Department of Computer and Information Science


Automated cross-document summarization of news storylines

With the advance of digital media, the need for resources to deal with dynamic and complex text data coming from newsstreams is increasingly growing. Particularly, manually tracking down news storylines in newstreams can be a time consuming activity.

This project investigates the question of how news storylines across several online documents can be automatically summarized. The main goal of this project is to automatically summarize storylines as humans do, that is, deduplicating and aggregating content, ordering time, or resolving conflicts distributed across several documents.

To carry out this project, the students will make use of resources to perform a deep content analysis, including the use of named entity recognizers, event detection techniques and knowledge bases like Wikipedia. This is a very challenging project in the area of natural language generation that is suitable for one or two students.

The project is part of the SmartMedia program at IDI.  SmartMedia collaborates with the Norwegian media industry and is investigating the use of semantics and linked data in large-scale realtime news recommendation.

The project is supervised by Cristina Marco and Jon Atle Gulla.



