Department of Computer and Information Science

TDT4215 Web-intelligence Spring 2014

Document collections

REUTERS-21578 altered collection

The collection consists of a set of 21.578 documents. The documents are gathered into XML files of about 1.000 documents in each file. The different files are explained in the assignment text. Read the README files as well.