Data Sets

Title Format Description Contact Person
GOV2 HTML and TXT http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm Kjetil
TREC 1/2/3 TXT Kjetil
British National Corpus XML http://www.natcorp.ox.ac.uk/ Robert
MEDLINE XML and TXT http://www.nlm.nih.gov/pubs/factsheets/medline.html Heri
TREC Genomic Track XML and TXT http://ir.ohsu.edu/genomics/ Heri
Text Research Collection Vol.1,2,3,4,5 TXT http://trec.nist.gov/data/docs_eng.html Wei
The AQUAINT Corpus of English News Text TXT http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 Wei
The New York Times Annotated Corpus XML http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19 Kjetil or Nattiya
 
home.txt · Last modified: 2009/08/12 23:17 by user
 
Except where otherwise noted, content on this wiki is licensed under the following license:Public Domain
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki