| Title | Format | Description | Contact Person |
|---|---|---|---|
| GOV2 | HTML and TXT | http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm | Kjetil |
| TREC 1/2/3 | TXT | Kjetil | |
| British National Corpus | XML | http://www.natcorp.ox.ac.uk/ | Robert |
| MEDLINE | XML and TXT | http://www.nlm.nih.gov/pubs/factsheets/medline.html | Heri |
| TREC Genomic Track | XML and TXT | http://ir.ohsu.edu/genomics/ | Heri |
| Text Research Collection Vol.1,2,3,4,5 | TXT | http://trec.nist.gov/data/docs_eng.html | Wei |
| The AQUAINT Corpus of English News Text | TXT | http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 | Wei |
| The New York Times Annotated Corpus | XML | http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19 | Kjetil or Nattiya |