POS Annotation for ICSI Meeting Recorder Data

Here you find the Part of Speech annotation for the ICSI Meeting Recorder Data. Please note, that the files only contain the POS information and no word information. You already have to have the ICSI corpus to use this data. When using this data, please cite the following paper: Margot Mieskes and Michael Strube:
Part-of-Speech Tagging of Transcribed Speech Proceedings of the 5th Conference on Language Resources and Evaluation (LREC 2006). Genua, Italy, May 22-28, 2006 (PDF).
This paper also contains a description of the method used and detailed results.
The format in the .txt files is one segment per line.

The Gold Standard files are:

The Gold Standard files are:Bed016
Bed017
Bmr001
Bmr002
Bns003
Bmr003
Bmr004
Bmr005
Bsr001
Btr001
Btr002
Buw001

Downloads:

Here you find the Gold Standard manual annotation in .txt format.

Here you find the Gold Standard manual annotation in .mmax format.

Here you find the automatic POS annotation for the Gold Standard after retraining the four taggers on the manual data in .txt format.

Here you find the automatic POS annotation for the Gold Standard after retraining the four taggers on the manual data in .mmax format.

Here you find the automatic POS annotation for the whole corpus after retraining the four taggers on the manual data in .txt format.

Here you find the automatic POS annotation for the whole corpus after retraining the four taggers on the manual data in .mmax format.

The four taggers used were the following:

TBL Tagger: Eric Brill Some Advance in transformation based part of speech tagging In Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, Washington 1. – 4. August 1994, pp. 722-727

TnT Tagger: Thorsten Brants TnT – A statistical Part Of Speech tagger In Proceedings of the 6th International Conference on Applied Natural Language Processing, Seattle, Washington 29. April – 4. May 2000, pp. 224-231

Stanford NLP Library Tagger: Kristina Toutanova and Christopher D. Manning Enriching the knowledge sources used in a maximum entropy part-of-speech tagger In Proceedings of the Joint SIGDAT Conference on Empirical methods in Natural Language Processing and very large corpus, Hong Kong 2000, pp. 63-70

Stanford NLP Library Tagger:Kristina Toutanova, Dan Klein, Christopher D. Manning and Yoram Singer Feature-Rich Part-of-Speech Tagging with a cyclic dependency network. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Alberta, Canada, 27. May – 1. June 2003, pp. 252-259NLP Group