Publications about the MBT tagger:
- Recent Advances in Memory-Based Part-of-Speech Tagging. Jakub Zavrel and Walter Daelemans. in: Actas del VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590-597, 1999. ILK pub: ILK-9903.
- MBT: A Memory-Based Part of Speech
Tagger-Generator. Walter Daelemans, Jakub Zavrel, Peter Berck
and Steven Gillis. in: E. Ejerhed and I. Dagan
(eds.) Proceedings of the Fourth Workshop on Very Large
Corpora, Copenhagen, Denmark, 14-27, 1996.
- description of English tagset (WSJ - Penn Treebank)
- Part-of-Speech Tagging for Dutch with MBT,
a Memory-based Tagger Generator. Walter Daelemans, Jakub
Zavrel, Peter Berck, in: Congresboek van de Interdisciplinaire
Onderzoeksconferentie Informatiewetenchap 1996, TU Delft.
- description of Dutch tagset (WOTAN - I; Thanks to Hans van Halteren)
- The Spanish tagger was trained on a small part of the LEXESP corpus (Thanks to Rafael Nunoz).
- The Swedish tagger was trained on the SUC corpus (Thanks to Joachim Nivre).
- The Slovene tagger was trained on the MULTEXT-EAST corpus (George Orwell's 1984; Thanks to Tomaz Erjavec and Saso Dzeroski).
- The German tagger was trained on the
Negra corpus, annotated with the STTS (Stuttgart-Tuebingen Tag
Set) (Thanks to Thorsten Brants).
Contact
Walter Daelemans or Jakub Zavrel for more information about these taggers.