This software is currently broken, it might be removed entirely soon.
ILK: Induction of Linguistic Knowledge Research Group

Tagger-lemmatizer for transcribed speech

This tagger is trained on the full Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), and can tag transcribed spontaneous speech. Transcriptions need to adhere to the CGN orthographic transcription guidelines (PDF document), and may contain filled pauses, interrupted words, and other special markers. The lemmatizer is trained on the Spoken Dutch Corpus lexicon, which fully covers the corpus.


Or specify a file (adhering to the .ort format, max. 128 Kb):

Tagger-lemmatizer for written text

This tagger produces CGN tags plus a few added tags for written-text phenomena. Its estimated tag prediction accuracy is about 98% (about 76% on unseen words). The tagger is based on Mbt. The lemmatizer is trained on e-Lex and is based on TiMBL.


Or specify a file (containing raw ASCII text, no markup or codes, max. 128 Kb):