MBLEM and MBMA: Memory-based lemmatization and morphological analysis demos

MBLEM: Memory-based lemmatization

MBLEM is a lemmatizer for English, German, and Dutch. Its engine is a TiMBL server operating on data sets of English, German, and Dutch wordform-lemma information from CELEX. MBLEM performs a one-step mapping of instances (encoding words) to complex classes encoding the information needed to go from inflected form to lemma, and POS tagging. The demo produces all possible lemmatizations, with morphosyntactic information, e.g. the word shoes is analyzed as:

Enter any word in the entry field below, select the appropriate language, and click on "analyse".


Choose language: English German Dutch

MBMA: Memory-based morphological analysis

The MBMA demo analyses the morphology of Dutch words. Its engine is a TiMBL server operating on a data set of Dutch wordform morphology from CELEX. MBMA performs a one-step mapping of instances (letters in their immediate letter context) to complex classes encoding segmentation, derivation, affixation, inflection, spelling changes, and POS tagging. The resulting sequence of letter-by-letter codes is mapped to a bracketed analysis with morphosyntactic labels. For example, the word stationshallen is analyzed as:

Enter any Dutch word in the entry field below, and click on "analyse".