ILK: Induction of Linguistic Research Group - Software
  
ILK Software
  
TiMBL - Tilburg Memory-Based Learner

  • TiMBL
    The TiMBL software package is a fast, tree-based implementation of k-nearest neighbor classification. The package includes the IB1, IB2, TRIBL, TRIBL2, and IGTree algorithms, and offers various weighting metrics. Server functionality offered by separate TimblServer wrapper.

  • Dimbl
    TiMBL wrapper performing parallel k-NN classification on multi-CPU machines

  • paramsearch
    Wrapped progressive sampling search for automatic algorithmic parameter optimization for TiMBL and other machine learning algorithms

  • python-timbl
    Python language bindings for TiMBL.

  • rtimbl
    Ruby language bindings for TiMBL.

  • knngraph
    Visualizes nearest neighbors in a TiMBL instance base.

Tools

  • CLAM
    Computational linguistics application mediator, turning (legacy) NLP software into RESTful webservices and webapplications, written by Maarten van Gompel.

  • ewnpy
    A command line interface to the Dutch Eurowordnet. Written by Erwin Marsi.

  • chunklink
    Converts Penn Treebank II files into a one-word-per-line format containing (at least) the same information as the original files. This script was used to generate the data for the CoNLL-2000 Shared Task. Written by Sabine Buchholz.

  • FoLiA
    Format for Linguistic Annotation. A rich XML-format supporting a wide variety of linguistic annotations, using a extensible and universal paradigm. Written by Maarten van Gompel.

  • PyNLPl
    Python Natural Language Processing Library (PyNLPl, pronounce as: pineapple). A collection of Python modules for a wide variety of NLP tasks. Written by Maarten van Gompel.

  • suffixtree
    C++ package implementing the suffix tree datatype. Written by Menno van Zaanen.

  • sarrays
    C++ package implementing the suffix array datatype. Also prints ngrams and skipgrams. Written by Herman Stehouwer.
Packaged

TiMBL and TimblServer, MBT and MbtServer, and Ucto have been packaged for Debian, Ubuntu, and Fedora. Consult this page for further instructions:

Generic NLP software
  • Mbt
    A customizable tagger-generator and tagger combined in one. Based on a tagged corpus, Mbtg generates a tagger, for instance for part-of-speech tagging or named-entity recognition. Mbt processes text from left to right, and uses a feedback loop to take its own previous decisions into account.

  • MBMT and PBMBMT
    Memory-based machine translation based on trigrams (MBMT, written by Antal van den Bosch and Peter Berck) or phrases (PBMBMT, written by Maarten van Gompel).

  • ABL
    The Alignment-Based Learning grammatical inference system by Menno van Zaanen.

  • DEMOCRAT
    Deciding between Multiple Outputs Created by Automatic Translation, a consensus-driven machine translation system by Menno van Zaanen and Harold Somers.

  • WOPR
    Memory-based language modeling, written by Peter Berck.

  • Ucto
    Generic tokenizer and sentence splitter, written by Maarten van Gompel and Ko van der Sloot.

Dutch language and speech technology

  • Frog
    Frog (formerly called Tadpole) is a modular system integrating a tagger, lemmatizer, morphological analyzer, and dependency parser based on TiMBL and MBT. Read about Frog in this paper.

  • NeXTeNS
    A multi-platform, open source text-to-speech system for Dutch.

Generic data mining software

-
Antal.vdnBosch@uvt.nl | Last update: