TiMBL

TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases.

For over fifteen years TiMBL has been mostly used in natural language processing as a machine learning classifier component, but its use extends to virtually any supervised machine learning domain. Due to its particular decision-tree-based implementation, TiMBL is in many cases far more efficient in classification than a standard k-nearest neighbor algorithm would be.

Features

Fast, decision-tree-based implementation of k-nearest neighbor classification
Implementations of IB1 and IB2, IGTree, TRIBL, and TRIBL2 algorithms
Similarity metrics: Overlap, MVDM, Jensen-Shannon and Jeffrey Divergence, Dot product, Cosine
Feature weighting metrics: information gain, gain ratio, chi squared, shared variance
Per-value similarity metrics: Levenshtein, Dice coefficient
Distance weighting metrics: inverse, inverse linear, exponential decay
Multi-CPU support
Extensive verbosity options to inspect nearest neighbor sets
Server functionality and extensive API
Fast leave-one-out testing and internal cross-validation
Handles user-defined example weighting

Download & Installation

Timbl is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.

To download and install Timbl:

First check if there are up-to-date packages included in your distribution's package manager. There are packages for Alpine Linux, Arch Linux (AUR), macOS (homebrew), Debian and derivates like Ubuntu.
If not, we recommend you use our docker container via docker pull proycon/timbl. It includes Timbl and all necessary dependencies.
Alternatively, you can always download, compile and install Timbl manually, as shown next.

Manual installation

To compile Timbl manually consult the included INSTALL document, you will need current versions of the following dependencies of our software:

ticcutils - A shared utility library

As well as the following 3rd party dependencies:

A sane build environment with a C++ compiler (e.g. gcc or clang), autotools, libtool, pkg-config

Documentation

Book: Memory-Based Language Processing - Daelemans, W., and Van den Bosch, A. (2005). Cambridge, UK: Cambridge University Press.
Reference Guide; Daelemans, W., Zavrel, J., Van der Sloot, K., and Van den Bosch, A. (still in edit). TiMBL: Tilburg Memory Based Learner, version 6.4, Reference Guide.
API guide (34 pages, 129 kB PDF); Van der Sloot, K. (2010). TiMBL: Tilburg Memory Based Learner, version 6.3, API Guide. ILK Research Group Technical Report Series no. 10-03.
TimblServer Manual (12 pages, 62 Kb PDF); Van der Sloot, K. (2010). TimblServer: Tilburg Memory-Based Learner Server, version 1.0, Manual. ILK Research Group Technical Report Series no. 10-02.

Extensions

Several wrappers, bindings and other extensions to TiMBL have been developed:

TimblServer - TiMBL wrapper, adds server functionality to TiMBL
python-timbl - Python language binding for TiMBL
Dimbl - Parallel TiMBL for multi-core processing; parallelizes by splitting the training set
paramsearch - Automatic hyperparameter optimization for TiMBL (and other ML algorithms)
rtimbl - a Ruby interface to TiMBL
Timpute - TiMBL-based data imputation
knngraph - Visualizes nearest neighbors in a TiMBL instance base

TiMBL is a core component of various NLP software systems such as MBT (memory-based tagger generator), Frog (Dutch morpho-syntactic analyzer), Gecco (Context-sensitive spelling corrector, used by Valkuil.net for Dutch, and Fowlt.net for English), and SoothSayer (Dutch word completion).

The development and improvement of Frog also relies on your bug reports, suggestions, and comments. Use the github issue tracker or mail lamasoftware (at) science.ru.nl.