MBMT: Memory-based machine translation
MBMT: Memory-based machine translation

MBMT is a software package for training and running a machine translation system. It is based on the k-nearest neighbor classifier as implemented in TiMBL, and also features a memory-based target language model based on WOPR, a memory-based language model. It assumes a word-aligned bilingual parallel training corpus, such as produced by GIZA++.

 

Features

  • Generates a machine translation model from a word-aligned parallel corpus
  • Based on k-nearest neighbor classification, as implemented in TiMBL
  • Fast training and translation

MBMT is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.

Credits

MBMT is written by Antal van den Bosch and Peter Berck, with contributions from Ko van der Sloot.

See also

PBMBMT, phrase-based memory-based machine translation, is an extension to MBMT based on variable-width n-grams, or phrases, instead of the fixed-width trigrams of MBMT. Coded by Maarten van Gompel.
Archived versions

http://software.ticc.uvt.nl/   

Download and installation

To install, please follow these basic instructions:

  • MBMT relies on an installation and availability in $PATH of the following two packages:
  • The tarball will unpack ('tar zxvf mbmt-0.1.tar.gz') in a directory called 'mbmt-0.1'.
  • In the 'mbmt-0.1' directory, issue a './configure' command, followed by 'make'.
  • If you want to install the software elsewhere, issue a './configure --prefix <install-dir>', followed by 'make' and 'make install'.
For step-by-step instructions on running MBMT, see this howto. Alternatively, the software comes with a shell script, mbmt.sh:
  • The mbmt.sh script included in the package runs a full training and translation process, based on a GIZA++-aligned 'A3.final' file, and a target-language text (one sentence per line);
  • For example, run the script by
    1. changing to the 'etc/' directory,
    2. moving the following two files there: JRC-Acquis.sample.A3.final and JRC-Acquis.source.test.txt, and
    3. issuing 'mbmt.sh JRC-Acquis.sample.A3.final JRC-Acquis.source.test.txt'.
    (These Dutch-English files are small extracts from the JRC-Acquis multilingual parallel corpus).
MBMT has been compiled successfully with gcc (4.0 - 4.2), on Intel/AMD platforms running several versions of Linux and the Mac OS X platform.

References

For more information and background on MBMT, see

Sponsor

MBMT is developed as part of the Implicit Linguistics project, funded by NWO, the Netherlands Organisation for Scientific Research.

Antal.vdnBosch@uvt.nl | Last update: