MBMT: Memory-based machine translation
MBMT is a software package for training and
running a machine translation system. It is based on the
k-nearest neighbor classifier as implemented in , and also features a
memory-based target language model based on TiMBL , a memory-based language
model. It assumes a word-aligned bilingual parallel training corpus,
such as produced by WOPR . GIZA++
Generates a machine translation
model from a word-aligned parallel corpus Based on
k-nearest neighbor classification, as implemented in TiMBL Fast training and
MBMT is free software; you can
redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free
MBMT is written by Antal van den Bosch and Peter Berck, with contributions from Ko van der Sloot.
, phrase-based memory-based machine translation, is an extension to PBMBMT MBMT based on variable-width n-grams, or phrases, instead of the fixed-width trigrams of MBMT. Coded by Maarten van Gompel.
Download and installation
To install, please follow these basic instructions:
For step-by-step instructions on running
MBMT relies on an installation and availability in $PATH of the following two packages:
The tarball will unpack ('tar zxvf mbmt-0.1.tar.gz') in a directory called 'mbmt-0.1'.
In the 'mbmt-0.1' directory, issue a './configure' command,
followed by 'make'.
If you want to install the software elsewhere,
issue a './configure --prefix <install-dir>', followed by 'make' and
MBMT, see this . Alternatively,
the software comes with a shell script, mbmt.sh:
The mbmt.sh script included in the package runs a full training
and translation process, based on a -aligned
'A3.final' file, and a target-language text (one sentence per line);
GIZA++ For example, run the script by
(These Dutch-English files are small extracts from the changing to the 'etc/' directory,
moving the following two files there: JRC-Acquis.sample.A3.final and JRC-Acquis.source.test.txt, and
issuing 'mbmt.sh JRC-Acquis.sample.A3.final JRC-Acquis.source.test.txt'.
multilingual parallel corpus).
JRC-Acquis MBMT has been compiled successfully with gcc (4.0 - 4.2), on
Intel/AMD platforms running several versions of Linux and the Mac OS X
For more information and background on MBMT, see
Van den Bosch, A. and Berck, P. (2009). Memory-based
machine translation and language modeling. The Prague Bulletin
of Mathematical Linguistics No. 91, pp. 17-26.
Van den Bosch, A., Stroppa, N., and Way, A. (2007). A
memory-based classification approach to marker-based EBMT. In
F. Van Eynde, V. Vandeghinste, and I. Schuurman (Eds.), Proceedings
of the METIS-II Workshop on New Approaches to Machine Translation,
pp. 63-72. Leuven, Belgium.
Stroppa, N., Van den Bosch, A., and Way, A. (2007). Exploiting
source similarity for SMT using context-informed features. In A. Way
and B. Gawronska (Eds.), Proceedings of the 11th International
Conference on Theoretical Issues in Machine Translation (TMI
Skïvde University Studies in Informatics 2007:1, pp. 231-240.
MBMT is developed as part of the Implicit Linguistics project, funded
by NWO, the Netherlands Organisation
for Scientific Research.