MBMT is a software package for training and
running a machine translation system. It is based on the
k-nearest neighbor classifier as implemented in TiMBL, and also features a
memory-based target language model based on WOPR, a memory-based language
model. It assumes a word-aligned bilingual parallel training corpus,
such as produced by GIZA++.
Features
Generates a machine translation
model from a word-aligned parallel corpus
Based on
k-nearest neighbor classification, as implemented in TiMBL
MBMT is written by Antal van den Bosch and Peter Berck, with contributions from Ko van der Sloot.
See also
PBMBMT, phrase-based memory-based machine translation, is an extension to MBMT based on variable-width n-grams, or phrases, instead of the fixed-width trigrams of MBMT. Coded by Maarten van Gompel.
The tarball will unpack ('tar zxvf mbmt-0.1.tar.gz') in a directory called 'mbmt-0.1'.
In the 'mbmt-0.1' directory, issue a './configure' command,
followed by 'make'.
If you want to install the software elsewhere,
issue a './configure --prefix <install-dir>', followed by 'make' and
'make install'.
For step-by-step instructions on running MBMT, see this howto. Alternatively,
the software comes with a shell script, mbmt.sh:
The mbmt.sh script included in the package runs a full training
and translation process, based on a GIZA++-aligned
'A3.final' file, and a target-language text (one sentence per line);
Van den Bosch, A., Stroppa, N., and Way, A. (2007). A
memory-based classification approach to marker-based EBMT. In
F. Van Eynde, V. Vandeghinste, and I. Schuurman (Eds.), Proceedings
of the METIS-II Workshop on New Approaches to Machine Translation,
pp. 63-72. Leuven, Belgium.
Stroppa, N., Van den Bosch, A., and Way, A. (2007). Exploiting
source similarity for SMT using context-informed features. In A. Way
and B. Gawronska (Eds.), Proceedings of the 11th International
Conference on Theoretical Issues in Machine Translation (TMI
2007),
Skïvde University Studies in Informatics2007:1, pp. 231-240.
Sponsor
MBMT is developed as part of the Implicit Linguistics project, funded
by NWO, the Netherlands Organisation
for Scientific Research.