Frog, formerly known as Tadpole, is an integration of
memory-based natural language processing (NLP) modules developed for
Dutch. All NLP modules are based
on Timbl, the Tilburg
memory-based learning software package. Most modules were created in
the 1990s at the ILK Research
Group (Tilburg University, the Netherlands) and
the CLiPS Research
Centre (University of Antwerp, Belgium). Over the years they
have been integrated into a single text processing tool. More
recently, a dependency parser, a base phrase chunker, and a
named-entity recognizer module were added.
Various (re)programming rounds have been made possible through funding
by NWO, the Netherlands
Organisation for Scientific Research, particularly under
project, the IMIX programme,
the Implicit Linguistics
project, and the CLARIN-NL
What does it do?
Frog's current version will tokenize, tag, lemmatize, and
morphologically segment word tokens in Dutch text files, will assign a
dependency graph to each sentence, will identify the base phrase
chunks in the sentence, and will attempt to find and label all named
Frog produces FoLiA XML, or tab-delimited
column-formatted output, one line per token, that looks as follows:
The ten columns contain the following information:
- Token number (resets every sentence)
- Lemma (according to MBLEM)
- Morphological segmentation (according to MBMA)
- PoS tag (CGN tagset; according to MBT)
- Confidence in the POS tag, a number between 0 and 1, representing the probability mass assigned to the best guess tag in the tag distribution
- Named entity type, identifying person (PER), organization (ORG), location (LOC), product (PRO), event (EVE), and miscellaneous (MISC), using a BIO (or IOB2) encoding
- Base (non-embedded) phrase chunk in BIO encoding
- Token number of head word in dependency graph (according to CSI-DP)
- Type of dependency relation with head word
If you use Frog for your own work, please cite the following paper:
Credits and contact information
Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius,
S. (2007). An
efficient memory-based morphosyntactic tagger and parser for
Dutch, In F. van Eynde, P. Dirix, I. Schuurman, and
V. Vandeghinste (Eds.), Selected Papers of the 17th Computational
Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114.
Frog, formerly known as Tadpole and before that as MB-TALPA, was coded
by Bertjan Busser, Ko van der Sloot, Maarten van Gompel, and Peter
Berck, subsuming code by Sander Canisius (constraint satisfaction
inference-based dependency parser), Antal van den Bosch (MBMA, MBLEM,
tagger-lemmatizer integration), Jakub Zavrel (MBT), and Maarten van
Gompel (Ucto). In the context of the CLARIN-NL infrastructure project
TTNWW, Frederik Vaassen (CLiPS, Antwerp) created the base phrase
chunking module, and Bart Desmet (LT3, Ghent) provided the data for the
The development of Frog relies on earlier work
and ideas from Ko van der Sloot (lead programmer of MBT and TiMBL and
the TiMBL API), Walter Daelemans, Jakub Zavrel, Peter Berck, Gert
Durieux, and Ton Weijters.
The development of Frog relies on your bug reports, suggestions, and comments. Please send them to a.vandenbosch (at) let.ru.nl.
Frog is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public
License as published by the Free
instructions for details on how to install this software if
you are using a Debian, Ubuntu, or Fedora-based
system (recommended). If you want to build the code from source
Because of file sizes and to cleanly separate code from data, the data
and configuration files for the modules of Frog have been packaged
separately. To get them,
Follow the same installation instructions as for Frog, see below;
this will unpack the data into the Frog configuration directory.
Installing and running Frog
If you downloaded Frog as tarball, proceed as follows:
- The tarball will unpack (tar zxvf frog-latest.tar.gz) in a
directory called 'frog-[version]'.
- When in the frog-[version] directory, issue a ./configure --prefix=<installdir> command,
followed by make and make install.
- Repeat the same procedure for the frogdata tarball.
Frog relies on other software, so before installing Frog, check the following list of dependencies and make sure this software is installed:
Frog assumes installed current versions of
Frog will not work with versions of Timbl before 6.4, and Mbt before 3.2 - please make sure you have the latest versions installed before installing Frog.
Frog is also dependent on Python 2.5 or higher and ICU 3.6 or higher. You may also need to install fresh versions of pkgconfig, libxml2 and/or libxml2-dev, and the autoconf toolkit.
- Mac users are advised to install the latest version of XCode, and
use Fink, Macports,
or homebrew to install
the above libraries. Currently (mid 2014) many Mac users report issues when compiling/linking Frog code. We have no solution at hand, unfortunately.
Making Frog Leap
To let Frog leap, simply invoking frog without arguments will produce a list of available commandline options. Some main options are:
- frog -t
<file> will run all modules on the text in <file>.
--testdir=<dir> will let Frog process all files in the
- frog -S <port> starts up a Frog server listening on port number <port>.
- With --skip=[mptnc] you can tell Frog to skip tokenization (t), base phrase chunking (c), named-entity recognition (n), multi-word unit chunking for the parser (m), or parsing (p).
Calling the Frog server from Python with pynlpl
written by Maarten van Gompel, contains a Frog client through which a
Frog server running on a port can be called, and its output
processed. To install pynlpl, invoke
$ easy_install pynlpl
Communication with Frog can be established as follows:
from pynlpl.clients.frogclient import FrogClient
port = 8020
frogclient = FrogClient('localhost',port)
for data in frogclient.process("Een voorbeeldbericht om te froggen")
word, lemma, morph, pos = data[:4]
#TODO: verdere verwerking per gefrogged woord
Memory and speed considerations
Without the dependency parser, Frog will process about 900 words
per second, and consume 542 MB on a 64-bit Linux architecture. With the
parser, Frog's speed reduces to about 200 words per second, taking
just under 1200 MB of memory; you have been warned.
Notice: we are in the process of writing a reference
guide for Frog that explains all options in detail.