ILK Home People News Publications MBLP book TiMBL MBT Tadpole Software Demos Resources Implicit Linguistics HiTiME AMICUS Contact About Nederlands
  
Tadpole
About Tadpole

Tadpole, which stands for Tagger, Dependency Parser, and Other Linguistic Engines, is an integration of memory-based language processing modules developed for Dutch. All modules are based on Timbl version 6.1.

Tadpole's current version number is 0.2, and is in beta state. It does not offer dependency parsing yet; the current version will tokenize, tag, lemmatize, and morphologically segment word tokens in incoming Dutch text files. We expect to be updating Tadpole frequently during 2008.

Tadpole expects simple, raw, ASCII (iso latin-1) texts as input, and will produce tab-delimited four-column output that looks as follows (wrapped to fit):



De      LID(bep,stan,rest)      de      [de]
oprichter       N(soort,ev,basis,zijd,stan)
     oprichter       [op][richt][er]
van     VZ(init)        van     [van]
Wikipedia       SPEC(deeleigen) Wikipedia
       [Wikipedia]
,       LET()   ,       [,]
Jimmy   SPEC(deeleigen) Jimmy   [Jimmy]
Wales   SPEC(deeleigen) Wales   [Wales]
,       LET()   ,       [,]
wil     WW(pv,tgw,ev)   willen  [wil]
een     LID(onbep,stan,agr)     een     [een]
nieuwe  ADJ(prenom,basis,met-e,stan)    nieuw
   [nieuw][e]
zoekmachine     N(soort,ev,basis,zijd,stan)
     zoekmachine     [zoek][machine]
lanceren        WW(inf,vrij,zonder)     lanceren
        [lanceren]
.       LET()   .       [.]


References

If you use Tadpole for your own work, please cite the following paper:

See the webpages of MBT and MBMA for more papers describing the memory-based tagger and morphological analyzer in Tadpole.

Download

Tadpole is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.

Tadpole assumes installed versions of Timbl version 6.1 and Mbt version 3.1. Tadpole will need about 38 Mb of free disk space.

Installation

Please consult the README file in the package for installation instructions. This file will remind you to check the following:

Credits and contact information

Tadpole was coded by Bertjan Busser, Ko van der Sloot, and Peter Berck, subsuming older code by Antal van den Bosch (MBMA, MBLEM, tagger-lemmatizer integration) and Sabine Buchholz (tokenization). The development of Tadpole further relies on earlier work and ideas from Ko van der Sloot (lead programmer of MBT and TiMBL and the TiMBL API), Walter Daelemans, Jakub Zavrel, Peter Berck, Gert Durieux, and Ton Weijters.

Thanks to Erik Tjong Kim Sang and Lieve Macken for stress-testing the first versions of Tadpole.

The development of Tadpole relies on your bug reports, suggestions, and comments. Please send them to Antal.vdnBosch (at) uvt.nl.

Antal.vdnBosch@uvt.nl | Last update: Fri Dec 21 2007