|
About Tadpole
Tadpole, which stands for Tagger, Dependency Parser, and
Other Linguistic Engines, is an integration of memory-based
language processing modules developed for Dutch. All modules are based
on Timbl version 6.1.
Tadpole's current version number is 0.2, and is in beta state. It
does not offer dependency parsing yet; the current version will
tokenize, tag, lemmatize, and morphologically segment word tokens in
incoming Dutch text files. We expect to be updating Tadpole frequently
during 2008.
Tadpole expects simple, raw, ASCII (iso latin-1) texts as input, and
will produce tab-delimited four-column output that looks as follows
(wrapped to fit):
De LID(bep,stan,rest) de [de]
oprichter N(soort,ev,basis,zijd,stan)
oprichter [op][richt][er]
van VZ(init) van [van]
Wikipedia SPEC(deeleigen) Wikipedia
[Wikipedia]
, LET() , [,]
Jimmy SPEC(deeleigen) Jimmy [Jimmy]
Wales SPEC(deeleigen) Wales [Wales]
, LET() , [,]
wil WW(pv,tgw,ev) willen [wil]
een LID(onbep,stan,agr) een [een]
nieuwe ADJ(prenom,basis,met-e,stan) nieuw
[nieuw][e]
zoekmachine N(soort,ev,basis,zijd,stan)
zoekmachine [zoek][machine]
lanceren WW(inf,vrij,zonder) lanceren
[lanceren]
. LET() . [.]
References
If you use Tadpole for your own work, please cite the following paper:
Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius,
S. (2007). An
efficient memory-based morphosyntactic tagger and parser for
Dutch, In F. van Eynde, P. Dirix, I. Schuurman, and
V. Vandeghinste (Eds.), Selected Papers of the 17th Computational
Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114.
See the webpages of MBT and MBMA for more papers describing the
memory-based tagger and morphological analyzer in Tadpole.
|