Parsing

My master thesis describes the development and optimisation of a shallow syntactic parser for spoken Dutch. The parser performs part-of-speech tagging, syntactic chunking, and grammatical relation finding. A similar system for written English, developed over the years within the ILK group served as a starting point; it comprises several modules, each trained to perform one of the parsing subtasks, all of them built on top of a memory-based learner. A good overview of this English memory-based shallow parser is presented in [DBV99].

The Dutch memory-based shallow parser has been trained on the Spoken Dutch Corpus (Dutch: Corpus Gesproken Nederlands, or CGN), a large corpus of contemporary spoken Dutch of which approximately a million words have received full syntactic annotation. Based on training material extracted from this annotation, both features and parameters were optimised.

Related publications

Sander Canisius and Antal van den Bosch (2007)
Recompiling a knowledge-based dependency parser into memory
In Proceedings of the International Conference on Recent
Advances in Natural Language Processing (RANLP-2007),
Borovets, Bulgaria.
[pdf]

Sander Canisius and Erik Tjong Kim Sang (2007)
A Constraint Satisfaction Approach to Dependency Parsing
In Proceedings of the CoNLL Shared Task Session of
EMNLP-CoNLL 2007, Prague, Czech Republic.
[pdf]

Sander Canisius, Toine Bogers, Antal van den Bosch,
Jeroen Geertzen, and Erik Tjong Kim Sang (2006)
Dependency Parsing by Inference over High-recall Dependency Predictions
In Proceedings of the Tenth Conference on Computational Natural
Language Learning (CoNLL-X), New York, USA.
[pdf]

Sander Canisius and Antal van den Bosch (2004)
A memory-based shallow parser for spoken Dutch
In B. Decadt, G. De Pauw, and V. Hoste (Eds.), Selected papers
from the Fourteenth Computational Linguistics in the Netherlands
Meeting, Antwerp, Belgium, pp. 31-45.
[pdf]

Sander Canisius (2004)
Memory-Based Shallow Parsing of Spoken Dutch
Master thesis, Universiteit Maastricht
[pdf]

References

[DBV99]

Walter Daelemans, Sabine Buchholz, and Jorn Veenstra (1999)
Memory-Based Shallow Parsing
In: Proceedings of CoNLL-99, Bergen, Norway, June 12, 1999.
http://ilk.uvt.nl/downloads/pub/papers/ilk.9907.ps.gz