My master thesis describes the development and optimisation of a shallow syntactic parser for spoken Dutch. The parser performs part-of-speech tagging, syntactic chunking, and grammatical relation finding. A similar system for written English, developed over the years within the ILK group served as a starting point; it comprises several modules, each trained to perform one of the parsing subtasks, all of them built on top of a memory-based learner. A good overview of this English memory-based shallow parser is presented in [DBV99].
The Dutch memory-based shallow parser has been trained on the Spoken Dutch Corpus (Dutch: Corpus Gesproken Nederlands, or CGN), a large corpus of contemporary spoken Dutch of which approximately a million words have received full syntactic annotation. Based on training material extracted from this annotation, both features and parameters were optimised.
Sander Canisius and Antal van den Bosch (2007)
Recompiling a knowledge-based dependency parser into memory
In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-2007), Borovets, Bulgaria.
Sander Canisius and Erik Tjong Kim Sang (2007)
A Constraint Satisfaction Approach to Dependency Parsing
In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic.
Sander Canisius, Toine Bogers, Antal van den Bosch, Jeroen Geertzen, and Erik Tjong Kim Sang (2006)
Dependency Parsing by Inference over High-recall Dependency Predictions
In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), New York, USA.
Sander Canisius and Antal van den Bosch (2004)
A memory-based shallow parser for spoken Dutch
In B. Decadt, G. De Pauw, and V. Hoste (Eds.), Selected papers from the Fourteenth Computational Linguistics in the Netherlands Meeting, Antwerp, Belgium, pp. 31-45.
Sander Canisius (2004)
Memory-Based Shallow Parsing of Spoken Dutch
Master thesis, Universiteit Maastricht