These pages are no longer maintained and will eventually disappear. Please vist my new homepage.

These pages are about my work. If you think that is rather boring, try my private pages.

Interests

I am a researcher in the area of Computational Linguistics. I am interested in basically all aspects of natural language and speech from a computational point of view. Some of the areas I did work in are:

Recognizing Textual Entailment
Text-to-text generation and sentence fusion
Dependency Parsing
Prosody prediction, intonation in particular
Speech Synthesis, both Text-to-Speech and Concept-to-Speech conversion
Talking heads and Embodied Conversational Agents
Natural Language Generation
Corpus annotation and validation
Morphological analysis and POS tagging of Arabic
Machine Learning (memory-based learning in particular)

Employment

As of October 2006, I am employed as a postdoc researcher at the department of Communication and Cognition at Tilburg University in the Netherlands, while physically spending most of time in Trondheim, Norway, as a guest researcher at NTNU. I work in a project called DAESO: Detecting And Exploiting Semantic Overlap, where I investigate the automatic detection of semantic overlap between Dutch sentences and the exploitation of this knowledge in a range of Natural Language Processing applications.

From March 2004 to March 2006, I was employed as a postdoc researcher at the same department at Tilburg University. I was involved in the IMIX (Interactive Multimodal Information Extraction) program, which aims at developing a multimodal question-answering system for Dutch. More specifically, I worked in the Imogen project, which concerns multimodal information-presentation, focussing on the combination of language, speech and graphics. My main contributions were in the area of text-to-text generation, speech synthesis, and talking head animation.

The three years before that, I worked as a post-doc researcher at the Induction of Linguistic Knowledge group, also at Tilburg University. I was involved in the PROSIT project (Prosody from Information in Text). The goal of this project was to improve the prediction of prosodic markers, that is, pitch accents and prosodic phrases, for the purpose of text-to-speech conversion in Dutch. We aimed for a robust automatic analysis of texts applying techniques from the fields of Information Retrieval and Natural Language Processing (shallow features), using machine learning allgorithms (memory-based learning, in particular). We showed that just a shallow analysis of an input text is usually sufficient for adequately predicting prosodic markers.

Within the PROSIT prject, I did quite a lot of work on text-to-speech. I am one of the initiators and lead developer of the NeXTeNS project. NeXTeNS stands for Nederlandse Extensie voor Tekst naar Spraak (Dutch Extension for Text to Speech). It is a clean, multi-platform, open source text-to-speech system for Dutch that is freely available for research and education purposes. You can try the demo , and download the software for free. Since it uses diphone synthesis, the segmental quality is not as good as that of commercial TTS systems, but it gives you the freedom to control, modify and extend all aspects of the text-to-speech process.

Even earlier, I was a PhD student at the Department of Language and Speech at the Radboud University (which was still called Nijmegen University back then) . I wrote my PhD thesis about intonation in spoken language generation (supervised by my promotor prof. dr. Carlos Gussenhoven and co-promotors dr. Peter-Arno Coppen and dr. Toni Rietveld). The main topic of this thesis was how to exploit the rich linguistic information generated by a natural language generation system (at the morphological, syntactic, and semantic level) in a rule-based system to predict the prosodic structure of utterances (location and type of pitch accents and boundary tones, as well a hierarchical prosodic phrasing), thereby improving the quality of synthetic speech.