Open Ph.D. position in the MEMPHIX project
The Department of Communication and Information Sciences has an opening for a PhD position for the project MEMPHIX: MEMory-based paraPHrasing with Implicit and eXplicit semantics:

    Vacancy number 500.08.13, 4 years, fulltime

Summary of the project

The ability to paraphrase, i.e. to say the same thing in another way, can serve a variety of purposes. It can serve to explain something or to provide feedback in dialogue. Generating shorter paraphrases is useful for subtitles or news feeds. Paraphrasing can also change the register of a text: from formal speak to street language, or from old-fashioned prose to present-day language. In the MEMPHIX project, a system is built that learns to generate paraphrases on the basis of examples. The project makes use of a memory-based translation system (MBMT), developed within the NWO VICI project Implicit Linguistics. The MBMT technology can be straightforwardly trained on pairs of paraphrases.

While the generation of paraphrases can be driven in the first place by surface similarities (leaving semantics completely implicit, just as in a statistical MT system), explicit semantic information may also play a role, such as the semantic roles of NPs and the coreference relations between NPs and pronouns. Such information may be computed through automatic means (parsing, semantic role labeling, co-reference resolution). The project will compare the direct implicit route with the use of explicitly computed semantics. This part of the project will join forces with an international effort of the ISO organization aimed at developing semantic annotation formalisms with a well-defined semantics.

The project will make use of a richly annotated Dutch corpus of 1 million words developed in the STEVIN Daeso project, consisting of pairs of texts that express paraphrased or at least comparable information from various domains. While the Dutch language will be a core object of study, the methods are language-independent, and so other paraphrasing corpora will be considered as well to explore alternative routes to gather paraphrase subcorpora beyond the 1 million word scale.

The MEMPHIX project will be carried under the guidance of Antal van den Bosch, and will be aligned with the NWO VICI project Implicit Linguistics (Van den Bosch), the ISO project "Semantic Annotation Framework" (Prof. dr. Harry Bunt), and the STEVIN project Daeso (Tilburg partners: prof. dr. Emiel Krahmer and dr. Erwin Marsi).


The candidate has an honours or (research) master degree (or equivalent) in communication & information sciences, computational linguistics, or a relevant related area, with a a background in statistical NLP and/or machine learning, corpus linguistics, and computational semantics, and with some experience in programming.

The candidate is expected to have a strong interest in doing research, excellent writing skills and a good command of English. Applicants should have, or be willing to develop, active knowledge of spoken and written Dutch, which is the target language within the project.


Applications should include cover letter, Curriculum Vitae, and names of two references.

Terms of employment

Tilburg University is rated among the top Dutch employers, offering excellent terms of employment. The collective labour agreement of Tilburg University applies. The selected candidates will start with a contract for one year, concluded by an evaluation. Upon a positive outcome of the first-year evaluation, the candidate will be offered an employment contract for the remaining years.

The starting salary is 2.000 euros gross per month in the first year, up to 2.558 euros in the 4th year. The selected candidate is expected to have written a PhD thesis by the end of the contract (which may be based on articles).

Tilburg University

Tilburg University is a modern, specialized university, located in the south of the Netherlands, specializing in humanities and social sciences.

Faculty of Humanities, Department of Communication and Information Sciences

The Department of Communication and Information Sciences in the Faculty of Humanities is dedicated to research and education in the areas of language technology, human-computer interaction, professional communication, and discourse studies. It offers bachelor's and master's programmes in business communication and digital media; a master's programme in human aspects of information technology; and it participates in an inter-university research master's programme in language and communication. It is home to two research programmes, one in language technology and computational linguistics, and one in communication, cognition and discourse studies ("Multimodality and Cognition"), directed by Prof. dr. Fons Maes and Prof. dr. Marc Swerts. The department has an inspiring working environment with an international orientation and good computing facilities.


For more information about the project please contact Antal van den Bosch via email ( or telephone (+31.13.466.3117).