This course aims to give students an understanding, both at the
conceptual and the technical level, of the development of natural
language processing (NLP) applications in the text mining /
information extraction area. At the conceptual level, the course
introduces machine learning as a powerful generic toolbox for
automatically learning NLP modules from data. At the technical level,
the course offers hands-on training and experience in building an
actual text mining application in which NLP modules contribute to
extracting information from text.
Course contents
Text mining, also known as 'information extraction from text', or as
'knowledge discovery from text', is an IT research and development
field that has gained increasing focus in the last decade, attracting
researchers from computational linguistics, machine learning (an AI
subfield), and information retrieval. Example key applications that
have emerged from this melting pot are question answering, information
extraction, and summarization. This course gives an overview of the
field in a practical, hands-on fashion, by first describing and then
building modules that perform subtasks in text mining, such as
part-of-speech tagging, phrase chunking, relation finding, and
named-entity recognition. Students build these models from basic
ingredients (machine learning algorithms and language data) and
subsequently integrate them in the larger framework of a text mining
application. Using a mix of software tools (ranging from programming
from scratch to tuning existing modules), students test and report on
the modules they develop.
Verplichte literatuur
Jackson, P. & I. Mouliner, Natural Language Processing for Online Applications: Text Retrieval, Extraction & Categorization, John Benjamins, 2002, ISBN 90 272 4989 x.