Text Mining

tweede semester 2004-2005
Master TKI / Faculteit der Letteren / UvT
vakcode 880414 / collegerooster

Walter Daelemans (Walter.Daelemans@ua.ac.be) and Antal van den Bosch (antalb@uvt.nl)

Dinsdag, 10.45 - 12.30, SZ33

Tijdlijn

Data en onderwerpen zijn onder voorbehoud!

[1 februari 2005]
[15 februari 2005]
[22 februari 2005]
[1 maart 2005]
[8 maart 2005]
[15 maart 2005]
[5 april 2005]
[12 april 2005]
[19 april 2005]
[26 april 2005]
[3 mei 2005]
[10 mei 2005]

Course objectives

This course aims to give students an understanding, both at the conceptual and the technical level, of the development of natural language processing (NLP) applications in the text mining / information extraction area. At the conceptual level, the course introduces machine learning as a powerful generic toolbox for automatically learning NLP modules from data. At the technical level, the course offers hands-on training and experience in building an actual text mining application in which NLP modules contribute to extracting information from text.

Course contents

Text mining, also known as 'information extraction from text', or as 'knowledge discovery from text', is an IT research and development field that has gained increasing focus in the last decade, attracting researchers from computational linguistics, machine learning (an AI subfield), and information retrieval. Example key applications that have emerged from this melting pot are question answering, information extraction, and summarization. This course gives an overview of the field in a practical, hands-on fashion, by first describing and then building modules that perform subtasks in text mining, such as part-of-speech tagging, phrase chunking, relation finding, and named-entity recognition. Students build these models from basic ingredients (machine learning algorithms and language data) and subsequently integrate them in the larger framework of a text mining application. Using a mix of software tools (ranging from programming from scratch to tuning existing modules), students test and report on the modules they develop.

Verplichte literatuur

Jackson, P. & I. Mouliner, Natural Language Processing for Online Applications: Text Retrieval, Extraction & Categorization, John Benjamins, 2002, ISBN 90 272 4989 x.
Aangevuld met on-line artikelen