| |
|
|
|
|
|
Program | Abstracts
Eduard Hovy (Information Sciences Institute of the University of Southern California, Marina del Rey, CA, U.S.A.)
Learning by Reading: From Information Extraction to Machine Reading
Creating computer systems that educate themselves by reading text was
one of the original dreams of Artificial Intelligence. Researchers in
Natural Language Processing (NLP) have made initial steps in this
direction, especially with Information Extraction and Text Mining,
which derive information from large sets of data. Can one, however,
build a system that learns by reading just one, or a small number, of
texts about a given topic?
Starting in 2002, three research groups in an experiment called
Project Halo manually converted the information in one chapter of a
high school chemistry textbook into knowledge representation
statements, and then had a knowledge representation system take the US
high school standardized (AP) exam. Surprisingly, all three systems
passed, albeit not very well. Could one do the same, automatically? In
late 2005, DARPA funded several small pilot projects in NLP, Knowledge
Representation and Reasoning (KR&R), and Cognitive Science to take
up this challenge, which grew into Project Möbius, a collaboration of
SRI, USC/ISI, University of Texas Austin, Boeing, and BBN Inc. The
Möbius prototype learning‐byreading system read paragraph‐length
Wikipedia‐level texts about the human heart and about engines, built
up enough knowledge to apply inferences, to produce its own further
reading requests, and to answer unseen questions. Results were
encouraging. In 2009, DARPA funded a new 5‐year program called Machine
Reading, which funds three large teams that include many of the top
NLP and KR&R research scientists in the USA.
This talk describes the Machine Reading program and provides details
about one the three teams, RACR, which is led by IBM's IE / QA team
and includes researchers at USC/ISI, University of Texas Austin, CMU,
and the University of Utah. The system contains several reading
engines that are being composed into a single large framework, with
access to a cluster of several thousand computers for large‐scale
experiments. The reading engines include traditional Information
Extraction engines, parsers, converters to various logical form
representations, abstract semantic models of world knowledge, and
various kinds of abductive and other reasoning engines. I will focus
on the use of large repositories of background knowledge and their
various uses to support reading and inference, and describe the
experiments currently being done.
|
|
|
|
|
|
|
|
|
|
|
|
Piek Vossen (Computational Lexicology and Terminology Lab, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands)
KYOTO: A Community Platform for Knowledge Modeling and Text Mining
The Asian-European project KYOTO develops an open system for knowledge
modeling and text mining: www.kyoto-project.eu. KYOTO operates in two
cycles. First, we derive a domain model from text by learning terms
and term-relations. Terms are automatically mapped to wordnets, which
are anchored to a central ontology. Next the domain model is used to
extract events and facts from text through a process of incremental
annotation of semantic layers. These layers are extracted through
simple profiles that can take any previous step as input and generate
a next layer as output. The KYOTO system uses an open text
representation format and a central ontology to enable extraction of
knowledge and facts from large volumes of text in many different
languages. We implemented a semantic tagging approach that performs
off-line reasoning. Mining of facts and knowledge is achieved through
a flexible pattern matching module that can work in much the same way
for different languages, can handle efficiently large volumes of
documents and is not restricted to a specific domain. We applied the
system to an English database on estuaries.
Marieke van Erp (Dept. of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands)
Accessing Natural History: Discoveries in Data Cleaning, Structuring, and Retrieval
Cultural heritage institutions harbour a vast treasure of
information. However, this treasure of information is often confined
to the walls of the archive, museum, or library. This thesis is about
improving access to cultural heritage collections through digitisation
and enrichment. In this thesis, three themes that improve information
access in a digital information collection from the Dutch National
Museum for Natural History Naturalis were investigated: data cleaning,
information structuring, and object retrieval.
Two methods for automatic cleanup of databases are presented: a
data-driven and a knowledge driven method. Both methods detect a large
number of inconsistencies in the data, but the experiments show that
they also detect different types of errors and are thus complementary.
Next, an automatic ontology construction method is presented. This
method makes implicit domain information present in the database from
Naturalis explicit by linking it to the online encyclopaedia
Wikipedia.
Finally, a system for data retrieval are presented in which three
different types of domain knowledge in three different stages of the
retrieval process are used. First, knowledge from external resources
and rules is used to interpret the queries to formulate more precise
queries. Then, the same types of knowledge is used to expand queries
with synonyms to increase recall. To rank results by relevance,
knowledge from the domain ontologies and query analysis is used. Mira
provides a significant improvement in data access as it decreases the
number of unanswered queries.
|
| | | |
|
|
|