William Cohen
Sequential learning methods for partitioning problems
One interesting special case of statistical relational learning is
sequential learning, in which the goal is to learn a sequentially
correlated set of decisions. Sequential learning has been used on a
diverse set of tasks including gene-finding, noun-phrase chunking,
named entity recognition, and document analysis. I will review two
well-studied approaches to sequential learning: conditional random
fields (CRFs), and maximum-entropy Markov models (MEMMs), and then
describe a new sequential learning scheme called ``stacked sequential
learning''.
Stacked sequential learning is a meta-learning algorithm, in which an
arbitrary base learner is augmented so as make it aware of the labels
of nearby examples. I will present experimental results on several
``sequential partitioning problems'', which are characterized by long
runs of identical labels, and show that on such problems MEMMs are
unstable, while CRFs and sequential stacking is not. I will also show
that on these problems, sequential stacking usually improves the
performance of non-sequential base learners; that it often improves
the performance of CRFs, and that a sequentially stacked
maximum-entropy learner often outperforms CRFs.
Hendrik Blockeel
Experiment databases: A novel methodology for experimental research
Data mining and machine learning are to some extent experimental
sciences: a lot of insight in the behaviour of algorithms is
obtained by implementing them and studying how they behave when run
on datasets. Performing experiments is a non-trivial task in this area:
performance of algorithms on datasets can be characterized in many different
ways and is influenced by many parameters of the algorithms and the datasets.
As a result, experiments need to be set up with care, and results need to be
interpreted with caution.
In this talk we will discuss the concept of "experiment databases" as the
basis of a new and improved experimental methodology for machine learning and
data mining. The basic idea behind experiment databases is that instead of
setting up experiments to answer specific research questions, large sets of
experiments are performed automatically and stored in a database, and
specific research questions are answered by querying that database.
The proposed methodology has numerous advantages of the classical one.
However, in order to exploit these optimally, several research challenges
need to be addressed. We will discuss these challenges as well as the
potential impact of the proposed methodology.
Maarten van Someren
Bias-variance analysis: What is it and why is it useful?
Bias-variance analysis analyses prediction errors into components that
have different causes. This talk will summarise the concept, show why
it is important, using an analysis of different solutions to a
real-world learning problem and conclude with guidelines on how to use
bias-variance analysis in data mining and machine learning.