The focus of the ROLAQUAD research project (2004-2008) was
basic research in the context of a multiple-turn question-answering
dialogue application. Team members of ROLAQUAD published work
on the following topics:
Constraint satisfaction inference for structured output learning,
applied to, among other tasks,
Open and closed-domain entity recognition;
Multi-label document classification;
Dependency parsing;
Machine translation.
Taxonomic knowledge extraction from semi-structured encyclopedic medical texts;
Question-answering in a closed domain through semantic tagging of medical concepts and relations;
Pragma-semantic tagging of dialogue acts.
Software
Apart from project-internal modules for question answering and semantic tagging, the ROLAQUAD project spun off or contributed to the following open source software releases:
Canisius, S., and Van den Bosch, A. (2007). Recompiling a
knowledge-based dependency parser into memory. In Proceedings of
the International Conference on Recent Advances in Natural Language
Processing (RANLP-2007), Borovets, Bulgaria, pp. 104-108. [pdf]
Van den Bosch, A., Busser, G.J., Canisius, S., and Daelemans,
W. (2007). An efficient memory-based morpho-syntactic tagger and
parser for Dutch. In P. Dirix, I. Schuurman, V. Vandeghinste, and
F. Van Eynde (Eds.), Computational Linguistics in the Netherlands:
Selected Papers from the Seventeenth CLIN Meeting, Leuven, Belgium,
pp. 99-114. [preprint
pdf]
Lendvai, P., and Geertzen, J. (2007). Token-based chunking of
turn-internal dialogue act sequences. In Proceedings of the 8th
SIGDIAL Workshop on Discourse and Dialogue, Antwerp, Belgium,
pp. 174-181. [pdf]
Spitters, M., De Boni, M., Zavrel, J., and Bonnema,
R. (2007). Learning to compose effective strategies from a library of
dialogue components. In Proceedings of the 45th Annual Meeting of
the Association of Computational Linguistics, Prague, Czech
Republic, pp. 792-799. [pdf]
Canisius, S., and Sporleder, C. (2007). Bootstrapping information
extraction from field books. In Proceedings of the 2007 Joint
Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP-CoNLL), Prague,
Czech Republic, pp. 827-836. [pdf, bib]
Canisius, S., and Tjong Kim Sang, E. (2007). A constraint
satisfaction approach to dependency parsing. In Proceedings of the
2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning
(EMNLP-CoNLL), Prague, Czech Republic, pp. 1124-1128. [pdf, bib]
Canisius, S., and Sporleder, C. (2007). Learning to segment and
label semi-structured documents with little or no supervision. In
P. Adriaans, M. van Someren, and S. Katrenko (Eds.), Proceedings of
the 18th BENELEARN Conference, Amsterdam, The Netherlands. [pdf]
Canisius, S., T. Bogers, A. van den Bosch, J. Geertzen, and
E. Tjong Kim Sang (2006). Dependency parsing by inference over
high-recall dependency predictions. In Proceedings of the Tenth
Conference on Computational Natural Language Learning, CoNLL-X,
June 2006, New York City, NY. [pdf]
Van den Bosch, A., and Canisius, S. (2006). Improved
morpho-phonological sequence processing with constraint satisfaction
inference. In Proceedings of the Eighth Meeting of the ACL Special
Interest Group in Computational Phonology, SIGPHON '06, June 2006,
New York City, NY. [pdf]
Canisius, S., Van den Bosch, A., and Daelemans,
W. (2006). Constraint satisfaction inference: Non-probabilistic global
inference for sequence labelling. In Proceedings of the EACL 2006
Workshop on Learning Structured Information in Natural Language
Applications, Trento, April 2006. [pdf]
Lendvai, P. (2005). Conceptual taxonomy identification in medical
documents. In Proceedings of The Second International Workshop on
Knowledge Discovery and Ontologies (KDO-2005), held within
ECML/PKDD, Porto, Portugal, 2005, pp. 31-38. [pdf]
Lendvai, P. (2005). Taxonómia felismerése
dokumentumszerkezetbõl. In Proceedings of Computational
Linguistics in Hungary Conference (Magyar
Szamítógépes Nyelvészeti Konferencia,
MSZNY-2005), Szeged, Hungary, 2005. pp. 88-95. [pdf]
Van den Bosch, A. and Lendvai, P. (2005). Robust ASR lattice
representation types in pragma-semantic processing of spoken input. In
Proceedings of the AAAI Spoken Language Understanding Workshop,
SLU-2005, July 9, 2005, Pittsburgh, PA, pp. 15-22. [pdf]
Van den Bosch, A. (2005). Memory-based understanding of user
utterances in a spoken dialogue system: Effects of feature selection
and co-learning. In Workshop Proceedings of the 6th International
Conference on Case-Based Reasoning, Chicago, IL, pp. 85-94. []pdf]
Van den Bosch, A., and Daelemans, W. (2005). Improving sequence
segmentation learning by predicting trigrams. In Proceedings of the
Ninth Conference on Natural Language Learning, CoNLL-2005, June 29-30,
2005, Ann Arbor, MI, pp. 80-87. [pdf]
Canisius, S., Van den Bosch, A., and Daelemans, W. (2005). Rule
meta-learning for trigram-based sequence processing. In J. Cussens and
C. Nedellec (Eds.), Proceedings of the Fourth Learning Language in
Logic Workshop, pp. 3-10, Bonn, August 2005. [pdf]
Tjong Kim Sang, E., Canisius, S, Van den Bosch, A., and Bogers,
T. (2005). Applying spelling error correction techniques for improving
semantic role labelling. In Proceedings of the Ninth Conference on
Natural Language Learning, CoNLL-2005, June 29-30, 2005, Ann
Arbor, MI. [pdf]
Lendvai, P., Van den Bosch, A., Krahmer, E., and Canisius, S.
(2004). Memory-based Robust Interpretation of Recognised Speech. In:
Proceedings of SPECOM '04, 9th International Conference "Speech and
Computer", St. Petersburg, Russia, pp. 415-422. [pdf]
Canisius, S., and Van den Bosch, A. (2004). A memory-based shallow
parser for spoken Dutch. In Decadt, B., De Pauw, G. and Hoste,
V. (Eds.), Selected papers from the Thirteenth Computational
Linguistics in the Netherlands Meeting, Antwerp, Belgium, pp. 31-45.
The idea central to ROLAQUAD was that by doing a direct
word-level and sentence-level semantic tagging of both questions and
background texts (medical encyclopedic texts), a basic QA module could
be rapidly developed. This was effectively realized in the IMIX
demonstrator, a spoken medical QA dialogue system that also integrated speech trecognition, dialogue management, natural language generation and speech synthesis, and two other QA modules, Joost (University of Groningen) and Factmine (University of Amsterdam).
ROLAQUAD was part of the NWOIMIX (Interactive Multimodal
Information Extraction) programme, and of the ILK Research Group of the Faculty of Humanities
(until January 2007: Faculty of Arts) of Tilburg University.
ROLAQUAD's industrial partner was Textkernel B.V.. Textkernel
contributes expertise and software for information extraction, text
classification, and server-based annotation.
ROLAQUAD would like to thank Emiel Krahmer, Erwin Marsi,
Erik Tjong Kim Sang, and all other IMIX project partners; Bertjan Busser, Toine Bogers, Jeroen Geertzen, and all other ILK members; Martijn Spitters,
Remko Bonnema, Eduard Hovy, and Caroline Sporleder for their help,
contributions, and suggestions along the way. Many thanks also to student
assistants Ralph Claassens, Eva Creyghton, and Corina Koolen who
annotated the medical encyclopedic texts.
Older related publications by members of the group:
Lendvai, P. (2004). Extracting Information from Spoken User
Input. A Machine Learning Approach. Ph.D. thesis, Tilburg
University, 2004.
Lendvai, P. (2003). Learning to Identify Fragmented Words in
Spoken Discourse. In: Proceedings of EACL-03 Student Research
Workshop. Budapest, 2003. pages 25-32. [pdf, slides]
Lendvai, P., Van den Bosch, A., and Krahmer, E. (2003). Memory-based
disfluency chunking. In R. Eklund (Ed.), Proceedings of DISS'03,
Disfluency in Spontaneous Speech Workshop, Göteborg
University, Sweden, 2003. pages 63-66. [pdf]
Lendvai, P., Van den Bosch, A., and Krahmer, E. (2003). Machine
Learning for Shallow Interpretation of User Utterances in Spoken
Dialogue Systems. In Proceedings of EACL-03 Workshop on Dialogue
Systems: interaction, adaptation and styles of
management. Budapest, 2003, pp. 69-78. [pdf] (note - this version corrects
the published paper!)
Lendvai,P., and L. Maruster (2003). Process discovery for
evaluating dialogue strategies. In: Proc. of ISCA Workshop on Error Handling in Spoken
Dialogue Systems. Chateau d'Oex-Vaud, Switzerland, 2003. pages
119-122. [pdf]
Lendvai, P., Van den Bosch, A., Krahmer, E., and Swerts,
M. (2002). Improving machine-learned detection of miscommunications in
human-machine dialogues through informed data splitting. In:
Proceedings of the ESSLLI 2002 Workshop on Machine Learning
Approaches in Computational Linguistics, Trento, Italy, August
2002. [postscript]
Lendvai, P., Van den Bosch, A., Krahmer, E, and Swerts, M. (2002).
Multi-feature error detection in spoken dialogue systems. In:
Proceedings of the 12th Computational Linguistics in The
Netherlands meeting, Twente, Netherlands, November 2001.
[postscript]
Van den Bosch, A., Krahmer, E., and Swerts, M. (2001). Detecting
problematic turns in human-machine interactions: Rule-induction versus
memory-based learning approaches. In Proceedings of the 39th
Meeting of the Association for Computational Linguistics
(ACL'00). New Brunswick, NJ: ACL, pp. 499-506. [postscript]