Open-Domain Question Answering on the World Wide Web

Sabine Buchholz

Overview



Introduction

What is Question Answering (QA)?

Related domains

Types of questions, answers and QA systems



Text REtrieval Conference (TREC) Question Answering Track

Purpose: First large-scale evaluation of domain-independent QA systems

Given

Wanted Evaluation Results

Shapaqa: A prototype system for form-based QA on the World Wide Web using shallow parsing

Example results for "When was the telephone invented?"

TREC vs. QA on the World Wide Web

TRECWWW
# docs< 1,000,0001,600,000,000
types of docsnewspaper, mostly 1990'sall kinds
processingofflineonline
answer format50 or 250 byteseasily readable
wrong information in docssystem gets creditvery bad
multiple correct answerssystem gets no extra creditbetter than single

Design principles

Architecture: global, NLP modules

All NLP modules (except the rule-based tokenizer) are trained on the Wall Street Journal Corpus of the Penn Treebank using the memory-based learning software package TiMBL

Sample sentence with treebank annotation:

( (S
    (NP-SBJ (DT The) (NN telephone) )
    (VP (VBD was)
       (VP (VBN invented)
          (PP (IN by)
	     (NP-LGS (NNP Alexander) (NNP Graham) (NNP Bell) ) )
	  (PP-TMP (IN in)
	     (NP (CD 1876) ) ) ) )
    (. .) ))
Sentences that did not pass a test:

Evaluation

Results

Shapaqa GRShapaqa CTSENTGoogleCombination
MRR.28.34.32.30.45
attempted answers72101114198
points received55.068.463.760.8
"precision".76.68.56.31

Possible improvements