Publication View

The evaluation of question answering systems: Lessons learned from the TREC QA track (2002)

Abstract
The TREC question answering (QA) track was the first large-scale evaluation of open-domain question answering systems. In addition to successfully fostering research on the QA task, the track has also been used to investigate appropriate evaluation methodologies for question answering systems. This paper gives a brief history of the TREC QA track, motivating the decisions made in its implementation and summarizing the results. The lessons learned from the track will be used to evolve new QA evaluations for both the track and the ARDA AQUAINT program. 1. The TREC QA Task TREC is a workshop series designed to provide the infrastructure required for large-scale evaluation of text retrieval and related technologies (National Institute of Standards and Technology, 2002). A “track ” for the investigation of question answering systems was introduced into TREC-8 in 1999, and has been run each year since then for a total of three times to date. The original motivation for the track was to foster research that would move retrieval systems closer to information retrieval systems rather than document retrieval systems. Document retrieval systems ’ ability to work in any domain was considered an important feature to maintain. At the same time, the technology that had been developed by the information extraction community appeared ready to exploit. Thus the task for the TREC-8 QA track was defined such that both the information retrieval and the information extraction communities could work on a common problem. The task was very similar to that used in the MURAX system (Kupiec, 1993), which used an on-line encyclopedia as a source of answers for closed-class questions, except that the answers were to be found in a large corpus of documents rather than an encyclopedia. Since the documents consisted mostly of newswire and newspaper articles, the domain was essentially unconstrained. However, only closed-class questions were used, so answers were generally entities familiar to information extraction systems. Participants were given a document collection and a test set of questions. The questions were fact-based, shortanswer questions such as How many calories are there in a

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.6532
Source http://www-nlpir.nist.gov/works/papers/lrec02.ps
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.83.71, 10.1.1.114.5131