Publication View

Experiments in Spoken Document Retrieval at CMU (1998)

Abstract
We describe our submission to the TREC-6 Spoken Document Retrieval (SDR) track and the speech recognition and the information retrieval engines. We present SDR evaluation results and a brief analysis. A few developments and experiments are also described in detail including: . Vocabulary size experiments, which assess the effect of words missing from the speech recognition vocabulary. For our 51,000-word vocabulary the effect was minimal. . Speech recognition using a stemmed language model, where the model statistics of words containing the same root are combined. Stemmed language models did not improve speech recognition or information retrieval. . Merging the IBM and CMU speech recognition data. Combining the results of two independent recognition systems slightly boosted information retrieval results. . Confidence annotations that estimate of the correctness of each recognized word. Confidence annotations did not appear to improve retrieval. . N-best lists where the top recogni...

Publication details
Download http://citeseer.ist.psu.edu/110270.html
Source http://trec.nist.gov/pubs/trec6/papers/cmu.trecSDR97-report.ps
Publisher unknown
Contributors The Pennsylvania State University CiteSeer Archives
Repository CiteSeer (United States)
Keywords M. A. Siegler,M. J. Witbrock,S. T. Slattery,K. Seymore,R. E. Jones,A. G. Hauptmann Experiments in Spoken Document Retrieval at CMU
Language Englisch