| Experiments in Spoken Document Retrieval at CMU (1998) | |||||||||||||||
Abstract | |||||||||||||||
| We describe our submission to the TREC-6 Spoken Document Retrieval (SDR) track and the speech recognition and the information retrieval engines. We present SDR evaluation results and a brief analysis. A few developments and experiments are also described in detail including: . Vocabulary size experiments, which assess the effect of words missing from the speech recognition vocabulary. For our 51,000-word vocabulary the effect was minimal. . Speech recognition using a stemmed language model, where the model statistics of words containing the same root are combined. Stemmed language models did not improve speech recognition or information retrieval. . Merging the IBM and CMU speech recognition data. Combining the results of two independent recognition systems slightly boosted information retrieval results. . Confidence annotations that estimate of the correctness of each recognized word. Confidence annotations did not appear to improve retrieval. . N-best lists where the top recogni... | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||