| TECHNIQUES FOR APPROXIMATING A TRIGRAM LANGUAGE MODEL (2007) | |||||||||||||||
Abstract | |||||||||||||||
| In this paper several methods are proposed for reducing the size of a trigram language model �LM�, which is often the biggest data structure in a continuous speech recognizer, without a�ecting its performance. The common factor shared by the di�erent approaches is to select only a subset of the available trigrams, trying toidentify those trigrams that mostly contribute tothe performance of the full trigram LM. The proposed selection criteria apply to trigram contexts, both oflength oneortwo. These criteria rely on information theory concepts, the back-o � probabilities estimated by the LM, or on a measure of the phonetic�linguistic uncertainty relative to a given context. Performance of the reduced trigrams LMs are compared both interms of perplexity and recognition accuracy. Results show that all the considered methods perform better than the naive frequency shifting method. In fact, a 50 � size reduction is obtained on a shift-1 trigram LM, at the cost of a 5 � increase in word error rate. Moreover, the reduced LMs improve by around 15 � the word error rate of a bigram LM of the same size. 1. | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||