Publication View

VARIABLE WORD RATE N-GRAMS (2008)

Abstract
The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional n-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or ngrams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%. 1.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.9589
Source http://homepages.inf.ed.ac.uk/srenals/pubs/2000/icassp2000.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.129.7219, 10.1.1.138.4180, 10.1.1.40.180, 10.1.1.12.5386, 10.1.1.38.3957, 10.1.1.21.3114