| The Distribution of N-Grams (2000) | |||||||||||||
|
|||||||||||||
Abstract | |||||||||||||
| N-grams are generalized words consisting of N consecutive symbols, as they are used in a text. This paper determines the rank-frequency distribution for redundant N-grams. For entire texts this is known to be Zipf''s law (i.e., an inverse power law). For N-grams, however, we show that the rank (r)-frequency distribution is [formule] where psgrN is the inverse function of fN(x)=x lnN–1x. Here we assume that the rank-frequency distribution of the symbols follows Zipf''s law with exponent beta. | |||||||||||||
Publication details | |||||||||||||
| |||||||||||||
Cited publications (2) | |||||||||||||
| |||||||||||||