| An Algorithm for V/UV/S Segmentation of Speech by (2008) | |||||||||||||||
Abstract | |||||||||||||||
| Let f(n) be a sampled voice signal. Our goal is to identify the voiced (V) portions of f (as opposed to the silence (S) and unvoiced (UV) portions). In the following discussion, the sampling rate is 22050 Hz, quantized at 8 bits. A window of length of 880 is twice the maximum period of the minimum frequency of 50 Hz we will track in the time domain. We use the algorithm described below to provide a robust estimate of the fundamental period to start our glottal pulse (GP) or pitch tracker described in [1]. We compute R(f), the discrete Fourier transform (DFT) for samples in the window. Let T(f) = | R(f) | having eliminated the last half of R because of symmetry. If the window contains more than one occurrence of the fundamental period of a voiced utterance, there will be “aliasing ” in T, where we define “aliasing ” to be any inaccuracies in the frequency analysis of a periodic signal resulting from a poor choice of window size. A plot of T(f) in Figure 1 is of voiced speech. Figure 1(a) DFT with aliasing. Figure 1(b) DFT w/o aliasing Notice Figure 1(b) is the envelope of Figure 1(a). The aliasing is seen in Figure 2 by observing the large area under the DC term and under the first peak. The real Cepstrum c(f) is the logarithm of the power spectrum, c(f) = 2 ln | R(f) |. From the properties of the Cepstrum, we know c(f) consists of two components: a slowly varying component which corresponds to the spectral envelope and a rapidly varying component which corresponds to the pitch harmonic peaks [2]. Since the logarithm is monotonically increasing, R(f) also has two components: one which corresponds to the spectral envelope and another which corresponds to the pitch harmonic peaks. These components can be separated by filtering, which is the traditional way to proceed, or by a second Fourier transform, which we have found to be more resistant to noise. Y(f) = T(R(f)). A graph of Y(f) in the case of a voiced utterance is shown as follows: | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||