FAST SPEAKER ADAPTION VIA MAXIMUM PENALIZED LIKELIHOOD KERNEL REGRESSION (2008)
Ivor W. Tsang, James T. Kwok, Brian Mak, Kai Zhang, Jeffrey J. Pan
Maximum likelihood linear regression (MLLR) has been a popular speaker adaptation method for many years. In this paper, we investigate a generalization of MLLR using nonlinear regression....
Pruning Hidden Markov Models with Optimal Brain Surgeon (2008)
A method of pruning hidden Markov models (HMMs) is presented. The main purpose is to find a good HMM topology for a given task with improved generalization capability. As a side effect, the resulting...
Unsupervised Speaker Adaptation using Reference Speaker Weighting (2008)
Abstract. Recently, we revisited the fast adaptation method called reference speaker weighting (RSW), and suggested a few modifications. We then showed that the algorithmically simplest technique...
Man-wai Mak, Roger Hsiao, Brian Mak
One key factor that hinders the widespread deployment of speaker verification technologies is the requirement of long enrollment utterances to guarantee low error rate during verification. To gain...
Feature Decision extractionSpeaker Modeling (2008)
Man-wai Mak, Roger Hsiao, Brian Mak
◮ To gain user acceptance of speaker verification technologies, adaptation algorithms that can enroll speakers with short utterances are highly essential. ◮ This paper compares four...
FAST SPEAKER ADAPTION VIA MAXIMUM PENALIZED LIKELIHOOD KERNEL REGRESSION (2008)
Ivor W. Tsang, James T. Kwok, Brian Mak, Kai Zhang, Jeffrey J. Pan
Maximum likelihood linear regression (MLLR) has been a popular speaker adaptation method for many years. In this paper, we investigate a generalization of MLLR using nonlinear regression....
Brian Mak, Yik-cheung Tamf, Roger Hsiao
The bank-of-filters spectrum analysis model is commonly used in the extraction of acoustic features for automatic speech recogni-tion. The most critical component in the analysis model is a bank of...
TRAINING OF CONTEXT-DEPENDENT SUBSPACE DISTRIBUTION CLUSTERING HIDDEN MARKOV MODEL (2007)
Training of continuous density hidden Markov models (CDHMMs) is usually time-consuming and tedious due to the large number of model parameters involved. Recently we proposed a new derivative of...
MAP ADAPTATION WITH SUBSPACE REGRESSION CLASSES AND TYING (2007)
In the hidden Markov modeling framework with mixture Gaussians, adaptation is often done by modifying the Gaussian mean vectors using MAP estimation or MLLR transformation. When the amount of...
We would like to revisit a simple fast adaptation technique called reference speaker weighting (RSW). RSW is similar to eigenvoice (EV) adaptation, and simply requires the model of a new speaker to...
Kernel Eigenspace-Based Mllr Adaptation Using Multiple Regression (2005)
Classes Roger Hsiao, Roger Hsiao, Brian Mak
Recently, we have been investigating the application of kernel methods to improve the performance of eigenvoice-based adaptation methods by exploiting possible nonlinearity in their original working...
Improving Eigenspace-based MLLR Adaptation by Kernel PCA (2004)
Brian Mak And, Brian Mak, Roger Hsiao
Eigenspace-based MLLR (EMLLR) adaptation has been shown effective for fast speaker adaptation. It applies the basic idea of eigenvoice adaptation, and derives a small set of eigenmatrices using...
Eigenvoice speaker adaptation via composite kernel PCA (2004)
James T. Kwok, Brian Mak, Simon Ho
Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to...
Speedup of kernel eigenvoice speaker adaptation by embedded kernel PCA (2004)
Brian Mak, Simon Ho, James T. Kwok
Recently, we proposed an improvement to the eigenvoice (EV) speaker adaptation called kernel eigenvoice (KEV) speaker adaptation. In KEV adaptation, eigenvoices are computed using kernel PCA, and a...
PLASER: Pronunciation Learning via Automatic Speech Recognition (2003)
Brian Mak, Manhung Siu, Mimi Ng, Yik-cheung Tam, Yu-chung Chan, Kin-wah Chan, ...
PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach...
Eigenvoice Speaker Adaptation via Composite Kernel PCA (2003)
James T. Kwok, Brian Mak, Simon Ho
Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to...
Discriminative Auditory Features for Robust Speech Recognition (2002)
Recently, Li et al. proposed a new auditory feature for robust speech recognition in noise environments. The new feature was derived by mimicking closely the function of human auditory process....
During minimum-classification-error (MCE) training, competing hypotheses against the correct one are commonly derived by the N-best algorithm. One problem with the N-best algorithm is that, in...
Development of an asynchronous multi-band system for continuous speech recognition (2001)
Recently, multi-band automatic speech recognition (MBASR) is proposed to combat environmental noises. In this paper, we describe the two major efforts in the development of our asynchronous MBASR...
Pruning of state-tying tree using bayesian information criterion (2000)
Yu-chung Chan, Manhung Siu, Brian Mak
The use of context-dependent phonetic units together with Gaussian mixture models allows modern-day speech recognizer to build very complex and accurate acoustic models. However, because of data...
Recently multi-band speech recognition has been proposed to improve robustness under environmental noises. One important issue is how to combine decisions from individual sub-band recognizers to...
One of the central themes in multi-band automatic speech recognition (ASR) is to devise a strategy for recombining sub-band information. This in turn raises two questions: (1) at what phonetic unit...
Training of Subspace Distribution Clustering Hidden Markov Model (1998)
In [2] and [7], we presented our novel subspace distribution clustering hidden Markov models (SDCHMMs) which can be converted from continuous density hidden Markov models(CDHMMs) by clustering...
Combining ANNs to improve phone recognition (1997)
In applying neural networks to speech recognition, one often finds that slightly different training configurations lead to significantly different networks. Thus different training sessions using...
Subspace distribution clustering for continuous observation density hidden markov models (1997)
This paper presents an efficient approximation of the Gaussian mixture state probability density functions of continuous observation density hidden Markov models (CHMM 's). In CHMM 's, the...
Brian Mak, Enrico Bocchieri, Etienne Barnard
In [1], our novel subspace distribution clustering hidden Markov model (SDCHMM) made its debut as an approximation to continuous density HMM(CDHMM). Deriving SDCHMMs from CDHMMs requires a definition...
Phone clustering using the Bhattacharyya distance (1996)
In this paper we study using the classi#cation-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian...
T.Bailey. The Contribution of Consonants versus Vowels to Word Recognition in Fluent Speech (1996)
Ronald A. Cole, Yonghong Yan, Brian Mak, Mark Fanty, Troy Bailey
Three perceptual experiments were conducted to test the relative importance of vowels vs. consonants to recognition of fluent speech. Sentences were selected from the TIMIT corpus to obtain...
Phone Clustering Using The Bhattacharyya Distance (1996)
In this paper we study using the classification-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian...