Brian Mak

Publication List Details

Period

1996 - 2008

Number

29

Co-Authors

FAST SPEAKER ADAPTION VIA MAXIMUM PENALIZED LIKELIHOOD KERNEL REGRESSION (2008)

Ivor W. Tsang, James T. Kwok, Brian Mak, Kai Zhang, Jeffrey J. Pan

Maximum likelihood linear regression (MLLR) has been a popular speaker adaptation method for many years. In this paper, we investigate a generalization of MLLR using nonlinear regression....

Pruning Hidden Markov Models with Optimal Brain Surgeon (2008)

Brian Mak, Kin-wah Chan

A method of pruning hidden Markov models (HMMs) is presented. The main purpose is to find a good HMM topology for a given task with improved generalization capability. As a side effect, the resulting...

Unsupervised Speaker Adaptation using Reference Speaker Weighting (2008)

Tsz-chung Lai, Brian Mak

Abstract. Recently, we revisited the fast adaptation method called reference speaker weighting (RSW), and suggested a few modifications. We then showed that the algorithmically simplest technique...

A comparison of various adaptation methods for speaker verification with limited enrollment data (2008)

Man-wai Mak, Roger Hsiao, Brian Mak

One key factor that hinders the widespread deployment of speaker verification technologies is the requirement of long enrollment utterances to guarantee low error rate during verification. To gain...

Feature Decision extractionSpeaker Modeling (2008)

Man-wai Mak, Roger Hsiao, Brian Mak

◮ To gain user acceptance of speaker verification technologies, adaptation algorithms that can enroll speakers with short utterances are highly essential. ◮ This paper compares four...

FAST SPEAKER ADAPTION VIA MAXIMUM PENALIZED LIKELIHOOD KERNEL REGRESSION (2008)

Ivor W. Tsang, James T. Kwok, Brian Mak, Kai Zhang, Jeffrey J. Pan

Maximum likelihood linear regression (MLLR) has been a popular speaker adaptation method for many years. In this paper, we investigate a generalization of MLLR using nonlinear regression....

DISCRIMINATIVE TRAINING OF AUDITORY FILTERS OF DIFFERENT SHAPES FOR ROBUST SPEECH RECOGNITION ABSTRACT (2008)

Brian Mak, Yik-cheung Tamf, Roger Hsiao

The bank-of-filters spectrum analysis model is commonly used in the extraction of acoustic features for automatic speech recogni-tion. The most critical component in the analysis model is a bank of...

TRAINING OF CONTEXT-DEPENDENT SUBSPACE DISTRIBUTION CLUSTERING HIDDEN MARKOV MODEL (2007)

Brian Mak, Enrico Bocchieri

Training of continuous density hidden Markov models (CDHMMs) is usually time-consuming and tedious due to the large number of model parameters involved. Recently we proposed a new derivative of...

MAP ADAPTATION WITH SUBSPACE REGRESSION CLASSES AND TYING (2007)

Kwok-man Wong, Brian Mak

In the hidden Markov modeling framework with mixture Gaussians, adaptation is often done by modifying the Gaussian mean vectors using MAP estimation or MLLR transformation. When the amount of...

Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers (2006)

Brian Mak, Tsz-chung Lai

We would like to revisit a simple fast adaptation technique called reference speaker weighting (RSW). RSW is similar to eigenvoice (EV) adaptation, and simply requires the model of a new speaker to...

Kernel Eigenspace-Based Mllr Adaptation Using Multiple Regression (2005)

Classes Roger Hsiao, Roger Hsiao, Brian Mak

Recently, we have been investigating the application of kernel methods to improve the performance of eigenvoice-based adaptation methods by exploiting possible nonlinearity in their original working...

Improving Eigenspace-based MLLR Adaptation by Kernel PCA (2004)

Brian Mak And, Brian Mak, Roger Hsiao

Eigenspace-based MLLR (EMLLR) adaptation has been shown effective for fast speaker adaptation. It applies the basic idea of eigenvoice adaptation, and derives a small set of eigenmatrices using...

Eigenvoice speaker adaptation via composite kernel PCA (2004)

James T. Kwok, Brian Mak, Simon Ho

Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to...

Speedup of kernel eigenvoice speaker adaptation by embedded kernel PCA (2004)

Brian Mak, Simon Ho, James T. Kwok

Recently, we proposed an improvement to the eigenvoice (EV) speaker adaptation called kernel eigenvoice (KEV) speaker adaptation. In KEV adaptation, eigenvoices are computed using kernel PCA, and a...

PLASER: Pronunciation Learning via Automatic Speech Recognition (2003)

Brian Mak, Manhung Siu, Mimi Ng, Yik-cheung Tam, Yu-chung Chan, Kin-wah Chan, ...

PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach...

Eigenvoice Speaker Adaptation via Composite Kernel PCA (2003)

James T. Kwok, Brian Mak, Simon Ho

Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to...

Discriminative Auditory Features for Robust Speech Recognition (2002)

Brian Mak, Yik-cheung Tam

Recently, Li et al. proposed a new auditory feature for robust speech recognition in noise environments. The new feature was derived by mimicking closely the function of human auditory process....

An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training (2002)

Yik-cheung Tam, Brian Mak

During minimum-classification-error (MCE) training, competing hypotheses against the correct one are commonly derived by the N-best algorithm. One problem with the N-best algorithm is that, in...

Development of an asynchronous multi-band system for continuous speech recognition (2001)

Yik-cheung Tam, Brian Mak

Recently, multi-band automatic speech recognition (MBASR) is proposed to combat environmental noises. In this paper, we describe the two major efforts in the development of our asynchronous MBASR...

Pruning of state-tying tree using bayesian information criterion (2000)

Yu-chung Chan, Manhung Siu, Brian Mak

The use of context-dependent phonetic units together with Gaussian mixture models allows modern-day speech recognizer to build very complex and accurate acoustic models. However, because of data...

Optimization of sub-band weights using simulated noisy speech in multi-band speech recognition (2000)

Yik-cheung Tam, Brian Mak

Recently multi-band speech recognition has been proposed to improve robustness under environmental noises. One important issue is how to combine decisions from individual sub-band recognizers to...

Asynchrony with trained transition probabilities improves performance in multi-band speech recognition (2000)

Brian Mak, Yik-cheung Tam

One of the central themes in multi-band automatic speech recognition (ASR) is to devise a strategy for recombining sub-band information. This in turn raises two questions: (1) at what phonetic unit...

Training of Subspace Distribution Clustering Hidden Markov Model (1998)

Brian Mak, Enrico Bocchieri

In [2] and [7], we presented our novel subspace distribution clustering hidden Markov models (SDCHMMs) which can be converted from continuous density hidden Markov models(CDHMMs) by clustering...

Combining ANNs to improve phone recognition (1997)

Brian Mak

In applying neural networks to speech recognition, one often finds that slightly different training configurations lead to significantly different networks. Thus different training sessions using...

Subspace distribution clustering for continuous observation density hidden markov models (1997)

Enrico Bocchieri, Brian Mak

This paper presents an efficient approximation of the Gaussian mixture state probability density functions of continuous observation density hidden Markov models (CHMM 's). In CHMM 's, the...

Stream Derivation And Clustering Scheme For Subspace Distribution Clustering Hidden Markov Model (1997)

Brian Mak, Enrico Bocchieri, Etienne Barnard

In [1], our novel subspace distribution clustering hidden Markov model (SDCHMM) made its debut as an approximation to continuous density HMM(CDHMM). Deriving SDCHMMs from CDHMMs requires a definition...

Phone clustering using the Bhattacharyya distance (1996)

Brian Mak, Etienne Barnard

In this paper we study using the classi#cation-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian...

T.Bailey. The Contribution of Consonants versus Vowels to Word Recognition in Fluent Speech (1996)

Ronald A. Cole, Yonghong Yan, Brian Mak, Mark Fanty, Troy Bailey

Three perceptual experiments were conducted to test the relative importance of vowels vs. consonants to recognition of fluent speech. Sentences were selected from the TIMIT corpus to obtain...

Phone Clustering Using The Bhattacharyya Distance (1996)

Brian Mak, Etienne Barnard

In this paper we study using the classification-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian...