Hervé Bourlard

Social Signals, their Function, and Automatic Analysis: (2009)

Ro Vinciarelli, Maja Pantic, Hervé Bourlard, Alex Pentl

Social Signal Processing (SSP) aims at the analysis of social behaviour in both Human-Human and Human-Computer interactions. SSP revolves around automatic sensing and interpretation of social...

Eurospeech 2001- Scandinavia Modeling Auxiliary Information in Bayesian Network Based ASR (2008)

Todd A. Stephenson, M. Mathew, Hervé Bourlard

Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not...

USING MORE INFORMATIVE POSTERIOR PROBABILITIES FOR SPEECH RECOGNITION (2008)

Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Hervé Bourlard

In this paper, we present initial investigations towards boosting posterior probability based speech recognition systems by estimating more informative posteriors taking into account acoustic context...

RECOGNITION AND UNDERSTANDING OF MEETINGS THE AMI AND AMIDA PROJECTS (2008)

Steve Renals, Thomas Hain, Hervé Bourlard

The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using...

Text Identification in Complex Background Using SVM (2008)

Datong Chen, Hervé Bourlard

This paper presents a fast and robust algorithm to identify text in image or video frames with complex backgrounds and compression effects. The algorithm first extracts the candidate text line on the...

Summary (2008)

Shantanu Chakrabartty, Gert Cauwenberghs, Hervé Bourlard

Driven by the proliferation of portable devices like cellular phones, personal digital assistants (PDAs) and smart wrist watches there has been an ever increasing demand for efficient and robust user...

RUNNING HEAD: Hidden Markov Models and other Finite State Automata Correspondence: (2008)

Hervé Bourlard, Samy Bengio, Hervé Bourlard

During these last 20 years, Finite State Automata (FSA), and more particularly Stochastic Finite State Au-tomata (SFSA) and different variants of Hidden Markov Models (HMMs), have been used quite...

H.: Low Cost Duration Modelling for Noise Robust Speech Recognition (2008)

Andrew C. Morris, Andrew C. Morris, Simon Payne, Simon Payne, Hervé Bourlard, ...

State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately...

Submitted to IEEE Trans. on SAP EDICS: 1-RECO COMPARISON AND COMBINATION OF FEATURES IN A HYBRID HMM/MLP AND A HMM/GMM SPEECH RECOGNITION SYSTEM # (2008)

Pere Pujol, Susagna Pol, Climent Nadeu, Astrid Hagen, Hervé Bourlard

Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency...

Social Signal Processing: State-of-the-art and future perspectives of an emerging domain (2008)

Ro Vinciarelli, Maja Pantic, Hervé Bourlard, Alex Pentl

The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued...

Interfacing of CASA and Multistream recognition (2007)

Hervé Glotin, Frédéric Berthommier, Emmanuel Tessier, Hervé Bourlard

. In this paper we propose a running demonstration of coupling between an intermediate processing step (named CASA), based on the harmonicity cue, and partial recognition, implemented with a HMM/ANN...

Development of Acoustic and Linguistic Resources for Research and Evaluation in Interactive Vocal Information Servers (2007)

Giulia Bernardis, Hervé Bourlard, Martin Rajman, Jean-Cédric Chappelier

This paper describes the setting up of a resource database for research and evaluation in the domain of interactive vocal information servers. All this resource development work took place in a...

MODELLING AUXILIARY FEATURES in TANDEM SYSTEMS (2007)

Todd A. Stephenson, Shajith Ikbal, Hervé Bourlard

Tandem systems transform the cepstral features into posterior probabilities of subword units using artificial neural networks (ANNs), which are processed to form input features for conventional...

Recognition and Understanding of Meetings The AMI and AMIDA Projects (2007)

Steve Renals, Thomas Hain, Hervé Bourlard, Steve Renals, Thomas Hain, Hervé Bourlard

to appear in Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU’07

AGGLOMERATIVE INFORMATION BOTTLENECK FOR SPEAKER DIARIZATION OF MEETINGS DATA (2007)

Deepu Vijayasenan A, Fabio Valente A, Hervé Bourlard A, Deepu Vijayasenan, Fabio Valente, Hervé Bourlard

submitted for publication Abstract. In this paper, we investigate the use of agglomerative Information Bottleneck (aIB) clustering for the speaker diarization task of meetings data. In contrary to...

Using Pitch as Prior Knowledge in Template-Based Speech Recognition (2006)

Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé

In a previous paper on speech recognition, we showed that templates can better capture the dynamics of speech signal compared to parametric models such as hidden Markov models. The key point in...

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR (2006)

Tyagi, Vivek, Bourlard, Hervé, Wellekens, Christian

It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time...

Unsupervised Spectral Subtraction for Noise-Robust ASR on Unknown Transmission Channels (2006)

Lathoud, Guillaume, Bourlard, Hervé

This paper addresses several issues of classical spectral subtraction methods with respect to the automatic speech recognition task in noisy environments. The main contributions of this paper are...

Towards using slide information to enhance speech transcription of meetings (2006)

Peregoudov, Artem, Vinciarelli, Alessandro, Bourlard, Hervé

In this paper we investigate the possibility of improving the speech recognition performance of meeting recordings by using slides captured during the recording process. The key hypothesis exploited...

Using Posterior-Based Features in Template Matching for Speech Recognition (2006)

Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé

Given the availability of large speech corpora, as well as the increasing of memory and computational resources, the use of template matching approaches for automatic speech recognition (ASR) have...

Hierarchical Multi-Stream Posterior Based Speech Recognition System (2005)

Ketabdar, Hamed, Bourlard, Hervé, Bengio, Samy

In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into...

Developing and Enhancing Posterior Based Speech Recognition Systems (2005)

Ketabdar, Hamed, Vepa, Jithendra, Bengio, Samy, Bourlard, Hervé

Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni...

Improving Speech Recognition Using a Data-Driven Approach (2005)

Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé

In this paper, we investigate the possibility of enhancing state-of-the-art HMM-based speech recognition systems using data-driven techniques, where whole set of training utterances is used as...

Multi-resolution Spectral Entropy Based Feature for Robust ASR (2005)

Misra, Hemant, Ikbal, Shajith, Sivadas, Sunil, Bourlard, Hervé

Recently, entropy measures at different stages of recognition have been used in automatic speech recognition (ASR) task. In a recent paper, we proposed that formant positions of a spectrum can be...

Spectral Entropy Feature in Full-Combination Multi-Stream for Robust ASR (2005)

Misra, Hemant, Bourlard, Hervé

In a recent paper, we reported promising automatic speech recognition results obtained by appending spectral entropy features to PLP features. In the present paper, spectral entropy features are used...

Continuous Microphone Array Speech Recognition on Wall Street Journal Corpus (2005)

Maganti, Hari Krishna, Vepa, Jithendra, Bourlard, Hervé

In this paper, we present a robust speech acquisition system to acquire continuous speech using a microphone array. A microphone array based speech recognition system is also presented to study the...

Using Pitch as Prior Knowledge in Template-Based Speech Recognition (2005)

Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé

In a previous paper on speech recognition, we showed that templates can better capture the dynamics of speech signal compared to parametric models such as hidden Markov models. The key point in...

Improving Speech Recognition Using a Data-Driven Approach (2005)

Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé

In this paper, we investigate the possibility of enhancing state-of-the-art HMM-based speech recognition systems using data-driven techniques, where whole set of training utterances is used as...

Threshold Selection for Unsupervised Detection, with an Application to Microphone Arrays (2005)

Lathoud, Guillaume, Odobez, Jean-Marc, Bourlard, Hervé

Detection is usually done by comparing some criterion to a threshold. It is often desirable to keep a performance metric such as False Alarm Rate constant across conditions. Using training data to...

Multi-stream ASR: Oracle Test and Embedded Training (2005)

Misra, Hemant, Vepa, Jithendra, Bourlard, Hervé

Multi-stream based automatic speech recognition (ASR) systems outperform their single stream counterparts, especially in the case of noisy speech. However, the main issues in multi-stream systems are...

Improving Continuous Speech Recognition System Performance with Grapheme Modelling (2005)

Dines, John, Bourlard, Hervé, Hermansky, Hynek

This paper investigates automatic speech recognition system using context-dependent graphemes as subword units based on the conventional HMM/GMM system as well as TANDEM system. Experimental studies...

Spectral Entropy Feature in Full-Combination Multi-Stream for Robust ASR (2005)

Misra, Hemant, Bourlard, Hervé

In a recent paper, we reported promising automatic speech recognition results obtained by appending spectral entropy features to PLP features. In the present paper, spectral entropy features are used...

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR (2005)

Tyagi, Vivek, Bourlard, Hervé, Wellekens, Christian

It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time...

Hierarchical Multi-Stream Posterior Based Speech Recognition System (2005)

Ketabdar, Hamed, Bourlard, Hervé, Bengio, Samy

In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into...

Developing and Enhancing Posterior Based Speech Recognition Systems (2005)

Ketabdar, Hamed, Vepa, Jithendra, Bengio, Samy, Bourlard, Hervé

Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni...

Unsupervised Spectral Substraction for Noise-Robust ASR (2005)

Lathoud, Guillaume, Mesot, Bertrand, Bourlard, Hervé

This paper proposes a simple, computationally efficient mbox2-mixture model approach to discriminate between speech andbackground noise at the magnitude spectrogram level. It is directly derived from...

Spectral Entropy Feature in Multi-Stream for Robust ASR (2005)

Misra, Hemant, Bourlard, Hervé

In recent papers, entropy computed from sub-bands of the spectrum was used as a feature for automatic speech recognition. In the present paper, we further study the sub-band spectral entropy features...

Hierarchical multi-stream posterior based speech secognition system (2005)

Hamed Ketabdar, Hervé Bourlard, Samy Bengio

Abstract. In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking...

Developing and enhancing posterior based speech recognition systems (2005)

Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Hervé Bourlard

Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., “Tandem”) to improve speech...

Developing and enhancing posterior based speech recognition systems (2005)

Hamed Ketabdar, Jithendra Vepa, Hervé Bourlard, Hamed Ketabdar, Jithendra Vepa, Samy Bengio, ...

submitted for publication Abstract. Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g.,...

Text Detection and Recognition in Images and Videos (2004)

Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé

Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing...

Joint Decoding for Phoneme-Grapheme Continuous Speech Recognition (2004)

Bengio, Samy, Bourlard, Hervé

Standard ASR systems typically use phoneme as the subword units. Preliminary studies have shown that the performance of the ASR system could be improved by using grapheme as additional subword units....

Modelling Auxiliary Features in Tandem Systems (2004)

Stephenson, Todd A., Ikbal, Shajith, Bourlard, Hervé

Tandem systems transform the cepstral features into posterior probabilities of subword units using artificial neural networks (ANNs), which are processed to form input features for conventional...

Confidence Measures in Multiple pronunciations Modeling For Speaker Verification (2004)

BenZeghiba, Mohamed F., Bourlard, Hervé

This paper investigates the use of multiple pronunciations modeling for User-Customized Password Speaker Verification (UCP-SV). The main characteristic of the UCP-SV is that the system does not have...

Spectral Entropy Based Feature for Robust ASR (2004)

Misra, Hemant, Ikbal, Shajith, Bourlard, Hervé, Hermansky, Hynek

In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF), entropy can also be used to measure the...

Posteriori Probabilities and Likelihoods Combination for Speech and Speaker Recognition (2004)

BenZeghiba, Mohamed F., Bourlard, Hervé

This paper investigates a new approach to perform simultaneous speech and speaker recognition. The likelihood estimated by a speaker identification system is combined with the posterior probability...

Posteriori Probabilities and Likelihoods Combination for Speech and Speaker Recognition (2004)

BenZeghiba, Mohamed F., Bourlard, Hervé

This paper investigates a new approach to perform simultaneous speech and speaker recognition. The likelihood estimated by a speaker identification system is combined with the posterior probability...

Modelling Auxiliary Features in Tandem Systems (2004)

Stephenson, Todd A., Ikbal, Shajith, Bourlard, Hervé

Tandem systems transform the cepstral features into posterior probabilities of subword units using artificial neural networks (ANNs), which are processed to form input features for conventional...

On the Adequacy of Baseform Pronunciations and Pronunciation Variants (2004)

Bourlard, Hervé

This paper presents an approach to automatically extract and evaluate the ``stability'' of pronunciation variants (i.e., adequacy of the model to accommodate this variability), based on multiple...

Phoneme vs Grapheme Based Automatic Speech Recognition (2004)

Dines, John, Bourlard, Hervé, Hermansky, Hynek

In recent literature, different approaches have been proposed to use graphemes as subword units with implicit source of phoneme information for automatic speech recognition. The major advantage of...

Towards using hierarchical posteriors for flexible automatic speech recognition systems (2004)

Bourlard, Hervé, Bengio, Samy, Doss, Mathew Magimai, Zhu, Qifeng, Mesot, Bertrand, Morgan, Nelson

Local state (or phone) posterior probabilities are often investigated as local classifiers (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') towards improved...

Multi-resolution Spectral Entropy Based Feature for Robust ASR (2004)

Misra, Hemant, Ikbal, Shajith, Sivadas, Sunil, Bourlard, Hervé

Recently, entropy measures at different stages of recognition have been used in automatic speech recognition (ASR) task. In a recent paper, we proposed that formant positions of a spectrum can be...

User-Customized Password Speaker Verification Using Multiple Reference and Background Models (2004)

BenZeghiba, Mohamed F., Bourlard, Hervé

In this paper, we discuss and optimize a HMM-based User-Customized Password Speaker Verification (UCP-SV) system, where users can have their own passwords (with no lexical constraints) after a short...

Phase AutoCorrelation (PAC) Features in Entropy Based Multi-Stream for Robust Speech Recognition (2004)

Shajith Ikbal, Hemant Misra, Hervé Bourlard, Hynek Hermansky

Methods to improve noise robustness of speech recognition systems often result in degradation of recognition performance for clean speech. Recently proposed Phase AutoCorrelation (PAC) [1, 2] based...

Spectro-temporal activity pattern (STAP) features for noise robust ASR (2004)

Shajith Ikbal, Hemant Misra, Hervé Bourlard

In this paper, we introduce a new noise robust representation of speech signal obtained by locating points of potential importance in the spectrogram, and parameterizing the activity of...

Entropy based combination of tandem representations for noise robust asr (2004)

Shajith Ikbal, Hemant Misra, Sunil Sivadas, Hynek Hermansky, Hervé Bourlard

In this paper, we present an entropy based method to combine tandem representations of the recently proposed Phase AutoCorrelation (PAC) based features and Mel-Frequency Cepstral Coefficients (MFCC)...

Multi channel sequence processing (2004)

Samy Bengio, Hervé Bourlard

submitted for publication Abstract. This paper summarizes some of the current research challenges arising from multichannel sequence processing. Indeed, multiple real life applications involve...

Multi channel sequence processing (2004)

Samy Bengio, Hervé Bourlard

Abstract. This paper summarizes some of the current research challenges arising from multi-channel sequence processing. Indeed, multiple real life applications involve simultaneous recording and...

Robust speaker change detection (2004)

Jitendra Ajmera, Iain Mccowan, Hervé Bourlard

Abstract—Most commonly used criteria for speaker change detection like log likelihood ratio (LLR) and Bayesian information criterion (BIC) have an adjustable threshold/penalty parameter to make...

Hybrid HMM/ANN and GMM Combination for User-Customized Password Speaker Verification (2003)

BenZeghiba, Mohamed F., Bourlard, Hervé

Recently we have proposed an approach for user-customized password speaker verification; in this approach, we combined a hybrid HMM/ANN model (used for utterance verification) and a GMM model (used...

On automatic annotation of meeting databases (2003)

Gatica-Perez, Daniel, McCowan, Iain, Barnard, Mark, Bengio, Samy, Bourlard, Hervé

In this paper, we discuss meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks,...

On the Combination of Speech and Speaker Recognition (2003)

BenZeghiba, Mohamed F., Bourlard, Hervé

This paper investigates an approach that maximizes the joint posterior probabil ity of the pronounced word and the speaker identity given the observed data. This probability can be expressed as a...

Using pitch frequency information in speech recognition (2003)

Stephenson, Todd A., Bourlard, Hervé

Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with pitch...

Phoneme-Grapheme Based Speech Recognition System (2003)

Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy

State-of-the-art Automatic Speech Recognition (ASR) systems typically use phoneme as the subword units. In this paper, we investigate a system where the word models are defined in-terms of two...

Comparison and Combination of Features in a Hybrid HMM/MLP and a HMM/GMM Speech Recognition System (2003)

Pujol, Pere, Pol, Susagna, Nadeu, Climent, Hagen, Astrid, Bourlard, Hervé

Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency...

New entropy based combination rules in HMM/ANN multi-stream ASR (2003)

Misra, Hemant, Bourlard, Hervé, Tyagi, Vivek

Classifier performance is often enhanced through combining multiple streams of information. In the context of multi-stream HMM/ANN systems in ASR, a confidence measure widely used in classifier...

Information Retrieval on Noisy Text (2003)

Grangier, David, Vinciarelli, Alessandro, Bourlard, Hervé

Spoken Document Retrieval (SDR) consists in retrieving segments of a speech database that are relevant to a query. The state-of-the-art approach to the SDR problem consists in transcribing the speech...

Using pitch frequency information in speech recognition (2003)

Stephenson, Todd A., Bourlard, Hervé

Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with pitch...

On automatic annotation of meeting databases (2003)

Gatica-Perez, Daniel, McCowan, Iain, Barnard, Mark, Bengio, Samy, Bourlard, Hervé

In this paper, we discuss meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks,...

Phoneme-Grapheme Based Speech Recognition System (2003)

Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy

State-of-the-art Automatic Speech Recognition (ASR) systems typically use phoneme as the subword units. In this paper, we investigate a system where the word models are defined in-terms of two...

On the Combination of Speech and Speaker Recognition (2003)

BenZeghiba, Mohamed F., Bourlard, Hervé

This paper investigates an approach that maximizes the joint posterior probabil ity of the pronounced word and the speaker identity given the observed data. This probability can be expressed as a...

Comparison and Combination of Features in a Hybrid HMM/MLP and a HMM/GMM Speech Recognition System (2003)

Pujol, Pere, Pol, Susagna, Nadeu, Climent, Hagen, Astrid, Bourlard, Hervé

Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency...

Joint Decoding for Phoneme-Grapheme Continuous Speech Recognition (2003)

Bengio, Samy, Bourlard, Hervé

Standard ASR systems typically use phoneme as the subword units. Preliminary studies have shown that the performance of the ASR system could be improved by using grapheme as additional subword units....

Confidence Measures in Multiple pronunciations Modeling For Speaker Verification (2003)

BenZeghiba, Mohamed F., Bourlard, Hervé

This paper investigates the use of multiple pronunciations modeling for User-Customized Password Speaker Verification (UCP-SV). The main characteristic of the UCP-SV is that the system does not have...

Some Emerging Concepts in Speech Recognition. (2003)

Hermansky, Hynek, Bourlard, Hervé

The paper presents a work-in-progress on several emerging concepts in Automatic Speech Recognition (ASR), that are being currently studied at IDIAP. This work can be roughly categorized into three...

Spectral Entropy Based Feature for Robust ASR (2003)

Misra, Hemant, Ikbal, Shajith, Bourlard, Hervé, Hermansky, Hynek

In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF), entropy can also be used to measure the...

Phase Auto-Correlation (PAC) derived Robust Speech Features (2003)

Shajith Ikbal, Hemant Misra, Hervé Bourlard

In this paper, we introduce a new class of noise robust acoustic features derived from a new measure of autocorrelation, and explicitly exploiting the phase variation of the speech signal frame over...

Towards computer understanding of human interactions (2003)

Iain Mccowan, Daniel Gatica-perez, Samy Bengio, Darren Moore, Hervé Bourlard

Abstract. People meet in order to interact- disseminating information, making decisions, and creating new ideas. Automatic analysis of meetings is therefore important from two points of view:...

Phoneme-Grapheme Based Speech Recognition (2003)

Martigny Valais Switzerl, Todd A. Stephenson, Hervé Bourlard, Todd A. Stephenson, ...

submitted for publication Abstract. State-of-the-art Automatic Speech Recognition (ASR) systems typically use phoneme as the subword units. In this paper, we investigate a system where the word...

On Automatic Annotation Of Meeting Databases (2003)

Iain Mccowan, Mark Barnard, Samy Bengio, Hervé Bourlard

In this paper, we present meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks,...

On Factorizing Spectral Dynamics for Robust Speech Recognition (2003)

Vivek Tyagi Iain, Iain Mccowan, Hervé Bourlard, Hemant Misra

In this paper, we introduce new dynamic speech features based on the modulation spectrum. These features, termed Melcepstrum Modulation Spectrum (MCMS), map the time trajectories of the spectral...

On automatic annotation of meeting databases (2003)

Daniel Gatica-perez, Iain Mccowan, Mark Barnard, Samy Bengio, Hervé Bourlard

In this paper, we discuss meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks,...

On automatic annotation of meeting databases (2003)

Martigny Valais Switzerl, Daniel Gatica-perez, Iain Mccowan, Mark Barnard, Samy Bengio, Hervé Bourlard, ...

Abstract. In this paper, we present meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal...

Mel-Cepstrum Modulation Spectrum (mcms) Features For Robust Asr (2003)

Vivek Tyagi Iain, Iain Mccowan, Hemant Misra, Hervé Bourlard

In this paper, we present new dynamic features derived from the modulation spectrum of the cepstral trajectories of the speech signal. Cepstral trajectories are projected over the basis of sines and...

Microphone array post-filter based on noise field coherence (2003)

Iain A. Mccowan, Hervé Bourlard

Abstract—This paper introduces a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a...

Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition (2002)

Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé

Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution...

Dynamic Bayesian Network Based Speech Recognition with Pitch and Energy as Auxiliary Variables (2002)

Stephenson, Todd A., Escofet, Jaume, Magimai-Doss, Mathew, Bourlard, Hervé

Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they...

Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition (2002)

Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé

In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain...

Low cost duration modelling for noise robust speech recognition (2002)

Morris, Andrew, Payne, Simon, Bourlard, Hervé

State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately...

Text Segmentation and Recognition in Complex Background Based on Markov Random Field (2002)

Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé

In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each...

User-Customized Password HMM Based Speaker Verification (2002)

BenZeghiba, Mohamed F., Bourlard, Hervé

is presented. The system has no it a priori knowledge of passwords. A hybrid HMM/ANN system is used to infer the phonetic transcription of the password. The emission probabilities are then modeled by...

User-Customized Password Speaker Verification based on HMM/ANN and GMM Models (2002)

BenZeghiba, Mohamed F., Bourlard, Hervé

In this paper, we present a new approach towards user-custom-ized password speaker verification combining the advantages of hybrid HMM/ANN systems, using Artificial Neural Networks (ANN) to estimate...

Dynamic Bayesian Network Based Speech Recognition with Pitch and Energy as Auxiliary Variables (2002)

Stephenson, Todd A., Escofet, Jaume, Magimai-Doss, Mathew, Bourlard, Hervé

Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they...

Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition (2002)

Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé

In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain...

Low cost duration modelling for noise robust speech recognition (2002)

Morris, Andrew C., Payne, Simon, Bourlard, Hervé

State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately...

Text Segmentation and Recognition in Complex Background Based on Markov Random Field (2002)

Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé

In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each...

Entropy-Based Multi-Stream Combination (2002)

Misra, Hemant, Bourlard, Hervé, Tyagi, Vivek

Full-combination multi-band approach has been proposed in the literature and performs well for band-limited noise. But the approach fails to deliver in case of wide-band noise. To overcome this,...

Text Detection and Recognition in Images and Videos (2002)

Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé

Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing...

Modelling auxiliary information (pitch frequency) in hybrid HMM/ANN based ASR systems (2002)

Stephenson, Todd A., Bourlard, Hervé

Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with auxiliary...

User-Customized Password HMM Based Speaker Verification (2002)

BenZeghiba, Mohamed F., Bourlard, Hervé

is presented. The system has no it a priori knowledge of passwords. A hybrid HMM/ANN system is used to infer the phonetic transcription of the password. The emission probabilities are then modeled by...

Hybrid HMM/ANN and GMM Combination for User-Customized Password Speaker Verification (2002)

BenZeghiba, Mohamed F., Bourlard, Hervé

Recently we have proposed an approach for user-customized password speaker verification; in this approach, we combined a hybrid HMM/ANN model (used for utterance verification) and a GMM model (used...

User-Customized Password Speaker Verification based on HMM/ANN and GMM Models (2002)

BenZeghiba, Mohamed F., Bourlard, Hervé

In this paper, we present a new approach towards user-custom-ized password speaker verification combining the advantages of hybrid HMM/ANN systems, using Artificial Neural Networks (ANN) to estimate...

Robust Hmm-Based Speech/music Segmentation (2002)

Jitendra Ajmera Iain, Iain A. Mccowan, Hervé Bourlard

In this paper we present a new approach towards high performance speech/music segmentation on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here,...

Microphone Array Post-Filter for Diffuse Noise Field (2002)

Iain A. Mccowan, Hervé Bourlard

This paper proposes a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a modification of the...

Multi-stream adaptive evidence combination for noise robust ASR (2001)

Morris, Andrew, Hagen, Astrid, Glotin, Hervé, Bourlard, Hervé

In this paper we develop different mathematical models in the framework of the multi-stream paradigm for noise robust ASR, and discuss their close relationship with human speech perception. Largely...

Text Enhancement with Asymmetric Filter for Video OCR (2001)

Chen, Datong, Shearer, Kim, Bourlard, Hervé

Stripes are common sub-structures of text characters, and the scale of these stripes varies little within a word. This scale consistency thus provides us with a useful feature for text detection and...

From missing data to maybe useful data: soft data modelling for noise robust ASR (2001)

Morris, Andrew, Barker, Jon, Bourlard, Hervé

Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel...

Modeling Auxiliary Information in Bayesian Network Based ASR (2001)

Stephenson, Todd A., Mathew, M., Bourlard, Hervé

Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not...

Video OCR for Sport Video Annotation and Retrieval (2001)

Chen, Datong, Shearer, Kim, Bourlard, Hervé

This paper presents a video OCR system that automatically extracts closed captions from video frames as keywords (or as we called "cues") for building annotations of sport videos. In this system,...

MAP Combination of Multi-Stream HMM or HMM/ANN Experts (2001)

Morris, Andrew, Hagen, Astrid, Bourlard, Hervé

Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of...

Text Identification in Complex Background using SVM (2001)

Chen, Datong, Bourlard, Hervé, Thrian, Jean-Philippe

This paper presents a fast and robust algorithm to identify text in image or video frames with complex backgrounds and compression effects. The algorithm first extracts the candidate text line on the...

Modeling Auxiliary Information in Bayesian Network Based ASR (2001)

Stephenson, Todd A., MAGIMAI DOSS, Mathew, Bourlard, Hervé

Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not...

Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition (2001)

Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé

Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution...

From missing data to maybe useful data: soft data modelling for noise robust ASR (2001)

Morris, Andrew, Barker, Jon, Bourlard, Hervé

Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel...

MAP Combination of Multi-Stream HMM or HMM/ANN Experts (2001)

Morris, Andrew, Hagen, Astrid, Bourlard, Hervé

Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of...

Text Identification in Complex Background using SVM (2001)

Chen, Datong, Bourlard, Hervé, Thiran, Jean-Philippe

This paper presents a fast and robust algorithm to identify text in image or video frames with complex backgrounds and compression effects. The algorithm first extracts the candidate text line on the...

Pronunciation models and their evaluation using confidence measures (2001)

Doss, Mathew Magimai, Bourlard, Hervé

In this report, we present preliminary experiments towards automatic inference and evaluation of pronunciation models based on multiple utterances of each lexicon word and their given baseline...

Video OCR for Sport Video Annotation and Retrieval (2001)

Chen, Datong, Bourlard, Hervé

This paper presents a video OCR system that automatically extracts closed captions from video frames as keywords (or as we called "cues") for building annotations of sport videos. In this system,...

User Customized HMM/ANN Based Speaker Verification (2001)

BenZeghiba, Mohamed F., Bourlard, Hervé

In this paper, we describe a new speaker verification approach, using a hybrid HMM/ANN system, and accommodating user customized passwords. This system is exploiting the high phonetic recognition...

Text Enhancement with Asymmetric Filter for Video OCR (2001)

Chen, Datong, Shearer, Kim, Bourlard, Hervé

Stripes are common sub-structures of text characters, and the scale of these stripes varies little within a word. This scale consistency thus provides us with a useful feature for text detection and...

Text identification in complex background using SVM (2001)

Datong Chen, Hervé Bourlard

This paper presents a fast and robust algorithm to identify text in image or video frames with complex backgrounds and compression effects. The algorithm first extracts the candidate text line on the...

Text enhancement with asymmetric filter for video OCR (2001)

Datong Chen, Kim Shearer, Hervé Bourlard

Stripes are common sub-structures of text characters, and the scale of these stripes varies little within a word. This scale consistency thus provides us with a useful feature for text detection and...

Video OCR for Sport Video Annotation and Retrieval (2001)

Datong Chen, Kim Shearer, Hervé Bourlard

This paper presents a video OCR system that automatically extracts closed captions from video frames as keywords (or as we called "cues") for building annotations of sport videos. In this...

HIDDEN MARKOV MODELS AND OTHER FINITE STATE AUTOMATA FOR SEQUENCE PROCESSING (2001)

Martigny Valais Switzerl, Martigny Valais Suisse, Hervé Bourlard, Samy Bengio, Hervé Bourlard, ...

During these last 20 years, Finite State Automata (FSA), and more particularly Stochastic Finite State Automata (SFSA) and different variants of Hidden Markov Models (HMMs), have been used quite...

HMM2- EXTRACTION OF FORMANT STRUCTURES AND THEIR USE FOR ROBUST ASR (2001)

Katrin Weber, Katrin Weber, Samy Bengio, Hervé Bourlard

Published in Proc. Eurospeech 2001 Abstract: As recently introduced in [1], an HMM2 can be considered as a particular case of an HMM mixture in which the HMM emission probabilities (usually estimated...

Automatic Speech Recognition using Dynamic Bayesian Networks with both Acoustic and Articulatory Variables (2000)

Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy, Morris, Andrew C.

Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the...

A neural network for classification with incomplete data: application to robust ASR (2000)

Morris, Andrew, Josifovski, Ljubomir, Bourlard, Hervé, Cooke, Martin, Green, Phil

If the data vector for input to an automatic classifier is incomplete, the optimal estimate for each class probability must be calculated as the expected value of the classifier output. We identify a...

Automatic Speech Recognition using Dynamic Bayesian Networks with both Acoustic and Articulatory Variables (2000)

Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy, Morris, Andrew C.

Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the...

Automatic Speech Recognition using Pitch Information in Dynamic Bayesian Networks (2000)

Stephenson, Todd A., Magimai Doss, Mathew, Bourlard, Hervé

The challenge of automatic speech recognition (ASR) increases when speaker variability is encountered. Being able to automatically use different acoustic models according to speaker type might help...

Using Multiple Time Scales in the Framework of Multi-Stream Speech Recognition (2000)

Martigny Valais Suisse, Hervé Bourlard, Astrid Hagen, Astrid Hagen

. In this paper, we present a new approach to incorporating multiple time scale information as independent streams in multi-stream processing. To illustrate the procedure, we take two dierent sets of...

Automatic Speech Recognition using Dynamic Bayesian Networks with both Acoustic and Articulatory Variables (2000)

Martigny Valais Suisse, Hervé Bourlard, Both Acoustic, Todd A. Stephenson, Samy Bengio

Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the...

Recent Developments in Speaker Verification at IDIAP (2000)

Bojan Nedic, Hervé Bourlard

This report presents recent developments in speaker verification at IDIAP. The fist part is mostly related to text-independent speaker verification with the special emphasis on NIST'99...

INtegrating SPEech acoustic and linguistic Constraints: Baseline System Development (1999)

Bernardis, Giulia, Bourlard, Hervé, Rajman, Martin, Chappelier, Jean-Cédric

In this report, we discuss the initial issues addressed in a research project aiming at the development of an advanced natural speech recognition system for the automatic processing of telephone...

Multi-stream adaptive evidence combination for noise robust ASR (1999)

Morris, Andrew, Hagen, Astrid, Glotin, Hervé, Bourlard, Hervé

In this paper we develop different mathematical models in the framework of the multi-stream paradigm for noise robust ASR, and discuss their close relationship with human speech perception. Largely...

Different Weighting Schemes In The Full Combination Subbands Approach For Noise Robust ASR (1999)

Astrid Hagen, Andrew Morris, Hervé Bourlard

In this paper, we present and investigate a new method for subband-based Automatic Speech Recognition (ASR) which approximates the ideal `full combination' approach which is itself often not...

Non-stationary multi-channel (multi-stream) processing towards robust and adaptive ASR (1999)

Hervé Bourlard

In this paper 1, we discuss the rationale behind multichannel processing as applied to multi-stream automatic speech recognition (ASR). In this framework, we will develop different mathematical...

Improving Posterior Based Confidence Measures in Hybrid HMM/ANN Speech Recognition Systems (1998)

Bernardis, Giulia, Bourlard, Hervé

In this paper we define and investigate a set of confidence measures based on hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) acoustic models. All these measures are using the neural...

Improving Posterior Based Confidence Measures in Hybrid HMM/ANN Speech Recognition Systems (1998)

Bernardis, Giulia, Bourlard, Hervé

In this paper we define and investigate a set of confidence measures based on hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) acoustic models. All these measures are using the neural...

Speaker Verification - A Quick Overview (1998)

A Quick Overview, Hervé Bourlard, Nelson Morgan

ignal Processing in Humans and Machines (publisher still to be defined), by Ben Gold and Nelson Morgan. 2 IDIAP--RR 98-12 1 Introduction Speech contains many characteristics that are specific to each...

Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions (1998)

Hervé Bourlard, Nelson Morgan

this paper, we first give a brief overview of current state-of-the-art Automatic Speech Recognition (ASR), and then describe the use of ANNs as statistical estimators. We then review the basic...

Using Multiple Time Scales In A Multi-Stream Speech Recognition System (1997)

Stéphane Dupont, Hervé Bourlard

In this paper, we propose and investigate a new approach towards using multiple time scale information in automatic speech recognition (ASR) systems. In this framework, we are using a particular HMM...

Robust Speech Recognition Based on Multi-Stream Features (1997)

Stéphane Dupont, Hervé Bourlard, Christophe Ris

In this paper, we discuss a new automatic speech recognition (ASR) approach based on the independent processing and recombination of several feature streams. In this framework, it is assumed that the...

Hybrid HMM/ANN Systems for Training Independent Tasks: Experiments on Phonebook and Related Improvements (1997)

Stéphane Dupont, Hervé Bourlard, Olivier Deroo, Vincent Fontaine, Jean-marc Boite

In this paper, we evaluate multi-Gaussian HMM systems and hybrid HMM/ANN systems in the framework of task independent training for small size (75 words) and medium size (600 words) vocabularies. To...

Subband-Based Speech Recognition (1997)

Hervé Bourlard, Stéphane Dupont

In the framework of Hidden Markov Models (HMM) or hybrid HMM/Artificial Neural Network (ANN) systems, we present a new approach towards automatic speech recognition (ASR). The general idea is to...

Robust Speech Recognition Based on Multi-Stream Features (1997)

Stéphane Dupont, Hervé Bourlard, Christophe Ris

In this paper, we discuss a new automatic speech recognition (ASR) approach based on the independent processing and recombination of several feature streams. In this framework, it is assumed that the...

Multi-Stream Speech Recognition (1996)

Bourlard, Hervé, Dupont, Stéphane, Ris, Christophe

In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the...

Speaker-Dependent Speech Recognition Based on Phone-Like Units Models --- Application to Voice Dialing (1996)

Fontaine, Vincent, Bourlard, Hervé

This paper presents a speaker dependent speech recognition with application to voice dialing. This work has been developed under the constraints imposed by voice dialing applications, i.e., low...

Remap - Experiments With Speech Recognition (1996)

Yochai Konig, Hervé Bourlard, Nelson Morgan

In this report we present experimental and theoretical results using a framework for training and modeling continuous speech recognition systems based on the theoretically optimal Maximum a...

Multi Stream Speech Recognition (1996)

Hervé Bourlard, Stéphane Dupont, Martigny Valais Suisse, Christophe Ris

. In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the...

Stochastic Perceptual Speech Models with Durational Dependence (1996)

Jeff Bilmes, Nelson Morgan, Su-lin Wu, Hervé Bourlard

In [6], we develop statistical model of speech recognition where emphasis is placed on the perceptually-relevant and information-rich portion of the speech signal. In that model, speech is viewed as...

Speaker-Dependent Speech Recognition Based on Phone-Like Units Models - Application to Voice Dialing (1996)

Vincent Fontaine, Hervé Bourlard, Martigny Valais Suisse

. This paper presents a speaker dependent speech recognition with application to voice dialing. This work has been developed under the constraints imposed by voice dialing applications, i.e., low...

Transition-Based Statistical Training for ASR (1995)

Nelson Morgan, Hervé Bourlard, Yochai Konig, Su-lin Wu

INTRODUCTION It is known that in human speech recognition, the perceptually -dominant and information-rich portions of the speech signal, which may also be the parts with a better chance to withstand...

Stochastic Perceptual Models Of Speech (1995)

Nelson Morgan, Hervé Bourlard, Steven Greenberg, Hynek Hermansky, Su-lin Wu

Wehave recently developed a statistical model of speech that avoids a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a...

REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities in Connectionist Speech Recognition (1995)

Hervé Bourlard, Yochai Konig, Nelson Morgan

In this paper, we briefly describe REMAP, an approach for the training and estimation of posterior probabilities, and report its application to speech recognition. REMAP is a recursive algorithm that...

Stochastic Perceptual Models of Speech (1995)

Nelson Morgan, Hervé Bourlard, Steven Greenberg, Hynek Hermansky, Su-lin Wu

We have recently developed a statistical model of speech that avoids a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a...

REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition (1995)

Yochai Konig, Hervé Bourlard, Nelson Morgan

In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EM-based Forward-Backward (Liporace...

Digit Recognition With Stochastic Perceptual Speech Models (1995)

Nelson Morgan, Su-lin Wu, Hervé Bourlard

We have recently developed a statistical model of speech that focuses statistical modeling power on phonetic transitions. These are the perceptually-dominant and informationrich portions of the...

Connectionist Probability Estimators in HMM Speech Recognition (1994)

Steve Renals, Nelson Morgan, Hervé Bourlard, Michael Cohen, Horacio Franco

Abstract—We are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist...

Connectionist Probability Estimators in HMM Speech Recognition (1994)

Steve Renals, Nelson Morgan, Hervé Bourlard, Michael Cohen, Horacio Franco

We are concerned with integrating connectionist networks into a hidden Markovmodel (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as...

Factoring Networks By A Statistical Method (1992)

Nelson Morgan, Hervé Bourlard

INTRODUCTION Both on theoretical and practical grounds, it is generally preferable to reduce the number of parameters for a trainable classifier system. In particular, it would be desirable to factor...

A New Training Algorithm For Hybrid HMM/ANN Speech Recognition Systems

Hervé Bourlard, Yochai Konig, Nelson Morgan, Christophe Ris

In this paper, we briefly describe REMAP, an approach for the training and estimation of posterior probabilities, and report its application to speech recognition. REMAP is a recursive algorithm that...