Integrating SPEech acoustic and linguistic Constraints: Baseline System Development (2006)
Bernardis, Giulia, Bourlard, Hervé, Chappelier, Jean-Cédric, Rajman, Martin
Using Pitch as Prior Knowledge in Template-Based Speech Recognition (2006)
Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé
In a previous paper on speech recognition, we showed that templates can better capture the dynamics of speech signal compared to parametric models such as hidden Markov models. The key point in...
On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR (2006)
Tyagi, Vivek, Bourlard, Hervé, Wellekens, Christian
It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time...
Using Posterior-Based Features in Template Matching for Speech Recognition (2006)
Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé
Given the availability of large speech corpora, as well as the increasing of memory and computational resources, the use of template matching approaches for automatic speech recognition (ASR) have...
State-of-the-Art and Recent Progress in Hybrid HMM/ANN Speech Recognition (2006)
Bourlard, Hervé, Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D.
Improving Posterior Based Confidence Measures in Hybrid HMM/ANN Speech Recognition Systems (2006)
Bernardis, Giulia, Bourlard, Hervé
In this paper we define and investigate a set of confidence measures based on hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) acoustic models. All these measures are using the neural...
Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy, Morris, Andrew C.
Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the...
A neural network for classification with incomplete data: application to robust ASR (2006)
Morris, Andrew, Josifovski, Ljubomir, Bourlard, Hervé, Cooke, Martin, Green, Phil
If the data vector for input to an automatic classifier is incomplete, the optimal estimate for each class probability must be calculated as the expected value of the classifier output. We identify a...
Multi-stream adaptive evidence combination for noise robust ASR (2006)
Morris, Andrew, Hagen, Astrid, Glotin, Hervé, Bourlard, Hervé
In this paper we develop different mathematical models in the framework of the multi-stream paradigm for noise robust ASR, and discuss their close relationship with human speech perception. Largely...
Text Enhancement with Asymmetric Filter for Video OCR (2006)
Chen, Datong, Shearer, Kim, Bourlard, Hervé
Stripes are common sub-structures of text characters, and the scale of these stripes varies little within a word. This scale consistency thus provides us with a useful feature for text detection and...
From missing data to maybe useful data: soft data modelling for noise robust ASR (2006)
Morris, Andrew, Barker, Jon, Bourlard, Hervé
Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel...
Modeling Auxiliary Information in Bayesian Network Based ASR (2006)
Stephenson, Todd A., Mathew, M., Bourlard, Hervé
Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not...
Video OCR for Sport Video Annotation and Retrieval (2006)
Chen, Datong, Shearer, Kim, Bourlard, Hervé
This paper presents a video OCR system that automatically extracts closed captions from video frames as keywords (or as we called "cues") for building annotations of sport videos. In this system,...
MAP Combination of Multi-Stream HMM or HMM/ANN Experts (2006)
Morris, Andrew, Hagen, Astrid, Bourlard, Hervé
Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of...
Text Identification in Complex Background using SVM (2006)
Chen, Datong, Bourlard, Hervé, Thrian, Jean-Philippe
This paper presents a fast and robust algorithm to identify text in image or video frames with complex backgrounds and compression effects. The algorithm first extracts the candidate text line on the...
Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition (2006)
Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé
Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution...
Stephenson, Todd A., Escofet, Jaume, Magimai-Doss, Mathew, Bourlard, Hervé
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they...
Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition (2006)
Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé
In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain...
Low cost duration modelling for noise robust speech recognition (2006)
Morris, Andrew, Payne, Simon, Bourlard, Hervé
State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately...
Text Segmentation and Recognition in Complex Background Based on Markov Random Field (2006)
Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé
In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each...
User-Customized Password HMM Based Speaker Verification (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
is presented. The system has no it a priori knowledge of passwords. A hybrid HMM/ANN system is used to infer the phonetic transcription of the password. The emission probabilities are then modeled by...
User-Customized Password Speaker Verification based on HMM/ANN and GMM Models (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
In this paper, we present a new approach towards user-custom-ized password speaker verification combining the advantages of hybrid HMM/ANN systems, using Artificial Neural Networks (ANN) to estimate...
Hybrid HMM/ANN and GMM Combination for User-Customized Password Speaker Verification (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
Recently we have proposed an approach for user-customized password speaker verification; in this approach, we combined a hybrid HMM/ANN model (used for utterance verification) and a GMM model (used...
On automatic annotation of meeting databases (2006)
Gatica-Perez, Daniel, McCowan, Iain, Barnard, Mark, Bengio, Samy, Bourlard, Hervé
In this paper, we discuss meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks,...
On the Combination of Speech and Speaker Recognition (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
This paper investigates an approach that maximizes the joint posterior probabil ity of the pronounced word and the speaker identity given the observed data. This probability can be expressed as a...
Using pitch frequency information in speech recognition (2006)
-Doss, Mathew Magimai., Stephenson, Todd A., Bourlard, Hervé
Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with pitch...
Phoneme-Grapheme Based Speech Recognition System (2006)
-Doss, Mathew Magimai., Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy
State-of-the-art Automatic Speech Recognition (ASR) systems typically use phoneme as the subword units. In this paper, we investigate a system where the word models are defined in-terms of two...
Pujol, Pere, Pol, Susagna, Nadeu, Climent, Hagen, Astrid, Bourlard, Hervé
Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency...
New entropy based combination rules in HMM/ANN multi-stream ASR (2006)
Misra, Hemant, Bourlard, Hervé, Tyagi, Vivek
Classifier performance is often enhanced through combining multiple streams of information. In the context of multi-stream HMM/ANN systems in ASR, a confidence measure widely used in classifier...
Text Detection and Recognition in Images and Videos (2006)
Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé
Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing...
Joint Decoding for Phoneme-Grapheme Continuous Speech Recognition (2006)
-Doss, Mathew Magimai., Bengio, Samy, Bourlard, Hervé
Standard ASR systems typically use phoneme as the subword units. Preliminary studies have shown that the performance of the ASR system could be improved by using grapheme as additional subword units....
Modelling Auxiliary Features in Tandem Systems (2006)
-Doss, Mathew Magimai., Stephenson, Todd A., Ikbal, Shajith, Bourlard, Hervé
Tandem systems transform the cepstral features into posterior probabilities of subword units using artificial neural networks (ANNs), which are processed to form input features for conventional...
Confidence Measures in Multiple pronunciations Modeling For Speaker Verification (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
This paper investigates the use of multiple pronunciations modeling for User-Customized Password Speaker Verification (UCP-SV). The main characteristic of the UCP-SV is that the system does not have...
Spectral Entropy Based Feature for Robust ASR (2006)
Misra, Hemant, Ikbal, Shajith, Bourlard, Hervé, Hermansky, Hynek
In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF), entropy can also be used to measure the...
Posteriori Probabilities and Likelihoods Combination for Speech and Speaker Recognition (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
This paper investigates a new approach to perform simultaneous speech and speaker recognition. The likelihood estimated by a speaker identification system is combined with the posterior probability...
Hierarchical Multi-Stream Posterior Based Speech Recognition System (2006)
Ketabdar, Hamed, Bourlard, Hervé, Bengio, Samy
In this paper, we present initial results towards boosting posterior based speech recognition systems by estimating more informative posteriors using multiple streams of features and taking into...
Developing and Enhancing Posterior Based Speech Recognition Systems (2006)
Ketabdar, Hamed, Vepa, Jithendra, Bengio, Samy, Bourlard, Hervé
Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni...
Improving Speech Recognition Using a Data-Driven Approach (2006)
Aradilla, Guillermo, Vepa, Jithendra, Bourlard, Hervé
In this paper, we investigate the possibility of enhancing state-of-the-art HMM-based speech recognition systems using data-driven techniques, where whole set of training utterances is used as...
Multi-resolution Spectral Entropy Based Feature for Robust ASR (2006)
Misra, Hemant, Ikbal, Shajith, Sivadas, Sunil, Bourlard, Hervé
Recently, entropy measures at different stages of recognition have been used in automatic speech recognition (ASR) task. In a recent paper, we proposed that formant positions of a spectrum can be...
Spectral Entropy Feature in Full-Combination Multi-Stream for Robust ASR (2006)
Misra, Hemant, Bourlard, Hervé
In a recent paper, we reported promising automatic speech recognition results obtained by appending spectral entropy features to PLP features. In the present paper, spectral entropy features are used...
Multi-Stream Speech Recognition (2006)
Bourlard, Hervé, Dupont, Stéphane, Ris, Christophe
In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the...
Fontaine, Vincent, Bourlard, Hervé
This paper presents a speaker dependent speech recognition with application to voice dialing. This work has been developed under the constraints imposed by voice dialing applications, i.e., low...
Improving Posterior Based Confidence Measures in Hybrid HMM/ANN Speech Recognition Systems (2006)
Bernardis, Giulia, Bourlard, Hervé
In this paper we define and investigate a set of confidence measures based on hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) acoustic models. All these measures are using the neural...
INtegrating SPEech acoustic and linguistic Constraints: Baseline System Development (2006)
Bernardis, Giulia, Bourlard, Hervé, Rajman, Martin, Chappelier, Jean-Cédric
In this report, we discuss the initial issues addressed in a research project aiming at the development of an advanced natural speech recognition system for the automatic processing of telephone...
Multi-stream adaptive evidence combination for noise robust ASR (2006)
Morris, Andrew, Hagen, Astrid, Glotin, Hervé, Bourlard, Hervé
In this paper we develop different mathematical models in the framework of the multi-stream paradigm for noise robust ASR, and discuss their close relationship with human speech perception. Largely...
Stephenson, Todd A., Bourlard, Hervé, Bengio, Samy, Morris, Andrew C.
Current technology for automatic speech recognition (ASR) uses hidden Markov models (HMMs) that recognize spoken speech using the acoustic signal. However, no use is made of the causes of the...
Automatic Speech Recognition using Pitch Information in Dynamic Bayesian Networks (2006)
Stephenson, Todd A., Magimai Doss, Mathew, Bourlard, Hervé
The challenge of automatic speech recognition (ASR) increases when speaker variability is encountered. Being able to automatically use different acoustic models according to speaker type might help...
Modeling Auxiliary Information in Bayesian Network Based ASR (2006)
Stephenson, Todd A., MAGIMAI DOSS, Mathew, Bourlard, Hervé
Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not...
Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition (2006)
Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé
Standard hidden Markov models (HMMs), as used in automatic speech recognition (ASR), calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution...
From missing data to maybe useful data: soft data modelling for noise robust ASR (2006)
Morris, Andrew, Barker, Jon, Bourlard, Hervé
Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel...
MAP Combination of Multi-Stream HMM or HMM/ANN Experts (2006)
Morris, Andrew, Hagen, Astrid, Bourlard, Hervé
Automatic speech recognition (ASR) performance falls dramatically with the level of mismatch between training and test data. The human ability to recognise speech when a large proportion of...
Text Identification in Complex Background using SVM (2006)
Chen, Datong, Bourlard, Hervé, Thiran, Jean-Philippe
This paper presents a fast and robust algorithm to identify text in image or video frames with complex backgrounds and compression effects. The algorithm first extracts the candidate text line on the...
Pronunciation models and their evaluation using confidence measures (2006)
Doss, Mathew Magimai, Bourlard, Hervé
In this report, we present preliminary experiments towards automatic inference and evaluation of pronunciation models based on multiple utterances of each lexicon word and their given baseline...
Video OCR for Sport Video Annotation and Retrieval (2006)
This paper presents a video OCR system that automatically extracts closed captions from video frames as keywords (or as we called "cues") for building annotations of sport videos. In this system,...
User Customized HMM/ANN Based Speaker Verification (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
In this paper, we describe a new speaker verification approach, using a hybrid HMM/ANN system, and accommodating user customized passwords. This system is exploiting the high phonetic recognition...
Stephenson, Todd A., Escofet, Jaume, Magimai-Doss, Mathew, Bourlard, Hervé
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they...
Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition (2006)
Stephenson, Todd A., Magimai-Doss, Mathew, Bourlard, Hervé
In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain...
Low cost duration modelling for noise robust speech recognition (2006)
Morris, Andrew C., Payne, Simon, Bourlard, Hervé
State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately...
Text Segmentation and Recognition in Complex Background Based on Markov Random Field (2006)
Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé
In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each...
Entropy-Based Multi-Stream Combination (2006)
Misra, Hemant, Bourlard, Hervé, Tyagi, Vivek
Full-combination multi-band approach has been proposed in the literature and performs well for band-limited noise. But the approach fails to deliver in case of wide-band noise. To overcome this,...
Text Detection and Recognition in Images and Videos (2006)
Chen, Datong, Odobez, Jean-Marc, Bourlard, Hervé
Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing...
Modelling auxiliary information (pitch frequency) in hybrid HMM/ANN based ASR systems (2006)
-Doss, Mathew Magimai., Stephenson, Todd A., Bourlard, Hervé
Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with auxiliary...
User-Customized Password HMM Based Speaker Verification (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
is presented. The system has no it a priori knowledge of passwords. A hybrid HMM/ANN system is used to infer the phonetic transcription of the password. The emission probabilities are then modeled by...
Hybrid HMM/ANN and GMM Combination for User-Customized Password Speaker Verification (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
Recently we have proposed an approach for user-customized password speaker verification; in this approach, we combined a hybrid HMM/ANN model (used for utterance verification) and a GMM model (used...
User-Customized Password Speaker Verification based on HMM/ANN and GMM Models (2006)
BenZeghiba, Mohamed F., Bourlard, Hervé
In this paper, we present a new approach towards user-custom-ized password speaker verification combining the advantages of hybrid HMM/ANN systems, using Artificial Neural Networks (ANN) to estimate...
Information Retrieval on Noisy Text (2006)
Grangier, David, Vinciarelli, Alessandro, Bourlard, Hervé
Spoken Document Retrieval (SDR) consists in retrieving segments of a speech database that are relevant to a query. The state-of-the-art approach to the SDR problem consists in transcribing the speech...
Using pitch frequency information in speech recognition (2006)
-Doss, Mathew Magimai., Stephenson, Todd A., Bourlard, Hervé
Automatic Speech Recognition systems typically use smoothed spectral features as acoustic observations. In recent studies, it has been shown that complementing these standard features with pitch...
On automatic annotation of meeting databases (2006)
Gatica-Perez, Daniel, McCowan, Iain, Barnard, Mark, Bengio, Samy, Bourlard, Hervé
In this paper, we discuss meetings as an application domain for multimedia content analysis. Meeting databases are a rich data source suitable for a variety of audio, visual and multi-modal tasks,...