TC-Star: Cross-Language Voice Conversion Revisited (2008)
David Sündermann, Harald Höge, Antonio Bonafonte, Hermann Ney, Julia Hirschberg
In the framework of the European speech-to-speech translation project TC-Star, one of the research tasks is cross-language voice conversion. In the recent second evaluation campaign, five...
H.: Residual Prediction (2008)
David Sündermann, Harald Höge, Antonio Bonafonte, Helenca Duxans
Residual prediction is a technique that aims at recovering the spectral details of speech that was encoded using parameterizations as linear predictive coefficients. Example applications of residual...
GAIA: Common Framework for the Development of Speech Translation Technologies (2008)
Javier Pérez, Antonio Bonafonte
We present here an open-source software platform for the integration of speech translation components. This tool is useful to integrate into a common framework different automatic speech recognition,...
INTERSPEECH 2006 Learning from Errors in Grapheme-to-Phoneme Conversion (2008)
Tatyana Polyakova, Antonio Bonafonte
In speech technology it is very important to have a system capable of accurately performing grapheme-to-phoneme (G2P) conversion, which is not an easy task especially if talking about languages like...
Filled pauses in speech synthesis: towards conversational speech. (2008)
Jordi Adell, Antonio Bonafonte, David Escudero
Abstract. Speech synthesis techniques have already reached a high level of naturalness. However, they are often evaluated on text reading tasks. New applications will request for conversational...
Eurospeech 2001- Scandinavia Speech Emotion Recognition Using Hidden Markov Models (2008)
Albino Nogueiras, Asunción Moreno, Antonio Bonafonte, José B. Mariño
This paper introduces a first approach to emotion recognition using RAMSES, the UPC’s speech recognition system. The approach is based on standard speech recognition technology using hidden...
Pablo Daniel Agüero, Jordi Adell, Antonio Bonafonte
Improving TTS quality using pitch contour information of source speaker in
Jordi Adell, Antonio Bonafonte, David Escudero
www.infor.uva.es Resumen: Aunque las tecnologías de voz mejoran de forma constante sus prestaciones, es necesario comprender los mecanismos utilizados en el habla para transmitir, además del...
Pablo Daniel Agüero, Antonio Bonafonte
Resumen: Este artículo presenta un estudio comparado de varios métodos de predicción de pausas, usando el mismo corpus etiquetado. Algunos métodos propuestos por otras publicaciones sobre el tema...
Consistent Estimation of Fujisaki’s Intonation Model Parameters (2008)
Pablo Daniel Agüero, Antonio Bonafonte
This paper presents a novel method to estimate an intonation model based on the representation of the fundamental frequency contour proposed by Fujisaki. Unlike other methods, this approach does not...
The development of a Speech-to-Speech Machine Translation (2008)
David Conejero, Jesús Giménez, Victoria Arranz, Antonio Bonafonte, Neus Pascual, Núria Castell, ...
Creation of lexica and corpora for Catalan, Spanish and US-English is described. A lexicon is being created for speech recognition and synthesis including relevant information. The lexicon contains...
INTERFACE DATABASES: DESIGN AND COLLECTION OF A MULTILINGUAL EMOTIONAL SPEECH DATABASE (2008)
Vladimir Hozjan, Zdravko Kacic, Asuncion Moreno, Antonio Bonafonte
As a part of the IST project Interface ("Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented environments"), an emotional speech database for...
Statistical analysis of filled pauses ’ rhythm for disfluent speech synthesis (2008)
Jordi Adell, Antonio Bonafonte, David Escudero
Given that state of the art speech synthesis systems have already reached a high naturalness level, it is time to move to talking speech from the actual read speech framework. For this purpose it is...
Flexible Harmonic/Stochastic Speech Synthesis (2008)
Daniel Erro, Asunción Moreno, Antonio Bonafonte
In this paper, our flexible harmonic/stochastic waveform generator for a speech synthesis system is presented. The speech is modeled as the superposition of two components: a harmonic component and a...
The UPC TTS System Description for the 2007 Blizzard Challenge (2008)
Antonio Bonafonte, Jordi Adell, Pablo D. Agüero, Daniel Erro, Ignasi Esquerra, Asunción Moreno, ...
This paper presents the evaluation of Ogmios, the UPC TTS system carried out within the Blizzard Challenge Initiative, 2007. Ogmios is a unit-selection based system. Prosodic models are used to...
Automatic Voice-Source Parameterization of Natural Speech (2008)
Javier Pérez, Antonio Bonafonte
We present here our work in automatic parameterization of natural speech by means of a pitch synchronous source-filter decomposition algorithm. The derivative glottal source is modelled using the...
Tatyana Polyakova, Antonio Bonafonte
Resumen: La conversión de letras a fonemas en inglés está siendo desarrollada para su futura integración en un sistema de síntesis de habla dentro del proyecto TC-STAR. En este trabajo se...
Helenca Duxans, Daniel Erro, Javier Pérez, Ferran Diego, Antonio Bonafonte, Asunción Moreno
Voice conversion (VC) technology allows to transform the voice of the source speaker so that it is perceived as the voice of a target speaker. One of the applications of VC is speech-to-speech...
Spanish Synthesis Corpora (2008)
Martí Umbert, Asunción Moreno, Pablo Agüero, Antonio Bonafonte
This paper deals with the design of a synthesis database for a high quality corpus-based Speech Synthesis system in Spanish. The database has been designed for speech synthesis, speech conversion and...
Antonio Bonafonte, Pablo D. Agüero, Jordi Adell, Javier Pérez, Asunción Moreno
This paper presents the baseline text-to-speech system developed at UPC (Ogmios) plus our recent work on speech prosody generation and the procedures to create high quality language resources for...
Aneto: A Tool for Prosody Analysis of Speech (2007)
Albert Febrer, Albert Febrer, Antonio Bonafonte, Ignasi Esquerra
The developed tool provides utilities for prosody analysis and labeling of voice signals. It works under Windows 95 and Windows NT environments and uses the Microsoft Win32 application programming...
Acceptance testing of a spoken language translation system (2006)
Rafael Banchs, Antonio Bonafonte, Javier Pérez
This paper describes an acceptance test procedure for evaluating a spoken language translation system between Catalan and Spanish. The procedure consists of two independent tests. The first test was...
Text-independent voice conversion based on unit selection (2006)
David Sündermann, Harald Höge, Antonio Bonafonte, Hermann Ney, Alan Black, Shri Narayanan
So far, most of the voice conversion training procedures are text-dependent, i.e., they are based on parallel training utterances of source and target speaker. Since several applications (e.g....
Prosody generation for speech-to-speech translation (2006)
Pablo Daniel Agüero, Jordi Adell, Antonio Bonafonte
This paper deals with speech synthesis in the framework of speech-to-speech translation. Our current focus is to translate speeches or conversations between humans so that a third person can listen...
Prosody generation for speech-to-speech translation (2006)
Pablo Daniel Agüero, Jordi Adell, Antonio Bonafonte
This paper deals with speech synthesis in the framework of speechto-speech translation. Our current focus is to translate speeches or conversations between humans so that a third person can listen to...
ECESS inter-module interface specification for speech synthesis (2006)
Javier Pérez, Antonio Bonafonte, Horst-udo Hain, Eric Keller, Stefan Breuer, Jilei Tian
The newly founded European Centre of Excellence for Speech Synthesis (ECESS) (ECESS, 2004) is an initiative to promote the development of the European research area (ERA) in the field of Language...
Residual Prediction Based on Unit Selection (2005)
David Sündermann, Harald Höge, Antonio Bonafonte, Hermann Ney, Alan W Black
Recently, we presented a study on residual prediction techniques that can be applied to voice conversion based on linear transformation or hidden Markov model-based speech synthesis. Our voice...
Comparative study of automatic phone segmentation methods for TTS (2005)
Jordi Adell, Antonio Bonafonte, Jon Ander Gómez, María José Castro
www.talp.upc.es In the present paper we present two novel approaches to phonetic speech segmentation. One based on acoustical clustering plus dynamic time warping and a second one based on a boundary...
H.: A Study on Residual Prediction Techniques for Voice Conversion (2005)
David Sündermann, Antonio Bonafonte
Several well-studied voice conversion techniques use line spectral frequencies as features to represent the spectral envelopes of the processed speech frames. In order to return to the time domain,...
Residual Prediction Based on Unit Selection (2005)
David Sündermann, Harald Höge, Antonio Bonafonte, Hermann Ney, Alan W Black
Recently, we presented a study on residual prediction techniques that can be applied to voice conversion based on linear transformation or hidden Markov model-based speech synthesis. Our voice...
Matej Rojc, Pablo Daniel Agüero, Antonio Bonafonte, Zdravko Kacic
This paper focuses on the estimation of the Tilt intonation model [1]. Usually, Tilt events are detected using a first estimation which is improved using gradient descent techniques. To speed up the...
A First Step Towards Text-Independent Voice Conversion (2004)
David Sündermann, Antonio Bonafonte
So far, all conventional voice conversion approaches are text-dependent, i.e., they need equivalent training utterances of source and target speaker. Since several recently proposed applications call...
Intonation modeling for TTS using a joint extraction and prediction approach (2004)
Pablo Daniel Agüero, Antonio Bonafonte
This paper presents a joint extraction and prediction framework for intonation modeling. The intonation model is based on a superpositional approach using Bézier curves. The components are attached...
Towards Phone Segmentation For Concatenative Speech Synthesis (2004)
Jordi Adell Antonio, Antonio Bonafonte
We present a new approach to solve the problem of phone segmentation when preparing databases for concatenative Text-to-Speech synthesis. First, we describe the problem and review the state of the...
Automatic analysis and synthesis of Fujisaki’s intonation model for TTS,” Speech Prosody (2004)
Pablo Daniel Agüero, Klaus Wimmer, Antonio Bonafonte
This paper deals with the automatic analysis and synthesis of intonation using Fujisaki’s model. Both the accent commands and the phrase commands are related to the accent group. We propose an...
Joint extraction and prediction of Fujisaki’s intonation model parameters (2004)
Pablo Daniel Agüero, Klaus Wimmer, Antonio Bonafonte
This paper presents a joint extraction and prediction framework for intonation modeling applied to Fujisaki’s intonation model for text-to-speech conversion. Previous methods in the area extract...
Including dynamic and phonetic information in voice conversion systems (2004)
Helenca Duxans, Antonio Bonafonte, Er Kain, Jan Van Santen
Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture...
Toward phone segmentation for concatenative speech synthesis (2004)
Jordi Adell, Antonio Bonafonte
We present a new approach to solve the problem of phone segmentation when preparing databases for concatenative Text-to-Speech synthesis. First, we describe the problem and review the state of the...
HMM recognition of expressions in unrestrained video intervals (2003)
José Luis L, Montse Pardàs, Antonio Bonafonte
This paper discusses the application of a facial expression recognition system in unrestrained video intervals. The system is based on the modeling of the expressions by means of Hidden Markov...
Facial animation parameters extraction and expression detection using HMM (2002)
Montse Pardàs, Antonio Bonafonte
The video analysis system described in this paper aims at facial expression recognition consistent with the MPEG4 standardized parameters for facial animation, FAP. For this reason, two levels of...
Emotion recognition based on MPEG-4 facial animation parameters (2002)
Montse Pardàs, Antonio Bonafonte, José Luis L
In this paper a facial expression recognition system is presented. The system is based on the modelling of the expressions by means of Hidden Markov Models. The observations used to create the models...
Speechdat-car: Towards a collection of speech databases for automotive environments (1999)
Antonio Bonafonte, Jerome Boudy, Ra Dufour, Philip Lockwood
The SpeechDat-Car project is a 4 th framework EC project in the Language Engineering programme. It aims at collecting a set of nine speech databases to support training and testing of robust...
Speechdat-Car: Towards A Collection Of Speech Databases For Automotive Environments (1999)
Antonio Bonafonte, Jerome Boudy, Sandra Dufour, Ra Dufour, Philip Lockwood, ...
The SpeechDat-Car project is a 4 th framework EC project in the Language Engineering programme. It aims at collecting a set of nine speech databases to support training and testing of robust...
Modeling Phone Duration: Application To Catalan Tts (1998)
Albert Febrer, Jaume Padrell, Antonio Bonafonte
There are many exhaustive works that deal with the use of models for segmental duration. The aim of this paper is to evaluate some of the properties mentioned in literature and evaluate factorial and...
Using X-Gram For Efficient Speech Recognition (1998)
Antonio Bonafonte, José B. Mariño, Jos� B. Mari��o
X-grams are a generalization of the n-grams, where the number of previous conditioning words is different for each case and decided from the training data. X-grams reduce perplexity with respect to...
The UPC text-to-speech system for Spanish and Catalan (1998)
Antonio Bonafonte, Ignasi Esquerra, Albert Febrer, Francesc Vallverdú
This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different...
The UPC text-to-speech system for Spanish and Catalan (1998)
Antonio Bonafonte, Ignasi Esquerra, Albert Febrer, Francesc Vallverdú
This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different...
Sethos: the UPC speech understanding system (1996)
Antonio Bonafonte, José B. Mariño, Albino Nogueiras
ABSTRACT. In EuroSpeech’95, we presented the first version of Sethos, the speech understanding system which has been developed at the UPC. In this paper some improvements are incorporated at...
Explicit segmentation of speech using gaussian models (1996)
Antonio Bonafonte, Albino Nogueiras, Antonio Rodriguez-garrido
In this paper we investigate an automatic method to segment labeled speech. The method needs an initial estimation of the segmentation which is provided by an alignment based on HMM. Afterwards, the...
Language Modeling Using X-Grams (1996)
Antonio Bonafonte, José B. Mariño, Jos� B. Mari��o
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) is not fixed a priori. Instead, first, large memories are accepted and afterwards, merging criteria...
Sethos: The Upc Speech Understanding System (1996)
Antonio Bonafonte, José B. Mariño, Jos� B. Mari��o, Albino Nogueiras
. In EuroSpeech'95, we presented the first version of Sethos, the speech understanding system which has been developed at the UPC. In this paper some improvements are incorporated at different...
The Demiphone: An Efficient Subword Unit For Continuous Speech Recognition
Jose B. Marino, Albino Nogueiras, Antonio Bonafonte
In this paper we introduce the demiphone as a contextual phonetic unit for continuous speech recognition. A phone is divided into two parts: a left demiphone that accounts for the left side...