Efficient Handling of N-gram Language Models for Statistical Machine Translation (2009)
Marcello Federico, Fondazione Bruno, Kessler Irst, Mauro Cettolo
Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient...
A Web-based Demonstrator of a Multi-lingual Phrase-based Translation System (2009)
Roldano Cattoni, Nicola Bertoldi, Mauro Cettolo, Boxing Chen, Marcello Federico
This paper describes a multi-lingual phrase-based Statistical Machine Translation system accessible by means of a Web page. The user can issue translation requests from Arabic, Chinese or Spanish...
Exploiting Word Transformation in Statistical Machine Translation from Spanish to English (2008)
Deepa Gupta, Marcello Federico
This paper investigates the use of morphosyntactic information to reduce datasparseness in statistical machine translation from Spanish to English. In particular, word-alignment training is performed...
phrase-based Statistical Machine Translation system (2008)
Roldano Cattoni, Nicola Bertoldi, Mauro Cettolo, Boxing Chen, Marcello Federico
In this demonstration we present our multi-lingual
A Web-based Demonstrator of a Multi-lingual Phrase-based Translation System (2008)
Roldano Cattoni, Nicola Bertoldi, Mauro Cettolo, Boxing Chen, Marcello Federico
This paper describes a multi-lingual phrase-based Statistical Machine Translation system accessible by means of a Web page. The user can issue translation requests from Arabic, Chinese or Spanish...
Jerry Goldman, Steve Renals, Steven Bird, Franciska Jong, Marcello Federico, Carl Fleischhauer, ...
The date of receipt and acceptance will be inserted by the editor Abstract. Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental...
Robust and Reliable Speech Understanding in Restricted Domains (2007)
Giuliano Antoniol, Mauro Cettolo, Marcello Federico
This paper describes the components of an Automatic Speech Understanding (ASU) system developed at IRST within the framework of the MAIA
An Optimum Classifier Approximation for Network-Based Handwritten Character Recognition (2007)
Marcello Federico, Stefano Messelodi, Luigi Stringa
An approximation of the Bayes decision rule and its implementation on a two-layered network are described. The net is trained in two phases: first, probabilities of the discrete-valued input features...
Marcello Federico, Nicola Bertoldi, Vanessa Sandrini
This paper presents the development of a Named Entity (NE) recognition sys-tem for the Italian broadcast news do-main. A statistical model is introduced based on a trigram language model de-fined on...
TECHNIQUES FOR APPROXIMATING A TRIGRAM LANGUAGE MODEL (2007)
Fabio Brugnara, Marcello Federico
In this paper several methods are proposed for reducing the size of a trigram language model �LM�, which is often the biggest data structure in a continuous speech recognizer, without a�ecting...
USABILITY FIELD-TEST OF A SPOKEN DATA-ENTRY SYSTEM (2007)
Marcello Federico, Fabio Brugnara, Roberto Gretter
This paper reports on the field-test of a speech based data-entry system developed as a follow-up of an EC funded project. The application domain is the data-entry of personnel absence records from a...
Robust Analysis Of Spoken Input Combining Statistical And (2007)
Roldano Cattoni, Marcello Federico, Alon Lavie
The work presented in this paper concerns the analysis of automatic transcription of spoken input into an interlingua formalism for a speech-to-speech machine translation system. This process is...
Moses: Open source toolkit for statistical machine translation (2007)
Hieu Hoang, Alexandra Birch, Chris Callison-burch, Richard Zens, Rwth Aachen, Alexandra Constantin, ...
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c)...
Improving statistical word alignments with morpho-syntactic transformations (2006)
Adrià De Gispert, Deepa Gupta, Maja Popović, Patrik Lambert, Jose B. Mariño, Marcello Federico, ...
Abstract. This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of...
Improving phrase-based statistical translation through combination of word alignment (2006)
Boxing Chen, Marcello Federico
Abstract. This paper investigates the combination of word-alignments computed with the competitive linking algorithm and well-established IBM models. New training methods for phrase-based statistical...
Maja Popović, Hermann Ney, Adrià De Gispert, José B. Mariño, Deepa Gupta, Marcello Federico, ...
Evaluation of machine translation output is an important but difficult task. Over the last years, a variety of automatic evaluation measures have been studied, some of them like Word Error Rate...
Maja Popovic, Hermann Ney, Adrià De Gispert, José B. Mariño, Deepa Gupta, Marcello Federico, ...
Evaluation of machine translation output is an important but difficult task. Over the last years, a variety of automatic evaluation measures have been studied, some of them like Word Error Rate...
How Many Bits Are Needed To Store Probabilities for Phrase-Based Translation? (2006)
Marcello Federico, Nicola Bertoldi
State of the art in statistical machine translation is currently represented by phrasebased models, which typically incorporate a large number of probabilities of phrase-pairs and word n-grams. In...
Accessing the spoken word (2005)
Goldman, Jerry, Renals, Steve, Bird, Steven, De Jong, Franciska, Federico, Marcello, Fleischhauer, Carl, ...
Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access...
Transforming Access to the Spoken Word (2005)
Goldman, Jerry, Renals, Steve, Bird, Steven, De Jong, Franciska, Federico, Marcello, Fleischhauer, Carl, ...
Spoken word audio collections cover many domains,including radio and television broadcasts, oral narratives,governmental proceedings, lectures, and telephone conversations.The collection, access and...
Accessing the spoken word (2005)
Goldman, Jerry, Renals, Steve, Bird, Steven, De Jong, Franciska, Federico, Marcello, Fleischhauer, Carl, ...
Spoken-word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access,...
Accessing the spoken word (2005)
Goldman, Jerry, Renals, Steve, Bird, Steven, De Jong, Franciska, Federico, Marcello, Fleischhauer, Carl, ...
Spoken-word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access,...
Accessing the spoken word (2005)
Goldman, Jerry, Renals, Steve, Bird, Steven, De Jong, Franciska, Federico, Marcello, Fleischhauer, Carl, ...
Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access...
Accessing the Spoken Word (2005)
Jerry Goldman, Steve Renals, Steven Bird, Franciska Jong, Mark Kornbluh, ...
Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access...
The CLEF 2003 Cross-Language Spoken Document Retrieval Track (2004)
Marcello Federico, Gareth Jones
The current expansion in collections of natural language based digital documents in various media and languages is creating challenging opportunities for automatically accessing the information...
Evaluation Frameworks for Speech Translation Technologies (2003)
This paper reports on activities carried out under the European project PF-STAR and within the CSTAR consortium, which aim at evaluating speech translation technologies. In PF-STAR, speech...
Language model adaptation through topic decomposition and mdi estimation (2002)
This work presents a language model adaptation method combining the latent semantic analysis framework with the minimum discrimination information estimation criterion. In particular, an unsupervised...
ITC-irst at CLEF 2001: Monolingual and bilingual tracks (2002)
Nicola Bertoldi, Marcello Federico
Abstract. This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum (CLEF) of 2001. ITC-irst has taken part to two tracks: the monolingual retrieval task, and the...
Cross-task portability of a broadcast news speech recognition system. Speech Communication (2002)
N. Bertoldi, F. Brugnara, M. Cettolo, M. Federico, D. Giuliani, Marcello Federico
This paper reports on experiments of porting the ITC-irst Italian broadcast news recognition system to two spontaneous dialogue domains. Porting was investigated by applying state-of-the-art...
ITC-irst at CLEF 2001: Monolingual and bilingual tracks (2002)
Nicola Bertoldi, Marcello Federico
This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum (CLEF) of 2001. ITC-irst has taken part to two tracks: the monolingual retrieval task, and the bilingual...
Broadcast News LM Adaptation using Contemporary Texts (2001)
Marcello Federico, Nicola Bertoldi
This paper investigates the problem of dynamically updating the language model (LM) of a broadcast news speech recognition system, in order to cope with language and topic changes, typical of the...
ITC-irst at CLEF 2000: Italian monolingual track (2001)
Nicola Bertoldi, Marcello Federico
Abstract. This paper presents work on document retrieval for Italian carried out at ITC-irst. Two different approaches to information retrieval were investigated, one based on the Okapi weighting...
Unsupervised Language and Acoustic Model Adaptation for Cross Domain Portability (2001)
Diego Giuliani, Marcello Federico
This work investigates the task of porting a broadcast news recognition system to a conversational speech domain, for which only untranscribed acoustic data are available. An iterative adaptation...
Robust Analysis Of Spoken Input Combining Statistical And (2001)
Roldano Cattoni, Marcello Federico
The work presented in this paper concerns the analysis of automatic transcription of spoken input into an interlingua formalism for a speech-to-speech machine translation system. This process is...
Development and Evaluation of an Italian Broadcast News Corpus (2000)
Marcello Federico, Dimitri Giordani, Paolo Coletti
This paper reports on the development and evaluation of an Italian broadcast news corpus at ITC-irst, under a contract with the European Language resources Distribution Agency (ELDA). The corpus...
A System for the Retrieval of Italian Broadcast News (2000)
This paper presents a prototype for the retrieval of Italian broadcast news, which has been developed at ITC-irst. The architecture employs a speech recognition engine for the automatic transcription...
Model selection criteria for acoustic segmentation (2000)
Mauro Cettolo, Marcello Federico
Robust acoustic segmentation has become a critical issue in order to apply speech recognition to audio streams with variable acoustic content, e.g. radio programs. Many techniques in the literature...
Italian text retrieval for CLEF 2000 at ITC-irst (2000)
Nicola Bertoldi, Marcello Federico
This paper presents work on document retrieval for Italian carried out at ITC-irst. Two different approaches to information retrieval were investigated, one based on the Okapi weighting formula and...
This work presents a usability evaluation performed during the field-test of a speech based data-entry system. The application domain is the data-entry of personnel absence records from a huge...
Efficient Language Model Adaptation through MDI Estimation (1999)
This paper presents a method for n-gram language model adaptation based on the principle of minimum discrimination information. A background language model is adapted to fit constraints on its...
A two-stage speech recognition method for information retrival applications (1999)
Paolo Coletti, Marcello Federico
This paper presents a two-stage approach to speech recognition that is suited for information retrieval tasks, e.g. accessing a large telephone directory. The first stage performs a Viterbi beam...
Bayesian Estimation Methods for N-Gram Language Model Adaptation (1996)
Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge...
Language Modeling for Efficient Beam-Search (1995)
Marcello Federico, Mauro Cettolo, Fabio Brugnara, Giuliano Antoniol
This paper considers the problems of estimating bigram language mod-els and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based,...
Language Model Representations For Beam-Search Decoding (1995)
Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes...
RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM (1994)
Bianca Angelini, Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico, Roberto Fiutem, ...
Radiological reporting has already been identified as a field in which voice technologies can prove to be very useful. Recent progress in automatic speech recognition and in hardware and software...
Language Model Estimations And Representations For Real-Time Continuous Speech Recognition (1994)
Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
This paper compares different ways of estimating bigram language models and of representing them in a finite state network used by a beam-search based, continuous speech, and speaker independent HMM...
Language Model Estimations and Representations for Real-time Continuous Speech Recognition (1994)
Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
This paper compares different ways of estimating bigram language models and of representing them in a finite state network used by a beam-search based, continuous speech, and speaker independent HMM...
Radiological Reporting by Speech Recognition: The A.Re.S. System (1994)
Bianca Angelini, Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico, Roberto Fiutem, ...
Radiological reporting has already been identified as a field in which voice technologies can prove to be very useful. Recent progress in automatic speech recognition and in hardware and software...
Techniques For Robust Recognition In Restricted Domains (1993)
Giuliano Antoniol, Mauro Cettolo, Marcello Federico
This paper describes an Automatic Speech Understanding (ASU) system used in a human-robot interface for the remote control of a mobile robot. The intended application is that of an operator issuing...
Robust Speech Understanding for Robot Telecontrol (1993)
Giuliano Antoniol, Roldano Cattoni, Mauro Cettolo, Marcello Federico
This paper describes an Automatic Speech Understanding (ASU) system used in a human-robot interface for the remote control of a mobile robot. The intended application is that of an operator issuing...
Techniques For Robust Recognition In Restricted Domains (1993)
Giuliano Antoniol, Mauro Cettolo, Marcello Federico
This paper describes an Automatic Speech Understanding (ASU) system used in a human-robot interface for the remote control of a mobile robot. The intended application is that of an operator issuing...
Language Models Comparison in a Robot Telecontrol Application (1993)
Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
Stochastic Language Models (LMs) are key for achieving good performance in speech recognition systems. This is confirmed by the numerous LMs that have been proposed recently in the literature. This...