Publication View

Exploiting Word Transformation in Statistical Machine Translation from Spanish to English (2008)

Abstract
This paper investigates the use of morphosyntactic information to reduce datasparseness in statistical machine translation from Spanish to English. In particular, word-alignment training is performed by applying different word transformations using lemmas and stems. It has been observed that stem-based training is better than lemma-based training when up to 1 million running words of data are used. In this paper a new word-alignment training technique is proposed by exploiting syntactically motivated constraints to the parallel data. Preliminary experimental results show that stem-based training with syntactically motivated constraints gives significant improvement in translation performance. Finally, a technique to reduce the impact of out-of-vocabulary words is discussed. The considered task is the translation of Plenary Sessions of the European Parliament. 1

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.127.1684
Source http://www.mt-archive.info/eamt-2006-gupta.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.10.1288, 10.1.1.120.6608, 10.1.1.97.1708