| Exploiting Word Transformation in Statistical Machine Translation from Spanish to English (2008) | |||||||||||||||
Abstract | |||||||||||||||
| This paper investigates the use of morphosyntactic information to reduce datasparseness in statistical machine translation from Spanish to English. In particular, word-alignment training is performed by applying different word transformations using lemmas and stems. It has been observed that stem-based training is better than lemma-based training when up to 1 million running words of data are used. In this paper a new word-alignment training technique is proposed by exploiting syntactically motivated constraints to the parallel data. Preliminary experimental results show that stem-based training with syntactically motivated constraints gives significant improvement in translation performance. Finally, a technique to reduce the impact of out-of-vocabulary words is discussed. The considered task is the translation of Plenary Sessions of the European Parliament. 1 | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||