Publication View

Improving statistical word alignments with morpho-syntactic transformations (2006)

Abstract
Abstract. This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability. 1

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.100.6720
Source http://www.tc-star.org/pubblicazioni/scientific_publications/IRST/sett-2006/deepagupta_fintal_2006.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.13.8919, 10.1.1.10.1288, 10.1.1.14.2316, 10.1.1.21.280, 10.1.1.19.1876, 10.1.1.12.9928, 10.1.1.127.8981, 10.1.1.13.8624, 10.1.1.108.896