Publication View

The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution (2005)

Abstract
Arabic presents an interesting challenge to natural language processing, being a highly inflected and agglutinative language. In particular, this paper presents an in-depth investigation of the entity detection and recognition (EDR) task for Arabic. We start by highlighting why segmentation is a necessary prerequisite for EDR, continue by presenting a finite-state statistical segmenter, and then examine how the resulting segments can be better included into a mention detection system and an entity recognition system; both systems are statistical, build around the maximum entropy principle. Experiments on a clearly stated partition of the ACE 2004 data show that stem-based features can significantly improve the performance of the EDT system by 2 absolute F-measure points. The system presented here had a competitive performance in the ACE 2004 evaluation.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.9121
Source http://acl.ldc.upenn.edu/W/W05/W05-0709.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.103.7637, 10.1.1.14.3957, 10.1.1.20.896, 10.1.1.13.8615, 10.1.1.18.8040, 10.1.1.12.4507, 10.1.1.104.4130, 10.1.1.97.7131, 10.1.1.81.2546, 10.1.1.65.2033