Publication View

HowtogetaChineseName(Entity): Segmentation and Combination Issues (2007)

Abstract
When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination. We present and describe four classifiers for Chinese named entity recognition and describe various methods for combining their outputs. The results demonstrate that classifier combination is an effective technique of improving system performance: experiments over a large annotated corpus of fine-grained entity types exhibit a 10% relative reduction in F-measure error.. Presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2003) held in Sapporo, Japan on 11-12 Jul 2003. Published in the Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2003. Sponsored in part by DARPA and National Science Foundation.

Publication details
Download http://handle.dtic.mil/100.2/ADA457910
Contributors IBM THOMAS J WATSON RESEARCH CENTER YORKTOWN HEIGHTS NY
Repository Defense Technical Information Center OAI-PMH Repository (United States)
Keywords INFORMATION SCIENCE, LINGUISTICS, CYBERNETICS, *RECOGNITION, *WORDS(LANGUAGE), *CHINESE LANGUAGE, ALGORITHMS, SYMPOSIA, NATURAL LANGUAGE, MARKOV PROCESSES, MODELS, CLASSIFICATION, *NAMED ENTITY RECOGNITION, *HMM(HIDDEN MARKOV MODEL), *CLASSIFIERS, WORD SEGMENTATION, WORD-BASED MODELS, CHARACTER-BASED MODELS
Language eng