In Mandarin Chinese, lexical semantic relation of near synonyms is a widespread phenomenon, and is of great interest to many linguists. Most works deal with lexical semantic relation between lexical...
Using Chinese Gigaword Corpus and Chinese Word Sketch in linguistic research (2008)
Abstract. We explore the possibility of deeper linguistic research based on corpus and computational linguistic tools in this paper. In particular, we adopt Chinese Word Sketch, the application of...
Lexicons can perform the bridging function between documents and conceptual categorisation (Calzolari, this panel). This position is motivated by both language engineering concerns as well as...
Basic Lexicon and Shared Ontology for Multilingual Resources: A SUMO+MILO Hybrid Approach (2008)
Shu-kai Hsieh, I-li Su, Pei-yi Hsiao, Chu-ren Huang, Tzu-yi Kuo, Laurent Prévot
Abstract. A common conceptual infrastructure is crucial for multilingual language processing and documentation. Global Wordnet (GWN) was proposed as the common infrastructure for linguistically...
Chu-ren Huang, Wei-yun Ma, Yi-ching Wu, Academia Sinica
This paper discusses the implementation of a knowledge-rich approach to automatic acquisition of grammatical information. Our study is based on Word Sketch Engine (Kilgarriff and Tudgell 2002). The...
Chinese Nominals, Kathleen Ahrens, Li-li Chang, Ke-jiann Chen, Chu-ren Huang
The goal of this paper is to explicate the nature of Chinese nominal semantics, and to create a paradigm for nominal semantics in general that will be useful for natural language processing purposes....
We explore the possibility of extracting linguistically felicitous characteristics based on quantitative data. This is a necessary and crucial step in the development of corpus and computational...
Automatic Acquisition of Linguistic Knowledge: From Sinica Corpus to Gigaword Corpus (2008)
The raison d’etre for a corpus, as it was first conceived by Francis and Kucera in 1963, was to provide a body of linguistic facts from which linguistic knowledge could be generalized, [1]. The...
GuangQunFangPu: e-Humanities Combining Textual and Botanic Information (2008)
Shu-kai Hsieh, Shu-ming Chang, Chun-han Chang, Yi-shuan Zhou, Chu-ren Huang, Feng-ju Lo
In this paper, we propose a lexicon-driven and ontologymerging methodology of constructing diachronic domain knowledge via Sinica BOW, a bilingual ontological lexical resource based on WordNet and...
The semantics of shapes: A study based on Mandarin quan1zi5 (圈子) ∗ (2008)
Cui-xia Weng, Chu-ren Huang, Cui-xia Weng, Chu-ren Huang
Mandarin shape nouns, such as a fang1xing2 ‘square ’ and san1jiao3xing2 ‘triangle’, share a set of very interesting lexical semantic features. These nouns can refer to either the contour...
Chu-ren Huang, Academia Sinica
Mandarin Chinese WH-questions exhibit two syntactically interesting characteristics. First, WH-words that bear interrogative information always occur in situ. Second, in spite the above fact, I find...
Categorical Ambiguity and Information Content A Corpus-based Study of Chinese (2008)
Assignment of grammatical categories is the fundamental step in natural language processing. And ambiguity resolution is one of the most challenging NLP tasks that is currently
Ching-yu Chen, Shu-fen Tseng, Chu-ren Huang, Keh-jiann Chen
The study of word frequency has been discussed by linguists, psychologists, and computer scientists. However, the results of these studies cannot be valid unless the corpus is big enough and...
Chu-ren Huang, Keh-jiann Chen, Feng-yi Chen, Li-li Chang
This paper proposes a segmentation standard for Chinese natural language processing. The standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity....
Meng-chien Yang, D. Victoria Rau, Tokunaga Takenobu, Chu-ren Huang
Preservation of an endangered language is an important and difficult task. The preservation project is proposed to include documentation, archiving and development of shared resources for the...
Towards a Common Conceptual Framework of Language Documentation (2008)
Language represents shared conventionalization of concepts by all speakers. Hence language documentation preserves information far beyond a collection of sound shapes, lexical forms, and grammatical...
papers from CLSW5. Computational Linguistics and Chinese Language Processing, 10.4. (2008)
Chu-ren Huang, P Huang, Nicoletta Calzolari, Aldo Gangemi, Ro Lenci, ...
Hanzi Grid Toward a Knowledge Infrastructure for Chinese Character-based Cultures (2008)
Ya-min Chou, Shu-kai Hsieh, Chu-ren Huang
Abstract. The long-term historical development and broad geographi-cal variation of Chinese character (Hanzi/Kanji) has made it a cross-cultural information sharing platform in East Asia. However,...
Chu-ren Huang, Wei-yun Ma, Yi-ching Wu, Academia Sinica
This paper discusses the implementation of a knowledge-rich approach to automatic acquisition of grammatical information. Our study is based on Word Sketch Engine (Kilgarriff and Tudgell 2002). The...
II.2. Experiments of Ontology Construction with Formal Concept Analysis (2008)
Chu-ren Huang, Nicoletta Calzolari, Aldo Gangemi, Ro Lenci, Laurent Prevot, Adam Pease, ...
Expected year of publication: 2007
Tokunaga Takenobu, Nicoletta Calzolari, Chu-ren Huang, Laurent Prevot
As an area of great linguistic and cultural diversity, Asian language resources have received much less attention than their western counterparts. Creating a common standard for Asian language...
Categorical ambiguity and information content: A Corpus-based study of Chinese (2008)
The degree of ambiguity in Chinese is investigated in this paper based on the tagged Sinica Corpus. We propose to use measurement of information content instead of frequency to model generalizations...
Assignment of grammatical categories is the fundamental step in natural language processing. And ambiguity resolution is one of the most challenging NLP tasks that is currently
JOURNAL OF CHINESE LINGUISTICS, Vol. 18, No. 2 REVIEW (2008)
This inaugural volume of the monograph series of the Chinese Language Teachers Association is a welcomed addition to the study of Chinese linguistics. Its focus on functionalism reflects a field that...
Kathleen Ahrens, Siaw-fong Chung, Chu-ren Huang
The goal of this paper is to further develop methods for verifying Mapping Principles between source and target domain pairings of conceptual metaphors. Previous work (Ahrens, Chung & Huang,...
Toward an Architecture for the Global Wordnet Initiative (2008)
Andrea Marchetti, Maurizio Tesconi, Francesco Ronzano, Marco Rosella, Francesca Bertagna, Monica Monachini, ...
Abstract — Enhancing the development of multilingual lexicons is of foremost importance for intercultural collaboration to take place, as multilingual lexicons are the cornerstone of several...
Information Processing) Group. This group was (2007)
This is a project note on the first stage of the con-struction of a comprehensive corpus of both Modern and Classical Chinese. The corpus is built with the dual aim of serving as the central database...
Kathleen Ahrens, Li-li Chang, Ke-jiann Chen, Chu-ren Huang
The goal of this paper is to explicate the nature of Chinese nominal semantics, and to create a paradigm for nominal semantics in general that will be useful for natural language processing purposes....
Sinica Treebank Design, Keh-jiann Chen, Chi-ching Luo, Ming-chung Chang, Feng-yi Chen, Jan Chen, ...
This paper describes the design criteria and annotation guidelines of the Sinica Treebank. The three design criteria are: Maximal Resource Sharing, Minimal Structural Complexity, and Optimal Semantic...
Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named...
Computing Thresholds of Linguistic Saliency (2007)
Chung, Siaw-Fong, Ahrens, Kathleen, Cheng, Chung-Ping, Huang, Chu-Ren, Simon, Petr
PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 2007
The Polysemy of Da3: An ontology-based lexical semantic study (2007)
Hong, Jia-Fei, Huang, Chu-Ren, Ahrens, Kathleen
PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 2007
Using Chinese Gigaword Corpus and Chinese Word Sketch in linguistic Research (2006)
PACLIC 20 / Wuhan, China / 1-3 November, 2006
Huang, Chu-Ren, Ma, Wei-Yun, Wu, Yi-Ching, Chiu, Chih-Ming
PACLIC 20 / Wuhan, China / 1-3 November, 2006
Using the Swadesh list for creating a simple common taxonomy (2006)
Laurent, Prevot, Huang, Chu-Ren, Su, I-Li
PACLIC 20 / Wuhan, China / 1-3 November, 2006
Text-based Construction and Comparison of Domain Ontology : A Study Based on Classical Poetry (2005)
Ontology-based Prediction of Compound Relations : A Study Based on SUMO (2005)
Hong, Jia-Fei, Li, Xiang-Bing, Huang, Chu-Ren
This paper explores the interaction between conceptual structure and morpho-syntax. In particular, we show that ontology-based conceptual classification can be used to predict internal relations in...
Text-based Construction and Comparison of Domain Ontology : A Study Based on Classical Poetry (2005)
Ontology-based Prediction of Compound Relations : A Study Based on SUMO (2005)
Hong, Jia-Fei, Li, Xiang-Bing, Huang, Chu-Ren
This paper explores the interaction between conceptual structure and morpho-syntax. In particular, we show that ontology-based conceptual classification can be used to predict internal relations in...
Text-based Construction and Comparison of Domain Ontology : A Study Based on Classical Poetry (2005)
Ontology-based Prediction of Compound Relations : A Study Based on SUMO (2005)
Hong, Jia-Fei, Li, Xiang-Bing, Huang, Chu-Ren
This paper explores the interaction between conceptual structure and morpho-syntax. In particular, we show that ontology-based conceptual classification can be used to predict internal relations in...
Extensive Reading with Guidance (2005)
Cheng, Chin-chuan, Huang, Chu-ren, Lo, Feng-ju, Chen, Xiang-yu, Han, Joyce Ya-chi, Huang, Yu-chun
A language learning mode called “word-focused extensive reading" has been proposed to facilitate word-usage learning. The user inputs a word in the computer program designed and implemented for...
Extensive Reading with Guidance (2005)
Cheng, Chin-chuan, Huang, Chu-ren, Lo, Feng-ju, Chen, Xiang-yu, Han, Joyce Ya-chi, Huang, Yu-chun
A language learning mode called “word-focused extensive reading" has been proposed to facilitate word-usage learning. The user inputs a word in the computer program designed and implemented for...
Extensive Reading with Guidance (2005)
Cheng, Chin-chuan, Huang, Chu-ren, Lo, Feng-ju, Chen, Xiang-yu, Han, Joyce Ya-chi, Huang, Yu-chun
A language learning mode called “word-focused extensive reading" has been proposed to facilitate word-usage learning. The user inputs a word in the computer program designed and implemented for...
Extensive Reading with Guidance (2005)
Cheng, Chin-chuan, Huang, Chu-ren, Lo, Feng-ju, Chen, Xiang-yu, Han, Joyce Ya-chi, Huang, Yu-chun
A language learning mode called “word-focused extensive reading" has been proposed to facilitate word-usage learning. The user inputs a word in the computer program designed and implemented for...
Extensive Reading with Guidance (2005)
Cheng, Chin-chuan, Huang, Chu-ren, Lo, Feng-ju, Chen, Xiang-yu, Han, Joyce Ya-chi, Huang, Yu-chun
A language learning mode called “word-focused extensive reading” has been proposed to facilitate word-usage learning. The user inputs a word in the computer program designed and implemented for...
The Sinica Sense Management System: Design and Implementation (2005)
Chu-ren Huang, Chun-ling Chen, Cui-xia Weng, Hsiang-ping Lee, Yong-xiang Chen, Keh-jiann Chen
A sense-based lexical knowledgebase is a core foundation for language engineering. Two important criteria must be satisfied when constructing a knowledgebase: linguistic felicity and data cohesion....
In and Out: Senses and Meaning Extension of Mandarin Spatial Terms nei and wai (2005)
Wu, Yiching, Weng, Cui-Xia, Huang, Chu-Ren
PACLIC 19 / Taipei, taiwan / December 1-3, 2005
Text-based Construction and Comparison of Domain Ontology : A Study Based on Classical Poetry (2004)
Ontology-based Prediction of Compound Relations : A Study Based on SUMO (2004)
Hong, Jia-Fei, Li, Xiang-Bing, Huang, Chu-Ren
This paper explores the interaction between conceptual structure and morpho-syntax. In particular, we show that ontology-based conceptual classification can be used to predict internal relations in...
Conceptual Metaphors: Ontology-based representation and corpora (2003)
Driven Mapping Principles, Kathleen Ahrens, Siaw Fong Chung, Chu-ren Huang
The goal of this paper is to integrate the Conceptual Mapping Model with an ontology -based knowledge representation (i.e. Suggested Upper Merged Ontology (SUMO)) in order to demonstrate that...
The Structure of Polysemy : A Study of Multi-sense Words Based on WordNet (2002)
Lin, Jen-Yi, Yang, Chang-Hua, Tseng, Shu-Chuan, Huang, Chu-Ren
The Structure of Polysemy : A Study of Multi-sense Words Based on WordNet (2002)
Lin, Jen-Yi, Yang, Chang-Hua, Tseng, Shu-Chuan, Huang, Chu-Ren
Translating Lexical Semantic Relations: The First Step Towards Multilingual Wordnets (2002)
Establishing correspondences between wordnets of different languages is essential to both multilingual knowledge processing and for bootstrapping wordnets of low-density languages. We claim that such...
Echa Chang, Chu-ren Huang, Sue-jin Ker, Chang-hua Yang
We present in this paper a series of induced methods to assign domain tags to WordNet entries. Our prime objective is to enrich the contextual information in WordNet specific to each synset entry. By...
Induction of Classification from Lexicon Expansion : (2002)
Assigning Domain Tags, Echa Chang, Chu-ren Huang, Sue-jin Ker, Chang-hua Yang
We present in this paper a series of induced methods to assign domain tags to WordNet entries. Our prime objective is to enrich the contextual information in WordNet specific to each synset entry. By...
The Open Language Archives Community and Asian Language Resources (2001)
Bird, Steven, Simons, Gary, Huang, Chu-Ren
The Open Language Archives Community (OLAC) is a new project to build a worldwide system of federated language archives based on the Open Archives Initiative and the Dublin Core Metadata Initiative....
The Open Language Archives Community and Asian Language Resources (2001)
Steven Bird, Gary Simons, Chu-ren Huang
The Open Language Archives Community (OLAC) is a new project to build a worldwide system of federated language archives based on the Open Archives Initiative and the Dublin Core Metadata Initiative....
Chu-ren Huang, Kathleen Ahrens
In this paper, we set forth a theory of lexical knowledge. we propose two types of modules: event structure modules and role modules, as well as two attributes: event-internal attributes and...
The CKIP Chinese Treebank: Guidelines for Annotation (1999)
Keh-jiann Chen, Chi-ching Luo, Zhao-ming Gao, Ming-chung Chang, Feng-yi Chen, Chao-jan Chen, ...
Abstract. This paper aims to present the methodology and guidelines for annotation in CKIP Chinese Treebank. Under the framework of the Information-based Case grammar (ICG), a lexical feature-based...
What Can Near Synonyms Tell Us (1998)
National Sun, Lian-cheng Chief, Chu-ren Huang, Keh-jiann Chen, Mei-chih Tsai, Lili Chang, ...
This study examines near synonyms and tries to extract the contrasts that dictate their semantic and associated syntactic behaviors. A near synonym pair of Chinese verbs, fangbian and bianli, which...
Segmentation standard for Chinese natural language processing (1997)
Chu-ren Huang, Keh-jiann Chen, Li-li Chang
This paper proposes a segmentation stan-dard for Chinese natural language processing. The standard is proposed to achieve linguis-tic felicity, computational feasibility, and data uniformity....
SINICA CORPUS : Design Methodology for Balanced Corpora (1996)
Chen, Keh-Jiann, Huang, Chu-Ren, Chang, Li-Ping, Hsu, Hui-Li
SINICA CORPUS : Design Methodology for Balanced Corpora (1996)
Chen, Keh-Jiann, Huang, Chu-Ren, Chang, Li-Ping, Hsu, Hui-Li
Mandarin Chinese NP de : a comparative study of current grammatical theories / (1991)
Thesis (Ph. D.)--Cornell University, 1987.
National Science Council, Chu-Ren Huang, ...
Purpose: Academia Sinica Balanced Corpus of Modern Chinese, simplified as Sinica Corpus, is designed for analyzing modern Chinese. Every text in the corpus is segmented and each segmented word is...
Thesis (Ph. D.)--Cornell University, 1987.