Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information, web statistics, and inverse search based on Google...
Using Polarity Scores of Words for Sentence-level Opinion Extraction (2009)
Lun-wei Ku, Yong-sheng Lo, Hsin-hsi Chen
The opinion analysis task is a pilot study task in NTCIR-6. It contains the challenges of opinion sentence extraction, opinion polarity judgment, opinion holder extraction and relevance sentence...
***Department of Statistics, Chungnam National University (2008)
Kazuko Kuriyama, Noriko K, Hsin-hsi Chen, Taejon Korea
The purpose of this paper is to overview research efforts at the NTCIR-6 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese,...
ABSTRACT FRank: A Ranking Method with (2008)
Fidelity Loss, Ming-feng Tsai, Tie-yan Liu, Tao Qin, Hsin-hsi Chen, Wei-ying Ma, ...
Ranking problem is becoming important in many fields, especially in information retrieval (IR). Many machine learning techniques have been proposed for ranking problem, such as RankSVM, RankBoost,...
2005. An Approach of Using the Web as a Live Corpus for Spoken Transliteration Name Access (2008)
Ming-shun Lin, Chia-ping Chen, Hsin-hsi Chen
Recognizing transliteration names is challenging due to their flexible formulation and lexical coverage. In our approach, we employ the Web as a giant corpus. The patterns extracted from the Web are...
David Kirk Evans, Lun-wei Ku, Yohei Seki, Hsin-hsi Chen
Abstract. In this paper we introduce the NTCIR6 Opinion Analysis Pilot Task, information about the Chinese, Japanese, and English data, plans for future opinion analysis tasks at NTCIR, and a brief...
It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based...
Classifying Biological Full-Text Articles for Multi-Database Curation (2008)
Wen-juan Hou, Chih Lee, Hsin-hsi Chen
In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title and abstract, MeSH terms, and captions) as...
AN NTU-APPROACH TO AUTOMATIC SENTENCE EXTRACTION FOR SUMMARY GENERATION (2008)
Kuang-hua Chert, Sheng-jie Huang, Wen-cheng Lin, Hsin-hsi Chen
Automatic summarization and information extraction are two important Internet services. MUC and SUMMAC play their appropriate roles in the next generation Internet. This paper focuses on the...
Fang, Yu-Ching, Huang, Hsuan-Cheng, Chen, Hsin-Hsi, Juan, Hsueh-Fen
Abstract Background Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East...
Corpus-Based Analyses of Adjectives: Automatic Clustering (2008)
Similarity analysis is a substantial issue in both corpus-based researches and language usages. This paper focuses on the semantic usages of adjectives, and analyzes the similarities among...
A Rule-Based and Corpus-Oriented Approach to Prepositional Phrases Attachment (2008)
Prepositional Phrase is the key issue in structrual ambiguity. Recently, researches in corpora provide the lexical cue of association among prepositions and other words and these information could be...
Tagging Heterogeneous Evaluation Corpora for Opinionated Tasks (2008)
Lun-wei Ku, Yu-ting Liang, Hsin-hsi Chen
Opinion retrieval aims to tell if a document is positive, neutral or negative on a given topic. Opinion extraction further identifies the supportive and the non-supportive evidence of a document. To...
Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information, web statistics, and inverse search based on Google...
Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison (2008)
Ming-hung Hsu, Ming-feng Tsai, Hsin-hsi Chen
Abstract. This paper compares the utilization of ConceptNet and WordNet in query expansion. Spreading activation selects candidate terms for query expansion from these two resources. Three measures...
A CHUNKING-AND-RAISING PARTIAL PARSER (2008)
Parsing is often seen as a combinatorial problem. It is not due to the properties of the natural languages, but due to the parsing strategies. This paper investigates a Constrained Grammar extracted...
Chih Lee, Wen-juan Hou, Hsin-hsi Chen
In the biological domain, extracting newly discovered functional features from the massive literature is a major challenging issue. To automatically annotate Gene References into Function (GeneRIF)...
This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents...
Sung Hyon Myaeng Information and Communications University (2008)
Kazuaki Kishida, Hsin-hsi Chen, Kuang-hua Chen, Noriko Kando, Koji Eguchi, Sukhoon Lee, ...
National Taiwan University
Retrieval of Biomedical Documents by Prioritizing Key Phrases (2008)
In this paper, we presented an approach to retrieving relevant articles from the biomedical corpus. Our first run considered four kinds of operators as query expansion. The operators are phrase,...
A Multimedia Retrieval System for Retrieving Chinese Text and Speech Documents (2008)
Multimedia documents place new requirements on the conventional text retrieval systems. This paper presents a multimedia retrieval system that employs the content-based strategy to retrieve both text...
Constructing a Named Entity Ontology from Web Corpora (2008)
This paper proposes a named entity (NE) ontology generation engine, called X NE-Tree engine, which produces relational named entities by given a seed. The engine incrementally extracts high...
Question Analysis and Answer Passage Retrieval for Opinion Question Answering Systems (2008)
Lun-wei Ku, Yu-ting Liang, Hsin-hsi Chen
Question answering systems provide an elegant way for people to access an underlying knowledge base. Humans are not only interested in factual questions but also interested in opinions. This paper...
General Terms Algorithms, Design, Experimentation. (2008)
Lun-wei Ku, Li-ying Lee, Tung-ho Wu, Hsin-hsi Chen
Watching specific information sources and summarizing the newly discovered opinions is important for governments to improve their services and companies to improve their products [1, 3]. Because no...
Merging Mechanisms in Multilingual Information Retrieval (2008)
National Taiwan University (NTU) Natural Language Processing Laboratory (NLPL) participated in MLIR task in CLEF 2002. We submitted five official multilingual runs. In this paper, we try to resolve...
This paper presents a design of a Mandarin to Taiwanese Min Nan (abbreviated as Taiwanese hereafter) machine translation system. It is the first machine translation system which focuses on these two...
General Terms Algorithms, Design, Experimentation. (2008)
Lun-wei Ku, Li-ying Lee, Tung-ho Wu, Hsin-hsi Chen
Watching specific information sources and summarizing the newly discovered opinions is important for governments to improve their services and companies to improve their products [1, 3]. Because no...
Categories and Subject Descriptors (2008)
This paper employs ConceptNet, which covers a rich set of commonsense concepts, to retrieve images with text descriptions by focusing on spatial relationships. Evaluation on test data of the 2005...
Proceedings of ROCLING-93, pp. 99-117 A PROBABILISTIC CHUNKER (2008)
This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we...
A LOGIC PROGRAMMING APPROACH TO FRAME-BASED LANGUAGE DESIGN (2008)
In this paper, we will propose a logic programming approach to design a frame-based language. The relationship among frame, logic and Prolog is our basic design issue. Frame is considered as a...
Chih Lee, Wen-juan Hou, Hsin-hsi Chen
In the biological domain, extracting newly discovered functional features from the massive literature is a major challenging issue. To automatically annotate Gene References into Function (GeneRIF)...
Integrating Punctuation Rules and Naïve Bayesian Model for Chinese Creation Title Recognition (2008)
Abstract. Creation titles, i.e. titles of literary and/or artistic works, comprise over 7 % of named entities in Chinese documents. They are the fourth large sort of named entities in Chinese other...
Gene Ontology Annotation Using Word Proximity Relationship (2008)
In this paper, we propose an approach for doing Gene Ontology (GO) annotation on full-text biomedical articles. This system explores the word proximity relationship between genes and GO terms. We...
Cross Document Event Clustering Using Knowledge Mining from Co-Reference Chains (2008)
Abstract. Unification of the terminology usages which captures more term semantics is useful for event clustering. This paper proposes a metric of normalized chain edit distance to mine controlled...
Description of NTU Approach to NTCIR3 Multilingual Information Retrieval (2008)
This paper deals with Chinese, English and Japanese multilingual information retrieval. Several merging strategies, including raw-score merging, round-robin merging, normalized-score merging, and...
SVM Approach to GeneRIF Annotation Wen-Juan Hou, Chun-Yuan Teng, (2008)
In the biological domain, to extract the newly discovered functional features from massive literature is a major challenging issue. To automatically annotate GeneRIF in a new literature is the main...
Retrieval of Biomedical Documents by Prioritizing Key Phrases (2008)
In this paper, we present an approach for retrieving relevant articles from the biomedical corpus. Our first run considered four kinds of operators as query expansion. The operators are phrase,...
National Taiwan University at Terabyte Track of TREC 2005 (2008)
There are three tasks in the Terabyte track of TREC 2005, i.e. Efficiency, Ad hoc and Named page finding. We participated in all the tasks and use different retrieval methods to deal with each task,...
NTU at CLEF 2001: Chinese-English Cross-Lingual Information Retrieval Abstract (2008)
which is 53.06 % of monolingual information retrieval.
Description of NTU System at TREC-10 QA Track (2008)
Introduction In the past years, we attended the 250-bytes group. Our main strategy was to measure the similarity score (or the informative score) of each candidate sentence to the question sentence....
Corpus-Based Analyses of Adjectives: Automatic Clustering (2007)
Similarity analysis is a substantial issue in both corpus-based researches and language usages. This paper focuses on the semantic usages of adjectives, and analyzes the similarities among...
hh_chen @ csie.ntu.edu.tw In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task...
This paper presents a design of a Mandarin to Taiwanese Min Nan (abbreviated as Taiwanese hereafter) machine translation system. It is the first machine translation system which focuses on these two...
Lun-wei Ku, Yong-shen Lo, Hsin-hsi Chen
Opinion analysis is an important research topic in recent years. However, there are no common methods to create evaluation corpora. This paper introduces a method for developing opinion corpora...
Overview of opinion analysis pilot task at NTCIR-6 (2007)
Yohei Seki, David Kirk Evans, Lun-wei Ku, Hsin-hsi Chen, Noriko K, Chin-yew Lin
This paper describes an overview of the Opinion Analysis Pilot Task from 2006 to 2007 at the Sixth NT-CIR Workshop. We created test collection for 32, 30, and 28 topics (11,907, 15,279, and 8,379...
Experiment for Using Web Information to do Query and Document Expansion (2007)
Abstract. ImageCLEF photo task of this year is a little different from those of previous years. The caption field in image annotations and the narrative field in the text queries are removed, and the...
Frank: A ranking method with fidelity loss (2007)
Ming-feng Tsai, Tie-yan Liu, Tao Qin, Hsin-hsi Chen, Wei-ying Ma
Ranking problem is becoming increasingly important in many applications, especially in information retrieval. Many machine learning technologies have been proposed to solve this problem, such as...
Novel association measures using web search with double checking (2006)
Hsin-hsi Chen, Ming-shun Lin, Yu-chuan Wei
A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-
Two kinds of intermedia are explored in ImageCLEFphoto2006. The approach of using a word-image ontology maps images to fundamental concepts in an ontology and measure the similarity of two images by...
Translating–transliterating named entities for multilingual information access (2006)
Hsin-hsi Chen, Wen-cheng Lin, Changhua Yang, Wei-hao Lin
Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of...
Novel association measures using web search with double checking (2006)
Hsin-hsi Chen, Ming-shun Lin, Yu-chuan Wei
A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-
Combining text and image queries at imageclef 2005 (2005)
Yih-cheng Chang, Wen-cheng Lin, Hsin-hsi Chen
This paper presents our methods for the tasks of bilingual ad hoc retrieval and automatic annotation in ImageCLEF 2005. In ad hoc task, we propose a feedback method for cross-media translation in a...
Integrating Textual and Visual Information for Cross-Language Image Retrieval (2005)
Wen-cheng Lin, Yih-chen Chang, Hsin-hsi Chen
Abstract. This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual...
2005): Identifying Relevant Full-Text Articles for Database Curation (2005)
Chih Lee, Wen-juan Hou, Hsin-hsi Chen
In this paper, we propose an approach for identifying curatable articles from a large pool. Our system currently considers three parts of an article as three individual representations of the...
A Relevance Detection Approach to Gene Annotation (2005)
Wen-juan Hou, Chih Lee, Hsin-hsi Chen
Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene...
with filtering and integration strategies (2004)
Enhancing performance of protein and gene name recognizers
Annotating Multiple Types of Biomedical Entities: A Single Word Classification Approach (2004)
Chih Lee, Wen-juan Hou, Hsin-hsi Chen
Named entity recognition is a fundamental task in biomedical data mining. Multiple-class annotation is more challenging than single-class annotation. In this paper, we took a single word...
Chih Lee, Wen-juan Hou, Hsin-hsi Chen
Gene Ontology (GO) is a controlled vocabulary. Given a gene product, GO enables scientists to clearly and unambiguously describe specific molecular functions of the gene product, specific biological...
Cross-Language Image Retrieval via Spoken Query (2004)
Wen-cheng Lin, Ming-shun Lin, Hsin-hsi Chen
This paper studies cross-language cross-medium information retrieval. We introduce several approaches to unify the languages and media of queries and documents. We experiment on cross-language image...
Spoken Cross-Language Access to Image Collection via Captions (2003)
This paper presents a framework of using Chinese speech to access images via English captions. The formulation and the structure mapping rules of Chinese and English named entities are extracted from...
Learning formulation and transformation rules for multilingual named entities (2003)
Hsin-hsi Chen, Changhua Yang, Ying Lin
This paper investigates three multilingual named entity corpora, including named people, named locations and named organizations. Frequency-based approaches with and without dictionary are proposed...
Building a Chinese-English WordNet for Translingual Applications (2002)
Hsin-hsi Chen, Chi-ching Lin, Wen-cheng Lin
A WordNet-like linguistic resource is useful, but difficult to construct. This article proposes a method to integrate five linguistic resources, including English/Chinese sense-tagged corpora,...
Overview of clir task at the third ntcir workshop (2002)
Kuang-hua Chen, Hsin-hsi Chen, Noriko K, Kazuko Kuriyama, Sukhoon Lee, Sung Hyon Myaeng, ...
This report is an overview of Cross-Language Information Retrieval Task (CLIR) at the third NTCIR Workshop. There are 3 tracks in CLIR: Single Language IR (SLIR), Bilingual CLIR (BLIR), and...
2002) “Some Similarity Computation Methods in Novelty Detection (2002)
In the noveky task, the amount of information of a sentence that can be used in similarky computation is the major challenging issue. Some sort of information expansion methods was introduced to...
Backward Machine Transliteration by Learning Phonetic Similarity (2002)
hh_chen @ csie.ntu.edu.tw In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task...
2002) “Some Similarity Computation Methods in Novelty Detection (2002)
In the novelty task, the amount of information of a sentence that can be used in similarity computation is the major challenging issue. Some sort of information expansion methods was introduced to...
The Chinese text retrieval tasks of NTCIR workshop 2 (2001)
This paper is a report of Chinese Text Retrieval (CHTR) tasks in NTCIR Workshop 2. CHTR tasks fall into two categories: Chinese-Chinese IR (CHIR) and English-Chinese IR (ECIR). The definitions,...
Mining tables from large scale html texts (2000)
Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai
Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering,...
Mining tables from large scale html texts (2000)
Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai
Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper l'ocuscs on mining tables from large-scale HTML texts. Table filtering,...
A multilingual news summarizer (2000)
Huge multilingual news articles are reported and disseminated on the Internet. How to extract the key information and save the reading time is a crucial issue. This paper proposes architecture of...
Mining tables from large scale html texts (2000)
Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai
Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering,...
A multilingual news summarizer (2000)
Huge multilingual news articles are reported and disseminated on the Internet. How to extract the key information and save the reading time is a crucial issue. This paper proposes architecture of...
Mining tables from large scale html texts (2000)
Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai
Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering,...
A Summarization System for Chinese News from Multiple Sources (1999)
Hsin-hsi Chen, Sheng-jie Huang
This paper will propose a personal news secretariat that helps on-line readers absorb news information from multiple sources. Such a news secretariat eliminates the redundant information in the news,...
A Summarization System for Chinese News from Multiple Sources (1999)
Hsin-hsi Chen, June-jei Kuo, Sheng-jie Huang, Chuan-jie Lin, Hung-chia Wung
This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation...
Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval (1999)
Hsin-hsi Chen, Guo-wei Bian, Wen-cheng Lin
This paper deals with translation ambiguity and target polysemy problems together. Two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and...
Chuan-jie Lin, Wen-cheng Lin, Guo-wei Bian, Hsin-hsi Chen
This paper describes a Japanese-English Cross-Language Information Retrieval (CLIR) System for the evaluation
Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval (1999)
Hsin-hsi Chen, Guo-wei Bian, Wen-cheng Lin
This paper deals with translation ambiguity and target polysemy problems together. Two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and...
Breiman L.: Arcing classifiers (1998)
Chuan-jie Lin, Che-chia Liu, Hsin-hsi Chen
Captions in videos contain valuable information for video retrieval. Although texts in captions can be obtained easily in the new image compression formats like MPEG2, there still are many video...
A New Hybrid Approach for Chinese-English Query Translation (1998)
A new hybrid approach combining the dictionary-based and corpus-based approaches for Chinese-English cross-language information retrieval is proposed. The bilingual dictionary provides the...
Description of the NTU System Used for MET2 (1998)
Hsin-hsi Chen, Yung-wei Ding, Shih-chung Tsai, Guo-wei Bian
Named entities form the major components in a document. When we catch the fundamental entities, we can understand a document to some degree. This paper employs different types of information from...
Proper name translation in cross-language information retrieval (1998)
Hsin-hsi Chen, Sheng-jie Huang, Yung-wei Ding, Shih-chung Tsai
Recently, language barrier becomes the major problem for people to search, retrieve, and understand WWW documents in different languages. This paper deals with query translation issue in...
White Page Construction from Web Pages for Finding People on the Internet (1998)
This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents...
Description Of The NTU System Used For MET2 (1998)
Hsin-Hsi Chen, Yung-Wei Ding, Shih-chung Tsai, Guo-wei Bian
Named entities form the major components in a document. When we catch the fundamental entities, we can understand a document to some degree. This paper employs different types of information from...
Applying Repair Processing in Chinese Homophone Disambiguation (1997)
Repair processing plays an important role in spoken language processing systems. This paper proposes a method for correcting Chinese repetition repairs and demonstrates the effects of repair...
Applying Repair Processing in Chinese Homophone Disambiguation (1997)
Repair processing plays an important role in spoken language processing systems. This paper proposes a method for correcting Chinese repetition repairs and demonstrates the effects of repair...
An MT Meta-Server for Information Retrieval on WWW (1997)
In the past few years, the World Wide Web (WWW) grows explosively and has become the most useful and powerful information retrieval and accessing system in the Internet. However, the language barrier...
An MT Meta-Server for Information Retrieval on WWW (1997)
In the past few years, the World Wide Web (WWW) grows explosively and has become the most useful and powerful information retrieval and accessing system in the Internet. However, the language barrier...
Still image, seamless space /--by Hsin-Hsi Chen. (1996)
Thesis research directed by Dept. of Art.
A Hybrid Approach to Machine Translation System Design (1996)
It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based...
Analysis of Error Count Distribution for Improving the Postprocessing Performance of OCCR (1996)
Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is...
Identification and classification of proper nouns in chinese texts (1996)
Various strategies are proposed to identify and classify three types of proper nouns in Chinese texts. Clues from character, sentence and paragraph levels are employed to resolve Chinese personal...
A Rule-Based and MTOriented Approach to Prepositional Phrases Attachment (1996)
khcheil<~nlg.csie, ntu.ed u.tw, hh_chcn({~csic.nl u.cdu.tw I'rel)ositional l'hrase is the key issue in structuraJ ambiguity, l{ecently, re-searches in corpora provide the lexical cue of...
Analysis of Error Count Distribution for Improving the Postprocessing Performance of OCCR (1996)
Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is...
A Rule-Based and MT-Oriented Approach to Prepositional Phrase Attachment (1996)
Prepositional Phrase is the key issue in structural ambiguity. Recently, researches in corpora provide the lexical cue of prepositions with other words and the information could be used to partly...
Analysis of Error Count Distributions for Improving the Postprocessing Performance of OCCR (1996)
Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is...
A Corpus-Based Approach to Text Partition (1995)
A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and...
Aligning Bilingual corpus: Especially for Language Pairs from Different Families (1995)
Rather than using length-based or translation-based criterion to align bilingual texts, this paper proposes a part-of-speech-based (POS-based) criterion. The postulation is that bilingual texts...
Development of Partially Bracketed Corpus with Part-of-Speech Information Only (1995)
Research based on a treebank is active for many natural language applications. However, the work to build a large scale treebank is laborious and tedious. This paper proposes a probabilistic chunker...
Machine Translation: An Integrated Approach (1995)
A pure statistics-based machine translation system is usually incapable of processing long sentences and is usually domain dependent. A pure rule-based machine translation system involves many costs...
Development of Partially Bracketed Corpus with Part-of-Speech Information Only (1995)
Resea/ch based on a treebank is active for many natural language applications. However, the work to build a large scale treebank is laborious and tedious. This paper proposes a probabilistic chunker...
A Corpus-Based Approach to Text Partition (1995)
A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and...
Machine Translation: An Integrated Approach (1995)
A pure statistics-based machine translation system is usually incapable of processing long sentences and is usually domain dependent. A pure rule-based machine translation system involves many costs...
Aligning Parallel Chinese-English Texts Using Multiple Clues (1995)
Parallel texts bring much linguistic information, so that they can be applied to word-sense disambiguation, extraction of translation templates, automatic translation of noun compounds, construction...
Approximate N-Gram Markov Model for Natural Language Generation (1994)
This paper proposes an Approximate n-gram Markov Model for bag generation. Directed word association pairs with distances are used to approximate (n-1)-gram and n-gram training tables. This model has...
A Corrective Training Algorithm for Adaptive Learning in Bag Generation (1994)
The sampling problem in training corpus is one of the major sources of errors in corpus-based applications. This paper proposes a corrective training algorithm to best-fit the run-time context domain...
Chen, Kuang-hua, Chen, Hsin-Hsi
To acquire noun phrases from running texts is useful for many applications, such as word grouping,terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure...
To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure...
phrases. The partial parser is motivated by an intuition (Abney, 1991): To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc....
To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure...
The Contextual Analysis of Chinese Sentences with Punctuation Marks (1994)
By corpus analysis, about 75% of Chinese sentences are composed of more than two sentence segments separated by commas or semicolons. A segment may be a sentence, a noun phrase, a verb phrase,...
A Probabilistic Chunker (1993)
This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we...
The Transfer of Anaphors in Translation (1992)
This paper adopts Government-Binding (GB) Theory and some Chinese-specific conditions to explain the various uses of anaphors such as the reflexive, pronoun, PRO, trace, pro and variable in Mandarin...
Proper Name Translation in Cross-Language Information Retrieval
Hsin-Hsi Chen, Sheng-Jie Huang, Yung-Wei Ding, Shih-Chang Tsai