Hsin-hsi Chen

Publication List Details

Period

1514 - 2009

Number

124

Co-Authors

A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics (2009)

Conrad Chen, Hsin-hsi Chen

Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information, web statistics, and inverse search based on Google...

Using Polarity Scores of Words for Sentence-level Opinion Extraction (2009)

Lun-wei Ku, Yong-sheng Lo, Hsin-hsi Chen

The opinion analysis task is a pilot study task in NTCIR-6. It contains the challenges of opinion sentence extraction, opinion polarity judgment, opinion holder extraction and relevance sentence...

***Department of Statistics, Chungnam National University (2008)

Kazuko Kuriyama, Noriko K, Hsin-hsi Chen, Taejon Korea

The purpose of this paper is to overview research efforts at the NTCIR-6 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese,...

ABSTRACT FRank: A Ranking Method with (2008)

Fidelity Loss, Ming-feng Tsai, Tie-yan Liu, Tao Qin, Hsin-hsi Chen, Wei-ying Ma, ...

Ranking problem is becoming important in many fields, especially in information retrieval (IR). Many machine learning techniques have been proposed for ranking problem, such as RankSVM, RankBoost,...

2005. An Approach of Using the Web as a Live Corpus for Spoken Transliteration Name Access (2008)

Ming-shun Lin, Chia-ping Chen, Hsin-hsi Chen

Recognizing transliteration names is challenging due to their flexible formulation and lexical coverage. In our approach, we employ the Web as a giant corpus. The patterns extracted from the Web are...

Opinion Analysis across languages: An Overview of and Observations from the NTCIR6 Opinion Analysis Pilot Task (2008)

David Kirk Evans, Lun-wei Ku, Yohei Seki, Hsin-hsi Chen

Abstract. In this paper we introduce the NTCIR6 Opinion Analysis Pilot Task, information about the Chinese, Japanese, and English data, plans for future opinion analysis tasks at NTCIR, and a brief...

ÓComputational Linguistics Society of R.O.C. A Hybrid Approach to Machine Translation System Design (2008)

Kuang-hua Chen, Hsin-hsi Chen

It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based...

Classifying Biological Full-Text Articles for Multi-Database Curation (2008)

Wen-juan Hou, Chih Lee, Hsin-hsi Chen

In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title and abstract, MeSH terms, and captions) as...

AN NTU-APPROACH TO AUTOMATIC SENTENCE EXTRACTION FOR SUMMARY GENERATION (2008)

Kuang-hua Chert, Sheng-jie Huang, Wen-cheng Lin, Hsin-hsi Chen

Automatic summarization and information extraction are two important Internet services. MUC and SUMMAC play their appropriate roles in the next generation Internet. This paper focuses on the...

TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining (2008)

Fang, Yu-Ching, Huang, Hsuan-Cheng, Chen, Hsin-Hsi, Juan, Hsueh-Fen

Abstract Background Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East...

Corpus-Based Analyses of Adjectives: Automatic Clustering (2008)

Kuang-hua Chen, Hsin-hsi Chen

Similarity analysis is a substantial issue in both corpus-based researches and language usages. This paper focuses on the semantic usages of adjectives, and analyzes the similarities among...

A Rule-Based and Corpus-Oriented Approach to Prepositional Phrases Attachment (2008)

Kuang-hua Chen, Hsin-hsi Chen

Prepositional Phrase is the key issue in structrual ambiguity. Recently, researches in corpora provide the lexical cue of association among prepositions and other words and these information could be...

Tagging Heterogeneous Evaluation Corpora for Opinionated Tasks (2008)

Lun-wei Ku, Yu-ting Liang, Hsin-hsi Chen

Opinion retrieval aims to tell if a document is positive, neutral or negative on a given topic. Opinion extraction further identifies the supportive and the non-supportive evidence of a document. To...

A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics (2008)

Conrad Chen, Hsin-hsi Chen

Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information, web statistics, and inverse search based on Google...

Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison (2008)

Ming-hung Hsu, Ming-feng Tsai, Hsin-hsi Chen

Abstract. This paper compares the utilization of ConceptNet and WordNet in query expansion. Spreading activation selects candidate terms for query expansion from these two resources. Three measures...

A CHUNKING-AND-RAISING PARTIAL PARSER (2008)

Hsin-hsi Chen, Yue-shi Lee

Parsing is often seen as a combinatorial problem. It is not due to the properties of the natural languages, but due to the parsing strategies. This paper investigates a Constrained Grammar extracted...

Support Vector Machine Approach to Extracting Gene References into Function from Biological Documents (2008)

Chih Lee, Wen-juan Hou, Hsin-hsi Chen

In the biological domain, extracting newly discovered functional features from the massive literature is a major challenging issue. To automatically annotate Gene References into Function (GeneRIF)...

World Wide Web (2008)

Hsin-hsi Chen, Guo-wei Bian

This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents...

Retrieval of Biomedical Documents by Prioritizing Key Phrases (2008)

Wen-juan Hou, Hsin-hsi Chen

In this paper, we presented an approach to retrieving relevant articles from the biomedical corpus. Our first run considered four kinds of operators as query expansion. The operators are phrase,...

A Multimedia Retrieval System for Retrieving Chinese Text and Speech Documents (2008)

Yue-shi Lee, Hsin-hsi Chen

Multimedia documents place new requirements on the conventional text retrieval systems. This paper presents a multimedia retrieval system that employs the content-based strategy to retrieve both text...

Constructing a Named Entity Ontology from Web Corpora (2008)

Ming-shun Lin, Hsin-hsi Chen

This paper proposes a named entity (NE) ontology generation engine, called X NE-Tree engine, which produces relational named entities by given a seed. The engine incrementally extracts high...

Question Analysis and Answer Passage Retrieval for Opinion Question Answering Systems (2008)

Lun-wei Ku, Yu-ting Liang, Hsin-hsi Chen

Question answering systems provide an elegant way for people to access an underlying knowledge base. Humans are not only interested in factual questions but also interested in opinions. This paper...

General Terms Algorithms, Design, Experimentation. (2008)

Lun-wei Ku, Li-ying Lee, Tung-ho Wu, Hsin-hsi Chen

Watching specific information sources and summarizing the newly discovered opinions is important for governments to improve their services and companies to improve their products [1, 3]. Because no...

Merging Mechanisms in Multilingual Information Retrieval (2008)

Wen-cheng Lin, Hsin-hsi Chen

National Taiwan University (NTU) Natural Language Processing Laboratory (NLPL) participated in MLIR task in CLEF 2002. We submitted five official multilingual runs. In this paper, we try to resolve...

A Mandarin to Taiwanese Min Nan Machine Translation System with Speech Synthesis of Taiwanese Min Nan (2008)

Chuan-jie Lin, Hsin-hsi Chen

This paper presents a design of a Mandarin to Taiwanese Min Nan (abbreviated as Taiwanese hereafter) machine translation system. It is the first machine translation system which focuses on these two...

General Terms Algorithms, Design, Experimentation. (2008)

Lun-wei Ku, Li-ying Lee, Tung-ho Wu, Hsin-hsi Chen

Watching specific information sources and summarizing the newly discovered opinions is important for governments to improve their services and companies to improve their products [1, 3]. Because no...

Categories and Subject Descriptors (2008)

Ming-hung Hsu, Hsin-hsi Chen

This paper employs ConceptNet, which covers a rich set of commonsense concepts, to retrieve images with text descriptions by focusing on spatial relationships. Evaluation on test data of the 2005...

Proceedings of ROCLING-93, pp. 99-117 A PROBABILISTIC CHUNKER (2008)

Kuang-hua Chen, Hsin-hsi Chen

This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we...

A LOGIC PROGRAMMING APPROACH TO FRAME-BASED LANGUAGE DESIGN (2008)

Hsin-hsi Chen, I-peng Lin

In this paper, we will propose a logic programming approach to design a frame-based language. The relationship among frame, logic and Prolog is our basic design issue. Frame is considered as a...

Support Vector Machine Approach to Extracting Gene References into Function from Biological Documents (2008)

Chih Lee, Wen-juan Hou, Hsin-hsi Chen

In the biological domain, extracting newly discovered functional features from the massive literature is a major challenging issue. To automatically annotate Gene References into Function (GeneRIF)...

Integrating Punctuation Rules and Naïve Bayesian Model for Chinese Creation Title Recognition (2008)

Conrad Chen, Hsin-hsi Chen

Abstract. Creation titles, i.e. titles of literary and/or artistic works, comprise over 7 % of named entities in Chinese documents. They are the fourth large sort of named entities in Chinese other...

Gene Ontology Annotation Using Word Proximity Relationship (2008)

Wen-juan Hou, Hsin-hsi Chen

In this paper, we propose an approach for doing Gene Ontology (GO) annotation on full-text biomedical articles. This system explores the word proximity relationship between genes and GO terms. We...

Cross Document Event Clustering Using Knowledge Mining from Co-Reference Chains (2008)

June-jei Kuo, Hsin-hsi Chen

Abstract. Unification of the terminology usages which captures more term semantics is useful for event clustering. This paper proposes a metric of normalized chain edit distance to mine controlled...

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval (2008)

Wen-cheng Lin, Hsin-hsi Chen

This paper deals with Chinese, English and Japanese multilingual information retrieval. Several merging strategies, including raw-score merging, round-robin merging, normalized-score merging, and...

SVM Approach to GeneRIF Annotation Wen-Juan Hou, Chun-Yuan Teng, (2008)

Chih Lee, Hsin-hsi Chen

In the biological domain, to extract the newly discovered functional features from massive literature is a major challenging issue. To automatically annotate GeneRIF in a new literature is the main...

Retrieval of Biomedical Documents by Prioritizing Key Phrases (2008)

Wen-juan Hou, Hsin-hsi Chen

In this paper, we present an approach for retrieving relevant articles from the biomedical corpus. Our first run considered four kinds of operators as query expansion. The operators are phrase,...

National Taiwan University at Terabyte Track of TREC 2005 (2008)

Ming-hung Hsu, Hsin-hsi Chen

There are three tasks in the Terabyte track of TREC 2005, i.e. Efficiency, Ad hoc and Named page finding. We participated in all the tasks and use different retrieval methods to deal with each task,...

Description of NTU System at TREC-10 QA Track (2008)

Chuan-jie Lin, Hsin-hsi Chen

Introduction In the past years, we attended the 250-bytes group. Our main strategy was to measure the similarity score (or the informative score) of each candidate sentence to the question sentence....

Corpus-Based Analyses of Adjectives: Automatic Clustering (2007)

Kuang-hua Chen, Hsin-Hsi Chen

Similarity analysis is a substantial issue in both corpus-based researches and language usages. This paper focuses on the semantic usages of adjectives, and analyzes the similarities among...

whlin @ cs.cmu.edu (2007)

Wei-hao Lin, Hsin-hsi Chen

hh_chen @ csie.ntu.edu.tw In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task...

A Mandarin to Taiwanese Min Nan Machine Translation System with Speech Synthesis of Taiwanese Min Nan (2007)

Chuan-jie Lin, Hsin-hsi Chen

This paper presents a design of a Mandarin to Taiwanese Min Nan (abbreviated as Taiwanese hereafter) machine translation system. It is the first machine translation system which focuses on these two...

Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus (2007)

Lun-wei Ku, Yong-shen Lo, Hsin-hsi Chen

Opinion analysis is an important research topic in recent years. However, there are no common methods to create evaluation corpora. This paper introduces a method for developing opinion corpora...

Overview of opinion analysis pilot task at NTCIR-6 (2007)

Yohei Seki, David Kirk Evans, Lun-wei Ku, Hsin-hsi Chen, Noriko K, Chin-yew Lin

This paper describes an overview of the Opinion Analysis Pilot Task from 2006 to 2007 at the Sixth NT-CIR Workshop. We created test collection for 32, 30, and 28 topics (11,907, 15,279, and 8,379...

Experiment for Using Web Information to do Query and Document Expansion (2007)

Yih-chen Chang, Hsin-hsi Chen

Abstract. ImageCLEF photo task of this year is a little different from those of previous years. The caption field in image annotations and the narrative field in the text queries are removed, and the...

Frank: A ranking method with fidelity loss (2007)

Ming-feng Tsai, Tie-yan Liu, Tao Qin, Hsin-hsi Chen, Wei-ying Ma

Ranking problem is becoming increasingly important in many applications, especially in information retrieval. Many machine learning technologies have been proposed to solve this problem, such as...

Novel association measures using web search with double checking (2006)

Hsin-hsi Chen, Ming-shun Lin, Yu-chuan Wei

A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval (2006)

Yih-chen Chang, Hsin-hsi Chen

Two kinds of intermedia are explored in ImageCLEFphoto2006. The approach of using a word-image ontology maps images to fundamental concepts in an ontology and measure the similarity of two images by...

Translating–transliterating named entities for multilingual information access (2006)

Hsin-hsi Chen, Wen-cheng Lin, Changhua Yang, Wei-hao Lin

Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of...

Novel association measures using web search with double checking (2006)

Hsin-hsi Chen, Ming-shun Lin, Yu-chuan Wei

A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-

Combining text and image queries at imageclef 2005 (2005)

Yih-cheng Chang, Wen-cheng Lin, Hsin-hsi Chen

This paper presents our methods for the tasks of bilingual ad hoc retrieval and automatic annotation in ImageCLEF 2005. In ad hoc task, we propose a feedback method for cross-media translation in a...

Integrating Textual and Visual Information for Cross-Language Image Retrieval (2005)

Wen-cheng Lin, Yih-chen Chang, Hsin-hsi Chen

Abstract. This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual...

2005): Identifying Relevant Full-Text Articles for Database Curation (2005)

Chih Lee, Wen-juan Hou, Hsin-hsi Chen

In this paper, we propose an approach for identifying curatable articles from a large pool. Our system currently considers three parts of an article as three individual representations of the...

A Relevance Detection Approach to Gene Annotation (2005)

Wen-juan Hou, Chih Lee, Hsin-hsi Chen

Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene...

with filtering and integration strategies (2004)

Wen-juan Hou, Hsin-hsi Chen

Enhancing performance of protein and gene name recognizers

Annotating Multiple Types of Biomedical Entities: A Single Word Classification Approach (2004)

Chih Lee, Wen-juan Hou, Hsin-hsi Chen

Named entity recognition is a fundamental task in biomedical data mining. Multiple-class annotation is more challenging than single-class annotation. In this paper, we took a single word...

Identifying relevant full-text articles for GO annotation without MeSH terms. The Thirteenth Text (2004)

Chih Lee, Wen-juan Hou, Hsin-hsi Chen

Gene Ontology (GO) is a controlled vocabulary. Given a gene product, GO enables scientists to clearly and unambiguously describe specific molecular functions of the gene product, specific biological...

Cross-Language Image Retrieval via Spoken Query (2004)

Wen-cheng Lin, Ming-shun Lin, Hsin-hsi Chen

This paper studies cross-language cross-medium information retrieval. We introduce several approaches to unify the languages and media of queries and documents. We experiment on cross-language image...

Spoken Cross-Language Access to Image Collection via Captions (2003)

Hsin-hsi Chen

This paper presents a framework of using Chinese speech to access images via English captions. The formulation and the structure mapping rules of Chinese and English named entities are extracted from...

Learning formulation and transformation rules for multilingual named entities (2003)

Hsin-hsi Chen, Changhua Yang, Ying Lin

This paper investigates three multilingual named entity corpora, including named people, named locations and named organizations. Frequency-based approaches with and without dictionary are proposed...

Building a Chinese-English WordNet for Translingual Applications (2002)

Hsin-hsi Chen, Chi-ching Lin, Wen-cheng Lin

A WordNet-like linguistic resource is useful, but difficult to construct. This article proposes a method to integrate five linguistic resources, including English/Chinese sense-tagged corpora,...

Overview of clir task at the third ntcir workshop (2002)

Kuang-hua Chen, Hsin-hsi Chen, Noriko K, Kazuko Kuriyama, Sukhoon Lee, Sung Hyon Myaeng, ...

This report is an overview of Cross-Language Information Retrieval Task (CLIR) at the third NTCIR Workshop. There are 3 tracks in CLIR: Single Language IR (SLIR), Bilingual CLIR (BLIR), and...

2002) “Some Similarity Computation Methods in Novelty Detection (2002)

Ming-feng Tsai, Hsin-hsi Chen

In the noveky task, the amount of information of a sentence that can be used in similarky computation is the major challenging issue. Some sort of information expansion methods was introduced to...

Backward Machine Transliteration by Learning Phonetic Similarity (2002)

Wei-hao Lin, Hsin-hsi Chen

hh_chen @ csie.ntu.edu.tw In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task...

2002) “Some Similarity Computation Methods in Novelty Detection (2002)

Ming-feng Tsai, Hsin-hsi Chen

In the novelty task, the amount of information of a sentence that can be used in similarity computation is the major challenging issue. Some sort of information expansion methods was introduced to...

The Chinese text retrieval tasks of NTCIR workshop 2 (2001)

Kuang-hua Chen, Hsin-hsi Chen

This paper is a report of Chinese Text Retrieval (CHTR) tasks in NTCIR Workshop 2. CHTR tasks fall into two categories: Chinese-Chinese IR (CHIR) and English-Chinese IR (ECIR). The definitions,...

Mining tables from large scale html texts (2000)

Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering,...

Mining tables from large scale html texts (2000)

Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper l'ocuscs on mining tables from large-scale HTML texts. Table filtering,...

A multilingual news summarizer (2000)

Hsin-hsi Chen

Huge multilingual news articles are reported and disseminated on the Internet. How to extract the key information and save the reading time is a crucial issue. This paper proposes architecture of...

Mining tables from large scale html texts (2000)

Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering,...

A multilingual news summarizer (2000)

Hsin-hsi Chen, Chuan-jie Lin

Huge multilingual news articles are reported and disseminated on the Internet. How to extract the key information and save the reading time is a crucial issue. This paper proposes architecture of...

Mining tables from large scale html texts (2000)

Hsin-hsi Chen, Shih-chung Tsai, Jin-he Tsai

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering,...

A Summarization System for Chinese News from Multiple Sources (1999)

Hsin-hsi Chen, Sheng-jie Huang

This paper will propose a personal news secretariat that helps on-line readers absorb news information from multiple sources. Such a news secretariat eliminates the redundant information in the news,...

A Summarization System for Chinese News from Multiple Sources (1999)

Hsin-hsi Chen, June-jei Kuo, Sheng-jie Huang, Chuan-jie Lin, Hung-chia Wung

This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation...

Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval (1999)

Hsin-hsi Chen, Guo-wei Bian, Wen-cheng Lin

This paper deals with translation ambiguity and target polysemy problems together. Two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and...

Description of the NTU japanese-english crosslingual information retrieval system used for NTCIR workshop (1999)

Chuan-jie Lin, Wen-cheng Lin, Guo-wei Bian, Hsin-hsi Chen

This paper describes a Japanese-English Cross-Language Information Retrieval (CLIR) System for the evaluation

Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval (1999)

Hsin-hsi Chen, Guo-wei Bian, Wen-cheng Lin

This paper deals with translation ambiguity and target polysemy problems together. Two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and...

Breiman L.: Arcing classifiers (1998)

Chuan-jie Lin, Che-chia Liu, Hsin-hsi Chen

Captions in videos contain valuable information for video retrieval. Although texts in captions can be obtained easily in the new image compression formats like MPEG2, there still are many video...

A New Hybrid Approach for Chinese-English Query Translation (1998)

Guo-wei Bian, Hsin-hsi Chen

A new hybrid approach combining the dictionary-based and corpus-based approaches for Chinese-English cross-language information retrieval is proposed. The bilingual dictionary provides the...

Description of the NTU System Used for MET2 (1998)

Hsin-hsi Chen, Yung-wei Ding, Shih-chung Tsai, Guo-wei Bian

Named entities form the major components in a document. When we catch the fundamental entities, we can understand a document to some degree. This paper employs different types of information from...

Proper name translation in cross-language information retrieval (1998)

Hsin-hsi Chen, Sheng-jie Huang, Yung-wei Ding, Shih-chung Tsai

Recently, language barrier becomes the major problem for people to search, retrieve, and understand WWW documents in different languages. This paper deals with query translation issue in...

White Page Construction from Web Pages for Finding People on the Internet (1998)

Hsin-Hsi Chen, Guo-wei Bian

This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents...

Description Of The NTU System Used For MET2 (1998)

Hsin-Hsi Chen, Yung-Wei Ding, Shih-chung Tsai, Guo-wei Bian

Named entities form the major components in a document. When we catch the fundamental entities, we can understand a document to some degree. This paper employs different types of information from...

Applying Repair Processing in Chinese Homophone Disambiguation (1997)

Yue-shi Lee, Hsin-hsi Chen

Repair processing plays an important role in spoken language processing systems. This paper proposes a method for correcting Chinese repetition repairs and demonstrates the effects of repair...

Applying Repair Processing in Chinese Homophone Disambiguation (1997)

Yue-shi Lee, Hsin-hsi Chen

Repair processing plays an important role in spoken language processing systems. This paper proposes a method for correcting Chinese repetition repairs and demonstrates the effects of repair...

An MT Meta-Server for Information Retrieval on WWW (1997)

Guo-wei Bian, Hsin-hsi Chen

In the past few years, the World Wide Web (WWW) grows explosively and has become the most useful and powerful information retrieval and accessing system in the Internet. However, the language barrier...

An MT Meta-Server for Information Retrieval on WWW (1997)

Guo-wei Bian, Hsin-Hsi Chen

In the past few years, the World Wide Web (WWW) grows explosively and has become the most useful and powerful information retrieval and accessing system in the Internet. However, the language barrier...

A Hybrid Approach to Machine Translation System Design (1996)

Kuang-hua Chen, Hsin-hsi Chen

It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based...

Analysis of Error Count Distribution for Improving the Postprocessing Performance of OCCR (1996)

Yue-shi Lee, Hsin-hsi Chen

Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is...

Identification and classification of proper nouns in chinese texts (1996)

Hsin-hsi Chen, Jen-chang Lee

Various strategies are proposed to identify and classify three types of proper nouns in Chinese texts. Clues from character, sentence and paragraph levels are employed to resolve Chinese personal...

A Rule-Based and MTOriented Approach to Prepositional Phrases Attachment (1996)

Kuang-hua Chen, Hsin-hsi Chen

khcheil<~nlg.csie, ntu.ed u.tw, hh_chcn({~csic.nl u.cdu.tw I'rel)ositional l'hrase is the key issue in structuraJ ambiguity, l{ecently, re-searches in corpora provide the lexical cue of...

Analysis of Error Count Distribution for Improving the Postprocessing Performance of OCCR (1996)

Yue-shi Lee, Hsin-hsi Chen

Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is...

A Rule-Based and MT-Oriented Approach to Prepositional Phrase Attachment (1996)

Kuang-hua Chen, Hsin-Hsi Chen

Prepositional Phrase is the key issue in structural ambiguity. Recently, researches in corpora provide the lexical cue of prepositions with other words and the information could be used to partly...

Analysis of Error Count Distributions for Improving the Postprocessing Performance of OCCR (1996)

Yue-Shi Lee, Hsin-Hsi Chen

Contextual language processing plays an important role for the postprocessing of OCR. Its effects are demonstrated by many proposed systems. In general, it performs well. However, its performance is...

A Corpus-Based Approach to Text Partition (1995)

Kuang-hua Chen, Hsin-hsi Chen

A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and...

Aligning Bilingual corpus: Especially for Language Pairs from Different Families (1995)

Kuang-hua Chen, Hsin-hsi Chen

Rather than using length-based or translation-based criterion to align bilingual texts, this paper proposes a part-of-speech-based (POS-based) criterion. The postulation is that bilingual texts...

Development of Partially Bracketed Corpus with Part-of-Speech Information Only (1995)

Hsin-hsi Chen, Yue-shi Lee

Research based on a treebank is active for many natural language applications. However, the work to build a large scale treebank is laborious and tedious. This paper proposes a probabilistic chunker...

Machine Translation: An Integrated Approach (1995)

Kuang-hua Chen, Hsin-hsi Chen

A pure statistics-based machine translation system is usually incapable of processing long sentences and is usually domain dependent. A pure rule-based machine translation system involves many costs...

Development of Partially Bracketed Corpus with Part-of-Speech Information Only (1995)

Hsin-hsi Chen, Yue-shi Lee

Resea/ch based on a treebank is active for many natural language applications. However, the work to build a large scale treebank is laborious and tedious. This paper proposes a probabilistic chunker...

A Corpus-Based Approach to Text Partition (1995)

Kuang-hua Chen, Hsin-Hsi Chen

A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and...

Machine Translation: An Integrated Approach (1995)

Kuang-hua Chen, Hsin-Hsi Chen

A pure statistics-based machine translation system is usually incapable of processing long sentences and is usually domain dependent. A pure rule-based machine translation system involves many costs...

Aligning Parallel Chinese-English Texts Using Multiple Clues (1995)

Hsin-hsi Chen, Yeong-yui Wu

Parallel texts bring much linguistic information, so that they can be applied to word-sense disambiguation, extraction of translation templates, automatic translation of noun compounds, construction...

Approximate N-Gram Markov Model for Natural Language Generation (1994)

Chen, Hsin-Hsi, Lee, Yue-Shi

This paper proposes an Approximate n-gram Markov Model for bag generation. Directed word association pairs with distances are used to approximate (n-1)-gram and n-gram training tables. This model has...

A Corrective Training Algorithm for Adaptive Learning in Bag Generation (1994)

Chen, Hsin-Hsi, Lee, Yue-Shi

The sampling problem in training corpus is one of the major sources of errors in corpus-based applications. This paper proposes a corrective training algorithm to best-fit the run-time context domain...

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation (1994)

Chen, Kuang-hua, Chen, Hsin-Hsi

To acquire noun phrases from running texts is useful for many applications, such as word grouping,terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure...

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and its Automatic Evaluation (1994)

Kuang-hua Chen, Hsin-hsi Chen

To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure...

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation (1994)

Kuang-hua Chen, Hsin-hsi Chen

phrases. The partial parser is motivated by an intuition (Abney, 1991): To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc....

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and its Automatic Evaluation (1994)

Kuang-hua Chen, Hsin-hsi Chen

To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure...

The Contextual Analysis of Chinese Sentences with Punctuation Marks (1994)

CHEN, HSIN-HSI

By corpus analysis, about 75% of Chinese sentences are composed of more than two sentence segments separated by commas or semicolons. A segment may be a sentence, a noun phrase, a verb phrase,...

A Probabilistic Chunker (1993)

Kuang-hua Chen, Hsin-hsi Chen

This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we...

The Transfer of Anaphors in Translation (1992)

CHEN, HSIN-HSI

This paper adopts Government-Binding (GB) Theory and some Chinese-specific conditions to explain the various uses of anaphors such as the reflexive, pronoun, PRO, trace, pro and variable in Mandarin...