Min-yen Kan

Publication List Details

Period

1996 - 2009

Number

100

Co-Authors

Ordering Phrases with Function Words (2009)

Hendra Setiawan, Min-yen Kan, Haizhou Li

This paper presents a Function Word centered, Syntax-based (FWS) solution to address phrase ordering in the context of statistical machine translation (SMT). Motivated by the observation that...

The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics (2009)

Steven Bird, Robert Dale, Bonnie J. Dorr, Bryan Gibson, Mark T. Joseph, Min-yen Kan, ...

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of...

NUROP –Augmenting Focused Crawling using Search Engine Queries AUGMENTING FOCUSED CRAWLING USING SEARCH ENGINE QUERIES (2009)

Xuan Wang, Min-yen Kan

The pervasiveness of the Internet makes it an ideal medium for sharing scholarly information. Nowadays, many authors post their publications online so that others may easily access to them,...

Paraphrase Recognition via Dissimilarity Significance Classification (2009)

Long Qiu, Min-yen Kan, Tat-seng Chua

We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities...

using a (2009)

Yee Fan Tan, Min-yen Kan, Hang Cui

corpus-based identification of light verb constructions

NUS at DUC 2007: Using Evolutionary Models of Text (2008)

Ziheng Lin, Tat-seng Chua, Min-yen Kan, Wee Sun Lee, Long Qiu, Shiren Ye

This paper presents our new, querybased multi-document summarization system used in DUC 2007. Current graph-based approaches to text summarization, such as TextRank and LexRank, assume a static graph...

Paraphrase Recognition via Dissimilarity Significance Classification (2008)

Long Qiu, Min-yen Kan, Tat-seng Chua

We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities...

using a (2008)

Yee Fan Tan, Min-yen Kan, Hang Cui

corpus-based identification of light verb constructions

using a (2008)

Yee Fan Tan, Min-yen Kan, Hang Cui

corpus-based identification of light verb constructions

LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals (2008)

Min-yen Kan, Ye Wang, Denny Isk, Tin Lay Nwe, Arun Shenoy

Abstract—We present LyricAlly, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle...

and Retrieval ¡ Content Analysis and Indexing – Linguistic (2008)

Min-yen Kan

Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can hint at the category of the resource. This paper explores the use of...

using a (2008)

Yee Fan Tan, Min-yen Kan, Hang Cui

corpus-based identification of light verb constructions

ABSTRACT Web Based Linkage (2008)

Ergin Elmacioglu, Min-yen Kan, Dongwon Lee, Yi Zhang

When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward...

Detecting and supporting known item queries in online public access catalogs (2008)

Min-yen Kan

When users seek to find specific resources in a digital library, they often use the library catalog to locate them. These catalog queries are defined as known item queries. As known item queries...

Ordering Phrases with Function Words (2008)

Hendra Setiawan, Min-yen Kan

This paper presents a Function Word centered, Syntax-based (FWS) solution to address phrase ordering in the context of statistical machine translation (SMT). Motivated by the observation that...

Paraphrase Recognition via Dissimilarity Significance Classification (2008)

Long Qiu, Min-yen Kan, Tat-seng Chua

We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities...

Supervised Categorization of JavaScript TM using Program Analysis Features Abstract (2008)

Wei Lu, Min-yen Kan

Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial...

Technical Opinion When (2008)

Min-yen Kan, Yee Fan Tan

data stores grow large, data quality, cleaning, and integrity become issues. The commercial sector spends a massive amount of time and energy canonicalizing customer and product records as their...

Record Matching in Digital Library Metadata (2008)

Min-yen Kan, Yee Fan Tan

When data stores grow large, data quality, cleaning and integrity become issues. The commercial sector spends a massive amount of time and energy canonicalizing customer and product records, as their...

Supervised Categorization of JavaScript TM using Program Analysis Features (2008)

Wei Lu, Min-yen Kan

Abstract. Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or...

Soft Pattern Matching Models for Definitional Question Answering (2008)

Hang Cui, Min-yen Kan, Tat-seng Chua

We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard...

Storage and Retrieval]: Content Analysis and Indexing – Linguistic Processing (2008)

Min-yen Kan, Hoang Oanh, Nguyen Thi

We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is faster than typical web page classification, as the pages do not have...

In this demonstration, we present several integrated components (2008)

Noemie Elhadad, Min-yen Kan, Simon Lok, A Muresan, Of Image

to provide personalized access to a distributed digital library of medical literature and consumer health information. The global system architecture of PERSIVAL is best described as a two-stage...

Natural Language Processing; General Terms (2008)

Renxu Sun, Hang Cui, Keya Li, Min-yen Kan, Tat-seng Chua

Open domain question answering (QA) has become a popular research area in recent years. Most current QA systems search for answers in three major stages: document retrieval, passage retrieval and...

Soft Pattern Matching Models for Definitional Question Answering (2008)

Hang Cui, Min-yen Kan

We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression based hard...

ABSTRACT Web Based Linkage (2008)

Ergin Elmacioglu, Min-yen Kan, Dongwon Lee, Yi Zhang

When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward...

ABSTRACT Search Engine Driven Author Disambiguation (2008)

Yee Fan Tan, Min-yen Kan

In scholarly digital libraries, author disambiguation is an important task that attributes a scholarly work with specific authors. This is critical when individuals share the same name. We present an...

ABSTRACT Using Librarian Techniques in Automatic Text Summarization for Information Retrieval (2008)

Min-yen Kan

A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such...

ABSTRACT (2008)

Min-yen Kan

�ÓÒ � �ÓÖ �Ø × ÓÛÒ ×ØÖÙ ØÙÖ � � × �Ø Û� × ���Ñ� � Ø� � ÑÓר Ù×��ÙÐ

2 (2007)

Martin Braschler, Min-yen Kan, Peter Schuble, Judith L. Klavans

This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback,...

Extracting Japanese Domain and Technical Terms is Relatively Easy (2007)

Pascale Fung, Min-yen Kan, Yurie Horita

We argue that the important task of extracting domain and technical terms is much easier in Japanese than is commonly believed, and that technical term extraction should and can be more widely used....

specific (2007)

Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

composite topic structure trees for multiple domain

speci c (2007)

Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

composite topic structure trees for multiple domain

Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces (2007)

GOZALI, Jesse Prabawa, KAN, Min-Yen

We redesign the user interface of an online library catalog, leveraging current web technologies that allow dynamic and fine-grained user interaction. Over the course of our iterative design and test...

Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces (2007)

PRABAWA, Jesse Gozali, KAN, Min-Yen

We redesign the user interface of an online library catalog, leveraging current web technologies that allow dynamic and fine-grained user interaction. Over the course of our iterative design and test...

Document concept lattice for text understanding and summarization (2007)

Shiren Ye, Tat-seng Chua, Min-yen Kan, Long Qiu

We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or...

Timestamped Graphs: Evolutionary models of text for multidocument summarization (2007)

Ziheng Lin, Min-yen Kan

Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph...

Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces (2007)

Jesse Prabawa Gozali, Min-yen Kan, Jaffar Joxan, Jesse Prabawa Gozali

tutorial article, which has been submitted for publication in a journal or for consideration by the commissioning organization. The report represents the ideas of its author, and should not be taken...

Adaptive Sorted Neighborhood Methods for Efficient Record Linkage (2007)

Su Yan, Dongwon Lee, Min-yen Kan, C. Lee Giles

Traditionally, record linkage algorithms have played an important role in maintaining digital libraries - i.e., identifying matching citations or authors for consolidation in updating or integrating...

Keyphrase extraction in scientific publications (2007)

Thuy Dung Nguyen, Min-yen Kan

Abstract. We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in document with respect...

Timestamped Graphs: Evolutionary models of text for multidocument summarization (2007)

Ziheng Lin, Min-yen Kan

Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph...

Video retrieval using high level features: Exploiting query matching and confidence-based weighting (2006)

Shi-yong Neo, Jin Zhao, Min-yen Kan, Tat-seng Chua

Abstract. Recent research in video retrieval has focused on automated, highlevel feature indexing on shots or frames. One important application of such indexing is to support precise video retrieval....

NPIC: Hierarchical synthetic image classification using image search and generic features", CIVR’06 (2006)

Fei Wang, Min-yen Kan

Abstract. We introduce NPIC, an image classification system that focuses on synthetic (e.g., non-photographic) images. We use class-specific keywords in an image search engine to create a noisily...

Fast Webpage Classification Using URL Features (2005)

KAN, Min-Yen, NGUYEN THI, Hoanh Oanh

We demonstrate the usefulness of the uniform resource locator (URL)alone in performing web page classification. This approach is magnitudes faster than typical web page classification, as the pages...

Extending corpus-based identification of light verb constructions using a supervised learning framework (2005)

TAN, Yee Fan, KAN, Min-Yen, CUI, Hang

Light verb constructions (LVC) such as "make a call" and "give a presentation" pose challenges for natural language processing and understanding. We propose corpus-based methods to automatically...

Nus at duc 2005: Understanding documents via concept links (2005)

Shiren Ye, Long Qiu, Tat-seng Chua, Min-yen Kan

The primary goal of our participation in DUC 2005 is two-fold. One is to benchmark the performance of a method of computing sentence semantic similarity. The other is to test the effectiveness of a...

Question answering passage retrieval using dependency relations (2005)

Hang Cui, Renxu Sun, Keya Li, Min-yen Kan, Tat-seng Chua

State-of-the-art question answering (QA) systems employ termdensity ranking to retrieve answer passages. Such methods often retrieve incorrect passages as relationships among question terms are not...

A Public Reference Implementation of the RAP Anaphora Resolution Algorithm (2004)

Qiu, Long, Kan, Min-Yen, Chua, Tat-Seng

This paper describes a standalone, publicly-available implementation of the Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994). The RAP algorithm resolves third person pronouns,...

Unsupervised Learning of Soft Patterns for Generating Definitions from Online News (2004)

Cui, Hang, Kan, Min-Yen, Chua, Tat-Seng

Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such terms. Previous work...

Lyrically: automatic synchronization of acoustic musical signals and textual lyrics (2004)

Ye Wang, Min-yen Kan, Tin Lay, Nwe Arun, Shenoy Jun Yin

We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a...

Lyrically: automatic synchronization of acoustic musical signals and textual lyrics (2004)

Ye Wang, Min-yen Kan, Tin Lay, Nwe Arun, Shenoy Jun Yin

We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a...

A Comparative Study on Sentence Retrieval for Definitional Question Answering (2004)

Hang Cui, Min-yen Kan, Tat-seng Chua, Jing Xiao

Most definitional question answering (QA) systems integrate statistical ranking using Web and WordNet as external resources and pattern matching to retrieve relevant sentences for further processing....

A public reference implementation of the rap anaphora resolution algorithm (2004)

Long Qiu, Min-yen Kan, Tat-seng Chua

This paper describes a standalone, publicly-available implementation of the Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994). The RAP algorithm resolves third person pronouns,...

Web Page Categorization without the Web Page (2004)

Min-yen Kan

Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can hint at the category of the resource. This paper explores the use of...

Metadata extraction and text categorization using Universal Resource Locator expansions (2003)

Min-Yen KAN

Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of...

Automatic text summarization as applied to information retrieval: Using indicative and informative summaries (2003)

Kan, Min-Yen

I identify weaknesses with the standard “ranked list of documents” information retrieval user interface by examining the search process as performed in the traditional library by professional...

Universal Resource Locator expansions (2003)

Min-yen Kan

Metadata extraction and text categorization using

Qualifier in trec-12 qa main task (2003)

Hui Yang, Hang Cui, Mstislav Maslennikov, Long Qiu, Min-yen Kan, Tat-seng Chua

This paper describes a question answering system and its various modules to solve definition, factoid and list questions defined in the TREC12 Main task. In particular, we tackle the factoid QA task...

Qualifier in trec-12 qa main task (2003)

Hui Yang, Hang Cui, Min-yen Kan, Mstislav Maslennikov, Long Qiu, Tat-seng Chua

This paper describes a question answering system and its various models handling the definition, factoid and list questions defined in the TREC12 Main task. Specially, we model the factoid QA task as...

Using the Annotated Bibliography as a Resource for Indicative Summarization (2002)

Kan, Min-Yen, Klavans, Judith L., McKeown, Kathleen R.

We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...

Using the Annotated Bibliography as a Resource for Indicative Summarization (2002)

Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...

Proceedings (2002)

Of Of The, Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...

Using the Annotated Bibliography as a Resource for Indicative Summarization (2002)

Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...

Applying Natural Language Generation to Indicative Summarization (2001)

Kan, Min-Yen, McKeown, Kathleen R., Klavans, Judith L.

The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a...

Combining visual layout and lexical cohesion features for text segmentation (2001)

Kan, Min-Yen

We propose integrating features from lexicalcohesion with elements from layout recognition to build a compositeframework. We use supervised machine learning on this compositefeature set to derive...

Synthesizing composite topic structure trees for multiple domain specific documents (2001)

Kan, Min-Yen, McKeown, Kathleen R., Klavans, Judith L.

Domain specific texts often have implicit rules oncontent and organization. We introduce a novel method forsynthesizing this topical structure. The system uses corpus examplesand recursively merges...

Applying natural language generation to indicative summarization (2001)

Min-yen Kan, Kathleen R. Mckeown

min,kathy¡ The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from...

Simfinder: A flexible clustering tool for summarization (2001)

Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown

We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...

Combining visual layout and lexical cohesion features for text segmentation (2001)

Min-yen Kan

We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive...

Simfinder: A flexible clustering tool for summarization (2001)

Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown

We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...

SIMFINDER: A Flexible Clustering Tool for Summarization (2001)

Vasileios Hatzivassiloglou Judith, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown

We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...

Simfinder: A flexible clustering tool for summarization (2001)

Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown

We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...

Applying natural language generation to indicative summarization (2001)

Min-yen Kan, Kathleen R. Mckeown

The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a...

Information Extraction and Summarization: Domain Independence through Focus Types (1999)

Kan, Min-Yen, McKeown, Kathleen R.

We show how information extraction (IE) andsummarization can be merged in a sequential pipeline, resulting in anew approach to domain-independent summarization. IE finds thedocument's terms and...

The Eurospider Retrieval System and the TREC-8 Cross-Language Track (1999)

Martin Braschler, Min-yen Kan, Peter Schäuble, Judith L. Klavans

This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback,...

The Eurospider Retrieval System and the TREC-8 Cross-Language Track (1999)

Martin Braschler, Min-yen Kan, Peter Schäuble, Judith L. Klavans

This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback,...

Resources for Evaluation of Summarization Techniques (1998)

Klavans, Judith L., McKeown, Kathleen R., Kan, Min-Yen, Lee, Susan

We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of...

Linear Segmentation and Segment Significance (1998)

Kan, Min-Yen, Klavans, Judith L., McKeown, Kathleen R.

We present a new method for discovering a segmental discourse structure of a document while categorizing segment function. We demonstrate how retrieval of noun phrases and pronominal forms, along...

The Role of Verbs in Document Analysis (1998)

Klavans, Judith L., Kan, Min-Yen

We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...

Role of verbs in document analysis (1998)

Judith Klavans, Min-yen Kan

We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...

Linear segmentation and segment significance (1998)

Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

We present a new method for discovering a segmental discourse structure of a document while categorizing each segment's function and importance. Segments are determined by a zero-sum weighting...

Linear Segmentation and Segment Significance (1998)

Min-yen Kan, Judith L. Klavans, Kathleen R. McKeown

We present a new method for discovering a segmental discourse structure of a document while categorizing each segment's function and importance. Segments are determined by a zero-sum weighting...

Role of Verbs in Document Analysis (1998)

Judith Klavans, Min-yen Kan

We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...

Resources for Evaluation of Summarization Techniques (1998)

Judith L. Klavans, Kathleen R. McKeown, Min-Yen Kan, Susan Lee

We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of...

Role of verbs in document analysis (1998)

Judith Klavans, Min-yen Kan

We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...

Linear segmentation and segment significance (1998)

Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown

We present a new method for discovering a segmental discourse structure of a document while categorizing each segment’s function and importance. Segments are determined by a zero-sum weighting...

Role of verbs in document analysis (1998)

Judith Klavans, Min-yen Kan

We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...

Extracting Japanese Domain and Technical Terms is Relatively Easy (1996)

Min-yen Kan, Yurie Horita

We argue that the important task of extracting domain and technical terms is much easier in Japanese than is commonly believed, and that technical term extraction should and can be more widely used....

Extracting Japanese Domain and Technical Terms is Relatively Easy (1996)

Pascale Fung, Min-yen Kan, Yurie Horita

We argue that the important task of extracting domain and technical terms is much easier in Japanese than is commonly believed, and that technical term extraction should and can be more widely used....