Ordering Phrases with Function Words (2009)
Hendra Setiawan, Min-yen Kan, Haizhou Li
This paper presents a Function Word centered, Syntax-based (FWS) solution to address phrase ordering in the context of statistical machine translation (SMT). Motivated by the observation that...
Steven Bird, Robert Dale, Bonnie J. Dorr, Bryan Gibson, Mark T. Joseph, Min-yen Kan, ...
The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of...
The pervasiveness of the Internet makes it an ideal medium for sharing scholarly information. Nowadays, many authors post their publications online so that others may easily access to them,...
Paraphrase Recognition via Dissimilarity Significance Classification (2009)
Long Qiu, Min-yen Kan, Tat-seng Chua
We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities...
Yee Fan Tan, Min-yen Kan, Hang Cui
corpus-based identification of light verb constructions
NUS at DUC 2007: Using Evolutionary Models of Text (2008)
Ziheng Lin, Tat-seng Chua, Min-yen Kan, Wee Sun Lee, Long Qiu, Shiren Ye
This paper presents our new, querybased multi-document summarization system used in DUC 2007. Current graph-based approaches to text summarization, such as TextRank and LexRank, assume a static graph...
Paraphrase Recognition via Dissimilarity Significance Classification (2008)
Long Qiu, Min-yen Kan, Tat-seng Chua
We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities...
Yee Fan Tan, Min-yen Kan, Hang Cui
corpus-based identification of light verb constructions
Yee Fan Tan, Min-yen Kan, Hang Cui
corpus-based identification of light verb constructions
LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals (2008)
Min-yen Kan, Ye Wang, Denny Isk, Tin Lay Nwe, Arun Shenoy
Abstract—We present LyricAlly, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle...
and Retrieval ¡ Content Analysis and Indexing – Linguistic (2008)
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can hint at the category of the resource. This paper explores the use of...
Yee Fan Tan, Min-yen Kan, Hang Cui
corpus-based identification of light verb constructions
ABSTRACT Web Based Linkage (2008)
Ergin Elmacioglu, Min-yen Kan, Dongwon Lee, Yi Zhang
When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward...
Detecting and supporting known item queries in online public access catalogs (2008)
When users seek to find specific resources in a digital library, they often use the library catalog to locate them. These catalog queries are defined as known item queries. As known item queries...
Ordering Phrases with Function Words (2008)
This paper presents a Function Word centered, Syntax-based (FWS) solution to address phrase ordering in the context of statistical machine translation (SMT). Motivated by the observation that...
Paraphrase Recognition via Dissimilarity Significance Classification (2008)
Long Qiu, Min-yen Kan, Tat-seng Chua
We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects dissimilarities...
Supervised Categorization of JavaScript TM using Program Analysis Features Abstract (2008)
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial...
data stores grow large, data quality, cleaning, and integrity become issues. The commercial sector spends a massive amount of time and energy canonicalizing customer and product records as their...
Record Matching in Digital Library Metadata (2008)
When data stores grow large, data quality, cleaning and integrity become issues. The commercial sector spends a massive amount of time and energy canonicalizing customer and product records, as their...
Supervised Categorization of JavaScript TM using Program Analysis Features (2008)
Abstract. Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or...
Soft Pattern Matching Models for Definitional Question Answering (2008)
Hang Cui, Min-yen Kan, Tat-seng Chua
We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard...
Storage and Retrieval]: Content Analysis and Indexing – Linguistic Processing (2008)
Min-yen Kan, Hoang Oanh, Nguyen Thi
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is faster than typical web page classification, as the pages do not have...
In this demonstration, we present several integrated components (2008)
Noemie Elhadad, Min-yen Kan, Simon Lok, A Muresan, Of Image
to provide personalized access to a distributed digital library of medical literature and consumer health information. The global system architecture of PERSIVAL is best described as a two-stage...
Natural Language Processing; General Terms (2008)
Renxu Sun, Hang Cui, Keya Li, Min-yen Kan, Tat-seng Chua
Open domain question answering (QA) has become a popular research area in recent years. Most current QA systems search for answers in three major stages: document retrieval, passage retrieval and...
Soft Pattern Matching Models for Definitional Question Answering (2008)
We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression based hard...
ABSTRACT Web Based Linkage (2008)
Ergin Elmacioglu, Min-yen Kan, Dongwon Lee, Yi Zhang
When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward...
ABSTRACT Search Engine Driven Author Disambiguation (2008)
In scholarly digital libraries, author disambiguation is an important task that attributes a scholarly work with specific authors. This is critical when individuals share the same name. We present an...
ABSTRACT Using Librarian Techniques in Automatic Text Summarization for Information Retrieval (2008)
A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such...
Bibliography Published Papers by Dragomir R. Radev References (2008)
Dragomir R. Radev, Alfred Aho, Shih-fu Chang, Kathleen Mckeown, Dragomir Radev, James Allan, ...
retrieval and language modeling. SIGIR Forum, 37(1), March
Bibliography Published Papers by Dragomir R. Radev References (2008)
Dragomir R. Radev, Alfred Aho, Shih-fu Chang, Kathleen Mckeown, Dragomir Radev, James Allan, ...
retrieval and language modeling. SIGIR Forum, 37(1), March
Martin Braschler, Min-yen Kan, Peter Schuble, Judith L. Klavans
This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback,...
Extracting Japanese Domain and Technical Terms is Relatively Easy (2007)
Pascale Fung, Min-yen Kan, Yurie Horita
We argue that the important task of extracting domain and technical terms is much easier in Japanese than is commonly believed, and that technical term extraction should and can be more widely used....
Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
composite topic structure trees for multiple domain
Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
composite topic structure trees for multiple domain
Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces (2007)
GOZALI, Jesse Prabawa, KAN, Min-Yen
We redesign the user interface of an online library catalog, leveraging current web technologies that allow dynamic and fine-grained user interaction. Over the course of our iterative design and test...
Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces (2007)
PRABAWA, Jesse Gozali, KAN, Min-Yen
We redesign the user interface of an online library catalog, leveraging current web technologies that allow dynamic and fine-grained user interaction. Over the course of our iterative design and test...
Document concept lattice for text understanding and summarization (2007)
Shiren Ye, Tat-seng Chua, Min-yen Kan, Long Qiu
We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or...
Timestamped Graphs: Evolutionary models of text for multidocument summarization (2007)
Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph...
Rich and Dynamic Library Catalogs: A Case Study of Online Search Interfaces (2007)
Jesse Prabawa Gozali, Min-yen Kan, Jaffar Joxan, Jesse Prabawa Gozali
tutorial article, which has been submitted for publication in a journal or for consideration by the commissioning organization. The report represents the ideas of its author, and should not be taken...
Adaptive Sorted Neighborhood Methods for Efficient Record Linkage (2007)
Su Yan, Dongwon Lee, Min-yen Kan, C. Lee Giles
Traditionally, record linkage algorithms have played an important role in maintaining digital libraries - i.e., identifying matching citations or authors for consolidation in updating or integrating...
Keyphrase extraction in scientific publications (2007)
Abstract. We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in document with respect...
Timestamped Graphs: Evolutionary models of text for multidocument summarization (2007)
Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph...
Shi-yong Neo, Jin Zhao, Min-yen Kan, Tat-seng Chua
Abstract. Recent research in video retrieval has focused on automated, highlevel feature indexing on shots or frames. One important application of such indexing is to support precise video retrieval....
Abstract. We introduce NPIC, an image classification system that focuses on synthetic (e.g., non-photographic) images. We use class-specific keywords in an image search engine to create a noisily...
Fast Webpage Classification Using URL Features (2005)
KAN, Min-Yen, NGUYEN THI, Hoanh Oanh
We demonstrate the usefulness of the uniform resource locator (URL)alone in performing web page classification. This approach is magnitudes faster than typical web page classification, as the pages...
TAN, Yee Fan, KAN, Min-Yen, CUI, Hang
Light verb constructions (LVC) such as "make a call" and "give a presentation" pose challenges for natural language processing and understanding. We propose corpus-based methods to automatically...
Nus at duc 2005: Understanding documents via concept links (2005)
Shiren Ye, Long Qiu, Tat-seng Chua, Min-yen Kan
The primary goal of our participation in DUC 2005 is two-fold. One is to benchmark the performance of a method of computing sentence semantic similarity. The other is to test the effectiveness of a...
Question answering passage retrieval using dependency relations (2005)
Hang Cui, Renxu Sun, Keya Li, Min-yen Kan, Tat-seng Chua
State-of-the-art question answering (QA) systems employ termdensity ranking to retrieve answer passages. Such methods often retrieve incorrect passages as relationships among question terms are not...
A Public Reference Implementation of the RAP Anaphora Resolution Algorithm (2004)
Qiu, Long, Kan, Min-Yen, Chua, Tat-Seng
This paper describes a standalone, publicly-available implementation of the Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994). The RAP algorithm resolves third person pronouns,...
Unsupervised Learning of Soft Patterns for Generating Definitions from Online News (2004)
Cui, Hang, Kan, Min-Yen, Chua, Tat-Seng
Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such terms. Previous work...
Lyrically: automatic synchronization of acoustic musical signals and textual lyrics (2004)
Ye Wang, Min-yen Kan, Tin Lay, Nwe Arun, Shenoy Jun Yin
We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a...
Lyrically: automatic synchronization of acoustic musical signals and textual lyrics (2004)
Ye Wang, Min-yen Kan, Tin Lay, Nwe Arun, Shenoy Jun Yin
We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem using a...
A Comparative Study on Sentence Retrieval for Definitional Question Answering (2004)
Hang Cui, Min-yen Kan, Tat-seng Chua, Jing Xiao
Most definitional question answering (QA) systems integrate statistical ranking using Web and WordNet as external resources and pattern matching to retrieve relevant sentences for further processing....
A public reference implementation of the rap anaphora resolution algorithm (2004)
Long Qiu, Min-yen Kan, Tat-seng Chua
This paper describes a standalone, publicly-available implementation of the Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994). The RAP algorithm resolves third person pronouns,...
Web Page Categorization without the Web Page (2004)
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can hint at the category of the resource. This paper explores the use of...
Metadata extraction and text categorization using Universal Resource Locator expansions (2003)
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can indicate metadata about a resource. This paper explores the mining of...
I identify weaknesses with the standard “ranked list of documents” information retrieval user interface by examining the search process as performed in the traditional library by professional...
Universal Resource Locator expansions (2003)
Metadata extraction and text categorization using
Qualifier in trec-12 qa main task (2003)
Hui Yang, Hang Cui, Mstislav Maslennikov, Long Qiu, Min-yen Kan, Tat-seng Chua
This paper describes a question answering system and its various modules to solve definition, factoid and list questions defined in the TREC12 Main task. In particular, we tackle the factoid QA task...
Qualifier in trec-12 qa main task (2003)
Hui Yang, Hang Cui, Min-yen Kan, Mstislav Maslennikov, Long Qiu, Tat-seng Chua
This paper describes a question answering system and its various models handling the definition, factoid and list questions defined in the TREC12 Main task. Specially, we model the factoid QA task as...
Department: Computer Science.
Using the Annotated Bibliography as a Resource for Indicative Summarization (2002)
Kan, Min-Yen, Klavans, Judith L., McKeown, Kathleen R.
We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...
Using the Annotated Bibliography as a Resource for Indicative Summarization (2002)
Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...
Of Of The, Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...
Using the Annotated Bibliography as a Resource for Indicative Summarization (2002)
Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated...
Applying Natural Language Generation to Indicative Summarization (2001)
Kan, Min-Yen, McKeown, Kathleen R., Klavans, Judith L.
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a...
Combining visual layout and lexical cohesion features for text segmentation (2001)
We propose integrating features from lexicalcohesion with elements from layout recognition to build a compositeframework. We use supervised machine learning on this compositefeature set to derive...
Synthesizing composite topic structure trees for multiple domain specific documents (2001)
Kan, Min-Yen, McKeown, Kathleen R., Klavans, Judith L.
Domain specific texts often have implicit rules oncontent and organization. We introduce a novel method forsynthesizing this topical structure. The system uses corpus examplesand recursively merges...
Applying natural language generation to indicative summarization (2001)
Min-yen Kan, Kathleen R. Mckeown
min,kathy¡ The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from...
Simfinder: A flexible clustering tool for summarization (2001)
Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown
We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...
Applying natural language generation to indicative summarization (2001)
Min-yen Kan, Kathleen R. Mckeown, Judith L. Klavans
Workshop on
Domain-specific informative and indicative summarization for information retrieval (2001)
Min-yen Kan, Kathleen R. Mckeown, Judith L. Klavans
retrieval
Combining visual layout and lexical cohesion features for text segmentation (2001)
We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive...
Simfinder: A flexible clustering tool for summarization (2001)
Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown
We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...
SIMFINDER: A Flexible Clustering Tool for Summarization (2001)
Vasileios Hatzivassiloglou Judith, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown
We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...
Simfinder: A flexible clustering tool for summarization (2001)
Vasileios Hatzivassiloglou, Judith L. Klavans, Melissa L. Holcombe, Regina Barzilay, Min-yen Kan, Kathleen R. Mckeown
We present a statistical similarity measuring and clustering tool, SIMFINDER, that organizes small pieces of text from one or multiple documents into tight clusters. By placing highly related text...
Applying natural language generation to indicative summarization (2001)
Min-yen Kan, Kathleen R. Mckeown
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a...
Information Extraction and Summarization: Domain Independence through Focus Types (1999)
Kan, Min-Yen, McKeown, Kathleen R.
We show how information extraction (IE) andsummarization can be merged in a sequential pipeline, resulting in anew approach to domain-independent summarization. IE finds thedocument's terms and...
The Eurospider Retrieval System and the TREC-8 Cross-Language Track (1999)
Martin Braschler, Min-yen Kan, Peter Schäuble, Judith L. Klavans
This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback,...
The Eurospider Retrieval System and the TREC-8 Cross-Language Track (1999)
Martin Braschler, Min-yen Kan, Peter Schäuble, Judith L. Klavans
This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback,...
Resources for Evaluation of Summarization Techniques (1998)
Klavans, Judith L., McKeown, Kathleen R., Kan, Min-Yen, Lee, Susan
We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of...
Linear Segmentation and Segment Significance (1998)
Kan, Min-Yen, Klavans, Judith L., McKeown, Kathleen R.
We present a new method for discovering a segmental discourse structure of a document while categorizing segment function. We demonstrate how retrieval of noun phrases and pronominal forms, along...
The Role of Verbs in Document Analysis (1998)
Klavans, Judith L., Kan, Min-Yen
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...
Role of verbs in document analysis (1998)
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...
Linear segmentation and segment significance (1998)
Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
We present a new method for discovering a segmental discourse structure of a document while categorizing each segment's function and importance. Segments are determined by a zero-sum weighting...
Linear Segmentation and Segment Significance (1998)
Min-yen Kan, Judith L. Klavans, Kathleen R. McKeown
We present a new method for discovering a segmental discourse structure of a document while categorizing each segment's function and importance. Segments are determined by a zero-sum weighting...
Role of Verbs in Document Analysis (1998)
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...
Resources for Evaluation of Summarization Techniques (1998)
Judith L. Klavans, Kathleen R. McKeown, Min-Yen Kan, Susan Lee
We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of...
Resources for the Evaluation of Summarization Techniques (1998)
Judith L. Klavans, Kathleen R. Mckeown, Min-yen Kan, Susan Lee
Role of verbs in document analysis (1998)
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...
Linear segmentation and segment significance (1998)
Min-yen Kan, Judith L. Klavans, Kathleen R. Mckeown
We present a new method for discovering a segmental discourse structure of a document while categorizing each segment’s function and importance. Segments are determined by a zero-sum weighting...
Role of verbs in document analysis (1998)
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than...
Extracting Japanese Domain and Technical Terms is Relatively Easy (1996)
We argue that the important task of extracting domain and technical terms is much easier in Japanese than is commonly believed, and that technical term extraction should and can be more widely used....
Extracting Japanese Domain and Technical Terms is Relatively Easy (1996)
Pascale Fung, Min-yen Kan, Yurie Horita
We argue that the important task of extracting domain and technical terms is much easier in Japanese than is commonly believed, and that technical term extraction should and can be more widely used....