Information Retrieval Based on Word Senses (2007)
Hinrich Schütze, Jan O. Pedersen
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...
Ramana Rao, Jan O. Pedersen, Marti A. Hearst, George G. Robertson
Effective information access involves rich interactions between users and information residing in diverse locations. Users seek and retrieve information from the sources—for example, file servers,...
David A. Hull, Jan O. Pedersen, Hinrich Schutze
There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of...
A Comparative Study on Feature Selection in Text Categorization (1997)
This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods were evaluated,...
Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks (1997)
David A. Hull, Gregory Grefenstette, B. Maximilian Schulze, Eric Gaussier, Hinrich Schütze, Jan O. Pedersen
this report is divided into three sections. The first section describes our work on routing and filtering (Hull, Schutze, and Pedersen), the second section covers the NLP track (Grefenstette,...
Almost-Constant-Time Clustering of Arbitrary Corpus Subsets (1997)
Craig Silverstein, Jan O. Pedersen
Methods exist for constant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In this paper we expand on those techniques, giving an algorithm for almostconstant -time...
Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results (1996)
Marti A. Hearst, Jan O. Pedersen
We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate...
A comparison of classifiers and document representations for the routing problem (1995)
Hinrich Schütze, David A. Hull, Jan O. Pedersen
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...
Information Retrieval Based on Word Senses (1995)
Hinrich Schutze, Jan O. Pedersen
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...
Scatter/Gather as a Tool for the Navigation of Retrieval Results (1995)
Marti A. Hearst, David R. Karger, Jan O. Pedersen
An important information access problem arises when the user is confronted with a very large number of documents that have been retrieved in response to a query. In this paper we explore the use of a...
A comparison of classifiers and document representations for the routing problem (1995)
Hinrich Schütze, David A. Hull, Jan O. Pedersen
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...
Information Retrieval Based on Word Senses (1995)
Hinrich Schütze, Jan O. Pedersen
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...
A Neural Network Approach to Topic Spotting (1995)
Erik Wiener, Jan O. Pedersen, Andreas S. Weigend
This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict...
Xerox TREC 3 Report: Combining Exact and Fuzzy Predictors (1995)
Hinrich Schütze, Jan O. Pedersen, Marti A. Hearst
this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.
Revealing collection structure through information access interface (1995)
Marti A. Hearst, Jan O. Pedersen
on amplifying the users ' cognitive abilities, rather than trying to completely automate them. This framework emphasizes the participation of the user in a cycle of query formulation,...
Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections (1993)
Douglass R. Cutting, David R. Karger, Jan O. Pedersen
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contentslike outlines of large document collections. Previous work [1] developed linear-time document...
Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections (1993)
Douglass Cutting, David R. Karger, Jan O. Pedersen
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contentslike outlines of large document collections. Previous work [1] developed linear-time document...
Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections (1992)
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey
Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with...
Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections (1992)
Douglass Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey
Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with...