Jan O. Pedersen

Information Retrieval Based on Word Senses (2007)

Hinrich Schütze, Jan O. Pedersen

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...

y (2007)

David A. Hull, Jan O. Pedersen, Hinrich Schutze

There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of...

A Comparative Study on Feature Selection in Text Categorization (1997)

Yiming Yang, Jan O. Pedersen

This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods were evaluated,...

Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks (1997)

David A. Hull, Gregory Grefenstette, B. Maximilian Schulze, Eric Gaussier, Hinrich Schütze, Jan O. Pedersen

this report is divided into three sections. The first section describes our work on routing and filtering (Hull, Schutze, and Pedersen), the second section covers the NLP track (Grefenstette,...

Almost-Constant-Time Clustering of Arbitrary Corpus Subsets (1997)

Craig Silverstein, Jan O. Pedersen

Methods exist for constant-time clustering of corpus subsets selected via Scatter/Gather browsing [3]. In this paper we expand on those techniques, giving an algorithm for almostconstant -time...

Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results (1996)

Marti A. Hearst, Jan O. Pedersen

We present Scatter/Gather, a cluster-based document browsing method, as an alternative to ranked titles for the organization and viewing of retrieval results. We systematically evaluate...

A comparison of classifiers and document representations for the routing problem (1995)

Hinrich Schütze, David A. Hull, Jan O. Pedersen

In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...

Information Retrieval Based on Word Senses (1995)

Hinrich Schutze, Jan O. Pedersen

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...

Scatter/Gather as a Tool for the Navigation of Retrieval Results (1995)

Marti A. Hearst, David R. Karger, Jan O. Pedersen

An important information access problem arises when the user is confronted with a very large number of documents that have been retrieved in response to a query. In this paper we explore the use of a...

A comparison of classifiers and document representations for the routing problem (1995)

Hinrich Schütze, David A. Hull, Jan O. Pedersen

In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...

Information Retrieval Based on Word Senses (1995)

Hinrich Schütze, Jan O. Pedersen

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...

A Neural Network Approach to Topic Spotting (1995)

Erik Wiener, Jan O. Pedersen, Andreas S. Weigend

This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict...

Xerox TREC 3 Report: Combining Exact and Fuzzy Predictors (1995)

Hinrich Schütze, Jan O. Pedersen, Marti A. Hearst

this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.

Revealing collection structure through information access interface (1995)

Marti A. Hearst, Jan O. Pedersen

on amplifying the users ' cognitive abilities, rather than trying to completely automate them. This framework emphasizes the participation of the user in a cycle of query formulation,...

Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections (1993)

Douglass R. Cutting, David R. Karger, Jan O. Pedersen

The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contentslike outlines of large document collections. Previous work [1] developed linear-time document...

Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections (1993)

Douglass Cutting, David R. Karger, Jan O. Pedersen

The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contentslike outlines of large document collections. Previous work [1] developed linear-time document...

Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections (1992)

Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with...

Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections (1992)

Douglass Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with...