David A. Hull, Jan O. Pedersen, Hinrich Schutze
There is strong empirical and theoretic evidence that combination of retrieval methods can improve performance. In this paper, we systematically compare combination strategies in the context of...
In Proceedings of Supercomputing '92 Dimensions of Meaning (2007)
The representation of documents and queries as vectors in a high-dimensional space is well-established in information retrieval [1]. This paper proposes to represent the semantics of words and...
The hypertext concordance: a better back-of-the-book index (1998)
This paper describes a tool that creates a usable index for organizing and hyperlinking related web pages. This type of hypertext construction is an important application of terminology extraction....
Projections for efficient document clustering (1997)
Hinrich Schutze, Craig Silverstein
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the...
Automatic Detection of Text Genre (1997)
Brett Kessler, Geoffrey Nunberg, Hinrich Schutze
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles...
Information Retrieval Based on Word Senses (1995)
Hinrich Schutze, Jan O. Pedersen
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...