Hinrich Schütze

The Effect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation (2009)

Michaela Atterer, Hinrich Schütze

We investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. The...

On a Generic Uncertainty Model for Position Information (2009)

Lange, Ralph, Weinschrott, Harald, Geiger, Lars, Blessing, Andre, Dürr, Frank, Rothermel, Kurt, ...

Position information of moving as well as stationary objects is generally subject to uncertainties due to inherent measuring errors of positioning technologies, explicit tolerances of position update...

Performance Thresholding in Practical Text Classification ABSTRACT (2008)

Hinrich Schütze

In practical classification, there is often a mix of learnable and unlearnable classes and only a classifier above a minimum performance threshold can be deployed. This problem is exacerbated if the...

AN EXEMPLAR-THEORETIC ACCOUNT OF SYLLABLE FREQUENCY EFFECTS (2008)

Michael Walsh, Hinrich Schütze, Bernd Möbius, Antje Schweitzer

This paper presents an exemplar-theoretic computational model of syllable frequency effects which yields simulation results in keeping with experimental results found in the literature. The argument...

Reviewed by (2008)

Christopher D. Manning, Hinrich Schütze, Ma The, Mit Press, Lillian Lee

time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak’s text was the...

Information Retrieval Based on Word Senses (2007)

Hinrich Schütze, Jan O. Pedersen

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...

Methods Using Text Analysis to Identify Functionally Coherent Gene Groups (2007)

Soumya Raychaudhuri, Hinrich Schütze, Russ B. Altman

The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene...

Towards a context model driven german geo-tagging system (2007)

Blessing, Andre, Kuntz, Reinhard, Schütze, Hinrich

In this paper, we present a new approach for recognition and grounding of geographic proper names for German. Named Entity Recognition (NER) in German is more difficult than in English because not...

Language-Derived Information and Context Models (2006)

Blessing, Andre, Klatt, Stefan, Nicklas, Daniela, Volz, Steffen, Schütze, Hinrich

There are a number of possible sources for information about the environment when creating or updating a context model, including sensorial input, databases, and explicit modeling by the system...

Letter-to-Phoneme Conversion for a German Text-to-Speech System (2005)

Vera Demberg, Betreuer Dr. Helmut Schmid, Dr. Gregor Möhler, Prüfer Prof, Hinrich Schütze

This thesis deals with the conversion from letters to phonemes, syllabification and word stress assignment for a German text-to-speech system. In the first part of the thesis (chapter 5), several...

GAPSCORE: finding gene and protein names one word at a time (2004)

Jeffrey T. Chang, Hinrich Schütze, Russ B. Altman

Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text....

Die Interface between Linguistic and Knowledge Representation (2004)

Hinrich Schütze, Unterrichtssprache Deutsch

The interaction between knowledge representation and linguistics is important for theoretical and practical reasons. World knowledge is an important part of many linguistic theories, e.g.,...

GAPSCORE: finding gene and protein names one word at a time (2004)

Chang, Jeffrey T., Schütze, Hinrich, Altman, Russ B.

Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text....

Multi-Modal Browsing of Images in Web Documents (1999)

Francine Chen, Ullas Gargi, Les Niles, Hinrich Schütze

In this paper, we describe a system for performing browsing and retrieval on a collection of web images and associated text on an HTML page. Browsing is combined with retrieval to help a user locate...

Automatic Word Sense Discrimination (1998)

Hinrich Schütze

This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words,...

Introduction (1997)

Christopher Manning, Hinrich Schütze

Introduction " Statistical considerations are essential to an understanding of the operation and development of languages" (Lyons 1968:98) Before we delve into a lot of theory, this chapter...

Projections for Efficient Document Clustering (1997)

Hinrich Schütze, Craig Silverstein

Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the...

Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks (1997)

David A. Hull, Gregory Grefenstette, B. Maximilian Schulze, Eric Gaussier, Hinrich Schütze, Jan O. Pedersen

this report is divided into three sections. The first section describes our work on routing and filtering (Hull, Schutze, and Pedersen), the second section covers the NLP track (Grefenstette,...

Xerox Site Report: Four TREC-4 Tracks (1996)

Marti Hearst, Peter Pirolli, Hinrich Schütze, Gregory Grefenstette, David Hull

this document sample than one would expect by chance. The terms are selected according to a binomial likelihood ratio test [10], comparing their occurrence in the first 20 documents to their...

Markov Models (1996)

Christopher Manning, Hinrich Schütze

we want to consider a sequence (perhaps through time) of random variables that aren't independent, but depend rather on previous elements in the sequence. For many such systems, it seems...

Document Routing as Statistical Classification (1996)

David Hull, Jan Pedersen, Hinrich Schütze

In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...

A comparison of classifiers and document representations for the routing problem (1995)

Hinrich Schütze, David A. Hull, Jan O. Pedersen

In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...

A comparison of classifiers and document representations for the routing problem (1995)

Hinrich Schütze, David A. Hull, Jan O. Pedersen

In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...

Information Retrieval Based on Word Senses (1995)

Hinrich Schütze, Jan O. Pedersen

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...

Xerox TREC 3 Report: Combining Exact and Fuzzy Predictors (1995)

Hinrich Schütze, Jan O. Pedersen, Marti A. Hearst

this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.

Part-of-Speech Tagging Using a Variable Memory Markov Model (1994)

Hinrich Schütze, Yoram Singer

We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to fixed-length Markov models, which predict based on...

Customizing a Lexicon to Better Suit a Computational Task (1993)

Marti A. Hearst, Hinrich Schütze

We discuss a method for augmenting and rearranging a structured lexicon in order to make it more suitable for a topic labeling task, by making use of lexical association information from a large text...

Word Space (1993)

Hinrich Schütze

Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, corpus-based method for...

A Vector Model for Syntagmatic and Paradigmatic Relatedness (1993)

For Syntagmatic, Paradigmatic Relatedness, Hinrich Schütze, Jan Pedersen

This paper introduces context digests, high-dimensional real-valued representations for the typical left and right contexts of a word. Initial entries for the context digests are formed from the...

Using Text Analysis to Identify Functionally Coherent Gene Groups

Raychaudhuri, Soumya, Schütze, Hinrich, Altman, Russ B.

The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene...

Creating an Online Dictionary of Abbreviations from MEDLINE

Chang, Jeffrey T., Schütze, Hinrich, Altman, Russ B.

Objective. The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of...

Using Text Analysis to Identify Functionally Coherent Gene Groups

Raychaudhuri, Soumya, Schütze, Hinrich, Altman, Russ B.

The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene...

Creating an Online Dictionary of Abbreviations from MEDLINE

Chang, Jeffrey T., Schütze, Hinrich, Altman, Russ B.

Objective. The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of...