Michaela Atterer, Hinrich Schütze
We investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. The...
On a Generic Uncertainty Model for Position Information (2009)
Lange, Ralph, Weinschrott, Harald, Geiger, Lars, Blessing, Andre, Dürr, Frank, Rothermel, Kurt, ...
Position information of moving as well as stationary objects is generally subject to uncertainties due to inherent measuring errors of positioning technologies, explicit tolerances of position update...
References [1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information (2008)
Christofer D. Manning, Hinrich Schütze, Foundations Statistical
[2] Eugene Charniak. Statistical techniques for natural language parsing. AI
Performance Thresholding in Practical Text Classification ABSTRACT (2008)
In practical classification, there is often a mix of learnable and unlearnable classes and only a classifier above a minimum performance threshold can be deployed. This problem is exacerbated if the...
AN EXEMPLAR-THEORETIC ACCOUNT OF SYLLABLE FREQUENCY EFFECTS (2008)
Michael Walsh, Hinrich Schütze, Bernd Möbius, Antje Schweitzer
This paper presents an exemplar-theoretic computational model of syllable frequency effects which yields simulation results in keeping with experimental results found in the literature. The argument...
Christopher D. Manning, Hinrich Schütze, Ma The, Mit Press, Lillian Lee
time, empirical techniques to natural language processing were on the rise — in that year, Computational Linguistics published a special issue on such methods — and Charniak’s text was the...
Information Retrieval Based on Word Senses (2007)
Hinrich Schütze, Jan O. Pedersen
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...
Methods Using Text Analysis to Identify Functionally Coherent Gene Groups (2007)
Soumya Raychaudhuri, Hinrich Schütze, Russ B. Altman
The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene...
Towards a context model driven german geo-tagging system (2007)
Blessing, Andre, Kuntz, Reinhard, Schütze, Hinrich
In this paper, we present a new approach for recognition and grounding of geographic proper names for German. Named Entity Recognition (NER) in German is more difficult than in English because not...
Language-Derived Information and Context Models (2006)
Blessing, Andre, Klatt, Stefan, Nicklas, Daniela, Volz, Steffen, Schütze, Hinrich
There are a number of possible sources for information about the environment when creating or updating a context model, including sensorial input, databases, and explicit modeling by the system...
Von Philosophisch-historischen, Beate Dorow, Erstgutachter Pd, Dr. Ulrich Heid, Zweitgutachter Prof, ...
Letter-to-Phoneme Conversion for a German Text-to-Speech System (2005)
Vera Demberg, Betreuer Dr. Helmut Schmid, Dr. Gregor Möhler, Prüfer Prof, Hinrich Schütze
This thesis deals with the conversion from letters to phonemes, syllabification and word stress assignment for a German text-to-speech system. In the first part of the thesis (chapter 5), several...
GAPSCORE: finding gene and protein names one word at a time (2004)
Jeffrey T. Chang, Hinrich Schütze, Russ B. Altman
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text....
Die Interface between Linguistic and Knowledge Representation (2004)
Hinrich Schütze, Unterrichtssprache Deutsch
The interaction between knowledge representation and linguistics is important for theoretical and practical reasons. World knowledge is an important part of many linguistic theories, e.g.,...
GAPSCORE: finding gene and protein names one word at a time (2004)
Chang, Jeffrey T., Schütze, Hinrich, Altman, Russ B.
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text....
Foundations of Statistical Natural Language Processing (1999)
Manning, Christopher D., Schütze, Hinrich
0-262-13360-1
Multi-Modal Browsing of Images in Web Documents (1999)
Francine Chen, Ullas Gargi, Les Niles, Hinrich Schütze
In this paper, we describe a system for performing browsing and retrieval on a collection of web images and associated text on an HTML page. Browsing is combined with retrieval to help a user locate...
Automatic Word Sense Discrimination (1998)
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words,...
Christopher Manning, Hinrich Schütze
Introduction " Statistical considerations are essential to an understanding of the operation and development of languages" (Lyons 1968:98) Before we delve into a lot of theory, this chapter...
Projections for Efficient Document Clustering (1997)
Hinrich Schütze, Craig Silverstein
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the...
Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks (1997)
David A. Hull, Gregory Grefenstette, B. Maximilian Schulze, Eric Gaussier, Hinrich Schütze, Jan O. Pedersen
this report is divided into three sections. The first section describes our work on routing and filtering (Hull, Schutze, and Pedersen), the second section covers the NLP track (Grefenstette,...
Xerox Site Report: Four TREC-4 Tracks (1996)
Marti Hearst, Peter Pirolli, Hinrich Schütze, Gregory Grefenstette, David Hull
this document sample than one would expect by chance. The terms are selected according to a binomial likelihood ratio test [10], comparing their occurrence in the first 20 documents to their...
Christopher Manning, Hinrich Schütze
we want to consider a sequence (perhaps through time) of random variables that aren't independent, but depend rather on previous elements in the sequence. For many such systems, it seems...
Document Routing as Statistical Classification (1996)
David Hull, Jan Pedersen, Hinrich Schütze
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...
A comparison of classifiers and document representations for the routing problem (1995)
Hinrich Schütze, David A. Hull, Jan O. Pedersen
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...
A comparison of classifiers and document representations for the routing problem (1995)
Hinrich Schütze, David A. Hull, Jan O. Pedersen
In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification...
Information Retrieval Based on Word Senses (1995)
Hinrich Schütze, Jan O. Pedersen
This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It differs from standard approaches by allowing...
Xerox TREC 3 Report: Combining Exact and Fuzzy Predictors (1995)
Hinrich Schütze, Jan O. Pedersen, Marti A. Hearst
this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.
Part-of-Speech Tagging Using a Variable Memory Markov Model (1994)
We present a new approach to disambiguating syntactically ambiguous words in context, based on Variable Memory Markov (VMM) models. In contrast to fixed-length Markov models, which predict based on...
Customizing a Lexicon to Better Suit a Computational Task (1993)
Marti A. Hearst, Hinrich Schütze
We discuss a method for augmenting and rearranging a structured lexicon in order to make it more suitable for a topic labeling task, by making use of lexical association information from a large text...
Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, corpus-based method for...
A Vector Model for Syntagmatic and Paradigmatic Relatedness (1993)
For Syntagmatic, Paradigmatic Relatedness, Hinrich Schütze, Jan Pedersen
This paper introduces context digests, high-dimensional real-valued representations for the typical left and right contexts of a word. Initial entries for the context digests are formed from the...
Using Text Analysis to Identify Functionally Coherent Gene Groups
Raychaudhuri, Soumya, Schütze, Hinrich, Altman, Russ B.
The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene...
Creating an Online Dictionary of Abbreviations from MEDLINE
Chang, Jeffrey T., Schütze, Hinrich, Altman, Russ B.
Objective. The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of...
Using Text Analysis to Identify Functionally Coherent Gene Groups
Raychaudhuri, Soumya, Schütze, Hinrich, Altman, Russ B.
The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene...
Creating an Online Dictionary of Abbreviations from MEDLINE
Chang, Jeffrey T., Schütze, Hinrich, Altman, Russ B.
Objective. The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of...