UMND2: SenseClusters Applied to the Sense Induction Task of SENSEVAL-4 (2009)
SenseClusters is a freely–available open– source system that served as the University of Minnesota, Duluth entry in the SENSEVAL-4 sense induction task. For this task SenseClusters was configured...
Determining the Syntactic Structure of Medical Terms in Clinical Notes (2009)
Bridget T. Mcinnes, Ted Pedersen, Serguei V. Pakhomov
This paper demonstrates a method for determining the syntactic structure of medical terms. We use a model-fitting method based on the Log Likelihood Ratio to classify three-word medical terms as...
SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts (2009)
SenseClusters is a freely available system that identifies similar contexts in text. It relies on lexical features to build first and second order representations of contexts, which are then...
SenseRelate::TargetWord - A Generalized Framework for Word Sense Disambiguation (2009)
Siddharth Patwardhan, Satanjeev Banerjee, Ted Pedersen
We have previously introduced a method of word sense disambiguation that computes the intended sense of a target word, using WordNet-based measures of semantic relatedness (Patwardhan et al., 2003)....
6 Unsupervised corpus-based methods for WSD (2009)
This chapter focuses on unsupervised corpus-based methods of word sense discrimination that are knowledge-lean, and do not rely on external knowledge sources such as machine readable dictionaries,...
Bridget T. Mcinnes, Ted Pedersen, John Carlis
This paper explores the use of Concept Unique Identifiers (CUIs) as assigned by MetaMap as features for a supervised learning approach to word sense disambiguation of biomedical text. We compare the...
SenseClusters- Finding Clusters that Represent Word Senses (2009)
SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured...
Parallel translations of written texts have long been useful tools for human students of language, and have begun to serve as an intriguing source of data for corpus-based approaches to natural...
Measures of Semantic Similarity and Relatedness in the Medical Domain (2008)
Ted Pedersen, Serguei Pakhomov, Siddharth Patwardhan
In this paper we introduce a measure of semantic relatedness based on context vectors derived from medical corpora. We also extend a number of measures of semantic similarity for general English to...
UMND2: SenseClusters Applied to the Sense Induction Task of SENSEVAL-4 (2008)
SenseClusters is a freely–available open– source system that served as the University of Minnesota, Duluth entry in the SENSEVAL-4 sense induction task. For this task SenseClusters was configured...
Discovering Identities in Web Contexts with Unsupervised Clustering (2008)
We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to...
Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces (2008)
This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The...
Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be...
2007. Unsupervised Discrimination of Person Names in Web Contexts (2008)
Abstract. Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques to...
SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts (2008)
SenseClusters is a freely available system that identifies similar contexts in text. It relies on lexical features to build first and second order representations of contexts, which are then...
Improving Word Sense Discrimination with Gloss Augmented Feature Vectors (2008)
Abstract. This paper presents a method of unsupervised word sense discrimination that augments co–occurrence feature vectors derived from raw untagged corpora with information from the glosses...
Romanian–English, and English–Hindi (2008)
Joel Martin, Rada Mihalcea, Ted Pedersen
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized
Romanian–English, and English–Hindi (2008)
Joel Martin, Rada Mihalcea, Ted Pedersen
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized
Discovering Identities in Web Contexts with Unsupervised Clustering (2008)
We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to...
Naive Bayes as a Satisficing Model (2007)
We report on an empirical study of supervised learning algorithms that induce models to resolve the meaning of ambiguous words in text. We find that the Naive Bayesian classifier is as accurate as...
Computational approaches to language processing arose during a time of bitter debate in the linguistics community that pitted the generative theory of grammar [Cho57] versus more quantitative and...
Siddharth Patwardhan, Satanjeev Banerjee, Ted Pedersen
Abstract. This paper generalizes the Adapted Lesk Algorithm of Banerjee and Pedersen (2002) to a method of word sense disambiguation based on semantic relatedness. This is possible since Lesk's...
I. Dan Melamed, Ma The, Mit Press, Ted Pedersen
Parallel translations of written texts have long been useful tools for human students of language, and have begun to serve as an intriguing source of data for corpus-based approaches to natural...
A baseline methodology for word sense disambiguation (2007)
Abstract. This paper describes a methodology for supervised word sense disambiguation that relies on standard machine learning algorithms to induce classiers from sense-tagged training examples where...
Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces (2007)
Amruta Purandare, Ted Pedersen
This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The...
SenseClusters - Finding Clusters that Represent Word Senses (2007)
Amruta Purandare, Ted Pedersen
SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured...
SenseClusters - Finding Clusters that Represent Word Senses (2007)
Amruta Purandare, Ted Pedersen
SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach.
Learning Probabilistic Models of Word Sense Disambiguation (2007)
This dissertation presents several new methods of supervised and unsupervised learning of word sense disambiguation models. The supervised methods focus on performing model searches through a space...
Learning Probabilistic Models of Word Sense Disambiguation (2007)
This dissertation presents several new methods of supervised and unsupervised learning of word sense disambiguation models. The supervised methods focus on performing model searches through a space...
Improving Name Discrimination: A Language Salad Approach (2006)
Ted Pedersen, Anagha Kulkarni, Zornitsa Kozareva
This paper describes a method of discriminating ambiguous names that relies upon features found in corpora of a more abundant language. In particular, we discriminate ambiguous names in Bulgarian,...
Mahesh Joshi, Serguei Pakhomov, Ted Pedersen, Christopher G. Chute
Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to...
Kernel Methods for Word Sense Disambiguation and Abbreviation Expansion (2006)
Mahesh Joshi, Ted Pedersen, Richard Maclin
The scarcity of manually labeled data for supervised machine learning methods presents a significant limitation on their ability to acquire knowledge. The use of kernels in Support Vector Machines...
Ted Pedersen, Anagha Kulkarni, Roxana Angheluta, Zornitsa Kozareva, Thamar Solorio
Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order...
Automatic cluster stopping with criterion functions and the Gap Statistic (2006)
SenseClusters is a freely available system that clusters similar contexts. It can be applied to a wide range of problems, although here we focus on word sense and name discrimination. It supports...
Abstract Resolving Ambiguities in Biomedical Text With Unsupervised Clustering Approaches (2005)
Guergana Savova, Ted Pedersen, Amruta Pur, Anagha Kulkarni
This paper explores the effectiveness of unsupervised clustering techniques developed for general English in resolving semantic ambiguities in the biomedical domain. Methods that use first and second...
Name discrimination by clustering similar contexts (2005)
Ted Pedersen, Amruta Pur, Anagha Kulkarni
Abstract. It is relatively common for different people or organizations to share the same name. Given the increasing amount of information available online, this results in the ever growing...
Abbreviation and acronym disambiguation in clinical discourse (2005)
Serguei Pakhomov, Ted Pedersen, Christopher G. Chute
Use of abbreviations and acronyms is pervasive in clinical reports despite many efforts to limit the use of ambiguous and unsanctioned abbreviations and acronyms. Due to the fact that many...
Word Sense Discrimination (WSD) � One word with multiple senses/meanings. (2005)
Anagha Kulkarni, Dr. Ted Pedersen, Computer Science, U Of Minnesota
� Dr. Guergana Savova, for her support and guidance.
Word Alignment for Languages with Scarce Resources (2005)
Joel Martin, Rada Mihalcea, Ted Pedersen
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the ACL 2005 Workshop on...
Theory revision via prior operationalization (2005)
Mahesh Joshi, Ted Pedersen, Richard Maclin
Abstract. We have applied five supervised learning approaches to word sense disambiguation in the medical domain. Our objective is to evaluate Support Vector Machines (SVMs) in comparison with other...
Wordnet::similarity - measuring the relatedness of concepts (2004)
Ted Pedersen, Siddharth Patwardhan
WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six...
Wordnet::similarity - measuring the relatedness of concepts (2004)
Ted Pedersen, Siddharth Patwardhan
WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity or relatedness between a pair of concepts (or word senses). It provides six...
Wordnet::similarity - measuring the relatedness of concepts (2004)
Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi
WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity and relatedness between a pair of concepts (or synsets). It provides six measures...
Wordnet::similarity - measuring the relatedness of concepts (2004)
Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi
WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity and relatedness between a pair of concepts (or synsets). It provides six measures...
The SENSEVAL-3 Multilingual English-Hindi Lexical Sample Task (2004)
Timothy Chklovski, Rada Mihalcea, Ted Pedersen, Amruta Purandare
This paper describes the English--Hindi Multilingual lexical sample task in SENSEVAL--3. Rather than tagging an English word with a sense from an English dictionary, this task seeks to assign the...
The SENSEVAL-3 multilingual English-Hindi lexical sample task (2004)
Timothy Chklovski, Ted Pedersen
This paper describes the English–Hindi Multilingual lexical sample task in SENSEVAL–3. Rather than tagging an English word with a sense from an English dictionary, this task seeks to assign the...
The SENSEVAL-3 multilingual English-Hindi lexical sample task (2004)
Timothy Chklovski, Ted Pedersen
This paper describes the English–Hindi Multilingual lexical sample task in SENSEVAL–3. Rather than tagging an English word with a sense from an English dictionary, this task seeks to assign the...
Writing About Research Or, the Art of WAR (2003)
There are a number of good books available that talk about writing for research. I recommend that you find one that you like, and that you read it carefully and follow it’s advice (let me know...
Extended gloss overlaps as a measure of semantic relatedness (2003)
Satanjeev Banerjee, Ted Pedersen
This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it...
The Design, Implementation and Use of the Ngram Statistics Package (2003)
Satanjeev Banerjee, Ted Pedersen
Abstract. The Ngram Statistics Package (NSP) is a exible and easy{ to{use software tool that supports the identication and analysis of Ngrams, sequences of N tokens in online text. We have designed...
An Evaluation Exercise for Word Alignment (2003)
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on...
An Evaluation Exercise for Word Alignment (2003)
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on...
An Evaluation Exercise for Word Alignment (2003)
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on...
Maximizing Semantic Relatedness to Perform Word Sense Disambiguation (2003)
Ted Pedersen, Satanjeev Banerjee, Siddharth Patwardhan
This article presents a method of word sense disambiguation that assigns a target word the sense that is most related to the senses of its neighboring words. We explore the use of measures of...
An Evaluation Exercise for Word Alignment (2003)
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on...
An evaluation exercise for word alignment (2003)
This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the
This paper presents an evaluation of an ensemble--based system that participated in the English and Spanish lexical sample tasks of Senseval-2. The system combines decision trees of unigrams,...
Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of Senseval-2 (2002)
This paper presents a comparative evaluation among the systems that participated in the Spanish and English lexical sample tasks of Senseval-2. The focus is on pairwise comparisons among systems to...
Machine Learning with Lexical Features: The Duluth Approach to Senseval-2 (2002)
This paper describes the sixteen Duluth entries in the Senseval-2 comparative exercise among word sense disambiguation systems. There were eight pairs of Duluth systems entered in the Spanish and...
Building Resources for Languages with Scarce Resources (2002)
Brian Rassier, Faculty Sponsor, Dr. Ted Pedersen
The objective of this UROP is to develop a Web interface that will allow for the construction of a large collection of multilingual parallel text with the help of Web users from around the world....
Assessing system agreement and instance difficulty in the lexical samples tasks of senseval-2 (2002)
This paper presents a comparative evaluation among the systems that participated in the Spanish and English lexical sample tasks of SENSEVAL-2. The focus is on pairwise comparisons among systems to...
This paper presents an evaluation of an ensemble--based system that participated in the English and Spanish lexical sample tasks of SENSEVAL-2. The system combines decision trees of unigrams,...
A Decision Tree of Bigrams is an Accurate Predictor of Word Sense (2001)
This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated...
A decision tree of bigrams is an accurate predictor of word sense (2001)
tpedersed. umn. edu This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This...
Machine Learning with Lexical Features: The Duluth Approach to SENSEVAL-2 (2001)
This paper describes the sixteen Duluth entries in the Senseval-2 comparative exercise among word sense disambiguation systems. There were eight pairs of Duluth systems entered in the Spanish and...
A decision tree of bigrams is an accurate predictor of word sense (2001)
This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated...
This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co--occurring...
This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co--occurring...
Search Techniques for Learning Probabilistic Models of Word Sense Disambiguation (1999)
The development of automatic natural language understanding systems remains an elusive goal. Given the highly ambiguous nature of the syntax and semantics of natural language, it is not possible to...
Integrating Natural Language Subtasks with Bayesian Belief Networks (1999)
The development of automatic natural language understanding systems remains an elusive goal. Given the highly ambiguous nature of the syntax and semantics of natural language, it is often impossible...
Learning probabilistic models of word sense disambiguation /--by Ted Pedersen. (1998)
Thesis (Ph. D. in Computer Science)--S.M.U.
Knowledge lean word-sense disambiguation (1998)
We present a corpus{based approach to word{sense disambiguation that only requires information that can be automatically extracted from untagged text. We use unsupervised techniques to estimate the...
Knowledge Lean Word-Sense Disambiguation (1998)
We present a corpus--based approach to word--sense disambiguation that only requires information that can be automatically extracted from untagged text. We use unsupervised techniques to estimate the...
Dependent Bigram Identification (1998)
F23.16> industry 240 times, industry occurs without oil 1001 times, and bigrams other than oil industry occur 1,298,742 times. This distribution is sparse and skewed and thus violates a central...
Pirati su internet, Ted Pedersen e Mel Gilden, illustrazioni di Smart Factory. . - Modena. NALUAF000497, Panini. NAEDAF003984, 1998.
Distinguishing Word Senses in Untagged Text (1997)
This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper,...
Sequential Model Selection for Word Sense Disambiguation (1997)
Pedersen, Ted, Bruce, Rebecca, Wiebe, Janyce
Statistical models of word-sense disambiguation are often based on a small number of contextual features or on a model that is assumed to characterize the interactions among a set of features. Model...
A Statistical Decision Making Method: A Case Study on Prepositional Phrase Attachment (1997)
Mehmet Kayaalp, Ted Pedersen, Rebecca Bruce
Statistical classification methods usually rely on a single best model to make ac-curate predictions. Such a model aims to maximize accuracy by balancing precision and recall. The Model Switching...
Distinguishing word senses in untagged text (1997)
This paper describes an experimental com-parison of three unsupervised learning al-gorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper,...
Distinguishing word senses in untagged text (1997)
This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper,...
A Statistical Decision Making Method: A Case Study on Prepositional Phrase Attachment (1997)
Mehmet Kayaalp, Ted Pedersen, Rebecca Bruce
Statistical classification methods usually rely on a single best model to make accurate predictions. Such a model aims to maximize accuracy by balancing precision and recall. The Model Switching...
A Statistical Decision Making Method: A Case Study on Prepositional Phrase Attachment (1997)
Mehmet Kayaalp, Ted Pedersen, Rebecca Bruce
Statistical classification methods usually rely on a single best model to make accurate predictions. Such a model aims to maximize accuracy by balancing precision and recall. The Model Switching...
Class.3.0 : A Probabilistic Classifier Using Decomposable Models (1997)
Class.3.0 is a probabilistic classifier designed for use with decomposable models. It classifies held out test instances based on parameter estimates made from training data. This document is a...
Naive Mixes for Word Sense Disambiguation (1997)
The Naive Mix is a new supervised learning algorithm based on sequential model selection. The usual objective of model selection is to find a single probabilistic model that adequately characterizes,...
Unsupervised Text Mining (1997)
We describe the results of performing text mining on a challenging problem in natural language processing, word sense disambiguation. We compare two methods of unsupervised learning, Ward's...
Sequential Model Selection for Word Sense Disambiguation (1997)
Ted Pedersen, Rebecca Bruce, Janyce Wiebe
Statistical models of word--sense disambiguation are often based on a small number of contextual features or on a model that is assumed to characterize the interactions among a set of features. Model...
Knowledge Lean Word Sense Disambiguation (1997)
s an averaged probabilistic model, is introduced and shown to be competitive with well--known machine learning algorithms (Pedersen & Bruce 1997). In the absence of sense--tagged text, the sense...
Distinguishing Word Senses in Untagged Text (1997)
This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper,...
Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of...
Bruce, Rebecca, Wiebe, Janyce, Pedersen, Ted
This paper describes measures for evaluating the three determinants of how well a probabilistic classifier performs on a given test set. These determinants are the appropriateness, for the test set,...
Significant lexical relationships (1996)
Ted Pedersen, Mehmet Kayaalp, Rebecca Bruce
Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance...
Significant lexical relationships (1996)
Ted Pedersen, Mehmet Kayaalp, Rebecca Bruce
Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance...
Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of...
Significant Lexical Relationships (1996)
Ted Pedersen, Mehmet Kayaalp, Rebecca Bruce
Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance...
Significant Lexical Relationships (1996)
Ted Pedersen, Mehmet Kayaalp, Rebecca Bruce
We describe a test that can be used to accurately assess the significance of a population model from a data sample using freely available software. We apply this test to the study of lexical...
Model Selection using Sparse Data and Backward Sequential Search (1996)
Ted Pedersen, Rebecca Bruce, Mehmet Kayaalp
This paper presents an empirical comparison of a variety of model evaluation criteria used in backwards sequential search (BSS). Both information criteria (IC) and significance tests are compared...
What to Infer from a Description (1996)
Recent work in identifying dependent collocations among consecutive words (i.e., dependent bigrams) has applied inferential statistical methods where descriptive ones may have been more appropriate...
Rebecca Bruce, Jaynce Wiebe, Ted Pedersen
This paper describes measures for evaluating the three determinants of how well a probabilistic classifier performs on a given test set. These determinants are the appropriateness, for the test set,...
Lexical Acquisition via Constraint Solving (1995)
This paper describes a method to automatically acquire the syntactic and semantic classifications of unknown words. Our method reduces the search space of the lexical acquisition problem by utilizing...
Lexical Acquisition via Constraint Solving (1995)
This paper describes a method to automatically acquire the syntactic and semantic classifications of unknown words. Our method reduces the search space of the lexical acquisition problem by utilizing...
Automatic Acquisition of Noun and Verb Meanings (1995)
A robust Natural Language Processing (NLP) system must be able to automatically acquire the syntax and semantics of unknown words that it encounters during processing. It is inevitable that a...
Abstract. This paper describes and evaluates a simple modication to the Brill Part{of{Speech Tagger. In its standard distribution the Brill Tagger allows manual assignment of a part{of{speech tag to...
A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports
Joshi, Mahesh, Pakhomov, Serguei, Pedersen, Ted, Chute, Christopher G.
Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to...
Abbreviation and Acronym Disambiguation in Clinical Discourse
Pakhomov, Serguei, Pedersen, Ted, Chute, Christopher G.
Use of abbreviations and acronyms is pervasive in clinical reports despite many efforts to limit the use of ambiguous and unsanctioned abbreviations and acronyms. Due to the fact that many...
A New Supervised Learning Algorithm for Word Sense Disambiguation
The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that...
Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain
McInnes, Bridget T., Pedersen, Ted, Carlis, John
This paper explores the use of Concept Unique Identifiers (CUIs) as assigned by MetaMap as features for a supervised learning approach to word sense disambiguation of biomedical text. We compare the...