David Yarowsky

Abstract Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs (2008)

William Gale, Kenneth Ward Church, David Yarowsky

We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus...

Abstract Using Bilingual Materials to Develop Word Sense Disambiguation Methods (2008)

William A. Gale, Kenneth W. Church, David Yarowsky

Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Much of this work has been stymied by difficulties in acquiring...

JHU1: An Unsupervised Approach to Person Name Disambiguation using Web Snippets (2008)

Delip Rao, Nikesh Garera, David Yarowsky

This paper presents an approach to person name disambiguation using K-means clustering on rich-feature-enhanced document vectors, augmented with additional webextracted snippets surrounding the...

In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002) (2008)

Inducing Information Extraction, Ellen Riloff, Charles Schafer, David Yarowsky

Information extraction (IE) systems are costly to build because they require development texts, parsing tools, and specialized dictionaries for each application domain and each natural language that...

Using Word-Sense Disambiguation Statistical Models of Roget's Categories Trained on Large Corpora (2007)

David Yarowsky

This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as...

Using Word-Sense Disambiguation Statistical Models of Roget's Categories Trained on Large Corpora (2007)

David Yarowsky

This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as...

Statement (2007)

David Yarowsky, Radu Florian

This paper describes and extensively evaluates a sys-tem for the automatic routing of submitted papers to reviewers and area committees, without the need for any human annotation from the reviewers...

Abstract Word-Sense Disambiguation Using Statistical Models of Roget's Categories (2007)

David Yarowsky

This paper describes a program that disambignates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as...

Languages (2007)

Charles Schafer, David Yarowsky

{cschafer, yarowsky}}cs. jhu. edu This paper presents a method for inducing translation lexicons between two distant languages without the need for either parallel bilingual corpora or a direct...

Polish CZECH (2007)

Charles Schafer, David Yarowsky

This paper presents a method for inducing translation lexicons between two distant languages without the need for either parallel bilingual corpora or a direct bilingual seed dictionary. The...

I Task Definition (2007)

David Yarowsky, Richard Wicentowski

Email:{yarowsky, rchardw}cs. jhu. edu This paper presents a corpus-based algorithm capable of inducing infiectional morphological analyses of both regulax and highly irregular forms (such as...

ABSTRACT Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora (2007)

David Yarowsky, Grace Ngai, Richard Wicentowski

This paper describes a system and set of algorithms for automatically inducing stand-alone monolingual part-of-speech taggers, base noun-phrase bracketers, named-entity taggers and morphological...

Publishers, 1999. (2007)

Ra Manzi, David Yarowsky, Dragomir R. Radev, Dragomir R. Radev, ...

[43] Evelyne Tzoukermann and Dragomir R. Radev. Using word class for partof

1 Translation Lexicons, Cognates, (2007)

Gideon S. Mann, David Yarowsky, Bridge Languages

This paper presents a method for inducing translation lexicons based on transduction models of cognate pairs via bridge languages. Bilingual lexicons within languages families are induced using...

Language Independent NER using a Unified Model of Internal and Contextual Evidence (2007)

Cucerzan, Silviu, Yarowsky, David

Abstract This paper investigates the use of a language independent model for named entity recognition based on iterative learning in a co-training fashion, using word-internal and contextual...

Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day (2007)

Cucerzan, Silviu, Yarowsky, David

This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one person day of data acquisition effort. It requires only...

Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora (2007)

Yarowsky, David, Ngai, Grace, Wicentowski, Richard

This paper describe system and set of automatically inducing stand-alone monolingual part-of-speech taggers, base noun-phrase bracketers, named-entity taggers and morphological analyzers for an...

One Sense Per Collocation (2006)

Yarowsky, David

Previous work [Gale, Church and Yarowsky, 1992] showed that with high probability a polysemous word has one sense per discourse. In this paper we show that for certain definitions of collocation, a...

Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 49--56, (2005)

Ann Arbor June, Elliott Franco Drábek, David Yarowsky

This paper presents an original approach to part-of-speech tagging of fine-grained features (such as case, aspect, and adjective person/number) in languages such as English where these properties are...

Multi-field information extraction and cross-document fusion (2005)

Gideon S. Mann, David Yarowsky

In this paper, we examine the task of extracting a set of biographic facts about target individuals from a collection of Web pages. We automatically annotate training text with positive and negative...

Unsupervised Personal Name Disambiguation (2003)

Gideon S. Mann, David Yarowsky

This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering...

Statistical machine translation using coercive two-level syntactic transduction (2003)

Charles Schafer, David Yarowsky

We define, implement and evaluate a novel model for statistical machine translation, which is based on shallow syntactic analysis (part-of-speech tagging and phrase chunking) in both the source and...

Modeling consensus: Classifier combination for word sense disambiguation (2002)

Radu Florian, David Yarowsky

This paper demonstrates the substantial empirical success of classifier combination for the word sense disambiguation task. It investigates more than 10 classifier combination methods, including...

Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day (2002)

Silviu Cucerzan, David Yarowsky

This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one personday of data acquisition effort. It requires only three...

Language Independent NER using a Unified Model of Internal and (2002)

Contextual Evidence Silviu, Silviu Cucerzan, David Yarowsky

This paper investigates the use of a language independent model for named entity recognition based on iterative learning in a co-training fashion, using word-internal and contextual information as...

Bootstrapping a Multilingual Part-of-speech Tagger (2002)

Silviu Cucerzan, David Yarowsky

This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one personday of data acquisition effort. It requires only three...

Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking (2001)

Ngai, Grace, Yarowsky, David

This paper presents a comprehensive empirical comparison between two approaches for developing a base noun phrase chunker: human rule writing and active learning using interactive real-time human...

Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation (2001)

Florian, Radu, Yarowsky, David

This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive...

The Johns Hopkins SENSEVAL2 system descriptions (2001)

David Yarowsky, Silviu Cucerzan, Radu Florian, Charles Schafer, Richard Wicentowski

This article describes the Johns Hopkins University (JHU) sense-disambiguation systems that participated in seven SENSEVAL2 tasks: four supervised lexical choice systems (Basque, English, Spanish,...

Multipath translation lexicon induction via bridge languages (2001)

Gideon S. Mann, David Yarowsky, Bridge Languages

{gsm, yarowsky}cs. jhu. edu This paper presents a method for inducing translation lexicons based on transduction models of cognate pairs via bridge languages. Bilingual lexicons within languages...

Language independent minimally supervised induction of lexical probabilities (2000)

Silviu Cucerzan, David Yarowsky

A central problem in part-of-speech tagging, especially for new languages for which limited annotated resources are available, is estimating the distribution of lexical probabilities for unknown...

Rule writing or annotation: Cost-efficient resource usage for base noun phrase chunking (2000)

Grace Ngai, David Yarowsky

Email:{gyn, yarowsky}cs. jhu. edu This paper presents a comprehensive empirical comparison between two approaches for developing a base noun phrase chunker: human rule writing and active learning...

Language independent minimally supervised induction of lexical probabilities (2000)

Silviu Cucerzan, David Yarowsky

A central problem in part-of-speech tagging, especially for new languages for which limited annotated resources are available, is estimating the distribution of lexical probabilities for unknown...

Hierarchical decision lists for word sense disambiguation (2000)

David Yarowsky

Abstract. This paper describes a supervised algorithm for word sense disambiguation based on hierarchies of decision lists. This algorithm supports a useful degree of conditional branching while...

Papers by Dragomir R. Radev (2000)

Ra Manzi, David Yarowsky, Dragomir R. Radev, Dragomir R. Radev, ...

Linguistics. [33] Evelyne Tzoukermann and Dragomir R. Radev. Use of weighted nding new information in threaded news. Technical Report CUCS-026-99, Columbia University, 1999. [24] Dragomir R. Radev,...

Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking (2000)

Grace Ngai, David Yarowsky

This paper presents a comprehensive empirical comparison between two approaches for developing a base noun phrase chunker: human rule writing and active learning using interactive realtime human...

Dynamic nonlocal language modeling via hierarchical topic-based adaptation (1999)

Radu Florian, David Yarowsky

This paper presents a novel method of generating and applying hierarchical, dynamic topic-based lan-guage models. It proposes and evaluates new clus-ter generation, hierarchical smoothing and...

Language independent named entity recognition combining morphological and contextual evidence (1999)

Silviu Cucerzan, David Yarowsky

Identifying and classifying personal, geographic, institutional or other names in a text is an important task for numerous applications. This paper describes and evaluates a language-independent...

Statistical Machine Translation (1999)

Yaser Al-onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, ...

Automatic translation from one human language to another using computers, better known as machine translation (MT), is a longstanding goal of computer science. In order to be able to perform such a...

Taking the Load Off the Conference Chairs: Towards a Digital Paper-Routing Assistant (1999)

David Yarowsky, Radu Florian

This paper describes and extensively evaluates a system for the automatic routing of submitted papers to reviewers and area committees, without the need for any human annotation from the reviewers or...

Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation (1999)

Radu Florian, David Yarowsky

This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive...

Statistical machine translation (1999)

Yaser Al-onaizan, Jan Curin, Michael Jahr, Kevin Knight, John La Erty, Dan Melamed, ...

Automatic translation from one human language to another using computers, better known as machine translation (MT), is a longstanding goal of computer science. In order to be able to perform such a...

Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation (1998)

Philip Resnik, David Yarowsky

Resnik and Yarowsky (1997) made a set of observations about the state of the art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals...

Exploiting nonlocal and syntactic word relationships in language models for conversational speech recognition (1997)

Radu Florian, David Yarowsky

Language models are used in various elds that deal with sequences of words. We present in this paper automatic methods to cluster sets of documents into topic trees (that go from general to specic)....

A Perspective on Word Sense Disambiguation Methods and Their Evaluation (1997)

Philip Resnik, David Yarowsky

In this position paper, we make several observations about the state of the art in automatic word sense disambiguation. Motivated by these observations, we offer several specific proposals to the...

Homograph Disambiguation in Text-to-speech Synthesis (1997)

David Yarowsky, Henry Iii

This chapter presents a statistical decision procedure for lexical ambiguity resolution in text-to-speech synthesis. Based on decision lists, the algorithm incorporates both local syntactic patterns...

Unsupervised word sense disambiguation rivaling supervised methods (1995)

David Yarowsky

This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require...

Unsupervised word sense disambiguation rivaling supervised methods (1995)

David Yarowsky

yarowsky~unagi, ci s. upenn, edu This paper presents an unsupervised learn-ing algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised...

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods (1995)

David Yarowsky

This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require...

Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French (1994)

Yarowsky, David

This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an...

A comparison of corpus-based techniques for restoring accents in Spanish and French text (1994)

David Yarowsky

Abstract. This chapter will explore and compare three corpus-based techniques for lexical ambiguity resolution, focusing on the problem of restoring missing accents to Spanish and French text. Many...

DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French (1994)

David Yarowsky

This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an...

Decision Lists For Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French (1994)

David Yarowsky

This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an...

A Comparison Of Corpus-Based Techniques For Restoring Accents In Spanish And French Text (1994)

David Yarowsky, David Yarowsky

. This chapter will explore and compare three corpus-based techniques for lexical ambiguity resolution, focusing on the problem of restoring missing accents to Spanish and French text. Many of the...

Discrimination Decisions for 100,000-Dimensional Spaces (1994)

William A. Gale, Kenneth W. Church, David Yarowsky

Discrimination decisions arise in many natural language processing tasks. Three classical tasks are discriminating texts by their authors (author identification), discriminating documents by their...

Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora (1992)

David Yarowsky

This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as...

Word-sense disambiguation using statistical models of Roget's categories trained on large corpora (1992)

David Yarowsky

This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as...

Language Independent NER Using a Unified Model of Internal and Contextual Evidence

Silviu Cucerzan, David Yarowsky

This paper investigates the use of a language independent model for named entity recognition based on iterative learning in a co-training fashion, using word-internal and contextual information as...