Dan Klein

Efficient Inference in Phylogenetic InDel Trees (2009)

Alexandre Bouchard-côté, Michael I. Jordan, Dan Klein

Accurate and efficient inference in evolutionary trees is a central problem in computational biology. While classical treatments have made unrealistic site independence assumptions, ignoring...

Fully Distributed EM for Very Large Datasets (2009)

Jason Wolfe, Aria Haghighi, Dan Klein

In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a...

Coarse-to-Fine Syntactic Machine Translation using Language Projections (2009)

Slav Petrov, Aria Haghighi, Dan Klein

The intersection of tree transducer-based translation models with n-gram language models results in huge dynamic programs for machine translation decoding. We propose a multipass, coarse-to-fine...

Mixture-of-Parents Maximum Entropy Markov Models (2009)

David S. Rosenberg, Dan Klein, Ben Taskar

We present the mixture-of-parents maximum entropy Markov model (MoP-MEMM), a class of directed graphical models extending MEMMs. The MoP-MEMM allows tractable incorporation of long-range dependencies...

Parsing German with Latent Variable Grammars (2009)

Slav Petrov, Dan Klein

We describe experiments on learning latent variable grammars for various German treebanks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically...

Learning Bilingual Lexicons from Monolingual Corpora (2009)

Aria Haghighi, Percy Liang, Taylor Berg-kirkpatrick, Dan Klein

We present a method for learning bilingual translation lexicons from monolingual corpora. Word types in each language are characterized by purely monolingual features, such as context counts and...

Coarse-to-Fine Syntactic Machine Translation using Language Projections (2009)

Slav Petrov, Aria Haghighi, Dan Klein

The intersection of tree transducer-based translation models with n-gram language models results in huge dynamic programs for machine translation decoding. We propose a multipass, coarse-to-fine...

Analyzing the Errors of Unsupervised Learning (2009)

Percy Liang, Dan Klein

We identify four types of errors that unsupervised induction systems make and study each one in turn. Our contributions include (1) using a meta-model to analyze the incorrect biases of a model in a...

The Complexity of Phrase Alignment Problems (2009)

John Denero, Dan Klein

Many phrase alignment models operate over the combinatorial space of bijective phrase alignments. We prove that finding an optimal alignment in this space is NP-hard, while computing alignment...

EFFICIENT SENTENCE SEGMENTATION USING SYNTACTIC FEATURES (2009)

Benoit Favre, Dilek Hakkani-tür, Slav Petrov, Dan Klein

To enable downstream language processing, automatic speech recognition output must be segmented into its individual sentences. Previous sentence segmentation systems have typically been very local,...

Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing (2009)

Slav Petrov, Dan Klein

We present a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement. The model is formally a latent variable CRF grammar over trees,...

Abstract (2009)

Percy Liang, Dan Klein, Michael I. Jordan

The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate...

Non-Local Modeling with a Mixture of PCFGs (2009)

Slav Petrov, Leon Barrett, Dan Klein

While most work on parsing with PCFGs has focused on local correlations between tree configurations, we attempt to model non-local correlations using a finite mixture of PCFGs. A mixture grammar fit...

Conflicting Tests (2008)

Dan Klein, Uc Berkeley

coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun, where frightened tourists squeezed into musty shelters.

� Binding � Reference (2008)

Dan Klein, Semantic Reference

� How do we know what nodes go in the tree?

Approximate Factoring for A ∗ Search (2008)

Aria Haghighi, John Denero, Dan Klein

We present a novel method for creating A ∗ estimates for structured search problems. In our approach, we project a complex model onto multiple simpler models for which exact inference is efficient....

Kinds of Reference (2008)

Dan Klein, Uc Berkeley, Next Few Weeks, Next Few Weeks, John Smith

� Sign up with me � You’ve got 10- 20 minutes, one slot per group � Tell us: � The problem: why do we care? � Your concrete task: input, output, evaluation � A simple baseline for the...

Wednesday 12/8) (2008)

Dan Klein, Uc Berkeley

� You’ve got 6-8 minutes! � Tell us: � The problem: why do we care? � Your concrete task: input, output, evaluation � A simple baseline for the task � Your method (half the time here)...

Phrase Structure Parsing (2008)

Dan Klein, Uc Berkeley, Grammar Induction

WSD? � Remember when we discussed WSD?

Non-Local Modeling with a Mixture of PCFGs (2008)

Slav Petrov, Leon Barrett, Dan Klein

While most work on parsing with PCFGs has focused on local correlations between tree configurations, we attempt to model non-local correlations using a finite mixture of PCFGs. A mixture grammar fit...

Conflicting Tests (2008)

Dan Klein, Uc Berkeley

� Phrase structure parsing organizes syntax into constituents or brackets � In general, this involves nested trees � Linguists can, and do, argue about details � Lots of ambiguity � Not the...

� Units of transfer: (2008)

Dan Klein, Uc Berkeley, Hurricane Emily, Mexico Caribbean

� Phrase structure parsing organizes syntax into constituents or brackets � In general, this involves nested trees � Linguists can, and do, argue about details � Lots of ambiguity � Not the...

� Syntactic language and translation models Hypotheis Lattices WSD? (2008)

Dan Klein, Uc Berkeley, Grammar Induction

� Remember when we discussed WSD? � Word-based MT systems rarely have a WSD step � Why not? Phrase Structure Parsing � Phrase structure parsing organizes syntax into constituents or brackets...

Kinds of Reference � Referring expressions (2008)

Dan Klein, Uc Berkeley, Next Few Weeks, John Smith

� Sign up with me � You’ve got 10-20 minutes, one slot per group � Tell us: � The problem: why do we care? � Your concrete task: input, output, evaluation � A simple baseline for the...

Abstract (2008)

Dan Klein, Christopher D. Manning

We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization...

Abstract (2008)

Dan Klein, Christopher D. Manning

We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The...

Abstract (2008)

Dan Klein, Christopher D. Manning

We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization...

Learning Structured Models for Phone Recognition (2008)

Slav Petrov, Adam Pauls, Dan Klein

We present a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition. In our approach, an initial monophone HMM is iteratively refined using a...

CHAMP (camera, handlens, and microscope probe) (2008)

Mungas, Greg S., Boynton, John E., Balzer, Mark A., Beegle, Luther, Sobel, Harold R., Fisher, Ted, ...

CHAMP (Camera, Handlens And Microscope Probe)is a novel field microscope capable of color imaging with continuously variable spatial resolution from infinity imaging down to diffraction-limited...

Abstract (2008)

Dan Klein, Christopher D. Manning

We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization...

Abstract (2008)

Percy Liang, Dan Klein, Michael I. Jordan

The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate...

Discriminative log-linear grammars with latent variables (2008)

Slav Petrov, Dan Klein

We demonstrate that log-linear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure...

Discriminative log-linear grammars with latent variables (2008)

Slav Petrov, Dan Klein

We demonstrate that log-linear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure...

Structure compilation: trading structure for features (2008)

Percy Liang, Hal Daumé Iii, Dan Klein

Structured models often achieve excellent performance but can be slow at test time. We investigate structure compilation, where we replace structure with features, which are often computationally...

z (2007)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Theory and Practice (2007)

Including Chris Manning, Dan Klein, Eric Gaussier, Nicola Cancedda, Franck Thollard, Alexander Simon Clark, ...

I hereby declare that this thesis has not been submitted, either in the same or different form, to this or any other university for a degree. Signature: Acknowledgements First, I would like to thank...

z (2007)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Abstract While O(n (2007)

Dan Klein, Christopher D. Manning

) methods for parsing probabilistic context-free grammars (PCFGs) are well known, a tabular parsing framework for arbitrary PCFGs which allows for botton-up, topdown, and other parsing strategies,...

Combining Heterogeneous Classi (2007)

H. Tolga Ilhan, Ar D. Kamvar, Dan Klein, Christopher D. Manning, Kristina Toutanova

The Stanford-CS224N system is an ensemble of simple classi ers. The rst-tier systems are heterogeneous, consisting primarily of naive-Bayes variants, but also including vector space, memory-based,...

2 and Huy Nguyen (2007)

Dan Klein, Joseph Smarr, Christopher D. Manning

We discuss two named-entity recognition models which use characters and character n-grams either exclusively or as an important part of their data representation. The first model is a character-level...

z (2007)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Fully Distributed EM for Very Large Datasets (2007)

Jason Wolfe, Aria Delier Haghighi, Daniel Klein, Jason Wolfe, Aria Haghighi, Dan Klein

personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the...

• Part-of-speech induction • Parsing and grammar induction • Word segmentation • Word alignment • Document summarization (2007)

Percy Liang, Dan Klein, Coreference Resolution

Recent interest in Bayesian nonparametric methods Probabilistic modeling is a core technique for many NLP tasks such as the ones listed. In recent years, there has been increased interest in applying...

The infinite PCFG using hierarchical Dirichlet processes (2007)

Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein

We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data...

Detecting categories in news video using acoustic, speech and image features (2007)

Slav Petrov, Arlo Faria, Pascal Michaillat, Er Berg, Andreas Stolcke, Dan Klein, ...

This work describes systems for detecting semantic categories present in news video. The multimedia data was processed in three ways: the audio signal was converted to a sequence of acoustic...

The infinite PCFG using hierarchical Dirichlet processes (2007)

Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein

We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data...

The infinite PCFG using hierarchical Dirichlet processes (2007)

Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein

We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data...

Improved inference for unlexicalized parsing (2007)

Slav Petrov, Dan Klein

We present several improvements to unlexicalized parsing with hierarchically state-split PCFGs. First, we present a novel coarse-to-fine method in which a grammar’s own hierarchical projections are...

Word alignment via quadratic assignment (2006)

Simon Lacoste-julien, Dan Klein

Recently, discriminative word alignment methods have achieved state-of-the-art accuracies by extending the range of information sources that can be easily incorporated into aligners. The chief...

Why generative phrase models underperform surface heuristics (2006)

John Denero, Dan Gillick, James Zhang, Dan Klein

We investigate why weights from generative models underperform heuristic estimates in phrasebased machine translation. We first propose a simple generative, phrase-based model and verify that its...

An end-to-end discriminative approach to machine translation (2006)

Percy Liang, Alexandre Bouchard-côté, Dan Klein, Ben Taskar

We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of...

An end-to-end discriminative approach to machine translation (2006)

Percy Liang, Alexandre Bouchard-côté, Dan Klein, Ben Taskar

We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of...

Proceedings of the Workshop on Statistical Machine Translation, pages 31--38, (2006)

New York City, John Denero, Dan Gillick, James Zhang, Dan Klein

We investigate why weights from generative models underperform heuristic estimates in phrasebased machine translation. We first propose a simple generative, phrase-based model and verify that its...

Word alignment via quadratic assignment (2006)

Simon Lacoste-julien, Dan Klein

Recently, discriminative word alignment methods have achieved state-of-the-art accuracies by extending the range of information sources that can be easily incorporated into aligners. The chief...

An end-to-end discriminative approach to machine translation (2006)

Percy Liang, Alexandre Bouchard-côté, Dan Klein, Ben Taskar

We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of...

Learning Accurate, Compact, and Interpretable Tree Annotation (2006)

Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar...

Learning Accurate, Compact, and Interpretable Tree Annotation (2006)

Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar...

An end-to-end discriminative approach to machine translation (2006)

Percy Liang, Alexandre Bouchard-côté, Dan Klein, Ben Taskar

We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of...

Word Alignment via Quadratic Assignment (2006)

Lacoste-Julien, Simon, Taskar, Ben, Klein, Dan, Jordan, Michael I.

Recently, discriminative word alignment methods have achieved state-of-the-art accuracies by extending the range of information sources that can be easily incorporated into aligners. The chief...

An end-to-end discriminative approach to machine translation (2006)

Percy Liang, Alexandre Bouchard-côté, Dan Klein, Ben Taskar

We present a perceptron-style discriminative approach to machine translation in which large feature sets can be exploited. Unlike discriminative reranking approaches, our system can take advantage of...

Prototype-driven grammar induction (2006)

Aria Haghighi, Dan Klein

We investigate prototype-driven learning for primarily unsupervised grammar induction. Prior knowledge is specified declaratively, by providing a few canonical examples of each target phrase type....

A Discriminative Matching Approach to Word Alignment (2005)

Taskar, Ben, Lacoste-Julien, Simon, Klein, Dan

We present a discriminative, large-margin approach to feature-based matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that...

A discriminative matching approach to word alignment (2005)

Ben Taskar, Simon Lacoste-julien, Dan Klein

We present a discriminative, largemargin approach to feature-based matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that...

A Core-Tools Statistical NLP Course (2005)

Dan Klein

In the fall term of 2004, I taught a new statistical NLP course focusing on core tools and machine-learning algorithms.

A discriminative matching approach to word alignment (2005)

Ben Taskar, Simon Lacoste-julien, Dan Klein

We present a discriminative, largemargin approach to feature-based matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that...

Transfer of Grammatical Structure (2005)

Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

Recent research has demonstrated that PCFGs with latent annotations are an effective way to provide automated increases in parsing accuracy. We feel that they have more potential than the literature...

Corpus-based induction of syntactic structure: Models of dependency and constituency (2004)

Dan Klein

We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The...

Analyzing an Italian Treebank with State-of-the-Art Statistical Parsers (2004)

Thomas M. Cover, Alberto Lavelli, Giorgio Satta, Roberto Zanoli, Wiley Series, Telecommunications John Wiley, ...

this paper we report work in progress on the application of state-of-the-art statistical parsing techniques to Italian. Our approach partially differs from previous efforts on other languages because...

Corpus-based induction of syntactic structure: Models of dependency and constituency (2004)

Dan Klein

We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The...

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network (2003)

Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of...

Accurate Unlexicalized Parsing (2003)

Dan Klein, Christopher D. Manning

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence...

Factored A* search for models over sequences and trees (2003)

Dan Klein

We investigate the calculation of A * bounds for sequence and tree models which are the explicit intersection of a set of simpler models or can be bounded by such an intersection. We provide a...

Spectral learning (2003)

Sepandar D. Kamvar, Dan Klein, Christopher D. Manning

We present a simple, easily implemented spectral learning algorithm which applies equally whether we have no supervisory information, pairwise link constraints, or labeled examples. In the...

Named entity recognition with character-level models (2003)

Dan Klein, Joseph Smarr, Huy Nguyen, Christopher D. Manning

We discuss two named-entity recognition models which use characters and character n-grams either exclusively or as an important part of their data representation. The first model is a character-level...

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network (2003)

Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of...

Spectral learning (2003)

Sepandar D. Kamvar, Dan Klein, Christopher D. Manning

We present a simple, easily implemented spectral learning algorithm that applies equally whether we have no supervisory information, pairwise link constraints, or labeled examples. In the...

A* parsing: Fast exact Viterbi parse selection (2003)

Dan Klein

We present an extension of the classic A * search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating...

Accurate Unlexicalized Parsing (2003)

Dan Klein, Christopher D. Manning

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence...

Computing PageRank using power extrapolation (2003)

Taher Haveliwala, Ar Kamvar, Dan Klein, Chris Manning, Gene Golub

Abstract. We present a novel technique for speeding up the computation of PageRank, a hyperlink-based estimate of the "importance " of Web pages, based on the ideas presented in...

Accurate Unlexicalized Parsing (2003)

Dan Klein Stanford, Dan Klein, Christopher D. Manning

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence...

A* Parsing: Fast Exact Viterbi Parse Selection (2003)

Dan Klein, Christopher D. Manning

We present an extension of the classic A* search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating...

Named Entity Recognition with Character-Level Models (2003)

Dan Klein And, Dan Klein, Joseph Smarr, Huy Nguyen, Christopher D. Manning

We discuss two named-entity recognition models which use characters and character -grams either exclusively or as an important part of their data representation. The first model is a character-level...

Fast Exact Inference with a Factored Model for Natural Language Parsing (2003)

Dan Klein, Christopher D. Manning

We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization...

Accurate Unlexicalized Parsing (2003)

Dan Klein Stanford, Dan Klein, Christopher D. Manning

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence...

Accurate Unlexicalized Parsing (2003)

Dan Klein, Christopher D. Manning

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence...

Accurate Unlexicalized Parsing (2003)

Dan Klein, Christopher D. Manning

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence...

Factored A* search for models over sequences and trees (2003)

Dan Klein

We investigate the calculation of A * bounds for sequence and tree models which are the explicit intersection of a set of simpler models or can be bounded by such an intersection. We provide a...

A* parsing: Fast exact Viterbi parse selection (2003)

Dan Klein

We present an extension of the classic A * search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating...

Computing PageRank using power extrapolation (2003)

Taher Haveliwala, Ar Kamvar, Dan Klein, Chris Manning, Gene Golub

Abstract. We present a novel technique for speeding up the computation of PageRank, a hyperlink-based estimate of the “importance ” of Web pages, based on the ideas presented in [7]. The original...

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network (2003)

Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of...

Factored A* Search for Models over Sequences and Trees (2003)

Dan Klein, Christopher D. Manning

We investigate the calculation of A* bounds for sequence and tree models which are the explicit intersection of a set of simpler models or can be bounded by such an intersection. We provide a natural...

Named entity recognition with character-level models (2003)

Dan Klein, Joseph Smarr, Huy Nguyen, Christopher D. Manning

We discuss two named-entity recognition models which use characters and character n-grams either exclusively or as an important part of their data representation. The first model is a character-level...

A* parsing: Fast exact Viterbi parse selection (2003)

Dan Klein

We present an extension of the classic A * search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating...

Factored A* search for models over sequences and trees (2003)

Dan Klein

We investigate the calculation of A * bounds for sequence and tree models which are the explicit intersection of a set of simpler models or can be bounded by such an intersection. We provide a...

A* parsing: Fast exact Viterbi parse selection (2003)

Dan Klein

We present an extension of the classic A * search procedure to tabular PCFG parsing. The use of A* search can dramatically reduce the time required to find a best parse by conservatively estimating...

Spectral learning (2003)

Percy Liang, Dan Klein, Michael I. Jordan

The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate...

Evaluating Strategies for Similarity Search on the Web (2002)

Haveliwala, Taher H., Gionis, Aristades, Klein, Dan, Indyk, Piotr

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Combining heterogeneous classifiers for word-sense disambiguation (2002)

Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Ar D. Kamvar, Christopher D. Manning

This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task....

Interpreting and extending classical agglomerative clustering algorithms using a model-based approach (2002)

Sepandar D. Kamvar, Dan Klein, Christopher D. Manning

We present two results which arise from a model-based approach to hierarchical agglomerative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms--...

From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering (2002)

Dan Klein, Sepandar D. Kamvar, Christopher D. Manning

We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel...

From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering (2002)

Dan Klein, Sepandar D. Kamvar, Christopher D. Manning

We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel...

From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering (2002)

Dan Klein, Sepandar D. Kamvar, Christopher D. Manning

We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel...

Conditional structure versus conditional estimation in NLP models (2002)

Dan Klein, Christopher D. Manning

This paper separates conditional parameter estimation, which consistently raises test set accuracy on statistical NLP tasks, from conditional model structures, such as the conditional Markov model...

Natural language grammar induction using a constituent-context model (2002)

Dan Klein, Christopher D. Manning

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG...

Combining heterogeneous classifiers for word-sense disambiguation (2002)

Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Ar D. Kamvar, Christopher D. Manning

This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task....

Interpreting and Extending Classical Agglomerative Clustering Algorithms (2002)

Sepandar D. Kamvar, Dan Klein, Christopher D. Manning

We present two results which arise from a model-based approach to hierarchical agglomerative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms --...

Natural language grammar induction using a constituent-context model (2002)

Dan Klein, Christopher D. Manning

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG...

Combining heterogeneous classifiers for word-sense disambiguation (2002)

Dan Klein, Kristina Toutanova, H. Tolga Ilhan, Ar D. Kamvar, Christopher D. Manning

This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task....

Interpreting and extending classical agglomerative clustering algorithms using a model-based approach (2002)

Sepandar D. Kamvar, Dan Klein, Christopher D. Manning

We present two results which arise from a model-based approach to hierarchical agglomerative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms –...

Natural language grammar induction using a constituent-context model (2002)

Dan Klein, Christopher D. Manning

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG...

A Generative Constituent-Context Model for Improved Grammar Induction (2002)

Dan Klein, Christopher D. Manning

We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter

From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering (2002)

Dan Klein, Sepandar D. Kamvar, Christopher D. Manning

We present an improved method for clustering in the presence of very limited supervisory information, given as pairwise instance constraints. By allowing instance-level constraints to have spacelevel...

Evaluating Strategies for Similarity Search on the Web (2002)

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages...

Artists in Glass : Late Twentieth Century Masters in Glass (2001)

Klein, Dan

Libro constituido por pequeños ensayos sobre las creaciones de alrededor de ochenta artistas que trabajan con el vidrio como materia prima. Incluye imágenes de la obra de cada artista. El texto...

Artists in Glass : Late Twentieth Century Masters in Glass / D. Klein. (2001)

Klein, Dan

Libro constituido por pequeños ensayos sobre las creaciones de alrededor de ochenta artistas que trabajan con el vidrio como materia prima. Incluye imágenes de la obra de cada artista. El texto...

Distributional phrase structure induction (2001)

Dan Klein, Christopher D. Manning

Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand,...

Parsing with treebank grammars: Empirical bounds, theoretical models, and the structure of the penn treebank (2001)

Dan Klein, Christopher D. Manning

This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank's own CFG...

Distributional phrase structure induction (2001)

Dan Klein, Christopher D. Manning

Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand,...

Parsing and hypergraphs (2001)

Dan Klein, Christopher D. Manning

While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers...

Parsing And Hypergraphs (2001)

Dan Klein And, Dan Klein, Christopher D. Manning

While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers.

Distributional phrase structure induction (2001)

Dan Klein, Christopher D. Manning

Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand,...

Parsing and hypergraphs (2001)

Dan Klein, Christopher D. Manning

While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers...

Distributional phrase structure induction (2001)

Dan Klein, Christopher D. Manning

Unsupervised grammar induction systems commonly judge potential constituents on the basis of their effects on the likelihood of the data. Linguistic justifications of constituency, on the other hand,...

Candidate Model Problems in Software Architecture (1994)

Mary Shaw David, David Garlan, Robert Allen, Dan Klein, John Ockerbloom, Curtis Scott, ...

data types. The second solution decomposes the system into a similar set of five modules. However, in this case data is no longer directly shared by the computational components. Instead, each module...

Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon

Huang, Yang, Lowe, Henry J., Klein, Dan, Cucina, Russell J.

Objective: The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing...

Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon

Huang, Yang, Lowe, Henry J., Klein, Dan, Cucina, Russell J.

Objective: The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing...

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network

Kristina Toutanova Dan, Dan Klein, Christopher D. Manning, Yoram Singer

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of...

Factors affecting secondary share offerings in the IPO process

Klein, Dan, Li, Mingsheng

We investigate whether the sale of secondary shares in the IPO process is affected by an issuing firm's market-timing and window-dressing activities. We find that secondary share offerings in IPOs...