DRASO: Declaratively Regularized Alternating Structural Optimization (2009)
Partha Pratim Talukdar, Ted Sandler, Mark Dredze, Koby Crammer, John Blitzer, Fernando Pereira
Recent work has shown that Alternating Structural Optimization (ASO) can improve supervised learners by learning feature representations from unlabeled data. However, there is no natural way to...
Dynamic Quality Control for Transform Domain Wyner-Ziv Video Coding (2009)
Sören Sofke, Fernando Pereira, Erika Müller
Wyner-Ziv is an emerging video coding paradigm based on the Slepian-Wolf and Wyner-Ziv theorems where video coding may be performed by exploiting the temporal correlation at the decoder and not...
A transcription factor affinity-based code for mammalian transcription initiation (2009)
Megraw, Molly, Pereira, Fernando, Jensen, Shane T., Ohler, Uwe, Hatzigeorgiou, Artemis G.
The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription...
Dynamic Quality Control for Transform Domain Wyner-Ziv Video Coding (2009)
Sören Sofke, Fernando Pereira, Erika Müller
Wyner-Ziv is an emerging video coding paradigm based on the Slepian-Wolf and Wyner-Ziv theorems where video coding may be performed by exploiting the temporal correlation at the decoder and not...
Feature Design for Transfer Learning (2008)
Mark Dredze, John Blitzer, Koby Crammer, Fernando Pereira
Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution and labeled using the same function. However, often we have labeled...
Penn/UMass/CHOP Biocreative II systems 1 Penn/UMass/CHOP Biocreative II systems (2008)
Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew Mccallum, ...
Our team participated in the entity tagging and normalization tasks of Biocreative II. For the entity tagging task, we used a k-best MIRA learning algorithm with lexicons and automatically derived...
Administrative changes in DRG-based financing models in Portugal (2008)
Cardoso, Ana, Pereira, Fernando
No abstract available.
Global Inference and Learning Algorithms for Multi-Lingual Dependency Parsing (2008)
Ryan Mcdonald, Koby Crammer, Fernando Pereira, Kevin Lerman
This paper gives an overview of the work of McDonald et al. (McDonald et al. 2005a, 2005b; McDonald and Pereira 2006; McDonald et al. 2006) on global inference and learning algorithms for data-driven...
Feature Article MPEG-21: Goals and Achievements (2008)
MPEG-21 is an open standards-based framework for multimedia delivery and consumption. It aims to enable the use of multimedia resources across a wide range of networks and devices. In this article,...
Spanning Tree Methods for Discriminative Training of Dependency Parsers (2008)
Ryan Mcdonald, Koby Crammer, Fernando Pereira
Untyped dependency parsing can be viewed as the problem of finding maximum spanning trees (MSTs) in directed graphs. Using this representation, the Eisner (1996) parsing algorithm is sufficient for...
Mariam Kimiaei Asadi, Fernando Pereira, Yves Mathieu Examinateurs, Vincent Charvillat, Nabil Layaïda, ...
présentée pour obtenir le grade de docteur
José M. Martínez, Rob Koenen, Fernando Pereira
permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of MPEG's products or services. Internal or personal use of this material is permitted....
The Need for Open Source Software in Machine Learning (2008)
Sören Sonnenburg, Mikio L. Braun, Samy Bengio, Leon Bottou, Geoffrey Holmes, Yann Lecun, ...
Euclidean Embedding of Co-occurrence Data (2008)
Fernando Pereira, Naftali Tishby, John Lafferty
Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper...
Embedding Heterogeneous Data using Statistical Models (2008)
Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby
Embedding algorithms are a method for revealing low dimensional structure in complex data. Most embedding algorithms are designed to handle objects of a single type for which pairwise distances are...
Euclidean Embedding of Co-occurrence Data (2008)
Fernando Pereira, Naftali Tishby, John Lafferty
Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper...
Distributed Video Coding: Selecting the most promising application scenarios (2008)
Pereira, Fernando, Torres, Luis, Guillemot, Christine, Ebrahimi, Touradj, Leonardi, Riccardo, Klomp, Sven
Distributed Video Coding (DVC) is a new video coding paradigm based on two major Information Theory results: the Splepian-Wolf and Wyner-Ziv theorems. Recently, practical DVC solutions have been...
Learning to Create Data-Integrating Queries (2008)
Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, ...
The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...
Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly...
Abstract Document Expansion for Speech Retrieval (2007)
Amit Singhal, Fernando Pereira
Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of \aboutness " for documents | is a...
Glyn V. Morrill, Fernando Pereira
Look under the hood of most theories of grammar or computational linguistic for-malisms and you will find a "machine, " often fueled by "rules, " that grinds...
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (2007)
John Lafferty, Andrew McCallum, Fernando Pereira
We present Conditional Random Fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models...
John Lafferty, Andrew Mccallum, Fernando Pereira
We present conditional random elds, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and...
Orlando Cicchello, Stefan C. Kremer, Fernando Pereira
This paper provides a comprehensive survey of the field of grammar induction applied to randomly generated languages using sparse example sets.
Euclidean Embedding of Co-occurrence Data (2007)
Globerson, Amir, Chechik, Gal, Pereira, Fernando, Tishby, Professor Naftali
Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper...
The Need for Open Source Software in Machine Learning (2007)
Sonnenburg, Sören, Braun, Mikio, Ong, Cheng Soon, Bengio, Samy, Bottou, Leon, Holmes, Geoffrey, ...
Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a...
Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction (2007)
Axel Bernal, Koby Crammer, Artemis Hatzigeorgiou, Fernando Pereira
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models...
Pedro, José, Soares, Luís, Brites, Catarina, Ascenso, João, Pereira, Fernando, Bandeirinha, Carlos, ...
Wyner-Ziv (WZ) video coding is an emerging video coding paradigm based on two major Information Theory results: the Slepian-Wolf and Wyner-Ziv theorems. One of the most interesting and used WZ video...
DVC Before DISCOVER 3 The DVC World in 2004 … (2007)
compression Syndrome based Multimedia coding) solution developed at Univ. Berkeley by Prof. Ramchandran’s team. • Feedback-channel based solution developed at Univ. Stanford by Prof. Girod’s...
John Blitzer, Mark Dredze, Fernando Pereira
Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every...
John Blitzer, Mark Dredze, Fernando Pereira
Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every...
Visual interactive subgroup discovery with numerical properties of interest (2006)
Azevedo, Paulo J., Jorge, Alípio M., Pereira, Fernando
Subgroup discovery consists in finding subsets of individuals from a given population which have distinctive collective properties with regard to one or more properties of interest. The interest of a...
Online Learning of Approximate Dependency Parsing Algorithms (2006)
Ryan Mcdonald, Fernando Pereira
In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higher-order feature representations and allow dependency structures...
Multilingual dependency analysis with a two-stage discriminative parser (2006)
Ryan Mcdonald, Kevin Lerman, Fernando Pereira
We present a two-stage multilingual dependency parser and evaluate it on 13 diverse languages. The first stage is based on the unlabeled dependency parsing models described by McDonald and Pereira...
Domain adaptation with structural correspondence learning (2006)
John Blitzer, Ryan Mcdonald, Fernando Pereira
Discriminative learning methods are widely used in natural language processing. These methods work best when their training and test data are drawn from the same distribution. For many NLP tasks,...
Multilingual dependency analysis with a two-stage discriminative parser (2006)
Ryan Mcdonald, Kevin Lerman, Fernando Pereira
We present a two-stage multilingual dependency parser and evaluate it on 13 diverse languages. The first stage is based on the unlabeled dependency parsing models described by McDonald and Pereira...
Domain adaptation with structural correspondence learning (2006)
John Blitzer, Ryan Mcdonald, Fernando Pereira
Discriminative learning methods are widely used in natural language processing. These methods work best when their training and test data are drawn from the same distribution. For many NLP tasks,...
Online Learning of Approximate Dependency Parsing Algorithms (2006)
Ryan Mcdonald, Fernando Pereira
In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higher-order feature representations and allow dependency structures...
Video Object Relevance Metrics for Overall Segmentation Quality Evaluation (2006)
Paulo Correia, Fernando Pereira
Video object segmentation is a task that humans perform efficiently and effectively, but which is difficult for a computer to perform. Since video segmentation plays an important role for many...
Automatically annotating documents with normalized gene lists (2005)
Crim, Jeremiah, McDonald, Ryan, Pereira, Fernando
Abstract Background Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential...
Identifying gene and protein mentions in text using conditional random fields (2005)
McDonald, Ryan, Pereira, Fernando
Abstract Background We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random...
Automatically annotating documents with normalized gene lists (2005)
Crim, Jeremiah, McDonald, Ryan, Pereira, Fernando
Background: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications...
Weighted Automata in Text and Speech Processing (2005)
Mohri, Mehryar, Pereira, Fernando, Riley, Michael
Finite-state automata are a very effective tool in natural language processing. However, in a variety of applications and especially in speech precessing, it is necessary to consider more general...
Online large-margin training of dependency parsers (2005)
Ryan Mcdonald, Koby Crammer, Fernando Pereira
We present an effective training algorithm for linearly-scored dependency parsers that implements online largemargin multi-class training (Crammer and Singer, 2003; Crammer et al., 2003) on top of...
Flexible text segmentation with structured multilabel classification (2005)
Ryan Mcdonald, Koby Crammer, Fernando Pereira
Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and...
Scalable large-margin online learning for structured classification (2005)
Koby Crammer, Ryan Mcdonald, Fernando Pereira
We investigate large-margin online learning algorithms for large-scale structured classification tasks, focusing on a structured-output extension of MIRA, the multi-class classification algorithm of...
Simple algorithms for complex relation extraction with applications to biomedical IE (2005)
Ryan Mcdonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, Pete White
A complex relation is any n-ary relation in which some of the arguments may be be unspecified. We present here a simple two-stage method for extracting complex relations between named entities in...
Online large-margin training of dependency parsers (2005)
Ryan Mcdonald, Koby Crammer, Fernando Pereira
We present an effective training algorithm for linearly-scored dependency parsers that implements online largemargin multi-class training (Crammer and Singer, 2003; Crammer et al., 2003) on top of...
Flexible text segmentation with structured multilabel classification (2005)
Ryan Mcdonald, Koby Crammer, Fernando Pereira
Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and...
Euclidean embedding of co-occurrence data (2005)
Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby
Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper...
Simple algorithms for complex relation extraction with applications to biomedical IE (2005)
Ryan Mcdonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, Pete White
A complex relation is any n-ary relation in which some of the arguments may be be unspecified. We present here a simple two-stage method for extracting complex relations between named entities in...
Euclidean embedding of co-occurrence data (2005)
Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby
Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a...
Non-projective dependency parsing using spanning tree algorithms (2005)
Ryan Mcdonald, Fernando Pereira, Kiril Ribarov, Jan Hajič
We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for...
Euclidean embedding of co-occurrence data (2005)
Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby
Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a...
Euclidean embedding of co-occurrence data (2005)
Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby, John Lafferty
Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper...
Non-projective dependency parsing using spanning tree algorithms (2005)
Ryan Mcdonald, Fernando Pereira, Kiril Ribarov, Jan Hajič
We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for...
Módulo Cifrador de Documentos Eletrônicos (2004)
Ricardo Custódio, Júlio Dias, Fernando Pereira, Adriana Notoya
AND THE COMMITTEE ON GRADUATE STUDIES (2004)
Ben Taskar, Daphne Koller, Andrew Y. Ng, Fernando Pereira
ii
Case-factor diagrams for structured probabilistic modeling (2004)
David Mcallester, Michael Collins, Fernando Pereira
We introduce a probabilistic formalism subsuming Markov random fields of bounded tree width and probabilistic context free grammars. Our models are based on a representation of Boolean formulas that...
Case-Factor Diagrams for Structured Probabilistic Modeling (2004)
David Mcallester, Michael Collins, Fernando Pereira
We introduce a probabilistic formalism subsuming Markov random fields of bounded tree width and probabilistic context free grammars. Our models are based on a representation of Boolean formulas that...
An entity tagger for recognizing acquired genomic variations in cancer literature (2004)
McDonald, Ryan T., Winters, R. Scott, Mandel, Mark, Jin, Yang, White, Peter S., Pereira, Fernando
Summary: VTag is an application for identifying the type, genomic location and genomic state-change of acquired genomic aberrations described in text. The application uses a machine learning...
An entity tagger for recognizing acquired genomic variations in cancer literature (2004)
McDonald, Ryan T., Winters, R. Scott, Mandel, Mark, Jin, Yang, White, Peter S., Pereira, Fernando
Summary: VTag is an application for identifying the type, genomic location and genomic state-change of acquired genomic aberrations described in text. The application uses a machine learning...
An entity tagger for recognizing acquired genomic variations in cancer literature (2004)
McDonald, Ryan T., Winters, R. Scott, Mandel, Mark, Jin, Yang, White, Peter S., Pereira, Fernando
Summary: VTag is an application for identifying the type, genomic location and genomic state-change of acquired genomic aberrations described in text. The application uses a machine learning...
Universal Multimedia Experiences for Tomorrow (2003)
Fernando Pereira, O Pereira, Ian Burnett
This article discusses the current status of universal multimedia access (UMA) technologies and investigates future directions in this area
Stand-Alone Objective Segmentation Quality Evaluation (2002)
Paulo Lobato Correia, Fernando Pereira
The identification of objects in video sequences, that is, video segmentation, plays a major role in emerging interactive multimedia services, such as those enabled by the ISO MPEG-4 and MPEG-7...
Linear Logic for Meaning Assembly (2002)
Mary Dalrymple, John Lamping, Fernando Pereira, Vijay Saraswat
. Semantic theories of natural language associate meanings with utterances by providing meanings for lexical items, together with composition rules for computing the meanings of larger units from the...
Stand-Alone Objective Segmentation Quality Evaluation (2002)
Fernando Pereira, Paulo Lobato Correia
The identification of objects in video sequences, that is, video segmentation, plays a major role in emerging interactive multimedia services, such as those enabled by the ISO MPEG-4 and MPEG-7...
Stand-Alone Objective Segmentation Quality Evaluation (2002)
Paulo Lobato Correia, Fernando Pereira
The identification of objects in video sequences, that is, video segmentation, plays a major role in emerging interactive multimedia services, such as those enabled by the ISO MPEG-4 and MPEG-7...
Weighted Finite-State Transducers in Speech Recognition (2001)
Mohri, Mehryar, Pereira, Fernando, Riley, Michael
We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs),...
Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
John Lafferty, Fernando Pereira
We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models...
Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
John Lafferty, Fernando Pereira
We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models...
Weighted Finite-State Transducers in Speech Recognition (2001)
Mehryar Mohri, Fernando Pereira, Michael Riley
We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for HMM models, context-dependency, pronunciation...
Maximum entropy markov models for information extraction and segmentation (2000)
Andrew Mccallum, Dayne Freitag, Fernando Pereira
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text...
Maximum entropy markov models for information extraction and segmentation (2000)
Andrew Mccallum, Dayne Freitag, Fernando Pereira
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text...
The Design Principles of a Weighted Finite-State Transducer Library (2000)
Mehryar Mohri, Fernando Pereira, O Pereira, Michael Riley
We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were...
Formal Grammar and Information Theory: Together Again? (2000)
this paper in the usual sense from theory of computation of a problem that has been proven to belong to one of the standard classes believed to require more than polynomial time on a deterministic...
Maximum Entropy Markov Models for Information Extraction and Segmentation (2000)
Andrew Mccallum, Dayne Freitag, Fernando Pereira
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text...
Maximum Entropy Markov Models (2000)
For Information Extraction, Andrew Mccallum, Dayne Freitag, Fernando Pereira
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text...
Weighted Finite-State Transducers in Speech Recognition (2000)
Mehryar Mohri, Fernando Pereira, Michael Riley
We survey the weighted finite-state transducer (WFST) approach to speech recognition developed at AT&T over the last several years. We show that WFSTs provide a common and natural representation...
Relating probabilistic grammars and automata (1999)
Steven Abney, David Mcallester, Fernando Pereira
Both probabilistic context-free grammars (PCFGs) and shift-reduce probabilistic push-down automata (PPDAs) have been used for language modeling and maximum likelihood parsing. We investigate the...
Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira
This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most of our effort for TREC-7 was concentrated on SDR and VLC tracks. On the filtering track, we tested a...
Finding Information In Audio: A New Paradigm For Audio Browsing And Retrieval (1999)
Julia Hirschberg, Steve Whittaker, Don Hindle, Fernando Pereira, O Pereira, Amit Singhal
Information retrieval from audio data is sharply different from information retrieval from text, not simply because speech recognition errors affect retrieval effectiveness, but more fundamentally...
Document Expansion for Speech Retrieval (1999)
Amit Singhal, Fernando Pereira
Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of "aboutness" for documents --- is a...
Relating Probabilistic Grammars and Automata (1999)
Steven Abney, David Mcallester, Fernando Pereira
Both probabilistic context-free grammars (PCFGs) and shift-reduce probabilistic pushdown automata (PPDAs) have been used for language modeling and maximum likelihood parsing. We investigate the...
SCAN: Designing and evaluating user interfaces to support retrieval from speech archives (1999)
Steve Whittaker, Julia Hirschberg, John Choi, Don Hindle, Fernando Pereira, O Pereira, ...
Previous examinations of search in textual archives have assumed that users first retrieve a ranked set of documents relevant to their query, and then visually scan through these documents, to...
Document Expansion for Speech Retrieval (1999)
Amit Singhal, Fernando Pereira
Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of "aboutness" for documents --- is a...
Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira
This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most of our effort for TREC-7 was concentrated on SDR and VLC tracks. On the filtering track, we tested a...
Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira
This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most of our e ort for TREC-7 was concentrated on SDR and VLC tracks. On the ltering track, we tested a...
Relating probabilistic grammars and automata (1999)
Steven Abney, David Mcallester, Fernando Pereira
Both probabilistic context-free grammars (PCFGs) and shift-reduce probabilistic pushdown automata (PPDAs) have been used for language modeling and maximum likelihood parsing. We investigate the...
Candide: An Interactive System for the Acquisition of Domain Specific Knowledge. (1998)
How is the knowledge that is embodied in a computer system acquired? In most present-day systems, it is painstakingly encoded in the actual algorithms that implement the system. Even in most...
A Conditional Random Field for Discriminatively-Trained Finite-State String Edit Distance (1998)
McCallum, Andrew, Bellare, Kedar, Pereira, Fernando
The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit...
Full Expansion Of Context-Dependent Networks In Large Vocabulary Speech Recognition (1998)
Mehryar Mohri, Michael Riley, Don Hindle, Andrej Ljolje, Fernando Pereira
We combine our earlier approach to context-dependent network representation with our algorithm for determinizing weighted networks to build optimized networks for large-vocabulary speech recognition...
An Overview Of The Att Spoken Document Retrieval (1998)
John Choi, Don Hindle, Julia Hirschberg, Ivan Magrin-chagnolleau, Christine Nakatani, Fernando Pereira, ...
We present an overview of a spoken document retrieval system developed at AT&T Labs-Research for the HUB4 Broadcast News corpus. This overview includes a description of the intonational phrase...
SCAN - Speech Content Based Audio Navigator: A Systems Overview (1998)
John Choi, Don Hindle, Julia Hirschberg, Ivan Magrin-chagnolleau, Christine Nakatani, Fernando Pereira, ...
SCAN (Speech Content based Audio Navigator) is a spoken document retrieval system integrating speaker-independent, large-vocabulary speech recognition with information-retrieval to support...
A Rational Design for a Weighted Finite-State Transducer Library (1998)
Mehryar Mohri, Fernando Pereira, O Pereira, Michael Riley
this paper is used ambiguously to refer both to the use of the theory of rational power series as a foundation for the library, and to the design approach, in which each object and function has a...
Similarity-Based Methods For Word Sense Disambiguation (1997)
Dagan, Ido, Lee, Lillian, Pereira, Fernando
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and...
Aggregate and mixed-order Markov models for statistical language processing (1997)
Saul, Lawrence, Pereira, Fernando
We consider the use of language models whose size and accuracy are intermediate between different order n-gram models. Two types of models are studied in particular. Aggregate Markov models are...
Transducer composition for contextdependent network expansion (1997)
Michael Riley, Fernando Pereira, Mehryar Mohri
Context-dependent models for language units are essential in highaccuracy speech recognition. However, standard speech recognition frameworks are based on the substitution of lower-level models for...
ATT Labs -- Research 600 Mountain Ave. Murray Hill, NJ 07974, USA (1997)
Pereira Research, Ido Dagan, Lillian Lee, Fernando Pereira
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and...
Aggregate and mixed-order Markov models for statistical language processing (1997)
Lawrence Saul, Fernando Pereira, O Pereira
We consider the use of language models whose size and accuracy are intermediate between different order n-gram models. Two types of models are studied in particular. Aggregate Markov models are...
Transducer Composition for Context-Dependent Network Expansion (1997)
Michael Riley, Fernando Pereira, Mehryar Mohri
Context-dependent models for language units are essential in high-accuracy speech recognition. However, standard speech recognition frameworks are based on the substitution of lowerlevel models for...
Weighted Automata in Text and Speech Processing (1996)
Mehryar Mohri, Fernando Pereira, O Pereira, Michael Riley
Finite-state automata are a very effective tool in natural language processing. However, in a variety of applications and especially in speech precessing, it is necessary to consider more general...
Second Generation Video Coding Schemes And Their Role In Mpeg-4 (1996)
Luis Torres, Murat Kunt, Fernando Pereira
Since its introduction in 1985, there has been a lot of activity in the field of second generation still image coding. In the last years, the approach has been extended to video coding and has been...
Speech Recognition by Composition of Weighted Finite Automata (1996)
Fernando Pereira, Michael D. Riley
We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent...
Quantifiers, Anaphora, and Intensionality (1995)
Dalrymple, Mary, Lamping, John, Pereira, Fernando, Saraswat, Vijay
The relationship between Lexical-Functional Grammar (LFG) {\em functional structures} (f-structures) for sentences and their semantic interpretations can be expressed directly in a fragment of linear...
Linear Logic for Meaning Assembly (1995)
Dalrymple, Mary, Lamping, John, Pereira, Fernando, Saraswat, Vijay
Semantic theories of natural language associate meanings with utterances by providing meanings for lexical items and rules for determining the meaning of larger units given the meanings of their...
Design of a linguistic postprocessor using Variable Memory Length Markov Models (1995)
Isabelle Guyon, Fernando Pereira
We describe a linguistic postprocessor for character recognizers. The central module of our system is a trainable variable memory length Markov model (VLMM) that predicts the next character given a...
Linear Logic for Meaning Assembly (1995)
Mary Dalrymple, John Lamping, Fernando Pereira, Vijay Saraswat
This paper provides a brief overview of our ongoing investigation in the use of formal deduction to explicate the relationship between syntactic analyses in Lexical-Functional Grammar (LFG) and...
Distributional Clustering of English Words (1994)
Pereira, Fernando, Tishby, Naftali, Lee, Lillian
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest...
Similarity-Based Estimation of Word Cooccurrence Probabilities (1994)
Dagan, Ido, Pereira, Fernando, Lee, Lillian
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two...
A Deductive Account of Quantification in LFG (1994)
Dalrymple, Mary, Lamping, John, Pereira, Fernando, Saraswat, Vijay
The relationship between Lexical-Functional Grammar (LFG) functional structures (f-structures) for sentences and their semantic interpretations can be expressed directly in a fragment of linear logic...
Intensional Verbs Without Type-Raising or Lexical Ambiguity (1994)
Dalrymple, Mary, Lamping, John, Pereira, Fernando, Saraswat, Vijay
We present an analysis of the semantic interpretation of intensional verbs such as seek that allows them to take direct objects of either individual or quantifier type, producing both de dicto and de...
Weighted rational transductions and their application to human language processing (1994)
Fernando Pereira, Michael Riley, Richard Sproat
We present the concepts of weighted language, ~ansduction and au-tomaton from algebraic automata theory as a general framework for describing and implementing decoding cascades in speech and...
Weighted rational transductions and their application to human language processing (1994)
Fernando Pereira, Michael Riley, Richard Sproat
We present the concepts of weighted language, ~ansduction and au-tomaton from algebraic automata theory as a general framework for describing and implementing decoding cascades in speech and...
Similarity-based estimation of word cooccurrence probabilities (1994)
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two...
Similarity-Based Estimation of Word Cooccurrence Probabilities (1994)
Ido Dagan, Fernando Pereira, Lillian Lee
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two...
Similarity-Based Estimation of Word Cooccurrence Probabilities (1994)
Ido Dagan, Fernando Pereira, Lillian Lee
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two...
Similaritybased estimation of word cooccurrence probabilities (1994)
Ido Dagan, Fernando Pereira, Lillian Lee
In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two...
Distributional clustering of english words (1993)
We describe and evaluate experimentally a method for clustering words according to their dis-tribution in particular syntactic contexts. Words are represented by the relative frequency distribu-tions...
Distributional Clustering of English Words (1993)
Fernando Pereira, Naftali Tishby, Lillian Lee
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency...
Distributional Clustering Of English Words (1993)
Fernando Pereira Att, Fernando Pereira, Naftali Tishby, Lillian Lee
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest...
Distributional Clustering of English Words (1993)
Fernando Pereira, Naftali Tishby, Lillian Lee
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest...
Distributional Clustering of English Words (1993)
Fernando Pereira, Naftali Tishby, Lillian Lee
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest...
Distributional clustering of english words (1993)
We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest...
Inside-outside reestimation from partially bracketed corpora (1992)
Fernando Pereira, Yves Schabes
The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information (constituent bracketing) in a partially parsed...
Inside-outside reestimation from partially bracketed corpora (1992)
Fernando Pereira, Yves Schabes
The inside-outside algorithm for inferring the pa-rameters of a stochastic context-free grammar is extended to take advantage of constituent in-formation (constituent bracketing) in a partially...
Finite-State Approximations of Grammars (1990)
Grammars for spoken language systems are subject to the conflicting requirements of language modeling for recognition and of language analysis for sentence inter-pretation. Current recognition...
Unification and Some New Grammatical Formalisms (1987)
Aravind K. Joshi, Fernando Pereira
I have organized my comments around some of the questions posed by the panel chair,
Logic for natural language analysis /--by Fernando Pereira. (1983)
Revision of thesis (Ph. D.)--University of Edinburgh.
Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
Bernal, Axel, Crammer, Koby, Hatzigeorgiou, Artemis, Pereira, Fernando
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models...
REFLEXIÓN SOBRE ALGUNASCARACTERÍSTICAS DEL ESPÍRITU EMPRENDEDORCOLOMBIANO
Enmarcado en el interés por conocer el desarrollo del espíritu emprendedor colombiano, el autor propone retomar un modelo conceptual simple, pero completo, que se ejemplifica identificando algunas...
REFLEXIÓN SOBRE ALGUNASCARACTERÍSTICAS DEL ESPÍRITU EMPRENDEDORCOLOMBIANO
Enmarcado en el interés por conocer el desarrollo del espíritu emprendedor colombiano, el autor propone retomar un modelo conceptual simple, pero completo, que se ejemplifica identificando algunas...
A transcription factor affinity-based code for mammalian transcription initiation
Megraw, Molly, Pereira, Fernando, Jensen, Shane T., Ohler, Uwe, Hatzigeorgiou, Artemis G.
The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription...