Serafim Batzoglou

Publication List Details

Period

1996 - 2009

Number

155

Co-Authors

Research Sequencing a Genome by Walking with Clone-End Sequences: A Mathematical Analysis (2009)

Serafim Batzoglou, Bonnie Berger, Jill Mesirov, Eric S. L

One approach to sequencing a large genome is (1) to sequence a collection of nonoverlapping “seeds ” chosen from a genomic library of large-insert clones [such as bacterial artificial chromosomes...

Letter Characterization of Evolutionary Rates and Constraints in Three Mammalian Genomes (2009)

Gregory M. Cooper, Michael Brudno, Eric A. Stone, Inna Dubchak, Serafim Batzoglou, Arend Sidow

We present an analysis of rates and patterns of microevolutionary phenomena that have shaped the human, mouse, and rat genomes since their last common ancestor. We find evidence for a shift in the...

Letter Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction (2009)

Serafim Batzoglou, Lior Pachter, Jill P. Mesirov, Bonnie Berger, Eric S. L

We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of...

A Classifier-based approach to identify genetic similarities between diseases (2009)

Schaub, Marc A., Kaplow, Irene M., Sirota, Marina, Do, Chuong B., Butte, Atul J., Batzoglou, Serafim

Motivation: Genome-wide association studies are commonly used to identify possible associations between genetic variations and diseases. These studies mainly focus on identifying individual single...

Resource ProbCons: Probabilistic consistency-based multiple sequence alignment (2008)

Chuong B. Do, Michael Brudno, Serafim Batzoglou

To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult...

Article Distribution and intensity of constraint in mammalian genomic sequence (2008)

Gregory M. Cooper, Eric A. Stone, George Asimenos, Nisc Comparative, Sequencing Program, Eric D. Green, ...

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the...

Special Issue Papers (2008)

Serafim Batzoglou

focus is computational biology.

Whole-Genome Sequencing and Assembly with High- Throughput, Short-Read Technologies (2008)

Andreas Sundquist, Mostafa Ronaghi, Haixu Tang, Pavel Pevzner, Serafim Batzoglou

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo...

* Corresponding authors Summary (2008)

Nameeta Shah, Olivier Couronne, Len A. Pennacchio, Michael Brudno, Serafim Batzoglou, Wes Bethel, ...

The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing...

DOI: 10.1093/bioinformatics/btg181 AGenDA: homology-based gene prediction (2008)

Leila Taher, Oliver Rinner, Saurabh Garg, Er Sczyrba, Michael Brudno, Serafim Batzoglou, ...

Summary: We present a www server for homology-based gene prediction. The user enters a pair of evolutionary related genomic sequences, for example from human and mouse. Our software system uses CHAOS...

Computational genomics: Mapping, comparison, and annotation of genomes (2008)

Bonnie Berger, Arthur C. Smith, Serafim Batzoglou, Serafim Batzoglou

The field of genomics provides many challenges to computer scientists and mathematicians. The area of computational genomics has been expanding recently, and the timely application of computer...

Finding Regulatory Motifs with Maximum Density Subgraphs (2008)

Eugene Fratkin, Brian Naughton, Douglas L. Brutlag, Serafim Batzoglou

The identification of over-represented but imperfectly conserved motifs in genomic DNA is a problem with important biological applications, such as the discovery of regulatory elements that determine...

DNA Computing and Molecular Self-Assembly Area Exam (2008)

Serafim Batzoglou

problem by encoding it in DNA and subsequently using a biological protocol that can create and extract the solution in a small number of steps. The main attraction of this method of performing...

Supplementary Material for Training Conditional Random Fields for Maximum Labelwise Accuracy (2008)

Samuel S. Gross, Olga Russakovsky, Chuong B. Do, Serafim Batzoglou

In this supplementary material, we derive the recurrences needed for the computation of the α ⋆ (i, j) and β ⋆ (i, j) matrices in the maximum labelwise accuracy algorithm. 1 Definitions Recall...

CONTRAST: A Discriminative, Phylogeny-free Approach to Multiple Informant De Novo Gene Prediction (2008)

Samuel S Gross, Chuong B Do, Marina Sirota, Serafim Batzoglou, Serafim Batzoglou

We describe CONTRAST, the first system for vertebrate protein-coding gene prediction to successfully use the information present in multiple alignments to achieve greater accuracy than the best...

for Large-Scale Multiple Alignment (2008)

Michael Brudno, Chuong B. Do, Gregory M. Cooper, Michael F. Kim, Eugene Davydov, ...

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the...

Abstract (2008)

Samuel S. Gross, Chuong B. Do, Olga Russakovsky, Serafim Batzoglou

We consider the problem of training a conditional random field (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk...

Effect of genetic divergence in identifying ancestral origin using HAPAA (2008)

Sundquist, Andreas, Fratkin, Eugene, Do, Chuong B., Batzoglou, Serafim

The genome of an admixed individual with ancestors from isolated populations is a mosaic of chromosomal blocks, each following the statistical properties of variation seen in those populations. By...

A max-margin model for efficient simultaneous alignment and folding of RNA sequences (2008)

Do, Chuong B., Foo, Chuan-Sheng, Batzoglou, Serafim

Motivation: The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous...

Automatic parameter learning for multiple network alignment (2008)

Jason Flannick, Chuongb. Do, Balajis. Srinivasan, Serafim Batzoglou

Abstract. We developed Græmlin 2.0, a new multiple network aligner with (1) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions,...

CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novogene prediction (2007)

Gross, Samuel S, Do, Chuong B, Sirota, Marina, Batzoglou, Serafim

Abstract We describe CONTRAST, a gene predictor which directly incorporates information from multiple alignments rather than employing phylogenetic models. This is accomplished through the use of...

Bacterial flora-typing with targeted, chip-based Pyrosequencing (2007)

Sundquist, Andreas, Bigdeli, Saharnaz, Jalili, Roxana, Druzin, Maurice L, Waller, Sarah, Pullen, Kristin M, ...

Abstract Background The metagenomic analysis of microbial communities holds the potential to improve our understanding of the role of microbes in clinical conditions. Recent, dramatic improvements in...

Current progress in network research: toward reference networks for key model organisms (2007)

Srinivasan, Balaji S., Shah, Nigam H., Flannick, Jason A., Abeliuk, Eduardo, Novak, Antal F., Batzoglou, Serafim

The collection of multiple genome-scale datasets is now routine, and the frontier of research in systems biology has shifted accordingly. Rather than clustering a single dataset to produce a static...

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome (2007)

Margulies, Elliott H., Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Dewey, Colin N., Siepel, Adam, ...

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation,...

Article type Method (2007)

Samuel S Gross, Chuong B Do, Marina Sirota, Serafim Batzoglou, Samuel S Gross, Chuong B Do, ...

This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fully formatted PDF and full text (HTML) versions will be made available soon. CONTRAST: a...

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites (2006)

Naughton, Brian T., Fratkin, Eugene, Batzoglou, Serafim, Brutlag, Douglas L.

Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to...

Multiple alignment of protein sequences with repeats and rearrangements (2006)

Phuong, Tu Minh, Do, Chuong B., Edgar, Robert C., Batzoglou, Serafim

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple...

Evidence for intelligent (algorithm) design (2006)

Srinivasan, Balaji S, Do, Chuong B, Batzoglou, Serafim

A report on the 10th annual Research in Computational Molecular Biology (RECOMB) Conference, Venice, Italy, 2-5 April 2006.

CONTRAlign: discriminative training for protein sequence alignment (2006)

Chuong B. Do, Samuel S. Gross, Serafim Batzoglou

1 Introduction In comparative structural biology studies, analyzing or predicting protein three-dimensional structure often begins with identifying patterns of amino acid substitution via protein...

Notes (2006)

Jason Flannick, Antal Novak, Balaji S. Srinivasan, Harley H. Mcadams, Serafim Batzoglou, Email Alerting, ...

Græmlin: General and robust alignment of multiple large interaction networks

Nucleic Acids Research Advance Access published October 26, 2006 Multiple alignment of protein sequences with repeats and rearrangements (2006)

Tu Minh Phuong, Chuong B. Do, Robert C. Edgar, Serafim Batzoglou

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple...

CONTRAlign: discriminative training for protein sequence alignment (2006)

Chuong B. Do, Samuel S. Gross, Serafim Batzoglou

Abstract. In this paper, we present CONTRAlign, an extensible and fully automatic framework for parameter learning and protein pairwise sequence alignment using pair conditional random fields. When...

CONTRAlign: discriminative training for protein sequence alignment (2006)

Chuong B. Do, Samuel S. Gross, Serafim Batzoglou

Abstract. In this paper, we present CONTRAlign, an extensible and fully automatic framework for parameter learning and protein pairwise sequence alignment using pair conditional random fields. When...

Integrated protein interaction networks for 11 microbes (2006)

Balaji S. Srinivasan, Antalf. Novak, Jason A. Flannick, Serafim Batzoglou, Harley H. Mcadams

Abstract. We have combined four different types of functional genomic data to create high coverage protein interaction networks for 11 microbes. Our integration algorithm naturally handles...

Training Log Linear Models using Smoothed Hamming Loss (2006)

Olga Russakovsky, Chuong Do, Serafim Batzoglou

In a paper that is to appear in NIPS this year [1], we proposed a new objective function for training Conditional Random Fields (CRFs). When using the traditional log-likelihood training, the...

Graemlin: General and robust alignment of multiple large interaction networks (2006)

Flannick, Jason, Novak, Antal, Srinivasan, Balaji S., McAdams, Harley H., Batzoglou, Serafim

The recent proliferation of protein interaction networks has motivated research into network alignment: the cross-species comparison of conserved functional modules. Previous studies have laid the...

Multiple alignment of protein sequences with repeats and rearrangements (2006)

Phuong, Tu Minh, Do, Chuong B., Edgar, Robert C., Batzoglou, Serafim

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple...

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites (2006)

Naughton, Brian T., Fratkin, Eugene, Batzoglou, Serafim, Brutlag, Douglas L.

Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to...

Graemlin: General and robust alignment of multiple large interaction networks (2006)

Flannick, Jason, Novak, Antal, Srinivasan, Balaji S., McAdams, Harley H., Batzoglou, Serafim

The recent proliferation of protein interaction networks has motivated research into network alignment: the cross-species comparison of conserved functional modules. Previous studies have laid the...

MotifCut: regulatory motifs finding with maximum density subgraphs (2006)

Fratkin, Eugene, Naughton, Brian T., Brutlag, Douglas L., Batzoglou, Serafim

Motivation: DNA motif finding is one of the core problems in computational biology, for which several probabilistic and discrete approaches have been developed. Most existing methods formulate motif...

CONTRAfold: RNA secondary structure prediction without physics-based models (2006)

Do, Chuong B., Woods, Daniel A., Batzoglou, Serafim

Motivation: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free...

Multiple alignment of protein sequences with repeats and rearrangements (2006)

Phuong, Tu Minh, Do, Chuong B., Edgar, Robert C., Batzoglou, Serafim

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple...

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites (2006)

Naughton, Brian T., Fratkin, Eugene, Batzoglou, Serafim, Brutlag, Douglas L.

Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to...

Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae (2005)

Galagan, James E., Calvo, Sarah E., Cuomo, Christina, Ma, Li-Jun, Wortman, Jennifer R., Batzoglou, Serafim, ...

The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a...

Using multiple alignments to improve seeded local alignment algorithms (2005)

Jason Flannick, Serafim Batzoglou

Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple...

PROBCONS: Probabilistic consistency-based multiple sequence alignment (2005)

Chuong B. Do, Michael Brudno, Serafim Batzoglou

To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult...

Distribution and intensity of constraint in mammalian genomic sequence (2005)

Cooper, Gregory M., Stone, Eric A., Asimenos, George, Green, Eric D., Batzoglou, Serafim, ...

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the...

Using multiple alignments to improve seeded local alignment algorithms (2005)

Flannick, Jason, Batzoglou, Serafim

Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple...

ProbCons: Probabilistic consistency-based multiple sequence alignment (2005)

Do, Chuong B., Mahabhashyam, Mahathi S.P., Brudno, Michael, Batzoglou, Serafim

To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult...

Distribution and intensity of constraint in mammalian genomic sequence (2005)

Cooper, Gregory M., Stone, Eric A., Asimenos, George, Green, Eric D., Batzoglou, Serafim, ...

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the...

The many faces of sequence alignment (2005)

Batzoglou, Serafim

Starting with the sequencing of the mouse genome in 2002, we have entered a period where the main focus of genomics will be to compare multiple genomes in order to learn about human biology and...

Phylo-VISTA: An Interactive Visualization Tool for Multiple DNA Sequence Alignments (2004)

Shah, Nameeta, Couronne, Olivier, Pennacchio, Len A., Brudno, Michael, Batzoglou, Serafim, Bethel, E. Wes, ...

We have developed Phylo-VISTA (Shah et al., 2003), an interactive software tool for analyzing multiple alignments by visualizing a similarity measure for DNA sequences of multiple species. The...

Eukaryotic regulatory element conservation analysis and identification using comparative genomics (2004)

Yueyi Liu, X. Shirley Liu, Liping Wei, Russ B. Altman, Serafim Batzoglou

Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from...

Phylo-VISTA: Interactive Visualization of Multiple DNA Sequence Alignments (2004)

Nameeta Shah Olivier, Olivier Couronne, Len A. Pennacchio, Michael Brudno, Serafim Batzoglou, Wes Bethel, ...

an Internet browser with Java Plug-in 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu. Contact phylovista@lbl.gov 1 Introduction Large-scale genome...

S.: ICA-based clustering of genes from microarray expression data (2004)

Su-in Lee, Serafim Batzoglou

We propose an unsupervised methodology using independent component analysis (ICA) to cluster genes from DNA microarray data. Based on an ICA mixture model of genomic expression patterns, linear and...

S.: ICA-based clustering of genes from microarray expression data (2004)

Su-in Lee, Serafim Batzoglou

We propose an unsupervised methodology using independent component analysis (ICA) to cluster genes from DNA microarray data. Based on an ICA mixture model of genomic expression patterns, linear and...

S.: ICA-based clustering of genes from microarray expression data (2004)

Su-in Lee, Serafim Batzoglou

We propose an unsupervised methodology using independent component analysis (ICA) to cluster genes from DNA microarray data. Based on an ICA mixture model of genomic expression patterns, linear and...

Chaining algorithms for alignment of draft sequence (2004)

Mukund Sundararajan, Michael Brudno, Kerrin Small, Arend Sidow, Serafim Batzoglou

Abstract. In this paper we propose a chaining method that can align a draft genomic sequence against a finished genome. We introduce the use of an overlap tree to enhance the state information...

A Computational Model for RNA Multiple Structural Alignment (2004)

Eugene Davydov, Serafim Batzoglou

Abstract. This paper addresses the problem of aligning multiple sequences of non-coding RNA genes. We approach this problem with the biologically motivated paradigm that scoring of ncRNA alignments...

acid (2004)

Chuong B. Do, Michael Brudno, Serafim Batzoglou

probabilistic consistency-based multiple alignment of amino

Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative Genomics (2004)

Liu, Yueyi, Liu, X. Shirley, Wei, Liping, Altman, Russ B., Batzoglou, Serafim

Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from...

Phylo-VISTA: interactive visualization of multiple DNA sequence alignments (2004)

Shah, Nameeta, Couronne, Olivier, Pennacchio, Len A., Brudno, Michael, Batzoglou, Serafim, Wes Bethel, E., ...

Motivation The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by...

A suite of web-based programs to search for transcriptional regulatory motifs (2004)

Liu, Yueyi, Wei, Liping, Batzoglou, Serafim, Brutlag, Douglas L., Liu, Jun S., Liu, X. Shirley

The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i)...

Characterization of Evolutionary Rates and Constraints in Three Mammalian Genomes (2004)

Cooper, Gregory M., Brudno, Michael, Stone, Eric A., Dubchak, Inna, Batzoglou, Serafim, Sidow, Arend

We present an analysis of rates and patterns of microevolutionary phenomena that have shaped the human, mouse, and rat genomes since their last common ancestor. We find evidence for a shift in the...

Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human (2004)

Brudno, Michael, Poliakov, Alexander, Salamov, Asaf, Cooper, Gregory M., Sidow, Arend, Rubin, Edward M., ...

We have built a whole-genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline that combines the local/global approach of the Berkeley Genome...

Phylo-VISTA: interactive visualization of multiple DNA sequence alignments (2004)

Shah, Nameeta, Couronne, Olivier, Pennacchio, Len A., Brudno, Michael, Batzoglou, Serafim, Bethel, E. Wes, ...

Motivation: The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by...

Phylo-VISTA: interactive visualization of multiple DNA sequence alignments (2004)

Shah, Nameeta, Couronne, Olivier, Pennacchio, Len A., Brudno, Michael, Batzoglou, Serafim, Wes Bethel, E., ...

Motivation The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by...

Fast and sensitive multiple alignment of large genomic sequences (2003)

Brudno, Michael, Chapman, Michael, Göttgens, Berthold, Batzoglou, Serafim, Morgenstern, Burkhard

Abstract Background Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory...

Application of independent component analysis to microarrays (2003)

Lee, Su-In, Batzoglou, Serafim

Abstract We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and...

Phylo-VISTA: An interactive visualization tool for multiple DNA sequence alignments (2003)

Shah, Nameeta, Couronne, Olivier, Pennacchio, Len A., Brudno, Michael, Batzoglou, Serafim, Bethel, E. Wes, ...

Motivation. The power of multi-sequence comparison for biological discovery is well established and sequence data from a growing list of organisms is becoming available. Thus, a need exists for...

Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay (2003)

Shirin Khambata-ford, Yueyi Liu, Christopher Gleason, Mark Dickson, Russ B. Altman, Serafim Batzoglou, ...

This paperdescS83D a retroviral plasmid library-based approac to identify promoter regions in the human genome. This method selecS for anddetec7 the promoter functer of isolated DNA fragments from...

Phylo-VISTA: An Interactive Visualization Tool for Multiple DNA Sequence Alignments (2003)

Nameeta Shah, Olivier Couronne, Len A. Pennacchio, Michael Brudno, Serafim Batzoglou, E. Wes Bethel, ...

We have developed Phylo-VISTA (Shah et al., 2003) , an interactive software tool for analyzing multiple alignments by visualizing a similarity measure for DNA sequences of multiple species. The...

BMC Bioinformatics BioMed Central Methodology article Fast and sensitive multiple alignment of large genomic sequences (2003)

Michael Brudno, Michael Chapman, Berthold Göttgens, Serafim Batzoglou, Burkhard Morgenstern

Background: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements....

LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA (2003)

Brudno, Michael, Do, Chuong B., Cooper, Gregory M., Kim, Michael F., Davydov, Eugene, Program, NISC Comparative Sequencing, ...

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the...

Quantitative Estimates of Sequence Divergence for Comparative Analyses of Mammalian Genomes (2003)

Cooper, Gregory M., Brudno, Michael, Program, NISC Comparative Sequencing, Green, Eric D., Batzoglou, Serafim, Sidow, Arend

Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of...

Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay (2003)

Khambata-Ford, Shirin, Liu, Yueyi, Gleason, Christopher, Dickson, Mark, Altman, Russ B., Batzoglou, Serafim, ...

Attempts to identify regulatory sequences in the human genome have involved experimental and computational methods such as cross-species sequence comparisons and the detection of transcription factor...

Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay (2003)

Khambata-Ford, Shirin, Liu, Yueyi, Gleason, Christopher, Dickson, Mark, Altman, Russ B., Batzoglou, Serafim, ...

Attempts to identify regulatory sequences in the human genome have involved experimental and computational methods such as cross-species sequence comparisons and the detection of transcription factor...

Glocal alignment: finding rearrangements during alignment (2003)

Brudno, Michael, Malde, Sanket, Poliakov, Alexander, Do, Chuong B., Couronne, Olivier, Dubchak, Inna, ...

Motivation: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align...

Computational genomics : mapping, comparison, and annotation of genomes (2000)

Batzoglou, Serafim

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.

Computational genomics : mapping, comparison, and annotation of genomes (2000)

Batzoglou, Serafim

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.

Computational genomics : mapping, comparison, and annotation of genomes / (2000)

Batzoglou, Serafim.

The field of genomics provides many challenges to computer scientists and mathematicians. The area of computational genomics has been expanding recently, and the timely application of computer...

Sandia National laboratories (2000)

Alan Hurd, Brian Walenz, John H. Conway, Freddie W. Peyerl, Serafim Batzoglou, Sorin Istrail, ...

andia is a multiprogram laboratory operated by Sandia Corporation.

A Dictionary Based Approach for Gene Annotation (1999)

Lior Pachter, Serafim Batzoglou, Valentin I. Spitkovsky, William S. Beebee, Eric S. Lander, Bonnie Berger, ...

This paper describes a fast and fully automated dictionary based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and...

A dictionary-based approach for gene annotation (1999)

Lior Pachter, Serafim Batzoglou, Valentin I. Spitkovsky, Eric Banks, Eric S. Lander, Daniel J. Kleitman, ...

This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and...

Recent Developments in Computational Gene Recognition (1998)

Serafim Batzoglou, Bonnie Berger, Daniel J. Kleitman, Eric S. Lander, Eric S. L, Lior Pachter

. We survey recent mathematical and computational work in the field of gene recognition, focusing on the techniques that have been developed to tackle the problem of identifying protein coding...

Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model (1997)

Richa Agarwala, Serafim Batzoglou, Scott E. Decatur, Martin Farach, Sridhar Hannenhalli, Steven Skiena

We consider the problem of determining the threedimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill [3], which...

Local Rules for Protein Folding on a Triangular Lattice and Generalized Hydrophobicity in the HP Model (1997)

Richa Agarwala, Serafim Batzoglou, Scott E. Decatur, Sridhar Hannenhalli, Martin Farach, S. Muthukrishnan, ...

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill [7], which...

Local Rules for Protein Folding on a Triangular Lattice and Generalized Hydrophobicity in the HP Model (1997)

Richa Agarwala, Serafim Batzoglou, Vlado Dancik, Scott E. Decatur, Martin Farach, Sridhar Hannenhalli, ...

We consider the problem of determining the threedimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill [3], which...

Local Rules for Protein Folding on a Triangular Lattice and Generalized Hydrophobicity in the HP Model (1997)

Richa Agarwala, Serafim Batzoglou, Scott E. Decatur, Martin Farach, Sridhar Hannenhalli, S. Muthukrishnan

Richa Agarwala y Serafim Batzoglou z Vlado Danc'ik x Scott E. Decatur -- Martin Farach k Sridhar Hannenhalli S. Muthukrishnan yy Steven Skiena zz A long standing problem in molecular biology is...

An equality theorem prover based on grammar rewriting /--by Serafim Batzoglou. (1996)

Batzoglou, Serafim.

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.

An equality theorem prover based on grammar rewriting (1996)

Batzoglou, Serafim

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.

An equality theorem prover based on grammar rewriting (1996)

Batzoglou, Serafim

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.

Protein Folding in the Hydrophobic-Polar Model on the 3D Triangular Lattice (1996)

Scott Decatur, Serafim Batzoglou

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. The model we use is based on the Hydrophobic-Polar (HP) model [1] on...

Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction

Batzoglou, Serafim, Pachter, Lior, Mesirov, Jill P., Berger, Bonnie, Lander, Eric S.

We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of...

Application of independent component analysis to microarrays

Lee, Su-In, Batzoglou, Serafim

Linear and nonlinear independent component analysis (ICA) was used to project microarray data into statistically independent components that correspond to putative biological processes, and to...

Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative Genomics

Liu, Yueyi, Liu, X. Shirley, Wei, Liping, Altman, Russ B., Batzoglou, Serafim

Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from...

Characterization of Evolutionary Rates and Constraints in Three Mammalian Genomes

Cooper, Gregory M., Brudno, Michael, Stone, Eric A., Dubchak, Inna, Batzoglou, Serafim, Sidow, Arend

We present an analysis of rates and patterns of microevolutionary phenomena that have shaped the human, mouse, and rat genomes since their last common ancestor. We find evidence for a shift in the...

Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human

Brudno, Michael, Poliakov, Alexander, Salamov, Asaf, Cooper, Gregory M., Sidow, Arend, Rubin, Edward M., ...

We have built a whole-genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline that combines the local/global approach of the Berkeley Genome...

Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay

Khambata-Ford, Shirin, Liu, Yueyi, Gleason, Christopher, Dickson, Mark, Altman, Russ B., Batzoglou, Serafim, ...

Attempts to identify regulatory sequences in the human genome have involved experimental and computational methods such as cross-species sequence comparisons and the detection of transcription factor...

LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA

Brudno, Michael, Do, Chuong B., Cooper, Gregory M., Kim, Michael F., Davydov, Eugene, Program, NISC Comparative Sequencing, ...

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the...

Quantitative Estimates of Sequence Divergence for Comparative Analyses of Mammalian Genomes

Cooper, Gregory M., Brudno, Michael, Program, NISC Comparative Sequencing, Green, Eric D., Batzoglou, Serafim, Sidow, Arend

Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of...

A suite of web-based programs to search for transcriptional regulatory motifs

Liu, Yueyi, Wei, Liping, Batzoglou, Serafim, Brutlag, Douglas L., Liu, Jun S., Liu, X. Shirley

The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i)...

ProbCons: Probabilistic consistency-based multiple sequence alignment

Do, Chuong B., Mahabhashyam, Mahathi S.P., Brudno, Michael, Batzoglou, Serafim

To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult...

Distribution and intensity of constraint in mammalian genomic sequence

Cooper, Gregory M., Stone, Eric A., Asimenos, George, Green, Eric D., Batzoglou, Serafim, Sidow, Arend

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the...

Using multiple alignments to improve seeded local alignment algorithms

Flannick, Jason, Batzoglou, Serafim

Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple...

ARACHNE: A Whole-Genome Shotgun Assembler

Batzoglou, Serafim, Jaffe, David B., Stanley, Ken, Butler, Jonathan, Gnerre, Sante, Mauceli, Evan, ...

We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive...

ARACHNE: A Whole-Genome Shotgun Assembler

Batzoglou, Serafim, Jaffe, David B., Stanley, Ken, Butler, Jonathan, Gnerre, Sante, Mauceli, Evan, ...

We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive...

Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction

Batzoglou, Serafim, Pachter, Lior, Mesirov, Jill P., Berger, Bonnie, Lander, Eric S.

We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of...

Application of independent component analysis to microarrays

Lee, Su-In, Batzoglou, Serafim

Linear and nonlinear independent component analysis (ICA) was used to project microarray data into statistically independent components that correspond to putative biological processes, and to...

Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative Genomics

Liu, Yueyi, Liu, X. Shirley, Wei, Liping, Altman, Russ B., Batzoglou, Serafim

Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from...

Characterization of Evolutionary Rates and Constraints in Three Mammalian Genomes

Cooper, Gregory M., Brudno, Michael, Stone, Eric A., Dubchak, Inna, Batzoglou, Serafim, Sidow, Arend

We present an analysis of rates and patterns of microevolutionary phenomena that have shaped the human, mouse, and rat genomes since their last common ancestor. We find evidence for a shift in the...

Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human

Brudno, Michael, Poliakov, Alexander, Salamov, Asaf, Cooper, Gregory M., Sidow, Arend, Rubin, Edward M., ...

We have built a whole-genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline that combines the local/global approach of the Berkeley Genome...

Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay

Khambata-Ford, Shirin, Liu, Yueyi, Gleason, Christopher, Dickson, Mark, Altman, Russ B., Batzoglou, Serafim, ...

Attempts to identify regulatory sequences in the human genome have involved experimental and computational methods such as cross-species sequence comparisons and the detection of transcription factor...

LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA

Brudno, Michael, Do, Chuong B., Cooper, Gregory M., Kim, Michael F., Davydov, Eugene, Program, NISC Comparative Sequencing, ...

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the...

Quantitative Estimates of Sequence Divergence for Comparative Analyses of Mammalian Genomes

Cooper, Gregory M., Brudno, Michael, Program, NISC Comparative Sequencing, Green, Eric D., Batzoglou, Serafim, Sidow, Arend

Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of...

A suite of web-based programs to search for transcriptional regulatory motifs

Liu, Yueyi, Wei, Liping, Batzoglou, Serafim, Brutlag, Douglas L., Liu, Jun S., Liu, X. Shirley

The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i)...

ProbCons: Probabilistic consistency-based multiple sequence alignment

Do, Chuong B., Mahabhashyam, Mahathi S.P., Brudno, Michael, Batzoglou, Serafim

To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult...

Distribution and intensity of constraint in mammalian genomic sequence

Cooper, Gregory M., Stone, Eric A., Asimenos, George, Green, Eric D., Batzoglou, Serafim, Sidow, Arend

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the...

Using multiple alignments to improve seeded local alignment algorithms

Flannick, Jason, Batzoglou, Serafim

Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple...

Multiple alignment of protein sequences with repeats and rearrangements

Phuong, Tu Minh, Do, Chuong B., Edgar, Robert C., Batzoglou, Serafim

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple...

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites

Naughton, Brian T., Fratkin, Eugene, Batzoglou, Serafim, Brutlag, Douglas L.

Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to...

Evidence for intelligent (algorithm) design

Srinivasan, Balaji S, Do, Chuong B, Batzoglou, Serafim

A report on the 10th annual Research in Computational Molecular Biology (RECOMB) Conference, Venice, Italy, 2-5 April 2006.

Græmlin: General and robust alignment of multiple large interaction networks

Flannick, Jason, Novak, Antal, Srinivasan, Balaji S., McAdams, Harley H., Batzoglou, Serafim

The recent proliferation of protein interaction networks has motivated research into network alignment: the cross-species comparison of conserved functional modules. Previous studies have laid the...

Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

Sundquist, Andreas, Ronaghi, Mostafa, Tang, Haixu, Pevzner, Pavel, Batzoglou, Serafim

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo...

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

Margulies, Elliott H., Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Dewey, Colin N., Siepel, Adam, ...

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation,...

CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction

Gross, Samuel S, Do, Chuong B, Sirota, Marina, Batzoglou, Serafim

CONTRAST is a gene predictor that directly incorporates information from multiple alignments and uses discriminative machine learning techniques to give large improvements in prediction over previous...

Effect of genetic divergence in identifying ancestral origin using HAPAA

Sundquist, Andreas, Fratkin, Eugene, Do, Chuong B., Batzoglou, Serafim

The genome of an admixed individual with ancestors from isolated populations is a mosaic of chromosomal blocks, each following the statistical properties of variation seen in those populations. By...

Genetic and Computational Identification of a Conserved Bacterial Metabolic Module

Boutte, Cara C., Srinivasan, Balaji S., Flannick, Jason A., Novak, Antal F., Martens, Andrew T., Batzoglou, Serafim, ...

We have experimentally and computationally defined a set of genes that form a conserved metabolic module in the α-proteobacterium Caulobacter crescentus and used this module to illustrate a schema...

A Classifier-based approach to identify genetic similarities between diseases

Schaub, Marc A., Kaplow, Irene M., Sirota, Marina, Do, Chuong B., Butte, Atul J., Batzoglou, Serafim

Motivation: Genome-wide association studies are commonly used to identify possible associations between genetic variations and diseases. These studies mainly focus on identifying individual single...

A max-margin model for efficient simultaneous alignment and folding of RNA sequences

Do, Chuong B., Foo, Chuan-Sheng, Batzoglou, Serafim

Motivation: The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous...