Picking Alignments from (Steiner) Trees (2009)
Lior Pachter, Fumei Lam, Marina Alexandersson
The application of Needleman-Wunsch alignment techniques t,o biological sequences is complicated by two serious prob-lems when t,he sequences are long: the running time, which scales as the product...
Serafim Batzoglou, Lior Pachter, Jill P. Mesirov, Bonnie Berger, Eric S. L
We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of...
John Maynard Smith, Lior Pachter, Bernd Sturmfels
www.cambridge.org Information on this title: www.cambridge.org/9780521857000 c ○ Cambridge University Press 2005 This book is in copyright. Subject to statutory exception and to the provisions of...
TopHat: discovering splice junctions with RNA-Seq (2009)
Trapnell, Cole, Pachter, Lior, Salzberg, Steven L.
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used...
Selecting universities: personal preference and rankings (2008)
Polyhedral geometry can be used to quantitatively assess the dependence of rankings on personal preference, and provides a tool for both students and universities to assess US News and World Report...
On the optimality of the neighbor-joining algorithm (2008)
Eickmeyer, Kord, Huggins, Peter, Pachter, Lior, Yoshida, Ruriko
Abstract The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this...
Abstract. Micro-indels are small insertion or deletion events (indels) that occur during genome evolution. The study of micro-indels is important, both in order to better understand the underlying...
AMAP–Fast and accurate multiple alignment using posterior decoding and sequence annealing (2008)
Ariel S. Schwartz, Michael Smoot, Lior Pachter
We present AMAP, a fast and accurate multiple sequence alignment program. AMAP is based on a sequence annealing algorithm, which improves significantly on the standard...
SHADOWER: A generalized hidden Markov phylogeny for multiple-sequence (2008)
Jon D. Mcauliffe, Lior Pachter, Michael I. Jordan
functional annotation
BIOINFORMATICS Multiple alignment by sequence annealing (2008)
Ariel S. Schwartz, Lior Pachter
Motivation: We introduce a novel approach to multiple alignment that is based on an algorithm for rapidly checking whether single matches are consistent with a partial multiple alignment. This leads...
Radu Mihaescu, Dan Levy, Lior Pachter
Theorem 2 of [2], which claims to settle Atteson’s edge radius conjecture [1], is invalidly proven. The argument in [2] is inductive, and is based on the assumption that if initially | | ˆ D −...
Homology mapping with Markov random fields (2008)
multiple sequence alignment.
Combinatorics of least squares trees (2008)
A recurring theme in the least squares approach to phylogenetics has been the discovery of elegant combinatorial formulas for the least squares estimates of edge lengths. These formulas have proved...
Specific alignment of structured RNA: stochastic grammars and sequence annealing (2008)
Bradley, Robert K., Pachter, Lior, Holmes, Ian
Motivation: Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative...
Combining statistical alignment and phylogenetic footprinting to detect regulatory elements (2008)
Satija, Rahul, Pachter, Lior, Hein, Jotun
Motivation: Traditional alignment-based phylogenetic footprinting approaches make predictions on the basis of a single assumed alignment. The predictions are therefore highly sensitive to alignment...
An Interesting Result About Subset Sums (2007)
Nitu Kitchloo Lior, Lior Pachter
We consider the problem of determining the number of subsets B ` f1; 2; : : : ; ng such that P b2B b j k mod n, where k is a residue class mod n (0 ! k n). If the number of such subsets is denoted N...
Begun, David J, Holloway, Alisha K, Stevens, Kristian, Hillier, Ladeana W, Poh, Yu-Ping, Hahn, Matthew W, ...
The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between...
David J. Begun, Alisha K. Holloway, Kristian Stevens, LaDeana W. Hillier, Yu-Ping Poh, Matthew W. Hahn, ...
The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between...
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures (2007)
Lin, Michael F, Kheradpour, Pouya, Pedersen, Jakob S, Parts, Leopold, Carlson, Joseph W, ...
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12...
On the optimality of the neighbor-joining algorithm (2007)
Eickmeyer, Kord, Huggins, Peter, Pachter, Lior, Yoshida, Ruriko
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of...
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies (2007)
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present...
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies (2007)
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present...
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies (2007)
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present...
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies (2007)
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present...
Viral population estimation using pyrosequencing (2007)
Eriksson, Nicholas, Pachter, Lior, Mitsuya, Yumi, Rhee, Soo-Yon, Wang, Chunlin, Gharizadeh, Baback, ...
The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently...
Margulies, Elliott H., Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Dewey, Colin N., Siepel, Adam, ...
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation,...
Analysis of epistatic interactions and fitness landscapes using a new geometric approach (2007)
Beerenwinkel, Niko, Pachter, Lior, Sturmfels, Bernd, Elena, Santiago F, Lenski, Richard E
Abstract Background Understanding interactions between mutations and how they affect fitness is a central problem in evolutionary biology that bears on such fundamental issues as the structure of...
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies (2007)
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present...
Convex Rank Tests and Semigraphoids (2007)
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd, Wienand, Oliver
Convex rank tests are partitions of the symmetric group which have desirable geometric properties. The statistical tests defined by such partitions involve counting all permutations in the...
The Neighbor-Net Algorithm (2007)
The neighbor-joining algorithm is a popular phylogenetics method for constructing trees from dissimilarity maps. The neighbor-net algorithm is an extension of the neighbor-joining algorithm and is...
Multiple alignment by sequence annealing (2007)
S. Schwartz, Ariel, Pachter, Lior
Motivation: We introduce a novel approach to multiple alignment that is based on an algorithm for rapidly checking whether single matches are consistent with a partial multiple alignment. This leads...
DOI 10.1007/s11538-007-9244-7 (2007)
Peter Huggins, Lior Pachter, Bernd Sturmfels
Abstract The human genotope is the convex hull of all allele frequency vectors that can be obtained from the genotypes present in the human population. In this paper, we take a few initial steps...
An introduction to reconstructing ancestral genomes (2006)
Recent advances in high-throughput genomics technologies have resulted in the sequencing of large numbers of (near) complete genomes. These genome sequences are being mined for important functional...
Towards the Human Genotope (2006)
Huggins, Peter, Pachter, Lior, Sturmfels, Bernd
The human genotope is the convex hull of all allele frequency vectors that can be obtained from the genotypes present in the human population. In this paper we take a few initial steps towards a...
Parametric Alignment of Drosophila Genomes (2006)
Colin N. Dewey, Peter M. Huggins, Kevin Woods, Bernd Sturmfels, Lior Pachter
The classic algorithms of Needleman–Wunsch and Smith–Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone...
Morton, Jason, Pachter, Lior, Shiu, Anne, Sturmfels, Bernd, Wienand, Oliver
We study partitions of the symmetric group which have desirable geometric properties. The statistical tests defined by such partitions involve counting all permutations in the equivalence classes....
Parametric alignment of Drosophila genomes (2006)
Colin Dewey, Peter Huggins, Kevin Woods, Bernd Sturmfels, Lior Pachter
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have...
Reference based annotation with GeneMapper (2006)
Chatterji, Sourav, Pachter, Lior
Abstract We introduce GeneMapper, a program for transferring annotations from a well annotated genome to other genomes. Drawing on high quality curated annotations, GeneMapper enables rapid and...
Epistasis and Shapes of Fitness Landscapes (2006)
Beerenwinkel, Niko, Pachter, Lior, Sturmfels, Bernd
The relationship between the shape of a fitness landscape and the underlying gene interactions, or epistasis, has been extensively studied in the two-locus case. Gene interactions among multiple loci...
Why neighbor-joining works (2006)
Mihaescu, Radu, Levy, Dan, Pachter, Lior
We show that the neighbor-joining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson's optimal radius bound as...
Why neighbor-joining works (2006)
Radu Mihaescu, Dan Levy, Lior Pachter
Abstract. We show that the neighbor-joining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal...
Beyond pairwise distances: Neighborjoining with phylogenetic diversity estimates (2006)
Dan Levy, Ruriko Yoshida, Lior Pachter
The Neighbor-Joining algorithm is a recursive procedure for re-constructing trees that is based on a transformation of pairwise dis-tances between leaves. We present a generalization of the...
Sturmfels: The mathematics of phylogenomics (2006)
“The lack of real contact between mathematics and biology is either a tragedy, a scandal or a challenge, it is hard to decide which. ” – Gian-Carlo Rota, [26, p. 2] 1
Parametric alignment of Drosophila genomes (2006)
Colin Dewey, Peter Huggins, Kevin Woods, Bernd Sturmfels, Lior Pachter
The classic algorithms of Needleman–Wunsch and Smith–Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have...
Evolution at the nucleotide level: the problem of multiple whole-genome alignment (2006)
Dewey, Colin N., Pachter, Lior
With the genome sequences of numerous species at hand, we have the opportunity to discover how evolution has acted at each and every nucleotide in our genome. To this end, we must identify sets of...
Beyond Pairwise Distances: Neighbor-Joining with Phylogenetic Diversity Estimates (2006)
Levy, Dan, Yoshida, Ruriko, Pachter, Lior
The “neighbor-joining algorithm” is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the...
Jason Morton, Lior Pachter, Anne Shiu, Bernd Sturmfels, Oliver Wien
We study partitions of the symmetric group which have desirable geometric properties. The statistical tests defined by such partitions involve counting all permutations in the equivalence classes....
Sourav Chatterji, Lior Pachter
electronic version of this article is the complete one and can be
Parametric Alignment of Drosophila Genomes (2005)
Dewey, Colin, Huggins, Peter, Woods, Kevin, Sturmfels, Bernd, Pachter, Lior
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have...
Alignment Metric Accuracy (2005)
Schwartz, Ariel S., Myers, Eugene W., Pachter, Lior
We propose a metric for the space of multiple sequence alignments that can be used to compare two alignments to each other. In the case where one of the alignments is a reference alignment, the...
Neighbor joining with phylogenetic diversity estimates (2005)
Levy, Dan, Yoshida, Ruriko, Pachter, Lior
The Neighbor-Joining algorithm is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the...
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities (2005)
The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis....
Beyond Pairwise Distances: Neighbor Joining with Phylogenetic Diversity Estimates (2005)
Levy, Dan, Yoshida, Ruriko, Pachter, Lior
The Neighbor-Joining algorithm is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the...
Identification of transposable elements using multiple alignments of related genomes (2005)
Accurate genome-wide cataloging of transposable elements (TEs) will facilitate our understanding of mobile DNA evolution, expose the genomic effects of TEs on the host genome, and improve the quality...
Beyond Pairwise Distances: Neighbor Joining with Phylogenetic Diversity Estimates (2005)
Levy, Dan, Yoshida, Ruriko, Pachter, Lior
The Neighbor-Joining algorithm is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the...
Subtree power analysis finds optimal species for comparative genomics (2004)
McAuliffe, Jon D., Jordan, Michael I., Pachter, Lior
Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization...
Guigo, Roderic, Birney, Ewan, Brent, Michael, Dermitzakis, Emmanouil, Pachter, Lior, Crollius, Hugues Roest, ...
With the sponsorship of ``Fundacio La Caixa'' we met in Barcelona, November 21st and 22nd, to analyze the reasons why, after the completion of the human genome sequence, the identification all...
The Mathematics of Phylogenomics (2004)
Pachter, Lior, Sturmfels, Bernd
The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed...
Intra-species sequence comparisons for annotating genomes (2004)
Boffelli, Dario, Weer, Claire V., Weng, Li, Lewis, Keith D., Shoukry, Malak I., Pachter, Lior, ...
Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its...
Parametric Inference for Biological Sequence Analysis (2004)
Pachter, Lior, Sturmfels, Bernd
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences....
VISTA - computational tools for comparative genomics (2004)
Frazer, Kelly A., Pachter, Lior, Poliakov, Alexander, Rubin, Edward M., Dubchak, Inna
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here we describe the VISTA family of tools created to assist biologists in...
Parametric inference for biological sequence analysis (2004)
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences....
Jon D. Mcauliffe, Michael I. Jordan, Lior Pachter
Subtree power analysis finds optimal species for
Multiple organism gene finding by collapsed Gibbs sampling (2004)
Sourav Chatterji, Lior Pachter
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then numerous...
Reconstructing trees from subtree weights (2004)
The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree metric, and has served as the foundation for numerous distance-based reconstruction...
VISTA: computational tools for comparative genomics (2004)
Kelly A. Frazer, Lior Pachter, Er Poliakov, Edward M. Rubin, Inna Dubchak
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in...
Multiple-sequence functional annotation and the generalized hidden Markov phylogeny (2004)
McAuliffe, Jon D., Pachter, Lior, Jordan, Michael I.
Motivation: Phylogenetic shadowing is a comparative genomics principle which allows for the discovery of conserved regions in sequences from multiple closely-related organisms. We develop a formal...
VISTA: computational tools for comparative genomics (2004)
Frazer, Kelly A., Pachter, Lior, Poliakov, Alexander, Rubin, Edward M., Dubchak, Inna
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in...
Identification of Evolutionary Hotspots in the Rodent Genomes (2004)
We describe a whole-genome comparative analysis of the human, mouse, and rat genomes to describe the average substitution patterns of four genomic regions: ancient repeats, rodent-specific DNA,...
Dewey, Colin, Wu, Jia Qian, Cawley, Simon, Alexandersson, Marina, Gibbs, Richard, Pachter, Lior
We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM...
MAVID: Constrained Ancestral Alignment of Multiple Sequences (2004)
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood...
Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER (2004)
Chakrabarti, Kushal, Pachter, Lior
We introduce a novel genome browser application, the K-BROWSER, that allows intuitive visualization of biological information across an arbitrary number of multiply aligned genomes. In particular,...
Intraspecies sequence comparisons for annotating genomes (2004)
Boffelli, Dario, Weer, Claire V., Weng, Li, Lewis, Keith D., Shoukry, Malak I., Pachter, Lior, ...
Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its...
Intraspecies sequence comparisons for annotating genomes (2004)
Boffelli, Dario, Weer, Claire V., Weng, Li, Lewis, Keith D., Shoukry, Malak I., Pachter, Lior, ...
Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its...
Multiple-sequence functional annotation and the generalized hidden Markov phylogeny (2004)
McAuliffe, Jon D., Pachter, Lior, Jordan, Michael I.
Motivation: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal...
Multiple-sequence functional annotation and the generalized hidden Markov phylogeny (2004)
McAuliffe, Jon D., Pachter, Lior, Jordan, Michael I.
Motivation: Phylogenetic shadowing is a comparative genomics principle which allows for the discovery of conserved regions in sequences from multiple closely-related organisms. We develop a formal...
MAVID: Constrained ancestral alignment of multiple sequences (2003)
We describe a new global multiple alignment program capable of aligning a large number of genomic regions. Our progressive alignment approach incorporates the following ideas: maximum-likelihood...
Reconstructing Trees from Subtree Weights (2003)
Pachter, Lior, Speyer, David E
The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree metric, and has served as the foundation for numerous distance-based reconstruction...
Tropical Geometry of Statistical Models (2003)
Pachter, Lior, Sturmfels, Bernd
This paper presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint,...
SLAM web server for comparative gene finding and alignment (2003)
Simon Cawley, Lior Pachter, Marina Alex
SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for...
DOI: 10.1093/nar/gkg623 MAVID multiple alignment server (2003)
MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID webserver allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to...
Strategies and Tools for Whole-Genome Alignments (2003)
Couronne, Olivier, Poliakov, Alexander, Bray, Nicolas, Ishkhanov, Tigran, Ryaboy, Dmitriy, Rubin, Edward, ...
The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for...
AVID: A Global Alignment Program (2003)
Bray, Nick, Dubchak, Inna, Pachter, Lior
In this paper we describe a new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to...
SLAM web server for comparative gene finding and alignment (2003)
Cawley, Simon, Pachter, Lior, Alexandersson, Marina
SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for...
MAVID multiple alignment server (2003)
MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to...
HMM sampling and applications to gene finding and alternative splicing (2003)
Cawley, Simon L., Pachter, Lior
The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the...
SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model (2003)
Alexandersson, Marina, Cawley, Simon, Pachter, Lior
Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic...
Strategies and tools for whole genome alignments (2002)
Couronne, Olivier, Poliakov, Alexander, Bray, Nicolas, Ishkhanov, Tigran, Ryaboy, Dmitriy, Rubin, Edward, ...
The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We have investigated different strategies of alignment...
Mapping and identification of essential gene functions on the X chromosome of Drosophila (2002)
Peter, Annette, Schoettler, Petra, Werner, Meike, Beinert, Nicole, Dowe, Gordon, Burkert, Peter, ...
The Drosophila melanogaster genome consists of four chromosomes that contain 165 Mb of DNA, 120 Mb of which are euchromatic. The two Drosophila Genome Projects, in collaboration with Celera Genomics...
Applications of generalized pair hidden Markov models to alignment and gene finding problems (2002)
Lior Pachter, Marina Alexandersson, Simon Cawley
Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene � nding and annotation. Alignment problems can be...
The explosion in genomic sequence avaliable in public databases has resulted in an unprecedented opportunity for computational whole genome analyses. A number of promising comparative-based...
Benos, Panayiotis V., Gatt, Melanie K., Murphy, Lee, Harris, David, Barrell, Bart, Ferraz, Concepcion, ...
We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome of Drosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94...
Lior Pachter, Inna Dubchak, Lior Pachter
is a staff scientist at Lawrence Berkeley National Laboratory. Her research interests include development of bioinformatics tools and biological databases in the areas of comparative genomics,...
Benos, Panayiotis V., Gatt, Melanie K., Murphy, Lee, Harris, David, Barrell, Bart, Ferraz, Concepcion, ...
Active Conservation of Noncoding Sequences Revealed by Three-Way Species Comparisons (2000)
Dubchak, Inna, Brudno, Michael, Loots, Gabriela G., Pachter, Lior, Mayor, Chris, Rubin, Edward M., ...
Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction (2000)
Batzoglou, Serafim, Pachter, Lior, Mesirov, Jill P., Berger, Bonnie, Lander, Eric S.
A Dictionary Based Approach for Gene Annotation (1999)
Lior Pachter, Serafim Batzoglou, Valentin I. Spitkovsky, William S. Beebee, Eric S. Lander, Bonnie Berger, ...
This paper describes a fast and fully automated dictionary based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and...
A Dictionary Based Approach for Gene Annotation (1999)
Lior Pachter, Valentin I. Spitkovsky, Eric Banks, Eric S. Lander, Daniel J. Kleitman, Bonnie Berger
L.P. and S.B. contributed equally to this work. z
A dictionary-based approach for gene annotation (1999)
Lior Pachter, Serafim Batzoglou, Valentin I. Spitkovsky, Eric Banks, Eric S. Lander, Daniel J. Kleitman, ...
This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and...
UAV Task Assignment with Timing Constraints via Mixed-Integer Linear Programming (1998)
Schumacher, Corey, Chandler, Phillip, Pachter, Meir, Pachter, Lior
The optimal timing of air-to-ground tasks is undertaken. Specifically, a scenario where multiple air vehicles are required to prosecute geographically dispersed targets is considered. The vehicles...
Recent Developments in Computational Gene Recognition (1998)
Serafim Batzoglou, Bonnie Berger, Daniel J. Kleitman, Eric S. Lander, Eric S. L, Lior Pachter
. We survey recent mathematical and computational work in the field of gene recognition, focusing on the techniques that have been developed to tackle the problem of identifying protein coding...
We give the first complete combinatorial proof of the fact that the number of domino tilings of the 2n 2n square grid is of the form 2 n (2k + 1) 2 , thus settling a question raised in [4] . The...
The Role of G Triplets in Splicing - A Computational View (1997)
A recent paper by McCullough and Berget [11] discusses the effect of G triplets (GGG) on splice site selection in short introns. Experimental evidence suggests that G triplets in introns play an...
We give the first complete combinatorial proof of the fact that the number of domino tilings of the 2n 2n square grid is of the form 2 n (2k +1) 2 ,thus settling a question raised in [4] . The proof...
SLAM web server for comparative gene finding and alignment
Cawley, Simon, Pachter, Lior, Alexandersson, Marina
SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for...
MAVID multiple alignment server
MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to...
rVista for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites
Loots, Gabriela G., Ovcharenko, Ivan, Pachter, Lior, Dubchak, Inna, Rubin, Edward M.
Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput...
Active Conservation of Noncoding Sequences Revealed by Three-Way Species Comparisons
Dubchak, Inna, Brudno, Michael, Loots, Gabriela G., Pachter, Lior, Mayor, Chris, Rubin, Edward M., ...
Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human–mouse DNA comparison studies have...
Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction
Batzoglou, Serafim, Pachter, Lior, Mesirov, Jill P., Berger, Bonnie, Lander, Eric S.
We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of...
Benos, Panayiotis V., Gatt, Melanie K., Murphy, Lee, Harris, David, Barrell, Bart, Ferraz, Concepcion, ...
We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome of Drosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94...
Identification of Evolutionary Hotspots in the Rodent Genomes
We describe a whole-genome comparative analysis of the human, mouse, and rat genomes to describe the average substitution patterns of four genomic regions: ancient repeats, rodent-specific DNA,...
Dewey, Colin, Wu, Jia Qian, Cawley, Simon, Alexandersson, Marina, Gibbs, Richard, Pachter, Lior
We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM...
MAVID: Constrained Ancestral Alignment of Multiple Sequences
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood...
Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER
Chakrabarti, Kushal, Pachter, Lior
We introduce a novel genome browser application, the K-BROWSER, that allows intuitive visualization of biological information across an arbitrary number of multiply aligned genomes. In particular,...
SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model
Alexandersson, Marina, Cawley, Simon, Pachter, Lior
Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic...
Strategies and Tools for Whole-Genome Alignments
Couronne, Olivier, Poliakov, Alexander, Bray, Nicolas, Ishkhanov, Tigran, Ryaboy, Dmitriy, Rubin, Edward, ...
The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for...
AVID: A Global Alignment Program
Bray, Nick, Dubchak, Inna, Pachter, Lior
In this paper we describe a new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to...
VISTA: computational tools for comparative genomics
Frazer, Kelly A., Pachter, Lior, Poliakov, Alexander, Rubin, Edward M., Dubchak, Inna
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in...
Tropical geometry of statistical models
Pachter, Lior, Sturmfels, Bernd
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint,...
Parametric inference for biological sequence analysis
Pachter, Lior, Sturmfels, Bernd
One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences....
Intraspecies sequence comparisons for annotating genomes
Boffelli, Dario, Weer, Claire V., Weng, Li, Lewis, Keith D., Shoukry, Malak I., Pachter, Lior, ...
Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its...
Mapping and identification of essential gene functions on the X chromosome of Drosophila
Peter, Annette, Schöttler, Petra, Werner, Meike, Beinert, Nicole, Dowe, Gordon, Burkert, Peter, ...
The Drosophila melanogaster genome consists of four chromosomes that contain 165 Mb of DNA, 120 Mb of which are euchromatic. The two Drosophila Genome Projects, in collaboration with Celera Genomics...
Subtree power analysis and species selection for comparative genomics
McAuliffe, Jon D., Jordan, Michael I., Pachter, Lior
Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization...
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities
The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis....
Parametric Alignment of Drosophila Genomes
Dewey, Colin N, Huggins, Peter M, Woods, Kevin, Sturmfels, Bernd, Pachter, Lior
The classic algorithms of Needleman–Wunsch and Smith–Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone...
Identification of transposable elements using multiple alignments of related genomes
Accurate genome-wide cataloging of transposable elements (TEs) will facilitate our understanding of mobile DNA evolution, expose the genomic effects of TEs on the host genome, and improve the quality...
Reference based annotation with GeneMapper
Chatterji, Sourav, Pachter, Lior
GeneMapper, a new program for transferring annotations from a well-annotated reference genome to other genomes, is described.
SLAM web server for comparative gene finding and alignment
Cawley, Simon, Pachter, Lior, Alexandersson, Marina
SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for...
MAVID multiple alignment server
MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to...
rVista for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites
Loots, Gabriela G., Ovcharenko, Ivan, Pachter, Lior, Dubchak, Inna, Rubin, Edward M.
Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput...
Active Conservation of Noncoding Sequences Revealed by Three-Way Species Comparisons
Dubchak, Inna, Brudno, Michael, Loots, Gabriela G., Pachter, Lior, Mayor, Chris, Rubin, Edward M., ...
Human and mouse genomic sequence comparisons are being increasingly used to search for evolutionarily conserved gene regulatory elements. Large-scale human–mouse DNA comparison studies have...
Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction
Batzoglou, Serafim, Pachter, Lior, Mesirov, Jill P., Berger, Bonnie, Lander, Eric S.
We describe a novel analytical approach to gene recognition based on cross-species comparison. We first undertook a comparison of orthologous genomic loci from human and mouse, studying the extent of...
Benos, Panayiotis V., Gatt, Melanie K., Murphy, Lee, Harris, David, Barrell, Bart, Ferraz, Concepcion, ...
We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome of Drosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94...
Identification of Evolutionary Hotspots in the Rodent Genomes
We describe a whole-genome comparative analysis of the human, mouse, and rat genomes to describe the average substitution patterns of four genomic regions: ancient repeats, rodent-specific DNA,...
Dewey, Colin, Wu, Jia Qian, Cawley, Simon, Alexandersson, Marina, Gibbs, Richard, Pachter, Lior
We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM...
MAVID: Constrained Ancestral Alignment of Multiple Sequences
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood...
Visualization of Multiple Genome Annotations and Alignments With the K-BROWSER
Chakrabarti, Kushal, Pachter, Lior
We introduce a novel genome browser application, the K-BROWSER, that allows intuitive visualization of biological information across an arbitrary number of multiply aligned genomes. In particular,...
SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model
Alexandersson, Marina, Cawley, Simon, Pachter, Lior
Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic...
Strategies and Tools for Whole-Genome Alignments
Couronne, Olivier, Poliakov, Alexander, Bray, Nicolas, Ishkhanov, Tigran, Ryaboy, Dmitriy, Rubin, Edward, ...
The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for...
AVID: A Global Alignment Program
Bray, Nick, Dubchak, Inna, Pachter, Lior
In this paper we describe a new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to...
VISTA: computational tools for comparative genomics
Frazer, Kelly A., Pachter, Lior, Poliakov, Alexander, Rubin, Edward M., Dubchak, Inna
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in...
Tropical geometry of statistical models
Pachter, Lior, Sturmfels, Bernd
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint,...
Parametric inference for biological sequence analysis
Pachter, Lior, Sturmfels, Bernd
One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences....
Intraspecies sequence comparisons for annotating genomes
Boffelli, Dario, Weer, Claire V., Weng, Li, Lewis, Keith D., Shoukry, Malak I., Pachter, Lior, ...
Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its...
Mapping and identification of essential gene functions on the X chromosome of Drosophila
Peter, Annette, Schöttler, Petra, Werner, Meike, Beinert, Nicole, Dowe, Gordon, Burkert, Peter, ...
The Drosophila melanogaster genome consists of four chromosomes that contain 165 Mb of DNA, 120 Mb of which are euchromatic. The two Drosophila Genome Projects, in collaboration with Celera Genomics...
Subtree power analysis and species selection for comparative genomics
McAuliffe, Jon D., Jordan, Michael I., Pachter, Lior
Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization...
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities
The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis....
Identification of transposable elements using multiple alignments of related genomes
Accurate genome-wide cataloging of transposable elements (TEs) will facilitate our understanding of mobile DNA evolution, expose the genomic effects of TEs on the host genome, and improve the quality...
Parametric Alignment of Drosophila Genomes
Dewey, Colin N, Huggins, Peter M, Woods, Kevin, Sturmfels, Bernd, Pachter, Lior
The classic algorithms of Needleman–Wunsch and Smith–Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone...
Reference based annotation with GeneMapper
Chatterji, Sourav, Pachter, Lior
GeneMapper, a new program for transferring annotations from a well-annotated reference genome to other genomes, is described.
Analysis of epistatic interactions and fitness landscapes using a new geometric approach
Beerenwinkel, Niko, Pachter, Lior, Sturmfels, Bernd, Elena, Santiago F, Lenski, Richard E
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Margulies, Elliott H., Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Dewey, Colin N., Siepel, Adam, ...
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation,...
Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans
Begun, David J, Holloway, Alisha K, Stevens, Kristian, Hillier, LaDeana W, Poh, Yu-Ping, Hahn, Matthew W, ...
The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between...
Viral Population Estimation Using Pyrosequencing
Eriksson, Nicholas, Pachter, Lior, Mitsuya, Yumi, Rhee, Soo-Yon, Wang, Chunlin, Gharizadeh, Baback, ...
The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently...
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies
Jason Morton, Lior Pachter, Anne Shiu, Bernd Sturmfels
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present...
On the optimality of the neighbor-joining algorithm
Eickmeyer, Kord, Huggins, Peter, Pachter, Lior, Yoshida, Ruriko
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of...
Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock
Dequéant, Mary-Lee, Ahnert, Sebastian, Edelsbrunner, Herbert, Fink, Thomas M. A., Glynn, Earl F., Hattem, Gaye, ...
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest...
Combinatorics of least-squares trees
A recurring theme in the least-squares approach to phylogenetics has been the discovery of elegant combinatorial formulas for the least-squares estimates of edge lengths. These formulas have proved...
Bradley, Robert K., Roberts, Adam, Smoot, Michael, Juvekar, Sudeep, Do, Jaeyoung, Dewey, Colin, ...
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical...
TopHat: discovering splice junctions with RNA-Seq
Trapnell, Cole, Pachter, Lior, Salzberg, Steven L.
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used...