Efficient oligonucleotide probe selection for pan-genomic tiling arrays (2009)
Phillippy, Adam M, Deng, Xiangyu, Zhang, Wei, Salzberg, Steven L
Abstract Background Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This...
A whole-genome assembly of the domestic cow, Bos taurus (2009)
Zimin, Aleksey V, Delcher, Arthur L, Florea, Liliana, Kelley, David R, Schatz, Michael C, Puiu, Daniela, ...
Abstract Background The genome of the domestic cow, Bos taurus , was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results We have assembled the 35 million...
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (2009)
Langmead, Ben, Trapnell, Cole, Pop, Mihai, Salzberg, Steven L
Abstract Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more...
Review Author's personal copy Bioinformatics challenges of new (2009)
This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author’s institution,...
A Whole-Genome Assembly of the Domestic Cow, Bos taurus (2009)
Zimin, Aleksey V, Delcher, Arthur L., Florea, Liliana, Kelley, David R., Schatz, Michael C., Puiu, Daniela, ...
Background: The genome of the domestic cow, Bos Taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence...
Insignia: a DNA signature search web server for diagnostic assay development (2009)
Phillippy, Adam M., Ayanbule, Kunmi, Edwards, Nathan J., Salzberg, Steven L.
Insignia is a web application for the rapid identification of unique DNA signatures. DNA signatures are distinct nucleotide sequences that can be used to detect the presence of certain organisms and...
OperonDB: a comprehensive database of predicted operons in microbial genomes (2009)
Pertea, Mihaela, Ayanbule, Kunmi, Smedinghoff, Megan, Salzberg, Steven L.
The fast pace of bacterial genome sequencing and the resulting dependence on highly automated annotation methods has driven the development of many genome-wide analysis tools. OperonDB, first...
TopHat: discovering splice junctions with RNA-Seq (2009)
Trapnell, Cole, Pachter, Lior, Salzberg, Steven L.
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used...
2009 Langmead et Volume al. 10, Issue 3, Article R25 Open Access (2009)
Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg
Software Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Steven L. Salzberg, James A. Yorke
With hundreds of genomes now in GenBank, researchers might be forgiven for assuming that genome sequence data are correct, at least at a large scale. Certainly there might be errors at some small...
Improving Genome Assembly without Sequencing (2008)
Michael C. Schatz, Arthur L. Delcher, Pawel Gajer, Jason Miller, Martin Shumway, Steven L. Salzberg
Assembly of genomes from whole-genome sequencing (WGS) projects is one of the most complex computational problems in genomics. WGS assemblers such as Arachne [1] and Celera Assembler [2] are able to...
Arthur L. Delcher, Steven L. Salzberg, Mihai Pop
research interests include genome assembly and comparative genomics.
Methods Hierarchical Scaffolding With Bambus (2008)
Mihai Pop, Daniel S. Kosack, Steven L. Salzberg
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...
John W. Sheppard, Steven L. Salzberg
Abstract. Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic...
Genome analysis Identifying bacterial genes and endosymbiont DNA with Glimmer (2008)
Arthur L. Delcher, Kirsten A. Bratke, Edwin C. Powers, Steven L. Salzberg
Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the...
Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzaepv. oryzae PXO99A (2008)
Salzberg, Steven L, Sommer, Daniel D, Schatz, Michael C, Phillippy, Adam M, Rabinowicz, Pablo D, Tsuge, Seiji, ...
No abstract available.
Methods Computational Gene Prediction Using Multiple Sources of Evidence (2008)
Jonathan E. Allen, Mihaela Pertea, Steven L. Salzberg
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...
Community Page The Genome Assembly Archive: A New Public Resource (2008)
Steven L. Salzberg, Deanna Church, Michael Dicuccio, Eugene Yaschenko, James Ostell
Scientists have dedicated considerable effort to decoding the genomes of an ever-growing list of species, ranging from small viruses, whose genomes may be just a few thousand nucleotides in length,...
Arthur L. Delcher, Steven L. Salzberg, Mihai Pop
research interests include genome assembly and comparative genomics.
Carleton L. Kingsford, Kunmi Ayanbule, Steven L. Salzberg
Rapid, accurate, computational discovery of
Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzaepv. oryzae PXO99A (2008)
Salzberg, Steven L, Sommer, Daniel D, Schatz, Michael C, Phillippy, Adam M, Rabinowicz, Pablo D, Tsuge, Seiji, ...
Abstract Background Xanthomonas oryzae pv. oryzae causes bacterial blight of rice ( Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We...
Steven L. Salzberg, James A. Yorke
With hundreds of genomes now in GenBank, researchers might be forgiven for assuming that genome sequence data are correct, at least at a large scale. Certainly there might be errors at some small...
BIOINFORMATICS EDITORIAL (2008)
John Quackenbush, Steven L. Salzberg
It is time to end the patenting of software
Haas, Brian J, Salzberg, Steven L, Zhu, Wei, Pertea, Mihaela, Allen, Jonathan E, Orvis, Joshua, ...
Abstract EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM,...
Haas, Brian U., Salzberg, Steven L., Zhu, Wei, Pertea, Mihaela, Allen, Jonathan E., Orvis, Joshua, ...
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when...
Jonathan A Eisen, John F Heidelberg, Owen White, Steven L Salzberg
for symmetric chromosomal inversions around the
Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, Steven L. Salzberg
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of...
Investigations of the Greedy Heuristic for Classification Tree Induction (2007)
Sreerama K. Murthy, Steven L. Salzberg
Most existing methods for automatic construction of classification trees utilize the greedy heuristic: trees are constructed one node at a time with no looking ahead or backtracking, choosing locally...
Bootstrapping Memory-Based Learning with Genetic Algorithms (2007)
John W. Sheppard, Steven L. Salzberg
A number of special-purpose learning techniques have been developed in recent years to address the problem of learning with delayed reinforcement. This category includes numerous important control...
,SimonKasif 3,OwenWhite (2007)
Arthur L. Delcher, Steven L. Salzberg
The GLIMMER system for microbial gene identification finds ~97--98 % of all genes in a genome when compared with published annotation. This paper reports on two new results: (i) significant technical...
Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, Steven L. Salzberg
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of...
Reviews Reports Deposited, Brian J Haas, Natalia Volfovsky, Christopher D Town, Maxim Troukhan, Nickolai Alex, ...
Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome...
Arthur L. Delcher, Jane Carlton, Steven L. Salzberg
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...
Keynote Talk Genome Paleontology: Discoveries from Complete Genomes (2007)
Our group has been developing new algorithms for the analyses of complete genome sequences, and using these algorithms to make a variety of biological discoveries in organisms ranging from bacteria...
A Unified Model Explaining the Offsets of Overlapping and Near-Overlapping Prokaryotic Genes (2007)
Kingsford, Carl, Delcher, Arthur L., Salzberg, Steven L.
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation...
Pertea, Mihaela, Mount, Stephen M, Salzberg, Steven L
Abstract Background Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic...
Comprehensive DNA Signature Discovery and Validation (2007)
Adam M. Phillippy, Jacquline A. Mason, Kunmi Ayanbule, Daniel D. Sommer, Elisa Taviani, Anwar Huq, ...
DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive...
Identifying bacterial genes and endosymbiont DNA with Glimmer (2007)
Delcher, Arthur L., Bratke, Kirsten A., Powers, Edwin C., Salzberg, Steven L.
Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the...
Comprehensive DNA Signature Discovery and Validation (2007)
Adam Phillippy, Jacquline A. Mason, Kunmi Ayanbule, Daniel D. Sommer, Elisa Taviani, Anwar Huq, ...
DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive...
Hawkeye: an interactive visual analytics tool for genome assemblies (2007)
Schatz, Michael C, Phillippy, Adam M, Shneiderman, Ben, Salzberg, Steven L
Abstract Genome sequencing remains an inexact science, and genome sequences can contain significant errors if they are not carefully examined. Hawkeye is our new visual analytics tool for genome...
Hawkeye: an interactive visual analytics tool for genome assemblies (2007)
Schatz, Michael C., Phillippy, Adam M., Shneiderman, Ben, Salzberg, Steven L.
Genome sequencing remains an inexact science, and genome sequences can contain significant errors if they are not carefully examined. Hawkeye is our new visual analytics tool for genome assemblies,...
Identifying bacterial genes and endosymbiont DNA with Glimmer (2007)
Delcher, Arthur L., Bratke, Kirsten A., Powers, Edwin C., Salzberg, Steven L.
Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the...
Minimus: a fast, lightweight genome assembler (2007)
Sommer, Daniel D, Delcher, Arthur L, Salzberg, Steven L, Pop, Mihai
Abstract Background Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most...
Minimus: a fast, lightweight genome assembler (2007)
Sommer, Daniel D., Delcher, Arthur L., Salzberg, Steven L., Pop, Mihai
Background: Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses...
Kingsford, Carleton L, Ayanbule, Kunmi, Salzberg, Steven L
Abstract Background In many prokaryotes, transcription of DNA to RNA is terminated by a thymine-rich stretch of DNA following a hairpin loop. Detecting such Rho-independent transcription terminators...
Kingsford, Carleton L., Ayanbule, Kunmi, Salzberg, Steven L.
Background: In many prokaryotes, transcription of DNA to RNA is terminated by a thymine-rich stretch of DNA following a hairpin loop. Detecting such Rho-independent transcription terminators can shed...
Genome re-annotation: a wiki solution? (2007)
Abstract The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately,...
Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis (2007)
Carlton, Jane M., Hirt, Robert P., Silva, Joana C., Delcher, Arthur L., Schatz, Michael, Zhao, Qi, ...
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome,...
A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. (2007)
Kingsford, Carl, Delcher, Arthur L., Salzberg, Steven L.
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation...
A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. (2007)
Kingsford, Carl, Delcher, Arthur L., Salzberg, Steven L.
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation...
Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis (2007)
Carlton, Jane M., Hirt, Robert ., Silva, Joana C., Delcher, Arthur L., Schatz, Michael, Zhao, Qi, ...
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase...
Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis (2007)
Carlton, Jane M., Hirt, Robert ., Silva, Joana C., Delcher, Arthur L., Schatz, Michael, Zhao, Qi, ...
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase...
Bmc Bioinformatics, Daniel D Sommer, Arthur L Delcher, Steven L Salzberg, Biomed Central, Daniel D Sommer, ...
Software Minimus: a fast, lightweight genome assembler
Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. (2006)
Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)
Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)
Jonathan A. Eisen, Robert S. Coyne, Martin Wu, Dongying Wu, Mathangi Thiagarajan, Jennifer R. Wortman, ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons (2006)
Allen, Jonathan E, Salzberg, Steven L
Abstract Background An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies...
Allen, Jonathan E, Majoros, William H, Pertea, Mihaela, Salzberg, Steven L
Abstract Background Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has...
Allen, Jonathan E., Majoros, William H., Pertea, Mihaela, Salzberg, Steven L.
Background: Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to...
Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)
Eisen, Jonathan A., Coyne, Robert S., Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R., ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)
Eisen, Jonathan A., Coyne, Robert S., Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R., ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
falciparum antigens by antigenic analysis of genomic and proteomic (2006)
Z. Bozdech, M. Llinas, B. L. Pulliam, E. D. Wong, J. Zhu, Z. Bozdech, ...
plasmodium
Edward C. Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten St. George, ...
Evolution of the flu virus is analyzed via genomic phylogeny; humans are found to provide a reservoir of antigenic variability implicit in flu adaptation and virulence.
Edward C. Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten St. George, ...
Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...
An empirical analysis of training protocols for probabilistic gene finders (2005)
Majoros, William H, Salzberg, Steven L
No abstract available.
Correction: Serendipitous discovery of Wolbachiagenomes in multiple Drosophilaspecies (2005)
Salzberg, Steven L, Dunning Hotopp, Julie, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...
No abstract available.
Serendipitous discovery of Wolbachiagenomes in multiple Drosophilaspecies (2005)
Salzberg, Steven L, Hotopp, Julie, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...
Abstract Background The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of...
Efficient decoding algorithms for generalized hidden Markov model gene finders (2005)
Majoros, William H, Pertea, Mihaela, Delcher, Arthur L, Salzberg, Steven L
Abstract Background The Generalized Hidden Markov Model (GHMM) has proven a useful framework for the task of computational gene prediction in eukaryotic genomes, due to its flexibility and...
Efficient decoding algorithms for generalized hidden Markov model gene finders (2005)
Majoros, William H., Pertea, Mihaela, Delcher, Arthur L., Salzberg, Steven L.
Background: The Generalized Hidden Markov Model (GHMM) has proven a useful framework for the task of computational gene prediction in eukaryotic genomes, due to its flexibility and probabilistic...
Holmes, Edward C., Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St. George, Kirsten, ...
Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species (2005)
Salzberg, Steven L., Dunning Hotopp, Julie C., Delcher, Arthur L., Pop, Mihai, Smith, Douglas R, Eisen, Michael B., ...
Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale genome sequencing projects. The existence of this data offers scientists the possibility of...
Holmes, Edward C., Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St. George, Kirsten, ...
Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species (2005)
Salzberg, Steven L., Dunning Hotopp, Julie C., Delcher, Arthur L., Pop, Mihai, Smith, Douglas R, Eisen, Michael B., ...
Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale genome sequencing projects. The existence of this data offers scientists the possibility of...
BIOINFORMATICS ORIGINAL PAPER Sequence analysis (2005)
Gene Prediction, Jonathan E. Allen, Steven L. Salzberg
JIGSAW: integration of multiple sources of evidence for
BIOINFORMATICS ORIGINAL PAPER Sequence analysis (2005)
Zasha Weinberg, Walter L. Ruzzo, Steven L. Salzberg
Vol. 22 no. 1 2006, pages 35–39 doi:10.1093/bioinformatics/bti743 Sequence-based heuristics for faster annotation of non-coding RNA families
Edward C. Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten St. George, ...
Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...
Steven L Salzberg, Arthur L Delcher
The electronic version of this article is the complete one and can be
JIGSAW: integration of multiple sources of evidence for gene prediction (2005)
Allen, Jonathan E., Salzberg, Steven L.
Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions...
JIGSAW: integration of multiple sources of evidence for gene prediction (2005)
Allen, Jonathan E., Salzberg, Steven L.
Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions...
An empirical analysis of training protocols for probabilistic gene finders (2004)
Majoros, William H, Salzberg, Steven L
Abstract Background Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent...
An empirical analysis of training protocols for probabilistic gene finders (2004)
Majoros, William H., Salzberg, Steven L.
Background: Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent proliferation...
Naomi Ward, Øivind Larsen, James Sakwa, Live Bruseth, Hoda Khouri, A. Scott Durkin, ...
Methanotrophs are bacteria that use methane as a sole carbon source. The genome sequence of Methylococcus capsulatus deepens our understanding of methanotroph biology and its relationship to global...
Naomi Ward, Øivind Larsen, James Sakwa, Live Bruseth, Hoda Khouri, A. Scott Durkin, ...
Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...
Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...
Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...
Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...
Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...
The Genome Assembly Archive: A New Public Resource (2004)
Steven L. Salzberg, Deanna Church, Michael DiCuccio, Eugene Yaschenko, James Ostell
With the genome assembly archive, it is possible to examine the raw data that underlies the DNA sequence in any sequenced genome.
The Genome Assembly Archive: A New Public Resource (2004)
Steven L. Salzberg, Deanna Church, Michael DiCuccio, Eugene Yaschenko, James Ostell
The Genome Assembly Archive: A New Public Resource (2004)
Salzberg, Steven L., Church, Deanna, DiCuccio, Michael, Yaschenko, Eugene, Ostell, James
Berman, Benjamin P, Pfeiffer, Barret D, Laverty, Todd R, Salzberg, Steven L, Rubin, Gerald M, Eisen, Michael B, ...
Abstract Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of...
Berman, Benjamin P., Pfeiffer, Barret D., Laverty, Todd R., Salzberg, Steven L., Rubin, Gerald M., Eisen, Michael B., ...
Versatile and open software for comparing large genomes (2004)
Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L, Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...
Abstract The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical...
Versatile and open software for comparing large genomes (2004)
Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L., Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing...
DAGchainer: a tool for mining segmental genome duplications and synteny (2004)
Brian J. Haas, Arthur L. Delcher, Jennifer R. Wortman, Steven L. Salzberg
Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order...
2004 Kurtz et Volume al. 5, Issue 2, Article R12 Open Access (2004)
Stefan Kurtz, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu, Steven L Salzberg
Software Versatile and open software for comparing large genomes
clusters in Drosophila melanogaster and Drosophila (2004)
Benjamin P Berman, Barret D Pfeiffer, Todd R Laverty, Steven L Salzberg, Gerald M Rubin, Michael B Eisen, ...
Computational identification of developmental enhancers: conservation and function of transcription factor binding-site
DAGchainer: a tool for mining segmental genome duplications and synteny (2004)
Haas, Brian J., Delcher, Arthur L., Wortman, Jennifer R., Salzberg, Steven L.
Summary: Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved...
DAGchainer: A tool for mining segmental genome duplications and synteny (2004)
Haas, Brian J., Delcher, Arthur L., Wortman, Jennifer R., Salzberg, Steven L.
Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order...
Automated correction of genome sequence errors (2004)
Gajer, Pawel, Schatz, Michael, Salzberg, Steven L.
By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...
Computational Gene Prediction Using Multiple Sources of Evidence (2004)
Allen, Jonathan E., Pertea, Mihaela, Salzberg, Steven L.
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...
Hierarchical Scaffolding With Bambus (2004)
Pop, Mihai, Kosack, Daniel S., Salzberg, Steven L.
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...
Comparative genome assembly (2004)
Pop, Mihai, Phillippy, Adam, Delcher, Arthur L., Salzberg, Steven L.
One of the most complex and computationally intensive tasks of genome sequence analysis is genome assembly. Even today, few centres have the resources, in both software and hardware, to assemble a...
DAGchainer: A tool for mining segmental genome duplications and synteny (2004)
Haas, Brian J., Delcher, Arthur L., Wortman, Jennifer R., Salzberg, Steven L.
Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order...
DOI: 10.1093/nar/gkh216 Automated correction of genome sequence errors (2003)
Pawel Gajer, Michael Schatz, Steven L. Salzberg
By using information from an assembly of a genome, a new program called AutoEditor signi®cantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...
Computational Discovery of Internal Micro-Exons (2003)
Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.
Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in...
GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders (2003)
Majoros, William H., Pertea, Mihaela, Antonescu, Corina, Salzberg, Steven L.
We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard...
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies (2003)
Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith, Roger K., Hannick, Linda I., ...
The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble...
Full-length messenger RNA sequences greatly improve genome annotation (2002)
Haas, Brian J, Volfovsky, Natalia, Town, Christopher D, Troukhan, Maxim, Alexandrov, Nickolai, Feldmann, Kenneth A, ...
Abstract Background Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of...
Full-length messenger RNA sequences greatly improve genome annotation (2002)
Haas, Brian J, Volfovsky, Natalia, Town, Christopher D, Troukhan, Maxim, Alexandrov, Nickolai, Feldman, Kenneth A, ...
Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome...
Finding a Majority Among N Votes. (2002)
Fischer,Michael J., Salzberg,Steven L.
A commonly-used technique for fault-tolerant computing is to perform n redundant computations and then vote on the results, choosing on the majority value if one exists. We present an algorithm for...
Fast algorithms for large-scale genome alignment and comparison (2002)
Delcher, Arthur L., Phillippy, Adam, Carlton, Jane, Salzberg, Steven L.
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...
A clustering method for repeat analysis in DNA sequences (2001)
Volfovsky, Natalia, Haas, Brian J, Salzberg, Steven L
Abstract Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this...
A clustering method for repeat analysis in DNA sequences. (2001)
Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.
Background: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data...
GeneSplicer: a new computational method for splice site prediction (2001)
Mihaela Pertea, Xiaoying Lin, Steven L. Salzberg
GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...
Prediction of operons in microbial genomes (2001)
Maria D. Ermolaeva, Owen White, Steven L. Salzberg
Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....
A Probabilistic Method For Identifying Start Codons In Bacterial Genomes (2001)
Baris E. Suzek, Maria D. Ermolaeva, Mark Schreiber, Steven L. Salzberg
As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities...
GeneSplicer: a new computational method for splice site prediction (2001)
Pertea, Mihaela, Lin, Xiaoying, Salzberg, Steven L.
GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...
Prediction of operons in microbial genomes (2001)
Ermolaeva, Maria D., White, Owen, Salzberg, Steven L.
Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....
A probabilistic method for identifying start codons in bacterial genomes (2001)
Suzek, Baris E., Ermolaeva, Maria D., Schreiber, Mark, Salzberg, Steven L.
As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities...
Evidence for symmetric chromosomal inversions around the replication origin in bacteria (2000)
Eisen, Jonathan A, Heidelberg, John F, White, Owen, Salzberg, Steven L
Abstract Background Whole-genome comparisons can provide great insight into many aspects of biology. Until recently, however, comparisons were mainly possible only between distantly related species....
Evidence for symmetric chromosomal inversions around the replication origin in bacteria (2000)
Eisen, Jonathan A., Heidelberg, John F., White, Owen, Salzberg, Steven L.
Background: Whole-genome comparisons can provide great insight into many aspects of biology. Until recently, however, comparisons were mainly possible only between distantly related species. Complete...
An optimized protocol for analysis of est sequences (2000)
Feng Liang, Ingeborg Holt, Geo Pertea, Svetlana Karamycheva, Steven L. Salzberg, John Quackenbush
The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...
Prediction of Transcription Terminators in Bacterial Genomes (2000)
Maria D. Ermolaeva, Hanif G. Khalak, Owen White, Hamilton O. Smith, Steven L. Salzberg
Introduction Bacterial genomes are organized into units of expression that are bounded by sites where transcription of DNA into RNA is initiated and terminated. Regulation of gene expression is often...
An optimized protocol for analysis of EST sequences (2000)
Liang, Feng, Holt, Ingeborg, Pertea, Geo, Karamycheva, Svetlana, Salzberg, Steven L., Quackenbush, John
The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...
Interpolated Markov models for eukaryotic gene finding (1999)
Steven L. Salzberg, Mihaela Pertea, Arthur L. Delcher, Malcolm J. Gardner, Hervé Tettelin
Computational gene finding research has emphasized the development of gene finders for bacterial and human DNA. This has left genome projects for some small eukaryotes without a system that addresses...
Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project (1999)
Hervé Tettelin, Diana Radune, Simon Kasif, Hoda Khouri, Steven L. Salzberg
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the...
Gene Discovery in DNA Sequences (1999)
this article, I describe the two main computational techniques used for discovering new genes: sequence alignment and computational gene finding.
Microbial gene identification using interpolated Markov models (1998)
Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White
This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...
Microbial gene identification using interpolated Markov models (1998)
Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White
This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...
Microbial gene identification using interpolated Markov models (1998)
Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White
This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...
Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA (1997)
This paper describes a new method for determining the consensus sequences that signal the start of donor translation and the boundaries between exons and introns (donor and acceptor sites) in...
On comparing classifiers: Pitfalls to avoid and a recommended approach (1997)
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very...
A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA (1997)
This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic...
A Teaching Strategy for Memory-Based Control (1997)
John Sheppard, Steven L. Salzberg
Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be...
A method for identifying splice sites and translational start sites in eukaryotic mRNA (1997)
This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic...
Methodological Note On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach (1996)
Steven L. Salzberg, Usama Fayyad
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very...
On Comparing Classifiers: A Critique of Current Research and Methods (1995)
. An importantcomponent of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully,...
Memory Based Learning of Pursuit Games (1995)
John W. Sheppard, Steven L. Salzberg, Steven L. Salzberg
Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that memory based learning can...
Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks (1994)
This paper examines the hypothesis that local weighted variants of k-nearest neighbor algorithms can support dynamic control tasks. We evaluated several k- nearest neighbor (k-NN) algorithms on the...
Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks (1994)
David W. Aha, Steven L. Salzberg
This paper examines the hypothesis that local weighted variants of k-nearest neighbor algorithms can support dynamic control tasks. We evaluated several k- nearest neighbor (k-NN) algorithms on the...
Steven L. Salzberg, David W. Aha
Steven L. Salzberg 1 and David W. Aha 2 1 Introduction Dynamic control problems are the subject of much research in machine learning (e.g., Selfridge, Sutton, & Barto, 1985; Sammut, 1990; Sutton,...
Evidence for symmetric chromosomal inversions around the replication origin in bacteria
Eisen, Jonathan A, Heidelberg, John F, White, Owen, Salzberg, Steven L
GeneSplicer: a new computational method for splice site prediction
Pertea, Mihaela, Lin, Xiaoying, Salzberg, Steven L.
GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...
Prediction of operons in microbial genomes
Ermolaeva, Maria D., White, Owen, Salzberg, Steven L.
Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....
Complete genome sequence of Caulobacter crescentus
Nierman, William C., Feldblyum, Tamara V., Laub, Michael T., Paulsen, Ian T., Nelson, Karen E., Eisen, Jonathan, ...
The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic...
An optimized protocol for analysis of EST sequences
Liang, Feng, Holt, Ingeborg, Pertea, Geo, Karamycheva, Svetlana, Salzberg, Steven L., Quackenbush, John
The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...
Full-length messenger RNA sequences greatly improve genome annotation
Haas, Brian J, Volfovsky, Natalia, Town, Christopher D, Troukhan, Maxim, Alexandrov, Nickolai, Feldmann, Kenneth A, ...
Fast algorithms for large-scale genome alignment and comparison
Delcher, Arthur L., Phillippy, Adam, Carlton, Jane, Salzberg, Steven L.
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...
Paulsen, Ian T., Seshadri, Rekha, Nelson, Karen E., Eisen, Jonathan A., Heidelberg, John F., Read, Timothy D., ...
The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of...
The Value of Complete Microbial Genome Sequencing (You Get What You Pay For)
Fraser, Claire M., Eisen, Jonathan A., Nelson, Karen E., Paulsen, Ian T., Salzberg, Steven L.
GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders
Majoros, William H., Pertea, Mihaela, Antonescu, Corina, Salzberg, Steven L.
We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard...
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith, Roger K., Hannick, Linda I., ...
The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble...
Kennedy, Sean P., Ng, Wailap Victor, Salzberg, Steven L., Hood, Leroy, DasSarma, Shiladitya
The genome of the halophilic archaeon Halobacterium sp. NRC-1 and predicted proteome have been analyzed by computational methods and reveal characteristics relevant to life in an extreme environment...
Computational Gene Prediction Using Multiple Sources of Evidence
Allen, Jonathan E., Pertea, Mihaela, Salzberg, Steven L.
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...
Hierarchical Scaffolding With Bambus
Pop, Mihai, Kosack, Daniel S., Salzberg, Steven L.
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...
Automated correction of genome sequence errors
Gajer, Pawel, Schatz, Michael, Salzberg, Steven L.
By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...
Versatile and open software for comparing large genomes
Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L, Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Computational Discovery of Internal Micro-Exons
Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.
Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in...
The Genome Assembly Archive: A New Public Resource
Salzberg, Steven L, Church, Deanna, DiCuccio, Michael, Yaschenko, Eugene, Ostell, James
With the genome assembly archive, it is possible to examine the raw data that underlies the DNA sequence in any sequenced genome
Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath)
Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...
Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...
Berman, Benjamin P, Pfeiffer, Barret D, Laverty, Todd R, Salzberg, Steven L, Rubin, Gerald M, Eisen, Michael B, ...
27 predicted gene-regulatory regions in the Drosophila melanogaster genome were analyzed in vivo, confirming 15 active enhancer regions. A comparison with Drosophila pseudoobscura sequences revealed...
Efficient decoding algorithms for generalized hidden Markov model gene finders
Majoros, William H, Pertea, Mihaela, Delcher, Arthur L, Salzberg, Steven L
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
Salzberg, Steven L, Hotopp, Julie C Dunning, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...
By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly:...
Correction: Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
Salzberg, Steven L, Dunning Hotopp, Julie C, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...
A correction to Serendipitous discovery of Wolbachia genomes in multiple Drosophila species by SL Salzberg, JC Dunning Hotopp, AL Delcher, M Pop, DR Smith, MB Eisen and WC Nelson. Genome Biology...
Holmes, Edward C, Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St George, Kirsten, ...
Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...
Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote
Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
Evidence for symmetric chromosomal inversions around the replication origin in bacteria
Eisen, Jonathan A, Heidelberg, John F, White, Owen, Salzberg, Steven L
GeneSplicer: a new computational method for splice site prediction
Pertea, Mihaela, Lin, Xiaoying, Salzberg, Steven L.
GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...
Prediction of operons in microbial genomes
Ermolaeva, Maria D., White, Owen, Salzberg, Steven L.
Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....
Complete genome sequence of Caulobacter crescentus
Nierman, William C., Feldblyum, Tamara V., Laub, Michael T., Paulsen, Ian T., Nelson, Karen E., Eisen, Jonathan, ...
The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic...
An optimized protocol for analysis of EST sequences
Liang, Feng, Holt, Ingeborg, Pertea, Geo, Karamycheva, Svetlana, Salzberg, Steven L., Quackenbush, John
The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...
Full-length messenger RNA sequences greatly improve genome annotation
Haas, Brian J, Volfovsky, Natalia, Town, Christopher D, Troukhan, Maxim, Alexandrov, Nickolai, Feldmann, Kenneth A, ...
Fast algorithms for large-scale genome alignment and comparison
Delcher, Arthur L., Phillippy, Adam, Carlton, Jane, Salzberg, Steven L.
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...
Paulsen, Ian T., Seshadri, Rekha, Nelson, Karen E., Eisen, Jonathan A., Heidelberg, John F., Read, Timothy D., ...
The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of...
The Value of Complete Microbial Genome Sequencing (You Get What You Pay For)
Fraser, Claire M., Eisen, Jonathan A., Nelson, Karen E., Paulsen, Ian T., Salzberg, Steven L.
GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders
Majoros, William H., Pertea, Mihaela, Antonescu, Corina, Salzberg, Steven L.
We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard...
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith, Roger K., Hannick, Linda I., ...
The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble...
Kennedy, Sean P., Ng, Wailap Victor, Salzberg, Steven L., Hood, Leroy, DasSarma, Shiladitya
The genome of the halophilic archaeon Halobacterium sp. NRC-1 and predicted proteome have been analyzed by computational methods and reveal characteristics relevant to life in an extreme environment...
Computational Gene Prediction Using Multiple Sources of Evidence
Allen, Jonathan E., Pertea, Mihaela, Salzberg, Steven L.
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...
Hierarchical Scaffolding With Bambus
Pop, Mihai, Kosack, Daniel S., Salzberg, Steven L.
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...
Automated correction of genome sequence errors
Gajer, Pawel, Schatz, Michael, Salzberg, Steven L.
By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...
Versatile and open software for comparing large genomes
Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L, Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Computational Discovery of Internal Micro-Exons
Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.
Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in...
The Genome Assembly Archive: A New Public Resource
Salzberg, Steven L, Church, Deanna, DiCuccio, Michael, Yaschenko, Eugene, Ostell, James
With the genome assembly archive, it is possible to examine the raw data that underlies the DNA sequence in any sequenced genome
Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath)
Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...
Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...
Berman, Benjamin P, Pfeiffer, Barret D, Laverty, Todd R, Salzberg, Steven L, Rubin, Gerald M, Eisen, Michael B, ...
27 predicted gene-regulatory regions in the Drosophila melanogaster genome were analyzed in vivo, confirming 15 active enhancer regions. A comparison with Drosophila pseudoobscura sequences revealed...
Efficient decoding algorithms for generalized hidden Markov model gene finders
Majoros, William H, Pertea, Mihaela, Delcher, Arthur L, Salzberg, Steven L
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
Salzberg, Steven L, Hotopp, Julie C Dunning, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...
By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly:...
Correction: Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
Salzberg, Steven L, Dunning Hotopp, Julie C, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...
A correction to Serendipitous discovery of Wolbachia genomes in multiple Drosophila species by SL Salzberg, JC Dunning Hotopp, AL Delcher, M Pop, DR Smith, MB Eisen and WC Nelson. Genome Biology...
Holmes, Edward C, Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St George, Kirsten, ...
Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...
Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote
Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...
The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...
JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
Allen, Jonathan E, Majoros, William H, Pertea, Mihaela, Salzberg, Steven L
Minimus: a fast, lightweight genome assembler
Sommer, Daniel D, Delcher, Arthur L, Salzberg, Steven L, Pop, Mihai
Genome re-annotation: a wiki solution?
Genome annotation currently tends to represent a static snapshot. Routine re-annotation, perhaps using wiki software, would help.
Kingsford, Carleton L, Ayanbule, Kunmi, Salzberg, Steven L
Using a novel computational method, an extensive collection of predicted Rho-independent transcription terminators is derived from 343 prokaryotes, offering insight into their relationship to DNA...
Hawkeye: an interactive visual analytics tool for genome assemblies
Schatz, Michael C, Phillippy, Adam M, Shneiderman, Ben, Salzberg, Steven L
Hawkeye is a new, freely available visual analytics tool for genome assemblies, designed to aid in identifying and correcting assembly errors.
Comprehensive DNA Signature Discovery and Validation
Phillippy, Adam M, Mason, Jacquline A, Ayanbule, Kunmi, Sommer, Daniel D, Taviani, Elisa, Huq, Anwar, ...
DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive...
Haas, Brian J, Salzberg, Steven L, Zhu, Wei, Pertea, Mihaela, Allen, Jonathan E, Orvis, Joshua, ...
EVidenceModeler (EVM) is an automated annotation tool that predicts protein-coding regions, alternatively spliced transcripts and untranslated regions of eukaryotic genes.
Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A
Salzberg, Steven L, Sommer, Daniel D, Schatz, Michael C, Phillippy, Adam M, Rabinowicz, Pablo D, Tsuge, Seiji, ...
Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads
Salzberg, Steven L., Sommer, Daniel D., Puiu, Daniela, Lee, Vincent T.
Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that...
Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18
Puiu, Daniela, Salzberg, Steven L.
Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica...
Lu, Hong, Patil, Prabhu, Van Sluys, Marie-Anne, White, Frank F., Ryan, Robert P., Dow, J. Maxwell, ...
Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A
Salzberg, Steven L, Sommer, Daniel D, Schatz, Michael C, Phillippy, Adam M, Rabinowicz, Pablo D, Tsuge, Seiji, ...
A whole-genome assembly of the domestic cow, Bos taurus
Zimin, Aleksey V, Delcher, Arthur L, Florea, Liliana, Kelley, David R, Schatz, Michael C, Puiu, Daniela, ...
A cow whole-genome assembly of 2.86 billion base pairs that closes gaps and corrects previously-described inversions and deletions as well as describing a portion of the Y chromosome.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Langmead, Ben, Trapnell, Cole, Pop, Mihai, Salzberg, Steven L
Bowtie: a new ultrafast memory-efficient tool for the alignment of short DNA sequence reads to large genomes.
TopHat: discovering splice junctions with RNA-Seq
Trapnell, Cole, Pachter, Lior, Salzberg, Steven L.
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used...
OperonDB: a comprehensive database of predicted operons in microbial genomes
Pertea, Mihaela, Ayanbule, Kunmi, Smedinghoff, Megan, Salzberg, Steven L.
The fast pace of bacterial genome sequencing and the resulting dependence on highly automated annotation methods has driven the development of many genome-wide analysis tools. OperonDB, first...
The Complete Genome Sequence of Bacillus anthracis Ames “Ancestor”▿
Ravel, Jacques, Jiang, Lingxia, Stanley, Scott T., Wilson, Mark R., Decker, R. Scott, Read, Timothy D., ...
The pathogenic bacterium Bacillus anthracis has become the subject of intense study as a result of its use in a bioterrorism attack in the United States in September and October 2001. Previous...
Insignia: a DNA signature search web server for diagnostic assay development
Phillippy, Adam M., Ayanbule, Kunmi, Edwards, Nathan J., Salzberg, Steven L.
Insignia is a web application for the rapid identification of unique DNA signatures. DNA signatures are distinct nucleotide sequences that can be used to detect the presence of certain organisms and...
Genome Sequence of the Wolbachia Endosymbiont of Culex quinquefasciatus JHB▿
Salzberg, Steven L., Puiu, Daniela, Sommer, Daniel D., Nene, Vish, Lee, Norman H.
Wolbachia species are endosymbionts of a wide range of invertebrates, including mosquitoes, fruit flies, and nematodes. The wPip strains can cause cytoplasmic incompatibility in some strains of the...
Genome Analysis Linking Recent European and African Influenza (H5N1) Viruses
Salzberg, Steven L., Kingsford, Carl, Cattoli, Giovanni, Spiro, David J., Janies, Daniel A., Aly, Mona Mehrez, ...
Although linked, these viruses are distinct from earlier outbreak strains.