Steven L. Salzberg

Efficient oligonucleotide probe selection for pan-genomic tiling arrays (2009)

Phillippy, Adam M, Deng, Xiangyu, Zhang, Wei, Salzberg, Steven L

Abstract Background Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This...

A whole-genome assembly of the domestic cow, Bos taurus (2009)

Zimin, Aleksey V, Delcher, Arthur L, Florea, Liliana, Kelley, David R, Schatz, Michael C, Puiu, Daniela, ...

Abstract Background The genome of the domestic cow, Bos taurus , was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results We have assembled the 35 million...

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (2009)

Langmead, Ben, Trapnell, Cole, Pop, Mihai, Salzberg, Steven L

Abstract Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more...

Review Author's personal copy Bioinformatics challenges of new (2009)

Mihai Pop, Steven L. Salzberg

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author’s institution,...

A Whole-Genome Assembly of the Domestic Cow, Bos taurus (2009)

Zimin, Aleksey V, Delcher, Arthur L., Florea, Liliana, Kelley, David R., Schatz, Michael C., Puiu, Daniela, ...

Background: The genome of the domestic cow, Bos Taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence...

Insignia: a DNA signature search web server for diagnostic assay development (2009)

Phillippy, Adam M., Ayanbule, Kunmi, Edwards, Nathan J., Salzberg, Steven L.

Insignia is a web application for the rapid identification of unique DNA signatures. DNA signatures are distinct nucleotide sequences that can be used to detect the presence of certain organisms and...

OperonDB: a comprehensive database of predicted operons in microbial genomes (2009)

Pertea, Mihaela, Ayanbule, Kunmi, Smedinghoff, Megan, Salzberg, Steven L.

The fast pace of bacterial genome sequencing and the resulting dependence on highly automated annotation methods has driven the development of many genome-wide analysis tools. OperonDB, first...

TopHat: discovering splice junctions with RNA-Seq (2009)

Trapnell, Cole, Pachter, Lior, Salzberg, Steven L.

Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used...

2009 Langmead et Volume al. 10, Issue 3, Article R25 Open Access (2009)

Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg

Software Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

BIOINFORMATICS LETTER TO THE EDITOR doi:10.1093/bioinformatics/bti769 Beware of mis-assembled genomes (2008)

Steven L. Salzberg, James A. Yorke

With hundreds of genomes now in GenBank, researchers might be forgiven for assuming that genome sequence data are correct, at least at a large scale. Certainly there might be errors at some small...

Improving Genome Assembly without Sequencing (2008)

Michael C. Schatz, Arthur L. Delcher, Pawel Gajer, Jason Miller, Martin Shumway, Steven L. Salzberg

Assembly of genomes from whole-genome sequencing (WGS) projects is one of the most complex computational problems in genomics. WGS assemblers such as Arachne [1] and Celera Assembler [2] are able to...

Methods Hierarchical Scaffolding With Bambus (2008)

Mihai Pop, Daniel S. Kosack, Steven L. Salzberg

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...

c � 1997 Kluwer Academic Publishers. Printed in the Netherlands. A Teaching Strategy for Memory-Based Control (2008)

John W. Sheppard, Steven L. Salzberg

Abstract. Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic...

Genome analysis Identifying bacterial genes and endosymbiont DNA with Glimmer (2008)

Arthur L. Delcher, Kirsten A. Bratke, Edwin C. Powers, Steven L. Salzberg

Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the...

Methods Computational Gene Prediction Using Multiple Sources of Evidence (2008)

Jonathan E. Allen, Mihaela Pertea, Steven L. Salzberg

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...

Community Page The Genome Assembly Archive: A New Public Resource (2008)

Steven L. Salzberg, Deanna Church, Michael Dicuccio, Eugene Yaschenko, James Ostell

Scientists have dedicated considerable effort to decoding the genomes of an ever-growing list of species, ranging from small viruses, whose genomes may be just a few thousand nucleotides in length,...

Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzaepv. oryzae PXO99A (2008)

Salzberg, Steven L, Sommer, Daniel D, Schatz, Michael C, Phillippy, Adam M, Rabinowicz, Pablo D, Tsuge, Seiji, ...

Abstract Background Xanthomonas oryzae pv. oryzae causes bacterial blight of rice ( Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We...

BIOINFORMATICS LETTER TO THE EDITOR doi:10.1093/bioinformatics/bti769 Beware of mis-assembled genomes (2008)

Steven L. Salzberg, James A. Yorke

With hundreds of genomes now in GenBank, researchers might be forgiven for assuming that genome sequence data are correct, at least at a large scale. Certainly there might be errors at some small...

BIOINFORMATICS EDITORIAL (2008)

John Quackenbush, Steven L. Salzberg

It is time to end the patenting of software

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments (2008)

Haas, Brian J, Salzberg, Steven L, Zhu, Wei, Pertea, Mihaela, Allen, Jonathan E, Orvis, Joshua, ...

Abstract EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM,...

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments (2008)

Haas, Brian U., Salzberg, Steven L., Zhu, Wei, Pertea, Mihaela, Allen, Jonathan E., Orvis, Joshua, ...

EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when...

4 (2007)

Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, Steven L. Salzberg

A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of...

Investigations of the Greedy Heuristic for Classification Tree Induction (2007)

Sreerama K. Murthy, Steven L. Salzberg

Most existing methods for automatic construction of classification trees utilize the greedy heuristic: trees are constructed one node at a time with no looking ahead or backtracking, choosing locally...

Bootstrapping Memory-Based Learning with Genetic Algorithms (2007)

John W. Sheppard, Steven L. Salzberg

A number of special-purpose learning techniques have been developed in recent years to address the problem of learning with delayed reinforcement. This category includes numerous important control...

,SimonKasif 3,OwenWhite (2007)

Arthur L. Delcher, Steven L. Salzberg

The GLIMMER system for microbial gene identification finds ~97--98 % of all genes in a genome when compared with published annotation. This paper reports on two new results: (i) significant technical...

4 (2007)

Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, Steven L. Salzberg

A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of...

Full-length (2007)

Reviews Reports Deposited, Brian J Haas, Natalia Volfovsky, Christopher D Town, Maxim Troukhan, Nickolai Alex, ...

Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome...

1 (2007)

Arthur L. Delcher, Jane Carlton, Steven L. Salzberg

We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...

Keynote Talk Genome Paleontology: Discoveries from Complete Genomes (2007)

Steven L. Salzberg, Ph. D

Our group has been developing new algorithms for the analyses of complete genome sequences, and using these algorithms to make a variety of biological discoveries in organisms ranging from bacteria...

A Unified Model Explaining the Offsets of Overlapping and Near-Overlapping Prokaryotic Genes (2007)

Kingsford, Carl, Delcher, Arthur L., Salzberg, Steven L.

Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation...

A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana (2007)

Pertea, Mihaela, Mount, Stephen M, Salzberg, Steven L

Abstract Background Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions. However exonic...

Comprehensive DNA Signature Discovery and Validation (2007)

Adam M. Phillippy, Jacquline A. Mason, Kunmi Ayanbule, Daniel D. Sommer, Elisa Taviani, Anwar Huq, ...

DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive...

Identifying bacterial genes and endosymbiont DNA with Glimmer (2007)

Delcher, Arthur L., Bratke, Kirsten A., Powers, Edwin C., Salzberg, Steven L.

Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the...

Comprehensive DNA Signature Discovery and Validation (2007)

Adam Phillippy, Jacquline A. Mason, Kunmi Ayanbule, Daniel D. Sommer, Elisa Taviani, Anwar Huq, ...

DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive...

Hawkeye: an interactive visual analytics tool for genome assemblies (2007)

Schatz, Michael C, Phillippy, Adam M, Shneiderman, Ben, Salzberg, Steven L

Abstract Genome sequencing remains an inexact science, and genome sequences can contain significant errors if they are not carefully examined. Hawkeye is our new visual analytics tool for genome...

Hawkeye: an interactive visual analytics tool for genome assemblies (2007)

Schatz, Michael C., Phillippy, Adam M., Shneiderman, Ben, Salzberg, Steven L.

Genome sequencing remains an inexact science, and genome sequences can contain significant errors if they are not carefully examined. Hawkeye is our new visual analytics tool for genome assemblies,...

Identifying bacterial genes and endosymbiont DNA with Glimmer (2007)

Delcher, Arthur L., Bratke, Kirsten A., Powers, Edwin C., Salzberg, Steven L.

Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archæa and viruses representing hundreds of species. We describe several major changes to the...

Minimus: a fast, lightweight genome assembler (2007)

Sommer, Daniel D, Delcher, Arthur L, Salzberg, Steven L, Pop, Mihai

Abstract Background Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most...

Minimus: a fast, lightweight genome assembler (2007)

Sommer, Daniel D., Delcher, Arthur L., Salzberg, Steven L., Pop, Mihai

Background: Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses...

Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake (2007)

Kingsford, Carleton L, Ayanbule, Kunmi, Salzberg, Steven L

Abstract Background In many prokaryotes, transcription of DNA to RNA is terminated by a thymine-rich stretch of DNA following a hairpin loop. Detecting such Rho-independent transcription terminators...

Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake (2007)

Kingsford, Carleton L., Ayanbule, Kunmi, Salzberg, Steven L.

Background: In many prokaryotes, transcription of DNA to RNA is terminated by a thymine-rich stretch of DNA following a hairpin loop. Detecting such Rho-independent transcription terminators can shed...

Genome re-annotation: a wiki solution? (2007)

Salzberg, Steven L

Abstract The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately,...

Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis (2007)

Carlton, Jane M., Hirt, Robert P., Silva, Joana C., Delcher, Arthur L., Schatz, Michael, Zhao, Qi, ...

We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome,...

A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. (2007)

Kingsford, Carl, Delcher, Arthur L., Salzberg, Steven L.

Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation...

A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. (2007)

Kingsford, Carl, Delcher, Arthur L., Salzberg, Steven L.

Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation...

Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis (2007)

Carlton, Jane M., Hirt, Robert ., Silva, Joana C., Delcher, Arthur L., Schatz, Michael, Zhao, Qi, ...

We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase...

Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis (2007)

Carlton, Jane M., Hirt, Robert ., Silva, Joana C., Delcher, Arthur L., Schatz, Michael, Zhao, Qi, ...

We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase...

Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. (2006)

Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)

Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)

Jonathan A. Eisen, Robert S. Coyne, Martin Wu, Dongying Wu, Mathangi Thiagarajan, Jennifer R. Wortman, ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons (2006)

Allen, Jonathan E, Salzberg, Steven L

Abstract Background An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies...

JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions (2006)

Allen, Jonathan E, Majoros, William H, Pertea, Mihaela, Salzberg, Steven L

Abstract Background Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has...

JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions (2006)

Allen, Jonathan E., Majoros, William H., Pertea, Mihaela, Salzberg, Steven L.

Background: Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to...

Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)

Eisen, Jonathan A., Coyne, Robert S., Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R., ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote (2006)

Eisen, Jonathan A., Coyne, Robert S., Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R., ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses (2005)

Edward C. Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten St. George, ...

Evolution of the flu virus is analyzed via genomic phylogeny; humans are found to provide a reservoir of antigenic variability implicit in flu adaptation and virulence.

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses (2005)

Edward C. Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten St. George, ...

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...

Serendipitous discovery of Wolbachiagenomes in multiple Drosophilaspecies (2005)

Salzberg, Steven L, Hotopp, Julie, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...

Abstract Background The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of...

Efficient decoding algorithms for generalized hidden Markov model gene finders (2005)

Majoros, William H, Pertea, Mihaela, Delcher, Arthur L, Salzberg, Steven L

Abstract Background The Generalized Hidden Markov Model (GHMM) has proven a useful framework for the task of computational gene prediction in eukaryotic genomes, due to its flexibility and...

Efficient decoding algorithms for generalized hidden Markov model gene finders (2005)

Majoros, William H., Pertea, Mihaela, Delcher, Arthur L., Salzberg, Steven L.

Background: The Generalized Hidden Markov Model (GHMM) has proven a useful framework for the task of computational gene prediction in eukaryotic genomes, due to its flexibility and probabilistic...

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses (2005)

Holmes, Edward C., Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St. George, Kirsten, ...

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...

Serendipitous discovery of Wolbachia genomes in multiple Drosophila species (2005)

Salzberg, Steven L., Dunning Hotopp, Julie C., Delcher, Arthur L., Pop, Mihai, Smith, Douglas R, Eisen, Michael B., ...

Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale genome sequencing projects. The existence of this data offers scientists the possibility of...

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses (2005)

Holmes, Edward C., Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St. George, Kirsten, ...

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...

Serendipitous discovery of Wolbachia genomes in multiple Drosophila species (2005)

Salzberg, Steven L., Dunning Hotopp, Julie C., Delcher, Arthur L., Pop, Mihai, Smith, Douglas R, Eisen, Michael B., ...

Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale genome sequencing projects. The existence of this data offers scientists the possibility of...

BIOINFORMATICS ORIGINAL PAPER Sequence analysis (2005)

Zasha Weinberg, Walter L. Ruzzo, Steven L. Salzberg

Vol. 22 no. 1 2006, pages 35–39 doi:10.1093/bioinformatics/bti743 Sequence-based heuristics for faster annotation of non-coding RNA families

Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol (2005)

Edward C. Holmes, Elodie Ghedin, Naomi Miller, Jill Taylor, Yiming Bao, Kirsten St. George, ...

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...

JIGSAW: integration of multiple sources of evidence for gene prediction (2005)

Allen, Jonathan E., Salzberg, Steven L.

Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions...

JIGSAW: integration of multiple sources of evidence for gene prediction (2005)

Allen, Jonathan E., Salzberg, Steven L.

Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions...

An empirical analysis of training protocols for probabilistic gene finders (2004)

Majoros, William H, Salzberg, Steven L

Abstract Background Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent...

An empirical analysis of training protocols for probabilistic gene finders (2004)

Majoros, William H., Salzberg, Steven L.

Background: Generalized hidden Markov models (GHMMs) appear to be approaching acceptance as a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recent proliferation...

Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath) (2004)

Naomi Ward, Øivind Larsen, James Sakwa, Live Bruseth, Hoda Khouri, A. Scott Durkin, ...

Methanotrophs are bacteria that use methane as a sole carbon source. The genome sequence of Methylococcus capsulatus deepens our understanding of methanotroph biology and its relationship to global...

Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath) (2004)

Naomi Ward, Øivind Larsen, James Sakwa, Live Bruseth, Hoda Khouri, A. Scott Durkin, ...

Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...

Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath) (2004)

Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...

Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...

Genomic Insights into Methanotrophy : the Complete Genome Sequence of Methylococcus capsulatus (Bath) (2004)

Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...

Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...

The Genome Assembly Archive: A New Public Resource (2004)

Steven L. Salzberg, Deanna Church, Michael DiCuccio, Eugene Yaschenko, James Ostell

With the genome assembly archive, it is possible to examine the raw data that underlies the DNA sequence in any sequenced genome.

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogasterand Drosophila pseudoobscura (2004)

Berman, Benjamin P, Pfeiffer, Barret D, Laverty, Todd R, Salzberg, Steven L, Rubin, Gerald M, Eisen, Michael B, ...

Abstract Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of...

Versatile and open software for comparing large genomes (2004)

Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L, Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...

Abstract The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical...

Versatile and open software for comparing large genomes (2004)

Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L., Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...

The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing...

DAGchainer: a tool for mining segmental genome duplications and synteny (2004)

Brian J. Haas, Arthur L. Delcher, Jennifer R. Wortman, Steven L. Salzberg

Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order...

clusters in Drosophila melanogaster and Drosophila (2004)

Benjamin P Berman, Barret D Pfeiffer, Todd R Laverty, Steven L Salzberg, Gerald M Rubin, Michael B Eisen, ...

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site

DAGchainer: a tool for mining segmental genome duplications and synteny (2004)

Haas, Brian J., Delcher, Arthur L., Wortman, Jennifer R., Salzberg, Steven L.

Summary: Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved...

DAGchainer: A tool for mining segmental genome duplications and synteny (2004)

Haas, Brian J., Delcher, Arthur L., Wortman, Jennifer R., Salzberg, Steven L.

Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order...

Automated correction of genome sequence errors (2004)

Gajer, Pawel, Schatz, Michael, Salzberg, Steven L.

By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...

Computational Gene Prediction Using Multiple Sources of Evidence (2004)

Allen, Jonathan E., Pertea, Mihaela, Salzberg, Steven L.

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...

Hierarchical Scaffolding With Bambus (2004)

Pop, Mihai, Kosack, Daniel S., Salzberg, Steven L.

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...

Comparative genome assembly (2004)

Pop, Mihai, Phillippy, Adam, Delcher, Arthur L., Salzberg, Steven L.

One of the most complex and computationally intensive tasks of genome sequence analysis is genome assembly. Even today, few centres have the resources, in both software and hardware, to assemble a...

DAGchainer: A tool for mining segmental genome duplications and synteny (2004)

Haas, Brian J., Delcher, Arthur L., Wortman, Jennifer R., Salzberg, Steven L.

Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order...

DOI: 10.1093/nar/gkh216 Automated correction of genome sequence errors (2003)

Pawel Gajer, Michael Schatz, Steven L. Salzberg

By using information from an assembly of a genome, a new program called AutoEditor signi®cantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...

Computational Discovery of Internal Micro-Exons (2003)

Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.

Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in...

GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders (2003)

Majoros, William H., Pertea, Mihaela, Antonescu, Corina, Salzberg, Steven L.

We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard...

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies (2003)

Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith, Roger K., Hannick, Linda I., ...

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble...

Full-length messenger RNA sequences greatly improve genome annotation (2002)

Haas, Brian J, Volfovsky, Natalia, Town, Christopher D, Troukhan, Maxim, Alexandrov, Nickolai, Feldmann, Kenneth A, ...

Abstract Background Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of...

Full-length messenger RNA sequences greatly improve genome annotation (2002)

Haas, Brian J, Volfovsky, Natalia, Town, Christopher D, Troukhan, Maxim, Alexandrov, Nickolai, Feldman, Kenneth A, ...

Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome...

Finding a Majority Among N Votes. (2002)

Fischer,Michael J., Salzberg,Steven L.

A commonly-used technique for fault-tolerant computing is to perform n redundant computations and then vote on the results, choosing on the majority value if one exists. We present an algorithm for...

Fast algorithms for large-scale genome alignment and comparison (2002)

Delcher, Arthur L., Phillippy, Adam, Carlton, Jane, Salzberg, Steven L.

We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...

A clustering method for repeat analysis in DNA sequences (2001)

Volfovsky, Natalia, Haas, Brian J, Salzberg, Steven L

Abstract Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this...

A clustering method for repeat analysis in DNA sequences. (2001)

Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.

Background: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data...

GeneSplicer: a new computational method for splice site prediction (2001)

Mihaela Pertea, Xiaoying Lin, Steven L. Salzberg

GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...

Prediction of operons in microbial genomes (2001)

Maria D. Ermolaeva, Owen White, Steven L. Salzberg

Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....

A Probabilistic Method For Identifying Start Codons In Bacterial Genomes (2001)

Baris E. Suzek, Maria D. Ermolaeva, Mark Schreiber, Steven L. Salzberg

As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities...

GeneSplicer: a new computational method for splice site prediction (2001)

Pertea, Mihaela, Lin, Xiaoying, Salzberg, Steven L.

GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...

Prediction of operons in microbial genomes (2001)

Ermolaeva, Maria D., White, Owen, Salzberg, Steven L.

Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....

A probabilistic method for identifying start codons in bacterial genomes (2001)

Suzek, Baris E., Ermolaeva, Maria D., Schreiber, Mark, Salzberg, Steven L.

As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities...

Evidence for symmetric chromosomal inversions around the replication origin in bacteria (2000)

Eisen, Jonathan A, Heidelberg, John F, White, Owen, Salzberg, Steven L

Abstract Background Whole-genome comparisons can provide great insight into many aspects of biology. Until recently, however, comparisons were mainly possible only between distantly related species....

Evidence for symmetric chromosomal inversions around the replication origin in bacteria (2000)

Eisen, Jonathan A., Heidelberg, John F., White, Owen, Salzberg, Steven L.

Background: Whole-genome comparisons can provide great insight into many aspects of biology. Until recently, however, comparisons were mainly possible only between distantly related species. Complete...

An optimized protocol for analysis of est sequences (2000)

Feng Liang, Ingeborg Holt, Geo Pertea, Svetlana Karamycheva, Steven L. Salzberg, John Quackenbush

The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...

Prediction of Transcription Terminators in Bacterial Genomes (2000)

Maria D. Ermolaeva, Hanif G. Khalak, Owen White, Hamilton O. Smith, Steven L. Salzberg

Introduction Bacterial genomes are organized into units of expression that are bounded by sites where transcription of DNA into RNA is initiated and terminated. Regulation of gene expression is often...

An optimized protocol for analysis of EST sequences (2000)

Liang, Feng, Holt, Ingeborg, Pertea, Geo, Karamycheva, Svetlana, Salzberg, Steven L., Quackenbush, John

The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...

Interpolated Markov models for eukaryotic gene finding (1999)

Steven L. Salzberg, Mihaela Pertea, Arthur L. Delcher, Malcolm J. Gardner, Hervé Tettelin

Computational gene finding research has emphasized the development of gene finders for bacterial and human DNA. This has left genome projects for some small eukaryotes without a system that addresses...

Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project (1999)

Hervé Tettelin, Diana Radune, Simon Kasif, Hoda Khouri, Steven L. Salzberg

A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the...

Gene Discovery in DNA Sequences (1999)

Steven L. Salzberg

this article, I describe the two main computational techniques used for discovering new genes: sequence alignment and computational gene finding.

Microbial gene identification using interpolated Markov models (1998)

Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...

Microbial gene identification using interpolated Markov models (1998)

Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...

Microbial gene identification using interpolated Markov models (1998)

Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...

Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA (1997)

Salzberg, Steven L.

This paper describes a new method for determining the consensus sequences that signal the start of donor translation and the boundaries between exons and introns (donor and acceptor sites) in...

On comparing classifiers: Pitfalls to avoid and a recommended approach (1997)

Steven L. Salzberg

Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very...

A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA (1997)

Steven L. Salzberg

This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic...

A Teaching Strategy for Memory-Based Control (1997)

John Sheppard, Steven L. Salzberg

Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be...

A method for identifying splice sites and translational start sites in eukaryotic mRNA (1997)

Salzberg, Steven L.

This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic...

Methodological Note On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach (1996)

Steven L. Salzberg, Usama Fayyad

Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very...

On Comparing Classifiers: A Critique of Current Research and Methods (1995)

Steven L. Salzberg

. An importantcomponent of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully,...

Memory Based Learning of Pursuit Games (1995)

John W. Sheppard, Steven L. Salzberg, Steven L. Salzberg

Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that memory based learning can...

Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks (1994)

David Aha, Steven L. Salzberg

This paper examines the hypothesis that local weighted variants of k-nearest neighbor algorithms can support dynamic control tasks. We evaluated several k- nearest neighbor (k-NN) algorithms on the...

Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks (1994)

David W. Aha, Steven L. Salzberg

This paper examines the hypothesis that local weighted variants of k-nearest neighbor algorithms can support dynamic control tasks. We evaluated several k- nearest neighbor (k-NN) algorithms on the...

Learning to Catch: Applying Nearest Neighbor Algorithms to Dynamic Control Tasks (Extended Abstract) (1993)

Steven L. Salzberg, David W. Aha

Steven L. Salzberg 1 and David W. Aha 2 1 Introduction Dynamic control problems are the subject of much research in machine learning (e.g., Selfridge, Sutton, & Barto, 1985; Sammut, 1990; Sutton,...

GeneSplicer: a new computational method for splice site prediction

Pertea, Mihaela, Lin, Xiaoying, Salzberg, Steven L.

GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...

Prediction of operons in microbial genomes

Ermolaeva, Maria D., White, Owen, Salzberg, Steven L.

Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....

Complete genome sequence of Caulobacter crescentus

Nierman, William C., Feldblyum, Tamara V., Laub, Michael T., Paulsen, Ian T., Nelson, Karen E., Eisen, Jonathan, ...

The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic...

An optimized protocol for analysis of EST sequences

Liang, Feng, Holt, Ingeborg, Pertea, Geo, Karamycheva, Svetlana, Salzberg, Steven L., Quackenbush, John

The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...

Fast algorithms for large-scale genome alignment and comparison

Delcher, Arthur L., Phillippy, Adam, Carlton, Jane, Salzberg, Steven L.

We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...

The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts

Paulsen, Ian T., Seshadri, Rekha, Nelson, Karen E., Eisen, Jonathan A., Heidelberg, John F., Read, Timothy D., ...

The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of...

GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders

Majoros, William H., Pertea, Mihaela, Antonescu, Corina, Salzberg, Steven L.

We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard...

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith, Roger K., Hannick, Linda I., ...

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble...

Understanding the Adaptation of Halobacterium Species NRC-1 to Its Extreme Environment through Computational Analysis of Its Genome Sequence

Kennedy, Sean P., Ng, Wailap Victor, Salzberg, Steven L., Hood, Leroy, DasSarma, Shiladitya

The genome of the halophilic archaeon Halobacterium sp. NRC-1 and predicted proteome have been analyzed by computational methods and reveal characteristics relevant to life in an extreme environment...

Computational Gene Prediction Using Multiple Sources of Evidence

Allen, Jonathan E., Pertea, Mihaela, Salzberg, Steven L.

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...

Hierarchical Scaffolding With Bambus

Pop, Mihai, Kosack, Daniel S., Salzberg, Steven L.

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...

Automated correction of genome sequence errors

Gajer, Pawel, Schatz, Michael, Salzberg, Steven L.

By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...

Versatile and open software for comparing large genomes

Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L, Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...

The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.

Computational Discovery of Internal Micro-Exons

Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.

Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in...

The Genome Assembly Archive: A New Public Resource

Salzberg, Steven L, Church, Deanna, DiCuccio, Michael, Yaschenko, Eugene, Ostell, James

With the genome assembly archive, it is possible to examine the raw data that underlies the DNA sequence in any sequenced genome

Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath)

Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...

Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura

Berman, Benjamin P, Pfeiffer, Barret D, Laverty, Todd R, Salzberg, Steven L, Rubin, Gerald M, Eisen, Michael B, ...

27 predicted gene-regulatory regions in the Drosophila melanogaster genome were analyzed in vivo, confirming 15 active enhancer regions. A comparison with Drosophila pseudoobscura sequences revealed...

Serendipitous discovery of Wolbachia genomes in multiple Drosophila species

Salzberg, Steven L, Hotopp, Julie C Dunning, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...

By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly:...

Correction: Serendipitous discovery of Wolbachia genomes in multiple Drosophila species

Salzberg, Steven L, Dunning Hotopp, Julie C, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...

A correction to Serendipitous discovery of Wolbachia genomes in multiple Drosophila species by SL Salzberg, JC Dunning Hotopp, AL Delcher, M Pop, DR Smith, MB Eisen and WC Nelson. Genome Biology...

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses

Holmes, Edward C, Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St George, Kirsten, ...

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...

Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote

Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

GeneSplicer: a new computational method for splice site prediction

Pertea, Mihaela, Lin, Xiaoying, Salzberg, Steven L.

GeneSplicer is a new, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been tested successfully using DNA from two reference organisms: the model...

Prediction of operons in microbial genomes

Ermolaeva, Maria D., White, Owen, Salzberg, Steven L.

Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons....

Complete genome sequence of Caulobacter crescentus

Nierman, William C., Feldblyum, Tamara V., Laub, Michael T., Paulsen, Ian T., Nelson, Karen E., Eisen, Jonathan, ...

The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic...

An optimized protocol for analysis of EST sequences

Liang, Feng, Holt, Ingeborg, Pertea, Geo, Karamycheva, Svetlana, Salzberg, Steven L., Quackenbush, John

The vast body of Expressed Sequence Tag (EST) data in the public databases provide an important resource for comparative and functional genomics studies and an invaluable tool for the annotation of...

Fast algorithms for large-scale genome alignment and comparison

Delcher, Arthur L., Phillippy, Adam, Carlton, Jane, Salzberg, Steven L.

We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs...

The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts

Paulsen, Ian T., Seshadri, Rekha, Nelson, Karen E., Eisen, Jonathan A., Heidelberg, John F., Read, Timothy D., ...

The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of...

GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders

Majoros, William H., Pertea, Mihaela, Antonescu, Corina, Salzberg, Steven L.

We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard...

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

Haas, Brian J., Delcher, Arthur L., Mount, Stephen M., Wortman, Jennifer R., Smith, Roger K., Hannick, Linda I., ...

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble...

Understanding the Adaptation of Halobacterium Species NRC-1 to Its Extreme Environment through Computational Analysis of Its Genome Sequence

Kennedy, Sean P., Ng, Wailap Victor, Salzberg, Steven L., Hood, Leroy, DasSarma, Shiladitya

The genome of the halophilic archaeon Halobacterium sp. NRC-1 and predicted proteome have been analyzed by computational methods and reveal characteristics relevant to life in an extreme environment...

Computational Gene Prediction Using Multiple Sources of Evidence

Allen, Jonathan E., Pertea, Mihaela, Salzberg, Steven L.

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program,...

Hierarchical Scaffolding With Bambus

Pop, Mihai, Kosack, Daniel S., Salzberg, Steven L.

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is...

Automated correction of genome sequence errors

Gajer, Pawel, Schatz, Michael, Salzberg, Steven L.

By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the...

Versatile and open software for comparing large genomes

Kurtz, Stefan, Phillippy, Adam, Delcher, Arthur L, Smoot, Michael, Shumway, Martin, Antonescu, Corina, ...

The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.

Computational Discovery of Internal Micro-Exons

Volfovsky, Natalia, Haas, Brian J., Salzberg, Steven L.

Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in...

The Genome Assembly Archive: A New Public Resource

Salzberg, Steven L, Church, Deanna, DiCuccio, Michael, Yaschenko, Eugene, Ostell, James

With the genome assembly archive, it is possible to examine the raw data that underlies the DNA sequence in any sequenced genome

Genomic Insights into Methanotrophy: The Complete Genome Sequence of Methylococcus capsulatus (Bath)

Ward, Naomi, Larsen, Øivind, Sakwa, James, Bruseth, Live, Khouri, Hoda, Durkin, A. Scott, ...

Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular,...

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura

Berman, Benjamin P, Pfeiffer, Barret D, Laverty, Todd R, Salzberg, Steven L, Rubin, Gerald M, Eisen, Michael B, ...

27 predicted gene-regulatory regions in the Drosophila melanogaster genome were analyzed in vivo, confirming 15 active enhancer regions. A comparison with Drosophila pseudoobscura sequences revealed...

Serendipitous discovery of Wolbachia genomes in multiple Drosophila species

Salzberg, Steven L, Hotopp, Julie C Dunning, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...

By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly:...

Correction: Serendipitous discovery of Wolbachia genomes in multiple Drosophila species

Salzberg, Steven L, Dunning Hotopp, Julie C, Delcher, Arthur L, Pop, Mihai, Smith, Douglas R, Eisen, Michael B, ...

A correction to Serendipitous discovery of Wolbachia genomes in multiple Drosophila species by SL Salzberg, JC Dunning Hotopp, AL Delcher, M Pop, DR Smith, MB Eisen and WC Nelson. Genome Biology...

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses

Holmes, Edward C, Ghedin, Elodie, Miller, Naomi, Taylor, Jill, Bao, Yiming, St George, Kirsten, ...

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2...

Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote

Eisen, Jonathan A, Coyne, Robert S, Wu, Martin, Wu, Dongying, Thiagarajan, Mathangi, Wortman, Jennifer R, ...

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct...

Genome re-annotation: a wiki solution?

Salzberg, Steven L

Genome annotation currently tends to represent a static snapshot. Routine re-annotation, perhaps using wiki software, would help.

Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake

Kingsford, Carleton L, Ayanbule, Kunmi, Salzberg, Steven L

Using a novel computational method, an extensive collection of predicted Rho-independent transcription terminators is derived from 343 prokaryotes, offering insight into their relationship to DNA...

Hawkeye: an interactive visual analytics tool for genome assemblies

Schatz, Michael C, Phillippy, Adam M, Shneiderman, Ben, Salzberg, Steven L

Hawkeye is a new, freely available visual analytics tool for genome assemblies, designed to aid in identifying and correcting assembly errors.

Comprehensive DNA Signature Discovery and Validation

Phillippy, Adam M, Mason, Jacquline A, Ayanbule, Kunmi, Sommer, Daniel D, Taviani, Elisa, Huq, Anwar, ...

DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive...

Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments

Haas, Brian J, Salzberg, Steven L, Zhu, Wei, Pertea, Mihaela, Allen, Jonathan E, Orvis, Joshua, ...

EVidenceModeler (EVM) is an automated annotation tool that predicts protein-coding regions, alternatively spliced transcripts and untranslated regions of eukaryotic genes.

Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads

Salzberg, Steven L., Sommer, Daniel D., Puiu, Daniela, Lee, Vincent T.

Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that...

Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18

Puiu, Daniela, Salzberg, Steven L.

Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica...

A whole-genome assembly of the domestic cow, Bos taurus

Zimin, Aleksey V, Delcher, Arthur L, Florea, Liliana, Kelley, David R, Schatz, Michael C, Puiu, Daniela, ...

A cow whole-genome assembly of 2.86 billion base pairs that closes gaps and corrects previously-described inversions and deletions as well as describing a portion of the Y chromosome.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Langmead, Ben, Trapnell, Cole, Pop, Mihai, Salzberg, Steven L

Bowtie: a new ultrafast memory-efficient tool for the alignment of short DNA sequence reads to large genomes.

TopHat: discovering splice junctions with RNA-Seq

Trapnell, Cole, Pachter, Lior, Salzberg, Steven L.

Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used...

OperonDB: a comprehensive database of predicted operons in microbial genomes

Pertea, Mihaela, Ayanbule, Kunmi, Smedinghoff, Megan, Salzberg, Steven L.

The fast pace of bacterial genome sequencing and the resulting dependence on highly automated annotation methods has driven the development of many genome-wide analysis tools. OperonDB, first...

The Complete Genome Sequence of Bacillus anthracis Ames “Ancestor”▿

Ravel, Jacques, Jiang, Lingxia, Stanley, Scott T., Wilson, Mark R., Decker, R. Scott, Read, Timothy D., ...

The pathogenic bacterium Bacillus anthracis has become the subject of intense study as a result of its use in a bioterrorism attack in the United States in September and October 2001. Previous...

Insignia: a DNA signature search web server for diagnostic assay development

Phillippy, Adam M., Ayanbule, Kunmi, Edwards, Nathan J., Salzberg, Steven L.

Insignia is a web application for the rapid identification of unique DNA signatures. DNA signatures are distinct nucleotide sequences that can be used to detect the presence of certain organisms and...

Genome Sequence of the Wolbachia Endosymbiont of Culex quinquefasciatus JHB▿

Salzberg, Steven L., Puiu, Daniela, Sommer, Daniel D., Nene, Vish, Lee, Norman H.

Wolbachia species are endosymbionts of a wide range of invertebrates, including mosquitoes, fruit flies, and nematodes. The wPip strains can cause cytoplasmic incompatibility in some strains of the...