Ewan Birney

Publication List Details

Period

1992 - 2009

Number

149

Co-Authors

Reactome - a knowledgebase of human biological pathways (2009)

Bijay Jassal, Esther E. Schmidt, Guanming Wu, Imre Vastrik, David Croft, Bernard De Bono, ...

Reactome ("http://www.reactome.org":http://www.reactome.org) is an expert-authored, peer-reviewed knowledgebase of human reactions and pathways that functions as a data mining resource and...

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes (2009)

Pruitt, Kim D., Harrow, Jennifer, Harte, Rachel A., Wallin, Craig, Diekhans, Mark, Maglott, Donna R., ...

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can...

VectorBase: a data resource for invertebrate vector genomics (2009)

Lawson, Daniel, Arensburger, Peter, Atkinson, Peter, Besansky, Nora J., Bruggner, Robert V., Butler, Ryan, ...

VectorBase (http://www.vectorbase.org) is an NIAID-funded Bioinformatic Resource Center focused on invertebrate vectors of human pathogens. VectorBase annotates and curates vector genomes providing a...

Petabyte-scale innovations at the European Nucleotide Archive (2009)

Cochrane, Guy, Akhtar, Ruth, Bonfield, James, Bower, Lawrence, Demiralp, Fehmi, Faruque, Nadeem, ...

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence...

MAPU 2.0: high-accuracy proteomes mapped to genomes (2009)

Gnad, Florian, Oroshi, Mario, Birney, Ewan, Mann, Matthias

The MAPU 2.0 database contains proteomes of organelles, tissues and cell types measured by mass spectrometry (MS)-based proteomics. In contrast to other databases it is meant to contain a limited...

Reactome knowledgebase of human biological pathways and processes (2009)

Matthews, Lisa, Gopinath, Gopal, Gillespie, Marc, Caudy, Michael, Croft, David, De Bono, Bernard, ...

Reactome (http://www.reactome.org) is an expert-authored, peer-reviewed knowledgebase of human reactions and pathways that functions as a data mining resource and electronic textbook. Its current...

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates (2009)

Vilella, Albert J., Severin, Jessica, Ureta-Vidal, Abel, Heng, Li, Durbin, Richard, Birney, Ewan

We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation,...

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment (2009)

Paten, Benedict, Herrero, Javier, Beal, Kathryn, Birney, Ewan

Motivation: Multiple sequence alignment is a cornerstone of comparative genomics. Much work has been done to improve methods for this task, particularly for the alignment of small sequences, and...

Integrating biological data – the Distributed Annotation System (2008)

Jenkinson, Andrew M, Albrecht, Mario, Birney, Ewan, Blankenburg, Hagen, Down, Thomas, Finn, Robert D, ...

Abstract Background The Distributed Annotation System (DAS) is a widely adopted protocol for dynamically integrating a wide range of biological data from geographically diverse sources. DAS continues...

software engineering, source (2008)

Ewan Birney, Ewan Birney

joined the EBI as one of the founding investigators for the Ensembl project. He is now a senior scientist at the EBI and runs both research and database projects. Michele Clamp joined the Sanger...

Technical Brief The International Protein Index: An integrated database for proteomics experiments (2008)

Paul J. Kersey, Jorge Duarte, Allyson Williams, Youla Karavidopoulou, Ewan Birney, Rolf Apweiler

Despite the complete determination of the genome sequence of several higher eukaryotes, their proteomes remain relatively poorly defined. Information about proteins identified by different...

Vol. 23 ISMB/ECCB 2007, pages i195–i204 BIOINFORMATICS doi:10.1093/bioinformatics/btm200 Optimized design and assessment of whole genome tiling arrays (2008)

Stefan Gräf, Stefan Kurtz, Martijn A. Huynen, Ewan Birney, Henk Stunnenberg, ...

Motivation: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important...

BMC Bioinformatics BioMed Central Research article Gene finding in the chicken genome (2008)

Eduardo Eyras, Re Reymond, Robert Castelo, Jacqueline M Bye, Francisco Camara, Paul Flicek, ...

This is an Open Access article distributed under the terms of the Creative Commons Attribution License

Genome analysis of the platypus reveals unique signatures of evolution (2008)

Warren, Wesley C., Hillier, LaDeana W., Marshall Graves, Jennifer A., Birney, Ewan, Ponting, Chris P., Grützner, Frank, ...

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a...

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (2008)

Zerbino, Daniel R., Birney, Ewan

We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short...

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database (2008)

Cochrane, Guy, Akhtar, Ruth, Aldebert, Philippe, Althorpe, Nicola, Baldwin, Alastair, Bates, Kirsty, ...

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth...

The HGNC Database in 2008: a resource for the human genome (2008)

Bruford, Elspeth A., Lush, Michael J., Wright, Mathew W., Sneddon, Tam P., Povey, Sue, Birney, Ewan

The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally meaningful name and symbol to every human gene. The HGNC database currently comprises over 24 000 public records...

Confounding between recombination and selection, and the Ped/Pop method for detecting selection (2008)

O’Reilly, Paul F., Birney, Ewan, Balding, David J.

In recent years, there have been major developments of population genetics methods to estimate both rates of recombination and levels of natural selection. However, genomic variants subject to...

Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs (2008)

Paten, Benedict, Herrero, Javier, Beal, Kathryn, Fitzgerald, Stephen, Birney, Ewan

Pairwise whole-genome alignment involves the creation of a homology map, capable of performing a near complete transformation of one genome into another. For multiple genomes this problem is...

Genome-wide nucleotide-level mammalian ancestor reconstruction (2008)

Paten, Benedict, Herrero, Javier, Fitzgerald, Stephen, Beal, Kathryn, Flicek, Paul, Holmes, Ian, ...

Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will...

An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs) (2008)

Rakyan, Vardhman K., Down, Thomas A., Thorne, Natalie P., Flicek, Paul, Kulesha, Eugene, Gräf, Stefan, ...

We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA...

3 (2007)

Sean R. Eddy, Ewan Birney, Alex Bateman, Richard Durbin

Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members, and alignment is done...

Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. (2007)

Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy, Robert D. Finn, Erik L. L

Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families. Release 3.1 is a major update of the Pfam database and contains 1313 families which are...

Resource The Bioperl Toolkit: Perl Modules for the Life Sciences (2007)

Jason E. Stajich, David Block, Kris Boulez, Steven E. Brenner, Lincoln D. Stein, Elia Stupka, ...

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive libraryof...

1 (2007)

Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy, Robert D. Finn

Pfam is a collection of multiple alignments and profile hidden Markov models of protein domain families. Release 3.1 is a major update of the Pfam database and contains 1313 families which are...

,KevinL.Howeand (2007)

Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at

Reactome - a knowledgebase of human biological pathways (2007)

Peter D'Eustachio, David Croft, Bernard De Bono, Gopal Gopinath, Marc Gillespie, Bijay Jassal, ...

Pathway curation is a powerful tool for systematically associating gene products with functions. Reactome (www.reactome.org) is a manually curated human pathway knowledgebase describing a wide range...

In Vivo Validation of a Computationally Predicted Conserved Ath5 Target Gene Set (2007)

Filippo Del Bene, Laurence Ettwiller, Dorota Skowronska-Krawczyk, Herwig Baier, Jean-Marc Matter, Ewan Birney, ...

So far, the computational identification of transcription factor binding sites is hampered by the complexity of vertebrate genomes. Here we present an in silico procedure to predict target sites of a...

In vivo Validation of a Computationally Predicted Conserved Ath5 Target Gene Set (2007)

Filippo Del Bene, Laurence Ettwiller, Dorota Skowronska-Krawczyk, Herwig Baier, Jean-Marc Matter, Ewan Birney, ...

So far the computational identification of transcription factor binding sites was hampered by the complexity of vertebrate genomes. Here we present an in silico procedure to predict target sites of a...

Optimized design and assessment of whole genome tiling arrays (2007)

Gräf, Stefan, Nielsen, Fiona G. G., Kurtz, Stefan, Huynen, Martijn A., Birney, Ewan, Stunnenberg, Henk, ...

Motivation: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important...

The landscape of histone modifications across 1% of the human genome in five human cell lines (2007)

Koch, Christoph M., Andrews, Robert M., Flicek, Paul, Dillon, Shane C., Karaöz, Ulas, Clelland, Gayle K., ...

We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1,...

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome (2007)

Margulies, Elliott H., Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Dewey, Colin N., Siepel, Adam, ...

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation,...

Reactome: a knowledge base of biologic pathways and processes (2007)

Vastrik, Imre, D'Eustachio, Peter, Schmidt, Esther, Joshi-Tope, Geeta, Gopinath, Gopal, Croft, David, ...

Abstract Reactome http://www.reactome.org , an online curated resource for human pathway data, provides infrastructure for computation across the biologic reaction network. We use Reactome to infer...

Update of the Anopheles gambiaePEST genome assembly (2007)

Sharakhova, Maria V, Hammond, Martin P, Lobo, Neil F, Krzywinski, Jaroslaw, Unger, Maria F, Hillenmeyer, Maureen E, ...

Abstract Background The genome of Anopheles gambiae , the major vector of malaria, was sequenced and assembled in 2002. This initial genome assembly and analysis made available to the scientific...

Patterns of somatic mutation in human cancer genomes (2007)

Greenman, Christopher, Stephens, Philip, Smith, Raffaella, Dalgliesh, Gillian L., Hunter, Christopher, Bignell, Graham, ...

Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for...

Reactome: An integrated expert model of human molecular processes and access toolkit (2007)

De Bono, Bernard, Vastrik, Imre, D´Eustachio, Peter, Schmidt, Esther, Gopinath, Gopal, Croft, David, ...

The behaviour of pervasive molecular processes in human biology can be studied through the large-scale modeling of the molecular events that define them. Constructing detailed models of such extent...

Patterns of somatic mutation in human cancer genomes (2007)

Greenman, Christopher, Stephens, Philip, Smith, Raffaella, Dalgliesh, Gillian L., Hunter, Christopher, Bignell, Graham, ...

Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for...

The implications of alternative splicing in the ENCODE protein complement (2007)

Tress, Michael L., Martelli, Pier L., Frankish, Adam, Reeves, Gabrielle A., Wesselink, Jan J., Yeats, Corin, ...

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological...

Identification of novel peptide hormones in the human proteome by hidden Markov model screening (2007)

Mirabeau, Olivier, Perlas, Emerald, Severini, Cinzia, Audero, Enrica, Gascuel, Olivier, Possenti, Roberta, ...

Peptide hormones are small, processed, and secreted peptides that signal via membrane receptors and play critical roles in normal and pathological physiology. The search for novel peptide hormones...

Genome browsing with Ensembl: a practical overview (2007)

Spudich, Giulietta, Fernández-Suárez, Xosé M., Birney, Ewan

A wealth of gene information is accruing in public databases. Genome browsers such as Ensembl are needed to organize and depict this information in the context of the genome. Ensembl provides an open...

Estimating the Neutral Rate of Nucleotide Substitution Using Introns (2007)

Hoffman, Michael M., Birney, Ewan

Evolutionary biologists frequently rely on estimates of the neutral rate of evolution when characterizing the selective pressure on protein-coding genes. We introduce a new method to estimate this...

VectorBase: a home for invertebrate vectors of human pathogens (2007)

Lawson, Daniel, Arensburger, Peter, Atkinson, Peter, Besansky, Nora J., Bruggner, Robert V., Butler, Ryan, ...

VectorBase (http://www.vectorbase.org/) is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing...

EGASP: the human ENCODE Genome Annotation Assessment Project (2006)

Guigó, Roderic, Flicek, Paul, Abril, Josep F, Reymond, Alexandre, Lagarde, Julien, Denoeud, France, ...

Abstract Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence....

EGASP: the human ENCODE Genome Annotation Assessment Project (2006)

Guigó Serra, Roderic, Flicek, Paul, Abril, Josep F., Reymond, Alexandre, Lagarde, Julien, Denoeud, France, ...

Background: Non-long terminal repeat (non-LTR) retrotransposons have contributed to shaping the structure and function of genomes. In silico and experimental approaches have been used to identify the...

VectorBase: a home for invertebrate vectors of human pathogens (2006)

Lawson, Daniel, Arensburger, Peter, Atkinson, Peter, Besansky, Nora J., Bruggner, Robert V., Butler, Ryan, ...

VectorBase (http://www.vectorbase.org/) is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing...

VectorBase: a home for invertebrate vectors of human pathogens (2006)

Lawson, Daniel, Arensburger, Peter, Atkinson, Peter, Besansky, Nora J., Bruggner, Robert V., Butler, Ryan, ...

VectorBase (http://www.vectorbase.org/) is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing...

The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates (2005)

Ettwiller, Laurence, Paten, Benedict, Souren, Marcel, Loosli, Felix, Wittbrodt, Jochen, Birney, Ewan

Abstract We have developed several new methods to investigate transcriptional motifs in vertebrates. We developed a specific alignment tool appropriate for regions involved in transcription control,...

Gene finding in the chicken genome (2005)

Eyras, Eduardo, Reymond, Alexandre, Castelo, Robert, Bye, Jacqueline M, Camara, Francisco, Flicek, Paul, ...

Abstract Background Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is...

Automated generation of heuristics for biological sequence comparison (2005)

Slater, Guy, Birney, Ewan

Abstract Background Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce...

comparison (2005)

Bmc Bioinformatics, Guy St, C Slater, Ewan Birney

Automated generation of heuristics for biological sequence

Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags (2005)

Hubbard, Simon J., Grafham, Darren V., Beattie, Kevin J., Overton, Ian M., McLaren, Stuart R., Croning, Michael D.R., ...

We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally...

Needed for completion of the human genome: hypothesis driven experiments and biologically realistic mathematical models (2004)

Guigo, Roderic, Birney, Ewan, Brent, Michael, Dermitzakis, Emmanouil, Pachter, Lior, Crollius, Hugues Roest, ...

With the sponsorship of ``Fundacio La Caixa'' we met in Barcelona, November 21st and 22nd, to analyze the reasons why, after the completion of the human genome sequence, the identification all...

The Human Genome (2004)

Birney, Ewan

Representación gráfica de la secuencia del genoma humano en los 23 pares de cromosomas de los seres humanos. Se trata de un material acompañante al vol. 431, no. 7011 de la revista Nature (21 de...

The Human Genome (2004)

Birney, Ewan

Representación gráfica de la secuencia del genoma humano en los 23 pares de cromosomas de los seres humanos. Se trata de un material acompañante al vol. 431, no. 7011 de la revista Nature (21 de...

The Pfam protein families database (2004)

Alex Bateman, Ewan Birney, Lorenzo Cerruti, Richard Durbin, Laurence Etwiller, Sean R. Eddy, ...

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at

EnsMart: A Generic System for Fast and Flexible Access to Biological Data (2004)

Kasprzyk, Arek, Keefe, Damian, Smedley, Damian, London, Darin, Spooner, William, Melsopp, Craig, ...

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools....

Biological database design and implementation (2004)

Birney, Ewan, Clamp, Michele

We present our experience of building biological databases. Such databases have most aspects in common with other complex databases in other fields. We do not believe that biological data are that...

An Overview of Ensembl (2004)

Birney, Ewan, Andrews, T. Daniel, Bevan, Paul, Caccamo, Mario, Chen, Yuan, Clarke, Laura, ...

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of...

Comparison of Human Chromosome 21 Conserved Nongenic Sequences (CNGs) With the Mouse and Dog Genomes Shows That Their Selective Constraint Is Independent of Their Genic Environment (2004)

Dermitzakis, Emmanouil T., Kirkness, Ewen, Schwarz, Scott, Birney, Ewan, Reymond, Alexandre, Antonarakis, Stylianos E.

The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic...

Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags (2004)

Hubbard, Simon J., Grafham, Darren V., Beattie, Kevin J., Overton, Ian M., McLaren, Stuart R., Croning, Michael D.R., ...

We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally...

Comparison of Human Chromosome 21 Conserved Nongenic Sequences (CNGs) With the Mouse and Dog Genomes Shows That Their Selective Constraint Is Independent of Their Genic Environment (2004)

Dermitzakis, Emmanouil T., Kirkness, Ewen, Schwarz, Scott, Birney, Ewan, Reymond, Alexandre, Antonarakis, Stylianos E.

The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic...

An Overview of Ensembl (2004)

Birney, Ewan, Andrews, T. Daniel, Bevan, Paul, Caccamo, Mario, Chen, Yuan, Clarke, Laura, ...

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of...

The Ensembl Core Software Libraries (2004)

Stabenau, Arne, McVicker, Graham, Melsopp, Craig, Proctor, Glenn, Clamp, Michele, Birney, Ewan

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means...

Sockeye: A 3D Environment for Comparative Genomics (2004)

Montgomery, Stephen B., Astakhova, Tamara, Bilenky, Mikhail, Birney, Ewan, Fu, Tony, Hassel, Maik, ...

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases,...

GeneWise and Genomewise (2004)

Birney, Ewan, Clamp, Michele, Durbin, Richard

We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and...

The European Bioinformatics Institute's data resources (2003)

Brooksbank, Catherine, Camon, Evelyn, Harris, Midori A., Magrane, Michele, Martin, Maria Jesus, Mulder, Nicola, ...

As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics...

Discovering Novel cis-Regulatory Motifs Using Functional Networks (2003)

Ettwiller, Laurence M., Rung, Johan, Birney, Ewan

We combined functional information such as protein–protein interactions or metabolic networks with genome information in Saccaromyces cerevisiae to predict cis-regulatory motifs in the upstream...

DOI: 10.1093/nar/gkg066 The European Bioinformatics Institute’s data resources (2002)

Catherine Brooksbank, Evelyn Camon, Midori A. Harris, Michele Magrane, Maria Jesus Martin, Nicola Mulder, ...

As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics...

The Pfam Protein Families Database (2002)

Alex Bateman, Ewan Birney, Lorenzo Cerruti, Richard Durbin, Laurence Etwiller, Sean R. Eddy, ...

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in...

The Pfam Protein Families Database (2002)

Bateman, Alex, Birney, Ewan, Cerruti, Lorenzo, Durbin, Richard, Etwiller, Laurence, Eddy, Sean R., ...

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at...

The Pfam Protein Families Database (2000)

Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy, Kevin L. Howe, Erik Sonnhammer, ...

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in...

ProtEST: protein multiple sequence alignments from expressed sequence tags (2000)

Cuff, James A., Birney, Ewan, Clamp, Michele E., Barton, Geoffrey J.

Motivation: An automatic sequence searching method (ProtEST) is described which constructs multiple protein sequence alignments from protein sequences and translated expressed sequence tags (ESTs)....

The Pfam Protein Families Database (2000)

Bateman, Alex, Birney, Ewan, Durbin, Richard, Eddy, Sean R., Howe, Kevin L., Sonnhammer, Erik L. L.

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at...

Pfam: Multiple Sequence Alignments and HMM-Profiles of Protein Domains (1998)

Sean R. Eddy, Ewan Birney, Alex Bateman, Richard Durbin

Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done...

Studies in Probabilistic Sequence Alignment and Evolution (1998)

Ian Holmes, Ewan Birney, Bill Bruno, Richard Durbin, Sean Eddy, David Haussler, ...

The complete sequencing of whole genomes presents opportunities for detailed study of molecular evolution. This thesis combines theoretical developments of Bayesian approaches in bioinformatics with...

Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors (1993)

Birney, Ewan, Kumar, Sanjay, Krainer, Adrian R.

We present a systematic analysis of sequence motifs found in metazoan protein factors involved in constitutive pre-mRNA splicing and in alternative splicing regulation. Using profile analysis we...

The Pfam Protein Families Database

Bateman, Alex, Birney, Ewan, Cerruti, Lorenzo, Durbin, Richard, Etwiller, Laurence, Eddy, Sean R., ...

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in...

The Pfam Protein Families Database

Bateman, Alex, Birney, Ewan, Durbin, Richard, Eddy, Sean R., Howe, Kevin L., Sonnhammer, Erik L. L.

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/ , in Sweden at...

The European Bioinformatics Institute's data resources

Brooksbank, Catherine, Camon, Evelyn, Harris, Midori A., Magrane, Michele, Martin, Maria Jesus, Mulder, Nicola, ...

As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics...

The Bioperl Toolkit: Perl Modules for the Life Sciences

Stajich, Jason E., Block, David, Boulez, Kris, Brenner, Steven E., Chervitz, Stephen A., Dagdigian, Chris, ...

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of...

Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene Pairs

Jareborg, Niclas, Birney, Ewan, Durbin, Richard

A data set of 77 genomic mouse/human gene pairs has been compiled from the EMBL nucleotide database, and their corresponding features determined. This set was used to analyze the degree of...

Using GeneWise in the Drosophila Annotation Experiment

Birney, Ewan, Durbin, Richard

The GeneWise method for combining gene prediction and homology searches was applied to the 2.9-Mb region from Drosophila melanogaster. The results from the Genome Annotation Assessment Project (GASP)...

EnsMart: A Generic System for Fast and Flexible Access to Biological Data

Kasprzyk, Arek, Keefe, Damian, Smedley, Damian, London, Darin, Spooner, William, Melsopp, Craig, ...

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools....

Discovering Novel cis-Regulatory Motifs Using Functional Networks

Ettwiller, Laurence M., Rung, Johan, Birney, Ewan

We combined functional information such as protein–protein interactions or metabolic networks with genome information in Saccaromyces cerevisiae to predict cis-regulatory motifs in the upstream...

Comparison of Human Chromosome 21 Conserved Nongenic Sequences (CNGs) With the Mouse and Dog Genomes Shows That Their Selective Constraint Is Independent of Their Genic Environment

Dermitzakis, Emmanouil T., Kirkness, Ewen, Schwarz, Scott, Birney, Ewan, Reymond, Alexandre, Antonarakis, Stylianos E.

The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic...

An Overview of Ensembl

Birney, Ewan, Andrews, T. Daniel, Bevan, Paul, Caccamo, Mario, Chen, Yuan, Clarke, Laura, ...

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of...

The Ensembl Core Software Libraries

Stabenau, Arne, McVicker, Graham, Melsopp, Craig, Proctor, Glenn, Clamp, Michele, Birney, Ewan

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means...

Sockeye: A 3D Environment for Comparative Genomics

Montgomery, Stephen B., Astakhova, Tamara, Bilenky, Mikhail, Birney, Ewan, Fu, Tony, Hassel, Maik, ...

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases,...

GeneWise and Genomewise

Birney, Ewan, Clamp, Michele, Durbin, Richard

We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and...

Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags

Hubbard, Simon J., Grafham, Darren V., Beattie, Kevin J., Overton, Ian M., McLaren, Stuart R., Croning, Michael D.R., ...

We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally...

A survey of homozygous deletions in human cancer genomes

Cox, Charles, Bignell, Graham, Greenman, Chris, Stabenau, Arne, Warren, William, Stephens, Philip, ...

Homozygous deletions of recessive cancer genes and fragile sites are known to occur in human cancers. We identified 281 homozygous deletions in 636 cancer cell lines. Of these deletions, 86 were...

The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates

Ettwiller, Laurence, Paten, Benedict, Souren, Marcel, Loosli, Felix, Wittbrodt, Jochen, Birney, Ewan

Several new methods and a specific alignment tool for investigating transcriptional motifs in vertebrates were used to identify all possible 12mers involved in transcription. Active instances of...

The Pfam Protein Families Database

Bateman, Alex, Birney, Ewan, Cerruti, Lorenzo, Durbin, Richard, Etwiller, Laurence, Eddy, Sean R., ...

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in...

The Pfam Protein Families Database

Bateman, Alex, Birney, Ewan, Durbin, Richard, Eddy, Sean R., Howe, Kevin L., Sonnhammer, Erik L. L.

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/ , in Sweden at...

The European Bioinformatics Institute's data resources

Brooksbank, Catherine, Camon, Evelyn, Harris, Midori A., Magrane, Michele, Martin, Maria Jesus, Mulder, Nicola, ...

As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics...

The Bioperl Toolkit: Perl Modules for the Life Sciences

Stajich, Jason E., Block, David, Boulez, Kris, Brenner, Steven E., Chervitz, Stephen A., Dagdigian, Chris, ...

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of...

Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene Pairs

Jareborg, Niclas, Birney, Ewan, Durbin, Richard

A data set of 77 genomic mouse/human gene pairs has been compiled from the EMBL nucleotide database, and their corresponding features determined. This set was used to analyze the degree of...

Using GeneWise in the Drosophila Annotation Experiment

Birney, Ewan, Durbin, Richard

The GeneWise method for combining gene prediction and homology searches was applied to the 2.9-Mb region from Drosophila melanogaster. The results from the Genome Annotation Assessment Project (GASP)...

EnsMart: A Generic System for Fast and Flexible Access to Biological Data

Kasprzyk, Arek, Keefe, Damian, Smedley, Damian, London, Darin, Spooner, William, Melsopp, Craig, ...

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools....

Discovering Novel cis-Regulatory Motifs Using Functional Networks

Ettwiller, Laurence M., Rung, Johan, Birney, Ewan

We combined functional information such as protein–protein interactions or metabolic networks with genome information in Saccaromyces cerevisiae to predict cis-regulatory motifs in the upstream...

Comparison of Human Chromosome 21 Conserved Nongenic Sequences (CNGs) With the Mouse and Dog Genomes Shows That Their Selective Constraint Is Independent of Their Genic Environment

Dermitzakis, Emmanouil T., Kirkness, Ewen, Schwarz, Scott, Birney, Ewan, Reymond, Alexandre, Antonarakis, Stylianos E.

The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic...

An Overview of Ensembl

Birney, Ewan, Andrews, T. Daniel, Bevan, Paul, Caccamo, Mario, Chen, Yuan, Clarke, Laura, ...

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of...

The Ensembl Core Software Libraries

Stabenau, Arne, McVicker, Graham, Melsopp, Craig, Proctor, Glenn, Clamp, Michele, Birney, Ewan

Systems for managing genomic data must store a vast quantity of information. Ensembl stores these data in several MySQL databases. The core software libraries provide a practical and effective means...

Sockeye: A 3D Environment for Comparative Genomics

Montgomery, Stephen B., Astakhova, Tamara, Bilenky, Mikhail, Birney, Ewan, Fu, Tony, Hassel, Maik, ...

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases,...

GeneWise and Genomewise

Birney, Ewan, Clamp, Michele, Durbin, Richard

We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and...

Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags

Hubbard, Simon J., Grafham, Darren V., Beattie, Kevin J., Overton, Ian M., McLaren, Stuart R., Croning, Michael D.R., ...

We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally...

A survey of homozygous deletions in human cancer genomes

Cox, Charles, Bignell, Graham, Greenman, Chris, Stabenau, Arne, Warren, William, Stephens, Philip, ...

Homozygous deletions of recessive cancer genes and fragile sites are known to occur in human cancers. We identified 281 homozygous deletions in 636 cancer cell lines. Of these deletions, 86 were...

The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates

Ettwiller, Laurence, Paten, Benedict, Souren, Marcel, Loosli, Felix, Wittbrodt, Jochen, Birney, Ewan

Several new methods and a specific alignment tool for investigating transcriptional motifs in vertebrates were used to identify all possible 12mers involved in transcription. Active instances of...

VectorBase: a home for invertebrate vectors of human pathogens

Lawson, Daniel, Arensburger, Peter, Atkinson, Peter, Besansky, Nora J., Bruggner, Robert V., Butler, Ryan, ...

VectorBase () is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for...

Update of the Anopheles gambiae PEST genome assembly

Sharakhova, Maria V, Hammond, Martin P, Lobo, Neil F, Krzywinski, Jaroslaw, Unger, Maria F, Hillenmeyer, Maureen E, ...

An update on the Anopheles gambiae PEST genome assembly places about 33% of previously unmapped sequences on the chromosomes.

Reactome: a knowledge base of biologic pathways and processes

Vastrik, Imre, D'Eustachio, Peter, Schmidt, Esther, Joshi-Tope, Geeta, Gopinath, Gopal, Croft, David, ...

Reactome, an online curated resource for human pathway data, can be used to infer equivalent reactions in non-human species and as a tool to aid in the interpretation of microarrays and other...

Identification of novel peptide hormones in the human proteome by hidden Markov model screening

Mirabeau, Olivier, Perlas, Emerald, Severini, Cinzia, Audero, Enrica, Gascuel, Olivier, Possenti, Roberta, ...

Peptide hormones are small, processed, and secreted peptides that signal via membrane receptors and play critical roles in normal and pathological physiology. The search for novel peptide hormones...

The implications of alternative splicing in the ENCODE protein complement

Tress, Michael L., Martelli, Pier Luigi, Frankish, Adam, Reeves, Gabrielle A., Wesselink, Jan Jaap, Yeats, Corin, ...

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological...

In Vivo Validation of a Computationally Predicted Conserved Ath5 Target Gene Set

Del Bene, Filippo, Ettwiller, Laurence, Skowronska-Krawczyk, Dorota, Baier, Herwig, Matter, Jean-Marc, Birney, Ewan, ...

So far, the computational identification of transcription factor binding sites is hampered by the complexity of vertebrate genomes. Here we present an in silico procedure to predict target sites of a...

The landscape of histone modifications across 1% of the human genome in five human cell lines

Koch, Christoph M., Andrews, Robert M., Flicek, Paul, Dillon, Shane C., Karaöz, Ulaş, Clelland, Gayle K., ...

We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1,...

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

Margulies, Elliott H., Cooper, Gregory M., Asimenos, George, Thomas, Daryl J., Dewey, Colin N., Siepel, Adam, ...

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation,...

The HGNC Database in 2008: a resource for the human genome

Bruford, Elspeth A., Lush, Michael J., Wright, Mathew W., Sneddon, Tam P., Povey, Sue, Birney, Ewan

The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally meaningful name and symbol to every human gene. The HGNC database currently comprises over 24 000 public records...

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database

Cochrane, Guy, Akhtar, Ruth, Aldebert, Philippe, Althorpe, Nicola, Baldwin, Alastair, Bates, Kirsty, ...

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth...

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Zerbino, Daniel R., Birney, Ewan

We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short...

Confounding between recombination and selection, and the Ped/Pop method for detecting selection

O’Reilly, Paul F., Birney, Ewan, Balding, David J.

In recent years, there have been major developments of population genetics methods to estimate both rates of recombination and levels of natural selection. However, genomic variants subject to...

An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs)

Rakyan, Vardhman K., Down, Thomas A., Thorne, Natalie P., Flicek, Paul, Kulesha, Eugene, Gräf, Stefan, ...

We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA...

Dynamite: A flexible code generating language for dynamic programming methods used in sequence comaprison.

Ewan Birney, Richard Durbin

1 We have developed a code generating language, called Dynamite, specialised for the production and subsequent manipulation of complex dynamic programming methods for biological sequence comparison....

Genome-wide nucleotide-level mammalian ancestor reconstruction

Paten, Benedict, Herrero, Javier, Fitzgerald, Stephen, Beal, Kathryn, Flicek, Paul, Holmes, Ian, ...

Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will...

Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs

Paten, Benedict, Herrero, Javier, Beal, Kathryn, Fitzgerald, Stephen, Birney, Ewan

Pairwise whole-genome alignment involves the creation of a homology map, capable of performing a near complete transformation of one genome into another. For multiple genomes this problem is...

Petabyte-scale innovations at the European Nucleotide Archive

Cochrane, Guy, Akhtar, Ruth, Bonfield, James, Bower, Lawrence, Demiralp, Fehmi, Faruque, Nadeem, ...

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence...

VectorBase: a data resource for invertebrate vector genomics

Lawson, Daniel, Arensburger, Peter, Atkinson, Peter, Besansky, Nora J., Bruggner, Robert V., Butler, Ryan, ...

VectorBase (http://www.vectorbase.org) is an NIAID-funded Bioinformatic Resource Center focused on invertebrate vectors of human pathogens. VectorBase annotates and curates vector genomes providing a...

MAPU 2.0: high-accuracy proteomes mapped to genomes

Gnad, Florian, Oroshi, Mario, Birney, Ewan, Mann, Matthias

The MAPU 2.0 database contains proteomes of organelles, tissues and cell types measured by mass spectrometry (MS)-based proteomics. In contrast to other databases it is meant to contain a limited...

EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

Vilella, Albert J., Severin, Jessica, Ureta-Vidal, Abel, Heng, Li, Durbin, Richard, Birney, Ewan

We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation,...

Reactome knowledgebase of human biological pathways and processes

Matthews, Lisa, Gopinath, Gopal, Gillespie, Marc, Caudy, Michael, Croft, David, De Bono, Bernard, ...

Reactome (http://www.reactome.org) is an expert-authored, peer-reviewed knowledgebase of human reactions and pathways that functions as a data mining resource and electronic textbook. Its current...