Publication View

ES: Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery (2004)

Abstract
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genome-wide comparative analysis allowed the identification of functionally important sequences, both coding and non-coding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90 % of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10 % of previously annotated genes) and refining the gene structure of hundreds of genes. We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.8956
Source http://web.mit.edu/manoli/www/publications/Kellis_JCB_04.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.121.7056, 10.1.1.28.9317, 10.1.1.18.8344, 10.1.1.25.1002, 10.1.1.37.7492, 10.1.1.15.6826, 10.1.1.86.6972, 10.1.1.15.2252