Mark Gerstein

M: Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions (2009)

Jiang Qian, Marisa Dolled-filhart, Jimmy Lin, Haiyuan Yu, Mark Gerstein

The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global...

An approach to compare genome tiling microarray and MPSS sequencing data for transcript mapping (2009)

Sasidharan, Rajkumar, Agarwal, Ashish, Rozowsky, Joel, Gerstein, Mark

Corrected abstract We are correcting the abstract of our published article ( 1 ). The sentence that starts "We observe that 4.5% of MPSS tags...." was not scientifically complete in the original...

The relationship between the evolution of microRNA targets and the length of their UTRs (2009)

Cheng, Chao, Bhardwaj, Nitin, Gerstein, Mark

Abstract Background MicroRNAs (miRNAs) are endogenous small RNA molecules that modulate the gene expression at the post-transcription levels in many eukaryotic cells. Their widespread and important...

mRNA expression profiles show differential regulatory effects of microRNAs between estrogen receptor-positive and estrogen receptor-negative breast cancer (2009)

Cheng, Chao, Fu, Xuping, Alves, Pedro, Gerstein, Mark

Abstract Background Recent studies have shown that the regulatory effect of microRNAs can be investigated by examining expression changes of their target genes. Given this, it is useful to define an...

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels (2009)

Yip, Kevin Y, Kim, Philip M, McDermott, Drew, Gerstein, Mark

Abstract Background Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy:...

An approach to comparing tiling array and high throughput sequencing technologies for genomic transcript mapping (2009)

Sasidharan, Rajkumar, Agarwal, Ashish, Rozowsky, Joel, Gerstein, Mark

Abstract Background There are two main technologies for transcriptome profiling, namely, tiling microarrays and high-throughput sequencing. Recently there has been a tremendous amount of excitement...

Systematic identification of transcription factors associated with patient survival in cancers (2009)

Cheng, Chao, Li, Lei M, Alves, Pedro, Gerstein, Mark

Abstract Background Aberrant activation or expression of transcription factors has been implicated in the tumorigenesis of various types of cancer. In spite of the prevalent application of microarray...

A Method Using Active-Site Sequence Conservation to Find Functional Shifts in Protein Families: Application to the Enzymes of Central Metabolism, Leading to the Identification of an Anomalous Isocitrate (2009)

Dehydrogenase In Pathogens, Rajdeep Das, Mark Gerstein

ABSTRACT We have introduced a method to identify functional shifts in protein families. Our method is based on the calculation of an active-site conservation ratio, which we call the “ASC ratio.”...

Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing (2009)

Lefrançois, Philippe, Euskirchen, Ghia M, Auerbach, Raymond K, Rozowsky, Joel, Gibson, Theodore, Yellman, Christopher M, ...

Abstract Background Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use,...

Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes (2009)

Balasubramanian, Suganthi, Zheng, Deyou, Liu, Yuen-Jong, Fang, Gang, Frankish, Adam, Carriero, Nicholas, ...

Abstract Background The availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a...

Zebrafish miR-1 and miR-133 shape muscle gene expression and regulate sarcomeric actin organization (2009)

Mishima, Yuichiro, Abreu-Goodger, Cei, Staton, Alison A., Stahlhut, Carlos, Shou, Chong, Cheng, Chao, ...

microRNAs (miRNAs) represent ∼4% of the genes in vertebrates, where they regulate deadenylation, translation, and decay of the target messenger RNAs (mRNAs). The integrated role of miRNAs to...

MAPK target networks in Arabidopsis thaliana revealed using functional protein microarrays (2009)

Popescu, Sorina C., Popescu, George V., Bachan, Shawn, Zhang, Zimei, Gerstein, Mark, Snyder, Michael, ...

Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs)....

MSB: A mean-shift-based approach for the analysis of structural variation in the genome (2009)

Wang, Lu-yong, Abyzov, Alexej, Korbel, Jan O., Snyder, Michael, Gerstein, Mark

Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining...

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions (2009)

Yip, Kevin Y., Gerstein, Mark

Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While...

Mismatch oligonucleotides in human and yeast: guidelines for probe design on tiling microarrays (2008)

Seringhaus, Michael, Rozowsky, Joel, Royce, Thomas, Nagalakshmi, Ugrappa, Jee, Justin, Snyder, Michael, ...

Abstract Background Mismatched oligonucleotides are widely used on microarrays to differentiate specific from nonspecific hybridization. While many experiments rely on such oligos, the hybridization...

Insight/Outlook Interrelating Different Types of Genomic Data, from Proteome to Secretome: ’Oming in on Function (2008)

Dov Greenbaum, Nicholas M. Luscombe, Ronald Jansen, Jiang Qian, Mark Gerstein

With the completion of genome sequences, the current challenge for biology is to determine the functions of all gene products and to understand how they contribute in making an organism viable. For...

Inferring Protein-Protein Interactions Using Interaction Network Topologies (2008)

Alberto Paccanaro, Valery Trifonov, Haiyuan Yu, Mark Gerstein

[ ∗ these authors contributed equally to this work] Abstract — We describe two novel methods for predicting protein interactions, using only the topology of an observed protein interaction...

BIOINFORMATICS ORIGINAL PAPER (2008)

Gene Expression, Jiang Du, Joel S. Rozowsky, Jan O. Korbel, Zhengdong D. Zhang, Thomas E. Royce, ...

doi:10.1093/bioinformatics/btl515 A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating...

Information Retrieval Leveraging Biological Identifier Relationships and Related Documents to Enhance Information Retrieval for Proteomics (2008)

Andrew Smith, Kei Cheung, Michael Krauthammer, Martin Schultz, Mark Gerstein

Motivation: Proteomics researchers need to be able to quickly retrieve relevant information from the web and the biomedical literature. To improve information retrieval, we leverage a graph of...

Methods Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions (2008)

Yuval Kluger, Ronen Basri, Joseph T. Chang, Mark Gerstein

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find “marker genes ” that are...

Helix Interaction Tool (HIT): A web-based tool for analysis of helixhelix interactions in proteins (2008)

Ursula Lehnert, Eric Z. Yu, Mark Gerstein

Motivation: In many proteins, helix-helix interactions can be critical to establishing protein conformation (folding) and dynamics, as well as determining associations between protein units. However,...

BIOINFORMATICS ORIGINAL PAPER Systems biology (2008)

Haiyuan Yu, Alberto Paccanaro, Valery Trifonov, Mark Gerstein

Vol. 22 no. 7 2006, pages 823–829 doi:10.1093/bioinformatics/btl014 Predicting interactions in protein networks by completing defective cliques

The tyna platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks (2008)

Kevin Y. Yip, Haiyuan Yu, Philip M. Kim, Martin Schultz, Mark Gerstein

Biological processes involve complex networks of interactions between molecules. Various large-scale experiments and curation efforts have led to preliminary versions of complete cellular networks...

Unstructured Data (2008)

Mark Gerstein

• Databases make program data persistent • RDB’s turn formless data in a number of structured tables ◊ Ways of joining together tables to give various views of the data 2

Perspective (2008)

Mark Gerstein

Folds in Genomes, shared & common folds

Established (2008)

Mark Gerstein, C Mark Gerstein, C Mark Gerstein, Monte Carlo

◊ Electrical non-bonded interactions ◊ bonded, fundamentally QM but treat as springs ◊ Sum up the energy

REVIEW DNA recognition code of transcription factors (2008)

Masashi Suzuki, Steven E. Brenner, Mark Gerstein, Naoto Yagi

'TO whom correspondence should be addressed Key words: DNA binding1DNA-protein interactionlgene expression/molecular recognition

Selection and Characterization of Small Random (2008)

Transmembrane Proteins That, Ann M. Dixon, Jennifer B. Frank, Yu Xia, Lara Ely, ...

this article can be found at doi: 10.1016/j.jmb.2004.03.044 E-mail address of the corresponding author: daniel.dimaio@yale.edu Abbreviations used: PDGF, platelet-derived growth factor; CAT,...

Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome (2008)

Wu, Jia, Du, Jiang, Rozowsky, Joel, Zhang, Zhengdong, Urban, Alexander E, Euskirchen, Ghia, ...

Abstract Background Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there...

Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. (2008)

Johnson, David S., Li, Wei, Gordon, D. Benjamin, Bhattacharjee, Arindam, Curry, Bo, Ghosh, Jayati, ...

The most widely used method for detecting genome-wide protein-DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first...

An integrated system for studying residue coevolution in proteins (2008)

Yip, Kevin Y., Patel, Prianka, Kim, Philip M., Engelman, Donald M., McDermott, Drew, Gerstein, Mark

Residue coevolution has recently emerged as an important concept, especially in the context of protein structures. While a multitude of different functions for quantifying it have been proposed, not...

Analysis of Nuclear Receptor Pseudogenes in Vertebrates: How the Silent Tell Their Stories (2008)

Zhang, Zhengdong D., Cayting, Philip, Weinstock, George, Gerstein, Mark

Transcription factor pseudogenes have not been systematically studied before. Nuclear receptors (NRs) constitute one of the largest groups of transcription factors in animals (e.g., 48 NRs in human)....

Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets (2008)

Johnson, David S., Li, Wei, Gordon, D. Benjamin, Bhattacharjee, Arindam, Curry, Bo, Ghosh, Jayati, ...

The most widely used method for detecting genome-wide protein–DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first...

A genomic analysis of RNA polymerase II modification and chromatin architecture related to 3' end RNA polyadenylation (2008)

Lian, Zheng, Karpikov, Alexander, Lian, Jin, Mahajan, Milind C., Hartman, Stephen, Gerstein, Mark, ...

Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3′ end formation and transcription termination. We used...

1 (2007)

Mark Gerstein, Ronald Jansen, Ted Johnson, Jerry Tsai, Werner Krebs

We describe database approaches taken in our lab to the study of protein and nucleic acid motions. We have developed a database of macromolecular motions, which is accessible on the World Wide Web...

REVIEW DNA recognition code of transcription factors (2007)

Masashi Suzuki, Steven E. Brenner, Mark Gerstein, Naoto Yagi

‘To whom correspondence should be addressed Key words: DNA bindingDNA-protein interactiordgene expressiordmolecular recognition

by Piotr Berman (2007)

Paul Bertone, Bhaskar Dasgupta, Mark Gerstein, Ming-yang Kao, Michael Snyder

A preliminary version of this paper appeared in the 2 nd Workshop on Algorithms in Bioinformatics,

The Morph Server and the Macromolecular (2007)

W. G. Krebs, Mark Gerstein

Motions Database ' a standardized system for analyzing and visualizing macromolecular motions in a database framework

Letter Relating Whole-Genome Expression Data with Protein-Protein Interactions (2007)

Ronald Jansen, Dov Greenbaum, Mark Gerstein

We investigate the relationship of protein-protein interactions with mRNA expression levels, by integrating a variety of data sources for yeast. We focus on known protein complexes that have clearly...

Figures for Average Core Structures (Revised) Average Core Structures and Variability Measures for Protein Families: (2007)

Mark Gerstein, Russ B Altman

Subject classification: Proteins Figures for Average Core Structures (Revised) A variety of methods are currently available for creating multiple alignments, and these can be used to define and...

Calculated from Simulation, using Voronoi Polyhedra (2007)

Mark Gerstein, Jerry Tsai, Michael Levitt

The protein surface is of great interest since proteins recognize other molecules and perform their functions through their surfaces. Central to understanding the protein surface is understanding

Pages: __ _ in total including this one (2007)

Mark Gerstein, Michael Levitt

ss-pstxt.rtf (word-97 RTF file of text) ss-prsci.pdf (acrobat PDF file, text + figures) ss-psfig.pdf (acrobat PDF file, just figures) ss-pstxt.txt (ASCII text file of just the text) 1 Structural...

Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search (2007)

Piotr Berman, Paul Bertone, Bhaskar Dasgupta, Mark Gerstein, Ming-yang Kao, Michael Snyder

In this paper we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size bound parameters, we want to find a set of tiles such that they...

Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications (2007)

Yu, Haiyuan, Jansen, Ronald, Stolovitzky, Gustavo, Gerstein, Mark

Motivation: Many classifications of protein function such as Gene Ontology (GO) are organized in directed acyclic graph (DAG) structures. In these classifications, the proteins are terminal leaf...

PARE: A tool for comparing protein abundance and mRNA expression data (2007)

Yu, Eric Z, Burba, Anne, Gerstein, Mark

Abstract Background Techniques for measuring protein abundance are rapidly advancing and we are now in a situation where we anticipate many protein abundance data sets will be available in the near...

Tilescope: online analysis pipeline for high-density tiling microarray data (2007)

Zhang, Zhengdong D, Rozowsky, Joel, Lam, Hugo YK, Du, Jiang, Snyder, Michael, Gerstein, Mark

Abstract We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data http://tilescope.gersteinlab.org . In a completely automated fashion,...

Getting connected: analysis and principles of biological networks (2007)

Zhu, Xiaowei, Gerstein, Mark, Snyder, Michael

The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and...

The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics (2007)

Haiyuan Yu, Philip M. Kim, Emmett Sprecher, Valery Trifonov, Mark Gerstein

It has been a long-standing goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has...

Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human (2007)

Zhang, Zhaolei, Pang, Andy, Gerstein, Mark

Abstract Background Widespread transcription activities in the human genome were recently observed in high-resolution tiling array experiments, which revealed many novel transcripts that are outside...

The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics (2007)

Haiyuan Yu, Philip M. Kim, Emmett Sprecher, Valery Trifinov, Mark Gerstein

It has been a long-standing goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has...

Tilescope: online analysis pipeline for high-density tiling microarray data (2007)

Zhengdong D. Zhang, Joel Rozowsky, Jiang Du, Michael Snyder, Mark Gerstein

Running title: microarray data analysis pipeline Key words: high-density tiling microarray, high-density oligonucleotide microarray, microarray data analysis For test data sets, sample result web...

New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis (2007)

Smith, Michael G., Gianoulis, Tara A., Pukatzki, Stefan, Mekalanos, John J., Ornston, L. Nicholas, Gerstein, Mark, ...

Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary...

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool (2007)

Yu, Haiyuan, Nguyen, Katherine, Royce, Tom, Qian, Jiang, Nelson, Kenneth, Snyder, Michael, ...

Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able...

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation (2007)

Karro, John E., Yan, Yangpan, Zheng, Deyou, Zhang, Zhaolei, Carriero, Nicholas, Cayting, Philip, ...

The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different...

BioMed Central (2007)

Bmc Bioinformatics, Eric Z Yu, Mark Gerstein, Eric Z Yu, ...

Software PARE: A tool for comparing protein abundance and mRNA expression data

ProCAT: a data analysis approach for protein microarrays (2006)

Zhu, Xiaowei, Gerstein, Mark, Snyder, Michael

Abstract Protein microarrays provide a versatile method for the analysis of many protein biochemical activities. Existing DNA microarray analytical methods do not translate to protein microarrays due...

BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments (2006)

Wang, Lu-yong, Snyder, Michael, Gerstein, Mark

Abstract Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin...

An Integrative Genomic Approach to Uncover Molecular Mechanisms of Prokaryotic Traits (2006)

Yang Liu, Jianrong Li, Lee Sam, Chern-Sing Goh, Mark Gerstein, Yves A. Lussier

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes,...

Integration of curated databases to identify genotype-phenotype associations (2006)

Goh, Chern-Sing, Gianoulis, Tara A, Liu, Yang, Li, Jianrong, Paccanaro, Alberto, Lussier, Yves A, ...

Abstract Background The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an...

Design principles of molecular networks revealed by global comparisons and composite motifs (2006)

Yu, Haiyuan, Xia, Yu, Trifonov, Valery, Gerstein, Mark

Abstract Background Molecular networks are of current interest, particularly with the publication of many large-scale datasets. Previous analyses have focused on topologic structures of individual...

PseudoPipe: an automated pseudogene identification pipeline (2006)

Zhaolei Zhang, Nicholas Carriero, Deyou Zheng, John Karro, Paul M. Harrison, Mark Gerstein, ...

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for...

BioMed Central Open Access (2006)

Chern-sing Goh, Tara A Gianoulis, Yang Liu, Jianrong Li, Alberto Paccanaro, Yves A Lussier, ...

Integration of curated databases to identify genotype-phenotype

Design optimization methods for genomic DNA tiling arrays (2006)

Paul Bertone, Valery Trifonov, Joel S. Rozowsky, Falk Schubert, Olof Emanuelsson, John Karro, ...

A recent development in microarray construction entails the unbiased coverage, or tiling, of non-repetitive genomic DNA for the experimental identification of unannotated transcribed sequences and...

PseudoPipe: an automated pseudogene identification pipeline (2006)

Zhang, Zhaolei, Carriero, Nicholas, Zheng, Deyou, Karro, John, Harrison, Paul M., Gerstein, Mark

Motivation: Mammalian genomes contain many ‘genomic fossils’ i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or...

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins (2006)

Burba, Anne E. Counterman, Lehnert, Ursula, Yu, Eric Z., Gerstein, Mark

Motivation: In many proteins, helix–helix interactions can be critical to establishing protein conformation (folding) and dynamics, as well as determining associations between protein units....

Predicting essential genes in fungal genomes (2006)

Seringhaus, Michael, Paccanaro, Alberto, Borneman, Anthony, Snyder, Michael, Gerstein, Mark

Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through...

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks (2006)

Yip, Kevin Y., Yu, Haiyuan, Kim, Philip M., Schultz, Martin, Gerstein, Mark

Summary: Biological processes involve complex networks of interactions between molecules. Various large-scale experiments and curation efforts have led to preliminary versions of complete cellular...

Predicting interactions in protein networks by completing defective cliques (2006)

Yu, Haiyuan, Paccanaro, Alberto, Trifonov, Valery, Gerstein, Mark

Datasets obtained by large-scale, high-throughput methods for detecting protein–protein interactions typically suffer from a relatively high level of noise. We describe a novel method for improving...

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool (2006)

Yu, Haiyuan, Nguyen, Katherine, Royce, Tom, Qian, Jiang, Nelson, Kenneth, Snyder, Michael, ...

Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able...

Target hub proteins serve as master regulators of development in yeast (2006)

Borneman, Anthony R., Leigh-Bell, Justine A., Yu, Haiyuan, Bertone, Paul, Gerstein, Mark, Snyder, Michael

To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in...

Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome (2006)

Emanuelsson, Olof, Nagalakshmi, Ugrappa, Zheng, Deyou, Rozowsky, Joel S., Urban, Alexander E., Du, Jiang, ...

Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated...

Predicting essential genes in fungal genomes (2006)

Seringhaus, Michael, Paccanaro, Alberto, Borneman, Anthony, Snyder, Michael, Gerstein, Mark

Essential genes are required for an organism’s viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through...

The Database of Macromolecular Motions: new features added at the decade mark (2006)

Flores, Samuel, Echols, Nathaniel, Milburn, Duncan, Hespenheide, Brandon, Keating, Kevin, Lu, Jason, ...

The database of molecular motions, MolMovDB (http://molmovdb.org), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two...

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool (2006)

Yu, Haiyuan, Nguyen, Katherine, Royce, Tom, Qian, Jiang, Nelson, Kenneth, Snyder, Michael, ...

Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able...

PubNet: a flexible system for visualizing literature derived networks (2005)

Douglas, Shawn M, Montelione, Gaetano T, Gerstein, Mark

Abstract We have developed PubNet, a web-based tool that extracts several types of relationships returned by PubMed queries and maps them into networks, allowing for graphical visualization, textual...

Global changes in STAT target selection and transcription regulation upon interferon treatments (2005)

Hartman, Stephen E., Bertone, Paul, Nath, Anjali K., Royce, Thomas E., Gerstein, Mark, Weissman, Sherman, ...

The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain...

Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms (2005)

Balasubramanian, Suganthi, Xia, Yu, Freinkman, Elizaveta, Gerstein, Mark

We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from the dbSNP database in G-protein-coupled...

Normal modes for predicting protein motions: A comprehensive database assessment and associated Web tool (2005)

Alexandrov, Vadim, Lehnert, Ursula, Echols, Nathaniel, Milburn, Duncan, Engelman, Donald, Gerstein, Mark

We carry out an extensive statistical study of the applicability of normal modes to the prediction of mobile regions in proteins. In particular, we assess the degree to which the observed motions...

Assessing the limits of genomic data integration for predicting protein networks (2005)

Lu, Long J., Xia, Yu, Paccanaro, Alberto, Yu, Haiyuan, Gerstein, Mark

Genomic data integration—the process of statistically combining diverse sources of information from functional genomics experiments to make large-scale predictions—is becoming increasingly...

Global changes in STAT target selection and transcription regulation upon interferon treatments (2005)

Hartman, Stephen E., Bertone, Paul, Nath, Anjali K., Royce, Thomas E., Gerstein, Mark, Weissman, Sherman, ...

The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain...

Biochemical and genetic analysis of the yeast proteome with a movable ORF collection (2005)

Gelperin, Daniel M., White, Michael A., Wilkinson, Martha L., Kon, Yoshiko, Kung, Li A., Wise, Kevin J., ...

Functional analysis of the proteome is an essential part of genomic research. To facilitate different proteomic approaches, a MORF (moveable ORF) library of 5854 yeast expression plasmids was...

Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles (2005)

Gilad, Yoav, Rifkin, Scott A., Bertone, Paul, Gerstein, Mark, White, Kevin P.

Interspecies comparisons of gene expression levels will increase our understanding of the evolution of transcriptional mechanisms and help to identify targets of natural selection. This approach...

Design optimization methods for genomic DNA tiling arrays (2005)

Bertone, Paul, Trifonov, Valery, Rozowsky, Joel S., Schubert, Falk, Emanuelsson, Olof, Karro, John, ...

A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central...

YeastHub: a semantic web use case for integrating data in the life sciences domain (2005)

Cheung, Kei-Hoi, Yip, Kevin Y., Smith, Andrew, DeKnikker, Remko, Masiar, Andy, Gerstein, Mark

Motivation: As the semantic web technology is maturing and the need for life sciences data integration over the web is growing, it is important to explore how data integration needs can be addressed...

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability (2005)

Harrison, Paul M., Zheng, Deyou, Zhang, Zhaolei, Carriero, Nicholas, Gerstein, Mark

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete...

Global changes in STAT target selection and transcription regulation upon interferon treatments (2005)

Hartman, Stephen E., Bertone, Paul, Nath, Anjali K., Royce, Thomas E., Gerstein, Mark, Weissman, Sherman, ...

The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain...

Information assessment on predicting protein-protein interactions (2004)

Lin, Nan, Wu, Baolin, Jansen, Ronald, Gerstein, Mark, Zhao, Hongyu

Abstract Background Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of...

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes (2004)

Liu, Yang, Harrison, Paul M, Kunin, Victor, Gerstein, Mark

Abstract Background Pseudogenes often manifest themselves as disabled copies of known genes. In prokaryotes, it was generally believed (with a few well-known exceptions) that they were rare. Results...

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures (2004)

Alexandrov, Vadim, Gerstein, Mark

Abstract Background Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far,...

An XML-based approach to integrating heterogeneous yeast genome data (2004)

Kei-hoi Cheung, Deyun Pan, Andrew Smith, Michael Seringhaus, Shawn M. Douglas, Mark Gerstein

Abstract. While there are an increasing number of genomes (including the human genome) whose sequences have been fully or nearly completed, the budding yeast Saccharomyces cerevisiae was the first...

Analyzing cellular biochemistry in terms of molecular networks (2004)

Yu Xia, Haiyuan Yu, Ronald Jansen, Michael Seringhaus, Sarah Baxter, Dov Greenbaum, ...

Key Words genome-wide high-throughput experiments, protein-protein interaction networks, regulatory networks, integration and prediction, network topology f Abstract One way to understand cells and...

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics (2004)

Yu, Haiyuan, Zhu, Xiaowei, Greenbaum, Dov, Karro, John, Gerstein, Mark

Biological networks are a topic of great current interest, particularly with the publication of a number of large genome‐wide interaction datasets. They are globally characterized by a variety...

Annotation Transfer Between Genomes: Protein-Protein Interologs and Protein-DNA Regulogs (2004)

Yu, Haiyuan, Luscombe, Nicholas M., Lu, Hao Xin, Zhu, Xiaowei, Xia, Yu, Han, Jing-Dong J., ...

Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their...

Relationship between gene co-expression and probe localization on microarray slides (2003)

Kluger, Yuval, Yu, Haiyuan, Qian, Jiang, Gerstein, Mark

Abstract Background Microarray technology allows simultaneous measurement of thousands of genes in a single experiment. This is a potentially useful tool for evaluating co-expression of genes and...

Comparing protein abundance and mRNA expression levels on a genomic scale (2003)

Greenbaum, Dov, Colangelo, Christopher, Williams, Kenneth, Gerstein, Mark

Abstract Attempts to correlate protein abundance with mRNA expression levels have had variable success. We review the results of these comparisons, focusing on yeast. In the process, we survey...

Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements (2003)

Zhang, Zhaolei, Gerstein, Mark

Abstract Phylogenetic footprinting is an approach to finding functionally important sequences in the genome that relies on detecting their high degrees of conservation across different species. A new...

A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes (2003)

Harrison, Paul M, Gerstein, Mark

Abstract We have derived a novel method to assess compositional biases in biological sequences, which is based on finding the lowest-probability subsequences for a given residue-type set. As a case...

ExpressYourself: a modular platform for processing and visualizing microarray data (2003)

Luscombe, Nicholas M., Royce, Thomas E., Bertone, Paul, Echols, Nathaniel, Horak, Christine E., Chang, Joseph T., ...

DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy...

Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes (2003)

Zhang, Zhaolei, Gerstein, Mark

Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences,...

Identification of pseudogenes in the Drosophila melanogaster genome (2003)

Harrison, Paul M., Milburn, Duncan, Zhang, Zhaolei, Bertone, Paul, Gerstein, Mark

Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are...

Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome (2003)

Zhang, Zhaolei, Harrison, Paul M., Liu, Yin, Gerstein, Mark

Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we...

SPINE 2: a system for collaborative structural proteomics within a federated database framework (2003)

Goh, Chern-Sing, Lan, Ning, Echols, Nathaniel, Douglas, Shawn M., Milburn, Duncan, Bertone, Paul, ...

We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at http://nesg.org. It serves as the central hub for the Northeast Structural Genomics Consortium,...

Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions (2003)

Kluger, Yuval, Basri, Ronen, Chang, Joseph T., Gerstein, Mark

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find “marker genes” that are...

MolMovDB: analysis and visualization of conformational change and structural flexibility (2003)

Echols, Nathaniel, Milburn, Duncan, Gerstein, Mark

The Database of Macromolecular Movements (http://MolMovDB.org) is a collection of data and software pertaining to flexibility in protein and RNA structures. The database is organized into two parts....

Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data (2003)

Qian, Jiang, Lin, Jimmy, Luscombe, Nicholas M., Yu, Haiyuan, Gerstein, Mark

Motivation: Defining regulatory networks, linking transcription factors (TFs) to their targets, is a central problem in post-genomic biology. One might imagine one could readily determine these...

Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models (2003)

Jansen, Ronald, Bussemaker, Harmen J., Gerstein, Mark

Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and...

Genomic analysis of membrane protein families: abundance and conserved motifs (2002)

Liu, Yang, Engelman, Donald M, Gerstein, Mark

Abstract Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of...

The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties (2002)

Luscombe, Nicholas M, Qian, Jiang, Zhang, Zhaolei, Johnson, Ted, Gerstein, Mark

Abstract Background The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. Through...

Structural genomics: a new era for pharmaceutical research (2002)

Liu, Yang, Luscombe, Nicholas M, Alexandrov, Vadim, Bertone, Paul, Harrison, Paul, Zhang, Zhaolei, ...

A report on the 15th Annual Center for Advanced Biotechnology and Medicine Symposium on structural genomics in pharmaceutical design, Princeton, USA, 24-25 October 2001.

Fast optimal genome tiling with applications to microarray design and homology search (2002)

Piotr Berman, Paul Bertone, Bhaskar Dasgupta, Mark Gerstein, Ming-yang Kao, Michael Snyder

In this paper we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size bound parameters, we want to find a set of tiles of maximum total...

Fast optimal genome tiling with applications to microarray design and homology search (2002)

Piotr Berman, Paul Bertone, Bhaskar Dasgupta, Mark Gerstein, Ming-yang Kao, Michael Snyder, ...

Abstract. In this paper we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size bound parameters, we want to nd a set of tiles such that...

A question of size: the eukaryotic proteome and the problems in defining it (2002)

Harrison, Paul M., Kumar, Anuj, Lang, Ning, Snyder, Michael, Gerstein, Mark

We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and...

Calculations of protein volumes: sensitivity analysis and parameter database (2002)

Tsai, Jerry, Gerstein, Mark

Motivation: The precise sizes of protein atoms in terms of occupied packing volume are of great importance. We have previously presented standard volumes for protein residues based on calculations...

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes (2002)

Echols, Nathaniel, Harrison, Paul, Balasubramanian, Suganthi, Luscombe, Nicholas M., Bertone, Paul, Zhang, Zhaolei, ...

Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomes—the worm, yeast, fly and human (chromosomes 21...

RNA expression patterns change dramatically in human neutrophils exposed to bacteria (2001)

Yamaga, Shigeru, Prashar, Yatindra, Lee, Helen H., Hoe, Nancy Palme, Kluger, Yuval, ...

A comprehensive study of changes in messenger RNA (mRNA) levels in human neutrophils following exposure to bacteria is described. Within 2 hours there are dramatic changes in the levels of several...

Determining the minimum number of types necessary to represent the sizes of protein atoms (2001)

Tsai, Jerry, Voss, Neil, Gerstein, Mark

Motivation: Traditionally, for packing calculations people have collected atoms together into a number of distinct ‘types’. These, in fact, often represent a heavy atom and its associated...

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics (2001)

Bertone, Paul, Kluger, Yuval, Lan, Ning, Zheng, Deyou, Christendat, Dinesh, Yee, Adelinda, ...

High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information...

Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels (2000)

Jimmy Lin, Mark Gerstein

We built “whole-genome ” trees based on the presence or absence of particular molecular features (either orthologs or folds) in the genomes of a number of recently sequenced microorganisms. To...

Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins (2000)

Jansen, Ronald, Gerstein, Mark

We analyzed 10 genome expression data sets by large-scale cross-referencing against broad structural and functional categories. The data sets, generated by different techniques (e.g. SAGE and gene...

SURVEY AND SUMMARY: The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework (2000)

Krebs, Werner G., Gerstein, Mark

The number of solved structures of macromolecules that have the same fold and thus exhibit some degree of conformational variability is rapidly increasing. It is consequently advantageous to develop...

Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome (2000)

Balasubramanian, Suganthi, Schneider, Tamara, Gerstein, Mark, Regan, Lynne

We present the results of a comprehensive analysis of the proteome of Mycoplasma genitalium (MG), the smallest autonomously replicating organism that has been completely sequenced. Our aim was to...

Database of Macromolecular Movements (1999)

Mark Gerstein ; Werner G. Krebs

The Molecular Movements Database lists motions in proteins and other macromolecules. It is arranged around a multi-level classification scheme and includes motions of loops, domains, and subunits.

Database of Macromolecular Movements (1999)

Mark Gerstein ; Werner G. Krebs

The Molecular Movements Database lists motions in proteins and other macromolecules. It is arranged around a multi-level classification scheme and includes motions of loops, domains, and subunits.

Studying Macromolecular Motions in a Database Framework: From Structure to Sequence (1999)

Mark Gerstein, Ronald Jansen, Ted Johnson, Jerry Tsai, Werner Krebs

We describe database approaches taken in our lab to the study of protein and nucleic acid motions. We have developed a database of macromolecular motions, which is accessible on the World Wide Web...

Investigating Molecular Recognition Through Large-scale Analysis of Protein Sequences and Structures (1998)

Gerstein, Mark

The objective of this project is to study protein sequence-structure relationships through large-scale computational analysis of gene sequences and crystal structure in the databanks. The results of...

Comprehensive Assessment of Automatic Structural Alignment Against a Manual Standard, the Scop Classification of Proteins (1998)

Mark Gerstein, Michael Levitt

(The NLM-formatted bibliographic entry is also available.) We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop...

Simulating the Minimum Core for Hydrophobic Collapse in Globular Proteins (1997)

Jerry Tsai, Mark Gerstein, Michael Levitt

(The NLM-formatted bibliographic entry is also available.) To investigate the nature of hydrophobic collapse considered to be the driving force in protein folding, we have simulated aqueous solutions...

Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures (1996)

Mark Gerstein, Michael Levitt

We show how a basic pairwise alignment procedure can be improved to more accurately align conserved structural regions, by using variable, positiondependent gap penalties that depend on secondary...

DNA recognition and superstructure formation by helix-turn-helix proteins (1995)

Suzuki, Masashi, Yagi, Naoto, Gerstein, Mark

The way helix-turn-helix proteins recognize DNA is analysed by comparing their sequences, structures, and binding specificities. Individual recognition helices in these proteins bind to four DNA base...

Stereochemical basis of DNA recognition by Zn fingers (1994)

Suzuki, Masashi, Gerstein, Mark, Yagi, Naoto

DNA-recognition rules for Zn fingers are discussed in terms of crystal structures. The rules can explain the DNA-binding characteristics of a number of Zn finger proteins for which there are no...

Solution structure of the DNA binding octapeptide repeat of the K10 gene product (1994)

Suzuki, Masashi, Neuhaus, David, Gerstein, Mark, Aimoto, Saburo

A putative transcription factor, the Drosophila K10 gene product, contains eight repeats of the octapeptide sequence SPNQQQHP or close variants. The solution structure of the K10 repeat was studied...

An NMR study on the DNA-binding SPKK motif and a model for its interaction with DNA (1993)

Suzuki, Masashi, Gerstein, Mark, Johnson, Tony

The solution structure of one and two repeats of the ‘SPKK’ DNA-binding motif is reported on the basis of NMR measurements. In dimethylsulphoxide (DMSO) the major population (approximately 90%)...

A structural census of the current population of protein sequences

Gerstein, Mark, Levitt, Michael

We examine the occurrence of the ≈300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences (≈140,000) in...

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information

Qian, Jiang, Stenger, Brad, Wilson, Cyrus A., Lin, Jimmy, Jansen, Ronald, Teichmann, Sarah A., ...

As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the...

A unified statistical framework for sequence comparison and structure comparison

Levitt, Michael, Gerstein, Mark

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an...

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics

Bertone, Paul, Kluger, Yuval, Lan, Ning, Zheng, Deyou, Christendat, Dinesh, Yee, Adelinda, ...

High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information...

A question of size: the eukaryotic proteome and the problems in defining it

Harrison, Paul M., Kumar, Anuj, Lang, Ning, Snyder, Michael, Gerstein, Mark

We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and...

SURVEY AND SUMMARY: The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework

Krebs, Werner G., Gerstein, Mark

The number of solved structures of macromolecules that have the same fold and thus exhibit some degree of conformational variability is rapidly increasing. It is consequently advantageous to develop...

Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome

Balasubramanian, Suganthi, Schneider, Tamara, Gerstein, Mark, Regan, Lynne

We present the results of a comprehensive analysis of the proteome of Mycoplasma genitalium (MG), the smallest autonomously replicating organism that has been completely sequenced. Our aim was to...

Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins

Jansen, Ronald, Gerstein, Mark

We analyzed 10 genome expression data sets by large-scale cross-referencing against broad structural and functional categories. The data sets, generated by different techniques (e.g. SAGE and gene...

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes

Echols, Nathaniel, Harrison, Paul, Balasubramanian, Suganthi, Luscombe, Nicholas M., Bertone, Paul, Zhang, Zhaolei, ...

Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomes—the worm, yeast, fly and human (chromosomes 21...

GATA-1 binding sites mapped in the β-globin locus by using mammalian chIp-chip analysis

Horak, Christine E., Mahajan, Milind C., Luscombe, Nicholas M., Gerstein, Mark, Weissman, Sherman M., Snyder, Michael

The expression of the β-like globin genes is intricately regulated by a series of both general and tissue-restricted transcription factors. The hemapoietic lineage-specific transcription factor...

The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties

Luscombe, Nicholas M, Qian, Jiang, Zhang, Zhaolei, Johnson, Ted, Gerstein, Mark

The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. The genomic occurrence of...

Genomic analysis of membrane protein families: abundance and conserved motifs

Liu, Yang, Engelman, Donald M, Gerstein, Mark

A genome-wide analysis was carried out on patterns of the classified polytopic membrane protein families, and the distribution of conserved amino acids and motifs in the transmembrane helix regions...

Structural genomics: a new era for pharmaceutical research

Liu, Yang, Luscombe, Nicholas M, Alexandrov, Vadim, Bertone, Paul, Harrison, Paul, Zhang, Zhaolei, ...

A report on the 15th Annual Center for Advanced Biotechnology and Medicine Symposium on structural genomics in pharmaceutical design, Princeton, USA, 24-25 October 2001.

Identification of pseudogenes in the Drosophila melanogaster genome

Harrison, Paul M., Milburn, Duncan, Zhang, Zhaolei, Bertone, Paul, Gerstein, Mark

Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are...

Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models

Jansen, Ronald, Bussemaker, Harmen J., Gerstein, Mark

Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and...

SPINE 2: a system for collaborative structural proteomics within a federated database framework

Goh, Chern-Sing, Lan, Ning, Echols, Nathaniel, Douglas, Shawn M., Milburn, Duncan, Bertone, Paul, ...

We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at http://nesg.org. It serves as the central hub for the Northeast Structural Genomics Consortium,...

MolMovDB: analysis and visualization of conformational change and structural flexibility

Echols, Nathaniel, Milburn, Duncan, Gerstein, Mark

The Database of Macromolecular Movements (http://MolMovDB.org) is a collection of data and software pertaining to flexibility in protein and RNA structures. The database is organized into two parts....

ExpressYourself: a modular platform for processing and visualizing microarray data

Luscombe, Nicholas M., Royce, Thomas E., Bertone, Paul, Echols, Nathaniel, Horak, Christine E., Chang, Joseph T., ...

DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy...

Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae

Horak, Christine E., Luscombe, Nicholas M., Qian, Jiang, Bertone, Paul, Piccirrillo, Stacy, Gerstein, Mark, ...

In the yeast Saccharomyces cerevisiae, SBF (Swi4–Swi6 cell cycle box binding factor) and MBF (MluI binding factor) are the major transcription factors regulating the START of the cell cycle, a time...

Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human Genome

Zhang, Zhaolei, Harrison, Paul, Gerstein, Mark

Mammals have 79 ribosomal proteins (RP). Using a systematic procedure based on sequence-homology, we have comprehensively identified pseudogenes of these proteins in the human genome. Our assignments...

Systematic Learning of Gene Functional Classes From DNA Array Expression Data by Using Multilayer Perceptrons

Mateos, Alvaro, Dopazo, Joaquín, Jansen, Ronald, Tu, Yuhai, Gerstein, Mark, Stolovitzky, Gustavo

Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering...

A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

Harrison, Paul M, Gerstein, Mark

A novel method has been derived to assess compositional biases in biological sequences. It is based on finding the lowest-probability subsequences for a given residue-type set.

Comparing protein abundance and mRNA expression levels on a genomic scale

Greenbaum, Dov, Colangelo, Christopher, Williams, Kenneth, Gerstein, Mark

We review the results of attempts to correlate protein abundance with mRNA expression levels, focusing on yeast.

Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements

Zhang, Zhaolei, Gerstein, Mark

Phylogenetic footprinting is an approach to finding functionally important sequences in the genome that relies on detecting their high degrees of conservation across different species. A new study...

The transcriptional activity of human Chromosome 22

Rinn, John L., Euskirchen, Ghia, Bertone, Paul, Martone, Rebecca, Luscombe, Nicholas M., Hartman, Stephen, ...

A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)+ RNA. We found that...

Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes

Zhang, Zhaolei, Gerstein, Mark

Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences,...

Distribution of NF-κB-binding sites across human chromosome 22

Martone, Rebecca, Euskirchen, Ghia, Bertone, Paul, Hartman, Stephen, Royce, Thomas E., Luscombe, Nicholas M., ...

We have mapped the chromosomal binding site distribution of a transcription factor in human cells. The NF-κB family of transcription factors plays an essential role in regulating the induction of...

A Genome-Wide Analysis of Blue-Light Regulation of Arabidopsis Transcription Factor Gene Expression during Seedling Development1[w]

Jiao, Yuling, Yang, Hongjuan, Ma, Ligeng, Sun, Ning, Yu, Haiyuan, Liu, Tie, ...

A microarray based on PCR amplicons of 1,864 confirmed and predicted Arabidopsis transcription factor genes was produced and used to profile the global expression pattern in seedlings, specifically...

Whole-genome Trees Based on the Occurrence of Folds and Orthologs: Implications for Comparing Genomes on Different Levels

Lin, Jimmy, Gerstein, Mark

We built whole-genome trees based on the presence or absence of particular molecular features, either orthologs or folds, in the genomes of a number of recently sequenced microorganisms. To put these...

Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins

Hegyi, Hedi, Gerstein, Mark

Annotation transfer is a principal process in genome annotation. It involves “transferring” structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed...

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics

Yu, Haiyuan, Zhu, Xiaowei, Greenbaum, Dov, Karro, John, Gerstein, Mark

Biological networks are a topic of great current interest, particularly with the publication of a number of large genome-wide interaction datasets. They are globally characterized by a variety of...

Transmembrane protein domains rarely use covalent domain recombination as an evolutionary mechanism

Liu, Yang, Gerstein, Mark, Engelman, Donald M.

Recombination of evolutionarily unrelated domains is a mechanism often used by evolution to produce variety in soluble proteins. By using a classification of polytopic transmembrane domains into...

CREB Binds to Multiple Loci on Human Chromosome 22

Euskirchen, Ghia, Royce, Thomas E., Bertone, Paul, Martone, Rebecca, Rinn, John L., Nelson, F. Kenneth, ...

The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An...

Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome

Zhang, Zhaolei, Harrison, Paul M., Liu, Yin, Gerstein, Mark

Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we...

Annotation Transfer Between Genomes: Protein–Protein Interologs and Protein–DNA Regulogs

Yu, Haiyuan, Luscombe, Nicholas M., Lu, Hao Xin, Zhu, Xiaowei, Xia, Yu, Han, Jing-Dong J., ...

Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their...

Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions

Kluger, Yuval, Basri, Ronen, Chang, Joseph T., Gerstein, Mark

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find “marker genes” that are...

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes

Liu, Yang, Harrison, Paul M, Kunin, Victor, Gerstein, Mark

A comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes identified around 7,000 candidate pseudogenes. A large fraction of prokaryote pseudogenes...

DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states

White, Eric J., Emanuelsson, Olof, Scalzo, David, Royce, Thomas, Kosak, Steven, Oakeley, Edward J., ...

Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific...

A High Productivity/Low Maintenance Approach to High-performance Computation for Biomedicine: Four Case Studies

Carriero, Nicholas, Osier, Michael V., Cheung, Kei-Hoi, Miller, Perry L., Gerstein, Mark, Zhao, Hongyu, ...

The rapid advances in high-throughput biotechnologies such as DNA microarrays and mass spectrometry have generated vast amounts of data ranging from gene expression to proteomics data. The large size...

Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms

Balasubramanian, Suganthi, Xia, Yu, Freinkman, Elizaveta, Gerstein, Mark

We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from the dbSNP database in G-protein-coupled...

Use of Thioredoxin as a Reporter To Identify a Subset of Escherichia coli Signal Sequences That Promote Signal Recognition Particle-Dependent Translocation

Huber, Damon, Boyd, Dana, Xia, Yu, Olma, Michael H., Gerstein, Mark, Beckwith, Jon

We have previously reported that the DsbA signal sequence promotes efficient, cotranslational translocation of the cytoplasmic protein thioredoxin-1 via the bacterial signal recognition particle...

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability

Harrison, Paul M., Zheng, Deyou, Zhang, Zhaolei, Carriero, Nicholas, Gerstein, Mark

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete...

Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles

Gilad, Yoav, Rifkin, Scott A., Bertone, Paul, Gerstein, Mark, White, Kevin P.

Interspecies comparisons of gene expression levels will increase our understanding of the evolution of transcriptional mechanisms and help to identify targets of natural selection. This approach...

Assessing the limits of genomic data integration for predicting protein networks

Lu, Long J., Xia, Yu, Paccanaro, Alberto, Yu, Haiyuan, Gerstein, Mark

Genomic data integration—the process of statistically combining diverse sources of information from functional genomics experiments to make large-scale predictions—is becoming increasingly...

Network security and data integrity in academia: an assessment and a proposal for large-scale archiving

Smith, Andrew, Greenbaum, Dov, Douglas, Shawn M, Long, Morrow, Gerstein, Mark

A direct impediment to the optimal use of online databases is the increasing prevalence, severity, and toll of computer and network security incidents. Funding agencies should set up working groups...

PubNet: a flexible system for visualizing literature derived networks

Douglas, Shawn M, Montelione, Gaetano T, Gerstein, Mark

PubNet is a web-based tool to extract several types of relationships returned by PubMed queries and map them into networks.

The Database of Macromolecular Motions: new features added at the decade mark

Flores, Samuel, Echols, Nathaniel, Milburn, Duncan, Hespenheide, Brandon, Keating, Kevin, Lu, Jason, ...

The database of molecular motions, MolMovDB (), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two conformations (the Morph...

Relating Whole-Genome Expression Data with Protein-Protein Interactions

Jansen, Ronald, Greenbaum, Dov, Gerstein, Mark

We investigate the relationship of protein-protein interactions with mRNA expression levels, by integrating a variety of data sources for yeast. We focus on known protein complexes that have clearly...

Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22

Harrison, Paul M., Hegyi, Hedi, Balasubramanian, Suganthi, Luscombe, Nicholas M., Bertone, Paul, Echols, Nathaniel, ...

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain...

Subcellular localization of the yeast proteome

Kumar, Anuj, Agarwal, Seema, Heyman, John A., Matson, Sandra, Heidtman, Matthew, Piccirillo, Stacy, ...

Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any...

Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae

Seringhaus, Michael, Kumar, Anuj, Hartigan, John, Snyder, Michael, Gerstein, Mark

Transposons are widely employed as tools for gene disruption. Ideally, they should display unbiased insertion behavior, and incorporate readily into any genomic DNA to which they are exposed....

Biochemical and genetic analysis of the yeast proteome with a movable ORF collection

Gelperin, Daniel M., White, Michael A., Wilkinson, Martha L., Kon, Yoshiko, Kung, Li A., Wise, Kevin J., ...

Functional analysis of the proteome is an essential part of genomic research. To facilitate different proteomic approaches, a MORF (moveable ORF) library of 5854 yeast expression plasmids was...

Global changes in STAT target selection and transcription regulation upon interferon treatments

Hartman, Stephen E., Bertone, Paul, Nath, Anjali K., Royce, Thomas E., Gerstein, Mark, Weissman, Sherman, ...

The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain...

Design optimization methods for genomic DNA tiling arrays

Bertone, Paul, Trifonov, Valery, Rozowsky, Joel S., Schubert, Falk, Emanuelsson, Olof, Karro, John, ...

A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central...

Target hub proteins serve as master regulators of development in yeast

Borneman, Anthony R., Leigh-Bell, Justine A., Yu, Haiyuan, Bertone, Paul, Gerstein, Mark, Snyder, Michael

To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in...

Proton sensitivity of ASIC1 appeared with the rise of fishes by changes of residues in the region that follows TM1 in the ectodomain of the channel

Coric, Tatjana, Zheng, Deyou, Gerstein, Mark, Canessa, Cecilia M

The acid-sensitive ion channel 1 (ASIC1) is a neuronal Na+ channel insensitive to changes in membrane potential but is gated by external protons. Proton sensitivity is believed to be essential for...

An Integrative Genomic Approach to Uncover Molecular Mechanisms of Prokaryotic Traits

Liu, Yang, Li, Jianrong, Sam, Lee, Goh, Chern-Sing, Gerstein, Mark, Lussier, Yves A

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes,...

A structural census of the current population of protein sequences

Gerstein, Mark, Levitt, Michael

We examine the occurrence of the ≈300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences (≈140,000) in...

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information

Qian, Jiang, Stenger, Brad, Wilson, Cyrus A., Lin, Jimmy, Jansen, Ronald, Teichmann, Sarah A., ...

As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the...

A unified statistical framework for sequence comparison and structure comparison

Levitt, Michael, Gerstein, Mark

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an...

SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics

Bertone, Paul, Kluger, Yuval, Lan, Ning, Zheng, Deyou, Christendat, Dinesh, Yee, Adelinda, ...

High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information...

A question of size: the eukaryotic proteome and the problems in defining it

Harrison, Paul M., Kumar, Anuj, Lang, Ning, Snyder, Michael, Gerstein, Mark

We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and...

SURVEY AND SUMMARY: The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework

Krebs, Werner G., Gerstein, Mark

The number of solved structures of macromolecules that have the same fold and thus exhibit some degree of conformational variability is rapidly increasing. It is consequently advantageous to develop...

Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome

Balasubramanian, Suganthi, Schneider, Tamara, Gerstein, Mark, Regan, Lynne

We present the results of a comprehensive analysis of the proteome of Mycoplasma genitalium (MG), the smallest autonomously replicating organism that has been completely sequenced. Our aim was to...

Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins

Jansen, Ronald, Gerstein, Mark

We analyzed 10 genome expression data sets by large-scale cross-referencing against broad structural and functional categories. The data sets, generated by different techniques (e.g. SAGE and gene...

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes

Echols, Nathaniel, Harrison, Paul, Balasubramanian, Suganthi, Luscombe, Nicholas M., Bertone, Paul, Zhang, Zhaolei, ...

Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomes—the worm, yeast, fly and human (chromosomes 21...

GATA-1 binding sites mapped in the β-globin locus by using mammalian chIp-chip analysis

Horak, Christine E., Mahajan, Milind C., Luscombe, Nicholas M., Gerstein, Mark, Weissman, Sherman M., Snyder, Michael

The expression of the β-like globin genes is intricately regulated by a series of both general and tissue-restricted transcription factors. The hemapoietic lineage-specific transcription factor...

The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties

Luscombe, Nicholas M, Qian, Jiang, Zhang, Zhaolei, Johnson, Ted, Gerstein, Mark

The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. The genomic occurrence of...

Genomic analysis of membrane protein families: abundance and conserved motifs

Liu, Yang, Engelman, Donald M, Gerstein, Mark

A genome-wide analysis was carried out on patterns of the classified polytopic membrane protein families, and the distribution of conserved amino acids and motifs in the transmembrane helix regions...

Structural genomics: a new era for pharmaceutical research

Liu, Yang, Luscombe, Nicholas M, Alexandrov, Vadim, Bertone, Paul, Harrison, Paul, Zhang, Zhaolei, ...

A report on the 15th Annual Center for Advanced Biotechnology and Medicine Symposium on structural genomics in pharmaceutical design, Princeton, USA, 24-25 October 2001.

Identification of pseudogenes in the Drosophila melanogaster genome

Harrison, Paul M., Milburn, Duncan, Zhang, Zhaolei, Bertone, Paul, Gerstein, Mark

Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are...

Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models

Jansen, Ronald, Bussemaker, Harmen J., Gerstein, Mark

Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and...

Relating Whole-Genome Expression Data with Protein-Protein Interactions

Jansen, Ronald, Greenbaum, Dov, Gerstein, Mark

We investigate the relationship of protein-protein interactions with mRNA expression levels, by integrating a variety of data sources for yeast. We focus on known protein complexes that have clearly...

Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22

Harrison, Paul M., Hegyi, Hedi, Balasubramanian, Suganthi, Luscombe, Nicholas M., Bertone, Paul, Echols, Nathaniel, ...

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain...

Subcellular localization of the yeast proteome

Kumar, Anuj, Agarwal, Seema, Heyman, John A., Matson, Sandra, Heidtman, Matthew, Piccirillo, Stacy, ...

Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any...

SPINE 2: a system for collaborative structural proteomics within a federated database framework

Goh, Chern-Sing, Lan, Ning, Echols, Nathaniel, Douglas, Shawn M., Milburn, Duncan, Bertone, Paul, ...

We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at http://nesg.org. It serves as the central hub for the Northeast Structural Genomics Consortium,...

MolMovDB: analysis and visualization of conformational change and structural flexibility

Echols, Nathaniel, Milburn, Duncan, Gerstein, Mark

The Database of Macromolecular Movements (http://MolMovDB.org) is a collection of data and software pertaining to flexibility in protein and RNA structures. The database is organized into two parts....

ExpressYourself: a modular platform for processing and visualizing microarray data

Luscombe, Nicholas M., Royce, Thomas E., Bertone, Paul, Echols, Nathaniel, Horak, Christine E., Chang, Joseph T., ...

DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy...

Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae

Horak, Christine E., Luscombe, Nicholas M., Qian, Jiang, Bertone, Paul, Piccirrillo, Stacy, Gerstein, Mark, ...

In the yeast Saccharomyces cerevisiae, SBF (Swi4–Swi6 cell cycle box binding factor) and MBF (MluI binding factor) are the major transcription factors regulating the START of the cell cycle, a time...

Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human Genome

Zhang, Zhaolei, Harrison, Paul, Gerstein, Mark

Mammals have 79 ribosomal proteins (RP). Using a systematic procedure based on sequence-homology, we have comprehensively identified pseudogenes of these proteins in the human genome. Our assignments...

Systematic Learning of Gene Functional Classes From DNA Array Expression Data by Using Multilayer Perceptrons

Mateos, Alvaro, Dopazo, Joaquín, Jansen, Ronald, Tu, Yuhai, Gerstein, Mark, Stolovitzky, Gustavo

Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering...

A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

Harrison, Paul M, Gerstein, Mark

A novel method has been derived to assess compositional biases in biological sequences. It is based on finding the lowest-probability subsequences for a given residue-type set.

Comparing protein abundance and mRNA expression levels on a genomic scale

Greenbaum, Dov, Colangelo, Christopher, Williams, Kenneth, Gerstein, Mark

We review the results of attempts to correlate protein abundance with mRNA expression levels, focusing on yeast.

Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements

Zhang, Zhaolei, Gerstein, Mark

Phylogenetic footprinting is an approach to finding functionally important sequences in the genome that relies on detecting their high degrees of conservation across different species. A new study...

The transcriptional activity of human Chromosome 22

Rinn, John L., Euskirchen, Ghia, Bertone, Paul, Martone, Rebecca, Luscombe, Nicholas M., Hartman, Stephen, ...

A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)+ RNA. We found that...

Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes

Zhang, Zhaolei, Gerstein, Mark

Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences,...

Distribution of NF-κB-binding sites across human chromosome 22

Martone, Rebecca, Euskirchen, Ghia, Bertone, Paul, Hartman, Stephen, Royce, Thomas E., Luscombe, Nicholas M., ...

We have mapped the chromosomal binding site distribution of a transcription factor in human cells. The NF-κB family of transcription factors plays an essential role in regulating the induction of...

A Genome-Wide Analysis of Blue-Light Regulation of Arabidopsis Transcription Factor Gene Expression during Seedling Development1[w]

Jiao, Yuling, Yang, Hongjuan, Ma, Ligeng, Sun, Ning, Yu, Haiyuan, Liu, Tie, ...

A microarray based on PCR amplicons of 1,864 confirmed and predicted Arabidopsis transcription factor genes was produced and used to profile the global expression pattern in seedlings, specifically...

Whole-genome Trees Based on the Occurrence of Folds and Orthologs: Implications for Comparing Genomes on Different Levels

Lin, Jimmy, Gerstein, Mark

We built whole-genome trees based on the presence or absence of particular molecular features, either orthologs or folds, in the genomes of a number of recently sequenced microorganisms. To put these...

Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins

Hegyi, Hedi, Gerstein, Mark

Annotation transfer is a principal process in genome annotation. It involves “transferring” structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed...

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics

Yu, Haiyuan, Zhu, Xiaowei, Greenbaum, Dov, Karro, John, Gerstein, Mark

Biological networks are a topic of great current interest, particularly with the publication of a number of large genome-wide interaction datasets. They are globally characterized by a variety of...

Transmembrane protein domains rarely use covalent domain recombination as an evolutionary mechanism

Liu, Yang, Gerstein, Mark, Engelman, Donald M.

Recombination of evolutionarily unrelated domains is a mechanism often used by evolution to produce variety in soluble proteins. By using a classification of polytopic transmembrane domains into...

CREB Binds to Multiple Loci on Human Chromosome 22

Euskirchen, Ghia, Royce, Thomas E., Bertone, Paul, Martone, Rebecca, Rinn, John L., Nelson, F. Kenneth, ...

The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An...

Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome

Zhang, Zhaolei, Harrison, Paul M., Liu, Yin, Gerstein, Mark

Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we...

Annotation Transfer Between Genomes: Protein–Protein Interologs and Protein–DNA Regulogs

Yu, Haiyuan, Luscombe, Nicholas M., Lu, Hao Xin, Zhu, Xiaowei, Xia, Yu, Han, Jing-Dong J., ...

Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their...

Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions

Kluger, Yuval, Basri, Ronen, Chang, Joseph T., Gerstein, Mark

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find “marker genes” that are...

Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes

Liu, Yang, Harrison, Paul M, Kunin, Victor, Gerstein, Mark

A comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes identified around 7,000 candidate pseudogenes. A large fraction of prokaryote pseudogenes...

DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states

White, Eric J., Emanuelsson, Olof, Scalzo, David, Royce, Thomas, Kosak, Steven, Oakeley, Edward J., ...

Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific...

A High Productivity/Low Maintenance Approach to High-performance Computation for Biomedicine: Four Case Studies

Carriero, Nicholas, Osier, Michael V., Cheung, Kei-Hoi, Miller, Perry L., Gerstein, Mark, Zhao, Hongyu, ...

The rapid advances in high-throughput biotechnologies such as DNA microarrays and mass spectrometry have generated vast amounts of data ranging from gene expression to proteomics data. The large size...

Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms

Balasubramanian, Suganthi, Xia, Yu, Freinkman, Elizaveta, Gerstein, Mark

We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from the dbSNP database in G-protein-coupled...

Use of Thioredoxin as a Reporter To Identify a Subset of Escherichia coli Signal Sequences That Promote Signal Recognition Particle-Dependent Translocation

Huber, Damon, Boyd, Dana, Xia, Yu, Olma, Michael H., Gerstein, Mark, Beckwith, Jon

We have previously reported that the DsbA signal sequence promotes efficient, cotranslational translocation of the cytoplasmic protein thioredoxin-1 via the bacterial signal recognition particle...

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability

Harrison, Paul M., Zheng, Deyou, Zhang, Zhaolei, Carriero, Nicholas, Gerstein, Mark

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete...

Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles

Gilad, Yoav, Rifkin, Scott A., Bertone, Paul, Gerstein, Mark, White, Kevin P.

Interspecies comparisons of gene expression levels will increase our understanding of the evolution of transcriptional mechanisms and help to identify targets of natural selection. This approach...

Assessing the limits of genomic data integration for predicting protein networks

Lu, Long J., Xia, Yu, Paccanaro, Alberto, Yu, Haiyuan, Gerstein, Mark

Genomic data integration—the process of statistically combining diverse sources of information from functional genomics experiments to make large-scale predictions—is becoming increasingly...

Network security and data integrity in academia: an assessment and a proposal for large-scale archiving

Smith, Andrew, Greenbaum, Dov, Douglas, Shawn M, Long, Morrow, Gerstein, Mark

A direct impediment to the optimal use of online databases is the increasing prevalence, severity, and toll of computer and network security incidents. Funding agencies should set up working groups...

PubNet: a flexible system for visualizing literature derived networks

Douglas, Shawn M, Montelione, Gaetano T, Gerstein, Mark

PubNet is a web-based tool to extract several types of relationships returned by PubMed queries and map them into networks.

Biochemical and genetic analysis of the yeast proteome with a movable ORF collection

Gelperin, Daniel M., White, Michael A., Wilkinson, Martha L., Kon, Yoshiko, Kung, Li A., Wise, Kevin J., ...

Functional analysis of the proteome is an essential part of genomic research. To facilitate different proteomic approaches, a MORF (moveable ORF) library of 5854 yeast expression plasmids was...

Global changes in STAT target selection and transcription regulation upon interferon treatments

Hartman, Stephen E., Bertone, Paul, Nath, Anjali K., Royce, Thomas E., Gerstein, Mark, Weissman, Sherman, ...

The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain...

The Database of Macromolecular Motions: new features added at the decade mark

Flores, Samuel, Echols, Nathaniel, Milburn, Duncan, Hespenheide, Brandon, Keating, Kevin, Lu, Jason, ...

The database of molecular motions, MolMovDB (), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two conformations (the Morph...

Design optimization methods for genomic DNA tiling arrays

Bertone, Paul, Trifonov, Valery, Rozowsky, Joel S., Schubert, Falk, Emanuelsson, Olof, Karro, John, ...

A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central...

Target hub proteins serve as master regulators of development in yeast

Borneman, Anthony R., Leigh-Bell, Justine A., Yu, Haiyuan, Bertone, Paul, Gerstein, Mark, Snyder, Michael

To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in...

Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae

Seringhaus, Michael, Kumar, Anuj, Hartigan, John, Snyder, Michael, Gerstein, Mark

Transposons are widely employed as tools for gene disruption. Ideally, they should display unbiased insertion behavior, and incorporate readily into any genomic DNA to which they are exposed....

Proton sensitivity of ASIC1 appeared with the rise of fishes by changes of residues in the region that follows TM1 in the ectodomain of the channel

Coric, Tatjana, Zheng, Deyou, Gerstein, Mark, Canessa, Cecilia M

The acid-sensitive ion channel 1 (ASIC1) is a neuronal Na+ channel insensitive to changes in membrane potential but is gated by external protons. Proton sensitivity is believed to be essential for...

An Integrative Genomic Approach to Uncover Molecular Mechanisms of Prokaryotic Traits

Liu, Yang, Li, Jianrong, Sam, Lee, Goh, Chern-Sing, Gerstein, Mark, Lussier, Yves A

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes,...

Design principles of molecular networks revealed by global comparisons and composite motifs

Yu, Haiyuan, Xia, Yu, Trifonov, Valery, Gerstein, Mark

A global comparison of the four basic molecular networks in yeast - regulatory, co-expression, interaction and metabolic - reveals general design principles.

TOS9 Regulates White-Opaque Switching in Candida albicans▿ †

Srikantha, Thyagarajan, Borneman, Anthony R., Daniels, Karla J., Pujol, Claude, Wu, Wei, Seringhaus, Michael R., ...

In Candida albicans, the a1-α2 complex represses white-opaque switching, as well as mating. Based upon the assumption that the a1-α2 corepressor complex binds to the gene that regulates...

ProCAT: a data analysis approach for protein microarrays

Zhu, Xiaowei, Gerstein, Mark, Snyder, Michael

ProCAT, a powerful and flexible new approach for analyzing many types of protein microarrays, is described.

BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments

Wang, Lu-yong, Snyder, Michael, Gerstein, Mark

BoCaTFBS, a new method that combines noisy data from ChIP-chip experiments with known binding-site patterns, is described and applied to the ENCODE project.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation

Karro, John E., Yan, Yangpan, Zheng, Deyou, Zhang, Zhaolei, Carriero, Nicholas, Cayting, Philip, ...

The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different...

Predicting essential genes in fungal genomes

Seringhaus, Michael, Paccanaro, Alberto, Borneman, Anthony, Snyder, Michael, Gerstein, Mark

Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through...

Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool

Yu, Haiyuan, Nguyen, Katherine, Royce, Tom, Qian, Jiang, Nelson, Kenneth, Snyder, Michael, ...

Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able...

Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome

Li, Lei, Wang, Xiangfeng, Sasidharan, Rajkumar, Stolc, Viktor, Deng, Wei, He, Hang, ...

Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the...

Genomic analysis of the hierarchical structure of regulatory networks

Yu, Haiyuan, Gerstein, Mark

A fundamental question in biology is how the cell uses transcription factors (TFs) to coordinate the expression of thousands of genes in response to various stimuli. The relationships between TFs and...

The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics

Yu, Haiyuan, Kim, Philip M, Sprecher, Emmett, Trifonov, Valery, Gerstein, Mark

It has been a long-standing goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has...

Tilescope: online analysis pipeline for high-density tiling microarray data

Zhang, Zhengdong D, Rozowsky, Joel, Lam, Hugo YK, Du, Jiang, Snyder, Michael, Gerstein, Mark

Tilescope is a fully integrated and automated new data-processing pipeline for analyzing high-density tiling-array data.

New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis

Smith, Michael G., Gianoulis, Tara A., Pukatzki, Stefan, Mekalanos, John J., Ornston, L. Nicholas, Gerstein, Mark, ...

Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary...

Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays

Popescu, Sorina C., Popescu, George V., Bachan, Shawn, Zhang, Zimei, Seay, Montrell, Gerstein, Mark, ...

Calmodulins (CaMs) are the most ubiquitous calcium sensors in eukaryotes. A number of CaM-binding proteins have been identified through classical methods, and many proteins have been predicted to...

Normal modes for predicting protein motions: A comprehensive database assessment and associated Web tool

Alexandrov, Vadim, Lehnert, Ursula, Echols, Nathaniel, Milburn, Duncan, Engelman, Donald, Gerstein, Mark

We carry out an extensive statistical study of the applicability of normal modes to the prediction of mobile regions in proteins. In particular, we assess the degree to which the observed motions...

The role of disorder in interaction networks: a structural analysis

Kim, Philip M, Sboner, Andrea, Xia, Yu, Gerstein, Mark

Recent studies have emphasized the value of including structural information into the topological analysis of protein networks. Here, we utilized structural information to investigate the role of...

Transmembrane Protein Oxygen Content and Compartmentalization of Cells

Sasidharan, Rajkumar, Smith, Andrew, Gerstein, Mark

Recently, there was a report that explored the oxygen content of transmembrane proteins over macroevolutionary time scales where the authors observed a correlation between the geological time of...

Modeling ChIP Sequencing In Silico with Applications

Zhang, Zhengdong D., Rozowsky, Joel, Snyder, Michael, Chang, Joseph, Gerstein, Mark

ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate...

Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets

Johnson, David S., Li, Wei, Gordon, D. Benjamin, Bhattacharjee, Arindam, Curry, Bo, Ghosh, Jayati, ...

The most widely used method for detecting genome-wide protein–DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first...

YMD: a microarray database for large-scale gene expression analysis.

Cheung, Kei-Hoi, White, Kevin, Hager, Janet, Gerstein, Mark, Reinke, Valerie, Nelson, Kenneth, ...

The use of microarray technology to perform parallel analysis of the expression pattern of a large number of genes in a single experiment has created a new frontier of medical research. The vast...

A genomic analysis of RNA polymerase II modification and chromatin architecture related to 3′ end RNA polyadenylation

Lian, Zheng, Karpikov, Alexander, Lian, Jin, Mahajan, Milind C., Hartman, Stephen, Gerstein, Mark, ...

Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3′ end formation and transcription termination. We used...

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Yip, Kevin Y., Gerstein, Mark

Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While...

Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes

Balasubramanian, Suganthi, Zheng, Deyou, Liu, Yuen-Jong, Fang, Gang, Frankish, Adam, Carriero, Nicholas, ...

An analysis of ribosomal protein pseudogenes in the four mammalian genomes reveals no correlation between number of pseudogenes and mRNA abundance.

MSB: A mean-shift-based approach for the analysis of structural variation in the genome

Wang, Lu-yong, Abyzov, Alexej, Korbel, Jan O., Snyder, Michael, Gerstein, Mark

Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining...

MAPK target networks in Arabidopsis thaliana revealed using functional protein microarrays

Popescu, Sorina C., Popescu, George V., Bachan, Shawn, Zhang, Zimei, Gerstein, Mark, Snyder, Michael, ...

Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs)....

Zebrafish miR-1 and miR-133 shape muscle gene expression and regulate sarcomeric actin organization

Mishima, Yuichiro, Abreu-Goodger, Cei, Staton, Alison A., Stahlhut, Carlos, Shou, Chong, Cheng, Chao, ...

microRNAs (miRNAs) represent ∼4% of the genes in vertebrates, where they regulate deadenylation, translation, and decay of the target messenger RNAs (mRNAs). The integrated role of miRNAs to...

An approach to compare genome tiling microarray and MPSS sequencing data for transcript mapping

Sasidharan, Rajkumar, Agarwal, Ashish, Rozowsky, Joel, Gerstein, Mark

We are correcting the abstract of our published article ([1]). The sentence that starts "We observe that 4.5% of MPSS tags...." was not scientifically complete in the original abstract, having only...

mRNA expression profiles show differential regulatory effects of microRNAs between estrogen receptor-positive and estrogen receptor-negative breast cancer

Cheng, Chao, Fu, Xuping, Alves, Pedro, Gerstein, Mark

Most microRNAs have a stronger inhibitory effect in estrogen receptor-negative than in estrogen receptor-positive breast cancers