Simon Kasif

BMC Bioinformatics BioMed Central (2009)

Yu Zheng, Brian P Anton, Richard J Roberts, Simon Kasif

Research article Phylogenetic detection of conserved gene clusters in microbial genomes

Systems Biology via Redescription and Ontologies (III): Protein Classification using Malaria Parasite’s Temporal Transcriptomic Profiles (2009)

Antonina Mitrofanova, Simon Kasif, Samantha Kleinberg, Bud Mishra, Jane Carlton

This paper addresses the protein classification problem, and explores how its accuracy can be improved by using information from time-course gene expression data. The methods are tested on data from...

Methods Human–Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis (2009)

Lingang Zhang, Vladimir Pavlovic, Charles R. Cantor, Simon Kasif

The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding...

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering (2009)

Dotan-Cohen, Dikla, Kasif, Simon, Melkman, Avraham A.

Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to...

Integration of relational and hierarchical network information for protein function prediction (2008)

Jiang, Xiaoyu, Nariai, Naoki, Steffen, Martin, Kasif, Simon, Kolaczyk, Eric D

Abstract Background In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations,...

Analysis of gene expression in a developmental context emphasizes distinct biological leitmotifs in human cancers (2008)

Naxerova, Kamila, Bult, Carol J, Peaston, Anne, Fancher, Karen, Knowles, Barbara B, Kasif, Simon, ...

Abstract Background In recent years, the molecular underpinnings of the long-observed resemblance between neoplastic and immature tissue have begun to emerge. Genome-wide transcriptional profiling...

The Limits of Multiplex PCR (2008)

John Rachlin, Chunming Ding, Charles Cantor, Simon Kasif

Multiplex PCR is an essential cost-saving technique for large scale genotyping with significant scientific, clinical, and commercial applications including gene expression [1], whole-genome...

Modeling Transcription Factor Motifs (2008)

Ben Kao, Simon Kasif, Deyou Cai, Peter Nelson

Motivation: Generating small models for which there may exist very little training data presents a crucial problem in computational biology, namely the trade-off between model specificity and...

Quantitative Analysis of Single Nucleotide Polymorphisms within Copy Number Variation (2008)

Lee, Soohyun, Kasif, Simon, Weng, Zhiping, Cantor, Charles R.

Background Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were...

Integration of relational and hierarchical network information for protein function prediction (2008)

Jiang, Xiaoyu, Nariai, Naoki, Steffen, Martin, Kasif, Simon, Kolaczyk, Eric

BACKGROUND:In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become...

4 (2007)

Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, Steven L. Salzberg

A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of...

Induction of Shallow Decision Trees (2007)

David Dobkin, Truxton Fulton, Dimitrios Gunopulos, Simon Kasif, Steven Salzberg

In this paper we describe efficient algorithms that induce shallow (i.e., low depth) decision trees. A key feature of these algorithms is their ability to induce decision trees over real-valued data...

Modeling DNA sequences with Bayes networks (2007)

Deyou Cai, Arthur Delcher, Ben Kao, Simon Kasif

1 Introduction Recent advances in biotechnology have triggered the generation of massive amounts of biological data. The size and complexity of biological sequence databases suggest that automated...

Learning a Hidden Matching Combinatorial Identication of Hidden Matchings with Applications to Whole Genome Sequencing (2007)

Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, Benny Sudakov

We consider the problem of learning a matching (i.e., a graph in which all vertices have degree 0 or 1) in a model where the only allowed operation is to query whether a set of vertices induces an...

Letter Computational Identification of Operons in Microbial Genomes (2007)

Yu Zheng, Joseph D. Szustakowski, Lance Fortnow, Richard J. Roberts, Simon Kasif

By applying graph representations to biochemical pathways, a new computational pipeline is proposed to find potential operons in microbial genomes. The algorithm relies on the fact that enzyme genes...

Towards An Adaptive Framework for Information Retrieval (2007)

Scott A. Weiss, Simon Kasif, Eric Brill

We report on our investigation into techniques for adaptive information retrieval. We describe our domain of USENET newsgroups, and discuss some of our inital experiments. We illustrate the weakness...

(work on this paper performed during an internship at CRL) (2007)

Beth Logan, Pedro Moreno, Baris Suzek, Zhiping Weng, Simon Kasif, Beth Logan, ...

Functional annotation of newly sequenced genomes is an important challenge for computational biology systems. While much progress has been made towards scalingup experimental methods for functional...

Analysis and Algorithms for Protein Sequence-Structure Alignment by (2007)

Steven Salzberg, David Searls, Simon Kasif, Richard H. Lathrop, Robert G. Rogers, Jadwiga Bienkowska, ...

This chapter discusses analytic and algorithmic results for computational protein structure prediction by protein sequence-structure alignment, an approach also known as protein threading. Biological...

Tel Aviv University (2007)

Richard Beigel, Noga Alon, Mehmet Serkan Apaydn, Lance Fortnow, Simon Kasif

Tettelin et. al. proposed a new method for closing the gaps in whole genome shotgun sequencing projects. The method uses a multiplex PCR strategy in order to minimize the time and eort required to...

4 (2007)

Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, Steven L. Salzberg

A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of...

Learning a Hidden Matching (2007)

Combinatorial Identification Of, Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, Benny Sudakov

We consider the problem of learning a matching (i.e., a graph in which all vertices have degree 0 or 1) in a model where the only allowed operation is to query whether a set of vertices induces an...

(work on this paper performed during an internship at CRL) (2007)

Beth Logan, Beth Logan, Pedro Moreno, Pedro Moreno, Baris Suzek, Baris Suzek, ...

Functional annotation of newly sequenced genomes is an important challenge for computational biology systems. While much progress has been made towards scalingup experimental methods for functional...

Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models (2007)

Manway Liu, Arthur Liberzon, Sek Won Kong, Weil R. Lai, Peter J. Park, Isaac S. Kohane, ...

Type 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug...

Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles (2007)

Jeremiah J. Faith, Boris Hayete, Joshua T. Thaden, Ilaria Mogno, Jamey Wierzbowski, Guillaume Cottarel, ...

Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental...

Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles (2007)

Jeremiah J. Faith, Boris Hayete, Joshua T. Thaden, Ilaria Mogno, Jamey Wierzbowski, Guillaume Cottarel, ...

Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental...

Hierarchical tree snipping: clustering guided by prior knowledge (2007)

Dotan-Cohen, Dikla, Melkman, Avraham A., Kasif, Simon

Motivation: Hierarchical clustering is widely used to cluster genes into groups based on their expression similarity. This method first constructs a tree. Next this tree is partitioned into subtrees...

Towards the identification of essential genes using targeted genome sequencing and comparative analysis (2006)

Gustafson, Adam M, Snitkin, Evan S, Parker, Stephen CJ, DeLisi, Charles, Kasif, Simon

Abstract Background The identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in...

BioMed Central (2006)

Bmc Genomics, Adam M Gustafson, Evan S Snitkin, Stephen Cj Parker, Charles Delisi, Simon Kasif, ...

Research article Towards the identification of essential genes using targeted genome sequencing and comparative analysis

Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes (2005)

Lee, Soohyun, Kohane, Isaac, Kasif, Simon

Abstract Background Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has...

Phylogenetic detection of conserved gene clusters in microbial genomes (2005)

Zheng, Yu, Anton, Brian P, Roberts, Richard J, Kasif, Simon

Abstract Background Microbial genomes contain an abundance of genes with conserved proximity forming clusters on the chromosome. However, the conservation can be a result of many factors such as...

Computational tradeoffs in multiplex PCR assay design for SNP genotyping (2005)

Rachlin, John, Ding, Chunming, Cantor, Charles, Kasif, Simon

Abstract Background Multiplex PCR is a key technology for detecting infectious microorganisms, whole-genome sequencing, forensic analysis, and for enabling flexible yet low-cost genotyping. However,...

MuPlex: multi-objective multiplex PCR assay design (2005)

John Rachlin, Chunming Ding, Charles Cantor, Simon Kasif

We have developed a web-enabled system called MuPlexthataidsresearchersinthedesignofmultiplex PCR assays. Multiplex PCR is a key technology for an endless list of applications, including detecting...

Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes (2005)

Lee, Soohyun, Kohane, Isaac, Kasif, Simon

BACKGROUND:Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has...

MuPlex: multi-objective multiplex PCR assay design (2005)

Rachlin, John, Ding, Chunming, Cantor, Charles, Kasif, Simon

We have developed a web-enabled system called MuPlex that aids researchers in the design of multiplex PCR assays. Multiplex PCR is a key technology for an endless list of applications, including...

GEMS: a web server for biclustering analysis of expression data (2005)

Wu, Chang-Jiun, Kasif, Simon

The advent of microarray technology has revolutionized the search for genes that are differentially expressed across a range of cell types or experimental conditions. Traditional clustering methods,...

Less is more: towards an optimal universal description of protein folds (2005)

Szustakowski, Joseph D., Kasif, Simon, Weng, Zhiping

Motivation: Identification and characterization of protein structure regularities can reveal the mechanisms governing protein structure, function and evolution. Here we focus on an intermediate level...

Segmentally Variable Genes: A New Perspective on Adaptation (2004)

Yu Zheng, Richard J. Roberts, Simon Kasif

Segmentally variable genes show a mosaic pattern of one or more rapidly evolving, variable regions. Discerning their function may provide new insights into the forces that shape genome diversity and...

Segmentally Variable Genes: A New Perspective on Adaptation (2004)

Yu Zheng, Richard J. Roberts, Simon Kasif

Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes...

Identification of transcription factor binding sites upstream of human genes regulated by the phosphatidylinositol 3-kinase and MEK/ERK signaling pathways (2004)

John W. Tullai, Michael E. Schaffer, Steven Mullenbrock, Simon Kasif, Geoffrey M. Cooper

We have taken an integrated approach in which expression profiling has been combined with the use of small molecule inhibitors and computational analysis of transcription factor binding sites to...

topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association (2004)

Nathan O. Stitziel, T. Andrew Binkowski, Yan Yuan Tseng, Jie Liang, Simon Kasif

The database of topographic mapping of Single Nucleotide Polymorphism (topoSNP) provides an online resource for analyzing non-synonymous SNPs (nsSNPs) that can be mapped onto known 3D structures of...

Identification of genes with fast-evolving regions in microbial genomes (2004)

Zheng, Yu, Roberts, Richard J., Kasif, Simon

Complete sequences of multiple strains of the same microbial species provide an invaluable source for studying the evolutionary dynamics between orthologous genes over a relatively short time scale....

topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association (2004)

Stitziel, Nathan O., Binkowski, T. Andrew, Tseng, Yan Yuan, Kasif, Simon, Liang, Jie

The database of topographic mapping of Single Nucleotide Polymorphism (topoSNP) provides an online resource for analyzing non‐synonymous SNPs (nsSNPs) that can be mapped onto known 3D structures...

Extracting conserved gene expression motifs from gene expression data (2003)

T. M. Murali, Simon Kasif

We propose a representation for gene expression data called conserved gene expression motifs or xmotifs. A gene’s expression level is conserved across a set of samples if the gene is expressed with...

Predicting (2003)

Stanley Letovsky, Simon Kasif

Vol. 19 Suppl. 1 2003, pages i197–i204

Structural location of disease-associated singlenucleotide polymorphisms (2003)

Nathan O. Stitziel, Yan Yuan Tseng, Dimitri Pervouchine, David Goddeau, Simon Kasif, Jie Liang

Single-nucleotide polymorphisms (SNPs) are the most common form of human genetic variation. The coding regions of the human genome contain about 500,000 SNPs. 1 Among these, the nonsynonymous

Extracting conserved gene expression motifs from gene expression data (2003)

S. Kasif, T. M. Murali, T. M. Murali, Simon Kasif

We propose a representation for gene expression data called conserved gene expression motifs or xmotifs. A gene’s expression level is conserved across a set of samples if the gene is expressed with...

Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis (2003)

Zhang, Lingang, Pavlovic, Vladimir, Cantor, Charles R, Kasif, Simon

The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding...

On the normalization of RNA equilibrium free energy to the length of the sequence (2003)

Pervouchine, Dmitri D., Graber, Joel H., Kasif, Simon

There is no universal definition of stability for RNA secondary structures. Here we present an approach that is based on normalization of the equilibrium free energy to the length of the sequence: a...

Predicting protein function from protein/protein interaction data: a probabilistic approach (2003)

Letovsky, Stanley, Kasif, Simon

Motivation:The development of experimental methods for genome scale analysis of molecular interaction networks has made possible new approaches to inferring protein function. This paper describes a...

Genomic functional annotation using co-evolution profiles of gene clusters (2002)

Zheng, Yu, Roberts, Richard J, Kasif, Simon

Abstract Background The current speed of sequencing already exceeds the capability of annotation, creating a potential bottleneck. A large proportion of the genes in microbial genomes remains...

Learning a hidden matching (2002)

Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, Benny Sudakov

Abstract. We consider the problem of learning a matching (i.e., a graph in which all vertices have degree 0 or 1) in a model where the only allowed operation is to query whether a set of vertices...

A comparative genomic method for computational identification of prokaryotic translation initiation sites (2002)

Walker, Megon, Pavlovic, Vladimir, Kasif, Simon

The ever growing number of completely sequenced prokaryotic genomes facilitates cross‐species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework...

A Bayesian framework for combining gene predictions (2002)

Pavlovic, Vladimir, Garg, Ashutosh, Kasif, Simon

Motivation: Gene identification and gene discovery in new genomic sequences is one of the most timely computational questions addressed by bioinformatics scientists. This computational research has...

A computational framework for optimal masking in the synthesis of oligonucleotide microarrays (2002)

Kasif, Simon, Weng, Zhiping, Derti, Adnan, Beigel, Richard, DeLisi, Charles

High‐throughput genomic technologies are revolutionizing modern biology. In particular, DNA microarrays have become one of the most powerful tools for profiling global mRNA expression in...

An optimal procedure for gap closing in whole genome shotgun sequencing (2001)

Richard Beigel, Lance Fortnow, Mehmet Serkan Apaydin, Simon Kasif

Tettelin et. al. proposed a new method for closing the gaps in whole genome shotgun sequencing projects. The method uses a multiplex PCR strategy in order to minimize the time and effort required to...

Modeling splice sites with Bayes networks (2000)

Cai, Deyou, Delcher, Arthur, Kao, Ben, Kasif, Simon

Motivation: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of...

Datascope: Mining Biological Sequences (1999)

Simon Kasif

data-mining community defines mining data as a collection of techniques for extracting knowledge out of large databases. This definition is a bit ambiguous because the “knowledge ” extracted from...

2003), “Inductive Inference: An Axiomatic Approach (1999)

Itzhak Gilboa, David Schmeidler, Edi Karni, Simon Kasif, Daniel Lehmann, Sujoy Mukerji, ...

A predictor is asked to rank eventualities according to their plausibility, based on past cases. We assume that she can form a ranking given any memory that consists of finitely many past cases. Mild...

An Axiomatic Approach to Inductive Inference: Extended Abstract ¤ (1999)

Itzhak Gilboa Y, David Schmeidler Z, Edi Karni, Simon Kasif, Daniel Lehmann, Sujoy Mukerji, ...

Apredictorisaskedtorankeventualitiesaccordingtotheirplausibility, based on past cases. We assume that she can form a ranking given any memory that consists of …nitely many past cases. Mild...

Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project (1999)

Hervé Tettelin, Diana Radune, Simon Kasif, Hoda Khouri, Steven L. Salzberg

A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the...

1 Data Mining Research: Opportunities and Challenges A Report of three NSF Workshops on Mining Large, Massive, and Distributed (1999)

Robert Grossman, Simon Kasif, Reagan Moore, David Rocke, Jeff Ullman

All opinions, findings, conclusions and recommendations in any material resulting from these workshops are those of the workshops ' participants, and do not necessarily reflect the views of the...

Complexity of Connectionist and Constraint-Satisfaction Networks. (1998)

Kasif, Simon

Since the beginning of the funding of the grant, we established a substantial effort in the area of connectionist optimization algorithms, relaxation networks, and geometrical learning algorithms....

Microbial gene identification using interpolated Markov models (1998)

Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...

Microbial gene identification using interpolated Markov models (1998)

Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...

A Probabilistic Framework for Memory-Based Reasoning (1998)

Simon Kasif, Steven Salzberg, David Waltz, John Rachlin, David Aha

In this paper, we propose a probabilistic framework for Memory-Based Reasoning (MBR). The framework allows us to clarify the technical merits and limitations of several recently published MBR methods...

Microbial gene identification using interpolated Markov models (1998)

Steven L. Salzberg, Arthur L. Delcher, Simon Kasif, Owen White

This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and other complete microbial genomes, this...

A Probabilistic Framework for Memory-Based Reasoning (1998)

Simon Kasif, Steven Salzberg, David Waltz, John Rachlin, David Aha

In this paper, we propose a probabilistic framework for Memory-Based Reasoning (MBR). The framework allows us to clarify the technical merits and limitations of several recently published MBR methods...

A Probabilistic Framework for Memory-Based Reasoning (1998)

Simon Kasif, Steven Salzberg, David Waltz, John Rachlin, David Aha

In this paper, we propose a probabilistic framework for Memory-Based Reasoning (MBR). The framework allows us to clarify the technical merits and limitations of several recently published MBR methods...

A Probabilistic Framework for Memory-Based Reasoning (1998)

Simon Kasif, Steven Salzberg, David Waltz, John Rachlin, David Aha

In this paper, we propose a probabilistic framework for Memory-Based Reasoning (MBR). The framework allows us to clarify the technical merits and limitations of several recently published MBR methods...

Winnow vs. SMART: A Comparison in the Newsgroup Classification Domain (1997)

Scott A. Weiss, Simon Kasif

We detail our continuing investigation into the problem of classifying USENET postings by newsgroup. Our best results utilize a simple method we called full-text concatenation in the SMART...

Text Classification in USENET Newsgroups: A Progress Report (1996)

Scott A. Weiss, Simon Kasif, Eric Brill

We report on our investigations into topic classification with USENET newsgroups. Our framework is to determine the newsgroup that a new document should be posted to. We train our system by forming...

Committees of Decision Trees (1996)

David Heath, Simon Kasif, Steven Salzberg

Many intelligent systems are designed to sift through a mass of evidence and arrive at a decision. Certain pieces of evidence may be given more weight than others, and this may affect the final...

Local Induction of Decision Trees: Towards Interactive Data Mining (1996)

Truxton Fulton, Simon Kasif, Steven Salzberg, David Waltz

Decision trees are an important data mining tool with many applications. Like many classification techniques, decision trees process the entire data base in order to produce a generalization of the...

Text Classification in USENET Newsgroups: A Progress Report (1996)

Scott Weiss, Simon Kasif, Eric Brill

We report on our investigations into topic classification with USENET newsgroups. Our framework is to determine the newsgroup that a new document should be posted to. We train our system by forming...

Best-Case Results for Nearest Neighbor Learning (1995)

Steven Salzberg, Arthur Delcher, David Heath, Simon Kasif

In this paper we propose a theoretical model for analysis of classification methods, in which the teacher knows the classification algorithm and chooses examples in the best way possible. We apply...

Towards a Framework for Memory-Based Reasoning (1995)

Simon Kasif, Steven Salzberg, David Waltz, John Rachlin, David Aha

In this paper, we propose a broad framework for Memory-Based Reasoning (MBR). The framework allows us to clarify the technical merits and limitations of several current MBR methods and to design new...

Logarithmic-Time Updates and Queries in Probabilistic Networks (1995)

Arthur Delcher, Adam Grove, Simon Kasif, Judea Pearl

Traditional databases commonly support efficient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a first step toward...

On Reasoning from Data (1995)

David Waltz, Simon Kasif

Introduction Our society is currently entering a new phase where gigabytes of information are being generated, and are becoming readily available for exploration over academic networks, digital...

Logarithmic-time updates and queries in probabilistic networks (1995)

Arthur L. Delcher, Adam J. Grove, Simon Kasif

Traditional databases commonly support e cient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a rst step toward...

Towards a framework for Memory-Based Reasoning (1995)

Simon Kasif, Steven Salzberg, David Waltz, John Rachlin, David Aha

In this paper, we propose a probabilistic framework for Memory-Based Reasoning (MBR). The framework allows us to clarify the technical merits and limitations of several recently published MBR methods...

Towards a better understanding of memory-based and Bayesian classifiers (1994)

John Rachlin, Simon Kasif, Steven Salzberg, David W. Aha

We quantify both experimentally and analytically the performance of memorybased reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR...

A System for Induction of Oblique Decision Trees (1994)

Sreerama K. Murthy, Simon Kasif, Steven Salzberg

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in...

Local Consistency in Parallel Constraint-Satisfaction Networks (1994)

Simon Kasif, Arthur L. Delcher

We summarize our work on the parallel complexity of local consistency in constraint networks, and present several basic techniques for achieving parallel execution of constraint networks. We are...

Towards a Better Understanding of Memory-Based Reasoning Systems (1994)

John Rachlin, Simon Kasif, Steven Salzberg, David W. Aha

We quantify both experimentally and analytically the performance of memorybased reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR...

A system for induction of oblique decision trees (1994)

Sreerama K. Murthy, Simon Kasif, Steven Salzberg

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to nd a good oblique split (in the...

OC1: A randomized algorithm for building oblique decision trees (1993)

Sreerama K. Murthy, Simon Kasif, Steven Salzberg, Richard Beigel

This paper introduces OC1, a new algorithm for generating multivariate decision trees. Multivariate trees classify examples by testing linear combinations of the features at each non-leaf node of the...

Induction of Oblique Decision Trees (1993)

David Heath, Simon Kasif, Steven Salzberg

This paper introduces a randomized technique for partitioning examples using oblique hyperplanes. Standard decision tree techniques, such as ID3 and its descendants, partition a set of points with...

Protein Secondary Structure Modelling with Probabilistic Networks (1993)

Extended Abstract, Arthur L. Delcher, Simon Kasif, Harry R. Goldberg, William H. Hsu

In this paper we study the performance of probabilistic networks in the context of protein sequence analysis in molecular biology. Specifically, we report the results of our initial experiments...

OC1: Randomized induction of oblique decision trees (1993)

Sreerama Murthy, Simon Kasif, Steven Salzberg, Richard Beigel

This paper introduces OC1, a new algorithm for generating multivariate decision trees. Multivariate trees classify examples by testing linear combinations of the features at each non-leaf node of the...

Learning nested concept classes with limited storage (1991)

David Heath, Simon Kasif, S. Rao Kosaraju, Steven Salzberg, Gregory Sullivan

Many existing learning methods use incremental algorithms that construct a generalization in one pass through a set of training data and modify it in subsequent passes (e.g., perceptrons, neural...

Formula dissection: A parallel algorithm for constraint satisfaction (1987)

John H Reif, Simon Kasif, Deepak Sherlekar

Many well-known problems in Artificial Intelligence can be formulated in terms of systems of constraints. The problem of testing the satisfiability of propositional formulae (SAT) is of special...

Genomic functional annotation using co-evolution profiles of gene clusters

Zheng, Yu, Roberts, Richard J, Kasif, Simon

The current speed of sequencing already exceeds the capability of annotation. A new method for functional annotation is proposed using the conservation patterns of gene clusters and has been applied...

A comparative genomic method for computational identification of prokaryotic translation initiation sites

Walker, Megon, Pavlovic, Vladimir, Kasif, Simon

The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for...

A computational framework for optimal masking in the synthesis of oligonucleotide microarrays

Kasif, Simon, Weng, Zhiping, Derti, Adnan, Beigel, Richard, DeLisi, Charles

High-throughput genomic technologies are revolutionizing modern biology. In particular, DNA microarrays have become one of the most powerful tools for profiling global mRNA expression in different...

On the normalization of RNA equilibrium free energy to the length of the sequence

Pervouchine, Dmitri D., Graber, Joel H., Kasif, Simon

There is no universal definition of stability for RNA secondary structures. Here we present an approach that is based on normalization of the equilibrium free energy to the length of the sequence: a...

Computational Identification of Operons in Microbial Genomes

Zheng, Yu, Szustakowski, Joseph D., Fortnow, Lance, Roberts, Richard J., Kasif, Simon

By applying graph representations to biochemical pathways, a new computational pipeline is proposed to find potential operons in microbial genomes. The algorithm relies on the fact that enzyme genes...

topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association

Stitziel, Nathan O., Binkowski, T. Andrew, Tseng, Yan Yuan, Kasif, Simon, Liang, Jie

The database of topographic mapping of Single Nucleotide Polymorphism (topoSNP) provides an online resource for analyzing non-synonymous SNPs (nsSNPs) that can be mapped onto known 3D structures of...

Whole-genome annotation by using evidence integration in functional-linkage networks

Karaoz, Ulas, Murali, T. M., Letovsky, Stan, Zheng, Yu, Ding, Chunming, Cantor, Charles R., ...

The advent of high-throughput biology has catalyzed a remarkable improvement in our ability to identify new genes. A large fraction of newly discovered genes have an unknown functional role,...

Segmentally Variable Genes:A New Perspective on Adaptation

Zheng, Yu, Roberts, Richard J, Kasif, Simon

Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes...

Human–Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis

Zhang, Lingang, Pavlovic, Vladimir, Cantor, Charles R, Kasif, Simon

The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding...

GC/AT-content spikes as genomic punctuation marks

Zhang, Lingang, Kasif, Simon, Cantor, Charles R., Broude, Natalia E.

Large-scale analysis of the GC-content distribution at the gene level reveals both common features and basic differences in genomes of different groups of species. Sharp changes in GC content are...

Identification of genes with fast-evolving regions in microbial genomes

Zheng, Yu, Roberts, Richard J., Kasif, Simon

Complete sequences of multiple strains of the same microbial species provide an invaluable source for studying the evolutionary dynamics between orthologous genes over a relatively short time scale....

Characterization of Two New Aminopeptidases in Escherichia coli

Zheng, Yu, Roberts, Richard J., Kasif, Simon, Guan, Chudi

Two genes in the Escherichia coli genome, ypdE and ypdF, have been cloned and expressed, and their products have been purified. YpdF is shown to be a metalloenzyme with Xaa-Pro aminopeptidase...

MuPlex: multi-objective multiplex PCR assay design

Rachlin, John, Ding, Chunming, Cantor, Charles, Kasif, Simon

We have developed a web-enabled system called MuPlex that aids researchers in the design of multiplex PCR assays. Multiplex PCR is a key technology for an endless list of applications, including...

GEMS: a web server for biclustering analysis of expression data

Wu, Chang-Jiun, Kasif, Simon

The advent of microarray technology has revolutionized the search for genes that are differentially expressed across a range of cell types or experimental conditions. Traditional clustering methods,...

Genomic functional annotation using co-evolution profiles of gene clusters

Zheng, Yu, Roberts, Richard J, Kasif, Simon

The current speed of sequencing already exceeds the capability of annotation. A new method for functional annotation is proposed using the conservation patterns of gene clusters and has been applied...

A comparative genomic method for computational identification of prokaryotic translation initiation sites

Walker, Megon, Pavlovic, Vladimir, Kasif, Simon

The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for...

A computational framework for optimal masking in the synthesis of oligonucleotide microarrays

Kasif, Simon, Weng, Zhiping, Derti, Adnan, Beigel, Richard, DeLisi, Charles

High-throughput genomic technologies are revolutionizing modern biology. In particular, DNA microarrays have become one of the most powerful tools for profiling global mRNA expression in different...

On the normalization of RNA equilibrium free energy to the length of the sequence

Pervouchine, Dmitri D., Graber, Joel H., Kasif, Simon

There is no universal definition of stability for RNA secondary structures. Here we present an approach that is based on normalization of the equilibrium free energy to the length of the sequence: a...

Computational Identification of Operons in Microbial Genomes

Zheng, Yu, Szustakowski, Joseph D., Fortnow, Lance, Roberts, Richard J., Kasif, Simon

By applying graph representations to biochemical pathways, a new computational pipeline is proposed to find potential operons in microbial genomes. The algorithm relies on the fact that enzyme genes...

topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association

Stitziel, Nathan O., Binkowski, T. Andrew, Tseng, Yan Yuan, Kasif, Simon, Liang, Jie

The database of topographic mapping of Single Nucleotide Polymorphism (topoSNP) provides an online resource for analyzing non-synonymous SNPs (nsSNPs) that can be mapped onto known 3D structures of...

Whole-genome annotation by using evidence integration in functional-linkage networks

Karaoz, Ulas, Murali, T. M., Letovsky, Stan, Zheng, Yu, Ding, Chunming, Cantor, Charles R., ...

The advent of high-throughput biology has catalyzed a remarkable improvement in our ability to identify new genes. A large fraction of newly discovered genes have an unknown functional role,...

Segmentally Variable Genes:A New Perspective on Adaptation

Zheng, Yu, Roberts, Richard J, Kasif, Simon

Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes...

Human–Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis

Zhang, Lingang, Pavlovic, Vladimir, Cantor, Charles R, Kasif, Simon

The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding...

GC/AT-content spikes as genomic punctuation marks

Zhang, Lingang, Kasif, Simon, Cantor, Charles R., Broude, Natalia E.

Large-scale analysis of the GC-content distribution at the gene level reveals both common features and basic differences in genomes of different groups of species. Sharp changes in GC content are...

Identification of genes with fast-evolving regions in microbial genomes

Zheng, Yu, Roberts, Richard J., Kasif, Simon

Complete sequences of multiple strains of the same microbial species provide an invaluable source for studying the evolutionary dynamics between orthologous genes over a relatively short time scale....

Characterization of Two New Aminopeptidases in Escherichia coli

Zheng, Yu, Roberts, Richard J., Kasif, Simon, Guan, Chudi

Two genes in the Escherichia coli genome, ypdE and ypdF, have been cloned and expressed, and their products have been purified. YpdF is shown to be a metalloenzyme with Xaa-Pro aminopeptidase...

MuPlex: multi-objective multiplex PCR assay design

Rachlin, John, Ding, Chunming, Cantor, Charles, Kasif, Simon

We have developed a web-enabled system called MuPlex that aids researchers in the design of multiplex PCR assays. Multiplex PCR is a key technology for an endless list of applications, including...

GEMS: a web server for biclustering analysis of expression data

Wu, Chang-Jiun, Kasif, Simon

The advent of microarray technology has revolutionized the search for genes that are differentially expressed across a range of cell types or experimental conditions. Traditional clustering methods,...

Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles

Faith, Jeremiah J, Hayete, Boris, Thaden, Joshua T, Mogno, Ilaria, Wierzbowski, Jamey, Cottarel, Guillaume, ...

Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental...

Biological context networks: a mosaic view of the interactome

Rachlin, John, Cohen, Dikla Dotan, Cantor, Charles, Kasif, Simon

Network models are a fundamental tool for the visualization and analysis of molecular interactions occurring in biological systems. While broadly illuminating the molecular machinery of the cell,...

Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

Nariai, Naoki, Kolaczyk, Eric D., Kasif, Simon

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not...

Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models

Liu, Manway, Liberzon, Arthur, Kong, Sek Won, Lai, Weil R, Park, Peter J, Kohane, Isaac S, ...

Type 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug...

Quantifying DNA–protein binding specificities by using oligonucleotide mass tags and mass spectroscopy

Zhang, Lingang, Kasif, Simon

The ability to determine the relative binding affinity of different transcription-factors (TF) to their DNA binding sites is fundamentally important for a comprehensive understanding of gene...

Analysis of gene expression in a developmental context emphasizes distinct biological leitmotifs in human cancers

Naxerova, Kamila, Bult, Carol J, Peaston, Anne, Fancher, Karen, Knowles, Barbara B, Kasif, Simon, ...

A systematic analysis of the relationship between the neoplastic and developmental transcriptome provides an outline of global trends in cancer gene expression.

RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli

Anton, Brian P., Saleh, Lana, Benner, Jack S., Raleigh, Elisabeth A., Kasif, Simon, Roberts, Richard J.

Ribosomal protein S12 undergoes a unique posttranslational modification, methylthiolation of residue D88, in Escherichia coli and several other bacteria. Using mass spectrometry, we have identified...

Genomewide Analysis of PRC1 and PRC2 Occupancy Identifies Two Classes of Bivalent Domains

Ku, Manching, Koche, Richard P., Rheinbay, Esther, Mendenhall, Eric M., Endoh, Mitsuhiro, Mikkelsen, Tarjei S., ...

In embryonic stem (ES) cells, bivalent chromatin domains with overlapping repressive (H3 lysine 27 tri-methylation) and activating (H3 lysine 4 tri-methylation) histone modifications mark the...

1 2 3 4 5 6 7 8

Nathan O. Stitziel, Yan Yuan Tseng, Dimitri Pervouchine, David Goddeau, Simon Kasif, Jie Liang

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

Dotan-Cohen, Dikla, Kasif, Simon, Melkman, Avraham A.

Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to...

Triplet repeat length bias and variation in the human transcriptome

Molla, Michael, Delcher, Arthur, Sunyaev, Shamil, Cantor, Charles, Kasif, Simon

Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major...

Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance

Park, Peter J., Kong, Sek Won, Tebaldi, Toma, Lai, Weil R., Kasif, Simon, Kohane, Isaac S.

Motivation: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes...