DOI: 10.1093/bioinformatics/bth065 (2009)
A metric model of amino acid substitution
Integrating shotgun proteomics and mRNA expression data to improve protein identification (2009)
Ramakrishnan, Smriti R., Vogel, Christine, Prince, John T., Li, Zhihua, Penalva, Luiz O., Myers, Margaret, ...
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun...
Mining gene functional networks to improve mass-spectrometry-based protein identification (2009)
Ramakrishnan, Smriti R., Vogel, Christine, Kwon, Taejoon, Penalva, Luiz O., Marcotte, Edward M., Miranker, Daniel P.
Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical...
Errata for “A Metric Model of Amino Acid Substitution” (2008)
Weijia Xu, Weijia Xu, Daniel P. Miranker
Due to a programming error, two substitution values in the derived distance metric must be changed in order to form a metric rather than just one as stated in the paper. As a result, the value of the...
Abstract Indexing Protein Sequences in Metric Space (2008)
Weijia Xu, Daniel P. Miranker, Rui Mao, Shu Wang
The hyper-exponential growth of biological sequence data and complex queries demand new approaches of managing sequence databases where sequence data is preprocessed off-line and organized in data...
Juan F. Sequeda, Syed Hamid Tirmizi, Daniel P. Miranker
Our position is founded on two assumptions. First, we assume that the SQL data definition language (SQL-DDL) is capable of encoding substantial domain semantics, albeit not in ways syntactically...
Miranker: Implementing Federated Databases Systems by Compiling SchemaSQL (2008)
François Barbançon, Daniel P. Miranker
Federated systems integrating data from multiple sources must cope with semantic heterogeneity by reasoning over both the data and meta-data of their sources. SchemaSQL is one of a number of related...
Willard S. Willard, Rui Mao, Wenguo Liu, Weijia Xu, Shulin Ni, Daniel P. Miranker
mSQL is an extended SQL query language targeting the expanding area of biological sequence databases and sequence analysis methods. The core aspects include first-class data types for biological...
Daniel P. Miranker, Willard J. Briggs, Rui Mao, Shulin Ni, Weijia Xu, Arthur Kaufmann, ...
The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...
Sequence analysis Biclustering As A Method For RNA Local Multiple Sequence Alignment (2008)
Shu Wang, Robin R. Gutell, Daniel P. Miranker
Bioinformatics Advance Access published October 6, 2007 Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple...
On Metric-Space Indexing and Real Workloads (2008)
Rui Mao, Ving I. Lei, Smriti Ramakrishnan, Weijia Xu, Daniel P. Miranker
Contemporary technology is fostering new demands to manage large collections of complex data, including the contents of multimedia and biological databases. In many cases the similarity of the data...
SPHINX: Schema Integration by Example 1 (2008)
Francois Barbançon, Daniel P. Miranker
We focus on the problem of semi-automated query discovery for XML views without requiring the intervention of an expert to guarantee a correct final result. Given multiple independent sources of...
Abstract Metric-Space Search of Protein Sequence Databases (2008)
Weijia Xu, Rui Mao, Shu Wang, Daniel P. Miranker
Motivation: Growing sequence databases are instigating sequence retrieval systems that construct k-mer (hot-spot) indexes off-line to speed up the on-line query execution. For fixed k, the content of...
SPHINX: Schema integration by example (2008)
Francois Barbançon, Daniel P. Miranker
The Internet has instigated a critical need for automated tools that facilitate integrating countless databases. Since non-technical end users are often the ultimate repositories of the domain...
Vasilii Samoladas, Daniel P. Miranker
and Fapadlmitriou [7] to model data indexing on external memory, Using indexing schemes, the complexity of index-ing is qunntified by two parameters: storage redundancy and access overhead, There is...
Definition and Application of a GenVoca Component Description Language (2007)
Lane Warshaw, Daniel P. Miranker, Kenneth S. Ulrich
We begin the formalization of the GenVoca method of component based software generation by introducing an architectural description language for GenVoca components. The description language includes...
On a Three-Way Hash Join Algorithm (2007)
Vasilis Samoladas, Daniel P. Miranker
We develop hash-based algorithms for computing a three-way join. The method involves hashing all three relations into buckets, and then joining buckets in main memory, three buckets at a time....
Definition and Application of a GenVoca Component Description Language (2007)
Lane Warshaw Daniel, Daniel P. Miranker, Kenneth S. Ulrich
We begin the formalization of the GenVoca method of component based software generation by introducing an architectural description language for GenVoca components. The description language includes...
On Indexing Large Databases for Advanced Data Models (2007)
Daniel P. Miranker, Annamaria B. Amenta, James C. Browne, Donald S. Fussell, Dimitris Georgakopoulos, Vasilis Samoladas, ...
Certifies that this is the approved version of the following dissertation:
Miranker: Implementing Federated Databases Systems by Compiling SchemaSQL (2007)
François Barbançon, Daniel P. Miranker
Federated systems integrating data from multiple sources must cope with semantic heterogeneity by reasoning over both the data and meta-data of their sources. SchemaSQL is one of a number of related...
SPHINX: Schema Integration by Example (2007)
Francois Barbanon, Daniel P. Miranker
We focus on the problem of semi-automated query discovery for XML views without requiring the intervention of an expert to guarantee a correct final result. Given multiple independent sources of...
SPHINX: Schema Integration by Example (2007)
Francois Barbançon, Daniel P. Miranker
We focus on the problem of semi-automated query discovery for XML views without requiring the intervention of an expert to guarantee a correct final result. Given multiple independent sources of...
Morgan, Xochitl C, Ni, Shulin, Miranker, Daniel P, Iyer, Vishwanath R
Abstract Background Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some...
BMC Bioinformatics BioMed Central (2007)
Xochitl C Morgan, Shulin Ni, Daniel P Miranker, Vishwanath R Iyer, Vishwanath R Iyer
Research article Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining
Biclustering as a method for RNA local multiple sequence alignment (2007)
Wang, Shu, Gutell, Robin R., Miranker, Daniel P.
Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of...
BIOINFORMATICS ORIGINAL PAPER Data and Text mining (2006)
Smriti R. Ramakrishnan, Rui Mao, Aleksey A. Nakorchevskiy, John T. Prince, Willard S. Willard, Weijia Xu, ...
doi:10.1093/bioinformatics/btl118 A fast coarse filtering method for peptide identification by mass spectrometry
A fast coarse filtering method for peptide identification by mass spectrometry (2006)
Ramakrishnan, Smriti R., Mao, Rui, Nakorchevskiy, Aleksey A., Prince, John T., Willard, Willard S., Xu, Weijia, ...
Motivation: We reformulate the problem of comparing mass-spectra by mapping spectra to a vector space model. Our search method leverages a metric space indexing algorithm to produce an initial...
On Optimizing Distance-Based Similarity Search for Biological Databases (2005)
Rui Mao, Weijia Xu, Smriti Ramakrishnan, Glen Nuckolls, Daniel P. Miranker
Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three...
Case Study: Distance-Based Image Retrieval (2005)
Rui Mao, Qasim Iqbal, Wenguo Liu, Daniel P. Miranker
Similarity search leveraging distance-based index structures is increasingly being used for complex data types. It has been shown that for high dimensional uniform vectors with similarity norms, any...
Daniel P. Miranker, Kathleen S. Barber, Inderjit S. Dhillon, Raymond J. Mooney, Bruce W. Porter, François Gérard Barbanson, ...
approved version of the following dissertation:
Case Study: Distance-Based Image Retrieval (2005)
Rui Mao, Qasim Iqbal, Wenguo Liu, Daniel P. Miranker
Similarity search leveraging distance-based index structures is increasingly being used for complex data types. It has been shown that for high dimensional uniform vectors with similarity norms, any...
DOI: 10.1093/bioinformatics/bth929 (2004)
Weijia Xu, Willard J. Briggs, Joanna Padolina, Ruth E. Timme, Wenguo Liu, Al Linder, ...
Vol. 20 Suppl. 1 2004, pages i355–i362
Miranker: Interactive Schema Integration with Sphinx (2004)
François Barbançon, Daniel P. Miranker
Abstract. The Internet has instigated a critical need for automated tools that facilitate integrating countless databases. Since non-technical end users are often the ultimate repositories of the...
Biosequence use cases in mobios sql (2004)
Daniel P. Miranker, Willard J Briggs, Rui Mao, Shulin Ni, Weijia Xu
The sequencing and annotation of entire genomes has enriched the content of biological sequence databases such that new methods of sequence analysis, comparison and retrieval are being invented and...
A metric model of amino acid substitution (2004)
Xu, Weijia, Miranker, Daniel P.
Motivation: We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed...
Xu, Weijia, Briggs, Willard J., Padolina, Joanna, Timme, Ruth E., Liu, Wenguo, Linder, C. Randal, ...
Motivation: For the purpose of identifying evolutionary reticulation events in flowering plants, we determine a large number of paired, conserved DNA oligomers that may be used as primers to amplify...
A metric model of amino acid substitution (2004)
Xu, Weijia, Miranker, Daniel P.
Motivation: We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed...
A metric model of amino acid substitution (2004)
Xu, Weijia, Miranker, Daniel P.
Motivation: We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed...
Motivation: We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed...
An Assessment of a Metric Space Database Index to Support Sequence Homology (2003)
Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker
Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a...
SQL extensions and database mechanisms for managing biosequences (2003)
Willard J Briggs, Wenguo Liu, Rui Mao, Weijia Xu, Daniel P. Miranker
Biologically effective retrieval and analysis of sequences entails much more than finding matching strings. While identification and storage of biological sequences usually comprises long functional...
On a Model of Indexability and its Bounds for Range Queries (2002)
Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, VASILIS SAMOLADAS
On a model of indexability and its bounds for range queries (2002)
Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, Vasilis Samoladas
Abstract. We develop a theoretical framework to characterize the hardness of indexing data sets on block-access memory devices like hard disks. We define an indexing workload by a data set and a set...
VenusIDS: An Active Database Component for Intrusion Detection (1999)
Lane B. Warshaw, Lance Obermeyer, Daniel P. Miranker, Sara P. Matzner
Active-databases are a budding technology where rule-based expert systems can be developed in tight integration with database management systems. This paper presents VenusIDS: an active database...
VenusIDS: An Active Database Component for Intrusion Detection (1999)
Lane B. Warshaw, Lance Obermeyer, Daniel P. Miranker, Sara P. Matzner
Active-databases are a budding technology where rule-based expert systems can be developed in tight integration with database management systems. This paper presents VenusIDS: an active database...
VenusIDS: An Active Database Component for Intrusion Detection (1999)
Lane B. Warshaw, Lance Obermeyer, Daniel P. Miranker, Sara P. Matzner
Active-databases are a budding technology where rule-based expert systems can be developed in tight integration with database management systems. This paper presents VenusIDS: an active database...
Rule-Based Query Optimization, Revisited (1999)
Lane B. Warshaw, Daniel P. Miranker
We present an overview and initial performance assessment of a rule-based query optimizer written in VenusDB. VenusDB is an active-database rule language embedded in C++. Following the developments...
Rule-Based Query Optimization, Revisited (1999)
Lane B. Warshaw, Daniel P. Miranker
We present the architecture and a performance assessment of an extensible query optimizer written in Venus. Venus is a general-purpose active-database rule language embedded in C++. Following the...
Vasilis Samoladas, Daniel P. Miranker
Indexing schemes were proposed by Hellerstein, Koutsoupias and Papadimitriou [7] to model data indexing on external memory. Using indexing schemes, the complexity of indexing is quantified by two...
Monitoring Network Logs for Anomalous Activity (1998)
Lane B. Warshaw, Sara P. Matzner, Daniel P. Miranker, Lance Obermeyer, David Spindler
We report on the progress of the VenusDB active-database system as driven by WatchDog, an application in network intrusion detection. The application is typical of a class of problems we coin...
Venus: An Object-Oriented Extension of Rule-Based Programming (1998)
Daniel P. Miranker, Lance Obermeyer, Lane Warshaw, James C. Browne
Declarative programming, in the form of forward-chaining rule languages, offers advantages complementary to procedurally based object-oriented programming languages. The Venus rule language addresses...
Alamo: An Architecture for Integrating Heterogeneous Data Sources (1997)
Daniel P. Miranker, Vasilis Samoladas
We are developing an architecture, Alamo, that addresses both the semantic and physical aspects of data integration. The Alamo architecture permits the interoperability of both data sources and...
Evaluating Triggers Using Decision Trees (1997)
Lance Obermeyer, Daniel P. Miranker
This paper presents an algorithm for implementing rule filtering in active and trigger enabled databases. The algorithm generates one or more decision trees that determine what rules or triggers...
A General Purpose Rule Language as the Basis of a Query Optimizer (1997)
Lane Warshaw, Daniel P. Miranker, Tao Wang, Brf Lee Brownston, Robert Farrell, Elaine Kant, ...
Introduction to Rule-Based Programming, Addison-Wesley Publishing Company, Inc. 1985. [CHM88] K.M. Chandy and J. Misra. Parallel Program Design: A Foundation. Addison-Wesley Publishing Company, Inc.,...
Roberto J. Bayardo, Daniel P. Miranker
Learning during backtrack search is a space-intensive process that records information (such as additional constraints) in order to avoid redundant work. In this paper, we analyze the effects of...
An Overview of the VenusDB Active Multidatabase System (1996)
Daniel P. Miranker, Lance Obermeyer
VenusDB is a C++ embedded, forward-chaining rule language and compiler that includes linguistic elements and runtime support for accessing multiple databases across multiple platforms. Multidatabase...
Processing Queries for First-Few Answers (1996)
Roberto J. Bayardo, Daniel P. Miranker
Special support for quickly finding the first-few answers of a query is already appearing in commercial database systems. This support is useful in active databases, when dealing with potentially...
Roberto Bayardo, Daniel P. Miranker
Learning during backtrack search is a space-intensive process that records information (such as additional constraints) in order to avoid redundant work. In this paper, we analyze the effects of...
Porting an Expert Database Application to an Active Database - An Experience Report (1996)
Lance Obermeyer, Lane Warshaw, Daniel P. Miranker
This paper reports on our experience porting the ALEXSYS program to the Venus active database environment. Our experience shows that a deep understanding of program behavior but only moderate code...
A Case Study of Venus and a Declarative Bases for Rule Modules (1996)
Lane B. Warshaw, Daniel P. Miranker
The Venus Rule Language introduced a declarative basis for structured rule-based programming (as opposed to pro cedural encapsulation). The method is closely related to the nested transaction model...
Compilation for Critically Constrained Knowledge Bases (1996)
Robert C. Schrag, Daniel P. Miranker
We show that many "critically constrained" Random 3SAT knowledge bases (KBs) and other KBs commonly used as benchmarks for propositional knowledge compilation (KC) can be compiled into...
Selective Indexing Speeds Production Systems (1995)
Lance Obermeyer, Daniel P. Miranker, David Brant
In this paper 1 we present performance results for a production system environment, CLIPS++, that demonstrate the advantage of selectively building and applying simple index structures. We contrast...
On the space-time trade-off in solving constraint satisfaction problems (1995)
Roberto J. Bayardo, Daniel P. Miranker
A common technique for bounding the runtime required to solve a constraint satisfaction problem is to exploit the structure of the problem’s constraint graph [Dechter, 92]. We show that a simple...
Toward Semantic-Based Exploration of Parallelism in Production Systems (1994)
Shiow-yang Wu, Daniel P. Miranker, James C. Browne
We propose a new approach for the parallel execution of production system programs. This approach embodies methods of decomposition abstraction using declarative mechanisms. Application semantics can...
Toward Semantic-Based Parallelism in Production Systems (1994)
Shiow-yang Wu, Daniel P. Miranker, James C. Browne
We propose a new approach for the parallel execution of production system programs. This approach embodies methods of decomposition abstraction using declarative mechanisms. Application semantics can...
A New Approach to Modularity in Rule-Based Programming (1994)
James C. Browne, Allen Emerson, Mohamed G. Gouda, Daniel P. Miranker, Aloysius Mok, Roberto Bayardo Jr., ...
In this paper we describe a purely declarative method for introducing modularity into forwardchaining, rule-based languages and its embodiment in the Venus rule language. The method is enforced by...
Loop optimizations for acyclic object-oriented queries (1992)
Vasilis Samoladas, Daniel P. Miranker
Nested loop execution of object-oriented queries retains the promise of maintaining the full generality of the object paradigm, independent of the specifics of any single object model. Thus, from...
Backtrack-Bounded Search in Polynomial Space
Roberto Bayardo Jr, Daniel P. Miranker
We present and analyze a polynomially space-bounded backtrack algorithm for solving constraint satisfaction problems. We show the algorithm is capable of bounding worst-case runtime almost as...
Integrating shotgun proteomics and mRNA expression data to improve protein identification
Ramakrishnan, Smriti R., Vogel, Christine, Prince, John T., Li, Zhihua, Penalva, Luiz O., Myers, Margaret, ...
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun...
Mining gene functional networks to improve mass-spectrometry-based protein identification
Ramakrishnan, Smriti R., Vogel, Christine, Kwon, Taejoon, Penalva, Luiz O., Marcotte, Edward M., Miranker, Daniel P.
Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical...