Weighted Proximity Best-Joins for Information Retrieval † (2009)
Risi Thonangi, Hao He, Anhai Doan, Haixun Wang
Abstract—We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a...
A Monte Carlo Sampling Framework for Information Recovery ∗ (2009)
Junyi Xie, Jun Yang, Yuguo Chen, Haixun Wang, Philip S. Yu
There has been a recent resurgence in research related to noisy and incomplete data. Many applications require information to be recovered from imperfect data. For example, in sensor data processing,...
Efficiently Answering Reachability Queries on Very Large Directed Graphs (2009)
Ruoming Jin, Yang Xiang, Ning Ruan, Haixun Wang, Graph Indexing
Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social...
Lock-FreeConsistencyControlforWeb2.0Applications ∗ (2009)
Jiangming Yang, Haixun Wang, Ning Gu, Yiming Liu, Chunsong Wang, Qiwei Zhang
Online collaboration and sharing is the central theme of many webbased services that create the so-called Web 2.0 phenomena. Using the Internet as a computing platform, many Web 2.0 applications set...
Stop Chasing Trends: Discovering High Order Models in Evolving Data (2009)
Shixi Chen, Haixun Wang, Shuigeng Zhou, Philip S. Yu
Abstract — Many applications are driven by evolving data — patterns in web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction models for...
Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach (2009)
Yijian Bai, Haixun Wang, Carlo Zaniolo
Monitoring multiple streaming sources for collective decision making presents several challenges. First, streaming data are often of large volume, fast speed, and highly bursty nature. Second, it is...
Efficiently Answering Reachability Queries on Very Large Directed Graphs (2009)
Ruoming Jin, Yang Xiang, Ning Ruan, Haixun Wang
Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social...
Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach (2009)
Monitoring multiple streaming sources for collective decision making presents several challenges. First, streaming data are often of large volume, fast speed, and highly bursty nature. Second, it is...
Estimating the Selectivity of XML Path Expression with predicates by Histograms ⋆ (2008)
Yu Wang, Haixun Wang, Xiaofeng Meng, Shan Wang
Abstract. Selectivity estimation of path expressions in querying XML data plays an important role in query optimization. A path expression may contain multiple branches with predicates, each of which...
Providing Freshness Guarantees for Outsourced Databases ∗ (2008)
Database outsourcing becomes increasingly attractive as advances in network technologies eliminate the perceived performance difference between in-house databases and outsourced databases, and price...
In system management applications, an overwhelming amount of data are generated and collected in the form of temporal events. While mining temporal event data to discover interesting and frequent...
ABSTRACT Integrity Auditing of Outsourced Data ∗ (2008)
An increasing number of enterprises outsource their IT services to third parties who can offer these services for a much lower cost due to economy of scale. Quality of service is a major concern in...
ABSTRACT Integrity Auditing of Outsourced Data (2008)
An increasing number of enterprises outsource their IT functions or business processes to third-parties who offer these services with a lower cost due to the economy of scale. Quality of service has...
A Sampling-Based Approach to Information Recovery † (2008)
Junyi Xie, Jun Yang, Yuguo Chen, Haixun Wang, Philip S. Yu
Abstract — There has been a recent resurgence of interest in research on noisy and incomplete data. Many applications require information to be recovered from such data. Ideally, an approach for...
A fully distributed framework for cost-sensitive data mining (2008)
Wei Fan, Haixun Wang, Philip S. Yu
In this paper, we propose a fully distributed system (as compared to centralized and partially distributed systems) for cost-sensitive data mining. Experimental results have shown that this approach...
ABSTRACT Integrity Auditing of Outsourced Data ∗ (2008)
An increasing number of enterprises outsource their IT services to third parties who can offer these services for a much lower cost due to economy of scale. Quality of service is a major concern in...
Inductive Learning in Less Than One Sequential Data Scan (2008)
Wei Fan, Haixun Wang, Philip S. Yu
Most recent research of scalable inductive learning on very large dataset, decision tree construction in particular, focuses on eliminating memory constraints and reducing the number of sequential...
ABSTRACT Discovery in Multi-Attribute Data with User-defined Constraints (2008)
Chang-shing Perng, Haixun Wang, Sheng Ma, Joseph L. Hellerstein
Pattern-based Similarity Search for Microarray Data (2008)
One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two objects in a...
Haixun Wang, Carlo Zaniolo, Yan-nei Law, Haixun Wang, Carlo Zaniolo
We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them....
A Random Method for Quantifying Changing Distributions in Data Streams (2008)
Abstract. In applications such as fraud and intrusion detection, it is of great interest to measure the evolving trends in the data. We consider the problem of quantifying changes between two...
Haixun Wang, Jian Yin, Philip S. Yu, Jeffrey Xu Yu
Mining data streams of changing class distributions is important for real-time business decision support. The stream classifier must evolve to reflect the current class distribution. This poses a...
A Balanced Ensemble Approach to Weighting Classifiers for Text Classification (2008)
Gabriel Pui, Cheong Fung, Jeffrey Xu Yu, Haixun Wang, David W. Cheung, Huan Liu
This paper studies the problem of constructing an effective heterogeneous ensemble classifier for text classification. One major challenge of this problem is to formulate a good combination function,...
Abstract Active Mining of Data Streams (2008)
Wei Fan, Yi-an Huang, Haixun Wang, Philip S. Yu
Most previously proposed mining methods on data streams make an unrealistic assumption that “labelled ” data stream is readily available and can be mined at anytime. However, in most real-world...
Jiong Yang, Haixun Wang, Wei Wang, Philip S. Yu
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously...
Faiz Arni, Shalom Tsur, Haixun Wang, Carlo Zaniolo
This paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend...
A Balanced Ensemble Approach to Weighting Classifiers for Text Classification (2008)
Gabriel Pui, Cheong Fung, Jeffrey Xu Yu, Haixun Wang, David W. Cheung, Huan Liu
This paper studies the problem of constructing an effective heterogeneous ensemble classifier for text classification. One major challenge of this problem is to formulate a good combination function,...
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support new applications domains, such as datamining. In this...
Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz
Information Systems Catch the moment: maintaining closed frequent itemsets over a data stream sliding window
Catch the Moment: Maintaining Closed Frequent Itemsets (2008)
Over Data Stream, Yun Chi, Philip S. Yu, Haixun Wang, Richard R. Muntz
This paper considers the problem of mining closed frequent itemsets over a data stream sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the...
Fast computing reachability labelings for large graphs with high compression rate (2008)
Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, Philip S. Yu
Abstract. The need of processing graph reachability queries stems from many applications that manage complex data as graphs. The applications include transportation network, Internet traffic...
Database System Extensions for Decision Support: the AXL Approach (2007)
Research on database-centric data mining is seeking to improve the eectiveness of database systems in decision support applications. Dierent solutions are now used for dierent problems, including (i)...
Implementation of XY Stratification: An Extension to LDL++ (2007)
Introduction The problem of allowing non-monotonic constructs, such as negation and aggregates, in recursive programs represents a difficult challenge faced by current research in deductive...
User Defined Aggregates in LDL++ (2007)
Introduction The reason why aggregate is important is twofold. One is that aggregate in deductive database systems introduces a situation that is very similar to negation, since before any aggregate...
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support new application domains. Considerable efforts by...
Improving Performance of Bicluster Discovery in a Large Data Set (2007)
Jiong Yang, Wei Wang, Haixun Wang, Philip Yu
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously...
Faiz Arni, Shalom Tsur, Haixun Wang, Carlo Zaniolo
This paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend...
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support new applications domains, such as datamining. In this...
Wei Fan, Haixun Wang, Philip S. Yu, Shaw-hwa Lo, Salvatore Stolfo
Presently, inductive learning is still performed in a frustrating batch process. The user has little interaction with the system and no control over the final accuracy and training time. If the...
Inductive Learning in Less Than One Sequential Data Scan (2007)
Wei Fan, Haixun Wang, Philip S. Yu
Most recent research of scalable inductive learning on very large dataset, decision tree construction in particular, focuses on eliminating memory constraints and reducing the number of sequential...
The s2-tree: An index structure for subsequence matching of spatial objects (2007)
Haixun Wang, Chang-shing Perng
Abstract. We present the S 2- Tree, an indexing method for subsequence matching of spatial objects. The S 2- Tree locates subsequences within a collection of spatial sequences, i.e., sequences made...
ABSTRACT Mining Concept-Drifting Data Streams Using Ensemble Classifiers (2007)
Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han
Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target...
ABSTRACT Mining Concept-Drifting Data Streams using Ensemble Classifiers (2007)
Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han
Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target...
Database System Extensions for Decision Support: the AXL Approach (2007)
Research on database-centric data mining is seeking to improve the effectiveness of database systems in decision support applications. Different solutions are now used for different problems,...
Logic-Based User-Defined Aggregates for the Next Generation of Database Systems (2007)
London Milan Paris, Carlo Zaniolo, Haixun Wang
Summary. In this paper, we provide logic-based foundations for the extended aggregate constructs required by advanced database applications. In particular, we focus on data mining applications and...
Abstract ATLaS: A Native Extension of SQL for Data Mining (2007)
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need...
Yun Chi, Yun Chi, Philip S. Yu, Philip S. Yu, Haixun Wang, Haixun Wang, ...
been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be...
Kun-lung Wu, Kirsten W. Hildrum, Wei Fan, Gang Luo, Philip S. Yu, Charu C. Aggarwal, ...
In this paper, we describe the challenges of prototyping a reference application on System S, a distributed stream processing middleware under development at IBM Research. With a large number of...
Supporting ranking and clustering as generalized order-by and group-by (2007)
Chengkai Li, Min Wang, Lipyeow Lim, Haixun Wang
The Boolean semantics of SQL queries cannot adequately capture the “fuzzy ” preferences and “soft ” criteria required in non-traditional data retrieval applications. One way to solve this...
Kun-lung Wu, Kirsten W. Hildrum, Wei Fan, Gang Luo, Philip S. Yu, Charu C. Aggarwal, ...
In this paper, we describe the challenges of prototyping a reference application on System S, a distributed stream processing middleware under development at IBM Research. With a large number of...
Gstring: A novel approach for efficient search in graph databases (2007)
Haoliang Jiang, Haixun Wang, Philip S. Yu, Shuigeng Zhou
Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a...
Blinks: Ranked keyword searches on graphs (2007)
Hao He, Haixun Wang, Jun Yang, Philip S. Yu
Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each...
Discovering frequent closed partial orders from strings (2006)
Jian Pei, Haixun Wang, Ieee Computer Society, Jian Liu, Ke Wang, Jianyong Wang, ...
Abstract—Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For...
Dual labeling: Answering graph reachability queries in constant time (2006)
Haixun Wang, Hao He, Jun Yang, Philip S. Yu, Jeffrey Xu Yu
Graph reachability is fundamental to a wide range of applications, including XML indexing, geographic navigation, Internet routing, ontology queries based on RDF/OWL, etc. Many applications involve...
Load shedding in classifying multi-source streaming data: A Bayes Risk approach (2006)
Monitoring multiple streaming sources for collective decision making presents several challenges. First, streaming data are often of large volume, fast speed, and highly bursty nature. Second, it is...
Load shedding in classifying multi-source streaming data: A Bayes Risk approach (2006)
Monitoring multiple streaming sources for collective decision making presents several challenges. First, streaming data are often of large volume, fast speed, and highly bursty nature. Second, it is...
Fast computation of reachability labeling for large graphs (2006)
Jiefeng Cheng, Jeffrey Xu Yu, Xuemin Lin, Haixun Wang, Philip S. Yu
There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2-hop cover can compactly represent the whole edge transitive closure...
On the sequencing of tree structures for XML indexing (2005)
Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically...
Loadstar: A Load Shedding Scheme for Classifying Data Streams (2005)
Yun Chi, Philip S. Yu, Haixun Wang, Richard R. Muntz
We consider the problem of resource allocation in mining multiple data streams. Due to the large volume and the high speed of streaming data, mining algorithms must cope with the e#ects of system...
A native extension of sql for mining data streams (2005)
Chang Luo, Hetal Thakkar, Haixun Wang, Carlo Zaniolo
ESL 1 enables users to develop stream applications in an SQL-like
On the sequencing of tree structures for XML indexing (2005)
Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically...
Loadstar: Load shedding in data stream mining (2005)
Yun Chi, Haixun Wang, Philip S. Yu
In this demo, we show that intelligent load shedding is essential in achieving optimum results in mining data streams under various resource constraints. The Loadstar system introduces load shedding...
A native extension of sql for mining data streams (2005)
Chang Luo, Hetal Thakkar, Haixun Wang, Carlo Zaniolo
ESL 1 enables users to develop stream applications in an SQL-like
Loadstar: A load shedding scheme for classifying data streams (2005)
Yun Chi, Philip S. Yu, Haixun Wang, Richard R. Muntz
We consider the problem of resource allocation in mining multiple data streams. Due to the large volume and the high speed of streaming data, mining algorithms must cope with the effects of system...
Query Languages and Data Models for Database Sequences and Data Streams (2004)
Yan-nei Law, Haixun Wang, Carlo Zaniolo
We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them....
Query Languages and Data Models for Database Sequences and Data Streams (2004)
Yan-nei Law, Haixun Wang, Carlo Zaniolo
We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them....
Moment: Maintaining closed frequent itemsets over a stream sliding window (2004)
Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window...
Query Languages and Data Models for Database Sequences and Data Streams (2004)
Yan-nei Law, Haixun Wang, Carlo Zaniolo
We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them....
Compact reachability labeling for graph-structured data (2004)
Hao He, Haixun Wang, Jun Yang, Philip S. Yu
Testing reachability between nodes in a graph is a well-known problem with many important applications, including knowledge representation, program analysis, and more recently, biological and...
Compact reachability labeling for graph-structured data (2004)
Testing reachability between nodes in a graph is a well-known problem with many important applications, including knowledge representation, program analysis, and more recently, biological and...
Moment: Maintaining closed frequent itemsets over a stream sliding window (2004)
Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window...
Indexing Weighted-Sequences in Large Databases (2003)
Haixun Wang, Chang-shing Perng, Wei Fan, Sanghyun Park, Philip S. Yu
We present an index structure for managing weightedsequences in large databases. A weighted-sequence is defined as a two-dimensional structure where each element in the sequence is associated with a...
ViST: a dynamic index method for querying XML data by tree structures (2003)
Haixun Wang, Sanghyun Park, Wei Fan, Philip S. Yu
With the growing importance of XML in data exchange, much research has been done in providing flexible query facilities to extract data from structured XML documents. In this paper, we propose ViST,...
Enhanced biclustering on expression data (2003)
Jiong Yang, Haixun Wang, Wei Wang, Philip Yu, Uiuc Ibm, Unc Chapel, ...
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously...
ATLaS: A native extension of SQL for data mining (2003)
A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need...
ATLaS: a Small but Complete SQL Extension for Data Mining and Data Streams (2003)
Haixun Wang, Carlo Zaniolo, Chang Richard Luo
Introduction DBMSs have long suffered from SQL's lack of power and extensibility. We have implemented ATLAS [1], a powerful database language and system that enables users to develop complete...
Online Mining of Changes from Data Streams: (2003)
Research Problems And, Guozhu Dong, Jiawei Han, Jian Pei, Haixun Wang, ...
As data streams are gaining prominence in a growing number of emerging applications, advanced analysis and mining of data streams is becoming increasingly important. While there are some recent...
The Deductive Database System LDL++ (2002)
Arni, Faiz, Ong, KayLiang, Tsur, Shalom, Wang, Haixun, Zaniolo, Carlo
This paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend...
ATLaS is a powerful database language and system that enables users to develop complete data-intensive applications in SQL—by writing new table functions and aggregates in SQL, rather than in...
Clustering by pattern similarity in large data sets (2002)
Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu
Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of these models the...
ffi-cluster: capturing subspace correlation in a large data set (2002)
Jiong Yang, Wei Wang, Haixun Wang, Philip Yu
Clustering has been an active research area of great practical importance for recent years. Most previous clustering models have focused on grouping objects with similar values on a (sub)set of...
Mark Hosang, Wayne Wight, Haixun Wang, Carlo Zaniolo Zaniolo, S. Sarawagi Sarawagi
Non blocking AGGREGATE myavg(Next int) int): Real { TABLE state)sum Int, Int, cnt Int); Int); INITIALIZE: { INSERT INTO state VALUES (Next, 1); ITERATE: { UPDATE state SET sum=sum+Next sum =...
Naoki Abe, Edwin Pednault, Haixun Wang, Bianca Zadrozny, Wei Fan, Chid Apte
We empirically evaluate the performance of various reinforcement learning methods in applications to sequential targeted marketing. In particular, we propose and evaluate a progression of...
Mining associations by pattern structure in large relational tables (2002)
Haixun Wang, Chang-shing Perng, Sheng Ma, Philip S. Yu
Association rule mining aims at discovering patterns whose support is beyond a given threshold. Mining patterns composed of items described by an arbitrary subset of attributes in a large relational...
The deductive database system ldl (2002)
Faiz Arni, Haixun Wang, Carlo Zaniolo
This paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend...
Extending sql for decision support applications (2002)
The challenge of extending database systems for decision support applications has been the topic of much recent research—a very incomplete list of previous work includes [11, 8, 12, 4, 10, 5]. Yet,...
The deductive database system ldl (2002)
Natraj Arni, Kayliang Ong, Shalom Tsur, Haixun Wang, Carlo Zaniolo
This paper describes the LDL++ system and the research advances that have enabled its design and development. We begin by discussing the new nonmonotonic and nondeterministic constructs that extend...
Naoki Abe, Edwin Pednault, Haixun Wang, Bianca Zadrozny, Wei Fan, Chid Apte
We empirically evaluate the performance of various reinforcement learning methods in applications to sequential targeted marketing. In particular, we propose and evaluate a progression of...
User-defined aggregates for advanced database applications / (2000)
Thesis (Ph. D.)--University of California, Los Angeles, 2000.
CMP: A Fast Decision Tree Classifier Using Multivariate Predictions (2000)
Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. In this paper, we propose a...
CMP: A Fast Decision Tree Classifier Using Multivariate Predictions (2000)
Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. In this paper, we propose a...
Landmarks: a new model for similarity-based pattern querying in time series databases (2000)
Chang-shing Perng, Haixun Wang, Sylvia R. Zhang, D. Stott Parker
In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional...
Landmarks: a new model for similarity-based pattern querying in time series databases (2000)
Chang-shing Perng, Haixun Wang, Sylvia R. Zhang, D. Stott Parker
In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional...
Nonmonotonic reasoning in LDL (2000)
Abstract Deductive database systems have made major advances on efficient support for nonmonotonic reasoning. A first generation of deductive database systems supported the notion of stratification...
Los Angeles User-Defined Aggregates for Advanced Database Applications (2000)
Haixun Wang, Haixun Wang, Wesley Chu, D. Stott Parker, Edward Stabler
by
User Defined Aggregates in Object-Relational Systems (2000)
User-defined aggregates are essential in many advanced database applications, particularly in expressing data mining functions, but they find little support in current systems including...
Using SQL to Build New Aggregates and Extenders for Object-Relational Systems (2000)
User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that...
Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases (2000)
Chang-shing Perng, Haixun Wang, Sylvia R. Zhang, D. Stott Parker
In this paper we present the Landmark Model, a model for time series that yields new techniques for similarity-based time series pattern querying. The Landmark Model does not follow traditional...
CMP: A Fast Decision Tree Classifier Using Multivariate Predictions (2000)
Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. In this paper, we propose a...
CMP: A Fast Decision Tree Classifier Using Multivariate Predictions (2000)
Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. In this paper, we propose a...
Landmark: A New Technique for Similarity-Based Pattern Querying in Time Series Databases (2000)
Chang-Shing Perng, Haixun Wang, Sylvia R. Zhang, D. Stott Parker
In this paper we present Landmark, a new technique for similarity-based time series pattern querying. Landmark does not follow traditional similarity models which rely on the point-wise Euclidean...
User-Defined Aggregates for Datamining (1999)
User-defined aggregates can be the linchpin of sophisticated datamining functions and other advanced database applications. This is demonstrated by our efficient implementation on DB2 of SQL3...
User-Defined Aggregates in Database Languages (1999)
User-defined aggregates can be the linchpin of sophisticated datamining functions and other advanced database applications, but they find little support in current database systems including...
User Defined Aggregates in Database Languages (1999)
Haixun Wang Computer, Haixun Wang, Carlo Zaniolo
User-defined aggregates can be the linchpin of sophisticated datamining functions and other advanced database applications, but they find little support in current database systems including...
The S²-Tree: An Index Structure for Subsequence Matching of Spatial Objects (1999)
Haixun Wang, Chang-shing Perng, The S
We present the S²-Tree, an indexing method for subsequence matching of spatial objects. The S²-Tree locates subsequences within a collection of spatial sequences, i.e., sequences made up of spatial...
User-Defined Aggregates for Datamining (1999)
Haixun Wang Computer, Haixun Wang, Carlo Zaniolo
User-defined aggregates can be the linchpin of sophisticated datamining functions and other advanced database applications. This is demonstrated by our efficient implementation on DB2 of SQL3...
User-Defined Aggregates in Database Languages. DBPL 1999: 43-60 (1949)
Abstract. User-defined aggregates (UDAs) can be the linchpin of sophisticated data mining functions and other advanced database applications, but they find little support in current database...
Logic-Based User-Defined Aggregates for the Next Generation of Database Systems
. 1 Introduction A new wave of database applications, particularly decision-support and data mining applications, are based on complex aggregates not supported by current DBMSs: in fact, SQL2...