Publication View

Effective and Efficient Itemset Pattern Summarization: Regression-based Approaches (2009)

Abstract
In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the non-linear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The treeregression approach employs a decision-tree type of top-down partitionprocess. Inaddition,wediscussalternativestoestimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both realandsyntheticdatasetsdemonstratesthatourapproachessignificantly improve the summarization performance in terms of bothaccuracy (restorationerror),andcomputational cost.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.5951
Source http://www.cs.kent.edu/~jin/Papers/KDD08.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Keywords CategoriesandSubjectDescriptors, H.2.8[DatabaseManagement, Database Applications- Data Mining General Terms, Algorithms, Performance Keywords, frequencyrestoration, patternsummarization, regression
Type text
Language English