Publication View

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms ABSTRACT (2008)

Abstract
In this paper, we focus on using a cluster of SMPs for scalable data mining. We have developed distributed memory and shared memory parallelization techniques that are applicable to a number of common data mining algorithms. These techniques are incorporated in a middleware called FREERIDE (FRamework for Rapid Implementations of Datamining Engines). We present experimental evaluation of our techniques and framework using apriori association mining, k-means clustering, and a decision tree algorithm. We achieve excellent speedups for apriori and k-means, and good distributed memory speedup for decision tree construction. Despite using a common set of techniques and a middleware with a high-level interface, the speedups we achieve compare well against the reported performance from stand-alone implementations of individual parallel data mining algorithms. Overall, our work shows that a common framework can be used for efficiently parallelizing algorithms for different data mining tasks. 1.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.117.9130
Source http://www.cs.kent.edu/~jin/Papers/HPDM-Jin.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English