| Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms ABSTRACT (2008) | |||||||||||||
Abstract | |||||||||||||
| In this paper, we focus on using a cluster of SMPs for scalable data mining. We have developed distributed memory and shared memory parallelization techniques that are applicable to a number of common data mining algorithms. These techniques are incorporated in a middleware called FREERIDE (FRamework for Rapid Implementations of Datamining Engines). We present experimental evaluation of our techniques and framework using apriori association mining, k-means clustering, and a decision tree algorithm. We achieve excellent speedups for apriori and k-means, and good distributed memory speedup for decision tree construction. Despite using a common set of techniques and a middleware with a high-level interface, the speedups we achieve compare well against the reported performance from stand-alone implementations of individual parallel data mining algorithms. Overall, our work shows that a common framework can be used for efficiently parallelizing algorithms for different data mining tasks. 1. | |||||||||||||
Publication details | |||||||||||||
| |||||||||||||