Publication View

ABSTRACT Accurate One Pass Decision Tree Construction (2008)

Abstract
In mining continuous data streams, one popular paradigm is using sampling and having a one-pass algorithm with probabilistic bound on the accuracy. A key application of this approach is in decision tree construction. A disadvantage, however, of an algorithm that only achieves a probabilistic bound is that there are no guarantees from the results obtained from a particular application of the algorithm. In this paper, we present an approach for making one-pass decision tree construction more accurate. We present a new algorithm which attempts to determine, and mark as such, exact split points for nodes. This is done by creating a confidence interval near the probabilistic split point for a node, and storing and potentially reassigning samples that belong to this interval. The confidence intervals are constructed using Hoeffding inequality, and can be shrunk as additional samples are processed. We have evaluated our approach using a number of synthetic and real datasets. In summary, our results suggests that the approach can split a large fraction of the nodes exactly, with only modest memory and computational overheads. 1.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.88.9925
Source http://www.math.ohio-state.edu/~banerjee/anjan/DM/DecisionTree.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.106.9846, 10.1.1.41.6931, 10.1.1.104.152, 10.1.1.119.3124, 10.1.1.42.5335, 10.1.1.21.3216, 10.1.1.19.8648, 10.1.1.40.854, 10.1.1.14.4071, 10.1.1.136.3242, 10.1.1.10.3265, 10.1.1.121.9697, 10.1.1.15.4774, 10.1.1.13.9867, 10.1.1.118.6092, 10.1.1.71.5482