Publication View

Approximation and streaming algorithms for histogram construction problems (2008)

Abstract
Histograms are typically used to approximate data distributions. Histograms and related synopsis structures have been successful in a wide variety of popular database applications including approximate querying, similarity searching and data mining. Histograms were a few of the earliest synopsis structures proposed and continue to be popular tools. Typically, the histograms are used as quick and easy estimates, and thus the slight loss of accuracy can be offset by fast histogram construction algorithms. A natural question arises in this context: can we find a fast near optimal approximation algorithm for the histogram construction problem? In this paper, we give the first linear time (1 + ɛ)-factor approximation algorithms (for any ɛ> 0) for several histogram construction problems. Several of our algorithms extend to data streams. We also show that our method generalizes to a large number of histogram construction problems including the use of piecewise small degree polynomials to approximate data. Using synthetic and real-life data sets, we demonstrate that the approximate histograms have almost identical quality in many scenarios and offer significant performance benefits. 1

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.97.2420
Source http://www.cis.upenn.edu/~sudipto/mypapers/histjour.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.129.5879, 10.1.1.102.5483, 10.1.1.52.7679, 10.1.1.21.2565, 10.1.1.24.7941, 10.1.1.13.4827, 10.1.1.36.5401, 10.1.1.7.8618, 10.1.1.29.634, 10.1.1.40.2576, 10.1.1.130.3576, 10.1.1.131.3593, 10.1.1.112.2702, 10.1.1.10.4839, 10.1.1.100.5069, 10.1.1.100.2485, 10.1.1.31.1574, 10.1.1.102.535, 10.1.1.85.4086, 10.1.1.7.4576, 10.1.1.80.9314, 10.1.1.101.6655, 10.1.1.114.2440, 10.1.1.116.8433, 10.1.1.133.3531