| Data Mining and Knowledge Discovery KL503-02-Mannila-1 September 29, 1997 15:49 (2007) | |||||||||||||||||
Abstract | |||||||||||||||||
| One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class of sentences for defining subgroups of r, and a selection predicate, find all sentences of deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences determine whether is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD. | |||||||||||||||||
Publication details | |||||||||||||||||
| |||||||||||||||||