| and G.Wets. Evaluating the performance of cost-based discretization versus entropyand error-based discretization (2007) | |||||||||||||||
Abstract | |||||||||||||||
| Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a preprocessing step to the induction of a classifier is introduced in order to obtain an optimal multi-interval splitting for each numeric attribute. Costbased discretization is particularly useful in the case where the cost of making errors is not equal. A transparent description of the method and the steps involved in cost-based discretization are given. Furthermore, its performance against two other well-known methods, i.e. entropy-based discretization and pure error-based discretization is examined. To this end, experiments on several datasets, taken from the UCI Repository on Machine Learning were carried out. In order to compare the different methods, the area under the Receiver Operating Characteristic (ROC) graph was used and tested on its level of significance. For most datasets the results show that costbased discretization outperforms entropy- and error-based discretization. 1. | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||