Publication View

Cukic, Statistical framework for the prediction of fault-proneness (2007)

Abstract
Accurate prediction of fault prone modules in software development process enables effective discovery and identification of the defects. Such prediction models are especially valuable for the large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a methodology for predicting fault prone modules using a modified random forests algorithm. Random forests improve classification accuracy by growing an ensemble of classification trees and letting them vote on the classification decision. We applied the methodology to five NASA public domain defect data sets. These data sets vary in size, but all typically contain a small number of defect samples in the learning set. For instance, in project PC1, only around 7 % of the instances are defects. If overall accuracy maximization is the goal, then learning from such data usually results in a biased classifier, i.e. the majority of samples would be classified into non-defect class. To obtain better prediction of fault-proneness, two strategies are investigated: proper sampling technique in constructing the tree classifiers, and threshold adjustment in determining the winning class. Both are found to be effective in accurate prediction of fault prone modules. In addition, the paper presents a thorough and statistically sound comparison of these methods against ten other classifiers frequently used in the literature. 1

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.112.5016
Source http://www.hsc.wvu.edu/mbrcc/fs/guolab/publications/software engineering/software 1.pdf
Publisher Idea Group
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.32.9399, 10.1.1.16.3103, 10.1.1.48.8200, 10.1.1.31.1494, 10.1.1.18.5547, 10.1.1.134.2391, 10.1.1.38.929, 10.1.1.37.9873, 10.1.1.104.5130, 10.1.1.79.2439, 10.1.1.124.947