International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 117 - Number 23 |
Year of Publication: 2015 |
Authors: Kalai Magal. R, Shomona Gracia Jacob |
10.5120/20693-3582 |
Kalai Magal. R, Shomona Gracia Jacob . Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques. International Journal of Computer Applications. 117, 23 ( May 2015), 18-22. DOI=10.5120/20693-3582
Software defect prediction using classification algorithms was advocated by many researchers. Moreover the classifier ensemble can effectively improve classification performance compared to a single classifier. The research on defect prediction using classifier ensemble methods are motivated since they have not been fully exploited. Software defects leads to failure of many defense systems. A comparative study of various classification methods was performed to classify software defects. The methods include Random Tree, Random Forest, Bayesian Network, Naive Bayes, K-Nearest Neighbour and Instance Based Classifier. Random Forest algorithm was found to give more accurate prediction than other classifiers. To enhance the classification accuracy the new algorithm "Improved Random Forest" is proposed. It works by incorporating best feature selection algorithm with the Random Forest to gives better accurracy. Correlation based Feature Subset Selection algorithm selects the optimal subset of features. The optimal features are fed as a part of Random Forest classification to give better accuracy in software defect prediction. The six optimal subset of features were selected for PC1 dataset. The features are selected by the CFS and utilized by Random Forest to improve the accuracy of existing Random Forest. The experiments were carried on public-NASA datasets of PROMISE repository.