CFP last date
20 January 2025
Reseach Article

Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques

by Kalai Magal. R, Shomona Gracia Jacob
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 117 - Number 23
Year of Publication: 2015
Authors: Kalai Magal. R, Shomona Gracia Jacob
10.5120/20693-3582

Kalai Magal. R, Shomona Gracia Jacob . Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques. International Journal of Computer Applications. 117, 23 ( May 2015), 18-22. DOI=10.5120/20693-3582

@article{ 10.5120/20693-3582,
author = { Kalai Magal. R, Shomona Gracia Jacob },
title = { Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 117 },
number = { 23 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 18-22 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume117/number23/20693-3582/ },
doi = { 10.5120/20693-3582 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:00:11.192968+05:30
%A Kalai Magal. R
%A Shomona Gracia Jacob
%T Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 117
%N 23
%P 18-22
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Software defect prediction using classification algorithms was advocated by many researchers. Moreover the classifier ensemble can effectively improve classification performance compared to a single classifier. The research on defect prediction using classifier ensemble methods are motivated since they have not been fully exploited. Software defects leads to failure of many defense systems. A comparative study of various classification methods was performed to classify software defects. The methods include Random Tree, Random Forest, Bayesian Network, Naive Bayes, K-Nearest Neighbour and Instance Based Classifier. Random Forest algorithm was found to give more accurate prediction than other classifiers. To enhance the classification accuracy the new algorithm "Improved Random Forest" is proposed. It works by incorporating best feature selection algorithm with the Random Forest to gives better accurracy. Correlation based Feature Subset Selection algorithm selects the optimal subset of features. The optimal features are fed as a part of Random Forest classification to give better accuracy in software defect prediction. The six optimal subset of features were selected for PC1 dataset. The features are selected by the CFS and utilized by Random Forest to improve the accuracy of existing Random Forest. The experiments were carried on public-NASA datasets of PROMISE repository.

References
  1. Dr. R. Geetha Ramani, S. Vinodh Kumar, Shomona Gracia Jacob,"Predicting Fault Prone Software Modules Using Feature Selection and Classification through Data Mining Algorithms",2012.
  2. J. Han and M. Kamber, ?Data Mining; Concepts and Techniques, Morgan Kaufmann Publishers, 2000.
  3. Software Defect Dataset, PROMISE REPOSITORY, http://promise. site. uottawa. ca/SERepository/datasets-page. html, (2013) December 4.
  4. Y. W. Chen and C. J. Lin, "Combining SVMs with various feature selection strategies",
  5. Hassan Najadat and Izzat Alsmadi,"Enhance Rule Based Detection for Software Fault Prone Modules", International Journal of Software Engineering and Its Applications,Vol. 6, No. 1, January, 2012
  6. Kehan Gao, Taghi M. Khoshgoftaar2, Huanjing Wang and Naeem Seliya,"Choosing software metrics for defect prediction: an investigation on feature selection techniques",software – practice and experience,2011
  7. Twala B (2011) Predicting software faults in large space systems using machine learning techniques
  8. Breiman L. (2001). Random Forests. Machine learning, 45(1):5–32.
  9. L. Guo, Y. Ma, B. Cukic and H. Singh, "Robust prediction of fault proneness by random forests", Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE'04), (2004), pp. 417–428.
  10. Geetha Ramani R, Shomona Gracia Jacob. , "Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models", PLoS ONE (Impact Factor: 4. 537) 8(3): e58772, 2013, ISSN: 1932-6203.
  11. Sonali Agarwal S. A. and Divya Tomar D. T. (2014). A feature selection based model for software defect prediction. International Journal of Advanced Science and Technology, 35:39–58. .
  12. J. Kaur and Pallavi, "Data Mining Techniques for Software Defect Prediction", International Journal of Software and Web Sciences (IJSWS), (2013), pp. 54-57.
  13. Catal C. and Diri B. (2009). A systematic review of software fault prediction studies. Expert systems with applications, 36(4):7346–7354.
Index Terms

Computer Science
Information Sciences

Keywords

Software Defect Prediction Feature Selection Classification Classifier Evaluation.