CFP last date
20 January 2025
Reseach Article

A Novel Feature Subset Selection Algorithm for Software Defect Prediction

by Reena P, Binu Rajan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Number 17
Year of Publication: 2014
Authors: Reena P, Binu Rajan
10.5120/17618-8315

Reena P, Binu Rajan . A Novel Feature Subset Selection Algorithm for Software Defect Prediction. International Journal of Computer Applications. 100, 17 ( August 2014), 39-43. DOI=10.5120/17618-8315

@article{ 10.5120/17618-8315,
author = { Reena P, Binu Rajan },
title = { A Novel Feature Subset Selection Algorithm for Software Defect Prediction },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 100 },
number = { 17 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 39-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume100/number17/17618-8315/ },
doi = { 10.5120/17618-8315 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:30:13.417331+05:30
%A Reena P
%A Binu Rajan
%T A Novel Feature Subset Selection Algorithm for Software Defect Prediction
%J International Journal of Computer Applications
%@ 0975-8887
%V 100
%N 17
%P 39-43
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Feature subset selection is the process of choosing a subset of good features with respect to the target concept. A clustering based feature subset selection algorithm has been applied over software defect prediction data sets. Software defect prediction domain has been chosen due to the growing importance of maintaining high reliability and high quality for any software being developed. A software quality prediction model is built using software metrics and defect data collected from a previously developed system release or similar software projects. Upon validation of such a model, it could be used for predicting the fault-proneness of program modules that are currently under development. The proposed clustering based algorithm for feature selection uses minimum spanning tree based method to cluster features. And then the algorithm is applied over four different data sets and its impact is analyzed.

References
  1. Antonio Arauzo-Azofra, Jose Manuel Benitez and Juan Luis Castro, A Feature Set Measure Based On Relief, RASC2004.
  2. Roberto Battiti, Using Mutual Information For Selecting Features In Supervised Neural Net Learning, IEEE Transactions on Neural Networks 1994, VOL 5, NO 4, July 1994.
  3. Hussein Almuallim, Thomas G Dieterich , Efficient Algorithms For Identifying Relevant Features 1991.
  4. George Forman, An Extensive Empirical Study Of Feature Selection Metrics For Text Classification, Journal of Machine Learning Research 3 (2003) 1289-1305.
  5. M Scherf, W Brauer, Improving RBF Networks By The Feature Selection Approach EUBAFES.
  6. Guyon I. and Elisseeff A. , An introduction to variable and feature selection, Journal of Machine Learning Research, 3, pp 1157-1182, 2003.
  7. Mitchell T. M. , Generalization as Search, Arti?cial Intelligence, 18(2), pp 203-226, 1982.
  8. Dash M. and Liu H. , Feature Selection for Classi?cation, Intelligent Data Analysis, 1(3), pp 131-156, 1997.
  9. Langley P. , Selection of relevant features in machine learning, In Proceedings of the AAAI.
  10. Qinbao Song, Jingjie Ni and Guangtao Wang , A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data, IEEE Transactions On Knowledge And Data Engineering Vol:25 No:1 Year 2013.
  11. Tim Menzies, Jeremy Greenwald, and Art Frank "Data Mining Static Code Attributes to Learn Defect Predictors", IEEE Transactions On Software Engineering, Vol. 33, No. 1, January 2007.
  12. Software Defect Prediction: Heuristics for weighted Naïve Bayes – Burak Turhan, Ayse Bener, Proceedings Of World Congress On Engineering 2009 Vol 1.
Index Terms

Computer Science
Information Sciences

Keywords

Relevant features Redundant Features Minimum spanning tree Tree partition graph based clustering Software defect prediction Naïve Bayes classifier Decision tree classifier