International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 46 - Number 8 |
Year of Publication: 2012 |
Authors: El Sayed Abdel Wahed, Ibrahim Al Emam, Amr Badr |
10.5120/6928-9371 |
El Sayed Abdel Wahed, Ibrahim Al Emam, Amr Badr . Feature Selection for Cancer Classification: An SVM based Approach. International Journal of Computer Applications. 46, 8 ( May 2012), 20-26. DOI=10.5120/6928-9371
Cancer is an immense problem facing Egypt and a notorious human being killer. The magnitude of the disease remains unknown. In fact, it is a significant health problem in many other developing countries. The burden of such a predicament will eventually diminish by better diagnosis and classification. Classification is a machine learning technique used to predict the correlation between data samples and classes. There are several classification techniques, among which are: Support Vector Machine (SVM), K-Nearest Neighbor (k-NN) and Naive Bayes (NB) Classifier. Feature Selection for the classification of cancer data means discovering feature values and profiles of diseased and healthy samples. It also means using this knowledge to predict the state of new samples. In this paper, we have proposed an approach for feature selection based on using SVM in three different ways. First, using SVM as a classifier to build a model based on the training data. The purpose is to measure the accuracy of the model in predicting the category of the test data compared with other classifiers. Second, using SVM as a learner, where data is clustered via K-Means into 3, 4 and 5 clusters. Different classifiers are then applied to the clustered data such as SVM, K-NN and NB. A number of 2 validation methods are used to help predict the accuracy of each classifier. These methods are: the 10-Fold Cross Validation (CV) and the Leave-One-Out. Third, using SVM for feature weighting, by predicting feature importance relative to a target class. The experimental results show that SVM classifier presents best accuracy as a classifier, a learner, and a feature weighting method compared with other classifiers used in this study.