International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 29 - Number 5 |
Year of Publication: 2011 |
Authors: Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam |
10.5120/3564-4903 |
Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam . An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets. International Journal of Computer Applications. 29, 5 ( September 2011), 1-6. DOI=10.5120/3564-4903
The medical data are multidimensional and hundreds of independent features in these high dimensional databases need to be considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed. Feature selection is a preprocessing step which aims to reduce the dimensionality of the data by selecting the most informative features that influence the diagnosis of the disease. We propose a feature selection embedded Hybrid Prediction model that combines two different functionalities of data mining; the clustering and the classification. The F-score feature selection method and k-means clustering selects the optimal feature subsets of the medical datasets that enhances the performance of the Support Vector Machine classifier. The performance of the SVM classifier is empirically evaluated on the reduced feature subset of Diabetes, Breast Cancer and Heart disease data sets. The proposed model is validated using four parameters namely the Accuracy of the classifier, Area Under ROC Curve, Sensitivity and Specificity. The results prove that the proposed feature selection embedded hybrid prediction model indeed improve the predictive power of the classifier and reduce false positive and false negative rates. The proposed method achieves a predictive accuracy of 98.9427% for diabetes dataset, 99% for cancer dataset and 100% for heart disease dataset, the highest predictive accuracy for these datasets, compared to other models reported in the literature.