CFP last date
20 January 2025
Reseach Article

Cascaded Modeling for PIMA Indian Diabetes Data

by M.S. Barale, D.T. Shirke
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 139 - Number 11
Year of Publication: 2016
Authors: M.S. Barale, D.T. Shirke
10.5120/ijca2016909426

M.S. Barale, D.T. Shirke . Cascaded Modeling for PIMA Indian Diabetes Data. International Journal of Computer Applications. 139, 11 ( April 2016), 1-4. DOI=10.5120/ijca2016909426

@article{ 10.5120/ijca2016909426,
author = { M.S. Barale, D.T. Shirke },
title = { Cascaded Modeling for PIMA Indian Diabetes Data },
journal = { International Journal of Computer Applications },
issue_date = { April 2016 },
volume = { 139 },
number = { 11 },
month = { April },
year = { 2016 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume139/number11/24531-2016909426/ },
doi = { 10.5120/ijca2016909426 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:40:37.748765+05:30
%A M.S. Barale
%A D.T. Shirke
%T Cascaded Modeling for PIMA Indian Diabetes Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 139
%N 11
%P 1-4
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper develops the cascaded models for classification of PIMA Indian diabetes database. The k-nearest neighbour method is used to impute the missing data and the processed data is used for further classification. This is done in two steps, in first step k-means clustering algorithm is used for extracting hidden patterns in data set then in second step the classification is done by using suitable classifier. k-means algorithm combined with artificial neural network classifier and k-means algorithm combined with logistic regression classifier achieve classification accuracy above 98%.

References
  1. Alan Agresti Department of Statistics University of Florida Gainesville, Florida, An Introduction to Categorical Data Analysis 2nd Edition, (2007).
  2. A. G. Karegowda, M. A. Jayaram, Integrating Decision Tree and ANN for Categorization of Diabetics Data, International Conference on Computer Aided Engineering, December 13– 15, IIT Madras, Chennai, India (2007).
  3. A. G. Karegowda and M.A. Jayaram, Cascading GA & CFS for Feature Subset Selection in Medical Data Mining , International Conference on IEEE International Advance Computing Conference (IACC?09), Thapar University, Patiala, Punjab India (Mar 2009).
  4. A. G. Karegowda, Punya V., M.A. Jayaram and A.S. Manjunath, Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-3, (Feb 2012).
  5. A. G. Karegowda, Punya V., M.A. Jayaram and A.S. Manjunath, Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5, International Journal of Computer Applications ISSN: 0975 – 8887, Volume 45, (May 2012).
  6. B. M. Patil , R.C. Joshi, Durga Toshniwal, Hybrid prediction model for Type-2 diabetic patients, Expert Systems with Applications, Volume 37 ISS: 8102–8108, (2010).
  7. Gustavo E. A. P. A. Batista and Maria Carolina Monard, University of Sao Paulo, A Study of k- Nearest Neighbour as an Imputation Method.
  8. J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann Publishers, 3rd edition, (2012).
  9. Kayaer, K., & Yildirim, T., Medical diagnosis on pima Indian diabetes using general regression neural networks, artificial neural networks and neural information processing (pp. 181–184), Istanbul, Turkey, (2003).
  10. Kemal Polat, Salih Gunes and Ahmet Arslan, A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine, Expert Systems with Applications, Volume 34 ISS: 482–487, (Jan 2008).
  11. Marvin L. Brown and John F. Kros, Data Mining and the Impact of Missing Data, Industrial Management & Data Systems, Volume 103, ISS: 611–621, (2003).
Index Terms

Computer Science
Information Sciences

Keywords

Missing data Clustering Classification