CFP last date
20 January 2025
Reseach Article

An Efficient Approach Parallel Support Vector Machine for Classification of Diabetes Dataset

by Naveeen Kumar Shrivastava, Praneet Saurabh, Bhupendra Verma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 36 - Number 6
Year of Publication: 2011
Authors: Naveeen Kumar Shrivastava, Praneet Saurabh, Bhupendra Verma
10.5120/4496-6342

Naveeen Kumar Shrivastava, Praneet Saurabh, Bhupendra Verma . An Efficient Approach Parallel Support Vector Machine for Classification of Diabetes Dataset. International Journal of Computer Applications. 36, 6 ( December 2011), 19-24. DOI=10.5120/4496-6342

@article{ 10.5120/4496-6342,
author = { Naveeen Kumar Shrivastava, Praneet Saurabh, Bhupendra Verma },
title = { An Efficient Approach Parallel Support Vector Machine for Classification of Diabetes Dataset },
journal = { International Journal of Computer Applications },
issue_date = { December 2011 },
volume = { 36 },
number = { 6 },
month = { December },
year = { 2011 },
issn = { 0975-8887 },
pages = { 19-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume36/number6/4496-6342/ },
doi = { 10.5120/4496-6342 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:22:27.755999+05:30
%A Naveeen Kumar Shrivastava
%A Praneet Saurabh
%A Bhupendra Verma
%T An Efficient Approach Parallel Support Vector Machine for Classification of Diabetes Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 36
%N 6
%P 19-24
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The paper proposes a Parallel SVM for predicting the diabetes chances in human based on a survey dataset which relates the different body parameters with diabetic and non diabetic persons. The aim of the paper is to correctly predict the future possibility of diabetes for any person. Since the survey dataset size could be very large with large numbers of parameters which makes it difficult to handle by simple SVM hence a parallel SVM concept is proposed in this paper to distribute these datasets into n different sets for n different machines which reduces the computational complexity, processing power and memory requirements for each machine. The proposed method is simple but quite reliable for parallel operation of SVM and can be used for large and unbalanced datasets the method also provide the flexibility to modify according to the dataset size, processors and memory available on different units. We have tested the proposed method using MATLAB and results are very encouraging.

References
  1. Horváth (2003) in Suykens et al. p 392
  2. Kristian Woodsend and Jacek Gondzio“Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training” Journal of Machine Learning Research 10 (2009) 1937-1953.
  3. Nasullah Khalid Alham brunel university PHD thesis on “Parallelizing Support Vector Machines for Scalable Image Annotation”
  4. Yumao Lu and Vwani Roychowdhury “Parallel Randomized Support Vector Machine” PAKDD 2006, LNAI 3918, pp. 205–214, 2006.
  5. Tamir Hazan Amit Man Amnon Shashua “A Parallel Decomposition Solver for SVM: Distributed Dual Ascend using Fenchel Duality”
  6. Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, Kunle Olukotun “Map-Reduce for Machine Learning on Multicore” CS. Department, Stanford University 353 Serra Mall, Stanford University, Stanford CA 94305-9025 Rexee Inc.
  7. “Jian-Pei Zhang; Zhong-Wei Li; Jing Yang; “A parallel SVM training algorithm on large-scale classification problems” Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., China IEEE Machine Learning and Cybernetics, 2005. Proceedings of 2005.
  8. Lee, Y.-J., & Mangasarian, O. L. (2001). “Rsvm: Reduced support vector machines”. First SIAM International Conference on Data Mining. Chicago.
  9. Joachims, T. (1998). “Making large-scale svm learning practical. Advances in Kernel Methods Support Vector Learning.”
  10. Kmeans clustering, available at: http://en.wikipedia.org/wiki/K-means_clustering
  11. Wei Yu, Tiebin Liu, Rodolfo Valdez, Marta Gwinn, Muin J Khoury “Application of support vector machine modeling for prediction of common diseases: the case of diabetes and prediabetes”. BMC Medical Informatics and Decision Making 2010, 10:16
  12. Mohammed Khalilia, Sounak Chakrabortyand Mihail Popescu “Predicting disease risks from highly imbalanced data using random forest” BMC Medical Informatics and Decision Making 2011,11:51
Index Terms

Computer Science
Information Sciences

Keywords

Diabetes Support vector Machine K means Clustering Parallel Support Vector Machine Binary Classification