We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Approximation to the K-Means Clustering Algorithm using PCA

by Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 175 - Number 11
Year of Publication: 2020
Authors: Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao
10.5120/ijca2020920605

Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao . Approximation to the K-Means Clustering Algorithm using PCA. International Journal of Computer Applications. 175, 11 ( Aug 2020), 43-46. DOI=10.5120/ijca2020920605

@article{ 10.5120/ijca2020920605,
author = { Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao },
title = { Approximation to the K-Means Clustering Algorithm using PCA },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2020 },
volume = { 175 },
number = { 11 },
month = { Aug },
year = { 2020 },
issn = { 0975-8887 },
pages = { 43-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume175/number11/31501-2020920605/ },
doi = { 10.5120/ijca2020920605 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:24:48.299280+05:30
%A Sathyendranath Malli
%A Nagesh H. R.
%A B. Dinesh Rao
%T Approximation to the K-Means Clustering Algorithm using PCA
%J International Journal of Computer Applications
%@ 0975-8887
%V 175
%N 11
%P 43-46
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Healthcare is an emerging domain that produces data exponentially. These massive data contain a wide variety of fields, which lead to a problem in analyzing the information. Clustering is a popular method for analyzing data. Data is split into smaller clusters having similar properties and is then analyzed. The K-Means algorithm [1] is a well-known technique among clustering methods. In this paper, an efficient approximation to the K-means problem targeted for large data by reducing the number of features to one through Principle Component Analysis(PCA) is introduced. This data is clustered in one dimension using the K - means algorithm. Intra-cluster RMS error in the modified algorithm is compared with the K-means algorithm in m dimensions and is found to be reasonable. The time taken by the modified algorithm is significantly less when compared to the K - means algorithm.

References
  1. S.P Lloyd, Least Squares quantization in PCM, IEEE trans. Inf. Theory 28(2) (1982) 129-136
  2. D. Arthur, S. Vassilvitskii, k-Meansþ þ: the advantages of careful seeding, in ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007, pp. 1027–1035
  3. Hotelling H., Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441, and 498–520.
  4. Marco Capóa; An efficient approximation to the K -means clustering for massive data, Knowledge-Based Systems 117 (2017) 56–69
  5. Grigorios Tzortzis n; The MinMax k-Means clustering algorithm, Pattern Recognition 47(2014)2505–2516
  6. Jing Wang; Fast Approximate k-Means via Cluster Closures, 978-1-4673-1228-8/12/2012 IEEE.
  7. Hassan Ismkhan; I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means, Pattern Recognition 79 (2018) 402–413
  8. M. E. Celebi, Hassan A, Patricio; A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications 40 (2013) 200–210
  9. Amir Ahmad, Lipika Dey, A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering 63 (2007) 503–527
  10. Han Xiao, Kashif Rasul, Roland Vollgraf; Fashion- MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, https://www.researchgate.net/publication/319312259, 2017
  11. S. Sieranoja and P. Fränti, "Fast and general density peaks clustering", Pattern Recognition Letters, 128, 551-558, December 2019
  12. D. Sculley. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, pages 1177–1178. ACM, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

K-means RMS error PCA Approximation.