CFP last date
20 January 2025
Reseach Article

Approximation to the K-Means Clustering Algorithm using PCA

by Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 175 - Number 11
Year of Publication: 2020
Authors: Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao
10.5120/ijca2020920605

Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao . Approximation to the K-Means Clustering Algorithm using PCA. International Journal of Computer Applications. 175, 11 ( Aug 2020), 43-46. DOI=10.5120/ijca2020920605

@article{ 10.5120/ijca2020920605,
author = { Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao },
title = { Approximation to the K-Means Clustering Algorithm using PCA },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2020 },
volume = { 175 },
number = { 11 },
month = { Aug },
year = { 2020 },
issn = { 0975-8887 },
pages = { 43-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume175/number11/31501-2020920605/ },
doi = { 10.5120/ijca2020920605 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:24:48.299280+05:30
%A Sathyendranath Malli
%A Nagesh H. R.
%A B. Dinesh Rao
%T Approximation to the K-Means Clustering Algorithm using PCA
%J International Journal of Computer Applications
%@ 0975-8887
%V 175
%N 11
%P 43-46
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Healthcare is an emerging domain that produces data exponentially. These massive data contain a wide variety of fields, which lead to a problem in analyzing the information. Clustering is a popular method for analyzing data. Data is split into smaller clusters having similar properties and is then analyzed. The K-Means algorithm [1] is a well-known technique among clustering methods. In this paper, an efficient approximation to the K-means problem targeted for large data by reducing the number of features to one through Principle Component Analysis(PCA) is introduced. This data is clustered in one dimension using the K - means algorithm. Intra-cluster RMS error in the modified algorithm is compared with the K-means algorithm in m dimensions and is found to be reasonable. The time taken by the modified algorithm is significantly less when compared to the K - means algorithm.

References
  1. S.P Lloyd, Least Squares quantization in PCM, IEEE trans. Inf. Theory 28(2) (1982) 129-136
  2. D. Arthur, S. Vassilvitskii, k-Meansþ þ: the advantages of careful seeding, in ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007, pp. 1027–1035
  3. Hotelling H., Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441, and 498–520.
  4. Marco Capóa; An efficient approximation to the K -means clustering for massive data, Knowledge-Based Systems 117 (2017) 56–69
  5. Grigorios Tzortzis n; The MinMax k-Means clustering algorithm, Pattern Recognition 47(2014)2505–2516
  6. Jing Wang; Fast Approximate k-Means via Cluster Closures, 978-1-4673-1228-8/12/2012 IEEE.
  7. Hassan Ismkhan; I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means, Pattern Recognition 79 (2018) 402–413
  8. M. E. Celebi, Hassan A, Patricio; A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications 40 (2013) 200–210
  9. Amir Ahmad, Lipika Dey, A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering 63 (2007) 503–527
  10. Han Xiao, Kashif Rasul, Roland Vollgraf; Fashion- MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, https://www.researchgate.net/publication/319312259, 2017
  11. S. Sieranoja and P. Fränti, "Fast and general density peaks clustering", Pattern Recognition Letters, 128, 551-558, December 2019
  12. D. Sculley. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, pages 1177–1178. ACM, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

K-means RMS error PCA Approximation.