Approximation to the K-Means Clustering Algorithm using PCA

Sathyendranath Malli; Nagesh H. R.; B. Dinesh Rao

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Assessing LLMs as Cognitive Interpreters of Student Prompts: A Typological Framework

Tadeu da Ponte Matevz Vremec Matej Mertik

Random Articles

Reseach Article

Approximation to the K-Means Clustering Algorithm using PCA

by Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 175 - Number 11

Year of Publication: 2020

Authors: Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao

10.5120/ijca2020920605

Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao . Approximation to the K-Means Clustering Algorithm using PCA. International Journal of Computer Applications. 175, 11 ( Aug 2020), 43-46. DOI=10.5120/ijca2020920605

@article{ 10.5120/ijca2020920605,

author = { Sathyendranath Malli, Nagesh H. R., B. Dinesh Rao },

title = { Approximation to the K-Means Clustering Algorithm using PCA },

journal = { International Journal of Computer Applications },

issue_date = { Aug 2020 },

volume = { 175 },

number = { 11 },

month = { Aug },

year = { 2020 },

issn = { 0975-8887 },

pages = { 43-46 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume175/number11/31501-2020920605/ },

doi = { 10.5120/ijca2020920605 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:24:48.299280+05:30

%A Sathyendranath Malli

%A Nagesh H. R.

%A B. Dinesh Rao

%T Approximation to the K-Means Clustering Algorithm using PCA

%J International Journal of Computer Applications

%@ 0975-8887

%V 175

%N 11

%P 43-46

%D 2020

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Healthcare is an emerging domain that produces data exponentially. These massive data contain a wide variety of fields, which lead to a problem in analyzing the information. Clustering is a popular method for analyzing data. Data is split into smaller clusters having similar properties and is then analyzed. The K-Means algorithm [1] is a well-known technique among clustering methods. In this paper, an efficient approximation to the K-means problem targeted for large data by reducing the number of features to one through Principle Component Analysis(PCA) is introduced. This data is clustered in one dimension using the K - means algorithm. Intra-cluster RMS error in the modified algorithm is compared with the K-means algorithm in m dimensions and is found to be reasonable. The time taken by the modified algorithm is significantly less when compared to the K - means algorithm.

References

S.P Lloyd, Least Squares quantization in PCM, IEEE trans. Inf. Theory 28(2) (1982) 129-136
D. Arthur, S. Vassilvitskii, k-Meansþ þ: the advantages of careful seeding, in ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007, pp. 1027–1035
Hotelling H., Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441, and 498–520.
Marco Capóa; An efficient approximation to the K -means clustering for massive data, Knowledge-Based Systems 117 (2017) 56–69
Grigorios Tzortzis n; The MinMax k-Means clustering algorithm, Pattern Recognition 47(2014)2505–2516
Jing Wang; Fast Approximate k-Means via Cluster Closures, 978-1-4673-1228-8/12/2012 IEEE.
Hassan Ismkhan; I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means, Pattern Recognition 79 (2018) 402–413
M. E. Celebi, Hassan A, Patricio; A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications 40 (2013) 200–210
Amir Ahmad, Lipika Dey, A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering 63 (2007) 503–527
Han Xiao, Kashif Rasul, Roland Vollgraf; Fashion- MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, https://www.researchgate.net/publication/319312259, 2017
S. Sieranoja and P. Fränti, "Fast and general density peaks clustering", Pattern Recognition Letters, 128, 551-558, December 2019
D. Sculley. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, pages 1177–1178. ACM, 2010.

Index Terms

Computer Science

Information Sciences

Keywords

K-means RMS error PCA Approximation.