CFP last date
20 January 2025
Reseach Article

Optimizing k-means for Scalability

by Akansha Agrawal, Shreya Sharma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 120 - Number 17
Year of Publication: 2015
Authors: Akansha Agrawal, Shreya Sharma
10.5120/21320-4337

Akansha Agrawal, Shreya Sharma . Optimizing k-means for Scalability. International Journal of Computer Applications. 120, 17 ( June 2015), 20-24. DOI=10.5120/21320-4337

@article{ 10.5120/21320-4337,
author = { Akansha Agrawal, Shreya Sharma },
title = { Optimizing k-means for Scalability },
journal = { International Journal of Computer Applications },
issue_date = { June 2015 },
volume = { 120 },
number = { 17 },
month = { June },
year = { 2015 },
issn = { 0975-8887 },
pages = { 20-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume120/number17/21320-4337/ },
doi = { 10.5120/21320-4337 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:06:29.089383+05:30
%A Akansha Agrawal
%A Shreya Sharma
%T Optimizing k-means for Scalability
%J International Journal of Computer Applications
%@ 0975-8887
%V 120
%N 17
%P 20-24
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Proposed decades ago, k-means is still the most popular algorithm for clustering. Despite the drawbacks of k-means, its advantages make it most attractive. Several researches have been conducted to alleviate the problems of k-means. We suggest here some simple modifications to optimize k-means for scalability without much sacrifice in the precision. Current shift in emphasis of data mining towards Big Data requires fast algorithms that can scale well. We propose an idea how time-tested techniques can be adapted to changing needs. The implementation results demonstrate the impact simple modifications can bring

References
  1. J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
  2. A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31:651-666, 2010.
  3. X. Wu et al. Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1):1-37, 2008.
  4. Lozano, J. A. , Pena, J. M. , Larranaga, P. , 1999. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040.
  5. E. W. Forgy (1965). "Cluster analysis of multivariate data: efficiency versus interpretability of classifications". Biometrics 21: 768–769.
  6. Kaufman, L. , Rousseeuw, P. J. , 1990. Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, Canada.
  7. Erisoglu, M. , Calis, N. , Sakallioglu, S. , 2011. A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognition Letters 32, 1701–1705.
  8. C Liu, T Hu, Y Ge and H Xiong, "Which Distance Metric is Right: An Evolutionary K-Means View", Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, California, USA, April 26-28, 2012.
  9. Igor Melnykov, Volodymyr Melnykov. "On K-means algorithm with the use of Mahalanobis distances", Statistics and Probability Letters 84 (2014) 88–95. http://dx. doi. org/10. 1016/j. spl. 2013. 09. 026
  10. GrigoriosTzortzis, AristidisLikas. "The MinMax k-Means clustering algorithm", Pattern Recognition 47(2014)2505–2516. http://dx. doi. org/10. 1016/j. patcog. 2014. 01. 015
  11. Sadhana Tiwari and Tanu Solanki, "An Optimized Approach for k-means Clustering", International Journal of Computer Applications (0975 – 8887) 9th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine-2013)
  12. A Singh, A Yadav and A Rana, "K-means with Three different Distance Metrics", International Journal of Computer Applications (0975 – 8887) Volume 67– No. 10, April 2013.
  13. M Ramakrishnan and DT Jayaraj, "Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets", International Journal of Computer Applications (0975 – 8887) Volume 89 – No. 7, March 2014.
  14. E. H. Ruspini (1970) Numerical methods for fuzzy clustering. Inform. Sci. 2, 319–350.
Index Terms

Computer Science
Information Sciences

Keywords

Data mining Big Data k-means