CFP last date
20 January 2025
Reseach Article

An Optimized Approach for k-means Clustering

Published on December 2013 by Sadhana Tiwari, Tanu Solanki
ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness
Foundation of Computer Science USA
QSHINE - Number 1
December 2013
Authors: Sadhana Tiwari, Tanu Solanki
b319dc1f-a569-457a-aa35-cb2f35ab3854

Sadhana Tiwari, Tanu Solanki . An Optimized Approach for k-means Clustering. ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness. QSHINE, 1 (December 2013), 5-7.

@article{
author = { Sadhana Tiwari, Tanu Solanki },
title = { An Optimized Approach for k-means Clustering },
journal = { ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness },
issue_date = { December 2013 },
volume = { QSHINE },
number = { 1 },
month = { December },
year = { 2013 },
issn = 0975-8887,
pages = { 5-7 },
numpages = 3,
url = { /proceedings/qshine/number1/14406-1302/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness
%A Sadhana Tiwari
%A Tanu Solanki
%T An Optimized Approach for k-means Clustering
%J ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness
%@ 0975-8887
%V QSHINE
%N 1
%P 5-7
%D 2013
%I International Journal of Computer Applications
Abstract

Cluster analysis method is one of the most analytical methods of data mining. The method will directly influence the result of clustering. This paper discusses the standard of k-mean clustering and analyzes the shortcomings of standard k-means such as k-means algorithm calculates distance of each data point from each cluster centre. Calculating this distance in each iteration makes the algorithm of low efficiency. This paper introduces an optimized algorithm which solves this problem. This is done by introducing a simple data structure to store some information in every iteration and used this information in next iteration. The introduced algorithm does not require calculating the distance of each data point from each cluster centre in each iteration due to which running time of algorithm is saved. Experimental results show that the improved algorithm can efficiently improve the speed of clustering and accuracy by reducing the computational complexity of standard k-means algorithm.

References
  1. T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Y. Wu, "An efficient k-means clustering algorithm: Analysis and implementation" IEEE Transaction Pattern Analysis and Machine Intelligence, 2002.
  2. Bruce A. Maxwell, Frederic L. Pryor, Casey Smith, "Cluster analysis in cross-cultural research" World Cultures 13(1): 22-38, 2002.
  3. Kiri Wagstaff and Claire Cardie Department of computer science, Cornell University, USA "Constrained k- means algorithm with background knowledge".
  4. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, Introduction to Algorithms, Prentice Hall, 1990.
  5. Anil K. Jain, M. N. Murty, P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, 31(3): 264-323 (1999).
  6. Anil K. Jain and Richard C. Dubes, Algorithms for Clustering Data, Prentice Hall (1988).
  7. Ahmet Alken, Department of Electrical and Electronics Engineering, KSU, Turkey, "Use of K-means clustering in migraine detection by using EEG records under flash stimulation" International Journal of the Physical Sciences Vol. 6(4), pp. 641-650, 18 February, 2011
Index Terms

Computer Science
Information Sciences

Keywords

Cluster Analysis K-means Clustering Kd-tree Lloyd's Algorithm Standard K-means Algorithm Constrained K-means Algorithm.