CFP last date
20 December 2024
Reseach Article

A Novel Clustering Algorithm using K-means (CUK)

by Khaled W. Alnaji, Wesam M. Ashour
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 25 - Number 1
Year of Publication: 2011
Authors: Khaled W. Alnaji, Wesam M. Ashour
10.5120/2995-4025

Khaled W. Alnaji, Wesam M. Ashour . A Novel Clustering Algorithm using K-means (CUK). International Journal of Computer Applications. 25, 1 ( July 2011), 25-30. DOI=10.5120/2995-4025

@article{ 10.5120/2995-4025,
author = { Khaled W. Alnaji, Wesam M. Ashour },
title = { A Novel Clustering Algorithm using K-means (CUK) },
journal = { International Journal of Computer Applications },
issue_date = { July 2011 },
volume = { 25 },
number = { 1 },
month = { July },
year = { 2011 },
issn = { 0975-8887 },
pages = { 25-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume25/number1/2995-4025/ },
doi = { 10.5120/2995-4025 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:10:39.580012+05:30
%A Khaled W. Alnaji
%A Wesam M. Ashour
%T A Novel Clustering Algorithm using K-means (CUK)
%J International Journal of Computer Applications
%@ 0975-8887
%V 25
%N 1
%P 25-30
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

While K-means is one of the most well known methods to partition data set into clusters, it still has a problem when clusters are of different size and different density. K-means converges to one of many local minima. Many methods have been proposed to overcome these limitations of K-means, but most of these methods do not overcome the limitation of both different density and size in the same time. The previous methods success to overcome one of them while fails with the others. In this paper we propose a novel algorithm of clustering using K-means (CUK). Our proposed algorithm uses K-means to cluster data objects by using one additional centroid, several partitioning and merging process are used. Merging decision depends on the average mean distance where average distance between each cluster mean and each data object is determined, since the least and closet clusters in average mean distance are merged in one cluster, this process continues until we get the final required clusters in an accurate and efficient way. By comparing the results with K-means, it was found that the results obtained by the proposed algorithm CUK are more effective and accurate.

References
  1. D.Vanisri, and Dr.C.Loganathan, "An Efficient Fuzzy Clustering Algorithm Based on Modified K-Means", D. Vanisri et. al. International Journal of Engineering Science and Technology Vol. 2(10), 2010, 5949-5958.
  2. S. Guha, R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases”, Proc. ACM SIGMOD Int’l Conf. Management of Data, ACM Press, New York, 1998, pp. 73-84.
  3. Gan, Guojun, Chaoqun Ma, and Jianhong Wu, Data Clusterin, "Theory, Algorithms, and Applications", ASA-SIAM Series on Statistics and Applied Probability, SIAM, Philadelphia, ASA, Alexandria, VA, 2007.
  4. H. Tsai, S. Horng, S. Tsai, S. Lee, T. Kao, and C. Chen. “Parallel clustering algorithms on a reconfigurable array of processors with wider bus networks”, in Proc. IEEE International Conference on Parallel and Distributed Systems, 1997.
  5. I. S. Dhillon and D. S. Modha, “A Data-Clustering Algorithm on Distributed Memory Multiprocessors”, in Proceedings of KDDWS on High Performance Data Mining, 1999.
  6. S. S. Khan and A. Ahmed, “Cluster center initialization for Kmeans algorithm”, in Pattern Recognition Letters, vol. 25, no. 11, pp. 1293-1302, 2004.
  7. P. S. Bradley and U. M. Fayyad, “Refining Initial Points for Kmeans Clustering”, in Technical Report of Microsoft Research Center, Redmond,California, USA, 1998.
  8. F. X. Wu, “Genetic weighted K-means algorithm for clustering large-scale gene expression data”, in BMC Bioinformatics, vol. 9, 2008.
  9. Malay K. Pakhira, " A Modified K-means Algorithm to Avoid Empty Clusters", International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009.
  10. Kohei Arai, and Ali Ridho Barakbah, "Hierarchical K-means: an algorithm for centroids initialization for K-means", Saga Univ. Saga University, Vol. 36, No.1, 2007.
  11. Tajunisha and Saravanan, "Performance analysis of K-means with different initialization methods for high dimensional data", International Journal of Artificial Intelligence & Applications (IJAIA), Vol.1, No.4, October 2010.
  12. J. B. McQueen, “Some methods of classification and analysis in multivariate observations”, in Proc. Of fifth Barkley symposium on mathematical statistics and probability, pp. 281 - 297, 1967.
  13. Likas, Vlassis and J. J. Verbeek, “The global k-means clustering algorithm”, in Pattern Recognition , vol. 36, no. 2, pp. 451-461, 2003.
Index Terms

Computer Science
Information Sciences

Keywords

Data Clustering K-means Clustering using K-means Average Mean Distance