CFP last date
20 January 2025
Reseach Article

A Deterministic K-means Algorithm based on Nearest Neighbor Search

by Omar Kettani, Benaissa Tadili, Faycal Ramdani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 63 - Number 15
Year of Publication: 2013
Authors: Omar Kettani, Benaissa Tadili, Faycal Ramdani
10.5120/10544-5541

Omar Kettani, Benaissa Tadili, Faycal Ramdani . A Deterministic K-means Algorithm based on Nearest Neighbor Search. International Journal of Computer Applications. 63, 15 ( February 2013), 33-37. DOI=10.5120/10544-5541

@article{ 10.5120/10544-5541,
author = { Omar Kettani, Benaissa Tadili, Faycal Ramdani },
title = { A Deterministic K-means Algorithm based on Nearest Neighbor Search },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 63 },
number = { 15 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 33-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume63/number15/10544-5541/ },
doi = { 10.5120/10544-5541 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:14:25.739406+05:30
%A Omar Kettani
%A Benaissa Tadili
%A Faycal Ramdani
%T A Deterministic K-means Algorithm based on Nearest Neighbor Search
%J International Journal of Computer Applications
%@ 0975-8887
%V 63
%N 15
%P 33-37
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In data mining, the k-means algorithm is among the most commonly and widely used method for solving clustering problems because of its simplicity and performance. However, one of the main drawback of this algorithm is that its accuracy and performance are sensitive to the initial choice of clustering centers, which are generated randomly. To overcome this drawback, we propose a simple deterministic method based on nearest neighbor search and k-means procedure in order to improve clustering results. Experimental results on various data sets reveal that the proposed method is more accurate than standard K-means algorithm.

References
  1. Aloise, D. ; Deshpande, A. ; Hansen, P. ; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10. 1007/s10994-009-5103-0.
  2. Lloyd. , S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10. 1109/TIT. 1982. 1056489.
  3. P. S. Bradley and U. M. Fayyad, "Refining initial points for K-means Clustering", Proceeding of The Fifteenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, 1998, pp. 91-99.
  4. Khan and A. Ahmad, "Cluster Center Initialization for K-mean Clustering", Pattern Recognition Letters, Volume 25, Issue 11, 2004, pp. 1293-1302
  5. Arthur, D. and S. Vassilvitskii, 2007. K-means++: The advantages of careful seeding. Proceeding of the 18th Annual ACM-SIAM Symposium of Discrete Analysis, Jan. 7-9, ACM Press, New Orleans, Louisiana, pp:1027-1035.
  6. Ahmed and W. Ashour "An Initialization Method for the K-means Algorithm using RNN and Coupling Degree" International Journal of Computer Applications (0975 – 8887) Volume 25– No. 1, July 2011
  7. C. Zhang and Z. Fang "An Improved K-means Clustering Algorithm" Journal of Information & Computational Science 10: 1 (2013) 193–199
  8. L. Kaufman and P. J. Rousseeuw. Finding groups in Data: "an Introduction to Cluster Analysis". Wiley, 1990.
  9. Yoonho Hwang; Bohyung Han; Hee-Kap Ahn "A fast nearest neighbor search algorithm by nonlinear embedding" Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
  10. Ming-Chao Chiang, Chun-Wei Tsai, Chu-Sing Yang "A time-efficient pattern reduction algorithm for k-means clustering" Information Sciences 181 (2011) 716–731
  11. You Li, Kaiyong Zhao, Xiaowen Chu , Jiming Liu "Speeding up k-Means algorithm by GPUs" Journal of Computer and System Sciences 79 (2013) 216–229
  12. Merz C and Murphy P, UCI Repository of Machine Learning ftp://ftp. ics. uci. edu/pub/machine-Learning-databases Clustering datasets: http://cs. joensuu. fi/sipu/datasets/
  13. http://www. mathworks. com
Index Terms

Computer Science
Information Sciences

Keywords

Nearest Neighbor Search Initial Centroid K-means Clustering Algorithm