We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

An Agglomerative Clustering Method for Large Data Sets

by Omar Kettani, Faycal Ramdani, Benaissa Tadili
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 92 - Number 14
Year of Publication: 2014
Authors: Omar Kettani, Faycal Ramdani, Benaissa Tadili
10.5120/16074-4952

Omar Kettani, Faycal Ramdani, Benaissa Tadili . An Agglomerative Clustering Method for Large Data Sets. International Journal of Computer Applications. 92, 14 ( April 2014), 1-7. DOI=10.5120/16074-4952

@article{ 10.5120/16074-4952,
author = { Omar Kettani, Faycal Ramdani, Benaissa Tadili },
title = { An Agglomerative Clustering Method for Large Data Sets },
journal = { International Journal of Computer Applications },
issue_date = { April 2014 },
volume = { 92 },
number = { 14 },
month = { April },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume92/number14/16074-4952/ },
doi = { 10.5120/16074-4952 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:14:16.897230+05:30
%A Omar Kettani
%A Faycal Ramdani
%A Benaissa Tadili
%T An Agglomerative Clustering Method for Large Data Sets
%J International Journal of Computer Applications
%@ 0975-8887
%V 92
%N 14
%P 1-7
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Data Mining, agglomerative clustering algorithms are widely used because their flexibility and conceptual simplicity. However, their main drawback is their slowness. In this paper, a simple agglomerative clustering algorithm with a low computational complexity, is proposed. This method is especially convenient for performing clustering on large data sets, and could also be used as a linear time initialization method for other clustering algorithms, like the commonly used k-means algorithm. Experiments conducted on some standard data sets confirm that the proposed approach is effective.

References
  1. Aloise, D. ; Deshpande, A. ; Hansen, P. ; Popat, P. (2009). "NP-hardness of Euclidean sum-of-squares clustering". Machine Learning 75: 245–249. doi:10. 1007/s10994-009-5103-0.
  2. Franti, P. , Virmajoki, O. , Hautamaki, V. : Fast agglomerative clustering using a k-nearest neighbor graph. IEEE TPAMI 28(11) (2006) 1875–1881
  3. Cho, M. , Lee, J. , Lee, K. : Feature correspondence and deformable object matching via agglomerative correspondence clustering. In: ICCV. (2009)
  4. Sander, J. , Ester, M. , Kriegel, H. , Xu, X. : Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery 2(2)(1998) 169–194
  5. Karypis, G. , Han, E. , Kumar, V. : Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32(8) (1999) 68–75
  6. Zhao, D. , Tang, X. : Cyclizing clusters via zeta function of a graph. In: NIPS. (2008)
  7. Felzenszwalb, P. , Huttenlocher, D. : Efficient graph-based image segmentation. IJCV 59(2)
  8. Wei Zhang, Xiaogang Wang, Deli Zhao, Xiaoou Tang: Graph Degree Linkage: Agglomerative Clustering on a Directed Graph Computer Vision – ECCV 2012 Lecture Notes in Computer Science Volume 7572, 2012, pp 428-441
  9. Pasi Fränti, Olli Virmajoki and Ville Hautamäki:Fast PNN-based Clustering Using K-nearest Neighbor Graph. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL. 28, NO. 11,NOVEMBER 2006
  10. Jianfu LI,, Jianshuang LI, Huaiqing HE:A Simple and Accurate Approach to Hierarchical Clustering. Journal of Computational Information Systems 7: 7 (2011) 2577-2584
  11. Chih-Tang Chang, Jim Z. C. Lai, M. D. Jeng: fast agglomerative clustering using information of k-nearest neighbors. Pattern Recognition 43 (2010) 3958–3968
  12. Wei Zhang , Deli Zhao, Xiaogang Wang:Agglomerative Clustering via Maximum Incremental Path Integral. Pattern Recognition 46(11) 3056-3065 (2013)
  13. S. Lloyd, "Least Squares Quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–136, 1982.
  14. P. S. Bradley and U. M. Fayyad, "Refining initial points for K-means clustering", proceedings of the 15th International Conference on Machine Learning, (1998) July 24-27, Morgan Kaufmann, San Francisco, pp. 91-99.
  15. M. Al-Daoud and S. Roberts. New methods for the initialisation of clusters. Technical Report 94. 34, School of Computer Studies,University of Leeds, 1994.
  16. I. Katsavounidis, C. -C. J. Kuo, and Z. Zhang. A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters,1(10):144–146, 1994.
  17. T. Su and J. G. Dy, "In Search of Deterministic Methods for Initializing K-Means and Gauss (2004) 167–181
  18. Merz C and Murphy P, UCI Repository of Machine Learning ftp://ftp. ics. uci. edu/pub/machine-Learning-databases Clustering datasets:http://cs. joensuu. fi/sipu/datasets/
  19. Kaufmann, L. and Rousseeuw, P. J. (1990) Finding Groups in Data. Wiley, New York.
  20. http://www. mathworks. com
Index Terms

Computer Science
Information Sciences

Keywords

Agglomerative clustering k-means initialization.