We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining

by S. Sarumathi, N. Shanthi, M. Sharmila
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 81 - Number 4
Year of Publication: 2013
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
10.5120/14004-2050

S. Sarumathi, N. Shanthi, M. Sharmila . A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining. International Journal of Computer Applications. 81, 4 ( November 2013), 46-55. DOI=10.5120/14004-2050

@article{ 10.5120/14004-2050,
author = { S. Sarumathi, N. Shanthi, M. Sharmila },
title = { A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining },
journal = { International Journal of Computer Applications },
issue_date = { November 2013 },
volume = { 81 },
number = { 4 },
month = { November },
year = { 2013 },
issn = { 0975-8887 },
pages = { 46-55 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume81/number4/14004-2050/ },
doi = { 10.5120/14004-2050 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:55:14.073736+05:30
%A S. Sarumathi
%A N. Shanthi
%A M. Sharmila
%T A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining
%J International Journal of Computer Applications
%@ 0975-8887
%V 81
%N 4
%P 46-55
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Over the past decades, a prevalent amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Moreover a myriad of algorithms and methods has been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main aspire of the cluster ensemble is to combine different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of the new methods in the sphere of data mining, it is obligatory to make a critical analysis of the existing techniques and the future novelty. This paper reveals the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this theoretical and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

References
  1. Sandro Vega-pons & Jose reuiz Shulcloper. "A Survey of Clustering Ensemble algorithms. " International Journal of Pattern Recognition and Artificial Intelligence Vol. 25, No. 3 337_372 , 2011.
  2. Cristofor. D & Simovici. D," Finding Median Partitions Using Information Theoretical Based Genetic Algorithms. " J. Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002.
  3. Fisher. D. H . " Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning," vol. 2, pp. 139-172, 1987.
  4. Gibson. D, Klein. J & Raghavan. R, "Clustering Categorical Data: An Approach Based on Dynamical Systems. " Very Large Data Base Endowment Journal . vol. 8, nos. 3-4, pp. 222-236, 2000
  5. Guha. S, Rastogi. R, & Shim. K,. "ROCK: A Robust Clustering Algorithm for Categorical Attributes. " Information Systems, vol. 25, no. 5, pp. 345-366, 2000
  6. Zaki. M. J & Peters. M. Clicks:" Mining Subspace Clusters in Categorical Data via Kpartite Maximal Cliques". Proc. International Conference on Data Engineering (ICDE), pp. 355-356, 2005.
  7. Ganti. V, Gehrke. J, & Ramakrishnan. R "CACTUS: Clustering Categorical Data Using Summaries. " Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 73-83, 1999.
  8. Barbara. D, Li. Y, & Couto. J "COOLCAT: An Entropy-Based Algorithm for Categorical Clustering. " Proc. International Conference on Information and Knowledge Management pp. 582-589, 2002.
  9. Yang. Y, Guan. S, & You. J. "CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data. " Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 682- 687, 2002.
  10. He. Z, Xu. X, & S. Deng. Squeezer: "An Efficient Algorithm for Clustering Categorical Data. " J. Computer Science and Technology vol. 17, no. 5, pp. 611-624, 2002.
  11. Andritsos. P & Tzerpos. V. "Information Theoretic Software Clustering. " IEEE Transactions on Software Engineering. , Vol. 31, no. 2, pp. 150-165, 2005.
  12. Indrajit Saha, Ujjwal Maulik, & Nilanjan. "Differential Fuzzy Clustering for Categorical Data. " International Conference on Methods and Models in Computer Science, 2009.
  13. Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, & Chris Price. "A Link based cluster ensemble approach for categorical data clustering. " IEEE Transactions on knowledge and data engineering, Vol. 24, No. 3, 2012.
  14. Sandro Vega-pons & Jose reuiz Shulcloper. "A Survey of Clustering Ensemble algorithms. "International Journal of Pattern Recognition and Artificial Intelligence Vol. 25, No. 3 (2011) 337_372.
  15. Harun Pirim, Dilip Gautam, Tanmay , Bhowmik, Andy D. Perkins, Burak Ek?ioglu, & Ahmet Alkan, " Performance of an ensemble clustering algorithm on biological datasets". Mathematical and Computational Applications, Vol. 16, No. 1, pp. 87-96. 2011
  16. Domeniconi. C & Al-Razgan. M, " Weighted cluster ensembles: methods and analysis. "ACM Transaction on. Knowledge Discovery Data 2(4) 1_40. 2009
  17. Li Zhang*a, Weida Zhoua, Caili Wua, Jieting Huoa, Haishuang Zoua, & Licheng Jiaoa. "Center matching scheme for K-means cluster ensembles. " MIPPR Pattern Recognition and Computer Vision, edited by Mingyue Ding, Bir Bhanu, Friedrich M. Wahl, Jonathan Roberts, Proc. of SPIE Vol. 7496, 749614 SPIE. 2009
  18. Weingessel, A, Dimitriadou, E. , & Hornik, K. "An ensemblemethodforclustering. "Workingpaperhttp://www. Ci. tuwien. ac. at/conferences/DSC-2003, 51. 2003
  19. Hamid Parvin, Hamid Alinejad-Rokny, & Sajad Parvin. " A New Clustering Ensemble Framework. " International Journal of Learning Management Systems, J. Learn. Man. Sys. 1, No. 1, 19-25. 2013
  20. Yang Lili, Yu Jian, & JIA Caiyan. "A New method for Cluster Ensembles", Programs Foundation of Ministry of Education of China. 2013.
  21. Yu J. & Lin Z C. " Squared error adjacency matrix clustering. " Technical report on Dept. of Computer Science, Beijing Jiaotong University 2008.
  22. Fowlkes C, Belongie S, & Chung F, et al. . " Spectral grouping using the Nyström method. " IEEE Transactions on Geoscience and Remote Sensing (2): 214-225 2004.
  23. Ng A, Jordan M, & Weiss Y. "On spectral clustering: Analysis and an algorithm[C]. " Advances in Neural Information Processing Systems (NIPS). Boston: MIT Press, 849-857. 2002
  24. XU Yuanchun, JIA Jianhua. "Adaptive Spectral Clustering Ensemble Selection via Re-sampling and Population Based Incremental Learning Algorithm. " Journal of Natural Sciences, Vol. 16 No. 3, 228-236 2011
  25. Al-Razgan. M, Domeniconi. R, & Barbara. D. "Random Subspace Ensembles for Clustering Categorical Data. Supervised and Unsupervised Ensemble Methods and Their Applications," pp. 31-48, Springer. 2008.
  26. Jianhua Jia, Xuan Xiao, & Binxiang Liu, "Similarity-based Spectral Clustering Ensemble Selection. " 9th IEEE International Conference on Fuzzy Systems and Knowledge Discovery. 2012
  27. Zhang. X. R, JiaoL. C, & Liu. F et. al. "Spectral clustering ensemble applied to SAR image segmentation. " IEEE Transactions on Geoscience and Remote Sensing, 46 (7)2126-2136 2008
  28. Hongjun Wang, Hanhuai Shan & Arindam Banerjee. "Bayesian Cluster Ensembles. " Wiley Periodicals, Inc. 2011
  29. Jamil Al-Shaqsi & Wenjia Wang, "A Clustering Ensemble Method for Clustering Mixed Data. " IEEE International conference 978-1-4244-8126-2/10/$26. 00. 2010
  30. Al Shaqsi J. & Wang W. "A Novel Three Staged Clustering Algorithm. AIDES European Conference on Data Mining," A. P. Abraham, Ed. Ed. Algarve, Portugal, pp. 19-26 2009.
  31. Ioannis T. Christou, Member IEEE " Coordination of Cluster Ensembles via Exact Methods. " IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 2. 2010
  32. O. du Merle, P. Hansen, B. Jaumard, and N. Mladenovich. "An Interior Point Algorithm for Minimum Sum of Squares Clustering. " SIAM J. Scientific Computing, vol. 21, no. 4, pp. 1484-1505, Mar. 2000.
  33. Topchy A, Jain AK, Punch WF "A mixture model for clustering ensembles. " In: Proceedings of SIAM international conference on data mining, SDM 04, pp 379–390 2004
  34. Fred ALN, Jain AK "Combining multiple clustering using evidence accumulation. " IEEE Trans Pattern Anal Mach Intell 27(6)2005
  35. Strehl A, Ghosh J "Cluster ensembles-a knowledge reuse framework for combining multiple partitions. " J Mach Learn Res 3:583–617 2003
  36. Topchy A, Jain AK, Punch WF "Combining multiple weak clusterings. " In: Proceedings of 3rd IEEE international conference on data mining, pp 331–338 2003
  37. Gullo F, Domeniconi C, Tagarelli A "Projective clustering ensembles. " In: Proceedings of the international conference on data mining (ICDM), pp 794–799 2009
  38. Ka Ka Ng E, Wai-Chee Fu A, Chi-Wing Wong R "Projective clustering by histograms. " IEEE Trans Knowl Data Eng (TKDE) 17(3):369–383 2005
  39. Yiu ML, Mamoulis N "Iterative projected clustering by subspace mining. " IEEE Trans Knowl Data Eng (TKDE) 17(2):176–189 2005
  40. Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A " Finding hierarchies of subspace clusters. " In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 446–453 2006
  41. Domeniconi C, Gunopulos D,MaS,YanB,Al-Razgan M, PapadopoulosD "Locally adaptive metrics for clustering high dimensional data. " Data Min Knowl Disc 14(1):63–972007
  42. Deb K "Multi-objective optimization using evolutionary algorithms". Wiley, New York. 2001
  43. Ruochen Liu, Member, IEEE, Yong Liu, Yangyang Li?Member, IEEE, "An Improved Method for Multi-Objective clustering Ensemble Algorithm. " IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia 2012
  44. A. Strehl, J. Ghosh, "Cluster ensembles-a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research 3 (2002) 583–618. 2002
  45. K. Faceli, A. Carvalho, M. de Souto. " Multi-objective clustering ensemble for gene expression data analysis," Neurocomputing 72(2009)2753-2774.
  46. Shaohong Zhang, Hau-San Wong, "ARImp A Generalized Adjusted Rand Index for Cluster Ensembles. " International Conference on Pattern Recognition, IEEE Computer Society. 2010
  47. L. Hubert and P. Arabie. " Comparing partitions. " Journal of Classification, 2:193–218, 1985.
  48. Taoying Li, Yan Chen "Fuzzy Clustering Ensemble Algorithm for Partitioning Categorical Data. " International Conference on Business Intelligence and Financial Engineering IEEE Computer Society. 2009
Index Terms

Computer Science
Information Sciences

Keywords

Cluster Ensemble methods Co-association matrix Consensus function Median partition.