CFP last date
20 January 2025
Reseach Article

CATCLUS – A Proposed Algorithm for Clustering Categorical Data

by Srikanta Kolay, Kumar S. Ray
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 139 - Number 10
Year of Publication: 2016
Authors: Srikanta Kolay, Kumar S. Ray
10.5120/ijca2016909394

Srikanta Kolay, Kumar S. Ray . CATCLUS – A Proposed Algorithm for Clustering Categorical Data. International Journal of Computer Applications. 139, 10 ( April 2016), 40-44. DOI=10.5120/ijca2016909394

@article{ 10.5120/ijca2016909394,
author = { Srikanta Kolay, Kumar S. Ray },
title = { CATCLUS – A Proposed Algorithm for Clustering Categorical Data },
journal = { International Journal of Computer Applications },
issue_date = { April 2016 },
volume = { 139 },
number = { 10 },
month = { April },
year = { 2016 },
issn = { 0975-8887 },
pages = { 40-44 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume139/number10/24530-2016909394/ },
doi = { 10.5120/ijca2016909394 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:40:37.039705+05:30
%A Srikanta Kolay
%A Kumar S. Ray
%T CATCLUS – A Proposed Algorithm for Clustering Categorical Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 139
%N 10
%P 40-44
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Classification of categorical data always involves more complexities compared to the numerical data. Because, a firm outline cannot be drawn in case of categorical data. Different types of assumptions are followed by various researchers to treat such kind of data. Again, dissimilarity measures applied in case of numerical data cannot be applied directly in this case. In this paper, a new clustering algorithm for categorical data is proposed. The algorithm is using a newly devised dissimilarity measure. This paper only includes the theoretical description of the proposed algorithm with appropriate example.

References
  1. MCQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297.
  2. Z. Huang Extensions to the k-means algorithm for clustering large data sets with categorical values Data Mining and Knowledge Discovery, 2 (3) (1998), pp. 283–304
  3. S. Guha, R. Rastogi, and K. Shim,” ROCK: A Robust Clustering Algorithm for Categorical Attributes”, 15th International Conference on Data Engineering, pp. 512-521, 2000.
  4. V., Ganti, J. Gehrke, R. Ramakrishnan, CACTUS – clustering categorical data using summaries, in: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 73–83.
  5. Z. He, X. Xu, S. Deng, Squeezer: an efficient algorithm for clustering categorical data Journal of Computer Science & Technology, 17 (5) (2002), pp. 611–624
  6. D. Kim, K. Lee, D. Lee Fuzzy clustering of categorical data using fuzzy centroids Pattern Recognition Letters, 25 (11) (2004), pp. 1263–1271
Index Terms

Computer Science
Information Sciences

Keywords

Categorical Data Clustering Dissimilarity Measure Algorithm.