We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Categorical Data Clustering based on an Alternative Data Representation Technique

by Jyoti Prokash Goswami, Anjana Kakoti Mahanta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 72 - Number 5
Year of Publication: 2013
Authors: Jyoti Prokash Goswami, Anjana Kakoti Mahanta
10.5120/12488-8301

Jyoti Prokash Goswami, Anjana Kakoti Mahanta . Categorical Data Clustering based on an Alternative Data Representation Technique. International Journal of Computer Applications. 72, 5 ( June 2013), 7-12. DOI=10.5120/12488-8301

@article{ 10.5120/12488-8301,
author = { Jyoti Prokash Goswami, Anjana Kakoti Mahanta },
title = { Categorical Data Clustering based on an Alternative Data Representation Technique },
journal = { International Journal of Computer Applications },
issue_date = { June 2013 },
volume = { 72 },
number = { 5 },
month = { June },
year = { 2013 },
issn = { 0975-8887 },
pages = { 7-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume72/number5/12488-8301/ },
doi = { 10.5120/12488-8301 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:37:06.615277+05:30
%A Jyoti Prokash Goswami
%A Anjana Kakoti Mahanta
%T Categorical Data Clustering based on an Alternative Data Representation Technique
%J International Journal of Computer Applications
%@ 0975-8887
%V 72
%N 5
%P 7-12
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering categorical data is relatively difficult than clustering numeric data. In numeric data the inherent geometric properties can be used in defining distance functions between data points. In case of categorical data, a distance or dissimilarity function can't be defined directly. An extension of the classical k-means algorithm for categorical data has been done in [1], where a method of representing a cluster using representatives which are very much similar to means used in k-means algorithm has been proposed together with a new distance measure. In this paper we first propose an alternative representation of categorical data as numeric data making it easier to handle. This technique provides a uniform representation for data points and the cluster representatives. The similarity measure proposed in [2] has been used in this new setting. The algorithm used in [1] has been implemented and tested with this new setting and the results obtained have been reported. Experiments were conducted on two real life data sets, namely, soybean diseases, and mushroom data sets. The clusters obtained in soybean dataset are pure clusters with hundred percent accuracy. In the other dataset also it gives relatively higher accuracy with small errors.

References
  1. OHN MAR SAN,VAN-NAM HUYNH, YOSHITERU NAKAMORI. (2004): An Alternative Extension of the k- Means Algorithm For Clustering Categorical Data, - Int. J. Appl. Math. Comput. Sci. , Vol. 14, No. 2, 241-247
  2. M. Dutta, A. Kakoti Mahanta. A Fast Summary Based Algorithm for Clustering Large Categorical Databases, Proceedings of ICWES12, Ottawa, CANADA.
  3. Sudipto Guha, Rajeev Rastogi and Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. Proceedings of the IEEE International Conference on Data Engineering,, Sydney, March 1999.
  4. M. Dutta, A. Kakoti Mahanta, Arun K. Pujari. QROCK: A Quick Version of the ROCK Algorithm for Clustering Categorical Data.
  5. Malay Dutta, Anjana Kakoti . An Incremental Clustering Algorithm for Clustering Large sets of Categorical Data, Proceedings of CIT 2001 (4th International Conference on Information Technology), National Institute of Science and Technology, Berhampur, Orrisa, 20-23 Dec. 2001, 45-50
  6. Liang Baia,b, Jiye Lianga, Chuangyin Dang b, Fuyuan Cao a a. (2012): A Novel Fuzzy Clustering Algorithm with between Cluster Information for Categorical Data.
  7. Tao Chen a, Nevin L. Zhang b, Tengfei Liu b, Kin Man Poon b, Yi Wang c. (2011): Model-based Multidimensional Clustering of categorical Data.
  8. Iam-On, N, Boongeon. T, Garrett,S,; Price,C. (2012): A Link-Based Cluster Ensemble Approach for Categorical Data Clustering, Knowledge and Data Engineering, IEEE Transactions, Vol. -24, Issue: 3, pages: 413-425.
  9. Chiranth B. O, Panduranga Rao M. V, Basavaraj Patil S. , A New Link Based Approach for Categorical Data Clustering, IJSR , Vol 1, Issue 3.
  10. Zengyou He, Xiaofei Xu, Shengchun Deng. A Cluster Ensemble Method for Clustering Categorical Data
  11. Sue li A. Mingoti, Re nata A. Matos. Clustering Algorithms for Categorical Data: A Monte Carlo Study, International Journal of Statistics and Applications 2012, 2(4): 24-32
  12. Ludmila I. Kuncheva. Fuzzy Classifier Design. Physica- Verlag.
Index Terms

Computer Science
Information Sciences

Keywords

clustering categorical data cluster representative