Categorical Data Clustering based on an Alternative Data Representation Technique

Jyoti Prokash Goswami; Anjana Kakoti Mahanta

Call for Paper

November Edition

IJCA solicits high quality original research papers for the upcoming November edition of the journal. The last date of research paper submission is 20 October 2025

Submit your paper

Know more

The week's pick

Zero Trust Architecture Implementation in Enterprise Networks: Evaluating Effectiveness Against Cyber Threats

Stephen Kofi Dotse Samuel Yao Sebuabe Augustus Obeng Silas Asani Abudu Edna Awisie Pappoe

Random Articles

Reseach Article

Categorical Data Clustering based on an Alternative Data Representation Technique

by Jyoti Prokash Goswami, Anjana Kakoti Mahanta

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 72 - Number 5

Year of Publication: 2013

Authors: Jyoti Prokash Goswami, Anjana Kakoti Mahanta

10.5120/12488-8301

Jyoti Prokash Goswami, Anjana Kakoti Mahanta . Categorical Data Clustering based on an Alternative Data Representation Technique. International Journal of Computer Applications. 72, 5 ( June 2013), 7-12. DOI=10.5120/12488-8301

@article{ 10.5120/12488-8301,

author = { Jyoti Prokash Goswami, Anjana Kakoti Mahanta },

title = { Categorical Data Clustering based on an Alternative Data Representation Technique },

journal = { International Journal of Computer Applications },

issue_date = { June 2013 },

volume = { 72 },

number = { 5 },

month = { June },

year = { 2013 },

issn = { 0975-8887 },

pages = { 7-12 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume72/number5/12488-8301/ },

doi = { 10.5120/12488-8301 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:37:06.615277+05:30

%A Jyoti Prokash Goswami

%A Anjana Kakoti Mahanta

%T Categorical Data Clustering based on an Alternative Data Representation Technique

%J International Journal of Computer Applications

%@ 0975-8887

%V 72

%N 5

%P 7-12

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering categorical data is relatively difficult than clustering numeric data. In numeric data the inherent geometric properties can be used in defining distance functions between data points. In case of categorical data, a distance or dissimilarity function can't be defined directly. An extension of the classical k-means algorithm for categorical data has been done in [1], where a method of representing a cluster using representatives which are very much similar to means used in k-means algorithm has been proposed together with a new distance measure. In this paper we first propose an alternative representation of categorical data as numeric data making it easier to handle. This technique provides a uniform representation for data points and the cluster representatives. The similarity measure proposed in [2] has been used in this new setting. The algorithm used in [1] has been implemented and tested with this new setting and the results obtained have been reported. Experiments were conducted on two real life data sets, namely, soybean diseases, and mushroom data sets. The clusters obtained in soybean dataset are pure clusters with hundred percent accuracy. In the other dataset also it gives relatively higher accuracy with small errors.

References

OHN MAR SAN,VAN-NAM HUYNH, YOSHITERU NAKAMORI. (2004): An Alternative Extension of the k- Means Algorithm For Clustering Categorical Data, - Int. J. Appl. Math. Comput. Sci. , Vol. 14, No. 2, 241-247
M. Dutta, A. Kakoti Mahanta. A Fast Summary Based Algorithm for Clustering Large Categorical Databases, Proceedings of ICWES12, Ottawa, CANADA.
Sudipto Guha, Rajeev Rastogi and Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. Proceedings of the IEEE International Conference on Data Engineering,, Sydney, March 1999.
M. Dutta, A. Kakoti Mahanta, Arun K. Pujari. QROCK: A Quick Version of the ROCK Algorithm for Clustering Categorical Data.
Malay Dutta, Anjana Kakoti . An Incremental Clustering Algorithm for Clustering Large sets of Categorical Data, Proceedings of CIT 2001 (4th International Conference on Information Technology), National Institute of Science and Technology, Berhampur, Orrisa, 20-23 Dec. 2001, 45-50
Liang Baia,b, Jiye Lianga, Chuangyin Dang b, Fuyuan Cao a a. (2012): A Novel Fuzzy Clustering Algorithm with between Cluster Information for Categorical Data.
Tao Chen a, Nevin L. Zhang b, Tengfei Liu b, Kin Man Poon b, Yi Wang c. (2011): Model-based Multidimensional Clustering of categorical Data.
Iam-On, N, Boongeon. T, Garrett,S,; Price,C. (2012): A Link-Based Cluster Ensemble Approach for Categorical Data Clustering, Knowledge and Data Engineering, IEEE Transactions, Vol. -24, Issue: 3, pages: 413-425.
Chiranth B. O, Panduranga Rao M. V, Basavaraj Patil S. , A New Link Based Approach for Categorical Data Clustering, IJSR , Vol 1, Issue 3.
Zengyou He, Xiaofei Xu, Shengchun Deng. A Cluster Ensemble Method for Clustering Categorical Data
Sue li A. Mingoti, Re nata A. Matos. Clustering Algorithms for Categorical Data: A Monte Carlo Study, International Journal of Statistics and Applications 2012, 2(4): 24-32
Ludmila I. Kuncheva. Fuzzy Classifier Design. Physica- Verlag.

Index Terms

Computer Science

Information Sciences

Keywords

clustering categorical data cluster representative