International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 120 - Number 17 |
Year of Publication: 2015 |
Authors: Garima Khandelwal, Rakesh Sharma |
10.5120/21321-4341 |
Garima Khandelwal, Rakesh Sharma . A Simple Yet Fast Clustering Approach for Categorical Data. International Journal of Computer Applications. 120, 17 ( June 2015), 25-30. DOI=10.5120/21321-4341
Categorical data has always posed a challenge in data analysis through clustering. With the increasing awareness about Big data analysis, the need for better clustering methods for categorical data and mixed data has arisen. The prevailing clustering algorithms are not suitable for clustering categorical data majorly because the distance functions used for continuous data are not applicable for categorical data. Recent research focuses on several different approaches for clustering categorical data. However, the complexity of methods makes them unsuitable for use in big data. Emphasis should be on algorithms which are faster. Thus paper proposes a simple, fast method derived from statistics for clustering categorical data. Results on popular datasets are encouraging.