International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 11 - Number 3 |
Year of Publication: 2010 |
Authors: K.Arunprabha, V.Bhuvaneswari |
10.5120/1565-1875 |
K.Arunprabha, V.Bhuvaneswari . Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring. International Journal of Computer Applications. 11, 3 ( December 2010), 4-7. DOI=10.5120/1565-1875
In Data mining, Clustering is one of the major tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. We present an improved algorithm for learning k while clustering the Categorical clustering. We present a clustering algorithm Gaussian means applied in k-means paradigm that works well for categorical features. For applying Categorical dataset to this algorithm, converting it into numeric dataset. In this paper we present a Heuristic novel techniques are used for conversion and comparing the categorical data with numeric data. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs in k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian. G-means only requires one intuitive parameter, the standard statistical significance level α.