Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring

K.Arunprabha; V.Bhuvaneswari

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

Navigating the Future of Cybersecurity: A Strategic Approach to Crypto Agility for Modern Enterprises

Aditya Gupta

Random Articles

Numerical Analysis of the Effects of Soil Nail on Slope Stability

May

2016

Article:Diagnosis of Diabetes Mellitus based on Risk Factors

November

2010

Late-Materialization using Sort-merge Join Algorithm

Sep

2016

Copy-Move Forgery Detection using Orthogonal Wavelet Transforms

February

2014

Reseach Article

Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring

by K.Arunprabha, V.Bhuvaneswari

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 11 - Number 3

Year of Publication: 2010

Authors: K.Arunprabha, V.Bhuvaneswari

10.5120/1565-1875

K.Arunprabha, V.Bhuvaneswari . Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring. International Journal of Computer Applications. 11, 3 ( December 2010), 4-7. DOI=10.5120/1565-1875

@article{ 10.5120/1565-1875,

author = { K.Arunprabha, V.Bhuvaneswari },

title = { Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring },

journal = { International Journal of Computer Applications },

issue_date = { December 2010 },

volume = { 11 },

number = { 3 },

month = { December },

year = { 2010 },

issn = { 0975-8887 },

pages = { 4-7 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume11/number3/1565-1875/ },

doi = { 10.5120/1565-1875 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:59:38.580263+05:30

%A K.Arunprabha

%A V.Bhuvaneswari

%T Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring

%J International Journal of Computer Applications

%@ 0975-8887

%V 11

%N 3

%P 4-7

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In Data mining, Clustering is one of the major tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. We present an improved algorithm for learning k while clustering the Categorical clustering. We present a clustering algorithm Gaussian means applied in k-means paradigm that works well for categorical features. For applying Categorical dataset to this algorithm, converting it into numeric dataset. In this paper we present a Heuristic novel techniques are used for conversion and comparing the categorical data with numeric data. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs in k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian. G-means only requires one intuitive parameter, the standard statistical significance level α.

References

“Anderson-Darling: A Goodness of Fit Test for Small Samples Assumptions”,START,Vol .10,No.5.
Ahmed M. Sultan Hala Mahmoud Khaleel., ”A new modified Goodness of fit tests for type 2 censored sample from Normal population“
Blake. C.L. and Merz. C.J. “ UCI repository of machine learning databases”,1998.
Chris Ding, Xiaofeng He, Hongyuan Zha, and Horst Simon. “Adaptive dimension reduction for clustering high dimensional data”.In Proceedings of the 2nd IEEE International Conference on Data Mining, 2002.
Dongmin Cai, and Stephen S-T Yau, ”Categorical Clustering By Converting Associated Information” International Journal of Computer Science 1;1 2006.
Greg Hamerly,Charles Elkan, “Learning the k in k means”
Gregory James Hamerly,”Learning structure and concepts in data through data clustering”. 2001.
Jain,A.K., Murty. M. N., and Flynn. P. J. “Data clustering: a review”. ACM Computing Surveys, 1999.
Stephens. M.A. “EDF statistics for goodness of fit and some comparisons”. American Statistical Association, September 1974.
Zhang. Y. , Fu. A, Cai. C. and Heng. P., “Clustering categorical data” 2000
Zhexue Huang, ”Extensions to the K-means algorithm for clustering Large Data sets with categorical value”, 1998.

Index Terms

Computer Science

Information Sciences

Keywords

Data mining Clustering Algorithm Categorical data Gaussian Distribution