CFP last date
20 January 2025
Reseach Article

Effective Purity Method for Measuring the Clustering Accuracy and its Illustration

by Srinivasa Suresh Sikhakolli, Asha Kiran Sikhakolli
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 9
Year of Publication: 2023
Authors: Srinivasa Suresh Sikhakolli, Asha Kiran Sikhakolli
10.5120/ijca2023922752

Srinivasa Suresh Sikhakolli, Asha Kiran Sikhakolli . Effective Purity Method for Measuring the Clustering Accuracy and its Illustration. International Journal of Computer Applications. 185, 9 ( May 2023), 28-33. DOI=10.5120/ijca2023922752

@article{ 10.5120/ijca2023922752,
author = { Srinivasa Suresh Sikhakolli, Asha Kiran Sikhakolli },
title = { Effective Purity Method for Measuring the Clustering Accuracy and its Illustration },
journal = { International Journal of Computer Applications },
issue_date = { May 2023 },
volume = { 185 },
number = { 9 },
month = { May },
year = { 2023 },
issn = { 0975-8887 },
pages = { 28-33 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number9/32731-2023922752/ },
doi = { 10.5120/ijca2023922752 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:25:41.249933+05:30
%A Srinivasa Suresh Sikhakolli
%A Asha Kiran Sikhakolli
%T Effective Purity Method for Measuring the Clustering Accuracy and its Illustration
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 9
%P 28-33
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is one of the commonly used model in business and scientific applications. Often, data science specialists and researchers apply clustering techniques for classification and optimization. Measuring the clustering accuracy is one of the key parameter. There are several extrinsic measures exists for measuring clustering quality. One of them is Purity. It indicates the level of homogeneity of the clusters. Purity computes the sum of frequencies of the dominant class in each cluster and then divides the sum by total number of records. In the existing purity method, total number of clusters is not taken into consideration. According to the researcher, number of clusters have significant effect on overall cluster quality. In this paper, the researcher proposed an algorithm with few changes to the existing purity method. The proposed algorithm is applied on machine learning data sets taken from UCI machine learning repository. Further, significant improvement in purity computation is observed when applied using FCM and K-means clustering. This paper explains proposed algorithm artificial illustration, results & analysis and comparative analysis between proposed purity method an existing purity method.

References
  1. J. Vaidya and C.Clifton, “Privacy preserving k-means clustering over vertically partitioned data”, the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003
  2. Luong The Dung and Ho TuBao, “Enhancing Privacy in Distributed Data Clustering”, Journal of Computer Science and Cybernetics, Vol. 26, No. 2, pp. 1-15, 2010)
  3. Suzuki Kaoru, “Data Mining and the Case for Sampling”, SAS Institute Best Practices Paper, SAS Institute, vol. 18, pp. 361-380, 1999
  4. P.Arabie, L.J Hubert, and G.Soete , Clustering and Classifications, World Scientific, 1996.
  5. S.Guha, Rastogi and K.Shim .Rock: A Robust Clustering Algorithm for Categorical attributes. In proc 1999 Int Conference: Data Engineering (ICD’99), PP 512-521, Sydney,Australia,Mar,1999.
  6. C.M.Bishop, Pattern recognition and Machine Machine Learning New York: Springer, 2006.
  7. J MichlineKamber, Jian Pei, “Data Mining Concepts and Techniques”, ISBN: 978-93-80931-91-3 p.no.444,P.no.487.ELSEVIER, 2012.
  8. Erendira Rendon etal, “Internal Verses External Cluster validation Indexes”, Issue 1, volume 5,2011, International Journal of Computers and communications.
  9. SatyaChaitanyaSripada, Comparision of purity and Entropy of K-means clustering and Fuzzy C means clustering. International journal of Computer Science and Engineering(IJCSE), Vol.2, No.3,June-July,2011, ISSN:0976-5166.
  10. Pacual D et al,Cluster validation using Information Stability Measures, Pattern Recognition, letter 31,2010, pp454-461.
  11. LeganyC,Cluster Validity Measurement Technique,Proceedings of the 5 th WSEAS International Conference on Artificial, Knowledge Engineering, and Data bases: Spain, Feb 15-17,2006,pp.388-393.
  12. Robert Detrano, M.D., Ph.D. Machine Learning Repository, Heart decease data sets available at http://archive.ics.uci.edu/ml/citation_policy.html, Cleveland Clinic Foundation.
  13. Asha kiran, ManimalaPuri, Srinivasa Suresh, PSO Enabled Privacy preservation, Indian Journal of Science and Technology, Vol 10(11), DOI: 10.17485/ijst/2017/v10i11/89318, March 2017, ISSN:0974-5645(online)
  14. Enrique Amigo et al, “A compariosion of Extrinsic clustering evaluation metrics based on formal constraints”, UNED, Madrid, Spain, 2009. "APP purity method", APP method, Average Purity Method,
  15. Shaobin Huang, Yuan Cheng, * Dapeng Lang, Ronghua Chi, and Guofeng Liu Michal Zochowski, EditonA Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking, 2014 Mar DOI: 10.1371/journal.pone.0090109.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering clustering accuracy clustering extrinsic measure clustering purity.