We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

A Comparative Study of Data Clustering Algorithms

by Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 83 - Number 15
Year of Publication: 2013
Authors: Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga
10.5120/14528-2927

Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga . A Comparative Study of Data Clustering Algorithms. International Journal of Computer Applications. 83, 15 ( December 2013), 41-46. DOI=10.5120/14528-2927

@article{ 10.5120/14528-2927,
author = { Geet Singhal, Shipra Panwar, Kanika Jain, Devender Banga },
title = { A Comparative Study of Data Clustering Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { December 2013 },
volume = { 83 },
number = { 15 },
month = { December },
year = { 2013 },
issn = { 0975-8887 },
pages = { 41-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume83/number15/14528-2927/ },
doi = { 10.5120/14528-2927 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:59:31.349200+05:30
%A Geet Singhal
%A Shipra Panwar
%A Kanika Jain
%A Devender Banga
%T A Comparative Study of Data Clustering Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 83
%N 15
%P 41-46
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data clustering is a process of partitioning data points into meaningful clusters such that a cluster holds similar data and different clusters hold dissimilar data. It is an unsupervised approach to classify data into different patterns. In general, the clustering algorithms can be classified into the following two categories: firstly, hard clustering, where a data object can belong to a single and distinct cluster and secondly, soft clustering, where a data object can belong to different clusters. In this report we have made a comparative study of three major data clustering algorithms highlighting their merits and demerits. These algorithms are: k-means, fuzzy c-means and K-NN clustering algorithm. Choosing an appropriate clustering algorithm for grouping the data takes various factors into account for illustration one is the size of data to be partitioned.

References
  1. Joseph P. Bigus. "Data Mining With Neural Networks",Mcgraw-Hill (Tx), 1996
  2. Paulraj Pooniah. "Data Warehousing Fundamentals", Wiley; 2 edition (May 24, 2010).
  3. Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.
  4. Shiv Pratap, Singh Kushwah, KeshavRawat and Pradeep Gupta. Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining.
  5. Arpit Gupta, Ankit Gupta and Amit Mishra. Research Paper On Cluster Techniques Of Data Variations, IJATER, 2011 Volume 1.
  6. Yi Liu, Rong Jin, and Anil K. Jain. "BoostCluster: Boosting Clustering by Pairwise Constraints", KDD 2007, USA.
  7. Anil K. Jain, Alexander Topchy, Martin H. C. Law,and Joachim M. Buhmann. "Landscape of Clustering Algorithms. " ICPR 2004, Vol. 1
  8. Raymond T. Ng and JiaweiHany. "Efficient and Effective Clustering Methods for Spatial Data Mining". 20th VLDB Conference, 1994
  9. Shiv Pratap Singh Kushwah, KeshavRawat, Pradeep Gupta. "Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining", IJITEE 2012, Volume 1, Issue 3.
  10. R. Suganya, R. Shanthi . "Fuzzy C- Means Algorithm- A Review" IJSRP, Volume 2, Issue 11, November 2012 Edition.
  11. P´adraig Cunningham1 and Sarah Jane Delany. "k-Nearest Neighbour Classifiers Technical Report", UCD-CSI-2007-4March 27, 2007
  12. A. K. Jain, M. N. Murty and P. J. Flynn. "Data Clustering: A Review" ACM Computing Surveys, Vol. 31, No. 3, September
Index Terms

Computer Science
Information Sciences

Keywords

k-means algorithm c-means algorithm k-nn algorithm Euclidian distance Hard clustering Soft clustering.