We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data

by Smita Chormunge, Sudarson Jena
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 125 - Number 11
Year of Publication: 2015
Authors: Smita Chormunge, Sudarson Jena
10.5120/ijca2015906144

Smita Chormunge, Sudarson Jena . Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data. International Journal of Computer Applications. 125, 11 ( September 2015), 35-40. DOI=10.5120/ijca2015906144

@article{ 10.5120/ijca2015906144,
author = { Smita Chormunge, Sudarson Jena },
title = { Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data },
journal = { International Journal of Computer Applications },
issue_date = { September 2015 },
volume = { 125 },
number = { 11 },
month = { September },
year = { 2015 },
issn = { 0975-8887 },
pages = { 35-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume125/number11/22479-2015906144/ },
doi = { 10.5120/ijca2015906144 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:16:12.825634+05:30
%A Smita Chormunge
%A Sudarson Jena
%T Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 125
%N 11
%P 35-40
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering high dimensional data is challenging due to its dimensionality problem and it affects time complexity and accuracy of clustering methods. This paper presents the F-measure and Euclidean distance based performance efficiency and effectiveness of K-means and Agglomerative hierarchical clustering methods on Text and Microarray datasets by varying cluster values. Efficiency concerns about computational time required to build up dataset and effectiveness concerns about accuracy to cluster the data. Experimental results on different datasets demonstrate that K-means clustering algorithm is favourable in terms of effectiveness where as Agglomerative hierarchical clustering is efficient in time for text datasets used for empirical study.

References
  1. Michael Steinbach, George Karypis and Vipin Kumar,A Comparison of Document Clustering Techniques. KDD Workshop on Text Mining, 2000.
  2. Daxin Jiang,Chun Tang,Aidong Zhang,Cluster Analysis for Gene Expression Data: A survey,IEEE Transactions on Knowledge and Data Engineering,vol.16,no.11,pp-1370- 1386,Nov 2004, doi.ieeecomputersociety.org /10.1109/TKDE.
  3. Michael Steinbach, Levent Ertöz, and Vipin Kumar The Challenges of Clustering High Dimensional Data.in New Vistas in Statistical Physics – Applications in Econophysics, Bioinformatics, and Pattern Recognition, Springer-Verlag, 2004.
  4. Rui Xu and Donald Wunsch, Survey of Clustering Algorithms,IEEE Transactions On Neural Networks, pp 645-678, Vol. 16, No. 3, May 2005.
  5. Takashi Onoda, Miho Sakai, Independent Component Analysis based Seeding method for k-means Clustering, IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2011, DOI 10.1109/WI-IAT.2011.29.
  6. Lior Rokach, Oded Maimon,Clustering Methods Data Mining and Knowledge Discovery Handbook,Springer,2005.
  7. Elke Achtert, Sascha Goldhofer, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek,Evaluation of Clusterings -- Metrics and Visual Support ,Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC, 2012.
  8. Michael Greenacre,Raul Primicerio Measures of Distance between Samples: Euclidean.. Fundacion BBVA publication ,ISBN: 978-84-92937-50-9 pp-47-59, December 2013.
  9. Remco R. Bouckaert,Eibe Frank,Mark Hall,Richard Kirkby,Peter Reutemann,Alex Seewald,David Scuse, WEKA Manual for Version 3-7-10,July 31, 2013.
  10. http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.
  11. Dhillon I. and Modha D., Concept Decompositi-on for Large Sparse Text Data Using Clustering. Machine Learning. 42, pp.143-175. 2001.
  12. Bourennani F,Ken Q. Pu,Ying Zhu,Visualization and Integration of Databases Using Self-Organizing Map, IEEE International Conference on Advances in Databases, Knowledge, and Data Applications, pp-155-160,2009, DOI 10.1109/DBKDA.2009.30.
  13. Song Q, Jingjie Ni and Wang G, A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data, IEEE Transactions On Knowledge And Data Engineering Vol 25 No:1,2013.
  14. http://tunedit.org/repo/Data/Text-wc available at: Machine Learning & Data Mining Algorithms.
  15. https://archive.ics.uci.edu/ml/datasets/DBWorld+e-mails available at: DBWorld e-mails Data Set.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering K-means Agglomerative hierarchical F-measure Precision Recall.