We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Performance based Analysis and Comparison of Multi-Algorithmic Clustering Techniques

by Rajesh N. Phursule, P. C. Bhaskar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 45 - Number 4
Year of Publication: 2012
Authors: Rajesh N. Phursule, P. C. Bhaskar
10.5120/6770-9056

Rajesh N. Phursule, P. C. Bhaskar . Performance based Analysis and Comparison of Multi-Algorithmic Clustering Techniques. International Journal of Computer Applications. 45, 4 ( May 2012), 40-44. DOI=10.5120/6770-9056

@article{ 10.5120/6770-9056,
author = { Rajesh N. Phursule, P. C. Bhaskar },
title = { Performance based Analysis and Comparison of Multi-Algorithmic Clustering Techniques },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 45 },
number = { 4 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 40-44 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume45/number4/6770-9056/ },
doi = { 10.5120/6770-9056 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:36:45.643487+05:30
%A Rajesh N. Phursule
%A P. C. Bhaskar
%T Performance based Analysis and Comparison of Multi-Algorithmic Clustering Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 45
%N 4
%P 40-44
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering the documents based on similarity of words and searching the text is major search procedure and widely used for large set of documents. Documents can be clustered using many clustering algorithms such as Nearest Neighbor, K-Means, Hierarchical, Graph Theoretic etc [4] [5] [7]. The performance measurement in terms of space complexity and execution time and searched output in terms of accuracy and redundancy of these algorithms is a needful study [3]. This paper mainly focuses on performance measurement of Nearest Neighbor, K-Means and Hierarchical agglomerative clustering algorithms on text documents as well as compares them in terms of space complexity, execution time, accuracy and redundancy. In particular, preprocess the input text document and convert it into the document graph represented in the form of matrix. Then convert that document graph into relation matrix which gives relation (similarity score) among all the nodes from 0 to 1 [2]. Implementation and the results of applied clustering algorithms ( Nearest Neighbor, K-Means and Hierarchical agglomerative) on documents are discussed and implemented here.

References
  1. Sholom Weiss, Brian White and Chidanand Apte, "A Lightweight Document Clustering", IBM T. J. Watson Research Centre NY10598, USA.
  2. Ramkrishna Varadrajan, Vagelis Hristidis, "A System for Query Specific Document Summarization", Florida International University.
  3. Michael Steinbach, George Karypis, Vipin Kumar, "A Comparison of Document Clustering Techniques" ,University of Minnesota, Technical Report #00-034.
  4. A. K. Jain, Michigan State University, M. N. Murthy, Indian Institute of Science and P. J. Flynn, The Ohio State University, "Data Clustering: A Review".
  5. King B. , "Step-wise Clustering Procedures", 1967J. Am. Stat. Assoc. 69, 86–101.
  6. Anderberg M. R. . , "Cluster Analysis for Application", 1973 Academic Press, Inc. , New York Ny. Augustson, J.
  7. Abracos and G. Pereira-Lopes, "Statistical methods for retrieving most significant paragraphs in newspaper articles", ACL/EACL Workshop on Intelligent Scalable Text Summarization, 1997.
  8. S. Agrawal, S. Chaudhuri, and G. Das, "DBXplorer: A System For Keyword-Based Search Over Relational Databases", ICDE,2002.
  9. E. Amitay, C. Paris, "Automatically Summarizing Web Sites -Is there any way around it?", CIKM,2000.
  10. H. H. Chen, J. J. Kuo, and T. C. Su, "Clustering and Visualization in a Multi-Lingual Multi- Document Summarization System ", ECIR,2003
  11. G. Erkan and D. R. Radev. Lexrank, "Graph-based centrality as salience in text summarization", JAIR,2004.
  12. J. Goldstein, M. Kantrowitz, V. Mittal, J. Carbonell, "Summarizing text documents: Sentence selection and evaluation metrics", ACM SIGIR, 1999.
  13. C. Y. Lin, "Improving Summarization Performance by Sentence Compression - A Pilot Study", IRAL,2003.
  14. D. Cutting, D. Karger, J. Pedersen, and J. Tukey, " Scatter/Gather: a Cluster-based Approach to Browsing Large Document collections", ACM SIGIR 1992.
  15. J. Hartigan and M Wong, ". A k-means clustering algorithm", Applied Statitsics, 1979
  16. A. El-Hamdouchi and P. Willet, ". Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval", The Computer Journal, Vol. 32, No. 3, 1989
Index Terms

Computer Science
Information Sciences

Keywords

Analysis And Comparison Of K-means Nearest Neighbor Agglomerative Hierarchical Document Graph. Clustering Algorithm