We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Performance Comparison of Hard and Soft Approaches for Document Clustering

by Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 41 - Number 7
Year of Publication: 2012
Authors: Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot
10.5120/5557-7632

Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot . Performance Comparison of Hard and Soft Approaches for Document Clustering. International Journal of Computer Applications. 41, 7 ( March 2012), 44-48. DOI=10.5120/5557-7632

@article{ 10.5120/5557-7632,
author = { Vibekananda Dutta, Krishna Kumar Sharma, Deepti Gahalot },
title = { Performance Comparison of Hard and Soft Approaches for Document Clustering },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 41 },
number = { 7 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 44-48 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume41/number7/5557-7632/ },
doi = { 10.5120/5557-7632 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:29:01.712571+05:30
%A Vibekananda Dutta
%A Krishna Kumar Sharma
%A Deepti Gahalot
%T Performance Comparison of Hard and Soft Approaches for Document Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 41
%N 7
%P 44-48
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

There is a tremendous spread in the amount of information on the largest shared information source like search engine. Fast and standards quality document clustering algorithms play an important role in helping users effectively towards vertical search engine, World Wide Web, summarizing & organizing information. Recent surveys have shown that partitional clustering algorithms are more suitable for clustering large datasets like World Wide Web. However the K-means algorithm is the most commonly used in partitional clustering algorithm because it can easily be implemented and most efficient interms of execution in time. In this paper we represent a short overview of method for soft approaches of an optimal fuzzy document clustering algorithm as compare to the hard approaches. In the experiment we conducted, we applied the Hard and soft approaches like K-means and Fuzzy c-means on different text document datasets. The number of document in the datasets ranges from 1500 to 2600 and the number of terms ranges from 6000 to over 7500 in both hard and soft approaches. The results illustrate that the soft approaches can generated slightly better result than the hard approaches.

References
  1. Dunn, J. , C. , A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics 3, pp. 32-57, 1973
  2. Bezdek, J. , C. , Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1988
  3. L. Yanjun, "Text Clustering with Feature election byUsing Statistical Data," IEEE Transactions on Knowledgeand Data Engineering, vol. 20, pp. 641-652, 2007.
  4. Valente de Oliveira, J. , Pedrycz, W. , Advances in Fuzzy Clustering and its Applications, John Wiley & Sons, pp 3-30, 2007.
  5. Anderberg, M. R. , 1973. Cluster Analysis for Applications. Academic Press, Inc. , New York, NY.
  6. Berkhin, P. , 2002. Survey of clustering data mining techniques. Accrue Software Research Paper.
  7. Cios K. , Pedrycs W. , Swiniarski R. , 1998. Data Mining – Methods for Knowledge Discovery, Kluwer Academic Publishers.
  8. Everitt, B. , 1980. Cluster Analysis. 2nd Edition. Halsted Press, New York.
  9. Jain A. K. , Murty M. N. , and Flynn P. J. , 1999. Data Clustering: A Review, ACM Computing Survey, Vol. 31, No. 3, pp. 264-323.
  10. Hartigan, J. A. 1975. Clustering Algorithms. John Wiley and Sons, Inc. , New York, NY.
  11. Salton G. and Buckley C. , 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24 (5): pp. 513-523.
  12. Selim, S. Z. And Ismail, M. A. 1984. K-means type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87.
  13. Steinbach M. , Karypis G. , Kumar V. , 2000. A Comparison of Document Clustering Techniques. TextMining Workshop, KDD
  14. Zhao Y. and Karypis G. , 2004. Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering, Machine Learning, 55 (3): pp. 311-331
  15. Anupam Joshi and Raghu Krishnapuram , " Robust Fuzzy Clustering Methods to Support Web Mining", Proceedings of the Workshop on Data Mining and Knowledge Discovery , SOGMOD ,1998
Index Terms

Computer Science
Information Sciences

Keywords

Document Clustering Hard And Soft Approaches Text Datasets Cluster Centriod And Vector Space Model