We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

A Fuzzy based Document Clustering Algorithm

by Kabita Thaoroijam, A. Kakoti Mahanta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 151 - Number 10
Year of Publication: 2016
Authors: Kabita Thaoroijam, A. Kakoti Mahanta
10.5120/ijca2016911923

Kabita Thaoroijam, A. Kakoti Mahanta . A Fuzzy based Document Clustering Algorithm. International Journal of Computer Applications. 151, 10 ( Oct 2016), 21-24. DOI=10.5120/ijca2016911923

@article{ 10.5120/ijca2016911923,
author = { Kabita Thaoroijam, A. Kakoti Mahanta },
title = { A Fuzzy based Document Clustering Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2016 },
volume = { 151 },
number = { 10 },
month = { Oct },
year = { 2016 },
issn = { 0975-8887 },
pages = { 21-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume151/number10/26270-2016911923/ },
doi = { 10.5120/ijca2016911923 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:56:45.509524+05:30
%A Kabita Thaoroijam
%A A. Kakoti Mahanta
%T A Fuzzy based Document Clustering Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 151
%N 10
%P 21-24
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Document clustering is an automatic grouping of text documents into clusters so that documents within a cluster have high similarity values among one another, but dissimilar to documents in other clusters. It has wide applications in areas such as search engines, web mining, information retrieval and topological analysis. This paper presents a new document clustering algorithm using the concept of fuzzy sets, where each cluster is viewed as a fuzzy set over some finite universal set. The algorithm was implemented and the results are reported. The efficiency and time complexity of the algorithm have also been discussed.

References
  1. A. K. Jain, M. N. Murty, and P.J. Flynn, “Dataclustering: A review”, ACM Computing Surveys, 31(3): 264-323, 1999.
  2. I. Ludmila Kuncheva, Fuzzy Classifier Design, Physica-Verlag.
  3. J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithm”, Plenum Press, New York
  4. K. Yeung and W. Ruzzo, Details of the adjusted rand index and clustering algorithms, supplement to the paper "an experimental study on principal component analysis for clustering gene expression data". Bioinformatics (17), 763-774, 2001.
  5. M. Dutta and A. Kakoti Mahanta, “An algorithm for clustering large categorical databases using a Fuzzy set based approach”, Proceedings of NWTAC (National Workshop on Trends in Advanced Computing) 2006, Tezpur University.
  6. M. Friedman, M. Last, O. Zaafrany, M. Schneider, and A. Kandel, “A New Approach for Fuzzy Clustering of Web Documents”, Fuzzy Systems, Proceedings. 2004 IEEE International Conference, Vol 1, 377- 381, July 2004
  7. M. P. Sioka and D. W. Come, “The BankSearch web document dataset: investigating unsupervised clustering and category similarity”, Journal of Network and Computer Applications Volume 28, Issue 2 (April 2005).
  8. M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques”, Proc. KDD-2000 Workshop on TextMining, Aug. 2000.
  9. R. N. Dave, “Generalized fuzzy C-shells clustering and detection of circular and elliptic boundaries”, Pattern Recognition, 25,713-722
  10. S. K. Pal, “Fuzzy tools for the management of uncertainty in pattern recognition, image analysis, vision and expert systems”, International J. System Sc, Vol 22,No 3, pp 511-549, 1991.
  11. W. Pedrycz, “Fuzzy sets in pattern recognition: Methodology and methods”, Pattern Recognition, Vol 23, No ½, pp121-146, 1990
Index Terms

Computer Science
Information Sciences

Keywords

Document Clustering Fuzzy Set Agglomerative Algorithm Compact Representation.