CFP last date
20 February 2025
Reseach Article

An Approach for Document Clustering using Agglomerative Clustering and Hebbian-type Neural Network

by Gopal Patidar, Anju Singh, Divakar Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Number 9
Year of Publication: 2013
Authors: Gopal Patidar, Anju Singh, Divakar Singh
10.5120/13139-0532

Gopal Patidar, Anju Singh, Divakar Singh . An Approach for Document Clustering using Agglomerative Clustering and Hebbian-type Neural Network. International Journal of Computer Applications. 75, 9 ( August 2013), 17-22. DOI=10.5120/13139-0532

@article{ 10.5120/13139-0532,
author = { Gopal Patidar, Anju Singh, Divakar Singh },
title = { An Approach for Document Clustering using Agglomerative Clustering and Hebbian-type Neural Network },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 75 },
number = { 9 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 17-22 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume75/number9/13139-0532/ },
doi = { 10.5120/13139-0532 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:43:49.923249+05:30
%A Gopal Patidar
%A Anju Singh
%A Divakar Singh
%T An Approach for Document Clustering using Agglomerative Clustering and Hebbian-type Neural Network
%J International Journal of Computer Applications
%@ 0975-8887
%V 75
%N 9
%P 17-22
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is a useful method that categorizes a large quantity of unordered text documents into a small number of meaningful and coherent collections, thereby providing a basis for instinctive and informative navigation and browsing mechanisms. Different type of distance functions and similarity measures have been used for clustering, such as squared, cosine similarity, Euclidean distance and relative entropy. This paper presents text document space dimension reduction in text document retrieval by agglomerative clustering and Hebbian-type neural network. Hebbian-type neural network reduce document space to two dimensions so each document is represented as a point in the reduced document space. Furthermore, the clusters are formed in compact document space.

References
  1. R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, ISBN 0-201-39829-X, 1999.
  2. P. Baldi, K. Hornik, Neural Networks and Principal Component Analysis: Learning from Examples without Local Minima. Neural Networks, Vol. 2, No 1,1989, pp. 53-58.
  3. C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995.
  4. H. Bourland, Y. Kamp, Autoassociation by the Multilayer Perceptrons and Singular Value Decomposition. Biol. Cybern. , Vol. 59, No 4 -5, 1988, pp. 291-294.
  5. G. G. Chowdhury, Introduction to Modern Information retrieval. Facet Publishing, ISBN 1-85604-480-7, 2004.
  6. S. Deerwester, et. al. , Indexing by Latent Semantic analysis. Journal of the American Society for Information Science, 41(6), 1990, pp. 391-407.
  7. M. Delichère,D. Memmi, Neural Dimensionality Reduction for Document Processing, ESANN'2002 Proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), ISBN 2-930307-02-1, 2002pp. 211-216.
  8. T. Hofmann, Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning. ISSN 0885- 6125, 2001, pp. 177-196.
  9. Y. H. Kim, B. Zhang, Document Indexing using Independent Component Analysis and Signal Separation (ICA2001). San Diego, California, 2001, pp. 557-562.
  10. T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, A. Saarela, Self Organization of a Massive Document Collection. IEEE Transactions on Neural Networks, 2000, pp. 574-585.
  11. D. Merkl, A. Rauber, Document Classification with Unsupervised Artificial Neural Networks. Soft Computing in Information Retrieval: Techniques and Applications, F. Crestani and G. Pasi, Eds. Heidelberg, Germany: Physica-Verlag, 2000, Vol. 50, pp. 102-121.
  12. D. Merkl, M. Dittenbach, A. Rauber, Uncovering Hierarchical Structure in Data using the Growing Hierarchical Self Organizing Map. Neurocomputing, 2002, pp. 199-216.
  13. I. Mokriš, L. Skovajsová, Information Retrieval by eans of Vector Space Model of Document Representation and Cascade Neural Networks. 1st Workshop on Intelligent and Knowledge Oriented Technologies, ISBN 78-80-969202-5-9, Bratislava, Nov. 28– 29, 2006, pp. 102 – 105.
  14. I. Mokriš, L. Skovajsová, Text Document Space Dimension Reduction by Latent Semantic Indexing. 1st Workshop on Intelligent and Knowledge Oriented Technologies, ISBN 978-80-969202-5-9, Bratislava, Nov. 28 – 29, 2006, pp. 106 – 109.
  15. I. Mokriš, L. Skovajsová, Proposal of Latent Semantic Model for Document Set Representation by Neural Network. 2nd Workshop on Intelligent and Knowledge Oriented Technologies, Košice, Nov. 15. -16. 2007, ISBN 978-80-969202-5-9, pp. 102-105.
  16. E. Oja, A Simplified Neuron Model as a Principal Component Analyzer. Journal of Mathematical Biology,Vol. 15, 1982, pp. 267-273.
  17. E. Oja, PCA, ICA and Nonlinear Hebbian Learning. In Proceedings of the International Conference on Artificial Neural Networks, ICANN-95, Paris, France, 1995, pp. 83- 97.
  18. E. Oja, M. Plumbley, Blind Separation of Positive Sources using Nonnegative PCA, 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), April 2003, pp. 11-16.
  19. http://www. research. att. com/~lewis
  20. Barry de Ville, "Text Mining with Holographic, " Decision Tree Ensembles, SAS Institute Inc. , Cary, NC, 2002
  21. Unsupervised Learning of Semantic Relations for Molecular Biology Ontologies, Ciaramita, M. Gangemi, et al, "Unsupervised Learning of Semantic Relations for Molecular Biology Ontologies, " In Proceeding of the 2008 Conference on ontology Learning and Population, Bridging the Gap between Text and Knowledge,2003.
  22. Helm. R, Maarek. Y, "Integrating Information Retrieval and Domain Specific Approaches for Browsing and Retrieval in Object-oriented Class Libraries," In Proceedings of Object-oriented Programming Systems, Languages, and Applications, 47–61, ACM Press, New York, USA (1991).
  23. Alan Marwick,technical lead,IBM, "Text Mining for associations using UIMA, "feb2006.
  24. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications, "In Proc. of the ACMSIGMOD Int'l Conference on Management of Data, Seattle, Washington, June 1998. ACM Press,1998.
  25. A. Hinneburg and D. A. Keim, " Optimal grid-clustering, Towards breaking the curse of dimensionality in high-dimensional clustering, "In Proc. of VLDB-1999, Edinburgh, Scotland, September 2000. Morgan Kaufmann, 1999.
  26. Zamir, O. Etzioni, "Web Document Clustering, A Feasibility Demonstration, " in Proceedings of the 21st International ACM SIGIR Conference on Research and Development.
  27. D. Buttler, L. Liu, and C. Pu, "A fully automated object extraction system for the world wide web," in Proc. Int. Conf. Distrub. Comput. Syst. , 2001, pp. 361–370 ormation Retrieval, ACM 1-58113-015-5 8/98, Melbourne, Australia, 1998.
  28. V. Crescenzi, G. Mecca, P. Merialdo, and P. Missier, "An automatic data grabber for large Web sites," in Proc. VLDB, 2004, pp. 1321–1324.
  29. T. Kohonen, S. Kaski, K. Lagus, J. Salojrvi, J. Honkela, V. Paatero, Saarela, "Self organization of a massive document collection",IEEE Trans. Neural Networks, vol. 11, 2000, pp. 574-585.
  30. J. Tantrum, A. Murua, W. Stuetzle, "Hierarchical model-based clustering of large datasets through fractionation and refractionation, " Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2002, pp. 183-190.
  31. I. S. Dhillon, D. S. Modha, "Concept decompositions for large sparse text data using clustering," Machine Learning, vol. 42, 2001, pp. 143-175.
  32. M. Steinbach, G. Karypis, V. Kumar, "A comparison of documentclustering techniques, " KDD Workshop on Text Mining, 2000,pp. 109-110.
  33. S. Vaithyanathan, B. Dom, "Model-based hierarchical clustering, "Proc. 16th Conf. Uncertainty in Artificial Intelligence, 2000, pp. 599–608.
  34. M. Meila, D. Heckerman, "An experimental comparison of modelbased clustering methods," Machine Learning, vol. 42, 2001, pp. 9–29.
  35. Li haiying,zhuang zhenquan, li bin, wan ke, "A real-time C-V clustering algorithm for web-mining, "journal of electronics, January 2002.
  36. M. Steinbach, G. Karypis, V. Kumar, "A Comparison of document clustering techniques," KDD Workshop on Text Mining, 2000, pp. 109-110.
Index Terms

Computer Science
Information Sciences

Keywords

Agglomerative and Oja Learning Rule of hebbian-type neural network F-measure