We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Using Data Fusion for a Context Aware Document Clustering

by P. Venkateshkumar, A. Subramani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 72 - Number 6
Year of Publication: 2013
Authors: P. Venkateshkumar, A. Subramani
10.5120/12497-7430

P. Venkateshkumar, A. Subramani . Using Data Fusion for a Context Aware Document Clustering. International Journal of Computer Applications. 72, 6 ( June 2013), 17-20. DOI=10.5120/12497-7430

@article{ 10.5120/12497-7430,
author = { P. Venkateshkumar, A. Subramani },
title = { Using Data Fusion for a Context Aware Document Clustering },
journal = { International Journal of Computer Applications },
issue_date = { June 2013 },
volume = { 72 },
number = { 6 },
month = { June },
year = { 2013 },
issn = { 0975-8887 },
pages = { 17-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume72/number6/12497-7430/ },
doi = { 10.5120/12497-7430 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:37:12.680084+05:30
%A P. Venkateshkumar
%A A. Subramani
%T Using Data Fusion for a Context Aware Document Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 72
%N 6
%P 17-20
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The large volume of unstructured text data available at various sources such as digital libraries, news, internet, has given arise a need to organize the information as per the user's requirement. Search for relevant information is efficient when context of the selected word in the document is considered. Document Clustering aims to discover natural groupings, and present an overview of classes (topics) in a document collection. Thus, documents with similar contents are related to the same query. In this paper, a new method for clustering documents is proposed. In the proposed method, the term frequency of the document collection is computed and contexts based terms are fused. Agglomerative clustering and Bisecting K-Means are used to cluster the extracted features.

References
  1. Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
  2. Patel, D. , & Zaveri, M. (2011). A Review on Web Pages Clustering Techniques. Trends in Network and Communications, 700-710.
  3. Everitt, B. S. Cluster Analysis. London: Edward Arnold, 1993.
  4. Murtagh, F. , & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
  5. Oikonomakou, N. , & Vazirgiannis, M. (2010). A Review of Web Document Clustering Approaches. Data Mining and Knowledge Discovery Handbook, 931-948.
  6. Grira N, Crucianu M, Boujemaa N (2005) Unsupervised and semi-supervised clustering: a brief survey. In: 7th ACM SIGMM international workshop on multimedia information retrieval, pp 9–16.
  7. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 264–323.
  8. Mahdavi, M. , Chehreghani, M. H. , Abolhassani, H. , & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441-451.
  9. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 264–323.
  10. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD'2000. Technical report of University of Minnesota
  11. Singh, V. K. , Tiwari, N. , & Garg, S. (2011, October). Document Clustering using K-means, Heuristic K-means and Fuzzy C-means. In Computational Intelligence and Communication Networks (CICN), 2011 International Conference on (pp. 297-301). IEEE.
  12. SureshBabu, Y. , Mutyalu, K. V. , & Prasad, Y. S. (2012). A Relevant Document Information Clustering Algorithm for Web Search Engine. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 1(8), pp-16.
  13. Chihli Hung ,Stefan Wermter and Peter Smith, " Hybrid Neural Document Clustering Using Guided Self-Organization and WordNet", IEEE Intelligent Systems Volume 19 Issue 2, March 2004 .
  14. A. Smeaton, M. Burnett, F. Crimmins, and G. Quinn. An architecture for efficient document clustering and retrieval on a dynamic collection of newspaper texts. In BCS-IRSG Annual Colloquium on IR Research, Workshops in Computing, 1998.
  15. Deng Cai, Xiaofei He, and Jiawei Han, "Document Clustering Using Locality Preserving Indexing", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 12, DECEMBER 2005, pg(1624-1637).
  16. Jones, Gareth, Robertson, Alexander M. , Santimetvirul, Chawchat and Willett, Peter (1995) "Non-hierarchic document clustering using a genetic algorithm". Information Research, 1(1).
  17. Raghavan VV, Birchand K (1979) A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the second international conference on information storage and retrieval, pp 10–22
  18. Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley
  19. Benesty, J. , Chen, J. , Huang, Y. , & Cohen, I. (2009). Pearson Correlation Coefficient. Noise reduction in speech processing, 1-4.
  20. El-Hamdouchi, A. , & Willett, P. (1989). Comparison of hierarchic agglomerative clustering methods for document retrieval. The Computer Journal, 32(3), 220-227.
  21. Rokach, L. , & Maimon, O. (2005). Clustering methods. Data mining and knowledge discovery handbook, 321-352.
  22. McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings f the fifth berkeley symposium on mathematical statistics and probability, pp 281–297
Index Terms

Computer Science
Information Sciences

Keywords

Document clustering term frequency Bisecting K-means Agglomerative clustering Reuters dataset