Using Data Fusion for a Context Aware Document Clustering

P. Venkateshkumar; A. Subramani

Call for Paper

October Edition

IJCA solicits high quality original research papers for the upcoming October edition of the journal. The last date of research paper submission is 22 September 2025

Submit your paper

Know more

The week's pick

Real-Time Video Transmission using Gaussian Minimum Shift Keying (GMSK) on GNU Radio and USRP for Radiation Monitoring Applications in Nuclear Reactors

Nabiha Ben Abid Abdalla M. Khattab Hani A.M. Harb Chokri Souani

Random Articles

Reseach Article

Using Data Fusion for a Context Aware Document Clustering

by P. Venkateshkumar, A. Subramani

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 72 - Number 6

Year of Publication: 2013

Authors: P. Venkateshkumar, A. Subramani

10.5120/12497-7430

P. Venkateshkumar, A. Subramani . Using Data Fusion for a Context Aware Document Clustering. International Journal of Computer Applications. 72, 6 ( June 2013), 17-20. DOI=10.5120/12497-7430

@article{ 10.5120/12497-7430,

author = { P. Venkateshkumar, A. Subramani },

title = { Using Data Fusion for a Context Aware Document Clustering },

journal = { International Journal of Computer Applications },

issue_date = { June 2013 },

volume = { 72 },

number = { 6 },

month = { June },

year = { 2013 },

issn = { 0975-8887 },

pages = { 17-20 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume72/number6/12497-7430/ },

doi = { 10.5120/12497-7430 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:37:12.680084+05:30

%A P. Venkateshkumar

%A A. Subramani

%T Using Data Fusion for a Context Aware Document Clustering

%J International Journal of Computer Applications

%@ 0975-8887

%V 72

%N 6

%P 17-20

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The large volume of unstructured text data available at various sources such as digital libraries, news, internet, has given arise a need to organize the information as per the user's requirement. Search for relevant information is efficient when context of the selected word in the document is considered. Document Clustering aims to discover natural groupings, and present an overview of classes (topics) in a document collection. Thus, documents with similar contents are related to the same query. In this paper, a new method for clustering documents is proposed. In the proposed method, the term frequency of the document collection is computed and contexts based terms are fused. Agglomerative clustering and Bisecting K-Means are used to cluster the extracted features.

References

Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
Patel, D. , & Zaveri, M. (2011). A Review on Web Pages Clustering Techniques. Trends in Network and Communications, 700-710.
Everitt, B. S. Cluster Analysis. London: Edward Arnold, 1993.
Murtagh, F. , & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
Oikonomakou, N. , & Vazirgiannis, M. (2010). A Review of Web Document Clustering Approaches. Data Mining and Knowledge Discovery Handbook, 931-948.
Grira N, Crucianu M, Boujemaa N (2005) Unsupervised and semi-supervised clustering: a brief survey. In: 7th ACM SIGMM international workshop on multimedia information retrieval, pp 9–16.
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 264–323.
Mahdavi, M. , Chehreghani, M. H. , Abolhassani, H. , & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441-451.
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 264–323.
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD'2000. Technical report of University of Minnesota
Singh, V. K. , Tiwari, N. , & Garg, S. (2011, October). Document Clustering using K-means, Heuristic K-means and Fuzzy C-means. In Computational Intelligence and Communication Networks (CICN), 2011 International Conference on (pp. 297-301). IEEE.
SureshBabu, Y. , Mutyalu, K. V. , & Prasad, Y. S. (2012). A Relevant Document Information Clustering Algorithm for Web Search Engine. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 1(8), pp-16.
Chihli Hung ,Stefan Wermter and Peter Smith, " Hybrid Neural Document Clustering Using Guided Self-Organization and WordNet", IEEE Intelligent Systems Volume 19 Issue 2, March 2004 .
A. Smeaton, M. Burnett, F. Crimmins, and G. Quinn. An architecture for efficient document clustering and retrieval on a dynamic collection of newspaper texts. In BCS-IRSG Annual Colloquium on IR Research, Workshops in Computing, 1998.
Deng Cai, Xiaofei He, and Jiawei Han, "Document Clustering Using Locality Preserving Indexing", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 12, DECEMBER 2005, pg(1624-1637).
Jones, Gareth, Robertson, Alexander M. , Santimetvirul, Chawchat and Willett, Peter (1995) "Non-hierarchic document clustering using a genetic algorithm". Information Research, 1(1).
Raghavan VV, Birchand K (1979) A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the second international conference on information storage and retrieval, pp 10–22
Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley
Benesty, J. , Chen, J. , Huang, Y. , & Cohen, I. (2009). Pearson Correlation Coefficient. Noise reduction in speech processing, 1-4.
El-Hamdouchi, A. , & Willett, P. (1989). Comparison of hierarchic agglomerative clustering methods for document retrieval. The Computer Journal, 32(3), 220-227.
Rokach, L. , & Maimon, O. (2005). Clustering methods. Data mining and knowledge discovery handbook, 321-352.
McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings f the fifth berkeley symposium on mathematical statistics and probability, pp 281–297

Index Terms

Computer Science

Information Sciences

Keywords

Document clustering term frequency Bisecting K-means Agglomerative clustering Reuters dataset