Bangla Document Categorization using Term Graph

Enamul Hassan; Md Nazim Uddin; Moudud Ahmed Khan

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

Bangla Document Categorization using Term Graph

by Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 178 - Number 44

Year of Publication: 2019

Authors: Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan

10.5120/ijca2019919329

Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan . Bangla Document Categorization using Term Graph. International Journal of Computer Applications. 178, 44 ( Aug 2019), 24-32. DOI=10.5120/ijca2019919329

@article{ 10.5120/ijca2019919329,

author = { Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan },

title = { Bangla Document Categorization using Term Graph },

journal = { International Journal of Computer Applications },

issue_date = { Aug 2019 },

volume = { 178 },

number = { 44 },

month = { Aug },

year = { 2019 },

issn = { 0975-8887 },

pages = { 24-32 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume178/number44/30835-2019919329/ },

doi = { 10.5120/ijca2019919329 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:53:05.189954+05:30

%A Enamul Hassan

%A Md Nazim Uddin

%A Moudud Ahmed Khan

%T Bangla Document Categorization using Term Graph

%J International Journal of Computer Applications

%@ 0975-8887

%V 178

%N 44

%P 24-32

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Bangla document categorization is an emergent topic now-a-days. Every document has some keywords that reflect its category. Document categorization refers to an automatic categorization of a document based on the keywords it contains. An expedient keyword selection method is necessary to correctly classify a document. TF-IDF [1], Naive Bayes [2][3][4], KNN [5] are some of the trending methods used in Document Categorization. Some of models are also used in Bangla Document Categorization. In this research, Term Graph concept was mainly focused. TGM [5] is never used before for Bangla document categorization. So, the concentration was Term Graph concept mixing with other existing models for categorizing Bangla documents. Experiments are also performed by changing and tuning feature selection method. Maximum 3-size subsets are used in experiment. Features were selected by changing selecting formula. Sometime all features were selected and sometime less important features were removed for increasing accuracy and reducing space complexity.

References

Y. h. Lu and Y. Huang, “Document categorization with entropy based tf/idf classi- fier,” vol. 4, pp. 269–273, May 2009.
Y. Wang, J. Hodges, and B. Tang, “Classification of web documents using a naive bayes method,” pp. 560– 564, 12 2003.
Y. Matsuo and M. Ishizuka, “Keyword extraction from a single document using word co-occurrence statistical information,” vol. 13, 03 2003.
M. EL KOURDI, A. BENSAID, and T.-e. Rachidi, “Automatic arabic document categorization based on the naïve bayes algorithm,” 08 2004.
V. Bijalwan, P. Kumari, J. Espada, and V. Semwal, “Knn based machine learning approach for text and document mining,” vol. 7, 06 2014.
M. Alexandrov, A. Gelbukh, and G. Lozovoi, “Chi-square classifier for document categorization,” pp. 457–459, 02 2001.
C.-h. Chan, A. Sun, and E.-P. Lim, “Automated online news classification with per- sonalization,” 03 2002.
A. Mesleh, “Support vector machines based arabic language text classification system: Feature selection comparative study.” pp. 11–16, 01 2007.
Y. Wang and Z.-O. Wang, “A fast knn algorithm for text categorization,” vol. 6, pp. 3436 – 3441, 09 2007.
M. S. Islam, F. Elahi, and S. Ikhtiar Ahmed, “A support vector machinemixed with tf-idf algorithm to categorize bengali document,” 04 2017.
S. Weiss, C. Apte, F. Damerau, and S. Weiss, “Text mining with decision trees and decision rules,” 10 1999.
Z. Chen, C. Ni, and Y. L. Murphey, “Neural network approaches for text document categorization,” pp. 1054–1060, 2006.
P. Soucy and G. Mineau, “A simple knn algorithm for text categorization,” pp. 647– 648, 02 2001.
H. Al-Mubaid and S. A. Umair, “A new text categorization technique using distribu- tional clustering and learning logic,” 2006

Index Terms

Computer Science

Information Sciences

Keywords

TF-IDF TGM KNN SVM NB.