We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Bangla Document Categorization using Term Graph

by Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 178 - Number 44
Year of Publication: 2019
Authors: Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan
10.5120/ijca2019919329

Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan . Bangla Document Categorization using Term Graph. International Journal of Computer Applications. 178, 44 ( Aug 2019), 24-32. DOI=10.5120/ijca2019919329

@article{ 10.5120/ijca2019919329,
author = { Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan },
title = { Bangla Document Categorization using Term Graph },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2019 },
volume = { 178 },
number = { 44 },
month = { Aug },
year = { 2019 },
issn = { 0975-8887 },
pages = { 24-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume178/number44/30835-2019919329/ },
doi = { 10.5120/ijca2019919329 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:53:05.189954+05:30
%A Enamul Hassan
%A Md Nazim Uddin
%A Moudud Ahmed Khan
%T Bangla Document Categorization using Term Graph
%J International Journal of Computer Applications
%@ 0975-8887
%V 178
%N 44
%P 24-32
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Bangla document categorization is an emergent topic now-a-days. Every document has some keywords that reflect its category. Document categorization refers to an automatic categorization of a document based on the keywords it contains. An expedient keyword selection method is necessary to correctly classify a document. TF-IDF [1], Naive Bayes [2][3][4], KNN [5] are some of the trending methods used in Document Categorization. Some of models are also used in Bangla Document Categorization. In this research, Term Graph concept was mainly focused. TGM [5] is never used before for Bangla document categorization. So, the concentration was Term Graph concept mixing with other existing models for categorizing Bangla documents. Experiments are also performed by changing and tuning feature selection method. Maximum 3-size subsets are used in experiment. Features were selected by changing selecting formula. Sometime all features were selected and sometime less important features were removed for increasing accuracy and reducing space complexity.

References
  1. Y. h. Lu and Y. Huang, “Document categorization with entropy based tf/idf classi- fier,” vol. 4, pp. 269–273, May 2009.
  2. Y. Wang, J. Hodges, and B. Tang, “Classification of web documents using a naive bayes method,” pp. 560– 564, 12 2003.
  3. Y. Matsuo and M. Ishizuka, “Keyword extraction from a single document using word co-occurrence statistical information,” vol. 13, 03 2003.
  4. M. EL KOURDI, A. BENSAID, and T.-e. Rachidi, “Automatic arabic document categorization based on the naïve bayes algorithm,” 08 2004.
  5. V. Bijalwan, P. Kumari, J. Espada, and V. Semwal, “Knn based machine learning approach for text and document mining,” vol. 7, 06 2014.
  6. M. Alexandrov, A. Gelbukh, and G. Lozovoi, “Chi-square classifier for document categorization,” pp. 457–459, 02 2001.
  7. C.-h. Chan, A. Sun, and E.-P. Lim, “Automated online news classification with per- sonalization,” 03 2002.
  8. A. Mesleh, “Support vector machines based arabic language text classification system: Feature selection comparative study.” pp. 11–16, 01 2007.
  9. Y. Wang and Z.-O. Wang, “A fast knn algorithm for text categorization,” vol. 6, pp. 3436 – 3441, 09 2007.
  10. M. S. Islam, F. Elahi, and S. Ikhtiar Ahmed, “A support vector machinemixed with tf-idf algorithm to categorize bengali document,” 04 2017.
  11. S. Weiss, C. Apte, F. Damerau, and S. Weiss, “Text mining with decision trees and decision rules,” 10 1999.
  12. Z. Chen, C. Ni, and Y. L. Murphey, “Neural network approaches for text document categorization,” pp. 1054–1060, 2006.
  13. P. Soucy and G. Mineau, “A simple knn algorithm for text categorization,” pp. 647– 648, 02 2001.
  14. H. Al-Mubaid and S. A. Umair, “A new text categorization technique using distribu- tional clustering and learning logic,” 2006
Index Terms

Computer Science
Information Sciences

Keywords

TF-IDF TGM KNN SVM NB.