International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 178 - Number 44 |
Year of Publication: 2019 |
Authors: Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan |
10.5120/ijca2019919329 |
Enamul Hassan, Md Nazim Uddin, Moudud Ahmed Khan . Bangla Document Categorization using Term Graph. International Journal of Computer Applications. 178, 44 ( Aug 2019), 24-32. DOI=10.5120/ijca2019919329
Bangla document categorization is an emergent topic now-a-days. Every document has some keywords that reflect its category. Document categorization refers to an automatic categorization of a document based on the keywords it contains. An expedient keyword selection method is necessary to correctly classify a document. TF-IDF [1], Naive Bayes [2][3][4], KNN [5] are some of the trending methods used in Document Categorization. Some of models are also used in Bangla Document Categorization. In this research, Term Graph concept was mainly focused. TGM [5] is never used before for Bangla document categorization. So, the concentration was Term Graph concept mixing with other existing models for categorizing Bangla documents. Experiments are also performed by changing and tuning feature selection method. Maximum 3-size subsets are used in experiment. Features were selected by changing selecting formula. Sometime all features were selected and sometime less important features were removed for increasing accuracy and reducing space complexity.