CFP last date
20 February 2025
Reseach Article

Different Machine Learning based Approaches of Baseline and Deep Learning Models for Bengali News Categorization

by Mohammad Rabib Hossain, Soikot Sarkar, Moqsadur Rahman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 18
Year of Publication: 2020
Authors: Mohammad Rabib Hossain, Soikot Sarkar, Moqsadur Rahman
10.5120/ijca2020920107

Mohammad Rabib Hossain, Soikot Sarkar, Moqsadur Rahman . Different Machine Learning based Approaches of Baseline and Deep Learning Models for Bengali News Categorization. International Journal of Computer Applications. 176, 18 ( Apr 2020), 10-16. DOI=10.5120/ijca2020920107

@article{ 10.5120/ijca2020920107,
author = { Mohammad Rabib Hossain, Soikot Sarkar, Moqsadur Rahman },
title = { Different Machine Learning based Approaches of Baseline and Deep Learning Models for Bengali News Categorization },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2020 },
volume = { 176 },
number = { 18 },
month = { Apr },
year = { 2020 },
issn = { 0975-8887 },
pages = { 10-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume176/number18/31299-2020920107/ },
doi = { 10.5120/ijca2020920107 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:42:51.939565+05:30
%A Mohammad Rabib Hossain
%A Soikot Sarkar
%A Moqsadur Rahman
%T Different Machine Learning based Approaches of Baseline and Deep Learning Models for Bengali News Categorization
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 18
%P 10-16
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Today’s universe is the type of world where everyone thrives to live in virtual life. According to the perspective of the present time, the online news portal holds a major door to that gradually increasing greedy life. So around the globe, the various platform has been developed to fulfill the requirement of mankind. A heavy load of work has been carried out for making this platform autonomous in the English language. That’s why the machine learning approach is quite a fully developed field in English in news classification. But it can't be said the same for Bangla language. These put in the inspiration to do a research on this topic. So, here Bangla news which has been collected from newspapers and gathered around to make a Bengali Corpus. After preprocessing the news text, different sorts of procedures to classify the news text using baseline and deep learning models of Machine Learning are applied.

References
  1. Tenenboim, L., Shapira, B. and Shoval, P. 2008. Ontology-based classification of news in an electronic newspaper.
  2. Pendharkar, B., Ambekar, P., Godbole, P., Joshi, S. and Abhyankar, S. 2007. Topic categorization of rss news feeds, Group.
  3. Carreira, R., Crato, J. M., Gonçalves, D. and Jorge, J. A. 2004. Evaluating adaptive user profiles for news classification. 9th international conference on Intelligent user interfaces, pp. 206–212.
  4. Xu, J., Ding, Y.-X. and Wang, X.-L. 2007. Sentiment classification for chinese news using machine learning methods. Journal of Chinese Information Processing, vol. 21, no. 6, pp. 95–100.
  5. Asy’arie, A. D. and Pribadi, A. W. 2009. Automatic news articles classification in Indonesian language by using naive bayes classifier method. 11th International Conference on Information Integration and Web-based Applications & Services, pp. 658–662.
  6. Buana, P. W., Jannet, S. and Putra I. 2012. Combination of k-nearest neighbor and k-means based on term re-weighting for classify indonesian news. International Journal of Computer Applications, vol. 50, no. 11, pp. 37–42.
  7. Dutta, K., Kaushik, S. and Prakash, N. 2011. Machine learning approach for the classification of demonstrative pronouns for indirect anaphora in hindi news items. The Prague Bulletin of Mathematical Linguistics, vol. 95, pp. 33–50.
  8. Kanan, T. and Fox, E. A. 2016. Automated arabic text classification with p-s temmer, machine learning, and a tailored news article taxonomy. Journal of the Association for Information Science and Technology, vol. 67, no. 11, pp. 2667–2683.
  9. El-Barbary, O. 2016. Arabic news classification using field association words. Advances in Research, pp. 1–9.
  10. Haque, R., Dandapat, S., Srivastava, A. K., Naskar, S. K. and Way, A. 2009. English-hindi transliteration using context-informed pb-smt: the dcu system for news 2009. Named Entities Workshop: Shared Task on Transliteration. Association for Computational Linguistics, pp. 104–107.
  11. Mansur, M. 2006. Analysis of n-gram based text categorization for bangla in a newspaper corpus. Ph.D. dissertation, BRAC University.
  12. Hossain, M. R. and Hoque, M. M. 2019. Automatic bengali document categorization based on deep convolution nets. Emerging Research in Computing, Information, Communication and Applications. Springer, pp. 513–525.
  13. Mostakim, S. Al., Ehsan, F., Hasan, S. M., Islam, S. and Shatabda, S. 2018. Bangla content categorization using text based supervised learning methods. International Conference on Bangla Speech and Language Processing (ICBSLP), IEEE, pp. 1–6.
  14. Chy, A. N., Seddiqui, M. H. and Das, S. 2014. Bangla news classification using naive bayes classifier. 16th Int’l Conf. Computer and Information Technology, IEEE, pp. 366–371.
  15. Ladwani, V. M. 2018. Support vector machines and applications. Computer Vision: Concepts, Methodologies, Tools, and Applications. IGI Global, pp. 1381–1390.
  16. Cauwenberghs, G. and Poggio, T. 2001. Incremental and decremental support vector machine learning. Advances in neural information processing systems, pp. 409–415.
  17. Huang, X., Maier, A., Hornegger, J. and Suykens, J. A. 2017. Indefinite kernels in least squares support vector machines and principal component analysis. Applied and Computational Harmonic Analysis, vol. 43, no. 1, pp. 162–172.
  18. Plank, B., Søgaard, A. and Goldberg, Y. 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529.
  19. Zhou, P. Shi, W. Tian, J., Qi, Z., Li, B. Hao, H. and Xu, B. 2016. Attention-based bidirectional long short-term memory networks for relation classification. 54th annual meeting of the association for computational linguistics (volume 2: Short papers), pp. 207–212.
  20. Lawrence, S., Giles, C. L., Tsoi, A. C. and Back, A. D. 1997. Face recognition: A convolutional neural-network approach. IEEE transactions on neural networks, vol. 8, no. 1, pp. 98–113.
  21. Kalchbrenner, N., Grefenstette, E. and Blunsom, P. A. 2014. Convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
  22. Maron, M. E. 1961. Automatic indexing: an experimental inquiry. Journal of the ACM (JACM), vol. 8, no. 3, pp. 404–417.
  23. Rennie, J., Shih, L., Teevan, J. and Karger, D. 2003. Tackling the poor assumptions of naive bayes classifiers. ICML [Accessed: 10-Feb-2017].
  24. I. Rish et al. 2001. An empirical study of the naive bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, no. 22, pp. 41–46.
  25. Ho, T. K. 1995. Random decision forests. 3rd international conference on document analysis and recognition, vol. 1. IEEE, pp. 278–282.
  26. Technol, L. and Hill, M. 1998. The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, vol. 20, no. 8, pp. 832–844.
  27. Friedman, J., Hastie, T. and Tibshirani, R. 2001. The elements of statistical learning. Springer series in statistics New York, vol. 1, no. 10.
  28. Kleinberg, E. 1990. Stochastic discrimination. Annals of Mathematics and Artificial intelligence, vol. 1, no. 1-4, pp. 207–239.
  29. Kleinberg E. et al. 1996. An overtraining-resistant stochastic modeling method for pattern recognition. The annals of statistics, vol. 24, no. 6, pp. 2319–2349.
Index Terms

Computer Science
Information Sciences

Keywords

Sentiment Analysis Bangla News Categorization Confusion Matrix CNN BiLSTM.