CFP last date
20 December 2024
Reseach Article

Open Source Autonomous Bengali Corpus

by Summit Haque, Md. Abu Shahriar Ratul, Md. Yousuf Ali Khan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 17
Year of Publication: 2020
Authors: Summit Haque, Md. Abu Shahriar Ratul, Md. Yousuf Ali Khan
10.5120/ijca2020920120

Summit Haque, Md. Abu Shahriar Ratul, Md. Yousuf Ali Khan . Open Source Autonomous Bengali Corpus. International Journal of Computer Applications. 176, 17 ( Apr 2020), 33-37. DOI=10.5120/ijca2020920120

@article{ 10.5120/ijca2020920120,
author = { Summit Haque, Md. Abu Shahriar Ratul, Md. Yousuf Ali Khan },
title = { Open Source Autonomous Bengali Corpus },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2020 },
volume = { 176 },
number = { 17 },
month = { Apr },
year = { 2020 },
issn = { 0975-8887 },
pages = { 33-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume176/number17/31297-2020920120/ },
doi = { 10.5120/ijca2020920120 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:42:50.406922+05:30
%A Summit Haque
%A Md. Abu Shahriar Ratul
%A Md. Yousuf Ali Khan
%T Open Source Autonomous Bengali Corpus
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 17
%P 33-37
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Through Sentiment Analysis System it is possible to know what kind of information is there in a text. For example, one can identify is the text about a particular product, political view, sport, entertainment, education, politics, etc. or not. It is also possible to further categorize text in positive, negative or neutral. So, through proper Sentiment Analysis, the current technology would go to another step. There are so many works on Sentiment Analysis that have been done already in different languages. But due to lack of data, the work on Sentiment Analysis on Bangla Text is very limited. Because word categorization accuracy depends heavily on the size of the text corpus used to derive the inter-word statistics. So, it was planned to develop an automated corpus generation system that traverses the Web collecting text and stores them under the defined category. This flexible scheme can produce very large general-purpose corpora or particular samples of domain-specific text.

References
  1. Sebastiani, F. Text Categorization.
  2. Xiao, R. Corpus Creation.
  3. Majumder, K. M. Y. A., Islam, M. Z., Zaman, N. U., and Khan, M. Analysis of and Observations from a Bangla News Corpus.
  4. Mumin, M. A. A., Shoeb, A. A. M., Selim, M. R., and Iqbal, M. Z. SUMono: A Representative Modern Bengali Corpus
  5. McEnery, A., Xiao, R., Tono, Y. Corpora Survey.
  6. Panunzi, A., Fabbri, M, Moneglia, M., Gregori, L., and Paladini, S. RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for web Corpora Building.
  7. Sarkar, A. I., Pavel, D. S. H., and Khan, M. Automatic Bangla Corpus Creation.
  8. Lesher, G. W. A Web-Based System for Autonomous Text Corpus Generation.
  9. Oliver, A. Automaticcreation of WordNets from parallel corpora
  10. Pavel Kr´al, P., and Cerisara, C. Automatic Dialog Act Corpuscreation From Web Pages
  11. Jha, M., Andreas, J., Thadani, K., Rosenthal, S., and McKeown, K. Corpus Creation for New Genres: A Crowdsourced Approach to PP Attachment
  12. Maeda, K., Lee, H., Medero, S., Medero, J., Parker, R., and Strassel, S. Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium.
  13. Cieri, C., and Liberman, M. Issues in Corpus Creation and Distribution: The Evolution of the Linguistic Data Consortium
  14. Pavel, D. S. H., Sarkar, A. I., and Khan, M. A Proposed Automated Extraction Procedure Of Bangla Text For Corpus Creation In Unicode
Index Terms

Computer Science
Information Sciences

Keywords

Corpus Autonomous Corpus Autonomous Bengali Corpus ZIPF Law.