International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 176 - Number 17 |
Year of Publication: 2020 |
Authors: Summit Haque, Md. Abu Shahriar Ratul, Md. Yousuf Ali Khan |
10.5120/ijca2020920120 |
Summit Haque, Md. Abu Shahriar Ratul, Md. Yousuf Ali Khan . Open Source Autonomous Bengali Corpus. International Journal of Computer Applications. 176, 17 ( Apr 2020), 33-37. DOI=10.5120/ijca2020920120
Through Sentiment Analysis System it is possible to know what kind of information is there in a text. For example, one can identify is the text about a particular product, political view, sport, entertainment, education, politics, etc. or not. It is also possible to further categorize text in positive, negative or neutral. So, through proper Sentiment Analysis, the current technology would go to another step. There are so many works on Sentiment Analysis that have been done already in different languages. But due to lack of data, the work on Sentiment Analysis on Bangla Text is very limited. Because word categorization accuracy depends heavily on the size of the text corpus used to derive the inter-word statistics. So, it was planned to develop an automated corpus generation system that traverses the Web collecting text and stores them under the defined category. This flexible scheme can produce very large general-purpose corpora or particular samples of domain-specific text.