International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 11 |
Year of Publication: 2024 |
Authors: Datla Tarun Anjaneya Varma, Nukala Sai Dhanuj, Nookala Gopala Krishna Murthy |
10.5120/ijca2024923466 |
Datla Tarun Anjaneya Varma, Nukala Sai Dhanuj, Nookala Gopala Krishna Murthy . Hostile Content Detection from Tweets in Hindi using Machine Learning and Deep Learning. International Journal of Computer Applications. 186, 11 ( Mar 2024), 30-34. DOI=10.5120/ijca2024923466
In this paper, the focus is to address the exigent challenge of cyberbullying detection within the domain of Hindi social media discourse, an area conspicuously underserved in scholarly exploration. Harnessing a meticulously curated dataset from the CONSTRAINT-2021[1][6] shared task, encompassing approximately 8,200 posts meticulously annotated with categories delineating facets such as fake, hate, offensive, and defamation, the study leverages the prowess of machine learning methodologies. Two distinct approaches are scrutinized: one predicated on the application of the MBERT transformer model, involving the translation of sentences into English, and the other leveraging INLTK embeddings directly for Hindi posts. The outcomes unveil the superior efficacy of the MBERT model in comparison to INLTK. Employing discerning algorithms such as Xgboost, Lightgbm, and Catboost, the research attains commendable F1 scores across diverse categories of hostile content. This scholarly pursuit thus not only enriches the existing literature on the detection of cyberbullying in regional languages but also furnishes consequential insights for mitigating this societal challenge.