CFP last date
21 April 2025
Reseach Article

Detecting Hate Speech and Offensive Language: Evaluating Multiple Machine Learning Approaches on a Common Dataset

by Krishna B. Singha, Arti Bajaj
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 67
Year of Publication: 2025
Authors: Krishna B. Singha, Arti Bajaj
10.5120/ijca2025924488

Krishna B. Singha, Arti Bajaj . Detecting Hate Speech and Offensive Language: Evaluating Multiple Machine Learning Approaches on a Common Dataset. International Journal of Computer Applications. 186, 67 ( Feb 2025), 47-52. DOI=10.5120/ijca2025924488

@article{ 10.5120/ijca2025924488,
author = { Krishna B. Singha, Arti Bajaj },
title = { Detecting Hate Speech and Offensive Language: Evaluating Multiple Machine Learning Approaches on a Common Dataset },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2025 },
volume = { 186 },
number = { 67 },
month = { Feb },
year = { 2025 },
issn = { 0975-8887 },
pages = { 47-52 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number67/detecting-hate-speech-and-offensive-language-evaluating-multiple-machine-learning-approaches-on-a-common-dataset/ },
doi = { 10.5120/ijca2025924488 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-02-25T22:58:01.915345+05:30
%A Krishna B. Singha
%A Arti Bajaj
%T Detecting Hate Speech and Offensive Language: Evaluating Multiple Machine Learning Approaches on a Common Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 67
%P 47-52
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Social media platforms are readily exploited nowadays to propagate hate and offensive speeches that may be directed towards an individual, an organization, a particular society or societies, a country, and so on. These messages can sometimes lead to very horrific consequences, which this civilization witnessed in the recent past. To avoid such a scenario, it is very crucial to control the spread of such content in a timely manner. The task of identifying and filtering out these contents is very essential prior to being made available on the social media platforms. Machine learning algorithms have proved to be one of the popular techniques used for this task. This work presents the performance of three machine learning algorithms—Support Vector Machine (SVM), Stochastic Gradient Descent (SGD), and Naïve Bayes (NB)—and an ensemble learning method, Random Forest (RF), on a dataset of X(formerly Twitter) hate speech text data. We see the detection of hate speech in a tweet as a classification problem—hate and non-hate class. The dataset has been resampled to balance the data in the two classes after cleaning the text using various natural language processing techniques. Suitable feature engineering techniques are used to extract and select important features for the classification purpose. For each of the learning techniques, we evaluated the performance on the feature set. The SVM technique gave the highest F1 score of 98%, whereas the NB technique performed the lowest F1 score of 92%.

References
  1. Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., & Patti, V. (2021). Resources and benchmark corpora for hate speech detection.: a systematic review. Language Resources and Evaluation, 55, 477-523.
  2. Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017), May). Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media (Vol. 11, No. 1, pp. 512-515).
  3. Aljero, M. K. A., & Dimililer, N. (2021). A novel stacked ensemble for hate speech recognition. Applied Sciences, 11(24), 11684.
  4. Mnassri, K., Rajapaksha, P., Farahbakhsh, R., & Crespi, N. (2022, December). BERT-based Ensemble Approaches for Hate Speech Detection. In GLOBECOM 2022-2022 IEEE Global Communications Conference (pp. 4649-4654). IEEE.
  5. Hegde, A., Anusha, M. D., & Shashirekha, H. L. (2021). Ensemble based machine learning models for hate speech and offensive content identification. In Forum for Information Retrieval Evaluation (Working Notes)(FIRE), CEUR-WS. org.
  6. Agarwal, S., & Chowdary, C. R. (2021). Combating hate speech using an adaptive ensemble learning model with a case study on COVID-19. Expert Systems with Applications, 185, 115632.
  7. Mutanga, R. T., Naicker, N., & Olugbara, O. O. (2022). Detecting Hate Speech on Twitter Network using Ensemble Machine Learning. International Journal of Advanced Computer Science and Applications, 13(3).
  8. Zimmerman, S., Kruschwitz, U., & Fox, C. (2018, May). Improving hate speech detection with deep learning ensembles. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).
  9. Zhou, Y., Yang, Y., Liu, H., Liu, X., & Savage, N. (2020). Deep learning based fusion approach for hate speech detection. IEEE Access, 8, 128923-128929.
  10. Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017, April). Deep learning for hate speech detection in tweets. In Proceedings of the 26th international conference on World Wide Web companion (pp. 759-760).
  11. Alfina, I., Mulia, R., Fanany, M. I., & Ekanata, Y. (2017, October). Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 international conference on advanced computer science and information systems (ICACSIS) (pp. 233-238). IEEE.
  12. Asogwa, D. C., Chukwuneke, C. I., Ngene, C. C., & Anigbogu, G. N. (2022). Hate speech classification using SVM and naive BAYES. arXiv preprint arXiv:2204.07057.
  13. Burnap, P., & Williams, M. L. (2015). Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & internet, 7(2), 223-242.
  14. Mohapatra, S. K., Prasad, S., Bebarta, D. K., Das, T. K., Srinivasan, K., & Hu, Y. C. (2021). Automatic hate speech detection in english-odia code mixed social media data using machine learning techniques. Applied Sciences, 11(18), 8575.
  15. Gambäck, B., & Sikdar, U. K. (2017, August). Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online (pp. 85-90).
  16. Alsafari, S., Sadaoui, S., & Mouhoub, M. (2020, November). Deep learning ensembles for hate speech detection. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 526-531). IEEE.
  17. Ibrohim, M. O., & Budi, I. (2019, August). Multi-label hate speech and abusive language detection in Indonesian Twitter. In Proceedings of the third workshop on abusive language online (pp. 46-57).
  18. Gomez, R., Gibert, J., Gomez, L., & Karatzas, D. (2020). Exploring hate speech detection in multimodal publications. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1470-1478).
  19. Mullah, N. S., & Zainon, W. M. N. W. (2021). Advances in machine learning algorithms for hate speech detection in social media: a review. IEEE Access, 9, 88364-88376.
  20. Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012, September). Detecting offensive language in social media to protect adolescent online safety. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing (pp. 71-80). IEEE.
  21. Kumar Sharma, H., Kshitiz, K., & Shailendra. (2018). NLP and machine learning techniques for detecting insulting comments on social networking platforms. In Proceedings on 2018 international conference on advances in computing and communication engineering, ICACCE 2018, IEEE (pp.265–272). Online Resources: https://rdcu.be/duIzm (retrieved on 15th August 2024)
Index Terms

Computer Science
Information Sciences
Classification task
Machine Learning Algorithms

Keywords

Hate Speech Offensive Speech Hate Speech Classification Social Media Support Vector Machine Machine Learning Random Forest Ensemble Learning Stochastic Gradient Descent Naïve Bayes