International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 67 |
Year of Publication: 2025 |
Authors: Krishna B. Singha, Arti Bajaj |
![]() |
Krishna B. Singha, Arti Bajaj . Detecting Hate Speech and Offensive Language: Evaluating Multiple Machine Learning Approaches on a Common Dataset. International Journal of Computer Applications. 186, 67 ( Feb 2025), 47-52. DOI=10.5120/ijca2025924488
Social media platforms are readily exploited nowadays to propagate hate and offensive speeches that may be directed towards an individual, an organization, a particular society or societies, a country, and so on. These messages can sometimes lead to very horrific consequences, which this civilization witnessed in the recent past. To avoid such a scenario, it is very crucial to control the spread of such content in a timely manner. The task of identifying and filtering out these contents is very essential prior to being made available on the social media platforms. Machine learning algorithms have proved to be one of the popular techniques used for this task. This work presents the performance of three machine learning algorithms—Support Vector Machine (SVM), Stochastic Gradient Descent (SGD), and Naïve Bayes (NB)—and an ensemble learning method, Random Forest (RF), on a dataset of X(formerly Twitter) hate speech text data. We see the detection of hate speech in a tweet as a classification problem—hate and non-hate class. The dataset has been resampled to balance the data in the two classes after cleaning the text using various natural language processing techniques. Suitable feature engineering techniques are used to extract and select important features for the classification purpose. For each of the learning techniques, we evaluated the performance on the feature set. The SVM technique gave the highest F1 score of 98%, whereas the NB technique performed the lowest F1 score of 92%.