International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 2 |
Year of Publication: 2024 |
Authors: Yehia Helmy, Merna Ashraf, Laila Abdelhamid |
10.5120/ijca2024923346 |
Yehia Helmy, Merna Ashraf, Laila Abdelhamid . Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts. International Journal of Computer Applications. 186, 2 ( Jan 2024), 8-16. DOI=10.5120/ijca2024923346
Due to the numerous issues or challenges that aren't always within the company's control. Customers became unhappy. Customer complaint is the method by which they convey their dissatisfaction. Due to the rapid advancement of technology and the various convenient channels available for customers to voice their complaints, including email, web, and chatbots, online complaints have experienced exponential growth. As a result, classifying these complaints under the pertinent issue in time became a difficult task. Selecting the appropriate classification model and Fitting it with the proper training and testing ratios is a crucial topic that always faces researchers. This paper implements and compares the performance of six text classification machine learning algorithms used in multi-classification (SVM, KNN, NB, DT, RF, and GB) under two types of sampling (random and stratified) with the use of various data splitting ratios 50:50,80:20, 60:40, 70:30, and 90:10 on a Complaint Dataset. This paper aims to provide a roadmap for researchers working in the text classification field that helps them select the optimum classification model and splitting ratio. The results demonstrate that DT with an accuracy of 99%, F1-measure of 99%, and runtime of 1 second outperformed all other algorithms. And that the most suitable splitting ratio that fits most algorithms and acts as a secure base to work with is 80:20. It also indicates that using stratified sampling in multi-class text classification produces better results than random sampling.