International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 7 |
Year of Publication: 2024 |
Authors: Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat |
10.5120/ijca2024923409 |
Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat . A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis. International Journal of Computer Applications. 186, 7 ( Feb 2024), 35-42. DOI=10.5120/ijca2024923409
Sentiment analysis plays a crucial role in deciphering opinions and emotions expressed in textual data, with wide-ranging applications in business such as customer feedback analysis and social media monitoring. This paper conducts a thorough performance analysis of supervised machine learning algorithms in sentiment analysis, utilising the Wongnai reviews dataset, which comprises 40,000 reviews. By utilising a sophisticated preprocessing pipeline and conducting a comparative analysis of feature extraction methods, the research improves sentiment analysis by eliminating stop words (e.g., < > □% I < / # + -;- * & @ $). Subsequently, it will eradicate words that are meaningless for processing the text, for example, มี, เฉยๆ, เช่นใด, เพียงแต่, น้อยๆ, ข้างเคียง and hashtag removal, POS tagging, sentiment score computation, and TF-IDF analysis. The research introduces a novel approach to dominant feature extraction, surpassing traditional bag-of-words methods. By applying six algorithms Logistic Regression (LR), Multinomial Naïve Bayes (NB), Decision Tree Classifier (DT), Neural Network (NN), Gradient Descent (SGD), and Support Vector Machine (SVC), the study compares their accuracy, precision, and recall values, revealing notable insights within the context of Wongnai reviews. In conclusion, this paper not only contributes to understanding sentiment analysis performance but also serves as a valuable resource for optimising models in diverse domains. SVC emerges as the top-performing algorithm by achieving a 0.73 accuracy score, outclassing LR, NB, NN, and SGD with identical performances by achieving a 0.72 accuracy score, while DT exhibits the lowest performance. Further analysis combining TF-IDF with BoW shows improved performance by SGD and SVC by achieving a 0.74 accuracy score, reinforcing the superior performance of SVC in this experiment. This concise summary provides a foundation for practitioners and researchers engaged in sentiment analysis, aiding informed decision-making and paving the way for future exploration with advanced machine learning algorithms.