CFP last date
22 July 2024
Call for Paper
August Edition
IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 22 July 2024

Submit your paper
Know more
Reseach Article

A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis

by Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 7
Year of Publication: 2024
Authors: Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat
10.5120/ijca2024923409

Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat . A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis. International Journal of Computer Applications. 186, 7 ( Feb 2024), 35-42. DOI=10.5120/ijca2024923409

@article{ 10.5120/ijca2024923409,
author = { Korakot Matarat, Chaidan Mingmuang, Weerasak Charoenrat },
title = { A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2024 },
volume = { 186 },
number = { 7 },
month = { Feb },
year = { 2024 },
issn = { 0975-8887 },
pages = { 35-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number7/a-comprehensive-performance-analysis-of-supervised-machine-learning-techniques-for-sentiment-analysis/ },
doi = { 10.5120/ijca2024923409 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-22T22:17:52.859284+05:30
%A Korakot Matarat
%A Chaidan Mingmuang
%A Weerasak Charoenrat
%T A Comprehensive Performance Analysis of Supervised Machine Learning Techniques for Sentiment Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 7
%P 35-42
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Sentiment analysis plays a crucial role in deciphering opinions and emotions expressed in textual data, with wide-ranging applications in business such as customer feedback analysis and social media monitoring. This paper conducts a thorough performance analysis of supervised machine learning algorithms in sentiment analysis, utilising the Wongnai reviews dataset, which comprises 40,000 reviews. By utilising a sophisticated preprocessing pipeline and conducting a comparative analysis of feature extraction methods, the research improves sentiment analysis by eliminating stop words (e.g., < > □% I < / # + -;- * & @ $). Subsequently, it will eradicate words that are meaningless for processing the text, for example, มี, เฉยๆ, เช่นใด, เพียงแต่, น้อยๆ, ข้างเคียง and hashtag removal, POS tagging, sentiment score computation, and TF-IDF analysis. The research introduces a novel approach to dominant feature extraction, surpassing traditional bag-of-words methods. By applying six algorithms Logistic Regression (LR), Multinomial Naïve Bayes (NB), Decision Tree Classifier (DT), Neural Network (NN), Gradient Descent (SGD), and Support Vector Machine (SVC), the study compares their accuracy, precision, and recall values, revealing notable insights within the context of Wongnai reviews. In conclusion, this paper not only contributes to understanding sentiment analysis performance but also serves as a valuable resource for optimising models in diverse domains. SVC emerges as the top-performing algorithm by achieving a 0.73 accuracy score, outclassing LR, NB, NN, and SGD with identical performances by achieving a 0.72 accuracy score, while DT exhibits the lowest performance. Further analysis combining TF-IDF with BoW shows improved performance by SGD and SVC by achieving a 0.74 accuracy score, reinforcing the superior performance of SVC in this experiment. This concise summary provides a foundation for practitioners and researchers engaged in sentiment analysis, aiding informed decision-making and paving the way for future exploration with advanced machine learning algorithms.

References
  1. Abdul, M., Abdul, K., and Abu, K. (2019). Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset. 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, pp. 266 - 270, Nov. 2019
  2. Alexander, M., Elmina, H., Francisco, C. and Ofer, E. (2021). Sentiment analysis using TF–IDF weighting of UK MPs’ tweets on Brexit. Knowledge-Based Systems,Vol. 228, Sep. 2021
  3. Abdulwahab, A. and Mustafa, A. (2019). Sentiment Analysis of Product Reviews Using Bag of Words and Bag of Concepts. IJEIE, Vol. 11. No.2. pp.49-60, Dec. 2019
  4. Azwa, A. and Andrew, S. (2019). Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches. in IEEE Access, vol. 8, pp. 17722-17733, Dec. 2020
  5. Devansh, S., Arun, S. and Sudha, P. (2022). Sentimental Analysis Using Supervised Learning Algorithms. ICCAKM, Dubai, United Arab Emirates, pp. 1-6, Dec. 2022
  6. Elena, R., Martin, H., Matthias, W. and Marcelo, J. (2018). More than Bags of Words: Sentiment Analysis with Word Embeddings. Communication Methods and Measures, Vol. 12, No. 2, pp. 140-157, Apr. 2018
  7. Furqan, R., Madiha, W., Vaibhav, R. and Arif, Mehmood. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE. Vol. 16, No. 2, Feb. 2021
  8. Hafiz, M., et al. (2021). Sentiment Analysis of Online Food Reviews using Big Data Analytics. Elementary Education Online. Vol. 20, No. 2, pp. 827-836, Apr. 2021
  9. Kanwal, Z., Narmeen, B. and Soomaiya, H. (2020). Sentiment Analysis and Classification of Restaurant Reviews using Machine Learning. ACIT, Giza, Egypt, pp. 1-6, Jan. 2020
  10. Kotagiri, S., and Mary, S. (2019). Aspect Based Sentiment Analysis using POS Tagging and TFIDF. IJEAT, Vol. 8, No. 6, Aug. 2019
  11. Manasee, G. (2015). The Process of Sentiment Analysis: A Study. International Journal of Computer Applications, Vol. 126, No. 7, Sep. 2015
  12. Marwan, O., Moustafa, H., Nacereddine, H. and Amani, S. (2019). Sentiment Classifier: Logistic Regression for Arabic Services’ Reviews in Lebanon. International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, pp. 1-5, May. 2019
  13. Metin, B., and Haldun, K. (2019). Sentiment Analysis with Term Weighting and Word Vectors. The International Arab Journal of Information Technology, Vol. 16, No. 5, Sep. 2019
  14. Mohammad, F., and Riyanarto, S. (2019). A comparative study of sentiment analysis using SVM and SentiWordNet. Indonesian Journal of Electrical Engineering and Computer Science, Vol. 13, No. 3, pp. 902-909, Mar. 2019
  15. Mohd, Y., Muhammad, L. and Liyana, Z. (2019). A Review on Sentiment Analysis Techniques and Applications. IOP Conference Series Materials Science and Engineering, Vol. 551, Fab. 2019
  16. Mohamed, C. et al. (2021). LSTM, VADER and TF-IDF based Hybrid Sentiment Analysis Model. IJACSA, Vol. 12, No. 7, Jul. 2021
  17. Pooja, M. and Sharnil, P. (2020). A Review on Sentiment Analysis Methodologies, Practices and Applications. IJSTR, Vol. 9, No. 2, Feb. 2020
  18. Raj, P. et al. (2016). Comparative Evaluation of Supervised Learning Algorithms for Sentiment Analysis of Movie Reviews. International Journal of Computer Applications, Vol. 142, No. 1, May. 2016
  19. Rutuja, R., Sumit, K. and Ruchi, R. (2022). Comparison of Artificial Intelligence Algorithms in Plant Disease Prediction. Revue d'Intelligence Artificielle, Vol. 36, No. 2, pp. 185-193, Apr. 2022
  20. Manjula, D. et al. (2023). Twitter Sentiment Analysis using Collaborative Multi Layer Perceptron (MLP) Classifier. ICCCI, Coimbatore, India, pp. 1-6, May. 2023
  21. Samriti, S., Gurvinder, S. and Manik, S. (2021). A comprehensive review and analysis of supervisedlearning and soft computing techniques for stress diagnosis in humans. Computers in Biology and Medicine, Vol. 134, Jul. 2021
  22. Samruddhi, K. (2019). Classification Model to Predict the Sentiment of Hotel Review. IRJCS, Vol. 6, No. 6, Jun. 2019
  23. Saleh, N. et al. (2022). Data Analytics for the Identification of Fake Reviews Using Supervised Learning. Computers, Materials & Continua, Vol. 70, No. 2, Sep. 2022
  24. Siva, P. et al. (2019). Feature-Based Opinion Mining for Amazon Product’s using MLT. International Journal of Innovative Technology and Exploring Engineering, Vol. 8, No. 11, Sep. 2019
  25. Siyin, L. et al. (2021). Research on Text Sentiment Analysis Based on Neural Network and Ensemble Learning. Revue d'Intelligence Artificielle, Vol. 35, No. 1, pp. 63-70, Feb. 2021
  26. Satyendra, S., Krishan, K. and Brajesh, K. (2022). Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques. International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, Faridabad, India, pp. 252-255, May. 2022
  27. Shamsa, U. et al. (2018). Sentiment Analysis Approaches and Applications: A Survey. International Journal of Computer Applications, Vol. 181, No. 1, Jul. 2018
  28. Tanatorn, T., Nuttapong, S. and Udomsak, D. (2020). Sentiment Classification on Thai Social Media Using a Domain-Specific Trained Lexicon, ECTI-CON, Phuket, Thailand, pp. 580-583, Jun. 2020
  29. Tejaswini, M. and Choudhari, G. (2019). Implementation of Sentiment Classification of Movie Reviews by Supervised Machine Learning Approaches. ICCMC, Erode, India, pp. 1197-1200, Mar. 2019
  30. Vivian, L. et al. (2019). Semi-supervised Learning for Sentiment Classification using Small Number of Labeled Data. The Fifth Information Systems International Conference, Vol. 161, pp. 577-584, Jan. 2019
  31. Shadi, D. (2018). Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. IJCSIS, Vol. 16, No. 12, Dec. 2018
  32. Waqar, M. et al. (2020). Sentiment analysis of Product Reviews in the Absence of Labelled data using Supervised Learning Approaches. Malaysian Journal of Computer Science, Vol. 32, No. 2, pp. 118-132, Apr. 2020
  33. Korakot, C. (2021). Wongnai corpus. https://github.com/ wongnai/wongnai-corpus
Index Terms

Computer Science
Information Sciences

Keywords

Performance analysis Supervised learning Bag-of-words TF-IDF analysis Thai language data analysis Sentiment analysis.