CFP last date
20 February 2025
Reseach Article

Sarcasm Detection in Telugu Language Text using Distinct Machine Learning Classification Algorithms

by B. Ravikiran, Srinivasu Badugu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 42
Year of Publication: 2024
Authors: B. Ravikiran, Srinivasu Badugu
10.5120/ijca2024924040

B. Ravikiran, Srinivasu Badugu . Sarcasm Detection in Telugu Language Text using Distinct Machine Learning Classification Algorithms. International Journal of Computer Applications. 186, 42 ( Sep 2024), 28-35. DOI=10.5120/ijca2024924040

@article{ 10.5120/ijca2024924040,
author = { B. Ravikiran, Srinivasu Badugu },
title = { Sarcasm Detection in Telugu Language Text using Distinct Machine Learning Classification Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2024 },
volume = { 186 },
number = { 42 },
month = { Sep },
year = { 2024 },
issn = { 0975-8887 },
pages = { 28-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number42/sarcasm-detection-in-telugu-language-text-using-distinct-machine-learning-classification-algorithms/ },
doi = { 10.5120/ijca2024924040 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-09-30T23:02:41.252793+05:30
%A B. Ravikiran
%A Srinivasu Badugu
%T Sarcasm Detection in Telugu Language Text using Distinct Machine Learning Classification Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 42
%P 28-35
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Sarcasm detection is a growing field in Natural Language Processing (NLP). Sarcasm is identified using positive or more increased positive words, often with a negative connotation, to insult or mock others. In sentiment analysis, detecting sarcasm in the text has become critical. They reviewed numerous relevant research articles, but due to the telugu language's limited resources, detecting sarcasm in telugu language texts remains challenging. As a result, the sentiment detection model struggles to accurately identify the exact sentiment of a sarcastic statement, necessitating the development of an automated sarcasm detection system. Many researchers have trained and tested various machine learning classification algorithms to identify sarcasm, but these algorithms require a dataset as input, which often contains noise. The dataset undergoes various preprocessing techniques to eliminate noise. Gathered a Telugu conversational dataset from the Kaggle repository, developed their dataset called the Telugu News Headline dataset, labeled the statements as sarcastic or non-sarcastic by the annotators, and then input them into the proposed model. Built the proposed model using SVM (Support Vector Machine), NB (Naive Bayes), and LR (Logistic Regression) and utilized One Hot Encoding (OHE) to transform the dataset into vectors, then fed to the Sarcasm Detection Model to determine the model accuracy. It is trained and tested the Sarcasm detection model on positive or even more positive sentences with 60:40, 70:30, 80:20, and 90:10 splitting ratios to enhance the model performance. By considering the base 70:30 split ratio the best of three algorithms, Logistic Regression resulted in accuracy rates of 65.89% on the imbalanced Telugu conversational dataset and 67.01% on the balanced Telugu conversational dataset. Logistic Regression resulted in accuracy rates of 90.07% on the imbalanced Telugu news headline dataset, and SVM resulted in an accuracy of 98.35% on the balanced Telugu conversational dataset. It is observed that Logistic Regression had better accuracy on the imbalanced and balanced Telugu conversational dataset and the imbalanced Telugu news headline dataset, whereas on the balanced Telugu news headline dataset, SVM had good accuracy. In the future, it can be applied deep learning algorithms to detect sarcasm for better accuracy.

References
  1. Joshi, A., Bhattacharyya, P., & Carman, M. J. (2017). “Automatic Sarcasm Detection : A Survey”. ACM ComputingSurveys, 50(5),1–22. https://doi.org/10.1145/3124420
  2. Misra, R., & Arora, P. (2023). “Sarcasm Detection using news headlines dataset”. AI Open. https://doi.org/10.1016/j.aiopen.2023.01.001
  3. Otter, D. W., Medina, J. R., & Kalita, J. K. (2020). “A Survey of the Usages of Deep Learning for Natural Language Processing”. IEEE Transactions on Neural Networks and Learning Systems, 32(2), 1–21. https://doi.org/10.1109/TNNLS.2020.2979670
  4. Šandor, D. and Bagić Babac, M. (2024), "Sarcasm Detection in online comments using machine learning", Information Discovery and Delivery, Vol. 52 No. 2, pp. 213-226. https://doi.org/10.1108/IDD-01-2023-0002
  5. Rahma, A., Azab, S. S., & Mohammed, A. (2023). “A Comprehensive Survey on Arabic Sarcasm Detection: Approaches, Challenges and Future Trends”. IEEE Access, 11,18261–18280. https://doi.org/10.1109/access.2023.3247427
  6. Razali, M. S., Halin, A. A., Ye, L., Doraisamy, S., & Norowi, N. M. (2021). “Sarcasm Detection Using Deep Learning With Contextual Features”. IEEE Access, 9, 68609–68618. https://doi.org/10.1109/ACCESS.2021.3076789
  7. Ravi Teja Gedela, Ujwala Baruah, & Soni, B. (2023). “Deep Contextualised Text Representation and Learning for Sarcasm Detection”. Arabian Journal for Science and Engineering, 49(3), 3719–3734. https://doi.org/10.1007/s13369-023-08170-4
  8. Kumar, A., & Garg, G. (2019). “Empirical study of shallow and deep learning Models for Sarcasm Detection using context in benchmark datasets”. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-019-01419-7
  9. Eke, C. I., Norman, A. A., Shuib, L., & Nweke, H. F. (2019). “Sarcasm identification in textual data: systematic review, research challenges and open directions”. Artificial Intelligence Review, 53(6), 4215–4258. https://doi.org/10.1007/s10462-019-09791-8
  10. Ravi Teja Gedela, Pavani Meesala, Ujwala Baruah, & Soni, B. (2023). “Identifying Sarcasm using heterogeneous word embeddings: a hybrid and ensemble perspective”. SoftComputing. https://doi.org/10.1007/s00500-023-08368-6
  11. Xiong, T., Zhang, P., Zhu, H., & Yang, Y. (2019). “Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling”. The World Wide Web Conference on - WWW ’19. https://doi.org/10.1145/3308558.3313735
  12. Prashanth KVTKN, & Tene Ramakrishnudu. (2023). “Semi-supervised approach for tweet-level stress detection”. Natural Language Processing Journal, 100019–100019. https://doi.org/10.1016/j.nlp.2023.100019
  13. Poria, S., Cambria, E., Hazarika, D., & Vij, P. (2016). “A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks”. ArXiv.org. https://arxiv.org/abs/1610.08815
  14. Doan, T. M., & Gulla, J. A. (2022). “A Survey on Political Viewpoints Identification”. Online Social Networks and Media, 30, 100208. https://doi.org/10.1016/j.osnem.2022.100208
  15. Shaik, T., Tao, X., Dann, C., Xie, H., Li, Y., & Galligan, L. (2023). “Sentiment analysis and opinion mining on educational data: A survey”. Natural Language Processing Journal, 2, 100003. https://doi.org/10.1016/j.nlp.2022.100003.
  16. Chakravarthi, B. R., Priyadharshini, R., Banerjee, S., Jagadeeshan, M. B., Kumaresan, P. K., Ponnusamy, R., Benhur, S., & McCrae, J. P. (2023). “Detecting abusive comments at a fine-grained level in a low-resource language”. Natural Language Processing Journal, 3, 100006. https://doi.org/10.1016/j.nlp.2023.100006.
  17. Kulkarni, D. S., & Rodd, S. S. (2022). “Sentiment Analysis in Hindi—A Survey on the State-of-the-art Techniques”. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(1), 1–46. https://doi.org/10.1145/3469722
  18. M. Nirmala, Gandomi, A. H., Madda Rajasekhara Babu, Babu, D., & Rizwan Patan. (2024). “An Emoticon-Based Novel Sarcasm Pattern Detection Strategy to Identify Sarcasm in Microblogging Social Networks”. IEEE Transactions on Computational Social Systems, 1–8. https://doi.org/10.1109/tcss.2023.3306908
  19. Li, J., Pan, H., Lin, Z., Fu, P., & Wang, W. (2021). “Sarcasm Detection with Commonsense Knowledge”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3192–3201. https://doi.org/10.1109/taslp.2021.3120601
  20. He, S., Guo, F., & Qin, S. (2020). “Sarcasm Detection Using Graph Convolutional Networks with Bidirectional LSTM”. https://doi.org/10.1145/3422713.3422722
  21. Govindan, V., & Balakrishnan, V. (2022). “A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for Sarcasm Detection”. Journal of King Saud University - Computer andInformationSciences. https://doi.org/10.1016/j.jksuci.2022.01.008
  22. Muaad, A. Y., Jayappa Davanagere, H., Benifa, J. V. B., Alabrah, A., Naji Saif, M. A., Pushpa, D., Al-antari, M. A., & Alfakih, T. M. (2022). “Artificial Intelligence-Based Approach for Misogyny and Sarcasm Detection from Arabic Texts”. Computational Intelligence and Neuroscience, 2022,e7937667. https://doi.org/10.1155/2022/7937667
  23. Jothi Prakash V, & Vijay, A. (2023). “Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(12),1–28. https://doi.org/10.1145/3631391
  24. Lahoti, P., Mittal, N., & Singh, G. (2022). A Survey on NLP resources, tools and techniques for Marathi Language Processing. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3548457
  25. Alcamo, T., Cuzzocrea, A., Bosco, G. L., Pilato, G., & Schicchi, D. (2020). Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text Corpora. Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications&Services. https://doi.org/10.1145/3428757.3429144
  26. Feng, H., Xie, S., Wei, W., Haibin, L., & Zhihan, L. (2022). Deep Learning in Computational Linguistics for Chinese Language Translation. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3519386
  27. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep Learning--based Text Classification. ACM Computing Surveys, 54(3), 1–40. https://doi.org/10.1145/3439726
  28. Poeller, S., Dechant, M., Klarkowski, M., & Mandryk, R. L. (2023). Suspecting Sarcasm: How League of Legends Players Dismiss Positive Communication in Toxic Environments. Proceedings of the ACM on Human-Computer Interaction, 7(CHI PLAY), 1–26. https://doi.org/10.1145/3611020
  29. Son, L. H., Kumar, A., Sangwan, S. R., Arora, A., Nayyar, A., & Abdel-Basset, M. (2019). Sarcasm Detection Using Soft Attention-Based Bidirectional Long Short-Term Memory Model With Convolution Network. IEEE Access, 7, 23319–23328. https://doi.org/10.1109/access.2019.2899260
  30. Zhang, Y., Yu, Y., Wang, M., Huang, M., & M. Shamim Hossain. (2023). Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis. ACM Transactions on Multimedia Computing, Communications and Applications/ACM Transactions on Multimedia Computing Communications and Applications. https://doi.org/10.1145/3635311
  31. Jain, P. K., Saravanan, V., & Pamula, R. (2021). A Hybrid CNN-LSTM: A Deep Learning Approach for Consumer Sentiment Analysis Using Qualitative User-Generated Contents. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(5), 1–15. https://doi.org/10.1145/3457206
  32. Cao, J., Li, J., Yin, M., & Wang, Y. (2022). Online reviews sentiment analysis and product feature improvement with deep learning. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3522575
  33. Jothi Prakash V, & Vijay, A. (2023). Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning Architecture. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(12), 1–28. https://doi.org/10.1145/3631391
  34. Oprea, S. V., & Magdy, W. (2020). The Effect of Sociocultural Variables on Sarcasm Communication Online. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW1), 1–22. https://doi.org/10.1145/3392834
  35. Meelen, M., Roux, É., & Hill, N. (2021). Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(1), 1–11. https://doi.org/10.1145/3409488
  36. Agrawal, A., An, A., & Manos Papagelis. (2020). Leveraging Transitions of Emotions for Sarcasm Detection. https://doi.org/10.1145/3397271.3401183
  37. Tusarkanta Dalai, Tapas Kumar Mishra, & Sa, P. K. (2024). Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(2), 1–23. https://doi.org/10.1145/3637877.
  38. R Prasanna Kumar, G Bharathi Mohan, Yamani Kakarla, L, J. S., Kolla Gnapika Sindhu, Sai, V., Ganesh, B., & Nunna Hasmitha Krishna. (2023). Sarcasm Detection in Telugu and Tamil: An Exploration of Machine Learning and Deep Neural Networks. https://doi.org/10.1109/icccnt56998.2023.10306775
Index Terms

Computer Science
Information Sciences
NLP
Machine Learning Classification Algorithms
Telugu Language Text
SVM
NB
LR

Keywords

Natural Language Processing; Sarcasm Detection; Machine Learning Low-resource language