CFP last date
01 October 2024
Reseach Article

Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam

by Dipalee B. Borse, Swati K. Borse, Vijaya Ahire
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 10
Year of Publication: 2023
Authors: Dipalee B. Borse, Swati K. Borse, Vijaya Ahire
10.5120/ijca2023922766

Dipalee B. Borse, Swati K. Borse, Vijaya Ahire . Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam. International Journal of Computer Applications. 185, 10 ( May 2023), 12-17. DOI=10.5120/ijca2023922766

@article{ 10.5120/ijca2023922766,
author = { Dipalee B. Borse, Swati K. Borse, Vijaya Ahire },
title = { Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam },
journal = { International Journal of Computer Applications },
issue_date = { May 2023 },
volume = { 185 },
number = { 10 },
month = { May },
year = { 2023 },
issn = { 0975-8887 },
pages = { 12-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number10/32735-2023922766/ },
doi = { 10.5120/ijca2023922766 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:25:43.988429+05:30
%A Dipalee B. Borse
%A Swati K. Borse
%A Vijaya Ahire
%T Evaluating the Performance of Machine Learning Classifiers for Detecting Twitter Spam
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 10
%P 12-17
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The usage of social networking sites is rising rapidly every day. The popularity of twitter as a microblogging site is huge in normal users as well as illegitimate users. The people with wrong intentions use twitter to spread spam posts which results in phishing, monetary loss, un-useful or noisy data on social media, stealing personal information etc. It becomes extremely important to stop spamming activities. In this paper six machine learning classifiers, which are Logistic regression & Support Vector machine (linear models) and Random forest, K- Nearest Neighbor, Decision tree and Naive Bayes, (nonlinear models), have been implemented on existing data and compared the performance using different parameters such as accuracy, F1-score, recall, precision, f-measure. Among the six classifiers random forest has shown better accuracy followed by K-nearest neighbor classifier for large continuous dataset than small or random dataset. The accuracy is increased from 3% to 13% for large continuous data. Also False positive ratio of random forest and K-nearest neighbor algorithm 0.001 and 0.005 respectively which is much lesser than other algorithms. With lowest accuracy and highest FPR Naive Bayes algorithm performed worst for large datasets.

References
  1. Chen, Chao, Jun Zhang, Xiao Chen, Yang Xiang, and Wanlei Zhou. "6 million spam tweets: A large ground truth for timely Twitter spam detection." In 2015 IEEE international conference on communications (ICC), pp. 7065-7070. IEEE, 2015.
  2. Sun, Nan, Guanjun Lin, Junyang Qiu, and Paul Rimba. "Near real-time twitter spam detection with machine learning techniques." International Journal of Computers and Applications 44, no. 4 (2022): 338-348
  3. Lin, Guanjun, et al. "Statistical twitter spam detection demystified: performance, stability and scalability." IEEE access 5 (2017): 11142-11154.
  4. Chen, Chao, Jun Zhang, Yi Xie, Yang Xiang, Wanlei Zhou, Mohammad Mehedi Hassan, AbdulhameedAlElaiwi, and MajedAlrubaian. "A performance evaluation of machine learning-based streaming spam tweets detection." IEEE Transactions on Computational Social systems 2, no. 3 (2015): 65-76.
  5. Borse, D., Borse, S. (2022). State of the Art on Twitter Spam Detection. In: Iyer, B., Crick, T., Peng, SL. (eds) Applied Computational Technologies. ICCET 2022. Smart Innovation, Systems and Technologies, vol303. Springer, Singapore. https://doi.org/10.1007/978-981-19-2719-5_46
  6. Abu-Salih, Bilal, Dana Al Qudah, Malak Al-Hassan, Seyed Mohssen Ghafari, Tomayess Issa, Ibrahim Aljarah, Amin Beheshti, and Sulaiman Alqahtan. "An Intelligent System for Multi-Topic Social Spam Detection in Microblogging." arXiv preprint arXiv:2201.05203 (2022).
  7. Wu, Tingmin, Shigang Liu, Jun Zhang, and Yang Xiang. "Twitter spam detection based on deep learning." In Proceedings of the australasian computer science week multiconference, pp. 1-8. 2017.
  8. Rodrigues, Anisha P., Roshan Fernandes, Adarsh Shetty, Kuruva Lakshmanna, and R. Mahammad Shafi. "Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques." Computational Intelligence and Neuroscience 2022 (2022).
  9. Zhu, Tiantian, Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee, and Alok N. Choudhary. "Beating the artificial chaos: Fighting OSN spam using its own templates." IEEE/ACM Transactions on Networking 24, no. 6 (2016): 3856-3869.
  10. Wang, Xuesong, Qi Kang, Jing An, and Mengchu Zhou. "Drifted Twitter spam classification using multiscale detection test on KL divergence." IEEE Access 7 (2019): 108384-108394.
  11. Tajalizadeh, Hadi, and Reza Boostani. "A novel stream clustering framework for spam detection in Twitter." IEEE Transactions on Computational Social Systems 6, no. 3 (2019): 525-534.
  12. Jain, Gauri, Manisha Sharma, and Basant Agarwal. "Spam detection in social media using convolutional and long short term memory neural network." Annals of Mathematics and Artificial Intelligence 85.1 (2019): 21-44.
  13. El-Mawass, Nour, Paul Honeine, and Laurent Vercouter. "SimilCatch: Enhanced social spammers detection on twitter using Markov random fields." Information Processing & Management 57, no. 6 (2020): 102317.
  14. Tang, Wenbing, Zuohua Ding, and Mengchu Zhou. "A spammer identification method for class imbalanced weibo datasets." IEEE Access 7 (2019): 29193-29201.
  15. https://vkosuri.github.io/CourseraMachineLearning/
  16. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a4 44fca47.
  17. Ameen, Aso Khaleel, and Buket Kaya. "Spam detection in online social networks by deep learning." 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE, 2018.
Index Terms

Computer Science
Information Sciences

Keywords

Spam detection Machine learning twitter spam detection information security.