Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy

Mirsad Hadžić; Zerina Mašetić; Fatima Mašić

Call for Paper

February Edition

IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2026

Submit your paper

Know more

The week's pick

DHCPv6 Security Threats in Smart City Infrastructure: A Comprehensive Case Study of USA Municipalities

Joy Selasi Agbesi

Random Articles

Reseach Article

Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy

by Mirsad Hadžić, Zerina Mašetić, Fatima Mašić

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 58

Year of Publication: 2025

Authors: Mirsad Hadžić, Zerina Mašetić, Fatima Mašić

10.5120/ijca2025925988

Mirsad Hadžić, Zerina Mašetić, Fatima Mašić . Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy. International Journal of Computer Applications. 187, 58 ( Nov 2025), 73-79. DOI=10.5120/ijca2025925988

@article{ 10.5120/ijca2025925988,

author = { Mirsad Hadžić, Zerina Mašetić, Fatima Mašić },

title = { Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2025 },

volume = { 187 },

number = { 58 },

month = { Nov },

year = { 2025 },

issn = { 0975-8887 },

pages = { 73-79 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number58/sentiment-analysis-of-fifa-related-tweets-integrating-nltks-vader-with-bert-for-enhanced-classification-accuracy/ },

doi = { 10.5120/ijca2025925988 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-11-18T21:11:20+05:30

%A Mirsad Hadžić

%A Zerina Mašetić

%A Fatima Mašić

%T Sentiment Analysis of FIFA-Related Tweets: Integrating NLTK’s VADER with BERT for Enhanced Classification Accuracy

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 58

%P 73-79

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The purpose of this research is to perform sentiment analysis on Twitter data using Natural Language Processing (NLP) techniques, particularly leveraging the NLTK library in Python within a Jupyter notebook environment. The study aims to explore sentiment classification methods, evaluating the emotional tone of tweets and categorizing them as neutral, positive, or negative sentiments, utilizing NLTK's SentimentIntensityAnalyzer. The sample consists of Twitter data with columns like 'Tweet' and 'Sentiment' sourced from a CSV file. The methodology involves tokenizing and processing the text, grading sentiment, counting occurrences of the hashtag #fifa, and analyzing word frequencies [1]. In addition to the lexicon-based VADER approach, the study incorporates a transformer-based deep learning model—BERT (Bidirectional Encoder Representations from Transformers) -to enhance sentiment classification accuracy. BERT, pre-trained on large corpora and capable of understanding context and nuanced language, offers a state-of-the-art alternative to traditional models. This inclusion allows a comparative analysis between rule-based and deep learning approaches, highlighting BERT’s effectiveness in handling complex tweet structures. Furthermore, the study investigates the impact of removing stopwords and explores the list of eliminated stopwords. The expected results include gaining insights into prevalent sentiments on Twitter regarding a specified topic, frequency of the hashtag #fifa, and a comprehensive understanding of word usage, visually depicted through wordclouds. Possible limitations include inherent subjectivity in sentiment analysis, potential variations in language use, reliance on hashtag frequency as an indicator of topic prevalence, and the effectiveness of stopword removal, which may be context-dependent. The addition of wordcloud analysis enhances the visual representation of the most frequent words, providing a holistic perspective on the dataset.

References

Saif M. Mohammed (2017). Challenges in Sentiment Analysis. arXiv preprint. https://ufal.mff.cuni.cz/~hana/teaching/Mohammad2017_Chapter_ChallengesInSentimentA nalysis.pdf
VADER. (2024). https://www.geeksforgeeks.org/python-sentiment-analysis-using-vader/
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
Pang, B., & Lee, L. (2008). Thumbs up? Sentiment Classification using Machine Learning Techniques. https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
Lei Z., S.W., B. Liu (2018). Deep Learning for Sentiment Analysis : A Survey https://arxiv.org/abs/1801.07883
Saif M. Mohammed, & S.K. (2018). https://svkir.com/papers/Mohammad-Kiritchenko-Tweets-VAD-EI-LREC-2018.pdf
Caliskan, A., et al. (2017). Semantics derived automatically from language corpora contain human-like biases. Science. https://www.science.org/doi/10.1126/science.aal4230
"Natural Language Processing in Python: Exploring Word Frequencies with NLTK" - Medium. (2021) https://medium.com/@siglimumuni/natural-language-processing-in-python-exploring-word-fr equencies-with-nltk-918f33c1e4c3
Dataset. (2022). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets
NLTK. (2025). https://www.nltk.org/
"Simple WordCloud using NLTK Library in Python" - NLPfy. (2021) https://nlpfy.com/simple-wordcloud-using-nltk-library-in-python/
Mueller, A. (2012). WordCloud Documentation. https://github.com/amueller/word_cloud
Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216-225.
MNB. (2024). https://www.geeksforgeeks.org/multinomial-naive-bayes/
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
Kaggle (2023). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets/data.
Snscrape (2007). https://github.com/JustAnotherArchivist/snscrape

Index Terms

Computer Science

Information Sciences

Keywords

Sentiment analysis Twitter data mining VADER BERT NLP