Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation

Oluwafemi Oriola

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Wirelessly Transmitting a Grayscale Image using Visible Light

November

2012

Development and Performance Evaluation of Mismatched Filter using Differential Evolution

May

2012

A Novel Prioritised Concealment and Flexible Macroblock Ordering Scheme for Video Transmission

Sep

2016

An Optimizing Technique based on Genetic Algorithm for Power Management in Heterogeneous Multi-Tier Web Clusters

April

2015

Reseach Article

Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation

by Oluwafemi Oriola

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 176 - Number 39

Year of Publication: 2020

Authors: Oluwafemi Oriola

10.5120/ijca2020920503

Oluwafemi Oriola . Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation. International Journal of Computer Applications. 176, 39 ( Jul 2020), 25-30. DOI=10.5120/ijca2020920503

@article{ 10.5120/ijca2020920503,

author = { Oluwafemi Oriola },

title = { Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2020 },

volume = { 176 },

number = { 39 },

month = { Jul },

year = { 2020 },

issn = { 0975-8887 },

pages = { 25-30 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume176/number39/31460-2020920503/ },

doi = { 10.5120/ijca2020920503 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:40:52.889740+05:30

%A Oluwafemi Oriola

%T Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation

%J International Journal of Computer Applications

%@ 0975-8887

%V 176

%N 39

%P 25-30

%D 2020

%I Foundation of Computer Science (FCS), NY, USA

Abstract

FakeNewsNet is a repository of two novel datasets, PolitiFact and GossipCop, which are employed for evaluation of fake news detection techniques. Unlike other extensively studied benchmark fake news datasets, the FakeNewsNet datasets incorporate news content, social context, and dynamic information, which could be used to study fake news propagation, detection, and mitigation. Existing works on FakeNewsNet have focused on one-hot encoding, social contexts such as user-based models, and dynamic information such as news propagation model. However, n-gram, word embeddings, and topic models of news contents, which have been impressive in other contexts have not been explored. This paper therefore explores n-gram, word embeddings, and topic models of news contents for the evaluation of FakeNewsNet datasets. Unigram-based n-gram model, skip-gram word2vec-based word embeddings model and Latent Dirichlet Allocation-based topic model are extracted after preprocessing the datasets. The features are weighted by TFIDF to overcome the shortcomings of the individual models and analyzed using Logistic Regression. The evaluation of the models and their hybrids shows that n-gram model outperforms word embedding and topic models. Specifically, n-gram model records accuracy, precision, recall and F1-score of 0.80, 0.79, 0.78 and 0.79, respectively for PolitiFact and records 0.82, 0.75, 0.79 and 0.77, respectively for GossipCop. The comparison with benchmarks also shows that the performance of n-gram model is better.

References

S. Vosoughi, M. N. E. O. Mohsenvand, and D. E. B. Roy, “Rumor Gauge : Predicting the Veracity of Rumors on Twitter r r,” ACM Trans. Knowl. Discov. Data, vol. 11, no. 4, 2017.
R. Yan, Y. I. Li, W. Wu, D. Li, and Y. Wang, “Rumor Blocking through Online Link Deletion,” ACM Trans. Knowl. Discov. Data, vol. 13, no. 2, 2019.
W. Y. Wang, “‘ Liar , Liar Pants on Fire ’: A New Benchmark Dataset for Fake News Detection,” 2016.
B. Andreas Hanselowski, Avinesh PVS and and F. C. Schiller, “Team Athene on the Fake News Challenge.,” 2017. [Online]. Available: https://medium.com/@andre134679/%0Ateam-athene-on-the-fake-news-/%0Achallenge-28a5cf5e017b.
BuzzFeedNews, “BuzzFeedNews,” 2016. [Online]. Available: https://github.com/BuzzFeedNews/2016-10-facebook-factcheck/%0Ablob/master/data.
M. Aldwairi and A. Alwahedi, “Detecting Fake News in Social Media Networks,” Procedia Comput. Sci., vol. 141, pp. 215–222, 2018.
S. Castelo, E. Nakamura, and J. Freire, “A Topic-Agnostic Approach for Identifying Fake News Pages,” in Companion Proceedings of the 2019 World Wide Web Conference (WWW ’19 Companion), p. 6pages.
A. Thota, “Fake News Detection : A Deep Learning Approach,” SMU Data Sci. Rev., vol. 1, no. 3, 2018.
K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “FakeNewsNet : A Data Repository with News Content , Social Context and Dynamic Information for Studying Fake News on Social Media,” Assoc. Adv. Artif. Intell., 2017.
TampaBayTimes, “PolitiFact,” Tampa Bay Times. [Online]. Available: https://www.politifact.com/.
“GossipCop.” [Online]. Available: https://www.gossipcop.com/.
US, “PolitiFact.” [Online]. Available: http://politifact.com.
S. Yang, K. Shu, S. Wang, R. Gu, F. Wu, and H. Liu, “Unsupervised Fake News Detection on Social Media : A Generative Approach,” 2019.
B. Ghanem, P. Rosso, and F. Rangel, “Stance Detection in Fake News : A Combined Feature Representation,” in Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), 2018, pp. 66–71.
K. Xu, F. Wang, H. Wang, and B. Yang, “Detecting Fake News Over Online Social Media via Domain Reputations and Content Understanding,” TSINGHUA Sci. Technol., vol. 25, no. 1, pp. 20–27, 2020.
P. Fortuna and S. Nunes, “A Survey on Automatic Detection of Hate Speech in Text,” ACM Comput. Surv. 51, 4, vol. 51, no. 4, 2018.
G. O. and D. E. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
S. Malmasi and M. Zampieri, “Challenges in Discriminating Profanity from Hate Speech,” pp. 1–16, 2011.
A. Gaydhani, V. Doma, S. Kendre, and L. Bhagwat, “Detecting Hate Speech and Offensive Language on Twitter using Machine Learning : An N-gram and TFIDF based Approach.”
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pp. 1–9.R. Reh, “gensim Documentation,” 2017. and M. I. J. D. M. Blei, A. Y. Ng, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no, no. 3, pp. 993– 1022, 2003.
PythonTM, “Python 3.6.4.” 2017.
E. L. Steven Bird, Ewan Kliein, Analyzing Texts with Natural Language Toolkit: Natural Language Processing with Python, First. O’Reilly, 2009.
N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : Synthetic Minority Over-sampling Technique,” vol. 16, pp. 321–357, 2002.

Index Terms

Computer Science

Information Sciences

Keywords

Fake News Detection FakeNewsNet Classification News Content Features TFIDF