International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 183 - Number 37 |
Year of Publication: 2021 |
Authors: Elaf Alhazmi |
10.5120/ijca2021921765 |
Elaf Alhazmi . Fake News Classification in Machine Learning with Different Word Representations. International Journal of Computer Applications. 183, 37 ( Nov 2021), 1-7. DOI=10.5120/ijca2021921765
Text classification has been effectively applied in a variety of domains, one of which is the detection of fake news. Working with a classification framework is an important approach for detecting fake news. One of the most significant steps in converting text to numbers in a classification framework is feature extraction. In this paper, we compare the effectiveness of several feature extraction approaches such as bag of words, TF-IDF, and one-hot encoding. For the experiment, we measured the accuracy of the classification and evaluated the best/worst classifier in three techniques using three fake news detection data sets and six machine learning classifiers. Following our tests, we discovered that employing a bag of words, also known as CountVectorizer, and the TF-IDF approach in text classification for selected data outperforms one-hot encoding. Despite the fact that logistic regression and support vector machine both produce valid results by using bag of words and TF-IDF, random forest classifier is the only algorithm that consistently produces accurate results in all three feature extraction methods. The accuracy of support vector machine in one-hot encoding was the lowest even though the algorithm produced substantial results in the other two extraction procedures.