Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers

Parth Vora; Mansi Khara; Kavita Kelkar

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Wirelessly Transmitting a Grayscale Image using Visible Light

November

2012

Development and Performance Evaluation of Mismatched Filter using Differential Evolution

May

2012

A Novel Prioritised Concealment and Flexible Macroblock Ordering Scheme for Video Transmission

Sep

2016

An Optimizing Technique based on Genetic Algorithm for Power Management in Heterogeneous Multi-Tier Web Clusters

April

2015

Reseach Article

Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers

by Parth Vora, Mansi Khara, Kavita Kelkar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 178 - Number 3

Year of Publication: 2017

Authors: Parth Vora, Mansi Khara, Kavita Kelkar

10.5120/ijca2017915773

Parth Vora, Mansi Khara, Kavita Kelkar . Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers. International Journal of Computer Applications. 178, 3 ( Nov 2017), 1-7. DOI=10.5120/ijca2017915773

@article{ 10.5120/ijca2017915773,

author = { Parth Vora, Mansi Khara, Kavita Kelkar },

title = { Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2017 },

volume = { 178 },

number = { 3 },

month = { Nov },

year = { 2017 },

issn = { 0975-8887 },

pages = { 1-7 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume178/number3/28651-2017915773/ },

doi = { 10.5120/ijca2017915773 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:49:21.606492+05:30

%A Parth Vora

%A Mansi Khara

%A Kavita Kelkar

%T Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers

%J International Journal of Computer Applications

%@ 0975-8887

%V 178

%N 3

%P 1-7

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the large-scale penetration of social media into our daily lives, it has become a platform for individuals to share and express their views, feelings, opinions, and thoughts. Identifying emotions has many applications ranging from personalized marketing to behavior study. Individuals express their feelings in a language that is frequently accompanied by ambiguity and figure of speech, which makes it difficult even for humans to comprehend. In this paper, we propose a new approach to classify text into emotion categories. We use Twitter data as labeled input, this data is labeled using hashtags and addresses features like emoticons, emoji, apostrophes, Twitter slang and spelling variations which are a part of informal language on social media. Our model uses word vectors generated by architecture like Word2vec, Glove, and Fasttext to generate word representations of the text. We then investigate the utility of these models on random forest classifier. Ultimately we compare the results to find the best model for text classification based on emotions. We achieve an overall 91% precision for four emotional classes on a mined dataset of more than 100,000 tweets. This is a very useful tool to understand human behavior and a natural step beyond the positive/negative polarity.

References

Bollen, Johan, Huina Mao, and Xiaojun Zeng. "Twitter mood predicts the stock market." Journal of computational science 2.1 (2011): 1-8.
Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
Schwenk, Holger. "Continuous space language models." Computer Speech & Language 21.3 (2007): 492-518.
Mikolov, Tomáš, et al. "Empirical evaluation and combination of advanced language modeling techniques." Twelfth Annual Conference of the International Speech Communication Association. 2011.
Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
Bojanowski, Piotr, et al. "Enriching word vectors with subword information." arXiv preprint arXiv:1607.04606 (2016).
Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).
Hasan, Maryam, Emmanuel Agu, and Elke Rundensteiner. "Using hashtags as labels for supervised learning of emotions in Twitter messages." Proceedings of the Health Informatics Workshop (HI-KDD). 2014
Hasan, Maryam, Elke Rundensteiner, and Emmanuel Agu. "Emotex: Detecting emotions in twitter messages." (2014).
Barbieri, Francesco, Francesco Ronzano, and Horacio Saggion. "What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis." LREC. 2016.
Norvig, Peter. "How to write a spelling corrector." De: http://norvig. com/spell-correct. HTML (2007).
Rehurek, Radim, and Petr Sojka. "Software framework for topic modelling with large corpora." In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010.
Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009.
Xu, Baoxun, et al. "An Improved Random Forest Classifier for Text Categorization." JCP 7.12 (2012): 2913-2920.
Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of Machine Learning Research 12.Oct (2011): 2825-2830.

Index Terms

Computer Science

Information Sciences

Keywords

Keywords Word vectors random forests Word2vec Glove emotion text classification