CFP last date
20 March 2025
Reseach Article

Data Extraction and Sentiment Analysis of Social Media

by Kshitij Sekhar Dutta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 24
Year of Publication: 2024
Authors: Kshitij Sekhar Dutta

Kshitij Sekhar Dutta . Data Extraction and Sentiment Analysis of Social Media. International Journal of Computer Applications. 186, 24 ( Jun 2024), 23-28. DOI=10.5120/ijca2024923700

@article{ 10.5120/ijca2024923700,
author = { Kshitij Sekhar Dutta },
title = { Data Extraction and Sentiment Analysis of Social Media },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2024 },
volume = { 186 },
number = { 24 },
month = { Jun },
year = { 2024 },
issn = { 0975-8887 },
pages = { 23-28 },
numpages = {9},
url = { },
doi = { 10.5120/ijca2024923700 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-06-27T00:56:26.972947+05:30
%A Kshitij Sekhar Dutta
%T Data Extraction and Sentiment Analysis of Social Media
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 24
%P 23-28
%D 2024
%I Foundation of Computer Science (FCS), NY, USA

Social media nowadays has become synonymous with the internet. Social media platforms have long evolved from being simple forums where people could post photos and thoughts to now being a base where people can launch successful entrepreneurial businesses or even turn into influencers, potentially earning them millions. This paper aims to harness the power of Natural Language Processing through Sentiment Analysis and apply it to one of the most popular social media forums right now, Reddit. Reddit has 73.1 million daily active users and 267.5 million weekly active users. There are more than 100,000 active subreddits (sub-forums) on the platform. This paper utilizes Reddit APIs to employ a crawler that scrapes data from Reddit and orders them into a single data set. Then, the paper examines the structure of this data set. Through this data set it then analyses what the current topics of discussions were about, what the perceived opinions of the users were about the various topics. This paper intends to find if there is a correlation between the amount and type of emotions. This paper also highlights the concerns with using Sentiment Analysis and some other applications of it in real-life.

  1. Curiskis, S.A., Drake, B., Osborn, T.R., Kennedy, P.J.: An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inf. Process. Manage. 57(2), 102,034 (2020)
  2. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)
  3. Choi, D., Han, J., Chung, T., Ahn, Y.Y., Chun, B.G., Kwon, T.T.: Characterizing conversation patterns in Reddit: from the perspectives of content properties and user participation behaviors. In: Proceedings of the 2015 ACM on Conference on Online Social Networks, pp. 233–243 (2015)
  4. Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 412–418 (2004)
  5. Kouloumpis, E., Wilson, T., & Moore, J. (2021). Twitter Sentiment Analysis: The Good the Bad and the OMG!. Proceedings of the International AAAI Conference on Web and Social Media, 5(1), 538-541.
  6. Godbole, N., Srinivasaiah, M., Skiena, S.: Large-scale sentiment analysis for news and blogs. ICWSM 7(21), 219–222 (2007)
  7. Glenski, M., Pennycuff, C., Weninger, T.: Consumers and curators: browsing and voting patterns on Reddit. IEEE Trans. Comput. Soc. Syst. 4(4), 196–206 (2017)
  8. Stoddard, G.: Popularity and quality in social news aggregators: a study of Reddit and hacker news. In: Proceedings of the 24th International Conference on World Wide Web, pp. 815–818 (2015)
  9. Semrush Blog, accessed on 20th of April 2024, <
Index Terms

Computer Science
Information Sciences
Natural Language Processing


Sentiment Analysis Data Mining Natural Language Processing Social Media Reddit