CFP last date
20 January 2025
Reseach Article

Towards SENTIEXTRACT: A Combination of OCR and Sentiment Analysis

by Sumitra Pundlik, Varun Rambal, Sonu Tayade, Sonali Ramteke, Pratigya Suri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 12
Year of Publication: 2014
Authors: Sumitra Pundlik, Varun Rambal, Sonu Tayade, Sonali Ramteke, Pratigya Suri
10.5120/16399-6020

Sumitra Pundlik, Varun Rambal, Sonu Tayade, Sonali Ramteke, Pratigya Suri . Towards SENTIEXTRACT: A Combination of OCR and Sentiment Analysis. International Journal of Computer Applications. 94, 12 ( May 2014), 38-41. DOI=10.5120/16399-6020

@article{ 10.5120/16399-6020,
author = { Sumitra Pundlik, Varun Rambal, Sonu Tayade, Sonali Ramteke, Pratigya Suri },
title = { Towards SENTIEXTRACT: A Combination of OCR and Sentiment Analysis },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 12 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 38-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number12/16399-6020/ },
doi = { 10.5120/16399-6020 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:17:30.633221+05:30
%A Sumitra Pundlik
%A Varun Rambal
%A Sonu Tayade
%A Sonali Ramteke
%A Pratigya Suri
%T Towards SENTIEXTRACT: A Combination of OCR and Sentiment Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 12
%P 38-41
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Do you have a lot of unstructured data in image files? Are you interested in finding out the sentiment of those files? If you are SENTIEXTRACT is the perfect tool for you. In this paper, we have given an insight of our system (SENTIEXTRACT). Our system works on algorithms such as tesseract-ocr to convert image files to text files and naïve bayes classifier to find out the sentiments of these files. In this system we are using a data set of movie reviews collected from IMDB. Giving this dataset as training dataset to our naïve bayes classifier, we have tried to achieve high accuracy for our system. Also, experimental results of how our system responds are shown for image files based different size and number of words.

References
  1. Bing Liu,2012, Sentiment Analysis and Opinion Mining.
  2. Ravina Mithe, Supriya Indalkar and Nilam Divekar,2013, Optical Character Recognition, IJRTE, Volume-2, Issue-1, March 2013.
  3. A McCallum, K Nigam,1998, A comparison of event models for naive bayes text classification.
  4. Jeonghee Yi, 2003 , Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques.
  5. Alexander Pak, Patrick Paroubek,2010, Twitter as a Corpus for Sentiment Analysis and Opinion Mining.
  6. Theresa Wilson, Janyce Wiebe, Paul Hoffmann,2005, Recognizing contextual polarity in phrase-level sentiment analysis.
  7. Hong, Yancheng and Steven Skiena,2010, The Wisdom of Bookies? Sentiment Analysis vs. the NFL Point Spread,Proceedings of the International Conference on Weblogs and SocialMedia.
  8. O'Connor, Brendan, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith, 2010,From Tweets to Polls: Linking Text Sentimentto Public Opinion Time Series,Proceedings of the International AAAI Conference on Weblogs and Social Media.
  9. Tumasjan, Andranik, Timm O. Sprenger, Philipp G. Sandner, and Isabell M. Welpe,2010, Predicting elections with twitter: What 140 characters reveal about political sentiment, Proceedings of the International Conference on Weblogs and Social Media.
  10. Ayatullah Faruk Mollah, Nabamita Majumder, Subhadip Basu and Mita Nasipuri,2011, Design of an Optical Character Recognition System for Camera-based Handheld Devices, IJCSI, Vol. 8, Issue 4, No. 1,July,2011.
  11. Kirill Safronov, Dr. Ing. Igor Tchouchenkov and Prof. Dr. Ing. Heinz Wörn,2007, Optical Character Recognition Using Optimisation Algorithms, Proceedings of the 9thInternational Workshop on Computer Science and Information Technologies, Ufa, Russia, 2007.
  12. S. J. Perantonis, B. Gatos and V. Maragos,2003, A novel Web image processing algorithm for text area identification that helps commercial OCR engines to improve their Web image recognition efficiency.
  13. C. V. Jawahar, M. N. S. S. K. Pavan Kumar and S. S. Ravi Kiran,2003, A Bilingual OCR for Hindi-Telugu Documents and its Applications.
  14. Abdelwadood Mesleh, Ahmed Sharadqh, Jamil Al-Azzeh, MazenAbu-Zaher, Nawal Al-Zabin, Tasneem Jaber, Aroob Odeh and Myssa'a Hasn,2012, An Optical Character Recognition, Contemporary Engineering Sciences, Vol. 5, No. 11,2012.
  15. Ray Smith, 2007,An Overview of the Tesseract OCR Engine, IEEE, 2007.
  16. R. Smith, A Simple and Efficient Skew Detection Algorithm via Text Row Accumulation,Proc. of the 3rd Int. Conf. on Document Analysis and Recognition,IEEE, Vol. 21995.
  17. Bo Pang and Lillian Lee,2004, A Sentiment Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.
Index Terms

Computer Science
Information Sciences

Keywords

Sentiment Analysis Optical Character Recognition Internet Movie Database