International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 94 - Number 12 |
Year of Publication: 2014 |
Authors: Sumitra Pundlik, Varun Rambal, Sonu Tayade, Sonali Ramteke, Pratigya Suri |
10.5120/16399-6020 |
Sumitra Pundlik, Varun Rambal, Sonu Tayade, Sonali Ramteke, Pratigya Suri . Towards SENTIEXTRACT: A Combination of OCR and Sentiment Analysis. International Journal of Computer Applications. 94, 12 ( May 2014), 38-41. DOI=10.5120/16399-6020
Do you have a lot of unstructured data in image files? Are you interested in finding out the sentiment of those files? If you are SENTIEXTRACT is the perfect tool for you. In this paper, we have given an insight of our system (SENTIEXTRACT). Our system works on algorithms such as tesseract-ocr to convert image files to text files and naïve bayes classifier to find out the sentiments of these files. In this system we are using a data set of movie reviews collected from IMDB. Giving this dataset as training dataset to our naïve bayes classifier, we have tried to achieve high accuracy for our system. Also, experimental results of how our system responds are shown for image files based different size and number of words.