Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study

Chirag Patel; Atul Patel; Dharmendra Patel

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

RackOps: Software Architecture and Automation Patterns for Large-Scale Server Rack Validation

Gopimahesh Vatram

Random Articles

Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)

April

2014

Fuzzy based Probability Factor Calculation for Number of Cluster Estimation to K-Mean by using Apriori

March

2015

Comparison of various Security Protocols in RFID

June

2011

Code and Performance-based Metrics for Multithreaded Object-Oriented Software

Jan

2025

Reseach Article

Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study

by Chirag Patel, Atul Patel, Dharmendra Patel

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 55 - Number 10

Year of Publication: 2012

Authors: Chirag Patel, Atul Patel, Dharmendra Patel

10.5120/8794-2784

Chirag Patel, Atul Patel, Dharmendra Patel . Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications. 55, 10 ( October 2012), 50-56. DOI=10.5120/8794-2784

@article{ 10.5120/8794-2784,

author = { Chirag Patel, Atul Patel, Dharmendra Patel },

title = { Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study },

journal = { International Journal of Computer Applications },

issue_date = { October 2012 },

volume = { 55 },

number = { 10 },

month = { October },

year = { 2012 },

issn = { 0975-8887 },

pages = { 50-56 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume55/number10/8794-2784/ },

doi = { 10.5120/8794-2784 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:56:55.725173+05:30

%A Chirag Patel

%A Atul Patel

%A Dharmendra Patel

%T Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study

%J International Journal of Computer Applications

%@ 0975-8887

%V 55

%N 10

%P 50-56

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Optical character recognition (OCR) method has been used in converting printed text into editable text. OCR is very useful and popular method in various applications. Accuracy of OCR can be dependent on text preprocessing and segmentation algorithms. Sometimes it is difficult to retrieve text from the image because of different size, style, orientation, complex background of image etc. We begin this paper with an introduction of Optical Character Recognition (OCR) method, History of Open Source OCR tool Tesseract, architecture of it and experiment result of OCR performed by Tesseract on different kinds images are discussed. We conclude this paper by comparative study of this tool with other commercial OCR tool Transym OCR by considering vehicle number plate as input. From vehicle number plate we tried to extract vehicle number by using Tesseract and Transym and compared these tools based on various parameters.

References

ARCHANA A. SHINDE, D. 2012. Text Pre-processing and Text Segmentation for OCR. International Journal of Computer Science Engineering and Technology, pp. 810-812.
ANAGNOSTOPOULOS,C. ,ANAGNOSTOPOULOS, I. , LOUMOS, V, & KAYAFAS, E. 2006. A License Plate Recognition Algorithm for Intelligent Transportation System Applications. . , IEEE Transactions on Intelligent Transportation Systems, pp. 377- 399.
Y. WEN, Y. L. 2011. An Algorithm for License Plate Recognition Applied to Intelligent Transportation System. , IEEE Transactions on Intelligent Systems, pp. 1-16.
XIN FAN, G. L. 2009. Graphical Models for Joint Segmentation and Recognition of License Plate Characters. IEEE Signal Processing Letters, pp. 10-13.
HUI WU, B. L. 2011. License Plate Recognition system. International Conference on Multimedia Technology (ICMT). pp. 5425 - 5427.
PAN, Y. -F. , HOU, X. , & LIU, C. -L. 2008. A Robust System to Detect and Localize Texts in Natural Scene Images. The Eighth IAPR International Workshop on Document Analysis Systems.
SMITH, R. 2007. An Overview of the Tesseract OCR Engine. In proceedings of Document analysis and Recognition. . ICDAR 2007. IEEE Ninth International Conference.
GOOGLE. Google Code. google code. [Online] 2012. http://code. google. com/p/tesseract-ocr/.
F. SHAFAIT, D. K. San Jose, CA : s. n. , 2008. Efficient Implementation of Local Adaptive Thresholding Techniques Using Integral Images. . In Document Recognition and Retrieval XV, S&T/SPIE Annual Symposium on Electronic Imaging.
1stwebdesigner. 1stwebdesigner. [Online] 2012. http://www. 1stwebdesigner. com/wp- content/uploads/2009/11/typography- tutorial/text1-how-to-create-typographic- wallpaper. jpg.
dsigninspire. Desing Inspire. [Online] 2012. http://dsigninspire. com/wpcontent/uploads/2011/ 09/moon-shine. jpg.
Geometric Rectification of Camera-Captured Document Images. Jian Liang; DeMenthon, D. ; Doermann, D. ; April 2008. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 591-605.
Y. WEN, Y. L. 2011. ,An Algorithm for License Plate Recognition Applied to Intelligent Transportation System. IEEE Transactions on Intelligent Systems, pp. 1-16.
Lihong Zheng, Xiangjian He, Bijan Samali, Laurence T. Yang. An algorithm for accuracy enhancement of license plate recognition. Journal of Computer and System Sciences, Available online 9 May 2012.
Deselaers, T. ; Gass, T. ; Heigold, G. ; Ney, H. ; Latent Log-Linear Models for Handwritten Digit Classification. June 2012. IEEE Transactions on Pattern Analysis and Machine Intelligence, , vol. 34, no. 6, pp. 1105-1117, doi: 10. 1109/TPAMI. 2011. 218.
Jianbin Jiao, Qixiang Ye, Qingming Huang, A configurable method for multi-style license plate recognition. 2009. Pattern Recognition, Volume 42, Issue 3, , Pages 358-369.
H. Erdinc Kocer, K. Kursat Cevik. 2011. Artificial neural networks based vehicle license plate recognition. Procedia Computer Science, Volume 3, Pages 1033-1037.
Apurva A. Desai. 2010. Gujarati handwritten numeral optical character reorganization through neural network, Pattern Recognition, Volume 43, Issue 7 Pages 2582-2589, ISSN 0031-3203, 10. 1016/j. patcog. 2010. 01. 008.
Roy, A. ; Ghoshal, D. P. 2011. Number Plate Recognition for use in different countries using an improved segmentation, 2nd National Conference on Emerging Trends and Applications in Computer Science (NCETACS),vol. , no. , pp. 1-5, 4-5. doi: . 1109/NCETACS. 2011. 5751407.
Umapada Pal, Partha Pratim Roy, Nilamadhaba Tripathy, Josep Lladós. December 2010. Multi-oriented Bangla and Devnagari text recognition, Pattern Recognition, Volume 43, Issue 12, Pages 4124-4136, 10. 1016/j. patcog. 2010. 06. 017.
Bilal Bataineh, Siti Norul Huda Sheikh Abdullah, Khairuddin Omar. 2011. An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows, Pattern Recognition Letters, Volume 32, Issue 14, , Pages 1805-1813, ISSN 0167-8655, 10. 1016/j. patrec. 2011. 08. 001.
Fink, Gernot. 2009. Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition. Springer Berlin / Heidelberg pp. 269-298,volume: 12,Doi: 10. 1007/s10032-009-0098-4

Index Terms

Computer Science

Information Sciences

Keywords

Optical Character Recognition (OCR) Open Source DLL Tesseract Transym