International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 39 - Number 6 |
Year of Publication: 2012 |
Authors: Nitin Mishra, C. Patvardhan, C. Vasantha Lakshmi, Sarika Singh |
10.5120/4824-7076 |
Nitin Mishra, C. Patvardhan, C. Vasantha Lakshmi, Sarika Singh . Shirorekha Chopping Integrated Tesseract OCR Engine for Enhanced Hindi Language Recognition. International Journal of Computer Applications. 39, 6 ( February 2012), 19-23. DOI=10.5120/4824-7076
Tesseract OCR Engine is one of the most efficient open source OCR engines currently available. Recently, Tesseract OCR 3.01 is capable of recognizing Hindi language but still it needs some enhancement to improve the performance. The Hindi language recognition accuracy is quite low even for the printed text, as the conjunct character combinations of Hindi Language are not easily separable due to partial overlapping. The proposed approach solves this problem, so that Devanagari conjunct characters can easily be segmented and recognized using Tesseract OCR Engine. This paper presents a complete methodology to improve The Hindi Language Recognition accuracy. This paper also presents comparison with other Devanagari OCR engines available on the basis of recognition accuracy, processing time, font variations and database size.