An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired

Pijush Chakraborty; Arnab Mallik

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Real-time Synchronization Mechanisms Between Batch-oriented Legacy Systems and Modern Interfaces in the Retirement Domain

Balamurugan Krishnaswamy Gnanasekaran

Random Articles

Trust Enhancing Model for Cloud Environment

December

2015

Fuzzy Crime Investigation Framework for Tracking Data Theft based on USB Storage

December

2013

A New Ranking Algorithm for Search Engine: Content’s Weight based Page Ranking

Oct

2016

Online Customer Care: An Android Application for Mobile Customers using Speech Synthesis

Jul

2016

Reseach Article

An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired

by Pijush Chakraborty, Arnab Mallik

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 68 - Number 16

Year of Publication: 2013

Authors: Pijush Chakraborty, Arnab Mallik

10.5120/11664-7254

Pijush Chakraborty, Arnab Mallik . An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired. International Journal of Computer Applications. 68, 16 ( April 2013), 26-32. DOI=10.5120/11664-7254

@article{ 10.5120/11664-7254,

author = { Pijush Chakraborty, Arnab Mallik },

title = { An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired },

journal = { International Journal of Computer Applications },

issue_date = { April 2013 },

volume = { 68 },

number = { 16 },

month = { April },

year = { 2013 },

issn = { 0975-8887 },

pages = { 26-32 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume68/number16/11664-7254/ },

doi = { 10.5120/11664-7254 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:28:02.475940+05:30

%A Pijush Chakraborty

%A Arnab Mallik

%T An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired

%J International Journal of Computer Applications

%@ 0975-8887

%V 68

%N 16

%P 26-32

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Many valuable paper documents are usually scanned and kept as images for backup. Extracting text from the images is quite helpful and thus a need for some tool for this extraction is always there. One of the important applications of this tool is its use in Braille Translation. Braille has been the primary writing and reading system used by the visually impaired since the 19th century. This application that extracts text from images and then converts it to Braille will prove to be quite useful for converting old valuable documents or books into Braille format. In this paper the complete methodology used for the extraction of texts from scanned images and for the translation of texts to Braille is presented. The scanned images are initially pre-processed and converted to grayscale and then passed through an adaptive threshold function for conversion to binary image. Then it is sent for Recognition using Google's powerful Tesseract recognition engine which is considered to be the best Open Source OCR Engine currently available. The generated text is then post-processed using a spell checking API JOrtho for removing the errors in the previous step. The final corrected text is then translated to a six dot cell Braille format using a set of rules provided by www. iceb. org. The translation to Braille includes conversion of numbers, alphabets, symbols and compound letters. The translated text can then be saved for printing the document later or for sending it to a Refreshable Braille Display.

References

Tesseract Project Site: http://code. google. com/p/tesseractocr.
Ray Smith, Chris Newton, Phil Cheatle, Adaptive Threshold for OCR: A Significant Test, HP Laboratories Bristol, March 1993
R. Smith, An Overview of the Tesseract OCR Engine, Proc. Ninth Int. Conference on Document Analysis and Recognition , IEEE Computer Society (2007)
Ray Smith, Tesseract OCR Engine, OSCON Conference 2007
Chirag Patel, Atul Patel, Dharmendra Patel, Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study, IJCA Volume 55 Issue 10, October 2012
Tess4J Project Site: http://tess4j. sourceforge. net/
JOrtho Project Site: http://jortho. sourceforge. net/
Soundex Reference: http://en. wikipedia. org/wiki/Soundex
The Rules of Unified English Braille, International Council on English Braille(ICEB), June 2001
Braille ASCII: http://en. wikipedia. org/wiki/Braille_ASCII
Paul Blenkhorn, A System for Converting Braille to Print, IEEE Transactions on Rehabilation Engineering, Vol. 3 No. , June 1995
Manzeet Singh, Parteek Bhatia, Automated Conversion of English and Hindi Text to Braille Representation, IJCA Volume 4 Issue 6, April 2010
Md. Abul Hasnat, Muttakinur Rahman Chowdhury, Mumit Khan, An open source Tesseract based Optical Character Recognizer for Bangla script, 10th International Conference on Document and Recognition, 2009
BrailleOCR Project Site: https://code. google. com/p/brailleocr/

Index Terms

Computer Science

Information Sciences

Keywords

OCR Tesseract Tess4J JOrtho Phonetic Matching Soundex Braille Braille Translation Braille ASCII