International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 44 - Number 22 |
Year of Publication: 2012 |
Authors: Shivani Saluja, Tushar Patnaik, Tanvi Jain |
10.5120/6410-8759 |
Shivani Saluja, Tushar Patnaik, Tanvi Jain . Text Extraction and Non Text Removal from Colored Images. International Journal of Computer Applications. 44, 22 ( April 2012), 13-19. DOI=10.5120/6410-8759
The objective of this paper is to propose a new methodology for text extraction and non text removal from colored images. Text in web pages, library documents etc. is one powerful source of high level semantics. Existing text extraction methods do not work efficiently in case of images with complex background, different contrast or text embedded in a complex background. Documents fed into OCR work efficiently if they contain only text. The paper has taken into focus several images in several languages (English, Telgu, and Gurumukhi). Several existing text detection techniques have also been discussed in the paper. The approach used is based on preprocessing steps, adaptive thresholding, detecting connected components, generating blobs and finally extraction of only those blobs which consist of textual part.