CFP last date
20 December 2024
Reseach Article

Text Extraction and Non Text Removal from Colored Images

by Shivani Saluja, Tushar Patnaik, Tanvi Jain
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 44 - Number 22
Year of Publication: 2012
Authors: Shivani Saluja, Tushar Patnaik, Tanvi Jain
10.5120/6410-8759

Shivani Saluja, Tushar Patnaik, Tanvi Jain . Text Extraction and Non Text Removal from Colored Images. International Journal of Computer Applications. 44, 22 ( April 2012), 13-19. DOI=10.5120/6410-8759

@article{ 10.5120/6410-8759,
author = { Shivani Saluja, Tushar Patnaik, Tanvi Jain },
title = { Text Extraction and Non Text Removal from Colored Images },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 44 },
number = { 22 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 13-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume44/number22/6410-8759/ },
doi = { 10.5120/6410-8759 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:36:13.239460+05:30
%A Shivani Saluja
%A Tushar Patnaik
%A Tanvi Jain
%T Text Extraction and Non Text Removal from Colored Images
%J International Journal of Computer Applications
%@ 0975-8887
%V 44
%N 22
%P 13-19
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The objective of this paper is to propose a new methodology for text extraction and non text removal from colored images. Text in web pages, library documents etc. is one powerful source of high level semantics. Existing text extraction methods do not work efficiently in case of images with complex background, different contrast or text embedded in a complex background. Documents fed into OCR work efficiently if they contain only text. The paper has taken into focus several images in several languages (English, Telgu, and Gurumukhi). Several existing text detection techniques have also been discussed in the paper. The approach used is based on preprocessing steps, adaptive thresholding, detecting connected components, generating blobs and finally extraction of only those blobs which consist of textual part.

References
  1. Chen D, H. Bourlard, 2001. And J. -P. Thiran, "Text identification in Complex Background using SVM, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 621-626.
  2. Tushar Patnaik,Shalu Gupta,Deepak Arya "Comparison of Binarization Algorithm in Indian Language OCR", Proceedings of ASCNT – 2010, CDAC, Noida, India, pp. 61 – 69
  3. Efthimios Badekas, Nikos Nikolaou, Nikos Papamarkos "Text Binarization in Color Documents", 2007 Wiley Periodicals, Inc.
  4. Efthimios Badekas, Nikos Nikolaou, Nikos Papamarkos "Font and Background Color Independent Text Binarization", 2007 Wiley Periodicals, Inc
  5. J. He, Q. D. M. Do, A. C. Downton, J. H. Kim, "A Comparison of Binarization Methods for Historical Archive Documents," Eighth International Conference on Document Analysis and Recognition (ICDAR'05), pp. 538542, 2005 [6 ] Huang, Huadong Ma, He Zhang, "A New Video Text Extraction Approach" IEEE International Conference on Multimedia and Expo, 2009. ICME 2009 .
  6. Keechul Jung, Kwang In Kim, Anil K. Jain "Text Information Extraction in Images and Video: A Survey
Index Terms

Computer Science
Information Sciences

Keywords

Binarization Pixel Image Text Non Text Text Localization Connected Component Blobs Color