Text Extraction and Non Text Removal from Colored Images

Shivani Saluja; Tushar Patnaik; Tanvi Jain

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Text Extraction and Non Text Removal from Colored Images

by Shivani Saluja, Tushar Patnaik, Tanvi Jain

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 44 - Number 22

Year of Publication: 2012

Authors: Shivani Saluja, Tushar Patnaik, Tanvi Jain

10.5120/6410-8759

Shivani Saluja, Tushar Patnaik, Tanvi Jain . Text Extraction and Non Text Removal from Colored Images. International Journal of Computer Applications. 44, 22 ( April 2012), 13-19. DOI=10.5120/6410-8759

@article{ 10.5120/6410-8759,

author = { Shivani Saluja, Tushar Patnaik, Tanvi Jain },

title = { Text Extraction and Non Text Removal from Colored Images },

journal = { International Journal of Computer Applications },

issue_date = { April 2012 },

volume = { 44 },

number = { 22 },

month = { April },

year = { 2012 },

issn = { 0975-8887 },

pages = { 13-19 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume44/number22/6410-8759/ },

doi = { 10.5120/6410-8759 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:36:13.239460+05:30

%A Shivani Saluja

%A Tushar Patnaik

%A Tanvi Jain

%T Text Extraction and Non Text Removal from Colored Images

%J International Journal of Computer Applications

%@ 0975-8887

%V 44

%N 22

%P 13-19

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The objective of this paper is to propose a new methodology for text extraction and non text removal from colored images. Text in web pages, library documents etc. is one powerful source of high level semantics. Existing text extraction methods do not work efficiently in case of images with complex background, different contrast or text embedded in a complex background. Documents fed into OCR work efficiently if they contain only text. The paper has taken into focus several images in several languages (English, Telgu, and Gurumukhi). Several existing text detection techniques have also been discussed in the paper. The approach used is based on preprocessing steps, adaptive thresholding, detecting connected components, generating blobs and finally extraction of only those blobs which consist of textual part.

References

Chen D, H. Bourlard, 2001. And J. -P. Thiran, "Text identification in Complex Background using SVM, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 621-626.
Tushar Patnaik,Shalu Gupta,Deepak Arya "Comparison of Binarization Algorithm in Indian Language OCR", Proceedings of ASCNT – 2010, CDAC, Noida, India, pp. 61 – 69
Efthimios Badekas, Nikos Nikolaou, Nikos Papamarkos "Text Binarization in Color Documents", 2007 Wiley Periodicals, Inc.
Efthimios Badekas, Nikos Nikolaou, Nikos Papamarkos "Font and Background Color Independent Text Binarization", 2007 Wiley Periodicals, Inc
J. He, Q. D. M. Do, A. C. Downton, J. H. Kim, "A Comparison of Binarization Methods for Historical Archive Documents," Eighth International Conference on Document Analysis and Recognition (ICDAR'05), pp. 538542, 2005 [6 ] Huang, Huadong Ma, He Zhang, "A New Video Text Extraction Approach" IEEE International Conference on Multimedia and Expo, 2009. ICME 2009 .
Keechul Jung, Kwang In Kim, Anil K. Jain "Text Information Extraction in Images and Video: A Survey

Index Terms

Computer Science

Information Sciences

Keywords

Binarization Pixel Image Text Non Text Text Localization Connected Component Blobs Color