International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 181 - Number 48 |
Year of Publication: 2019 |
Authors: Siva Rama Sastry Gumma |
10.5120/ijca2019918653 |
Siva Rama Sastry Gumma . Extracting Text from Telugu Color Documents by Removing Dither Patterns. International Journal of Computer Applications. 181, 48 ( Apr 2019), 1-7. DOI=10.5120/ijca2019918653
Preprocessing is an important step in the development of Optical Character Recognition (OCR) system. Inpreprocessing there are various modules like binarization, skew detection and correction etc. Among these modules this paper discusses about binarization module. Although there are many algorithms for binarization of a document image, there are fewer algorithms for binarization of printed color images because of printed color documents contain dither patterns, normal text, reversed text, colored text overlayed on colored background drawings and graphics appear with millions of different colors. Hence preprocessing for colored documents is a challenging task to work.For printed color documents, elimination of dither patterns using Butterworth band reject filter and text extraction in the color documents by eliminating graphics using height of the component is also presented. Results on a corpus consisting of newspapers published in Telugu show that the proposed method shows promising results.