National conference on Digital Image and Signal Processing |
Foundation of Computer Science USA |
DISP2015 - Number 3 |
April 2015 |
Authors: Ranjana S. Zinjore, R. J. Ramteke |
0715cefe-6820-4492-b179-56cdb8867e3a |
Ranjana S. Zinjore, R. J. Ramteke . Identification and Removal of Devanagari Script and Extraction of Roman Words from Printed Bilingual Text Document. National conference on Digital Image and Signal Processing. DISP2015, 3 (April 2015), 17-20.
In this paper, a generalized framework has been proposed for Identification and Removal of Devangari (Marathi) Script and extraction of Roman (English) words from printed Bilingual Text document. For identification, the gray scale image is converted into binary image. After that, Sobel edge detector is applied on binary image. The morphological dilation with square structuring element is applied on image. Then labeling the connected component and with the help of visual discriminating features, Marathi words are identify. All identified Marathi words are removed from document for word level extraction of Roman Script. For Extraction of Roman words, the close neighbors, to bounding box (BB) are joined and two BB that are on the same text line in the image are group if the distance between them is less than considered threshold value. We are tested the proposed methodology on 10 different bilingual documents collected from newspapers, book text and some are manually generated. The identification accuracy obtained is 85. 95%.