CFP last date
20 December 2024
Reseach Article

Identification and Removal of Devanagari Script and Extraction of Roman Words from Printed Bilingual Text Document

Published on April 2015 by Ranjana S. Zinjore, R. J. Ramteke
National conference on Digital Image and Signal Processing
Foundation of Computer Science USA
DISP2015 - Number 3
April 2015
Authors: Ranjana S. Zinjore, R. J. Ramteke
0715cefe-6820-4492-b179-56cdb8867e3a

Ranjana S. Zinjore, R. J. Ramteke . Identification and Removal of Devanagari Script and Extraction of Roman Words from Printed Bilingual Text Document. National conference on Digital Image and Signal Processing. DISP2015, 3 (April 2015), 17-20.

@article{
author = { Ranjana S. Zinjore, R. J. Ramteke },
title = { Identification and Removal of Devanagari Script and Extraction of Roman Words from Printed Bilingual Text Document },
journal = { National conference on Digital Image and Signal Processing },
issue_date = { April 2015 },
volume = { DISP2015 },
number = { 3 },
month = { April },
year = { 2015 },
issn = 0975-8887,
pages = { 17-20 },
numpages = 4,
url = { /proceedings/disp2015/number3/20492-3027/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National conference on Digital Image and Signal Processing
%A Ranjana S. Zinjore
%A R. J. Ramteke
%T Identification and Removal of Devanagari Script and Extraction of Roman Words from Printed Bilingual Text Document
%J National conference on Digital Image and Signal Processing
%@ 0975-8887
%V DISP2015
%N 3
%P 17-20
%D 2015
%I International Journal of Computer Applications
Abstract

In this paper, a generalized framework has been proposed for Identification and Removal of Devangari (Marathi) Script and extraction of Roman (English) words from printed Bilingual Text document. For identification, the gray scale image is converted into binary image. After that, Sobel edge detector is applied on binary image. The morphological dilation with square structuring element is applied on image. Then labeling the connected component and with the help of visual discriminating features, Marathi words are identify. All identified Marathi words are removed from document for word level extraction of Roman Script. For Extraction of Roman words, the close neighbors, to bounding box (BB) are joined and two BB that are on the same text line in the image are group if the distance between them is less than considered threshold value. We are tested the proposed methodology on 10 different bilingual documents collected from newspapers, book text and some are manually generated. The identification accuracy obtained is 85. 95%.

References
  1. S. Basavaraj Patil and N V Subbareddy, "Neural Network based System for Script Identification in Indian Documents", Sadhana, Special Issue on Indian Language Document Processing, Feb 2002, Vol. 27, part-1, pp. 83-97.
  2. D. Ghosh, T. Dube and A. P. Shivaprasad, "Script Recognition A Review", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, (2010) December, pp. 2142-2161.
  3. Dhandra. B. V, Mallikarjun. H, Hegadil. R and Malemathl. V. S. , "Word Level Script Identification in Bilingual Documents through Discriminating Features", IEEE- International Conference on Signal Processing, Communications and Networking ( ICSCN), Feb 22- 24, 2007, pp. 630-635.
  4. Sushama Shelk and Shaila Apte, "A Multistage Handwritten Marathi Compound Character Recognition Scheme using Neural Networks and Wavelet Features", International Journal of Signal Processing, Image Processing and Pattern Recognition, March 2011, Vol. 4.
  5. Aarti G. Ambekar, Chhaya S. Hinge, Samidha S. Kulkarni, "Bilingual OCR for Printed English and Devnagari Text", International Journal of Research, Jan 2013, Vol. 2, Issue: 1, ISSN: 2250-1991.
  6. K. Roy, U. Pal, and B. B. Chaudhuri, "Neural Network based Word wise Handwritten Script Identification System for Indian Postal Automation", IEEE-Proceedings of International conference on Intelligent Sensing and Information Processing (ICISIP), Jan 4-7 2005, pp 240-245
  7. K. Roy and U. Pal, "Word-wise Hand-written Script Separation for Indian Postal automation", In Proc. 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 521-526, 2006.
  8. Lijun Zhou, Yue Lu, Chew Lim Tan, "Bangla/English Script Identification based on Analysis of Connected component Profiles", In Proc. 7th IAPR workshop on Document Analysis System, New land, pp. 234-254,13-15, Feb-2006
  9. R. Dhir, C. Singh and G. S. Lehal, "A Structural Feature Based Approach for Script identification of Gurumukhi and Roman Characters and Words", Proceedings of 39th Annual National Convention of Computer Society of India (2004) December.
  10. Sunilkumar K. Sangame , R. J. Ramteke , Shivkumar Andure and Yogesh V. Gundge, "Script identification of text words from a bilingual document using voting Techniques", World Journal of Science and Technology 2012, 2(5):114-119 ISSN: 2231 – 2587
  11. Savita Pal Godara and Pratap Singh Patwal, "Latin Script Detection and Removal from Devanagari Document Image for OCR", International Journal of Computer & Organization Trends, Mar 2014, Vol. 6.
  12. Lincoln Faria and Angel Sanchez, " Word- Level Segmentation in Printed Handwritten Documents",
Index Terms

Computer Science
Information Sciences

Keywords

Bounding Box Morphological Operation Feature Extraction Script Identification.