CFP last date
20 January 2025
Reseach Article

Script Identification for Tri-Lingual Image Document

Published on September 2014 by Anil Kumar Dahiya, Vivek Kumar Verma
Recent Advances in Wireless Communication and Artificial Intelligence
Foundation of Computer Science USA
RAWCAI - Number 1
September 2014
Authors: Anil Kumar Dahiya, Vivek Kumar Verma
632f82b4-9b7e-4dc0-bb14-27c10850d0ed

Anil Kumar Dahiya, Vivek Kumar Verma . Script Identification for Tri-Lingual Image Document. Recent Advances in Wireless Communication and Artificial Intelligence. RAWCAI, 1 (September 2014), 35-38.

@article{
author = { Anil Kumar Dahiya, Vivek Kumar Verma },
title = { Script Identification for Tri-Lingual Image Document },
journal = { Recent Advances in Wireless Communication and Artificial Intelligence },
issue_date = { September 2014 },
volume = { RAWCAI },
number = { 1 },
month = { September },
year = { 2014 },
issn = 0975-8887,
pages = { 35-38 },
numpages = 4,
url = { /proceedings/rawcai/number1/17916-1412/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Recent Advances in Wireless Communication and Artificial Intelligence
%A Anil Kumar Dahiya
%A Vivek Kumar Verma
%T Script Identification for Tri-Lingual Image Document
%J Recent Advances in Wireless Communication and Artificial Intelligence
%@ 0975-8887
%V RAWCAI
%N 1
%P 35-38
%D 2014
%I International Journal of Computer Applications
Abstract

In multi lingual environment where in a single image document have more than one script occur there is need of script identification system. Automatic identification of scripts in document facilitates (i)Automatic archiving of multilingual documents, (ii) Searching online archives of document images, (iii) Selection of script specific OCR in a multilingual environment. The main objective of this system is to identify the specific script and feed them into their specified Optical Character Recognition (OCR) system. OCR is the system which converts the image document into editable text document. Script identification of written text in the domain of Indian script based languages is a well-studied research field. In this paper a technique of script Identification is described to discriminate three major south Indian scripts: Oriya, Telugu and Kannada. These three scripts are member of Brahmi script and most of the character shapes are near similar. This method is applied over segmented line from the image document and it is completely free from size and font. The proposed technique uses the basic distinguishable features based on texture analysis. The approach is based on the analysis of horizontal projection and vertical projection profile. We obtain overall 98. 64% accuracy from test dataset of three ancient mix document images at line level.

References
  1. M C Padma and P A Vijay "Identification of Telugu Devnagri and English Script using discriminating feature "International Journal of Computer science & Information Technology (IJCSIT), Vol 1, pp. 64-78 , November 2009.
  2. Rajesh Gopakumar, N V Subbareddy, Krishnamoorthi Makkithaya, U Dinesh Acharya "Zone-based Structural feature extraction for Script Identification from Indian Documents" 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, 978-1-4244-6653-5/10/$26. 00 ©2010 IEEE pp. 420-425 ,2010.
  3. B. V. Dhandra, Mallikarjun Hangarge, Ravindra Hegadil and V. S. Malemathl "Word Level Script Identification in Bilingual Documents through Discriminating Features" IEEE - ICSCN 2007, MIT Campus, Anna University, Chennai, India. Feb. 22-24, 2007. Pp. 630-635.
  4. U. Pal, S. Sinha and B. B. Chaudhuri "Multi-Script Line identification from Indian Documents" Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003)0-7695-1960-1/03 $17. 00 © 2003 IEEE.
  5. P Nagabhushan, S. A. Angadi and B. S. Anami," An Intelligent Pin code Script Identification methodology based on texture analysis using modified invariant moments "In Preceding of ICCR-2005,pp. 615-623.
  6. U. pal and B. B chaudhary,"Automatic Seperation of different script Documents", in Proc. Indian Conference on Computer-vision, Graphics and Image processing, PP 141-146, 1998.
  7. Gopal Datt Joshi, Saurabh garg, and Jayanti Saraswat,"Script Identification of Indian Documents", LNCS 3872, PP. 255-267, DAS 2006.
  8. P. A. Vijaya, M. C. Padma, "Text line identification from a multilingual document," Proc. of Intl. Conf. on digital image processing (ICDIP 2009) Bangkok, pp. 302-305, March 2009.
  9. Sukalpa Chanda, Srikanta Pal and Umapada Pal," Word-wise Sinhala Tamil and English Script Identification using Gaussian Kernel SVM " In Preceding of IEEE-2008 978-1-4244-2175.
Index Terms

Computer Science
Information Sciences

Keywords

Ocr Script Identification Knn Oriya Telugu Kannada Projection Profile