Recent Advances in Wireless Communication and Artificial Intelligence |
Foundation of Computer Science USA |
RAWCAI - Number 1 |
September 2014 |
Authors: Anil Kumar Dahiya, Vivek Kumar Verma |
632f82b4-9b7e-4dc0-bb14-27c10850d0ed |
Anil Kumar Dahiya, Vivek Kumar Verma . Script Identification for Tri-Lingual Image Document. Recent Advances in Wireless Communication and Artificial Intelligence. RAWCAI, 1 (September 2014), 35-38.
In multi lingual environment where in a single image document have more than one script occur there is need of script identification system. Automatic identification of scripts in document facilitates (i)Automatic archiving of multilingual documents, (ii) Searching online archives of document images, (iii) Selection of script specific OCR in a multilingual environment. The main objective of this system is to identify the specific script and feed them into their specified Optical Character Recognition (OCR) system. OCR is the system which converts the image document into editable text document. Script identification of written text in the domain of Indian script based languages is a well-studied research field. In this paper a technique of script Identification is described to discriminate three major south Indian scripts: Oriya, Telugu and Kannada. These three scripts are member of Brahmi script and most of the character shapes are near similar. This method is applied over segmented line from the image document and it is completely free from size and font. The proposed technique uses the basic distinguishable features based on texture analysis. The approach is based on the analysis of horizontal projection and vertical projection profile. We obtain overall 98. 64% accuracy from test dataset of three ancient mix document images at line level.