International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 85 - Number 15 |
Year of Publication: 2014 |
Authors: Gopal Prasad, Atul Kumar Singh, Pawan Kumar |
10.5120/14916-3462 |
Gopal Prasad, Atul Kumar Singh, Pawan Kumar . A Multiple Feature based Novel Approach for Identification of Printed Indian Scripts at Word Level. International Journal of Computer Applications. 85, 15 ( January 2014), 8-13. DOI=10.5120/14916-3462
In a country like India where different scripts are in use, automatic identification of printed script facilitates many important applications such as automatic transcription of multilingual documents and for the selection of script specific OCR in a multilingual environment. In this paper a novel method to identify the script type of the collection of documents printed in seven Indian languages at word level is proposed. These languages are Bangla, Hindi, English, Malayalam, Oriya, Tamil and Kannada. The recognition is based upon multiple features extracted using Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). Script classification performance is analyzed using the K-nearest neighbor classifier by comparing the majority of voting's between the outputs of DCT and DWT based methods. The proposed scheme utilizes the strength of both the DCT and DWT based features. The results of experimentation found the overall accuracy to be 98. 11 % which show the superiority of the proposed multiple features based scheme over several existing schemes of script identification.