International Conference on Communication Computing and Virtualization |
Foundation of Computer Science USA |
ICCCV2016 - Number 1 |
July 2016 |
Authors: Shailesh A. Chaudhari, Ravi M. Gulati |
b23d910f-1154-4fab-b598-7f524bf5db91 |
Shailesh A. Chaudhari, Ravi M. Gulati . A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English). International Conference on Communication Computing and Virtualization. ICCCV2016, 1 (July 2016), 16-20.
In a bilingual or multi-lingual optical character recognition system script identification is a challenging task. A remarkable research work on script identification have been noted in Indian or non-Indian context. As many commercial and official regional documents of different states of India are in bilingual containing one regional language of respective state and the other international intersperse language English. Therefore script identification is one of the primary tasks in multi-script document recognition. English words are mostly interspersed in regional documents of different states of India. In this paper script identification of Gujarati and English at word level is presented. For feature extraction two approach are used. In the first approach statistical features and in second approach the Gabor features of a word using Gabor filters with suitable frequencies and orientations are extracted. The proposed system uses two classifiers k-NN and SVM with different kernel functions used to classify the extracted features in one of the script. From the experiment it has been perceived that SVM outperform then k-NN.