International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 32 - Number 1 |
Year of Publication: 2011 |
Authors: Mohammad Abu Obaida, Md. Jakir Hossain, Momotaz Begum, Md. Shahin Alam |
10.5120/3872-5414 |
Mohammad Abu Obaida, Md. Jakir Hossain, Momotaz Begum, Md. Shahin Alam . Multilingual OCR (MOCR): An Approach to Classify Words to Languages. International Journal of Computer Applications. 32, 1 ( October 2011), 46-53. DOI=10.5120/3872-5414
There are immense efforts to design a complete OCR for most of the world’s leading languages, however, multilingual documents either of handwritten or of printed form. As a united attempt, Unicode based OCRs were studied mostly with some positive outcomes, despite the fact that a large character set slows down the recognition significantly. In this paper, we come out with a method to classify words to a language as the word segmentation is complete. For the purpose, we identified the characteristics of writings of several languages and utilized projecting method combined with some other feature extraction methods. In addition, this paper intends a modified statistical approach to correct the skewness before processing a segmented document. The proposed procedure, evaluated for a collection of both handwritten and printed documents, came with excellent outcomes in assigning words to languages.