International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 4 - Number 6 |
Year of Publication: 2010 |
Authors: B.V.Dhandra, Mallikarjun Hangarge |
10.5120/834-1170 |
B.V.Dhandra, Mallikarjun Hangarge . Offline Handwritten Script Identification in Document Images. International Journal of Computer Applications. 4, 6 ( July 2010), 1-5. DOI=10.5120/834-1170
Automatic handwritten script identification from document images facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate a texture as a tool for determining the script of handwritten document image, based on the observation that text has a distinct visual texture. Further, K nearest neighbour algorithm is used to classify 300 text blocks as well as 400 text lines into one of the three major Indian scripts: English, Devnagari and Urdu, based on 13 spatial spread features extracted using morphological filters. The proposed algorithm attains average classification accuracy as high as 99.2% for bi-script and 88.6% for tri-script separation at text line and text block level respectively with five fold cross validation test.