International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 37 - Number 9 |
Year of Publication: 2012 |
Authors: Apurva A. Desai |
10.5120/4635-6683 |
Apurva A. Desai . Segmentation of Characters from Old Typewritten Documents using Radon Transform. International Journal of Computer Applications. 37, 9 ( January 2012), 10-15. DOI=10.5120/4635-6683
Optical character recognition is a very challenging area. Many works have been done and still being done for many languages across the world. For many Indian languages too good amount of work has been done. However, Gujarati is a language for which hardly any work can be found. Gujarati has a rich literary heritage, and therefore it is important to preserve it for the next generation. In this paper an attempt has be done to segmenting out the words and characters from old typewritten Gujarati documents. Here an algorithm is presented which makes use of global threshold for converting scan RGB documents to blank and white documents. Noise removal has also been applied. Here Radon transform is utilized for skew detection. The novel concept of using Radon transform is presented here in this work. Here Radon transform is used for segmenting documents into lines and then vertical profiles has been used for further segmentation of lines in characters. At last this segmentation algorithm is also tested for the documents typewritten in Hindi. The algorithm presented here gives very good results.