KanOCR: Conversion of Printed Kannada Document to Editable form using Convolutional Neural Networks

Pradyumna Mukunda; Niraj S. Prasad; Mamatha H. R.

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

KanOCR: Conversion of Printed Kannada Document to Editable form using Convolutional Neural Networks

by Pradyumna Mukunda, Niraj S. Prasad, Mamatha H. R.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 177 - Number 37

Year of Publication: 2020

Authors: Pradyumna Mukunda, Niraj S. Prasad, Mamatha H. R.

10.5120/ijca2020919885

Pradyumna Mukunda, Niraj S. Prasad, Mamatha H. R. . KanOCR: Conversion of Printed Kannada Document to Editable form using Convolutional Neural Networks. International Journal of Computer Applications. 177, 37 ( Feb 2020), 51-58. DOI=10.5120/ijca2020919885

@article{ 10.5120/ijca2020919885,

author = { Pradyumna Mukunda, Niraj S. Prasad, Mamatha H. R. },

title = { KanOCR: Conversion of Printed Kannada Document to Editable form using Convolutional Neural Networks },

journal = { International Journal of Computer Applications },

issue_date = { Feb 2020 },

volume = { 177 },

number = { 37 },

month = { Feb },

year = { 2020 },

issn = { 0975-8887 },

pages = { 51-58 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume177/number37/31151-2020919885/ },

doi = { 10.5120/ijca2020919885 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:48:03.094798+05:30

%A Pradyumna Mukunda

%A Niraj S. Prasad

%A Mamatha H. R.

%T KanOCR: Conversion of Printed Kannada Document to Editable form using Convolutional Neural Networks

%J International Journal of Computer Applications

%@ 0975-8887

%V 177

%N 37

%P 51-58

%D 2020

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Optical Character Recognition (OCR) technology in converting an image containing text to an editable text format is of high sense in document image processing. Input to OCR could be a scanned document, or a simple newspaper cut-out. Supervised Learning using Neural Networks yield the output with greater accuracy. Unlike English, Kannada Language has a huge set of characters as it includes kaagunithas, vattaksharas, etc. This makes recognition of the characters much more complex. The paper mainly concentrates on OCR for the Kannada Text which goes through a threshold as a first step converting input image into binary image, making segmentation easier. Characters can be extracted from the documents using various Segmentation methods. The vattaksharas are extracted/differentiated from the words by using base-line technique. When the characters are recognized, they are compared with Unicodes available on the system and then printed. In the above method, CNN plays a pivotal role in reading the character and comparing it with the Unicode look up table values to print the output. This system has been tested with varying fonts. A total number of 37 sample documents are used for experimentation. The system has been developed for only printed Kannada Text.

References

HR Mamatha, S Sucharitha, Srikanta Murthy, “Multi-font and Multi-size Kannada Character Recognition based on the Curvelets and Standard Deviation”, International Journal of Computer Applications, Foundation of Computer Science, New York, USA, 2011.
R Prajna, VR Ramya, HR Mamatha “A study of different text line extraction techniques for multi-font and multi-size printed kannada documents”, International Journal of Computer Applications, Foundation of Computer Science, 2015.
M.K Jindal, R. K. Sharma & G.S. Lehal, "Segmentation of Horizontally Overlapping Lines in Printed Indian Scripts", International Journal of Computational Intelligence Research. ISSN 0973-1873 Vol.3, No.4 (2007), pp. 277–286
Ashwin T.V and P.S Sastry, “A font and size independent OCR system for printed Kannada using SVM”, Sadhana, vol. 27, Part 1, February 2002, pp. 35–58.
Anil. K. Jain, “Feature Extraction methods for Character Recognition – A survey”, Pattern Recognition Volume 29, Issue 4, April 1996, Pages 641-662
K. Indira, S. Sethu Selvi, “Kannada Character Recognition System: A Review”, InterJRI Science and Technology, Vol. 1, Issue 2, July 2009
Netravati Belagali, Shanmukhappa A. Angadi, “OCR for Handwritten Kannada Language Script”, International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 08; August – 2016.
C V, Aravinda, “Kannada handwritten character recognition using multi feature extraction tecnhiques”. International Journal of Science and Research (IJSR). Vol 10, 2014
Shashikala Parameshwarappa1 , B.V.Dhandra, “Basic Kannada Handwritten Character Recognition System using Shape Based and Transform Domain Features”, International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 7, July 2015
M. Vishwaas, M. M. Arjun and R. Dinesh, "Handwritten Kannada character recognition based on Kohonen Neural Network," 2012 International Conference on Recent Advances in Computing and Software Systems, Chennai, 2012, pp. 91-97.
G. Keerthi Prasad, I. Khan, N. R. Chanukotimath and F. Khan, "On-line handwritten character recognition system for Kannada using Principal Component Analysis Approach: For handheld devices," 2012 World Congress on Information and Communication Technologies, Trivandrum, 2012, pp. 675-678.
Gururaj mukarambi , dhandra b.v , mallikarjun hangarge, “recognition system for handwritten and printed kannada numerals and vowels”, International Journal of Machine Intelligence ISSN: 0975–2927 & E-ISSN: 0975–9166, Volume 3, Issue 4, 2011, pp-259-262.
CS231n Convolution Neural Networks for Visual Recognition; http://cs231n.github.io/convolutional-networks/

Index Terms

Computer Science

Information Sciences

Keywords

Base-line Identification CNN Kannada Neural Network Optical Character Recognition Pre-processing Python Segmentation.