Machine Learning based Multilingual OCR

Chandrahas Gaikwad; Satish Akolkar; Reshma Khodade; Deepali Dalal; Smita S. Pawar

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

Machine Learning based Multilingual OCR

by Chandrahas Gaikwad, Satish Akolkar, Reshma Khodade, Deepali Dalal, Smita S. Pawar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 117 - Number 7

Year of Publication: 2015

Authors: Chandrahas Gaikwad, Satish Akolkar, Reshma Khodade, Deepali Dalal, Smita S. Pawar

10.5120/20568-2963

Chandrahas Gaikwad, Satish Akolkar, Reshma Khodade, Deepali Dalal, Smita S. Pawar . Machine Learning based Multilingual OCR. International Journal of Computer Applications. 117, 7 ( May 2015), 27-31. DOI=10.5120/20568-2963

@article{ 10.5120/20568-2963,

author = { Chandrahas Gaikwad, Satish Akolkar, Reshma Khodade, Deepali Dalal, Smita S. Pawar },

title = { Machine Learning based Multilingual OCR },

journal = { International Journal of Computer Applications },

issue_date = { May 2015 },

volume = { 117 },

number = { 7 },

month = { May },

year = { 2015 },

issn = { 0975-8887 },

pages = { 27-31 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume117/number7/20568-2963/ },

doi = { 10.5120/20568-2963 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:58:43.763962+05:30

%A Chandrahas Gaikwad

%A Satish Akolkar

%A Reshma Khodade

%A Deepali Dalal

%A Smita S. Pawar

%T Machine Learning based Multilingual OCR

%J International Journal of Computer Applications

%@ 0975-8887

%V 117

%N 7

%P 27-31

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Paperless business has led to high speed amelioration in the world of technology. Storage, processing and retrieval of data have thus become effortless. To avoid unnecessary alterations during these phases, dossiers are stored as images or as Printable Document Format (PDF). But when real time modifications are to be made, barriers occur due to platform and script dependency, leading to complications. In this project, a generic way to overcome this problem has been presented through the concept of machine learning. A learning character set and a PDF of the identical script constitute the input. The unique features of various characters in the character set are learnt by the machine through various classifiers, and a map for the same is searched in the PDF and correspondingly profiles are generated. These classifiers distinguish the characters based on number of ripples in their patterns, number of regions and other parameters. Comparison is made between both and exact match is declared as result. This project eradicates the need to 'start from scratch' for processing newly encountered script, as observed in the conventional software due to its 'classifier reuse' strategy. It touches the social aspect in situations, where data is available with the user, but in a format in which manipulation is tiresome. In such cases, user can simply give the respective PDF and its character set as input, and obtain corresponding editable version as an output.

References

Text Classification Using Machine Learning Techniques, M. IKONOMAKIS, S. KOTSIANTIS, V. TAMPAKAS
Machine Learning for Image Classification and Clustering Using a Universal Distance Measure, Uzi Chester and Joel Ratsaby, Electrical and Electronics Engineering Department, Ariel University of Samaria, ARIEL 40700
Cursive character recognition – a character segmentation method using projection profile-based technique Roberto J. Rodrigues, Antonio Carlos Gay Thomé
A Two Stage Classification Approach to Tamil Handwriting Recognition. S. Hewavitharana, Department of Computer Science, University of Colombo, Colombo 03, Sri Lanka, H. C. Fernando, Sri Lanka Institute of Information Technology, Colombo 03, Sri Lanka
Peter W. Frey and David J. Slate,"Letter Recognition Using Holland style Adaptive Classifiers" Department of Psychology, Northwestern University, Evanston, IL 60208
A Simple and Effective Optical Character Recognition System for Digits Recognition using the Pixel-Contour Features and Mathematical Parameters, Jenil Shah, Viral Gokani.
Tree Structured Data Analysis: AID, CHAID and CART Leland Wilkinson, SPSS Inc. , 233 South Wacker, Chicago, IL 60606, Department of Statistics, Northwestern University, Evanston, IL 60201

Index Terms

Computer Science

Information Sciences

Keywords

Multilingual Optical Character Recognition Machine Learning Classifiers.