CFP last date
20 December 2024
Reseach Article

Bangla Character Recognition for Android Devices

by Aparajita Chowdhury, Abu Foysal, Shafiqul Islam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 136 - Number 11
Year of Publication: 2016
Authors: Aparajita Chowdhury, Abu Foysal, Shafiqul Islam
10.5120/ijca2016908566

Aparajita Chowdhury, Abu Foysal, Shafiqul Islam . Bangla Character Recognition for Android Devices. International Journal of Computer Applications. 136, 11 ( February 2016), 13-19. DOI=10.5120/ijca2016908566

@article{ 10.5120/ijca2016908566,
author = { Aparajita Chowdhury, Abu Foysal, Shafiqul Islam },
title = { Bangla Character Recognition for Android Devices },
journal = { International Journal of Computer Applications },
issue_date = { February 2016 },
volume = { 136 },
number = { 11 },
month = { February },
year = { 2016 },
issn = { 0975-8887 },
pages = { 13-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume136/number11/24197-2016908566/ },
doi = { 10.5120/ijca2016908566 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:36:48.780328+05:30
%A Aparajita Chowdhury
%A Abu Foysal
%A Shafiqul Islam
%T Bangla Character Recognition for Android Devices
%J International Journal of Computer Applications
%@ 0975-8887
%V 136
%N 11
%P 13-19
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The main target of the project was to build an Android application that can extract text from any image that contains Bengali characters and convert it into an editable document. There were a few limitations in existing systems which could be improved further. To recognize more characters and joint letters, it was decided to work on decreasing the rate of error to preserve more texts. Tesseract (v3.03) was used to recognize the characters which utilizes Leptonica Image Processing library to process image and extracting data from the image. Joint letters, dangerous ambiguity and contrast issues were handled to increase efficiency. A record of the analyzed data and overall progress were kept for future scopes of improvement.

References
  1. Smith, R. (2007). An Overview of the Tesseract OCR Engine. Proc. of 9th ICDAR 2007, Curitiba, Paraná, Brazil. (pp. 629-633). IEEE Explore.
  2. Omee, F. Y., Himel, S. S., & Bikas, M. A. N. (2011). A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications, 21(9).
  3. Hasnat, M. A., Habib, S. M. M., Khan, M. (2008). A High Performance Domain Specific OCR for Bangla Script. Novel Algorithms and Techniques in Telecommunications, Automation and Industrial Electronics. (pp. 174-178).
  4. Zaman, S. M., & Islam, T. (2012). Application of Augmented Reality: Mobile Camera Based Bangla Text Detection and Translation. BRAC University.
  5. Chowdhury, M., T., Islam, M., S., Bipu, B., H. (2015). Implementation of an Optical Character Recognizer (OCR) for Bengali language. BRAC University.
  6. Rakshit, S., Ghosal, D., Das, T., Dutta, S., Basu, S. (2009). Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits. Int. Conf. on Information Technology and Business Intelligence.
  7. Hasnat, M., A., Chowdhury, M., R., Khan, M. (2009). Integrating Bangla script recognition support in Tesseract OCR. BRAC University.
  8. Patel, C., Patel, A., & Patel, D. (2012). Optical Character Recognition by Open Source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications, 55(10).
  9. Aithal, P., K., Acharya, U., D., Siddalingaswamy, P., C. (2013). A Fast and Novel Skew Estimation Approach using Radon Transform. International Journal of Computer Information Systems and Industrial Management Applications (5). (pp. 337-344).
  10. Pal, U., Chaudhuri, B., B. (1994). OCR in Bangla: an Indo-Bangladeshi language. Proc. of ICPR, Jerusalem, Israel. (pp. 269-274). IEEE Explore
  11. Chaudhuri, B., B., Pal, U. (1997). An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). Proc. of 4th ICDAR. Ulm, Germany. (pp. 1011-1015). IEEE Explore
  12. Sarfraz, M., Zidouri, A., Shahab, S.A. (2005). A novel approach for skew estimation of document images in OCR system. International Conference on Computer Graphics, Imaging and Vision: New Trends. (pp. 175-180). IEEE Explore.
  13. Gajoui, K., E., Ataa-Allah, F., Oumsis, M. (2015). Training Tesseract Tool for Amazigh OCR. Recent Researches in Applied Computer Science. Proc. of 15th International Conference on Applied Computer Science (ACS15), Konya, Turkey. (pp.172-179). WSEAS Press.
  14. Banerjee, S. (2012). A Study on Tesseract Open Source Optical Character Recognition Engine. Jadavpur University. Retrieved December 13, 2015, from: http://dspace.jdvu.ac.in /handle/123456789/27793
  15. Datta, S., Chaudhury, S., and Parthasarathy, G. (1992). On Recognition of Bengali Numerals with BackPropagation Learning. IEEE International Conference on Systems, Man and Cybernetics (pp. 94-99). IEEE Explore.
  16. Abdullah, A., Khan, M. (2007). A Survey on Script Segmentation for Bangla OCR. BRAC University.
  17. Manning, C., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass. MIT Press.
  18. Arif, S., R. (2007). Bengali Character Recognition using Feature Extraction. BRAC University.
  19. Hayder, K. (2007). Research Report on Bangla Lexicon. BRAC University.
Index Terms

Computer Science
Information Sciences

Keywords

Optical Character Recognition (OCR) Bangla language Android Tesseract Leptonica.