We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

A Complete Workflow for Development of Bangla OCR

by Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 21 - Number 9
Year of Publication: 2011
Authors: Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas
10.5120/2543-3483

Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas . A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications. 21, 9 ( May 2011), 1-6. DOI=10.5120/2543-3483

@article{ 10.5120/2543-3483,
author = { Farjana Yeasmin Omee, Shiam Shabbir Himel, Md. Abu Naser Bikas },
title = { A Complete Workflow for Development of Bangla OCR },
journal = { International Journal of Computer Applications },
issue_date = { May 2011 },
volume = { 21 },
number = { 9 },
month = { May },
year = { 2011 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume21/number9/2543-3483/ },
doi = { 10.5120/2543-3483 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:08:00.616042+05:30
%A Farjana Yeasmin Omee
%A Shiam Shabbir Himel
%A Md. Abu Naser Bikas
%T A Complete Workflow for Development of Bangla OCR
%J International Journal of Computer Applications
%@ 0975-8887
%V 21
%N 9
%P 1-6
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Developing a Bangla OCR requires bunch of algorithm and methods. There were many effort went on for developing a Bangla OCR. But all of them failed to provide an error free Bangla OCR. Each of them has some lacking. We discussed about the problem scope of currently existing Bangla OCR’s. In this paper, we present the basic steps required for developing a Bangla OCR and a complete workflow for development of a Bangla OCR with mentioning all the possible algorithms required.

References
  1. Md. AbulHasnat, S M MurtozaHabib and MumitKhan."A high performance domain specific OCR for Bangla script", Int. Joint Conf. on Computer, Information, and Systems Sciences, and Engineering (CISSE), 2007.
  2. Open_Source_Bangla_OCR:http://sourceforge.net/project/showfiles.php?group_id=158301&package_id=215908.
  3. A. B. M. Abdullah and A. Rahman, “A Different Approach in Spell Checking for South Asian Languages”, Proc. of 2nd ICITA, 2004.
  4. A. B. M. Abdullah and A. Rahman, “Spell Checking for Bangla Languages: An Implementation Perspective”, Proc. of 6th ICCIT, 2003, pp. 856-860.
  5. U. Garain and B. B. Chaudhuri, “Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts using Fuzzy Multifactorial Analysis”, IEEE Transactions on Systems, Man and Cybernetics, vol.32, pp. 449-459, Nov. 2002.
  6. Minhaz Fahim Zibran, Arif Tanvir, Rajiullah Shammi and Ms. Abdus Sattar, Computer Representation of Bangla Characters And Sorting of Bangla Words, Proc. ICCIT’ 2002 , 27-28 December, East West University, Dhaka, Bangladesh.
  7. ArifBillah Al-Mahmud Abdullah and MumitKhan,“A Survey on Script Segmentation for Bangla OCR” Dept. of CSE, BRAC University, Dhaka, Bangladesh
  8. Md. MahbubAlam and Dr. M. AbulKashem, “A Complete Bangla OCR System for Printed Chracters” JCIT-100707.pdf
  9. J. He, Q. D. M. Do*, A. C. Downton and J. H. Kim, ”A Comparison of Binarization Methods for Historical Archive Documents”.
  10. Tushar Patnaik, Shalu Gupta, Deepak Arya, ”Comparison of Binarization Algorithm in Indian Language OCR”.
  11. Ahmed Shah Mashiyat Ahmed Shah MehadiKamrulHasanTalukder“Bangla off-line Handwritten Character Recognition Using Superimposed Matrices”, 7th ICCT_2004_112.pdf
  12. Sho Miura, Hiroyuki Tsuji, Tomoaki Kimura, Shinji Tokumasu, “MIXED NOISE REMOVAL IN DIGITAL IMAGES USING ENHANCED TV FILTERS”, IEEE- Automation Congress, 2008. WAC 2008. World. Sept. 28 2008-Oct. 2 2008
  13. Marie Nikaido, Naoyuki Tamaru,“Noise reduction for gray image using a Kalman filter” SICE 2003 Annual Conference Issue Date : 4-6 Aug. 2003, Volume : 2,On page(s): 1748
  14. M.HassanShirali-Shahreza, SajadShirali-Shahreza , “Removing Noises Similar to Dots from Persian Scanned Documents” Computing, Communication, Control, and Management, 2008. CCCM '08. ISECS International Colloquium on Issue Date: 3-4 Aug. 2008 On page(s): 313 – 317
  15. Tinku Acharya and Ajoy K. Ray (2005). “Image Processing Principles and Applications”, John Wiley & Sons, Inc., Hoboken, New Jersey
  16. J. U. Mahmud, M. F. Rahman and C. M. Rahman (2003). “A Complete OCR System for Continuous Bengali Characters”, IEEE,PP. 1372-1376
  17. B.B. Chaudhuri and U. Pal, "Skew Angle Detection Of Digitized Indian Script Documents", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, pp.182-186, 1997.
  18. S. M. MurtozaHabib, Nawsher Ahmed Noor and Mumit Khan, Skew Angle Detection of Bangla script using Radon Transform, Proc. of 9th ICCIT, 2006.
  19. Description_Of_RLSA_Algorithm:http://crblpocr.blogspot.com/2007/06/run-length-smoothing-algorithm-rlsa.html
  20. Thomas M. Breuel, DFKI and U. Kaiserslautern Kaiserslautern, Germany “The OCRopus Open Source OCR System”.
  21. A. Ray Chaudhuri, A.K.Mandal, B.B. Chaudhuri “Page Layout Analyzer for Multilingual Indian Documents” Proceedings of the Language Engineering Conference (LEC’02), IEEE.
  22. Swapnil Khedekar, Vemulapati Ramanaprasad, Srirangaraj Setlur, Venugopal Govindaraju “Text-Image Separation in Devangari Documents” Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), IEEE
  23. Nasreen Akter, Saima Hossain, Md. Tajul Islam & Hasan Sarwar (2008). An Algorithm For Segmenting Modifies From Bangla Text, ICCIT, IEEE, Khulna,Bangladesh, PP.177-182
  24. B.B. Chaudhuri & U. Pal (1998). Complete Printed Bangla OCR System, Elsevier Science Ltd. Pattern Recognition, Vol(31): 531-549
  25. Md. Al Mehedi Hasan, Md. Abdul Alim, Md. Wahedul Islam & M. Ganger Ali (2005). Bangla Text Extraction and Recognition from Textual Image, NCCPB, Bangladesh, PP.171-176
  26. Abu Sayeed Md. Sohail, Md. Robiul Islam, Boshir Ahmed & M A Mottalib (2005). Improvement in Existing Offline Bangla Character Recognitions Techniques Introducing Substainability to Rotation and Noise, NCCPB, Bangladesh, pp. 163-170
  27. Angshul Majumdar & Rabab K. Ward (2009). Nearest Subspace Classifier: Application To Character Recognition
  28. Subhadip Basu, Nibaran Das, Ram Sarkar, MahantapasKundu, Mita Nasipuri & DipakKumarBasu (2005). Handwritten 'Bangla ' Alphabet Recognition Using an MLP Based Classifier, NCCPB, Bangladesh, PP. 285-291
  29. Adnan Mohammad Shoeb Shatil and Mumit Khan (2007). Computer Science and Engineering, BRAC University, Dhaka,Bangladesh “Minimally Segmenting High Performance Bangla OpticalCharacter Recognition Using Kohonen Network”
  30. Md. AbulHasnat, S. M. Murtoza Habib, Mumit Khan, “Segmentation free Bangla OCR using HMM: Training and Recognition”
  31. Ray Smith, "An Overview of the Tesseract OCR Engine",Proc. of ICDAR 2007, Volume 2, Page(s):629 - 633, 2007.
  32. Tesseract-OCR: http://code.google.com/p/tesseract-ocr/
  33. Md. AbulHasnat, Muttakinur Rahman Chowdhury and Mumit Khan, "Integrating Bangla script recognition support in Tesseract OCR", Proc. of the Conference on Language and Technology 2009 (CLT09), Lahore, Pakistan, 2009.
Index Terms

Computer Science
Information Sciences

Keywords

OCR Bangla OCR Bangla Font Matra Preprocessing Binarization Classification Segmentation Page Layout analysis Tesseract