We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Line-wise Script Segmentation for Indian Language Documents

by Manoj Kumar Shukla, Haider Banka
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 9
Year of Publication: 2014
Authors: Manoj Kumar Shukla, Haider Banka
10.5120/18943-0411

Manoj Kumar Shukla, Haider Banka . Line-wise Script Segmentation for Indian Language Documents. International Journal of Computer Applications. 108, 9 ( December 2014), 34-37. DOI=10.5120/18943-0411

@article{ 10.5120/18943-0411,
author = { Manoj Kumar Shukla, Haider Banka },
title = { Line-wise Script Segmentation for Indian Language Documents },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 9 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 34-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number9/18943-0411/ },
doi = { 10.5120/18943-0411 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:42:34.935054+05:30
%A Manoj Kumar Shukla
%A Haider Banka
%T Line-wise Script Segmentation for Indian Language Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 9
%P 34-37
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In a multi-lingual country like India, script segmentation or script separation of the multi-script in an image of a document page is of primary importance for a script identification system. For script segmentation of such a document page, it is necessary to segment multi script forms before running individual OCR of the script. In this paper we present a technique for script segmentation of the individual text line for printed in Indian language document. Our line wise script segmentation approach is Horizontal Projection Profile based script segmentation. A prototype of the system has been tested on printed Indian language lines of script and an average accuracy of 99% has been achieved

References
  1. U. Pal and B. Chaudhuri. Script line separation from indian multi-script documents. In International Conference on Document Analysis and Recognition, pages 406{409, 1999.
  2. U. Pal and B. Chaudhuri. Automatic identi_cation of english, chinese, arabic, devnagari and bangla script line. In International Conference on Document Analysis and Recognition, pages 790{794, 2001.
  3. U. Pal, S. Sinha and B. B. Chaudhuri, "Multi-Script line identification from Indian documents," Proc. of seventh Intl. conf. on document analysis and Recognition (ICDAR 2003), vol. 2, pp. 880-884, 2003.
  4. Santanu Choudhury, Gaurav Harit, Shekar Madnani, R. B. Shet, "Identification of Scripts of Indian Languages by Combining Trainable Classifiers," ICVGIP, Bangalore, India, Dec. 20-22, 2000.
  5. S Basavaraj Patil and N. V. SubbaReddy, "Neural network based system for script identification in Indian documents," Sadhana, vol. 27, part1, pp. 83-97, February 2002.
  6. B. V. Dhandra, Mallikarjun Hangarge, Ravindra Hegadi and V. S. Malemath, "Word Level Script Identification in Bilingual Documents through Discriminating Features," IEEE – ICSCN 2007, Chennai, India, pp. 630-635, Feb. 2007.
  7. S. Chanda, U. Pal, "English, Devanagari and Urdu Text Identification," Proc. Intl. Conf. on Document Analysis and Recognition, pp. 538-545, 2005.
  8. P. A. Vijaya, M. C. Padma, "Text line identification from a multilingual document," Proc. of Intl. Conf. on digital image processing (ICDIP 2009) Bangkok, pp. 302-305, March 2009.
  9. Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy, "Script Identification from Indian Documents," LNCS 3872, DAS, pp. 255-267, 2006.
  10. Zhou L, Y Lu and C L Tan, Bangla/English script Identification based on analysis of connected component Profiles, In Proc. 7th DAS, 2006
  11. S. Tsujimoto and H. Asada, 1992, "Major components of a complete text reading system", Proceedings of the IEEE, Vol. 80(7), pp. 1133-1149, 1992.
  12. V. Bansal and R. M. K. Sinha, "Segmentation of touching and fused Devanagari characters", Pattern Recognition, Vol. 35(4), pp. 875-893, 2002.
  13. U. Pal and B. B. Chaudhuri, "Printed Devanagari script OCR system", Vivek, Vol. 10(1), pp. 12-24, 1997.
  14. B. B. Chaudhuri and U. Pal, "A complete printed Bangla OCR system", Pattern Recognition, Vol. 31(5), pp. 531-549, 1998.
  15. G. S. Lehal, C. Singh and R. Lehal, "A shape based post processor for Gurmukhi OCR", in the Proceedings of 6th ICDAR, pp. 1105-1109, 2001.
  16. A. Goyal, G. S. Lehal and S. S. Deol, "Segmentation of machine printed Gurmukhi script", in the Proceedings of 9th International Graphonomics Society Conference,Singapore, pp. 293-297, 1999.
  17. G. S. Lehal, Optical Character Recognition of Machine Printed Gurmukhi Text, Ph. D. hesis, Punjabi University, Patiala, India, 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Script line documents OCR