CFP last date
20 February 2025
Reseach Article

Text-Line Extraction from Handwritten Document Images using Histogram and Connected Component Analysis

Published on April 2015 by G. G. Rajput, Suryakant B. Ummapure, Preethi N. Patil
National conference on Digital Image and Signal Processing
Foundation of Computer Science USA
DISP2015 - Number 2
April 2015
Authors: G. G. Rajput, Suryakant B. Ummapure, Preethi N. Patil

G. G. Rajput, Suryakant B. Ummapure, Preethi N. Patil . Text-Line Extraction from Handwritten Document Images using Histogram and Connected Component Analysis. National conference on Digital Image and Signal Processing. DISP2015, 2 (April 2015), 11-17.

@article{
author = { G. G. Rajput, Suryakant B. Ummapure, Preethi N. Patil },
title = { Text-Line Extraction from Handwritten Document Images using Histogram and Connected Component Analysis },
journal = { National conference on Digital Image and Signal Processing },
issue_date = { April 2015 },
volume = { DISP2015 },
number = { 2 },
month = { April },
year = { 2015 },
issn = 0975-8887,
pages = { 11-17 },
numpages = 7,
url = { /proceedings/disp2015/number2/20484-3014/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National conference on Digital Image and Signal Processing
%A G. G. Rajput
%A Suryakant B. Ummapure
%A Preethi N. Patil
%T Text-Line Extraction from Handwritten Document Images using Histogram and Connected Component Analysis
%J National conference on Digital Image and Signal Processing
%@ 0975-8887
%V DISP2015
%N 2
%P 11-17
%D 2015
%I International Journal of Computer Applications
Abstract

Text-line segmentation is an essential part of script identification technique from handwritten and printed document images. In case of handwritten documents, overlapping, touching, skewed and small perforations between the lines makes line extraction difficult task. Presence of such variations leads to errors and wrong identification of script. This paper describes an efficient line extraction technique from handwritten document images using histogram and connected component analysis. Using horizontal histogram profile, a threshold, i. e. , average height of a line in the given document is computed, using which the non-overlapping lines are extracted. In order to extract overlapping lines, that exceed the given threshold, a rectangular bounding box is imposed over the words of the overlapping lines using connected component analysis. The mid-point of each bounding box is then calculated and compared with the average height of the image to label each component as either belonging to upper line or lower line. Experiments are carried out on document images of Kannada, Telugu, Hindi, English and Malayalam scripts and the results obtained are encouraging.

References
  1. FEI YIN, CHENG-LIN LIU. "Handwritten Text-line Extraction Based on Minimum Spanning Tree Clustering". Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China, 2-4 Nov. 2007.
  2. Yin, Fei, and Cheng-Lin Liu. "Handwritten Chinese text-line segmentation by clustering with distance metric learning. " Pattern Recognition 42. 12 (2009): 3146-3157.
  3. Saabni, Raid, and Jihad El-Sana. "Language-independent text-lines extraction using seam carving. " Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011.
  4. Lemaitre, Aurélie, and Jean Camillerapp. "Text-line extraction in handwritten document with Kalman filter applied on low resolution image". Document Image Analysis for Libraries, 2006. DIAL'06. Second International Conference on. IEEE, 2006.
  5. Anusree. M and Dhanya. M. Dhanalakshmy. "Text-line Segmentation of Curved Document Images". Anusree. M et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 5( Version 5), May 2014, pp. 32-36
  6. Gomathi@ Rohini. S, Umadevi. R. S and Mohanavel. S. "Segmentation of Touching, Overlapping, Skewed and Short Handwritten Text-lines". International Journal of Computer Applications (0975 – 8887) Volume 49– No. 19, July 2012
  7. Sunanda Dixit, Sneha, Nilotpal Utkalit and Suresh . H. N. "Text-line Segmentation of Handwritten Documents in Hindi and English". International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 2 Issue: 4 733 – 739.
  8. Vikas J Dongre and Vijay H Mankar. "DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH". International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 1, No. 3, August 2011.
  9. Zhinxin Shi, SrirangarajSetlur and VenuGovindraju. "A Steerable Directional Local Profile Technique for Extraction of Handwritten Arabic Text-lines". 2009 10th International Conference on Document Analysis and Recognition.
  10. Neha Sahu. "DEVANAGIRI DOCUMENT SEGMENTATION USING HISTOGRAM BASED APPROACH". International Journal of Electronics, Electrical and Computational System IJEECS ISSN 2348-117X Volume 3, Issue 3 May 2014.
  11. SaiprakashPalakollu, RenuDhir and Rajneesh Rani. "A New Technique for Line Segmentation of Handwritten Hindi Text". Special Issue of International Journal of Computer Applications (0975 – 8887) on Electronics, Information and Communication Engineering - ICEICE No. 5, Dec 2011.
  12. SaiprakashPalakollu, RenuDhir and Rajneesh Rani. "Segmentation of Handwritten Devanagari Script". SaiprakashPalakollu et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3), 2011, 1244-1247. ISSN: 0975-9646.
  13. Rahul Garg and Naresh Kumar Garg. "An algorithm for Text-line Segmentation in Handwritten Skewed and Overlapped Devanagari Script". International Journal of Emerging Technology and Advanced Engineering Website: www. ijetae. com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 5, May 2014).
  14. Varsha Hole, LeenaRagha and Pravin Hole. "Text-line and Word Segmentation of Indian Script Handwritten Document". International Conference & Workshop on Recent Trends in Technology,(TCET) 2012 Proceedings published in International Journal of Computer Applications®(IJCA).
  15. M. Ravi Kumar, B. P. Pragathi and Nayana N Shetty. " Text-line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approach". International Journal of Computer Applications (0975 – 8878),on National Conference on Advanced Computing and Communications - NCACC, April 2012
  16. Samir Malkarao et al and Nibran et al " Text-line extraction from handwritten document pages using spiral run length smearing algorithm". 978-1-4673-4698-6 ©2012 IEEE.
  17. NazihOuwayed, Abdel Belaid and Francois Auger. "General Text-line Extraction Approach based on Locally Orientation Estimation". Author manuscript, published in "Document Recognition and Retrieval XVII - DRR 2010, 17th Document Recognition and Retrieval Conference, San Jose, CA : United States (2010)".
  18. SaiprakashPalakollu, RenuDhir and Rajneesh Rani. "Handwritten Hindi Text Segmentation Techniques for Lines and Characters". Proceedings of the World Congress on Engineering and Computer Science 2012 Vol IWCECS 2012, October 24-26, 2012, San Francisco, USA.
  19. Kumar, Jayant, et al. "Segmentation of handwritten textlines in presence of touching components. " Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011.
  20. NazihOuwayed, Abdel Belaid. "Separation of Overlapping and Touching Lines within Handwritten Arabic Documents". Xiaoyi Jiang and Nicolai Petkov. The 13th International Conferenceon Computer Analysis of Images and Patterns - CAIP 2009, Sep 2009, Munster, Germany. Springer Berlin / Heidelberg, 5702, pp. 237-244.
  21. Ram Sarkar et al. "CMATERdb1:a database of unconstrained handwritten Bangla and Bangla-English mixed script document image". IJDAR DOI 10. 1007/s 10032-011-0148-6 Published online:24 February 2011.
  22. Rafael C. Gonzalez and Richard E. Woods " Digital Image Processing", Third Edition, Published by Pearson Education,Inc. and Dorling Kindersley Publishing,Inc. ISBN 978-81-317-1934-3.
Index Terms

Computer Science
Information Sciences

Keywords

Handwritten Document Text-line Segmentation Histogram Connected Component.