International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 84 - Number 3 |
Year of Publication: 2013 |
Authors: Deepika Ghai, Neelu Jain |
10.5120/14559-2661 |
Deepika Ghai, Neelu Jain . Text Extraction from Document Images- A Review. International Journal of Computer Applications. 84, 3 ( December 2013), 40-48. DOI=10.5120/14559-2661
Text extraction in an image is a challenging task in the computer vision. Text extraction plays an important role in providing useful and valuable information. This paper discusses various approaches such as Adaptive Local Connectivity Map (ALCM), Expectation Maximization (EM), Maximization Likelihood (ML), Markov Random Field (MRF), Spiral Run Length Smearing Algorithm (SRLSA), Curvelet transform etc. for extracting text from scanned book covers, journals, multi-color document, handwritten document, ancient document and newspaper document images. Text line segmentation is a major component for document image analysis. Text in documents depend upon various factors such as language, styles, font, sizes, color, background, orientation, fluctuating text lines, crossing or touching text lines. This paper provides performance comparison of several existing methods suggested by researchers in document text extraction on the basis of recall rate, precision rate, processing time, accuracy etc.