International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 33 - Number 1 |
Year of Publication: 2011 |
Authors: Brij Mohan Singh, Ankush Mittal, Vivek Chand, Debashish Ghosh |
10.5120/3988-5640 |
Brij Mohan Singh, Ankush Mittal, Vivek Chand, Debashish Ghosh . Text Line Extraction from Complex Layout Documents. International Journal of Computer Applications. 33, 1 ( November 2011), 36-43. DOI=10.5120/3988-5640
There are numerous stylish documents which do not have the traditional text layouts where printed text regions are not parallel to each other. Such complex layouts make text line extraction challenging due to multi-orientation of paragraphs. This paper introduces a system for the text line extraction from the complex layout documents. Proposed method is based on the concept of dilation and histogram profiling. The text regions are extracted using dilation and food fill based approach, then paragraph orientation is determined and individual text lines are extracted. The accuracy of extracted text lines are evaluated using the new proposed concept that is also based on the histogram profiling. The results of proposed approach on the complex layouts are promising.