Text Extraction from PDF document

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Text Extraction from PDF document

Published on January 2013 by D. Sasirekha, E. Chandra

Amrita International Conference of Women in Computing - 2013

Foundation of Computer Science USA

AICWIC - Number 3

January 2013

Authors: D. Sasirekha, E. Chandra

D. Sasirekha, E. Chandra . Text Extraction from PDF document. Amrita International Conference of Women in Computing - 2013. AICWIC, 3 (January 2013), 17-19.

@article{

author = { D. Sasirekha, E. Chandra },

title = { Text Extraction from PDF document },

journal = { Amrita International Conference of Women in Computing - 2013 },

issue_date = { January 2013 },

volume = { AICWIC },

number = { 3 },

month = { January },

year = { 2013 },

issn = 0975-8887,

pages = { 17-19 },

numpages = 3,

url = { /proceedings/aicwic/number3/9876-1318/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Amrita International Conference of Women in Computing - 2013

%A D. Sasirekha

%A E. Chandra

%T Text Extraction from PDF document

%J Amrita International Conference of Women in Computing - 2013

%@ 0975-8887

%V AICWIC

%N 3

%P 17-19

%D 2013

%I International Journal of Computer Applications

Abstract

Documents in PDF format are nowadays called the Universal document format. PDF to speech converter systems involves many steps to achieve. Text extraction is the primary step From PDF to do further processing. In this paper we start with the brief discussion about the steps involved in extracting the text from PDF documents. The aim of this paper is to give the introduction with some basic concepts on PDF, and with text extraction concepts, which will be useful for the readers who are less familiar in this area of research.

References

http://desktoppub. about. com/od/electronicpublishing/g/pdf. htm
http://www. digitalpreservation. gov/formats/fdd/fdd000030. shtml
http://www. techterms. com/definition/pdf
http://www. webopedia. com/TERM/P/PDF. html
Lin, X. , Gao, L. , Tang, Z. , Lin, X. , & Hu, X. 2011. Mathematical formula identification in PDF documents. In Document Analysis and Recognition (ICDAR), 2011 International Conference on (pp. 1419-1423)
AJEDIG, M. A. , Li, F. , & ur Rehman, A. 2011. A PDF Text Extractor Based on PDF-Renderer. In Proceedings of the International MultiConference of Engineers and Computer Scientists (Vol. 1)
Gupta, G. , Niranjan, S. , Shrivastava, A. , & Sinha, R. 2006. Document Layout Analysis and Classification and Its Application in OCR. In Enterprise Distributed Object Computing Conference Workshops, 2006. EDOCW'06. 10th IEEE International (pp. 58-58)
Williams S. Lovegrove and David F. Brailsford 1995 Document analysis of PDF files: methods, results and implications", Electronic publishing ,vol. 8 (2&3),20-220.
S. Audithan, R M. Chandrasekaran 2009 Document text extraction from document images using Haar Discrete Wavelet Transform" , EJSR.
Claudie Faure, Nicole Vincent 2009 Simultaneous detection of vertical and horizontal text lines based on perceptual organization Proc. SPIE 7247, Document Recognition and Retrieval XVI, 72470M doi:10. 1117/12. 805504,2009
K. S. Sesh Kumar, Anoop M. Namboodiri, and C. V. Jawahar 2006 Learning segmentation of documents with complex scripts ICVGIP'06 Proceedings of the 5th Indian Conference on Computer Vision, Graphics and Image Processing, pp. 749-760.
Song Mao, Azriel Rosenfeld, and Tapas Kanungo 2003 Document structure analysis algorithms: A literature survey Vol. 5010 of SPIE Proceedings, SPIE, pp. 197-207.
Tamir Hassan" Object-Level Document Analysis of PDF Files", DocEng'09, September 16-18, 2009, Munich, Germany.

Index Terms

Computer Science

Information Sciences

Keywords

Text Extraction Pdf Text Extraction Technique