Text Recognition from PDF Files using BPNN and SVM

D. Sasirekha; E. Chandra

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Text Recognition from PDF Files using BPNN and SVM

by D. Sasirekha, E. Chandra

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 59 - Number 20

Year of Publication: 2012

Authors: D. Sasirekha, E. Chandra

10.5120/9819-4417

D. Sasirekha, E. Chandra . Text Recognition from PDF Files using BPNN and SVM. International Journal of Computer Applications. 59, 20 ( December 2012), 18-22. DOI=10.5120/9819-4417

@article{ 10.5120/9819-4417,

author = { D. Sasirekha, E. Chandra },

title = { Text Recognition from PDF Files using BPNN and SVM },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 59 },

number = { 20 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 18-22 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume59/number20/9819-4417/ },

doi = { 10.5120/9819-4417 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:04:47.953195+05:30

%A D. Sasirekha

%A E. Chandra

%T Text Recognition from PDF Files using BPNN and SVM

%J International Journal of Computer Applications

%@ 0975-8887

%V 59

%N 20

%P 18-22

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

OCR, is the process of electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR systems are given additional consideration nowadays. The PDF files consist of text, images and graphs. Mixed Raster Content (MRC) technique segregates text and non-text region from the PDF files and the text part alone is extracted. Artificial Neural Networks (ANN) is a standard pattern classifier and extensively applicable to various problems and here uses Backpropagation learning algorithm which is very usable for image processing. SVM is a classifier that performs classification to find an optimal solution. Thus, this research uses the BPNN and SVM method for OCR from the extracted text files using features. 100 different format of PDF files have been tested and the experimental results with recognition performance are tabulated by comparing both the techniques

References

Andrew Blais and David Mertz, "An Introduction to Neural Networks Pattern Learning with Back Propagation Algorithm", Gnosis Software,Inc. , July 2001.
Yuelong Li Jinping and Li LiMeng, "Character Recognition Based on Hierarchical RBF Neural Networks", Intelligent Systems Design and Applications. Sixth International Conference, 2006, vol. 1, pp. 127-132.
Dong Xiao Ni Seidenberg, "Application of Neural Networks to Character Recognition", CSIS, Pace University, School of CSIS, Pace University, White Plains, NY, 2007.
S. N. Sivanandam, S. N. Deepa," Principals of Soft Computing", Wiley-India, New Delhi, India. pp. 71-83, 2008.
Adobe Systems Incorporated, PDF Reference, Sixth edition, version 1. 23 (30 MB), Nov 2006, p. 33.
Imade, S. ; Tatsuta, S. and Wada, T. "Segmentation and Classification for Mixed Text/Image Documents Using Neural Network", Proc. International Conference on Document Analysis & Recognition (ICDAR1993) , pp 930-934.
Shih, F. Y. and Chan, S. S. "Adaptive Document Block Segmentation and Classification", IEEE Transactions on Systems, Man, and Cybernetics, vol. 26, no. 5, October 1996, pp. 797-802.
Etemad, K,; Doerman, D. S. and Chellappa, R. "Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 92-96, Jan. 1997.
Schettini, R. ; Brambilla, C. ; Ciocca, G. ; Valsasna, A. And De Ponti, M. "A Hierarchical Classification Strategy For Digital Documents", Pattern Recognition 35 (2002), pp. 1759-1769.
Jian Lia, QianDub,?, CaixinSuna ,"An improved box-counting method for image fractal dimension estimation", Pattern Recognition 42 (2009) 2460 – 2469.
EriHaneda, and Charles A. Bouman, "Text Segmentation for MRC Document Compression, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 6, JUNE 2011.
Zaidah Ibrahim, Dino Isa, Rajprasad Rajkumar, Graham Kendall"Document Zone Content Classification for Technical Document Images Using Artificial Neural Networks and Support Vector Machines" 978-1-4244-4457-1/09/$25. 00 ©2009 IEEE
Gunvantsinh Gohil, Rekha Teraiya, Mahesh Goyani, "Chain Code And Holistic Features Based Ocr System For Printed Devanagari Script Using Ann And Svm", International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 3, No. 1, January 2012.
Mamta Maloo et al. / International Journal on Computer Science and Engineering (IJCSE) "Support Vector Machine Based Gujarati Numeral Recognition", Vol. 3 No. 7 July 2011
Arvind C. S. , Nithya E And Nabanita Bhattacharjee " Kannada Language Ocr System Using Svm Classifier" Journal of Information Systems and Communication ISSN: 0976-8742, E-ISSN: 0976-8750, Volume 3, Issue 1, 2012, pp- 92-95.
V. Vapnik, Statistical Learning Theory. John-Wiley and Sons , Inc. , New York, 1998.
Arvind C. S. Nithya E. And Nabanita Bhattacharjee3 "Kannada Language Ocr System Using Svm Classifier", Journal Of Information Systems And Communication, ISSN: 0976-8742, E- ISSN:: 0976- 8750, Volume 3, Issue 1, 2012, Pp- 92-95.
Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun," Character Segmentation Scheme for OCR SystemFor Myanmar Printed Documents", International Journal of Computer Vision and Image Processing, 1(4), 50-58, October-December 2011.
Bindu Philip ; R. D. Sudhaker Samuel," Preferred Computational Approaches for the Recognition of different Classes of Printed Malayalam Characters using Hierarchical SVM Classifiers" International Journal of Computer Applications, vol. I, Issue:16, Pg: 5- 10,2010
Suruchi G. Dedgaonkar, Anjali A. Chandavale, Ashok M. Sapkal, "Survey of Methods for Character Recognition", International Journal of Engineering and Innovative Technology (IJEIT),Volume 1, Issue 5, May 2012.
Madhup Shrivastava, Monika Sahu, Dr. M. A. Rizvi, " Artificial Neural Network Based Character Recognition Using Backpropagat", International Journal of Computers & Technology www. ijctonline. com ISSN: 2277-3061 Volume 3, No. 1, AUG, 2012

Index Terms

Computer Science

Information Sciences

Keywords

MRC OCR ANN BPNN SVM