CFP last date
20 March 2025
Reseach Article

Text Recognition from PDF Files using BPNN and SVM

by D. Sasirekha, E. Chandra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 59 - Number 20
Year of Publication: 2012
Authors: D. Sasirekha, E. Chandra

D. Sasirekha, E. Chandra . Text Recognition from PDF Files using BPNN and SVM. International Journal of Computer Applications. 59, 20 ( December 2012), 18-22. DOI=10.5120/9819-4417

@article{ 10.5120/9819-4417,
author = { D. Sasirekha, E. Chandra },
title = { Text Recognition from PDF Files using BPNN and SVM },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 59 },
number = { 20 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 18-22 },
numpages = {9},
url = { },
doi = { 10.5120/9819-4417 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T21:04:47.953195+05:30
%A D. Sasirekha
%A E. Chandra
%T Text Recognition from PDF Files using BPNN and SVM
%J International Journal of Computer Applications
%@ 0975-8887
%V 59
%N 20
%P 18-22
%D 2012
%I Foundation of Computer Science (FCS), NY, USA

OCR, is the process of electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR systems are given additional consideration nowadays. The PDF files consist of text, images and graphs. Mixed Raster Content (MRC) technique segregates text and non-text region from the PDF files and the text part alone is extracted. Artificial Neural Networks (ANN) is a standard pattern classifier and extensively applicable to various problems and here uses Backpropagation learning algorithm which is very usable for image processing. SVM is a classifier that performs classification to find an optimal solution. Thus, this research uses the BPNN and SVM method for OCR from the extracted text files using features. 100 different format of PDF files have been tested and the experimental results with recognition performance are tabulated by comparing both the techniques

  1. Andrew Blais and David Mertz, "An Introduction to Neural Networks Pattern Learning with Back Propagation Algorithm", Gnosis Software,Inc. , July 2001.
  2. Yuelong Li Jinping and Li LiMeng, "Character Recognition Based on Hierarchical RBF Neural Networks", Intelligent Systems Design and Applications. Sixth International Conference, 2006, vol. 1, pp. 127-132.
  3. Dong Xiao Ni Seidenberg, "Application of Neural Networks to Character Recognition", CSIS, Pace University, School of CSIS, Pace University, White Plains, NY, 2007.
  4. S. N. Sivanandam, S. N. Deepa," Principals of Soft Computing", Wiley-India, New Delhi, India. pp. 71-83, 2008.
  5. Adobe Systems Incorporated, PDF Reference, Sixth edition, version 1. 23 (30 MB), Nov 2006, p. 33.
  6. Imade, S. ; Tatsuta, S. and Wada, T. "Segmentation and Classification for Mixed Text/Image Documents Using Neural Network", Proc. International Conference on Document Analysis & Recognition (ICDAR1993) , pp 930-934.
  7. Shih, F. Y. and Chan, S. S. "Adaptive Document Block Segmentation and Classification", IEEE Transactions on Systems, Man, and Cybernetics, vol. 26, no. 5, October 1996, pp. 797-802.
  8. Etemad, K,; Doerman, D. S. and Chellappa, R. "Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 92-96, Jan. 1997.
  9. Schettini, R. ; Brambilla, C. ; Ciocca, G. ; Valsasna, A. And De Ponti, M. "A Hierarchical Classification Strategy For Digital Documents", Pattern Recognition 35 (2002), pp. 1759-1769.
  10. Jian Lia, QianDub,?, CaixinSuna ,"An improved box-counting method for image fractal dimension estimation", Pattern Recognition 42 (2009) 2460 – 2469.
  11. EriHaneda, and Charles A. Bouman, "Text Segmentation for MRC Document Compression, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 6, JUNE 2011.
  12. Zaidah Ibrahim, Dino Isa, Rajprasad Rajkumar, Graham Kendall"Document Zone Content Classification for Technical Document Images Using Artificial Neural Networks and Support Vector Machines" 978-1-4244-4457-1/09/$25. 00 ©2009 IEEE
  13. Gunvantsinh Gohil, Rekha Teraiya, Mahesh Goyani, "Chain Code And Holistic Features Based Ocr System For Printed Devanagari Script Using Ann And Svm", International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 3, No. 1, January 2012.
  14. Mamta Maloo et al. / International Journal on Computer Science and Engineering (IJCSE) "Support Vector Machine Based Gujarati Numeral Recognition", Vol. 3 No. 7 July 2011
  15. Arvind C. S. , Nithya E And Nabanita Bhattacharjee " Kannada Language Ocr System Using Svm Classifier" Journal of Information Systems and Communication ISSN: 0976-8742, E-ISSN: 0976-8750, Volume 3, Issue 1, 2012, pp- 92-95.
  16. V. Vapnik, Statistical Learning Theory. John-Wiley and Sons , Inc. , New York, 1998.
  17. Arvind C. S. Nithya E. And Nabanita Bhattacharjee3 "Kannada Language Ocr System Using Svm Classifier", Journal Of Information Systems And Communication, ISSN: 0976-8742, E- ISSN:: 0976- 8750, Volume 3, Issue 1, 2012, Pp- 92-95.
  18. Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun," Character Segmentation Scheme for OCR SystemFor Myanmar Printed Documents", International Journal of Computer Vision and Image Processing, 1(4), 50-58, October-December 2011.
  19. Bindu Philip ; R. D. Sudhaker Samuel," Preferred Computational Approaches for the Recognition of different Classes of Printed Malayalam Characters using Hierarchical SVM Classifiers" International Journal of Computer Applications, vol. I, Issue:16, Pg: 5- 10,2010
  20. Suruchi G. Dedgaonkar, Anjali A. Chandavale, Ashok M. Sapkal, "Survey of Methods for Character Recognition", International Journal of Engineering and Innovative Technology (IJEIT),Volume 1, Issue 5, May 2012.
  21. Madhup Shrivastava, Monika Sahu, Dr. M. A. Rizvi, " Artificial Neural Network Based Character Recognition Using Backpropagat", International Journal of Computers & Technology www. ijctonline. com ISSN: 2277-3061 Volume 3, No. 1, AUG, 2012
Index Terms

Computer Science
Information Sciences
