International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 59 - Number 20 |
Year of Publication: 2012 |
Authors: D. Sasirekha, E. Chandra |
10.5120/9819-4417 |
D. Sasirekha, E. Chandra . Text Recognition from PDF Files using BPNN and SVM. International Journal of Computer Applications. 59, 20 ( December 2012), 18-22. DOI=10.5120/9819-4417
OCR, is the process of electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR systems are given additional consideration nowadays. The PDF files consist of text, images and graphs. Mixed Raster Content (MRC) technique segregates text and non-text region from the PDF files and the text part alone is extracted. Artificial Neural Networks (ANN) is a standard pattern classifier and extensively applicable to various problems and here uses Backpropagation learning algorithm which is very usable for image processing. SVM is a classifier that performs classification to find an optimal solution. Thus, this research uses the BPNN and SVM method for OCR from the extracted text files using features. 100 different format of PDF files have been tested and the experimental results with recognition performance are tabulated by comparing both the techniques