International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 106 - Number 12 |
Year of Publication: 2014 |
Authors: Chandrahas Gaikwad, Satish Akolkar, Reshma Khodade, Deepali Dalal, Swarupa Kamble |
10.5120/18572-9405 |
Chandrahas Gaikwad, Satish Akolkar, Reshma Khodade, Deepali Dalal, Swarupa Kamble . Generic PDF To Text Conversion using Machine Learning. International Journal of Computer Applications. 106, 12 ( November 2014), 17-21. DOI=10.5120/18572-9405
The world is advancing to a futuristic paperless aeon. Stockpiling of logs, charters, records and other documents has become monotonous. Storage of these as 'soft copy' is more convenient and reliable. This facilitates searching and sorting with ease. Generally such documents are stored as PDF (Printable Document Format), so as to make the documents easily viewable and avoid unnecessary changes emerging due to software platforms. However, editing of locally scripted documents becomes inconvenient. The conventional PDF to text conversion software are incapable of editing some unexplored scripts. In this research paper, a generic way of making PDF documents editable by the script-independent and machine learning features is presented. This is possible by slicing out the characters from the PDF. A set of classifiers is applied to identify the character. The Decision Model implemented as a part of Machine learning systematizes the classifier functions. The resultant classifier set gives the resolution for the character. This approach eradicates the barrier of limiting our scope to international scripts and also facilitates usage of regional scripts in the technological world.