CFP last date
20 February 2025
Reseach Article

Urdu Character Recognition using Principal Component Analysis

by Khalil Khan, Rehan Ullah, Nasir Ahmad Khan, Khwaja Naveed
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 60 - Number 11
Year of Publication: 2012
Authors: Khalil Khan, Rehan Ullah, Nasir Ahmad Khan, Khwaja Naveed
10.5120/9733-2082

Khalil Khan, Rehan Ullah, Nasir Ahmad Khan, Khwaja Naveed . Urdu Character Recognition using Principal Component Analysis. International Journal of Computer Applications. 60, 11 ( December 2012), 1-4. DOI=10.5120/9733-2082

@article{ 10.5120/9733-2082,
author = { Khalil Khan, Rehan Ullah, Nasir Ahmad Khan, Khwaja Naveed },
title = { Urdu Character Recognition using Principal Component Analysis },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 60 },
number = { 11 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume60/number11/9733-2082/ },
doi = { 10.5120/9733-2082 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:06:16.301163+05:30
%A Khalil Khan
%A Rehan Ullah
%A Nasir Ahmad Khan
%A Khwaja Naveed
%T Urdu Character Recognition using Principal Component Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 60
%N 11
%P 1-4
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper proposes a method for Urdu language text search in image based Urdu Text. In the proposed method two databases of images have been created; first one for training purpose and another for testing purpose. Training database is named 'TrainDatabase' and testing database as 'TestDatabase'. Training database consists of all characters of Urdu language in different shapes. Eigen values and Eigen vectors of all the images to be placed in the TrainingDatabase are calculated. Only those values having highest Eigen values are kept. A feature vector for each image of the TrainDatabase is calculated by the algorithm. A threshold value is chosen such that it defines maximum allowable distance between TrainDatabase and TestDatabase images. Feature vector is also created for each image to be identified and placed in 'TestDatabase'. Comparison is done for a character to be identified with each image of 'TrainDatabase'. If the character to be recognized is matching with any character of the TrainDatabase result is shown by algorithm. MATLAB has been used as a simulation tool and the recognition rate obtained was 96. 2 % for isolated characters.

References
  1. Wei Zhao, Jia-Feng Liu ; Xiang-Long Tang "Online handwritten English word recognition based on cascade connection of character HMMs", Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference, vol. 4 pp. 1758 - 1761
  2. G. Nagy Rensselaer Polytechnic Institute Troy, New York, "Chinese Character Recognition A Twenty Five Year Retrospective". Tsuyoshi Kitani t, riguchi and Masami Ilara Yoshio, "Pattern Matching in the Textract Information Extraction System".
  3. T. S El-Sheikh and R. M Guindi, "computer Recognition of Arabic Cursive Script," Pattern Recognition, Vol. 21, No, 4, 1988, pp. 293-302.
  4. Raymond G. Gordon, "Ethnologue: Languages of the World Fifteenth Edition" SIL International, 2005.
  5. Zahra A Shah and Farah Saleem. "Ligature Based Optical Character Recognition of Urdu, Nastaleeq Font", INMIC 2002.
  6. U. Pal and Anirban Sarkar "Recognition of Printed Urdu Script", Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), IEEE.
  7. Inam Shamsher, Zaheer Ahmad, Jehanzeb Khan Orakzai, and Awais Adnan "OCR For Printed Urdu Script Using Feed Forward Neural Network", World Academy of Science, Engineering and Technology 34 2007.
  8. Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, and Awais Adnan "Urdu Nastaleeq optical character recognition", World Academy of Science, Engineering and Technology 32 2007
  9. S. A. Husain, Asma Sajjad, Fareeha Anwar "Online Urdu Character Recognition System", MVA2007 IAPR Conference on Machine Vision Applications, May 16-18, 2007, Tokyo, Japan.
  10. Tabassam Nawaz, Syed Ammar Hassan Shah Naqvi, Habib ur Rehman, Anoshia Faiz "Optical Character Recognition System for Urdu (Naskh Font) Using Pattern Matching Technique", International Journal of Image Processing, (IJIP)Volume (3) : Issue (3).
  11. Sobia Tariq Javed and Sarmad Hussain, "Improving Nastalique-Specific Pre-Recognition Process for Urdu OCR", Multitopic Conference, 2009. INMIC 2009. IEEE 13th International.
  12. J Edward. A User's Guide to Principal Components. New York: Wiley-Interscience, 1991.
  13. M. A. Turk and A. P. Pentland, "Face Recognition Using Eigenfaces", IEEE Conf. on Computer Vision and Pattern Recognition, pp. 586-591, 1991.
  14. M. S. R. S. Prasad, S. S. Panda, G. Deepthi and V. Anisha "Face Recognition Using PCA and Feed Forward Neural Networks", International Journal of Computer Science and Telecommunications [Volume 2, Issue 8, November 2011].
  15. Khaled Labib and V. Rao Vemuri , "An Application of Principal Component Analysis to the Detection and Visualization of Computer Network Attacks" Conference on Security and Network Architectures the proceedings of SAR 2004.
  16. Kavita Mahajan, M. R. Vargantwar, Sangita M. Rajput, "Classification of EEG using PCA, ICA and Neural Network", International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-1, October 2011
Index Terms

Computer Science
Information Sciences

Keywords

Optical Character Recognition Principal Component Analysis Training Database Testing Database