CFP last date
20 December 2024
Reseach Article

Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach

by Hazrat Ali, An Jianwei, Khalid Iqbal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 118 - Number 9
Year of Publication: 2015
Authors: Hazrat Ali, An Jianwei, Khalid Iqbal
10.5120/20770-3275

Hazrat Ali, An Jianwei, Khalid Iqbal . Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach. International Journal of Computer Applications. 118, 9 ( May 2015), 1-5. DOI=10.5120/20770-3275

@article{ 10.5120/20770-3275,
author = { Hazrat Ali, An Jianwei, Khalid Iqbal },
title = { Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 118 },
number = { 9 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume118/number9/20770-3275/ },
doi = { 10.5120/20770-3275 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:01:11.909072+05:30
%A Hazrat Ali
%A An Jianwei
%A Khalid Iqbal
%T Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 118
%N 9
%P 1-5
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Speech Recognition for Urdu language is an interesting and less developed task. This is primarily due to the fact that linguistic resources such as rich corpus are not available for Urdu. Yet, few attempts have been made for developing Urdu speech recognition frameworks using the traditional approaches such as Hidden Markov Models and Neural Networks. In this work, we investigate the use of three classification methods for Urdu speech recognition task. We extract the Mel Frequency Cepstral Coefficients, the delta and delta-delta features from the speech data and train the classifiers to perform Urdu speech recognition. We present the performance achieved by training a Support Vector Machine (SVM) classifier, a random forest (RF) classifier and a linear discriminant analysis classifier (LDA) for comparison with SVM. Consequently, the experimental results show that SVM gives better performance than RF and LDA classifiers on this particular task.

References
  1. H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, no. 1, pp. 43–49, Feb 1978.
  2. L. Gagnon, S. Foucher, F. Laliberte, and G. Boulianne, "A simplified audiovisual fusion model with application to largevocabulary recognition of french canadian speech," Canadian Journal of Electrical and Computer Engineering, vol. 33, no. 2, pp. 109–119, 2008.
  3. T. Shimizu, Y. Ashikari, E. Sumita, J. Zhang, and S. Nakamura, "NICT/ATR Chinese-Japanese-English Speech-to- Speech Translation System," Tsinghua Science and Technology, vol. 13, no. 4, pp. 540 – 544, 2008.
  4. J. -J. Mao, Q. lin Chen, F. Gao, R. Guo, and R. -Z. Lu, "STIS: a Chinese spoken dialogue system about Shanghai transportation information," in Proceedings. 2003 IEEE Intelligent Transportation Systems, vol. 1, Oct 2003, pp. 65–68.
  5. K. Ohtsuki, T. Matsuoka, T. Mori, K. Yoshida, Y. Taguchi, S. Furui, and K. Shirai, "Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news," Speech Communication, vol. 28, no. 2, pp. 155 – 166, 1999.
  6. W. Ghai and N. Singh, "Analysis of Automatic Speech Recognition Systems for Indo-Aryan Languages : Punjabi A Case Study," International Journal of Soft Computing and Engineering (IJSCE), vol. 2, no. 1, pp. 379–385, 2012.
  7. M. U. Akram and M. Arif, "Design of an Urdu Speech Recognizer based upon acoustic phonetic modeling approach," in Proceedings of 8th International Multitopic Conference (INMIC) 2004,, Dec 2004, pp. 91–96.
  8. A. Ahad, A. Fayyaz, and T. Mehmood, "Speech recognition using multilayer perceptron," in Proceedings. IEEE Students Conference, ISCON '02. , vol. 1, Aug 2002, pp. 103–109.
  9. S. Hasnain and M. Awan, "Recognizing spoken urdu numbers using fourier descriptor and neural networks with matlab," in Second International Conference on Electrical Engineering, (ICEE 2008), March 2008, pp. 1–6.
  10. L. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb 1989.
  11. J. Ashraf, N. Iqbal, N. Sarfraz Khattak, and A. Mohsin Zaidi, "Speaker independent urdu speech recognition using hmm," in The 7th International Conference on Informatics and Systems (INFOS 2010), March 2010, pp. 1–5.
  12. H. Sarfraz, S. Hussain, R. Bokhari, A. A. Raza, I. Ullah, Z. Sarfraz, S. Pervez, A. Mustafa, I. Javed, and R. Parveen, "Large vocabulary continuous speech recognition for urdu," in Proceedings of the 8th International Conference on Fron-tiers of Information Technology. New York, USA: ACM, 2010, pp. 1:1–1:5.
  13. H. Ali, N. Ahmad, X. Zhou, K. Iqbal, and S. M. Ali, "DWT features performance analysis for automatic speech recognition of Urdu," SpringerPlus, vol. 3, no. 1, p. 204, 2014.
  14. H. Ali, N. Ahmad, X. Zhou, M. Ali, and A. Manjotho, "Linear discriminant analysis based approach for automatic speech recognition of urdu isolated words," in Communication Technologies, Information Security and Sustainable Development, ser. Communications in Computer and Information Science, F. K. Shaikh, B. S. Chowdhry, S. Zeadally, D. M. A. Hussain, A. A. Memon, and M. A. Uqaili, Eds. Springer International Publishing, 2014, vol. 414, pp. 24–34.
  15. H. Ali, N. Ahmad, and X. Zhou, "Automatic speech recognition of Urdu words using linear discriminant analysis," Journal of Intelligent and Fuzzy Systems, 2015, pre-print.
  16. H. Ali, A. d'Avila Garcez, S. Tran, X. Zhou, and K. Iqbal, "Unimodal late fusion for NIST i-vector challenge on speaker detection," Electronics Letters, vol. 50, no. 15, pp. 1098– 1100, July 2014.
  17. A. Criminisi and J. Shotton, "Classification forests," in Decision Forests for Computer Vision and Medical Image Analysis, ser. Advances in Computer Vision and Pattern Recognition, A. Criminisi and J. Shotton, Eds. London: Springer, 2013, pp. 25–45.
  18. S. Molau, M. Pitz, R. Schluter, and H. Ney, "Computing mel-frequency cepstral coefficients on the power spectrum," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), vol. 1, 2001, pp. 73–76.
  19. W. Han, C. -F. Chan, C. -S. Choy, and K. -P. Pun, "An efficient mfcc extraction method in speech recognition," in Proceedings. IEEE International Symposium on Circuits and Systems, ISCAS 2006. , May 2006.
  20. B. Kotnik, D. Vlaj, and B. Horvat, "Efficient noise robust feature extraction algorithms for distributed speech recognition (dsr) systems," International Journal of Speech Technology, vol. 6, no. 3, pp. 205–219, 2003.
  21. B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ser. COLT '92. New York, NY, USA: ACM, 1992, pp. 144– 152.
  22. L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun, U. Muller, E. Sackinger, P. Simard, and V. Vapnik, "Comparison of classifier methods: a case study in handwritten digit recognition," in Proceedings of the 12th IAPR International. Conference on Pattern Recognition,, vol. 2, Oct 1994, pp. 77–82.
  23. C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
  24. B. Sch¨olkopf, C. Burges, and V. Vapnik, "Extracting support data for a given task," in First International Conference on Knowledge Discovery & Data Mining, Menlo Park. AAAI Press, 1995, pp. 252–257.
  25. ——, "Incorporating invariances in support vector learning machines," in Proceedings of the 1996 International Conference on Artificial Neural Networks, ser. ICANN 96. Verlag: Springer, 1996, pp. 47–52.
  26. V. Blanz, B. Sch¨olkopf, H. H. B¨ulthoff, C. Burges, V. Vapnik, and T. Vetter, "Comparison of view-based object recognition algorithms using realistic 3d models," in Proceedings of the 1996 International Conference on Artificial Neural Networks, ser. ICANN 96. Verlag: Springer, 1996, pp. 251–256.
  27. W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-carrasquillo, "Support vector machines for speaker and language recognition," Computer Speech and Language, vol. 20, pp. 210–229, 2006.
  28. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.
  29. C. -C. Chang and C. -J. Lin, "LIBSVM: A library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011, software available at http://www. csie. ntu. edu. tw/ cjlin/libsvm.
  30. T. K. Ho, "Random decision forests," in Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, Aug 1995, pp. 278–282 vol. 1.
  31. R. Caruana, N. Karampatziakis, and A. Yessenalina, "An empirical evaluation of supervised learning in high dimensions," in Proceedings of the 25th International Conference on Machine Learning, ser. ICML '08. New York, NY, USA: ACM, 2008, pp. 96–103.
  32. S. Balakrishnama and A. Ganapathiraju, "Linear discriminant analysis; a brief tutorial," http://www. music. mcgill. ca/ ich/classes/mumt611 07/classifiers/lda theory. pdf, [Online] Accessed: 10 November, 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Linear Discriminant Analysis Mel-Frequency Cepstral Coefficients Random Forest Support Vector Machines Urdu