We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Nepali Speech Recognition using RNN-CTC Model

by Paribesh Regmi, Arjun Dahal, Basanta Joshi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 178 - Number 31
Year of Publication: 2019
Authors: Paribesh Regmi, Arjun Dahal, Basanta Joshi
10.5120/ijca2019918401

Paribesh Regmi, Arjun Dahal, Basanta Joshi . Nepali Speech Recognition using RNN-CTC Model. International Journal of Computer Applications. 178, 31 ( Jul 2019), 1-6. DOI=10.5120/ijca2019918401

@article{ 10.5120/ijca2019918401,
author = { Paribesh Regmi, Arjun Dahal, Basanta Joshi },
title = { Nepali Speech Recognition using RNN-CTC Model },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2019 },
volume = { 178 },
number = { 31 },
month = { Jul },
year = { 2019 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume178/number31/30732-2019918401/ },
doi = { 10.5120/ijca2019918401 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:51:54.188862+05:30
%A Paribesh Regmi
%A Arjun Dahal
%A Basanta Joshi
%T Nepali Speech Recognition using RNN-CTC Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 178
%N 31
%P 1-6
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a Neural Network based Nepali Speech Recognition model. RNN (Recurrent Neural Networks) is used for processing sequential audio data. CTC (Connectionist Temporal Classification) [1] technique is applied allowing RNN to train over audio data. CTC is a probabilistic approach of maximizing the occurrence probability of the desired labels from RNN output. After processing through RNN and CTC layers, Nepali text is obtained as output. This paper also defines a character set of 67 Nepali characters required for transcription of Nepali speech to text.

References
  1. A. Graves, S. Fernandez, F. Gomez and J. Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented data with Recurrent Neural Networks. In ICML '06 Proc. of the Int. Conf. on International Conference on Machine Learning, Pittsburgh Pennsylvania USA
  2. E Hinton, Geoffrey & Osindero, Simon & Teh, Yee-Whye. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural computation, 18, pp. 1527-54.
  3. Bourlard, Herve A. and Morgan, Nelson. Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell, MA, USA, 1993.
  4. G. E. Dahl, D. Yu, L. Deng and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition. In Proc. IEEE Transactions on Audio, Speech and Language Processing, 20, pp. 30-42.
  5. A. Graves and N. Jaitly. 2014. Towards End-to-End Speech Recognition with Recurrent Neural Networks. In ICML 14 Proc. of the Int. Conf. on International Conference on Machine Learning, Beijing China
  6. A. Kalakheti, K. P. Bhattarari, S. Kuwar and S. Adhikari, Automatic Speech Recognition for Nepali Language. Tribhuvan University, Nepal
  7. B. Joshi, A. Gajurel, A. Pokhrel and M. K. Sharma. 2017. HMM Based Isolated Word Nepali Speech Recognition. In Intern. Conf. on Machine Learning and Cybernetics. Ningbo, China.
  8. S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, 9(8), pp. 1735-1780
  9. Hochreiter, Sepp. (1998). The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 6. 107-116.
  10. M. Schuster and K. K. Paliwal. 1997. Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45.
  11. A. Graves, S. Fernandez and J. Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 2005 International Conference on Artificial Neural Networks. Warsaw, Poland.
  12. S. Magre, P. Janse, and R. Deshmukh. 2014. A Review on Feature Extraction and Noise Reduction Technique. International Journal of Advanced Research in Computer Science and Software Engineering
  13. The Python Tutorial, https://docs.python.org/3/tutorial/index.html
  14. Tensorflow, https://www.tensorflow.org
Index Terms

Computer Science
Information Sciences

Keywords

Artificial Intelligence Machine Learning Automatic Speech Recognition Recurrent Neural Network Connectionist Temporal Classification Softmax Hidden Markov Model Nepali Speech Recognition Long-Short Term Memory (LSTM) Backpropagation Character Error Rate