Nepali Speech Recognition using RNN-CTC Model

Paribesh Regmi; Arjun Dahal; Basanta Joshi

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Assessing LLMs as Cognitive Interpreters of Student Prompts: A Typological Framework

Tadeu da Ponte Matevz Vremec Matej Mertik

Random Articles

Reseach Article

Nepali Speech Recognition using RNN-CTC Model

by Paribesh Regmi, Arjun Dahal, Basanta Joshi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 178 - Number 31

Year of Publication: 2019

Authors: Paribesh Regmi, Arjun Dahal, Basanta Joshi

10.5120/ijca2019918401

Paribesh Regmi, Arjun Dahal, Basanta Joshi . Nepali Speech Recognition using RNN-CTC Model. International Journal of Computer Applications. 178, 31 ( Jul 2019), 1-6. DOI=10.5120/ijca2019918401

@article{ 10.5120/ijca2019918401,

author = { Paribesh Regmi, Arjun Dahal, Basanta Joshi },

title = { Nepali Speech Recognition using RNN-CTC Model },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2019 },

volume = { 178 },

number = { 31 },

month = { Jul },

year = { 2019 },

issn = { 0975-8887 },

pages = { 1-6 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume178/number31/30732-2019918401/ },

doi = { 10.5120/ijca2019918401 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:51:54.188862+05:30

%A Paribesh Regmi

%A Arjun Dahal

%A Basanta Joshi

%T Nepali Speech Recognition using RNN-CTC Model

%J International Journal of Computer Applications

%@ 0975-8887

%V 178

%N 31

%P 1-6

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper presents a Neural Network based Nepali Speech Recognition model. RNN (Recurrent Neural Networks) is used for processing sequential audio data. CTC (Connectionist Temporal Classification) [1] technique is applied allowing RNN to train over audio data. CTC is a probabilistic approach of maximizing the occurrence probability of the desired labels from RNN output. After processing through RNN and CTC layers, Nepali text is obtained as output. This paper also defines a character set of 67 Nepali characters required for transcription of Nepali speech to text.

References

A. Graves, S. Fernandez, F. Gomez and J. Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented data with Recurrent Neural Networks. In ICML '06 Proc. of the Int. Conf. on International Conference on Machine Learning, Pittsburgh Pennsylvania USA
E Hinton, Geoffrey & Osindero, Simon & Teh, Yee-Whye. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural computation, 18, pp. 1527-54.
Bourlard, Herve A. and Morgan, Nelson. Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell, MA, USA, 1993.
G. E. Dahl, D. Yu, L. Deng and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition. In Proc. IEEE Transactions on Audio, Speech and Language Processing, 20, pp. 30-42.
A. Graves and N. Jaitly. 2014. Towards End-to-End Speech Recognition with Recurrent Neural Networks. In ICML 14 Proc. of the Int. Conf. on International Conference on Machine Learning, Beijing China
A. Kalakheti, K. P. Bhattarari, S. Kuwar and S. Adhikari, Automatic Speech Recognition for Nepali Language. Tribhuvan University, Nepal
B. Joshi, A. Gajurel, A. Pokhrel and M. K. Sharma. 2017. HMM Based Isolated Word Nepali Speech Recognition. In Intern. Conf. on Machine Learning and Cybernetics. Ningbo, China.
S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, 9(8), pp. 1735-1780
Hochreiter, Sepp. (1998). The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 6. 107-116.
M. Schuster and K. K. Paliwal. 1997. Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45.
A. Graves, S. Fernandez and J. Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 2005 International Conference on Artificial Neural Networks. Warsaw, Poland.
S. Magre, P. Janse, and R. Deshmukh. 2014. A Review on Feature Extraction and Noise Reduction Technique. International Journal of Advanced Research in Computer Science and Software Engineering
The Python Tutorial, https://docs.python.org/3/tutorial/index.html
Tensorflow, https://www.tensorflow.org

Index Terms

Computer Science

Information Sciences

Keywords

Artificial Intelligence Machine Learning Automatic Speech Recognition Recurrent Neural Network Connectionist Temporal Classification Softmax Hidden Markov Model Nepali Speech Recognition Long-Short Term Memory (LSTM) Backpropagation Character Error Rate