International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 178 - Number 31 |
Year of Publication: 2019 |
Authors: Paribesh Regmi, Arjun Dahal, Basanta Joshi |
10.5120/ijca2019918401 |
Paribesh Regmi, Arjun Dahal, Basanta Joshi . Nepali Speech Recognition using RNN-CTC Model. International Journal of Computer Applications. 178, 31 ( Jul 2019), 1-6. DOI=10.5120/ijca2019918401
This paper presents a Neural Network based Nepali Speech Recognition model. RNN (Recurrent Neural Networks) is used for processing sequential audio data. CTC (Connectionist Temporal Classification) [1] technique is applied allowing RNN to train over audio data. CTC is a probabilistic approach of maximizing the occurrence probability of the desired labels from RNN output. After processing through RNN and CTC layers, Nepali text is obtained as output. This paper also defines a character set of 67 Nepali characters required for transcription of Nepali speech to text.