International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 183 - Number 28 |
Year of Publication: 2021 |
Authors: Nazik O’mar Balula, Mohsen Rashwan, Sherif Mahdi Abdo |
10.5120/ijca2021921152 |
Nazik O’mar Balula, Mohsen Rashwan, Sherif Mahdi Abdo . TDNN-LSTM-based Acoustic Modeling for verification of Qur’an Recitation for Non-Arabic Speakers using Kaldi Toolkit. International Journal of Computer Applications. 183, 28 ( Sep 2021), 31-40. DOI=10.5120/ijca2021921152
Automatic Speech Recognition (ASR) has become an important component in HCI (Human -Computer Interaction) such as learning and processing natural languages. This paper provides a hybrid system which used GMM-HMM (Hidden Markov Model with a Mixture of Gaussians Model) and TDNN-LSTM (Time Delay Neural Network with Long-Short Term Memory Neural Network) to detect and correct the pronunciation errors in Quran recitation for non-Arabic speakers, specifically Indian speakers. The develo- , , X , , , , ¨ ¨, ) that non-Arabic speakers can not pronounce them correctly and may confused with other letters that share the same articulation point. Traing and Testing data collected from 94 Indian speakers. MFCCs had been used as a feature extraction technique whereas GMMHMM and TDNN-LSTM used as recognition tool. The main contribuation of the system is the enhancement and increament of accuracy of the HAFSS© system by using Deep Neural Network instead of GMM-HMM. The open-source Kaldi ASR toolkit recipes were used for building, training, testing and evaluation of the system. The developed system outperforms the GMM-MM model by 1.56% based on Kaldi toolkit word accuracy equation. The SUD () letter accuracy using DNN-HMM model based on Kaldi toolkit outperforms the GMM-HMM model by 1% and at the same time outperforms DNN-HMM model based on HTK toolkit by 9.5%. The system acuracy was 95.14% using GMM-HMM and 96.88% using TDNN-LSTM. Calculating the accuracy of the 10 letters, the best accuracy was 97.3% which achived by the letter TTA ( ), and the worest accuracy was 90.1% which achived by the letter DAA (X ). The rest of the paper is divided into seven parts, Section 1, Introduction. Section 2 Qur’an recitation problems introduced along with Previous and Related studies. Section 3 outlines the Project Goal and Section 4 explains the structure of the system. The acoustic model training is explained in Section 5. Section 6 shows the Experiments Results and discussion. Models Results comparison is presented in Section 7. Comparison with previously published results is explaind in Section 8. Recomindations and conclusion in Section 9.