| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 70 |
| Year of Publication: 2025 |
| Authors: Nazik O’mar Balula, Mohsen Rashwan |
10.5120/ijca2025926045
|
Nazik O’mar Balula, Mohsen Rashwan . A Mispronunciation Detection Model for Certain Arabic Letters and Selected Chapters of Holy Quran Recitation, Designed for Non-native Arabic Speakers, Developed using the Kaldi Toolkit. International Journal of Computer Applications. 187, 70 ( Dec 2025), 1-13. DOI=10.5120/ijca2025926045
The use of Deep Neural Networks (DNN) shows significantly higher accuracy compared to traditional methods like the Hidden Markov Model (HMM) combined with the Gaussian Mixture Model (GMM) for creating acoustic models. This research involved developing and evaluating a baseline GMM-HMM model alongside a hybrid model that merges Time-Delay Neural Networks (TDNN) with LSTM and GMM-HMM for acoustic modelling, utilising the opensource Kaldi ASR toolkit. The main goal is to detect pronunciation errors of Arabic as spoken by Indians, and of the recitation of the Holy Qur’an, focusing specifically on ten Arabic letters (د , ح , خ , ص , ض , ط , ظ , ع , غ , ق ) that non-Arabic speakers often mispronounce, confusing them with other letters that have similar articulation points. The speech dataset consisted of around 65 hours of audio, with 58 hours designated for training and 7 hours for validation and testing. The results indicate that the hybrid model, which combines TDNN-LSTM with GMM-HMM, achieved the highest performance of 96.88%, with a Word Error Rate (WER) of 3.12%. This outperforms the GMM-HMM model, which had a performance of 95.2% and a WER of 4.68%. These results confirm the hybrid model’s effectiveness in improving the accuracy of identifying pronunciation errors in Indian speech and recitation compared to the GMM-HMM model alone. This represents a significant step forward in the development of more accurate and efficient speech recognition systems.