International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 64 |
Year of Publication: 2025 |
Authors: Chandan Senapati, Utpal Roy |
10.5120/ijca2025924462 |
Chandan Senapati, Utpal Roy . Multilingual ASR Model for Kudmali Voice Recognition. International Journal of Computer Applications. 186, 64 ( Jan 2025), 27-35. DOI=10.5120/ijca2025924462
The Kudmali language, an underrepresented and potentially vulnerable language, faces significant challenges in the development of Automatic Speech Recognition (ASR) systems due to its minimal digital presence and limited annotated datasets. This paper investigates the application of the multilingual XLS-R model, a transformer-based pre-trained ASR framework, for Kudmali voice detection. By leveraging transfer learning and fine-tuning techniques, we adapt the XLS-R model to recognize and transcribe Kudmali speech effectively. The proposed system utilizes a diverse dataset of Kudmali audio recordings, transcribed in Bengali script, addressing the lack of native transcriptions. We present a comprehensive data preparation pipeline, including audio normalization, data augmentation, and multilingual model adaptation, to overcome resource limitations. Comparative performance analysis with baseline models demonstrates significant improvements, achieving a Word Error Rate (WER) of 19.8% and a Character Error Rate (CER) of 12.1% after fine-tuning, with further reductions when using data augmentation techniques. This study highlights the potential of leveraging multilingual pretrained models like XLS-R to develop ASR systems for lowresource languages, ensuring their preservation and promoting digital inclusivity. The findings underscore the importance of adapting state-of-the-art ASR frameworks for linguistic diversity, paving the way for further advancements in underrepresented language technology. The study aims to evaluate models’ adaptability, accuracy, and error patterns in recognizing this lesser-known language, contributing to the broader application of ASR technologies in lowresource languages.