CFP last date
20 February 2025
Call for Paper
March Edition
IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2025

Submit your paper
Know more
Reseach Article

Multilingual ASR Model for Kudmali Voice Recognition

by Chandan Senapati, Utpal Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 64
Year of Publication: 2025
Authors: Chandan Senapati, Utpal Roy
10.5120/ijca2025924462

Chandan Senapati, Utpal Roy . Multilingual ASR Model for Kudmali Voice Recognition. International Journal of Computer Applications. 186, 64 ( Jan 2025), 27-35. DOI=10.5120/ijca2025924462

@article{ 10.5120/ijca2025924462,
author = { Chandan Senapati, Utpal Roy },
title = { Multilingual ASR Model for Kudmali Voice Recognition },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2025 },
volume = { 186 },
number = { 64 },
month = { Jan },
year = { 2025 },
issn = { 0975-8887 },
pages = { 27-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number64/multilingual-asr-model-for-kudmali-voice-recognition/ },
doi = { 10.5120/ijca2025924462 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-01-31T17:28:36.767063+05:30
%A Chandan Senapati
%A Utpal Roy
%T Multilingual ASR Model for Kudmali Voice Recognition
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 64
%P 27-35
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Kudmali language, an underrepresented and potentially vulnerable language, faces significant challenges in the development of Automatic Speech Recognition (ASR) systems due to its minimal digital presence and limited annotated datasets. This paper investigates the application of the multilingual XLS-R model, a transformer-based pre-trained ASR framework, for Kudmali voice detection. By leveraging transfer learning and fine-tuning techniques, we adapt the XLS-R model to recognize and transcribe Kudmali speech effectively. The proposed system utilizes a diverse dataset of Kudmali audio recordings, transcribed in Bengali script, addressing the lack of native transcriptions. We present a comprehensive data preparation pipeline, including audio normalization, data augmentation, and multilingual model adaptation, to overcome resource limitations. Comparative performance analysis with baseline models demonstrates significant improvements, achieving a Word Error Rate (WER) of 19.8% and a Character Error Rate (CER) of 12.1% after fine-tuning, with further reductions when using data augmentation techniques. This study highlights the potential of leveraging multilingual pretrained models like XLS-R to develop ASR systems for lowresource languages, ensuring their preservation and promoting digital inclusivity. The findings underscore the importance of adapting state-of-the-art ASR frameworks for linguistic diversity, paving the way for further advancements in underrepresented language technology. The study aims to evaluate models’ adaptability, accuracy, and error patterns in recognizing this lesser-known language, contributing to the broader application of ASR technologies in lowresource languages.

References
  1. Harpreet Singh Anand, Amulya Ratna Dash, and Yashvardhan Sharma. Empowering low-resource language translation: Methodologies for bhojpuri-hindi and marathi-hindi asr and mt. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 229–234, 2024.
  2. Arun Babu, ChanghanWang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick Von Platen, Yatharth Saraf, Juan Pino, et al. Xls-r: Selfsupervised cross-lingual speech representation learning at scale. arXiv preprint arXiv:2111.09296, 2021.
  3. Joyanta Basu, Soma Khan, Rajib Roy, Tapan Kumar Basu, and Swanirbhar Majumder. Multilingual speech corpus in low-resource eastern and northeastern indian languages for speaker and language identification. Circuits, Systems, and Signal Processing, 40(10):4986–5013, 2021.
  4. Sruti Sruba Bharali and Sanjib Kr Kalita. A comparative study of different features for isolated spoken word recognition using hmm with reference to assamese language. International Journal of Speech Technology, 18:673–684, 2015.
  5. Shuangyu Chang, Lokendra Shastri, and Steven Greenberg. Automatic phonetic transcription of spontaneous speech (american english). In INTERSPEECH, pages 330–333. Citeseer, 2000.
  6. Niladri Sekhar Dash. Documentation and digitization of endangered indigenous languages: Methods and strategies.
  7. Barsha Deka, Joyshree Chakraborty, Abhishek Dey, Shikhamoni Nath, Priyankoo Sarmah, SR Nirmala, and Samudra Vijaya. Speech corpora of underresourced languages of northeast india. In 2018 Oriental COCOSDA-International Conference on Speech Database and Assessments, pages 72–77. IEEE, 2018.
  8. Barsha Deka, S. R. Nirmala, S. R. Nirmala, and K. Samudravijaya. Development of assamese continuous speech recognition system. In The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018.
  9. Amandeep Singh Dhanjal and Williamjeet Singh. A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83(8):23367– 23412, 2024.
  10. Mark JF Gales, Kate M Knill, Anton Ragni, and Shakti P Rath. Speech recognition and keyword spotting for lowresource languages: Babel project research at cued. In Fourth International workshop on spoken language technologies for under-resourced languages (SLTU-2014), pages 16–23. International Speech Communication Association (ISCA), 2014.
  11. Rupak Raj Ghimire, Bal Krishna Bal, and Prakash Poudyal. A comprehensive study of the current state-of-the-art in nepali automatic speech recognition systems. arXiv preprint arXiv:2402.03050, 2024.
  12. Shivang Gupta, Kowshik Siva Sai Motepalli, Ravi Kumar, Vamsi Narasinga, Sai Ganesh Mirishkar, and Anil Kumar Vuppala. Enhancing language identification in indian context through exploiting learned features with wav2vec2. 0. In International Conference on Speech and Computer, pages 503– 512. Springer, 2023.
  13. Hamza Kheddar, Mustapha Hemis, and Yassine Himeur. Automatic speech recognition using advanced deep learning approaches: A survey. Information Fusion, page 102422, 2024.
  14. Ritesh Kumar, Bornini Lahiri, and Deepak Alok. Developing lrs for non-scheduled indian languages: A case of magahi. In Human Language Technology Challenges for Computer Science and Linguistics: 5th Language and Technology Conference, LTC 2011, Pozna´n, Poland, November 25–27, 2011, Revised Selected Papers 5, pages 491–501. Springer, 2014.
  15. Ritesh Kumar, Atul Kr Ojha, Bornini Lahiri, and Chingrimnng Lungleng. Aggression in hindi and english speech: Acoustic correlates and automatic identification. arXiv preprint arXiv:2204.02814, 2022.
  16. Hong Leung and V Zue. A procedure for automatic alignment of phonetic transcriptions with continuous speech. In ICASSP’84. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 9, pages 73–76. IEEE, 1984.
  17. Min-Siong Liang, Ren-Yuan Lyu, and Yuang-Chin Chiang. Phonetic transcription using speech recognition technique considering variations in pronunciation. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, volume 4, pages IV–109. IEEE, 2007.
  18. Rajeev Ranjan and Rajesh Kumar Dubey. Isolated word recognition using hmm for maithili dialect. In 2016 International Conference on Signal Processing and Communication (ICSC), pages 323–327, 2016.
  19. Nay San, Martijn Bartelds, Mitchell Browne, Lily Clifford, Fiona Gibson, John Mansfield, David Nash, Jane Simpson, Myfany Turpin, Maria Vollmer, Sasha Wilmoth, and Dan Jurafsky. Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1094–1101, 2021.
  20. Himangshu Sarma, Navanath Saharia, and Utpal Sharma. Development and analysis of speech recognition systems for assamese language using htk. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17(1):1–14, 2017.
  21. Abhayjeet Singh, Arjun Singh Mehta, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Priyanka Pai, et al. Model adaptation for asr in low-resource indian languages. arXiv preprint arXiv:2307.07948, 2023.
  22. Shivangi Singh and Shobha Bhatt. Phoneme based hindi speech recognition using deep learning. In 2024 2nd International Conference on Disruptive Technologies (ICDT), pages 159–162. IEEE, 2024.
  23. Jinshi Wang. Cross-lingual Transfer Learning for Low- Resource Natural Language Processing Tasks. PhD thesis, Master Thesis. Institute for Anthropomatics and Robotics, Karlsruhe . . . , 2021.
Index Terms

Computer Science
Information Sciences
Natural Language Processing
Speech Recognition

Keywords

ASR Kudmali Language Multilingual Models XLS-R model Speech Recognition Low-Resource Languages