CFP last date
20 January 2025
Reseach Article

Hindi Pronunciation Analysis for Speech Impaired using MFCC and DTW

by Sahil Panchbhaiya, Pranav Menon, Rishikesh Lingayat, Nikhita Mangaonkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 31
Year of Publication: 2024
Authors: Sahil Panchbhaiya, Pranav Menon, Rishikesh Lingayat, Nikhita Mangaonkar
10.5120/ijca2024923887

Sahil Panchbhaiya, Pranav Menon, Rishikesh Lingayat, Nikhita Mangaonkar . Hindi Pronunciation Analysis for Speech Impaired using MFCC and DTW. International Journal of Computer Applications. 186, 31 ( Aug 2024), 48-54. DOI=10.5120/ijca2024923887

@article{ 10.5120/ijca2024923887,
author = { Sahil Panchbhaiya, Pranav Menon, Rishikesh Lingayat, Nikhita Mangaonkar },
title = { Hindi Pronunciation Analysis for Speech Impaired using MFCC and DTW },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2024 },
volume = { 186 },
number = { 31 },
month = { Aug },
year = { 2024 },
issn = { 0975-8887 },
pages = { 48-54 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number31/hindi-pronunciation-analysis-for-speech-impaired-using-mfcc-and-dtw/ },
doi = { 10.5120/ijca2024923887 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-08-11T02:24:43.897894+05:30
%A Sahil Panchbhaiya
%A Pranav Menon
%A Rishikesh Lingayat
%A Nikhita Mangaonkar
%T Hindi Pronunciation Analysis for Speech Impaired using MFCC and DTW
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 31
%P 48-54
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The aim of this experiment is to educate speech-impaired learners on the pronunciation of Hindi syllables by providing word breakdowns, sounds, and examples of their usage. After the speaker becomes familiar with the syllables, a voice sample from the user is taken as input and analyzed to determine whether it matches the predefined data, ensuring that the speaker is following correctly. This feature matching is performed using Dynamic Time Warping (DTW) and Mel-Frequency Cepstral Coefficients (MFCC). The process is carried out using a combination of MFCC and DTW. In the two-step process of speech analysis, MFCC is used in the first phase to extract fourteen features, and the second phase employs three unique classifiers: k-Nearest Neighbour (KNN), Support Vector Machine (SVM), and Dynamic Time Warping (DTW) to determine the best combination for accurate and precise feature matching.

References
  1. A. Winursito, R. Hidayat, A. Bejo and M. N. Y. Utomo, "Feature Data Reduction of MFCC Using PCA and SVD in Speech Recognition System," 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, 2018, pp. 1-6, doi: 10.1109/ICSCEE.2018.8538414.
  2. R. Hidayat, A. Bejo, S. Sumaryono and A. Winursito, "Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System," 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE), Bali, Indonesia, 2018, pp. 280-284, doi: 10.1109/ICITEED.2018.8534807.
  3. A. Brahme and U. Bhadade, "Marathi digit recognition using lip geometric shape features and dynamic time warping," TENCON 2017 - 2017 IEEE Region 10 Conference, Penang, Malaysia, 2017, pp. 974-979, doi: 10.1109/TENCON.2017.8227999.
  4. R. Koul, M. Yadav and K. Suneja, "Comparative Analysis of FPGA Based Hardware Design of Dynamic Time Warping Algorithm using Different Multiplier Architectures," 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2020, pp. 599-603, doi: 10.1109/GUCON48875.2020.9231244.
  5. P. Yang, L. Xie, Q. Luan and W. Feng, "A tighter lower bound estimate for dynamic time warping," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 8525-8529, doi: 10.1109/ICASSP.2013.6639329.
  6. J. Joseph and S. S. Upadhya, "Indian accent detection using dynamic time warping," 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), Chennai, India, 2017, pp. 2814-2817, doi: 10.1109/ICPCSI.2017.8392233.
  7. J. C. Vasquez-Correa, J. R. Orozco-Arroyave and E. Nöth, "Word accuracy and dynamic time warping to assess intelligibility deficits in patients with Parkinson's disease," 2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA), Bucaramanga, Colombia, 2016, pp. 1-5, doi: 10.1109/STSIVA.2016.7743349.
  8. X. Zhang, J. Sun, Z. Luo and M. Li, "Confidence index dynamic time warping for language-independent embedded speech recognition," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 8066-8070, doi: 10.1109/ICASSP.2013.6639236.
  9. K. Sheoran et al., "Pronunciation Scoring With Goodness of Pronunciation and Dynamic Time Warping," in IEEE Access, vol. 11, pp. 15485-15495, 2023, doi: 10.1109/ACCESS.2023.3244393.
  10. S. Singhal and R. K. Dubey, "Automatic speech recognition for connected words using DTW/HMM for English/ Hindi languages," 2015 Communication, Control and Intelligent Systems (CCIS), Mathura, India, 2015, pp. 199-203, doi: 10.1109/CCIntelS.2015.7437908.
  11. S. Paul, B. P. Babu and L. Mary, "Assessment of Articulation Disorder Using Objective Quality Measures," 2018 International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT), Kannur, India, 2018, pp. 439-444, doi: 10.1109/ICCPCCT.2018.8574289.
  12. Zhang Jing and Zhang Min, "Speech recognition system based improved DTW algorithm," 2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering, Changchun, 2010, pp. 320-323, doi: 10.1109/CMCE.2010.5609979.
  13. T. S. Kumar, T. Sheela, D. Arulselvam, S. Premalatha and K. Srividya, "Study of Various Machine Learning Algorithms for use with Automatic Speech Recognition," 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 2022, pp. 1-5, doi: 10.1109/ICPECTS56089.2022.10047695.
  14. P. Mahesha and D. S. Vinod, "LP-Hillbert transform based MFCC for effective discrimination of stuttering dysfluencies," 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 2017, pp. 2561-2565, doi: 10.1109/WiSPNET.2017.8300225.
  15. M. Goyani, N. Dave and N. M. Patel, "Performance Analysis of Lip Synchronization Using LPC, MFCC and PLP Speech Parameters," 2010 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, 2010, pp. 582-587, doi: 10.1109/CICN.2010.115.
  16. Q. Li et al., "MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications," in IEEE Access, vol. 8, pp. 48720-48730, 2020, doi: 10.1109/ACCESS.2020.2979799.
  17. Senthildevi K. A and Chandra E, "Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm," 2015 International Conference on Communications and Signal Processing (ICCSP), Melmaruvathur, India, 2015, pp. 0550-0554, doi: 10.1109/ICCSP.2015.7322545.
  18. S. Gaikwad, B. Gawali, P. Yannawar and S. Mehrotra, "Feature extraction using fusion MFCC for continuous marathi speech recognition," 2011 Annual IEEE India Conference, Hyderabad, India, 2011, pp. 1-5, doi: 10.1109/INDCON.2011.6139372.
  19. M. V. Unnikrishnan and R. Rajan, "Mimicking voice recognition using MFCC-GMM framework," 2017 International Conference on Trends in Electronics and Informatics (ICEI), Tirunelveli, India, 2017, pp. 301-304, doi: 10.1109/ICOEI.2017.8300936.
  20. K. Sukvichai, C. Utintu and W. Muknumporn, "Automatic Speech Recognition for Thai Sentence based on MFCC and CNNs," 2021 Second International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), Bangkok, Thailand, 2021, pp. 1-4, doi: 10.1109/ICA-SYMP50206.2021.9358451.
  21. Mizanur Rahman and Md. Babul Islam, "Performance evaluation of MLPC and MFCC for HMM based noisy speech recognition," 2010 13th International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 2010, pp. 273-276, doi: 10.1109/ICCITECHN.2010.572386
Index Terms

Computer Science
Information Sciences

Keywords

Mel Frequency Cepstral Frequency(MFCC); Dynamic Time Warping(DTW); k-Nearest Neighbour(KNN); Support Vector Machine(SVM); Speech Impairment; Hindi Syllables