International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 183 - Number 25 |
Year of Publication: 2021 |
Authors: Atharva Bankar, Aryan Gandhi, Dipali Baviskar |
10.5120/ijca2021921625 |
Atharva Bankar, Aryan Gandhi, Dipali Baviskar . Image and Signal Processing of Mel-Spectrograms in Isolated Speech Recognition. International Journal of Computer Applications. 183, 25 ( Sep 2021), 11-17. DOI=10.5120/ijca2021921625
One of the fundamental modes of communication is speech. In the past decade, many advances in the field of speech recognition system have been recorded. The conversion of acoustic waveforms into human understandable texts is the basic idea behind these systems. In this paper, an automatic speech recognition (speech-to-text) system is modelled which recognizes isolated words (one at a time). The word predictions are made based on two methods, namely Image Processing and Signal Processing. This paper presents the idea of a speech recognition system for the fundamental progress of speech recognition and also gives an overview of techniques used in each stage of speech recognition. Moreover, a comparative analysis on basis of accuracy and computation time is done. The techniques showcased in this study are used for feature extraction and then used to identify 30 spoken commands using convolutional neural networks (CNNs).