International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 177 - Number 16 |
Year of Publication: 2019 |
Authors: Francisco Carlos M. Souza, Alinne C. Correa Souza, Carolina Y. V. Watanabe, Patricia Pupin Mandrá, Alessandra Alaniz Macedo |
10.5120/ijca2019919393 |
Francisco Carlos M. Souza, Alinne C. Correa Souza, Carolina Y. V. Watanabe, Patricia Pupin Mandrá, Alessandra Alaniz Macedo . An Analysis of Visual Speech Features for Recognition of Non-articulatory Sounds using Machine Learning. International Journal of Computer Applications. 177, 16 ( Nov 2019), 1-9. DOI=10.5120/ijca2019919393
People with articulation and phonological disorders need exercise to execute sounds of speech. Essentially, exercise starts with production of non-articulatory sounds in clinics or homes where a huge variety of the environment sounds exist; i.e., in noisy locations. Speech recognition systems considers environment sounds as background noises, which can lead to unsatisfactory speech recognition. This study aims to assess a system that supports aggregation of visual features to audio features during recognition of non-articulatory sounds in noisy environments. Thehe methods Mel-Frequency Cepstrum Coefficients and Laplace transform were used to extract audio features, Convolutional Neural Network to extract video features, and Support Vector Machine to recognize audio and Long Short-Term Memory networks for video recognition. Report experimental results regarding the accuracy, recall and precision of the system on a set of 585 sounds was achieved. Overall, the results indicate that video information can complement audio recognition and assist non-articulatory sound recognition.