CFP last date
20 January 2025
Reseach Article

Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features

by Esra J. Harfash, Diyar H. Shakir
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Number 1
Year of Publication: 2017
Authors: Esra J. Harfash, Diyar H. Shakir
10.5120/ijca2017915901

Esra J. Harfash, Diyar H. Shakir . Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features. International Journal of Computer Applications. 180, 1 ( Dec 2017), 33-38. DOI=10.5120/ijca2017915901

@article{ 10.5120/ijca2017915901,
author = { Esra J. Harfash, Diyar H. Shakir },
title = { Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2017 },
volume = { 180 },
number = { 1 },
month = { Dec },
year = { 2017 },
issn = { 0975-8887 },
pages = { 33-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume180/number1/28766-2017915901/ },
doi = { 10.5120/ijca2017915901 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:59:27.192334+05:30
%A Esra J. Harfash
%A Diyar H. Shakir
%T Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features
%J International Journal of Computer Applications
%@ 0975-8887
%V 180
%N 1
%P 33-38
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The most important challenges in AVSR and the focus of most research are the features that are extracted, and when combined give better results. The other challenge is the resulted feature here of nature are large in size, then prefers here to reduce the features by use of an appropriate way to reduce these data with ensure have their properties after downsizing. The System that is presented in this research is for recognition a group of Arabic words voices, from one to ten words. In the acoustic parts the features were extracted of coefficients MFCC, LPC,FFT to be determine which type of these features is efficient in AVSR .All these types of feature are showed efficient results but MFCC is the best. The visual features are calculated of DCT matrix, and the features are extracted by applying the zigzag scan. In the reduction features stage, several methods of data reducing have been implemented; they are LDA, PCA and SVD. Each method are applied to the data separately. The KNN models are used in the stage of recognition, where the testing is implemented on dependent and independent database of words from one to ten. The final results that obtained are efficient and encouraging.

References
  1. Vorwerk A., Wang X., Kolossa D., Zeiler S., and Orglmeister R., "WAPUSK20 – A database for robust audiovisual speech recognition", Chair of Electronics and Medical Signal Processing , EMSP, University of Berlin, Einsteinufer 17, 10587 Berlin, 2011.
  2. Potamianos G., Neti C., Luettin J., and Matthews I., "Audio-visual automatic speech recognition: an overview". Issues in audio-visual speech processing. MIT Press, 2004.
  3. Lucey S., Chen T., Sirdharan S., and Chardran V.," Integration Strategies for Audio-visual Speech Processing: Applied to Text Dependent Speaker Recognition", Queensland University of Technology, Australia, 2004.
  4. Pao T.L., and Liao W.Y., "AVSR for Testing AV Database", Department of Computer Science and Engineering, University of Tatung, Taipei, Taiwan, R.O.C, 2006.
  5. Kratt J., Metze F., Stiefelhagen R., and Waibel A.," Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit", Interactive Systems Laboratories University of Karlsruhe , Germany, 2004.
  6. Potamianos G., Neti C., and Deligne S., " Joint Audio Visual Speech Processing for Recognition and Enhancement". Proceedings of AVSP, 2003.
  7. Goecke R., and Potamianos G., " Neti. Noisy Audio Feature Enhancement using Audio-Visual Speech Data". ICASSP 02, 2002.
  8. Bord P., Varp A., Manz R., and Yannawar P., "Recognition of Isolated Words using Zernike and MFCC features for AVSR", Department of Science and Technology (DST), India, 2011.
  9. Gagnon L., S., Foucher F. L., and Boulianne G., "A simplified audiovisual fusion model with application to large-vocabulary recognition of French Canadian speech", CAN.J.ELECT. COMPUT. ENG., VOL. 33, NO. 2, SPRING 2008.
  10. Galatas G, Potamianos G., and Makedon F., "AVSR Incorporating Facial Depth Information Captured by the Kinect", 20th European Signal Processing Conference EUSIPCO, Bucharest, Romania, August 2012.
  11. Silber-Varod V, and Geri N., "Can ASR be Satisficing for Audio/Visual Search? Keyword-Focused Analysis of Hebrew Automatic and Manual Transcription",Online Journal of Applied Knowledge Management, Vol. 2, Issue 1, 2014.
  12. Potamiano G., and Neti Ch., "AVSR In Challenging Environment", Processing of the European Conference on Speech Communication and Technology (EUROSPEECH), PP. 1293-1296, Geneva, Switzerland, sept. 2003.
  13. Reikeras H., Engelbrecht H., Herbst B., and Preez J.D., "AVSR using SciPy", University of Stellenbosch, http://www.SciPy.org/, 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Audio-Video Speech Processing Automatic Speech recognition Mouth detection Discrete cosine transformation Visual Features