International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 137 - Number 4 |
Year of Publication: 2016 |
Authors: Prashant Borde, Ramesh Manza, Bharti Gawali, Pravin Yannawar |
10.5120/ijca2016908696 |
Prashant Borde, Ramesh Manza, Bharti Gawali, Pravin Yannawar . ‘vVISWa’ – A Multilingual Multi-Pose Audio Visual Database for Robust Human Computer Interaction. International Journal of Computer Applications. 137, 4 ( March 2016), 25-31. DOI=10.5120/ijca2016908696
Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area of signal processing and pattern recognition. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In order to develop robust AVSR systems under Human Computer Interaction an appropriate simultaneously recorded speech and video data are needed. This paper describes a ‘vVISWa’ (Visual Vocabulary of Independent Standard Words) database consists of audio visual data of 48 native speakers and 10 nonnative speakers. These speakers have contributed towards development of corpus in three profiles that is full frontal, 450 profile and side pose. This database was primarily designed to deal with Multi-pose Audio Visual Speech Recognition system for three languages that is, ‘Marathi’ (The Native language of Maharashtra), ‘Hindi’ (National Language of India) and ‘English’ (Universal language). This database is multi-pose, multi-lingual database formed in Indian context. This database available by request from http://visbamu.in/viswaDataset.html.