CFP last date
20 January 2025
Reseach Article

LipVision: A Deep Learning Approach

by Parth Khetarpal, Riaz Moradian, Shayan Sadar, Sunny Doultani, Salma Pathan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 179 - Number 8
Year of Publication: 2017
Authors: Parth Khetarpal, Riaz Moradian, Shayan Sadar, Sunny Doultani, Salma Pathan
10.5120/ijca2017916029

Parth Khetarpal, Riaz Moradian, Shayan Sadar, Sunny Doultani, Salma Pathan . LipVision: A Deep Learning Approach. International Journal of Computer Applications. 179, 8 ( Dec 2017), 34-36. DOI=10.5120/ijca2017916029

@article{ 10.5120/ijca2017916029,
author = { Parth Khetarpal, Riaz Moradian, Shayan Sadar, Sunny Doultani, Salma Pathan },
title = { LipVision: A Deep Learning Approach },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2017 },
volume = { 179 },
number = { 8 },
month = { Dec },
year = { 2017 },
issn = { 0975-8887 },
pages = { 34-36 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume179/number8/28759-2017916029/ },
doi = { 10.5120/ijca2017916029 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:54:50.216591+05:30
%A Parth Khetarpal
%A Riaz Moradian
%A Shayan Sadar
%A Sunny Doultani
%A Salma Pathan
%T LipVision: A Deep Learning Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 179
%N 8
%P 34-36
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Lip-Reading is the task of interpreting what an individual is saying by analysing his/her mouth patterns while the individual is talking. The paper is conducting a survey on the previously done work on Lip-Reading. It will be discussing the different classifiers used, their efficiency and the end accuracy obtained. Lip-Reading can be used in a myriad of fields such as medical, communication and gaming. The proposed system will use the GRID corpus dataset in which the videos are recorded from 33 speakers. OpenCV and dlib will be used for face and mouth detection. Then the mouth ROI will be used with the iBug tool to annotate facial landmarks. The architecture consists of Convolutional Neural Networks which will be created and trained in Tensorflow (Open Source Software Library), which are then passed through Connectionist Temporal Classification. It will then be using saliency visualisation technique to interpret and match the learned behaviour and generate text.

References
  1. Yannis M. Assael, Brendan Shillingford, Shimon Whiteson and Nandode Freitas, “Lipnet: End-to-end sentence-level lipreading”, arXiv > cs > arXiv:1611.01599, 2016.
  2. Jithin George, Ronan Keane and Conor Zellmer, “Estimating speech from lip dynamics”, arXiv > cs > arXiv:1708.01198, 2017.
  3. Salma Pathan and Archana Ghotkar, “Recognition of spoken English phrases using visual features extraction and classification”, International Journal of Computer Science and Information Technologies, Vol. 6 (4), 3716-3719, 2015.
  4. Bor-Shing Lin, Yu-Hsien Yao, Ching-Feng Liu, Ching-Feng Lien, and Bor-Shyh Lin, “Development of Novel Lip-reading Recognition Algorithm”, IEEE Access Volume 5, Pages 794 – 801, 2017.
  5. Amit Garg, Jonathan Noyola and Sameep Bagadia, “Lip reading using CNN and LSTM”, 2016.
  6. Website – https://www.docs.opencv.org
  7. GRID Corpus Dataset http://spandh.dcs.shef.ac.uk/gridcorpus/
Index Terms

Computer Science
Information Sciences

Keywords

Computer Vision Deep Learning Pattern Recognition.