CFP last date
20 January 2025
Reseach Article

A Review on Action Recognition and Action Prediction of Human(s) using Deep Learning Approaches

by Syed Abdussami, Nagendraprasad S., Shivarajakumara K., Sanjeet Singh, A. Thyagarajamurthy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 177 - Number 20
Year of Publication: 2019
Authors: Syed Abdussami, Nagendraprasad S., Shivarajakumara K., Sanjeet Singh, A. Thyagarajamurthy
10.5120/ijca2019919605

Syed Abdussami, Nagendraprasad S., Shivarajakumara K., Sanjeet Singh, A. Thyagarajamurthy . A Review on Action Recognition and Action Prediction of Human(s) using Deep Learning Approaches. International Journal of Computer Applications. 177, 20 ( Nov 2019), 1-5. DOI=10.5120/ijca2019919605

@article{ 10.5120/ijca2019919605,
author = { Syed Abdussami, Nagendraprasad S., Shivarajakumara K., Sanjeet Singh, A. Thyagarajamurthy },
title = { A Review on Action Recognition and Action Prediction of Human(s) using Deep Learning Approaches },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2019 },
volume = { 177 },
number = { 20 },
month = { Nov },
year = { 2019 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume177/number20/31012-2019919605/ },
doi = { 10.5120/ijca2019919605 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:46:23.988381+05:30
%A Syed Abdussami
%A Nagendraprasad S.
%A Shivarajakumara K.
%A Sanjeet Singh
%A A. Thyagarajamurthy
%T A Review on Action Recognition and Action Prediction of Human(s) using Deep Learning Approaches
%J International Journal of Computer Applications
%@ 0975-8887
%V 177
%N 20
%P 1-5
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Human Action Recognition and Prediction are some of the hot topics in Computer Vision these days. It has its formidable contribution in the Anomaly detection. Many research scientists have been working in this field. Many new algorithms have been tried out in recent decades. In this paper, eight such approaches proposed in eight research papers have been reviewed. Compared to their counterparts for still images (the 2D CNNs for visual recognition), the 3D CNNs are considered to be comparatively less efficient, due to the limitations like high training complexity of spatio-temporal fusion and huge memory cost. So in the first referred paper the authors have proposed MiCT (Mixed Convolution Tube – for videos) with the right use of both 2D CNNs and 3D CNNs which reduces the training time. In the second research paper, the glimpse sequences in each frame correspond to interest points in the scene that are relevant to the classified activities. Unlike the last referred paper, the third referred paper presents a novel method to recognize human action as the evolution of pose estimation maps. The fourth referred paper presents a model for long term prediction of pedestrians from on-board observations. In the fifth research article referred, an attempt has been made to recognize the Human Rights Violation activities using the Deep Convolutional Neural Networks. In the sixth research article, Convolutional LSTM is used for the purpose of detecting violent videos. The seventh paper introduces a new Two-Stream Inflated 3D ConvNet (I3D) that is based on 2D ConvNet inflation. In the eighth research paper, a new temporal transition layer (TTL) that models variable temporal convolution kernel depths is embedded into 3D CNN to form T3D (Temporal 3D Convnets). Transferring knowledge from a pre-trained 2D CNN to a 3D CNN reduces the number of training samples required for 3D CNNs.

References
  1. Y. Zhou, X. Sun, Z. Zha and W. Zeng, "MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 449-458
  2. F. Baradel, C. Wolf, J. Mille and G. W. Taylor, "Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 469-478
  3. M. Liu and J. Yuan, "Recognizing Human Actions as the Evolution of Pose Estimation Maps," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 1159-1168.
  4. A. Bhattacharyya, M. Fritz and B. Schiele, "Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 4194-4202.
  5. Kalliatakis, Grigorios & Ehsan, Shoaib & Fasli, Maria & Leonardis, Ales & Gall, Juergen & McDonald-Maier, Klaus. (2016). Detection of Human Rights Violations in Images: Can Convolutional Neural Networks help?.
  6. S. Sudhakaran and O. Lanz, "Learning to detect violent videos using convolutional long short-term memory," 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, 2017, pp. 1-6.
  7. Carreira, J & Zisserman, Andrew. (2017). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. 4724-4733. 10.1109/CVPR.2017.502.
  8. Diba, Ali & Fayyaz, Mohsen & Sharma, Vivek & Hossein Karami, Amir & Mahdi Arzani, Mohammad & Van Gool, Luc & Yousefzadeh, Rahman. (2017). Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification.
  9. Tu, Zhigang&Xie, Wei & Qin, Qianqing&Poppe, Ronald &Veltkamp, Remco & Li, Baoxin& Yuan, Junsong. (2018). Multi-stream CNN: Learning representations based on human-related regions for action recognition. Pattern Recognition. 79. 32-43. 10.1016/j.patcog.2018.01.020.
Index Terms

Computer Science
Information Sciences

Keywords

CNN SVM MiCT Glimpse Clouds Two-stream Bayesian Encoder-Decoder Pose estimation Heat Maps ConvLSTM Two-stream 3D CNN TTL T3D.