CFP last date
20 January 2025
Reseach Article

Human Action Recognition through the First-Person Point of view, Case Study Two Basic Task

by Mohammad Almasi, Hamed Fathi, Sayed Adel Ghaeinian, Samaneh Samiee
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 177 - Number 24
Year of Publication: 2019
Authors: Mohammad Almasi, Hamed Fathi, Sayed Adel Ghaeinian, Samaneh Samiee
10.5120/ijca2019919703

Mohammad Almasi, Hamed Fathi, Sayed Adel Ghaeinian, Samaneh Samiee . Human Action Recognition through the First-Person Point of view, Case Study Two Basic Task. International Journal of Computer Applications. 177, 24 ( Dec 2019), 19-23. DOI=10.5120/ijca2019919703

@article{ 10.5120/ijca2019919703,
author = { Mohammad Almasi, Hamed Fathi, Sayed Adel Ghaeinian, Samaneh Samiee },
title = { Human Action Recognition through the First-Person Point of view, Case Study Two Basic Task },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2019 },
volume = { 177 },
number = { 24 },
month = { Dec },
year = { 2019 },
issn = { 0975-8887 },
pages = { 19-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume177/number24/31045-2019919703/ },
doi = { 10.5120/ijca2019919703 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:46:48.341465+05:30
%A Mohammad Almasi
%A Hamed Fathi
%A Sayed Adel Ghaeinian
%A Samaneh Samiee
%T Human Action Recognition through the First-Person Point of view, Case Study Two Basic Task
%J International Journal of Computer Applications
%@ 0975-8887
%V 177
%N 24
%P 19-23
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this study, a human motion dataset is built and developed based on indoors and outdoors actions through a bounded-on-head camera and Xsens for tracking the motions. The key point here to structuring the dataset is utilized to set the sequence of a Deep Neural Network and order an arrangement of frames in the performed task (washing, eating, etc.). As a final point, a 3D modeling of the person suggested at every frame centered with the comparable structure of the first network. More than 120,000 frames constructed the dataset, taken from 7 different people, each one acting out different tasks in diverse indoor and outdoor scenarios. The sequences of every video frame were 3D synchronized and segmented 23 parts.

References
  1. Alletto, S., Serra, G., Calderara, S., & Cucchiara, R. (2015). Understanding social relationships in egocentric vision. Pattern Recognition, 48(12), 4082-4096. DOI:10.1016/j.patcog.2015.06.006
  2. Almasi, M. (2018). Investigating the Effect of Head Movement during Running and Its Results in Record Time Using Computer Vision. International Journal of Applied Engineering Research, 13(11), 9433-9436
  3. Cao, Z., Simon, T., Wei, S., & Sheikh, Y. (2017). Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI:10.1109/cvpr.2017.143
  4. Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou, & Wanqing Li. (2017). Skeleton-based action recognition using LSTM and CNN. 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). DOI:10.1109/icmew.2017.8026287
  5. Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Furnari, A., Kazakos, E, Wray, M. (2018). Scaling Egocentric Vision: The Dataset Computer Vision – ECCV 2018, 753-771.doi:10.1007/978-3-030-01225-0_44
  6. Ekvall, S., & Kragic, D. (2006). Learning Task Models from Multiple Human Demonstrations. ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication. DOI:10.1109/roman.2006.314460
  7. El-Yacoubi, M. A., He, H., Roualdes, F., Selmi, M., Hariz, M., & Gillet, F. (2015). Vision-based Recognition of Activities by a Humanoid Robot. International Journal of Advanced Robotic Systems, 1. DOI: 10.5772/61819
  8. Fathi, A., Li, Y., & Rehg, J. M. (2012). Learning to Recognize Daily Actions Using Gaze. Computer Vision – ECCV 2012, 314-327. DOI: 10.1007/978-3-642-33718-5_23
  9. Hara, K., Kataoka, H., & Satoh, Y. (2018). Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI:10.1109/cvpr.2018.00685
  10. Hochreiter, S. & Schmidhuber, Jü. (1997). Long short-term memory. Neural computation, 9, 1735--1780
  11. Jiang, H., & Grauman, K. (2017). Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI:10.1109/cvpr.2017.373
  12. Li Y., Lan, C., Xing, J., Zeng, W., Yuan, C., & Liu, J. (2016). Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks. Computer Vision – ECCV 2016, 203-220. DOI: 10.1007/978-3-319-46478-7_13
  13. Majd, M., & Safabakhsh, R. (2019). Correlational Convolutional LSTM for human action recognition. Neurocomputing. DOI:10.1016/j.neucom.2018.10.095
  14. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
  15. Patel, D., & Upadhyay, S. (2013). Optical Flow Measurement using Lucas Kanade Method. International Journal of Computer Applications, 61(10), 6-10. DOI: 10.5120/9962-4611
  16. Pirsiavash, H., & Ramanan, D. (2012). Detecting activities of daily living in first-person camera views. 2012 IEEE Conference on Computer Vision and Pattern Recognition. DOI:10.1109/cvpr.2012.6248010
  17. Squartini, S., Hussain, A., & Piazza, F. (n.d.). Preprocessing based solution for the vanishing gradient problem in recurrent neural networks. Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03. DOI:10.1109/iscas.2003.1206412
  18. Yuan, Y., & Kitani, K. (2018). 3D Ego-Pose Estimation via Imitation Learning. Computer Vision – ECCV 2018, 763-778. DOI: 10.1007/978-3-030-01270-0_45
  19. Zhu, L., & Wan, W. (2018). Human Pose Estimation Based on Deep Neural Network. 2018 International Conference on Audio, Language and Image Processing (ICALIP). DOI:10.1109/icalip.2018.8455245
Index Terms

Computer Science
Information Sciences

Keywords

Machine learning deep learning Computer vision LSTM Recurrent neural network ResNet motion recognition.