Skeleton-based Human Action Recognition
with Recurrent Neural Network
University of Science, VNU-HCM
Faculty of Information Technology
Advanced Program in Computer Science
Võ Trần Thanh Lương
1551020
Vũ Hoàng Quân
1551026
Thesis Advisors:
Dr. Trần Thái Sơn
Ho Chi Minh City
Aug 18th
2019
Outlines
●
●
●
●
●
●
●
Introduction
Every human action is done to serve a purpose. Machines should be able to learn and understand it.
Introduction
Introduction
●
●
○
○
○
○
●
●
Introduction
●
○
○
Motivation
●
○
○
○
○
○
○ …
●
Motivation
Contributions
●
●
●
●
Related Work
● ●
Related Work
●
○
○
Zhuowen Lv 1, Xianglei Xing 1,, Kejun Wang 1, and Donghai Guan 2 , "Class
Energy Image Analysis for Video Sensor-Based Gait Recognition: A Review,"
An example of local representations for human action
Related Work
●
○
○
Georgios D. Evangelidis, Gurkirt Singh, Radu
Horaud, "Continuous Gesture Recognition from
Articulated Poses,"
Spatial Temporal interest points. S.F. Wong, T.-K. Kim, and R. Cipolla, "Learning motion
categories using both semantic and structural information”
Related Work
●
○
○
■
■
Example of 3D convolution. Karen Simonyan & Andrew Zisserman , "Two-Stream Convolutional Networks for Action Recognition in Videos,"
Related Work
●
○
○
Hybrid network for temporal modeling. Lisa Anne Hendricks, Marcus Rohrbach,
Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell,
"Long-term recurrent convolutional networks for visual recognition and description,"
Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, Alex C. Kot, "Skeleton-Based Human Action
Recognition with Global Context-Aware Attention LSTM Networks,
Proposed Method
Temporal RNN
Spatial RNN
Temporal RNN
●
●
●
Temporal RNN
●
●
●
○
○
Spatial RNN
●
●
●
●
Spatial RNN
Spatial RNN
●
●
●
Spatial RNN
Spatial RNN
3D Transformation
Training Flow
Skeleton
Dataset
Training
Set
Testing
Set
Initialize
RNN
Feature
Extraction
Training
Softmax
Classification
Trained
Model
NTU
Dataset
Kinetics
Dataset
Raw
video
Extract Skeleton
Data
Using OpenPose
Predicted Flow
Raw Video
Extract Skeleton
Data
Load Trained
Model
Predicted Value
Using OpenPose
Experiments
●
○
○
○
NTU RGB+D and NTU RGB+D 120 Dataset
●
●
●
●
Skeleton Joints Position (NTU Dataset)
Kinetics Dataset
●
●
●
Sample frames of Kinetics Dataset
Skeleton Joints Position (Coco Model)
Proposed Method Result
Comparison with the state-of-the-art
Accuracy Calculation
Problems with Kinetics dataset
NTU RGB+D NTU RGB+D 120 Kinetics
Raw Videos Yes Yes No (can obtain
from given URLs)
3D skeleton data Yes Yes No
Depth maps Yes Yes No
Problems with Kinetics dataset
○
○
Demo
Conclusion
●
●
●
●
Future Work
●
●
●
Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, Alex C. Kot, "Skeleton-Based Human
Action Recognition with Global Context-Aware Attention LSTM Networks,
ASPR Framework. Liu, Jun and Shahroudy, Amir and Perez, Mauricio and Wang, Gang and
Duan, Ling-Yu and Kot, Alex C., "NTU RGB+D 120: A Large-Scale Benchmark for 3D
Human Activity Understanding,"
THANK YOU FOR YOUR ATTENTION
University of Science, VNU-HCM
Faculty of Information Technology
Advanced Program in Computer Science
Võ Trần Thanh Lương
1551020
Vũ Hoàng Quân
1551026
Thesis Advisors:
Dr. Trần Thái Sơn
Ho Chi Minh City
Aug 18th
2019

Skeleton-based Human Action Recognition with Recurrent Neural Network