SlideShare a Scribd company logo
1 of 31
Download to read offline
HUMAN ACTION RECOGNITION
- INTRODUCTION OF HUMAN ACTION RECOGNITION
- ISSUES OF SKELETON-BASED ACTION RECOGNITION
- RESEARCH RELATED TO SKELETON-BASED ACTION RECOGNITION
1
연세대 박사과정 이인웅
HUMAN ACTION RECOGNITION
- INTRODUCTION OF HUMAN ACTION RECOGNITION
Introduction of Human Action Recognition
 RGB based Human Action Recognition
■ Two-stream convolutional networks for action recognition in
videos, in NIPS 2014
 UCF-101 (88.0 %), HMDB-51 (59.4 %)
■ Currently, UCF-101 (94.9 %), HMDB-51 (72.2 %) in CVPR 2017
■ Focusing on mapping video into action label, not human pose
3
Introduction of Human Action Recognition
 RGB based 2D Human Pose Estimation
■ Hand Face and Body Keypoint Detection (CVPR 2017)
 More context information than just RGB video
4
Introduction of Human Action Recognition
 RGB based 3D Human Pose Estimation
■ Recurrent 3D Pose Sequence Machine (CVPR 2017)
 More information (3D) than 2D human pose
5
Introduction of Human Action Recognition
 RGB-D based 3D Human Pose Estimation
■ Microsoft Kinect version 2.0
 More accurate than RGB based 3D skeleton because of depth
6
RGB with 2D Skeleton 3D Skeleton
HUMAN ACTION RECOGNITION
- ISSUES OF SKELETON-BASED ACTION RECOGNITION
Issues of Skeleton-based Action Recognition
 Attributes of Skeleton extracted from Camera
8
z
Variable Scale
View 1
x
y
x
y
z
View 2
Variable View Orientation
View 3z
x
y
small
large
Very Noisy
Issues of Skeleton-based Action Recognition
 Attributes of Human Action
9
Rate Variation
5 frames per 1 action
3 frames per 1 action
fast
slow
Intra-action Variation
Straight Punch
Curved Punch
HUMAN ACTION RECOGNITION
- RESEARCH RELATED TO SKELETON-BASED ACTION RECOGNITION
: ENSEMBLE DEEP LEARNING USING TS-LSTM NETWORKS
Ensemble Deep Learning using TS-LSTM networks
 Overview of the proposed deep learning network [1]
11
Coordinate
Transformation
Salient Motion
Extraction
Discriminative
Multi-term LSTMs
Ensemble of
Deep Learning
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
Multi-termDiversityaxis
Softmax axis
Motion 1
Motion 2
Motion N
LSTM
LSTM
…
Ensemble
Translation
Rotation
Scale
LSTM
LSTM
[1] Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee, "Ensemble Deep Learning for
Skeleton-based Action Recognition using Temporal Sliding LSTM networks," in ICCV 2017.
 Feature Representation (1/3)
■ Absolute skeleton position
 Joint coordinates of raw skeleton
 High orientation and location variations at the same action
■ Relative skeleton position
 Joint difference coordinates from a reference joint of each frame
 Low orientation and location variations at the same action
 Simplification of temporal skeleton movements (ex: Jump)
■ Absolute + Relative skeleton position
 Joint difference coordinates from a reference joint of initial frame
 Low orientation and location variations at the same action
 Reflection of temporal skeleton movements
12
Ensemble Deep Learning using TS-LSTM networks
Ensemble Deep Learning using TS-LSTM networks
 Feature Representation (2/3)
■ Pose orientation alignment
13
Trial: We transform y axis  the vector vertical to the ground, x axis  the left
direction of initial skeleton, and z axis  the cross product of x and y.
Effect: This trial achieves view/orientation invariance and retains temporal
relation between skeletons.
Initial skeleton of
each sequence
Sequence 1
Sequence 2
x z
y
Ensemble Deep Learning using TS-LSTM networks
 Feature Representation (3/3)
■ Motion feature extraction
14
Trial: We obtain a difference between two skeleton frames.
Effect: Motion feature can capture the actual movements of skeleton joints.
Difference between
two skeletons
 Modeling of Human Action (1/8)
■ Traditional work (1/4)
 Mining Actionlet Ensemble for
Action Recognition with Depth
Cameras, in CVPR 2012
 Fourier temporal pyramids
 In addition to the global
fourier coefficients, they
recursively partition the action
into a pyramid
 Mining actionlet ensemble
 Feature pooling
 Support vector machine
15
Ensemble Deep Learning using TS-LSTM networks
 Modeling of Human Action (2/8)
■ Traditional work (2/4)
 Human Action Recognition by Representing 3D Skeletons as
Points in A Lie Group, in CVPR 2014
 Feature representation using manifold, temporal alignment
through dynamic time warping, and SVM classification using FTP
16
Ensemble Deep Learning using TS-LSTM networks
SVM: Support Vector Machine
FTP: Fourier Temporal Pyramid
 Modeling of Human Action (3/8)
■ Traditional work (3/4)
 Hierarchical Recurrent Neural Network for Skeleton Based Action
Recognition, in CVPR 2015
 Spatial: body part based features, temporal: recurrent networks
17
Ensemble Deep Learning using TS-LSTM networks
 Modeling of Human Action (4/8)
■ Traditional work (4/4)
 Spatio-Temporal LSTM with Trust Gates for 3D Human Action
Recognition, in ECCV 2016
 Combination of spatial and temporal features using LSTM
18
Ensemble Deep Learning using TS-LSTM networks
LSTM: Long Short-Term Memory
Ensemble Deep Learning using TS-LSTM networks
 Modeling of Human Action (5/8)
■ Temporal Sliding LSTM (TS-LSTM)
 LSTM captures only long-term dependency.
 TS-LSTM can capture short-term, medium-term, and long-term
dependencies.
 We can adapt TS-LSTM into various dependencies through
controlling of temporal stride and internal LSTM time-step size.
19
LSTM LSTM LSTM LSTM
LSTM LSTM LSTM LSTM
Input Sequence
Temporal
Stride
LSTM LSTM LSTM LSTM
Input Sequence
: LSTM input : LSTM output
Traditional LSTM Proposed TS-LSTM
LSTM LSTM
LSTM: Long Short-Term Memory
Ensemble Deep Learning using TS-LSTM networks
 Modeling of Human Action (6/8)
■ Conceptual diagram of Ensemble TS-LSTM v1
 1 Short-term TS-LSTM with 1 motion feature
 2 Medium-term TS-LSTM with 2 motion features
 3 Long-term TS-LSTM with 3 motion features
20
: Ensemble feature
Short
Short
Short
Short
Medium
Medium
Medium
Medium
Long
Long
Long
: Motion feature
Output
Input
Ensemble Deep Learning using TS-LSTM networks
 Modeling of Human Action (7/8)
■ Conceptual diagram of Ensemble TS-LSTM v2
 1 Short-term TS-LSTM with 1 motion feature
 2 Medium-term TS-LSTM with 2 motion features
 3 Long-term TS-LSTM with 3 motion features
 1 Medium-term TS-LSTM with 1 pose feature
21
: Ensemble feature
Short
Short
Short
Short
Medium
Medium
Medium
Medium
Long
Long
Long
: Motion feature
Output
Input
Medium
Medium
: Pose feature
 Modeling of Human Action (8/8)
■ Actual Ensemble TS-LSTM v1 & Ensemble TS-LSTM v2
22
Ensemble Deep Learning using TS-LSTM networks
Ensemble Deep Learning using TS-LSTM networks
 Used Datasets
■ MSR Action3D dataset (widely used)
 20 actions performed by 10 subjects for 2 or 3 times
■ Northwestern-UCLA dataset (3 views)
 10 actions performed by 10 subjects for 1 ~ 6 times
 Abbreviation Definition
■ Human Cognitive Coordinate (HCC)
 y axis  the vector vertical to the ground, x axis  the left
direction of initial skeleton, z axis  the cross product of x and y
■ Salient Motion Feature (SMF)
 Difference features between two skeleton frames
23
Ensemble Deep Learning using TS-LSTM networks
 Results and Comparisons (1/2)
■ Bag of 3d points: Projection of the sampled 3D points
■ Lie group: Manifold feature based SVM model
■ HBRNN: Body-part based LSTM model
■ ST-LSTM + Trust Gate: Spatio-Temporal LSTM model with Trust
Gate
24
Experimental result comparison on the MSR Action3D dataset.
 Best Performance
SVM: Support Vector Machine
Ensemble Deep Learning using TS-LSTM networks
 Results and Comparisons (2/2)
■ Lie group: Manifold feature based SVM model
■ Actionlet ensemble: Temporal Pyramid features + SVM model
■ HBRNN-L: Body-part based LSTM model
■ Enhanced skeleton visualization: CNN based model
25
Experimental results on the Northwestern-UCLA dataset.
 Best Performance
CNN: Convolutional Neural Network
Ensemble Deep Learning using TS-LSTM networks
 Result Analysis (1/3)
■ Misclassified action
 Forward punch
 Tennis serve
 High throw
 Hammer, Tennis serve
 Pickup & throw
 Bend
■ Ensemble TS-LSTM v1
classifies these similar actions
to some degree by using the multiple
TS-LSTM networks
26
Confusion matrix of AS1
(Ensemble TS-LSTM v1)Ground Truth
Prediction
Ensemble Deep Learning using TS-LSTM networks
 Result Analysis (2/3)
■ Softmax feature analysis of Ensemble TS-LSTM v1 (1/2)
 Overall, the diagonal probabilities of Softmax2 with long-term
LSTMs are higher than those of Softmax0 with short-term LSTMs
and Softmax1 with medium-term LSTMs
 The global temporal features have relatively more influence on the
performance than the local temporal features
27Softmax0 Softmax1 Softmax2
Short Medium Long
Ensemble Deep Learning using TS-LSTM networks
 Result Analysis (3/3)
■ Softmax feature analysis of Ensemble TS-LSTM v1 (2/2)
 Softmax0 and Softmax1 sometimes produce lower
misclassification rates compared with Softmax2
 This makes the model less prone to overfitting to certain actions
 Softmax0 and Softmax1 have lower misclassification probabilities
of “Pickup & throw” to “Bend” than Softmax2
28Softmax0 Softmax1 Softmax2
Short Medium Long
(Weakness Compensation)
Ensemble Deep Learning using TS-LSTM networks
 Action Sequence Prediction (Yonsei Dataset)
29
1. Fall Down  2. Sit Down  3. Stand Up  4. Wave Hands  5. Hands On The
Head  6. Hunker Down  7. Punch  8. Kick  9. Wield Knife  10. Aim Handgun
 11. Aim Rifle  12. Throw  13. Kick Object
Action Label Order
Ensemble Deep Learning using TS-LSTM networks
 Remaining Issues
■ Network design for skeleton-based action
 Advanced ensemble TS-LSTM networks
■ Untrimmed skeleton-based action recognition
 Action classification + action localization
 Real-time action detection system
■ Human pose estimation
 2/3D skeleton estimation from RGB images
 Skeleton tracking in video
■ Other types of human action recognition
 Human-object interaction analysis
 Action video question answering
 Multiple person’s action recognition
30
Q & A
Thank you

More Related Content

What's hot

What's hot (20)

Deep learning
Deep learningDeep learning
Deep learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognition
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation
 
Resnet
ResnetResnet
Resnet
 

Similar to Human Action Recognition

A3c mmgc13-slideshare
A3c mmgc13-slideshareA3c mmgc13-slideshare
A3c mmgc13-slideshare
Shannon Chen
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Fadwa Fouad
 
Human Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-PosterHuman Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-Poster
nikhilus85
 
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...
cscpconf
 

Similar to Human Action Recognition (20)

HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
 
A3c mmgc13-slideshare
A3c mmgc13-slideshareA3c mmgc13-slideshare
A3c mmgc13-slideshare
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptor
 
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
 
Human Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-PosterHuman Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-Poster
 
Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...Recurrent neural networks for sequence learning and learning human identity f...
Recurrent neural networks for sequence learning and learning human identity f...
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
A COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTION
A COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTIONA COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTION
A COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTION
 
A COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTION
A COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTIONA COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTION
A COMPARATIVE STUDY OF LSTM AND PHASED LSTM FOR GAIT PREDICTION
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Human Action Recognition using Lagrangian Descriptors
Human Action Recognition using Lagrangian DescriptorsHuman Action Recognition using Lagrangian Descriptors
Human Action Recognition using Lagrangian Descriptors
 
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMA ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
 
Smart Room Gesture Control
Smart Room Gesture ControlSmart Room Gesture Control
Smart Room Gesture Control
 
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...
ADAPTIVE, SCALABLE, TRANSFORMDOMAIN GLOBAL MOTION ESTIMATION FOR VIDEO STABIL...
 
Towards Robust and Safe Autonomous Drones
Towards Robust and Safe Autonomous DronesTowards Robust and Safe Autonomous Drones
Towards Robust and Safe Autonomous Drones
 
All projects
All projectsAll projects
All projects
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
TransDreamer.pptx
TransDreamer.pptxTransDreamer.pptx
TransDreamer.pptx
 
SPPRA'2013 Paper Presentation
SPPRA'2013 Paper PresentationSPPRA'2013 Paper Presentation
SPPRA'2013 Paper Presentation
 

More from NAVER Engineering

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Human Action Recognition

  • 1. HUMAN ACTION RECOGNITION - INTRODUCTION OF HUMAN ACTION RECOGNITION - ISSUES OF SKELETON-BASED ACTION RECOGNITION - RESEARCH RELATED TO SKELETON-BASED ACTION RECOGNITION 1 연세대 박사과정 이인웅
  • 2. HUMAN ACTION RECOGNITION - INTRODUCTION OF HUMAN ACTION RECOGNITION
  • 3. Introduction of Human Action Recognition  RGB based Human Action Recognition ■ Two-stream convolutional networks for action recognition in videos, in NIPS 2014  UCF-101 (88.0 %), HMDB-51 (59.4 %) ■ Currently, UCF-101 (94.9 %), HMDB-51 (72.2 %) in CVPR 2017 ■ Focusing on mapping video into action label, not human pose 3
  • 4. Introduction of Human Action Recognition  RGB based 2D Human Pose Estimation ■ Hand Face and Body Keypoint Detection (CVPR 2017)  More context information than just RGB video 4
  • 5. Introduction of Human Action Recognition  RGB based 3D Human Pose Estimation ■ Recurrent 3D Pose Sequence Machine (CVPR 2017)  More information (3D) than 2D human pose 5
  • 6. Introduction of Human Action Recognition  RGB-D based 3D Human Pose Estimation ■ Microsoft Kinect version 2.0  More accurate than RGB based 3D skeleton because of depth 6 RGB with 2D Skeleton 3D Skeleton
  • 7. HUMAN ACTION RECOGNITION - ISSUES OF SKELETON-BASED ACTION RECOGNITION
  • 8. Issues of Skeleton-based Action Recognition  Attributes of Skeleton extracted from Camera 8 z Variable Scale View 1 x y x y z View 2 Variable View Orientation View 3z x y small large Very Noisy
  • 9. Issues of Skeleton-based Action Recognition  Attributes of Human Action 9 Rate Variation 5 frames per 1 action 3 frames per 1 action fast slow Intra-action Variation Straight Punch Curved Punch
  • 10. HUMAN ACTION RECOGNITION - RESEARCH RELATED TO SKELETON-BASED ACTION RECOGNITION : ENSEMBLE DEEP LEARNING USING TS-LSTM NETWORKS
  • 11. Ensemble Deep Learning using TS-LSTM networks  Overview of the proposed deep learning network [1] 11 Coordinate Transformation Salient Motion Extraction Discriminative Multi-term LSTMs Ensemble of Deep Learning LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Multi-termDiversityaxis Softmax axis Motion 1 Motion 2 Motion N LSTM LSTM … Ensemble Translation Rotation Scale LSTM LSTM [1] Inwoong Lee, Doyoung Kim, Seoungyoon Kang, and Sanghoon Lee, "Ensemble Deep Learning for Skeleton-based Action Recognition using Temporal Sliding LSTM networks," in ICCV 2017.
  • 12.  Feature Representation (1/3) ■ Absolute skeleton position  Joint coordinates of raw skeleton  High orientation and location variations at the same action ■ Relative skeleton position  Joint difference coordinates from a reference joint of each frame  Low orientation and location variations at the same action  Simplification of temporal skeleton movements (ex: Jump) ■ Absolute + Relative skeleton position  Joint difference coordinates from a reference joint of initial frame  Low orientation and location variations at the same action  Reflection of temporal skeleton movements 12 Ensemble Deep Learning using TS-LSTM networks
  • 13. Ensemble Deep Learning using TS-LSTM networks  Feature Representation (2/3) ■ Pose orientation alignment 13 Trial: We transform y axis  the vector vertical to the ground, x axis  the left direction of initial skeleton, and z axis  the cross product of x and y. Effect: This trial achieves view/orientation invariance and retains temporal relation between skeletons. Initial skeleton of each sequence Sequence 1 Sequence 2 x z y
  • 14. Ensemble Deep Learning using TS-LSTM networks  Feature Representation (3/3) ■ Motion feature extraction 14 Trial: We obtain a difference between two skeleton frames. Effect: Motion feature can capture the actual movements of skeleton joints. Difference between two skeletons
  • 15.  Modeling of Human Action (1/8) ■ Traditional work (1/4)  Mining Actionlet Ensemble for Action Recognition with Depth Cameras, in CVPR 2012  Fourier temporal pyramids  In addition to the global fourier coefficients, they recursively partition the action into a pyramid  Mining actionlet ensemble  Feature pooling  Support vector machine 15 Ensemble Deep Learning using TS-LSTM networks
  • 16.  Modeling of Human Action (2/8) ■ Traditional work (2/4)  Human Action Recognition by Representing 3D Skeletons as Points in A Lie Group, in CVPR 2014  Feature representation using manifold, temporal alignment through dynamic time warping, and SVM classification using FTP 16 Ensemble Deep Learning using TS-LSTM networks SVM: Support Vector Machine FTP: Fourier Temporal Pyramid
  • 17.  Modeling of Human Action (3/8) ■ Traditional work (3/4)  Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition, in CVPR 2015  Spatial: body part based features, temporal: recurrent networks 17 Ensemble Deep Learning using TS-LSTM networks
  • 18.  Modeling of Human Action (4/8) ■ Traditional work (4/4)  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition, in ECCV 2016  Combination of spatial and temporal features using LSTM 18 Ensemble Deep Learning using TS-LSTM networks LSTM: Long Short-Term Memory
  • 19. Ensemble Deep Learning using TS-LSTM networks  Modeling of Human Action (5/8) ■ Temporal Sliding LSTM (TS-LSTM)  LSTM captures only long-term dependency.  TS-LSTM can capture short-term, medium-term, and long-term dependencies.  We can adapt TS-LSTM into various dependencies through controlling of temporal stride and internal LSTM time-step size. 19 LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Input Sequence Temporal Stride LSTM LSTM LSTM LSTM Input Sequence : LSTM input : LSTM output Traditional LSTM Proposed TS-LSTM LSTM LSTM LSTM: Long Short-Term Memory
  • 20. Ensemble Deep Learning using TS-LSTM networks  Modeling of Human Action (6/8) ■ Conceptual diagram of Ensemble TS-LSTM v1  1 Short-term TS-LSTM with 1 motion feature  2 Medium-term TS-LSTM with 2 motion features  3 Long-term TS-LSTM with 3 motion features 20 : Ensemble feature Short Short Short Short Medium Medium Medium Medium Long Long Long : Motion feature Output Input
  • 21. Ensemble Deep Learning using TS-LSTM networks  Modeling of Human Action (7/8) ■ Conceptual diagram of Ensemble TS-LSTM v2  1 Short-term TS-LSTM with 1 motion feature  2 Medium-term TS-LSTM with 2 motion features  3 Long-term TS-LSTM with 3 motion features  1 Medium-term TS-LSTM with 1 pose feature 21 : Ensemble feature Short Short Short Short Medium Medium Medium Medium Long Long Long : Motion feature Output Input Medium Medium : Pose feature
  • 22.  Modeling of Human Action (8/8) ■ Actual Ensemble TS-LSTM v1 & Ensemble TS-LSTM v2 22 Ensemble Deep Learning using TS-LSTM networks
  • 23. Ensemble Deep Learning using TS-LSTM networks  Used Datasets ■ MSR Action3D dataset (widely used)  20 actions performed by 10 subjects for 2 or 3 times ■ Northwestern-UCLA dataset (3 views)  10 actions performed by 10 subjects for 1 ~ 6 times  Abbreviation Definition ■ Human Cognitive Coordinate (HCC)  y axis  the vector vertical to the ground, x axis  the left direction of initial skeleton, z axis  the cross product of x and y ■ Salient Motion Feature (SMF)  Difference features between two skeleton frames 23
  • 24. Ensemble Deep Learning using TS-LSTM networks  Results and Comparisons (1/2) ■ Bag of 3d points: Projection of the sampled 3D points ■ Lie group: Manifold feature based SVM model ■ HBRNN: Body-part based LSTM model ■ ST-LSTM + Trust Gate: Spatio-Temporal LSTM model with Trust Gate 24 Experimental result comparison on the MSR Action3D dataset.  Best Performance SVM: Support Vector Machine
  • 25. Ensemble Deep Learning using TS-LSTM networks  Results and Comparisons (2/2) ■ Lie group: Manifold feature based SVM model ■ Actionlet ensemble: Temporal Pyramid features + SVM model ■ HBRNN-L: Body-part based LSTM model ■ Enhanced skeleton visualization: CNN based model 25 Experimental results on the Northwestern-UCLA dataset.  Best Performance CNN: Convolutional Neural Network
  • 26. Ensemble Deep Learning using TS-LSTM networks  Result Analysis (1/3) ■ Misclassified action  Forward punch  Tennis serve  High throw  Hammer, Tennis serve  Pickup & throw  Bend ■ Ensemble TS-LSTM v1 classifies these similar actions to some degree by using the multiple TS-LSTM networks 26 Confusion matrix of AS1 (Ensemble TS-LSTM v1)Ground Truth Prediction
  • 27. Ensemble Deep Learning using TS-LSTM networks  Result Analysis (2/3) ■ Softmax feature analysis of Ensemble TS-LSTM v1 (1/2)  Overall, the diagonal probabilities of Softmax2 with long-term LSTMs are higher than those of Softmax0 with short-term LSTMs and Softmax1 with medium-term LSTMs  The global temporal features have relatively more influence on the performance than the local temporal features 27Softmax0 Softmax1 Softmax2 Short Medium Long
  • 28. Ensemble Deep Learning using TS-LSTM networks  Result Analysis (3/3) ■ Softmax feature analysis of Ensemble TS-LSTM v1 (2/2)  Softmax0 and Softmax1 sometimes produce lower misclassification rates compared with Softmax2  This makes the model less prone to overfitting to certain actions  Softmax0 and Softmax1 have lower misclassification probabilities of “Pickup & throw” to “Bend” than Softmax2 28Softmax0 Softmax1 Softmax2 Short Medium Long (Weakness Compensation)
  • 29. Ensemble Deep Learning using TS-LSTM networks  Action Sequence Prediction (Yonsei Dataset) 29 1. Fall Down  2. Sit Down  3. Stand Up  4. Wave Hands  5. Hands On The Head  6. Hunker Down  7. Punch  8. Kick  9. Wield Knife  10. Aim Handgun  11. Aim Rifle  12. Throw  13. Kick Object Action Label Order
  • 30. Ensemble Deep Learning using TS-LSTM networks  Remaining Issues ■ Network design for skeleton-based action  Advanced ensemble TS-LSTM networks ■ Untrimmed skeleton-based action recognition  Action classification + action localization  Real-time action detection system ■ Human pose estimation  2/3D skeleton estimation from RGB images  Skeleton tracking in video ■ Other types of human action recognition  Human-object interaction analysis  Action video question answering  Multiple person’s action recognition 30
  • 31. Q & A Thank you