SlideShare a Scribd company logo
1 of 20
Make Skeleton-based Action Recognition
Model Smaller, Faster and Better
Fan Yang1,2, Sakriani Sakti1,2, Yang Wu3 , Satoshi Nakamura1,2
1Nara Institute of Science and Technology, Japan
2RIKEN Center for Advanced Intelligence Project, Japan
3Kyoto University, Japan
0
Skeleton-based action recognition
12019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
RGB-based action
recognition
Skeleton-based
action recognition
Using JHMDB data as an example
With a comparable performance, one of the fastest RGB-based action recognition work is ECO
(Zolfaghari, et al. 2018) : 230 clips per second.
With a comparable performance, one of the reported fastest skeleton-based action recognition work
is MFA-Net (Chen et al. 2019) : 361 clips per second, about 3M parameters.
Compared with the RGB data, the skeleton data is more sparse, can we make it smaller
faster, and better?
Skeleton-based action recognition by neural network models
22019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Joints {j0, j1, …, jn}
Frames{f0,f0,…,fk}
Inputs
Actions
Neural Network
The raw skeleton sequence feature
Representation (e.g., 3D skeleton)
32019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Joints {j0, j1, …, jn}
Frames{f0,f0,…,fk}
Inputs
Selecting features
≠
joints
xyz
=
Joints 1:N-1
Joints2:N
Geometry feature
Joints 1:N-1
Joints2:N
Has been solely used in:
“Learning a 3d human pose distance metric from geo-metric pose descriptor”,
“Fusing geometric features for skeleton-based action recognition using multilayer lstm networks”,
“Enhanced skeleton visualization for view invariant human action recognition”, etc.
Cartesian coordinate feature
xyz
joints
Has been solely used in:
“Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition”
“Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition”,
“Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data”, etc.
( lose global spatio-temporal information)
(contains global spatio-temporal information)
Let’s use both of them
=
Rotate and shift
Action
Obtaining global motions from Cartesian coordinate feature
42019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Actions related to global trajectories,
e.g., swipe V.
Actions unrelated to global trajectories,
e.g., pinch.
Therefore, we use Cartesian coordinates features, other than geometry features, to
calculate the motions.
Motion = -
frame k+1 frame k
It is variant to global view points but invariant to global locations.
Double-scale motions
52019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Slow Motion
Double-scale
motions
Fast motion = -
frame k+1 frame k
Slow motion = -
frame k+2 frame k
Fast Motion
Previous works only utilize a single-scale motion.
To be robust to the motion scale variance, we use double-scale motions.
Implementation in our neural network architecture
62019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Concatenate
CNN(1,2*filters)
CNN(1,filters), /2
Temporal difference
(stride=2)
CNN(1,2*filters)
CNN(1,filters), /2
CNN(1,2*filters)
CNN(1,filters)
Temporal difference
(stride=1)
Cartesian
Coordinates
Joints Collection
Distances
2×CNN(3,2*filters), /2
2×CNN(3,4*filters), /2
2×CNN(3,8*filters)
CNN(3,filters) CNN(3,filters) CNN(3,filters)
Fast
motion
Slow
motionInputs
GAP
FC(num_classes)
FC(128)
Skeleton-based action recognition by neural network models
72019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Joints {j0, j1, …, jn}
Frames{f0,f0,…,fk}
Actions
To make the model smaller faster, and better
Inputs Neural Network
Fewer parameters and FLOPs
FLOPs: floating point operations
Neural network models used in skeleton-based action recognition
82019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Reference:
[1] “Convolutional Neural Networks for Multivariate Time Series Classification using both Inter- and Intra Channel Parallel Convolutions”(1d cnn)
[2] “Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition”(1d cnn)
[3] “Simple yet efficient real-time pose-based action recognition” (2d)
[4] “PoTion: Pose MoTion Representation for Action Recognition” (2d)
[5]“Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data” (lstm)
[6]“Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition” (2d cnn+lstm)
[7] “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition” (GCNN)
2D CNN + LSTM
[6]
1D CNN
[1,2]
LSTM
[5]
2D CNN
[3,4]
Could be relatively
smaller and faster
GCNN
[7]
Skeleton is ordered point data, but the correlation is complicated
92019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
1
3
4 5
6 7
8 9
10 11
12 13
14 15
2
Point Data
Ordered Point Data
GCNN, CNN, RNN and FCN could be used
Unordered Point Data
generally GCNN is used
e.g., point cloud e.g., skeletons
However,
Adjacent indices but not
locally correlated joints:
[9, 10],
[11, 12],
[13, 14], etc.
Locally correlated but not
adjacent indices joints:
[4, 8, 12],
[5, 9, 13],
[2, 6, 7], etc.
The relationship between joints are dynamically changed.
Dynamically learning the implicit relationship of joints
102019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Raw Features (relationship of joints is fixed)
Frames{f0,f0,…,fk}
Embedding features (implicit relationship of joints are learned)
Frames{f0,f0,…,fk}
Actions
Previous works [1,2]
Dynamically learning the
implicit relationship of joints
by a neural network
Reference:
[1] “Convolutional Neural Networks for Multivariate Time Series Classification using both Inter- and Intra Channel Parallel Convolutions”(1d cnn)
[2] “Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition”(1d cnn)
Implementation in our neural network architecture
112019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Concatenate
CNN(1,2*filters)
CNN(1,filters), /2
Temporal difference
(stride=2)
CNN(1,2*filters)
CNN(1,filters), /2
CNN(1,2*filters)
CNN(1,filters)
Temporal difference
(stride=1)
Cartesian
Coordinates
Joints Collection
Distances
2×CNN(3,2*filters), /2
2×CNN(3,4*filters), /2
2×CNN(3,8*filters)
CNN(3,filters) CNN(3,filters) CNN(3,filters)
Fast
motion
Slow
motion
Embedding
GAP
FC(num_classes)
FC(128)
Experimental Datasets
122019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Results on SHREC data (Hand Gesture Recognition)
132019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
e.g., swipe left gestures
Global trajectory matters
Double motions are more effective
Results on JHMDB data
142019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Actions are weakly related to
global trajectories
Double motions are more effective
Our code is released
152019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
2019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Thanks!
2019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Appendix: cartesian coordinate features
Limitation of our method:
For some cases (e.g., NTU data), Cartesian coordinates matters
182019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Appendix: parameters and FLOPs for the neural network
2D CNN:
Parameters: (Ci x K2 + 1) x Co
FLOPs: (2 x Ci x K2 – 1) x T x J x Co
Ci=input channel, k=kernel size, T=frames, J=joint number, Co=output channel.
Dense layer:
Parameters: (I +1) x O
FLOPs: (2 x I – 1) x O
I = input neuron numbers, O = output neuron numbers.
1D CNN:
Parameters: (Ci x K +1) x Co
FLOPs: (2 x Ci x K – 1) x T x Co
Ci=input channel, k=kernel size, T=frames, Co=output channel.
LSTM:
Parameters: 4 x ((I+1) x O + O^2)
FLOPs: 16 x T x H x (H – 1)
I = input size, O = output size, H = hidden size.
Appendix: motion feature
192019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
Simonyan et al. “Two-Stream Convolutional Networks for Action Recognition in Videos”
In RGB-based action recognition, motion features (i.e., optical flow) are used to improve the action
recognition performance.
In skeleton-based action recognition, cross-frame skeleton difference can be used as motion
features.
is a motion feature vector

More Related Content

What's hot

Batch normalization
Batch normalizationBatch normalization
Batch normalizationYuichiro Iio
 
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...MYEONGGYU LEE
 
Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)MYEONGGYU LEE
 
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Universitat Politècnica de Catalunya
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...MYEONGGYU LEE
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...Mohamed Elawady
 
Senseapp13 keynote
Senseapp13 keynoteSenseapp13 keynote
Senseapp13 keynoteRaja Jurdak
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
 
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...KIMMINHA3
 
PR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsPR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsjaewon lee
 
Transferable GAN-generated Images Detection Framework.
Transferable GAN-generated Images  Detection Framework.Transferable GAN-generated Images  Detection Framework.
Transferable GAN-generated Images Detection Framework.KIMMINHA3
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIPPR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIPjaewon lee
 
Tomoya Sato Master Thesis
Tomoya Sato Master ThesisTomoya Sato Master Thesis
Tomoya Sato Master Thesispflab
 
Deblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationDeblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationIRJET Journal
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...KAIST
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...KAIST
 
An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...
An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...
An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...Mr Santosh Kumar Chhotray
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 

What's hot (20)

Batch normalization
Batch normalizationBatch normalization
Batch normalization
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
 
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
(Paper Review) Abnormal Event Detection in Videos using Generative Adversaria...
 
Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)Survey of Super Resolution Task (SISR Only)
Survey of Super Resolution Task (SISR Only)
 
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
 
Senseapp13 keynote
Senseapp13 keynoteSenseapp13 keynote
Senseapp13 keynote
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
 
PR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypointsPR-146: CornerNet detecting objects as paired keypoints
PR-146: CornerNet detecting objects as paired keypoints
 
Transferable GAN-generated Images Detection Framework.
Transferable GAN-generated Images  Detection Framework.Transferable GAN-generated Images  Detection Framework.
Transferable GAN-generated Images Detection Framework.
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIPPR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
 
Tomoya Sato Master Thesis
Tomoya Sato Master ThesisTomoya Sato Master Thesis
Tomoya Sato Master Thesis
 
Deblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationDeblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel Estimation
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
 
An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...
An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...
An efficient block matching algorithm for fast motion ESTIMATION USING COMBIN...
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 

Similar to Make Skeleton Models Smaller, Faster and Better

IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...IRJET Journal
 
Human Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-PosterHuman Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-Posternikhilus85
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetIRJET Journal
 
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET Journal
 
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...Hiroki Nakahara
 
06-08 ppt.pptx
06-08 ppt.pptx06-08 ppt.pptx
06-08 ppt.pptxFarah Naaz
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionvivatechijri
 
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITIONA NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITIONsipij
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsIRJET Journal
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Networkgerogepatton
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Networkgerogepatton
 
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowPR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowHyeongmin Lee
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET Journal
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET Journal
 
Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning IJECEIAES
 

Similar to Make Skeleton Models Smaller, Faster and Better (20)

IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
IRJET- A Review Paper on Object Detection using Zynq-7000 FPGA for an Embedde...
 
Human Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-PosterHuman Action Recognition Based on Spacio-temporal features-Poster
Human Action Recognition Based on Spacio-temporal features-Poster
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNet
 
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
 
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
 
06-08 ppt.pptx
06-08 ppt.pptx06-08 ppt.pptx
06-08 ppt.pptx
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITIONA NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
A NOVEL GRAPH REPRESENTATION FOR SKELETON-BASED ACTION RECOGNITION
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
 
Ribeiro visfaces07
Ribeiro visfaces07Ribeiro visfaces07
Ribeiro visfaces07
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
 
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowPR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
 
Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning
 

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 

Recently uploaded (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Make Skeleton Models Smaller, Faster and Better

  • 1. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Fan Yang1,2, Sakriani Sakti1,2, Yang Wu3 , Satoshi Nakamura1,2 1Nara Institute of Science and Technology, Japan 2RIKEN Center for Advanced Intelligence Project, Japan 3Kyoto University, Japan 0
  • 2. Skeleton-based action recognition 12019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better RGB-based action recognition Skeleton-based action recognition Using JHMDB data as an example With a comparable performance, one of the fastest RGB-based action recognition work is ECO (Zolfaghari, et al. 2018) : 230 clips per second. With a comparable performance, one of the reported fastest skeleton-based action recognition work is MFA-Net (Chen et al. 2019) : 361 clips per second, about 3M parameters. Compared with the RGB data, the skeleton data is more sparse, can we make it smaller faster, and better?
  • 3. Skeleton-based action recognition by neural network models 22019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Joints {j0, j1, …, jn} Frames{f0,f0,…,fk} Inputs Actions Neural Network The raw skeleton sequence feature Representation (e.g., 3D skeleton)
  • 4. 32019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Joints {j0, j1, …, jn} Frames{f0,f0,…,fk} Inputs Selecting features ≠ joints xyz = Joints 1:N-1 Joints2:N Geometry feature Joints 1:N-1 Joints2:N Has been solely used in: “Learning a 3d human pose distance metric from geo-metric pose descriptor”, “Fusing geometric features for skeleton-based action recognition using multilayer lstm networks”, “Enhanced skeleton visualization for view invariant human action recognition”, etc. Cartesian coordinate feature xyz joints Has been solely used in: “Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition” “Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition”, “Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data”, etc. ( lose global spatio-temporal information) (contains global spatio-temporal information) Let’s use both of them = Rotate and shift Action
  • 5. Obtaining global motions from Cartesian coordinate feature 42019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Actions related to global trajectories, e.g., swipe V. Actions unrelated to global trajectories, e.g., pinch. Therefore, we use Cartesian coordinates features, other than geometry features, to calculate the motions. Motion = - frame k+1 frame k It is variant to global view points but invariant to global locations.
  • 6. Double-scale motions 52019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Slow Motion Double-scale motions Fast motion = - frame k+1 frame k Slow motion = - frame k+2 frame k Fast Motion Previous works only utilize a single-scale motion. To be robust to the motion scale variance, we use double-scale motions.
  • 7. Implementation in our neural network architecture 62019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Concatenate CNN(1,2*filters) CNN(1,filters), /2 Temporal difference (stride=2) CNN(1,2*filters) CNN(1,filters), /2 CNN(1,2*filters) CNN(1,filters) Temporal difference (stride=1) Cartesian Coordinates Joints Collection Distances 2×CNN(3,2*filters), /2 2×CNN(3,4*filters), /2 2×CNN(3,8*filters) CNN(3,filters) CNN(3,filters) CNN(3,filters) Fast motion Slow motionInputs GAP FC(num_classes) FC(128)
  • 8. Skeleton-based action recognition by neural network models 72019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Joints {j0, j1, …, jn} Frames{f0,f0,…,fk} Actions To make the model smaller faster, and better Inputs Neural Network Fewer parameters and FLOPs FLOPs: floating point operations
  • 9. Neural network models used in skeleton-based action recognition 82019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Reference: [1] “Convolutional Neural Networks for Multivariate Time Series Classification using both Inter- and Intra Channel Parallel Convolutions”(1d cnn) [2] “Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition”(1d cnn) [3] “Simple yet efficient real-time pose-based action recognition” (2d) [4] “PoTion: Pose MoTion Representation for Action Recognition” (2d) [5]“Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data” (lstm) [6]“Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition” (2d cnn+lstm) [7] “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition” (GCNN) 2D CNN + LSTM [6] 1D CNN [1,2] LSTM [5] 2D CNN [3,4] Could be relatively smaller and faster GCNN [7]
  • 10. Skeleton is ordered point data, but the correlation is complicated 92019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better 1 3 4 5 6 7 8 9 10 11 12 13 14 15 2 Point Data Ordered Point Data GCNN, CNN, RNN and FCN could be used Unordered Point Data generally GCNN is used e.g., point cloud e.g., skeletons However, Adjacent indices but not locally correlated joints: [9, 10], [11, 12], [13, 14], etc. Locally correlated but not adjacent indices joints: [4, 8, 12], [5, 9, 13], [2, 6, 7], etc. The relationship between joints are dynamically changed.
  • 11. Dynamically learning the implicit relationship of joints 102019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Raw Features (relationship of joints is fixed) Frames{f0,f0,…,fk} Embedding features (implicit relationship of joints are learned) Frames{f0,f0,…,fk} Actions Previous works [1,2] Dynamically learning the implicit relationship of joints by a neural network Reference: [1] “Convolutional Neural Networks for Multivariate Time Series Classification using both Inter- and Intra Channel Parallel Convolutions”(1d cnn) [2] “Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition”(1d cnn)
  • 12. Implementation in our neural network architecture 112019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Concatenate CNN(1,2*filters) CNN(1,filters), /2 Temporal difference (stride=2) CNN(1,2*filters) CNN(1,filters), /2 CNN(1,2*filters) CNN(1,filters) Temporal difference (stride=1) Cartesian Coordinates Joints Collection Distances 2×CNN(3,2*filters), /2 2×CNN(3,4*filters), /2 2×CNN(3,8*filters) CNN(3,filters) CNN(3,filters) CNN(3,filters) Fast motion Slow motion Embedding GAP FC(num_classes) FC(128)
  • 13. Experimental Datasets 122019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
  • 14. Results on SHREC data (Hand Gesture Recognition) 132019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better e.g., swipe left gestures Global trajectory matters Double motions are more effective
  • 15. Results on JHMDB data 142019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Actions are weakly related to global trajectories Double motions are more effective
  • 16. Our code is released 152019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better
  • 17. 2019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Thanks!
  • 18. 2019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Appendix: cartesian coordinate features Limitation of our method: For some cases (e.g., NTU data), Cartesian coordinates matters
  • 19. 182019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Appendix: parameters and FLOPs for the neural network 2D CNN: Parameters: (Ci x K2 + 1) x Co FLOPs: (2 x Ci x K2 – 1) x T x J x Co Ci=input channel, k=kernel size, T=frames, J=joint number, Co=output channel. Dense layer: Parameters: (I +1) x O FLOPs: (2 x I – 1) x O I = input neuron numbers, O = output neuron numbers. 1D CNN: Parameters: (Ci x K +1) x Co FLOPs: (2 x Ci x K – 1) x T x Co Ci=input channel, k=kernel size, T=frames, Co=output channel. LSTM: Parameters: 4 x ((I+1) x O + O^2) FLOPs: 16 x T x H x (H – 1) I = input size, O = output size, H = hidden size.
  • 20. Appendix: motion feature 192019©Fan Yang et al. Make Skeleton-based Action Recognition Model Smaller, Faster and Better Simonyan et al. “Two-Stream Convolutional Networks for Action Recognition in Videos” In RGB-based action recognition, motion features (i.e., optical flow) are used to improve the action recognition performance. In skeleton-based action recognition, cross-frame skeleton difference can be used as motion features. is a motion feature vector

Editor's Notes

  1. Morning everyone, this is Fan Yang, I will present our work make skeleton-based action recogntion smaller, faster, and better.
  2. By leveraging the power of machine learning, human actions can be automatically learned. One common method is to utilize the RGB videos, which contains rich context information. Another method is to use skeletons, which does not contain context information, but can well capture the motions with fewer noise.
  3. What we see for the skeleton sequence How to represent the skeleton sequence data
  4. To achieve this goal, let’s take a close look at some properties of skeleton sequence data.
  5. To achieve this goal, let’s take a close look at some properties of skeleton sequence data.
  6. Our features do no contain any location in Cartesian coordinates, but they can be useful for some data
  7. Since RNN takes recursive progress and it is slow, we do not talk about it here. For GCNN, the parameters and FLOPs can be referred to 2D CNN case.