SlideShare a Scribd company logo
1 of 23
Download to read offline
ģžģ—°ģ–“ģ²˜ė¦¬ ģ—°źµ¬ģ‹¤
M2020064
ģ”°ė‹Øė¹„
Published in: 2017 IEEE 12th International Conference on Automatic Face & Gesture Recognition
URL: https://ieeexplore.ieee.org/abstract/document/7961779
Content
1. Introduce
2. Taxonomy (architecture & challenges)
3. Action/Activity & Gesture Recognition
4. Discussion
#Kookmin_University #Natural_Language_Processing_lab. 1
Introduce
> Action and Gesture recognition + Deep learning
> Challenging problem: amounts of data to be processed, model complexity
> Proposed models: RNN and LSTM for action/gesture recognition
+ 3D convolutional networks
+ pre-computed motion-based features
+ combination of multiple visual
> Our goal: how they treat the temporal dimension of the data?
#Kookmin_University #Natural_Language_Processing_lab. 2
Computer vision and pattern recognition
Temporal dimension in sequences
Taxonomy
1. Architectures
2. Fusion Strategies
3. Datasets
4. Challenges
#Kookmin_University #Natural_Language_Processing_lab. 3
Architectures
#Kookmin_University #Natural_Language_Processing_lab. 4
Action/Gesture
Recognition Approaches
3D Models (3D conv a pool)
Motion-based input features
Temporal Methods 2D Models + RNN + LSTM
2D Models + B-RNN + LSTM
2D Models + H-RNN + LSTM
2D Models + D-RNN + LSTM
2D Models + HMM
2D/3D Models + Auxiliary outputs
2D/3D Models + Hand-crafted features
* B: Bidirectional
H: Hierarchical
D: Differential
Architectures
> How the deal with the temporal dimension
in deep-based human action and gesture recognition?
1) Using 3D filters in the convolutional layer
> It captures discriminative features along both spatial and temporal dimensions
while maintaining a certain temporal structure
2) Motion features
> We extract motion features
> The features input to the network as additional channels
3) Combining a 2D(or 3D) CNN applied at individual frames with a temporal sequence modeling
> with RNN or LSTM
#Kookmin_University #Natural_Language_Processing_lab. 5
Architectures
#Kookmin_University #Natural_Language_Processing_lab. 6
Fusion Strategies
> Main variants for information fusion in deep learning models
1) Early
> Before the data is feed into the model,
> The model fuses information directly from multiple sources
2) Late
> Output of deep learning models are combined
3) Middle
> Intermediate layers fuse information
Additional fusion strategies: ensembles or stacked networks
#Kookmin_University #Natural_Language_Processing_lab. 7
to combine the information from parts of a segmented video sequence
Datasets
8
Datasets
9
Challenges
#Kookmin_University #Natural_Language_Processing_lab. 10
Reviews: Action/Activity & Gesture Recognition
1. 3D Convolutional Neural Networks
2. Motion-based Features
3. Temporal Deep Learning Models: RNN and LSTM
4. Deep Learning with Fusion Strategies
#Kookmin_University #Natural_Language_Processing_lab. 11
3D Convolutional Neural Networks
> Extending the convolution along the temporal axis (in 3D CNN)
- Initializing the weights of a 3D CNN by using 2D weights learned from ImageNET
- Factorizing the 3D convolutional kernel learning
as a sequential process of learning 2D spatial and 1D temporal kernels in different layers
- Performing 3D convolutions over stacks of optical flow maps
- Using multiple 3D CNNs in a multi-stage
- Combining 3D CNN models with sequence modeling methods
or hand-crafted feature desciptors
#Kookmin_University #Natural_Language_Processing_lab. 12
Motion-based Features
> Incorporating pre-computed temporal features within the deep model
- Presenting two-stream CNN (spatial and temporal networks)
- Exploiting a motion vector from video compression
- Extending the convolutions in time with long-term temporal convolutions
> Extending the CNN capabilities using trajectory features
- Pooling and normalization
- Learning bag-of-features from dense trajectories of synthetic 3D human models
#Kookmin_University #Natural_Language_Processing_lab. 13
Temporal Deep Learning Models: RNN and LSTM
> Combining CNN with temporal sequence models (RNN or LSTM)
- Changing information of motions between successive frames
- Presenting a multi-stream (motion and appearance) using bi-directional RNN
- Observing video frames and deciding both where to look next and when to emit a
prediction
- using 3D skeleton sequences to regularize LSTM network (LSTM+CNN) on video frames
- RNN with Multimodal(depth video, skeleton, and speech) system
- Multi-RNN to facilitate the handling of variable-length gestures
#Kookmin_University #Natural_Language_Processing_lab. 14
Deep Learning with Fusion Strategies
> Using diverse fusion schemes to improve recognition performance of
action recognition
- Learning an end-to-end hierarchical RNN with skeleton data
- DeepConvLSTM based on convolutional and LSTM recurrent units
- HMM(Hidden Markov Model), GMM(Gaussian Mixture Model)
#Kookmin_University #Natural_Language_Processing_lab. 15
Discussion
> Comprehensive overview of deep-based models for action and gesture recognition
- How does a method deal with temporal information?
- How can such a large net work be trained with small datasets?
> 3D networks over a long sequence can learn complex temporal patterns
> Temporal models (RNN and LSTM) has the crucial advantage to cope with longer-range
temporal relations
> Ensemble learning reduces the bias and variance errors of the learning algorithm
(fusion strategies)
#Kookmin_University #Natural_Language_Processing_lab. 16
Other papers
#Kookmin_University #Natural_Language_Processing_lab. 17
ā€œModeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classificationā€
(ACM 2015)
Other papers
#Kookmin_University #Natural_Language_Processing_lab. 18
ā€œLong-term Recurrent Convolutional Networks for Visual Recognition and Descriptionā€
(CVPR 2015)
Other papers
#Kookmin_University #Natural_Language_Processing_lab. 19
ā€œFASTER Recurrent Networks for Efficient Video Classificationā€
(AAAI 2020)
Other papers
#Kookmin_University #Natural_Language_Processing_lab. 20
ā€œAttention Boosted Deep Networks for Video Classificationā€
(IEEE 2020)
Other papers
#Kookmin_University #Natural_Language_Processing_lab. 21
ā€œTraditional Bangladeshi Sports Video Classification
Using Deep Learning Methodā€
(Applied Sciences 2021)
Thank You.
22
#Kookmin_University #Natural_Language_Processing_lab.

More Related Content

What's hot

GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNAbhishek Tiwari
Ā 
Dissertation character recognition - Report
Dissertation character recognition - ReportDissertation character recognition - Report
Dissertation character recognition - Reportsachinkumar Bharadva
Ā 
Video Description using Deep Learning
Video Description using Deep LearningVideo Description using Deep Learning
Video Description using Deep LearningPranjalMahajan9
Ā 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...IJCNCJournal
Ā 
Handwritten Digit Recognition
Handwritten Digit RecognitionHandwritten Digit Recognition
Handwritten Digit Recognitionijtsrd
Ā 
Automated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired PeopleAutomated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired PeopleChristopher Mehdi Elamri
Ā 
Implementation of Steganographic Model using Inverted LSB Insertion
Implementation of Steganographic Model using Inverted LSB InsertionImplementation of Steganographic Model using Inverted LSB Insertion
Implementation of Steganographic Model using Inverted LSB InsertionDr. Amarjeet Singh
Ā 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
Ā 
Sharbani bhattacharya gyanodya 2014
Sharbani bhattacharya gyanodya 2014Sharbani bhattacharya gyanodya 2014
Sharbani bhattacharya gyanodya 2014Sharbani Bhattacharya
Ā 
Mostafa Shabani Cv
Mostafa Shabani CvMostafa Shabani Cv
Mostafa Shabani Cvmostafa shabani
Ā 
Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘
Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘
Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘datasciencekorea
Ā 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learningRamesh Kumar
Ā 
Digit recognition
Digit recognitionDigit recognition
Digit recognitionbtandale
Ā 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
Ā 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaEr. Arpit Sharma
Ā 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...csandit
Ā 

What's hot (20)

GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNN
Ā 
Dissertation character recognition - Report
Dissertation character recognition - ReportDissertation character recognition - Report
Dissertation character recognition - Report
Ā 
Video Description using Deep Learning
Video Description using Deep LearningVideo Description using Deep Learning
Video Description using Deep Learning
Ā 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
Ā 
Handwritten Digit Recognition
Handwritten Digit RecognitionHandwritten Digit Recognition
Handwritten Digit Recognition
Ā 
Automated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired PeopleAutomated Neural Image Caption Generator for Visually Impaired People
Automated Neural Image Caption Generator for Visually Impaired People
Ā 
Implementation of Steganographic Model using Inverted LSB Insertion
Implementation of Steganographic Model using Inverted LSB InsertionImplementation of Steganographic Model using Inverted LSB Insertion
Implementation of Steganographic Model using Inverted LSB Insertion
Ā 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
Ā 
Sharbani bhattacharya gyanodya 2014
Sharbani bhattacharya gyanodya 2014Sharbani bhattacharya gyanodya 2014
Sharbani bhattacharya gyanodya 2014
Ā 
Mostafa Shabani Cv
Mostafa Shabani CvMostafa Shabani Cv
Mostafa Shabani Cv
Ā 
Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘
Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘
Deep Learning - ģøź³µģ§€ėŠ„ źø°ź³„ķ•™ģŠµģ˜ ģƒˆė”œģš“ ķŠøėžœė“œ :ź¹€ģøģ¤‘
Ā 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learning
Ā 
88 92
88 9288 92
88 92
Ā 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
Ā 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
Ā 
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
Ā 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharma
Ā 
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep LearningAnimesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Ā 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Ā 
Tensorflow Training From Bangalore,Online and Classrooms
Tensorflow Training From Bangalore,Online and ClassroomsTensorflow Training From Bangalore,Online and Classrooms
Tensorflow Training From Bangalore,Online and Classrooms
Ā 

Similar to A survey on deep learning based approaches for action and gesture recognition in image sequences

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
Ā 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...IJECEIAES
Ā 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMIRJET Journal
Ā 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish
Ā 
Human Action Recognition in Videos
Human Action Recognition in VideosHuman Action Recognition in Videos
Human Action Recognition in VideosIRJET Journal
Ā 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
Ā 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
Ā 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
Ā 
Video captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learningVideo captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learningIJECEIAES
Ā 
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Willy Marroquin (WillyDevNET)
Ā 
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Willy Marroquin (WillyDevNET)
Ā 
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learning
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic LearningOmni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learning
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learningsipij
Ā 
IRJET- Survey on Text Error Detection using Deep Learning
IRJET-  	  Survey on Text Error Detection using Deep LearningIRJET-  	  Survey on Text Error Detection using Deep Learning
IRJET- Survey on Text Error Detection using Deep LearningIRJET Journal
Ā 
Deep Learning Neural Networks in the Cloud
Deep Learning Neural Networks in the CloudDeep Learning Neural Networks in the Cloud
Deep Learning Neural Networks in the CloudIJAEMSJORNAL
Ā 
Three classes of deep learning networks
Three classes of deep learning networksThree classes of deep learning networks
Three classes of deep learning networksVenkat Chaithanya Chintha
Ā 
Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
Ā 
Real-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for SurveillanceReal-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for SurveillanceIRJET Journal
Ā 
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET Journal
Ā 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
Ā 

Similar to A survey on deep learning based approaches for action and gesture recognition in image sequences (20)

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Ā 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...
Ā 
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMVIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTM
Ā 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Ā 
Human Action Recognition in Videos
Human Action Recognition in VideosHuman Action Recognition in Videos
Human Action Recognition in Videos
Ā 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
Ā 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
Ā 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
Ā 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
Ā 
Video captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learningVideo captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learning
Ā 
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Ā 
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Ā 
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learning
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic LearningOmni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learning
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learning
Ā 
IRJET- Survey on Text Error Detection using Deep Learning
IRJET-  	  Survey on Text Error Detection using Deep LearningIRJET-  	  Survey on Text Error Detection using Deep Learning
IRJET- Survey on Text Error Detection using Deep Learning
Ā 
Deep Learning Neural Networks in the Cloud
Deep Learning Neural Networks in the CloudDeep Learning Neural Networks in the Cloud
Deep Learning Neural Networks in the Cloud
Ā 
Three classes of deep learning networks
Three classes of deep learning networksThree classes of deep learning networks
Three classes of deep learning networks
Ā 
Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...
Ā 
Real-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for SurveillanceReal-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for Surveillance
Ā 
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...
Ā 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
Ā 

More from Danbi Cho

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkDanbi Cho
Ā 
Gpt models
Gpt modelsGpt models
Gpt modelsDanbi Cho
Ā 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsDanbi Cho
Ā 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textDanbi Cho
Ā 
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Danbi Cho
Ā 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
Ā 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp timeDanbi Cho
Ā 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsDanbi Cho
Ā 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingDanbi Cho
Ā 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningDanbi Cho
Ā 

More from Danbi Cho (10)

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic network
Ā 
Gpt models
Gpt modelsGpt models
Gpt models
Ā 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
Ā 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in text
Ā 
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Zero wall detecting zero-day web attacks through encoder-decoder recurrent ne...
Ā 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
Ā 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
Ā 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Ā 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understanding
Ā 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learning
Ā 

Recently uploaded

Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletAndrea Goulet
Ā 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphNeo4j
Ā 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfkalichargn70th171
Ā 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
Ā 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanNeo4j
Ā 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
Ā 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...drm1699
Ā 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfryanfarris8
Ā 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024MulesoftMunichMeetup
Ā 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNeo4j
Ā 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Lisi Hocke
Ā 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIInflectra
Ā 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
Ā 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AIAGATSoftware
Ā 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)Roberto Bettazzoni
Ā 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
Ā 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)Dimitrios Platis
Ā 

Recently uploaded (20)

Abortion Clinic In Pongola ](+27832195400*)[ šŸ„ Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ šŸ„ Safe Abortion Pills In Pongola...Abortion Clinic In Pongola ](+27832195400*)[ šŸ„ Safe Abortion Pills In Pongola...
Abortion Clinic In Pongola ](+27832195400*)[ šŸ„ Safe Abortion Pills In Pongola...
Ā 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
Ā 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
Ā 
Abortion Clinic In Pretoria ](+27832195400*)[ šŸ„ Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ šŸ„ Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ šŸ„ Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ šŸ„ Safe Abortion Pills in Pretor...
Ā 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Ā 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
Ā 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Ā 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
Ā 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Ā 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
Ā 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Ā 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
Ā 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Ā 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
Ā 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
Ā 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
Ā 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
Ā 
Abortion Pill Prices Jozini ](+27832195400*)[ šŸ„ Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ šŸ„ Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ šŸ„ Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ šŸ„ Women's Abortion Clinic in Jo...
Ā 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Ā 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
Ā 

A survey on deep learning based approaches for action and gesture recognition in image sequences

  • 1. ģžģ—°ģ–“ģ²˜ė¦¬ ģ—°źµ¬ģ‹¤ M2020064 ģ”°ė‹Øė¹„ Published in: 2017 IEEE 12th International Conference on Automatic Face & Gesture Recognition URL: https://ieeexplore.ieee.org/abstract/document/7961779
  • 2. Content 1. Introduce 2. Taxonomy (architecture & challenges) 3. Action/Activity & Gesture Recognition 4. Discussion #Kookmin_University #Natural_Language_Processing_lab. 1
  • 3. Introduce > Action and Gesture recognition + Deep learning > Challenging problem: amounts of data to be processed, model complexity > Proposed models: RNN and LSTM for action/gesture recognition + 3D convolutional networks + pre-computed motion-based features + combination of multiple visual > Our goal: how they treat the temporal dimension of the data? #Kookmin_University #Natural_Language_Processing_lab. 2 Computer vision and pattern recognition Temporal dimension in sequences
  • 4. Taxonomy 1. Architectures 2. Fusion Strategies 3. Datasets 4. Challenges #Kookmin_University #Natural_Language_Processing_lab. 3
  • 5. Architectures #Kookmin_University #Natural_Language_Processing_lab. 4 Action/Gesture Recognition Approaches 3D Models (3D conv a pool) Motion-based input features Temporal Methods 2D Models + RNN + LSTM 2D Models + B-RNN + LSTM 2D Models + H-RNN + LSTM 2D Models + D-RNN + LSTM 2D Models + HMM 2D/3D Models + Auxiliary outputs 2D/3D Models + Hand-crafted features * B: Bidirectional H: Hierarchical D: Differential
  • 6. Architectures > How the deal with the temporal dimension in deep-based human action and gesture recognition? 1) Using 3D filters in the convolutional layer > It captures discriminative features along both spatial and temporal dimensions while maintaining a certain temporal structure 2) Motion features > We extract motion features > The features input to the network as additional channels 3) Combining a 2D(or 3D) CNN applied at individual frames with a temporal sequence modeling > with RNN or LSTM #Kookmin_University #Natural_Language_Processing_lab. 5
  • 8. Fusion Strategies > Main variants for information fusion in deep learning models 1) Early > Before the data is feed into the model, > The model fuses information directly from multiple sources 2) Late > Output of deep learning models are combined 3) Middle > Intermediate layers fuse information Additional fusion strategies: ensembles or stacked networks #Kookmin_University #Natural_Language_Processing_lab. 7 to combine the information from parts of a segmented video sequence
  • 12. Reviews: Action/Activity & Gesture Recognition 1. 3D Convolutional Neural Networks 2. Motion-based Features 3. Temporal Deep Learning Models: RNN and LSTM 4. Deep Learning with Fusion Strategies #Kookmin_University #Natural_Language_Processing_lab. 11
  • 13. 3D Convolutional Neural Networks > Extending the convolution along the temporal axis (in 3D CNN) - Initializing the weights of a 3D CNN by using 2D weights learned from ImageNET - Factorizing the 3D convolutional kernel learning as a sequential process of learning 2D spatial and 1D temporal kernels in different layers - Performing 3D convolutions over stacks of optical flow maps - Using multiple 3D CNNs in a multi-stage - Combining 3D CNN models with sequence modeling methods or hand-crafted feature desciptors #Kookmin_University #Natural_Language_Processing_lab. 12
  • 14. Motion-based Features > Incorporating pre-computed temporal features within the deep model - Presenting two-stream CNN (spatial and temporal networks) - Exploiting a motion vector from video compression - Extending the convolutions in time with long-term temporal convolutions > Extending the CNN capabilities using trajectory features - Pooling and normalization - Learning bag-of-features from dense trajectories of synthetic 3D human models #Kookmin_University #Natural_Language_Processing_lab. 13
  • 15. Temporal Deep Learning Models: RNN and LSTM > Combining CNN with temporal sequence models (RNN or LSTM) - Changing information of motions between successive frames - Presenting a multi-stream (motion and appearance) using bi-directional RNN - Observing video frames and deciding both where to look next and when to emit a prediction - using 3D skeleton sequences to regularize LSTM network (LSTM+CNN) on video frames - RNN with Multimodal(depth video, skeleton, and speech) system - Multi-RNN to facilitate the handling of variable-length gestures #Kookmin_University #Natural_Language_Processing_lab. 14
  • 16. Deep Learning with Fusion Strategies > Using diverse fusion schemes to improve recognition performance of action recognition - Learning an end-to-end hierarchical RNN with skeleton data - DeepConvLSTM based on convolutional and LSTM recurrent units - HMM(Hidden Markov Model), GMM(Gaussian Mixture Model) #Kookmin_University #Natural_Language_Processing_lab. 15
  • 17. Discussion > Comprehensive overview of deep-based models for action and gesture recognition - How does a method deal with temporal information? - How can such a large net work be trained with small datasets? > 3D networks over a long sequence can learn complex temporal patterns > Temporal models (RNN and LSTM) has the crucial advantage to cope with longer-range temporal relations > Ensemble learning reduces the bias and variance errors of the learning algorithm (fusion strategies) #Kookmin_University #Natural_Language_Processing_lab. 16
  • 18. Other papers #Kookmin_University #Natural_Language_Processing_lab. 17 ā€œModeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classificationā€ (ACM 2015)
  • 19. Other papers #Kookmin_University #Natural_Language_Processing_lab. 18 ā€œLong-term Recurrent Convolutional Networks for Visual Recognition and Descriptionā€ (CVPR 2015)
  • 20. Other papers #Kookmin_University #Natural_Language_Processing_lab. 19 ā€œFASTER Recurrent Networks for Efficient Video Classificationā€ (AAAI 2020)
  • 21. Other papers #Kookmin_University #Natural_Language_Processing_lab. 20 ā€œAttention Boosted Deep Networks for Video Classificationā€ (IEEE 2020)
  • 22. Other papers #Kookmin_University #Natural_Language_Processing_lab. 21 ā€œTraditional Bangladeshi Sports Video Classification Using Deep Learning Methodā€ (Applied Sciences 2021)