SlideShare a Scribd company logo
1 of 10
2022-04-21
Sangmin Woo
Computational Intelligence Lab.
School of Electrical Engineering
Korea Advanced Institute of Science and Technology (KAIST)
Multi-modal Action
Recognition
Datasets & Benchmarks
2
Action Recognition Datasets
Generic
 Kinetics [1]
 Charades [2]
 Activity Net [3]
 UCF101 [4]
Instructional
 YouCook [5]
 COIN [6]
 HowTo100M [7]
[1] Carreira, Joao, et al. " Quo vadis, action recognition? a new model and the kinetics dataset." CVPR 2017
[2] Sigurdsson, Gunnar A., et al. "Hollywood in homes: Crowdsourcing data collection for activity understanding." ECCV 2016
[3] Caba Heilbron, Fabian, et al. "Activitynet: A large-scale video benchmark for human activity understanding." CVPR 2015
[4] Soomro, Khurram, et al. "UCF101: A dataset of 101 human actions classes from videos in the wild." arXiv 2012
[5] Zhou, Luowei, et al. "Towards automatic learning of procedures from web instructional videos." AAAI 2018
[6] Tang, Yansong, et al. "Coin: A large-scale dataset for comprehensive instructional video analysis." CVPR 2019
[7] Miech, Antoine, et al. "Howto100m: Learning a text-video embedding by watching hundred million narrated video clips." ICCV 2019
3
Action Recognition Datasets
Ego-centric
 EPIC Kitchens [1]
Compositional
 Action Genome [2]
 Something-Something [3]
 HOMAGE [4]
[1] Damen, Dima, et al. "Scaling egocentric vision: The epic-kitchens dataset." ECCV 2018
[2] Ji, Jingwei, et al. "Action genome: Actions as compositions of spatio-temporal scene graphs." CVPR 2020
[3] Goyal, Raghav, et al. "The" something something" video database for learning and evaluating visual common sense." ICCV 2017
[4] Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021
4
Action Recognition Datasets
Multi-view
 LEMMA [1]
[1] Jia, Baoxiong, et al. "Lemma: A multi-view dataset for learning multi-agent multi-task activities." ECCV 2020
5
Action Recognition Datasets
Multi-modal
Single Label (Video-level Action)
 MSR-Action3D [1] depth map
 PKU-MMD [2] RGB+D+IR+skeletion
 NTU RGB+D [3, 4] RGB+D+IR+3D human joint
Multi Labels (Temporally Localized Actions)
 MMAct [5] RGB+keypoints+acc+gyro+orientation
 LEMMA[6] RGB+D
 HOMAGE [7] RGB+IR+audio+RGB light+light+acc+gyro
etc.
[1] Li, Wanqing, et al. "Action recognition based on a bag of 3d points." CVPRW 2010
[2] Liu, Chunhui, et al. "Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding." arxiv 2017
[3] Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." CVPR 2016
[4] Liu, Jun, et al. "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding." TPAMI 2019
[5] Kong, Quan, et al. "Mmact: A large-scale dataset for cross modal human action understanding." ICCV 2019
[6] Jia, Baoxiong, et al. "Lemma: A multi-view dataset for learning multi-agent multi-task activities." ECCV 2020
[7] Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021
6
Action Recognition Datasets
Multi-modal
Single Label (Video-level Action)
 PKU-MMD RGB+D+IR+skeletion
Liu, Chunhui, et al. "Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding." arxiv 2017
7
Action Recognition Datasets
Multi-modal
Single Label (Video-level Action)
 NTU RGB+D RGB+D+IR+3D human joint
Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." CVPR 2016
Liu, Jun, et al. "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding." TPAMI 2019
RGB RGB+Joints Depth Depth+Joints IR
8
Action Recognition Datasets
Multi-modal
Multi Labels (Temporally Localized Actions)
 MMAct RGB+keypoints+acc+gyro+orientation
Kong, Quan, et al. "Mmact: A large-scale dataset for cross modal human action understanding." ICCV 2019
9
Action Recognition Datasets
Multi-modal
Multi Labels (Temporally Localized Actions)
 HOMAGE RGB+IR+audio+RGB light+light+acc+gyro
etc.
Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021
Thank
You
Sangmin Woo
sangminwoo.github.i
o
smwoo95@kaist.ac.k
r
sangminwoo

More Related Content

Similar to Action Recognition Datasets.pptx

A Framework for Human Action Detection via Extraction of Multimodal Features
A Framework for Human Action Detection via Extraction of Multimodal FeaturesA Framework for Human Action Detection via Extraction of Multimodal Features
A Framework for Human Action Detection via Extraction of Multimodal Features
CSCJournals
 
Human Activity Recognition Using Recurrent Neural Network
Human Activity Recognition Using Recurrent Neural NetworkHuman Activity Recognition Using Recurrent Neural Network
Human Activity Recognition Using Recurrent Neural Network
mlaij
 

Similar to Action Recognition Datasets.pptx (20)

An Efficient Activity Detection System based on Skeleton Joints Identification
An Efficient Activity Detection System based on Skeleton Joints Identification An Efficient Activity Detection System based on Skeleton Joints Identification
An Efficient Activity Detection System based on Skeleton Joints Identification
 
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
 
Human Activity Recognition Using Neural Network
Human Activity Recognition Using Neural NetworkHuman Activity Recognition Using Neural Network
Human Activity Recognition Using Neural Network
 
IRJET- Tracking and Recognition of Multiple Human and Non-Human Activites
IRJET- Tracking and Recognition of Multiple Human and Non-Human ActivitesIRJET- Tracking and Recognition of Multiple Human and Non-Human Activites
IRJET- Tracking and Recognition of Multiple Human and Non-Human Activites
 
A Framework for Human Action Detection via Extraction of Multimodal Features
A Framework for Human Action Detection via Extraction of Multimodal FeaturesA Framework for Human Action Detection via Extraction of Multimodal Features
A Framework for Human Action Detection via Extraction of Multimodal Features
 
IRJET- Object Detection in Real Time using AI and Deep Learning
IRJET- Object Detection in Real Time using AI and Deep LearningIRJET- Object Detection in Real Time using AI and Deep Learning
IRJET- Object Detection in Real Time using AI and Deep Learning
 
Motion Prediction Using Depth Information of Human Arm Based on Alexnet
Motion Prediction Using Depth Information of Human Arm Based on AlexnetMotion Prediction Using Depth Information of Human Arm Based on Alexnet
Motion Prediction Using Depth Information of Human Arm Based on Alexnet
 
MOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNET
MOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNETMOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNET
MOTION PREDICTION USING DEPTH INFORMATION OF HUMAN ARM BASED ON ALEXNET
 
IRJET- A Survey on Object Detection using Deep Learning Techniques
IRJET- A Survey on Object Detection using Deep Learning TechniquesIRJET- A Survey on Object Detection using Deep Learning Techniques
IRJET- A Survey on Object Detection using Deep Learning Techniques
 
BTP Report.pdf
BTP Report.pdfBTP Report.pdf
BTP Report.pdf
 
Progress Reprot.pptx
Progress Reprot.pptxProgress Reprot.pptx
Progress Reprot.pptx
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep Learning
 
SURVEILLANCE VIDEO BASED ROBUST DETECTION AND NOTIFICATION OF REAL TIME SUSPI...
SURVEILLANCE VIDEO BASED ROBUST DETECTION AND NOTIFICATION OF REAL TIME SUSPI...SURVEILLANCE VIDEO BASED ROBUST DETECTION AND NOTIFICATION OF REAL TIME SUSPI...
SURVEILLANCE VIDEO BASED ROBUST DETECTION AND NOTIFICATION OF REAL TIME SUSPI...
 
Surveillance Video Based Robust Detection and Notification of Real Time Suspi...
Surveillance Video Based Robust Detection and Notification of Real Time Suspi...Surveillance Video Based Robust Detection and Notification of Real Time Suspi...
Surveillance Video Based Robust Detection and Notification of Real Time Suspi...
 
Top downloaded article in academia 2020 - International Journal of Informatio...
Top downloaded article in academia 2020 - International Journal of Informatio...Top downloaded article in academia 2020 - International Journal of Informatio...
Top downloaded article in academia 2020 - International Journal of Informatio...
 
Survey on Human Behavior Recognition using CNN
Survey on Human Behavior Recognition using CNNSurvey on Human Behavior Recognition using CNN
Survey on Human Behavior Recognition using CNN
 
Human Activity Recognition Using Recurrent Neural Network
Human Activity Recognition Using Recurrent Neural NetworkHuman Activity Recognition Using Recurrent Neural Network
Human Activity Recognition Using Recurrent Neural Network
 
Indian Sign Language Recognition Method For Deaf People
Indian Sign Language Recognition Method For Deaf PeopleIndian Sign Language Recognition Method For Deaf People
Indian Sign Language Recognition Method For Deaf People
 
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORYACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
 
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORYACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
 

More from Sangmin Woo

More from Sangmin Woo (14)

Multimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxMultimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptx
 
Video Transformers.pptx
Video Transformers.pptxVideo Transformers.pptx
Video Transformers.pptx
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
 
Visual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptxVisual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptx
 
Video Grounding.pptx
Video Grounding.pptxVideo Grounding.pptx
Video Grounding.pptx
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Towards Efficient Transformers
Towards Efficient TransformersTowards Efficient Transformers
Towards Efficient Transformers
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Neural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextNeural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global context
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Action Recognition Datasets.pptx

  • 1. 2022-04-21 Sangmin Woo Computational Intelligence Lab. School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Multi-modal Action Recognition Datasets & Benchmarks
  • 2. 2 Action Recognition Datasets Generic  Kinetics [1]  Charades [2]  Activity Net [3]  UCF101 [4] Instructional  YouCook [5]  COIN [6]  HowTo100M [7] [1] Carreira, Joao, et al. " Quo vadis, action recognition? a new model and the kinetics dataset." CVPR 2017 [2] Sigurdsson, Gunnar A., et al. "Hollywood in homes: Crowdsourcing data collection for activity understanding." ECCV 2016 [3] Caba Heilbron, Fabian, et al. "Activitynet: A large-scale video benchmark for human activity understanding." CVPR 2015 [4] Soomro, Khurram, et al. "UCF101: A dataset of 101 human actions classes from videos in the wild." arXiv 2012 [5] Zhou, Luowei, et al. "Towards automatic learning of procedures from web instructional videos." AAAI 2018 [6] Tang, Yansong, et al. "Coin: A large-scale dataset for comprehensive instructional video analysis." CVPR 2019 [7] Miech, Antoine, et al. "Howto100m: Learning a text-video embedding by watching hundred million narrated video clips." ICCV 2019
  • 3. 3 Action Recognition Datasets Ego-centric  EPIC Kitchens [1] Compositional  Action Genome [2]  Something-Something [3]  HOMAGE [4] [1] Damen, Dima, et al. "Scaling egocentric vision: The epic-kitchens dataset." ECCV 2018 [2] Ji, Jingwei, et al. "Action genome: Actions as compositions of spatio-temporal scene graphs." CVPR 2020 [3] Goyal, Raghav, et al. "The" something something" video database for learning and evaluating visual common sense." ICCV 2017 [4] Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021
  • 4. 4 Action Recognition Datasets Multi-view  LEMMA [1] [1] Jia, Baoxiong, et al. "Lemma: A multi-view dataset for learning multi-agent multi-task activities." ECCV 2020
  • 5. 5 Action Recognition Datasets Multi-modal Single Label (Video-level Action)  MSR-Action3D [1] depth map  PKU-MMD [2] RGB+D+IR+skeletion  NTU RGB+D [3, 4] RGB+D+IR+3D human joint Multi Labels (Temporally Localized Actions)  MMAct [5] RGB+keypoints+acc+gyro+orientation  LEMMA[6] RGB+D  HOMAGE [7] RGB+IR+audio+RGB light+light+acc+gyro etc. [1] Li, Wanqing, et al. "Action recognition based on a bag of 3d points." CVPRW 2010 [2] Liu, Chunhui, et al. "Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding." arxiv 2017 [3] Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." CVPR 2016 [4] Liu, Jun, et al. "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding." TPAMI 2019 [5] Kong, Quan, et al. "Mmact: A large-scale dataset for cross modal human action understanding." ICCV 2019 [6] Jia, Baoxiong, et al. "Lemma: A multi-view dataset for learning multi-agent multi-task activities." ECCV 2020 [7] Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021
  • 6. 6 Action Recognition Datasets Multi-modal Single Label (Video-level Action)  PKU-MMD RGB+D+IR+skeletion Liu, Chunhui, et al. "Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding." arxiv 2017
  • 7. 7 Action Recognition Datasets Multi-modal Single Label (Video-level Action)  NTU RGB+D RGB+D+IR+3D human joint Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." CVPR 2016 Liu, Jun, et al. "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding." TPAMI 2019 RGB RGB+Joints Depth Depth+Joints IR
  • 8. 8 Action Recognition Datasets Multi-modal Multi Labels (Temporally Localized Actions)  MMAct RGB+keypoints+acc+gyro+orientation Kong, Quan, et al. "Mmact: A large-scale dataset for cross modal human action understanding." ICCV 2019
  • 9. 9 Action Recognition Datasets Multi-modal Multi Labels (Temporally Localized Actions)  HOMAGE RGB+IR+audio+RGB light+light+acc+gyro etc. Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021

Editor's Notes

  1. Thank you.