Action Recognition Datasets.pptx

2022-04-21
Sangmin Woo
Computational Intelligence Lab.
School of Electrical Engineering
Korea Advanced Institute of Science and Technology (KAIST)
Multi-modal Action
Recognition
Datasets & Benchmarks

2
Action Recognition Datasets
Generic
 Kinetics [1]
 Charades [2]
 Activity Net [3]
 UCF101 [4]
Instructional
 YouCook [5]
 COIN [6]
 HowTo100M [7]
[1] Carreira, Joao, et al. " Quo vadis, action recognition? a new model and the kinetics dataset." CVPR 2017
[2] Sigurdsson, Gunnar A., et al. "Hollywood in homes: Crowdsourcing data collection for activity understanding." ECCV 2016
[3] Caba Heilbron, Fabian, et al. "Activitynet: A large-scale video benchmark for human activity understanding." CVPR 2015
[4] Soomro, Khurram, et al. "UCF101: A dataset of 101 human actions classes from videos in the wild." arXiv 2012
[5] Zhou, Luowei, et al. "Towards automatic learning of procedures from web instructional videos." AAAI 2018
[6] Tang, Yansong, et al. "Coin: A large-scale dataset for comprehensive instructional video analysis." CVPR 2019
[7] Miech, Antoine, et al. "Howto100m: Learning a text-video embedding by watching hundred million narrated video clips." ICCV 2019

3
Ego-centric
 EPIC Kitchens [1]
Compositional
 Action Genome [2]
 Something-Something [3]
 HOMAGE [4]
[1] Damen, Dima, et al. "Scaling egocentric vision: The epic-kitchens dataset." ECCV 2018
[2] Ji, Jingwei, et al. "Action genome: Actions as compositions of spatio-temporal scene graphs." CVPR 2020
[3] Goyal, Raghav, et al. "The" something something" video database for learning and evaluating visual common sense." ICCV 2017
[4] Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021

4
Multi-view
 LEMMA [1]
[1] Jia, Baoxiong, et al. "Lemma: A multi-view dataset for learning multi-agent multi-task activities." ECCV 2020

5
Multi-modal
Single Label (Video-level Action)
 MSR-Action3D [1] depth map
 PKU-MMD [2] RGB+D+IR+skeletion
 NTU RGB+D [3, 4] RGB+D+IR+3D human joint
Multi Labels (Temporally Localized Actions)
 MMAct [5] RGB+keypoints+acc+gyro+orientation
 LEMMA[6] RGB+D
 HOMAGE [7] RGB+IR+audio+RGB light+light+acc+gyro
etc.
[1] Li, Wanqing, et al. "Action recognition based on a bag of 3d points." CVPRW 2010
[2] Liu, Chunhui, et al. "Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding." arxiv 2017
[3] Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." CVPR 2016
[4] Liu, Jun, et al. "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding." TPAMI 2019
[5] Kong, Quan, et al. "Mmact: A large-scale dataset for cross modal human action understanding." ICCV 2019
[6] Jia, Baoxiong, et al. "Lemma: A multi-view dataset for learning multi-agent multi-task activities." ECCV 2020
[7] Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021

6
Multi-modal
 PKU-MMD RGB+D+IR+skeletion
Liu, Chunhui, et al. "Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding." arxiv 2017

7
Multi-modal
 NTU RGB+D RGB+D+IR+3D human joint
Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." CVPR 2016
Liu, Jun, et al. "Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding." TPAMI 2019
RGB RGB+Joints Depth Depth+Joints IR

8
Multi-modal
 MMAct RGB+keypoints+acc+gyro+orientation
Kong, Quan, et al. "Mmact: A large-scale dataset for cross modal human action understanding." ICCV 2019

9
Multi-modal
 HOMAGE RGB+IR+audio+RGB light+light+acc+gyro
etc.
Rai, Nishant, et al. "Home Action Genome: Cooperative Compositional Action Understanding." CVPR 2021

Thank
You
Sangmin Woo
sangminwoo.github.i
o
smwoo95@kaist.ac.k
r
sangminwoo

Action Recognition Datasets.pptx

More Related Content

Similar to Action Recognition Datasets.pptx

More from Sangmin Woo

Recently uploaded

Action Recognition Datasets.pptx

Editor's Notes