「解説資料」Reasoning-RCNN: Unifying Adaptive Global Reasoning into Large-scale Obj...
「解説資料」VideoMix: Rethinking Data Augmentation for Video Classification
1. DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
VideoMix: Rethinking Data Augmentation for
Video Classification
Takumi Ohkuma, Nakayama Lab D1
1
2. 書誌情報
• 題名:VideoMix: Rethinking Data Augmentation for Video Classification
• 出典:Arxiv (2020/12/07に投稿)
• 著者:Sangdoo Yun ら5名(韓国のNAVERの研究チーム)
• URL:https://arxiv.org/abs/2012.03457v1
スライド中の出典の明記されていない図表は、本論文の出典である
2
19. 感想
• Simple is Best というしかない結果が得られている
• 結局手法に関する新規性は、CutMixをシンプルに動画へ拡張し、最も精度の高いSpatial CutMixを
提案手法としているくらい?
• 複雑な試みは悉く精度につながっていない(3動画以上の混合、複雑なベータ分布)
• 考察に関してはやや弱いような気がする
• 精度の上がった理由を無理やりこじつけてる感が否めない
• 動画×DAの方向性はまだできることが沢山ありそう
19
20. 引用
1. Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical data augmentation with no separate search.
arXiv preprint arXiv:1909.13719, 2019.
2. Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint
arXiv:1710.09412, 2017.
3. Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint
arXiv:1708.04552, 2017.
4. Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy
to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision, pages
6023–6032, 2019.
5. Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor
Back, Paul Natsev, et al. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
6. Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the
IEEE International Conference on Computer Vision, pages 6202–6211, 2019.
7. Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. Video classification with channel-separated convolutional networks. In
Proceedings of the IEEE International Conference on Computer Vision, pages 5552–5561, 2019.
20
21. 引用
8. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
9. Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a
large number of classes. http://crcv.ucf.edu/THUMOS14/, 2014.
10. Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
11. Krishna Kumar Singh and Yong Jae Lee. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and
action localization. In 2017 IEEE International Conference on Computer Vision, pages 3544–3553. IEEE, 2017.
12. Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici,
Susanna Ricco, Rahul Sukthankar, et al. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pages 6047– 6056, 2018.
13. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal
networks. In Advances in neural information processing systems, pages 91–99, 2015.
21