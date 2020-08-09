Successfully reported this slideshow.
Rethinking Pre-training and Self-training Google Research, Brain Team Yonsei University Severance Hospital CCIDS Choi Dong...
Introduction • Generality and Flexibility of Self-training with three insights    1) Stronger data augmentation & More lab...
Methodology • Methods and Control Factors  1. Data Augmentation  2. Pre-training  3. Self-training
Methodology • Methods and Control Factors  1. Data Augmentation
Methodology • Methods and Control Factors  1. Data Augmentation AutoAugment RandAugment Automatically search for  improved...
Methodology • Methods and Control Factors  2. Pre-training (EﬃcientNet-B7 baseline)
Methodology • Methods and Control Factors  2. Pre-training (EﬃcientNet-B7 baseline) ImageNet++ Init : EﬃcientNet-B7 + Nois...
Methodology • Methods and Control Factors  3. Self-training (based on Noise Student Method) Qizhe Xie et al. Self-training...
Experiments 1. The eﬀects of augmentation and labeled dataset size on pre-training - Task : COCO object detection  - Netwo...
Experiments 1. The eﬀects of augmentation and labeled dataset size on pre-training - Task : COCO object detection  - Netwo...
Experiments 2. The eﬀects of augmentation and labeled dataset size on self-training - Task : COCO object detection (self-t...
Experiments 2. The eﬀects of augmentation and labeled dataset size on self-training - Task : COCO object detection (self-t...
Experiments 3. Self-supervised pre-training also hurts when self-training helps in high  data/strong augmentation regimes ...
Experiments 4. Exploring the limits of self-training and pre-training - Task : COCO object detection  - Network : SpineNet...
Experiments 4. Exploring the limits of self-training and pre-training - Task : PASCAL VOC Semantic Segmentation  - Network...
Experiments 4. Exploring the limits of self-training and pre-training - Task : PASCAL VOC Semantic Segmentation  - Network...
Discussion 1. Rethinking pre-training and universal feature representations - Requirements of universal feature representa...
Discussion 1. Rethinking pre-training and universal feature representations - Requirements of universal feature representa...
Discussion 2. The beneﬁt of joint-training - Joint-training : jointly train ImageNet classiﬁcation with COCO object detect...
Discussion 3. The importance of the task alignment - aug : additional PASCAL VOC dataset with much noisier labels  - Train...
Discussion 3. The importance of the task alignment - aug : additional PASCAL VOC dataset with much noisier labels  - Train...
Discussion 4. Limitations - Self-training requires more compute than pre-training    - Good pre-trained models are also ne...
Discussion 4. Limitations - Self-training requires more compute than pre-training    - Good pre-trained models are also ne...
Review : Rethinking Pre-training and Self-training

Review : Rethinking Pre-training and Self-training (by Google Research, Brain Team)

Paper Link : https://arxiv.org/abs/2006.06882

Published in: Technology
Review : Rethinking Pre-training and Self-training

  1. 1. Rethinking Pre-training and Self-training Google Research, Brain Team Yonsei University Severance Hospital CCIDS Choi Dongmin
  4. 4. Introduction • Generality and Flexibility of Self-training with three insights    1) Stronger data augmentation & More labeled data  → diminish the value of pre-training    2) Unlike pre-training, self-training is always helpful    3) Self-training improves upon pre-training
  5. 5. Methodology • Methods and Control Factors  1. Data Augmentation  2. Pre-training  3. Self-training
  6. 6. Methodology • Methods and Control Factors  1. Data Augmentation
  7. 7. Methodology • Methods and Control Factors  1. Data Augmentation AutoAugment RandAugment Automatically search for  improved data augmentation policies Remove a separate search space phase on a proxy task more stronger
  8. 8. Methodology • Methods and Control Factors  2. Pre-training (EﬃcientNet-B7 baseline)
  9. 9. Methodology • Methods and Control Factors  2. Pre-training (EﬃcientNet-B7 baseline) ImageNet++ Init : EﬃcientNet-B7 + Noisy Student Method M Tan et al. EﬃcientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019  Qizhe Xie et al. Self-training with Noisy Student improves ImageNet classiﬁcation. arXiv:1911.04252 - A semi-supervised learning  - Self-training + Distillation
  10. 10. Methodology • Methods and Control Factors  3. Self-training (based on Noise Student Method) Qizhe Xie et al. Self-training with Noisy Student improves ImageNet classiﬁcation. arXiv:1911.04252
  11. 11. Experiments 1. The eﬀects of augmentation and labeled dataset size on pre-training - Task : COCO object detection  - Network : RetinaNet with the EﬃcientNet-B7 backbone    - Left : under various ImageNet pre-trained checkpoint and data augmentation strengths TY Lin et al. Focal Loss for Dense Object Detection. ICCV 2017 Finding 1. Pre-training hurts performance when stronger data augmentation is used
  12. 12. Experiments 1. The eﬀects of augmentation and labeled dataset size on pre-training - Task : COCO object detection  - Network : RetinaNet with the EﬃcientNet-B7 backbone    - Right : under various COCO dataset sizes and ImageNet pre-trained checkpoint TY Lin et al. Focal Loss for Dense Object Detection. ICCV 2017 Finding 2. More labeled data diminishes the value of pre-training
  13. 13. Experiments 2. The eﬀects of augmentation and labeled dataset size on self-training - Task : COCO object detection (self-training only treats ImageNet as unlabeled data)  - Network : RetinaNet with the EﬃcientNet-B7 backbone    Finding 1. Self-training helps in high data/strong augmentation regimes,  even when pre-training hurts = Pre-training
  14. 14. Experiments 2. The eﬀects of augmentation and labeled dataset size on self-training - Task : COCO object detection (self-training only treats ImageNet as unlabeled data)  - Network : RetinaNet with the EﬃcientNet-B7 backbone    Finding 2. Self-training works across dataset sizes and  is additive to pre-training.
  15. 15. Experiments 3. Self-supervised pre-training also hurts when self-training helps in high  data/strong augmentation regimes - Task : COCO object detection  - Network : RetinaNet with the ResNet-50 backbone  - All models use Augment-S4  T Chen et al. A Simple Framework for Contrastive Learning of Visual Representations. arXiv:2002.05709 https://amitness.com/2020/03/illustrated-simclr/
  16. 16. Experiments 4. Exploring the limits of self-training and pre-training - Task : COCO object detection  - Network : SpineNet (closer to SOTA)  - Self-training dataset : Open Images Dataset X Du et al. SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. CVPR 2020 SpineNet with Self-training  achieves the best performance
  17. 17. Experiments 4. Exploring the limits of self-training and pre-training - Task : PASCAL VOC Semantic Segmentation  - Network : NAS-FPN (EﬃcientNet backbone)  - Pre-training + Self-training + Augment-S4 - Pre-training dataset : ImageNet - Self-training dataset : aug set of PASCAL G Ghiasi et al. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. CVPR 2019 Improves SOTA by +1.5% mIOU  w/ much less human labels
  18. 18. Experiments 4. Exploring the limits of self-training and pre-training - Task : PASCAL VOC Semantic Segmentation  - Network : NAS-FPN (EﬃcientNet backbone)  - Pre-training + Self-training + Augment-S4 - Pre-training dataset : ImageNet - Self-training dataset : aug set of PASCAL Pre-training with a good checkpoint is crucial  due to PASCAL’s small dataset size < Appendix C >
  21. 21. Discussion 2. The beneﬁt of joint-training - Joint-training : jointly train ImageNet classiﬁcation with COCO object detection    - Random Initialization + Self-training + Joint Training : +4.4 improvement    - Joint Training (+2.9) and Pre-training (+2.6) gives similar improvements,  but Joint Training is achieved by training 19 epochs while Pre-training needed   to be trained for 350 epochs.
  22. 22. Discussion 3. The importance of the task alignment - aug : additional PASCAL VOC dataset with much noisier labels  - Training with aug dataset hurts performance when strong augmentation  - Self-training (pseudo-label on aug dataset) improves accuracy Noisy (PASCAL) or un-targeted (ImageNet) labeling is worse than targeted pseudo labeling
  23. 23. Discussion 3. The importance of the task alignment - aug : additional PASCAL VOC dataset with much noisier labels  - Training with aug dataset hurts performance when strong augmentation  - Self-training (pseudo-label on aug dataset) improves accuracy Noisy (PASCAL) or un-targeted (ImageNet) labeling is worse than targeted pseudo labeling Shao et al : Pre-training on Open Images hurts performance on COCO, despite both of them being annotated with bounding boxes Shao et al. Objects365: A Large-scale, High-quality Dataset for Object Detection. ICCV 2019 Not only the task but the annotations to be same for  pre-training to be beneﬁcial (but self-training is very general)
  24. 24. Discussion 4. Limitations - Self-training requires more compute than pre-training    - Good pre-trained models are also needed for low-data applications  (ex. PASCAL segmentation)
  25. 25. Discussion 4. Limitations - Self-training requires more compute than pre-training    - Good pre-trained models are also needed for low-data applications  (ex. PASCAL segmentation) 5. The scalability, generality and ﬂexibility of self-training - Scalability : works well as we have more labeled data  - Generality : works well even when pre-training fails but also when pre-training  succeeds  - Flexibility : works well in every setup (low or high data / weak or strong aug)  and with diﬀerent architectures, data sources, and tasks The most methods fail when we have more labeled data or more compute or better supervised training recipes,  but that does not seem to self-training
