2. BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Paper info
• Title: Using Pre-Training Can Improve Model Robustness and Uncertainty
• Author: Dan Hendricks, Kimin Lee, Mantas Mazeika
• Belonging: UC Berkeley, KAIST, Univ. of Chicago
• Published: ICML 2019
• Targeted Problem:
to prove pre-training can improve model robustness and uncertainty
• Proposed Summary:
✓ “Pre-training does not necessarily help reduce overfiVng” (He+, 2018), whilst
it helps to improve model robustness and uncertainty es]mates.
✓ large gain from pre-training on 1. adversarial examples, 2. label corrup]on, 3. class
imbalance, 4. out-of-distribu]on detec]on, and 5. confidence calibra]on
✓ adversarial pre-training reaches previous SotA in adversarial robustness
3. BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Recap: Pre-training
• Pre-training model history
✓ Pre-training model [Krizhevsky+, 2012]
✓ SotA object detec]on and segmenta]on [He+, 2017]
✓ “universal representa]ons” that transfer to mul]ple domains [Rebuffi+, 2017]
✓ “pre-train then tune” [Zeiler&Fergus, 2014]
• When pre-training model works?
✓ the dataset for the target task is extremely small
✴ analyzed the proper]es
❖ fine-tuning should stop [Agrawal+ 2014]
❖ which layers should be fine-tuned [Yoshinski+ 2014]
❖ works on the datasets, including the removal of classes [Huh+ 2016]
4. BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Recap: Rethinking Pre-training
• Rethinking ImageNet Pre-training [He+ 2018]
• Discussion Point:
✓ The results of training from scratch are no worse than their pre-train+tuning,
with the sole excep]on of increasing the number of training itera]ons
✓ Training from random ini]aliza]on is robust:
✴ using only 10% of the training data
✴ for deeper and wider models
✴ for mul]ple tasks and metrics
train Mask R-CNN with a ResNet-50 FPN and GroupNorm backbone
on the COCO set. The learning rate is reduced where the accuracy leaps
Total numbers of images, instances, and pixels seen during all training
itera]ons
5. BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Recap: Robustness & Uncertainty es]mates
• Model robustness to
✓ label corrup5on [Sukhbaastar 2014, Patrini 2017, Zhang&Sabuncu]
✴ using a stochas]c matrix encoding the label noise
✴ two-step training to es]mate the stochas]c matrix for corrected classifier
✴ networks overfit if trained too long (Fig)
—> pertaining only require fine-tune for a short period
✓ class imbalance [Japkowicz 2000, He&Gracia 2008, Huang 2016]
✴ sampling from the minority classes
✴ supervised loss func]on, re-weigh]ng each sample by the inverse freq.
✓ adversarial a6acks [Szegedy 2014]
• Uncertainty es]mates for
✓ out-of-distribu5on detec5on
[Hendrycks&Gimpel 2017]
✓ calibra5on [Nguyen&O’Connor 2015]
7. BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Test Robustness to Adversarial Perturba]ons
• Training:
✓ adversarially pre-training (with L adversarial perturba]ons)
✓ learning rate: starts 0.1 and anneals following a cosine curve reduc]on
• Model: 28-10 Wide ResNet [Kurakin 2017, Madry 2018]
• Result:
✓ An adversarially pre-trained network can surpass the SotAs.
✓ There is only a 1.04% decrease in adversarial accuracy, pre-trained with CIFAR-10-
related classes removed.
> training on more natural images will increase adversarial robustness.
✓ Even if we only adversarially tune last layer, it surpassed the SotAs.
8. BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Test Robustness to Label Corrup]on
• Task:
✓ to predict y = argmax p(y|x) under corrupted label datasets D = (x, [y])
✓ with a ground truth matrix of corrup]on probabili]es
✓ tested 11 experiments with non-diagonal term from 0 to 1 in increments of 0.1
• Training:
✓ pre-training: downsampled ImageNet classifier against an untargeted adversary
✓ fine-tuning: CIFAR-10 or CIFAR-100
• Baseline:
✓ Forward [Patrini 2017]: two-stage training procedure
1. es]mate the matrix C, 2. train corrected classifier
✓ GLC [Hendrycks 2018]: specify the "trusted frac]on" for es]ma]ng the matrix
• Result:
✓ pre-training with label noise correc]on, pre-training model improves the methods
✓ Pre-training with no correc]on yields superior performance