paper repo - pre training for model robustness and uncertainty

BB
UATOIN
HOUSE SEMINAR
ber, 2nd, 2017
aishu MINAMI
Pre-training and model robustness
and uncertainty
X37  
July, 6th 2019 Tokyo 
bread house seminar

BB
VALUATOIN
PAN HOUSE SEMINAR
Date: December, 2nd, 2017
Presenter: Kaishu MINAMI
1
Paper info
• Title: Using Pre-Training Can Improve Model Robustness and Uncertainty
• Author: Dan Hendricks, Kimin Lee, Mantas Mazeika
• Belonging: UC Berkeley, KAIST, Univ. of Chicago
• Published: ICML 2019 
• Targeted Problem: 
to prove pre-training can improve model robustness and uncertainty 
• Proposed Summary:
✓ “Pre-training does not necessarily help reduce overﬁVng” (He+, 2018), whilst 
it helps to improve model robustness and uncertainty es]mates. 
✓ large gain from pre-training on 1. adversarial examples, 2. label corrup]on, 3. class
imbalance, 4. out-of-distribu]on detec]on, and 5. conﬁdence calibra]on 
✓ adversarial pre-training reaches previous SotA in adversarial robustness

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Recap: Pre-training
• Pre-training model history
✓ Pre-training model [Krizhevsky+, 2012]
✓ SotA object detec]on and segmenta]on [He+, 2017]
✓ “universal representa]ons” that transfer to mul]ple domains [Rebuffi+, 2017]
✓ “pre-train then tune” [Zeiler&Fergus, 2014] 
• When pre-training model works?
✓ the dataset for the target task is extremely small
✴ analyzed the proper]es
❖ fine-tuning should stop [Agrawal+ 2014]
❖ which layers should be fine-tuned [Yoshinski+ 2014]
❖ works on the datasets, including the removal of classes [Huh+ 2016]

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Recap: Rethinking Pre-training
• Rethinking ImageNet Pre-training [He+ 2018]
• Discussion Point:
✓ The results of training from scratch are no worse than their pre-train+tuning,  
with the sole excep]on of increasing the number of training itera]ons
✓ Training from random ini]aliza]on is robust:
✴ using only 10% of the training data
✴ for deeper and wider models
✴ for mul]ple tasks and metrics
train Mask R-CNN with a ResNet-50 FPN and GroupNorm backbone 
on the COCO set. The learning rate is reduced where the accuracy leaps
Total numbers of images, instances, and pixels seen during all training 
itera]ons

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Recap: Robustness & Uncertainty es]mates
• Model robustness to
✓ label corrup5on [Sukhbaastar 2014, Patrini 2017, Zhang&Sabuncu]
✴ using a stochas]c matrix encoding the label noise
✴ two-step training to es]mate the stochas]c matrix for corrected classifier
✴ networks overfit if trained too long (Fig) 
—> pertaining only require fine-tune for a short period 
✓ class imbalance [Japkowicz 2000, He&Gracia 2008, Huang 2016]
✴ sampling from the minority classes
✴ supervised loss func]on, re-weigh]ng each sample by the inverse freq. 
✓ adversarial a6acks [Szegedy 2014] 
• Uncertainty es]mates for
✓ out-of-distribu5on detec5on  
[Hendrycks&Gimpel 2017]
✓ calibra5on [Nguyen&O’Connor 2015]

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Recap: Adversarial Asacks
• Nearly all adversarial defenses have been broken. [Carlini&Wagner 2017] 
• Adversarial robustness for large-scale image classiﬁer remains elusive 
[Engstrom, 2018] 
• Par]ally successful for defending small-scale image classiﬁers agains L perturba]on 
[Madry 2018]

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Test Robustness to Adversarial Perturba]ons
• Training:
✓ adversarially pre-training (with L adversarial perturba]ons)
✓ learning rate: starts 0.1 and anneals following a cosine curve reduc]on
• Model: 28-10 Wide ResNet [Kurakin 2017, Madry 2018] 
• Result:
✓ An adversarially pre-trained network can surpass the SotAs.
✓ There is only a 1.04% decrease in adversarial accuracy, pre-trained with CIFAR-10-
related classes removed. 
> training on more natural images will increase adversarial robustness.
✓ Even if we only adversarially tune last layer, it surpassed the SotAs.

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Test Robustness to Label Corrup]on
• Task:
✓ to predict y = argmax p(y|x) under corrupted label datasets D = (x, [y])
✓ with a ground truth matrix of corrup]on probabili]es 
 
✓ tested 11 experiments with non-diagonal term from 0 to 1 in increments of 0.1 
• Training:
✓ pre-training: downsampled ImageNet classifier against an untargeted adversary
✓ fine-tuning: CIFAR-10 or CIFAR-100 
• Baseline:
✓ Forward [Patrini 2017]: two-stage training procedure 
1. es]mate the matrix C, 2. train corrected classifier
✓ GLC [Hendrycks 2018]: specify the "trusted frac]on" for es]ma]ng the matrix
• Result:
✓ pre-training with label noise correc]on, pre-training model improves the methods
✓ Pre-training with no correc]on yields superior performance

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Test Robustness to Label Corrup]on
Each value is an area under the error curve (AUC). Lower is beser. All values are percentages.

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Test Robustness to Class Imbalance
• Assumed the training samples for a class C is  
imbalanced with a power law model

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Test Uncertainty in Out-of-Distribu]on
• Background: 
models are tasked with assigning anomaly scores to indicate whether a sample is 
in- or out-of-distribu]on 
• Training:
✓ Pre-training: Downsampled ImageNet
✓ ﬁne-tuning: CIFAR-10, CIFAR-100, Tiny ImageNet 
• Result:
✓ both the AUROC and AUPR improve over the baseline

BB
VALUATOIN
PAN HOUSE SEMINAR
1
Test Uncertainty in Calibra]on
• Background: 
deep neural network classifiers display severe overconfidence in the predic]ons, which leads to
egregious assessment 
• How to measure the calibra]on of a classifier 
- difference between the classifier’s confidence and its accuracy at the confidence level 
 
 
• Result:
✓ Large improvements in calibra]on from using pre-training
✓ The gains are complementary with the temperature tuning method [Guo 2017], 
whilst pre-training doesn’t require collec]ng extra data and directly calibra]ng the model

paper repo - pre training for model robustness and uncertainty

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

paper repo - pre training for model robustness and uncertainty