Sharpness-Aware Minimization for
Efficiently Improving Generalization
Presenter
이재윤
1
Fundamental Team
고형권, 김동희, 김준호, 김창연, 송헌, 이민경
Foret, Pierre, et al. "Sharpness-Aware Minimization for Efficiently
Improving Generalization." arXiv preprint arXiv:2010.01412 (2020).
Contents
1. Introduction
2. Sharpness-Aware-Minimization
3. Experiments
1. Introduction
Purpose
Sharp minimum to which a ResNet
trained with SGD converged.
Wide minimum to which the same
ResNet trained with SAM converged.
Sharpness –Generalization Correlation
On Large-Batch Training For Deep Learning: Generalization Gap and Sharp Minima, ICLR 2017
Applied Task
1. Image Classification(CIFAR10, CIFAR100)
2. Finetuning
3. Robustness to Label Noise
Applied Task
2. Sharpness-Aware-Minimization
Motivation
SGD
Adam
RMSProp
SAM
 Only concerns finding global minima.
 Results in Suboptimal for modern
overparameterized models.
 Connection between sharpness of loss
and generalization.
 Seek Flat minima while minimizing loss.
PAC Bayesian Generalization Bound
PAC Bayesian Generalization Bound
Probably the given classifier is Approximately Correct
for the test data.
(given a training dataset drawn i.i.d from distribution D)
Probably Approximately Correct
Probably the given classifier is Approximately Correct
for the test data.
(given a training dataset drawn i.i.d from distribution D)
H : model complexity
m : Training data 의 sample 수
PAC Bayesian Generalization Bound
Probably Approximately Correct
그림 출처 : https://www.textbook.ds100.org/ch/15/bias_cv.html
PAC Bayesian Generalization Bound
PAC Bayesian Generalization Bound
Probably Approximately Correct
Bayesian Generalization Bound
D : test set
PAC Bayesian Generalization Bound
Probably Approximately Correct
Bayesian Generalization Bound
𝑆: 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡
𝑤 ∶ 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝜌 ∶ 𝑣𝑎𝑙𝑢𝑒 𝑏𝑖𝑔𝑔𝑒𝑟 𝑡ℎ𝑎𝑛 0
𝜖 ∶ 𝑛𝑜𝑖𝑠𝑒 𝑎𝑟𝑜𝑢𝑛𝑑 0
PAC Bayesian Generalization Bound
Probably Approximately Correct
Bayesian Generalization Bound
h : strictly increasing function
𝑤: 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝜌 ∶ 𝑣𝑎𝑙𝑢𝑒 𝑏𝑖𝑔𝑔𝑒𝑟 𝑡ℎ𝑎𝑛 0
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
=0
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Training Loss
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Regularizer
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
L2-Regularizer
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
 Minimize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
 Minimize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
𝑆: 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡
𝑤 ∶ 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝜌 ∶ 𝑣𝑎𝑙𝑢𝑒 𝑏𝑖𝑔𝑔𝑒𝑟 𝑡ℎ𝑎𝑛 0
𝜖 ∶ 𝑛𝑜𝑖𝑠𝑒 𝑎𝑟𝑜𝑢𝑛𝑑 0
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
1st order Taylor
expansion
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Algorithm
Algorithm
3. Experiments
Experiments – Image Classification
Experiments – Image Classification
Experiments – Image Classification
Experiments – Finetuning
Experiments – Finetuning
Experiments – Finetuning
Experiments – Label Noise
Experiments – Evolution of the spectrum of the Hessian
Experiments – Evolution of the spectrum of the Hessian
Thank you

Sharpness-Aware Minimization for Efficiently Improving Generalization