SlideShare a Scribd company logo
1 of 52
Sharpness-Aware Minimization for
Efficiently Improving Generalization
Presenter
이재윤
1
Fundamental Team
고형권, 김동희, 김준호, 김창연, 송헌, 이민경
Foret, Pierre, et al. "Sharpness-Aware Minimization for Efficiently
Improving Generalization." arXiv preprint arXiv:2010.01412 (2020).
Contents
1. Introduction
2. Sharpness-Aware-Minimization
3. Experiments
1. Introduction
Purpose
Sharp minimum to which a ResNet
trained with SGD converged.
Wide minimum to which the same
ResNet trained with SAM converged.
Sharpness –Generalization Correlation
On Large-Batch Training For Deep Learning: Generalization Gap and Sharp Minima, ICLR 2017
Applied Task
1. Image Classification(CIFAR10, CIFAR100)
2. Finetuning
3. Robustness to Label Noise
Applied Task
2. Sharpness-Aware-Minimization
Motivation
SGD
Adam
RMSProp
SAM
 Only concerns finding global minima.
 Results in Suboptimal for modern
overparameterized models.
 Connection between sharpness of loss
and generalization.
 Seek Flat minima while minimizing loss.
PAC Bayesian Generalization Bound
PAC Bayesian Generalization Bound
Probably the given classifier is Approximately Correct
for the test data.
(given a training dataset drawn i.i.d from distribution D)
Probably Approximately Correct
Probably the given classifier is Approximately Correct
for the test data.
(given a training dataset drawn i.i.d from distribution D)
H : model complexity
m : Training data 의 sample 수
PAC Bayesian Generalization Bound
Probably Approximately Correct
그림 출처 : https://www.textbook.ds100.org/ch/15/bias_cv.html
PAC Bayesian Generalization Bound
PAC Bayesian Generalization Bound
Probably Approximately Correct
Bayesian Generalization Bound
D : test set
PAC Bayesian Generalization Bound
Probably Approximately Correct
Bayesian Generalization Bound
𝑆: 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡
𝑤 ∶ 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝜌 ∶ 𝑣𝑎𝑙𝑢𝑒 𝑏𝑖𝑔𝑔𝑒𝑟 𝑡ℎ𝑎𝑛 0
𝜖 ∶ 𝑛𝑜𝑖𝑠𝑒 𝑎𝑟𝑜𝑢𝑛𝑑 0
PAC Bayesian Generalization Bound
Probably Approximately Correct
Bayesian Generalization Bound
h : strictly increasing function
𝑤: 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝜌 ∶ 𝑣𝑎𝑙𝑢𝑒 𝑏𝑖𝑔𝑔𝑒𝑟 𝑡ℎ𝑎𝑛 0
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
=0
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Training Loss
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Regularizer
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
L2-Regularizer
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
 Minimize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Minimizing Upper Bound leads to Generalization
 Make explicit Sharpness term
 Minimize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
𝑆: 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡
𝑤 ∶ 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝜌 ∶ 𝑣𝑎𝑙𝑢𝑒 𝑏𝑖𝑔𝑔𝑒𝑟 𝑡ℎ𝑎𝑛 0
𝜖 ∶ 𝑛𝑜𝑖𝑠𝑒 𝑎𝑟𝑜𝑢𝑛𝑑 0
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
1st order Taylor
expansion
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Sharpness Aware Minimization
 Before minimizing, find 𝜖 which maximize 𝐿𝑆
𝑆𝐴𝑀
 Dual Norm problem
 Substitute 𝜖 into 𝐿𝑆
𝑆𝐴𝑀
Algorithm
Algorithm
3. Experiments
Experiments – Image Classification
Experiments – Image Classification
Experiments – Image Classification
Experiments – Finetuning
Experiments – Finetuning
Experiments – Finetuning
Experiments – Label Noise
Experiments – Evolution of the spectrum of the Hessian
Experiments – Evolution of the spectrum of the Hessian
Thank you

More Related Content

What's hot

Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models수철 박
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
数学プログラムを Haskell で書くべき 6 の理由
数学プログラムを Haskell で書くべき 6 の理由数学プログラムを Haskell で書くべき 6 の理由
数学プログラムを Haskell で書くべき 6 の理由Hiromi Ishii
 
PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2
PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2
PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2Hiroyuki Kato
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2hirokazutanaka
 
Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근Sooyoung Moon
 
PRML輪読#5
PRML輪読#5PRML輪読#5
PRML輪読#5matsuolab
 
[DL輪読会]陰関数微分を用いた深層学習
[DL輪読会]陰関数微分を用いた深層学習[DL輪読会]陰関数微分を用いた深層学習
[DL輪読会]陰関数微分を用いた深層学習Deep Learning JP
 
PRML輪読#6
PRML輪読#6PRML輪読#6
PRML輪読#6matsuolab
 
(文献紹介) 画像復元:Plug-and-Play ADMM
(文献紹介) 画像復元:Plug-and-Play ADMM(文献紹介) 画像復元:Plug-and-Play ADMM
(文献紹介) 画像復元:Plug-and-Play ADMMMorpho, Inc.
 
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)ryotat
 
PRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルPRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルShohei Okada
 
グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習Ichigaku Takigawa
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSeiya Tokui
 
[DL輪読会]Deep Learning 第12章 アプリケーション
[DL輪読会]Deep Learning 第12章 アプリケーション[DL輪読会]Deep Learning 第12章 アプリケーション
[DL輪読会]Deep Learning 第12章 アプリケーションDeep Learning JP
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
 
指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端Yoichi Iwata
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」Keisuke Sugawara
 
A Brief Survey of Schrödinger Bridge (Part I)
A Brief Survey of Schrödinger Bridge (Part I)A Brief Survey of Schrödinger Bridge (Part I)
A Brief Survey of Schrödinger Bridge (Part I)Morpho, Inc.
 

What's hot (20)

Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
数学プログラムを Haskell で書くべき 6 の理由
数学プログラムを Haskell で書くべき 6 の理由数学プログラムを Haskell で書くべき 6 の理由
数学プログラムを Haskell で書くべき 6 の理由
 
PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2
PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2
PRML上巻勉強会 at 東京大学 資料 第4章4.3.1 〜 4.5.2
 
Prml nn
Prml nnPrml nn
Prml nn
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
 
Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근Natural Policy Gradient 직관적 접근
Natural Policy Gradient 직관적 접근
 
PRML輪読#5
PRML輪読#5PRML輪読#5
PRML輪読#5
 
[DL輪読会]陰関数微分を用いた深層学習
[DL輪読会]陰関数微分を用いた深層学習[DL輪読会]陰関数微分を用いた深層学習
[DL輪読会]陰関数微分を用いた深層学習
 
PRML輪読#6
PRML輪読#6PRML輪読#6
PRML輪読#6
 
(文献紹介) 画像復元:Plug-and-Play ADMM
(文献紹介) 画像復元:Plug-and-Play ADMM(文献紹介) 画像復元:Plug-and-Play ADMM
(文献紹介) 画像復元:Plug-and-Play ADMM
 
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
 
PRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルPRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデル
 
グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
[DL輪読会]Deep Learning 第12章 アプリケーション
[DL輪読会]Deep Learning 第12章 アプリケーション[DL輪読会]Deep Learning 第12章 アプリケーション
[DL輪読会]Deep Learning 第12章 アプリケーション
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端指数時間アルゴリズムの最先端
指数時間アルゴリズムの最先端
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」
 
A Brief Survey of Schrödinger Bridge (Part I)
A Brief Survey of Schrödinger Bridge (Part I)A Brief Survey of Schrödinger Bridge (Part I)
A Brief Survey of Schrödinger Bridge (Part I)
 

Similar to Sharpness-Aware Minimization for Efficiently Improving Generalization

バンディット問題の理論とアルゴリズムとその実装
バンディット問題の理論とアルゴリズムとその実装バンディット問題の理論とアルゴリズムとその実装
バンディット問題の理論とアルゴリズムとその実装EinosukeIida
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsAlan Said
 
4 Types of Approach to A2P Monetisation
4 Types of Approach to A2P Monetisation4 Types of Approach to A2P Monetisation
4 Types of Approach to A2P MonetisationHAUD
 
Presentation at SMI 2023
Presentation at SMI 2023Presentation at SMI 2023
Presentation at SMI 2023Joaquim Jorge
 
An efficient linear elastic FEM solver using automatic local grid refinement ...
An efficient linear elastic FEM solver using automatic local grid refinement ...An efficient linear elastic FEM solver using automatic local grid refinement ...
An efficient linear elastic FEM solver using automatic local grid refinement ...Harshal Patil
 
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...devashishsarkar
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionBennoG1
 
Lecture 10: SVM and MIRA
Lecture 10: SVM and MIRALecture 10: SVM and MIRA
Lecture 10: SVM and MIRAMarina Santini
 

Similar to Sharpness-Aware Minimization for Efficiently Improving Generalization (8)

バンディット問題の理論とアルゴリズムとその実装
バンディット問題の理論とアルゴリズムとその実装バンディット問題の理論とアルゴリズムとその実装
バンディット問題の理論とアルゴリズムとその実装
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
 
4 Types of Approach to A2P Monetisation
4 Types of Approach to A2P Monetisation4 Types of Approach to A2P Monetisation
4 Types of Approach to A2P Monetisation
 
Presentation at SMI 2023
Presentation at SMI 2023Presentation at SMI 2023
Presentation at SMI 2023
 
An efficient linear elastic FEM solver using automatic local grid refinement ...
An efficient linear elastic FEM solver using automatic local grid refinement ...An efficient linear elastic FEM solver using automatic local grid refinement ...
An efficient linear elastic FEM solver using automatic local grid refinement ...
 
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear Regression
 
Lecture 10: SVM and MIRA
Lecture 10: SVM and MIRALecture 10: SVM and MIRA
Lecture 10: SVM and MIRA
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 

Recently uploaded (20)

Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 

Sharpness-Aware Minimization for Efficiently Improving Generalization