SlideShare a Scribd company logo
Bayesian Neural Networks for
Uncertainty Quantification
John Mitros
ioannis.mitros@insight-centre.org
(3/12/2019)
What is a Neural Network?
Neural networks are parameterised functions
• Data: ! = {(%&, (&)}&+,
-
= (., ()
• Parameters /are weights of neural nets.
• Feedforward neural nets model 0 (1 21, / as a nonlinear
function of / and 2, e.g.:
0 (1 = 1 21, / = 4(5
6
/7%6
(&)
)
• Multilayer/deep neural networks model the overall function as a
composition of functions (layers), e.g.:
(& = 5
8
/8
(9)
4(/86
(,)
%6
(&)
)
• Usually trained to maximise the likelihood using variants of SGD
optimisation 1
Limitations of Deep Learning
Neural networks and deep learning systems provide great performance on many
tasks, but they are generally:
• data hungry (e.g. often millions of examples)
• compute intensive to train and deploy (GPU resources)
• poor at representing uncertainty
• miscalibrated
• difficult to optimize (non-convex + choice of architecture, learning procedure,
initialisation, etc, require experimentation)
2
Objectives
• Questions:
• Are Bayesian neural networks better calibrated?
• Can Bayesian neural networks predict out of distribution samples with high uncertainty?
• Approximate Bayesian Inference Techniques
• MC-Dropout (Gal et al. ‘15)
• Stochastic Weight Averaging of Gaussian Samples - SWAG (Maddox et al. ‘19)
• Models
• VGG16
• PreResNet164
• WideResNet28x10
3
Bayesian Neural Networks
Bayesian neural network
• Data: ! = {(%&, (&)}&+,
-
= (., ()
• Parameters /are weights of neural nets.
• prior 0(/|!)
• posterior 0 / !, % ∝ 0 ( ., / 0(/|!)
• prediction 0 (∗ !, %∗ = ∫ 0 (∗ %∗, / 0 / ! 5/
4
Bayesian Machine Learning
• Learning
! " #, % =
! # ", % '("|%)
! # %
• Prediction
! + #, % = , ! + ", #, % ! " #, % -"
• Model Comparison
! % # =
! # % !(%)
! #
5
MC-Dropout
• MC-Dropout (Gal et al. ‘15) view of dropout at test time as
approximate Bayesian inference
• Establishes relationship between neural networks with dropout and
Gaussian Processes
• A Gaussian Processes over a dataset (X, Y)
• Key idea: represent K with a probabilistic neural network
• Because the integral is intractable use MC to approximate
• Consider such that
6
&
SWAG
• Stochastic weight averaging of Gaussian samples (Maddox et al. ‘19)
an extension of stochastic weight averaging (SWA, i.e. averaged SGD)
• The weights of a NN are averaged during different SGD iterates, which
in itself can be viewed as approximate Bayesian inference
• Key idea: noisy SGD dynamics resemble sampling techniques (MCMC,
Gibbs sampler, etc.)
• SWAG estimates the covariance from the weights of a NN
• Maintains a running average to compute the covariance
yielding approximate Gaussian posterior
• At test time Bayesian model averaging yields final predictions from posterior
7
Example of Miscalibration
• Miscalibration expressed as deviation between accuracy & confidence
• Miscalibrated classifier = ∑"#$
% |'(|
)
*++ ," − +./0 ," > 0
8
Results
9
Calibration Confidence Error
10
Calibration Confidence Error
11
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
VGG16-SGD
VGG16-M
C
Dropout
VGG16-SW
AG
PreResNet164-SGD
PreResNet164-M
CDropout
PreResNet164-SW
AG
W
ideResNet28x10-SGD
W
ideResNet28x10-M
C
Dropout
W
ideResNet28x10-SW
AG
Expected Calibration Error
CIFAR10 SVHN
Example of Representing Uncertainty
• The model predicts with high confidence the wrong category
• Desirable:
• make model aware of inputs outside the data distribution
• reduce model’s confidence of wrong predictions
12
Results
13
Out of Distribution Uncertainty
14
Out of Distribution Uncertainty
• Capturing uncertainty in a
quantifiable scalar metric
• Higher values indicate models’
ability to respond to uncertain
inputs with low confidence
• Symmetric KL
!"# = !"#(&| ( + !"#((||&)
15
0
1
2
3
4
5
6
7
VGG16-SGD
VGG16-M
C
Dropout
VGG16-SW
AG
PreResNet164-SGD
PreResNet164-M
CDropout
PreResNet164-SW
AGW
ideResNet28x10-SGD
W
ideResNet28x10-M
C
Dropout
W
ideResNet28x10-SW
AG
Entropy
CIFAR10 SVHN
Accuracy on Test Set
94.4
93.26
93.8
93.56
94.68
93.14
94.04
95.54
95.12
97.1
96.87 96.83
97.9 97.73 97.69
97.44
97.63
97.95
VGG16-SGD
VGG16-M
C
DROPOUT
VGG16-SW
AG
PRERESNET164-SGD
PRERESNET164-M
C
DROPOUT
PRERESNET164-SW
AG
W
IDERESNET28X10-SGD
W
IDERESNET28X10-M
C
DROPOUT
W
IDERESNET28X10-SW
AG
Accuracy
CIFAR10 SVHN
16
References
• 1. Choi, H., Jang, E., Alemi, A.A.: WAIC, but Why? Generative Ensembles for
Robust Anomaly Detection. arXiv:1810.01392 [cs, stat] (Oct 2018)
• 2. Damianou, A.C., Lawrence, N.D.: Deep Gaussian Processes. arXiv e-prints (Nov
2012)
• 3. Fawzi, A., Fawzi, H., Fawzi, O.: Adversarial vulnerability for any classifier. Neural
Information Processing Systems (Feb 2018)
• 4. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing
Model Uncertainty in Deep Learning. arXiv e-prints (Jun 2015)
• 5. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On Calibration of Modern Neural
Networks. International Conference on Machine Learning (Jun 2017)
• 6. He, K., Zhang, X., Ren, S., Sun, J.: Identity Mappings in Deep Residual Networks.
arXiv e-prints (Mar 2016)
21
Thanks!
Preprint: https://arxiv.org/abs/1912.01530
22
Acknowledgement. This work was supported by Science Foundation Ireland under Grant No.15/CDA/3520 and
Grant No. 12/RC/2289.

More Related Content

What's hot

LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要
Kenji Urai
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
Deep Learning JP
 
Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"
Yusuke Iwasawa
 
頻度論とベイズ論と誤差最小化について
頻度論とベイズ論と誤差最小化について頻度論とベイズ論と誤差最小化について
頻度論とベイズ論と誤差最小化について
Shohei Miyashita
 
CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程
ssuser87f46e
 
ICML 2021 Workshop 深層学習の不確実性について
ICML 2021 Workshop 深層学習の不確実性についてICML 2021 Workshop 深層学習の不確実性について
ICML 2021 Workshop 深層学習の不確実性について
tmtm otm
 
ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章
YosukeAkasaka
 
[DL輪読会]An Iterative Framework for Self-supervised Deep Speaker Representatio...
[DL輪読会]An Iterative Framework for Self-supervised Deep  Speaker Representatio...[DL輪読会]An Iterative Framework for Self-supervised Deep  Speaker Representatio...
[DL輪読会]An Iterative Framework for Self-supervised Deep Speaker Representatio...
Deep Learning JP
 
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック 大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
西岡 賢一郎
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
Sangmin Woo
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
cvpaper. challenge
 
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
Deep Learning JP
 
データ解析7 主成分分析の基礎
データ解析7 主成分分析の基礎データ解析7 主成分分析の基礎
データ解析7 主成分分析の基礎
Hirotaka Hachiya
 
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
Deep Learning JP
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
Kazuyuki Miyazawa
 
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
Deep Learning JP
 
[DeepLearning論文読み会] Dataset Distillation
[DeepLearning論文読み会] Dataset Distillation[DeepLearning論文読み会] Dataset Distillation
[DeepLearning論文読み会] Dataset Distillation
Ryutaro Yamauchi
 
[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation
Deep Learning JP
 
機械学習と深層学習入門
機械学習と深層学習入門機械学習と深層学習入門
機械学習と深層学習入門
Yuta Takahashi
 
深層ニューラルネットワークの積分表現(Deepを定式化する数学)
深層ニューラルネットワークの積分表現(Deepを定式化する数学)深層ニューラルネットワークの積分表現(Deepを定式化する数学)
深層ニューラルネットワークの積分表現(Deepを定式化する数学)
Katsuya Ito
 

What's hot (20)

LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"
 
頻度論とベイズ論と誤差最小化について
頻度論とベイズ論と誤差最小化について頻度論とベイズ論と誤差最小化について
頻度論とベイズ論と誤差最小化について
 
CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程CNNの誤差逆伝播/Deconvolutionの計算過程
CNNの誤差逆伝播/Deconvolutionの計算過程
 
ICML 2021 Workshop 深層学習の不確実性について
ICML 2021 Workshop 深層学習の不確実性についてICML 2021 Workshop 深層学習の不確実性について
ICML 2021 Workshop 深層学習の不確実性について
 
ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章
 
[DL輪読会]An Iterative Framework for Self-supervised Deep Speaker Representatio...
[DL輪読会]An Iterative Framework for Self-supervised Deep  Speaker Representatio...[DL輪読会]An Iterative Framework for Self-supervised Deep  Speaker Representatio...
[DL輪読会]An Iterative Framework for Self-supervised Deep Speaker Representatio...
 
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック 大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
 
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
 
データ解析7 主成分分析の基礎
データ解析7 主成分分析の基礎データ解析7 主成分分析の基礎
データ解析7 主成分分析の基礎
 
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
 
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
 
[DeepLearning論文読み会] Dataset Distillation
[DeepLearning論文読み会] Dataset Distillation[DeepLearning論文読み会] Dataset Distillation
[DeepLearning論文読み会] Dataset Distillation
 
[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation[DL輪読会]Ensemble Distribution Distillation
[DL輪読会]Ensemble Distribution Distillation
 
機械学習と深層学習入門
機械学習と深層学習入門機械学習と深層学習入門
機械学習と深層学習入門
 
深層ニューラルネットワークの積分表現(Deepを定式化する数学)
深層ニューラルネットワークの積分表現(Deepを定式化する数学)深層ニューラルネットワークの積分表現(Deepを定式化する数学)
深層ニューラルネットワークの積分表現(Deepを定式化する数学)
 

Similar to On the Validity of Bayesian Neural Networks for Uncertainty Estimation

Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
Roger Barga
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
Shirin Elsinghorst
 
Is ScalaC Getting Faster, or Am I just Imagining It
Is ScalaC Getting Faster, or Am I just Imagining ItIs ScalaC Getting Faster, or Am I just Imagining It
Is ScalaC Getting Faster, or Am I just Imagining It
Rory Graves
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
ImXaib
 
Improving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeImproving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis code
Yaara Erez
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_final
Noha Elprince
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
VisageCloud
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
Alex Gilgur
 
Machine Learning - Principles
Machine Learning - PrinciplesMachine Learning - Principles
Machine Learning - Principles
Giorgio Alfredo Spedicato
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
ESCOM
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 

Similar to On the Validity of Bayesian Neural Networks for Uncertainty Estimation (20)

Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
Is ScalaC Getting Faster, or Am I just Imagining It
Is ScalaC Getting Faster, or Am I just Imagining ItIs ScalaC Getting Faster, or Am I just Imagining It
Is ScalaC Getting Faster, or Am I just Imagining It
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
Improving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeImproving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis code
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_final
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Machine Learning - Principles
Machine Learning - PrinciplesMachine Learning - Principles
Machine Learning - Principles
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 

Recently uploaded

一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 

Recently uploaded (20)

一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 

On the Validity of Bayesian Neural Networks for Uncertainty Estimation

  • 1. Bayesian Neural Networks for Uncertainty Quantification John Mitros ioannis.mitros@insight-centre.org (3/12/2019)
  • 2. What is a Neural Network? Neural networks are parameterised functions • Data: ! = {(%&, (&)}&+, - = (., () • Parameters /are weights of neural nets. • Feedforward neural nets model 0 (1 21, / as a nonlinear function of / and 2, e.g.: 0 (1 = 1 21, / = 4(5 6 /7%6 (&) ) • Multilayer/deep neural networks model the overall function as a composition of functions (layers), e.g.: (& = 5 8 /8 (9) 4(/86 (,) %6 (&) ) • Usually trained to maximise the likelihood using variants of SGD optimisation 1
  • 3. Limitations of Deep Learning Neural networks and deep learning systems provide great performance on many tasks, but they are generally: • data hungry (e.g. often millions of examples) • compute intensive to train and deploy (GPU resources) • poor at representing uncertainty • miscalibrated • difficult to optimize (non-convex + choice of architecture, learning procedure, initialisation, etc, require experimentation) 2
  • 4. Objectives • Questions: • Are Bayesian neural networks better calibrated? • Can Bayesian neural networks predict out of distribution samples with high uncertainty? • Approximate Bayesian Inference Techniques • MC-Dropout (Gal et al. ‘15) • Stochastic Weight Averaging of Gaussian Samples - SWAG (Maddox et al. ‘19) • Models • VGG16 • PreResNet164 • WideResNet28x10 3
  • 5. Bayesian Neural Networks Bayesian neural network • Data: ! = {(%&, (&)}&+, - = (., () • Parameters /are weights of neural nets. • prior 0(/|!) • posterior 0 / !, % ∝ 0 ( ., / 0(/|!) • prediction 0 (∗ !, %∗ = ∫ 0 (∗ %∗, / 0 / ! 5/ 4
  • 6. Bayesian Machine Learning • Learning ! " #, % = ! # ", % '("|%) ! # % • Prediction ! + #, % = , ! + ", #, % ! " #, % -" • Model Comparison ! % # = ! # % !(%) ! # 5
  • 7. MC-Dropout • MC-Dropout (Gal et al. ‘15) view of dropout at test time as approximate Bayesian inference • Establishes relationship between neural networks with dropout and Gaussian Processes • A Gaussian Processes over a dataset (X, Y) • Key idea: represent K with a probabilistic neural network • Because the integral is intractable use MC to approximate • Consider such that 6 &
  • 8. SWAG • Stochastic weight averaging of Gaussian samples (Maddox et al. ‘19) an extension of stochastic weight averaging (SWA, i.e. averaged SGD) • The weights of a NN are averaged during different SGD iterates, which in itself can be viewed as approximate Bayesian inference • Key idea: noisy SGD dynamics resemble sampling techniques (MCMC, Gibbs sampler, etc.) • SWAG estimates the covariance from the weights of a NN • Maintains a running average to compute the covariance yielding approximate Gaussian posterior • At test time Bayesian model averaging yields final predictions from posterior 7
  • 9. Example of Miscalibration • Miscalibration expressed as deviation between accuracy & confidence • Miscalibrated classifier = ∑"#$ % |'(| ) *++ ," − +./0 ," > 0 8
  • 13. Example of Representing Uncertainty • The model predicts with high confidence the wrong category • Desirable: • make model aware of inputs outside the data distribution • reduce model’s confidence of wrong predictions 12
  • 15. Out of Distribution Uncertainty 14
  • 16. Out of Distribution Uncertainty • Capturing uncertainty in a quantifiable scalar metric • Higher values indicate models’ ability to respond to uncertain inputs with low confidence • Symmetric KL !"# = !"#(&| ( + !"#((||&) 15 0 1 2 3 4 5 6 7 VGG16-SGD VGG16-M C Dropout VGG16-SW AG PreResNet164-SGD PreResNet164-M CDropout PreResNet164-SW AGW ideResNet28x10-SGD W ideResNet28x10-M C Dropout W ideResNet28x10-SW AG Entropy CIFAR10 SVHN
  • 17. Accuracy on Test Set 94.4 93.26 93.8 93.56 94.68 93.14 94.04 95.54 95.12 97.1 96.87 96.83 97.9 97.73 97.69 97.44 97.63 97.95 VGG16-SGD VGG16-M C DROPOUT VGG16-SW AG PRERESNET164-SGD PRERESNET164-M C DROPOUT PRERESNET164-SW AG W IDERESNET28X10-SGD W IDERESNET28X10-M C DROPOUT W IDERESNET28X10-SW AG Accuracy CIFAR10 SVHN 16
  • 18. References • 1. Choi, H., Jang, E., Alemi, A.A.: WAIC, but Why? Generative Ensembles for Robust Anomaly Detection. arXiv:1810.01392 [cs, stat] (Oct 2018) • 2. Damianou, A.C., Lawrence, N.D.: Deep Gaussian Processes. arXiv e-prints (Nov 2012) • 3. Fawzi, A., Fawzi, H., Fawzi, O.: Adversarial vulnerability for any classifier. Neural Information Processing Systems (Feb 2018) • 4. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. arXiv e-prints (Jun 2015) • 5. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On Calibration of Modern Neural Networks. International Conference on Machine Learning (Jun 2017) • 6. He, K., Zhang, X., Ren, S., Sun, J.: Identity Mappings in Deep Residual Networks. arXiv e-prints (Mar 2016) 21
  • 19. Thanks! Preprint: https://arxiv.org/abs/1912.01530 22 Acknowledgement. This work was supported by Science Foundation Ireland under Grant No.15/CDA/3520 and Grant No. 12/RC/2289.