On the Validity of Bayesian Neural Networks for Uncertainty Estimation

Bayesian Neural Networks for
Uncertainty Quantification
John Mitros
ioannis.mitros@insight-centre.org
(3/12/2019)

What is a Neural Network?
Neural networks are parameterised functions
• Data: ! = {(%&, (&)}&+,
-
= (., ()
• Parameters /are weights of neural nets.
• Feedforward neural nets model 0 (1 21, / as a nonlinear
function of / and 2, e.g.:
0 (1 = 1 21, / = 4(5
6
/7%6
(&)
)
• Multilayer/deep neural networks model the overall function as a
composition of functions (layers), e.g.:
(& = 5
8
/8
(9)
4(/86
(,)
%6
(&)
)
• Usually trained to maximise the likelihood using variants of SGD
optimisation 1

Limitations of Deep Learning
Neural networks and deep learning systems provide great performance on many
tasks, but they are generally:
• data hungry (e.g. often millions of examples)
• compute intensive to train and deploy (GPU resources)
• poor at representing uncertainty
• miscalibrated
• difficult to optimize (non-convex + choice of architecture, learning procedure,
initialisation, etc, require experimentation)
2

Objectives
• Questions:
• Are Bayesian neural networks better calibrated?
• Can Bayesian neural networks predict out of distribution samples with high uncertainty?
• Approximate Bayesian Inference Techniques
• MC-Dropout (Gal et al. ‘15)
• Stochastic Weight Averaging of Gaussian Samples - SWAG (Maddox et al. ‘19)
• Models
• VGG16
• PreResNet164
• WideResNet28x10
3

Bayesian Neural Networks
Bayesian neural network
• Data: ! = {(%&, (&)}&+,
-
= (., ()
• Parameters /are weights of neural nets.
• prior 0(/|!)
• posterior 0 / !, % ∝ 0 ( ., / 0(/|!)
• prediction 0 (∗ !, %∗ = ∫ 0 (∗ %∗, / 0 / ! 5/
4

Bayesian Machine Learning
• Learning
! " #, % =
! # ", % '("|%)
! # %
• Prediction
! + #, % = , ! + ", #, % ! " #, % -"
• Model Comparison
! % # =
! # % !(%)
! #
5

MC-Dropout
• MC-Dropout (Gal et al. ‘15) view of dropout at test time as
approximate Bayesian inference
• Establishes relationship between neural networks with dropout and
Gaussian Processes
• A Gaussian Processes over a dataset (X, Y)
• Key idea: represent K with a probabilistic neural network
• Because the integral is intractable use MC to approximate
• Consider such that
6
&

SWAG
• Stochastic weight averaging of Gaussian samples (Maddox et al. ‘19)
an extension of stochastic weight averaging (SWA, i.e. averaged SGD)
• The weights of a NN are averaged during different SGD iterates, which
in itself can be viewed as approximate Bayesian inference
• Key idea: noisy SGD dynamics resemble sampling techniques (MCMC,
Gibbs sampler, etc.)
• SWAG estimates the covariance from the weights of a NN
• Maintains a running average to compute the covariance
yielding approximate Gaussian posterior
• At test time Bayesian model averaging yields final predictions from posterior
7

Example of Miscalibration
• Miscalibration expressed as deviation between accuracy & confidence
• Miscalibrated classifier = ∑"#$
% |'(|
)
*++ ," − +./0 ," > 0
8

Calibration Confidence Error
10

Calibration Confidence Error
11
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
VGG16-SGD
VGG16-M
C
Dropout
VGG16-SW
AG
PreResNet164-SGD
PreResNet164-M
CDropout
PreResNet164-SW
AG
W
ideResNet28x10-SGD
W
ideResNet28x10-M
C
Dropout
W
ideResNet28x10-SW
AG
Expected Calibration Error
CIFAR10 SVHN

Example of Representing Uncertainty
• The model predicts with high confidence the wrong category
• Desirable:
• make model aware of inputs outside the data distribution
• reduce model’s confidence of wrong predictions
12

Out of Distribution Uncertainty
14

Out of Distribution Uncertainty
• Capturing uncertainty in a
quantifiable scalar metric
• Higher values indicate models’
ability to respond to uncertain
inputs with low confidence
• Symmetric KL
!"# = !"#(&| ( + !"#((||&)
15
0
1
2
3
4
5
6
7
VGG16-SGD
VGG16-M
C
Dropout
VGG16-SW
AG
PreResNet164-SGD
PreResNet164-M
CDropout
PreResNet164-SW
AGW
ideResNet28x10-SGD
W
ideResNet28x10-M
C
Dropout
W
ideResNet28x10-SW
AG
Entropy
CIFAR10 SVHN

Accuracy on Test Set
94.4
93.26
93.8
93.56
94.68
93.14
94.04
95.54
95.12
97.1
96.87 96.83
97.9 97.73 97.69
97.44
97.63
97.95
VGG16-SGD
VGG16-M
C
DROPOUT
VGG16-SW
AG
PRERESNET164-SGD
PRERESNET164-M
C
DROPOUT
PRERESNET164-SW
AG
W
IDERESNET28X10-SGD
W
IDERESNET28X10-M
C
DROPOUT
W
IDERESNET28X10-SW
AG
Accuracy
CIFAR10 SVHN
16

References
• 1. Choi, H., Jang, E., Alemi, A.A.: WAIC, but Why? Generative Ensembles for
Robust Anomaly Detection. arXiv:1810.01392 [cs, stat] (Oct 2018)
• 2. Damianou, A.C., Lawrence, N.D.: Deep Gaussian Processes. arXiv e-prints (Nov
2012)
• 3. Fawzi, A., Fawzi, H., Fawzi, O.: Adversarial vulnerability for any classifier. Neural
Information Processing Systems (Feb 2018)
• 4. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing
Model Uncertainty in Deep Learning. arXiv e-prints (Jun 2015)
• 5. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On Calibration of Modern Neural
Networks. International Conference on Machine Learning (Jun 2017)
• 6. He, K., Zhang, X., Ren, S., Sun, J.: Identity Mappings in Deep Residual Networks.
arXiv e-prints (Mar 2016)
21

Thanks!
Preprint: https://arxiv.org/abs/1912.01530
22
Acknowledgement. This work was supported by Science Foundation Ireland under Grant No.15/CDA/3520 and
Grant No. 12/RC/2289.

On the Validity of Bayesian Neural Networks for Uncertainty Estimation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to On the Validity of Bayesian Neural Networks for Uncertainty Estimation

Similar to On the Validity of Bayesian Neural Networks for Uncertainty Estimation (20)

Recently uploaded

Recently uploaded (20)

On the Validity of Bayesian Neural Networks for Uncertainty Estimation