Survey of
Uncertainties in
Deep Learning
in a nutshell
Sungjoon Choi

CPSLAB, SNU
Introduction
2
The first fatality from an assisted driving system
Introduction
3
Google Photos identified two black people as 'gorillas'
Contents
4
5
Y. Gal, Uncertainty in Deep Learning, 2016
Gal (2016)
6
Dropout as a Bayesian approximation
Gal (2016)
7
Dropout as a Bayesian approximation
Gal (2016)
8
Dropout as a Bayesian approximation
Gal (2016)
9
Bayesian Neural Network
p(y⇤
|x⇤
, X, Y) =
Z
p(y⇤
|x⇤
, w)p(w|X, Y)dwInference
p(w|X, Y)Posterior
Gal (2016)
10
Bayesian Neural Network
p(y⇤
|x⇤
, X, Y) =
Z
p(y⇤
|x⇤
, w)p(w|X, Y)dwInference
p(w|X, Y)Posterior
Variational Inference KL (q✓(w)||p(w|X, Y)) =
Z
q✓(w) log
q✓(w)
p(w|X, Y)
dw
ELBO
Z
q✓(w) log p(Y|X, w)dw KL(q✓(w)||p(w))
ELBO (reparametrization)
Z
p(✏) log p(Y|X, w)d✏ KL(q✓(w)||p(w))
w = g(✓, ✏)
Gal (2016)
11
Bayesian Neural Network with dropout
Gal (2016)
12
Bayesian Neural Network with dropout
Gal (2016)
13
Model uncertainty
1. Given a model trained with several pictures of dog breeds, a user asks the model
to decide on a dog breed using a photo of a cat.
Gal (2016)
14
Model uncertainty
2. We have three different types of images to classify, cat, dog, and cow, where only
cat images are noisy.
Gal (2016)
15
Model uncertainty
3. What is the best model parameters that best explain a given dataset? what
model structure should we use?
Gal (2016)
16
Model uncertainty
1. Given a model trained with several pictures of dog breeds, a user asks the model
to decide on a dog breed using a photo of a cat.
2. We have three different types of images to classify, cat, dog, and cow, where only
cat images are noisy.
3. What is the best model parameters that best explain a given dataset? what
model structure should we use?
Gal (2016)
17
Model uncertainty
1. Given a model trained with several pictures of dog breeds, a user asks the model
to decide on a dog breed using a photo of a cat.
2. We have three different types of images to classify, cat, dog, and cow, where only
cat images are noisy.
3. What is the best model parameters that best explain a given dataset? what
model structure should we use?
Out of distribution test data
Aleatoric uncertainty
Epistemic uncertainty
Gal (2016)
18
Predictive mean and uncertainties
19
P. McClure, Representing Inferential Uncertainty in Deep
Neural Networks Through Sampling, 2017
McClure (2017)
20
Different variational distributions
McClure (2017)
21
Results on MNIST
22
G. Khan et al., Uncertainty-Aware Reinforcement
Learning from Collision Avoidance, 2016
Khan (2016)
23
Uncertainty-Aware Reinforcement Learning
Uncertainty-aware collision prediction model
Khan (2016)
24
Uncertainty-Aware Reinforcement Learning
Uncertainty is based on bootstrapped neural networks using dropout
Khan (2016)
25
Uncertainty-Aware Reinforcement Learning
Uncertainty is based on bootstrapped neural networks using dropout
Bootstrapping

- Generate multiple datasets using sampling with replacement. 

- The intuition behind bootstrapping is that, by generating multiple populations
and training one model per population, the models will agree in high-density
areas (low uncertainty) and disagree in low-density areas (high uncertainty).
Khan (2016)
26
Uncertainty-Aware Reinforcement Learning
Uncertainty is based on bootstrapped neural networks using dropout
Bootstrapping

- Generate multiple datasets using sampling with replacement. 

- The intuition behind bootstrapping is that, by generating multiple populations
and training one model per population, the models will agree in high-density
areas (low uncertainty) and disagree in low-density areas (high uncertainty).
Dropout

- Dropout can be viewed as an economical approximation of an ensemble
method (such as bootstrapping) in which each sampled dropout mask
corresponds to a different model.
Khan (2016)
27
Uncertainty-Aware Reinforcement Learning
28
B. Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty
Estimation using Deep Ensembles, 2017
Lakshminarayanan (2017)
29
Proper scoring rule
A scoring rule assigns a numerical score to a predictive distribution rewarding
better calibrated predictions over worse. (…) It turns out many common neural
network loss functions are proper scoring rules.
Lakshminarayanan (2017)
30
Density network
x
µ✓(x) ✓(x)
L =
1
N
NX
i=1
log N(yi; µ✓(xi), 2
✓(xi))
f✓(x)
Lakshminarayanan (2017)
31
Adversarial training with a fast gradient sign method
Adversarial training can be also be interpreted as a computationally efficient
solution to smooth the predictive distributions by increasing the likelihood
of the target around an neighborhood of the observed training examples.
Lakshminarayanan (2017)
32
Proposed method
Lakshminarayanan (2017)
33
Proposed method
Empirical variance (5) Density network (1) Adversarial training Deep ensemble (5)
34
A. Kendal and Y. Gal, What Uncertainties Do We Need in
Bayesian Deep Learning for Computer Vision?, 2017
Kendal (2017)
35
Aleatoric & epistemic uncertainties
Kendal (2017)
36
Aleatoric & epistemic uncertainties
ˆW ⇠ q(W)
x
ˆy ˆW(x) ˆ2
ˆW
(x)
[ˆy, ˆ2
] = f
ˆW
(x)
L =
1
N
NX
i=1
log N(yi; ˆy ˆW(x), ˆ2
ˆW
(x))
Kendal (2017)
37
Aleatoric & epistemic uncertainties
ˆW ⇠ q(W)
x
ˆy ˆW(x) ˆ2
ˆW
(x)
[ˆy, ˆ2
] = f
ˆW
(x)
Var(y) ⇡
1
T
TX
t=1
ˆy2
t
TX
t=1
ˆyt
!2
+
1
T
TX
t=1
ˆ2
t
Epistemic unct. Aleatoric unct,
Kendal (2017)
38
Heteroscedastic uncertainty as loss attenuation
ˆW ⇠ q(W)
x
ˆy ˆW(x) ˆ2
ˆW
(x)
[ˆy, ˆ2
] = f
ˆW
(x)
Kendal (2017)
39
Results
40
S. Choi et al., Uncertainty-Aware Learning from Demonstration Using
Mixture Density Networks with Sampling-Free Variance Modeling, 2017
Choi (2017)
41
Mixture density networks
x
µ1(x) µ2(x) µ3(x)⇡1(x) ⇡2(x) ⇡3(x) 1(x) 2(x) 3(x)
f ˆW(x)
L =
1
N
NX
i=1
log
KX
j=1
⇡j(xi)N(yi; µj(xi), 2
j (x))
42
Choi (2017)
Mixture density networks
43
Choi (2017)
Explained and unexplained variance
We propose a sampling-free variance modeling method using a mixture
density network which can be decomposed into explained variance and
unexplained variance.
44
Choi (2017)
Explained and unexplained variance
In particular, explained variance represents model uncertainty whereas
unexplained variance indicates the uncertainty inherent in the process, e.g.,
measurement noise.
45
Choi (2017)
Explained and unexplained variance
We present uncertainty-aware learning from demonstration by using the
explained variance as a switching criterion between trained policy and rule-
based safe mode.
Unexplained
Variance
Explained
Variance
46
Choi (2017)
Driving experiments
47
Anonymous, Bayesian Uncertainty Estimation for
Batch Normalized Deep Networks, 2017
Anonymous (2018?)
48
Monte Carlo Batch Normalization (MCBN)
Anonymous (2018?)
49
Batch normalized deep nets as Bayesian modeling
Learnable parameter
Stochastic parameter
Anonymous (2018?)
50
Batch normalized deep nets as Bayesian modeling
Anonymous (2018?)
51
MCBN to Bayesian SegNet
52
)
( . )

Uncertainty Modeling in Deep Learning