Uncertainty Modeling in Deep Learning

Survey of
Uncertainties in
Deep Learning
in a nutshell
Sungjoon Choi

CPSLAB, SNU

Introduction
2
The ﬁrst fatality from an assisted driving system

Introduction
3
Google Photos identiﬁed two black people as 'gorillas'

5
Y. Gal, Uncertainty in Deep Learning, 2016

Gal (2016)
6
Dropout as a Bayesian approximation

Gal (2016)
7

Gal (2016)
8

Gal (2016)
9
Bayesian Neural Network
p(y⇤
|x⇤
, X, Y) =
Z
p(y⇤
|x⇤
, w)p(w|X, Y)dwInference
p(w|X, Y)Posterior

Gal (2016)
10
Bayesian Neural Network
p(y⇤
|x⇤
, X, Y) =
Z
p(y⇤
|x⇤
, w)p(w|X, Y)dwInference
p(w|X, Y)Posterior
Variational Inference KL (q✓(w)||p(w|X, Y)) =
Z
q✓(w) log
q✓(w)
p(w|X, Y)
dw
ELBO
Z
q✓(w) log p(Y|X, w)dw KL(q✓(w)||p(w))
ELBO (reparametrization)
Z
p(✏) log p(Y|X, w)d✏ KL(q✓(w)||p(w))
w = g(✓, ✏)

Gal (2016)
11
Bayesian Neural Network with dropout

Gal (2016)
12
Bayesian Neural Network with dropout

Gal (2016)
13
Model uncertainty
1. Given a model trained with several pictures of dog breeds, a user asks the model
to decide on a dog breed using a photo of a cat.

Gal (2016)
14
Model uncertainty
2. We have three diﬀerent types of images to classify, cat, dog, and cow, where only
cat images are noisy.

Gal (2016)
15
Model uncertainty
3. What is the best model parameters that best explain a given dataset? what
model structure should we use?

Gal (2016)
16
Model uncertainty

Gal (2016)
17
Model uncertainty
Out of distribution test data
Aleatoric uncertainty
Epistemic uncertainty

Gal (2016)
18
Predictive mean and uncertainties

19
P. McClure, Representing Inferential Uncertainty in Deep
Neural Networks Through Sampling, 2017

McClure (2017)
20
Diﬀerent variational distributions

McClure (2017)
21
Results on MNIST

22
G. Khan et al., Uncertainty-Aware Reinforcement
Learning from Collision Avoidance, 2016

Khan (2016)
23
Uncertainty-Aware Reinforcement Learning
Uncertainty-aware collision prediction model

Khan (2016)
24
Uncertainty is based on bootstrapped neural networks using dropout

Khan (2016)
25
Bootstrapping

- Generate multiple datasets using sampling with replacement.

- The intuition behind bootstrapping is that, by generating multiple populations
and training one model per population, the models will agree in high-density
areas (low uncertainty) and disagree in low-density areas (high uncertainty).

Khan (2016)
26
Bootstrapping

- Generate multiple datasets using sampling with replacement.

- The intuition behind bootstrapping is that, by generating multiple populations
and training one model per population, the models will agree in high-density
areas (low uncertainty) and disagree in low-density areas (high uncertainty).
Dropout

- Dropout can be viewed as an economical approximation of an ensemble
method (such as bootstrapping) in which each sampled dropout mask
corresponds to a diﬀerent model.

Khan (2016)
27

28
B. Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty
Estimation using Deep Ensembles, 2017

Lakshminarayanan (2017)
29
Proper scoring rule
A scoring rule assigns a numerical score to a predictive distribution rewarding
better calibrated predictions over worse. (…) It turns out many common neural
network loss functions are proper scoring rules.

30
Density network
x
µ✓(x) ✓(x)
L =
1
N
NX
i=1
log N(yi; µ✓(xi), 2
✓(xi))
f✓(x)

31
Adversarial training with a fast gradient sign method
Adversarial training can be also be interpreted as a computationally eﬃcient
solution to smooth the predictive distributions by increasing the likelihood
of the target around an neighborhood of the observed training examples.

32
Proposed method

33
Proposed method
Empirical variance (5) Density network (1) Adversarial training Deep ensemble (5)

34
A. Kendal and Y. Gal, What Uncertainties Do We Need in
Bayesian Deep Learning for Computer Vision?, 2017

Kendal (2017)
35
Aleatoric & epistemic uncertainties

Kendal (2017)
36
ˆW ⇠ q(W)
x
ˆy ˆW(x) ˆ2
ˆW
(x)
[ˆy, ˆ2
] = f
ˆW
(x)
L =
1
N
NX
i=1
log N(yi; ˆy ˆW(x), ˆ2
ˆW
(x))

Kendal (2017)
37
ˆW ⇠ q(W)
x
ˆy ˆW(x) ˆ2
ˆW
(x)
[ˆy, ˆ2
] = f
ˆW
(x)
Var(y) ⇡
1
T
TX
t=1
ˆy2
t
TX
t=1
ˆyt
!2
+
1
T
TX
t=1
ˆ2
t
Epistemic unct. Aleatoric unct,

Kendal (2017)
38
Heteroscedastic uncertainty as loss attenuation
ˆW ⇠ q(W)
x
ˆy ˆW(x) ˆ2
ˆW
(x)
[ˆy, ˆ2
] = f
ˆW
(x)

40
S. Choi et al., Uncertainty-Aware Learning from Demonstration Using
Mixture Density Networks with Sampling-Free Variance Modeling, 2017

Choi (2017)
41
Mixture density networks
x
µ1(x) µ2(x) µ3(x)⇡1(x) ⇡2(x) ⇡3(x) 1(x) 2(x) 3(x)
f ˆW(x)
L =
1
N
NX
i=1
log
KX
j=1
⇡j(xi)N(yi; µj(xi), 2
j (x))

42
Choi (2017)
Mixture density networks

43
Choi (2017)
Explained and unexplained variance
We propose a sampling-free variance modeling method using a mixture
density network which can be decomposed into explained variance and
unexplained variance.

44
Choi (2017)
In particular, explained variance represents model uncertainty whereas
unexplained variance indicates the uncertainty inherent in the process, e.g.,
measurement noise.

45
Choi (2017)
We present uncertainty-aware learning from demonstration by using the
explained variance as a switching criterion between trained policy and rule-
based safe mode.
Unexplained
Variance
Explained
Variance

46
Choi (2017)
Driving experiments

47
Anonymous, Bayesian Uncertainty Estimation for
Batch Normalized Deep Networks, 2017

Anonymous (2018?)
48
Monte Carlo Batch Normalization (MCBN)

Anonymous (2018?)
49
Batch normalized deep nets as Bayesian modeling
Learnable parameter
Stochastic parameter

Anonymous (2018?)
50
Batch normalized deep nets as Bayesian modeling

Anonymous (2018?)
51
MCBN to Bayesian SegNet

Uncertainty Modeling in Deep Learning

More Related Content

What's hot

Similar to Uncertainty Modeling in Deep Learning

More from Sungjoon Choi

Recently uploaded

Uncertainty Modeling in Deep Learning