Slides for "Do Deep Generative Models Know What They Don't know?"

Do Deep Generative Models* Know
What They Don't Know?
Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan
(DeepMind)
ICLR 2019
*Fake news, no GANs
Presented by: Julius Hietala

TL;DR
Normalizing flows, VAEs, PixelCNNs aren’t reliable enough to
detect out of distribution data*
*in some interesting cases

Outline
• Paper introduction
• Some notes
• How normalizing flows work?
• Paper experiments
• Paper findings
• Conclusions
• Discussion

Paper introduction
• Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)

Paper introduction
• These applications have spawned interest towards deep
generative models

Paper introduction
generative models
• Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models

Paper introduction
generative models
• The latter two are interesting due to the fact that they allow for
exact likelihood calculation

Paper introduction
generative models
• The latter two are interesting due to the fact that they allow for
exact likelihood calculation
• Main question of the paper: can these models be used for
anomaly detection?

Some notes
• The authors report results for VAEs, PixelCNNs, and
normalizing flows.

Some notes
normalizing flows.
• Only normalizing flows are discussed and studied in depth

Some notes
normalizing flows.
• Only normalizing flows are discussed and studied in depth
• Is their analysis applicable to all the different types of models?

How normalizing flows work?
• Change of variables:
• 𝑔 = 𝑓−1
• 𝑝 𝑥 𝑥 = 𝑝 𝑧 𝑧
𝜕𝑧
𝜕𝑥
• ⟹ 𝑝 𝑥 𝑥 = 𝑝 𝑧 𝑓(𝑥 )
𝜕𝑓
𝜕𝑥
𝑥
𝑍
𝑔
𝑋
ℝℝ
*Illustration stolen from here:
https://www.youtube.com/watch?v=P4Ta-TZPVi0

• In multiple dimensions this is 𝑝 𝑥 𝒙 = 𝑝 𝑧 𝑓(𝒙 ) det
𝜕𝒇
𝜕𝒙
𝑝 𝑥
𝑝 𝑧

• In multiple dimensions this is
𝑝 𝑥 𝑥 = 𝑝 𝑧 𝑓(𝑥 ) det
𝜕𝑓
𝜕𝑥
• We want to determine 𝑝 𝑥 𝑥
• We can choose 𝑝 𝑧(𝑧) as we wish (usually a gaussian)
• We can choose 𝑓 (invertible, 𝑔 = 𝑓−1
)
• Challenges?

• Calculating det
𝜕𝑓
𝜕𝑥
could be hard (Jacobian determinant)

• Calculating det
𝜕𝑓
𝜕𝑥
• Designing 𝑓 to be invertible might be a challenge

• Calculating det
𝜕𝑓
𝜕𝑥
• Flow based models are designed so that both of these are easy

• Calculating det
𝜕𝑓
𝜕𝑥
• Flow based models are designed so that both of these are easy
• Jacobian determinant:
• Make triangular so that only diagonal terms matter
• Make diagonal elements easy to calculate

• Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
*s and t are NN()

• Even with multiple levels of these steps of ”flow” the Jacobian
determinant remains tractable since
det 𝐴𝐵 = det 𝐴 det 𝐵

• So we are able to determine 𝑝 𝑥 𝑥

• For generation, we would just sample from 𝑝 𝑥 𝑥 (sample from
𝑝 𝑧 𝑧 and ”flow” the sample back in reverse)

• For likelihood estimation (anomaly detection etc. applications)
we just ”flow” 𝑥 through the model to get the likelihood given by
𝜕𝑓
𝜕𝑥

𝜕𝑓
𝜕𝑥
• Models are optimized simply by maximizing the (log) likelihood
𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜃 log 𝑝 𝑥(𝑥; 𝜃)

𝜕𝑓
𝜕𝑥
• Models are optimized simply by maximizing the (log) likelihood
𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜃 log 𝑝 𝑥(𝑥; 𝜃)
• Glow demo: https://openai.com/blog/glow/

Paper experiments
• Train the model (Glow) on one data set (in distribution),
afterwards determine likelihoods for the training data (in
distribution) and another data set that was not used in training
(out of distribution)

Paper experiments
• Train the model (Glow) on one data set (in distribution),
afterwards determine likelihoods for the training data (in
distribution) and another data set that was not used in training
(out of distribution)
• Data set/distribution pairs:
• FashionMNIST vs. MNIST
• CIFAR-10 vs. SVHN
• CelebA vs. SVHN
• ImageNet vs. CIFAR-10/CIFAR-100/SVHN

Paper findings
• FashionMNIST vs. MNIST

Paper findings
• CIFAR-10 vs. SVHN

Paper findings
• CelebA vs. SVHN

Paper findings
• ImageNet vs. CIFAR-10/CIFAR-100/SVHN

Paper findings
• Other model types

Paper findings
• The observations presented were the main contributions of the paper,
grain of salt needed with next points

Paper findings
• They try to explain the phenomenon, but raising many questions from
the reviewers

Paper findings
• They try to explain the phenomenon, but raising many questions from
the reviewers
• Change of variable formula* term analysis:
*𝑝 𝑥 𝑥 = 𝑝 𝑧 𝑓(𝑥 ) det
𝜕𝑓
𝜕𝑥

Paper findings
• They make the model “constant volume” (CV), i.e. det
𝜕𝑓
𝜕𝑥
is constant

Paper findings
• Explanation of the phenomenon making a lot of assumptions:
• Training distribution 𝑥 ~𝑝∗ and ”adversarial distribution” 𝑥 ~𝑞,
generative model 𝑝(𝑥; 𝜃)
• 𝑞 will have higher likelihood than 𝑝∗ if
𝔼 𝑞 log p 𝑥; 𝜃 − 𝔼 𝑝∗ log p 𝑥; 𝜃 > 0
• Assumptions:
• Second order expansion around 𝑥0
• Assuming 𝔼 𝑞 = 𝔼 𝑝∗ = 𝑥0 (some empirical proof in the example case)
• Latent distribution is gaussian
• Using constant volume
• 𝑞= SVHN, 𝑝∗ = CIFAR-10

Paper findings
• For 𝑞=SVHN, 𝑝∗=CIFAR-10, the assumptions given, and empirical
variances of the data
𝔼 𝑞 log p 𝑥; 𝜃 − 𝔼 𝑝∗ log p 𝑥; 𝜃 > 0
simplifies to:
1
2𝜎 𝜓
2 𝛼1
2
∗ 12.3 + 𝛼2
2
∗ 6.5 + 𝛼3
2
∗ 14.5 ≥ 0, where
𝛼 𝑐 =
𝑘=1
𝐾
𝑗=1
𝐶
𝑢 𝑘,𝑐,𝑗
• 𝔼 𝑞 log p 𝑥; 𝜃 − 𝔼 𝑝∗ log p 𝑥; 𝜃 is thus always larger or equal to zero
since 𝛼 𝑐
2
≥ 0
• Predicts that SVHN will be more likely than CIFAR-10

Paper findings
• Then hypothesize that reducing the variance of the data artificially will
increase the likelihood

Conclusions
• Cause to pause when using generative models in anomaly
detection
• Second order analysis provided (only applicable to a certain
type of flow + many assumptions)
• The author’s urge further study on the subject

Discussion
• How valid/applicable is their analysis?
• How come samples do not look like the OOD images if they
have higher likelihood?

Slides for "Do Deep Generative Models Know What They Don't know?"

Recommended

Recommended

More Related Content

Similar to Slides for "Do Deep Generative Models Know What They Don't know?"

Similar to Slides for "Do Deep Generative Models Know What They Don't know?" (20)

Recently uploaded

Recently uploaded (20)

Slides for "Do Deep Generative Models Know What They Don't know?"