SlideShare a Scribd company logo
1 of 48
Do Deep Generative Models* Know
What They Don't Know?
Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan
(DeepMind)
ICLR 2019
*Fake news, no GANs
Presented by: Julius Hietala
TL;DR
TL;DR
Normalizing flows, VAEs, PixelCNNs arenโ€™t reliable enough to
detect out of distribution data*
*in some interesting cases
Outline
โ€ข Paper introduction
โ€ข Some notes
โ€ข How normalizing flows work?
โ€ข Paper experiments
โ€ข Paper findings
โ€ข Conclusions
โ€ข Discussion
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
โ€ข Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
โ€ข Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models
โ€ข The latter two are interesting due to the fact that they allow for
exact likelihood calculation
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
โ€ข Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models
โ€ข The latter two are interesting due to the fact that they allow for
exact likelihood calculation
โ€ข Main question of the paper: can these models be used for
anomaly detection?
Some notes
โ€ข The authors report results for VAEs, PixelCNNs, and
normalizing flows.
Some notes
โ€ข The authors report results for VAEs, PixelCNNs, and
normalizing flows.
โ€ข Only normalizing flows are discussed and studied in depth
Some notes
โ€ข The authors report results for VAEs, PixelCNNs, and
normalizing flows.
โ€ข Only normalizing flows are discussed and studied in depth
โ€ข Is their analysis applicable to all the different types of models?
How normalizing flows work?
How normalizing flows work?
โ€ข Change of variables:
โ€ข ๐‘” = ๐‘“โˆ’1
โ€ข ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘ง
๐œ•๐‘ง
๐œ•๐‘ฅ
โ€ข โŸน ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ )
๐œ•๐‘“
๐œ•๐‘ฅ
๐‘ฅ
๐‘
๐‘”
๐‘‹
โ„โ„
*Illustration stolen from here:
https://www.youtube.com/watch?v=P4Ta-TZPVi0
How normalizing flows work?
โ€ข In multiple dimensions this is ๐‘ ๐‘ฅ ๐’™ = ๐‘ ๐‘ง ๐‘“(๐’™ ) det
๐œ•๐’‡
๐œ•๐’™
๐‘ ๐‘ฅ
๐‘ ๐‘ง
How normalizing flows work?
โ€ข In multiple dimensions this is
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
โ€ข We want to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข We can choose ๐‘ ๐‘ง(๐‘ง) as we wish (usually a gaussian)
โ€ข We can choose ๐‘“ (invertible, ๐‘” = ๐‘“โˆ’1
)
โ€ข Challenges?
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
โ€ข Designing ๐‘“ to be invertible might be a challenge
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
โ€ข Designing ๐‘“ to be invertible might be a challenge
โ€ข Flow based models are designed so that both of these are easy
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
โ€ข Designing ๐‘“ to be invertible might be a challenge
โ€ข Flow based models are designed so that both of these are easy
โ€ข Jacobian determinant:
โ€ข Make triangular so that only diagonal terms matter
โ€ข Make diagonal elements easy to calculate
How normalizing flows work?
โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
*s and t are NN()
How normalizing flows work?
โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
How normalizing flows work?
โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
โ€ข Even with multiple levels of these steps of โ€flowโ€ the Jacobian
determinant remains tractable since
det ๐ด๐ต = det ๐ด det ๐ต
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
โ€ข For likelihood estimation (anomaly detection etc. applications)
we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
โ€ข For likelihood estimation (anomaly detection etc. applications)
we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
โ€ข Models are optimized simply by maximizing the (log) likelihood
๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ)
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
โ€ข For likelihood estimation (anomaly detection etc. applications)
we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
โ€ข Models are optimized simply by maximizing the (log) likelihood
๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ)
โ€ข Glow demo: https://openai.com/blog/glow/
Paper experiments
โ€ข Train the model (Glow) on one data set (in distribution),
afterwards determine likelihoods for the training data (in
distribution) and another data set that was not used in training
(out of distribution)
Paper experiments
โ€ข Train the model (Glow) on one data set (in distribution),
afterwards determine likelihoods for the training data (in
distribution) and another data set that was not used in training
(out of distribution)
โ€ข Data set/distribution pairs:
โ€ข FashionMNIST vs. MNIST
โ€ข CIFAR-10 vs. SVHN
โ€ข CelebA vs. SVHN
โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
Paper findings
โ€ข FashionMNIST vs. MNIST
Paper findings
โ€ข FashionMNIST vs. MNIST
Paper findings
โ€ข CIFAR-10 vs. SVHN
Paper findings
โ€ข CIFAR-10 vs. SVHN
Paper findings
โ€ข CelebA vs. SVHN
Paper findings
โ€ข CelebA vs. SVHN
Paper findings
โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
Paper findings
โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
Paper findings
โ€ข Other model types
Paper findings
โ€ข The observations presented were the main contributions of the paper,
grain of salt needed with next points
Paper findings
โ€ข The observations presented were the main contributions of the paper,
grain of salt needed with next points
โ€ข They try to explain the phenomenon, but raising many questions from
the reviewers
Paper findings
โ€ข The observations presented were the main contributions of the paper,
grain of salt needed with next points
โ€ข They try to explain the phenomenon, but raising many questions from
the reviewers
โ€ข Change of variable formula* term analysis:
*๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
Paper findings
โ€ข They make the model โ€œconstant volumeโ€ (CV), i.e. det
๐œ•๐‘“
๐œ•๐‘ฅ
is constant
Paper findings
โ€ข Explanation of the phenomenon making a lot of assumptions:
โ€ข Training distribution ๐‘ฅ ~๐‘โˆ— and โ€adversarial distributionโ€ ๐‘ฅ ~๐‘ž,
generative model ๐‘(๐‘ฅ; ๐œƒ)
โ€ข ๐‘ž will have higher likelihood than ๐‘โˆ— if
๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0
โ€ข Assumptions:
โ€ข Second order expansion around ๐‘ฅ0
โ€ข Assuming ๐”ผ ๐‘ž = ๐”ผ ๐‘โˆ— = ๐‘ฅ0 (some empirical proof in the example case)
โ€ข Latent distribution is gaussian
โ€ข Using constant volume
โ€ข ๐‘ž= SVHN, ๐‘โˆ— = CIFAR-10
Paper findings
โ€ข For ๐‘ž=SVHN, ๐‘โˆ—=CIFAR-10, the assumptions given, and empirical
variances of the data
๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0
simplifies to:
1
2๐œŽ ๐œ“
2 ๐›ผ1
2
โˆ— 12.3 + ๐›ผ2
2
โˆ— 6.5 + ๐›ผ3
2
โˆ— 14.5 โ‰ฅ 0, where
๐›ผ ๐‘ =
๐‘˜=1
๐พ
๐‘—=1
๐ถ
๐‘ข ๐‘˜,๐‘,๐‘—
โ€ข ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ is thus always larger or equal to zero
since ๐›ผ ๐‘
2
โ‰ฅ 0
โ€ข Predicts that SVHN will be more likely than CIFAR-10
Paper findings
โ€ข Then hypothesize that reducing the variance of the data artificially will
increase the likelihood
Conclusions
โ€ข Cause to pause when using generative models in anomaly
detection
โ€ข Second order analysis provided (only applicable to a certain
type of flow + many assumptions)
โ€ข The authorโ€™s urge further study on the subject
Discussion
โ€ข How valid/applicable is their analysis?
โ€ข How come samples do not look like the OOD images if they
have higher likelihood?

More Related Content

Similar to Slides for "Do Deep Generative Models Know What They Don't know?"

Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability DataWorks Summit/Hadoop Summit
ย 
Mini datathon
Mini datathonMini datathon
Mini datathonKunal Jain
ย 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
ย 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
ย 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
ย 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE
ย 
forecasting model
forecasting modelforecasting model
forecasting modelFEG
ย 
R meetup lm
R meetup lmR meetup lm
R meetup lmNathan Day
ย 
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉFares Al-Qunaieer
ย 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
ย 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
ย 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
ย 
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowPaper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowMin-Yih Hsu
ย 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
ย 
Market Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesMarket Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesLuca Zavarella
ย 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdfAli Zoljodi
ย 
Change Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability LossChange Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability LossGiacomo Boracchi
ย 

Similar to Slides for "Do Deep Generative Models Know What They Don't know?" (20)

Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability
ย 
Mini datathon
Mini datathonMini datathon
Mini datathon
ย 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
ย 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
ย 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
ย 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
ย 
forecasting model
forecasting modelforecasting model
forecasting model
ย 
R meetup lm
R meetup lmR meetup lm
R meetup lm
ย 
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ย 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
ย 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
ย 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
ย 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
ย 
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowPaper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
ย 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
ย 
Market Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesMarket Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning Services
ย 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
ย 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
ย 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdf
ย 
Change Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability LossChange Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability Loss
ย 

Recently uploaded

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
ย 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
ย 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
ย 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
ย 
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.aasikanpl
ย 
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”soniya singh
ย 
Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |
Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |
Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
ย 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
ย 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
ย 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
ย 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
ย 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
ย 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
ย 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
ย 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
ย 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
ย 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
ย 
Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.aasikanpl
ย 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
ย 

Recently uploaded (20)

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
ย 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
ย 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
ย 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
ย 
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
ย 
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Munirka Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
ย 
Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |
Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |
Call Us โ‰ฝ 9953322196 โ‰ผ Call Girls In Lajpat Nagar (Delhi) |
ย 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
ย 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
ย 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
ย 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
ย 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
ย 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
ย 
Hot Sexy call girls in Moti Nagar,๐Ÿ” 9953056974 ๐Ÿ” escort Service
Hot Sexy call girls in  Moti Nagar,๐Ÿ” 9953056974 ๐Ÿ” escort ServiceHot Sexy call girls in  Moti Nagar,๐Ÿ” 9953056974 ๐Ÿ” escort Service
Hot Sexy call girls in Moti Nagar,๐Ÿ” 9953056974 ๐Ÿ” escort Service
ย 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
ย 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
ย 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
ย 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
ย 
Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
Call Girls in Aiims Metro Delhi ๐Ÿ’ฏCall Us ๐Ÿ”9953322196๐Ÿ” ๐Ÿ’ฏEscort.
ย 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
ย 

Slides for "Do Deep Generative Models Know What They Don't know?"

  • 1. Do Deep Generative Models* Know What They Don't Know? Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan (DeepMind) ICLR 2019 *Fake news, no GANs Presented by: Julius Hietala
  • 3. TL;DR Normalizing flows, VAEs, PixelCNNs arenโ€™t reliable enough to detect out of distribution data* *in some interesting cases
  • 4. Outline โ€ข Paper introduction โ€ข Some notes โ€ข How normalizing flows work? โ€ข Paper experiments โ€ข Paper findings โ€ข Conclusions โ€ข Discussion
  • 5. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.)
  • 6. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models
  • 7. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models โ€ข Currently popular choices are VAEs, GANs, auto regressive models, and invertible latent variable models
  • 8. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models โ€ข Currently popular choices are VAEs, GANs, auto regressive models, and invertible latent variable models โ€ข The latter two are interesting due to the fact that they allow for exact likelihood calculation
  • 9. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models โ€ข Currently popular choices are VAEs, GANs, auto regressive models, and invertible latent variable models โ€ข The latter two are interesting due to the fact that they allow for exact likelihood calculation โ€ข Main question of the paper: can these models be used for anomaly detection?
  • 10. Some notes โ€ข The authors report results for VAEs, PixelCNNs, and normalizing flows.
  • 11. Some notes โ€ข The authors report results for VAEs, PixelCNNs, and normalizing flows. โ€ข Only normalizing flows are discussed and studied in depth
  • 12. Some notes โ€ข The authors report results for VAEs, PixelCNNs, and normalizing flows. โ€ข Only normalizing flows are discussed and studied in depth โ€ข Is their analysis applicable to all the different types of models?
  • 14. How normalizing flows work? โ€ข Change of variables: โ€ข ๐‘” = ๐‘“โˆ’1 โ€ข ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘ง ๐œ•๐‘ง ๐œ•๐‘ฅ โ€ข โŸน ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) ๐œ•๐‘“ ๐œ•๐‘ฅ ๐‘ฅ ๐‘ ๐‘” ๐‘‹ โ„โ„ *Illustration stolen from here: https://www.youtube.com/watch?v=P4Ta-TZPVi0
  • 15. How normalizing flows work? โ€ข In multiple dimensions this is ๐‘ ๐‘ฅ ๐’™ = ๐‘ ๐‘ง ๐‘“(๐’™ ) det ๐œ•๐’‡ ๐œ•๐’™ ๐‘ ๐‘ฅ ๐‘ ๐‘ง
  • 16. How normalizing flows work? โ€ข In multiple dimensions this is ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ โ€ข We want to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข We can choose ๐‘ ๐‘ง(๐‘ง) as we wish (usually a gaussian) โ€ข We can choose ๐‘“ (invertible, ๐‘” = ๐‘“โˆ’1 ) โ€ข Challenges?
  • 17. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant)
  • 18. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant) โ€ข Designing ๐‘“ to be invertible might be a challenge
  • 19. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant) โ€ข Designing ๐‘“ to be invertible might be a challenge โ€ข Flow based models are designed so that both of these are easy
  • 20. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant) โ€ข Designing ๐‘“ to be invertible might be a challenge โ€ข Flow based models are designed so that both of these are easy โ€ข Jacobian determinant: โ€ข Make triangular so that only diagonal terms matter โ€ข Make diagonal elements easy to calculate
  • 21. How normalizing flows work? โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf): *s and t are NN()
  • 22. How normalizing flows work? โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
  • 23. How normalizing flows work? โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf): โ€ข Even with multiple levels of these steps of โ€flowโ€ the Jacobian determinant remains tractable since det ๐ด๐ต = det ๐ด det ๐ต
  • 24. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
  • 25. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
  • 26. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse) โ€ข For likelihood estimation (anomaly detection etc. applications) we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ
  • 27. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse) โ€ข For likelihood estimation (anomaly detection etc. applications) we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ โ€ข Models are optimized simply by maximizing the (log) likelihood ๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ)
  • 28. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse) โ€ข For likelihood estimation (anomaly detection etc. applications) we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ โ€ข Models are optimized simply by maximizing the (log) likelihood ๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ) โ€ข Glow demo: https://openai.com/blog/glow/
  • 29. Paper experiments โ€ข Train the model (Glow) on one data set (in distribution), afterwards determine likelihoods for the training data (in distribution) and another data set that was not used in training (out of distribution)
  • 30. Paper experiments โ€ข Train the model (Glow) on one data set (in distribution), afterwards determine likelihoods for the training data (in distribution) and another data set that was not used in training (out of distribution) โ€ข Data set/distribution pairs: โ€ข FashionMNIST vs. MNIST โ€ข CIFAR-10 vs. SVHN โ€ข CelebA vs. SVHN โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
  • 37. Paper findings โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
  • 38. Paper findings โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
  • 40. Paper findings โ€ข The observations presented were the main contributions of the paper, grain of salt needed with next points
  • 41. Paper findings โ€ข The observations presented were the main contributions of the paper, grain of salt needed with next points โ€ข They try to explain the phenomenon, but raising many questions from the reviewers
  • 42. Paper findings โ€ข The observations presented were the main contributions of the paper, grain of salt needed with next points โ€ข They try to explain the phenomenon, but raising many questions from the reviewers โ€ข Change of variable formula* term analysis: *๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ
  • 43. Paper findings โ€ข They make the model โ€œconstant volumeโ€ (CV), i.e. det ๐œ•๐‘“ ๐œ•๐‘ฅ is constant
  • 44. Paper findings โ€ข Explanation of the phenomenon making a lot of assumptions: โ€ข Training distribution ๐‘ฅ ~๐‘โˆ— and โ€adversarial distributionโ€ ๐‘ฅ ~๐‘ž, generative model ๐‘(๐‘ฅ; ๐œƒ) โ€ข ๐‘ž will have higher likelihood than ๐‘โˆ— if ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0 โ€ข Assumptions: โ€ข Second order expansion around ๐‘ฅ0 โ€ข Assuming ๐”ผ ๐‘ž = ๐”ผ ๐‘โˆ— = ๐‘ฅ0 (some empirical proof in the example case) โ€ข Latent distribution is gaussian โ€ข Using constant volume โ€ข ๐‘ž= SVHN, ๐‘โˆ— = CIFAR-10
  • 45. Paper findings โ€ข For ๐‘ž=SVHN, ๐‘โˆ—=CIFAR-10, the assumptions given, and empirical variances of the data ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0 simplifies to: 1 2๐œŽ ๐œ“ 2 ๐›ผ1 2 โˆ— 12.3 + ๐›ผ2 2 โˆ— 6.5 + ๐›ผ3 2 โˆ— 14.5 โ‰ฅ 0, where ๐›ผ ๐‘ = ๐‘˜=1 ๐พ ๐‘—=1 ๐ถ ๐‘ข ๐‘˜,๐‘,๐‘— โ€ข ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ is thus always larger or equal to zero since ๐›ผ ๐‘ 2 โ‰ฅ 0 โ€ข Predicts that SVHN will be more likely than CIFAR-10
  • 46. Paper findings โ€ข Then hypothesize that reducing the variance of the data artificially will increase the likelihood
  • 47. Conclusions โ€ข Cause to pause when using generative models in anomaly detection โ€ข Second order analysis provided (only applicable to a certain type of flow + many assumptions) โ€ข The authorโ€™s urge further study on the subject
  • 48. Discussion โ€ข How valid/applicable is their analysis? โ€ข How come samples do not look like the OOD images if they have higher likelihood?