SlideShare a Scribd company logo
1 of 48
Do Deep Generative Models* Know
What They Don't Know?
Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan
(DeepMind)
ICLR 2019
*Fake news, no GANs
Presented by: Julius Hietala
TL;DR
TL;DR
Normalizing flows, VAEs, PixelCNNs arenโ€™t reliable enough to
detect out of distribution data*
*in some interesting cases
Outline
โ€ข Paper introduction
โ€ข Some notes
โ€ข How normalizing flows work?
โ€ข Paper experiments
โ€ข Paper findings
โ€ข Conclusions
โ€ข Discussion
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
โ€ข Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
โ€ข Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models
โ€ข The latter two are interesting due to the fact that they allow for
exact likelihood calculation
Paper introduction
โ€ข Density estimation/determination is used in many applications
(anomaly detection, transfer learning etc.)
โ€ข These applications have spawned interest towards deep
generative models
โ€ข Currently popular choices are VAEs, GANs, auto regressive
models, and invertible latent variable models
โ€ข The latter two are interesting due to the fact that they allow for
exact likelihood calculation
โ€ข Main question of the paper: can these models be used for
anomaly detection?
Some notes
โ€ข The authors report results for VAEs, PixelCNNs, and
normalizing flows.
Some notes
โ€ข The authors report results for VAEs, PixelCNNs, and
normalizing flows.
โ€ข Only normalizing flows are discussed and studied in depth
Some notes
โ€ข The authors report results for VAEs, PixelCNNs, and
normalizing flows.
โ€ข Only normalizing flows are discussed and studied in depth
โ€ข Is their analysis applicable to all the different types of models?
How normalizing flows work?
How normalizing flows work?
โ€ข Change of variables:
โ€ข ๐‘” = ๐‘“โˆ’1
โ€ข ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘ง
๐œ•๐‘ง
๐œ•๐‘ฅ
โ€ข โŸน ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ )
๐œ•๐‘“
๐œ•๐‘ฅ
๐‘ฅ
๐‘
๐‘”
๐‘‹
โ„โ„
*Illustration stolen from here:
https://www.youtube.com/watch?v=P4Ta-TZPVi0
How normalizing flows work?
โ€ข In multiple dimensions this is ๐‘ ๐‘ฅ ๐’™ = ๐‘ ๐‘ง ๐‘“(๐’™ ) det
๐œ•๐’‡
๐œ•๐’™
๐‘ ๐‘ฅ
๐‘ ๐‘ง
How normalizing flows work?
โ€ข In multiple dimensions this is
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
โ€ข We want to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข We can choose ๐‘ ๐‘ง(๐‘ง) as we wish (usually a gaussian)
โ€ข We can choose ๐‘“ (invertible, ๐‘” = ๐‘“โˆ’1
)
โ€ข Challenges?
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
โ€ข Designing ๐‘“ to be invertible might be a challenge
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
โ€ข Designing ๐‘“ to be invertible might be a challenge
โ€ข Flow based models are designed so that both of these are easy
How normalizing flows work?
โ€ข Calculating det
๐œ•๐‘“
๐œ•๐‘ฅ
could be hard (Jacobian determinant)
โ€ข Designing ๐‘“ to be invertible might be a challenge
โ€ข Flow based models are designed so that both of these are easy
โ€ข Jacobian determinant:
โ€ข Make triangular so that only diagonal terms matter
โ€ข Make diagonal elements easy to calculate
How normalizing flows work?
โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
*s and t are NN()
How normalizing flows work?
โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
How normalizing flows work?
โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
โ€ข Even with multiple levels of these steps of โ€flowโ€ the Jacobian
determinant remains tractable since
det ๐ด๐ต = det ๐ด det ๐ต
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
โ€ข For likelihood estimation (anomaly detection etc. applications)
we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
โ€ข For likelihood estimation (anomaly detection etc. applications)
we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
โ€ข Models are optimized simply by maximizing the (log) likelihood
๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ)
How normalizing flows work?
โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from
๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
โ€ข For likelihood estimation (anomaly detection etc. applications)
we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by
๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
โ€ข Models are optimized simply by maximizing the (log) likelihood
๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ)
โ€ข Glow demo: https://openai.com/blog/glow/
Paper experiments
โ€ข Train the model (Glow) on one data set (in distribution),
afterwards determine likelihoods for the training data (in
distribution) and another data set that was not used in training
(out of distribution)
Paper experiments
โ€ข Train the model (Glow) on one data set (in distribution),
afterwards determine likelihoods for the training data (in
distribution) and another data set that was not used in training
(out of distribution)
โ€ข Data set/distribution pairs:
โ€ข FashionMNIST vs. MNIST
โ€ข CIFAR-10 vs. SVHN
โ€ข CelebA vs. SVHN
โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
Paper findings
โ€ข FashionMNIST vs. MNIST
Paper findings
โ€ข FashionMNIST vs. MNIST
Paper findings
โ€ข CIFAR-10 vs. SVHN
Paper findings
โ€ข CIFAR-10 vs. SVHN
Paper findings
โ€ข CelebA vs. SVHN
Paper findings
โ€ข CelebA vs. SVHN
Paper findings
โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
Paper findings
โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
Paper findings
โ€ข Other model types
Paper findings
โ€ข The observations presented were the main contributions of the paper,
grain of salt needed with next points
Paper findings
โ€ข The observations presented were the main contributions of the paper,
grain of salt needed with next points
โ€ข They try to explain the phenomenon, but raising many questions from
the reviewers
Paper findings
โ€ข The observations presented were the main contributions of the paper,
grain of salt needed with next points
โ€ข They try to explain the phenomenon, but raising many questions from
the reviewers
โ€ข Change of variable formula* term analysis:
*๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det
๐œ•๐‘“
๐œ•๐‘ฅ
Paper findings
โ€ข They make the model โ€œconstant volumeโ€ (CV), i.e. det
๐œ•๐‘“
๐œ•๐‘ฅ
is constant
Paper findings
โ€ข Explanation of the phenomenon making a lot of assumptions:
โ€ข Training distribution ๐‘ฅ ~๐‘โˆ— and โ€adversarial distributionโ€ ๐‘ฅ ~๐‘ž,
generative model ๐‘(๐‘ฅ; ๐œƒ)
โ€ข ๐‘ž will have higher likelihood than ๐‘โˆ— if
๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0
โ€ข Assumptions:
โ€ข Second order expansion around ๐‘ฅ0
โ€ข Assuming ๐”ผ ๐‘ž = ๐”ผ ๐‘โˆ— = ๐‘ฅ0 (some empirical proof in the example case)
โ€ข Latent distribution is gaussian
โ€ข Using constant volume
โ€ข ๐‘ž= SVHN, ๐‘โˆ— = CIFAR-10
Paper findings
โ€ข For ๐‘ž=SVHN, ๐‘โˆ—=CIFAR-10, the assumptions given, and empirical
variances of the data
๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0
simplifies to:
1
2๐œŽ ๐œ“
2 ๐›ผ1
2
โˆ— 12.3 + ๐›ผ2
2
โˆ— 6.5 + ๐›ผ3
2
โˆ— 14.5 โ‰ฅ 0, where
๐›ผ ๐‘ =
๐‘˜=1
๐พ
๐‘—=1
๐ถ
๐‘ข ๐‘˜,๐‘,๐‘—
โ€ข ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ is thus always larger or equal to zero
since ๐›ผ ๐‘
2
โ‰ฅ 0
โ€ข Predicts that SVHN will be more likely than CIFAR-10
Paper findings
โ€ข Then hypothesize that reducing the variance of the data artificially will
increase the likelihood
Conclusions
โ€ข Cause to pause when using generative models in anomaly
detection
โ€ข Second order analysis provided (only applicable to a certain
type of flow + many assumptions)
โ€ข The authorโ€™s urge further study on the subject
Discussion
โ€ข How valid/applicable is their analysis?
โ€ข How come samples do not look like the OOD images if they
have higher likelihood?

More Related Content

Similar to Slides for "Do Deep Generative Models Know What They Don't know?"

Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability DataWorks Summit/Hadoop Summit
ย 
Mini datathon
Mini datathonMini datathon
Mini datathonKunal Jain
ย 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
ย 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
ย 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
ย 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE
ย 
forecasting model
forecasting modelforecasting model
forecasting modelFEG
ย 
R meetup lm
R meetup lmR meetup lm
R meetup lmNathan Day
ย 
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉFares Al-Qunaieer
ย 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
ย 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
ย 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
ย 
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowPaper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowMin-Yih Hsu
ย 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
ย 
Market Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesMarket Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesLuca Zavarella
ย 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdfAli Zoljodi
ย 
Change Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability LossChange Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability LossGiacomo Boracchi
ย 

Similar to Slides for "Do Deep Generative Models Know What They Don't know?" (20)

Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability
ย 
Mini datathon
Mini datathonMini datathon
Mini datathon
ย 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
ย 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
ย 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
ย 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
ย 
forecasting model
forecasting modelforecasting model
forecasting model
ย 
R meetup lm
R meetup lmR meetup lm
R meetup lm
ย 
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ู…ุฏุฎู„ ุฅู„ู‰ ุชุนู„ู… ุงู„ุขู„ุฉ
ย 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
ย 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
ย 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
ย 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
ย 
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowPaper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
ย 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
ย 
Market Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning ServicesMarket Basket Analysis in SQL Server Machine Learning Services
Market Basket Analysis in SQL Server Machine Learning Services
ย 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
ย 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
ย 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdf
ย 
Change Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability LossChange Detection in Multivariate Data: Likelihood and Detectability Loss
Change Detection in Multivariate Data: Likelihood and Detectability Loss
ย 

Recently uploaded

Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
ย 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
ย 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
ย 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
ย 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSรฉrgio Sacani
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
ย 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
ย 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
ย 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
ย 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
ย 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...Sรฉrgio Sacani
ย 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
ย 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
ย 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSรฉrgio Sacani
ย 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
ย 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
ย 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
ย 

Recently uploaded (20)

Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls AgencyHire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire ๐Ÿ’• 9907093804 Hooghly Call Girls Service Call Girls Agency
ย 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
ย 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
ย 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
ย 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
ย 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
ย 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
ย 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
ย 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
ย 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
ย 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
ย 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
ย 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
ย 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: โ€œEg...
ย 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
ย 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
ย 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
ย 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
ย 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
ย 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
ย 

Slides for "Do Deep Generative Models Know What They Don't know?"

  • 1. Do Deep Generative Models* Know What They Don't Know? Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan (DeepMind) ICLR 2019 *Fake news, no GANs Presented by: Julius Hietala
  • 3. TL;DR Normalizing flows, VAEs, PixelCNNs arenโ€™t reliable enough to detect out of distribution data* *in some interesting cases
  • 4. Outline โ€ข Paper introduction โ€ข Some notes โ€ข How normalizing flows work? โ€ข Paper experiments โ€ข Paper findings โ€ข Conclusions โ€ข Discussion
  • 5. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.)
  • 6. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models
  • 7. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models โ€ข Currently popular choices are VAEs, GANs, auto regressive models, and invertible latent variable models
  • 8. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models โ€ข Currently popular choices are VAEs, GANs, auto regressive models, and invertible latent variable models โ€ข The latter two are interesting due to the fact that they allow for exact likelihood calculation
  • 9. Paper introduction โ€ข Density estimation/determination is used in many applications (anomaly detection, transfer learning etc.) โ€ข These applications have spawned interest towards deep generative models โ€ข Currently popular choices are VAEs, GANs, auto regressive models, and invertible latent variable models โ€ข The latter two are interesting due to the fact that they allow for exact likelihood calculation โ€ข Main question of the paper: can these models be used for anomaly detection?
  • 10. Some notes โ€ข The authors report results for VAEs, PixelCNNs, and normalizing flows.
  • 11. Some notes โ€ข The authors report results for VAEs, PixelCNNs, and normalizing flows. โ€ข Only normalizing flows are discussed and studied in depth
  • 12. Some notes โ€ข The authors report results for VAEs, PixelCNNs, and normalizing flows. โ€ข Only normalizing flows are discussed and studied in depth โ€ข Is their analysis applicable to all the different types of models?
  • 14. How normalizing flows work? โ€ข Change of variables: โ€ข ๐‘” = ๐‘“โˆ’1 โ€ข ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘ง ๐œ•๐‘ง ๐œ•๐‘ฅ โ€ข โŸน ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) ๐œ•๐‘“ ๐œ•๐‘ฅ ๐‘ฅ ๐‘ ๐‘” ๐‘‹ โ„โ„ *Illustration stolen from here: https://www.youtube.com/watch?v=P4Ta-TZPVi0
  • 15. How normalizing flows work? โ€ข In multiple dimensions this is ๐‘ ๐‘ฅ ๐’™ = ๐‘ ๐‘ง ๐‘“(๐’™ ) det ๐œ•๐’‡ ๐œ•๐’™ ๐‘ ๐‘ฅ ๐‘ ๐‘ง
  • 16. How normalizing flows work? โ€ข In multiple dimensions this is ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ โ€ข We want to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข We can choose ๐‘ ๐‘ง(๐‘ง) as we wish (usually a gaussian) โ€ข We can choose ๐‘“ (invertible, ๐‘” = ๐‘“โˆ’1 ) โ€ข Challenges?
  • 17. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant)
  • 18. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant) โ€ข Designing ๐‘“ to be invertible might be a challenge
  • 19. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant) โ€ข Designing ๐‘“ to be invertible might be a challenge โ€ข Flow based models are designed so that both of these are easy
  • 20. How normalizing flows work? โ€ข Calculating det ๐œ•๐‘“ ๐œ•๐‘ฅ could be hard (Jacobian determinant) โ€ข Designing ๐‘“ to be invertible might be a challenge โ€ข Flow based models are designed so that both of these are easy โ€ข Jacobian determinant: โ€ข Make triangular so that only diagonal terms matter โ€ข Make diagonal elements easy to calculate
  • 21. How normalizing flows work? โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf): *s and t are NN()
  • 22. How normalizing flows work? โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf):
  • 23. How normalizing flows work? โ€ข Example from RealNVP (https://arxiv.org/pdf/1605.08803.pdf): โ€ข Even with multiple levels of these steps of โ€flowโ€ the Jacobian determinant remains tractable since det ๐ด๐ต = det ๐ด det ๐ต
  • 24. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ
  • 25. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse)
  • 26. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse) โ€ข For likelihood estimation (anomaly detection etc. applications) we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ
  • 27. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse) โ€ข For likelihood estimation (anomaly detection etc. applications) we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ โ€ข Models are optimized simply by maximizing the (log) likelihood ๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ)
  • 28. How normalizing flows work? โ€ข So we are able to determine ๐‘ ๐‘ฅ ๐‘ฅ โ€ข For generation, we would just sample from ๐‘ ๐‘ฅ ๐‘ฅ (sample from ๐‘ ๐‘ง ๐‘ง and โ€flowโ€ the sample back in reverse) โ€ข For likelihood estimation (anomaly detection etc. applications) we just โ€flowโ€ ๐‘ฅ through the model to get the likelihood given by ๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ โ€ข Models are optimized simply by maximizing the (log) likelihood ๐œƒโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐œƒ log ๐‘ ๐‘ฅ(๐‘ฅ; ๐œƒ) โ€ข Glow demo: https://openai.com/blog/glow/
  • 29. Paper experiments โ€ข Train the model (Glow) on one data set (in distribution), afterwards determine likelihoods for the training data (in distribution) and another data set that was not used in training (out of distribution)
  • 30. Paper experiments โ€ข Train the model (Glow) on one data set (in distribution), afterwards determine likelihoods for the training data (in distribution) and another data set that was not used in training (out of distribution) โ€ข Data set/distribution pairs: โ€ข FashionMNIST vs. MNIST โ€ข CIFAR-10 vs. SVHN โ€ข CelebA vs. SVHN โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
  • 37. Paper findings โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
  • 38. Paper findings โ€ข ImageNet vs. CIFAR-10/CIFAR-100/SVHN
  • 40. Paper findings โ€ข The observations presented were the main contributions of the paper, grain of salt needed with next points
  • 41. Paper findings โ€ข The observations presented were the main contributions of the paper, grain of salt needed with next points โ€ข They try to explain the phenomenon, but raising many questions from the reviewers
  • 42. Paper findings โ€ข The observations presented were the main contributions of the paper, grain of salt needed with next points โ€ข They try to explain the phenomenon, but raising many questions from the reviewers โ€ข Change of variable formula* term analysis: *๐‘ ๐‘ฅ ๐‘ฅ = ๐‘ ๐‘ง ๐‘“(๐‘ฅ ) det ๐œ•๐‘“ ๐œ•๐‘ฅ
  • 43. Paper findings โ€ข They make the model โ€œconstant volumeโ€ (CV), i.e. det ๐œ•๐‘“ ๐œ•๐‘ฅ is constant
  • 44. Paper findings โ€ข Explanation of the phenomenon making a lot of assumptions: โ€ข Training distribution ๐‘ฅ ~๐‘โˆ— and โ€adversarial distributionโ€ ๐‘ฅ ~๐‘ž, generative model ๐‘(๐‘ฅ; ๐œƒ) โ€ข ๐‘ž will have higher likelihood than ๐‘โˆ— if ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0 โ€ข Assumptions: โ€ข Second order expansion around ๐‘ฅ0 โ€ข Assuming ๐”ผ ๐‘ž = ๐”ผ ๐‘โˆ— = ๐‘ฅ0 (some empirical proof in the example case) โ€ข Latent distribution is gaussian โ€ข Using constant volume โ€ข ๐‘ž= SVHN, ๐‘โˆ— = CIFAR-10
  • 45. Paper findings โ€ข For ๐‘ž=SVHN, ๐‘โˆ—=CIFAR-10, the assumptions given, and empirical variances of the data ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ > 0 simplifies to: 1 2๐œŽ ๐œ“ 2 ๐›ผ1 2 โˆ— 12.3 + ๐›ผ2 2 โˆ— 6.5 + ๐›ผ3 2 โˆ— 14.5 โ‰ฅ 0, where ๐›ผ ๐‘ = ๐‘˜=1 ๐พ ๐‘—=1 ๐ถ ๐‘ข ๐‘˜,๐‘,๐‘— โ€ข ๐”ผ ๐‘ž log p ๐‘ฅ; ๐œƒ โˆ’ ๐”ผ ๐‘โˆ— log p ๐‘ฅ; ๐œƒ is thus always larger or equal to zero since ๐›ผ ๐‘ 2 โ‰ฅ 0 โ€ข Predicts that SVHN will be more likely than CIFAR-10
  • 46. Paper findings โ€ข Then hypothesize that reducing the variance of the data artificially will increase the likelihood
  • 47. Conclusions โ€ข Cause to pause when using generative models in anomaly detection โ€ข Second order analysis provided (only applicable to a certain type of flow + many assumptions) โ€ข The authorโ€™s urge further study on the subject
  • 48. Discussion โ€ข How valid/applicable is their analysis? โ€ข How come samples do not look like the OOD images if they have higher likelihood?