SlideShare a Scribd company logo
1 of 35
Download to read offline
author title 2020/7/20 1
2020/6/3
Monte Carlo dropout and variational bound
Lab Seminar
author title 2020/7/20 2
• ヤン ティエンラ(訓読み:よう てんらく)
• 1996/7/18
• 出身:四川省成都市
• 大学:電子科技大学自動化専攻
• 趣味:音楽、カラオケ、アニメ、スポーツ
• 研究分野:ベイズ深層学習
自己紹介
author title 2020/7/20 3
Outline
• Variational Inference
• Dropout
• Evaluating the Log Evidence Lower Bound
• VI with upper bound
Preliminaries
Variational inference Approximation
Variational inference with Upper bound
author title 2020/7/20 4
Bayesian Inference
author title 2020/7/20 5
Variational Inference
author title 2020/7/20 6
Bayesian Inference
author title 2020/7/20 7
Variational Inference
ELBO
𝑊~𝑞 𝜃(𝑊)
author title 2020/7/20 8
Variational Inference
• The predictive distribution
• Log evidence lower bound
• Objective:
• Variational Prediction:
• Objective:
• A variational distribution q(w) that explains the data well
while still being close to prior
author title 2020/7/20 9
• The mini-batch objective with reparameterization:
• Simple MC sampling (one):
Variational Inference
author title 2020/7/20 10
Variational Inference
author title 2020/7/20 11
Dropout
• Training stage: A unit is present with probability p
• Testing stage: The unit is always present and the weights are
multiplied by p, same as the expected output
• Training a neural network with dropout can be seen as training
a collection of 2 𝐾
thinned networks with extensive weight sharing
• A single neural network to approximate averaging output at test
time
Procedure
Intuition
author title 2020/7/20 12
Dropout for one-hidden-layer Neural Networks
• Dropout local units
• Equivalent to multiplying the global weight matrices by the
binary vectors to dropout entire rows:
• Application to regression
• Inputs x and Output y
• g: activation function; Weights:
• 𝑏𝑙 is binary dropout variables
author title 2020/7/20 13
• We can rewrite the loss into negative log:
• The weight parameter with present probability:
• The new loss:
Dropout for one-hidden-layer Neural Networks
author title 2020/7/20 14
Dropout for one-hidden-layer Neural Networks
author title 2020/7/20 15
A single-layer neural network example
• Setup
• Idea: Introduce 𝑊1 and 𝑊2 to approxiamtion
• Q: input dimension
K: number of hidden units
D: output dimension
• Goal : Learn 𝑊1 ∈ 𝑅 𝑄×𝐾 and 𝑊2 ∈ 𝑅 𝐾×𝐷to map
𝑋 ∈ 𝑅 𝑁×𝑄
and 𝑌 ∈ 𝑅 𝑁×𝐷
author title 2020/7/20 16
Variational Inference in the Approximate Model
• To mimic Dropout, q(W1) is factorised over input dimension, each
of them is a Gaussian mixture distribution with two components
Where
• Same for q(W2)
Where
• Optimise over parameters, especially
author title 2020/7/20 17
Evaluating the Log Evidence Lower Bound for Regression
• Log evidence lower bound
• Approximation of
• For large enough K we can approximate the KL divergence term as (Gal, 2016)
• Similarly for
author title 2020/7/20 18
KL condition
• More specifically, if we define the prior p(ω) s.t. the following holds:
• Get the following:
• With identical optimization procedures
• Dropout can be interpreted into Bayesian variational inference
(Gal, 2016)
author title 2020/7/20 19
MC dropout in Epistemic uncertainty
• predictive distribution
where 𝑤 is our set of random variables for a model with L layers, 𝑓 is our model’s
stochastic output, and 𝑞 𝜃
∗
𝑤 is an optimum of :
author title 2020/7/20 20
MC dropout in Epistemic uncertainty
For regression this epistemic uncertainty is captured by the predictive mean and
variance, which can be approximated as:
For classification this can be approximated using Monte Carlo integration as
follows:
MC
estimator
author title 2020/7/20 21
MC dropout in Epistemic uncertainty
• Generally, we use the or dropout only in training time, but for the MC dropout,
we do it also in testing time to get an average uncertainty for classification
• Dropout VI can severely underestimate model uncertainty (Gal, 2016,
Section 3.3.2) – a property many VI methods share
(Dropout with alpha divergence, Li, 2017)
• For Evidence Lower Bound:
• Can we use additional bound for estimating q(w)?
author title 2020/7/20 22
VI with upper bound
• Divergence measures(Minka, 2005)
author title 2020/7/20 23
VI with upper bound
• Underestimate posterior with zero-forcing behavior
zero-forcing behavior
mass-covering property/zero-avoiding
author title 2020/7/20 24
CUBO (Adji B. Dieng, 2017)
• Chi-Square divergence
• For the log evidence and Chi-square divergence:
VI with upper bound
author title 2020/7/20 25
VI with upper bound
With an accompanying upper bound, one can perform what we call
maximum entropy model selection in which each model evidence values are chosen to be
that which maximizes the entropy of the resulting distribution on models
author title 2020/7/20 26
EUBO (Chunlin, 2019)
• Inclusive KL divergence
• For the Gibbs’ inequality:
VI with upper bound
author title 2020/7/20 27
VI with upper bound
Theorem 2:
⚫ SGD for the EUBO:
w(𝜃) is generally unknown, so use the joint distribution p(D|𝜃)p(𝜃) instead and
normalizes the weights to cancel the unknown constant p(D)
author title 2020/7/20 28
VI with upper bound
⚫ assume the posterior and the joint distribution has no relation the variational
parameter lambda:
author title 2020/7/20 29
Reference
[1]Gal, Yarin and Ghahramani, Zoubin. Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning.ICML, 2016b.
[2] Gal, Yarin. Uncertainty in Deep Learning. PhD thesis, University of
Cambridge, 2016.
[3]Minka, Tom. Divergence measures and message passing. Technical report,
Microsoft Research, 2005.
[4] Li, Yingzhen and Turner, Richard E. Renyi divergence variational inference. In
NIPS, 2016.
[5] C. Ji and H. Shen. Stochastic variational inference via upper bound. arXiv
preprint, arXiv:1912.00650, 2019.
[6] Adji B. Dieng. Variational inference via chi-upper bound minimization. In
Proceedings of the Neural Information Processing Systems, 2017.
author title 2020/7/20 30
Appendix
• Marginal and Conditional Gaussians
• Given a marginal Gaussian distribution for x and a conditional
Gaussian distribution for y given x in the form
• The marginal distribution of y and the conditional distribution
of x given y are given by
• where
author title 2020/7/20 31
Appendix-Uncertainty
author title 2020/7/20 32
Appendix
author title 2020/7/20 33
Appendix-Sandwich theorem
author title 2020/7/20 34
Appendix-Sandwich theorem
author title 2020/7/20 35
Appendix-EUBO

More Related Content

What's hot

PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7Sunwoo Kim
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper ReviewLEE HOSEONG
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
 
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...PyData
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian ProcessSungjoon Choi
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Variational Inference
Variational InferenceVariational Inference
Variational InferenceTushar Tank
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaEdureka!
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data scienceBrad Klingenberg
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 

What's hot (20)

PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
 
Predictive Analytics - An Introduction
Predictive Analytics - An IntroductionPredictive Analytics - An Introduction
Predictive Analytics - An Introduction
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
 
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian Process
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 

Similar to Monte carlo dropout and variational bound

KVA, Mind Your P's and Q's!
KVA, Mind Your P's and Q's!KVA, Mind Your P's and Q's!
KVA, Mind Your P's and Q's!Shashi Jain
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25Francesco Osborne
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Pirouz Nourian
 
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...changedaeoh
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Mb0040 statistics for management
Mb0040   statistics for managementMb0040   statistics for management
Mb0040 statistics for managementsmumbahelp
 
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15MLconf
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceDiego Tosato
 
Bi gaussianity and indicator variograms (2006)
Bi gaussianity and indicator variograms (2006)Bi gaussianity and indicator variograms (2006)
Bi gaussianity and indicator variograms (2006)David F. Machuca-Mory
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
Laurent jojczyk et Fréderic Dierick - Falla-way
Laurent jojczyk et Fréderic Dierick -  Falla-wayLaurent jojczyk et Fréderic Dierick -  Falla-way
Laurent jojczyk et Fréderic Dierick - Falla-waySynhera
 
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from ArmEdge AI and Vision Alliance
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsQuantUniversity
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada
 
Bag Jacobs Ead Model Ccl Irmc 6 10
Bag Jacobs Ead Model Ccl Irmc 6 10Bag Jacobs Ead Model Ccl Irmc 6 10
Bag Jacobs Ead Model Ccl Irmc 6 10Michael Jacobs, Jr.
 

Similar to Monte carlo dropout and variational bound (20)

KVA, Mind Your P's and Q's!
KVA, Mind Your P's and Q's!KVA, Mind Your P's and Q's!
KVA, Mind Your P's and Q's!
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
 
OOD_PPT.pptx
OOD_PPT.pptxOOD_PPT.pptx
OOD_PPT.pptx
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Mb0040 statistics for management
Mb0040   statistics for managementMb0040   statistics for management
Mb0040 statistics for management
 
Model Uncertainty
Model UncertaintyModel Uncertainty
Model Uncertainty
 
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
Bi gaussianity and indicator variograms (2006)
Bi gaussianity and indicator variograms (2006)Bi gaussianity and indicator variograms (2006)
Bi gaussianity and indicator variograms (2006)
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Laurent jojczyk et Fréderic Dierick - Falla-way
Laurent jojczyk et Fréderic Dierick -  Falla-wayLaurent jojczyk et Fréderic Dierick -  Falla-way
Laurent jojczyk et Fréderic Dierick - Falla-way
 
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
 
vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
 
vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
 
vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
 
Bag Jacobs Ead Model Ccl Irmc 6 10
Bag Jacobs Ead Model Ccl Irmc 6 10Bag Jacobs Ead Model Ccl Irmc 6 10
Bag Jacobs Ead Model Ccl Irmc 6 10
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Monte carlo dropout and variational bound

  • 1. author title 2020/7/20 1 2020/6/3 Monte Carlo dropout and variational bound Lab Seminar
  • 2. author title 2020/7/20 2 • ヤン ティエンラ(訓読み:よう てんらく) • 1996/7/18 • 出身:四川省成都市 • 大学:電子科技大学自動化専攻 • 趣味:音楽、カラオケ、アニメ、スポーツ • 研究分野:ベイズ深層学習 自己紹介
  • 3. author title 2020/7/20 3 Outline • Variational Inference • Dropout • Evaluating the Log Evidence Lower Bound • VI with upper bound Preliminaries Variational inference Approximation Variational inference with Upper bound
  • 4. author title 2020/7/20 4 Bayesian Inference
  • 5. author title 2020/7/20 5 Variational Inference
  • 6. author title 2020/7/20 6 Bayesian Inference
  • 7. author title 2020/7/20 7 Variational Inference ELBO 𝑊~𝑞 𝜃(𝑊)
  • 8. author title 2020/7/20 8 Variational Inference • The predictive distribution • Log evidence lower bound • Objective: • Variational Prediction: • Objective: • A variational distribution q(w) that explains the data well while still being close to prior
  • 9. author title 2020/7/20 9 • The mini-batch objective with reparameterization: • Simple MC sampling (one): Variational Inference
  • 10. author title 2020/7/20 10 Variational Inference
  • 11. author title 2020/7/20 11 Dropout • Training stage: A unit is present with probability p • Testing stage: The unit is always present and the weights are multiplied by p, same as the expected output • Training a neural network with dropout can be seen as training a collection of 2 𝐾 thinned networks with extensive weight sharing • A single neural network to approximate averaging output at test time Procedure Intuition
  • 12. author title 2020/7/20 12 Dropout for one-hidden-layer Neural Networks • Dropout local units • Equivalent to multiplying the global weight matrices by the binary vectors to dropout entire rows: • Application to regression • Inputs x and Output y • g: activation function; Weights: • 𝑏𝑙 is binary dropout variables
  • 13. author title 2020/7/20 13 • We can rewrite the loss into negative log: • The weight parameter with present probability: • The new loss: Dropout for one-hidden-layer Neural Networks
  • 14. author title 2020/7/20 14 Dropout for one-hidden-layer Neural Networks
  • 15. author title 2020/7/20 15 A single-layer neural network example • Setup • Idea: Introduce 𝑊1 and 𝑊2 to approxiamtion • Q: input dimension K: number of hidden units D: output dimension • Goal : Learn 𝑊1 ∈ 𝑅 𝑄×𝐾 and 𝑊2 ∈ 𝑅 𝐾×𝐷to map 𝑋 ∈ 𝑅 𝑁×𝑄 and 𝑌 ∈ 𝑅 𝑁×𝐷
  • 16. author title 2020/7/20 16 Variational Inference in the Approximate Model • To mimic Dropout, q(W1) is factorised over input dimension, each of them is a Gaussian mixture distribution with two components Where • Same for q(W2) Where • Optimise over parameters, especially
  • 17. author title 2020/7/20 17 Evaluating the Log Evidence Lower Bound for Regression • Log evidence lower bound • Approximation of • For large enough K we can approximate the KL divergence term as (Gal, 2016) • Similarly for
  • 18. author title 2020/7/20 18 KL condition • More specifically, if we define the prior p(ω) s.t. the following holds: • Get the following: • With identical optimization procedures • Dropout can be interpreted into Bayesian variational inference (Gal, 2016)
  • 19. author title 2020/7/20 19 MC dropout in Epistemic uncertainty • predictive distribution where 𝑤 is our set of random variables for a model with L layers, 𝑓 is our model’s stochastic output, and 𝑞 𝜃 ∗ 𝑤 is an optimum of :
  • 20. author title 2020/7/20 20 MC dropout in Epistemic uncertainty For regression this epistemic uncertainty is captured by the predictive mean and variance, which can be approximated as: For classification this can be approximated using Monte Carlo integration as follows: MC estimator
  • 21. author title 2020/7/20 21 MC dropout in Epistemic uncertainty • Generally, we use the or dropout only in training time, but for the MC dropout, we do it also in testing time to get an average uncertainty for classification • Dropout VI can severely underestimate model uncertainty (Gal, 2016, Section 3.3.2) – a property many VI methods share (Dropout with alpha divergence, Li, 2017) • For Evidence Lower Bound: • Can we use additional bound for estimating q(w)?
  • 22. author title 2020/7/20 22 VI with upper bound • Divergence measures(Minka, 2005)
  • 23. author title 2020/7/20 23 VI with upper bound • Underestimate posterior with zero-forcing behavior zero-forcing behavior mass-covering property/zero-avoiding
  • 24. author title 2020/7/20 24 CUBO (Adji B. Dieng, 2017) • Chi-Square divergence • For the log evidence and Chi-square divergence: VI with upper bound
  • 25. author title 2020/7/20 25 VI with upper bound With an accompanying upper bound, one can perform what we call maximum entropy model selection in which each model evidence values are chosen to be that which maximizes the entropy of the resulting distribution on models
  • 26. author title 2020/7/20 26 EUBO (Chunlin, 2019) • Inclusive KL divergence • For the Gibbs’ inequality: VI with upper bound
  • 27. author title 2020/7/20 27 VI with upper bound Theorem 2: ⚫ SGD for the EUBO: w(𝜃) is generally unknown, so use the joint distribution p(D|𝜃)p(𝜃) instead and normalizes the weights to cancel the unknown constant p(D)
  • 28. author title 2020/7/20 28 VI with upper bound ⚫ assume the posterior and the joint distribution has no relation the variational parameter lambda:
  • 29. author title 2020/7/20 29 Reference [1]Gal, Yarin and Ghahramani, Zoubin. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.ICML, 2016b. [2] Gal, Yarin. Uncertainty in Deep Learning. PhD thesis, University of Cambridge, 2016. [3]Minka, Tom. Divergence measures and message passing. Technical report, Microsoft Research, 2005. [4] Li, Yingzhen and Turner, Richard E. Renyi divergence variational inference. In NIPS, 2016. [5] C. Ji and H. Shen. Stochastic variational inference via upper bound. arXiv preprint, arXiv:1912.00650, 2019. [6] Adji B. Dieng. Variational inference via chi-upper bound minimization. In Proceedings of the Neural Information Processing Systems, 2017.
  • 30. author title 2020/7/20 30 Appendix • Marginal and Conditional Gaussians • Given a marginal Gaussian distribution for x and a conditional Gaussian distribution for y given x in the form • The marginal distribution of y and the conditional distribution of x given y are given by • where
  • 31. author title 2020/7/20 31 Appendix-Uncertainty
  • 32. author title 2020/7/20 32 Appendix
  • 33. author title 2020/7/20 33 Appendix-Sandwich theorem
  • 34. author title 2020/7/20 34 Appendix-Sandwich theorem
  • 35. author title 2020/7/20 35 Appendix-EUBO