SlideShare a Scribd company logo
1 of 31
Variational Inference
Note: Much (meaning almost all) of this has been
liberated from John Winn and Matthew Beal’s theses,
and David McKay’s book.
Overview
• Probabilistic models & Bayesian
inference
• Variational Inference
• Univariate Gaussian Example
• GMM Example
• Variational Message Passing
Bayesian networks
• Directed graph
• Nodes represent
variables
• Links show dependencies
• Conditional distribution at
each node
• Defines a joint
distribution:
.
P(C,L,S,I)=P(L) P(C)
P(S|C) P(I|L,S)
Lighting
color
Surface
color
Image
color
Object class
C
SL
I
P(L)
P(C)
P(S|C)
P(I|L,S)
Lighting
color
Hidden
Bayesian inference
Observed
• Observed variables D
and hidden variables H.
• Hidden variables include
parameters and latent
variables.
• Learning/inference
involves finding:
• P(H1, H2…| D), or
• P(H,Θ|D,M) - explicitly for
generative model.
Surface
color
Image
color
C
SL
I
Object class
Bayesian inference vs. ML/MAP
• Consider learning one parameter θ
)(
)()|(
)|(
DP
PDP
DP
θθ
θ =
• How should we represent this posterior distribution?
)()|( θθ PDP∝
Bayesian inference vs. ML/MAP
θMAP
θ
Maximum of P(V| θ) P(θ)
• Consider learning one parameter θ
P(D| θ) P(θ)
Bayesian inference vs. ML/MAP
P(D| θ) P(θ)
θMAP
θ
High probability mass
High probability density
• Consider learning one parameter θ
Bayesian inference vs. ML/MAP
θML
θ
Samples
• Consider learning one parameter θ
P(D| θ) P(θ)
Bayesian inference vs. ML/MAP
θML
θ
Variational
approximation
)(θQ
• Consider learning one parameter θ
P(D| θ) P(θ)
Variational Inference
1. Choose a family of variational
distributions Q(H).
2. Use Kullback-Leibler divergence
KL(Q||P) as a measure of ‘distance’
between P(H|D) and Q(H).
3. Find Q which minimizes divergence.
(in three easy steps…)
Choose Variational Distribution
• P(H|D) ~ Q(H).
• If P is so complex how do we choose Q?
• Any Q is better than an ML or MAP point
estimate.
• Choose Q so it can “get” close to P and is
tractable – factorize, conjugate.
Kullback-Leibler Divergence
• Derived from Variational Free Energy by Feynman and
Bobliubov
• Relative Entropy between two probability distributions
• KL(Q||P) > 0 , for any Q (Jensen’s inequality)
• KL(Q||P) = 0 iff P = Q.
• Not true distance measure, not symmetric
∑=
X xP
xQ
xQPQKL
)(
)(
ln)()||(
Kullback-Leibler Divergence
Minimising
KL(Q||P)
P
Q
Q Exclusive
∑=
H DHP
HQ
HQ
)|(
)(
ln)(
Minimising
KL(P||Q) P
∑=
H HQ
DHP
DHP
)(
)|(
ln)|(
Inclusive
Kullback-Leibler Divergence
∑=
H DHP
HQ
HQPQKL
)|(
)(
ln)()||(
∑ ∑+=
H H
DPHQ
DHP
HQ
HQPQKL )(ln)(
),(
)(
ln)()||(
∑=
H DHP
DPHQ
HQPQKL
),(
)()(
ln)()||(
∑=
H HQ
DHP
DHPQPK
)(
)|(
ln)|()||(
∑ +=
H
DP
DHP
HQ
HQPQKL )(ln
),(
)(
ln)()||(
∑=
H DHP
HQ
HQPQKL
)|(
)(
ln)()||(
Bayes Rules
Log property
Sum over H
Kullback-Leibler Divergence
∑ ∑−≡
H H
HQHQDHPHQQL )(ln)(),(ln)()( DEFINE
• L is the difference between: expectation of the marginal
likelihood with respect to Q, and the entropy of Q
• Maximize L(Q) is equivalent to minimizing KL Divergence
• We could not do the same trick for KL(P||Q), thus we will
approximate likelihood with a function that has it’s mass where
the likelihood is most probable (exclusive).
∑ +=
H
DP
DHP
HQ
HQPQKL )(ln
),(
)(
ln)()||(
)()(ln)||( QLDPPQKL −=
Summarize
)||(KL)()(ln PQQLDP +=
∑ 





=
H HQ
DHP
HQQL
)(
),(
ln)()(
where
• For arbitrary Q(H)
• We choose a family of Q distributions
where L(Q) is tractable to compute.
maximisefixed minimise
Still difficult in general to calculate
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Factorised Approximation
• Assume Q factorises
• Optimal solution for one factor given by
• Given the form of Q, find the best H in KL sense
• Choose conjugate priors P(H) to give from of Q
• Do it iteratively of each Qi(Hi)
const.),(ln)(ln *
+= ≠ijii DHPHQ
∏=
i
ii HQHQ )()(
∏∑≠
=
ji H
iiijj
i
DHPHQ
Z
HQ )),(ln)(exp(
1
)(*
Derivation
∏∑≠
=
ji H
iijj
i
DHPHQ
Z
HQ )),(ln)(exp(
1
)(*
∑ ∑−≡
H H
HQHQDHPHQQL )(ln)(),(ln)()(
∑ ∑ ∏∏∏ −=
H H j
jj
i
ii
i
ii HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∏ ∑∏ −=
H H i j
jjii
i
ii HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∑∏ −=
H i H
iiii
i
ii
i
HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∑∑∏ ≠≠
−−=
H ji H
iiii
H
jjjj
ji
iijj
ij
HQHQHQHQDHPHQHQ )(ln)()(ln)(),(ln)()(
Log property
Substitution
Factor one term Qj
Not a Function of Qj
Idea: Use factoring of Q
to isolate Qj and
maximize L wrt Qj
ZQQKL jj log)||( *
−−=
Example: Univariate Gaussian
• Normal distribution
• Find P(µ,γ | x)
• Conjugate prior
• Factorized variational
distribution
• Q distribution same form as
prior distributions
• Inference involves updating
these hidden parameters
Example: Univariate Gaussian
• Use Q* to derive:
• Where <> is the expectation over Q function
• Iteratively solve
Example: Univariate Gaussian
• Estimate of log evidence can be found by
calculating L(Q):
• Where <.> are expectations wrt to Q(.)
Example
Take four data samples form
Gaussian (Thick Line) to find
posterior. Dashed lines distribution
from sampled variational.
Variational and True posterior from
Gaussian given four samples. P(µ) =
N(0,1000). P(γ) = Gamma(.001,.001).
VB with Image Segmentation
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
0 100 200 300
0
100
200
0 100 200 300
0
50
100
0 100 200 300
0
100
200
300
0 100 200 300
0
50
100
0 100 200 300
0
50
100
150
0 100 200 300
0
50
100
RGB histogram of two pixel
locations.
“VB at the pixel level will give
better results.”
Feature vector (x,y,Vx,Vy,r,g,b) -
will have issues with data
association.
VB with GMM will be complex –
doing this in real time will be
execrable.
Lower Bound for GMM-Ugly
Variational Equations for GMM-Ugly
Brings Up VMP – Efficient Computation
Lighting
color
Surface
color
Image
color
Object class
C
SL
I
P(L)
P(C)
P(S|C)
P(I|L,S)

More Related Content

What's hot

Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
スペクトラルグラフ理論入門
スペクトラルグラフ理論入門スペクトラルグラフ理論入門
スペクトラルグラフ理論入門irrrrr
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明Haruka Ozaki
 
パターン認識 04 混合正規分布
パターン認識 04 混合正規分布パターン認識 04 混合正規分布
パターン認識 04 混合正規分布sleipnir002
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flowsGrigoris C
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationJason Anderson
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
深層生成モデルによるメディア生成
深層生成モデルによるメディア生成深層生成モデルによるメディア生成
深層生成モデルによるメディア生成kame_hirokazu
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Marina Santini
 
PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半Ohsawa Goodfellow
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
PRML 1.6 情報理論
PRML 1.6 情報理論PRML 1.6 情報理論
PRML 1.6 情報理論sleepy_yoshi
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10Sunwoo Kim
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 

What's hot (20)

Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
スペクトラルグラフ理論入門
スペクトラルグラフ理論入門スペクトラルグラフ理論入門
スペクトラルグラフ理論入門
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明
 
パターン認識 04 混合正規分布
パターン認識 04 混合正規分布パターン認識 04 混合正規分布
パターン認識 04 混合正規分布
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
深層生成モデルによるメディア生成
深層生成モデルによるメディア生成深層生成モデルによるメディア生成
深層生成モデルによるメディア生成
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)
 
PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
PRML 1.6 情報理論
PRML 1.6 情報理論PRML 1.6 情報理論
PRML 1.6 情報理論
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 

Similar to Variational Inference

Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on SamplingDon Sheehy
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningSungbin Lim
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningFrank Nielsen
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographywtyru1989
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Eliezer Silva
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsAlexander Litvinenko
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 

Similar to Variational Inference (20)

QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep Learning
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learning
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Variational inference
Variational inference  Variational inference
Variational inference
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficients
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
LieGroup
LieGroupLieGroup
LieGroup
 
HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016
 

More from Tushar Tank

Image Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingImage Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingTushar Tank
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsTushar Tank
 
Bayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesBayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesTushar Tank
 
Review of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisReview of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisTushar Tank
 
Tech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTushar Tank
 
Shapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionShapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionTushar Tank
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical ClusteringTushar Tank
 
Time Frequency Analysis for Poets
Time Frequency Analysis for PoetsTime Frequency Analysis for Poets
Time Frequency Analysis for PoetsTushar Tank
 
Kalman filter upload
Kalman filter uploadKalman filter upload
Kalman filter uploadTushar Tank
 

More from Tushar Tank (10)

Image Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingImage Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video Editting
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov Chains
 
Bayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesBayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with Examples
 
Review of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisReview of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series Analysis
 
Tech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paper
 
Shapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionShapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley Discussion
 
Hindu ABC Book
Hindu ABC BookHindu ABC Book
Hindu ABC Book
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Time Frequency Analysis for Poets
Time Frequency Analysis for PoetsTime Frequency Analysis for Poets
Time Frequency Analysis for Poets
 
Kalman filter upload
Kalman filter uploadKalman filter upload
Kalman filter upload
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Variational Inference

  • 1. Variational Inference Note: Much (meaning almost all) of this has been liberated from John Winn and Matthew Beal’s theses, and David McKay’s book.
  • 2. Overview • Probabilistic models & Bayesian inference • Variational Inference • Univariate Gaussian Example • GMM Example • Variational Message Passing
  • 3. Bayesian networks • Directed graph • Nodes represent variables • Links show dependencies • Conditional distribution at each node • Defines a joint distribution: . P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S) Lighting color Surface color Image color Object class C SL I P(L) P(C) P(S|C) P(I|L,S)
  • 4. Lighting color Hidden Bayesian inference Observed • Observed variables D and hidden variables H. • Hidden variables include parameters and latent variables. • Learning/inference involves finding: • P(H1, H2…| D), or • P(H,Θ|D,M) - explicitly for generative model. Surface color Image color C SL I Object class
  • 5. Bayesian inference vs. ML/MAP • Consider learning one parameter θ )( )()|( )|( DP PDP DP θθ θ = • How should we represent this posterior distribution? )()|( θθ PDP∝
  • 6. Bayesian inference vs. ML/MAP θMAP θ Maximum of P(V| θ) P(θ) • Consider learning one parameter θ P(D| θ) P(θ)
  • 7. Bayesian inference vs. ML/MAP P(D| θ) P(θ) θMAP θ High probability mass High probability density • Consider learning one parameter θ
  • 8. Bayesian inference vs. ML/MAP θML θ Samples • Consider learning one parameter θ P(D| θ) P(θ)
  • 9. Bayesian inference vs. ML/MAP θML θ Variational approximation )(θQ • Consider learning one parameter θ P(D| θ) P(θ)
  • 10. Variational Inference 1. Choose a family of variational distributions Q(H). 2. Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|D) and Q(H). 3. Find Q which minimizes divergence. (in three easy steps…)
  • 11. Choose Variational Distribution • P(H|D) ~ Q(H). • If P is so complex how do we choose Q? • Any Q is better than an ML or MAP point estimate. • Choose Q so it can “get” close to P and is tractable – factorize, conjugate.
  • 12. Kullback-Leibler Divergence • Derived from Variational Free Energy by Feynman and Bobliubov • Relative Entropy between two probability distributions • KL(Q||P) > 0 , for any Q (Jensen’s inequality) • KL(Q||P) = 0 iff P = Q. • Not true distance measure, not symmetric ∑= X xP xQ xQPQKL )( )( ln)()||(
  • 13. Kullback-Leibler Divergence Minimising KL(Q||P) P Q Q Exclusive ∑= H DHP HQ HQ )|( )( ln)( Minimising KL(P||Q) P ∑= H HQ DHP DHP )( )|( ln)|( Inclusive
  • 14. Kullback-Leibler Divergence ∑= H DHP HQ HQPQKL )|( )( ln)()||( ∑ ∑+= H H DPHQ DHP HQ HQPQKL )(ln)( ),( )( ln)()||( ∑= H DHP DPHQ HQPQKL ),( )()( ln)()||( ∑= H HQ DHP DHPQPK )( )|( ln)|()||( ∑ += H DP DHP HQ HQPQKL )(ln ),( )( ln)()||( ∑= H DHP HQ HQPQKL )|( )( ln)()||( Bayes Rules Log property Sum over H
  • 15. Kullback-Leibler Divergence ∑ ∑−≡ H H HQHQDHPHQQL )(ln)(),(ln)()( DEFINE • L is the difference between: expectation of the marginal likelihood with respect to Q, and the entropy of Q • Maximize L(Q) is equivalent to minimizing KL Divergence • We could not do the same trick for KL(P||Q), thus we will approximate likelihood with a function that has it’s mass where the likelihood is most probable (exclusive). ∑ += H DP DHP HQ HQPQKL )(ln ),( )( ln)()||( )()(ln)||( QLDPPQKL −=
  • 16. Summarize )||(KL)()(ln PQQLDP += ∑       = H HQ DHP HQQL )( ),( ln)()( where • For arbitrary Q(H) • We choose a family of Q distributions where L(Q) is tractable to compute. maximisefixed minimise Still difficult in general to calculate
  • 17. Minimising the KL divergence L(Q) KL(Q || P) ln P(D)maximise fixed
  • 18. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 19. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 20. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 21. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 22. Factorised Approximation • Assume Q factorises • Optimal solution for one factor given by • Given the form of Q, find the best H in KL sense • Choose conjugate priors P(H) to give from of Q • Do it iteratively of each Qi(Hi) const.),(ln)(ln * += ≠ijii DHPHQ ∏= i ii HQHQ )()( ∏∑≠ = ji H iiijj i DHPHQ Z HQ )),(ln)(exp( 1 )(*
  • 23. Derivation ∏∑≠ = ji H iijj i DHPHQ Z HQ )),(ln)(exp( 1 )(* ∑ ∑−≡ H H HQHQDHPHQQL )(ln)(),(ln)()( ∑ ∑ ∏∏∏ −= H H j jj i ii i ii HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∏ ∑∏ −= H H i j jjii i ii HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∑∏ −= H i H iiii i ii i HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∑∑∏ ≠≠ −−= H ji H iiii H jjjj ji iijj ij HQHQHQHQDHPHQHQ )(ln)()(ln)(),(ln)()( Log property Substitution Factor one term Qj Not a Function of Qj Idea: Use factoring of Q to isolate Qj and maximize L wrt Qj ZQQKL jj log)||( * −−=
  • 24. Example: Univariate Gaussian • Normal distribution • Find P(µ,γ | x) • Conjugate prior • Factorized variational distribution • Q distribution same form as prior distributions • Inference involves updating these hidden parameters
  • 25. Example: Univariate Gaussian • Use Q* to derive: • Where <> is the expectation over Q function • Iteratively solve
  • 26. Example: Univariate Gaussian • Estimate of log evidence can be found by calculating L(Q): • Where <.> are expectations wrt to Q(.)
  • 27. Example Take four data samples form Gaussian (Thick Line) to find posterior. Dashed lines distribution from sampled variational. Variational and True posterior from Gaussian given four samples. P(µ) = N(0,1000). P(γ) = Gamma(.001,.001).
  • 28. VB with Image Segmentation 20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 0 100 200 300 0 100 200 0 100 200 300 0 50 100 0 100 200 300 0 100 200 300 0 100 200 300 0 50 100 0 100 200 300 0 50 100 150 0 100 200 300 0 50 100 RGB histogram of two pixel locations. “VB at the pixel level will give better results.” Feature vector (x,y,Vx,Vy,r,g,b) - will have issues with data association. VB with GMM will be complex – doing this in real time will be execrable.
  • 29. Lower Bound for GMM-Ugly
  • 31. Brings Up VMP – Efficient Computation Lighting color Surface color Image color Object class C SL I P(L) P(C) P(S|C) P(I|L,S)

Editor's Notes

  1. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  2. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  3. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  4. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  5. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  6. Guarantees to increase the lower bound – unless already at a maximum.