SlideShare a Scribd company logo
1 of 31
Variational Inference
Note: Much (meaning almost all) of this has been
liberated from John Winn and Matthew Beal’s theses,
and David McKay’s book.
Overview
• Probabilistic models & Bayesian
inference
• Variational Inference
• Univariate Gaussian Example
• GMM Example
• Variational Message Passing
Bayesian networks
• Directed graph
• Nodes represent
variables
• Links show dependencies
• Conditional distribution at
each node
• Defines a joint
distribution:
.
P(C,L,S,I)=P(L) P(C)
P(S|C) P(I|L,S)
Lighting
color
Surface
color
Image
color
Object class
C
SL
I
P(L)
P(C)
P(S|C)
P(I|L,S)
Lighting
color
Hidden
Bayesian inference
Observed
• Observed variables D
and hidden variables H.
• Hidden variables include
parameters and latent
variables.
• Learning/inference
involves finding:
• P(H1, H2…| D), or
• P(H,Θ|D,M) - explicitly for
generative model.
Surface
color
Image
color
C
SL
I
Object class
Bayesian inference vs. ML/MAP
• Consider learning one parameter θ
)(
)()|(
)|(
DP
PDP
DP
θθ
θ =
• How should we represent this posterior distribution?
)()|( θθ PDP∝
Bayesian inference vs. ML/MAP
θMAP
θ
Maximum of P(V| θ) P(θ)
• Consider learning one parameter θ
P(D| θ) P(θ)
Bayesian inference vs. ML/MAP
P(D| θ) P(θ)
θMAP
θ
High probability mass
High probability density
• Consider learning one parameter θ
Bayesian inference vs. ML/MAP
θML
θ
Samples
• Consider learning one parameter θ
P(D| θ) P(θ)
Bayesian inference vs. ML/MAP
θML
θ
Variational
approximation
)(θQ
• Consider learning one parameter θ
P(D| θ) P(θ)
Variational Inference
1. Choose a family of variational
distributions Q(H).
2. Use Kullback-Leibler divergence
KL(Q||P) as a measure of ‘distance’
between P(H|D) and Q(H).
3. Find Q which minimizes divergence.
(in three easy steps…)
Choose Variational Distribution
• P(H|D) ~ Q(H).
• If P is so complex how do we choose Q?
• Any Q is better than an ML or MAP point
estimate.
• Choose Q so it can “get” close to P and is
tractable – factorize, conjugate.
Kullback-Leibler Divergence
• Derived from Variational Free Energy by Feynman and
Bobliubov
• Relative Entropy between two probability distributions
• KL(Q||P) > 0 , for any Q (Jensen’s inequality)
• KL(Q||P) = 0 iff P = Q.
• Not true distance measure, not symmetric
∑=
X xP
xQ
xQPQKL
)(
)(
ln)()||(
Kullback-Leibler Divergence
Minimising
KL(Q||P)
P
Q
Q Exclusive
∑=
H DHP
HQ
HQ
)|(
)(
ln)(
Minimising
KL(P||Q) P
∑=
H HQ
DHP
DHP
)(
)|(
ln)|(
Inclusive
Kullback-Leibler Divergence
∑=
H DHP
HQ
HQPQKL
)|(
)(
ln)()||(
∑ ∑+=
H H
DPHQ
DHP
HQ
HQPQKL )(ln)(
),(
)(
ln)()||(
∑=
H DHP
DPHQ
HQPQKL
),(
)()(
ln)()||(
∑=
H HQ
DHP
DHPQPK
)(
)|(
ln)|()||(
∑ +=
H
DP
DHP
HQ
HQPQKL )(ln
),(
)(
ln)()||(
∑=
H DHP
HQ
HQPQKL
)|(
)(
ln)()||(
Bayes Rules
Log property
Sum over H
Kullback-Leibler Divergence
∑ ∑−≡
H H
HQHQDHPHQQL )(ln)(),(ln)()( DEFINE
• L is the difference between: expectation of the marginal
likelihood with respect to Q, and the entropy of Q
• Maximize L(Q) is equivalent to minimizing KL Divergence
• We could not do the same trick for KL(P||Q), thus we will
approximate likelihood with a function that has it’s mass where
the likelihood is most probable (exclusive).
∑ +=
H
DP
DHP
HQ
HQPQKL )(ln
),(
)(
ln)()||(
)()(ln)||( QLDPPQKL −=
Summarize
)||(KL)()(ln PQQLDP +=
∑ 





=
H HQ
DHP
HQQL
)(
),(
ln)()(
where
• For arbitrary Q(H)
• We choose a family of Q distributions
where L(Q) is tractable to compute.
maximisefixed minimise
Still difficult in general to calculate
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Factorised Approximation
• Assume Q factorises
• Optimal solution for one factor given by
• Given the form of Q, find the best H in KL sense
• Choose conjugate priors P(H) to give from of Q
• Do it iteratively of each Qi(Hi)
const.),(ln)(ln *
+= ≠ijii DHPHQ
∏=
i
ii HQHQ )()(
∏∑≠
=
ji H
iiijj
i
DHPHQ
Z
HQ )),(ln)(exp(
1
)(*
Derivation
∏∑≠
=
ji H
iijj
i
DHPHQ
Z
HQ )),(ln)(exp(
1
)(*
∑ ∑−≡
H H
HQHQDHPHQQL )(ln)(),(ln)()(
∑ ∑ ∏∏∏ −=
H H j
jj
i
ii
i
ii HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∏ ∑∏ −=
H H i j
jjii
i
ii HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∑∏ −=
H i H
iiii
i
ii
i
HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∑∑∏ ≠≠
−−=
H ji H
iiii
H
jjjj
ji
iijj
ij
HQHQHQHQDHPHQHQ )(ln)()(ln)(),(ln)()(
Log property
Substitution
Factor one term Qj
Not a Function of Qj
Idea: Use factoring of Q
to isolate Qj and
maximize L wrt Qj
ZQQKL jj log)||( *
−−=
Example: Univariate Gaussian
• Normal distribution
• Find P(µ,γ | x)
• Conjugate prior
• Factorized variational
distribution
• Q distribution same form as
prior distributions
• Inference involves updating
these hidden parameters
Example: Univariate Gaussian
• Use Q* to derive:
• Where <> is the expectation over Q function
• Iteratively solve
Example: Univariate Gaussian
• Estimate of log evidence can be found by
calculating L(Q):
• Where <.> are expectations wrt to Q(.)
Example
Take four data samples form
Gaussian (Thick Line) to find
posterior. Dashed lines distribution
from sampled variational.
Variational and True posterior from
Gaussian given four samples. P(µ) =
N(0,1000). P(γ) = Gamma(.001,.001).
VB with Image Segmentation
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
0 100 200 300
0
100
200
0 100 200 300
0
50
100
0 100 200 300
0
100
200
300
0 100 200 300
0
50
100
0 100 200 300
0
50
100
150
0 100 200 300
0
50
100
RGB histogram of two pixel
locations.
“VB at the pixel level will give
better results.”
Feature vector (x,y,Vx,Vy,r,g,b) -
will have issues with data
association.
VB with GMM will be complex –
doing this in real time will be
execrable.
Lower Bound for GMM-Ugly
Variational Equations for GMM-Ugly
Brings Up VMP – Efficient Computation
Lighting
color
Surface
color
Image
color
Object class
C
SL
I
P(L)
P(C)
P(S|C)
P(I|L,S)

More Related Content

What's hot

PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
 
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
有向グラフに対する 非線形ラプラシアンと ネットワーク解析有向グラフに対する 非線形ラプラシアンと ネットワーク解析
有向グラフに対する 非線形ラプラシアンと ネットワーク解析Yuichi Yoshida
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes FamilyKota Matsui
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component AnalysisTatsuya Yokota
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
パンハウスゼミ 異常検知論文紹介 20191005
パンハウスゼミ 異常検知論文紹介  20191005パンハウスゼミ 異常検知論文紹介  20191005
パンハウスゼミ 異常検知論文紹介 20191005ぱんいち すみもと
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
シンギュラリティを知らずに機械学習を語るな
シンギュラリティを知らずに機械学習を語るなシンギュラリティを知らずに機械学習を語るな
シンギュラリティを知らずに機械学習を語るなhoxo_m
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodHa Phuong
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal TransportGabriel Peyré
 
ニューラルチューリングマシン入門
ニューラルチューリングマシン入門ニューラルチューリングマシン入門
ニューラルチューリングマシン入門naoto moriyama
 
Neural Processes
Neural ProcessesNeural Processes
Neural ProcessesSangwoo Mo
 
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...Deep Learning JP
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningChristian Perone
 
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...yukihiro domae
 
Style space analysis paper review !
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !taeseon ryu
 

What's hot (20)

PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
Introduction of VAE
Introduction of VAEIntroduction of VAE
Introduction of VAE
 
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
有向グラフに対する 非線形ラプラシアンと ネットワーク解析有向グラフに対する 非線形ラプラシアンと ネットワーク解析
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Chainerで流体計算
Chainerで流体計算Chainerで流体計算
Chainerで流体計算
 
パンハウスゼミ 異常検知論文紹介 20191005
パンハウスゼミ 異常検知論文紹介  20191005パンハウスゼミ 異常検知論文紹介  20191005
パンハウスゼミ 異常検知論文紹介 20191005
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
シンギュラリティを知らずに機械学習を語るな
シンギュラリティを知らずに機械学習を語るなシンギュラリティを知らずに機械学習を語るな
シンギュラリティを知らずに機械学習を語るな
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling Method
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
ニューラルチューリングマシン入門
ニューラルチューリングマシン入門ニューラルチューリングマシン入門
ニューラルチューリングマシン入門
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
 
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
 
Style space analysis paper review !
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !
 

Similar to Variational Inference

Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on SamplingDon Sheehy
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningSungbin Lim
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningFrank Nielsen
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographywtyru1989
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Eliezer Silva
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsAlexander Litvinenko
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 

Similar to Variational Inference (20)

QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep Learning
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learning
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Variational inference
Variational inference  Variational inference
Variational inference
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficients
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
LieGroup
LieGroupLieGroup
LieGroup
 
HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016
 

More from Tushar Tank

Image Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingImage Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingTushar Tank
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsTushar Tank
 
Bayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesBayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesTushar Tank
 
Review of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisReview of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisTushar Tank
 
Tech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTushar Tank
 
Shapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionShapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionTushar Tank
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical ClusteringTushar Tank
 
Time Frequency Analysis for Poets
Time Frequency Analysis for PoetsTime Frequency Analysis for Poets
Time Frequency Analysis for PoetsTushar Tank
 
Kalman filter upload
Kalman filter uploadKalman filter upload
Kalman filter uploadTushar Tank
 

More from Tushar Tank (10)

Image Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingImage Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video Editting
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov Chains
 
Bayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesBayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with Examples
 
Review of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisReview of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series Analysis
 
Tech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paper
 
Shapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionShapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley Discussion
 
Hindu ABC Book
Hindu ABC BookHindu ABC Book
Hindu ABC Book
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Time Frequency Analysis for Poets
Time Frequency Analysis for PoetsTime Frequency Analysis for Poets
Time Frequency Analysis for Poets
 
Kalman filter upload
Kalman filter uploadKalman filter upload
Kalman filter upload
 

Recently uploaded

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 

Variational Inference

  • 1. Variational Inference Note: Much (meaning almost all) of this has been liberated from John Winn and Matthew Beal’s theses, and David McKay’s book.
  • 2. Overview • Probabilistic models & Bayesian inference • Variational Inference • Univariate Gaussian Example • GMM Example • Variational Message Passing
  • 3. Bayesian networks • Directed graph • Nodes represent variables • Links show dependencies • Conditional distribution at each node • Defines a joint distribution: . P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S) Lighting color Surface color Image color Object class C SL I P(L) P(C) P(S|C) P(I|L,S)
  • 4. Lighting color Hidden Bayesian inference Observed • Observed variables D and hidden variables H. • Hidden variables include parameters and latent variables. • Learning/inference involves finding: • P(H1, H2…| D), or • P(H,Θ|D,M) - explicitly for generative model. Surface color Image color C SL I Object class
  • 5. Bayesian inference vs. ML/MAP • Consider learning one parameter θ )( )()|( )|( DP PDP DP θθ θ = • How should we represent this posterior distribution? )()|( θθ PDP∝
  • 6. Bayesian inference vs. ML/MAP θMAP θ Maximum of P(V| θ) P(θ) • Consider learning one parameter θ P(D| θ) P(θ)
  • 7. Bayesian inference vs. ML/MAP P(D| θ) P(θ) θMAP θ High probability mass High probability density • Consider learning one parameter θ
  • 8. Bayesian inference vs. ML/MAP θML θ Samples • Consider learning one parameter θ P(D| θ) P(θ)
  • 9. Bayesian inference vs. ML/MAP θML θ Variational approximation )(θQ • Consider learning one parameter θ P(D| θ) P(θ)
  • 10. Variational Inference 1. Choose a family of variational distributions Q(H). 2. Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|D) and Q(H). 3. Find Q which minimizes divergence. (in three easy steps…)
  • 11. Choose Variational Distribution • P(H|D) ~ Q(H). • If P is so complex how do we choose Q? • Any Q is better than an ML or MAP point estimate. • Choose Q so it can “get” close to P and is tractable – factorize, conjugate.
  • 12. Kullback-Leibler Divergence • Derived from Variational Free Energy by Feynman and Bobliubov • Relative Entropy between two probability distributions • KL(Q||P) > 0 , for any Q (Jensen’s inequality) • KL(Q||P) = 0 iff P = Q. • Not true distance measure, not symmetric ∑= X xP xQ xQPQKL )( )( ln)()||(
  • 13. Kullback-Leibler Divergence Minimising KL(Q||P) P Q Q Exclusive ∑= H DHP HQ HQ )|( )( ln)( Minimising KL(P||Q) P ∑= H HQ DHP DHP )( )|( ln)|( Inclusive
  • 14. Kullback-Leibler Divergence ∑= H DHP HQ HQPQKL )|( )( ln)()||( ∑ ∑+= H H DPHQ DHP HQ HQPQKL )(ln)( ),( )( ln)()||( ∑= H DHP DPHQ HQPQKL ),( )()( ln)()||( ∑= H HQ DHP DHPQPK )( )|( ln)|()||( ∑ += H DP DHP HQ HQPQKL )(ln ),( )( ln)()||( ∑= H DHP HQ HQPQKL )|( )( ln)()||( Bayes Rules Log property Sum over H
  • 15. Kullback-Leibler Divergence ∑ ∑−≡ H H HQHQDHPHQQL )(ln)(),(ln)()( DEFINE • L is the difference between: expectation of the marginal likelihood with respect to Q, and the entropy of Q • Maximize L(Q) is equivalent to minimizing KL Divergence • We could not do the same trick for KL(P||Q), thus we will approximate likelihood with a function that has it’s mass where the likelihood is most probable (exclusive). ∑ += H DP DHP HQ HQPQKL )(ln ),( )( ln)()||( )()(ln)||( QLDPPQKL −=
  • 16. Summarize )||(KL)()(ln PQQLDP += ∑       = H HQ DHP HQQL )( ),( ln)()( where • For arbitrary Q(H) • We choose a family of Q distributions where L(Q) is tractable to compute. maximisefixed minimise Still difficult in general to calculate
  • 17. Minimising the KL divergence L(Q) KL(Q || P) ln P(D)maximise fixed
  • 18. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 19. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 20. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 21. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 22. Factorised Approximation • Assume Q factorises • Optimal solution for one factor given by • Given the form of Q, find the best H in KL sense • Choose conjugate priors P(H) to give from of Q • Do it iteratively of each Qi(Hi) const.),(ln)(ln * += ≠ijii DHPHQ ∏= i ii HQHQ )()( ∏∑≠ = ji H iiijj i DHPHQ Z HQ )),(ln)(exp( 1 )(*
  • 23. Derivation ∏∑≠ = ji H iijj i DHPHQ Z HQ )),(ln)(exp( 1 )(* ∑ ∑−≡ H H HQHQDHPHQQL )(ln)(),(ln)()( ∑ ∑ ∏∏∏ −= H H j jj i ii i ii HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∏ ∑∏ −= H H i j jjii i ii HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∑∏ −= H i H iiii i ii i HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∑∑∏ ≠≠ −−= H ji H iiii H jjjj ji iijj ij HQHQHQHQDHPHQHQ )(ln)()(ln)(),(ln)()( Log property Substitution Factor one term Qj Not a Function of Qj Idea: Use factoring of Q to isolate Qj and maximize L wrt Qj ZQQKL jj log)||( * −−=
  • 24. Example: Univariate Gaussian • Normal distribution • Find P(µ,γ | x) • Conjugate prior • Factorized variational distribution • Q distribution same form as prior distributions • Inference involves updating these hidden parameters
  • 25. Example: Univariate Gaussian • Use Q* to derive: • Where <> is the expectation over Q function • Iteratively solve
  • 26. Example: Univariate Gaussian • Estimate of log evidence can be found by calculating L(Q): • Where <.> are expectations wrt to Q(.)
  • 27. Example Take four data samples form Gaussian (Thick Line) to find posterior. Dashed lines distribution from sampled variational. Variational and True posterior from Gaussian given four samples. P(µ) = N(0,1000). P(γ) = Gamma(.001,.001).
  • 28. VB with Image Segmentation 20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 0 100 200 300 0 100 200 0 100 200 300 0 50 100 0 100 200 300 0 100 200 300 0 100 200 300 0 50 100 0 100 200 300 0 50 100 150 0 100 200 300 0 50 100 RGB histogram of two pixel locations. “VB at the pixel level will give better results.” Feature vector (x,y,Vx,Vy,r,g,b) - will have issues with data association. VB with GMM will be complex – doing this in real time will be execrable.
  • 29. Lower Bound for GMM-Ugly
  • 31. Brings Up VMP – Efficient Computation Lighting color Surface color Image color Object class C SL I P(L) P(C) P(S|C) P(I|L,S)

Editor's Notes

  1. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  2. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  3. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  4. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  5. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  6. Guarantees to increase the lower bound – unless already at a maximum.