SlideShare a Scribd company logo
Variational Inference
By
Natan Katz
Natan.Katz@gmail.com
Framework –Bayesian Inference
• The inputs:
1. Sample of length n (numbers, categories, vectors, images)
We denote this entity–Evidence
2. An assumption about the probabilistic structure that generates the sample –Hypothesis
Posterior =P(H|E)
Objective : GainUpdate information about the Hypothesis using the Evidence
Bayesian Inference- into formulas
• Estimating hypothesis upon the evidence:
• Z,X random variables
We wish to have P(Z|X) .
• Bayes formula:
P(Z|X) =
𝑃(𝑍,𝑋)
𝑃(𝑋)
Bayesian inference is therefore about working with the RHS terms.
RHS terms
• Hidden Variables (Hypothesis)(Z)- The variables of the mechanism that
generates the sample
(e.g. topics distribution in a corpus or the Gaussians in GMM)
1. The values are not given
2. We have the joint distribution P(Z,X) !!!
• Observed Data (Evidence) (X)- The sample that we actually have.
1. We know every value
2. We may know nothing about its distribution
RHS terms (Cont.)
• In some case studies P(X) is intractable or extremely difficult to
calculate.
• We cannot obtain the conditional distribution based on Bayes
formula’s terms
• Variational inference offers a class of algorithms to solve this
problem: Approximating posterior for difficult P(X)
Examples:
GMM (Known # Gaussian & Variance)
We have K Gaussians
Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive)
For each sample j =1…n
𝑧𝑗 ~Cat (1/K,1/K…1/K)
𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗
, σ)
p(𝑥1….𝑛) = μ1:𝑘 𝑙=1
𝐾
𝑃(μ𝑙) 𝑖=1
𝑛
𝑧 𝑗
𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗
) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡 !
Examples - LDA
Corpus D every document of length N
N ∼ Poisson(ξ)
θ ∼ Dir(α).
Β -Topics (array of words ,fixed or Dirichlet)
For each of the N words 𝑤 𝑛:
a topic 𝑧 𝑛 ∼ Cat(θ).
𝑤 𝑛 ~ p(𝑤 𝑛|𝑧 𝑛, β)
β𝑖𝑗 = P(𝑤𝑖|𝑧𝑗 )
p(w|α, β) = 𝑃(θ|α) 𝑖=1
𝑛
𝑧 𝑛
(p(𝑧 𝑛|θ)p(wn|𝑧 𝑛, β))dθ
Sampling
• The common solution for estimating distributions is sampling:
1. MCMC
• Metropolis-Hastings
• Gibbs
2 RBM –(Mostly by Gibbs sampling)
3. Hybrid Monte Carlo
• Today we wont talk about these methods!
Sampling Vs. Analysis
• Sampling
1. The solutions are exact
2. Numerically expensive
Deterministic
1. Solutions are cheaper
2. Less accurate
3. Non-conjugate problem
4. An optimization process
Sampling Vs. Analysis cont.
• MCMC methods are good for small data where accuracy is essential
• When we have big data and many modes should be tested ,VI
methods have an advantage
Can we do something analytically?
• Can we analytically approximate the posterior ?!
• Can we find a distribution that is closed to the posterior and well estimate the distance?
• When the framework is a vector space
1. Calculus –Allows us to find extremums easily
2. We are endowed with 𝐿 𝑃 metrics (typically p=1 , 2)
• Our domain is the functions space and their functional
We need:
1. An analytical method to find functional’s extremums
2. Nice metric
Calculus of Variations
• Consider the following
𝐹, 𝑦 functions-(with all the “extras”)
J(y)= 𝐹( 𝑦, 𝑦′
𝑡)𝑑𝑡 (𝑦 𝑖𝑠 𝑑𝑖𝑓𝑓. )
If y is an extremum of J it satisfies Euler-Lagrange eq.
𝑑𝐹
𝑑𝑦
-
𝑑
𝑑𝑡
(
𝑑𝐹
𝑑𝑦′
) = 0
• Example: maximum entropy principle
Generally speaking this domain is a calculus for functional spaces hence it is
beneficial for optimizations
Calculus of Var. Cont.
The fundamental lemma of Calculus of variations:
If M continuous, and for all h differentiable
𝑎
𝑏
𝑀 𝑥 ∗ ℎ 𝑥 = 0 ⟹ 𝑀 ≡ 0 on (a,b) (Chybenko)
Generally speaking this domain is a calculus for functional spaces hence
it is beneficial for optimizations
KL (Kullback-Leibler) Divergence
• A metric on distributions
* “On Information and Sufficiency” 1951 (Ann Math Statist)
Properties:
1. Non-symmetric (It actually measures a relative distance :which distribution
P observes as the closest)
1. Concave -> 0 is obtained only for Kl(p,p) (proof by concavity of log an Jensen Lagrange
multipliers))
2. The distance between P(x,y) to p(x) *p(y)=0
Usage:
Cross Entropy = H(p)+ KL(p,q)
PMI (Pointwise Mutual Information)
• Let X,Y random variables
• PMI(X,Y)=Log[
𝑃(𝑋=𝑎,𝑌=𝑏)
𝑃 𝑋=𝑎 𝑃(𝑌=𝑏)
]
• KL(p(X/Y=a),Q(x)) = 𝑥 𝑃(𝑋 = 𝑥|𝑌 = 𝑎)PMI(X=x, Y=a)
• What does this term mean?
ELBO- Evidence Lower Bound
Consider now P(X) –The Evidence
We have :
log(P(X)) ≥ 𝐸 𝑄 [log p(x, Z)] − 𝐸 𝑄 [log Q(Z)]
The RHS is called ELBO and it is a lower bound of the LHS
Back to KL
• Having the requested analytical tools we can approximate the
posterior: find Q s.t. Q(Z) ~ P(Z|X) :
• min(KL(Q(Z)||P(Z|X) )
KL(Q||P(Z|X) )= Log(P(X))- ELBO
=>Log(P(X)) = KL(Q||P(Z|X) ) +ELBO
P is fixed Hence: Maximizing ELBO =>minimizing KL
Let’s use Calculus!
• We wish to optimize the ELBO term.
We can define a functional :
𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log p(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔(
𝑃(𝑋,𝑍)
𝑄(𝑍)
)= J(Q)
We can go to Euler –Lagrange here, but let’s try and simplify Q!
Mean Field Theory-MFT
• The main idea is solving many-body problem (Ising model)
Assume system of many bodies (atoms ,other particles)
1. For each body replace its interaction particles with their average.
2. Assume no correlations between interacted bodies
We will use section 2 to simplify Q
Q(z) = 𝑖=1
𝑛
𝑞𝑖(𝑧𝑖) (Obviously not true)
MFT –cont.
• We can use now Euler –Lagrange with the constrain
𝑞𝑖(z) =1
• We obtain
L𝑜𝑔(𝑞𝑖) = 𝑐𝑜𝑛𝑠𝑡 + 𝐸−𝑖[𝑝 𝑥, 𝑧 ] Bolzman Dist.!
Did we win ? No!
Note that each 𝑞𝑖 may change other 𝑞 𝑗′ 𝑠
Coordinate Ascent Variational Inference
CAVI
• An iterative algorithm
1. Construct a model P(X,Z)
2. Set sequentially each 𝑞𝑖 to 𝐸−𝑖[𝑝 𝑥, 𝑧 ] +constant
3. As always we repeat until the q’s converge
(Wikipedia,Blei) https://www.youtube.com/watch?v=uKxtmkfeuxg
“Message passing” – Winn & Bishop
Minka 2005, Knowles & Minka 2011
Gaussian Example
• Let λ0 𝑎0, 𝑏0 , μ0 hyper-parameters
τ ~ 𝐺𝑎𝑚𝑚𝑎 (𝑎0, 𝑏0)
μ ~N(μ0,(λ0τ)−1)
Data= {𝑥1, 𝑥2, … . 𝑥 𝑛}
Latent variables Z= (τ, μ)
Hyper parameters = {𝑎0, 𝑏0, μ0.λ}
P(X, τ, μ) =P(X| τ, μ)P(μ | τ ) P(τ)
Gaussian Cont.
• P(X|τ, μ) = 𝑖=1
𝑛
𝑁(𝑥 𝑛|τ, μ)
• P(μ| τ) = N(μ|μ0,(λ0τ)−1)
• P(τ)= 𝐺𝑎𝑚𝑚𝑎 (τ|𝑎0, 𝑏0)
• MFT implies:
q(μ, τ ) = 𝑞 μ 𝑞 τ (Not that accurate in this case !)
Using ELBO formula:
Ln(𝑞 μ )= 𝐸τ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C
Ln(𝑞 τ )= 𝐸μ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C
Stochastic - VI
• CAVI does not work well for big data (update for every item)
• Stochastic VI- rather updating the q’s, we calculate the gradient of the
ELBO, and optimize its parameters (similar to EM)
• Used in LDA applications (David Blei et al)
• http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisle
y2013.pdf
• https://www.cs.princeton.edu/courses/archive/fall11/cos597C/readin
g/Blei2011.pdf
•
Appendix- Ising Model
• Ferromagnetism (Pierre Weiss )
• Ising Model -Lenz & Ernst Ising
• We have a Hamiltonian
H (σ) = -h 𝑥 𝜎𝑥 − 𝑗 𝑦𝑥 𝜎𝑥 𝜎 𝑦
(𝜎𝑥 -the spin of a site (atom) y,x are nearest neighbors hence the sum is over
adjacent spins, h is the magnetic field and j is the “coupling constant”)
• Consider the contribution of a single atom (spin):
ξ(𝜎𝑥) = -h𝜎𝑥 -j𝜎𝑥 𝑦 𝜎 𝑦
(y runs over the near spins of x)
Ising Model(cont.)
• Now we replace the second summation by its mean :
ξ(𝜎𝑥) = -h𝜎𝑥 - j𝜎𝑥 < 𝜎 𝑦 > We obtain
ξ(𝜎𝑥) = −ℎ0 𝜎𝑥
• Note that if we are use this approximation to average the entire
system we can use this approximation to have:
𝐸 𝑚𝑓 =𝐸0-h 𝑥 𝜎𝑥
The solution single Bolzman spin dist.:
P(𝑠𝑖) = 𝑒 𝑎∗𝑠 𝑖 /(𝑒 𝑎∗𝑠 𝑖 +𝑒−𝑎∗𝑠 𝑖)
Remarks
1 Maxwell speeds – The use of independency for “achieving” normal
distribution
2 RBM
3 Conditional Random Field (CRF)
4. Cybenko., G. (1989) "Approximations by superposition of sigmoidal
functions“
5. Kullback & Leibler “On Information and Sufficiency”
6. David Blei – Latent Dirichlet Allocations (and the rest of his papers)
7. Expected maximization algorithm (EM, Baum-Welch)
VI in Python
• https://gist.github.com/AustinRochford/91cabfd2e1eecf9049774ce529b
a4c16
• https://www.cs.toronto.edu/~duvenaud/papers/blackbox.pdf
• Edward- edwardlib.org/
• Pymc3 - http://pymc-
devs.github.io/pymc3/notebooks/bayesian_neural_network_advi.html
• Pystan - http://mc-stan.org/interfaces/pystan.html
• Tensorflow - https://github.com/carpedm20/variational-text-tensorflow
VI –Other Languages.
• R- https://artax.karlin.mff.cuni.cz/r-help/library/varbvs/html/00Index.html
• R - https://cran.r-project.org/web/packages/varbvs/varbvs.pdf
• R - https://github.com/kieranrcampbell/clvm (claim that they implement
CAVI )
• Blog on Scala http://alexminnaar.com/online-latent-dirichlet-allocation-
the-best-option-for-topic-modeling-with-large-data-sets.html
• Spark mllib -
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a
pache/spark/mllib/clustering/LDAOptimizer.scala

More Related Content

What's hot

Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
Tushar Tank
 
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
The Statistical and Applied Mathematical Sciences Institute
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.
Alexander Decker
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
Flavio Morelli
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
The Statistical and Applied Mathematical Sciences Institute
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
Christian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
Stefano Cabras
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
Christian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 

What's hot (20)

Nested sampling
Nested samplingNested sampling
Nested sampling
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 

Similar to Variational inference

Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
Natan Katz
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
Natan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
Natan Katz
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdf
MuhammadUmerIhtisham
 
Neural ODE
Neural ODENeural ODE
Neural ODE
Natan Katz
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
fariyaPatel
 
Finite mathematics
Finite mathematicsFinite mathematics
Finite mathematics
Igor Rivin
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
AbdusSadik
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
A Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsA Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium Problems
Yasmine Anino
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
ananth
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
HomeworkAssignmentHe
 
Ch01
Ch01Ch01
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filterszukun
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
Junghyun Lee
 
clustering tendency
clustering tendencyclustering tendency
clustering tendency
Amir Shokri
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
NBER
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
Pierre Jacob
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
aashikareliya
 

Similar to Variational inference (20)

Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdf
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
 
Finite mathematics
Finite mathematicsFinite mathematics
Finite mathematics
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
A Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsA Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium Problems
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
 
Ch01
Ch01Ch01
Ch01
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filters
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
clustering tendency
clustering tendencyclustering tendency
clustering tendency
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 

More from Natan Katz

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
Natan Katz
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
Natan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
Natan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
Natan Katz
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
Natan Katz
 
Finalver
FinalverFinalver
Finalver
Natan Katz
 
Quant2a
Quant2aQuant2a
Quant2a
Natan Katz
 
Bismark
BismarkBismark
Bismark
Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
Natan Katz
 
Ucb
UcbUcb
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
Natan Katz
 

More from Natan Katz (11)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

Variational inference

  • 2. Framework –Bayesian Inference • The inputs: 1. Sample of length n (numbers, categories, vectors, images) We denote this entity–Evidence 2. An assumption about the probabilistic structure that generates the sample –Hypothesis Posterior =P(H|E) Objective : GainUpdate information about the Hypothesis using the Evidence
  • 3. Bayesian Inference- into formulas • Estimating hypothesis upon the evidence: • Z,X random variables We wish to have P(Z|X) . • Bayes formula: P(Z|X) = 𝑃(𝑍,𝑋) 𝑃(𝑋) Bayesian inference is therefore about working with the RHS terms.
  • 4. RHS terms • Hidden Variables (Hypothesis)(Z)- The variables of the mechanism that generates the sample (e.g. topics distribution in a corpus or the Gaussians in GMM) 1. The values are not given 2. We have the joint distribution P(Z,X) !!! • Observed Data (Evidence) (X)- The sample that we actually have. 1. We know every value 2. We may know nothing about its distribution
  • 5. RHS terms (Cont.) • In some case studies P(X) is intractable or extremely difficult to calculate. • We cannot obtain the conditional distribution based on Bayes formula’s terms • Variational inference offers a class of algorithms to solve this problem: Approximating posterior for difficult P(X)
  • 6. Examples: GMM (Known # Gaussian & Variance) We have K Gaussians Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) For each sample j =1…n 𝑧𝑗 ~Cat (1/K,1/K…1/K) 𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗 , σ) p(𝑥1….𝑛) = μ1:𝑘 𝑙=1 𝐾 𝑃(μ𝑙) 𝑖=1 𝑛 𝑧 𝑗 𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗 ) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡 !
  • 7. Examples - LDA Corpus D every document of length N N ∼ Poisson(ξ) θ ∼ Dir(α). Β -Topics (array of words ,fixed or Dirichlet) For each of the N words 𝑤 𝑛: a topic 𝑧 𝑛 ∼ Cat(θ). 𝑤 𝑛 ~ p(𝑤 𝑛|𝑧 𝑛, β) β𝑖𝑗 = P(𝑤𝑖|𝑧𝑗 ) p(w|α, β) = 𝑃(θ|α) 𝑖=1 𝑛 𝑧 𝑛 (p(𝑧 𝑛|θ)p(wn|𝑧 𝑛, β))dθ
  • 8. Sampling • The common solution for estimating distributions is sampling: 1. MCMC • Metropolis-Hastings • Gibbs 2 RBM –(Mostly by Gibbs sampling) 3. Hybrid Monte Carlo • Today we wont talk about these methods!
  • 9. Sampling Vs. Analysis • Sampling 1. The solutions are exact 2. Numerically expensive Deterministic 1. Solutions are cheaper 2. Less accurate 3. Non-conjugate problem 4. An optimization process
  • 10. Sampling Vs. Analysis cont. • MCMC methods are good for small data where accuracy is essential • When we have big data and many modes should be tested ,VI methods have an advantage
  • 11. Can we do something analytically? • Can we analytically approximate the posterior ?! • Can we find a distribution that is closed to the posterior and well estimate the distance? • When the framework is a vector space 1. Calculus –Allows us to find extremums easily 2. We are endowed with 𝐿 𝑃 metrics (typically p=1 , 2) • Our domain is the functions space and their functional We need: 1. An analytical method to find functional’s extremums 2. Nice metric
  • 12. Calculus of Variations • Consider the following 𝐹, 𝑦 functions-(with all the “extras”) J(y)= 𝐹( 𝑦, 𝑦′ 𝑡)𝑑𝑡 (𝑦 𝑖𝑠 𝑑𝑖𝑓𝑓. ) If y is an extremum of J it satisfies Euler-Lagrange eq. 𝑑𝐹 𝑑𝑦 - 𝑑 𝑑𝑡 ( 𝑑𝐹 𝑑𝑦′ ) = 0 • Example: maximum entropy principle Generally speaking this domain is a calculus for functional spaces hence it is beneficial for optimizations
  • 13. Calculus of Var. Cont. The fundamental lemma of Calculus of variations: If M continuous, and for all h differentiable 𝑎 𝑏 𝑀 𝑥 ∗ ℎ 𝑥 = 0 ⟹ 𝑀 ≡ 0 on (a,b) (Chybenko) Generally speaking this domain is a calculus for functional spaces hence it is beneficial for optimizations
  • 14. KL (Kullback-Leibler) Divergence • A metric on distributions * “On Information and Sufficiency” 1951 (Ann Math Statist) Properties: 1. Non-symmetric (It actually measures a relative distance :which distribution P observes as the closest) 1. Concave -> 0 is obtained only for Kl(p,p) (proof by concavity of log an Jensen Lagrange multipliers)) 2. The distance between P(x,y) to p(x) *p(y)=0 Usage: Cross Entropy = H(p)+ KL(p,q)
  • 15. PMI (Pointwise Mutual Information) • Let X,Y random variables • PMI(X,Y)=Log[ 𝑃(𝑋=𝑎,𝑌=𝑏) 𝑃 𝑋=𝑎 𝑃(𝑌=𝑏) ] • KL(p(X/Y=a),Q(x)) = 𝑥 𝑃(𝑋 = 𝑥|𝑌 = 𝑎)PMI(X=x, Y=a) • What does this term mean?
  • 16. ELBO- Evidence Lower Bound Consider now P(X) –The Evidence We have : log(P(X)) ≥ 𝐸 𝑄 [log p(x, Z)] − 𝐸 𝑄 [log Q(Z)] The RHS is called ELBO and it is a lower bound of the LHS
  • 17. Back to KL • Having the requested analytical tools we can approximate the posterior: find Q s.t. Q(Z) ~ P(Z|X) : • min(KL(Q(Z)||P(Z|X) ) KL(Q||P(Z|X) )= Log(P(X))- ELBO =>Log(P(X)) = KL(Q||P(Z|X) ) +ELBO P is fixed Hence: Maximizing ELBO =>minimizing KL
  • 18. Let’s use Calculus! • We wish to optimize the ELBO term. We can define a functional : 𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log p(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔( 𝑃(𝑋,𝑍) 𝑄(𝑍) )= J(Q) We can go to Euler –Lagrange here, but let’s try and simplify Q!
  • 19. Mean Field Theory-MFT • The main idea is solving many-body problem (Ising model) Assume system of many bodies (atoms ,other particles) 1. For each body replace its interaction particles with their average. 2. Assume no correlations between interacted bodies We will use section 2 to simplify Q Q(z) = 𝑖=1 𝑛 𝑞𝑖(𝑧𝑖) (Obviously not true)
  • 20. MFT –cont. • We can use now Euler –Lagrange with the constrain 𝑞𝑖(z) =1 • We obtain L𝑜𝑔(𝑞𝑖) = 𝑐𝑜𝑛𝑠𝑡 + 𝐸−𝑖[𝑝 𝑥, 𝑧 ] Bolzman Dist.! Did we win ? No! Note that each 𝑞𝑖 may change other 𝑞 𝑗′ 𝑠
  • 21. Coordinate Ascent Variational Inference CAVI • An iterative algorithm 1. Construct a model P(X,Z) 2. Set sequentially each 𝑞𝑖 to 𝐸−𝑖[𝑝 𝑥, 𝑧 ] +constant 3. As always we repeat until the q’s converge (Wikipedia,Blei) https://www.youtube.com/watch?v=uKxtmkfeuxg “Message passing” – Winn & Bishop Minka 2005, Knowles & Minka 2011
  • 22. Gaussian Example • Let λ0 𝑎0, 𝑏0 , μ0 hyper-parameters τ ~ 𝐺𝑎𝑚𝑚𝑎 (𝑎0, 𝑏0) μ ~N(μ0,(λ0τ)−1) Data= {𝑥1, 𝑥2, … . 𝑥 𝑛} Latent variables Z= (τ, μ) Hyper parameters = {𝑎0, 𝑏0, μ0.λ} P(X, τ, μ) =P(X| τ, μ)P(μ | τ ) P(τ)
  • 23. Gaussian Cont. • P(X|τ, μ) = 𝑖=1 𝑛 𝑁(𝑥 𝑛|τ, μ) • P(μ| τ) = N(μ|μ0,(λ0τ)−1) • P(τ)= 𝐺𝑎𝑚𝑚𝑎 (τ|𝑎0, 𝑏0) • MFT implies: q(μ, τ ) = 𝑞 μ 𝑞 τ (Not that accurate in this case !) Using ELBO formula: Ln(𝑞 μ )= 𝐸τ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C Ln(𝑞 τ )= 𝐸μ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C
  • 24. Stochastic - VI • CAVI does not work well for big data (update for every item) • Stochastic VI- rather updating the q’s, we calculate the gradient of the ELBO, and optimize its parameters (similar to EM) • Used in LDA applications (David Blei et al) • http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisle y2013.pdf • https://www.cs.princeton.edu/courses/archive/fall11/cos597C/readin g/Blei2011.pdf •
  • 25. Appendix- Ising Model • Ferromagnetism (Pierre Weiss ) • Ising Model -Lenz & Ernst Ising • We have a Hamiltonian H (σ) = -h 𝑥 𝜎𝑥 − 𝑗 𝑦𝑥 𝜎𝑥 𝜎 𝑦 (𝜎𝑥 -the spin of a site (atom) y,x are nearest neighbors hence the sum is over adjacent spins, h is the magnetic field and j is the “coupling constant”) • Consider the contribution of a single atom (spin): ξ(𝜎𝑥) = -h𝜎𝑥 -j𝜎𝑥 𝑦 𝜎 𝑦 (y runs over the near spins of x)
  • 26. Ising Model(cont.) • Now we replace the second summation by its mean : ξ(𝜎𝑥) = -h𝜎𝑥 - j𝜎𝑥 < 𝜎 𝑦 > We obtain ξ(𝜎𝑥) = −ℎ0 𝜎𝑥 • Note that if we are use this approximation to average the entire system we can use this approximation to have: 𝐸 𝑚𝑓 =𝐸0-h 𝑥 𝜎𝑥 The solution single Bolzman spin dist.: P(𝑠𝑖) = 𝑒 𝑎∗𝑠 𝑖 /(𝑒 𝑎∗𝑠 𝑖 +𝑒−𝑎∗𝑠 𝑖)
  • 27. Remarks 1 Maxwell speeds – The use of independency for “achieving” normal distribution 2 RBM 3 Conditional Random Field (CRF) 4. Cybenko., G. (1989) "Approximations by superposition of sigmoidal functions“ 5. Kullback & Leibler “On Information and Sufficiency” 6. David Blei – Latent Dirichlet Allocations (and the rest of his papers) 7. Expected maximization algorithm (EM, Baum-Welch)
  • 28. VI in Python • https://gist.github.com/AustinRochford/91cabfd2e1eecf9049774ce529b a4c16 • https://www.cs.toronto.edu/~duvenaud/papers/blackbox.pdf • Edward- edwardlib.org/ • Pymc3 - http://pymc- devs.github.io/pymc3/notebooks/bayesian_neural_network_advi.html • Pystan - http://mc-stan.org/interfaces/pystan.html • Tensorflow - https://github.com/carpedm20/variational-text-tensorflow
  • 29. VI –Other Languages. • R- https://artax.karlin.mff.cuni.cz/r-help/library/varbvs/html/00Index.html • R - https://cran.r-project.org/web/packages/varbvs/varbvs.pdf • R - https://github.com/kieranrcampbell/clvm (claim that they implement CAVI ) • Blog on Scala http://alexminnaar.com/online-latent-dirichlet-allocation- the-best-option-for-topic-modeling-with-large-data-sets.html • Spark mllib - https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a pache/spark/mllib/clustering/LDAOptimizer.scala