SlideShare a Scribd company logo
1 of 29
Download to read offline
Variational Inference
By
Natan Katz
Natan.Katz@gmail.com
Framework –Bayesian Inference
• The inputs:
1. Sample of length n (numbers, categories, vectors, images)
We denote this entity–Evidence
2. An assumption about the probabilistic structure that generates the sample –Hypothesis
Posterior =P(H|E)
Objective : GainUpdate information about the Hypothesis using the Evidence
Bayesian Inference- into formulas
• Estimating hypothesis upon the evidence:
• Z,X random variables
We wish to have P(Z|X) .
• Bayes formula:
P(Z|X) =
𝑃(𝑍,𝑋)
𝑃(𝑋)
Bayesian inference is therefore about working with the RHS terms.
RHS terms
• Hidden Variables (Hypothesis)(Z)- The variables of the mechanism that
generates the sample
(e.g. topics distribution in a corpus or the Gaussians in GMM)
1. The values are not given
2. We have the joint distribution P(Z,X) !!!
• Observed Data (Evidence) (X)- The sample that we actually have.
1. We know every value
2. We may know nothing about its distribution
RHS terms (Cont.)
• In some case studies P(X) is intractable or extremely difficult to
calculate.
• We cannot obtain the conditional distribution based on Bayes
formula’s terms
• Variational inference offers a class of algorithms to solve this
problem: Approximating posterior for difficult P(X)
Examples:
GMM (Known # Gaussian & Variance)
We have K Gaussians
Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive)
For each sample j =1…n
𝑧𝑗 ~Cat (1/K,1/K…1/K)
𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗
, σ)
p(𝑥1….𝑛) = μ1:𝑘 𝑙=1
𝐾
𝑃(μ𝑙) 𝑖=1
𝑛
𝑧 𝑗
𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗
) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡 !
Examples - LDA
Corpus D every document of length N
N ∼ Poisson(ξ)
θ ∼ Dir(α).
Β -Topics (array of words ,fixed or Dirichlet)
For each of the N words 𝑤 𝑛:
a topic 𝑧 𝑛 ∼ Cat(θ).
𝑤 𝑛 ~ p(𝑤 𝑛|𝑧 𝑛, β)
β𝑖𝑗 = P(𝑤𝑖|𝑧𝑗 )
p(w|α, β) = 𝑃(θ|α) 𝑖=1
𝑛
𝑧 𝑛
(p(𝑧 𝑛|θ)p(wn|𝑧 𝑛, β))dθ
Sampling
• The common solution for estimating distributions is sampling:
1. MCMC
• Metropolis-Hastings
• Gibbs
2 RBM –(Mostly by Gibbs sampling)
3. Hybrid Monte Carlo
• Today we wont talk about these methods!
Sampling Vs. Analysis
• Sampling
1. The solutions are exact
2. Numerically expensive
Deterministic
1. Solutions are cheaper
2. Less accurate
3. Non-conjugate problem
4. An optimization process
Sampling Vs. Analysis cont.
• MCMC methods are good for small data where accuracy is essential
• When we have big data and many modes should be tested ,VI
methods have an advantage
Can we do something analytically?
• Can we analytically approximate the posterior ?!
• Can we find a distribution that is closed to the posterior and well estimate the distance?
• When the framework is a vector space
1. Calculus –Allows us to find extremums easily
2. We are endowed with 𝐿 𝑃 metrics (typically p=1 , 2)
• Our domain is the functions space and their functional
We need:
1. An analytical method to find functional’s extremums
2. Nice metric
Calculus of Variations
• Consider the following
𝐹, 𝑦 functions-(with all the “extras”)
J(y)= 𝐹( 𝑦, 𝑦′
𝑡)𝑑𝑡 (𝑦 𝑖𝑠 𝑑𝑖𝑓𝑓. )
If y is an extremum of J it satisfies Euler-Lagrange eq.
𝑑𝐹
𝑑𝑦
-
𝑑
𝑑𝑡
(
𝑑𝐹
𝑑𝑦′
) = 0
• Example: maximum entropy principle
Generally speaking this domain is a calculus for functional spaces hence it is
beneficial for optimizations
Calculus of Var. Cont.
The fundamental lemma of Calculus of variations:
If M continuous, and for all h differentiable
𝑎
𝑏
𝑀 𝑥 ∗ ℎ 𝑥 = 0 ⟹ 𝑀 ≡ 0 on (a,b) (Chybenko)
Generally speaking this domain is a calculus for functional spaces hence
it is beneficial for optimizations
KL (Kullback-Leibler) Divergence
• A metric on distributions
* “On Information and Sufficiency” 1951 (Ann Math Statist)
Properties:
1. Non-symmetric (It actually measures a relative distance :which distribution
P observes as the closest)
1. Concave -> 0 is obtained only for Kl(p,p) (proof by concavity of log an Jensen Lagrange
multipliers))
2. The distance between P(x,y) to p(x) *p(y)=0
Usage:
Cross Entropy = H(p)+ KL(p,q)
PMI (Pointwise Mutual Information)
• Let X,Y random variables
• PMI(X,Y)=Log[
𝑃(𝑋=𝑎,𝑌=𝑏)
𝑃 𝑋=𝑎 𝑃(𝑌=𝑏)
]
• KL(p(X/Y=a),Q(x)) = 𝑥 𝑃(𝑋 = 𝑥|𝑌 = 𝑎)PMI(X=x, Y=a)
• What does this term mean?
ELBO- Evidence Lower Bound
Consider now P(X) –The Evidence
We have :
log(P(X)) ≥ 𝐸 𝑄 [log p(x, Z)] − 𝐸 𝑄 [log Q(Z)]
The RHS is called ELBO and it is a lower bound of the LHS
Back to KL
• Having the requested analytical tools we can approximate the
posterior: find Q s.t. Q(Z) ~ P(Z|X) :
• min(KL(Q(Z)||P(Z|X) )
KL(Q||P(Z|X) )= Log(P(X))- ELBO
=>Log(P(X)) = KL(Q||P(Z|X) ) +ELBO
P is fixed Hence: Maximizing ELBO =>minimizing KL
Let’s use Calculus!
• We wish to optimize the ELBO term.
We can define a functional :
𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log p(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔(
𝑃(𝑋,𝑍)
𝑄(𝑍)
)= J(Q)
We can go to Euler –Lagrange here, but let’s try and simplify Q!
Mean Field Theory-MFT
• The main idea is solving many-body problem (Ising model)
Assume system of many bodies (atoms ,other particles)
1. For each body replace its interaction particles with their average.
2. Assume no correlations between interacted bodies
We will use section 2 to simplify Q
Q(z) = 𝑖=1
𝑛
𝑞𝑖(𝑧𝑖) (Obviously not true)
MFT –cont.
• We can use now Euler –Lagrange with the constrain
𝑞𝑖(z) =1
• We obtain
L𝑜𝑔(𝑞𝑖) = 𝑐𝑜𝑛𝑠𝑡 + 𝐸−𝑖[𝑝 𝑥, 𝑧 ] Bolzman Dist.!
Did we win ? No!
Note that each 𝑞𝑖 may change other 𝑞 𝑗′ 𝑠
Coordinate Ascent Variational Inference
CAVI
• An iterative algorithm
1. Construct a model P(X,Z)
2. Set sequentially each 𝑞𝑖 to 𝐸−𝑖[𝑝 𝑥, 𝑧 ] +constant
3. As always we repeat until the q’s converge
(Wikipedia,Blei) https://www.youtube.com/watch?v=uKxtmkfeuxg
“Message passing” – Winn & Bishop
Minka 2005, Knowles & Minka 2011
Gaussian Example
• Let λ0 𝑎0, 𝑏0 , μ0 hyper-parameters
τ ~ 𝐺𝑎𝑚𝑚𝑎 (𝑎0, 𝑏0)
μ ~N(μ0,(λ0τ)−1)
Data= {𝑥1, 𝑥2, … . 𝑥 𝑛}
Latent variables Z= (τ, μ)
Hyper parameters = {𝑎0, 𝑏0, μ0.λ}
P(X, τ, μ) =P(X| τ, μ)P(μ | τ ) P(τ)
Gaussian Cont.
• P(X|τ, μ) = 𝑖=1
𝑛
𝑁(𝑥 𝑛|τ, μ)
• P(μ| τ) = N(μ|μ0,(λ0τ)−1)
• P(τ)= 𝐺𝑎𝑚𝑚𝑎 (τ|𝑎0, 𝑏0)
• MFT implies:
q(μ, τ ) = 𝑞 μ 𝑞 τ (Not that accurate in this case !)
Using ELBO formula:
Ln(𝑞 μ )= 𝐸τ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C
Ln(𝑞 τ )= 𝐸μ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C
Stochastic - VI
• CAVI does not work well for big data (update for every item)
• Stochastic VI- rather updating the q’s, we calculate the gradient of the
ELBO, and optimize its parameters (similar to EM)
• Used in LDA applications (David Blei et al)
• http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisle
y2013.pdf
• https://www.cs.princeton.edu/courses/archive/fall11/cos597C/readin
g/Blei2011.pdf
•
Appendix- Ising Model
• Ferromagnetism (Pierre Weiss )
• Ising Model -Lenz & Ernst Ising
• We have a Hamiltonian
H (σ) = -h 𝑥 𝜎𝑥 − 𝑗 𝑦𝑥 𝜎𝑥 𝜎 𝑦
(𝜎𝑥 -the spin of a site (atom) y,x are nearest neighbors hence the sum is over
adjacent spins, h is the magnetic field and j is the “coupling constant”)
• Consider the contribution of a single atom (spin):
ξ(𝜎𝑥) = -h𝜎𝑥 -j𝜎𝑥 𝑦 𝜎 𝑦
(y runs over the near spins of x)
Ising Model(cont.)
• Now we replace the second summation by its mean :
ξ(𝜎𝑥) = -h𝜎𝑥 - j𝜎𝑥 < 𝜎 𝑦 > We obtain
ξ(𝜎𝑥) = −ℎ0 𝜎𝑥
• Note that if we are use this approximation to average the entire
system we can use this approximation to have:
𝐸 𝑚𝑓 =𝐸0-h 𝑥 𝜎𝑥
The solution single Bolzman spin dist.:
P(𝑠𝑖) = 𝑒 𝑎∗𝑠 𝑖 /(𝑒 𝑎∗𝑠 𝑖 +𝑒−𝑎∗𝑠 𝑖)
Remarks
1 Maxwell speeds – The use of independency for “achieving” normal
distribution
2 RBM
3 Conditional Random Field (CRF)
4. Cybenko., G. (1989) "Approximations by superposition of sigmoidal
functions“
5. Kullback & Leibler “On Information and Sufficiency”
6. David Blei – Latent Dirichlet Allocations (and the rest of his papers)
7. Expected maximization algorithm (EM, Baum-Welch)
VI in Python
• https://gist.github.com/AustinRochford/91cabfd2e1eecf9049774ce529b
a4c16
• https://www.cs.toronto.edu/~duvenaud/papers/blackbox.pdf
• Edward- edwardlib.org/
• Pymc3 - http://pymc-
devs.github.io/pymc3/notebooks/bayesian_neural_network_advi.html
• Pystan - http://mc-stan.org/interfaces/pystan.html
• Tensorflow - https://github.com/carpedm20/variational-text-tensorflow
VI –Other Languages.
• R- https://artax.karlin.mff.cuni.cz/r-help/library/varbvs/html/00Index.html
• R - https://cran.r-project.org/web/packages/varbvs/varbvs.pdf
• R - https://github.com/kieranrcampbell/clvm (claim that they implement
CAVI )
• Blog on Scala http://alexminnaar.com/online-latent-dirichlet-allocation-
the-best-option-for-topic-modeling-with-large-data-sets.html
• Spark mllib -
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a
pache/spark/mllib/clustering/LDAOptimizer.scala

More Related Content

What's hot

Variational Inference
Variational InferenceVariational Inference
Variational InferenceTushar Tank
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Alexander Decker
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionFlavio Morelli
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...Christian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 

What's hot (20)

Nested sampling
Nested samplingNested sampling
Nested sampling
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 

Similar to Variational Inference Explained

Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL DivergenceNatan Katz
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfMuhammadUmerIhtisham
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the tofariyaPatel
 
Finite mathematics
Finite mathematicsFinite mathematics
Finite mathematicsIgor Rivin
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptxAbdusSadik
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
A Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsA Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsYasmine Anino
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment helpHomeworkAssignmentHe
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filterszukun
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
clustering tendency
clustering tendencyclustering tendency
clustering tendencyAmir Shokri
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2NBER
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 

Similar to Variational Inference Explained (20)

Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdf
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
 
Finite mathematics
Finite mathematicsFinite mathematics
Finite mathematics
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
A Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsA Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium Problems
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
 
Ch01
Ch01Ch01
Ch01
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filters
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
clustering tendency
clustering tendencyclustering tendency
clustering tendency
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 

More from Natan Katz (11)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Variational Inference Explained

  • 2. Framework –Bayesian Inference • The inputs: 1. Sample of length n (numbers, categories, vectors, images) We denote this entity–Evidence 2. An assumption about the probabilistic structure that generates the sample –Hypothesis Posterior =P(H|E) Objective : GainUpdate information about the Hypothesis using the Evidence
  • 3. Bayesian Inference- into formulas • Estimating hypothesis upon the evidence: • Z,X random variables We wish to have P(Z|X) . • Bayes formula: P(Z|X) = 𝑃(𝑍,𝑋) 𝑃(𝑋) Bayesian inference is therefore about working with the RHS terms.
  • 4. RHS terms • Hidden Variables (Hypothesis)(Z)- The variables of the mechanism that generates the sample (e.g. topics distribution in a corpus or the Gaussians in GMM) 1. The values are not given 2. We have the joint distribution P(Z,X) !!! • Observed Data (Evidence) (X)- The sample that we actually have. 1. We know every value 2. We may know nothing about its distribution
  • 5. RHS terms (Cont.) • In some case studies P(X) is intractable or extremely difficult to calculate. • We cannot obtain the conditional distribution based on Bayes formula’s terms • Variational inference offers a class of algorithms to solve this problem: Approximating posterior for difficult P(X)
  • 6. Examples: GMM (Known # Gaussian & Variance) We have K Gaussians Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) For each sample j =1…n 𝑧𝑗 ~Cat (1/K,1/K…1/K) 𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗 , σ) p(𝑥1….𝑛) = μ1:𝑘 𝑙=1 𝐾 𝑃(μ𝑙) 𝑖=1 𝑛 𝑧 𝑗 𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗 ) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡 !
  • 7. Examples - LDA Corpus D every document of length N N ∼ Poisson(ξ) θ ∼ Dir(α). Β -Topics (array of words ,fixed or Dirichlet) For each of the N words 𝑤 𝑛: a topic 𝑧 𝑛 ∼ Cat(θ). 𝑤 𝑛 ~ p(𝑤 𝑛|𝑧 𝑛, β) β𝑖𝑗 = P(𝑤𝑖|𝑧𝑗 ) p(w|α, β) = 𝑃(θ|α) 𝑖=1 𝑛 𝑧 𝑛 (p(𝑧 𝑛|θ)p(wn|𝑧 𝑛, β))dθ
  • 8. Sampling • The common solution for estimating distributions is sampling: 1. MCMC • Metropolis-Hastings • Gibbs 2 RBM –(Mostly by Gibbs sampling) 3. Hybrid Monte Carlo • Today we wont talk about these methods!
  • 9. Sampling Vs. Analysis • Sampling 1. The solutions are exact 2. Numerically expensive Deterministic 1. Solutions are cheaper 2. Less accurate 3. Non-conjugate problem 4. An optimization process
  • 10. Sampling Vs. Analysis cont. • MCMC methods are good for small data where accuracy is essential • When we have big data and many modes should be tested ,VI methods have an advantage
  • 11. Can we do something analytically? • Can we analytically approximate the posterior ?! • Can we find a distribution that is closed to the posterior and well estimate the distance? • When the framework is a vector space 1. Calculus –Allows us to find extremums easily 2. We are endowed with 𝐿 𝑃 metrics (typically p=1 , 2) • Our domain is the functions space and their functional We need: 1. An analytical method to find functional’s extremums 2. Nice metric
  • 12. Calculus of Variations • Consider the following 𝐹, 𝑦 functions-(with all the “extras”) J(y)= 𝐹( 𝑦, 𝑦′ 𝑡)𝑑𝑡 (𝑦 𝑖𝑠 𝑑𝑖𝑓𝑓. ) If y is an extremum of J it satisfies Euler-Lagrange eq. 𝑑𝐹 𝑑𝑦 - 𝑑 𝑑𝑡 ( 𝑑𝐹 𝑑𝑦′ ) = 0 • Example: maximum entropy principle Generally speaking this domain is a calculus for functional spaces hence it is beneficial for optimizations
  • 13. Calculus of Var. Cont. The fundamental lemma of Calculus of variations: If M continuous, and for all h differentiable 𝑎 𝑏 𝑀 𝑥 ∗ ℎ 𝑥 = 0 ⟹ 𝑀 ≡ 0 on (a,b) (Chybenko) Generally speaking this domain is a calculus for functional spaces hence it is beneficial for optimizations
  • 14. KL (Kullback-Leibler) Divergence • A metric on distributions * “On Information and Sufficiency” 1951 (Ann Math Statist) Properties: 1. Non-symmetric (It actually measures a relative distance :which distribution P observes as the closest) 1. Concave -> 0 is obtained only for Kl(p,p) (proof by concavity of log an Jensen Lagrange multipliers)) 2. The distance between P(x,y) to p(x) *p(y)=0 Usage: Cross Entropy = H(p)+ KL(p,q)
  • 15. PMI (Pointwise Mutual Information) • Let X,Y random variables • PMI(X,Y)=Log[ 𝑃(𝑋=𝑎,𝑌=𝑏) 𝑃 𝑋=𝑎 𝑃(𝑌=𝑏) ] • KL(p(X/Y=a),Q(x)) = 𝑥 𝑃(𝑋 = 𝑥|𝑌 = 𝑎)PMI(X=x, Y=a) • What does this term mean?
  • 16. ELBO- Evidence Lower Bound Consider now P(X) –The Evidence We have : log(P(X)) ≥ 𝐸 𝑄 [log p(x, Z)] − 𝐸 𝑄 [log Q(Z)] The RHS is called ELBO and it is a lower bound of the LHS
  • 17. Back to KL • Having the requested analytical tools we can approximate the posterior: find Q s.t. Q(Z) ~ P(Z|X) : • min(KL(Q(Z)||P(Z|X) ) KL(Q||P(Z|X) )= Log(P(X))- ELBO =>Log(P(X)) = KL(Q||P(Z|X) ) +ELBO P is fixed Hence: Maximizing ELBO =>minimizing KL
  • 18. Let’s use Calculus! • We wish to optimize the ELBO term. We can define a functional : 𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log p(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔( 𝑃(𝑋,𝑍) 𝑄(𝑍) )= J(Q) We can go to Euler –Lagrange here, but let’s try and simplify Q!
  • 19. Mean Field Theory-MFT • The main idea is solving many-body problem (Ising model) Assume system of many bodies (atoms ,other particles) 1. For each body replace its interaction particles with their average. 2. Assume no correlations between interacted bodies We will use section 2 to simplify Q Q(z) = 𝑖=1 𝑛 𝑞𝑖(𝑧𝑖) (Obviously not true)
  • 20. MFT –cont. • We can use now Euler –Lagrange with the constrain 𝑞𝑖(z) =1 • We obtain L𝑜𝑔(𝑞𝑖) = 𝑐𝑜𝑛𝑠𝑡 + 𝐸−𝑖[𝑝 𝑥, 𝑧 ] Bolzman Dist.! Did we win ? No! Note that each 𝑞𝑖 may change other 𝑞 𝑗′ 𝑠
  • 21. Coordinate Ascent Variational Inference CAVI • An iterative algorithm 1. Construct a model P(X,Z) 2. Set sequentially each 𝑞𝑖 to 𝐸−𝑖[𝑝 𝑥, 𝑧 ] +constant 3. As always we repeat until the q’s converge (Wikipedia,Blei) https://www.youtube.com/watch?v=uKxtmkfeuxg “Message passing” – Winn & Bishop Minka 2005, Knowles & Minka 2011
  • 22. Gaussian Example • Let λ0 𝑎0, 𝑏0 , μ0 hyper-parameters τ ~ 𝐺𝑎𝑚𝑚𝑎 (𝑎0, 𝑏0) μ ~N(μ0,(λ0τ)−1) Data= {𝑥1, 𝑥2, … . 𝑥 𝑛} Latent variables Z= (τ, μ) Hyper parameters = {𝑎0, 𝑏0, μ0.λ} P(X, τ, μ) =P(X| τ, μ)P(μ | τ ) P(τ)
  • 23. Gaussian Cont. • P(X|τ, μ) = 𝑖=1 𝑛 𝑁(𝑥 𝑛|τ, μ) • P(μ| τ) = N(μ|μ0,(λ0τ)−1) • P(τ)= 𝐺𝑎𝑚𝑚𝑎 (τ|𝑎0, 𝑏0) • MFT implies: q(μ, τ ) = 𝑞 μ 𝑞 τ (Not that accurate in this case !) Using ELBO formula: Ln(𝑞 μ )= 𝐸τ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C Ln(𝑞 τ )= 𝐸μ[ln(P(X|τ, μ))+ln(P(μ| τ) +ln(P(τ))]+C
  • 24. Stochastic - VI • CAVI does not work well for big data (update for every item) • Stochastic VI- rather updating the q’s, we calculate the gradient of the ELBO, and optimize its parameters (similar to EM) • Used in LDA applications (David Blei et al) • http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisle y2013.pdf • https://www.cs.princeton.edu/courses/archive/fall11/cos597C/readin g/Blei2011.pdf •
  • 25. Appendix- Ising Model • Ferromagnetism (Pierre Weiss ) • Ising Model -Lenz & Ernst Ising • We have a Hamiltonian H (σ) = -h 𝑥 𝜎𝑥 − 𝑗 𝑦𝑥 𝜎𝑥 𝜎 𝑦 (𝜎𝑥 -the spin of a site (atom) y,x are nearest neighbors hence the sum is over adjacent spins, h is the magnetic field and j is the “coupling constant”) • Consider the contribution of a single atom (spin): ξ(𝜎𝑥) = -h𝜎𝑥 -j𝜎𝑥 𝑦 𝜎 𝑦 (y runs over the near spins of x)
  • 26. Ising Model(cont.) • Now we replace the second summation by its mean : ξ(𝜎𝑥) = -h𝜎𝑥 - j𝜎𝑥 < 𝜎 𝑦 > We obtain ξ(𝜎𝑥) = −ℎ0 𝜎𝑥 • Note that if we are use this approximation to average the entire system we can use this approximation to have: 𝐸 𝑚𝑓 =𝐸0-h 𝑥 𝜎𝑥 The solution single Bolzman spin dist.: P(𝑠𝑖) = 𝑒 𝑎∗𝑠 𝑖 /(𝑒 𝑎∗𝑠 𝑖 +𝑒−𝑎∗𝑠 𝑖)
  • 27. Remarks 1 Maxwell speeds – The use of independency for “achieving” normal distribution 2 RBM 3 Conditional Random Field (CRF) 4. Cybenko., G. (1989) "Approximations by superposition of sigmoidal functions“ 5. Kullback & Leibler “On Information and Sufficiency” 6. David Blei – Latent Dirichlet Allocations (and the rest of his papers) 7. Expected maximization algorithm (EM, Baum-Welch)
  • 28. VI in Python • https://gist.github.com/AustinRochford/91cabfd2e1eecf9049774ce529b a4c16 • https://www.cs.toronto.edu/~duvenaud/papers/blackbox.pdf • Edward- edwardlib.org/ • Pymc3 - http://pymc- devs.github.io/pymc3/notebooks/bayesian_neural_network_advi.html • Pystan - http://mc-stan.org/interfaces/pystan.html • Tensorflow - https://github.com/carpedm20/variational-text-tensorflow
  • 29. VI –Other Languages. • R- https://artax.karlin.mff.cuni.cz/r-help/library/varbvs/html/00Index.html • R - https://cran.r-project.org/web/packages/varbvs/varbvs.pdf • R - https://github.com/kieranrcampbell/clvm (claim that they implement CAVI ) • Blog on Scala http://alexminnaar.com/online-latent-dirichlet-allocation- the-best-option-for-topic-modeling-with-large-data-sets.html • Spark mllib - https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a pache/spark/mllib/clustering/LDAOptimizer.scala