SlideShare a Scribd company logo
1 of 21
Variational Inference
Presenter: Shuai Zhang, CSE, UNSW
Content
1
•Brief Introduction
2
•Core Idea of VI
•Optimization
3
•Example: Bayesian Mix of Gauss
What is Variational Inference?
Variational Bayesian methods are a family of techniques for
approximating intractable integrals arising in Bayesian inference
and machine learning [Wiki].
It is widely used to approximate posterior densities for Bayesian
models, an alternative strategy to Markov Chain Monte Carlo, but
it tends to be faster and easier to scale to large data.
It has been applied to problems such as document analysis,
computational neuroscience and computer vision.
Core Idea
Consider a general problem of Bayesian Inference - Let the latent
variables in our problem be and the observed data
Inference in a Bayesian model amounts to conditioning on data
and computing the posterior
Approximate Inference
The inference problem is to compute the conditional given by the
below equation.
the denominator is the marginal distribution of the data obtained
by marginalizing all the latent variables from the joint distribution
p(x,z).
For many models, this evidence integral is unavailable in closed
form or requires exponential time to compute. The evidence is
what we need to compute the conditional from the joint; this is
why inference in such models is hard
MCMC
In MCMC, we first construct an ergodic Markov chain on z whose
stationary distribution is the posterior
Then, we sample from the chain to collect samples from the
stationary distribution.
Finally, we approximate the posterior with an empirical estimate
constructed from the collected samples.
VI vs. MCMC
MCMC VI
More computationally intensive Less intensive
Guarantees producing
asymptotically exact samples from
target distribution
No such guarantees
Slower Faster, especially for large data
sets and complex distributions
Best for precise inference Useful to explore many scenarios
quickly or large data sets
Core Idea
Rather than use sampling, the main idea behind variational
inference is to use optimization.
we restrict ourselves a family of approximate distributions D over
the latent variables. We then try to find the member of that
family that minimizes the Kullback-Leibler divergence to the exact
posterior. This reduces to solving an optimization problem.
The goal is to approximate p(z|x) with the resulting q(z). We
optimize q(z) for minimal value of KL divergence
Core Idea
The objective is not computable. Because
Because we cannot compute the KL, we optimize an alternative
objective that is equivalent to the KL up to an added constant.
We know from our discussion of EM.
Core Idea
Thus, we have the objective function:
Maximizing the ELBO is equivalent to minimizing the KL
divergence.
Intuition: We rewrite the ELBO as a sum of the expected log
likelihood of the data and the KL divergence between the prior
p(z) and q(z)
Mean field approximation
Now that we have specified the variational objective function
with the ELBO, we now need to specify the variational family of
distributions from which we pick the approximate variational
distribution.
A common family of distributions to pick is the Mean-field
variational family. Here, the latent variables are mutually
independent and each governed by a distinct factor in the
variational distribution.
Coordinate ascent mean-field VI
Having specified our objective function and the variational family
of distributions from which to pick the approximation, we now
work to optimize.
CAVI maximizes ELBO by iteratively optimizing each variational
factor of the mean-field variational distribution, while holding
the others fixed. It however, does not guarantee finding the
global optimum.
Coordinate ascent mean-field VI
given that we fix the value of all other variational factors ql(zl) (l
not equal to j), the optimal 𝑞 𝑗(𝑧𝑗) is proportional to the
exponentiated expected log of the complete conditional. This
then is equivalent to being proportional to the log of the joint
because the mean-field family assumes that all the latent
variables are independent.
Coordinate ascent mean-field VI
Below, we rewrite the first term using iterated expectation and
for the second term, we have only retained the term that
depends on
In this final equation, the RHS is equal to the negative KL
divergence between 𝑞 𝑗 and exp(A). Thus, maximizing this
expression is the same as minimizing the KL divergence between
𝑞 𝑗 and exp(A).
This occurs when 𝑞 𝑗 =exp(A).
Coordinate ascent mean-field VI
Bayesian Mixture of Gaussians
The full hierarchical model of
The joint dist.
Bayesian Mixture of Gaussians
The mean field variational family contains approximate posterior
densities of the form
Bayesian Mixture of Gaussians
Derive the ELBO as a function of the variational factors. Solve for
the ELBO
Bayesian Mixture of Gaussians
Next, we derive the CAVI update for the variational factors.
References
1. https://am207.github.io/2017/wiki/VI.html
2. https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf
3. https://www.cs.cmu.edu/~epxing/Class/10708-17/notes-17/10708-scribe-lecture13.pdf
4. https://arxiv.org/pdf/1601.00670.pdf
Week Report
• Last week
• Metric Factorization model
• Learning Group
• This week
• Submit the ICDE paper

More Related Content

What's hot

Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING csandit
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsSangwoo Mo
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Taiji Suzuki
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...Taiji Suzuki
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangChinmay Patel
 

What's hot (11)

Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Birch
BirchBirch
Birch
 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep Representations
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Optics
OpticsOptics
Optics
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 

Similar to Learning group variational inference

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in PythonPeadar Coyle
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...Jeongmin Cha
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesSho Takase
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsBrittany Allen
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Mumbai Academisc
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencekeerthikaA8
 

Similar to Learning group variational inference (20)

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Zahedi
ZahediZahedi
Zahedi
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in Python
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete Problems
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

More from Shuai Zhang

Introduction to Random Walk
Introduction to Random WalkIntroduction to Random Walk
Introduction to Random WalkShuai Zhang
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312Shuai Zhang
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copyShuai Zhang
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605Shuai Zhang
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboostShuai Zhang
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 

More from Shuai Zhang (8)

Introduction to Random Walk
Introduction to Random WalkIntroduction to Random Walk
Introduction to Random Walk
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 

Learning group variational inference

  • 2. Content 1 •Brief Introduction 2 •Core Idea of VI •Optimization 3 •Example: Bayesian Mix of Gauss
  • 3. What is Variational Inference? Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning [Wiki]. It is widely used to approximate posterior densities for Bayesian models, an alternative strategy to Markov Chain Monte Carlo, but it tends to be faster and easier to scale to large data. It has been applied to problems such as document analysis, computational neuroscience and computer vision.
  • 4. Core Idea Consider a general problem of Bayesian Inference - Let the latent variables in our problem be and the observed data Inference in a Bayesian model amounts to conditioning on data and computing the posterior
  • 5. Approximate Inference The inference problem is to compute the conditional given by the below equation. the denominator is the marginal distribution of the data obtained by marginalizing all the latent variables from the joint distribution p(x,z). For many models, this evidence integral is unavailable in closed form or requires exponential time to compute. The evidence is what we need to compute the conditional from the joint; this is why inference in such models is hard
  • 6. MCMC In MCMC, we first construct an ergodic Markov chain on z whose stationary distribution is the posterior Then, we sample from the chain to collect samples from the stationary distribution. Finally, we approximate the posterior with an empirical estimate constructed from the collected samples.
  • 7. VI vs. MCMC MCMC VI More computationally intensive Less intensive Guarantees producing asymptotically exact samples from target distribution No such guarantees Slower Faster, especially for large data sets and complex distributions Best for precise inference Useful to explore many scenarios quickly or large data sets
  • 8. Core Idea Rather than use sampling, the main idea behind variational inference is to use optimization. we restrict ourselves a family of approximate distributions D over the latent variables. We then try to find the member of that family that minimizes the Kullback-Leibler divergence to the exact posterior. This reduces to solving an optimization problem. The goal is to approximate p(z|x) with the resulting q(z). We optimize q(z) for minimal value of KL divergence
  • 9. Core Idea The objective is not computable. Because Because we cannot compute the KL, we optimize an alternative objective that is equivalent to the KL up to an added constant. We know from our discussion of EM.
  • 10. Core Idea Thus, we have the objective function: Maximizing the ELBO is equivalent to minimizing the KL divergence. Intuition: We rewrite the ELBO as a sum of the expected log likelihood of the data and the KL divergence between the prior p(z) and q(z)
  • 11. Mean field approximation Now that we have specified the variational objective function with the ELBO, we now need to specify the variational family of distributions from which we pick the approximate variational distribution. A common family of distributions to pick is the Mean-field variational family. Here, the latent variables are mutually independent and each governed by a distinct factor in the variational distribution.
  • 12. Coordinate ascent mean-field VI Having specified our objective function and the variational family of distributions from which to pick the approximation, we now work to optimize. CAVI maximizes ELBO by iteratively optimizing each variational factor of the mean-field variational distribution, while holding the others fixed. It however, does not guarantee finding the global optimum.
  • 13. Coordinate ascent mean-field VI given that we fix the value of all other variational factors ql(zl) (l not equal to j), the optimal 𝑞 𝑗(𝑧𝑗) is proportional to the exponentiated expected log of the complete conditional. This then is equivalent to being proportional to the log of the joint because the mean-field family assumes that all the latent variables are independent.
  • 14. Coordinate ascent mean-field VI Below, we rewrite the first term using iterated expectation and for the second term, we have only retained the term that depends on In this final equation, the RHS is equal to the negative KL divergence between 𝑞 𝑗 and exp(A). Thus, maximizing this expression is the same as minimizing the KL divergence between 𝑞 𝑗 and exp(A). This occurs when 𝑞 𝑗 =exp(A).
  • 16. Bayesian Mixture of Gaussians The full hierarchical model of The joint dist.
  • 17. Bayesian Mixture of Gaussians The mean field variational family contains approximate posterior densities of the form
  • 18. Bayesian Mixture of Gaussians Derive the ELBO as a function of the variational factors. Solve for the ELBO
  • 19. Bayesian Mixture of Gaussians Next, we derive the CAVI update for the variational factors.
  • 20. References 1. https://am207.github.io/2017/wiki/VI.html 2. https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf 3. https://www.cs.cmu.edu/~epxing/Class/10708-17/notes-17/10708-scribe-lecture13.pdf 4. https://arxiv.org/pdf/1601.00670.pdf
  • 21. Week Report • Last week • Metric Factorization model • Learning Group • This week • Submit the ICDE paper

Editor's Notes

  1. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  2. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  3. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  4. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  5. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  6. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  7. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  8. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  9. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  10. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  11. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  12. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  13. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  14. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  15. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  16. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  17. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  18. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  19. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  20. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  21. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation