Composing graphical models with neural networks for structured representations and fast inference
1. CS592 Presentation #18
Composing graphical models with
neural networks for structured
representations and fast inference
20173586 Jeongmin Cha
20184666 Yajie Zhou
20174463 Jaesung Choe
2. Content
1. Motivation
2. Modeling idea
3. Structural Variational Autoencoder (SVAE)
4. Background
5. Main algorithm
6. Experiment
7. Group Discussion Point
3. 1. Motivation
● How can we build interpretable models of high-dimensional data?
● modeling video of a mouse
● a mouse usually repeats a certain behavior
dart groom rear
4. 1. Motivation
● We want a model
○ can explain which behavior
most is performing at each frame
5. 1. Motivation
● What we want to do is ...
● segment and categorize mouse behavior from the video
● Q: generative vs discriminative model for this task?
6. 1. Motivation
● What we want to do is ...
● segment and categorize mouse behavior from the video
● Q: generative vs discriminative model for this task?
○ We can use both
○ a large number of labeled data needed in discriminative scheme
○ discriminative model relaxes conditional independence assumption
so may achieve better predictive result
7. 1. Motivation
● What we want to do is ...
● segment and categorize mouse behavior from the video
● Q: generative vs discriminative model for this task?
○ We can use both
○ a large number of unlabeled data from a small number of labeled data
8. 1. Motivation
● What we want to do is ...
● segment and categorize mouse behavior from the video
● Q: generative vs discriminative model for this task?
○ We can use both
○ a large number of unlabeled data from a small number of labeled data
○ This paper want to build a generative model for video of a mouse
9. 1. Motivation
● a generative model for video of a mouse
● a mouse repeats certain behaviors
GMM = one solution
11. 1. Motivation
● a mixture of gaussians fits the data poorly
● reports too many clusters (not natural clustering result)
GMMGMM
12. 1. Motivation
● neural network fits data well
● but, difficult to interpret in high dimensions (lack interpretability)
GMM
density net
(VAE)
13. 1. Motivation
● neural network fits data well
● but, difficult to interpret in high dimensions (lack interpretability)
● does not explicitly represent discrete mixture components,
GMM
density net
(VAE)
An appropriate model might
switch between discrete states
14. 1. Motivation
● How about combining both? (Graphical model + Deep Learning)
● Structured Variational AutoEncoder (SVAE)
density net
(VAE)
GMM GMM SVAE
16. 1. Motivation
● Q: Graphical model vs Deep Learning, pros and cons?
● specify explicit relationship between variables before learning
○ Graphical model configuration starts from a higher level (deduction)
○ Deep learning configuration starts from a lower level (induction)
17. 1. Motivation
● Graphical model
○ + interpretable, structured
representations
○ + data and computational efficiency
○ - strong assumptions may not fit
○ - feature engineering
○ - top-down inference
● Deep learning
○ - not directly interpretable structure
○ - can require lots of data
○ + flexible representations, learn
automatically
○ + feature learning
○ + recognition networks (bottom-up)
18. 2. Modeling idea
● graphical models on latent variables
○ structured probability distributions
○ fast exact inference subroutines
● neural network models (VAE) for observations
○ produce flexible non-linear feature manifold
■ nonlinear high-dimensional data to low-dimensional and dense representations
○ recognition network
■ instead of learning variational distribution parameters directly
■ map observations to conjugate graphical model potentials
■
19. 3. Structure Variational AutoEncoder (SVAE)
Under the exponential conjugate property we can define SAVE as below,
where p(θ) is the prior distribution and the p(x|θ) is the posterior distribution.
Statistics function is defined and the partition function is
described as,
Finally, we should like to infer the likelihood.
20. 3. Structure Variational AutoEncoder (SVAE)
Under the exponential conjugate property we can define SAVE as below,
where p(θ) is the prior distribution and the p(x|θ) is the posterior distribution.
Statistics function is defined and the partition function is
described as,
Finally, we should like to infer the likelihood.
Discussion Point:
Can you tell the fundamental
difference between VAE and
SAVE ?
21. 3. Structure Variational AutoEncoder (SVAE)
Under the exponential conjugate property we can define SAVE as below,
where p(θ) is the prior distribution and the p(x|θ) is the posterior distribution.
Statistics function is defined and the partition function is
described as,
Finally, we should like to infer the likelihood.
Discussion Point:
Can you tell the fundamental
difference between VAE and
SAVE ? : Conjugate property
22. 4. Background : conjugate distribution (VAE vs SVAE)
if the posterior distributions p(x|θ) are in the same probability distribution family as the prior probability
distribution p(θ), the prior and posterior are then called conjugate distributions.
What is conjugate distribution?
if the likelihood function is Poisson distribution, choosing a Poisson prior over the parameter λ will ensure that the
posterior distribution is also Poisson distribution.
Example
where λ = 4 where k = 4 and θ = 1
Posteriori : Prior:
23. 4. Background : conjugate distribution.
if the posterior distributions p(x|θ) are in the same probability distribution family as the prior probability
distribution p(θ), the prior and posterior are then called conjugate distributions.
What is conjugate distribution?
if the likelihood function is Poisson distribution, choosing a Poisson prior over the parameter λ will ensure that the
posterior distribution is also Poisson distribution.
Example
Likelihood:
(assume i=1, …, 6)
If we set k = 10 and θ = 0.5, Prior becomes posterior:
Prior:
24. 4. Background : conjugate distribution.
if the posterior distributions p(x|θ) are in the same probability distribution family as the prior probability
distribution p(θ), the prior and posterior are then called conjugate distributions.
What is conjugate distribution?
estimate the likelihood by updating the parameters of our prior
— reflecting a new mean and confidence level
25. 4. Background : conjugate distribution.
if the posterior distributions p(x|θ) are in the same probability distribution family as the prior probability
distribution p(θ), the prior and posterior are then called conjugate distributions.
What is conjugate distribution?
estimate the likelihood by updating the parameters of our prior
— reflecting a new mean and confidence level
Discussion Point:
Why the upper property is
important in SVAE ?
26. 4. Background : conjugate distribution.
if the posterior distributions p(θ | x) are in the same probability distribution family as the prior probability
distribution p(θ), the prior and posterior are then called conjugate distributions.
What is conjugate distribution?
estimate the likelihood by updating the parameters of our prior
— reflecting a new mean and confidence level
Discussion Point:
Why the upper property is
important in SVAE ?
A : Conjugacy property is
useful in Bayesian inference !
27. 4. Background : conjugate distribution.
the integral of the marginal likelihood = is intractable.
Intractability in VAE.
Conjugacy in SVAE (Proposition B.4)
=
where the posterior p(θ|x) is in the same exponential family as p(θ) with the natural parameter
=
, and d are statistic function.
28. 4. Background : conjugate distribution.
the integral of the marginal likelihood = is intractable.
Intractability in VAE.
Conjugacy in SVAE (Proposition B.4)
=
where the posterior p(θ|x) is in the same exponential family as p(θ) with the natural parameter
=
, and d are statistic function.
VAE handles a general non-conjugate observation models by introducing recognition network.
29. 4. Background : conjugate distribution.
the integral of the marginal likelihood = is intractable.
Intractability in VAE.
Conjugacy in SVAE (Proposition B.4)
where the posterior p(θ|x) is in the same exponential family as p(θ)
=
, and are statistic function.
30. 4. Background : conjugate distribution.
the integral of the marginal likelihood = is intractable.
Intractability in VAE.
Conjugacy in SVAE (Proposition B.4)
=
where the posterior p(θ|x) is in the same exponential family as p(θ) with the natural parameter
=
, and d are statistic function.
31. 4. Background : conjugate distribution.
the integral of the marginal likelihood = is intractable.
Intractability in VAE.
Conjugacy in SVAE (Proposition B.4)
= =
=
This relationship is useful in Bayesian inference under the conjugacy property.
43. https://www.youtube.com/watch?v=9WSb-89UsEo&t=60s
(This video is disabled to be watched on other sites except for Youtube)
7. Group discussion
Group Discussion Point:
VAE vs SVAE : which model can have better
performance? (Is it strike or ball?)
: Strike
: None
: Strike
: Ball
፠ Supplementary material
- For those who are not familiar with the baseball rules.
44. https://www.youtube.com/watch?v=9WSb-89UsEo&t=60s
(This video is disabled to be watched on other sites except for Youtube)
7. Group discussion
Group Discussion Point:
VAE vs SVAE : which model can have better
performance? (Is it strike or ball?)
: Strike
: None
: Strike
: Ball
፠ Supplementary material
- For those who are not familiar with the baseball rules.
Hint or not
45. If SVAE follows the linear-chain structure,
(expect) SVAE can predict better accuracy in video classification.
(expect) VAE would be better for the single image classification.
7. Group discussion
46. If SVAE follows the linear-chain structure,
(expect) SVAE can predict better accuracy in video classification.
(expect) VAE would be better for the single image classification.
7. Group discussion
NO
47. By the way, what is the result?
strike - ball
7. Group discussion
48. By the way, what is the result?
strike - ball
7. Group discussion
scoreboard
49. By the way, what is the result? Strick!!
7. Group discussion
strike - ball
scoreboard
50. By the way, what is the result? Strick!! How did you check the results?
7. Group discussion
strike - ball
scoreboard
51. By the way, what is the result? Strick!! How did you check the results?
I think just single frame is enough !!
7. Group discussion
strike - ball
scoreboard
52. As we check the scoreboard,
AI also look at the scoreboard for the inference.
In other words, we do not need sequential frames.
(Our expectation) VAE would be better.
If we mask the scoreboard,
(our expectation) SVAE would be better.
7. Group discussion
Where AI is looking at (i.e. high attention)
Mask(non-observable area)