3. Generative Models
GAN: More focusing on sampling
VAE: Maximize the lower bound of likelihood using surrogate loss
Normalizing Flow: Exact likelihood maximization via invertible transformations
https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html
4. Generative Models - limitations
GAN: High fidelity but hard to train
VAE: Not exact likelihood maximization
Normalizing Flow: Lack of flexibility (must be invertible)
https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html
6. Motivations
Why we need PDF?
- MLE equal to minimizing KLD with real data distribution.
- In PDF, all densities compete with each other.
https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture19.pdf
7. Motivations
: normalizing constant
: unnormalized function
or energy function
- The integral part of normalizing constant is intractable.
- AR, NF handle this problem via special structures which make the unit normalizing constant.
It is difficult to define a flexible, high-capacity probability density function
12. Score matching
Gradient is zero in data points. Local maxima regularization
But computing hessian trace
needs O(D) backprop!
13. Score matching - sliced score matching
Project the score onto random vectors
Equals to
Hessian-vector product, https://en.wikipedia.org/wiki/Hessian_automatic_differentiation
-> single backprop
14. Score matching - denoising score matching
Predict the score of the perturbed distribution instead of p(x).
Minimizing the object function above give us the optimal score function
But results could be noisy if is large
Let be gaussian,
15. Pitfalls
Now that we have score function, so let’s sample through langevin dynamics.
Under some conditions, it is proved that is the exact sample from p(x)!
But there are some problems...
16. Pitfalls - manifold hypothesis
The manifold hypothesis: Data lie on low-dimensional manifold.
Problems
- Score function is inaccurate in the low-density regions.
- It is difficult to recover the relative weights between modes.
17. Pitfalls - inaccurate score function
is not well-defined in the low density regions.
Since we train the objective function above via montecarlo estimation,
a model observes less samples from the low density regions.
19. Score matching - recovering relative weights
Langevin dynamics failed to recover the relative weights between two modes.
20. Perturbed distribution
The low density regions can be filled by injecting the noises.
But how to determine the noise strength? -> gradually decrease the variance.
21. Noise Conditional Score Network (NCSN)
A single network estimates the score of multiple perturbed data distribution.
25. Inverse Problem Solving
Inverse problem is Bayesian inference problem. We want to know p(x|y) when p(y|x) is given. E.g,
super resolution, colorization, inpainting...
score matching
known forward
process
We could sample from p(x|y) via langevin dynamics!
27. Conclusion
Very flexible architecture compared to NF or AR. Score function can be any function.
Therefore, modern DL architectures can be used (Resnet, UNet, etc).
Sampling from exact p(x) is possible compared to VAEs which use surrogate loss.
GAN-level fidelity without minmax game.
Naturally solves inverse problem.
28. References
Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." arXiv
preprint arXiv:1907.05600 (2019).
Song, Yang, et al. "Score-based generative modeling through stochastic differential equations." arXiv preprint
arXiv:2011.13456 (2020).
Yang Song blog (https://yang-song.github.io/blog/2021/score/)
Stefano Ermon seminar (https://www.youtube.com/watch?v=8TcNXi3A5DI&t=562s)