This document discusses Bayesian methods for Gaussian and Student's t-distribution models. It covers Bayesian linear regression, Gaussian processes, and variational mixture of Gaussians methods. The key points are:
- Bayesian inference for Gaussian distributions uses conjugate prior distributions like Gaussian-Gaussian for a known variance and Gaussian-Gamma for unknown variance.
- The Student's t-distribution can be represented as an infinite mixture of Gaussians, making it more robust to outliers than a single Gaussian.
- Gibbs sampling can be used to fit finite and infinite Gaussian mixture models to data in an unsupervised manner.
17. Assume we are given samples from p(z,µ),
one at the time.
The Robbins-Monro Algorithm
18. Successive estimates of µN
are then given by
Conditions on aN for convergence :
The Robbins-Monro Algorithm
19. Example: estimate the mean of a Gaussian.
Robbins-Monro for Maximum Likelihood
The distribution of z is Gaussian
with mean μML.
For the Robbins-Monro update
equation, aN = σ/N.
21. Bayesian Inference for the Gaussian (1)
Assume σ2
is known. Given i.i.d. data
, the likelihood function for
μ is given by
22. Bayesian Inference for the Gaussian (2)
Combined with a Gaussian prior over μ,
this gives the posterior
Completing the square over σ2
, we see that
26. Bayesian Inference for the Gaussian (5)
Now assume μ is known. The likelihood
function for λ=1/σ2
is given by
This has a Gamma shape as a function of λ.
28. Bayesian Inference for the Gaussian (7)
Now we combine a Gamma prior, ,
with the likelihood function for λ¸ to obtain
which we recognize as with
29. Bayesian Inference for the Gaussian (8)
If both μ and λ are unknown, the joint
likelihood function is given by
We need a prior with the same functional
dependence on μ and λ
38. Student’s t-Distribution (2)
If use the inverse Wishart distribution as the
prior, we also get a Student t-distribution
Gibbs sampling for fitting finite and infinite Gaussian mixture models by Herman Kamper
39. An application on IGMM (1)
How many Gaussian distributions
can approximate the right data ?
40. An application on IGMM (2)
Data
Use 6
Gaussians!
Let the data speak for itself.
The likelihood function depends on the data set only through the two quantities
Want a better expiation.
The prior is not only gives the prior knowledge, it also play as an “memory ” in the inference process. E.g., A Professor publish 0 paper this week, I published 1. Only look at this week, it seems I am better.
But no body will think in this way. You have to consider the history, maybe the professor published 1000 papers before but I got only 2. What we did before is the prior, it helps us get a better ajudgement.
but if I keep publish paper, the professor stopped. When I made 2000 papers with the same quality, it is more likely I will surpass him. In this process, the prior plays as an “memory”