2. The Gaussian Distribution
For a 𝐷-dimensional vector 𝑥, the multivariate Gaussian
distribution takes the form:
𝒩 𝑥|𝜇, Σ =
1
2𝜋
𝐷
2 Σ
1
2
exp −
1
2
𝑥 − 𝜇 ⊤
Σ−1
𝑥 − 𝜇
Motivations:
Maximum of the entropy
Central limit theorem
3. The Gaussian Distribution: Properties
The law is a function of the Mahalanobis distance from 𝑥 to 𝜇:
Δ2 = 𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇
The expectation of 𝑥 under the Gaussian distribution is:
𝔼 𝑥 = 𝜇
The covariance matrix of 𝑥 is:
cov 𝑥 = Σ
4. The Gaussian Distribution: Properties
The law (quadratic function) is constant on elliptical surfaces:
𝜆𝑖 are the eigenvalues of Σ
𝑢𝑖 are the associated eigenvectors
5. The Gaussian Distribution: more examples
Contours of constant probability density:
in which the covariance matrix is:
a) of general form
b) diagonal
c) proportional to the identity matrix
6. Conditional Law
Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with
𝑥 = 𝑥𝑎, 𝑥𝑏
⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏
⊤
Σ =
Σ𝑎𝑎 Σ𝑎𝑏
Σ𝑏𝑎 Σ𝑏𝑏
=
Λ𝑎𝑎 Λ𝑎𝑏
Λ𝑏𝑎 Λ𝑏𝑏
−1
What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?
13. Bayes’ theorem for Gaussian variables
Setup:
What’s the marginal distribution 𝑝 𝑦 and conditional
distribution 𝑝 𝑥|𝑦 ?
𝑝 𝑥 = 𝒩 𝑥|𝜇, Λ−1
𝑝 𝑦|𝑥 = 𝒩 𝑦|A𝑥 + 𝑏, L−1
How about first compute 𝑝 𝑧 , where 𝑧 = 𝑥, 𝑦 ⊤
𝑝 𝑧 is a Gaussian distribution, consider the log of the
joint distribution
ln 𝑝 𝑧 = ln 𝑝 𝑥 + ln 𝑝 𝑦 𝑥
= −
1
2
𝑥 − 𝜇 ⊤
Λ 𝑥 − 𝜇
−
1
2
𝑦 − A𝑥 + 𝑏 ⊤L 𝑦 − A𝑥 + 𝑏 + const
14. Bayes’ theorem for Gaussian variables
The same trick (consider the second order terms), we get
𝔼 𝑧 =
𝜇
A𝜇 + 𝑏
cov 𝑧 = Λ−1
Λ−1
A
AΛ−1 𝐿−1 + AΛ−1A
We can then get 𝑝 𝑦 and 𝑝 𝑥|𝑦 by marginal and conditional
laws!
15. Maximum likelihood for the Gaussian
Assume we have X = 𝑥1, … , 𝑥𝑁
⊤
in which the observation
𝑥𝑛 are assumed to be drawn independently from a
multivariate Gaussian, the log likelihood function is given by
ln 𝑝 X 𝜇, Σ = −
𝑁𝐷
2
ln 2𝜋 −
𝑁
2
ln Σ −
1
2
𝑛=1
𝑁
𝑥𝑛 − 𝜇 ⊤
Σ−1
𝑥𝑛 − 𝜇
Setting the derivative to zero, we obtain the solution for the
maximum likelihood estimator:
𝜕
𝜕𝜇
ln 𝑝 X 𝜇, Σ =
𝑛=1
𝑁
Σ−1 𝑥𝑛 − 𝜇 = 0
ΣML =
1
𝑁
𝑛=1
𝑁
𝑥𝑛 − 𝜇ML 𝑥𝑛 − 𝜇ML
⊤
𝜇ML =
1
𝑁
𝑛=1
𝑁
𝑥𝑛
16. Maximum likelihood for the Gaussian
The empirical mean is unbiased in the sense
𝔼 𝜇ML = 𝜇
However, the maximum likelihood estimate for the covariance
has an expectation that is less that the true value:
𝔼 ΣML =
𝑁 − 1
𝑁
Σ
We can correct it by multiplying ΣML by the factor
𝑁
𝑁−1
17. Conjugate prior for the Gaussian
Introducing prior distributions over the parameters of the
Gaussian
The maximum likelihood framework only gives point estimates
for the parameters, we would like to have uncertainty
estimation (confidence interval) for the estimation
The conjugate prior for 𝜇 is a Gaussian
The conjugate prior for precision Λ is a Gamma distribution
We would like the posterior 𝑝 𝜃 𝐷 ∝ 𝑝 𝜃 𝑝(𝐷|𝜃) has the
same form as the prior (Conjugate prior!)
18. The Gaussian Distribution: limitations
A lot of parameters to estimate 𝐷 +
𝐷2+𝐷
2
: structured
approximation (e.g., diagonal variance matrix)
Maximum likelihood estimators are not robust to outliers:
Student’s t-distribution (bottom left)
Not able to describe periodic data: von Mises distribution
Unimodel distribution: Mixture of Gaussian (bottom right)
19. The Gaussian Distribution: frontiers
Gaussian Process
Bayesian Neural Networks
Generative modeling (Variational Autoencoder)
……