tut07.pptx

The Gaussian Distribution
CSC 411/2515
Adopted from PRML 2.3

The Gaussian Distribution
 For a 𝐷-dimensional vector 𝑥, the multivariate Gaussian
distribution takes the form:
𝒩 𝑥|𝜇, Σ =
1
2𝜋
𝐷
2 Σ
1
2
exp −
1
2
𝑥 − 𝜇 ⊤
Σ−1
𝑥 − 𝜇
 Motivations:
 Maximum of the entropy
 Central limit theorem

The Gaussian Distribution: Properties
 The law is a function of the Mahalanobis distance from 𝑥 to 𝜇:
Δ2 = 𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇
 The expectation of 𝑥 under the Gaussian distribution is:
𝔼 𝑥 = 𝜇
 The covariance matrix of 𝑥 is:
cov 𝑥 = Σ

The Gaussian Distribution: Properties
 The law (quadratic function) is constant on elliptical surfaces:
 𝜆𝑖 are the eigenvalues of Σ
 𝑢𝑖 are the associated eigenvectors

The Gaussian Distribution: more examples
 Contours of constant probability density:
in which the covariance matrix is:
a) of general form
b) diagonal
c) proportional to the identity matrix

Conditional Law
 Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with
𝑥 = 𝑥𝑎, 𝑥𝑏
⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏
⊤
Σ =
Σ𝑎𝑎 Σ𝑎𝑏
Σ𝑏𝑎 Σ𝑏𝑏
=
Λ𝑎𝑎 Λ𝑎𝑏
Λ𝑏𝑎 Λ𝑏𝑏
−1
 What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?

Conditional Law
 Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with
𝑥 = 𝑥𝑎, 𝑥𝑏
⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏
⊤
Σ =
=
−1
−
1
2
𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇 =
−
1
2
𝑥𝑎 − 𝜇𝑎
⊤Λ𝑎𝑎 𝑥𝑎 − 𝜇𝑎 −
1
2
𝑥𝑎 − 𝜇𝑎
⊤Λ𝑎𝑏 𝑥𝑏 − 𝜇𝑏
−
1
2
𝑥𝑏 − 𝜇𝑏
⊤
Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎 −
1
2
⊤
Λ𝑏𝑏 𝑥𝑏 − 𝜇𝑏

Conditional Law
 Using the definition:
−
1
2
𝑥 − 𝜇 ⊤
Σ−1
𝑥 − 𝜇 = −
1
2
𝑥⊤
Σ−1
𝑥 + 𝑥⊤
Σ−1
𝜇 + const
 Σ𝑎|𝑏
−1
= Λ𝑎𝑎
 Σ𝑎|𝑏
−1
𝜇𝑎|𝑏 = Λ𝑎𝑎𝜇𝑎 − Λ𝑎𝑏 𝑥𝑏 − 𝜇𝑏
Σ =
=
−1
 Λ𝑎𝑎 = Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏
−1
Σ𝑏𝑎
−1
 Λ𝑎𝑏 = − Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏
−1
Σ𝑏𝑎
−1
Σ𝑎𝑏Σ𝑏𝑏
−1
Inverse partition identity:
𝐴 𝐵
𝐶 𝐷
−1
= 𝑀 −𝑀𝐵𝐷−1
−𝐷−1
𝐶𝑀 𝐷−1
+ 𝐷−1
𝐶𝑀𝐵𝐷−1
𝑀 = 𝐴 − 𝐵𝐷−1
𝐶 −1

Conditional Law
𝜇𝑎|𝑏 = 𝜇𝑎 + Λ𝑎𝑎Λ𝑎𝑏
−1
 The form using precision matrix:
Λ𝑎|𝑏 = Λ𝑎𝑎
𝜇𝑎|𝑏 = 𝜇𝑎 + Σ𝑎𝑏Σ𝑏𝑏
−1
 The conditional distribution 𝑝 𝑥𝑎 𝑥𝑏 is a Gaussian with:
Σ𝑎|𝑏 = Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏
−1
Σ𝑏𝑎

Marginal Law
 The marginal distribution is given by:
𝑝 𝑥𝑎 = 𝑝 𝑥𝑎, 𝑥𝑏 𝑑𝑥𝑏
 Picking out those terms that involve 𝑥𝑏, we have
−
1
2
𝑥𝑏
⊤
Λ𝑏𝑏𝑥𝑏 + 𝑥𝑏
⊤
𝑚 = −
1
2
𝑥𝑏 − Λ𝑏𝑏
−1
𝑚
⊤
Λ𝑏𝑏 𝑥𝑏 − Λ𝑏𝑏
−1
𝑚 +
1
2
𝑚⊤Λ𝑏𝑏
−1
𝑚
𝑚 = Λ𝑏𝑏𝜇𝑏 − Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎
 Integrate over 𝑥𝑏 (unnormalized Gaussian)
exp −
1
2
𝑥𝑏 − Λ𝑏𝑏
−1
𝑚
⊤
Λ𝑏𝑏 𝑥𝑏 − Λ𝑏𝑏
−1
𝑚 𝑑𝑥𝑏
 The integral is equal to the normalization term

Marginal Law
 After integrating over 𝑥𝑏, we pick out the remaining terms:
 The marginal distribution is a Gaussian with
𝔼 𝑥𝑎 = 𝜇𝑎 cov 𝑥𝑎 = Σ𝑎𝑎
−
1
2
𝑥𝑎
⊤Λ𝑎𝑎𝑥𝑎 + 𝑥𝑎
⊤ Λ𝑎𝑎𝜇𝑎 + Λ𝑎𝑏𝜇𝑏 +
1
2
𝑚⊤Λ𝑏𝑏
−1
𝑚 + const
𝑚 = Λ𝑏𝑏𝜇𝑏 − Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎

Short Summary
 Conditional distribution:
Σ =
=
−1
𝑝 𝑥𝑎 𝑥𝑏 = 𝒩 𝑥𝑎|𝜇𝑎|𝑏, Λ𝑎𝑎
−1
𝜇𝑎|𝑏 = 𝜇𝑎 − Λ𝑎𝑎
−1
Λ𝑎𝑏(𝑥𝑏 − 𝜇𝑏)
 Marginal distribution:
𝑝 𝑥𝑎 = 𝒩 𝑥𝑎|𝜇𝑎, Σ𝑎𝑎

Bayes’ theorem for Gaussian variables
 Setup:
 What’s the marginal distribution 𝑝 𝑦 and conditional
distribution 𝑝 𝑥|𝑦 ?
𝑝 𝑥 = 𝒩 𝑥|𝜇, Λ−1
𝑝 𝑦|𝑥 = 𝒩 𝑦|A𝑥 + 𝑏, L−1
 How about first compute 𝑝 𝑧 , where 𝑧 = 𝑥, 𝑦 ⊤
 𝑝 𝑧 is a Gaussian distribution, consider the log of the
joint distribution
ln 𝑝 𝑧 = ln 𝑝 𝑥 + ln 𝑝 𝑦 𝑥
= −
1
2
𝑥 − 𝜇 ⊤
Λ 𝑥 − 𝜇
−
1
2
𝑦 − A𝑥 + 𝑏 ⊤L 𝑦 − A𝑥 + 𝑏 + const

Bayes’ theorem for Gaussian variables
 The same trick (consider the second order terms), we get
𝔼 𝑧 =
𝜇
A𝜇 + 𝑏
cov 𝑧 = Λ−1
Λ−1
A
AΛ−1 𝐿−1 + AΛ−1A
 We can then get 𝑝 𝑦 and 𝑝 𝑥|𝑦 by marginal and conditional
laws!

Maximum likelihood for the Gaussian
 Assume we have X = 𝑥1, … , 𝑥𝑁
⊤
in which the observation
𝑥𝑛 are assumed to be drawn independently from a
multivariate Gaussian, the log likelihood function is given by
ln 𝑝 X 𝜇, Σ = −
𝑁𝐷
2
ln 2𝜋 −
𝑁
2
ln Σ −
1
2
𝑛=1
𝑁
𝑥𝑛 − 𝜇 ⊤
Σ−1
𝑥𝑛 − 𝜇
 Setting the derivative to zero, we obtain the solution for the
maximum likelihood estimator:
𝜕
𝜕𝜇
ln 𝑝 X 𝜇, Σ =
𝑛=1
𝑁
Σ−1 𝑥𝑛 − 𝜇 = 0
ΣML =
1
𝑁
𝑛=1
𝑁
𝑥𝑛 − 𝜇ML 𝑥𝑛 − 𝜇ML
⊤
𝜇ML =
1
𝑁
𝑛=1
𝑁
𝑥𝑛

Maximum likelihood for the Gaussian
 The empirical mean is unbiased in the sense
𝔼 𝜇ML = 𝜇
 However, the maximum likelihood estimate for the covariance
has an expectation that is less that the true value:
𝔼 ΣML =
𝑁 − 1
𝑁
Σ
 We can correct it by multiplying ΣML by the factor
𝑁
𝑁−1

Conjugate prior for the Gaussian
 Introducing prior distributions over the parameters of the
Gaussian
 The maximum likelihood framework only gives point estimates
for the parameters, we would like to have uncertainty
estimation (confidence interval) for the estimation
 The conjugate prior for 𝜇 is a Gaussian
 The conjugate prior for precision Λ is a Gamma distribution
 We would like the posterior 𝑝 𝜃 𝐷 ∝ 𝑝 𝜃 𝑝(𝐷|𝜃) has the
same form as the prior (Conjugate prior!)

The Gaussian Distribution: limitations
 A lot of parameters to estimate 𝐷 +
𝐷2+𝐷
2
: structured
approximation (e.g., diagonal variance matrix)
 Maximum likelihood estimators are not robust to outliers:
Student’s t-distribution (bottom left)
 Not able to describe periodic data: von Mises distribution
 Unimodel distribution: Mixture of Gaussian (bottom right)

The Gaussian Distribution: frontiers
 Gaussian Process
 Bayesian Neural Networks
 Generative modeling (Variational Autoencoder)
 ……

tut07.pptx

Recommended

Recommended

More Related Content

Similar to tut07.pptx

Similar to tut07.pptx (20)

Recently uploaded

Recently uploaded (20)

tut07.pptx