SlideShare a Scribd company logo
The Gaussian Distribution
CSC 411/2515
Adopted from PRML 2.3
The Gaussian Distribution
 For a 𝐷-dimensional vector 𝑥, the multivariate Gaussian
distribution takes the form:
𝒩 𝑥|𝜇, Σ =
1
2𝜋
𝐷
2 Σ
1
2
exp −
1
2
𝑥 − 𝜇 ⊤
Σ−1
𝑥 − 𝜇
 Motivations:
 Maximum of the entropy
 Central limit theorem
The Gaussian Distribution: Properties
 The law is a function of the Mahalanobis distance from 𝑥 to 𝜇:
Δ2 = 𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇
 The expectation of 𝑥 under the Gaussian distribution is:
𝔼 𝑥 = 𝜇
 The covariance matrix of 𝑥 is:
cov 𝑥 = Σ
The Gaussian Distribution: Properties
 The law (quadratic function) is constant on elliptical surfaces:
 𝜆𝑖 are the eigenvalues of Σ
 𝑢𝑖 are the associated eigenvectors
The Gaussian Distribution: more examples
 Contours of constant probability density:
in which the covariance matrix is:
a) of general form
b) diagonal
c) proportional to the identity matrix
Conditional Law
 Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with
𝑥 = 𝑥𝑎, 𝑥𝑏
⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏
⊤
Σ =
Σ𝑎𝑎 Σ𝑎𝑏
Σ𝑏𝑎 Σ𝑏𝑏
=
Λ𝑎𝑎 Λ𝑎𝑏
Λ𝑏𝑎 Λ𝑏𝑏
−1
 What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?
Conditional Law
 Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with
𝑥 = 𝑥𝑎, 𝑥𝑏
⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏
⊤
Σ =
Σ𝑎𝑎 Σ𝑎𝑏
Σ𝑏𝑎 Σ𝑏𝑏
=
Λ𝑎𝑎 Λ𝑎𝑏
Λ𝑏𝑎 Λ𝑏𝑏
−1
 What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?
−
1
2
𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇 =
−
1
2
𝑥𝑎 − 𝜇𝑎
⊤Λ𝑎𝑎 𝑥𝑎 − 𝜇𝑎 −
1
2
𝑥𝑎 − 𝜇𝑎
⊤Λ𝑎𝑏 𝑥𝑏 − 𝜇𝑏
−
1
2
𝑥𝑏 − 𝜇𝑏
⊤
Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎 −
1
2
𝑥𝑏 − 𝜇𝑏
⊤
Λ𝑏𝑏 𝑥𝑏 − 𝜇𝑏
Conditional Law
 Using the definition:
−
1
2
𝑥 − 𝜇 ⊤
Σ−1
𝑥 − 𝜇 = −
1
2
𝑥⊤
Σ−1
𝑥 + 𝑥⊤
Σ−1
𝜇 + const
 What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?
 Σ𝑎|𝑏
−1
= Λ𝑎𝑎
 Σ𝑎|𝑏
−1
𝜇𝑎|𝑏 = Λ𝑎𝑎𝜇𝑎 − Λ𝑎𝑏 𝑥𝑏 − 𝜇𝑏
Σ =
Σ𝑎𝑎 Σ𝑎𝑏
Σ𝑏𝑎 Σ𝑏𝑏
=
Λ𝑎𝑎 Λ𝑎𝑏
Λ𝑏𝑎 Λ𝑏𝑏
−1
 Λ𝑎𝑎 = Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏
−1
Σ𝑏𝑎
−1
 Λ𝑎𝑏 = − Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏
−1
Σ𝑏𝑎
−1
Σ𝑎𝑏Σ𝑏𝑏
−1
Inverse partition identity:
𝐴 𝐵
𝐶 𝐷
−1
= 𝑀 −𝑀𝐵𝐷−1
−𝐷−1
𝐶𝑀 𝐷−1
+ 𝐷−1
𝐶𝑀𝐵𝐷−1
𝑀 = 𝐴 − 𝐵𝐷−1
𝐶 −1
Conditional Law
𝜇𝑎|𝑏 = 𝜇𝑎 + Λ𝑎𝑎Λ𝑎𝑏
−1
𝑥𝑏 − 𝜇𝑏
 The form using precision matrix:
Λ𝑎|𝑏 = Λ𝑎𝑎
𝜇𝑎|𝑏 = 𝜇𝑎 + Σ𝑎𝑏Σ𝑏𝑏
−1
𝑥𝑏 − 𝜇𝑏
 The conditional distribution 𝑝 𝑥𝑎 𝑥𝑏 is a Gaussian with:
Σ𝑎|𝑏 = Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏
−1
Σ𝑏𝑎
Marginal Law
 The marginal distribution is given by:
𝑝 𝑥𝑎 = 𝑝 𝑥𝑎, 𝑥𝑏 𝑑𝑥𝑏
 Picking out those terms that involve 𝑥𝑏, we have
−
1
2
𝑥𝑏
⊤
Λ𝑏𝑏𝑥𝑏 + 𝑥𝑏
⊤
𝑚 = −
1
2
𝑥𝑏 − Λ𝑏𝑏
−1
𝑚
⊤
Λ𝑏𝑏 𝑥𝑏 − Λ𝑏𝑏
−1
𝑚 +
1
2
𝑚⊤Λ𝑏𝑏
−1
𝑚
𝑚 = Λ𝑏𝑏𝜇𝑏 − Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎
 Integrate over 𝑥𝑏 (unnormalized Gaussian)
exp −
1
2
𝑥𝑏 − Λ𝑏𝑏
−1
𝑚
⊤
Λ𝑏𝑏 𝑥𝑏 − Λ𝑏𝑏
−1
𝑚 𝑑𝑥𝑏
 The integral is equal to the normalization term
Marginal Law
 After integrating over 𝑥𝑏, we pick out the remaining terms:
 The marginal distribution is a Gaussian with
𝔼 𝑥𝑎 = 𝜇𝑎 cov 𝑥𝑎 = Σ𝑎𝑎
−
1
2
𝑥𝑎
⊤Λ𝑎𝑎𝑥𝑎 + 𝑥𝑎
⊤ Λ𝑎𝑎𝜇𝑎 + Λ𝑎𝑏𝜇𝑏 +
1
2
𝑚⊤Λ𝑏𝑏
−1
𝑚 + const
𝑚 = Λ𝑏𝑏𝜇𝑏 − Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎
Short Summary
 Conditional distribution:
Σ =
Σ𝑎𝑎 Σ𝑎𝑏
Σ𝑏𝑎 Σ𝑏𝑏
=
Λ𝑎𝑎 Λ𝑎𝑏
Λ𝑏𝑎 Λ𝑏𝑏
−1
𝑝 𝑥𝑎 𝑥𝑏 = 𝒩 𝑥𝑎|𝜇𝑎|𝑏, Λ𝑎𝑎
−1
𝜇𝑎|𝑏 = 𝜇𝑎 − Λ𝑎𝑎
−1
Λ𝑎𝑏(𝑥𝑏 − 𝜇𝑏)
 Marginal distribution:
𝑝 𝑥𝑎 = 𝒩 𝑥𝑎|𝜇𝑎, Σ𝑎𝑎
Bayes’ theorem for Gaussian variables
 Setup:
 What’s the marginal distribution 𝑝 𝑦 and conditional
distribution 𝑝 𝑥|𝑦 ?
𝑝 𝑥 = 𝒩 𝑥|𝜇, Λ−1
𝑝 𝑦|𝑥 = 𝒩 𝑦|A𝑥 + 𝑏, L−1
 How about first compute 𝑝 𝑧 , where 𝑧 = 𝑥, 𝑦 ⊤
 𝑝 𝑧 is a Gaussian distribution, consider the log of the
joint distribution
ln 𝑝 𝑧 = ln 𝑝 𝑥 + ln 𝑝 𝑦 𝑥
= −
1
2
𝑥 − 𝜇 ⊤
Λ 𝑥 − 𝜇
−
1
2
𝑦 − A𝑥 + 𝑏 ⊤L 𝑦 − A𝑥 + 𝑏 + const
Bayes’ theorem for Gaussian variables
 The same trick (consider the second order terms), we get
𝔼 𝑧 =
𝜇
A𝜇 + 𝑏
cov 𝑧 = Λ−1
Λ−1
A
AΛ−1 𝐿−1 + AΛ−1A
 We can then get 𝑝 𝑦 and 𝑝 𝑥|𝑦 by marginal and conditional
laws!
Maximum likelihood for the Gaussian
 Assume we have X = 𝑥1, … , 𝑥𝑁
⊤
in which the observation
𝑥𝑛 are assumed to be drawn independently from a
multivariate Gaussian, the log likelihood function is given by
ln 𝑝 X 𝜇, Σ = −
𝑁𝐷
2
ln 2𝜋 −
𝑁
2
ln Σ −
1
2
𝑛=1
𝑁
𝑥𝑛 − 𝜇 ⊤
Σ−1
𝑥𝑛 − 𝜇
 Setting the derivative to zero, we obtain the solution for the
maximum likelihood estimator:
𝜕
𝜕𝜇
ln 𝑝 X 𝜇, Σ =
𝑛=1
𝑁
Σ−1 𝑥𝑛 − 𝜇 = 0
ΣML =
1
𝑁
𝑛=1
𝑁
𝑥𝑛 − 𝜇ML 𝑥𝑛 − 𝜇ML
⊤
𝜇ML =
1
𝑁
𝑛=1
𝑁
𝑥𝑛
Maximum likelihood for the Gaussian
 The empirical mean is unbiased in the sense
𝔼 𝜇ML = 𝜇
 However, the maximum likelihood estimate for the covariance
has an expectation that is less that the true value:
𝔼 ΣML =
𝑁 − 1
𝑁
Σ
 We can correct it by multiplying ΣML by the factor
𝑁
𝑁−1
Conjugate prior for the Gaussian
 Introducing prior distributions over the parameters of the
Gaussian
 The maximum likelihood framework only gives point estimates
for the parameters, we would like to have uncertainty
estimation (confidence interval) for the estimation
 The conjugate prior for 𝜇 is a Gaussian
 The conjugate prior for precision Λ is a Gamma distribution
 We would like the posterior 𝑝 𝜃 𝐷 ∝ 𝑝 𝜃 𝑝(𝐷|𝜃) has the
same form as the prior (Conjugate prior!)
The Gaussian Distribution: limitations
 A lot of parameters to estimate 𝐷 +
𝐷2+𝐷
2
: structured
approximation (e.g., diagonal variance matrix)
 Maximum likelihood estimators are not robust to outliers:
Student’s t-distribution (bottom left)
 Not able to describe periodic data: von Mises distribution
 Unimodel distribution: Mixture of Gaussian (bottom right)
The Gaussian Distribution: frontiers
 Gaussian Process
 Bayesian Neural Networks
 Generative modeling (Variational Autoencoder)
 ……

More Related Content

Similar to tut07.pptx

On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
International Journal of Engineering Inventions www.ijeijournal.com
 
Probability Proficiency.pptx
Probability Proficiency.pptxProbability Proficiency.pptx
Probability Proficiency.pptx
HarshGupta137011
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)
Ahmad Gomaa
 
Helmholtz equation (Motivations and Solutions)
Helmholtz equation (Motivations and Solutions)Helmholtz equation (Motivations and Solutions)
Helmholtz equation (Motivations and Solutions)
Hassaan Saleem
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
mathsjournal
 
Generalised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double SequencesGeneralised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double Sequences
IOSR Journals
 
Lesson 3: Problem Set 4
Lesson 3: Problem Set 4Lesson 3: Problem Set 4
Lesson 3: Problem Set 4
Kevin Johnson
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
SantiagoGarridoBulln
 
Conformal Boundary conditions
Conformal Boundary conditionsConformal Boundary conditions
Conformal Boundary conditions
Hassaan Saleem
 
Tutorial_3_Solution_.pdf
Tutorial_3_Solution_.pdfTutorial_3_Solution_.pdf
Tutorial_3_Solution_.pdf
ssuserfb9ae6
 
Calculas
CalculasCalculas
Calculas
Vatsal Manavar
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Jongsu "Liam" Kim
 
Computational Method for Engineers: Solving a system of Linear Equations
Computational Method for Engineers: Solving a system of Linear EquationsComputational Method for Engineers: Solving a system of Linear Equations
Computational Method for Engineers: Solving a system of Linear Equations
Bektu Dida
 
Unit 5 Correlation
Unit 5 CorrelationUnit 5 Correlation
Unit 5 Correlation
Rai University
 
Gaussian Process Regression
Gaussian Process Regression  Gaussian Process Regression
Gaussian Process Regression
SEMINARGROOT
 
WavesAppendix.pdf
WavesAppendix.pdfWavesAppendix.pdf
WavesAppendix.pdf
cfisicaster
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's method
SEMINARGROOT
 
HK_Partial Differential Equations_Laplace equation.pdf
HK_Partial Differential Equations_Laplace equation.pdfHK_Partial Differential Equations_Laplace equation.pdf
HK_Partial Differential Equations_Laplace equation.pdf
happycocoman
 
Stochastic Optimization
Stochastic OptimizationStochastic Optimization
Stochastic Optimization
Mohammad Reza Jabbari
 

Similar to tut07.pptx (20)

On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
On the Analysis of the Finite Element Solutions of Boundary Value Problems Us...
 
Probability Proficiency.pptx
Probability Proficiency.pptxProbability Proficiency.pptx
Probability Proficiency.pptx
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)
 
Helmholtz equation (Motivations and Solutions)
Helmholtz equation (Motivations and Solutions)Helmholtz equation (Motivations and Solutions)
Helmholtz equation (Motivations and Solutions)
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
Generalised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double SequencesGeneralised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double Sequences
 
Lesson 3: Problem Set 4
Lesson 3: Problem Set 4Lesson 3: Problem Set 4
Lesson 3: Problem Set 4
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
Conformal Boundary conditions
Conformal Boundary conditionsConformal Boundary conditions
Conformal Boundary conditions
 
Tutorial_3_Solution_.pdf
Tutorial_3_Solution_.pdfTutorial_3_Solution_.pdf
Tutorial_3_Solution_.pdf
 
Calculas
CalculasCalculas
Calculas
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
Computational Method for Engineers: Solving a system of Linear Equations
Computational Method for Engineers: Solving a system of Linear EquationsComputational Method for Engineers: Solving a system of Linear Equations
Computational Method for Engineers: Solving a system of Linear Equations
 
Unit 5 Correlation
Unit 5 CorrelationUnit 5 Correlation
Unit 5 Correlation
 
Gaussian Process Regression
Gaussian Process Regression  Gaussian Process Regression
Gaussian Process Regression
 
WavesAppendix.pdf
WavesAppendix.pdfWavesAppendix.pdf
WavesAppendix.pdf
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's method
 
HK_Partial Differential Equations_Laplace equation.pdf
HK_Partial Differential Equations_Laplace equation.pdfHK_Partial Differential Equations_Laplace equation.pdf
HK_Partial Differential Equations_Laplace equation.pdf
 
Stochastic Optimization
Stochastic OptimizationStochastic Optimization
Stochastic Optimization
 

Recently uploaded

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

tut07.pptx

  • 1. The Gaussian Distribution CSC 411/2515 Adopted from PRML 2.3
  • 2. The Gaussian Distribution  For a 𝐷-dimensional vector 𝑥, the multivariate Gaussian distribution takes the form: 𝒩 𝑥|𝜇, Σ = 1 2𝜋 𝐷 2 Σ 1 2 exp − 1 2 𝑥 − 𝜇 ⊤ Σ−1 𝑥 − 𝜇  Motivations:  Maximum of the entropy  Central limit theorem
  • 3. The Gaussian Distribution: Properties  The law is a function of the Mahalanobis distance from 𝑥 to 𝜇: Δ2 = 𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇  The expectation of 𝑥 under the Gaussian distribution is: 𝔼 𝑥 = 𝜇  The covariance matrix of 𝑥 is: cov 𝑥 = Σ
  • 4. The Gaussian Distribution: Properties  The law (quadratic function) is constant on elliptical surfaces:  𝜆𝑖 are the eigenvalues of Σ  𝑢𝑖 are the associated eigenvectors
  • 5. The Gaussian Distribution: more examples  Contours of constant probability density: in which the covariance matrix is: a) of general form b) diagonal c) proportional to the identity matrix
  • 6. Conditional Law  Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with 𝑥 = 𝑥𝑎, 𝑥𝑏 ⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏 ⊤ Σ = Σ𝑎𝑎 Σ𝑎𝑏 Σ𝑏𝑎 Σ𝑏𝑏 = Λ𝑎𝑎 Λ𝑎𝑏 Λ𝑏𝑎 Λ𝑏𝑏 −1  What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?
  • 7. Conditional Law  Given a Gaussian distribution 𝒩 𝑥|𝜇, Σ with 𝑥 = 𝑥𝑎, 𝑥𝑏 ⊤, 𝜇 = 𝜇𝑎, 𝜇𝑏 ⊤ Σ = Σ𝑎𝑎 Σ𝑎𝑏 Σ𝑏𝑎 Σ𝑏𝑏 = Λ𝑎𝑎 Λ𝑎𝑏 Λ𝑏𝑎 Λ𝑏𝑏 −1  What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)? − 1 2 𝑥 − 𝜇 ⊤Σ−1 𝑥 − 𝜇 = − 1 2 𝑥𝑎 − 𝜇𝑎 ⊤Λ𝑎𝑎 𝑥𝑎 − 𝜇𝑎 − 1 2 𝑥𝑎 − 𝜇𝑎 ⊤Λ𝑎𝑏 𝑥𝑏 − 𝜇𝑏 − 1 2 𝑥𝑏 − 𝜇𝑏 ⊤ Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎 − 1 2 𝑥𝑏 − 𝜇𝑏 ⊤ Λ𝑏𝑏 𝑥𝑏 − 𝜇𝑏
  • 8. Conditional Law  Using the definition: − 1 2 𝑥 − 𝜇 ⊤ Σ−1 𝑥 − 𝜇 = − 1 2 𝑥⊤ Σ−1 𝑥 + 𝑥⊤ Σ−1 𝜇 + const  What’s the conditional distribution 𝑝(𝑥𝑎|𝑥𝑏)?  Σ𝑎|𝑏 −1 = Λ𝑎𝑎  Σ𝑎|𝑏 −1 𝜇𝑎|𝑏 = Λ𝑎𝑎𝜇𝑎 − Λ𝑎𝑏 𝑥𝑏 − 𝜇𝑏 Σ = Σ𝑎𝑎 Σ𝑎𝑏 Σ𝑏𝑎 Σ𝑏𝑏 = Λ𝑎𝑎 Λ𝑎𝑏 Λ𝑏𝑎 Λ𝑏𝑏 −1  Λ𝑎𝑎 = Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏 −1 Σ𝑏𝑎 −1  Λ𝑎𝑏 = − Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏 −1 Σ𝑏𝑎 −1 Σ𝑎𝑏Σ𝑏𝑏 −1 Inverse partition identity: 𝐴 𝐵 𝐶 𝐷 −1 = 𝑀 −𝑀𝐵𝐷−1 −𝐷−1 𝐶𝑀 𝐷−1 + 𝐷−1 𝐶𝑀𝐵𝐷−1 𝑀 = 𝐴 − 𝐵𝐷−1 𝐶 −1
  • 9. Conditional Law 𝜇𝑎|𝑏 = 𝜇𝑎 + Λ𝑎𝑎Λ𝑎𝑏 −1 𝑥𝑏 − 𝜇𝑏  The form using precision matrix: Λ𝑎|𝑏 = Λ𝑎𝑎 𝜇𝑎|𝑏 = 𝜇𝑎 + Σ𝑎𝑏Σ𝑏𝑏 −1 𝑥𝑏 − 𝜇𝑏  The conditional distribution 𝑝 𝑥𝑎 𝑥𝑏 is a Gaussian with: Σ𝑎|𝑏 = Σ𝑎𝑎 − Σ𝑎𝑏Σ𝑏𝑏 −1 Σ𝑏𝑎
  • 10. Marginal Law  The marginal distribution is given by: 𝑝 𝑥𝑎 = 𝑝 𝑥𝑎, 𝑥𝑏 𝑑𝑥𝑏  Picking out those terms that involve 𝑥𝑏, we have − 1 2 𝑥𝑏 ⊤ Λ𝑏𝑏𝑥𝑏 + 𝑥𝑏 ⊤ 𝑚 = − 1 2 𝑥𝑏 − Λ𝑏𝑏 −1 𝑚 ⊤ Λ𝑏𝑏 𝑥𝑏 − Λ𝑏𝑏 −1 𝑚 + 1 2 𝑚⊤Λ𝑏𝑏 −1 𝑚 𝑚 = Λ𝑏𝑏𝜇𝑏 − Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎  Integrate over 𝑥𝑏 (unnormalized Gaussian) exp − 1 2 𝑥𝑏 − Λ𝑏𝑏 −1 𝑚 ⊤ Λ𝑏𝑏 𝑥𝑏 − Λ𝑏𝑏 −1 𝑚 𝑑𝑥𝑏  The integral is equal to the normalization term
  • 11. Marginal Law  After integrating over 𝑥𝑏, we pick out the remaining terms:  The marginal distribution is a Gaussian with 𝔼 𝑥𝑎 = 𝜇𝑎 cov 𝑥𝑎 = Σ𝑎𝑎 − 1 2 𝑥𝑎 ⊤Λ𝑎𝑎𝑥𝑎 + 𝑥𝑎 ⊤ Λ𝑎𝑎𝜇𝑎 + Λ𝑎𝑏𝜇𝑏 + 1 2 𝑚⊤Λ𝑏𝑏 −1 𝑚 + const 𝑚 = Λ𝑏𝑏𝜇𝑏 − Λ𝑏𝑎 𝑥𝑎 − 𝜇𝑎
  • 12. Short Summary  Conditional distribution: Σ = Σ𝑎𝑎 Σ𝑎𝑏 Σ𝑏𝑎 Σ𝑏𝑏 = Λ𝑎𝑎 Λ𝑎𝑏 Λ𝑏𝑎 Λ𝑏𝑏 −1 𝑝 𝑥𝑎 𝑥𝑏 = 𝒩 𝑥𝑎|𝜇𝑎|𝑏, Λ𝑎𝑎 −1 𝜇𝑎|𝑏 = 𝜇𝑎 − Λ𝑎𝑎 −1 Λ𝑎𝑏(𝑥𝑏 − 𝜇𝑏)  Marginal distribution: 𝑝 𝑥𝑎 = 𝒩 𝑥𝑎|𝜇𝑎, Σ𝑎𝑎
  • 13. Bayes’ theorem for Gaussian variables  Setup:  What’s the marginal distribution 𝑝 𝑦 and conditional distribution 𝑝 𝑥|𝑦 ? 𝑝 𝑥 = 𝒩 𝑥|𝜇, Λ−1 𝑝 𝑦|𝑥 = 𝒩 𝑦|A𝑥 + 𝑏, L−1  How about first compute 𝑝 𝑧 , where 𝑧 = 𝑥, 𝑦 ⊤  𝑝 𝑧 is a Gaussian distribution, consider the log of the joint distribution ln 𝑝 𝑧 = ln 𝑝 𝑥 + ln 𝑝 𝑦 𝑥 = − 1 2 𝑥 − 𝜇 ⊤ Λ 𝑥 − 𝜇 − 1 2 𝑦 − A𝑥 + 𝑏 ⊤L 𝑦 − A𝑥 + 𝑏 + const
  • 14. Bayes’ theorem for Gaussian variables  The same trick (consider the second order terms), we get 𝔼 𝑧 = 𝜇 A𝜇 + 𝑏 cov 𝑧 = Λ−1 Λ−1 A AΛ−1 𝐿−1 + AΛ−1A  We can then get 𝑝 𝑦 and 𝑝 𝑥|𝑦 by marginal and conditional laws!
  • 15. Maximum likelihood for the Gaussian  Assume we have X = 𝑥1, … , 𝑥𝑁 ⊤ in which the observation 𝑥𝑛 are assumed to be drawn independently from a multivariate Gaussian, the log likelihood function is given by ln 𝑝 X 𝜇, Σ = − 𝑁𝐷 2 ln 2𝜋 − 𝑁 2 ln Σ − 1 2 𝑛=1 𝑁 𝑥𝑛 − 𝜇 ⊤ Σ−1 𝑥𝑛 − 𝜇  Setting the derivative to zero, we obtain the solution for the maximum likelihood estimator: 𝜕 𝜕𝜇 ln 𝑝 X 𝜇, Σ = 𝑛=1 𝑁 Σ−1 𝑥𝑛 − 𝜇 = 0 ΣML = 1 𝑁 𝑛=1 𝑁 𝑥𝑛 − 𝜇ML 𝑥𝑛 − 𝜇ML ⊤ 𝜇ML = 1 𝑁 𝑛=1 𝑁 𝑥𝑛
  • 16. Maximum likelihood for the Gaussian  The empirical mean is unbiased in the sense 𝔼 𝜇ML = 𝜇  However, the maximum likelihood estimate for the covariance has an expectation that is less that the true value: 𝔼 ΣML = 𝑁 − 1 𝑁 Σ  We can correct it by multiplying ΣML by the factor 𝑁 𝑁−1
  • 17. Conjugate prior for the Gaussian  Introducing prior distributions over the parameters of the Gaussian  The maximum likelihood framework only gives point estimates for the parameters, we would like to have uncertainty estimation (confidence interval) for the estimation  The conjugate prior for 𝜇 is a Gaussian  The conjugate prior for precision Λ is a Gamma distribution  We would like the posterior 𝑝 𝜃 𝐷 ∝ 𝑝 𝜃 𝑝(𝐷|𝜃) has the same form as the prior (Conjugate prior!)
  • 18. The Gaussian Distribution: limitations  A lot of parameters to estimate 𝐷 + 𝐷2+𝐷 2 : structured approximation (e.g., diagonal variance matrix)  Maximum likelihood estimators are not robust to outliers: Student’s t-distribution (bottom left)  Not able to describe periodic data: von Mises distribution  Unimodel distribution: Mixture of Gaussian (bottom right)
  • 19. The Gaussian Distribution: frontiers  Gaussian Process  Bayesian Neural Networks  Generative modeling (Variational Autoencoder)  ……