SlideShare a Scribd company logo
1 of 17
Chapter 3
Reviewer : Sunwoo Kim
Christopher M. Bishop
Pattern Recognition and Machine Learning
Yonsei University
Department of Applied Statistics
1
Chapter 3.1. Basic Linear Regression
2
1. Common linear regression case : 𝑦 𝑥, 𝑤 = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝐷𝑥𝐷
2. Extending to basis function : 𝑦 𝑥, 𝑤 = 𝑤0 + 𝑤1𝜙1 𝑋 + 𝑤2𝜙2 𝑋 + ⋯ + 𝑤𝑀−1𝜙𝑀−1(𝑋)
3. Notable fact : There exists a relationship that…
Chapter 3.1. Basic Linear Regression
3
We may consider such normal distribution that passes through linear line.
Thus, we can consider the optimization issue as MLE task.
Derivation was covered in
Undergraduate regression analysis
Chapter 3.1. Basic Linear Regression
4
Understanding under geometrical perspective
By definition…
Projection matrix : H = 𝐴 𝐴𝑇
𝐴 −1
𝐴𝑇
; HB = Projecting B on the column space of A
Our estimated value : Φ Φ𝑇
Φ −1
Φ𝑇
; HT = Projecting T on the column space of X
Green vector t : Target value
Blue vector y : Estimated value (in our course 𝑦)
Sequential update of linear regression
Familiar form!
Just like gradient descent
Chapter 3.1. Basic Linear Regression
5
Regularization
Preventing the overfitting, also called weight decay.
Most common l2 regularization : L =
Min loss without penalty
Min loss with penalty
Min loss without penalty
Min loss with penalty
1
2
𝑡𝑛 − 𝑊𝑇
𝜙 𝑥𝑛
2
+
𝜆
2
𝑊𝑇
𝑊
1
2
𝑡𝑛 − 𝑊𝑇
𝜙 𝑥𝑛
2
+
𝜆
2
|𝑤𝑗|
Theoretically, l1 regularization(lasso) tends to shrink more, which means the sparse solution.
But it’s hard to get first, second order value. Thus, we use numerical optimization for lasso
Chapter 3.1. Basic Linear Regression
6
Multiple outputs
This is very interesting part.
If we compute multiple output linear regression, how can we estimate values??
e.g. with 𝑋1, 𝑋2, … , 𝑋𝑝, we are predicting house price and house year at the same time!
This result indicates even if we predict multiple outputs,
We are using the same design matrix, and only changing the target value t.
Geometrically, this indicates we are projecting column vectors of t to the Φ’s column space.
We get the same result if we calculate two outputs separately, since we assume t’s column
vectors are independent.
Chapter 3.3. Bayesian linear regression
7
Prior & Posterior of regression
Now we are assuming the probability distribution of the weights (parameters).
Let’s consider simple conjugate prior of normal pdf.
We assume parameter w follows normal distribution!
To make entire process
As simple as we can…
We assume simpler prior
Univariate conjugate prior of
normal dist.
(Normal / Normal / Normal)
Weighted
average
Note that 𝑉𝑎𝑟 𝑊 = 𝛽−1
Φ𝑇
Φ −1
𝑊𝑚𝑙 = Φ𝑇
Φ −1
Φ𝑇
𝒕
This part is the
weighted prior mean
This part is the
weighted MLE mean
𝛽 ∗ (Φ𝑇
Φ) Φ𝑇
Φ −1
Φ𝑇
𝒕
Chapter 3.3. Bayesian linear regression
8
Intrinsic regularization of bayes regression
We know that likelihood x prior is proportional to the posterior.
Then let’s re-consider posterior at this point of view.
ln 𝑝 𝑤 𝑡) = ln exp −
𝛽
2
𝑡𝑛 − 𝑊𝑇
𝜙 𝑥𝑛
2
+ ln exp −
𝛼
2
𝑊𝑇
𝑊 + 𝐶 , where 𝐶 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 (nuisance parameters)
∴ ln 𝑝 𝑤 𝑡) ≈ −
𝛽
2
𝑡𝑛 − 𝑊𝑇
𝜙 𝑥𝑛
2
−
𝜶
𝟐
𝑾𝑻
𝑾
Even we did not intend to include the regularization, prior itself acts as a regularization!
Figure shows the sequential updating process of the posterior/prior.
We can see the variance of distribution reduces slowly.
Chapter 3.3. Bayesian linear regression
9
Predictive distribution of Bayesian linear regression
To get the predicted value, we don’t need the parameter distribution itself.
We only need some specific estimated values, like bayes estimator.
Derivation of following equation will be covered in chapter 8.
Important thing is, as 𝑁 → ∞, variance of posterior converges to zero,
and only left noise variance term
1
𝛽
.
Fitted line Generated samples
Chapter 3.3. Bayesian linear regression
10
Predictive distribution of Bayesian linear regression
Since we have studied entire linear regression on the perspective of frequentist, this process is really tricky.
Thus, let’s implement the entire process in a python.
Chapter 3.3. Bayesian linear regression
11
Equivalent kernel and its insight
Let’s talk about the kernel. First, we can get the predicted value of Bayesian regression by the following equations.
This k function is called smoother matrix or the equivalent kernel.
What the heck does this kernel indicates??
This gives an important intuition about the linear regression on the perspective of “weighted average of neighbors”.
You can see a kernel acts as an “similarity measure”. And it is being multiplied with the observed target values 𝒕𝒏.
What does it mean? It shows the estimating process is the weighted mean of the observed target values.
So called kernel, the similarity measure, gives more weights to the true value.
So, if input values have high similarity, it gets higher weights.
Following equations yield similar intuitions.
Chapter 3.4. Bayesian model comparison
12
Equivalent kernel and its insight
Let’s talk about the kernel. First, we can get the predicted value of Bayesian regression by the following equations.
This k function is called smoother matrix or the equivalent kernel.
What the heck does this kernel indicates??
This gives an important intuition about the linear regression on the perspective of “weighted average of neighbors”.
You can see a kernel acts as an “similarity measure”. And it is being multiplied with the observed target values 𝒕𝒏.
What does it mean? It shows the estimating process is the weighted mean of the observed target values.
So called kernel, the similarity measure, gives more weights to the true value.
So, if input values have high similarity, it gets higher weights.
Following equations yield similar intuitions.
Chapter 3.5. The evidence approximation
13
Fully Bayesian treatment
The real predictive distribution is equal to the following equation.
This integral is analytically intractable! Thus, we use other approach.
If distribution 𝑝(𝛼, 𝛽|𝑡) is sharply peaked around (𝛼, 𝛽), we can replace integral process by putting estimated values. That is,
Chapter 3.5. The evidence approximation
14
Evaluation of the evidence function
What we are trying to do is to estimate the nuisance parameter 𝛼 & 𝛽
Which can be known as the likelihood x prior. Overall equation can be rewritten as the following equations.
Which was covered in previous sections.
Now, let’s re-write the 𝐸(𝑊) by the followings.
Then why are we re-writing the equation?
1. We can perform the integral much easier.
2. We can get model comparison.
3. We can get nuisance parameter estimation.
Chapter 3.5. The evidence approximation
15
Re-writing evidence function
Chapter 3.5. The evidence approximation
16
Evidence function for the model comparison
Which model is best for the data??
= Model that yields the best evidence value! max
This difficult integration was
computed easily by re-written
equation!!
Chapter 3.5. The evidence approximation
17
Nuisance parameter estimation of 𝜶 & 𝜷
Why? Determinant is equal to the product of it’s eigen values!!
(We covered this in multivariate analysis!)
Prior variance 𝜶 Likelihood variance 𝜷 (Similar to 𝜎2
)

More Related Content

What's hot

Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
PRML Chapter 11
PRML Chapter 11PRML Chapter 11
PRML Chapter 11Sunwoo Kim
 
PRML輪読#8
PRML輪読#8PRML輪読#8
PRML輪読#8matsuolab
 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6Sunwoo Kim
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節Koji Matsuda
 
PRML 4.4-4.5.2 ラプラス近似
PRML 4.4-4.5.2 ラプラス近似PRML 4.4-4.5.2 ラプラス近似
PRML 4.4-4.5.2 ラプラス近似KokiTakamiya
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )Kenji Urai
 
PRML Chapter 2
PRML Chapter 2PRML Chapter 2
PRML Chapter 2Sunwoo Kim
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)ryotat
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networksjoisino
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12matsuolab
 
PRML輪読#13
PRML輪読#13PRML輪読#13
PRML輪読#13matsuolab
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodHa Phuong
 

What's hot (20)

Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
PRML Chapter 11
PRML Chapter 11PRML Chapter 11
PRML Chapter 11
 
PRML8章
PRML8章PRML8章
PRML8章
 
Prml 10 1
Prml 10 1Prml 10 1
Prml 10 1
 
PRML輪読#8
PRML輪読#8PRML輪読#8
PRML輪読#8
 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Prml2.1 2.2,2.4-2.5
Prml2.1 2.2,2.4-2.5Prml2.1 2.2,2.4-2.5
Prml2.1 2.2,2.4-2.5
 
研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節
 
PRML 4.4-4.5.2 ラプラス近似
PRML 4.4-4.5.2 ラプラス近似PRML 4.4-4.5.2 ラプラス近似
PRML 4.4-4.5.2 ラプラス近似
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
 
PRML Chapter 2
PRML Chapter 2PRML Chapter 2
PRML Chapter 2
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
行列およびテンソルデータに対する機械学習(数理助教の会 2011/11/28)
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networks
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12
 
PRML輪読#13
PRML輪読#13PRML輪読#13
PRML輪読#13
 
PRML 5.3-5.4
PRML 5.3-5.4PRML 5.3-5.4
PRML 5.3-5.4
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling Method
 

Similar to PRML Chapter 3

PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9Sunwoo Kim
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4Sunwoo Kim
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7Sunwoo Kim
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencekeerthikaA8
 
Artificial intelligence.pptx
Artificial intelligence.pptxArtificial intelligence.pptx
Artificial intelligence.pptxkeerthikaA8
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencekeerthikaA8
 
Linear Regression
Linear RegressionLinear Regression
Linear Regressionmailund
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear EquationsNumerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear Equationsinventionjournals
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...butest
 
Physics 1.3 scalars and vectors
Physics 1.3 scalars and vectorsPhysics 1.3 scalars and vectors
Physics 1.3 scalars and vectorsJohnPaul Kennedy
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisMonica Franklin
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equationsXequeMateShannon
 

Similar to PRML Chapter 3 (20)

PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 
AI Lesson 29
AI Lesson 29AI Lesson 29
AI Lesson 29
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Artificial intelligence.pptx
Artificial intelligence.pptxArtificial intelligence.pptx
Artificial intelligence.pptx
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Implementation of 1D NN in Signal Processing Application to Get Relevant Recu...
Implementation of 1D NN in Signal Processing Application to Get Relevant Recu...Implementation of 1D NN in Signal Processing Application to Get Relevant Recu...
Implementation of 1D NN in Signal Processing Application to Get Relevant Recu...
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear EquationsNumerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...
 
Practical --1.pdf
Practical --1.pdfPractical --1.pdf
Practical --1.pdf
 
Physics 1.3 scalars and vectors
Physics 1.3 scalars and vectorsPhysics 1.3 scalars and vectors
Physics 1.3 scalars and vectors
 
ML Lab.docx
ML Lab.docxML Lab.docx
ML Lab.docx
 
Unger
UngerUnger
Unger
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equations
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

PRML Chapter 3

  • 1. Chapter 3 Reviewer : Sunwoo Kim Christopher M. Bishop Pattern Recognition and Machine Learning Yonsei University Department of Applied Statistics 1
  • 2. Chapter 3.1. Basic Linear Regression 2 1. Common linear regression case : 𝑦 𝑥, 𝑤 = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝐷𝑥𝐷 2. Extending to basis function : 𝑦 𝑥, 𝑤 = 𝑤0 + 𝑤1𝜙1 𝑋 + 𝑤2𝜙2 𝑋 + ⋯ + 𝑤𝑀−1𝜙𝑀−1(𝑋) 3. Notable fact : There exists a relationship that…
  • 3. Chapter 3.1. Basic Linear Regression 3 We may consider such normal distribution that passes through linear line. Thus, we can consider the optimization issue as MLE task. Derivation was covered in Undergraduate regression analysis
  • 4. Chapter 3.1. Basic Linear Regression 4 Understanding under geometrical perspective By definition… Projection matrix : H = 𝐴 𝐴𝑇 𝐴 −1 𝐴𝑇 ; HB = Projecting B on the column space of A Our estimated value : Φ Φ𝑇 Φ −1 Φ𝑇 ; HT = Projecting T on the column space of X Green vector t : Target value Blue vector y : Estimated value (in our course 𝑦) Sequential update of linear regression Familiar form! Just like gradient descent
  • 5. Chapter 3.1. Basic Linear Regression 5 Regularization Preventing the overfitting, also called weight decay. Most common l2 regularization : L = Min loss without penalty Min loss with penalty Min loss without penalty Min loss with penalty 1 2 𝑡𝑛 − 𝑊𝑇 𝜙 𝑥𝑛 2 + 𝜆 2 𝑊𝑇 𝑊 1 2 𝑡𝑛 − 𝑊𝑇 𝜙 𝑥𝑛 2 + 𝜆 2 |𝑤𝑗| Theoretically, l1 regularization(lasso) tends to shrink more, which means the sparse solution. But it’s hard to get first, second order value. Thus, we use numerical optimization for lasso
  • 6. Chapter 3.1. Basic Linear Regression 6 Multiple outputs This is very interesting part. If we compute multiple output linear regression, how can we estimate values?? e.g. with 𝑋1, 𝑋2, … , 𝑋𝑝, we are predicting house price and house year at the same time! This result indicates even if we predict multiple outputs, We are using the same design matrix, and only changing the target value t. Geometrically, this indicates we are projecting column vectors of t to the Φ’s column space. We get the same result if we calculate two outputs separately, since we assume t’s column vectors are independent.
  • 7. Chapter 3.3. Bayesian linear regression 7 Prior & Posterior of regression Now we are assuming the probability distribution of the weights (parameters). Let’s consider simple conjugate prior of normal pdf. We assume parameter w follows normal distribution! To make entire process As simple as we can… We assume simpler prior Univariate conjugate prior of normal dist. (Normal / Normal / Normal) Weighted average Note that 𝑉𝑎𝑟 𝑊 = 𝛽−1 Φ𝑇 Φ −1 𝑊𝑚𝑙 = Φ𝑇 Φ −1 Φ𝑇 𝒕 This part is the weighted prior mean This part is the weighted MLE mean 𝛽 ∗ (Φ𝑇 Φ) Φ𝑇 Φ −1 Φ𝑇 𝒕
  • 8. Chapter 3.3. Bayesian linear regression 8 Intrinsic regularization of bayes regression We know that likelihood x prior is proportional to the posterior. Then let’s re-consider posterior at this point of view. ln 𝑝 𝑤 𝑡) = ln exp − 𝛽 2 𝑡𝑛 − 𝑊𝑇 𝜙 𝑥𝑛 2 + ln exp − 𝛼 2 𝑊𝑇 𝑊 + 𝐶 , where 𝐶 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 (nuisance parameters) ∴ ln 𝑝 𝑤 𝑡) ≈ − 𝛽 2 𝑡𝑛 − 𝑊𝑇 𝜙 𝑥𝑛 2 − 𝜶 𝟐 𝑾𝑻 𝑾 Even we did not intend to include the regularization, prior itself acts as a regularization! Figure shows the sequential updating process of the posterior/prior. We can see the variance of distribution reduces slowly.
  • 9. Chapter 3.3. Bayesian linear regression 9 Predictive distribution of Bayesian linear regression To get the predicted value, we don’t need the parameter distribution itself. We only need some specific estimated values, like bayes estimator. Derivation of following equation will be covered in chapter 8. Important thing is, as 𝑁 → ∞, variance of posterior converges to zero, and only left noise variance term 1 𝛽 . Fitted line Generated samples
  • 10. Chapter 3.3. Bayesian linear regression 10 Predictive distribution of Bayesian linear regression Since we have studied entire linear regression on the perspective of frequentist, this process is really tricky. Thus, let’s implement the entire process in a python.
  • 11. Chapter 3.3. Bayesian linear regression 11 Equivalent kernel and its insight Let’s talk about the kernel. First, we can get the predicted value of Bayesian regression by the following equations. This k function is called smoother matrix or the equivalent kernel. What the heck does this kernel indicates?? This gives an important intuition about the linear regression on the perspective of “weighted average of neighbors”. You can see a kernel acts as an “similarity measure”. And it is being multiplied with the observed target values 𝒕𝒏. What does it mean? It shows the estimating process is the weighted mean of the observed target values. So called kernel, the similarity measure, gives more weights to the true value. So, if input values have high similarity, it gets higher weights. Following equations yield similar intuitions.
  • 12. Chapter 3.4. Bayesian model comparison 12 Equivalent kernel and its insight Let’s talk about the kernel. First, we can get the predicted value of Bayesian regression by the following equations. This k function is called smoother matrix or the equivalent kernel. What the heck does this kernel indicates?? This gives an important intuition about the linear regression on the perspective of “weighted average of neighbors”. You can see a kernel acts as an “similarity measure”. And it is being multiplied with the observed target values 𝒕𝒏. What does it mean? It shows the estimating process is the weighted mean of the observed target values. So called kernel, the similarity measure, gives more weights to the true value. So, if input values have high similarity, it gets higher weights. Following equations yield similar intuitions.
  • 13. Chapter 3.5. The evidence approximation 13 Fully Bayesian treatment The real predictive distribution is equal to the following equation. This integral is analytically intractable! Thus, we use other approach. If distribution 𝑝(𝛼, 𝛽|𝑡) is sharply peaked around (𝛼, 𝛽), we can replace integral process by putting estimated values. That is,
  • 14. Chapter 3.5. The evidence approximation 14 Evaluation of the evidence function What we are trying to do is to estimate the nuisance parameter 𝛼 & 𝛽 Which can be known as the likelihood x prior. Overall equation can be rewritten as the following equations. Which was covered in previous sections. Now, let’s re-write the 𝐸(𝑊) by the followings. Then why are we re-writing the equation? 1. We can perform the integral much easier. 2. We can get model comparison. 3. We can get nuisance parameter estimation.
  • 15. Chapter 3.5. The evidence approximation 15 Re-writing evidence function
  • 16. Chapter 3.5. The evidence approximation 16 Evidence function for the model comparison Which model is best for the data?? = Model that yields the best evidence value! max This difficult integration was computed easily by re-written equation!!
  • 17. Chapter 3.5. The evidence approximation 17 Nuisance parameter estimation of 𝜶 & 𝜷 Why? Determinant is equal to the product of it’s eigen values!! (We covered this in multivariate analysis!) Prior variance 𝜶 Likelihood variance 𝜷 (Similar to 𝜎2 )