Slides of pattern recognition Course of Professor Zohreh Azimifar at Shiraz University.
اسلاید های درس شناسایی آماری الگو استاد زهره عظیمی فر در دانشگاه شیراز.
Instrumentation, measurement and control of bio process parameters ( Temperat...
Dr azimifar pattern recognition lect2
1. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Statistical Pattern Recognition
Lecture2
Dr Zohreh Azimifar
School of Electrical and Computer Engineering
Shiraz University
August 3, 2014
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 1 / 36
2. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Table of Contents
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 2 / 36
3. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Regression
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 3 / 36
4. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Regression
In classification, we seek to identify the categorical class Ck
associate with a given input vector x.
In regression, we seek to identify (or estimate) a continuous variable
y associated with a given input vector x.
Here, y is called the dependent variable and x is called the
independent variable.
We assume y = f (x).
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 4 / 36
5. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Regression
Figure: Polynomial Curve Fitting
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 5 / 36
6. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Regression
Figure: Polynomial curve fitting: polynomial of degree 1
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 6 / 36
7. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Regression
Figure: Polynomial curve fitting: polynomial of degree 3
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 7 / 36
8. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Regression
Figure: Polynomial curve fitting: polynomial of degree 9
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 8 / 36
9. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Linear Regression
The simplest scenario is y = ax + b.
Figure: y = x + 2
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 9 / 36
10. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Linear Regression
What are good guesses (â, b̂) for (a, b), base on the data?
Figure: Inexact linear regression
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 10 / 36
11. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
Some Examples
Number of cars in a city and amount of ozone per m3
of air
Number of iterations set in a matlab code and time to finish
computation
Size of a house per m2
and the price
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 11 / 36
12. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
Some Examples
Table: Price of a house as a function of its size and number of bedrooms.
Bias Area no. of Price in
(offset) (m2
) Bedrooms Millions
1 156 3 350
1 60 1 112
1 137 2 283
1 142 1 277
1 147 1 242
1 68 3 154
1 100 2 183
1 86 2 151
1 140 1 246
1 103 3 217
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 12 / 36
13. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
Linear Regression Definition
Assume we have m training samples {(X(j)
, y(j)
) : j = 1, ..., m},
where X(j)
and y(j)
are feature vector and label corresponding to the
jth
training sample.
Assume feature vector (pattern) contains n features, i.e., X(j)
is
n−dimensional.
The goal here is to learn a mapping function h(.) from feature space
X to output label y.
h(.) is called hypothesis function (model).
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 13 / 36
14. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
Linear Regression Definition
(a) Train phase: sample data (X, y) is entered and function h(.) is learned.
(b) Test phase: hypothesis function h(.) is used to predict y for a new sample Xnew
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 14 / 36
15. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
Linear Regression Definition
The goal here is to learn a hypothesis function h(.) from feature
space X to output label y:
ŷ(j)
= h(X(j)
) = hθ(X(j)
) = θ0 + θ1x
(j)
1 + ... + θnx(j)
n
=
n
X
i=0
θi x
(j)
i = θT
X(j)
j = 1, 2, ..., m
There are m samples each being a point in the n-dimensional feature
space.
Here, θ0 is called bias (offset) and xj
0 = 1 for all patterns.
We seed a hyperplane passing through the n-dimensional feature
space.
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 15 / 36
16. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Objective Function
Objective Function
Objective is to find parameters θ = {θ0, θ1, . . . , θn}.
The closer the hθ(X(j)
) to y(j)
, the more accurate our prediction for
unseen data. Be careful!
Define error: hθ(X(j)
) − y(j)
≡ ŷ(j)
− y(j)
A good criteria is to minimize the squared error (i.e., distance).
Ready to define our first Objective Function!
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 16 / 36
17. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Objective Function
Objective Function
Minimize sum of squared error over all m samples:
J(θ) =
1
2
m
X
j=1
(hθ(X(j)
) − y(j)
)2
minimize θ J(θ)
θ∗
is obtained by solving this optimization problem:
θ∗
= argminθ J(θ)
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 17 / 36
18. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Least Squared Method
Least Squared Method
The goal is to solve J(θ) directly.
First, construct a data matrix X of all samples as:
X =
· · · (X(1)
)T
· · ·
· · · (X(2)
)T
· · ·
.
.
.
.
.
.
.
.
.
· · · (X(m)
)T
· · ·
and
Xθ =
(X(1)
)T
θ
.
.
.
(X(m)
)T
θ
=
hθ(X(1)
)
.
.
.
hθ(X(m)
)
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 18 / 36
19. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Least Squared Method
Least Squared Method
Second, put all sample labels into a vector:
y =
y(1)
.
.
.
y(m)
Then, write the error vector:
(Xθ − y) =
hθ(X(1)
) − y(1)
.
.
.
hθ(X(m)
) − y(m)
Finally, square of the error becomes:
1
2
(Xθ − y)T
· (Xθ − y) =
1
2
m
X
j=1
(h(X(j)
) − y(j)
)2
= J(θ)
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 19 / 36
20. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Least Squared Method
Least Squared Method
A so called L.S. solution!
∇θJ(θ) = 0
⇒∇θ
1
2
(Xθ − y)T
(Xθ − y) = 0
⇒
1
2
∇θ tr (θT
XT
Xθ − θT
XT
y − yT
Xθ − yT
y) = 0
⇒
1
2
h
∇θ tr θθT
XT
X − ∇θ tr yT
Xθ − ∇θ tr yT
Xθ
i
= 0
⇒
1
2
XT
XθI + XT
XθI − XT
y − XT
y
= 0
⇒XT
Xθ − XT
y = 0
⇒θ = (XT
X)−1
XT
y
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 20 / 36
21. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Parameter Estimation by Gradient Descent
Implementation of Gradient Descent
Parameter Estimation by Gradient Descent
Not all objective functions have closed form solution!
Try an iterative method based on gradient of J(θ):
θi = θi − α
∂
∂θi
J(θ)
Need to determine ∂/∂θi J(θ)!.
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 21 / 36
22. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Parameter Estimation by Gradient Descent
Implementation of Gradient Descent
Parameter Estimation by Gradient Descent
∂
∂θi
J(θ) =
∂
∂θi
m
X
j=1
1
2
(hθ(X(j)
) − y(j)
)2
=
m
X
j=1
2 ·
1
2
(hθ(X(j)
) − y(j)
) ·
∂
∂θi
(hθ(X(j)
) − y(j)
)
=
m
X
j=1
(hθ(X(j)
) − y(j)
)
∂
∂θi
(θ0x
(j)
0 + θ1x
(j)
1 + · · · + θnx(j)
n − y(j)
)
=
m
X
j=1
(hθ(X(j)
) − y(j)
)x
(j)
i
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 22 / 36
23. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Parameter Estimation by Gradient Descent
Implementation of Gradient Descent
Parameter Estimation by Gradient Descent
θi = θi − α
m
X
j=1
(hθ(X(j)
) − y(j)
)x
(j)
i
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 23 / 36
24. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Parameter Estimation by Gradient Descent
Implementation of Gradient Descent
Implementation of Gradient Descent
Stochastic and Batch Gradient Descent
HELP ME! ALGORITHM STRUCTURE!!!!
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 24 / 36
25. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Overfitting and Underfitting
Model Complexity
Model learning depends on the number of features (i.e., dimension).
Model learning depends on the number of parameters (i.e., θ).
Underfit:
y = θ0 + θ1x1
Nonlinear (quadratic):
y = θ0 + θ1x1 + θ2x2
1
Overfit:
y = θ0 + θ1x1 + ... + θ6x6
1
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 25 / 36
26. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Overfitting and Underfitting
Model Complexity
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 26 / 36
27. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Locally Weighted Linear Regression
Locally Weighted Linear Regression
Goal: Generalize instance-based learning to predict continuous
outputs!
A non-parametric algorithm
Local means nearby samples have more effect on Xnew
(i.e. a
nearest neighbors approach)
Weighted means we value samples based upon how far away they
are from Xnew
Fit a linear (quadratic) function to k nearest neighbors of a given
sample. Thus producing piecewise approximation.
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 27 / 36
28. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Locally Weighted Linear Regression
Locally Weighted Linear Regression
The idea: in order to classify a new sample Xnew
:
Build a local model of the function (using a linear function,
quadratic, neural network, etc.)
Use the model to predict the output value
Throw the model away!
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 28 / 36
29. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Locally Weighted Linear Regression
Locally Weighted Linear Regression
For any new sample Xnew
:
J(θ) =
X
j
w(j)
(y(j)
− θT
X(j)
)2
The weighting (kernel) function can be a simple Gaussian:
w(j)
= exp(−
(X(j)
− X(new)
)2
2
)
Small |X(j)
− X(new)
| makes w(j)
≈ 1.
Large |X(j)
− X(new)
| makes w(j)
≈ 0.
Think about the effect of σ2
:
exp(−
(X(j)
− X(new)
)2
2σ2
)
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 29 / 36
30. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Locally Weighted Linear Regression
Locally Weighted Linear Regression
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 30 / 36
31. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Gaussian Noise
Independently Identically Distributed
Likelihood Function
Probabilistic Interpretation of Least Squared Method
Let us consider uncertainty in the model:
y(j)
= θT
X(j)
+ (j)
1. Gaussian noise:
(j)
∼ N(0, σ2
)
P((j)
) =
1
√
2πσ
exp(−(
(j)
2σ
)2
)
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 31 / 36
32. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Gaussian Noise
Independently Identically Distributed
Likelihood Function
Probabilistic Interpretation of Least Squared Method
P(y(j)
− θT
X(j)
) =
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
note: y(j)
|X(j)
≡ y(j)
− θT
X(j)
P(y(j)
|X(j)
; θ) =
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
y(j)
|X(j)
; θ ≈ N(θT
X(j)
, σ2
)
The larger the P(y(j)
|X(j)
), the better y(j)
labels X(j)
.
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 32 / 36
33. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Gaussian Noise
Independently Identically Distributed
Likelihood Function
Probabilistic Interpretation of Least Squared Method
2. Noise (j)
are assumed i.i.d
Define likelihood function:
L(θ) = P(y|X; θ)
=
m
Y
j=1
P(y(j)
|X(j)
; θ)
=
m
Y
j=1
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 33 / 36
34. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Gaussian Noise
Independently Identically Distributed
Likelihood Function
Probabilistic Interpretation of Least Squared Method
Take log will simple out the procedure:
l(θ) = logL(θ)
= log
m
Y
j=1
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
=
m
X
j=1
log
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
#
= m
1
√
2πσ
+
m
X
j=1
−
(y(j)
− θT
X(j)
)2
2σ2
Thus, maximizing l(θ) is in fact minimizing
Pm
j=1
(y(j)
−θT
X(j)
)2
2σ2 ,
which is the objective function J(θ).
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 34 / 36
35. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Gaussian Noise
Independently Identically Distributed
Likelihood Function
Probabilistic Interpretation of Least Squared Method
Label for X(j)
is:
Ey|X(y(j)
|X(j)
)
= Ey|X(θT
X(j)
+ (j)
)
= θT
X(j)
Note: probabilistic solution to regression with Gaussian noise is
identical to algebraic least square solution!
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 35 / 36
36. Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Lecture Summary
1
2
3
4
5
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 36 / 36