Dr azimifar pattern recognition lect2

Introduction
Linear Regression
Objective Function
Closed Form Solution
Gradient Descent Method
Model Complexity
Locally Weighted Linear Regression
Probabilistic Interpretation of Least Squared Method
Lecture Summary
Statistical Pattern Recognition
Lecture2
Dr Zohreh Azimifar
School of Electrical and Computer Engineering
Shiraz University
August 3, 2014
Dr Zohreh Azimifar, 2014 Statistical Pattern Recognition Lecture2 1 / 36

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Table of Contents

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Regression

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
What is Linear Regression
Some Examples
Linear Regression Definition
What is Regression
In classification, we seek to identify the categorical class Ck
associate with a given input vector x.
In regression, we seek to identify (or estimate) a continuous variable
y associated with a given input vector x.
Here, y is called the dependent variable and x is called the
independent variable.
We assume y = f (x).

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
What is Regression
Figure: Polynomial Curve Fitting

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
What is Regression
Figure: Polynomial curve fitting: polynomial of degree 1

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
What is Regression

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
The simplest scenario is y = ax + b.
Figure: y = x + 2

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
What are good guesses (â, b̂) for (a, b), base on the data?
Figure: Inexact linear regression

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
Some Examples
Number of cars in a city and amount of ozone per m3
of air
Number of iterations set in a matlab code and time to finish
computation
Size of a house per m2
and the price

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
Some Examples
Table: Price of a house as a function of its size and number of bedrooms.
Bias Area no. of Price in
(offset) (m2
) Bedrooms Millions
1 156 3 350
1 60 1 112
1 137 2 283
1 142 1 277
1 147 1 242
1 68 3 154
1 100 2 183
1 86 2 151
1 140 1 246
1 103 3 217

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
Assume we have m training samples {(X(j)
, y(j)
) : j = 1, ..., m},
where X(j)
and y(j)
are feature vector and label corresponding to the
jth
training sample.
Assume feature vector (pattern) contains n features, i.e., X(j)
is
n−dimensional.
The goal here is to learn a mapping function h(.) from feature space
X to output label y.
h(.) is called hypothesis function (model).

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
(a) Train phase: sample data (X, y) is entered and function h(.) is learned.
(b) Test phase: hypothesis function h(.) is used to predict y for a new sample Xnew

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
What is Regression
Some Examples
The goal here is to learn a hypothesis function h(.) from feature
space X to output label y:
ŷ(j)
= h(X(j)
) = hθ(X(j)
) = θ0 + θ1x
(j)
1 + ... + θnx(j)
n
=
n
X
i=0
θi x
(j)
i = θT
X(j)
j = 1, 2, ..., m
There are m samples each being a point in the n-dimensional feature
space.
Here, θ0 is called bias (offset) and xj
0 = 1 for all patterns.
We seed a hyperplane passing through the n-dimensional feature
space.

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Objective Function
Objective Function
Objective is to find parameters θ = {θ0, θ1, . . . , θn}.
The closer the hθ(X(j)
) to y(j)
, the more accurate our prediction for
unseen data. Be careful!
Define error: hθ(X(j)
) − y(j)
≡ ŷ(j)
− y(j)
A good criteria is to minimize the squared error (i.e., distance).
Ready to define our first Objective Function!

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Objective Function
Objective Function
Minimize sum of squared error over all m samples:
J(θ) =
1
2
m
X
j=1
(hθ(X(j)
) − y(j)
)2
minimize θ J(θ)
θ∗
is obtained by solving this optimization problem:
θ∗
= argminθ J(θ)

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Least Squared Method
The goal is to solve J(θ) directly.
First, construct a data matrix X of all samples as:
X =





· · · (X(1)
)T
· · ·
· · · (X(2)
)T
· · ·
.
.
.
.
.
.
.
.
.
· · · (X(m)
)T
· · ·





and
Xθ =



(X(1)
)T
θ
.
.
.
(X(m)
)T
θ


 =



hθ(X(1)
)
.
.
.
hθ(X(m)
)




Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Second, put all sample labels into a vector:
y =



y(1)
.
.
.
y(m)



Then, write the error vector:
(Xθ − y) =



hθ(X(1)
) − y(1)
.
.
.
hθ(X(m)
) − y(m)



Finally, square of the error becomes:
1
2
(Xθ − y)T
· (Xθ − y) =
1
2
m
X
j=1
(h(X(j)
) − y(j)
)2
= J(θ)

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
A so called L.S. solution!
∇θJ(θ) = 0
⇒∇θ
1
2
(Xθ − y)T
(Xθ − y) = 0
⇒
1
2
∇θ tr (θT
XT
Xθ − θT
XT
y − yT
Xθ − yT
y) = 0
⇒
1
2
h
∇θ tr θθT
XT
X − ∇θ tr yT
Xθ − ∇θ tr yT
Xθ
i
= 0
⇒
1
2

XT
XθI + XT
XθI − XT
y − XT
y

= 0
⇒XT
Xθ − XT
y = 0
⇒θ = (XT
X)−1
XT
y

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Parameter Estimation by Gradient Descent
Implementation of Gradient Descent
Not all objective functions have closed form solution!
Try an iterative method based on gradient of J(θ):
θi = θi − α
∂
∂θi
J(θ)
Need to determine ∂/∂θi J(θ)!.

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
∂
∂θi
J(θ) =
∂
∂θi
m
X
j=1
1
2
(hθ(X(j)
) − y(j)
)2
=
m
X
j=1
2 ·
1
2
(hθ(X(j)
) − y(j)
) ·
∂
∂θi
(hθ(X(j)
) − y(j)
)
=
m
X
j=1
(hθ(X(j)
) − y(j)
)
∂
∂θi
(θ0x
(j)
0 + θ1x
(j)
1 + · · · + θnx(j)
n − y(j)
)
=
m
X
j=1
(hθ(X(j)
) − y(j)
)x
(j)
i

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
θi = θi − α
m
X
j=1
(hθ(X(j)
) − y(j)
)x
(j)
i

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Stochastic and Batch Gradient Descent
HELP ME! ALGORITHM STRUCTURE!!!!

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Overfitting and Underfitting
Model Complexity
Model learning depends on the number of features (i.e., dimension).
Model learning depends on the number of parameters (i.e., θ).
Underfit:
y = θ0 + θ1x1
Nonlinear (quadratic):
y = θ0 + θ1x1 + θ2x2
1
Overfit:
y = θ0 + θ1x1 + ... + θ6x6
1

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Overfitting and Underfitting
Model Complexity

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Goal: Generalize instance-based learning to predict continuous
outputs!
A non-parametric algorithm
Local means nearby samples have more effect on Xnew
(i.e. a
nearest neighbors approach)
Weighted means we value samples based upon how far away they
are from Xnew
Fit a linear (quadratic) function to k nearest neighbors of a given
sample. Thus producing piecewise approximation.

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
The idea: in order to classify a new sample Xnew
:
Build a local model of the function (using a linear function,
quadratic, neural network, etc.)
Use the model to predict the output value
Throw the model away!

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
For any new sample Xnew
:
J(θ) =
X
j
w(j)
(y(j)
− θT
X(j)
)2
The weighting (kernel) function can be a simple Gaussian:
w(j)
= exp(−
(X(j)
− X(new)
)2
2
)
Small |X(j)
− X(new)
| makes w(j)
≈ 1.
Large |X(j)
− X(new)
| makes w(j)
≈ 0.
Think about the effect of σ2
:
exp(−
(X(j)
− X(new)
)2
2σ2
)

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Gaussian Noise
Independently Identically Distributed
Likelihood Function
Let us consider uncertainty in the model:
y(j)
= θT
X(j)
+ (j)
1. Gaussian noise:
(j)
∼ N(0, σ2
)
P((j)
) =
1
√
2πσ
exp(−(
(j)
2σ
)2
)

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Gaussian Noise
Likelihood Function
P(y(j)
− θT
X(j)
) =
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
note: y(j)
|X(j)
≡ y(j)
− θT
X(j)
P(y(j)
|X(j)
; θ) =
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
y(j)
|X(j)
; θ ≈ N(θT
X(j)
, σ2
)
The larger the P(y(j)
|X(j)
), the better y(j)
labels X(j)
.

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Gaussian Noise
Likelihood Function
2. Noise (j)
are assumed i.i.d
Define likelihood function:
L(θ) = P(y|X; θ)
=
m
Y
j=1
P(y(j)
|X(j)
; θ)
=
m
Y
j=1
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Gaussian Noise
Likelihood Function
Take log will simple out the procedure:
l(θ) = logL(θ)
= log
m
Y
j=1
1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
=
m
X
j=1
log

1
√
2πσ
exp(−
(y(j)
− θT
X(j)
)2
2σ2
)
#
= m
1
√
2πσ
+
m
X
j=1
−
(y(j)
− θT
X(j)
)2
2σ2
Thus, maximizing l(θ) is in fact minimizing
Pm
j=1
(y(j)
−θT
X(j)
)2
2σ2 ,
which is the objective function J(θ).

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Gaussian Noise
Likelihood Function
Label for X(j)
is:
Ey|X(y(j)
|X(j)
)
= Ey|X(θT
X(j)
+ (j)
)
= θT
X(j)
Note: probabilistic solution to regression with Gaussian noise is
identical to algebraic least square solution!

Introduction
Linear Regression
Objective Function
Model Complexity
Lecture Summary
Lecture Summary
1
2
3
4
5

Dr azimifar pattern recognition lect2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dr azimifar pattern recognition lect2

Similar to Dr azimifar pattern recognition lect2 (20)

More from Zahra Amini

More from Zahra Amini (12)

Recently uploaded

Recently uploaded (20)

Dr azimifar pattern recognition lect2