2. Outlines
1 Linear Regression:
2 Basic Rule for Linear Regression:
3 Parametric Estimation in Linear Regression
4 Linear Regression of Higher Order
5 Introduction to Quadratic or Polynomial Regression
6 References
Dr. Varun Kumar Lecture 5 2 / 13
3. Linear Regression:
Linear Regression:
Linear regression is an approach to model the relationship between a scalar
response (or dependent variable) and one or more explanatory variables (or
independent variables). The case of one explanatory variable is called
simple linear regression.
y = mx + c ⇒ y
Independent variable
= m x
Dependent variable
+ c
m → Gradient or slope and y → Intercept
or
y = a1x1 + a2x2 + ..... + akxk + b
Here, y → Independent variable
x1, x2, ...., xk → Dependent variable
Dr. Varun Kumar Lecture 5 3 / 13
4. Basic Rule for Linear Regression:
Table for Doing Bi-variate Linear Regression
y x ˆx − x ˆy − y
y1 x1 ˆx − x1 ˆy − y1
y2 x2 ˆx − x2 ˆy − y2
...
...
...
...
yk−1 xk−1 ˆx − xk−1 ˆy − yk−1
yk xk ˆx − xk ˆy − yk
ˆy = mean(y) ˆx = mean(x)
m =
k
i=1(ˆx − xi )(ˆy − yi )
k
i=1(ˆx − xi )2
Based on these observation, estimated straight line equation can be
y − ˆy = m(x − ˆx) or y = mx + c, where c = mˆx + ˆy
Dr. Varun Kumar Lecture 5 4 / 13
5. Continued–
Note:
1 There may exist infinite number of lines for different gradient and
intercept value.
2 Straight line is supposed to be best-fit in linear regression line that
gives the minimum least square error.
3 In linear regression, fitness of straight line can also be calculated as
R2 value. Lower the R2 greater be the fitness of straight line.
4 Above straight line may be one solution, but cannot say the true
estimate that gives minimum mean square error.
Calculation of R2
R2
=
(yp − ˆy)2
(yi − ˆy)2
where yp → Predicted value from st line.
yi → Value of ith sample of y vector
Dr. Varun Kumar Lecture 5 5 / 13
6. Parametric Estimation in Linear Regression
Mean Square Error:
MSE = 1
N (yp − yi )2
Parametric Estimation in Linear Regression
Let linear regression model is as follow
r = f (x) +
f (x) is unknown function, which requires to estimate properly.
G(x|θ) → Linear estimator, estimate the unknown function f (x).
∼ N(0, σ2)
p(r|x) ∼ N(G(x|θ), σ2)
Maximum likelihood is used to learn the parameters θ.
Dr. Varun Kumar Lecture 5 6 / 13
7. Continued–
The pairs (xt, rt) in the training set are drawn from an unknown joint
probability density p(x, r), which we can write as
p(x, r) = p(r|x)p(x)
Let χ = {xt, rt}N
t=1 → Training set.
Log-likelihood Function :
L(θ|χ) = log{p(r, x)} = log(p(r|x)) + log(p(x))
= log( N
t=1 p(rt|xt)) + log( N
t=1 p(xt))
Note: We can ignore the second term since it does not depend on our
estimator.
L(θ|χ) = log
N
t=1
1
√
2πσ
e
(rt −G(x|θ))2
2σ2
= −N log(
√
2πσ) −
1
2σ2
N
t=1
(rt
− G(x|θ))2
Dr. Varun Kumar Lecture 5 7 / 13
8. Continued–
Note:
In log-likelihood function L(θ|χ), first term is independent from dependent
parameter θ.
Maximizing this is equivalent to minimizing the error function,
E(θ|χ) =
−1
2σ2
N
t=1
(rt
− G(x|θ))2
In linear regression, we have a linear model
G(xt
|w1, w0) = w1xt
+ w0
There are two unknown w1 and w0. Hence, there require two equation.
N
t=1
rt
= Nw0 + w1
N
t=1
xt
(1)
Dr. Varun Kumar Lecture 5 8 / 13
9. Continued–
N
t=1
rt
xt
= w0
N
t=1
xt
+ w1
N
t=1
(xt
)2
(2)
which can be written in vector-matrix form as Aw = y where
A =
N N
t=1 xt
N
t=1 xt N
t=1(xt)2 W =
w0
w1
Hence, we can estimate the parameter w0 and w1 by estimator G(x|θ)
using training set χ = {xt, rt}
W = A−1
y
Dr. Varun Kumar Lecture 5 9 / 13
10. Linear Regression of Higher Order:
Example:
1 Let a mathematical input-output relation is such a way that
r = f (x) + b
where unknown function f (x) is estimated by a linear estimator
G(x|θ) = a1x1 + a2x2 + a3x3 + b
where, x = {x1, x2, x3} and θ = {a1, a2, a3, b} training data set
χ = {rt, xt
1, xt
2, xt
3}N
t=1, respectively .
Ans. Let us start with second question, where four unknown and total 5N
number of training data
N
t=1
rt
= bN + a1
N
t=1
xt
1 + a2
N
t=1
xt
2 + a3
N
t=1
xt
3 (3)
Dr. Varun Kumar Lecture 5 10 / 13
11. Continued–
N
t=1
rt
xt
1 = b
N
t=1
xt
1 + a1
N
t=1
(xt
1)2
+ a2
N
t=1
xt
1xt
2 + a3
N
t=1
xt
1xt
3 (4)
N
t=1
rt
xt
2 = b
N
t=1
xt
2 + a1
N
t=1
xt
2xt
1 + a2
N
t=1
(xt
2)2
+ a3
N
t=1
xt
2xt
3 (5)
N
t=1
rt
xt
3 = b
N
t=1
xt
3 + a1
N
t=1
xt
3xt
1 + a2
N
t=1
xt
3xt
2 + a3
N
t=1
(xt
3)2
(6)
From previous example, we can also express as
AW = y
where y = N
t=1 rt, N
t=1 rtxt
1, N
t=1 rtxt
2, N
t=1 rtxt
3
T
Dr. Varun Kumar Lecture 5 11 / 13
12. Quadratic or Polynomial Regression
Quadratic or Polynomial Regression :
For quadratic regression, our estimator can be modeled as
G(x|θ) = a2x2
+ a1x + a0
Here, θ = a2, a1, a0
For polynomial regression, our estimator can be modeled as
G(x|θ) = anxn
+ an−1xn−1
+ ..... + a0
Here, θ = an, an−1, ...., a0
Dr. Varun Kumar Lecture 5 12 / 13
13. References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
Dr. Varun Kumar Lecture 5 13 / 13