1. SIMPLE LINEAR REGRESSION
PREPARED BY
HASSAN SHEHWAR SHAH
DEPARTMENT OF MECHANICAL ENGINEERING
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
TAXILA
3/16/2021 1
2. OUTLINES
๏ง Introduction of linear regression
๏ง Simple linear regression Definition
๏ง Simple linear Regression Model
๏ง Assumption of Simple linear Regression
๏ง Formulas for Estimate Regression Line
๏ง Example
3/16/2021 2
Department of Mechanical Engineering
3. INTRODUCTION TO LINEAR REGRESSION
๏ง Linear Regression Is Used To:
โ Predict The Value Of A Dependent Variable Based On The Value Of At Least One
Independent Variable
โ Explain The Impact Of Change In An Independent Variable On The Dependent
Variable
Dependent Variable : The variable we wish to predict or explain
Independent Variable : The variable used to explain the dependent variable
3/16/2021 3
Department of Mechanical Engineering
4. SIMPLE LINEAR REGRESSION MODEL
๏ง Only One Independent Variable, X
๏ง Relationship Between X And Y Is Described By A Linear Function
๏ง Changes in Y Are Assumed To Be Caused By Changes in X
3/16/2021 4
Department of Mechanical Engineering
5. SIMPLE LINEAR REGRESSION MODEL
Yi = ฮฒ0 + ฮฒ1Xi + ิi
๏ง This is also known as Population Regression Model (PRM)
Where Yi Dependent Variable or observation randomly drawn from Population
Xi Independent variable or fixed
ิi Random error or Residual
ฮฒ0 & ฮฒ1 Population Parameters
ฮฒ0 is Intercept ฮฒ1 Slope or Regression Coefficient
3/16/2021 5
Department of Mechanical Engineering
6. SIMPLE LINEAR REGRESSION MODEL
3/16/2021 6
Y
Observed Value
of Y for Xi
ฮตi
Slope = ฮฒ1
Predicted Value
of Y for Xi
Random Error
for this Xi value
Intercept = ฮฒ0
Xi
Yi = ฮฒ0 + ฮฒ1Xi
X
Department of Mechanical Engineering
8. ASSUMPTION OF SIMPLE LINEAR REGRESSION
3/16/2021 8
โข The simple linear Regression model is linear parameters.
โข The Independent variable X values are non random.
โข The residual (error) term has zero mean or the expected value of error term is zero.
โข The residual (error) values follow the normal distribution
โข There is no relationship between residual (error) term and X variable
Department of Mechanical Engineering
9. ESTIMATE REGRESSION MODEL
๏ง Let there be a set of observations (Xi, Yi) , i = 1,2,3-----n,
Yi = ฮฒ0 + ฮฒ1Xi + ิi
In terms of sample data
Yi = b0 + b1Xi + ei
Where โb0โ and โb1โ are least square estimate of ฮฒ0 and ฮฒ1 and ei commonly called
Residual.
3/16/2021 9
Department of Mechanical Engineering
10. FORMULA FOR ESTIMATION REGRESSION LINE
โข The least square estimates are as under
b1 =
๐๐
๐๐
โ๐๐Y
๐๐
2
โ๐๐2 (1)
b0 = Yฬ - b1Xฬ (2)
Where Xฬ =
๐๐
๐
& Yฬ =
๐๐
๐
the parameters b0 and b1 can also be expressed in terms of sums of squares as
follows:
b1 =
๐๐๐๐
๐๐๐
(3)
b0 = Yฬ - b1Xฬ (4)
3/16/2021 10
Department of Mechanical Engineering
11. FORMULA FOR ESTIMATION REGRESSION LINE
SSx = ( ๐๐ โ Xฬ)2 = ๐๐
2 โ ๐Xฬ (5)
SSY = (๐๐ โ Yฬ)2 = ๐๐
2 โ ๐Yฬ (6)
SSxy = (๐๐ โ Xฬ)(Y๐ โ Yฬ) = ๐๐๐๐ โ
( Xi
)( Yi
)
n
(7)
๏ง SSX and SSY are the terms used to determine the variance of X and variance of Y
respectively. SSX and SSY are called corrected sum of squares. And ๊ต1 and ๊ต0 are
estimated as follows:
๊ต1 =
๐๐๐๐
๐๐๐
(8)
๊ต0 = Yฬ - ๊ต1Xฬ (9)
Y = ๊ต0 + ๊ต1x is the regression equation (10)
Variance of X(Sx
2) = SSx/(n-1) and (11)
Variance of Y(Sx
2) = SSY/(n-1) (12)
3/16/2021 11
Department of Mechanical Engineering
12. STANDARD ERROR AND ANOVA EQUATION
โข Similarly, we can write error sum of squares (SSe) for the regression as
SSe = ๐๐
2 and Standard error as Se =
๐๐
2
๐โ2
=
๐๐๐
๐๐๐
where, dfe are error degree of freedom
๏ง The ANOVA equation for linear regression is
Total corrected sum of square = sum of squares due to regression + sum of square
due to error
SSTotal = SSR + SSe
๏ง The sum of squares are computed as follows:
Let CF (Corrected factor) =
๐บ๐๐๐๐ ๐๐๐ก๐๐ 2
๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐ฃ๐๐ก๐๐๐
SSTotal = ๐๐
2 โ
( ๐๐
)2
๐
= ๐๐
2 โ CF (13)
SSR = ๊ต1SSXY (14)
3/16/2021 12
Department of Mechanical Engineering
13. ANVOVA TABLE FOR SIMPLE LINEAR
REGRESSION AND F-TEST
Source Sum of Square Degree of Freedom Mean square F0
Regression SSR= ๊ต1SSXY
1 MSR=
SSR
1
๐๐๐
๐๐๐
Error SSe = SSTotal- SSR n-2 MSe =
SSe
(๐โ2)
Total SSTotal = SSY n-1
3/16/2021 13
๏ง Test for significance of regression is to determine if there is a linear relationship between X
and Y.
The appropriate hypotheses are H0 : ๊ต1 = 0
H1 : ๊ต1 โ 0
Reject H0 if F0 exceeds Fฮฑ,1,n-2
Rejection of H0 implies that there is significant relationship between the variable X and Y. we can
also test the coefficients ๊ต0 and ๊ต1 using t test with n โ 2 degrees of freedom.
Department of Mechanical Engineering
14. T-TEST FOR ๊ต0 AND ๊ต1
โข The hypotheses are
H0 : ๊ต0 = 0
H1 : ๊ต0 โ 0
And H0 : ๊ต1 = 0
H1 : ๊ต1 โ 0
๏ง The test statistics for Intercept t0 =
๊ต0
๐๐๐
(
1
๐
+
Xฬ 2
๐๐๐
)
(15)
Reject H0, if |t0| > tฮฑ/2,nโ2
๏ง The test statistics for Slope t0 =
๊ต1
๐๐๐/๐๐๐
(16)
Reject H0, if |t0| > tฮฑ/2,nโ2
3/16/2021 14
Department of Mechanical Engineering
15. COEFFICIENT OF DETERMINATION (R2)
โข The coefficient of determination is
R2 =
๐๐๐
๐๐๐๐๐ก๐๐
= 1 โ
๐๐๐
๐๐๐๐๐ก๐๐
(17)
๏ง The value of R2 lie between 0 and 1.
๏ง The higher R2 the greater the percentage of the variation of explained by the
regression plain that is the better the goodness of fit of the regression plain to the
sample observations.
๏ง The closer R2 to zero the worse the fit.
3/16/2021 15
Department of Mechanical Engineering
16. CONFIDENCE INTERVALS ON THE SLOPE AND
INTERCEPT
๏ง A 100(1 โ ฮฑ)% confidence interval on the slope ฮฒ1 in simple linear
regression is
ฮฒ1 ยฑ ta/2, n-2
๐๐๐
๐๐๐
(18)
๏ง Similarly, a 100(1โ ฮฑ)% confidence interval on the intercept ฮฒ0 is
ฮ0 ยฑ ta/2, n-2 ๐๐๐(
1
๐
+
Xฬ
๐๐๐
) (19)
๏ง A100(1โ ฮฑ)% prediction interval on a future observation Y0 at the value X0 is
given by
Y0 ยฑ tฮฑ/2, n-2 ๐๐๐[1 +
1
๐
+
๐0
โXฬ 2
๐๐๐
] (20)
3/16/2021 16
Department of Mechanical Engineering
17. EXAMPLE SIMPLE LINEAR REGRESSION
A software company wants to find out whether their profit is related to the investment made in
their research and development. They have collected the following data from their company
records.
i. Develop a simple linear regression model to
the data and estimate the profit when the
investment is 13 Lakh rupees.
i. Test the significance of regression using F-test.
ii. Test significance of ๊ต1.
3/16/2021 17
Sr. No. Investment in R&D ( in Lakhs) Annual Profit ( in LakhS)
1 2 24
2 3 25
3 4 31
4 5 34
5 11 40
6 3 31
7 10 36
8 8.5 36
9 4 29
10 6.5 33
11 8 37
12 9.5 37
13 11.5 39
14 10.5 39
15 9.5 36
Department of Mechanical Engineering
19. SOLUTION OF EXAMPLE CONTINUE..
๊ต1=SSXY/SSX =
211.2
152.363
= 1.386
๊ต0=Yฬ - ๊ต1Xฬ = 33.8 โ 1.386(7.067) = 24.004
The Regression Model Equation Y = ๊ต0 +๊ต1X
Y = 24.004 + 1.386X
(i) When the investment is 13 Lakhs, the profit Y = 24.004 + 1.386(13)
=42.022 lakh
(ii) SSTotal = SSY = 340.4
SSR = ๊ต1*SSXY = (1.386)(211.2) = 292.723
SSe = SSTotal โ SSR =340.4 โ 292.723 = 47.677
3/16/2021 19
Department of Mechanical Engineering
20. EXAMPLE SOLUTION CONTINUEโฆ
ANOVA TABLE FOR SIMPLE LINEAR REGRESSION
๏ง Since F5%,1,13 = 4.67, regression is significant. That is the relation between X and Y
is significant.
3/16/2021 20
Source Sum of Square
Degree of
Freedom
Mean Square F0
Regression SSR=292.723 1 MSR=292.723
79.816
Error Sse=47.677 13 MSe=3.667
Total SSTotal 340.4 14
Department of Mechanical Engineering
21. EXAMPLE SOLUTION CONTINUEโฆ
T-TEST FOR ๊ต1
The statistic to test the regression coefficient ๊ต1 is
t0 =
๊ต1
๐๐๐
/๐๐๐
=
1.386
3.667
152.363
= 8.934
Since t0.025,13 = 2.160 the regression coefficient ๊ต1 is significant.
3/16/2021 21
Department of Mechanical Engineering
22. COEFFICIENT OF DETERMINATION R-SQUARE
๏ง For this we have the following formula
R2 =
๐๐๐
๐๐๐๐๐ก๐๐
=
292.723
340.4
= 0.85994
๏ง This mean that 85.994% change in the dependent variable occurred due to the
given explanatory variable, while rest of 14.006% may be caused by random error.
3/16/2021 22
Department of Mechanical Engineering
23. SIMPLE LINEAR REGRESSION
๏ง The coefficients ๊ต0 & ๊ต1 and all other Regression result can also find from MS
Excel.
3/16/2021 23
Department of Mechanical Engineering