The document discusses simple linear regression and correlation. Simple linear regression predicts a dependent variable based on an independent variable. The linear regression equation is represented as y = β0 + β1x + ε, where β0 is the y-intercept, β1 is the slope, and ε is the error term. Multiple linear regression extends this to use two or more independent variables. Correlation is measured on a scale of -1 to 1 and indicates the strength and direction of the linear relationship between two variables. The coefficient of correlation r is calculated to quantify the correlation between variables.
2. • Simple linear regression: predicts a
variable based on the information from
another variable.
• Linear regression can only be used when
one has two continuous variables—an
independent variable and a dependent
variable.
11/7/2023
Simple Linear Regression and
Correlations
2
4. • A Simple regression model. is a two-
variable (bivariate) linear regression
model because it relates the two
variables x and y.
• Multiple linear regression (MLR): is
used to predict the outcome of a
variable based on the value of two or
more variables.
11/7/2023
Simple Linear Regression and
Correlations
4
6. Example:
• Suppose the relationship between
expenditure (Y) and income (X) of
households is expressed as:
Y = 0.6X + 120
• Here, on the basis of income, we can
predict expenditure. For an income level of
Br 1,500, then the estimated expenditure
will be:
Expenditure = 0.6(1500) + 120 = Br 1,020
• This functional relationship is
deterministic or exact, that is, given
income we can determine the exact
expenditure of a household.
11/7/2023
Simple Linear Regression and
Correlations
6
7. • But in reality this rarely happens:
different households with the same
income are not expected to spend equal
amounts due to habit, preference,
geographical and time variation, etc.
• Thus, we should express the regression
model as:
𝑦𝑖 = 𝛽0 + 𝛽1𝑥1 + 𝜖𝑖
11/7/2023
Simple Linear Regression and
Correlations
7
8. Generally the reasons for including the
error term are:
i. Omitted variables: a model is a
simplification of reality. It is not
always possible to include all relevant
variables in a functional form.
Excluded variables from the model
introduces an error.
ii. Measurement error: inaccuracy in
collection and measurement of sample
data.
iii.Sampling error
11/7/2023
Simple Linear Regression and
Correlations
8
9. Stochastic and Non-stochastic
Relationships
• If the relationship between x and y is such
that for a particular value of x, there is
only one corresponding value of y.it is
known as a deterministic (non-stochastic)
relationship . Other factors in 𝜖𝑖 are held
fixed, so that the change in 𝜖𝑖is zero.
𝑦𝑖 = 𝛽0 + 𝛽1𝑥1 + ⋯ ⋯ ⋯ + 𝛽𝑝𝑥𝑖
• Take into account the sources of errors
𝜖𝑖 𝑜𝑟 𝑢𝑖 stochastic term of the function will
be:
𝑦𝑖 = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 + ⋯ ⋯ ⋯ + 𝛽𝑝𝑥𝑖 + 𝜖𝑖
11/7/2023
Simple Linear Regression and
Correlations
9
10. 11/7/2023
Simple Linear Regression and
Correlations
10
A simple regression analysis effectively treats
all factors affecting y other than x as being
unobserved.
𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙𝟏
Let’s start by noting the following:
𝑥 =
𝑥𝑖
𝑛
𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑥𝑖 = 𝑛𝑥
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑙𝑦 𝑦𝑖 = 𝑛𝑦
Also
(𝑥𝑖 − 𝑥)2= (𝑥𝑖
2 − 2𝑥𝑖𝑥 + 𝑥2)
= 𝑥𝑖
2 − 2𝑥 𝑥𝑖 + 𝑥
2
= 𝑥𝑖
2 − 2𝑥𝑛𝑥 + 𝑛𝑥2
= 𝑥𝑖
2
− 𝑛𝑥2
11. • Now we can take the first derivative of
𝛽0
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜇𝑖
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜇𝑖
The sum of squares of the errors (SSE)
is:
𝑆𝑆𝐸 = 𝜀𝑖
2
= (𝑦𝑖 − 𝑦𝑖)2
𝜀𝑖 = 𝜇𝑖 − 𝜇𝑖 Minimizing errors
11/7/2023
Simple Linear Regression and
Correlations
11
12. −2 𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑛𝑦 − 𝑛𝛽0 − 𝛽1𝑛𝑥 = 0
𝑦 − 𝛽0 − 𝛽1𝑥 = 0
𝛽0 = 𝑦 − 𝛽1𝑥……………………… I
Note: This implies OLS line passes
through the means 𝑥 𝑎𝑛𝑑 𝑦
11/7/2023
Simple Linear Regression and
Correlations
12
14. But we know that (𝑥𝑖 − 𝑥)2
= 𝑥𝑖
2
− 𝑛𝑥2
and also 𝑛𝑥2
= 𝑥2
𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = 𝛽1 𝑥𝑖
2
− 𝛽1 𝑥2
𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = 𝛽1 (𝑥𝑖 − 𝑥)2
Also 𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
Hence (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) = 𝛽1 (𝑥𝑖 − 𝑥)2
𝛽1 =
(𝑥𝑖−𝑥)(𝑦𝑖−𝑦)
(𝑥𝑖−𝑥)2 ……………………… II
11/7/2023
Simple Linear Regression and
Correlations
14
15. X 2 3 4 5 6 7
Y 7 2 8 14 12 10
11/7/2023
Simple Linear Regression and
Correlations
15
Example: For the data given below develop the linear
regression line
𝑥𝑖 = 27 𝑦𝑖 = 53
x =
xi
n
=
27
6
y =
yi
n
=
53
6
(𝑥𝑖 − 𝑥)2 = 17.5
(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) = 𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = 25.5
16. Hence
𝛽1 =
(𝑥𝑖−𝑥)(𝑦𝑖−𝑦)
(𝑥𝑖−𝑥)2 =
25.5
17.5
= 1.46
𝛽0 = 𝑦 − 𝛽1𝑥 =
53
6
− 1.46
27
6
≈ 2.3
The regression line will be
𝑦 = 2.3 + 1.46𝑥
11/7/2023
Simple Linear Regression and
Correlations
16
y = 1.4571x + 2.2762
0
2
4
6
8
10
12
14
16
0 1 2 3 4 5 6 7 8
y
17. • The coefficient of x ( 𝛽1 )will be
expressed in other terms
• Multiply 𝛽1 by
1
𝑛
it will be
𝛽1 =
1
𝑛
( (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
1
𝑛
( 𝑥𝑖 − 𝑥 2)
𝛽1 =
𝐶𝑜𝑣(𝑥, 𝑦)
𝑉𝑎𝑟(𝑥)
11/7/2023
Simple Linear Regression and
Correlations
17
18. COEFFICIENT OF CORRELATION (𝑟)
• It is the degree of relationship between two
variables.
• It goes between -1 and 1.
• 1 indicates that the two variables are moving in
unison. They rise and fall together and have perfect
correlation.
• -1 means that the two variables are in perfect
opposites.
11/7/2023
Simple Linear Regression and
Correlations
18
𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
or
𝑟 =
(𝑥 − 𝑥)(𝑦 − 𝑦)
(𝑥 − 𝑥)2 (𝑦 − 𝑦)2
19. 𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
or
𝑟 =
(𝑥 − 𝑥)(𝑦 − 𝑦)
(𝑥 − 𝑥)2 (𝑦 − 𝑦)2
• Example: It looks as if there exists a positive linear correlation
between average interest rate and yearly investment. This
means that if the average interest rate increases, then yearly
investment will also increase.
11/7/2023
Simple Linear Regression and
Correlations
19
20. 11/7/2023 Simple Linear Regression and Correlations 20
Example: It looks as if there exists a positive linear
correlation between average interest rate and yearly
investment.
0
500
1000
1500
2000
2500
13.5 14 14.5 15 15.5 16 16.5
Average
Investment
(Y)
Average Interest (X)
23. The equation of the straight line is
𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙𝟏
𝛽1 =
10 22,569 −(149.1)(14,730)
10(2,229.03)−(149.1)2
𝛽1 =
24,447
59.49
𝛽1 = 𝟒𝟗𝟒. 𝟗𝟗
11/7/2023
Simple Linear Regression and
Correlations
23
𝛽1 =
(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
(𝑥𝑖 − 𝑥)2
24. And 𝑎 = 𝑖=1
10
𝑦𝑖
𝑛
−
𝑏 𝑖=1
10
𝑥𝑖
𝑛
=
14,730
10
−
494.99 (149.1)
10
= −𝟓𝟗𝟎𝟕. 𝟑𝟎
Thus,
y = −5907.30 + 494.99x
11/7/2023
Simple Linear Regression and
Correlations
24
y = 494.99x - 5907.3
0
500
1000
1500
2000
2500
13.5 14 14.5 15 15.5 16 16.5
Average
Investment
(Y)
Average Interest (X)
Average Investment (Y)
25. COEFFICIENT OF DETERMINATION (𝒓𝟐)
• The coefficient of determination is a measurement
used to explain how much variability of one factor
can be caused by its relationship to another related
factor.
• It can be thought of as a percent.
• Values of 𝒓𝟐
lie between 0 and 1.
• In the example above the coefficient of
determination is 𝑟2
= 0.89892
= 0.8080. This means
that almost 81% of the variation in yearly
investments can be declared by the average
interest rate.
• An 𝒓𝟐
closer to 1 is an indicator of a
better goodness of fit for the observations, the
points will be around the regression line.
11/7/2023
Simple Linear Regression and
Correlations
25
26. Garage Age of car (in years) Resale value (in Birr)
1 1 41,250
2 6 10,250
3 4 24,310
4 2 38,720
5 5 8,740
6 4 26,110
7 1 38,650
8 2 36,200
11/7/2023
Simple Linear Regression and
Correlations
26
Example: A study was undertaken at eight garages
to determine how the resale value of a car is
affected by its age. The following data was
obtained:
27. The garage manager suspects a linear
relationship between the two variables.
Fit a curve of the form y = a + bx to the
data.
The equation for the regression line is
y = 48 644.17− 6 596.93X
The correlation coefficient is
𝑟 = −0.9601
𝑟2
= 0.921
11/7/2023
Simple Linear Regression and
Correlations
27