Single linear regression

Single Linear Regression
Conceptual Explanation

• Welcome to this explanation of Single Linear
Regression.

Regression.
• Single linear regression is an extension of
correlation.

Regression.
• Single linear regression is an extension of
correlation.
Correlation extends to Single Linear Regression

• Correlation is designed to render a single
coefficient that represents the degree of coherence
between two variables

As one
variable
increases the
other
increases
+.99

As one
variable
increases the
other
increases
+.99
This coefficient represents an
almost perfect positive
correlation or relationship
between these two variables.

Ave Daily Temp
500
600
700
800
900

Ave Daily Temp
500
600
700
800
900
As one
variable
decreases the
other
increases

Ave Daily Temp
500
600
700
800
900
As one
variable
decreases the
other
increases
-.99

Ave Daily Temp
500
600
700
800
900
As one
variable
decreases the
other
increases
-.99
Almost a perfect negative
correlation or relationship
between these two variables.

• Single linear regression uses that information to
predict the value of one variable based on the
given value of the other variable.

predict the value of one variable based on the
given value of the other variable.
• For example:

• For example:
If the following data set were real, what would you
predict ice cream sales would be when the
temperature reaches 1000?

• For example:
predict ice cream sales would be when the
temperature reaches 1000?
Ave Daily Ice Cream Sales
?
560
480
350
320
230
Ave Daily Temp
1000
900
800
700
600
500

predict the value of one variable (ice cream) based
on the given value of the other variable
(temperature).

predict ice cream sales would be when the temperature
reaches 1000?
630?
560
480
350
320
230
Ave Daily Temp
1000
900
800
700
600
500
• Rather than simply examining the relationship between
the variables (as is the case with the Pearson Product
Moment Correlation), one variable will be used as the
predictor (temperature) and the other value will be
used as the outcome or predicted (ice cream sales).

predict ice cream sales would be when the temperature
reaches 1000?
630?
560
480
350
320
230
Ave Daily Temp
1000
900
800
700
600
500
• Rather than simply examining the relationship between
the variables (as is the case with the Pearson Product
Moment Correlation), one variable will be used as the
predictor (temperature) and the other value will be
used as the outcome or predicted (ice cream sales).
• Linear Regression makes it possible to estimate a value
like 630

• In some cases which variable is considered
predictor or outcome is arbitrary.

• Like measures of depression and anxiety

Composite
Depression Score
33
26
22
14
12
6
Composite
Anxiety Score
103
100
92
74
52
26

Composite
Depression Score
33
26
22
14
12
6
Composite
Anxiety Score
103
100
92
74
52
26
• It’s not clear which influences which. Most likely
depression and anxiety mutually influence one
another.

• In some cases, either by theory or by the nature of
the research design, one variable will be rationally
defined as the predictor and the other as the
outcome.

outcome.
Ave Daily
Exposure to Sunlight
3.3 hrs
2.6 hrs
2.2 hrs
1.4 hrs
1.2 hrs
0.6 hrs

outcome.
Ave Daily
3.3 hrs
2.6 hrs
2.2 hrs
1.4 hrs
1.2 hrs
0.6 hrs
Levels of Vitamin E
after two months
10.3 units
8.1 units
7.3 units
7.0 units
6.8 units
5.7 units

outcome.
Ave Daily
3.3 hrs
2.6 hrs
2.2 hrs
1.4 hrs
1.2 hrs
0.6 hrs
Levels of Vitamin E
after two months
10.3 units
8.1 units
7.3 units
7.0 units
6.8 units
5.7 units
In this example,
exposure to sunlight
may impact levels of
Vitamin E.
But, levels of Vitamin E
would not impact the
amount of sunlight
one gets.

• An easy way to conceptualize single linear
regression is to create a scatterplot in Cartesian
space.

space.
Let’s plot the following data set:

space.
Let’s plot the following data set:
Composite
Depression Score
33
26
22
14
12
6
Composite
Anxiety Score
103
100
92
74
52
26

• First, we assign the predictor variable along the X
axis, which in this case we’ll arbitrarily say is
depression.

• First, we assign the predictor variable along the X
axis, which in this case we’ll arbitrarily say is
depression.
120
100
80
60
40
20
0
Relationship between
Depression & Anxiety
0 10 20 30 40
Anxiety
Depression

• ... and the outcome variable along the Y axis we’ll
arbitrarily say is Anxiety.

• ... and the outcome variable along the Y axis we’ll
arbitrarily say is Anxiety.
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression

• Now, let’s identify or plot each point or dot

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
Depression & Anxiety (33, 103)
0 10 20 30 40
Anxiety
Depression

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
Depression & Anx(i2e6ty, 100)
0 10 20 30 40
Anxiety
Depression

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
Depression &( 2A2n,x 9ie2t)y
0 10 20 30 40
Anxiety
Depression

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
Dep(r1e4s,s i7o4n) & Anxiety
0 10 20 30 40
Anxiety
Depression

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
(12, 52)
0 10 20 30 40
Anxiety
Depression

Depression
33
26
22
14
12
6
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
(6, 26)
0 10 20 30 40
Anxiety
Depression

• Visually, one can see in the plotted space whether
there is a tendency for the variables to be related
and in what direction they are related.

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
In this case there is a
strong tendency
to relate and the
relationship is
positive

• With this data set the tendency for the variables to
relate is strong and the direction is negative:

Depression
6
12
14
22
26
33
Anxiety
103
100
92
74
52
26

Depression
6
12
14
22
26
33
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression

Depression
6
12
14
22
26
33
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
Strong and Negative

• When no relationship exists the scatter plot tends
to look like a big circle.

Depression
22
33
12
6
14
26
Anxiety
103
100
92
74
52
26

Depression
22
33
12
6
14
26
Anxiety
103
100
92
74
52
26
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression

Depression
22
6
33
26
14
12
Anxiety
103
100
92
74
52
26

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
Depression
22
6
33
26
14
12
Anxiety
103
100
92
74
52
26

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
Depression
22
6
33
26
14
12
Anxiety
103
100
92
74
52
26
Weak and Positive

Depression
6
14
33
26
12
22
Anxiety
103
100
74
92
52
26

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
Depression
6
14
33
26
12
22
Anxiety
103
100
74
92
52
26

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
Depression
6
14
33
26
12
22
Anxiety
103
100
74
92
52
26
Weak and Negative

• You might have noticed that as the variables are related
either positively or negatively, the plot looks more like an
oval tilted one way or the other.

120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression

Weak and Negative
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
120
100
80
60
40
20
0
0 10 20 30 40
Anxiety
Depression
Weak and Positive

• As mentioned before, Linear Regression is used to predict
one variable (ice cream sales) from another related variable
(temperature).

(temperature).
• The stronger the relationship (e.g., +.99 or -.99) the more
accurate the prediction.

(temperature).
• The weaker the relationship (e.g., +.14 or -.03) the less

(temperature).
• One of the ways to represent those relationships is of
course with the coefficients (e.g., +.99, +.14, -.03, -.99).

(temperature).
• One of the ways to represent those relationships is of
course with the coefficients (e.g., +.99, +.14, -.03, -.99).
• Another way to represent it is by graphing the relationship.

• Recall that a line in Cartesian space is defined by its
slope and its Y intercept (the value of Y when X
equals 0).

equals 0).
[Y= intercept + (slope ∙ X)]

equals 0).
[Y= intercept + (slope ∙ X)]
6
5
4
3
2
1
0
0 1 2 3 4 5 6

• In this case the slope would be 1. You may
remember that this value is derived by taking what
is called the “rise” over the “run”.

6
5
4
3
2
1
0
rise
1
run
1
0 1 2 3 4 5 6

6
5
4
3
2
1
0
rise
1
run
1
0 1 2 3 4 5 6
• So the equation for this line so far would look like this:

6
5
4
3
2
1
0
rise
1
run
1
0 1 2 3 4 5 6
• So the equation for this line so far would look like this:
풚 = 0 +
1
1
풙

6
5
4
3
2
1
0
rise
1
run
1
0 1 2 3 4 5 6
풚 = 0 +
1
1
풙

6
5
4
3
2
1
0
rise
1
run
1
0 1 2 3 4 5 6
풚 = 0 +
1
1
풙
This is where the
line crosses the
Y axis.

6
5
4
3
2
1
0
rise
1
run
1
0 1 2 3 4 5 6
풚 = 0 +
1
1
풙
This is the slope
which is the rise
over the run.

• A line represents the functional relationship
between variable X and variable Y, therefore, that
line can be used to predict a Y value from any given
X value.

• A line represents the functional relationship
between variable X and variable Y, therefore, that
line can be used to predict a Y value from any given
X value.
Feb
Mar
Apr
May
Jun
Ave Monthly
Temperature
500
600
700
800
900
Ave Monthly Ice
Cream Sales
239
320
400
480
560

• In this case the two variables (temperature and ice
cream sales) have a perfect linear relationship. This
is rarely ever seen among variables such as these in
the real world, but for illustrative purposes we have
created a perfect relationship.

• In this case the two variables (temperature and ice
cream sales) have a perfect linear relationship. This
is rarely ever seen among variables such as these in
the real world, but for illustrative purposes we have
created a perfect relationship.
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Ave Monthly Temperature
Average Monthly Ice Cream Sales

• Now let’s say we have data for the average temperature
during the month of July. But, we don’t have the data for
the average ice cream sales for July

Feb
Mar
Apr
May
Jun
JUL
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
?

Feb
Mar
Apr
May
Jun
JUL
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
?
• Using single linear regression we can predict the average ice
cream sales for July. Here is the formula we will use for the
prediction:

Feb
Mar
Apr
May
Jun
JUL
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
?
prediction:
푦 = 풚 풊풏풕풆풓풄풆풑풕 + 풔풍풐풑풆(푥

Feb
Mar
Apr
May
Jun
JUL
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
?
prediction:
• There are many ways to write this equation. Here is one
way:

Feb
Mar
Apr
May
Jun
JUL
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
?
prediction:
• There are many ways to write this equation. Here is one
way:
푦 = 풃 +풎(푥

• Using this data set we can create a formula for a straight line
that represents that relationship:

Feb
Mar
Apr
May
Jun
Ave Monthly
Temperature
500
600
700
800
900
Ave Monthly Ice
Cream Sales
239
320
400
480
560

Feb
Mar
Apr
May
Jun
Ave Monthly
Temperature
500
600
700
800
900
Ave Monthly Ice
Cream Sales
239
320
400
480
560
700
600
500
400
300
200
100
0
푦= -162+8(푥)
0 20 40 60 80 100 120

• With this equation we can now plug in the average
temperature for July (1000) and see what the predicted
average ice cream sales would be:

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
푦

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
푦
700
600
500
400
300
200
100
0
푦̂ = -162 + 8(100)
0 20 40 60 80 100 120

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
푦
700
600
500
400
300
200
100
0
푦̂ = -162 + 800
0 20 40 60 80 100 120

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
푦
700
600
500
400
300
200
100
0
푦̂ = 638
0 20 40 60 80 100 120

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
638
700
600
500
400
300
200
100
0
푦̂ = 638
0 20 40 60 80 100 120

• So, based on our single linear regression analysis we would
predict that in the month of July that the average monthly
ice cream sales will be 638.
Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
638
700
600
500
400
300
200
100
0
푦̂ = 638
0 20 40 60 80 100 120

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
638
700
600
500
400
300
200
100
0
푦̂ = 638
0 20 40 60 80 100 120
• This is a simple demonstration of how regression works.

Feb
Mar
Apr
May
Jun
Jul
Ave Monthly
Temperature
500
600
700
800
900
1000
Ave Monthly Ice
Cream Sales
239
320
400
480
560
638
700
600
500
400
300
200
100
0
푦̂ = 638
0 20 40 60 80 100 120
• This is a simple demonstration of how regression works.
• In reality, however, most variables will not correlate so
perfectly like this did:

• Most will look like this:
• This line is called the best fitting line because it minimizes
the distance between the line and all of the points. You will
notice again that we have a linear equation for that line:

• This line is called the best fitting line because it minimizes
the distance between the line and all of the points. You will
notice again that we have a linear equation for that line:
푦= -50.93+7.21(x)

• This equation is calculated by using the standard
deviations and means of the two variables. For brevity
sake we will not go into this here.
푦= -50.93+7.21(x)

• Given the infinite number of positive linear fitting through a
scatterplot, the one closer to represent the functional
relationship between X and Y is the line that results in the
cumulative least squared error between the predicted values
of Y and the true observed values of Y for each given X.

This line is the
predicted values of Y
calculated from the
equation
푦 = 푏 + 푚푥

This line is the
predicted values of Y
calculated from the
equation
푦 = 푏 + 푚푥
These dots represent the
actual data

• We don’t have to actually plot the coordinates and lines. We
can operate solely on the equations to generate predicted
values and errors in prediction. In this way we can
determine if temperature is a statistically significant
predictor of ice cream sales.

• So here are the actual data we plotted the data from:

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
• We can now plot the predicted Y using the
equation:

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
equation: 푦 = -50.93+7.21(x)

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
equation:
푦 = -50.93+7.21(x)
• Which is the equation for the best fitting line
between these two variables:

• We can now plot the predicted Y using the equation:

푦 = -50.93+7.21(x)

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
푦 = -50.93+7.21(x)

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
푦 = -50.93+7.21(x)
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(320) =
푦 = -50.93+7.21(370) =
푦 = -50.93+7.21(480) =
푦 = -50.93+7.21(560) =
푦 = -50.93+7.21(640) =
푦 = -50.93+7.21(720) =
푦 = -50.93+7.21(600) =
푦 = -50.93+7.21(400) =
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(200) =
푦 = -50.93+7.21(122) =

(X) Ave
Monthly
Temp
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
푦 = -50.93+7.21(x)
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(320) =
푦 = -50.93+7.21(370) =
푦 = -50.93+7.21(480) =
푦 = -50.93+7.21(560) =
푦 = -50.93+7.21(640) =
푦 = -50.93+7.21(720) =
푦 = -50.93+7.21(600) =
푦 = -50.93+7.21(400) =
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(200) =
푦 = -50.93+7.21(122) =
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27

(X) Ave
Monthly
Temp
푦 = -50.93+7.21(x)
(y) Actual Ave
Monthly Ice
Cream Sales
Jan 40 300
Feb 50 320
Mar 60 370
Apr 70 480
May 80 560
Jun 90 640
Jul 100 720
Aug 90 600
Sep 80 400
Oct 60 300
Nov 40 200
Dec 20 122
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(320) =
푦 = -50.93+7.21(370) =
푦 = -50.93+7.21(480) =
푦 = -50.93+7.21(560) =
푦 = -50.93+7.21(640) =
푦 = -50.93+7.21(720) =
푦 = -50.93+7.21(600) =
푦 = -50.93+7.21(400) =
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(200) =
푦 = -50.93+7.21(122) =
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
• With this information we can now determine if x (temperature) is a
statistically significant predictor of “y” (ice cream sales).

• To begin we need to determine the total sum of squares just
like we would do with analysis of variance.

• This is done by subtracting the actual “Y” (ice cream sales)
values from the average or mean ice cream sales for the
whole year.

whole year.
• The mean is calculated by adding up the values and divided
them by how many there are.

whole year.
• The mean is calculated by adding up the values and divided
them by how many there are.
• (300+320+370+480+560+640+720+600+400+300+200+122)
/ 12 = 417 average ice cream sales

• We then subtract each y value from the mean

(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=

(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
• Note - if we did not know the functional relationship
between X and Y, our best prediction of any one person’s Y
value would be the mean of Y.

• Because we are calculating the total sum of squares we will
need to square the results and then take the average of the
sum of squares. This is the same as the variance of all of the
scores.

need to square the results and then take the average of the
sum of squares. This is the same as the variance of all of the
scores.
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
Squared
13689
9409
2209
3969
20449
49729
91809
33489
289
13689
47089
87025

need to square the results and then sum up the results

need to square the results and then sum up the result
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
Squared
13689
9409
2209
3969
20449
49729
91809
33489
289
13689
47089
87025
Sum up
SUM 372844

• Now we find regression (good) and residual (bad). To have
better prediction power we want the regression sums of
squares to be large and the residual or error sums of squares
to be small.

to be small.
• Let’s see if the residual or the regression is greater.

to be small.
• We know that the total sums of squares is 31,070.

to be small.
Sum of Squares df Mean Square F-ratio Significance
Total 372,844

to be small.
• Now we will calculate the residual (error) and the
regression sums of squares which will add up to 372,844.
Total 372,844

to be small.
• Now we will calculate the residual (error) and the
regression sums of squares which will add up to 372,844.
Regression ?
Residual (error) ?
Total 372,844

• Before we calculate residual and regression let’s see
visually how we calculated the total sums of squares -
372,844.

372,844.
• Once again we subtract the actual Y values from the mean
of the actual Y values

372,844.
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
-
-
-
-
-
-
-
-
-
-
-
-

372,844.
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
-
-
-
-
-
-
-
-
-
-
-
-
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

• The first data set are the actual Y values. We subtract them
from the mean (417) which would be our best prediction if
we did not know the relationship between X (temperature)
and Y (ice cream sales)

• The first data set are the actual Y values. We subtract them
from the mean (417) which would be our best prediction if
we did not know the relationship between X (temperature)
and Y (ice cream sales)
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
-
-
-
-
-
-
-
-
-
-
-
-
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

• Here is the graphic depiction of our subtracting each data
point from the mean (417):

122
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
122
-417
= -295
417

122
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
122
-417
= -295
417
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=

200
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
200
-417
= -217
417
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=

800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
200
-417
= +303
417
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=

• Now we have the difference between the actual values for
Y (ice cream sales) and the mean of the values for Y (417)
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
417
417
417
417
417
417
417
417
417
417
417
417
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=

• As we showed previously we have to square this value
because if we don’t when we sum the differences they will
come to zero.

come to zero.
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
Squared
13689
9409
2209
3969
20449
49729
91809
33489
289
13689
47089
87025
SUM
= 0
SUM
= 372,844

come to zero.
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
Squared
13689
9409
2209
3969
20449
49729
91809
33489
289
13689
47089
87025
SUM
= 0
SUM
= 372,844
• We are doing all this once again to show a
visual depiction of what the total sums of
squares are:

come to zero.
Difference
-117
-97
-47
63
143
223
303
183
-17
-117
-217
-295
Squared
13689
9409
2209
3969
20449
49729
91809
33489
289
13689
47089
87025
SUM
= 0
SUM
= 372,844
• We are doing all this once again to show a
visual depiction of what the total sums of
squares are:
Sum of
Squares
df Mean Square F-ratio Significance
Total 372,844

• Now that we’ve seen a visual depiction of how we
calculated total sums of squares we compare the sums of
squares that are associated with error (residual) and those
associated with regression.

Sum of
Squares
Regression
Residual
Total 372,844

Sum of
Squares
Regression
Residual
Total 372,844
• Let’s calculate the error or residual sums of squares now.

• The error or residual sums of squares are
computed by subtracting each actual Y value from
each Y predicted value.

• Here are the actual Y values

800
700
600
500
400
300
200
100
0
These are the actual
Y values or average
ice cream sales
0 20 40 60 80 100 120
Midterm Exam
Final Exam
average ice cream sales

800
700
600
500
400
300
200
100
0
These are the actual
Y values or average
ice cream sales
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122

• Here are the predicted values using the linear
regression formula:

regression formula:
800
700
600
500
400
300
200
100
0
These are the
actual Y values or
average ice
cream sales
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(320) =
푦 = -50.93+7.21(370) =
푦 = -50.93+7.21(480) =
푦 = -50.93+7.21(560) =
푦 = -50.93+7.21(640) =
푦 = -50.93+7.21(720) =
푦 = -50.93+7.21(600) =
푦 = -50.93+7.21(400) =
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(200) =
푦 = -50.93+7.21(122) =
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27

regression formula:
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(320) =
푦 = -50.93+7.21(370) =
푦 = -50.93+7.21(480) =
푦 = -50.93+7.21(560) =
푦 = -50.93+7.21(640) =
푦 = -50.93+7.21(720) =
푦 = -50.93+7.21(600) =
푦 = -50.93+7.21(400) =
푦 = -50.93+7.21(300) =
푦 = -50.93+7.21(200) =
푦 = -50.93+7.21(122) =
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

• From these points and the linear regression formula
a line can be drawn

• From these points and the linear regression formula
a line can be drawn
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

• The difference between each actual value (orange) and the
predicted value (green line) is what is called error or
residual. The closer these two values are to each other the
smaller the error. The farther these two values are from
each other the larger the error and the weaker the
predictive power of the regression line.

• The difference between each actual value (orange) and the
predicted value (green line) is what is called error or
residual. The closer these two values are to each other the
smaller the error. The farther these two values are from
each other the larger the error and the weaker the
predictive power of the regression line.
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
Difference
Difference

• Let’s subtract the orange actual values and the green line
predicted values:

predicted values:
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
62.53
10.43
-11.67
26.23
34.13
42.03
49.93
2.03
-125.87
-81.67
-37.47
28.73
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
+28.73
122
93

predicted values:
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
62.53
10.43
-11.67
26.23
34.13
42.03
49.93
2.03
-125.87
-81.67
-37.47
28.73
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
-125.87
525
400

predicted values:
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
• And so on…
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
62.53
10.43
-11.67
26.23
34.13
42.03
49.93
2.03
-125.87
-81.67
-37.47
28.73
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
-125.87
525
400

• We then square those difference (deviations)

• We then square those difference (deviations)
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
62.53
10.43
-11.67
26.23
34.13
42.03
49.93
2.03
-125.87
-81.67
-37.47
28.73
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
Squared
3910.00
108.78
136.19
688.01
1164.86
1766.52
2493.00
4.12
15843.26
6669.99
1404.00
825.41

• We then square those difference (deviations) and sum them up
(y) Actual Ave
Monthly Ice
Cream Sales
300
320
370
480
560
640
720
600
400
300
200
122
(푦 Predicted Ave
Monthly Ice Cream
Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Difference
62.53
10.43
-11.67
26.23
34.13
42.03
49.93
2.03
-125.87
-81.67
-37.47
28.73
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
Squared
3910.00
108.78
136.19
688.01
1164.86
1766.52
2493.00
4.12
15843.26
6669.99
1404.00
825.41
Sum up
= 35,014

Sum of
Squares
Regression
Residual 35,014
Total 372,844

• We will now calculate the regression sums of
squares.

• We will now calculate the regression sums of
squares.
Sum of
Squares
Regression
Residual 35,014
Total 372,844
• Our hope is that this value will be much bigger than
the residual (35,014).

• The regression sums of squares is calculated by subtracting
the predicted values from the mean.

• Let’s see what this looks like visually. The green line is the
predicted values for Y or the regression line.

800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
The blue line is
the mean (417)
which is the best
predictor absent
anything else.

• You can probably already tell that it will be bigger because a
simple way to calculate it is to subtract the residual (35,014)
from the total (372,844).
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

• You can probably already tell that it will be bigger because a
simple way to calculate it is to subtract the residual (35,014)
from the total (372,844).
• However, we will calculate it the long way so you can see what
is happening.
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

• We subtract each predicted value from the mean of
the actual Y values

the actual Y values
(y) Actual Ave
Monthly Ice
Cream Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Mean Monthly Ice
Cream Sales
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
Difference
-180.2
-108.1
-36.0
36.1
108.2
180.3
252.4
180.3
108.2
-36.0
-180.2
-324.4
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam

the actual Y values
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Mean Monthly Ice
Cream Sales
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
Difference
-180.2
-108.1
-36.0
36.1
108.2
180.3
252.4
180.3
108.2
-36.0
-180.2
-324.4
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
93
- 417
- 324

the actual Y values
800
700
600
500
400
300
200
100
0
0 20 40 60 80 100 120
Midterm Exam
Final Exam
(y) Actual Ave
Monthly Ice
Cream Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Mean Monthly Ice
Cream Sales
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
Difference
-180.2
-108.1
-36.0
36.1
108.2
180.3
252.4
180.3
108.2
-36.0
-180.2
-324.4
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
670
- 417
+252

• Then we square the differences (or deviations)

• Then we square the differences (or deviations)
(y) Actual Ave
Monthly Ice
Cream Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Mean Monthly Ice
Cream Sales
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
Difference
-180.2
-108.1
-36.0
36.1
108.2
180.3
252.4
180.3
108.2
-36.0
-180.2
-324.4
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
Squared
32470.8
11684.9
1295.76
1303.45
11708
32509.3
63707.4
32509.3
11708
1295.76
32470.8
105233

• Then we square the differences (or deviations) and
sum them up
(y) Actual Ave
Monthly Ice
Cream Sales
237.47
309.57
381.67
453.77
525.87
597.97
670.07
597.97
525.87
381.67
237.47
93.27
Mean Monthly Ice
Cream Sales
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
417.7
Difference
-180.2
-108.1
-36.0
36.1
108.2
180.3
252.4
180.3
108.2
-36.0
-180.2
-324.4
-
-
-
-
-
-
-
-
-
-
-
-
=
=
=
=
=
=
=
=
=
=
=
=
Squared
32470.8
11684.9
1295.76
1303.45
11708
32509.3
63707.4
32509.3
11708
1295.76
32470.8
105233
Sum up
= 337,830

• Then we square the differences (or deviations) and
sum them up
Sum of
Squares
Regression 337,830
Residual 35,014
Total 372,844

• Now we have all of the information to test for
significance

• Now we have all of the information to test for
significance
Sum of
Squares
Regression 337,830
Residual 35,014
Total 372,844

• The degrees of freedom (df) for the regression are the
number of parameters that are being estimated which
in this case is the Y intercept and the slope in this
equation minus

• The degrees of freedom (df) for the regression are the
number of parameters that are being estimated which
in this case is the Y intercept and the slope in this
equation minus
• 2 parameters -1 = 1
Sum of
Squares
Regression 337,830 1
Residual 35,014
Total 372,844

• The degrees of freedom for residual is the number of
cases (12) minus the number of parameters (2)

• 12 months – 2 parameters (slope / y intercept) = 10

• 12 months – 2 parameters (slope / y intercept) = 10
Sum of
Squares
Regression 337,830 1
Residual 35,014 10
Total 372,844

• We now have the information we need to calculate
the Mean Square values. They are calculated by
dividing the sums of squares by the degrees of
freedom.

• We now have the information we need to calculate
the Mean Square values. They are calculated by
dividing the sums of squares by the degrees of
freedom.
Sum of
Squares
Regression 337,830 1 =337,830
Residual 35,014 10 =3,501
Total 372,844

• The F-ratio is computed by dividing the Regression
Mean Square by the Residual Mean Square

• 337,830 / 3,501 = 96.5

• 337,830 / 3,501 = 96.5
Sum of
Squares
Regression 337,830 1 337,830 96.5
Residual 35,014 10 3,501
Total 372,844

• With this information we can turn to the F-distribution
table to determine the significance value.

• With this information we can turn to the F-distribution
table to determine the significance value.
Sum of
Squares
Regression 337,830 1 337,830 96.5
Residual 35,014 10 3,501
Total 372,844

Sum of
Squares
Regression 337,830 1 337,830 96.5
Residual 35,014 10 3,501
Total 372,844

Sum of
Squares
Regression 337,830 1 337,830 96.5
Residual 35,014 10 3,501
Total 372,844
• The regression degrees of freedom (1) is represented by the
columns below:

Sum of
Squares
Regression 337,830 1 337,830 96.5
Residual 35,014 10 3,501
Total 372,844
• The residual degrees of freedom (10) is represented by the
rows below:

Sum of
Squares
Regression 337,830 1 337,830 96.5
Residual 35,014 10 3,501
Total 372,844
• Put them together and we have found the critical F value at the
.05 alpha level to be 4.96.

• Because the F-ratio (96.5) exceeds the F-critical (4.96)
we will reject the null hypothesis and indicate that
temperature is a statistically significant predictor of ice
cream sales

In Summary
• The whole point of this demonstration was to

In Summary
(1) explain that linear regression is used to predict the
value of one variable (ice cream sales) based on another
variable (temperature)

In Summary
(2) show that the total variance in Y can be partitioned
into regression (prediction power) and residual (error)

In Summary
(2) show that the total variance in Y can be partitioned
into regression (prediction power) and residual (error)
(3) show how this can be used to test whether the
prediction is better than by chance.

Single linear regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Single linear regression

Similar to Single linear regression (20)

More from Ken Plummer

More from Ken Plummer (20)

Recently uploaded

Recently uploaded (20)

Single linear regression