An intrduction to the Multiple Regression.pptx

Vietnam Maritime University
School of Mechanical Engineering
16/02/2024 1
Thermodynamics and Heat transfer

16/02/2024 2
1. Describe some of the differences between the multiple regression and bi-variate
regression
2. Assess the importance of the R-squared statistic.
3. Examine the F-test and distribution
4. Show how we can use the F-test to determine joint significance.

16/02/2024 3
In general, the regression estimates are more reliable if:
1. n is large (large dataset)
2. The sample variance of the explanatory variable is high.
3. The variance of the error term is small
4. The less closely related are the explanatory variables.

16/02/2024 4
The constant and parameters are derived in the same way as with the bi-variate model. It involves minimising the sum of the error terms. The equation for the slope parameters (α) contains an expression for the covariance
between the explanatory variables.
When a new variable is added it affects the coefficients of the existing variables

16/02/2024 5
)
tan
,
45
(
56
.
1
,
3
.
0
R
(0.3)
(0.4)
(0.1)
9
.
0
4
.
0
6
.
0
ˆ
2
brackets
in
errors
dard
s
ns
observatio
DW
z
x
y t
t
t





Regression

16/02/2024 6
In the previous slide, a unit rise in x produces 0.4 of a unit rise in y, with z held
constant.
Interpretation of the t-statistics remains the same, i.e. 0.4-0/0.4=1 (critical value is
2.02), so we fail to reject the null and x is not significant.
The R-squared statistic indicates 30% of the variance of y is explained
DW statistic indicates we are not sure if there is autocorrelation, as the DW
statistic lies in the zone of indecision (Dl=1.43, Du=1.62)
Regression

16/02/2024 7
2
R
Adjusted R-squared Statistic
This statistic is used in a multiple regression
analysis, because it does not automatically rise when
an extra explanatory variable is added.
Its value depends on the number of explanatory
variables
It is usually written as (R-bar squared):

16/02/2024 8
)
1
(
1 2
2
2
R
k
n
k
R
R 




Adjusted R-squared
In generally rises when the t-statistic of an extra
variable exceeds unity (1),so does not necessarily
imply the extra variable is significant.
It has the following formula (n-number of
observations, k-number of parameters):

16/02/2024 9
The F-test is an analysis of the variance of a regression
It can be used to test for the significance of a group of variables or for a restriction
It has a different distribution to the t-test, but can be used to test at different levels of
significance
When determining the F-statistic we need to collect either the residual sum of squares (RSS) or
the R-squared statistic
The formula for the F-test of a group of variables can be expressed in terms of either the
residual sum of squares (RSS) or explained sum of squares (ESS)
The F-test

16/02/2024 10
This is the F-test for the goodness of fit of a regression and in effect tests for the joint significance of the explanatory variables.
It is based on the R-squared statistic
It is routinely produced by most computer software packages
It follows the F-distribution, which is quite different to the t-test
F-test of explanatory power

16/02/2024 11
1
2
2
)
/(
)
1
(
1
/






k
k
n
F
k
n
R
k
R
F
F-test formula
The formula for the F-test of the goodness of fit is:

16/02/2024 12
To find the critical value of the F-distribution, in general you need to know the number of parameters and the degrees of freedom
The number of parameters is then read across the top of the table, the d of f. from the side. Where these two values intersect, we find the critical value.
F-distribution

16/02/2024 13
5.1
5.2
5.4
5.8
6.6
5
6.3
6.4
6.6
7.0
7.7
4
9.0
9.1
9.3
9.6
10.1
3
19.3
19.3
19.2
19.0
18.5
2
230.2
224.6
215.7
199.5
161.4
1
5
4
3
2
1
F-test critical value

16/02/2024 14
3
4
F
F-distribution
Both go up to infinity
If we wanted to find the critical value for F(3,4), it
would be 6.6
The first value (3) is often termed the numerator,
whilst the second (4) the denominator.
It is often written as:

16/02/2024 15
When testing for the significance of the goodness of fit, our null hypothesis is that the explanatory variables jointly equal 0.
If our F-statistic is below the critical value we fail to reject the null and therefore we say the goodness of fit is not significant.
F-statistic

16/02/2024 16
The F-test is useful for testing a number of hypotheses and is often used to test for the joint significance of a group of variables
In this type of test, we often refer to ‘testing a restriction’
This restriction is that a group of explanatory variables are jointly equal to 0
Joint Significance

16/02/2024 17
remaining
freedom
of
Degrees
remaining/
squares
of
sum
Residual
up
used
freedom
of
degrees
Extra
fit/
in
t
Improvemen
F-test for joint significance
The formula for this test can be viewed as:

16/02/2024 18
RSS
restricted
RSS
RSS
ed
unrestrict
RSS
el
ed
unrestrict
in
parameters
k
ns
restrictio
of
number
m
k
n
RSS
m
RSS
RSS
F
R
u
u
u
R
mod
/
/







F-tests
The test for joint significance has its own formula,
which takes the following form:

16/02/2024 19
To carry out this test you need to conduct two separate OLS regression, one with
all the explanatory variables in (unrestricted equation), the other with the
variables whose joint significance is being tested, removed.
Then collect the RSS from both equations.
Put the values in the formula
Find the critical value and compare with the test statistic. The null hypothesis is
that the variables jointly equal 0.
Joint Significance of a group of variables

16/02/2024 20
ed
unrestrict
u
z
x
w
y
restricted
u
w
y
t
t
t
t
t
t
t
t










3
2
1
0
1
0






Joint Significance
If we have a 3 explanatory variable model and wish to
test for the joint significance of 2 of the variables (x and
z), we need to run the following restricted and
unrestricted models:

16/02/2024 21
5
.
1
75
.
0
1
0
3
2
1
0














R
t
t
t
u
t
t
t
t
t
RSS
restricted
v
x
y
RSS
ed
unrestrict
u
z
w
x
y






Given the following model, we wish to test the joint significance
of w and z. Having estimated them, we collect their respective
RSSs (n=60).

16/02/2024 22
s
y variable
explanator
are
,
,
s
error term
are
,
.
parameters
slope
are
,
....
constants.
are
,
:
1
3
1
0
0
t
t
t
t
t
z
w
x
v
u
where






16/02/2024 23
15
.
3
:
value
critical
28
0134
.
0
375
.
0
4
60
/
75
.
0
2
/
75
.
0
5
.
1
2
56 




F
Joint significance
Having obtained the RSSs, we need to input the
values into the earlier formula (slide 18):

16/02/2024 24
0
:
0
:
3
2
1
3
2
0








H
H
Joint significance
As the F statistic is greater than the critical value
(28>3.15), we reject the null hypothesis and
conclude that the variables w and z are jointly
significant and should remain in the model.

16/02/2024 25
1. Multiple regression analysis is similar to bi-variate analysis, however
correlation between the x variables needs to be taken into account
2. The adjusted R-squared statistic tends to be used in this case
3. The F-test is used to test for joint explanatory power of the whole regression or
a sub-set of the variables
4. We often use the F-test when testing for things like seasonal effects in the data.

An intrduction to the Multiple Regression.pptx

Recommended

Recommended

More Related Content

Similar to An intrduction to the Multiple Regression.pptx

Similar to An intrduction to the Multiple Regression.pptx (20)

More from vigia41

More from vigia41 (9)

Recently uploaded

Recently uploaded (20)

An intrduction to the Multiple Regression.pptx