1. Advanced econometrics and Stata
L5-6 (1) Hypothesis testing, multi-
regression
Dr. Chunxia Jiang
Business School, University of Aberdeen, UK
Beijing , 17-26 Nov 2019
2. Topics and schedule
Sessions plan
Evening —
L1-2 Introduction to Econometrics and Stata
Evening —
L3-4 Data, single regression
Morning —
L5-6 Hypothesis testing, Multi-regression , Violation of assumptions
Afternoon Exercises and practice
Morning —
L7-8 Time series models
Evening —
L9-10 Panel data models & Endogeneity
Morning Exercises and practice
Afternoon L11-12 Frontier1 SFA
Evening L13-14: Frontier2 DEA
Evening L15-16 DID
Morning Revision
Afternoon Exam
3. Basic data analysis: Summary statistics
One variable:
Mean or average value
Minimum and Maximum value
Mode & Median
Variance and standard deviation
Two variables:
Covariance
Correlation
Cross-plot (or scatter gram or scatter plot).
Single regression/multivariate regression analysis
How do we tell that OLS is a good estimator of the PRF?
Assumptions
R-square
Review: Data and simple regression
4. Statistical inference
Derivation of variance and standard error of our coefficient
estimates
How to use the variance and standard error to assess our
model and testing hypotheses
Hypothesis testing: null and alternative hypothesis
Testing hypotheses for single parameters
Testing joint hypotheses (for two or more
parameters)
Dummy variables
Functional form
Preview:
5. Statistical inference
We want to know how good and are as estimates of
the population parameters, alpha and beta.
How reliable are our estimates?
How reliable is the least square estimation procedure?
We need to know the nature of the variables in our
regression model: which variables are random and which
variables are deterministic.
Let’s look again at the assumptions of the CLRM :
̂
5
6. Assumptions of the Classical Linear
Regression Model (CLRM)
(1) The regression model is linear in the parameters.
(2) X values are fixed in repeated sampling.
(3) The number of observations must be greater than the number of
parameters to be estimated.
(4) There must be variability in the X values.
(5) The explanatory variable X is uncorrelated with the error term:
(6) There is no perfect multicollinearity.
(7) Given the value of X, the expected value of the error term is zero
(8) The variance of the error term is constant (homoscedasticity).
(9) There is no correlation between two error terms (no
autocorrelation).
(10) The disturbance term must be normally distributed
(11) The model is correctly specified.
0
)
|
(
X
u
E
0
)
,
(
i
i X
u
Cov
2
)
var(
i
i X
u
0
)
,
,
cov(
j
i
j
i X
X
u
u
)
σ
N(0,
u 2
t
6
7. What is random and what is non
random in our regression model?
Random variables: u Y
Non-random variables: X
To estimate the properties of the estimators and
we need to know:
The mean (expected value)
The variance and covariance
The probability distribution of the error term, and
̂
ˆ
̂
ˆ
̂
ˆ
7
8. Distribution of the error term &
coefficient estimates
Assumption: the error term is normally distributed with
mean 0 and variance:
Important property of the normal distribution: any linear
function of normally distributed variables is itself normally
distributed.
Therefore alpha hat and beta hat are also normally
distributed:
We now need to find their mean and variance.
)
,
0
(
~ 2
N
ui
~ (?,?)
N
~ (?,?)
N
ˆ
ˆ
ˆ
u = y - y = y - α - βx
t t t t t
8
9. Mean of alpha hat and beta hat
Under the CLRM assumptions, the expected value of the
OLS estimator of alpha and beta are:
The expected values of and is equal to the true
parameter values. This means that the estimator is
unbiased.
ˆ
( )
E
)
ˆ
(
E
̂ ˆ
9
10. A ‘little story’ about repeated sampling
Let’s imagine that we want to estimate the average height of our
class. Our class is our population. From this population we draw a
random sample of 2 students, we measure their height and we take
the average: this is our first estimate
We then randomly draw another sample of 2 students and we
obtain a different estimates, call it .
We carry on drawing different samples and each time a slightly
different estimate for the average height of our class. That is why
our coefficient estimates are random variables.
We assume that this random variable is normally distributed.
Using our sample we will produce a value of that is very close to
the population mean.
By randomly drawing sample we are very likely to be very close to
the population mean.
1
ˆ
2
ˆ
ˆ
10
11. ...and what we actually do
In reality we do not use repeated sampling, we only
have 1 sample. We could be very unlucky and choose a
sample that is very far from the population mean. But
there is a high probability that our random sample will
give a value of
which is very close to the real population parameter.
• We recognise that there is always a margin of error in
our estimates. This is represented by the variance (or
by the standard error) of .
ˆ
ˆ
11
12. Variance of alpha hat and beta hat
If the assumptions of the CLRM are correct it is possible to
derive the variance of the estimator alpha hat and beta hat
and obtain the following expressions:
)
(
)
ˆ
var(
2
2
x
x
N
x
i
i
2
2
)
(
)
ˆ
var(
x
xi
12
13. Distribution of alpha hat and beta hat
Wrapping up what we have said so far:
)
,
0
(
~ 2
N
ui
2
2
2
)
(
,
~
ˆ
x
x
N
x
N
i
i
2
2
)
(
,
~
ˆ
x
x
N
i
13
14. Standardised normal distribution
Let’s now go back to the distribution of our coefficient beta hat:
A more convenient way to represent this distribution is by
constructing a variable Z obtained by subtracting the mean and
dividing by the standard error (standardised normal
distribution):
SE: the standard deviation of the sampling distribution of the
estimatoran estimate of that standard deviation
2
2
)
(
,
~
ˆ
x
x
N
i
)
1
,
0
(
~
)
(
/
ˆ
2
2
N
x
x
Z
i
14
15. The t-distribution
Remember: we do not know the real variance of the error
term. We have to substitute it with its estimate.
We use the variance of the residual :
By doing so we obtain a random variable with a slightly
different probability distribution.
We now have a t-distribution with N-2 degrees of freedom:
)
2
(
2
2
~
)
(
/
ˆ
ˆ
N
i
t
x
x
t
Or more simply:
2)
-
(N
t
~
)
(
se
t
)
2
(
~
)
(
N
N
se
t
t
2
ˆ
ˆ
2
2
N
u t
t
û
15
16. Degree of freedom
It is total number of observations in the sample (n)
less the number of independent (linear) constraints
or restrictions put on them.
It is the number of independent observations out of a
total of n observations.
General rule is df=n-number of parameters estimated
16
17. From the Standardised normal
distribution to the T distribution
2
2
)
(
,
~
ˆ
x
x
N
i
)
1
,
0
(
~
)
(
/
ˆ
2
2
N
x
x
Z
i
)
2
(
2
2
~
)
(
/
ˆ
ˆ
N
i
t
x
x
t
Very important for
Hypothesis testing !
17
18. It is a symmetrical distribution with zero mean and normally
flatter than the normal distribution. As the degrees of
freedom increase the t distribution approximate the normal
distribution.
The area under the curve measures the probability of a
certain event occurring.
Probability density function of a
T distribution
18
19. Testing hypotheses
We now want to use all these information to carry out
hypothesis testing.
We derive our hypotheses from the theory.
For example, in the CAPM model, the theory suggests that
the intercept should be equal to zero.
Here are the steps to follow to carry out hypothesis
testing:
jt
ft
mt
ft
jt u
R
R
R
R
return
adjusted
risk
expected
t
at time
j
fund
of
return
excess
19
20. How to carry out hypothesis testing:
Using the t test
1. Draw up the null hypothesis (H0)
2. Draw up the alternative hypothesis (H1 or HA).
This will determine the type of test to carry out
(one tailed or two tailed test)
3. Compute your t statistic (or t ratio)
4. Choose your significance level
5. Find out the critical value on the table
6. Compare your t statistic to the critical value
and decide the outcome of the test.
20
21. Null Hypothesis
We can set up hypotheses on any
parameter of our model.
Step 1: Draw up the null hypothesis (H0)
In the case of the CAPM model this will
be: H0: α = 0
jt
ft
mt
ft
jt u
R
R
R
R
return
adjusted
risk
expected
t
at time
j
fund
of
return
excess
21
22. Alternative Hypothesis
The alternative is more tricky
Usually the alternative is just that the null
is wrong:
◦ H1: α 0 (fund managers earn non-zero risk
adjusted excess returns)
But sometimes is more specific
◦ H1: α < 0 (fund managers underperform)
◦ H1: α > 0 (fund managers outperform)
22
23. Construct your t statistics
Our primary interest is in testing the following null hypothesis:
Is the unknown population mean equal zero?
In order to test this hypothesis we first compute the t-statistics or t-
ratio:
The t ratio measures how many estimated standard deviations beta hat
is away from zero.
How far this value is from zero?
0
:
0
H
)
ˆ
(
ˆ
ˆ
se
t
)
ˆ
(
0
ˆ
ˆ
se
t
23
24. Choose your significance level
We want to test:
Against the alternative:
0
:
0
H
We have to choose a significance level: the probability of
rejecting H0 when is true.
Usually we choose the 5% significance level
(5% of times we make the error of rejecting H0 when it is true).
But we might also want to use also the 10% and the 1%
significance level.
0
:
1
H
24
25. One tailed test/one sided test
In order to reject H0 in favour of H1 we need a sufficiently
large value of t.
How large?
Look at the t tables. Find the critical value in
correspondence to the chosen significant level and number
of degrees of freedom.
If t>critical value we reject H0
If t<critical value we cannot reject H0
25
26. T distribution and critical values
Suppose we have to find the critical value for the 5% significance level and 20
Degrees of freedom Use your tables
26
27. Example: the CAPM model
Regression Statistics
R Square 0.928
Adjusted R Square 0.903
Standard Error 3.179
Observations 5
Coefficients Standard Error t Stat
Intercept -1.737 4.114 -0.422
X Variable 1 1.642 0.265 6.200
Dependent variable (Y): excess return on portfolio XXX
Independent variable (X): excess return on the market portfolio
We are testing whether alpha (intercept)=0, against the alternative that is > 0
27
28. This is a very small sample with only 5 observations
and 3 degrees of freedom. The critical value from the
t distribution with 3 degrees of freedom at the 5%
significance level is 3.182 for a two tailed test and
2.353 for a one tailed test. In both cases we cannot
reject the null hypothesis.
Example: the CAPM model
29. Test of significance
The test we have just seen is very common in
econometric analysis
We are usually interested in checking that the variables
included in our model are relevant
We carry out a test of significance.
For this test we always have:
H0: beta =0
H1: beta ≠0
It is always a two-sided test
29
30. An example
A simple model: explaining how the demand for
computers is related to personal income.
Y = number of PC per 100 persons
X= per capita income (in $)
We use cross sectional data for 34 countries
Country PCs Per capita income
($)
Argentina 8.2 11410
China 2.76 4980
Canada 48.7 30040
India 0.72 2880
30
31. Results from our OLS regression
analysis
i
i X
Y 0018
.
0
5833
.
6
ˆ
Se: (2.7437) (0.00014)
R2 = 0.829
Standard
error
Interpretation of the results:
•There is a positive relationship between number of computers and
per capita income. If income increases by $1,000 the demand for computer
goes up by about 2 units per 100 persons.
•The intercept has not meaningful interpretation in this case;
•The estimated R2 is very high. It suggests that 83% of the variation in
the demand for computers is explained by per capital income.
•Is the estimated coefficient significantly different from zero?
The t ratio = 0.0018/0.00014 = 12.857
Find the critical value for the 5% significance level.
31
32. Confidence intervals
(or interval estimates)
We can also use interval estimates to carry out our
statistical inference.
In this case we compare our hypothesised value for
beta with a range of values derived from our
estimates.
If the hypothesised value of beta is within that range
we cannot reject the null hypothesis at the chosen
level of significance.
32
33. ..or more simply
33
)
ˆ
(
tan
*
96
.
1
ˆ
darderror
s
5% Critical value for a two-tailed test, with over 120 degrees of freedom
We could also define a 99% confidence interval:
)
ˆ
(
tan
*
576
.
2
ˆ
darderror
s
1% critical value
34. Let’s construct a confidence interval
for an example
34
OLS estimate Standard error T stat. Critical values
Intercept .283 (.104) 2.721 1.960 (5% )
Exper .004 (.001) 4.00 1.960 (5% )
001
.
0
*
960
.
1
004
.
0
001
.
0
*
960
.
1
004
.
0
Parameter estimate +/- (critical value*standard error) = [0.002 0.006]
Does the value ‘0’ lie within that interval? No
We reject the null hypothesis that beta = 0
We could also say that an extra year of experience will increase wages between
0.002 and 0.006.
35. Probability values
Another way of carrying out hypothesis testing is based on
the use of probability values
Compute the t statistic
Check on the tables what is the probability of obtaining a
value of the test statistic as much or greater than that
obtained in the example.
This probability is called the p value (probability value).
This is the lowest significance level at which a null
hypothesis can be rejected.
35
36. Probability values – in practice
Probability values are usually reported with the
regression output in most computer software.
To reject the null hypothesis we want the p-value to be
quite small.
Relationship between significance level and
probability level:
- We choose the significance level before running the
regression. In practice, economists use the 10% or the 5%
or the 1% significance.
- The probability value gives us each time what exactly is
the lowest probability at which the null hypothesis can
be rejected (of making a type I error).
36
37. Summary
We have looked at three possible ways of carrying out
hypothesis testing.
All three ways of carrying out the hypothesis lead to the
same conclusion.
Computer output usually provide all the necessary
information to easily carry out hypothesis testing.
37
38. Hypothesis testing
Individual coefficient (-t-test)
Significance of all coefficients (F-test)
Test for restriction (F-test)
Test for stability overtime (F-test)
Normality test is a Chi-square to see whether
the error term is distributed normally (we
don’t discuss it here)
38
39. T-test: Example: model of R&D expenditure, n=32
R&D is a function of sales and the profit margin:
Log(rd)=-4.38 +1.084*log(sales)+0.0217*prof.marg.
R-square = 0.918
(.47) (.060) (.0218)
39
40. Testing the overall significance of the sample regression
The null hypothesis is that all the coefficients are equal to
zero. The alternative hypothesis is that at least one of
these coefficients is not equal to zero.
F-test
If F>Fc(k-1,n-k), reject H0. Where Fc(k-1,n-k) is the critical value,
e.g. critical value at 5% level, or 1% level, with the first
degree of freedom (df1)= (k-1) and the second degree of
freedom at df2 = (n-k). k is the number of regressors
including the intercept. N is the number of observations in
the regression.
0
...
β
β
H 3
2
0
k)
)/(n
R
(1
1)
/(k
R
k)
RSS/(n
1)
ESS/(k
RSS/df2
ESS/df1
F 2
2
40
41. The overall significance example:
Expectations-augmented Phillips curve for US 1970-82(n=13)
Actual inflation rate Y(%), unemployment rate X2(%), and
expected inflation rate X3(%)
F 0.05(2,10) = 4.96, F 0.01(2,10) = 10.0, the calculated F-value is
greater than the critical value at 1%. we reject the null
hypothesis .We conclude that the coefficients are jointly
significantly different from zero.
8766
.
0
)
1758
.
0
(
)
3050
.
0
(
(1.5958)
1.4700X
1.3925X
7.1933
Ŷ
2
3t
2t
t
R
35.52
0.8766)/10
(1
0.8766/2
k)
)/(n
R
(1
1)
/(k
R
F 2
2
41
42. Testing for additional variable
To test whether an additional variable is significantly different
from zero
Example, the R2 from model 1 is 0.9978, the R2 from model 2 is
0.9988, n=15, knew = 3, df1 = 1, df2 = 12.
F > F0.01(1,12) = 9.33, we reject the null hypothesis that 3 = 0 at the
1% critical level.
)/df2
R
(1
)/df1
R
(R
)
k
/(n
RSS
variables
new
of
)/number
ESS
(ESS
F
ESS
,
R
,
e
X
β
X
β
β
Y
:
2
Model
ESS
,
R
,
e
X
β
β
Y
:
1
Model
2
new
2
old
2
new
new
new
old
new
new
2
new
t
3t
3
2t
2
1
t
old
2
old
t
2t
2
1
t
10.3978
0.9988)/12
(1
0.9978)
(0.9988
)/12
R
(1
)/1
R
(R
F 2
new
2
old
2
new
42
43. Testing linear equality restriction
Model 1: Unrestricted model- double-log linear
function
Model 2: Restricted model
If we assume constant returns to scale then, 2 + 3 = 1,
by imposing this, we have
t
3t
3
2t
2
1
t e
lnX
β
lnX
β
β
lnY
model
restricted
,
e
)
/X
ln(X
β
β
)
/X
ln(Y
e
)
X
ln(X
β
β
lnX
lnY
e
)
X
ln(X
β
lnX
β
lnY
e
lnX
β
)lnX
β
(1
β
nY
t
2t
3t
3
1
2t
t
t
2t
3t
3
1
2t
t
t
2t
3t
3
2t
1
t
t
3t
3
2t
3
1
t
l
43
44. Testing linear equality restriction
Hypothesis testing
Run regressions of both the unrestricted and
restricted models and obtain the R2’s or the RSS’s,
then calculate the F value.
Where RSSR = RSS of the restricted model, RSSUR = RSS
of the unrestricted model, R2
R = R2 of the restricted
model and R2
UR = R2 of the unrestricted model.
1
:
1
:
3
2
1
3
2
0
H
H
k)
)/(n
R
(1
)/m
R
(R
k)
/(n
RSS
)/m
RSS
(RSS
F 2
UR
2
R
2
UR
UR
UR
R
44
45. Testing linear equality restriction:
example
The Cobb-Douglas production function for Taiwanese
agriculture 1958-72.
Unrestricted model
Restricted model
F?
12
df
0.8890,
R
(4.80)
(2.78)
1.36)
(
t
0.4899X
1.4988lnX
3.3384
Ŷ
ln
2
UR
3t
2t
t
1
m
0.8489,
2
R
R
(6.57)
(4.11)
t
)
2t
/X
3t
0.6129(X
1.7086
)
2t
/X
t
Ŷ
ln(
45
46. Testing linear equality restriction:
example
We cannot reject the null hypothesis that 2+3 = 1 at
the 5% critical level. However, the F value and the
critical value are close. Hence we have to be careful
about making decision. For example, we can reject
the hypothesis at 10% level.
4.36
0.8890)/12
(1
0.8489)
(0.8890
k)
)/(n
R
(1
)/m
R
(R
F 2
UR
2
R
2
UR
0.05(1,12)
0.05(1,12) F
F
4.75,
F
46
47. Testing for structural stability (Chow-
test, use of F-value again)
This is to test whether the same model Yt = f(Xt) can be
used for different time periods. If not, there will be
structural break in the model.
There are three possibilities that the same model may not
work for different time periods:
The intercept may be different
The coefficients may be different
Both the intercept and the coefficients are different
Hence, it is important that we test for the structural
stability of the model.
47
48. Testing for structural stability (Chow-
test, use of F-value again)
Suppose we break the entire data period into the sub-period,
then run the model for each of the two sub-periods and for the
entire data period to obtain the RSS’s from three different
regressions.
RSSall = RSS from the regression that uses all the observations from
the entire data period, the number of observations is n = n1 + n2.
RSS1 = RSS from the regression that use only the observations of
the first sub-period, the number of observations is n1.
RSS2 = RSS from the regression that use only the observations of
the second sub-period, the number of observations is n2.
RSS1+2 = RSS1 + RSS2.
K = number of regressors including intercept.
2k)
/(n
RSS
)/k
RSS
(RSS
F
2
1
2
1
all
48
49. Testing for structural stability
The null hypothesis is no structural change
If the F value does not exceed the critical F value, we do
not reject the null hypothesis of parameter stability
If the F value exceed the critical F value, we reject the
hypothesis of parameter stability
49
50. Looking at the p-values for the F test
We can also look at the p-values, the probability of
not rejecting the null hypothesis, given the estimated
coefficients and their standard errors.
These are computed by the econometric package.
For a 5% significance level, you reject H0 if p-
value<0.05.
50
52. Introducing qualitative factors in our
analysis
What do we mean by qualitative factors?
Examples:
Do men earn more than women?
Do students that live in campus get higher scores in their
exams compared to students who live off-campus?
How do we account for these factors?
Dummy variable = it’s a binary variable (0/1) where 1 indicates
that a particular event occurs (living in campus or not) or a
particular feature is present (beautiful).
52
53. Dummy variables
The name of the dummy indicates how it is constructed.
A dummy called ‘female’ equals 1 when the person is a female
and 0 otherwise.
A dummy called ‘campus’ equals 1 when the student lives on
campus and 0 otherwise.
Different types of dummies
Shift/intercept dummies
Slope dummies: dummy interact with X
Interaction dummies: the product of two dummy variables
53
54. Case 1: intercept or shift dummy
Let’s start with the simple case of a single dummy
dependent variable
Although there are 2 genders, we only need one
dummy variable. In this case we are using a dummy
called ‘female’. Remember that female + male = 1.
The quality or feature that we set equal to zero is called the base
group or the benchmark group (or reference group).
Examples: men is the base group in the previous case.
We cannot include a dummy for male and one for female
because this will cause a problem of perfect collinearity
i
i
i
i u
education
female
wage
1
0
54
55. Model with a dummy variable
i
i
i
i u
education
female
wage
1
0
i
i
i u
education
wage
1
0 0
*
After estimating our equation we may find the following results:
This shows that alpha is the intercept for male workers.
The intercept for female workers is alpha + delta.
0
Not significantly different from zero
Has a negative sign
Has a positive sign
When a particular worker is a male, our equation will look like:
55
57. Hypothesis testing
We could simply carry out a significance test to test
whether the dummy is significantly different from zero.
Or we might want to test the null against the alternative
that there is discrimination against women:
0
:
0
:
0
1
0
0
H
H
0
:
0
:
0
1
0
0
H
H
Two tailed/sided test
One tailed/sided test
57
58. Possible outcomes of the test for the
wage example
POSSIBLE RESULTS
FOR 0
INTERPRETATION
Negative and
significantly different
from zero
Women earn less than men, given the same level of
education. We find evidence of discrimination against
women.
Positive and
significantly different
from zero
Women earn more than men, given the same level of
education. We find evidence of discrimination against
men.
Not significantly
different from zero
There is no significant difference between male and
female hourly earning. We do not find any evidence of
discrimination
i
i
i
i u
education
female
wage
1
0
58
59. Interaction dummies:
interact dummies with explanatory
variables
We can also interact dummy variables with
other explanatory variables that are not
dummy variables to test whether there
are differences in slope.
For example, we want to estimate
whether the returns to education, in term
of wages, are the same for men and
women.
59
61. How do we formulate this model for
OLS estimation?
i
i
i
i
i
i u
education
female
educ
female
wage
*
)
log( 1
1
0
We could have different situations according to the coefficient estimates.
Example: let’s assume that all variable are statistically significant.
Return to education
for men
Female’s returns to education
are 0.56% less than men.
To know the returns to education for women,
take the difference between
the two coefficients.
i
i
i
i
i educ
female
educ
female
ge
a
w *
0056
.
0
082
.
0
227
.
0
389
.
0
)
log(
61