SlideShare a Scribd company logo
1 of 105
Download to read offline
Multiple Regression Analysis: Statistical Inference: I
Introductory Econometrics: A Modern Approach, 5e
Haoming Liu
National University of Singapore
August 21, 2022
1 . Sampling Distributions of the OLS Estimators
2 . Testing Hypotheses About a Single Population Parameter
3 . Confidence Intervals
4 . Testing Single Linear Restrictions
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 1 / 104
Recap
So far, what do we know how to do with the population model
y = β0 + β1x1 + ... + βkxk + u?
1 Mechanics of OLS for a given sample. We only need MLR.2 insofar as
it introduces the data, and MLR.3 (no perfect collinearity) so that the
OLS estimates exist. Interpretation of OLS regression line – ceteris
paribus effects – R2 goodness-of-fit measure. Some functional form
(natural logarithm).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 2 / 104
Recap: MLRs
1 : y = β0 + β1x1 + β2x2 + ... + βkxk + u
2 : random sampling from the population
3 : no perfect collinearity in the sample
4 : E(u|x1, ..., xk) = E(u) = 0 (exogenous explanatory variables)
5 : Var(u|x1, ..., xk) = Var(u) = σ2 (homoskedasticity)
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 3 / 104
Recap
Unbiasedness of OLS under MLR.1 to MRL.4. Obtain bias (or at
least the direction) when MLR.4 fails due to an omitted variable.
Obtain the variances, Var(β̂j ), under MLR.1 to MLR.5.
The Gauss-Markov Assumptions also imply OLS is the best linear
unbiased estimator (BLUE) (conditional on the values of the
explanatory variables).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 4 / 104
Sampling Distributions of the OLS Estimators
We now want to test hypotheses about the βj . This means we hypothesize
that a population parameter is a certain value, then use the data to
determine whether the hypothesis is likely to be false.
EXAMPLE: (Motivated by ATTEND.DTA)
final = β0 + β1missed + β2priGPA + β3ACT + u
where ACT is the achievement test score. The null hypothesis, that
missing lecture has no effect on final exam performance (after accounting
for prior MSU GPA and ACT score), is
H0 : β1 = 0
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 5 / 104
Sampling Distributions of the OLS Estimators
To test hypotheses about the βj using exact (or “finite sample”) testing
procedures, we need to know more than just the mean and variance of the
OLS estimators.
MLR.1 to MLR.4: We can compute the expected value as
E(β̂j ) = βj
MLR.1 to MLR.5: We know the variance is
Var(β̂j ) =
σ2
SSTj (1 − R2
j )
And, σ̂2 = SSR/(n − k − 1) is an unbiased estimator of σ2
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 6 / 104
Sampling Distributions of the OLS Estimators
But hypothesis testing relies on the entire sampling distributions of
the β̂j . Even under MLR.1 through MLR.5, the sample distributions
can be virtually anything.
Write
β̂j = βj +
n
X
i=1
wij ui ,
where the wij are functions of {(xi1, ..., xik) : i = 1, ..., n}.
Conditional on {(xi1, ..., xik) : i = 1, ..., n}, β̂j inherits its distribution
from that of {ui : i = 1, .., n}, which is a random sample from the
population distribution of u.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 7 / 104
Assumption MRL.6 (Normality)
Normality
The population error u is independent of (x1, ..., xk) and is normally
distribution with mean zero and variance σ2:
u ∼ Normal(0, σ2
)
MLR.4: E(u|x1, ..., xk) = E(u) = 0
MLR.5: Var(u|x1, ..., xk) = Var(u) = σ2
Now MLR.6 imposes full independence between u and (x1, x2, ..., xk)
(not just mean and variance independence), which is where the label
of the xj as “independent variables” originated.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 8 / 104
The important part of MLR.6 is that we have now made a very
specific distributional assumption for u: the familiar bell-shaped curve:
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 9 / 104
Assumption MRL.6 (Normality)
Normality is by far the most common assumption, but the usual
arguments about why normality is a good assumption are not always
operative.
Usually, the argument starts with the claim that u is the sum of many
independent factors, say u = a1 +a2 +...+am for “large” m, and then
we can apply the central limit theorem. But what if the factors have
very different distributions, or are multiplicative rather than additive?
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 10 / 104
Assumption MRL.1-6
Ultimately, like Assumption MLR.5, Assumption MLR.6 is maintained
for convenience. Fortunately, we will later see that, for approximate
inference in large samples, we can drop MLR.6. For now we keep it.
It is very difficult to perform exact statistical inference without
Assumption MLR.6.
Assumptions MLR.1 to MLR.6 are called the classical linear model
(CLM) assumptions (for cross-sectional regression).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 11 / 104
Normality
For practical purposes, think of
CLM = Gauss-Markov + normality
An important fact about independent normal random variables: any
linear combination is also normally distributed. Because the ui are
independent and identically distributed (iid) as Normal(0, σ2),
β̂j = βj +
n
X
i=1
wij ui ∼ Normal[βj , Var(β̂j )]
where we already know the formula for Var(β̂j ):
Var(β̂j ) =
σ2
SSTj (1 − R2
j )
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 12 / 104
THEOREM (Normal Sampling Distributions)
Under the CLM Assumptions (and conditional on the sample outcomes of
the explanatory variables),
β̂j ∼ Normal[βj , Var(β̂j )]
and so
β̂j − βj
sd(β̂j )
∼ Normal(0, 1)
The second result follows from a feature of the normal distribution: if
W ∼ Normal then a + bW ∼ Normal for constants a and b.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 13 / 104
Normality
The standardized random variable
β̂j − βj
sd(β̂j )
always has zero mean and variance one. Under MLR.6, it is also
normally distributed.
Notice that the standard normal distribution holds even when we do
not condition on {(xi1, xi2, ..., xik) : i = 1, ..., n}.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 14 / 104
Testing Hypotheses About a Single Population Parameter
We cannot directly use the result
β̂j − βj
sd(β̂j )
∼ Normal(0, 1)
to test hypotheses about βj : sd(β̂j ) depends on σ = sd(u), which is
unknown.
But we have σ̂ as an estimator of σ. Using this in place of σ gives us
the standard error, se(β̂j ).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 15 / 104
THEOREM (t Distribution for Standardized Estimators)
Under the CLM Assumptions,
β̂j − βj
se(β̂j )
∼ tn−k−1 = tdf
We will not prove this as the argument is somewhat involved.
It is replacing σ (an unknown constant) with σ̂ (an estimator that
varies across samples), that takes us from the standard normal to the
t distribution.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 16 / 104
Distribution for Standardized Estimators
The t distribution also has a bell shape, but is more spread out than
the Normal(0, 1).
E(tdf ) = 0 if df > 1
Var(tdf ) =
df
df − 2
> 1 if df > 2
We will never have very small df in this class.
When df = 10, Var(tdf ) = 1.25, which is 25% larger than the
Normal(0, 1) variance.
When df = 120, Var(tdf ) ≈ 1.017 – only 1.7% larger than the
standard normal.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 17 / 104
Distribution for Standardized Estimators
As df → ∞,
tdf → Normal(0, 1)
The difference is practically small for df > 120.
The next graph plots a standard normal pdf against a t6 pdf.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 18 / 104
Testing
We use the result on the t distribution to test the null hypothesis that
xj has no partial effect on y:
H0 : βj = 0
lwage = β0 + β1educ + β2exper + β3tenure + u
H0 : β2 = 0
In words: Once we control for education and time on the current job
(tenure), total workforce experience has no affect on
lwage = log(wage).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 19 / 104
Testing
To test H0 : βj = 0, we use the t statistic (or t ratio),
tβ̂j
=
β̂j
se(β̂j )
This is the estimated coefficient divided by our estimate of β̂j ’s
sampling standard deviation. In virtually all cases β̂j is not exactly
equal to zero. When we use tβ̂j
, we are measuring how far β̂j is from
zero relative to its standard error.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 20 / 104
Testing
Because se(β̂j ) > 0, tβ̂j
always has the same sign as β̂j . To use tβ̂j
to
test H0 : βj = 0, we need to have an alternative.
Some like to define tβ̂j
as the absolute value, so it is always positive.
This makes it cumbersome to test against one-sided alternatives.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 21 / 104
Testing Against One-Sided Alternatives
First consider the alternative
H1 : βj > 0
which means the null is effectively
H0 : βj ≤ 0
Using a positive one-sided alternative, if we reject βj = 0 than we
reject any βj < 0, too. We often just state H0 : βj = 0 and act like
we do not care about negative values.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 22 / 104
Testing Against One-Sided Alternatives
If the estimated coefficient β̂j is negative, it provides no evidence
against H0 in favor of H1 : βj > 0.
If β̂j is positive, the question is: How big does tβ̂j
= β̂j /se(β̂j ) have
to be before we conclude H0 is “unlikely”?
Traditional approach to hypothesis testing:
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 23 / 104
Testing Against One-Sided Alternatives
1 . Choose a null hypothesis: H0 : βj = 0 (or H0 : βj ≤ 0)
2 . Choose an alternative hypothesis: H1 : βj > 0
3 . Choose a significance level (or simply level, or size) for the test.
That is, the probability of rejecting the null hypothesis when it is in
fact true. (Type I Error). Suppose we use 5%, so the probability of
committing a Type I error is .05.
4 . Choose a critical value, c > 0, so that the rejection rule
tβ̂j
> c
leads to a 5% level test.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 24 / 104
Testing Against One-Sided Alternatives
The key is that, under the null hypothesis,
tβ̂j
∼ tn−k−1 = tdf
and this is what we use to obtain the critical value, c.
Suppose df = 28 and we use a 5% test. The critical value is
c = 1.701, as can be gotten from Table G.2 (page 833 in 5e).
The following picture shows that we are conducting a one-tailed test
(and it is these entries that should be used in the table).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 25 / 104
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 26 / 104
Testing Against One-Sided Alternatives
So, with df = 28, the rejection rule for H0 : βj = 0 against
H1 : βj > 0, at the 5% level, is
tβ̂j
> 1.701
We need a t statistic greater than 1.701 to conclude there is enough
evidence against H0.
If tβ̂j
≤ 1.701, we fail to reject H0 against H1 at the 5% significance
level.
Suppose df = 28, but we want to carry out the test at a different
significance level (often 10% level or the 1% level).
c.10 = 1.313
c.05 = 1.701
c.01 = 2.467
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 27 / 104
Testing Against One-Sided Alternatives
If we want to reduce the probability of Type I error, we must increase
the critical value (so we reject the null less often).
If we reject at, say, the 1% level, then we must also reject at any
larger level.
If we fail to reject at, say, the 10% level – so that tβ̂j
≤ 1.313 – then
we will fail to reject at any smaller level.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 28 / 104
Testing Against One-Sided Alternatives
With large sample sizes – certain when df > 120 – we can use critical
values from the standard normal distribution. These are the df = ∞
entry in Table G.2.
c.10 = 1.282
c.05 = 1.645
c.01 = 2.362
which we can round to 1.28, 1.65, and 2.36, respectively. The value
1.65 is especially common for a one-tailed test.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 29 / 104
EXAMPLE: Factors Affecting lwage (WAGE2.DTA)
In applications, it is helpful to label parameters with variable names
to state hypotheses. So βeduc, βIQ, and βexper , for example. Then
H0 : βexper = 0
is that workforce experience has no effect on a wage once education,
and IQ have been accounted for.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 30 / 104
EXAMPLE: Factors Affecting lwage (WAGE2.DTA)

lwage = −.229
(.230)
+ .107
(.012)
educ + .0080
(.0016)
IQ + .0435
(.0084)
exper
n = 759, R2
= .217
The quantities in parentheses are still standard errors, not t statistics!
Easiest to read the t statistic off the Stata output, when available:
texper = 5.17,
which is well above the one-sided critical value at the 1% level, 2.36.
In fact, the .5% critical value is about 2.58.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 31 / 104
EXAMPLE: Factors Affecting lwage (WAGE2.DTA)
The bottom line is that H0 : βexper = 0 can be rejected against
H1 : βexper > 0 at very small significance levels. A t of 5.17 is very
large.
The estimated effect of exper – that is, its economic importance – is
apparent. Another year of experience, holding educ and IQ fixed, is
estimated to be worth about 4.4%.
The t statistics for educ and IQ are also very large; there is no need
to even look up critical values.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 32 / 104
. reg lwage educ IQ exper
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 3, 755) = 69.78
Model | 57.0352742 3 19.0117581 Prob > F = 0.0000
Residual | 205.71337 755 .27246804 R-squared = 0.2171
-------------+------------------------------ Adj R-squared = 0.2140
Total | 262.748644 758 .346634095 Root MSE = .52198
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1069849 .0116513 9.18 0.000 .084112 .1298578
IQ | .0080269 .0015893 5.05 0.000 .0049068 .0111469
exper | .0435405 .0084242 5.17 0.000 .0270028 .0600783
_cons | -.228922 .2299876 -1.00 0.320 -.6804132 .2225692
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 33 / 104
EXAMPLE: Does ACT score help predict college GPA?
In the GPA1.DTA n = 141 MSU students from mid-1990s. All variables
are self reported.
Consider controlling for high school GPA:
colGPA = β0 + β1hsGPA + β2ACT + u
H0 : β2 = 0
From the Stata ouput, β̂2 = β̂ACT = .0094 and tACT = .87. Even at
the 10% level (c = 1.28), we cannot reject H0 against H1 : βACT > 0.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 34 / 104
Does ACT score help predict college GPA?
Because we fail to reject H0 : βACT = 0, we say that “β̂ACT is
statistically insignificant at the 10% level against at one-sided
alternative.”
It is also very important to see that the estimated effect of ACT is
small, too. Three more points (slightly more than one standard
deviation) only predicts colGPA that is .0094(3) ≈ .028 – not even
three one-hundreths of a grade point.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 35 / 104
Does ACT score help predict college GPA?
By contrast, β̂hsGPA = .453 is large in a practical sense – each point
on hsGPA is associated with about .45 points on colGPA – and
thsGPA = 4.73 is very large.
No critical values in Table G.2 with df = 141 − 3 = 138 are even
close to 4. So “β̂hsGPA is statistically significant” at very small
significance levels.
Notice what happens if we do not control for hsGPA. The simple
regression estimate is .0271 with tACT = 2.49. The magnitude is still
pretty modest, but we would conclude it is statistically different from
zero at the 1% significance level using the standard normal critical
value, 2.36.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 36 / 104
Does ACT score help predict college GPA?
Not clear why ACT has such a small, statistically insignificant effect.
The sample size is small and the scores were self-reported. The survey
was done in a couple of economics courses, so it is not a random
sample of all MSU students.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 37 / 104
. des colGPA hsGPA ACT
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------
colGPA float %9.0g MSU GPA
hsGPA float %9.0g high school GPA
ACT byte %9.0g ’achievement’ score
. sum colGPA hsGPA ACT
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
colGPA | 141 3.056738 .3723103 2.2 4
hsGPA | 141 3.402128 .3199259 2.4 4
ACT | 141 24.15603 2.844252 16 33
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 38 / 104
. reg colGPA hsGPA ACT
Source | SS df MS Number of obs = 141
-------------+------------------------------ F( 2, 138) = 14.78
Model | 3.42365506 2 1.71182753 Prob > F = 0.0000
Residual | 15.9824444 138 .115814814 R-squared = 0.1764
-------------+------------------------------ Adj R-squared = 0.1645
Total | 19.4060994 140 .138614996 Root MSE = .34032
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsGPA | .4534559 .0958129 4.73 0.000 .2640047 .6429071
ACT | .009426 .0107772 0.87 0.383 -.0118838 .0307358
_cons | 1.286328 .3408221 3.77 0.000 .612419 1.960237
------------------------------------------------------------------------------
. reg colGPA ACT
Source | SS df MS Number of obs = 141
-------------+------------------------------ F( 1, 139) = 6.21
Model | .829558811 1 .829558811 Prob > F = 0.0139
Residual | 18.5765406 139 .133644177 R-squared = 0.0427
-------------+------------------------------ Adj R-squared = 0.0359
Total | 19.4060994 140 .138614996 Root MSE = .36557
------------------------------------------------------------------------------
colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ACT | .027064 .0108628 2.49 0.014 .0055862 .0485417
_cons | 2.402979 .2642027 9.10 0.000 1.880604 2.925355
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 39 / 104
For the negative one-sided alternative,
H0 : βj < 0,
we use a symmetric rule. But the rejection rule is
tβ̂j
< −c
where c is chosen in the same way as in the positive case.
With df = 18 and a 5% test, the critical value is c = −1.734, so the
rejection rule is
tβ̂j
< −1.734
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 40 / 104
Now we must see a significantly negative value for the t statistic to
reject H0 : βj = 0 in favor of H1 : βj < 0.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 41 / 104
EXAMPLE: Does missing lectures affect final exam
performance?
final = β0 + β1missed + β2priGPA + β3ACT + u
H0 : β1 = 0, H1 : β1 < 0
We get β̂1 = −.079, tβ̂1
= −2.25. The 5% cv is −1.65 and the 1% cv
is −2.36. So we reject H0 in favor of H1 at the 5% level but not at
the 1% level.
The effect is not huge: 10 missed lectures, out of 32, lowers final
exam score by about .8 points – so not even one point.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 42 / 104
. reg final missed priGPA ACT
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 3, 676) = 56.79
Model | 3032.09408 3 1010.69803 Prob > F = 0.0000
Residual | 12029.853 676 17.7956405 R-squared = 0.2013
-------------+------------------------------ Adj R-squared = 0.1978
Total | 15061.9471 679 22.1825435 Root MSE = 4.2185
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
missed | -.0793386 .0352349 -2.25 0.025 -.1485216 -.0101556
priGPA | 1.915294 .372614 5.14 0.000 1.183674 2.646914
ACT | .4010639 .0532268 7.54 0.000 .2965542 .5055736
_cons | 12.37304 1.171961 10.56 0.000 10.07192 14.67416
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 43 / 104
If we do not control for ACT score, the effect of missed goes away. It turns out that missed and ACT are
positively correlated: those with higher ACT scores miss more classes, on average.
. reg final missed priGPA
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 2, 677) = 52.48
Model | 2021.72415 2 1010.86207 Prob > F = 0.0000
Residual | 13040.2229 677 19.2617768 R-squared = 0.1342
-------------+------------------------------ Adj R-squared = 0.1317
Total | 15061.9471 679 22.1825435 Root MSE = 4.3888
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
missed | .0172012 .0341483 0.50 0.615 -.0498481 .0842504
priGPA | 3.237554 .3419779 9.47 0.000 2.56609 3.909019
_cons | 17.41567 1.000942 17.40 0.000 15.45035 19.381
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 44 / 104
Reminder about Testing
Our hypthoses involve the unknown population values, βj . If in a our
set of data we obtain, say, β̂j = 2.75, we do not write the null
hypothesis as
H0 : 2.75 = 0
(which is obviously false).
Nor do we write
H0 : β̂j = 0
(which is also false except in the very rare case that our estimate is
exactly zero).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 45 / 104
Testing Against Two-Sided Alternatives
We do not test hypotheses about the estimate! We know what it is
once we collect the sample. We hypothesize about the unknown
population value, βj .
Sometimes we do not know ahead of time whether a variable definitely
has a positive effect or a negative effect. Even in the example
final = β0 + β1missed + β2priGPA + β3ACT + u
it is conceivable that missing class helps final exam performance.
(The extra time is used for studying, say.)
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 46 / 104
Generally, the null and alternative are
H0 : βj = 0
H1 : βj ̸= 0
Testing against the two-sided alternative is usually the default. It
prevents us from looking at the regression results and then deciding
on the alternative. Also, it is harder to reject H0 against the two-sided
alternative, so it requires more evidence that xj actually affects y.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 47 / 104
Two-Sided Alternatives
Now we reject if β̂j is sufficiently large in magnitude, either positive or
negative. We again use the t statistic tβ̂j
= β̂j /se(β̂j ), but now the
rejection rule is
tβ̂j
> c
This results in a two-tailed test, and those are the critical values we
pull from Table G.2.
For example, if we use a 5% level test and df = 25, the two-tailed cv
is 2.06. The two-tailed cv is, in this case, the 97.5 percentile in the
t25 distribution. (Compare the one-tailed cv, about 1.71, the 95th
percentile in the t25 distribution).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 48 / 104
Two-Sided Alternatives
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 49 / 104
EXAMPLE: Factors affecting math pass rates.
(MEAP98.DTA)
Run a multiple regression of math4 on lunch, str, avgsal, enrol.
A priori, we might expect lunch to have a negative effect (it is
essentially a school-level poverty rate), str to have a negative effect,
and avgsal to have a positive effect. But we can still test against a
two-sided alternative to avoid specifying the alternative ahead of time.
enrol is clearly ambiguous.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 50 / 104
With 923 observations, we can use the standard normal critical values. For a 10% test it is 1.65, for a 5%,
1.96, and for 1%, cv = 2.58.
. des math4 lunch str avgsal enrol
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
math4 byte %9.0g pass rate, 4th grade math test
lunch float %9.0g % students eligible free lunch
str float %9.0g student-teacher ratio
avgsal float %9.0g average teacher salary
enrol int %9.0g school enrollment
. sum math4 lunch str avgsal enrol
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
math4 | 923 60.54713 19.71111 3 100
lunch | 923 37.34231 26.21696 0 98.78
str | 923 23.50704 3.755936 7.6 41.1
avgsal | 923 47557.53 8577.373 13976 81045
enrol | 923 403.5655 162.6491 18 1176
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 51 / 104
. reg math4 lunch str avgsal enrol
Source | SS df MS Number of obs = 923
-------------+------------------------------ F( 4, 918) = 68.82
Model | 82641.3258 4 20660.3315 Prob > F = 0.0000
Residual | 275581.374 918 300.197575 R-squared = 0.2307
-------------+------------------------------ Adj R-squared = 0.2273
Total | 358222.7 922 388.527874 Root MSE = 17.326
------------------------------------------------------------------------------
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lunch | -.2911477 .0237168 -12.28 0.000 -.3376931 -.2446023
str | -.8354922 .1776196 -4.70 0.000 -1.18408 -.4869046
avgsal | .0003744 .000079 4.74 0.000 .0002194 .0005294
enrol | .0050858 .0036523 1.39 0.164 -.002082 .0122537
_cons | 71.20066 4.302933 16.55 0.000 62.75593 79.64539
------------------------------------------------------------------------------
The variables lunch, str, and avgsal all of coefficients with the anticipated signs, and the absolute
values of the t statistics are above 4. So we easily reject H0 : βj = 0 against H1 : βj ̸= 0.
enrol is a different situation. tenroll = 1.39 < 1.65, so we fail to reject H0 at even the 10% signficance
level.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 52 / 104
Functional form can make a difference. The math pass rates are capped at 100, so a diminishing effect in
avgsal and enrol seem appropriate; these variables have lots of variation. So use the logarithm instead.
. reg math4 lunch str lavgsal lenrol
Source | SS df MS Number of obs = 923
-------------+------------------------------ F( 4, 918) = 71.09
Model | 84715.9491 4 21178.9873 Prob > F = 0.0000
Residual | 273506.751 918 297.937637 R-squared = 0.2365
-------------+------------------------------ Adj R-squared = 0.2332
Total | 358222.7 922 388.527874 Root MSE = 17.261
------------------------------------------------------------------------------
math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lunch | -.2886697 .0235046 -12.28 0.000 -.3347986 -.2425408
str | -.9549563 .1824296 -5.23 0.000 -1.312984 -.5969288
lavgsal | 18.13305 3.605116 5.03 0.000 11.05782 25.20827
lenrol | 2.622179 1.256434 2.09 0.037 .1563616 5.087996
_cons | -116.6793 37.28153 -3.13 0.002 -189.8462 -43.51239
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 53 / 104
Of course, all estimates change, but it is those on the lavgsal and
lenrol that are now much different. Before, we were measure a dollar
effect. But now, holding the other variables fixed,
∆ 
math4 = (18.13/100)%∆avgsal = .1813(%∆avgsal)
So if, say, %∆avgsal = 10 – teacher salaries are 10 percent higher –
math4 is estimated to increase by about 1.8 points.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 54 / 104
Also,
∆ 
math4 = (2.62/100)%∆enroll = .0262(%∆enroll)
so a 10% increase in enrollment is associated with a .26 point increase in
math4.
Notice how lenrol = log(enrol) is statistically significant at the 5% level:
tlenrol = 2.09 > 1.96.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 55 / 104
Reminder: When we report the results of, say, the second regression,
it looks like

math4 = −116.68
(37.28)
− .289
(.024)
lunch − .955
(.182)
str + 18.13
(3.61)
lavgsal + 2.62
(1.26)
lenro
n = 903, R2
= .237
so that standard errors are below coefficients.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 56 / 104
When we reject H0 : βj = 0 against H1 : βj ̸= 0, we often say that β̂j is
statistically different from zero and usually mention a significance level. For
example, if we can reject at the 1% level, we say that. If we can reject a the
10% level but not the 5%, we say that.
As in the one-sided case, we also say β̂j is “statistically significant” when we
can reject H0 : βj = 0.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 57 / 104
Testing Other Hypotheses about the βj
Testing the null H0 : βj = 0 is by far the most common. That is why
Stata and other regression packages automatically report the t
statistic for this hypothesis.
It is critical to remember that
tβ̂j
=
β̂j
se(β̂j )
is only for H0 : βj = 0.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 58 / 104
What if we want to test a different null value? For example, in a
constant-elasticity consumption function,
log(cons) = β0 + β1 log(inc) + β2famsize + β3pareduc + u
we might want to test
H0 : β1 = 1
which means an income elasticity equal to one. (We can be pretty
sure that β1 > 0.)
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 59 / 104
More generally, suppose the null is
H0 : βj = aj
where we specify the value aj (usually zero, but, in the consumption
example, aj = 1).
It is easy to extend the t statistic:
t =
(β̂j − aj )
se(β̂j )
This t statistic just measures how far our estimate, β̂j , is from the
hypothesized value, aj , relative to se(β̂j ).
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 60 / 104
A useful general expression for general t testing:
t =
(estimate − hypothesized value)
standard error
The alternative can be one-sided or two-sided.
We choose critical values in exactly the same way as before.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 61 / 104
The language needs to be suitably modified. If, for example,
H0 : βj = 1
H1 : βj ̸= 1
is rejected at the 5% level, we say “β̂j is statistically different from
one at the 5% level.” Otherwise, β̂j is “not statistically different from
one.” If the alternative is H1 : βj > 1, then “β̂j is statistically greater
than one at the 5% level.”
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 62 / 104
EXAMPLE: Crime and enrollment on college campuses
(CAMPUS.DTA)
A simple regression model:
log(crime) = β0 + β1 log(enroll) + u
H0 : β1 = 1
H1 : β1 > 1
We get β̂1 = 1.27, and so a 1% increase in enrollment is estimated to
increase crime by 1.27% (so more than 1%). Is this estimate statistically
greater than one?
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 63 / 104
Crime and enrollment on college campuses
(CAMPUS.DTA)
We cannot pull the t statistic off of the usual Stata output. We can
compute it by hand (rounding the estimate and standard error):
t =
(1.270 − 1)
.110
≈ 2.45
(Note how this is much smaller than the t for H0 : β1 = 0, reported
by Stata.)
We have df = 97 − 2 = 95, so we use the df = 120 entry in Table
G.2. The 1% cv for a one-sided alternative is about 2.36, so we reject
at the 1% significance level.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 64 / 104
. reg lcrime lenroll
Source | SS df MS Number of obs = 97
-------------+------------------------------ F( 1, 95) = 133.79
Model | 107.083654 1 107.083654 Prob > F = 0.0000
Residual | 76.0358244 95 .800377098 R-squared = 0.5848
-------------+------------------------------ Adj R-squared = 0.5804
Total | 183.119479 96 1.90749457 Root MSE = .89464
------------------------------------------------------------------------------
lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lenroll | 1.26976 .109776 11.57 0.000 1.051827 1.487693
_cons | -6.63137 1.03354 -6.42 0.000 -8.683206 -4.579533
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 65 / 104
Alternatively, we can let Stata do the work using the lincom (“linear combination” command). Here the
null is stated equivalent as
H0 : β1 − 1 = 0
. lincom lenroll - 1
( 1) lenroll = 1
------------------------------------------------------------------------------
lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .2697603 .109776 2.46 0.016 .0518273 .4876932
------------------------------------------------------------------------------
The t = 2.46 is the more accurate calculation of the t statistic.
The lincom lenroll - 1 command is Stata’s way of saying “test whether βlenroll − 1 equals zero.”
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 66 / 104
Computing p-Values for t Tests
The traditional approach to testing, where we choose a significance
level ahead of time, can be cumbersome.
Plus, it can conceal information. For example, suppose that, for
testing against a two-sided alternative, a t statistical is just below the
5% cv. I could simply say that “I fail to reject H0 : βj = 0 against the
two-sided alternative at the 5% level.” But there is nothing sacred
about 5%. Might I reject at, say, 6%?
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 67 / 104
Computing p-Values for t Tests
Rather than have to specify a level ahead of time, or discuss different
traditional significance levels (10%, 5%, 1%), it is better to answer
the following question: Given the observed value of the t statistic,
what is the smallest significance level at which I can reject H0?
The smallest level at which the null can be rejected is known as the
p-value of a test. It is a single number that automatically allows us to
carry out the test at any level.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 68 / 104
One way to think about the p-values is that it uses the observed
statistic as the critical value, and then finds the significance level of
the test using that critical value.
It is most common to report p-values for two-sided alternatives. This
is what Stata does. The t tables are not detailed enough.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 69 / 104
For t testing against a two-sided alternative,
p-value = P(|T| > |t|)
where t is the value of the t statistic and T is a random variable with
the tdf distribution.
The p-value is a probability, so it is between zero and one.
Perhaps the best way to think about p-values: it is the probability of
observing a statistic as extreme as we did if the null hypothesis is true.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 70 / 104
So smaller p-values provide more are evidence against the null. For
example, if p-value = .50, then there is a 50% chance of observing a
t as large as we did (in absolute value). This is not enough evidence
against H0.
If p-value = .001, then the chance of seeing a t statistic as extreme
as we did is .1%. We can conclude that we got a very rare sample –
which is not helpful – or that the null hypothesis is very likely false.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 71 / 104
From
p-value = P(|T| > |t|)
we see that as |t| increases the p-value decreases. Large absolute t
statistics are associated with small p-values.
Suppose df = 40 and, from our data, we obtain t = 1.85 or
t = −1.85. Then
p-value = P(|T| > 1.85) = 2P(T > 1.85) = 2(.0359) = .0718
where T˜t40. Finding the actual numbers required using Stata.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 72 / 104
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 73 / 104
Given p-value, we can carry out a test at any significance level. If α is
the chosen level, then
Reject H0 if p-value < α
For example, in the previous example we obtained p-value = .0718.
This means that we reject H0 at the 10% level but not the 5% level.
We reject at 8% but (not quite) at 7%.
Knowing p-value = .0718 is clearly much better than just saying “I
fail to reject at the 5% level.”
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 74 / 104
Computing p-Values for One-Sided Alternatives
Stata and other packages report the two-sided p-value. How can we
get a one-sided p-value?
With a caveat, the answer is simple:
one-sided p-value =
two-sided p-value
2
We only want the area in one tail, not two tails. The two-sided
p-value gives us the area in both tails.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 75 / 104
This is the correct calculation when it is interesting to do the
calculation. The caveat is simple: if the estimated coefficient is not in
the direction of the alternative, the one-sided p-value is above .50,
and so it is not an interesting calculation.
In Stata, the two-sided p-values for H0 : βj = 0 are given in the
column labeled P |t|.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 76 / 104
EXAMPLE: Factors Affecting NBA Salaries
(NBASAL.DTA)
des wage games mingame points rebounds assists
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------
wage float %9.0g annual salary, thousands $
games byte %9.0g average games per year
mingame float %9.0g minutes per game
points float %9.0g points per game
rebounds float %9.0g rebounds per game
assists float %9.0g assists per game
. sum wage games mingame points rebounds assists
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
wage | 269 1423.828 999.7741 150 5740
games | 269 65.72491 18.85111 3 82
mingame | 269 23.97925 9.731177 2.888889 43.08537
points | 269 10.21041 5.900667 1.2 29.8
rebounds | 269 4.401115 2.892573 .5 17.3
-------------+--------------------------------------------------------
assists | 269 2.408922 2.092986 0 12.6
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 77 / 104
Factors Affecting NBA Salaries (NBASAL.DTA)
Use lwage = log(wage) to get constant percentage effects.
. reg lwage games mingame points rebounds assists
Source | SS df MS Number of obs = 269
-------------+------------------------------ F( 5, 263) = 40.27
Model | 90.2698185 5 18.0539637 Prob > F = 0.0000
Residual | 117.918945 263 .448361006 R-squared = 0.4336
-------------+------------------------------ Adj R-squared = 0.4228
Total | 208.188763 268 .776823743 Root MSE = .6696
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
games | .0004132 .002682 0.15 0.878 -.0048679 .0056942
mingame | .0302278 .0130868 2.31 0.022 .0044597 .055996
points | .0363734 .0150945 2.41 0.017 .0066519 .0660949
rebounds | .0406795 .0229455 1.77 0.077 -.0045007 .0858597
assists | .0003665 .0314393 0.01 0.991 -.0615382 .0622712
_cons | 5.648996 .1559075 36.23 0.000 5.34201 5.955982
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 78 / 104
Forgetting the intercept (or “constant”), none of the variables is
statistically significant at the 1% level against a two-sided alternative.
The closest is points, with p-value = .017. (The one-sided p-value is
.017/2 = .0085 < .01, so it is significant at the 1% level against the
positive one-sided alternative.)
mingame is statistically significant a the 5% level because p-value
= .022 < .05.
rebounds is statistically significant a the 10% level (against a
two-sided alternative) because p-value = .077 < .10, but not at the
5% level. But the one-sided p-value is .077/2 = .0385
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 79 / 104
Both games and assists have very small t statistics, which lead to
p-values close to one (for example, for assists, p-value = .991).
These variables are statistically insignificant.
In some applications, p-values equal to zero up to three decimal
places are not uncommon. We do not have to worry about statistical
significance in such cases.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 80 / 104
Using WAGE2.DTA:
. reg lwage educ IQ exper motheduc
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 4, 754) = 54.26
Model | 58.7293322 4 14.682333 Prob > F = 0.0000
Residual | 204.019312 754 .270582642 R-squared = 0.2235
-------------+------------------------------ Adj R-squared = 0.2194
Total | 262.748644 758 .346634095 Root MSE = .52018
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1006798 .0118813 8.47 0.000 .0773555 .124004
IQ | .00735 .0016068 4.57 0.000 .0041957 .0105043
exper | .0449386 .0084136 5.34 0.000 .0284217 .0614555
motheduc | .0239265 .0095623 2.50 0.013 .0051545 .0426985
_cons | -.3837064 .2373921 -1.62 0.106 -.8497344 .0823215
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 81 / 104
Language of Hypothesis Testing
If we do not reject H0 (against any alternative), it is better to say “we
fail to reject H0” as opposed to “we accept H0,” which is somewhat
common.
The reason is that many null hypotheses cannot be rejected in any
application. For example, if I have β̂j = .75 and se(β̂j ) = .25, I do
not say that I “accept H0 : βj = 1.”
I fail to reject because the t statistic is (.75 − 1)/.25 = −1.
But the t statistic for H0 : βj = .5 is (.75 − .5)/.25 = 1, so I cannot
reject H0 : βj = .5, either.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 82 / 104
Clearly βj = .5 and βj = 1 cannot both be true. There is a single,
unknown value in the population. So I should not “accept” either.
The outcomes of the t tests tell us the data cannot reject either
hypothesis. Nor can the data reject H0 : βj = .6, and so on. The data
does reject H0 : βj = 0 (t = 3) at a pretty small significance level (if
we have a reasonable df .)
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 83 / 104
Practical versus Statistical Significance
t testing is purely about statistical significance. It does not directly
speak to the issue of whether a variable has a practically, or
economically, large effect.
Practical (Economic) Significance depends on the size (and sign)
of β̂j .
Statistical Significance depends on tβ̂j
.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 84 / 104
It is possible estimate practically large effects but have the estimates
so imprecise that they are statistically insignificant. This is especially
an issue with small data sets (but not only small data sets).
Even more importantly, it is possible to get estimates that are
statistically significant – often with very small p-values – but are not
practically large. This can happen with very large data sets.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 85 / 104
EXAMPLE
Suppose that, using a large cross section data set for teenagers across
the U.S.,
we estimate the elasticity of alcohol demand with respect to price to
be −.013 with se = .002.
Then the t statistic is −6.5, and we need look no further to conclude
the elasticity is statistically different from zero. But the estimate
means that, say, a 10% increase in the price of alcohol reduces
demand by an estimated .13%. This is a small effect.
The bottom line: do not just fixate on t statistics! Interpreting the β̂j
is just as important.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 86 / 104
Confidence Intervals
Rather than just testing hypotheses about parameters it is also useful
to construct confidence intervals (also know as interval estimates)
Loosely, the CI is supposed to give a “likely” range of values for the
corresponding population parameter.
We will only consider CIs of the form
β̂j ± c · se(β̂j )
where c > 0 is chosen based on the confidence level.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 87 / 104
We will use a 95% confidence level, in which case c comes from the
97.5 percentile in the tdf distribution. In other words, c is the 5%
critical value against a two-sided alternative.
Stata automatically reports at 95% CI for each parameter, based on
the t distribution using the appropriate df .
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 88 / 104
. reg lwage games mingame points rebounds assists
Source | SS df MS Number of obs = 269
-------------+------------------------------ F( 5, 263) = 40.27
Model | 90.2698185 5 18.0539637 Prob > F = 0.0000
Residual | 117.918945 263 .448361006 R-squared = 0.4336
-------------+------------------------------ Adj R-squared = 0.4228
Total | 208.188763 268 .776823743 Root MSE = .6696
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
games | .0004132 .002682 0.15 0.878 -.0048679 .0056942
mingame | .0302278 .0130868 2.31 0.022 .0044597 .055996
points | .0363734 .0150945 2.41 0.017 .0066519 .0660949
rebounds | .0406795 .0229455 1.77 0.077 -.0045007 .0858597
assists | .0003665 .0314393 0.01 0.991 -.0615382 .0622712
_cons | 5.648996 .1559075 36.23 0.000 5.34201 5.955982
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 89 / 104
Notice how the three estimates that are not statistically different from
zero at the 5% level – games, rebounds, and assists – all have 95%
CIs that include zero. For example, the 95% CI for βrebounds is
[−.0045, .0859]
By contrast, the 95% CI for βpoints is
[.0067, .0661]
which excludes zero.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 90 / 104
A simple rule-of-thumb is useful for constructing a CI given the
estimate and its standard error. For, say, df ≥ 60, an approximate
95% CI is
β̂j ± 2se(β̂j ) or [β̂j − 2se(β̂j ), β̂j + 2se(β̂j )]
That is, subtract and add twice the standard error to the estimate.
(In the case of the standard normal, the 2 becomes 1.96.)
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 91 / 104
Properly interpeting a CI is a bit tricky. One often sees statements
such as “there is a 95% chance that βpoints is in the interval
[.0067, .0661].” This is incorrect. βpoints is some fixed value, and it
either is or is not in the interval.
The correct way to interpret a CI is to remember that the endpoints,
β̂j − c · se(β̂j ) and β̂j + c · se(β̂j ), change with each sample (or at
least can change). That is, the endpoints are random outcomes that
depend on the data we draw.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 92 / 104
What a 95% CI means is that for 95% of the random samples that we
draw from the population, the interval we compute using the rule
β̂j ± c · se(β̂j ) will include the value βj . But for a particular sample
we do not know whether βj is in the interval.
This is similar to the idea that unbiasedness of β̂j does not means
that β̂j = βj . Most of the time β̂j is not βj . Unbiasedness means
E(β̂j ) = βj .
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 93 / 104
CIs and Hypothesis Testing
If we have constructed a 95% CI for, say, βj , we can test any null
value against a two-sided alternative, at the 5% level. So
H0 : βj = aj
H1 : βj ̸= aj
1. If aj is in the 95% CI, then we fail to reject H0 at the 5% level.
2. If aj is not in the 95% CI then we reject H0 in favor of H1 at the
5% level.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 94 / 104
Note that, measured as percents,
significance level = 100 − confidence level
. reg lwage educ IQ exper motheduc
Source | SS df MS Number of obs = 759
-------------+------------------------------ F( 4, 754) = 54.26
Model | 58.7293322 4 14.682333 Prob > F = 0.0000
Residual | 204.019312 754 .270582642 R-squared = 0.2235
-------------+------------------------------ Adj R-squared = 0.2194
Total | 262.748644 758 .346634095 Root MSE = .52018
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1006798 .0118813 8.47 0.000 .0773555 .124004
IQ | .00735 .0016068 4.57 0.000 .0041957 .0105043
exper | .0449386 .0084136 5.34 0.000 .0284217 .0614555
motheduc | .0239265 .0095623 2.50 0.013 .0051545 .0426985
_cons | -.3837064 .2373921 -1.62 0.106 -.8497344 .0823215
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 95 / 104
The 95% CI for βIQ is about [.0042, .0105]. So we can reject
H0 : βIQ = 0 against the two-sided alternative at the 5% level. We
cannot reject H0 : βIQ = .01 (altough it is close).
We can reject a return to schooling of 7.5% as being too low, but
also 12.5% is too high.
Just as with hypothesis testing, these CIs are only as good as the
underlying assumptions. If we have omitted key variables, the β̂j are
biased. If the error variance is not constant, the standard errors are
improperly computed.
With df = 754, we will see later that normality is not very important.
But normality is needed for these CIs to be eact 95% CIs.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 96 / 104
Testing Single Linear Restrictions
So far, we have discussed testing hypotheses that involve only on
parameter, βj . But some hypotheses involve many parameters.
EXAMPLE: Are the Returns to a Year of Junior College the Same as
for a Four-Year University? (COLLEGE.DTA). Sample of high school
graduates.
lwage = β0 + β1jc + β2univ + β3exper + u
H0 : β1 = β2
H1 : β1 < β2
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 97 / 104
We could use a two-sided alternative, too.
We can also write
H0 : β1 − β2 = 0
Remember the general way to construct a t statistic:
t =
(estimate − hypothesized value)
standard error
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 98 / 104
Given the OLS estimates β̂1 and β̂2,
t =
β̂1 − β̂2
se(β̂1 − β̂2)
Problem: The OLS output gives us β̂1 and β̂2 and their standard
errors, but that is not enough to obtain se(β̂1 − β̂2).
Recall a fact about variances:
Var(β̂1 − β̂2) = Var(β̂1) + Var(β̂2) − 2Cov(β̂1, β̂2)
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 99 / 104
The standard error is an estimate of the square root:
se(β̂1 − β̂2) = {[se(β̂1)]2
+ [se(β̂1)]2
− 2s12}1/2
where s12 is an estimate of Cov(β̂1, β̂2). This is the piece we are
missing.
Stata will report s12 if we ask, but calcuting se(β̂1 − β̂2) is
cumbersome. There is also a trick of rewriting the model (see text,
Section 4.4).
These days, it is easiest to use a command for testing linear functions
of the coefficients. In Stata, it is lincom.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 100 / 104
. des lwage jc univ exper
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------
lwage float %9.0g log(wage)
jc float %9.0g total 2-year credits
univ float %9.0g total 4-year credits
exper float %8.0g work experience, years
. sum lwage jc univ exper
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lwage | 750 2.233674 .4906276 .6931472 3.901973
jc | 750 .3449006 .7731012 0 3.833333
univ | 750 1.817076 2.276202 0 7.5
exper | 750 10.26722 2.713302 .25 13.83333
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 101 / 104
. reg lwage jc univ exper
Source | SS df MS Number of obs = 750
-------------+------------------------------ F( 3, 746) = 86.25
Model | 46.4300797 3 15.4766932 Prob > F = 0.0000
Residual | 133.86575 746 .179444705 R-squared = 0.2575
-------------+------------------------------ Adj R-squared = 0.2545
Total | 180.295829 749 .240715393 Root MSE = .42361
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
jc | .0661471 .0202773 3.26 0.001 .0263398 .1059544
univ | .0836956 .0068935 12.14 0.000 .0701626 .0972287
exper | .0653706 .0057692 11.33 0.000 .0540448 .0766964
_cons | 1.387603 .0636388 21.80 0.000 1.262671 1.512536
------------------------------------------------------------------------------
Note that β̂jc − β̂univ = .0661 − .0837 = −.0176, so the estimated return to univ is about 1.8% higher.
But is the difference statistically significant?
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 102 / 104
. reg lwage jc univ exper
Source | SS df MS Number of obs = 750
-------------+------------------------------ F( 3, 746) = 86.25
Model | 46.4300797 3 15.4766932 Prob > F = 0.0000
Residual | 133.86575 746 .179444705 R-squared = 0.2575
-------------+------------------------------ Adj R-squared = 0.2545
Total | 180.295829 749 .240715393 Root MSE = .42361
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
jc | .0661471 .0202773 3.26 0.001 .0263398 .1059544
univ | .0836956 .0068935 12.14 0.000 .0701626 .0972287
exper | .0653706 .0057692 11.33 0.000 .0540448 .0766964
_cons | 1.387603 .0636388 21.80 0.000 1.262671 1.512536
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 103 / 104
. lincom jc - univ
( 1) jc - univ = 0
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -.0175485 .0206407 -0.85 0.395 -.0580694 .0229723
------------------------------------------------------------------------------
The two-sided p-value is .395, which means the one-sided p-value is .1975. Even against a one-sided
alternative, we cannot reject H0 : βjc = βuniv at even the 20% level.
Note how much more variation there is in univ compared with jc.
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 104 / 104
Of course, nothing changes (except the sign of the estimate) if we use βuniv − βjc:
. lincom univ - jc
( 1) - jc + univ = 0
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .0175485 .0206407 0.85 0.395 -.0229723 .0580694
------------------------------------------------------------------------------
Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 105 / 104

More Related Content

Similar to Chapter6.pdf.pdf

Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxPatilDevendra5
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxDevendraRavindraPati
 
Slides csm
Slides csmSlides csm
Slides csmychaubey
 
Outlying and Influential Data In Regression Diagnostics .docx
Outlying and Influential Data In Regression Diagnostics .docxOutlying and Influential Data In Regression Diagnostics .docx
Outlying and Influential Data In Regression Diagnostics .docxkarlhennesey
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3Mintu246
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Mintu246
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docxsimonithomas47935
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dMerrileeDelvalle969
 
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierRegression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierAl Arizmendez
 

Similar to Chapter6.pdf.pdf (20)

Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Chapter4_Multi_Reg_Estim.pdf.pdf
Chapter4_Multi_Reg_Estim.pdf.pdfChapter4_Multi_Reg_Estim.pdf.pdf
Chapter4_Multi_Reg_Estim.pdf.pdf
 
Chapter5.pdf.pdf
Chapter5.pdf.pdfChapter5.pdf.pdf
Chapter5.pdf.pdf
 
Slides csm
Slides csmSlides csm
Slides csm
 
Outlying and Influential Data In Regression Diagnostics .docx
Outlying and Influential Data In Regression Diagnostics .docxOutlying and Influential Data In Regression Diagnostics .docx
Outlying and Influential Data In Regression Diagnostics .docx
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
M8.logreg.ppt
M8.logreg.pptM8.logreg.ppt
M8.logreg.ppt
 
M8.logreg.ppt
M8.logreg.pptM8.logreg.ppt
M8.logreg.ppt
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
Logistics regression
Logistics regressionLogistics regression
Logistics regression
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Chapter13
Chapter13Chapter13
Chapter13
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
 
introduction CDA.pptx
introduction CDA.pptxintroduction CDA.pptx
introduction CDA.pptx
 
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierRegression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
 

More from ROBERTOENRIQUEGARCAA1 (20)

Memory Lecture Psychology Introduction part 1
Memory Lecture Psychology Introduction part 1Memory Lecture Psychology Introduction part 1
Memory Lecture Psychology Introduction part 1
 
Sherlock.pdf
Sherlock.pdfSherlock.pdf
Sherlock.pdf
 
Cognicion Social clase
Cognicion Social claseCognicion Social clase
Cognicion Social clase
 
surveys non experimental
surveys non experimentalsurveys non experimental
surveys non experimental
 
experimental research
experimental researchexperimental research
experimental research
 
non experimental
non experimentalnon experimental
non experimental
 
quasi experimental research
quasi experimental researchquasi experimental research
quasi experimental research
 
variables cont
variables contvariables cont
variables cont
 
sampling experimental
sampling experimentalsampling experimental
sampling experimental
 
experimental designs
experimental designsexperimental designs
experimental designs
 
experimental control
experimental controlexperimental control
experimental control
 
validity reliability
validity reliabilityvalidity reliability
validity reliability
 
Experiment basics
Experiment basicsExperiment basics
Experiment basics
 
methods
methodsmethods
methods
 
Week 11.pptx
Week 11.pptxWeek 11.pptx
Week 11.pptx
 
treatment effect DID.pdf
treatment effect DID.pdftreatment effect DID.pdf
treatment effect DID.pdf
 
DAG.pdf
DAG.pdfDAG.pdf
DAG.pdf
 
2022_Fried_Workshop_theory_measurement.pptx
2022_Fried_Workshop_theory_measurement.pptx2022_Fried_Workshop_theory_measurement.pptx
2022_Fried_Workshop_theory_measurement.pptx
 
pdf (8).pdf
pdf (8).pdfpdf (8).pdf
pdf (8).pdf
 
pdf (9).pdf
pdf (9).pdfpdf (9).pdf
pdf (9).pdf
 

Recently uploaded

VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthUnveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthShaheen Kumar
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppmiss dipika
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingAggregage
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spiritegoetzinger
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
 
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthUnveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsApp
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Chapter6.pdf.pdf

  • 1. Multiple Regression Analysis: Statistical Inference: I Introductory Econometrics: A Modern Approach, 5e Haoming Liu National University of Singapore August 21, 2022 1 . Sampling Distributions of the OLS Estimators 2 . Testing Hypotheses About a Single Population Parameter 3 . Confidence Intervals 4 . Testing Single Linear Restrictions Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 1 / 104
  • 2. Recap So far, what do we know how to do with the population model y = β0 + β1x1 + ... + βkxk + u? 1 Mechanics of OLS for a given sample. We only need MLR.2 insofar as it introduces the data, and MLR.3 (no perfect collinearity) so that the OLS estimates exist. Interpretation of OLS regression line – ceteris paribus effects – R2 goodness-of-fit measure. Some functional form (natural logarithm). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 2 / 104
  • 3. Recap: MLRs 1 : y = β0 + β1x1 + β2x2 + ... + βkxk + u 2 : random sampling from the population 3 : no perfect collinearity in the sample 4 : E(u|x1, ..., xk) = E(u) = 0 (exogenous explanatory variables) 5 : Var(u|x1, ..., xk) = Var(u) = σ2 (homoskedasticity) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 3 / 104
  • 4. Recap Unbiasedness of OLS under MLR.1 to MRL.4. Obtain bias (or at least the direction) when MLR.4 fails due to an omitted variable. Obtain the variances, Var(β̂j ), under MLR.1 to MLR.5. The Gauss-Markov Assumptions also imply OLS is the best linear unbiased estimator (BLUE) (conditional on the values of the explanatory variables). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 4 / 104
  • 5. Sampling Distributions of the OLS Estimators We now want to test hypotheses about the βj . This means we hypothesize that a population parameter is a certain value, then use the data to determine whether the hypothesis is likely to be false. EXAMPLE: (Motivated by ATTEND.DTA) final = β0 + β1missed + β2priGPA + β3ACT + u where ACT is the achievement test score. The null hypothesis, that missing lecture has no effect on final exam performance (after accounting for prior MSU GPA and ACT score), is H0 : β1 = 0 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 5 / 104
  • 6. Sampling Distributions of the OLS Estimators To test hypotheses about the βj using exact (or “finite sample”) testing procedures, we need to know more than just the mean and variance of the OLS estimators. MLR.1 to MLR.4: We can compute the expected value as E(β̂j ) = βj MLR.1 to MLR.5: We know the variance is Var(β̂j ) = σ2 SSTj (1 − R2 j ) And, σ̂2 = SSR/(n − k − 1) is an unbiased estimator of σ2 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 6 / 104
  • 7. Sampling Distributions of the OLS Estimators But hypothesis testing relies on the entire sampling distributions of the β̂j . Even under MLR.1 through MLR.5, the sample distributions can be virtually anything. Write β̂j = βj + n X i=1 wij ui , where the wij are functions of {(xi1, ..., xik) : i = 1, ..., n}. Conditional on {(xi1, ..., xik) : i = 1, ..., n}, β̂j inherits its distribution from that of {ui : i = 1, .., n}, which is a random sample from the population distribution of u. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 7 / 104
  • 8. Assumption MRL.6 (Normality) Normality The population error u is independent of (x1, ..., xk) and is normally distribution with mean zero and variance σ2: u ∼ Normal(0, σ2 ) MLR.4: E(u|x1, ..., xk) = E(u) = 0 MLR.5: Var(u|x1, ..., xk) = Var(u) = σ2 Now MLR.6 imposes full independence between u and (x1, x2, ..., xk) (not just mean and variance independence), which is where the label of the xj as “independent variables” originated. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 8 / 104
  • 9. The important part of MLR.6 is that we have now made a very specific distributional assumption for u: the familiar bell-shaped curve: Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 9 / 104
  • 10. Assumption MRL.6 (Normality) Normality is by far the most common assumption, but the usual arguments about why normality is a good assumption are not always operative. Usually, the argument starts with the claim that u is the sum of many independent factors, say u = a1 +a2 +...+am for “large” m, and then we can apply the central limit theorem. But what if the factors have very different distributions, or are multiplicative rather than additive? Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 10 / 104
  • 11. Assumption MRL.1-6 Ultimately, like Assumption MLR.5, Assumption MLR.6 is maintained for convenience. Fortunately, we will later see that, for approximate inference in large samples, we can drop MLR.6. For now we keep it. It is very difficult to perform exact statistical inference without Assumption MLR.6. Assumptions MLR.1 to MLR.6 are called the classical linear model (CLM) assumptions (for cross-sectional regression). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 11 / 104
  • 12. Normality For practical purposes, think of CLM = Gauss-Markov + normality An important fact about independent normal random variables: any linear combination is also normally distributed. Because the ui are independent and identically distributed (iid) as Normal(0, σ2), β̂j = βj + n X i=1 wij ui ∼ Normal[βj , Var(β̂j )] where we already know the formula for Var(β̂j ): Var(β̂j ) = σ2 SSTj (1 − R2 j ) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 12 / 104
  • 13. THEOREM (Normal Sampling Distributions) Under the CLM Assumptions (and conditional on the sample outcomes of the explanatory variables), β̂j ∼ Normal[βj , Var(β̂j )] and so β̂j − βj sd(β̂j ) ∼ Normal(0, 1) The second result follows from a feature of the normal distribution: if W ∼ Normal then a + bW ∼ Normal for constants a and b. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 13 / 104
  • 14. Normality The standardized random variable β̂j − βj sd(β̂j ) always has zero mean and variance one. Under MLR.6, it is also normally distributed. Notice that the standard normal distribution holds even when we do not condition on {(xi1, xi2, ..., xik) : i = 1, ..., n}. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 14 / 104
  • 15. Testing Hypotheses About a Single Population Parameter We cannot directly use the result β̂j − βj sd(β̂j ) ∼ Normal(0, 1) to test hypotheses about βj : sd(β̂j ) depends on σ = sd(u), which is unknown. But we have σ̂ as an estimator of σ. Using this in place of σ gives us the standard error, se(β̂j ). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 15 / 104
  • 16. THEOREM (t Distribution for Standardized Estimators) Under the CLM Assumptions, β̂j − βj se(β̂j ) ∼ tn−k−1 = tdf We will not prove this as the argument is somewhat involved. It is replacing σ (an unknown constant) with σ̂ (an estimator that varies across samples), that takes us from the standard normal to the t distribution. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 16 / 104
  • 17. Distribution for Standardized Estimators The t distribution also has a bell shape, but is more spread out than the Normal(0, 1). E(tdf ) = 0 if df > 1 Var(tdf ) = df df − 2 > 1 if df > 2 We will never have very small df in this class. When df = 10, Var(tdf ) = 1.25, which is 25% larger than the Normal(0, 1) variance. When df = 120, Var(tdf ) ≈ 1.017 – only 1.7% larger than the standard normal. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 17 / 104
  • 18. Distribution for Standardized Estimators As df → ∞, tdf → Normal(0, 1) The difference is practically small for df > 120. The next graph plots a standard normal pdf against a t6 pdf. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 18 / 104
  • 19. Testing We use the result on the t distribution to test the null hypothesis that xj has no partial effect on y: H0 : βj = 0 lwage = β0 + β1educ + β2exper + β3tenure + u H0 : β2 = 0 In words: Once we control for education and time on the current job (tenure), total workforce experience has no affect on lwage = log(wage). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 19 / 104
  • 20. Testing To test H0 : βj = 0, we use the t statistic (or t ratio), tβ̂j = β̂j se(β̂j ) This is the estimated coefficient divided by our estimate of β̂j ’s sampling standard deviation. In virtually all cases β̂j is not exactly equal to zero. When we use tβ̂j , we are measuring how far β̂j is from zero relative to its standard error. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 20 / 104
  • 21. Testing Because se(β̂j ) > 0, tβ̂j always has the same sign as β̂j . To use tβ̂j to test H0 : βj = 0, we need to have an alternative. Some like to define tβ̂j as the absolute value, so it is always positive. This makes it cumbersome to test against one-sided alternatives. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 21 / 104
  • 22. Testing Against One-Sided Alternatives First consider the alternative H1 : βj > 0 which means the null is effectively H0 : βj ≤ 0 Using a positive one-sided alternative, if we reject βj = 0 than we reject any βj < 0, too. We often just state H0 : βj = 0 and act like we do not care about negative values. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 22 / 104
  • 23. Testing Against One-Sided Alternatives If the estimated coefficient β̂j is negative, it provides no evidence against H0 in favor of H1 : βj > 0. If β̂j is positive, the question is: How big does tβ̂j = β̂j /se(β̂j ) have to be before we conclude H0 is “unlikely”? Traditional approach to hypothesis testing: Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 23 / 104
  • 24. Testing Against One-Sided Alternatives 1 . Choose a null hypothesis: H0 : βj = 0 (or H0 : βj ≤ 0) 2 . Choose an alternative hypothesis: H1 : βj > 0 3 . Choose a significance level (or simply level, or size) for the test. That is, the probability of rejecting the null hypothesis when it is in fact true. (Type I Error). Suppose we use 5%, so the probability of committing a Type I error is .05. 4 . Choose a critical value, c > 0, so that the rejection rule tβ̂j > c leads to a 5% level test. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 24 / 104
  • 25. Testing Against One-Sided Alternatives The key is that, under the null hypothesis, tβ̂j ∼ tn−k−1 = tdf and this is what we use to obtain the critical value, c. Suppose df = 28 and we use a 5% test. The critical value is c = 1.701, as can be gotten from Table G.2 (page 833 in 5e). The following picture shows that we are conducting a one-tailed test (and it is these entries that should be used in the table). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 25 / 104
  • 26. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 26 / 104
  • 27. Testing Against One-Sided Alternatives So, with df = 28, the rejection rule for H0 : βj = 0 against H1 : βj > 0, at the 5% level, is tβ̂j > 1.701 We need a t statistic greater than 1.701 to conclude there is enough evidence against H0. If tβ̂j ≤ 1.701, we fail to reject H0 against H1 at the 5% significance level. Suppose df = 28, but we want to carry out the test at a different significance level (often 10% level or the 1% level). c.10 = 1.313 c.05 = 1.701 c.01 = 2.467 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 27 / 104
  • 28. Testing Against One-Sided Alternatives If we want to reduce the probability of Type I error, we must increase the critical value (so we reject the null less often). If we reject at, say, the 1% level, then we must also reject at any larger level. If we fail to reject at, say, the 10% level – so that tβ̂j ≤ 1.313 – then we will fail to reject at any smaller level. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 28 / 104
  • 29. Testing Against One-Sided Alternatives With large sample sizes – certain when df > 120 – we can use critical values from the standard normal distribution. These are the df = ∞ entry in Table G.2. c.10 = 1.282 c.05 = 1.645 c.01 = 2.362 which we can round to 1.28, 1.65, and 2.36, respectively. The value 1.65 is especially common for a one-tailed test. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 29 / 104
  • 30. EXAMPLE: Factors Affecting lwage (WAGE2.DTA) In applications, it is helpful to label parameters with variable names to state hypotheses. So βeduc, βIQ, and βexper , for example. Then H0 : βexper = 0 is that workforce experience has no effect on a wage once education, and IQ have been accounted for. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 30 / 104
  • 31. EXAMPLE: Factors Affecting lwage (WAGE2.DTA) lwage = −.229 (.230) + .107 (.012) educ + .0080 (.0016) IQ + .0435 (.0084) exper n = 759, R2 = .217 The quantities in parentheses are still standard errors, not t statistics! Easiest to read the t statistic off the Stata output, when available: texper = 5.17, which is well above the one-sided critical value at the 1% level, 2.36. In fact, the .5% critical value is about 2.58. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 31 / 104
  • 32. EXAMPLE: Factors Affecting lwage (WAGE2.DTA) The bottom line is that H0 : βexper = 0 can be rejected against H1 : βexper > 0 at very small significance levels. A t of 5.17 is very large. The estimated effect of exper – that is, its economic importance – is apparent. Another year of experience, holding educ and IQ fixed, is estimated to be worth about 4.4%. The t statistics for educ and IQ are also very large; there is no need to even look up critical values. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 32 / 104
  • 33. . reg lwage educ IQ exper Source | SS df MS Number of obs = 759 -------------+------------------------------ F( 3, 755) = 69.78 Model | 57.0352742 3 19.0117581 Prob > F = 0.0000 Residual | 205.71337 755 .27246804 R-squared = 0.2171 -------------+------------------------------ Adj R-squared = 0.2140 Total | 262.748644 758 .346634095 Root MSE = .52198 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1069849 .0116513 9.18 0.000 .084112 .1298578 IQ | .0080269 .0015893 5.05 0.000 .0049068 .0111469 exper | .0435405 .0084242 5.17 0.000 .0270028 .0600783 _cons | -.228922 .2299876 -1.00 0.320 -.6804132 .2225692 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 33 / 104
  • 34. EXAMPLE: Does ACT score help predict college GPA? In the GPA1.DTA n = 141 MSU students from mid-1990s. All variables are self reported. Consider controlling for high school GPA: colGPA = β0 + β1hsGPA + β2ACT + u H0 : β2 = 0 From the Stata ouput, β̂2 = β̂ACT = .0094 and tACT = .87. Even at the 10% level (c = 1.28), we cannot reject H0 against H1 : βACT > 0. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 34 / 104
  • 35. Does ACT score help predict college GPA? Because we fail to reject H0 : βACT = 0, we say that “β̂ACT is statistically insignificant at the 10% level against at one-sided alternative.” It is also very important to see that the estimated effect of ACT is small, too. Three more points (slightly more than one standard deviation) only predicts colGPA that is .0094(3) ≈ .028 – not even three one-hundreths of a grade point. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 35 / 104
  • 36. Does ACT score help predict college GPA? By contrast, β̂hsGPA = .453 is large in a practical sense – each point on hsGPA is associated with about .45 points on colGPA – and thsGPA = 4.73 is very large. No critical values in Table G.2 with df = 141 − 3 = 138 are even close to 4. So “β̂hsGPA is statistically significant” at very small significance levels. Notice what happens if we do not control for hsGPA. The simple regression estimate is .0271 with tACT = 2.49. The magnitude is still pretty modest, but we would conclude it is statistically different from zero at the 1% significance level using the standard normal critical value, 2.36. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 36 / 104
  • 37. Does ACT score help predict college GPA? Not clear why ACT has such a small, statistically insignificant effect. The sample size is small and the scores were self-reported. The survey was done in a couple of economics courses, so it is not a random sample of all MSU students. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 37 / 104
  • 38. . des colGPA hsGPA ACT storage display value variable name type format label variable label ----------------------------------------------------------------------------- colGPA float %9.0g MSU GPA hsGPA float %9.0g high school GPA ACT byte %9.0g ’achievement’ score . sum colGPA hsGPA ACT Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- colGPA | 141 3.056738 .3723103 2.2 4 hsGPA | 141 3.402128 .3199259 2.4 4 ACT | 141 24.15603 2.844252 16 33 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 38 / 104
  • 39. . reg colGPA hsGPA ACT Source | SS df MS Number of obs = 141 -------------+------------------------------ F( 2, 138) = 14.78 Model | 3.42365506 2 1.71182753 Prob > F = 0.0000 Residual | 15.9824444 138 .115814814 R-squared = 0.1764 -------------+------------------------------ Adj R-squared = 0.1645 Total | 19.4060994 140 .138614996 Root MSE = .34032 ------------------------------------------------------------------------------ colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hsGPA | .4534559 .0958129 4.73 0.000 .2640047 .6429071 ACT | .009426 .0107772 0.87 0.383 -.0118838 .0307358 _cons | 1.286328 .3408221 3.77 0.000 .612419 1.960237 ------------------------------------------------------------------------------ . reg colGPA ACT Source | SS df MS Number of obs = 141 -------------+------------------------------ F( 1, 139) = 6.21 Model | .829558811 1 .829558811 Prob > F = 0.0139 Residual | 18.5765406 139 .133644177 R-squared = 0.0427 -------------+------------------------------ Adj R-squared = 0.0359 Total | 19.4060994 140 .138614996 Root MSE = .36557 ------------------------------------------------------------------------------ colGPA | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ACT | .027064 .0108628 2.49 0.014 .0055862 .0485417 _cons | 2.402979 .2642027 9.10 0.000 1.880604 2.925355 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 39 / 104
  • 40. For the negative one-sided alternative, H0 : βj < 0, we use a symmetric rule. But the rejection rule is tβ̂j < −c where c is chosen in the same way as in the positive case. With df = 18 and a 5% test, the critical value is c = −1.734, so the rejection rule is tβ̂j < −1.734 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 40 / 104
  • 41. Now we must see a significantly negative value for the t statistic to reject H0 : βj = 0 in favor of H1 : βj < 0. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 41 / 104
  • 42. EXAMPLE: Does missing lectures affect final exam performance? final = β0 + β1missed + β2priGPA + β3ACT + u H0 : β1 = 0, H1 : β1 < 0 We get β̂1 = −.079, tβ̂1 = −2.25. The 5% cv is −1.65 and the 1% cv is −2.36. So we reject H0 in favor of H1 at the 5% level but not at the 1% level. The effect is not huge: 10 missed lectures, out of 32, lowers final exam score by about .8 points – so not even one point. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 42 / 104
  • 43. . reg final missed priGPA ACT Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 3, 676) = 56.79 Model | 3032.09408 3 1010.69803 Prob > F = 0.0000 Residual | 12029.853 676 17.7956405 R-squared = 0.2013 -------------+------------------------------ Adj R-squared = 0.1978 Total | 15061.9471 679 22.1825435 Root MSE = 4.2185 ------------------------------------------------------------------------------ final | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- missed | -.0793386 .0352349 -2.25 0.025 -.1485216 -.0101556 priGPA | 1.915294 .372614 5.14 0.000 1.183674 2.646914 ACT | .4010639 .0532268 7.54 0.000 .2965542 .5055736 _cons | 12.37304 1.171961 10.56 0.000 10.07192 14.67416 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 43 / 104
  • 44. If we do not control for ACT score, the effect of missed goes away. It turns out that missed and ACT are positively correlated: those with higher ACT scores miss more classes, on average. . reg final missed priGPA Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 2, 677) = 52.48 Model | 2021.72415 2 1010.86207 Prob > F = 0.0000 Residual | 13040.2229 677 19.2617768 R-squared = 0.1342 -------------+------------------------------ Adj R-squared = 0.1317 Total | 15061.9471 679 22.1825435 Root MSE = 4.3888 ------------------------------------------------------------------------------ final | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- missed | .0172012 .0341483 0.50 0.615 -.0498481 .0842504 priGPA | 3.237554 .3419779 9.47 0.000 2.56609 3.909019 _cons | 17.41567 1.000942 17.40 0.000 15.45035 19.381 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 44 / 104
  • 45. Reminder about Testing Our hypthoses involve the unknown population values, βj . If in a our set of data we obtain, say, β̂j = 2.75, we do not write the null hypothesis as H0 : 2.75 = 0 (which is obviously false). Nor do we write H0 : β̂j = 0 (which is also false except in the very rare case that our estimate is exactly zero). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 45 / 104
  • 46. Testing Against Two-Sided Alternatives We do not test hypotheses about the estimate! We know what it is once we collect the sample. We hypothesize about the unknown population value, βj . Sometimes we do not know ahead of time whether a variable definitely has a positive effect or a negative effect. Even in the example final = β0 + β1missed + β2priGPA + β3ACT + u it is conceivable that missing class helps final exam performance. (The extra time is used for studying, say.) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 46 / 104
  • 47. Generally, the null and alternative are H0 : βj = 0 H1 : βj ̸= 0 Testing against the two-sided alternative is usually the default. It prevents us from looking at the regression results and then deciding on the alternative. Also, it is harder to reject H0 against the two-sided alternative, so it requires more evidence that xj actually affects y. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 47 / 104
  • 48. Two-Sided Alternatives Now we reject if β̂j is sufficiently large in magnitude, either positive or negative. We again use the t statistic tβ̂j = β̂j /se(β̂j ), but now the rejection rule is tβ̂j > c This results in a two-tailed test, and those are the critical values we pull from Table G.2. For example, if we use a 5% level test and df = 25, the two-tailed cv is 2.06. The two-tailed cv is, in this case, the 97.5 percentile in the t25 distribution. (Compare the one-tailed cv, about 1.71, the 95th percentile in the t25 distribution). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 48 / 104
  • 49. Two-Sided Alternatives Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 49 / 104
  • 50. EXAMPLE: Factors affecting math pass rates. (MEAP98.DTA) Run a multiple regression of math4 on lunch, str, avgsal, enrol. A priori, we might expect lunch to have a negative effect (it is essentially a school-level poverty rate), str to have a negative effect, and avgsal to have a positive effect. But we can still test against a two-sided alternative to avoid specifying the alternative ahead of time. enrol is clearly ambiguous. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 50 / 104
  • 51. With 923 observations, we can use the standard normal critical values. For a 10% test it is 1.65, for a 5%, 1.96, and for 1%, cv = 2.58. . des math4 lunch str avgsal enrol storage display value variable name type format label variable label ------------------------------------------------------------------------------ math4 byte %9.0g pass rate, 4th grade math test lunch float %9.0g % students eligible free lunch str float %9.0g student-teacher ratio avgsal float %9.0g average teacher salary enrol int %9.0g school enrollment . sum math4 lunch str avgsal enrol Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- math4 | 923 60.54713 19.71111 3 100 lunch | 923 37.34231 26.21696 0 98.78 str | 923 23.50704 3.755936 7.6 41.1 avgsal | 923 47557.53 8577.373 13976 81045 enrol | 923 403.5655 162.6491 18 1176 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 51 / 104
  • 52. . reg math4 lunch str avgsal enrol Source | SS df MS Number of obs = 923 -------------+------------------------------ F( 4, 918) = 68.82 Model | 82641.3258 4 20660.3315 Prob > F = 0.0000 Residual | 275581.374 918 300.197575 R-squared = 0.2307 -------------+------------------------------ Adj R-squared = 0.2273 Total | 358222.7 922 388.527874 Root MSE = 17.326 ------------------------------------------------------------------------------ math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lunch | -.2911477 .0237168 -12.28 0.000 -.3376931 -.2446023 str | -.8354922 .1776196 -4.70 0.000 -1.18408 -.4869046 avgsal | .0003744 .000079 4.74 0.000 .0002194 .0005294 enrol | .0050858 .0036523 1.39 0.164 -.002082 .0122537 _cons | 71.20066 4.302933 16.55 0.000 62.75593 79.64539 ------------------------------------------------------------------------------ The variables lunch, str, and avgsal all of coefficients with the anticipated signs, and the absolute values of the t statistics are above 4. So we easily reject H0 : βj = 0 against H1 : βj ̸= 0. enrol is a different situation. tenroll = 1.39 < 1.65, so we fail to reject H0 at even the 10% signficance level. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 52 / 104
  • 53. Functional form can make a difference. The math pass rates are capped at 100, so a diminishing effect in avgsal and enrol seem appropriate; these variables have lots of variation. So use the logarithm instead. . reg math4 lunch str lavgsal lenrol Source | SS df MS Number of obs = 923 -------------+------------------------------ F( 4, 918) = 71.09 Model | 84715.9491 4 21178.9873 Prob > F = 0.0000 Residual | 273506.751 918 297.937637 R-squared = 0.2365 -------------+------------------------------ Adj R-squared = 0.2332 Total | 358222.7 922 388.527874 Root MSE = 17.261 ------------------------------------------------------------------------------ math4 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lunch | -.2886697 .0235046 -12.28 0.000 -.3347986 -.2425408 str | -.9549563 .1824296 -5.23 0.000 -1.312984 -.5969288 lavgsal | 18.13305 3.605116 5.03 0.000 11.05782 25.20827 lenrol | 2.622179 1.256434 2.09 0.037 .1563616 5.087996 _cons | -116.6793 37.28153 -3.13 0.002 -189.8462 -43.51239 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 53 / 104
  • 54. Of course, all estimates change, but it is those on the lavgsal and lenrol that are now much different. Before, we were measure a dollar effect. But now, holding the other variables fixed, ∆ math4 = (18.13/100)%∆avgsal = .1813(%∆avgsal) So if, say, %∆avgsal = 10 – teacher salaries are 10 percent higher – math4 is estimated to increase by about 1.8 points. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 54 / 104
  • 55. Also, ∆ math4 = (2.62/100)%∆enroll = .0262(%∆enroll) so a 10% increase in enrollment is associated with a .26 point increase in math4. Notice how lenrol = log(enrol) is statistically significant at the 5% level: tlenrol = 2.09 > 1.96. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 55 / 104
  • 56. Reminder: When we report the results of, say, the second regression, it looks like math4 = −116.68 (37.28) − .289 (.024) lunch − .955 (.182) str + 18.13 (3.61) lavgsal + 2.62 (1.26) lenro n = 903, R2 = .237 so that standard errors are below coefficients. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 56 / 104
  • 57. When we reject H0 : βj = 0 against H1 : βj ̸= 0, we often say that β̂j is statistically different from zero and usually mention a significance level. For example, if we can reject at the 1% level, we say that. If we can reject a the 10% level but not the 5%, we say that. As in the one-sided case, we also say β̂j is “statistically significant” when we can reject H0 : βj = 0. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 57 / 104
  • 58. Testing Other Hypotheses about the βj Testing the null H0 : βj = 0 is by far the most common. That is why Stata and other regression packages automatically report the t statistic for this hypothesis. It is critical to remember that tβ̂j = β̂j se(β̂j ) is only for H0 : βj = 0. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 58 / 104
  • 59. What if we want to test a different null value? For example, in a constant-elasticity consumption function, log(cons) = β0 + β1 log(inc) + β2famsize + β3pareduc + u we might want to test H0 : β1 = 1 which means an income elasticity equal to one. (We can be pretty sure that β1 > 0.) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 59 / 104
  • 60. More generally, suppose the null is H0 : βj = aj where we specify the value aj (usually zero, but, in the consumption example, aj = 1). It is easy to extend the t statistic: t = (β̂j − aj ) se(β̂j ) This t statistic just measures how far our estimate, β̂j , is from the hypothesized value, aj , relative to se(β̂j ). Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 60 / 104
  • 61. A useful general expression for general t testing: t = (estimate − hypothesized value) standard error The alternative can be one-sided or two-sided. We choose critical values in exactly the same way as before. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 61 / 104
  • 62. The language needs to be suitably modified. If, for example, H0 : βj = 1 H1 : βj ̸= 1 is rejected at the 5% level, we say “β̂j is statistically different from one at the 5% level.” Otherwise, β̂j is “not statistically different from one.” If the alternative is H1 : βj > 1, then “β̂j is statistically greater than one at the 5% level.” Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 62 / 104
  • 63. EXAMPLE: Crime and enrollment on college campuses (CAMPUS.DTA) A simple regression model: log(crime) = β0 + β1 log(enroll) + u H0 : β1 = 1 H1 : β1 > 1 We get β̂1 = 1.27, and so a 1% increase in enrollment is estimated to increase crime by 1.27% (so more than 1%). Is this estimate statistically greater than one? Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 63 / 104
  • 64. Crime and enrollment on college campuses (CAMPUS.DTA) We cannot pull the t statistic off of the usual Stata output. We can compute it by hand (rounding the estimate and standard error): t = (1.270 − 1) .110 ≈ 2.45 (Note how this is much smaller than the t for H0 : β1 = 0, reported by Stata.) We have df = 97 − 2 = 95, so we use the df = 120 entry in Table G.2. The 1% cv for a one-sided alternative is about 2.36, so we reject at the 1% significance level. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 64 / 104
  • 65. . reg lcrime lenroll Source | SS df MS Number of obs = 97 -------------+------------------------------ F( 1, 95) = 133.79 Model | 107.083654 1 107.083654 Prob > F = 0.0000 Residual | 76.0358244 95 .800377098 R-squared = 0.5848 -------------+------------------------------ Adj R-squared = 0.5804 Total | 183.119479 96 1.90749457 Root MSE = .89464 ------------------------------------------------------------------------------ lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lenroll | 1.26976 .109776 11.57 0.000 1.051827 1.487693 _cons | -6.63137 1.03354 -6.42 0.000 -8.683206 -4.579533 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 65 / 104
  • 66. Alternatively, we can let Stata do the work using the lincom (“linear combination” command). Here the null is stated equivalent as H0 : β1 − 1 = 0 . lincom lenroll - 1 ( 1) lenroll = 1 ------------------------------------------------------------------------------ lcrime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .2697603 .109776 2.46 0.016 .0518273 .4876932 ------------------------------------------------------------------------------ The t = 2.46 is the more accurate calculation of the t statistic. The lincom lenroll - 1 command is Stata’s way of saying “test whether βlenroll − 1 equals zero.” Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 66 / 104
  • 67. Computing p-Values for t Tests The traditional approach to testing, where we choose a significance level ahead of time, can be cumbersome. Plus, it can conceal information. For example, suppose that, for testing against a two-sided alternative, a t statistical is just below the 5% cv. I could simply say that “I fail to reject H0 : βj = 0 against the two-sided alternative at the 5% level.” But there is nothing sacred about 5%. Might I reject at, say, 6%? Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 67 / 104
  • 68. Computing p-Values for t Tests Rather than have to specify a level ahead of time, or discuss different traditional significance levels (10%, 5%, 1%), it is better to answer the following question: Given the observed value of the t statistic, what is the smallest significance level at which I can reject H0? The smallest level at which the null can be rejected is known as the p-value of a test. It is a single number that automatically allows us to carry out the test at any level. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 68 / 104
  • 69. One way to think about the p-values is that it uses the observed statistic as the critical value, and then finds the significance level of the test using that critical value. It is most common to report p-values for two-sided alternatives. This is what Stata does. The t tables are not detailed enough. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 69 / 104
  • 70. For t testing against a two-sided alternative, p-value = P(|T| > |t|) where t is the value of the t statistic and T is a random variable with the tdf distribution. The p-value is a probability, so it is between zero and one. Perhaps the best way to think about p-values: it is the probability of observing a statistic as extreme as we did if the null hypothesis is true. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 70 / 104
  • 71. So smaller p-values provide more are evidence against the null. For example, if p-value = .50, then there is a 50% chance of observing a t as large as we did (in absolute value). This is not enough evidence against H0. If p-value = .001, then the chance of seeing a t statistic as extreme as we did is .1%. We can conclude that we got a very rare sample – which is not helpful – or that the null hypothesis is very likely false. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 71 / 104
  • 72. From p-value = P(|T| > |t|) we see that as |t| increases the p-value decreases. Large absolute t statistics are associated with small p-values. Suppose df = 40 and, from our data, we obtain t = 1.85 or t = −1.85. Then p-value = P(|T| > 1.85) = 2P(T > 1.85) = 2(.0359) = .0718 where T˜t40. Finding the actual numbers required using Stata. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 72 / 104
  • 73. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 73 / 104
  • 74. Given p-value, we can carry out a test at any significance level. If α is the chosen level, then Reject H0 if p-value < α For example, in the previous example we obtained p-value = .0718. This means that we reject H0 at the 10% level but not the 5% level. We reject at 8% but (not quite) at 7%. Knowing p-value = .0718 is clearly much better than just saying “I fail to reject at the 5% level.” Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 74 / 104
  • 75. Computing p-Values for One-Sided Alternatives Stata and other packages report the two-sided p-value. How can we get a one-sided p-value? With a caveat, the answer is simple: one-sided p-value = two-sided p-value 2 We only want the area in one tail, not two tails. The two-sided p-value gives us the area in both tails. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 75 / 104
  • 76. This is the correct calculation when it is interesting to do the calculation. The caveat is simple: if the estimated coefficient is not in the direction of the alternative, the one-sided p-value is above .50, and so it is not an interesting calculation. In Stata, the two-sided p-values for H0 : βj = 0 are given in the column labeled P |t|. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 76 / 104
  • 77. EXAMPLE: Factors Affecting NBA Salaries (NBASAL.DTA) des wage games mingame points rebounds assists storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------ wage float %9.0g annual salary, thousands $ games byte %9.0g average games per year mingame float %9.0g minutes per game points float %9.0g points per game rebounds float %9.0g rebounds per game assists float %9.0g assists per game . sum wage games mingame points rebounds assists Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wage | 269 1423.828 999.7741 150 5740 games | 269 65.72491 18.85111 3 82 mingame | 269 23.97925 9.731177 2.888889 43.08537 points | 269 10.21041 5.900667 1.2 29.8 rebounds | 269 4.401115 2.892573 .5 17.3 -------------+-------------------------------------------------------- assists | 269 2.408922 2.092986 0 12.6 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 77 / 104
  • 78. Factors Affecting NBA Salaries (NBASAL.DTA) Use lwage = log(wage) to get constant percentage effects. . reg lwage games mingame points rebounds assists Source | SS df MS Number of obs = 269 -------------+------------------------------ F( 5, 263) = 40.27 Model | 90.2698185 5 18.0539637 Prob > F = 0.0000 Residual | 117.918945 263 .448361006 R-squared = 0.4336 -------------+------------------------------ Adj R-squared = 0.4228 Total | 208.188763 268 .776823743 Root MSE = .6696 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- games | .0004132 .002682 0.15 0.878 -.0048679 .0056942 mingame | .0302278 .0130868 2.31 0.022 .0044597 .055996 points | .0363734 .0150945 2.41 0.017 .0066519 .0660949 rebounds | .0406795 .0229455 1.77 0.077 -.0045007 .0858597 assists | .0003665 .0314393 0.01 0.991 -.0615382 .0622712 _cons | 5.648996 .1559075 36.23 0.000 5.34201 5.955982 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 78 / 104
  • 79. Forgetting the intercept (or “constant”), none of the variables is statistically significant at the 1% level against a two-sided alternative. The closest is points, with p-value = .017. (The one-sided p-value is .017/2 = .0085 < .01, so it is significant at the 1% level against the positive one-sided alternative.) mingame is statistically significant a the 5% level because p-value = .022 < .05. rebounds is statistically significant a the 10% level (against a two-sided alternative) because p-value = .077 < .10, but not at the 5% level. But the one-sided p-value is .077/2 = .0385 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 79 / 104
  • 80. Both games and assists have very small t statistics, which lead to p-values close to one (for example, for assists, p-value = .991). These variables are statistically insignificant. In some applications, p-values equal to zero up to three decimal places are not uncommon. We do not have to worry about statistical significance in such cases. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 80 / 104
  • 81. Using WAGE2.DTA: . reg lwage educ IQ exper motheduc Source | SS df MS Number of obs = 759 -------------+------------------------------ F( 4, 754) = 54.26 Model | 58.7293322 4 14.682333 Prob > F = 0.0000 Residual | 204.019312 754 .270582642 R-squared = 0.2235 -------------+------------------------------ Adj R-squared = 0.2194 Total | 262.748644 758 .346634095 Root MSE = .52018 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1006798 .0118813 8.47 0.000 .0773555 .124004 IQ | .00735 .0016068 4.57 0.000 .0041957 .0105043 exper | .0449386 .0084136 5.34 0.000 .0284217 .0614555 motheduc | .0239265 .0095623 2.50 0.013 .0051545 .0426985 _cons | -.3837064 .2373921 -1.62 0.106 -.8497344 .0823215 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 81 / 104
  • 82. Language of Hypothesis Testing If we do not reject H0 (against any alternative), it is better to say “we fail to reject H0” as opposed to “we accept H0,” which is somewhat common. The reason is that many null hypotheses cannot be rejected in any application. For example, if I have β̂j = .75 and se(β̂j ) = .25, I do not say that I “accept H0 : βj = 1.” I fail to reject because the t statistic is (.75 − 1)/.25 = −1. But the t statistic for H0 : βj = .5 is (.75 − .5)/.25 = 1, so I cannot reject H0 : βj = .5, either. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 82 / 104
  • 83. Clearly βj = .5 and βj = 1 cannot both be true. There is a single, unknown value in the population. So I should not “accept” either. The outcomes of the t tests tell us the data cannot reject either hypothesis. Nor can the data reject H0 : βj = .6, and so on. The data does reject H0 : βj = 0 (t = 3) at a pretty small significance level (if we have a reasonable df .) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 83 / 104
  • 84. Practical versus Statistical Significance t testing is purely about statistical significance. It does not directly speak to the issue of whether a variable has a practically, or economically, large effect. Practical (Economic) Significance depends on the size (and sign) of β̂j . Statistical Significance depends on tβ̂j . Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 84 / 104
  • 85. It is possible estimate practically large effects but have the estimates so imprecise that they are statistically insignificant. This is especially an issue with small data sets (but not only small data sets). Even more importantly, it is possible to get estimates that are statistically significant – often with very small p-values – but are not practically large. This can happen with very large data sets. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 85 / 104
  • 86. EXAMPLE Suppose that, using a large cross section data set for teenagers across the U.S., we estimate the elasticity of alcohol demand with respect to price to be −.013 with se = .002. Then the t statistic is −6.5, and we need look no further to conclude the elasticity is statistically different from zero. But the estimate means that, say, a 10% increase in the price of alcohol reduces demand by an estimated .13%. This is a small effect. The bottom line: do not just fixate on t statistics! Interpreting the β̂j is just as important. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 86 / 104
  • 87. Confidence Intervals Rather than just testing hypotheses about parameters it is also useful to construct confidence intervals (also know as interval estimates) Loosely, the CI is supposed to give a “likely” range of values for the corresponding population parameter. We will only consider CIs of the form β̂j ± c · se(β̂j ) where c > 0 is chosen based on the confidence level. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 87 / 104
  • 88. We will use a 95% confidence level, in which case c comes from the 97.5 percentile in the tdf distribution. In other words, c is the 5% critical value against a two-sided alternative. Stata automatically reports at 95% CI for each parameter, based on the t distribution using the appropriate df . Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 88 / 104
  • 89. . reg lwage games mingame points rebounds assists Source | SS df MS Number of obs = 269 -------------+------------------------------ F( 5, 263) = 40.27 Model | 90.2698185 5 18.0539637 Prob > F = 0.0000 Residual | 117.918945 263 .448361006 R-squared = 0.4336 -------------+------------------------------ Adj R-squared = 0.4228 Total | 208.188763 268 .776823743 Root MSE = .6696 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- games | .0004132 .002682 0.15 0.878 -.0048679 .0056942 mingame | .0302278 .0130868 2.31 0.022 .0044597 .055996 points | .0363734 .0150945 2.41 0.017 .0066519 .0660949 rebounds | .0406795 .0229455 1.77 0.077 -.0045007 .0858597 assists | .0003665 .0314393 0.01 0.991 -.0615382 .0622712 _cons | 5.648996 .1559075 36.23 0.000 5.34201 5.955982 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 89 / 104
  • 90. Notice how the three estimates that are not statistically different from zero at the 5% level – games, rebounds, and assists – all have 95% CIs that include zero. For example, the 95% CI for βrebounds is [−.0045, .0859] By contrast, the 95% CI for βpoints is [.0067, .0661] which excludes zero. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 90 / 104
  • 91. A simple rule-of-thumb is useful for constructing a CI given the estimate and its standard error. For, say, df ≥ 60, an approximate 95% CI is β̂j ± 2se(β̂j ) or [β̂j − 2se(β̂j ), β̂j + 2se(β̂j )] That is, subtract and add twice the standard error to the estimate. (In the case of the standard normal, the 2 becomes 1.96.) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 91 / 104
  • 92. Properly interpeting a CI is a bit tricky. One often sees statements such as “there is a 95% chance that βpoints is in the interval [.0067, .0661].” This is incorrect. βpoints is some fixed value, and it either is or is not in the interval. The correct way to interpret a CI is to remember that the endpoints, β̂j − c · se(β̂j ) and β̂j + c · se(β̂j ), change with each sample (or at least can change). That is, the endpoints are random outcomes that depend on the data we draw. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 92 / 104
  • 93. What a 95% CI means is that for 95% of the random samples that we draw from the population, the interval we compute using the rule β̂j ± c · se(β̂j ) will include the value βj . But for a particular sample we do not know whether βj is in the interval. This is similar to the idea that unbiasedness of β̂j does not means that β̂j = βj . Most of the time β̂j is not βj . Unbiasedness means E(β̂j ) = βj . Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 93 / 104
  • 94. CIs and Hypothesis Testing If we have constructed a 95% CI for, say, βj , we can test any null value against a two-sided alternative, at the 5% level. So H0 : βj = aj H1 : βj ̸= aj 1. If aj is in the 95% CI, then we fail to reject H0 at the 5% level. 2. If aj is not in the 95% CI then we reject H0 in favor of H1 at the 5% level. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 94 / 104
  • 95. Note that, measured as percents, significance level = 100 − confidence level . reg lwage educ IQ exper motheduc Source | SS df MS Number of obs = 759 -------------+------------------------------ F( 4, 754) = 54.26 Model | 58.7293322 4 14.682333 Prob > F = 0.0000 Residual | 204.019312 754 .270582642 R-squared = 0.2235 -------------+------------------------------ Adj R-squared = 0.2194 Total | 262.748644 758 .346634095 Root MSE = .52018 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .1006798 .0118813 8.47 0.000 .0773555 .124004 IQ | .00735 .0016068 4.57 0.000 .0041957 .0105043 exper | .0449386 .0084136 5.34 0.000 .0284217 .0614555 motheduc | .0239265 .0095623 2.50 0.013 .0051545 .0426985 _cons | -.3837064 .2373921 -1.62 0.106 -.8497344 .0823215 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 95 / 104
  • 96. The 95% CI for βIQ is about [.0042, .0105]. So we can reject H0 : βIQ = 0 against the two-sided alternative at the 5% level. We cannot reject H0 : βIQ = .01 (altough it is close). We can reject a return to schooling of 7.5% as being too low, but also 12.5% is too high. Just as with hypothesis testing, these CIs are only as good as the underlying assumptions. If we have omitted key variables, the β̂j are biased. If the error variance is not constant, the standard errors are improperly computed. With df = 754, we will see later that normality is not very important. But normality is needed for these CIs to be eact 95% CIs. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 96 / 104
  • 97. Testing Single Linear Restrictions So far, we have discussed testing hypotheses that involve only on parameter, βj . But some hypotheses involve many parameters. EXAMPLE: Are the Returns to a Year of Junior College the Same as for a Four-Year University? (COLLEGE.DTA). Sample of high school graduates. lwage = β0 + β1jc + β2univ + β3exper + u H0 : β1 = β2 H1 : β1 < β2 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 97 / 104
  • 98. We could use a two-sided alternative, too. We can also write H0 : β1 − β2 = 0 Remember the general way to construct a t statistic: t = (estimate − hypothesized value) standard error Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 98 / 104
  • 99. Given the OLS estimates β̂1 and β̂2, t = β̂1 − β̂2 se(β̂1 − β̂2) Problem: The OLS output gives us β̂1 and β̂2 and their standard errors, but that is not enough to obtain se(β̂1 − β̂2). Recall a fact about variances: Var(β̂1 − β̂2) = Var(β̂1) + Var(β̂2) − 2Cov(β̂1, β̂2) Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 99 / 104
  • 100. The standard error is an estimate of the square root: se(β̂1 − β̂2) = {[se(β̂1)]2 + [se(β̂1)]2 − 2s12}1/2 where s12 is an estimate of Cov(β̂1, β̂2). This is the piece we are missing. Stata will report s12 if we ask, but calcuting se(β̂1 − β̂2) is cumbersome. There is also a trick of rewriting the model (see text, Section 4.4). These days, it is easiest to use a command for testing linear functions of the coefficients. In Stata, it is lincom. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 100 / 104
  • 101. . des lwage jc univ exper storage display value variable name type format label variable label ----------------------------------------------------------------------------- lwage float %9.0g log(wage) jc float %9.0g total 2-year credits univ float %9.0g total 4-year credits exper float %8.0g work experience, years . sum lwage jc univ exper Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- lwage | 750 2.233674 .4906276 .6931472 3.901973 jc | 750 .3449006 .7731012 0 3.833333 univ | 750 1.817076 2.276202 0 7.5 exper | 750 10.26722 2.713302 .25 13.83333 Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 101 / 104
  • 102. . reg lwage jc univ exper Source | SS df MS Number of obs = 750 -------------+------------------------------ F( 3, 746) = 86.25 Model | 46.4300797 3 15.4766932 Prob > F = 0.0000 Residual | 133.86575 746 .179444705 R-squared = 0.2575 -------------+------------------------------ Adj R-squared = 0.2545 Total | 180.295829 749 .240715393 Root MSE = .42361 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- jc | .0661471 .0202773 3.26 0.001 .0263398 .1059544 univ | .0836956 .0068935 12.14 0.000 .0701626 .0972287 exper | .0653706 .0057692 11.33 0.000 .0540448 .0766964 _cons | 1.387603 .0636388 21.80 0.000 1.262671 1.512536 ------------------------------------------------------------------------------ Note that β̂jc − β̂univ = .0661 − .0837 = −.0176, so the estimated return to univ is about 1.8% higher. But is the difference statistically significant? Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 102 / 104
  • 103. . reg lwage jc univ exper Source | SS df MS Number of obs = 750 -------------+------------------------------ F( 3, 746) = 86.25 Model | 46.4300797 3 15.4766932 Prob > F = 0.0000 Residual | 133.86575 746 .179444705 R-squared = 0.2575 -------------+------------------------------ Adj R-squared = 0.2545 Total | 180.295829 749 .240715393 Root MSE = .42361 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- jc | .0661471 .0202773 3.26 0.001 .0263398 .1059544 univ | .0836956 .0068935 12.14 0.000 .0701626 .0972287 exper | .0653706 .0057692 11.33 0.000 .0540448 .0766964 _cons | 1.387603 .0636388 21.80 0.000 1.262671 1.512536 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 103 / 104
  • 104. . lincom jc - univ ( 1) jc - univ = 0 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -.0175485 .0206407 -0.85 0.395 -.0580694 .0229723 ------------------------------------------------------------------------------ The two-sided p-value is .395, which means the one-sided p-value is .1975. Even against a one-sided alternative, we cannot reject H0 : βjc = βuniv at even the 20% level. Note how much more variation there is in univ compared with jc. Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 104 / 104
  • 105. Of course, nothing changes (except the sign of the estimate) if we use βuniv − βjc: . lincom univ - jc ( 1) - jc + univ = 0 ------------------------------------------------------------------------------ lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .0175485 .0206407 0.85 0.395 -.0229723 .0580694 ------------------------------------------------------------------------------ Liu, H (NUS) Multiple Regression Analysis: Statistical Inference: I August 21, 2022 105 / 104