2. Review of the Linear Model
Population model: Y = α + βX + ε
– Assume that the true slope is positive, so β > 0
Sample model: Y = a + bX + e
– Least squares (LS) estimator of β:
bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)
Under what conditions can we speak of bLS
as a causal estimate of the effect of X on Y?
3. Review of the Linear Model
Key assumption of the linear model:
E(X′e) = Cov(X,e) = E(e | X) = 0
– Exogeneity assumption = X is uncorrelated with
the unobserved determinants of Y
Important statistical property of the LS
estimator under exogeneity:
E(bLS) = β + Cov(X,e) / Var(X)
plim(bLS) = β + Cov(X,e) / Var(X)
Second terms 0,
so bLS unbiased
and consistent
4. When Is the
Exogeneity Assumption Violated?
Omitted variable (W) that is correlated with
both X and Y
– Classic problem of omitted variables bias
Coefficient on X will absorb the indirect path through W,
whose sign depends on Cov(X,W) and Cov(W,Y)
X Y
W
Things more complicated in applied
settings because there are bound
to be many W’s,
5. Instrumental Variables
Estimation Is a Viable Approach
An “instrumental variable” for X is one solution
to the problem of omitted variables bias
Requirements for Z to be a
valid instrument for X
– Relevant = Correlated with X
– Exogenous = Not correlated
with Y but through its
correlation with X
Z
X Y
W
e
6. Instrumental Variables Models
I often hear...“A good instrument should not
be correlated with the dependent variable”
– WRONG!!!
Z has to be correlated with Y, otherwise it is
useless as an instrument
– It can only be correlated with Y through X
A good instrument must not be correlated
with the unobserved determinants of Y
7. Instrumental Variables Models
Requirements for Z to be a good instrument
for X
– Assumption #1 (Validity):
Z is correlated with X
– Assumption #2 (Exclusion Restriction):
Z is not correlated with Y but through its correlation with
X (once we know X there is no extra predicitive
information in Z ; in other words Z can be excluded from
the equation Y = Xβ + ε, that is, Z is not correlated with
ε).
8. Instrumental Variables Models
– Assumption #1: Z is correlated with X
Testable
Regress endogenous (key) X on Z (and other controls
(W))
First stage: X = α0 + α1Z + α2W +ω
Look at the F stat in the 1st stage. Rule of thumb needs
to be 10 or greater, else weak instrument problems
(more on this later)
9. Instrumental Variables Models
– Assumption #2 Z is not correlated with Y but
through its correlation with X (Z is not correlated
with ε after controlling for W)
NOT TESTABLE !!!*
You need to convince yourself and the reader that your
instrument is appropriatie. (That people with high and
low values of Z share all ommitted traits that impact Y)
*When there are multiple instruments, there are overidentification tests but
they still depend on having one valid instrument which is untestable.
10. Examples of Possible IVs
Fertility and Female Labor Supply
– Y is labor force participations
– Endogenous variable family size
– Potential Instruments for number of children:
Twins, Abortion, Sex Composition of first two
children, Infertility, Mother’s Education
– All impact completed family size so they meet the
validity requirement. Which ones do/do not meet
the exclusion restriction?
11. Instrumental
Variables Terminology
Three different models to be familiar with
– First stage: X = α0 + α1Z + ω
– Structural model: Y = β0 + β1X + ε
– Reduced form: Y = δ0 + δ1Z + ξ
13. The Wald estimator example
My paper Journal of Human Resources paper
Y is female labor supply
– X is number of children (one endogenous variable)
– Z is an indictor for infertile (binary instrument)
The Wald estimator is the ratio of the reduced form to
first stage
bWald = [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)]
16. Different Types of
Instrumental Variables Estimators
Single binary instrument and no control
variables...
bWald = bIV = b2SLS
Single instrument (binary or continuous) with
or without control variables...
bIV = b2SLS
Multiple instruments (binary or continuous)
with or without control variables...
b2SLS
17. Different Types of
Instrumental Variables Estimators
Least squares (LS) estimator of β:
– bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)
Instrumental variables (IV) estimator:
bIV = (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X)
– Shows that bIV can be recovered from two samples
Two-stage least squares (2SLS) estimator:
b2SLS = (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃)
– X̃ represents “fitted” value from first-stage model
18. Important Point about
Instrumental Variables Models
Not all of the available variation in X is used
(this means standard errors will increase !!)
– Only that portion of X which is “explained” by
Z is used to explain Y
X Y
Z
X = Endogenous variable
Y = Response variable
Z = Instrumental variable
19. Important Point about
Instrumental Variables Models
X Y
Z
Realistic scenario: Very
little of X is explained by Z,
or what is explained does
not overlap much with Y
Weak First stage
X Y
Z
Best-case scenario: A lot of
X is explained by Z, and
most of the overlap between
X and Y is accounted for
Strong First stage
20. Statistical Inference with IV
Variance estimation
σ2
βLS
= σ2
ε / SSTX
σ2
βIV
= σ2
ε / (SSTX R2
X,Z)
where…
ε = Y – β0 – β1X
NOTICE: Because R2
X,Z < 1 sbIV
> sbLS
– IV standard errors tend to be large, especially when
R2
X,Z is very small, which can lead to type II errors
21. Regression Results
bWald = bIV = b2SLS (with no controls)
Table 3
Number of Children and Labor Force Participation of W omen
Dependent variable:
Women worked in the
last 12 months (=1)
Model 1 Model 2
OLS IV OLS IV
(i) (ii) (iii) (iv)
Number of children -0.024*** -0.006 -0.017*** -0.005
[0.002] [0.008] [0.002] [0.007]
Observations 90,965 90,965 90,965 90,965
F-statistic (first stage) 814.2 853.9
Note: Robust standard errors (in brackets) are clustered at the sub-national level. * denotes
significance at 10 percent, ** at 5 percent, and *** at 1 percent. Model 1 includes women’s
age and survey fixed effects. Model 2 adds to Model 1 education, age and education
interactions, age at first intercourse, marital status, age at first marriage, and spouse’s
education. The 2SLS instrument for children at home using the union of the infertility
measures. The F-statistic refers to the first stage results.
22. “Evidence to support the
exclusion restriction”
We are worried that infertility captures
unobservables that directly impact female
labor supply
– Show balance in observables between fertile and
infertile women (after conditioning on age)
– Include key controls (health indicators) and show
that results are unchanged
– Cite medical literature on infertility which argues
for randomness
23. Example: Levitt (1997), A.E.R.
Breaking the simultaneity in the police-crime
connection
– When more police are hired, crime should decline
– But...more police may be hired during crime waves
Election cycles and police hiring
– Increases in size of police force disproportionately
concentrated in election years
– Growth is 2.1% in mayoral election years, 2.0% in
gubernatorial election years, and 0.0% in non-
election years
24. Levitt (1997), A.E.R.
However...can election cycles affect crime
rates through other spending channels?
– Ex., education, welfare, unemployment benefits
– If so, all of these other indirect channels must be
netted out or shown to not be true
Growth in
Police
Manpower
+ – Growth in
Crime Rate
Election
Year
27. How to do IV with one instrument
(Z) and covariates (W)
Step 1: X = a0 + a1Z + a2W+ u
– Obtain fitted values (X̃) from the first-stage model
Step 2: Y = b0 + b1X̃ + b2W + e
– Substitute the fitted X̃ in place of the original X
– Note: If done manually in two stages, the standard
errors are based on the wrong residual
e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X
Best to just let the software do it for you
28. Including Control
Variables in an IV/2SLS Model
Control variables (W’s) should be entered into
the model at both stages
– First stage: X = a0 + a1Z + a2W + u
– Second stage: Y = b0 + b1X̃ + b2W + e
Control variables are considered
“instruments,” they are just not “excluded
instruments”
– They serve as their own instrument
29. Software Considerations
Basic model specification in Stata
ivreg y (x = z) w [weight = wtvar], options
y = dependent variable
x = endogenous variable
z = instrumental variable/s
w = control variable(s)
– Useful options: first, ffirst, robust, cluster(varname)
30. Functional
Form Considerations with IV/2SLS
Binary endogenous regressor (X)
– Consistency of second-stage estimates do not
hinge on getting first-stage functional form correct
– Can run OLS not probit in the first stage
Binary response variable (Y)
– IV probit (or logit) is feasible but is technically
unnecessary
In both cases, linear model is tractable, easily
interpreted, and consistent
31. Functional
Form Considerations with IV/2SLS
– Linear and squared X’s are treated as two different
endogenous regressors each of which need their
own instrument
– Entering first-stage fitted values and their square
into second-stage model leads to inconsistency
The square of a linear projection is not equivalent to a
linear projection on a quadratic
– Squares and cross-products of IV’s should be
treated (when appropriate) as additional
instruments.
32. More Examples of Possible IVs
Random Experiment with imperfect compliance
The draft, actual experiments
Instruments trying to approximate a random
encouragement design
– Distance to hospital/school
Natural Experiments
– Shift shares, quarter of birth, Election years
– If you can identify a new instrument and convince
the readers that it satisfies the exclusion
restriction then you have a paper
33. Often a given instrument can be
used in multiple settings
Relationship between childhood TV watching and
autism
– Instrument rainy/snowy days
– First stage/Validity: More TV is watched on “bad” days
– Exclusion Restriction: The weather is pretty random and
should not impact autism rates
Could use the instrument to look at ,say, TV watching and
vision problems
Bad instrument for the relationship between TV watching and
school grades if bad weather depresses kids (and depressed
kids get worse grades).
34. Instrumental Variables
and dif-n-dif
An instrument can be formed by interacting two variables,
say time and group. We are then using a DD as the first
stage of the relationship. In the second stage, we control
for the two uninteracted variable.
Consider the school experiment in (Duflo 2000). There are
two types of regions (High H and Low L regions) and two
types of cohorts (Young Y and Old O). The program
affected mostly the education of young cohorts in the high
program regions.
Assume that the program affected the wage of the
individuals only through its effects on education
35. Instrumental Variables
and dif-n-dif
The DD estimator for the effect of the program on education S is:
(E[S|H, Y ] − E[S|H, O]) − (E[S|L, Y ] − E[S|L, O])
The DD estimator for the effect of the program on wages W is:
(E[W|H,Y ] − E[W|H,O]) − (E[W|L,Y ] − E[W|L,O])
The effect of education on wages can be obtained by taking the
ratio of the two DD. This is the Wald estimator:
E[W|H,Y ] − E[W|H,O]) − (E[W|L,Y ] − E[W|L,O] /
E[S|H, Y ] − E[S|H, O]) − (E[S|L, Y ] − E[S|L, O]
The corresponding regression would be:
W =α+βY +γH+δS+ε
where H is a dummy equal to 1 in the high program region, Y is a
dummy equal to 1 for the young and S is instrumented with the
interaction H × Y
36. Rules for Good Practice with
Instrumental Variables Models
IV models can be very informative, but it is
your job to convince your audience
– Show the first-stage model diagnostics
Even the most clever IV might not be sufficiently strongly
related to X to be a useful source of identification
– Report test(s) of overidentifying restrictions
An invalid IV is often worse than no IV at all
– Report LS endogeneity (DWH) test
37. Useful
Diagnostic Tools for IV Models
Overidentification test
– Model must be overidentified, i.e., more IV’s than
endogenous X’s
– we may test whether the excluded instruments
are appropriately independent of the error process
In Stata it can be calculated after ivreg
estimation with the overid command
38. Tests of Instrument Exogeneity
A test of overidentifying restrictions regresses the
residuals from a 2SLS regression on all instruments
in Z .
H0: All IV’s uncorrelated with structural error
Overidentification test:
1. Estimate structural model
2. Regress IV residuals on all exogenous variables
3. Compute NR2 and compare to chi-square
df = # IV’s – # endogenous X’s
39. Useful
Diagnostic Tools for IV Models
Durbin-Wu-Hausman test
– Endogeneity of the problem regressor(s)
In Stata ivendog
40. Important Point about
Instrumental Variables Models
The IV estimator is BIASED
– In other words, E(bIV) ≠ β (finite-sample bias)
– The appeal of IV derives from its consistency
“Consistency” is a way of saying that E(b) → β as N → ∞
So…IV studies often have very large samples
– But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β
anyway
Asymptotic behavior of IV
plim(bIV) = β + Cov(Z,e) / Cov(Z,X)
– If Z is truly exogenous, then Cov(Z,e) = 0
41. Durbin-Wu-Hausman (DWH) Test
Balances the consistency of IV against the
efficiency of OLS
– H0: IV and OLS both consistent, but OLS is
efficient
– H1: Only IV is consistent
DWH test for a single endogenous regressor:
DWH = (bIV – bLS) / √(s2
bIV
– s2
bLS
) ~ N(0,1)
– If |DWH| > 1.96, then X is endogenous and IV is
the preferred estimator despite its inefficiency
42. Post-estimation tests
These tests are useful, but have two
problems:
– They may reject if the treatment effect is
heterogenous, and the instruments exploit variation
at different parts of the treatment response function.
(LATE vs. ATE more on this soon)
– Their power is not very strong and they tend to
accept too often.
43. Rules for Good Practice with
Instrumental Variables Models
Most importantly, TELL A STORY about why a
particular IV is a “good instrument”
Something to consider when thinking about whether a
particular IV is “good”
– Does the IV, for all intents and purposes, randomize the
endogenous regressor?
– Often authors shows that conditional of W (or some key
W) individuals with high and low values of Z look similar.
– Control for potential pathways and show they don’t
matter
44. Instrumental Variables and
Local Average Treatment Effects
We have established the conditions that would yield
Internal Validity, e.g. that we would indeed get a
causal estimate of the effect of children on female
labor supply in our sample with the IV that we were
using.
There is also the important issue of External Validity:
what do our estimates tell us about the world in
general? What results are particular to our sample and
the IV that we were using?
45. Instrumental Variables and
Local Average Treatment Effects
Definition of a L.A.T.E.
– The average treatment effect for individuals “who
can be induced to change [treatment] status by a
change in the instrument”
Imbens and Angrist (1994, p. 470)
– The average causal effect of X on Y for “compliers,”
as opposed to “always takers” or “never takers”
L.A.T.E. is instrument-dependent, in contrast
to the population average treatment effect
(A.T.E)
46. L.A.T.E.
in the Previous Examples
In the Levitt study...
– In cities that increased police spending to appear
tough on crime during the election year, each
additional cop resulted in a mean xxx decline in the
violent crime rate
In the Duflo study...
– For young people who got additional schooling
because a school opened up in their area, each
additional year of schooling resulted in a xxx
increase in wages
47. Instrumental Variables and
Local Average Treatment Effects
Assume a binary instrument and a binary treatment
“Compliers”
X==1 iff Z==1 or X=0 if Z=0 and X=1 if Z=1
These are the people from whom identification comes
“Always Takers”
X==1 or X=1 if Z=0 and X=1 if Z=1
“Never Takers”
X==0 or X=0 if Z=0 and X=0 if Z=1
48. Instrumental Variables and
Local Average Treatment Effects
In the draft example
“Compliers” : those for whom the lottery number makes a
difference to the army service decision
“Always Takers”: would have volunteered anyway
“Never Takers” :would have avoided the draft irrespective of
their lottery number
If we assume that the impact of army service on wages is the
same for every individual in the population then this IV estimate
represents a population average. However, if the impact of
army service is different for “non-compliers”, then we must be
careful while extrapolating IV estimates to the whole population.
49. Local Average Treatment Effects
L.A.T.E.
IV estimates may not be easily compared to
each other or to OLS because of LATE.
Similarly the IV estimate may not be
meaningful for the policy question at hand.
IV will not produce the average treatment
effect, but instead the average treatment effect
for all those individuals who helped provided
us with identifying variation
50. L.A.T.E.
For the fertility instruments
For twins and sex composition...
– People who had more children than they otherwise
would have (those who desire small families)
– Sex composition further restriction to the subsample
that care about the sex composition of their children
For infertility...
– People who had fewer children than they otherwise
would have (those who desire large families)
If interested in family planning programs this is the policy
relevant one.
51. Instrumental Variables
and Randomized Experiments
Imperfect compliance in randomized trials
– Some individuals assigned to treatment group will
not receive Tx, and/or some assigned to control
group will receive Tx
Assignment error; subject refusal; investigator discretion
– Some individuals who receive Tx will not change
their behavior, and some who do not receive Tx will
change their behavior
A problem in randomized job training studies and other
social experiments (e.g., housing vouchers)
52. Instrumental Variables
and Randomized Experiments
Two different measures of treatment (X)
– Treatment assigned = Exogenous
Intention-to-treat (ITT) analysis
– Reduced-form model: Y = δ0 + δ1Z + ξ
– Where Z is randomized into treatment
Often leads to underestimation of treatment effect
– Treatment delivered = Endogenous
Individuals who do not comply probably differ in ways that
can undermine the study
Self-selection bias and inconsistency
53. Angrist (2006), J.E.C.
Minneapolis domestic violence experiment
– Sherman and Berk (1984)
Cases of male-on-female misdemeanor assault in two high-
density precincts, in which both parties present at scene
– Random assignment of arrest-mediation
– But...treatment assigned was not treatment delivered
Fidelity vis-à-vis arrest, but many subjects (~25%) assigned
to mediation were arrested
– “Upgrading” was more likely when suspect was rude, suspect
assaulted officer, weapons were involved, victim persistently
demanded arrest, and incident violated restraining order
55. Angrist (2006), J.E.C.
Estimates of effect of arrest (vs. mediate) on D.V.
recidivism (Tables 2, 3)
– OLS: b = –.070 (s.e. = .038)
– ITT: b = –.108 (s.e. = .041)
– 2SLS: b = –.140 (s.e. = .053)
Deterrent effect of arrest is twice as large in 2SLS as
opposed to OLS
– In this context, the 2SLS estimate is known as a
treatment on the treated (it will always be large
than the ITT -think of the Wald equation).