IV Slides 2020.pptx

Instrumental Variables Estimation

Review of the Linear Model
 Population model: Y = α + βX + ε
– Assume that the true slope is positive, so β > 0
 Sample model: Y = a + bX + e
– Least squares (LS) estimator of β:
bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)
 Under what conditions can we speak of bLS
as a causal estimate of the effect of X on Y?

Review of the Linear Model
 Key assumption of the linear model:
E(X′e) = Cov(X,e) = E(e | X) = 0
– Exogeneity assumption = X is uncorrelated with
the unobserved determinants of Y
 Important statistical property of the LS
estimator under exogeneity:
E(bLS) = β + Cov(X,e) / Var(X)
plim(bLS) = β + Cov(X,e) / Var(X)
Second terms 0,
so bLS unbiased
and consistent

When Is the
Exogeneity Assumption Violated?
Omitted variable (W) that is correlated with
both X and Y
– Classic problem of omitted variables bias
 Coefficient on X will absorb the indirect path through W,
whose sign depends on Cov(X,W) and Cov(W,Y)
X Y
W
Things more complicated in applied
settings because there are bound
to be many W’s,

Instrumental Variables
Estimation Is a Viable Approach
 An “instrumental variable” for X is one solution
to the problem of omitted variables bias
 Requirements for Z to be a
valid instrument for X
– Relevant = Correlated with X
– Exogenous = Not correlated
with Y but through its
correlation with X
Z
X Y
W
e

Instrumental Variables Models
 I often hear...“A good instrument should not
be correlated with the dependent variable”
– WRONG!!!
 Z has to be correlated with Y, otherwise it is
useless as an instrument
– It can only be correlated with Y through X
 A good instrument must not be correlated
with the unobserved determinants of Y

 Requirements for Z to be a good instrument
for X
– Assumption #1 (Validity):
 Z is correlated with X
– Assumption #2 (Exclusion Restriction):
 Z is not correlated with Y but through its correlation with
X (once we know X there is no extra predicitive
information in Z ; in other words Z can be excluded from
the equation Y = Xβ + ε, that is, Z is not correlated with
ε).

– Assumption #1: Z is correlated with X
 Testable
 Regress endogenous (key) X on Z (and other controls
(W))
 First stage: X = α0 + α1Z + α2W +ω
 Look at the F stat in the 1st stage. Rule of thumb needs
to be 10 or greater, else weak instrument problems
(more on this later)

– Assumption #2 Z is not correlated with Y but
through its correlation with X (Z is not correlated
with ε after controlling for W)
 NOT TESTABLE !!!*
 You need to convince yourself and the reader that your
instrument is appropriatie. (That people with high and
low values of Z share all ommitted traits that impact Y)
*When there are multiple instruments, there are overidentification tests but
they still depend on having one valid instrument which is untestable.

Examples of Possible IVs
 Fertility and Female Labor Supply
– Y is labor force participations
– Endogenous variable family size
– Potential Instruments for number of children:
Twins, Abortion, Sex Composition of first two
children, Infertility, Mother’s Education
– All impact completed family size so they meet the
validity requirement. Which ones do/do not meet
the exclusion restriction?

Instrumental
Variables Terminology
 Three different models to be familiar with
– First stage: X = α0 + α1Z + ω
– Structural model: Y = β0 + β1X + ε
– Reduced form: Y = δ0 + δ1Z + ξ

The Wald estimator example
 My paper Journal of Human Resources paper
Y is female labor supply
– X is number of children (one endogenous variable)
– Z is an indictor for infertile (binary instrument)
 The Wald estimator is the ratio of the reduced form to
first stage
 bWald = [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)]

The Wald estimator (in pictures)
First Stage

The Wald estimator (in pictures)
Reduced Form and Wald

Different Types of
Instrumental Variables Estimators
 Single binary instrument and no control
variables...
bWald = bIV = b2SLS
 Single instrument (binary or continuous) with
or without control variables...
bIV = b2SLS
 Multiple instruments (binary or continuous)
with or without control variables...
b2SLS

Different Types of
Instrumental Variables Estimators
 Least squares (LS) estimator of β:
– bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)
 Instrumental variables (IV) estimator:
bIV = (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X)
– Shows that bIV can be recovered from two samples
 Two-stage least squares (2SLS) estimator:
b2SLS = (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃)
– X̃ represents “fitted” value from first-stage model

Important Point about
 Not all of the available variation in X is used
(this means standard errors will increase !!)
– Only that portion of X which is “explained” by
Z is used to explain Y
X Y
Z
X = Endogenous variable
Y = Response variable
Z = Instrumental variable

X Y
Z
Realistic scenario: Very
little of X is explained by Z,
or what is explained does
not overlap much with Y
Weak First stage
X Y
Z
Best-case scenario: A lot of
X is explained by Z, and
most of the overlap between
X and Y is accounted for
Strong First stage

Statistical Inference with IV
 Variance estimation
σ2
βLS
= σ2
ε / SSTX
σ2
βIV
= σ2
ε / (SSTX  R2
X,Z)
where…
ε = Y – β0 – β1X
 NOTICE: Because R2
X,Z < 1  sbIV
> sbLS
– IV standard errors tend to be large, especially when
R2
X,Z is very small, which can lead to type II errors

Regression Results
bWald = bIV = b2SLS (with no controls)
Table 3
Number of Children and Labor Force Participation of W omen
Dependent variable:
Women worked in the
last 12 months (=1)
Model 1 Model 2
OLS IV OLS IV
(i) (ii) (iii) (iv)
Number of children -0.024*** -0.006 -0.017*** -0.005
[0.002] [0.008] [0.002] [0.007]
Observations 90,965 90,965 90,965 90,965
F-statistic (first stage) 814.2 853.9
Note: Robust standard errors (in brackets) are clustered at the sub-national level. * denotes
significance at 10 percent, ** at 5 percent, and *** at 1 percent. Model 1 includes women’s
age and survey fixed effects. Model 2 adds to Model 1 education, age and education
interactions, age at first intercourse, marital status, age at first marriage, and spouse’s
education. The 2SLS instrument for children at home using the union of the infertility
measures. The F-statistic refers to the first stage results.

“Evidence to support the
exclusion restriction”
 We are worried that infertility captures
unobservables that directly impact female
labor supply
– Show balance in observables between fertile and
infertile women (after conditioning on age)
– Include key controls (health indicators) and show
that results are unchanged
– Cite medical literature on infertility which argues
for randomness

Example: Levitt (1997), A.E.R.
 Breaking the simultaneity in the police-crime
connection
– When more police are hired, crime should decline
– But...more police may be hired during crime waves
 Election cycles and police hiring
– Increases in size of police force disproportionately
concentrated in election years
– Growth is 2.1% in mayoral election years, 2.0% in
gubernatorial election years, and 0.0% in non-
election years

Levitt (1997), A.E.R.
 However...can election cycles affect crime
rates through other spending channels?
– Ex., education, welfare, unemployment benefits
– If so, all of these other indirect channels must be
netted out or shown to not be true
Growth in
Police
Manpower
+ – Growth in
Crime Rate
Election
Year

First-stage
coefficients
Reduced-
form
coefficients

 Comparative estimates of the effect of police
manpower on city crime rates
– Violent crime rate
 Changes: bOLS = –.27 (s.e. = .06)
 Changes: bIV = –1.39 (s.e. = .55)
– Property crime rate
 Changes: bOLS = –.23 (s.e. = .09)
 Changes: bIV = –.38 (s.e. = .83)
Note larger s.e.

How to do IV with one instrument
(Z) and covariates (W)
 Step 1: X = a0 + a1Z + a2W+ u
– Obtain fitted values (X̃) from the first-stage model
 Step 2: Y = b0 + b1X̃ + b2W + e
– Substitute the fitted X̃ in place of the original X
– Note: If done manually in two stages, the standard
errors are based on the wrong residual
e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X
 Best to just let the software do it for you

Including Control
Variables in an IV/2SLS Model
 Control variables (W’s) should be entered into
the model at both stages
– First stage: X = a0 + a1Z + a2W + u
– Second stage: Y = b0 + b1X̃ + b2W + e
 Control variables are considered
“instruments,” they are just not “excluded
instruments”
– They serve as their own instrument

Software Considerations
 Basic model specification in Stata
ivreg y (x = z) w [weight = wtvar], options
y = dependent variable
x = endogenous variable
z = instrumental variable/s
w = control variable(s)
– Useful options: first, ffirst, robust, cluster(varname)

Functional
Form Considerations with IV/2SLS
 Binary endogenous regressor (X)
– Consistency of second-stage estimates do not
hinge on getting first-stage functional form correct
– Can run OLS not probit in the first stage
 Binary response variable (Y)
– IV probit (or logit) is feasible but is technically
unnecessary
 In both cases, linear model is tractable, easily
interpreted, and consistent

Functional
Form Considerations with IV/2SLS
– Linear and squared X’s are treated as two different
endogenous regressors each of which need their
own instrument
– Entering first-stage fitted values and their square
into second-stage model leads to inconsistency
 The square of a linear projection is not equivalent to a
linear projection on a quadratic
– Squares and cross-products of IV’s should be
treated (when appropriate) as additional
instruments.

More Examples of Possible IVs
 Random Experiment with imperfect compliance
The draft, actual experiments
 Instruments trying to approximate a random
encouragement design
– Distance to hospital/school
 Natural Experiments
– Shift shares, quarter of birth, Election years
– If you can identify a new instrument and convince
the readers that it satisfies the exclusion
restriction then you have a paper

Often a given instrument can be
used in multiple settings
 Relationship between childhood TV watching and
autism
– Instrument rainy/snowy days
– First stage/Validity: More TV is watched on “bad” days
– Exclusion Restriction: The weather is pretty random and
should not impact autism rates
Could use the instrument to look at ,say, TV watching and
vision problems
Bad instrument for the relationship between TV watching and
school grades if bad weather depresses kids (and depressed
kids get worse grades).

and dif-n-dif
 An instrument can be formed by interacting two variables,
say time and group. We are then using a DD as the first
stage of the relationship. In the second stage, we control
for the two uninteracted variable.
 Consider the school experiment in (Duflo 2000). There are
two types of regions (High H and Low L regions) and two
types of cohorts (Young Y and Old O). The program
affected mostly the education of young cohorts in the high
program regions.
 Assume that the program affected the wage of the
individuals only through its effects on education

Rules for Good Practice with
 IV models can be very informative, but it is
your job to convince your audience
– Show the first-stage model diagnostics
 Even the most clever IV might not be sufficiently strongly
related to X to be a useful source of identification
– Report test(s) of overidentifying restrictions
 An invalid IV is often worse than no IV at all
– Report LS endogeneity (DWH) test

Useful
Diagnostic Tools for IV Models
 Overidentification test
– Model must be overidentified, i.e., more IV’s than
endogenous X’s
– we may test whether the excluded instruments
are appropriately independent of the error process
 In Stata it can be calculated after ivreg
estimation with the overid command

Tests of Instrument Exogeneity
A test of overidentifying restrictions regresses the
residuals from a 2SLS regression on all instruments
in Z .
H0: All IV’s uncorrelated with structural error
 Overidentification test:
1. Estimate structural model
2. Regress IV residuals on all exogenous variables
3. Compute NR2 and compare to chi-square
 df = # IV’s – # endogenous X’s

Useful
Diagnostic Tools for IV Models
 Durbin-Wu-Hausman test
– Endogeneity of the problem regressor(s)
In Stata ivendog

 The IV estimator is BIASED
– In other words, E(bIV) ≠ β (finite-sample bias)
– The appeal of IV derives from its consistency
 “Consistency” is a way of saying that E(b) → β as N → ∞
 So…IV studies often have very large samples
– But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β
anyway
 Asymptotic behavior of IV
plim(bIV) = β + Cov(Z,e) / Cov(Z,X)
– If Z is truly exogenous, then Cov(Z,e) = 0

Durbin-Wu-Hausman (DWH) Test
 Balances the consistency of IV against the
efficiency of OLS
– H0: IV and OLS both consistent, but OLS is
efficient
– H1: Only IV is consistent
 DWH test for a single endogenous regressor:
DWH = (bIV – bLS) / √(s2
bIV
– s2
bLS
) ~ N(0,1)
– If |DWH| > 1.96, then X is endogenous and IV is
the preferred estimator despite its inefficiency

Post-estimation tests
 These tests are useful, but have two
problems:
– They may reject if the treatment effect is
heterogenous, and the instruments exploit variation
at different parts of the treatment response function.
(LATE vs. ATE more on this soon)
– Their power is not very strong and they tend to
accept too often.

Rules for Good Practice with
 Most importantly, TELL A STORY about why a
particular IV is a “good instrument”
 Something to consider when thinking about whether a
particular IV is “good”
– Does the IV, for all intents and purposes, randomize the
endogenous regressor?
– Often authors shows that conditional of W (or some key
W) individuals with high and low values of Z look similar.
– Control for potential pathways and show they don’t
matter

Instrumental Variables and
Local Average Treatment Effects
 We have established the conditions that would yield
Internal Validity, e.g. that we would indeed get a
causal estimate of the effect of children on female
labor supply in our sample with the IV that we were
using.
 There is also the important issue of External Validity:
what do our estimates tell us about the world in
general? What results are particular to our sample and
the IV that we were using?

 Definition of a L.A.T.E.
– The average treatment effect for individuals “who
can be induced to change [treatment] status by a
change in the instrument”
 Imbens and Angrist (1994, p. 470)
– The average causal effect of X on Y for “compliers,”
as opposed to “always takers” or “never takers”
 L.A.T.E. is instrument-dependent, in contrast
to the population average treatment effect
(A.T.E)

L.A.T.E.
in the Previous Examples
 In the Levitt study...
– In cities that increased police spending to appear
tough on crime during the election year, each
additional cop resulted in a mean xxx decline in the
violent crime rate
 In the Duflo study...
– For young people who got additional schooling
because a school opened up in their area, each
additional year of schooling resulted in a xxx
increase in wages

 Assume a binary instrument and a binary treatment
 “Compliers”
 X==1 iff Z==1 or X=0 if Z=0 and X=1 if Z=1
 These are the people from whom identification comes
 “Always Takers”
 X==1 or X=1 if Z=0 and X=1 if Z=1
 “Never Takers”
 X==0 or X=0 if Z=0 and X=0 if Z=1

In the draft example
 “Compliers” : those for whom the lottery number makes a
difference to the army service decision
 “Always Takers”: would have volunteered anyway
 “Never Takers” :would have avoided the draft irrespective of
their lottery number
 If we assume that the impact of army service on wages is the
same for every individual in the population then this IV estimate
represents a population average. However, if the impact of
army service is different for “non-compliers”, then we must be
careful while extrapolating IV estimates to the whole population.

L.A.T.E.
 IV estimates may not be easily compared to
each other or to OLS because of LATE.
Similarly the IV estimate may not be
meaningful for the policy question at hand.
 IV will not produce the average treatment
effect, but instead the average treatment effect
for all those individuals who helped provided
us with identifying variation

L.A.T.E.
For the fertility instruments
 For twins and sex composition...
– People who had more children than they otherwise
would have (those who desire small families)
– Sex composition further restriction to the subsample
that care about the sex composition of their children
 For infertility...
– People who had fewer children than they otherwise
would have (those who desire large families)
 If interested in family planning programs this is the policy
relevant one.

and Randomized Experiments
 Imperfect compliance in randomized trials
– Some individuals assigned to treatment group will
not receive Tx, and/or some assigned to control
group will receive Tx
 Assignment error; subject refusal; investigator discretion
– Some individuals who receive Tx will not change
their behavior, and some who do not receive Tx will
change their behavior
 A problem in randomized job training studies and other
social experiments (e.g., housing vouchers)

and Randomized Experiments
 Two different measures of treatment (X)
– Treatment assigned = Exogenous
 Intention-to-treat (ITT) analysis
– Reduced-form model: Y = δ0 + δ1Z + ξ
– Where Z is randomized into treatment
 Often leads to underestimation of treatment effect
– Treatment delivered = Endogenous
 Individuals who do not comply probably differ in ways that
can undermine the study
 Self-selection  bias and inconsistency

Angrist (2006), J.E.C.
 Minneapolis domestic violence experiment
– Sherman and Berk (1984)
 Cases of male-on-female misdemeanor assault in two high-
density precincts, in which both parties present at scene
– Random assignment of arrest-mediation
– But...treatment assigned was not treatment delivered
 Fidelity vis-à-vis arrest, but many subjects (~25%) assigned
to mediation were arrested
– “Upgrading” was more likely when suspect was rude, suspect
assaulted officer, weapons were involved, victim persistently
demanded arrest, and incident violated restraining order

Treatment
Assigned
(Arrest)
Treatment
Delivered
(Arrest)
Recidivism
+ –
Violence
Proneness
+
+

 Estimates of effect of arrest (vs. mediate) on D.V.
recidivism (Tables 2, 3)
– OLS: b = –.070 (s.e. = .038)
– ITT: b = –.108 (s.e. = .041)
– 2SLS: b = –.140 (s.e. = .053)
 Deterrent effect of arrest is twice as large in 2SLS as
opposed to OLS
– In this context, the 2SLS estimate is known as a
treatment on the treated (it will always be large
than the ITT -think of the Wald equation).

IV Slides 2020.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IV Slides 2020.pptx

Similar to IV Slides 2020.pptx (20)

More from Mamdouh Mohamed

More from Mamdouh Mohamed (20)

Recently uploaded

Recently uploaded (20)

IV Slides 2020.pptx