2. Causality
• Economists’ use of data to answer cause-and-effect
questions constitutes the field of applied econometrics
• Comparisons made under ceteris paribus conditions
have a causal interpretation
– real other things equal comparisons are hard to engineer,
some would even say impossible
• However, program evaluators use data to get to other
things equal in spite of the obstacles—called selection
bias or omitted variables bias—found on the path
running from raw numbers to reliable causal
knowledge.
3. Set-up
• Americans spend high on health but have relatively poor quality health
• Medicare for elderly and Medicaid for the underprivileged
• But many don't have any insurance
• The ceteris paribus question in this context contrasts the health of someone with
insurance coverage to the health of the same person were they without insurance
• National Health Interview Survey (NHIS), an annual survey of the U.S. population
with detailed information on health and health insurance.
• NHIS asks: “Would you say your health in general is excellent, very good, good, fair,
or poor?”
• Build an index that assigns 5 to excellent health and 1 to poor health in a sample of
married 2009 NHIS respondents who may or may not be insured.
• Treatment group= with insurance; Comparison or Control group =without
insurance
• A good control group reveals the fate of the treated in a counterfactual world
where they are not treated.
4. TABLE 1.1
Health and demographic characteristics of insured and uninsured couples in the NHIS
Notes: This table reports average characteristics for insured and uninsured married couples in the 2009
National Health Interview Survey (NHIS). Columns (1), (2), (4), and (5) show average characteristics of the group
of individuals specified by the column heading. Columns (3) and (6) report the difference between the average
characteristic for individuals with and without health insurance (HI). Standard deviations are in brackets;
standard errors are reported in parentheses.
5. CETERIS PARIBUS
• Simple comparisons are misleading
– Other things NOT held constant
• Health Insured people are healthier, but also
wealthier, more educated etc.
• Health is correlated with healthier habits,
education, income and the like
6. Crux of Program Evaluation
• Yi=Outcome (here health) for individual i and
is observed in the data
• But two potential outcomes of which only one
is observed—either insured (Y1i) or not
insured (Y0i)
• The causal effect of treatment (insurance) on
the outcome (health)= Y1i- Y0i
7. Example
TABLE 1.2
Outcomes and treatments for Khuzdar and Maria
Khuzdar Maria
Potential outcome without insurance: Y0i {3} 5
Potential outcome with insurance: Y1i 4 {5}
Treatment (insurance status chosen): Di 1 0
Actual health outcome: Yi 4 5
Treatment effect: Y1i – Y0i 1 0
Naïve Comparison: YKhuzdar – YMaria -1
{}: Not observed
8. Selection Bias
• 1 represents the causal effect of insurance
• -2 reflects Khuzdar’s relative frailty and is called SELECTION BIAS
Core Problem with Selection Bias: - Does not average out for large sample
also when we try to calculate average causal effects
9. Extension for a group as a whole
• Define a dummy Di for denoting insurance
status for individual i
Denote,
Avgn[Yi|Di = 1] = Average among the insured =an average of outcome
Y1i, but contains no information about Y0i
Avgn[Yi|Di = 0] = Average among the uninsured=an average of
outcome Y0i, but contains no information about Y1i
10. Contd.
• While we are looking for Average Causal Effect,
Avgn[Y1i – Y0i] for the entire population, we see
average Y1i only for the insured and average Y0i
only for the uninsured
• A simple manipulation leads to
where selection bias is defined as the difference in
average Y0i between the groups being compared.
11. Selection on Unobservables
• Recall from lower panel of Table 1.1, health-insured people
were observed to be different from the uninsured ones
• If selection bias is only due to observed and measurable
characteristics, say , education, selection bias is eliminated
by focusing on samples of people with the same schooling.
• However, when observed differences proliferate, so should
our suspicions about unobserved differences
– even in a sample consisting of insured and uninsured people
with the same education, income, and employment status, the
insured might have higher values of Y0i.
• The principal challenge is elimination of the selection bias
that arises from such unobserved differences