10. mastersalikeas’metrics.Thetoolsofthe’metricstrade aredisciplined
dataanalysis,pairedwiththemachineryofstatisticalinference.There
is
amysticalaspecttoourworkaswell:we’reaftertruth,buttruthisnot
revealed in full, and the messages the data transmit require
interpretation. In this spirit,wedraw inspiration fromthe
journeyof
KwaiChangCaine,herooftheclassicKungFuTVseries.Caine,amixe
d-
raceShaolinmonk,wandersinsearchofhisU.S.-bornhalf-
brotherinthe
nineteenthcenturyAmericanWest.Ashesearches,Cainequestionsal
l
hesees inhumanaffairs,uncoveringhiddenrelationshipsanddeeper
meanings.LikeCaine’s journey, theWayof ’Metrics is
illuminatedby
questions.
OtherThingsEqual
Inadisturbingdevelopmentyoumayhaveheardof,theproportionof
Americancollegestudentscompletingtheirdegreesinatimelyfashio
n
has taken a sharp turn south. Politicians and policy analysts
blame
fallingcollegegraduationratesonaperniciouscombinationoftuition
hikesand the large student loansmany studentsuse to finance
their
studies.Perhapsincreasedstudentborrowingderailssomewhowould
otherwisestayontrack.Thefactthatthestudentsmostlikelytodrop
out of school often shoulder large student loans would seem to
substantiatethishypothesis.
You’d rather pay for school with inherited riches than borrowed
money if you can. As we’ll discuss in detail, however,
education
probablyboostsearningsenoughtomakeloanrepaymentbearablefor
11. mostgraduates.Howthenshouldweinterpretthenegativecorrelation
betweendebtburdenandcollegegraduationrates?Doesindebtedness
causedebtorstodropout?Thefirstquestiontoaskinthiscontextiswho
borrows themost. Studentswhoborrowheavily typically come
from
middle and lower income families, since richer families have
more
savings.Formanyreasons,studentsfromlowerincomefamiliesarele
ss
likely to complete a degree than those fromhigher income
families,
regardlessofwhetherthey’veborrowedheavily.Weshouldtherefore
be
skeptical of claims that high debt burdens cause lower college
completionrateswhentheseclaimsarebasedsolelyoncomparisonsof
completionratesbetweenthosewithmoreorlessdebt.Byvirtueofthe
correlationbetweenfamilybackgroundandcollegedebt,thecontrasti
n
graduationratesbetweenthosewithandwithoutstudentloansisnotan
otherthingsequalcomparison.
Ascollegestudentsmajoringineconomics,wefirstlearnedtheother
thingsequal ideaby
itsLatinname,ceterisparibus.Comparisonsmade
underceterisparibusconditionshaveacausalinterpretation.Imagine
two
studentsidenticalineveryway,sotheirfamilieshavethesamefinanci
al
resourcesandtheirparentsaresimilarlyeducated.Oneofthesevirtual
twinsfinancescollegebyborrowingandtheotherfromsavings.Becau
se
theyareotherwiseequalineveryway(theirgrandmotherhastreated
bothtoasmallnestegg),differencesintheireducationalattainmentca
n
12. beattributedtothefactthatonlyonehasborrowed.Tothisday,we
wonderwhy somany economics students first encounter this
central
ideainLatin;maybeit’saconspiracytokeepthemfromthinkingabout
it.Because,as thishypotheticalcomparisonsuggests, realother
things
equalcomparisonsarehardtoengineer,somewouldevensayimpossib
ile
(that’sItaliannotLatin,butatleastpeoplestillspeakit).
Hardtoengineer,maybe,butnotnecessarilyimpossible.The’metrics
craftusesdatatogettootherthingsequalinspiteoftheobstacles —
called
selectionbiasoromittedvariablesbias —
foundonthepathrunningfrom
raw numbers to reliable causal knowledge. The path to causal
understandingisroughandshadowedasitsnakesaround theboulders
of selection bias. And yet, masters of ’metrics walk this path
with
confidenceaswellashumility,successfullylinkingcauseandeffect.
Our first line of attack on the causality problem is a randomized
experiment, often called a randomized trial. In a randomized
trial,
researcherschangethecausalvariablesofinterest(say,theavailabilit
y
ofcollegefinancialaid)foragroupselectedusingsomethinglikeacoin
toss.Bychangingcircumstancesrandomly,wemakeithighlylikelyth
at
the variable of interest is unrelated to the many other factors
determiningtheoutcomeswemeantostudy.Randomassignmentisn’t
thesameasholdingeverythingelse fixed,but ithas thesameeffect.
Randommanipulationmakesotherthingsequalholdonaverageacros
s
thegroupsthatdidanddidnotexperiencemanipulation.Asweexplain
13. inChapter1,“onaverage”isusuallygoodenough.
Randomized trials takeprideofplace inour ’metrics toolkit.Alas,
randomizedsocialexperimentsareexpensivetofieldandmaybeslowt
o
bear fruit, while research funds are scarce and life is short.
Often,
therefore,mastersof’metricsturntolesspowerfulbutmoreaccessible
researchdesigns.Evenwhenwecan’tpracticablyrandomize,howeve
r,
we still dreamof the trialswe’d like to do. The notion of an ideal
experimentdisciplinesourapproachtoeconometricresearch.Master
ing
’Metrics showshowwise application of our five favorite
econometric
toolsbringsusascloseaspossibletothecausality-revealingpowerofa
realexperiment.
Ourfavoriteeconometrictoolsareillustratedherethroughaseriesof
well-
craftedandimportanteconometricstudies.VettedbyGrandMast er
OogwayofKungFuPanda’sJadePalace,theseinvestigationsofcausa
l
effectsaredistinguishedbytheirawesomeness.Themethodstheyuse
—
random assignment, regression, instrumental variables,
regression
discontinuity designs, and differences-in-differences—are the
Furious
Five of econometric research. For starters, motivated by the
contemporary American debate over health care, the first
chapter
describes two social experiments that reveal whether, as many
14. policymakersbelieve,healthinsuranceindeedhelpsthosewhohaveit
stayhealthy.Chapters2–5putourothertoolstowork,craftinganswers
to importantquestions ranging from
thebenefitsofattendingprivate
collegesandselectivehighschoolstothecostsofteendrinkingandthe
effectsofcentralbankinjectionsofliquidity.
OurfinalchapterputstheFuriousFivetothetestbyreturningtothe
education arena. On average, college graduates earn about twice
as
muchashighschoolgraduates,anearningsgapthatonlyseemstobe
growing.Chapter6askswhetherthisgapisevidenceofalargecausal
returntoschoolingormerelyareflectionofthemanyotheradvantages
thosewithmoreeducationmighthave(suchasmoreeducatedparents).
Cantherelationshipbetweenschoolingandearningseverbeevaluate
d
onaceterisparibusbasis,ormustthebouldersofselectionbiasforever
blockourway?Thechallengeofquantifying thecausal linkbetween
schoolingandearningsprovidesagrippingtestmatchfor’metricstool
s
andthemasterswhowieldthem.
Mastering’Metrics
Chapter1
RandomizedTrials
KWAICHANGCAINE:Whathappensinaman’slifeisalreadywritte
n.A
manmustmovethroughlifeashisdestinywills.
OLDMAN:Yeteachisfreetoliveashechooses.Thoughtheyseem
15. opposite,botharetrue.
KungFu,Pilot
OurPath
Our path begins with experimental random assignment, both as
a
frameworkforcausalquestionsandabenchmarkbywhichtheresults
fromothermethodsare judged.We illustrate theawesomepowerof
randomassignmentthroughtworandomizedevaluationsoftheeffect
sof
health insurance. The appendix to this chapter also uses the
experimental framework to review the concepts and methods of
statisticalinference.
1.1InSicknessandinHealth(Insura nce)
The Affordable Care Act (ACA) has proven to be one of the
most
controversial and interestingpolicy innovationswe’ve
seen.TheACA
requiresAmericanstobuyhealthinsurance,withataxpenaltyforthose
who don’t voluntarily buy in. The question of the proper role of
governmentinthemarketforhealthcarehasmanyangles.Oneisthe
causaleffectofhealth insuranceonhealth.TheUnitedStates spends
moreof itsGDPonhealthcare thandootherdevelopednations,yet
Americansaresurprisinglyunhealthy.Forexample,Americansarem
ore
likelytobeoverweightanddiesoonerthantheirCanadiancousins,wh
o
spendonlyabouttwo-thirdsasmuchoncare.Americaisalsounusual
among developed countries in having no universal health
insurance
16. scheme.Perhapsthere’sacausalconnectionhere.
ElderlyAmericansarecoveredbyafederalprogramcalledMedicare,
while some poor Americans (including most single mothers,
their
children,andmanyotherpoorchildren)arecoveredbyMedicaid.Man
y
oftheworking,prime-agepoor,however,havelongbeenuninsured.In
fact,manyuninsuredAmericanshavechosennot toparticipate inan
employer-provided insuranceplan.1 Theseworkers, perhaps
correctly,
count on hospital emergency departments, which cannot turn
them
away,toaddresstheirhealth-
careneeds.Buttheemergencydepartment
mightnotbethebestplacetotreat,say,theflu,ortomanagec hronic
conditionslikediabetesandhypertensionthataresopervasiveamong
poorAmericans.Theemergencydepartmentisnotrequiredtoprovide
long-termcare.Itthereforestandstoreasonthatgovernment-
mandated
healthinsurancemightyieldahealthdividend.Thepushforsubsidize
d
universalhealthinsurancestemsinpartfromthebeliefthatitdoes.
The ceteris paribus question in this context contrasts the health
of
someonewithinsurancecoveragetothehealthofthesamepersonwere
theywithoutinsurance(otherthananemergencydepartmentbackstop
).
Thiscontrasthighlightsafundamentalempiricalconundrum:people
are
eitherinsuredornot.Wedon’tgettoseethembothways,atleastnotat
thesametimeinexactlythesamecircumstances.
Inhiscelebratedpoem,“TheRoadNotTaken,”RobertFrostusedthe
metaphor of a crossroads to describe the causal effects of
personal
choice:
18. an indexthatassigns5 toexcellenthealthand1topoorhealth ina
sample ofmarried 2009NHIS respondentswhomay ormay not be
insured.2 This index is our outcome: a measure we’re interested
in
studying.Thecausalrelationofinteresthereisdeterminedbyavariabl
e
thatindicatescoveragebyprivatehealthinsurance.Wecallthisvariab
le
thetreatment,borrowingfromtheliteratureonmedicaltrials,althoug
h
thetreatmentswe’reinterestedinneednotbemedicaltreatmentslike
drugsorsurgery.Inthiscontext,thosewithinsurancecanbethoughtof
asthetreatmentgroup;thosewithoutinsurancemakeupthecompariso
n
orcontrolgroup.Agoodcontrolgrouprevealsthefateofthetreatedina
counterfactualworldwheretheyarenottreated.
The first row of Table 1.1 compares the average health index of
insuredanduninsuredAmericans,withstatisticstabulatedseparately
for
husbandsandwives.3Thosewithhealthinsuranceareindeedhealthie
r
thanthosewithout,agapofabout.3intheindexformenand.4inthe
indexforwomen.Thesearelargedifferenceswhenmeasuredagainstt
he
standard deviation of the health index,which is about 1.
(Standard
deviations,reportedinsquarebracketsinTable1.1,measurevariabili
ty
indata.Thechapterappendixreviewstherelevantformula.)Theselar
ge
gapsmightbethehealthdividendwe’relookingfor.
19. FruitlessandFruitfulComparisons
Simplecomparisons,suchasthoseatthetopofTable1.1,areoftencited
as evidence of causal effects. More often than not, however,
such
comparisonsaremisleading.Onceagaintheproblemisotherthingseq
ual,
or lack thereof. Comparisons of people with and without health
insurancearenotapplestoapples;suchcontrastsareapplestooranges,
orworse.
Among other differences, those with health insurance are better
educated,havehigherincome,andaremorelikelytobeworkingthan
theuninsured.ThiscanbeseeninpanelBofTable1.1,whichreports
theaveragecharacteristicsofNHISrespondentswhodoanddon’thav
e
health insurance.Many of the differences in the table are large
(for
example,anearly3-yearschoolinggap);mostarestatisticallyprecise
enoughtoruleoutthehypothesisthatthesediscrepanciesaremerely
chancefindings(seethechapterappendixforarefresheronstatistical
significance).Itwon’tsurpriseyoutolearnthatmostvariablestabulat
ed
herearehighlycorrelatedwithhealthaswellaswithhealthinsurance
status.More-educatedpeople,forexample,tendtobehealthieraswell
as being overrepresented in the insured group. Thismaybe
because
more-
educatedpeopleexercisemore,smokeless,andaremorelikelyto
wearseatbelts.Itstandstoreasonthatthedifferenceinhealthbetween
insuredanduninsuredNHISrespondentsatleastpartlyreflectstheext
ra
schoolingoftheinsured.
21. 4
Tonailthisdownfurther,considerthestoryofvisitingMassachusetts
InstituteofTechnology(MIT)studentKhuzdarKhalat,recentlyarriv
ed
fromKazakhstan. Kazakhstan has a national health insurance
system
thatcoversallitscitizensautomatically(thoughyouwouldn’tgothere
just for the health insurance). Arriving
inCambridge,Massachusetts,
KhuzdarissurprisedtolearnthatMITstudentsmustdecidewhetherto
optintotheuniversity’shealthinsuranceplan,forwhichMITleviesa
hefty fee. Upon reflection, Khuzdar judges theMIT
insuranceworth
paying for, since he fears upper respiratory infections in
chillyNew
England.Let’ssaythatY0i=3andY1i=4fori=Khuzdar.Forhim,
thecausaleffectofinsuranceisonestepupontheNHISscale:
Table1.2summarizesthisinformation.
TABLE1.2
OutcomesandtreatmentsforKhuzdarandMaria
KhuzdarKhalat MariaMoreño
Potentialoutcomewithoutinsurance:Y0i 3 5
Potentialoutcomewithinsurance:Y1i 4 5
Treatment(insurancestatuschosen):Di 1 0
Actualhealthoutcome:Yi 4 5
Treatmenteffect:Y1i−Y0i 1 0
22. It’sworthemphasizingthatTable1.2isanimaginarytable:someof
theinformationitdescribesmustremainhidden.Khuzdarwilleitherb
uy
insurance,revealinghisvalueofY1i,orhewon’t,inwhichcasehisY0ii
s
revealed. Khuzdar has walked many a long and dusty road in
Kazakhstan,butevenhecannotbesurewha tliesattheendofthosenot
taken.
MariaMoreñoisalsocomingtoMITthisyear;shehailsfromChile’s
Andeanhighlands.LittleconcernedbyBostonwinters,heartyMariai
s
not the type to fall sick easily. She therefore passes up the MIT
insurance,planningtousehermoneyfortravelinstead.BecauseMaria
hasY0,Maria=Y1,Maria=5,thecausaleffectofinsuranceonherhealt
h
is
Maria’snumberslikewiseappearinTable1.2.
SinceKhuzdarandMariamakedifferentinsurancechoices,theyoffer
aninterestingcomparison.Khuzdar’shealthisYKhuzdar=Y1,Khuz
dar=4,
whileMaria’sisYMaria=Y0,Maria=5.Thedifferencebetweenthemi
s
Taken at face value, this quantity—which we observe—suggests
Khuzdar’s decision to buy insurance is counterproductive. His
MIT
insurancecoveragenotwithstanding,insuredKhuzdar’shealthiswor
se
thanuninsuredMaria’s.
Infact,thecomparisonbetweenfrailKhuzdarandheartyMariatells
23. uslittleaboutthecausaleffectsoftheirchoices.Thiscanbeseenby
linkingobservedandpotentialoutcomesasfollows:
Thesecondlineinthisequationisderivedbyaddingandsubtracting
Y0,Khuzdar, therebygenerating twohidden comparisons
thatdetermine
the onewe see. The first comparison,Y1,Khuzdar−Y0,Khuzdar,
is the
causaleffectofhealthinsuranceonKhuzdar,whichisequalto1.The
second,Y0,Khuzdar−Y0,Maria,isthedifferencebetweenthetwostu
dents’
healthstatuswerebothtodecideagainstinsurance.Thisterm,equalto
−2, reflectsKhuzdar’s relative frailty. In thecontextofoureffort
to
uncovercausaleffects,thelackofcomparabilitycapturedbythesecon
d
termiscalledselectionbias.
Youmightthinkthatselectionbiashassomethingtodowithour focus
on particular individuals instead of on groups, where, perhaps,
extraneousdifferencescanbeexpectedto“averageout.”Butthediffic
ult
problemofselectionbiascarriesovertocomparisonsofgroups,thoug
h,
insteadofindividualcausaleffects,ourattentionshiftstoavera gecaus
al
effects.Inagroupofnpeople,averagecausaleffectsarewrittenAvgn[
Y1i
−Y0i],where averaging is done in the usualway (that is,we sum
individualoutcomesanddividebyn):
Thesymbol indicatesasumovereveryonefromi=1ton,wheren
is thesizeof thegroupoverwhichweareaveraging.Notethatboth
summationsinequation(1.1)aretakenovereverybodyinthegroupof
interest.Theaveragecausaleffectofhealthinsurancecomparesavera
ge
24. healthinhypotheticalscenarioswhereeverybodyinthegroupdoesan
d
doesnothavehealthinsurance.Asacomputationalmatter,thisisthe
average of individual causal effects like Y1,Khuzdar −
Y0,Khuzdar and
Y1,Maria−Y0,Mariaforeachstudentinourdata.
An investigationof theaveragecausaleffectof insurancenaturally
begins by comparing the average health of groups of insured
and
uninsuredpeople,asinTable1.1.Thiscomparisonisfacilitatedbythe
constructionofadummyvariable,Di,whichtakesonthevalues0and1
toindicateinsurancestatus:
WecannowwriteAvgn[Yi|Di=1]fortheaverageamongtheinsured
and Avgn[Yi|Di = 0] for the average among the uninsured.
These
quantitiesareaveragesconditionaloninsurancestatus.5
TheaverageYifortheinsuredisnecessarilyanaverageofoutcome
Y1i, but contains no information aboutY0i. Likewise, the
averageYi
amongtheuninsuredisanaverageofoutcomeY0i,butthisaverageis
devoidofinformationaboutthecorrespondingY1i.Inotherwords,the
roadtakenbythosewithinsuranceendswithY1i,whiletheroadtaken
bythosewithoutinsuranceleadstoY0i.Thisinturnleadstoasimple
but important conclusion about the difference in average health
by
insurancestatus:
anexpressionhighlightingthefactthatthecomparisonsinTable1.1tel
l
ussomethingaboutpotentialoutcomes,thoughnotnecessarilywhatw
e
25. want to know.We’re afterAvgn[Y1i−Y0i], an average causal
effect
involvingeveryone’sY1iandeveryone’sY0i,butweseeaverageY1io
nly
fortheinsuredandaverageY0ionlyfortheuninsured.
Tosharpenourunderstandingofequation(1.2),ithelpstoimagine
thathealthinsurancemakeseveryonehealthierbyaconstantamount,κ
.
Asisthecustomamongourpeople,weuseGreekletterstolabelsuch
parameters,soastodistinguishthemfromvariablesordata;thisoneis
theletter“kappa.”Theconstant-effectsassumptionallowsustowrite:
or,equivalently,Y1i−Y0i=κ.Inotherwords,κisboththeindividual
andaveragecausaleffectofinsuranceonhealth.Thequestionathandis
howcomparisonssuchasthoseatthetopofTable1.1relatetoκ.
Using the constant-effects model (equation (1.3)) to substitute
for
Avgn[Y1i|Di=1]inequation(1.2),wehave:
Thisequationrevealsthathealthcomparisonsbetweenthosewithand
without insurance equal the causal effect of interest (κ) plus the
differenceinaverageY0ibetweentheinsuredandtheuninsured.Asin
theparableofKhuzdarandMaria,thissecondtermdescribesselection
bias.Specifically, thedifference inaveragehealthby
insurancestatus
canbewritten:
whereselectionbiasisdefinedasthedifferenceinaverageY0ibetwee
n
thegroupsbeingcompared.
Howdoweknowthatthedifferenceinmeansbyinsurancestatusis
26. contaminatedbyselectionbias?WeknowbecauseY0iisshorthandfor
everythingaboutpersonirelatedtohealth,otherthaninsurancestatus.
The lower part of Table 1.1 documents important noninsurance
differencesbetweentheinsuredanduninsured,showingthatceterisis
n’t
paribushereinmanyways.TheinsuredintheNHISarehealthierforall
sortsofreasons,including,perhaps,thecausaleffectsofinsurance.Bu
t
theinsuredarealsohealthierbecausetheyaremoreeducated,among
other things.To seewhy thismatters, imagineaworld inwhich the
causaleffectofinsuranceiszero(thatis,κ=0).Eveninsuchaworld,
we should expect insured NHIS respondents to be healthier,
simply
becausetheyaremoreeducated,richer,andsoon.
Wewrapupthisdiscussionbypointingoutthesubtleroleplayedby
informationlikethatreportedinpanelBofTable1.1.Thispanelshows
thatthegroupsbeingcompareddifferinwaysthatwecanobserve.As
we’llseeinthenextchapter,iftheonlysourceofselectionbiasisaset
of differences in characteristics that we can observe and
measure,
selectionbiasis(relatively)easytofix.Suppose,forexample,thatthe
onlysourceofselectionbiasintheinsurancecomparisoniseducation.
Thisbiasiseliminatedbyfocusingonsamplesofpeoplewiththesame
schooling,say,collegegraduates.Educationisthesameforinsuredan
d
uninsuredpeopleinsuchasample,becauseit’sthesameforeveryonein
thesample.
The subtlety inTable1.1arisesbecausewhenobserveddifferences
proliferate,soshouldoursuspicionsaboutunobserveddifferences.T
he
28. this
comparison ceteris paribus: groups insured and uninsured by
random
assignmentdifferonly intheir
insurancestatusandanyconsequences
thatfollowfromit.
SupposetheMITHealthServiceelectstoforgopaymentandtossesa
coin to determine the insurance status of new students Ashish
and
Zandile (just this once, as a favor to their distinguished
Economics
Department).Zandileisinsuredifthetosscomesupheads;otherwise,
Ashish gets the coverage. A good start, but not good enough,
since
random assignment of two experimental subjects does not
produce
insuredanduninsuredapples.Foronething,AshishismaleandZandil
e
female.Women,asarule,arehealthierthanmen.IfZandilewindsup
healthier,itmightbeduetohergoodluckinhavingbeenbornawoman
andunrelatedtoherluckydrawintheinsurancelottery.Theproblem
here is that two is not enough to tangowhen it comes to random
assignment.Wemustrandomlyassigntreatmentinasamplethat’slarg
e
enoughtoensurethatdifferencesinindividualcharacteristics
likesex
washout.
Two randomly chosen groups, when large enough, are indeed
comparable.Thisfactisduetoapowerfulstatisticalpropertyknownas
theLawofLargeNumbers(LLN).TheLLNcharacterizesthebehavior
of
29. sampleaverages in relation to sample size.Specifically,
theLLNsays
thatasampleaveragecanbebroughtascloseasweliketotheaverage
in the population from which it is drawn (say, the population of
Americancollegestudents)simplybyenlargingthesample.
ToseetheLLNinaction,playdice.6Specifically,rollafairdieonce
andsavetheresult.Thenrollagainandaveragethesetworesults.Keep
onrollingandaveraging.Thenumbers1to6areequallylikely(that’s
whythedieissaidtobe“fair”),sowecanexpecttoseeeachvaluean
equal number of times if we play long enough. Since there are
six
possibilitieshere,andallareequallylikely,theexpectedoutcomeisan
equallyweightedaverageofeachpossibility,withweightsequalto1/6
:
Thisaveragevalueof3.5 is calledamathematical expectation; in
this
case,it’stheaveragevaluewe’dgetininfinitelymanyrollsofafairdie.
The expectation concept is important to our work, so we define
it
formallyhere.
MATHEMATICALEXPECTATIONThemathematicalexpectation
ofavariable,Yi,
writtenE[Yi], is thepopulationaverageof thisvariable. IfYi isa
variable generated by a randomprocess, such as throwing a die,
E[Yi]istheaverageininfinitelymanyrepetitionsofthisprocess.IfYi
isavariablethatcomesfromasamplesurvey,E[Yi]istheaverage
obtained if everyone in the population fromwhich the sample is
drawnweretobeenumerated.
Rollingadieonlyafewtimes,theaveragetossmaybefarfromthe
30. correspondingmathematicalexpectation.Roll two times,
forexample,
andyoumightgetboxcarsorsnakeeyes(twosixesortwoones).These
averagetovalueswellawayfromtheexpectedvalueof3.5.Butasthe
numberoftossesgoesup,theaverageacrosstossesreliablytendsto3.5
.
ThisistheLLNinaction(andit’showcasinosmakeaprofit:inmost
gamblinggames,youcan’tbeatthehouseinthelongrun,becausethe
expectedpayout forplayers isnegative).More remarkably,
itneedn’t
take toomany rolls or too large a sample for a sample average to
approach the expected value. The chapter appendix addresses
the
question of how the number of rolls or the size of a sample
survey
determinesstatisticalaccuracy.
Inrandomizedtrials,experimentalsamplesarecreatedbysampling
fromapopulationwe’dliketostudyratherthanbyrepeatingagame,
buttheLLNworksjustthesame.Whensampledsubjectsarerandomly
divided(as ifbyacointoss) intotreatmentandcontrolgroups, they
comefromthesameunderlyingpopulation.TheLLNthereforepromis
es
thatthoseinrandomlyassignedtreatmentandcontrolsampleswillbe
similarifthesamplesarelargeenough.Forexample,weexpecttosee
similarproportionsofmenandwomeninrandomlyassignedtreatment
andcontrolgroups.Randomassignmentalsoproducesgroupsofabout
the same age and with similar schooling levels. In fact,
randomly
assignedgroupsshouldbesimilarineveryway,includinginwaysthat
we cannot easily measure or observe. This is the root of random
assignment’sawesomepowertoeliminateselectionbias.
Thepowerofrandomassignmentcanbedescribedpreciselyusingthe
following definition, which is closely related to the definition
of
31. mathematicalexpectation.
CONDITIONAL EXPECTATION The conditional expectation
of a variable,Yi,
givenadummyvariable,Di=1,iswrittenE[Yi|Di=1].Thisisthe
averageofYiinthepopulationthathasDiequalto1.Likewise,the
conditional expectation of a variable, Yi, given Di = 0, written
E[Yi|Di=0],istheaverageofYiinthepopulationthathasDiequal
to0.IfYiandDiarevariablesgeneratedbyarandomprocess,such
asthrowingadieunderdifferentcircumstances,E[Yi|Di=d]isthe
averageofinfinitelymanyrepetitionsofthisprocesswhileholding
thecircumstancesindicatedbyDifixedatd.IfYiandDicomefroma
samplesurvey,E[Yi|Di=d]istheaveragecomputedwheneveryone
inthepopulationwhohasDi=dissampled.
Becauserandomlyassignedtreatmentandcontrolgroupscomefrom
the same underlying population, they are the same in every way,
including their expected Y0i. In other words, the conditional
expectations,E[Y0i|Di=1]andE[Y0i|Di=0],arethesame.Thisinturn
meansthat:
RANDOM ASSIGNMENT ELIMINATES SELECTION BIAS
When Di is randomly
assigned,E[Y0i|Di= 1]= E[Y0i|Di= 0], and the difference in
expectations by treatment status captures the causal effect of
treatment:
ProvidedthesampleathandislargeenoughfortheLLNtoworkits
magic(sowecanreplacetheconditionalaveragesinequation(1.4)wi t
h
conditional expectations), selection bias disappears in a
randomized
experiment. Random assignmentworks not by eliminating
individual
32. differences but rather by ensuring that themix of individuals
being
comparedisthesame.Thinkofthisascomparingbarrelsthatinclude
equalproportionsofapplesandoranges.Asweexplaininthechapters
that follow, randomization isn’t theonlyway togenerate
suchceteris
paribuscomparisons,butmostmastersbelieveit’sthebest.
Whenanalyzingdatafromarandomizedtrialoranyotherresearch
design,mastersalmostalwaysbeginwithacheckonwhethertreatment
andcontrolgroupsindeedlooksimilar.Thisprocess,calledcheckingf
or
balance,amountstoacomparisonofsampleaveragesasinpanelBof
Table1.1.Theaveragecharacteristics inpanelBappeardissimilaror
unbalanced,underliningthefactthatthedatainthistabledon’tcome
fromanythinglikeanexperiment.It’sworthcheckingforbalanceinthi
s
manneranytimeyoufindyourselfestimatingcausaleffects.
Random assignment of health insurance seems like a fanciful
proposition. Yet health insurance coverage has twice been
randomly
assignedtolargerepresentativesamplesofAmericans.TheRANDHe
alth
InsuranceExperiment(HIE),whichranfrom1974to1982,wasoneof
themost influential social experiments in research history. The
HIE
enrolled3,958peopleaged14to61fromsixareasofthecountry.The
HIE sample excluded Medicare participants and most Medicaid
and
militaryhealth
insurancesubscribers.HIEparticipantswererandomly
assignedtooneof14insuranceplans.Participantsdidnothavetopay
insurancepremiums,buttheplanshadavarietyofprovisionsr elatedto
costsharing,leadingtolargedifferencesintheamountofinsurancethe
33. y
offered.
ThemostgenerousHIEplanofferedcomprehensivecareforfree.At
theotherendoftheinsurancespectrum,three“catastrophiccoverage”
plans required families topay95%of theirhealth-carecosts,
though
thesecostswerecappedasaproportionofincome(orcappedat$1,000
perfamily,ifthatwaslower).Thecatastrophicplansapproximateano-
insurance condition. A second insurance scheme (the
“individual
deductible” plan) also required families to pay 95% of
outpatient
charges,butonlyupto$150perpersonor$450perfamily.Agroupof
nine other plans had a variety of coinsurance provisions,
requiring
participantstocoveranywherefrom25%to50%ofcharges,butalways
capped at a proportion of income or $1,000, whicheverwas
lower.
Participatingfamiliesenrolledintheexperimentalplansfor3or5year
s
andagreed togiveupanyearlier insurancecoverage in return fora
fixedmonthlypaymentunrelatedtotheiruseofmedicalcare.7
TheHIEwasmotivatedprimarilybyaninterestinwhateconomists
callthepriceelasticityofdemandforhealthcare.Specifically,theRA
ND
investigatorswantedtoknowwhetherandbyhowmuchhealth-
careuse
fallswhenthepriceofhealthcaregoesup.Familiesinthefreecareplan
facedapriceofzero,whilecoinsuranceplanscutpricesto25%or50%
of costs incurred, and families in the catastrophic coverage and
deductibleplanspaidsomethingclosetothestickerpriceforcare,at
leastuntiltheyhitthespendingcap.Buttheinvestigatorsalsowantedt
34. o
knowwhethermorecomprehensiveandmoregeneroushealthinsuran
ce
coverageindeedleadstobetterhealth.Theanswertothefirstquestion
wasaclear“yes”:health-careconsumptionishighlyresponsivetothe
priceofcare.Theanswertothesecondquestionismurkier.
RandomizedResults
Randomized field experiments are more elaborate than a coin
toss,
sometimes regrettably so. TheHIEwas
complicatedbyhavingmany
smalltreatmentgroups,spreadovermorethanadozeninsuranceplans
.
Thetreatmentgroupsassociatedwitheachplanaremostlytoosmallfor
comparisonsbetweenthemtobestatisticallymeaningful.Mostanalys
es
oftheHIEdatathereforestartbygroupingsubjectsw howereassigned
tosimilarHIEplanstogether.Wedothathereaswell.8
Anatural grouping scheme combines plans by the amount of cost
sharing they require. The three catastrophic coverage plans,
with
subscribers shouldering almost all of theirmedical expenses up
to a
fairly high cap, approximate a no-insurance state. The
individual
deductibleplanprovidedmorecoverage,butonlybyreducingthecap
ontotalexpensesthatplanparticipantswererequiredtoshoulder.The
ninecoinsuranceplansprovidedmoresubstantialcoveragebysplittin
g
subscribers’ health-care costswith the insurer, startingwith the
first
35. dollar of costs incurred. Finally, the free plan constituted a
radical
interventionthatmightbeexpectedtogeneratethelargestincreasein
health-careusageand,perhaps,health.Thiscategorizatio nleadsusto
four groups of plans: catastrophic, deductible, coinsurance, and
free,
instead of the 14 original plans. The catastrophic plans provide
the
(approximate)no-
insurancecontrol,whilethedeductible,coinsurance,
andfreeplansarecharacterizedbyincreasinglevelsofcoverage.
Aswithnonexperimentalcomparisons,afirststepinourexperimental
analysis is to check for balance. Do subjects randomly assigned
to
treatmentandcontrolgroups—
inthiscase,tohealthinsuranceschemes
ranging from little to complete coverage—indeed look similar?
We
gauge thisbycomparingdemographiccharacteristicsandhealthdata
collected before the experiment began. Because demographic
characteristicsareunchanging,while thehealthvariables
inquestion
weremeasuredbeforerandomassignment,weexpecttoseeonlysmall
differences in these variables across the groups assigned to
different
plans.
IncontrastwithourcomparisonofNHISrespondents’characteristics
byinsurancestatusinTable1.1,acomparisonofcharacteristicsacross
randomlyassignedtreatmentgroupsintheRANDexperimentshowst
he
peopleassignedtodifferentHIEplanstobesimilar.Thiscanbeseenin
panelAofTable1.3.Column(1)inthistablereportsaveragesforthe
catastrophic plan group, while the remaining columns compare
the
groupsassignedmoregenerousinsurancecoveragewiththecatastrop
hic
36. controlgroup.Asasummarymeasure,column(5)comparesasample
combiningsubjectsinthedeductible,coinsurance,andfreeplanswith
subjectsinthecatastrophicplans.Individualsassignedtotheplanswit
h
moregenerouscoveragearealittlelesslikelytobefemaleandalittle
lesseducated than those in thecatastrophicplans.Wealso see
some
variation in income, but differences between plan groups
aremostly
smallandareaslikelytogoonewayasanother.Thispatterncontrasts
withthelargeandsystematicdemographicdifferencesbetweeninsur
ed
anduninsuredpeopleseenintheNHISdatasummarizedinTable1.1.
ThesmalldifferencesacrossgroupsseeninpanelAofTable1.3seem
likelytoreflectchancevariationthatemergesnaturallyaspartofthe
sampling process. In any statistical sample, chance differences
arise
because we’re looking at one of many possible draws from the
underlying population fromwhichwe’ve sampled. A new sample
of
similar size from the same population can be expected to
produce
comparisons that are similar—though not identical—to those in
the
table.Thequestionofhow muchvariationweshouldexpectfromone
sampletoanotherisaddressedbythetoolsofstatisticalinference.
TABLE1.3
DemographiccharacteristicsandbaselinehealthintheRANDHIE
Notes:Thistabledescribesthedemographiccharacteristicsandbasel
37. inehealthofsubjectsin
theRANDHealth InsuranceExperiment (HIE).Column (1) shows
theaverage for thegroup
assigned catastrophic coverage. Columns (2)–(5) compare
averages in the deductible, cost-
sharing,freecare,andanyinsurancegroupswiththeaverageincolumn
(1).Standarderrorsare
reported in parentheses in columns (2)–(5); standard deviations
are reported in brackets in
column(1).
Theappendixtothischapterbrieflyexplainshowtoquantifysampling
variation with formal statistical tests. Such tests amount to the
juxtapositionofdifferencesinsampleaverageswiththeirstandarderr
ors,
thenumbersinparenthesesreportedbelowthedifferencesinaverages
listedincolumns(2)–
(5)ofTable1.3.Thestandarderrorofadifference
inaveragesisameasureofitsstatisticalprecision:whenadifferencein
sample averages is smaller than about two standard errors, the
differenceistypicallyjudgedtobeachancefindingcompatiblewithth
e
hypothesisthatthepopulationsfromwhichthesesamplesweredrawn
are,infact,thesame.
Differencesthatarelargerthanabouttwostandarderrorsaresaidto
bestatisticallysignificant:insuchcases,itishighlyunlikely(thoughn
ot
impossible) that
thesedifferencesarosepurelybychance.Differences
thatarenotstatisticallysignificantareprobablyduetothevagariesof
the sampling process. The notion of statistical significance
helps us
38. interpret comparisons like those in Table 1.3. Not only are the
differencesinthistablemostlysmall,onlytwo(forproportionfemalei
n
columns (4) and (5)) aremore than twiceas largeas theassociated
standarderrors.Intableswithmanycomparisons,thepresenceofafew
isolatedstatisticallysignificantdifferencesisusuallyalsoattributabl
eto
chance.Wealsotakecomfortfromthefactthatthestandarderrorsin
this table are not very big, indicating differences across groups
are
measuredreasonablyprecisely.
Panel B of Table 1.3 complements the contrasts in panel A with
evidence for reasonablygoodbalance inpre-
treatmentoutcomesacross
treatmentgroups.Thispanelshowsnostatisticallysignificantdiffere
nces
in a pre-treatment index of general health. Likewise, pre-
treatment
cholesterol,bloodpressure,andmental healthappearlargelyunrelate
d
to treatment assignment, with only a couple of contrasts close to
statisticalsignificance.Inaddition,althoughlowercholesterolinthef
ree
groupsuggestssomewhatbetterhealththaninthecatastrophicgroup,
differencesinthegeneralhealthindexbetweenthesetwogroupsgothe
otherway(sincelowerindexvaluesindicateworsehealth).Lackofa
consistent pattern reinforces the notion that these gaps are due
to
chance.
ThefirstimportantfindingtoemergefromtheHIEwasthatsubjects
assigned to more generous insurance plans used substantially
more
health care. This finding, which vindicates economists’ view
that
39. demandforagoodshouldgoupwhenitgetscheaper,canbeseenin
panel A of Table 1.4.9 As might be expected, hospital inpatient
admissions were less sensitive to price than was outpatient care,
probablybecauseadmissionsdecisionsareusuallymadebydoctors.O
n
the other hand, assignment to the free care plan raised
outpatient
spending by two-thirds (169/248) relative to spending by those
in
catastrophic plans, while total medical expenses increased by
45%.
These large gaps are economically important as well as
statistically
significant.
Subjectswhodidn’thavetoworryaboutthecostofhealthcareclearly
consumedquiteabitmoreofit.Didthisextracareandexpensemake
themhealthier?PanelBinTable1.4,whichcompareshealthindicators
across HIE treatment groups, suggests not. Cholesterol levels,
blood
pressure,andsummaryindicesofoverallhealthandmentalhealthare
remarkablysimilaracrossgroups(theseoutcomesweremostlymeasu
red
3or5yearsafterrandomassi gnment).Formalstatisticaltestsshowno
statisticallysignificantdifferences,ascanbeseeninthegroup-
specific
contrasts(reportedincolumns(2) –(4))andinthedifferencesinhealth
betweenthoseinacatastrophicplanandeveryoneinthemoregenerous
insurancegroups(reportedincol umn(5)).
TheseHIEfindingsconvincedmanyeconomiststhatgeneroushealth
insurance can have unintended and undesirable consequences,
increasinghealth-
40. careusageandcosts,withoutgeneratingadividendin
theformofbetterhealth.10
TABLE1.4
HealthexpenditureandhealthoutcomesintheRANDHIE
Notes: This table reportsmeans and treatment effects for health
expenditure and health
outcomesintheRANDHealthInsuranceExperiment(HIE).Column(
1)showstheaverageforthe
groupassignedcatastrophiccoverage.Columns(2)–
(5)compareaveragesinthedeductible,cost-
sharing,freecare,andanyinsurancegroupswiththeaverageincolumn
(1).Standarderrorsare
reported in parentheses in columns (2)–(5); standard deviations
are reported in brackets in
column(1).
1.2TheOregonTrail
MASTERKAN:Truthishardtounderstand.
KWAICHANGCAINE:Itisafact,itisnotthetruth.Truthisoftenhidde
n,
likeashadowindarkness.
KungFu,Season1,Episode14
The HIE was an ambitious attempt to assess the impact of health
insurance on health-care costs and health. And yet, as far as the
contemporarydebateoverhealth insurancegoes, theHIEmighthave
missedthemark.Foronething,eachHIEtreatmentgrouphadatleast
catastrophic coverage, so financial liability for health-care
41. costswas
limited under every treatment. More importantly, today’s
uninsured
Americans differ considerably from theHIE population:most of
the
uninsured are younger, less educated, poorer, and less likely to
be
working.Thevalueofextrahealthcareinsuchagroupmightbevery
differentthanforthemiddleclassfamiliesthatparticipatedintheHIE.
Oneofthemostcontroversialideasinthecontemporaryhealthpolicy
arena is the expansion ofMedicaid to cover the currently
uninsured
(interestingly, on the eve of the RAND experiment, talk was of
expanding Medicare, the public insurance program for
America’s
elderly).Medicaidnowcoversfamiliesonwelfare,someofthedisable
d,
otherpoor children, andpoorpregnantwomen.Supposewewere to
expandMedicaidtocoverthosewhodon’tqualifyundercurrentrules.
Howwould suchan expansionaffecthealth-care spending?Would
it
shift treatment from costly and crowded emergency departments
to
possibly more effective primary care? Would Medicaid
expansion
improvehealth?
Many American states have begun to “experiment”withMedicaid
expansioninthesensethatthey’veagreedtobroadeneligibility,with
thefederalgovernmentfootingmostofthebill.Alas,thesearen’treal
experiments, since everyone who is eligible for expanded
Medicaid
coverage gets it. The most convincing way to learn about the
consequences of Medicaid expansion is to randomly offer
Medicaid
42. coveragetopeopleincurrentlyineligiblegroups.Randomassignment
of
Medicaid seems too much to hope for. Yet, in an awesome
social
experiment,thestateofOregonrecentlyofferedMedicaidtothousand
s
of randomlychosenpeople inapubliclyannouncedhealth insurance
lottery.
We can think of Oregon’s health insurance lottery as randomly
selectingwinnersandlosersfromapoolofregistrants,thoughcoverag
e
was not automatic, even for lottery winners. Winners won the
opportunitytoapplyfor thestate-runOregonHealthPlan(OHP), the
OregonversionofMedicaid.Thestatethenreviewedtheseapplication
s,
awardingcoveragetoOregonresidentswhowereU.S.citizensorlegal
immigrantsaged19–
64,nototherwiseeligibleforMedicaid,uninsured
foratleast6months,withincomebelowthefederalpovertylevel,and
few financial assets. To initiate coverage, lottery winners had to
documenttheirpovertystatusandsubmittherequiredpaperworkwith
in
45days.
Therationaleforthe2008OHPlotterywasfairnessandnotresearch,
butit’snolessawesomeforthat.TheOregonhealthinsurancelottery
providessomeofthebestevidencewecanhopetofindonthecostsand
benefitsof insurancecoverageforthecurrentlyuninsured,afactthat
motivated researchonOHPbyMITmasterAmyFinkelstein andher
coauthors.11
Roughly75,000 lotteryapplicants registered
43. forexpandedcoverage
throughtheOHP.Ofthese,almost30,000wererandomlyselectedand
invitedtoapplyforOHP;thesewinnersconstitutetheOHPtreatment
group.Theother45,000constitutetheOHPcontrolsample.
ThefirstquestionthatarisesinthiscontextiswhetherOHPlottery
winnersweremorelikelytoendupinsuredasaresultofwinning.This
question ismotivated by the fact that some applicants qualified
for
regularMedicaidevenwithoutthelottery.PanelAofTable1.5shows
thatabout14%ofcontrols(lotterylosers)werecoveredbyMedicaidin
theyearfollowingthefirstOHPlottery.Atthesametime,thesecond
column,which reportsdifferencesbetween the
treatmentandcontrol
groups,showsthattheprobabilityofMedicaidcoverageincreasedby2
6
percentage points for lottery winners. Column (4) shows a
similar
increase for the subsample living in and around Portland,
Oregon’s
largest city.Theupshot is thatOHP lotterywinnerswere insuredat
muchhigherratesthanwerelotterylosers,adifferencethatmighthave
affectedtheiruseofhealthcareandtheirhealth.12
TheOHPtreatmentgroup(thatis,lotterywinners)usedmorehealth-
careservicesthantheyotherwisewouldha ve.Thiscanalsobeseenin
Table1.5,whichshowsestimatesofchangesinserviceuseintherows
below the estimate of the OHP effect on Medicaid coverage.
The
hospitalizationrateincreasedbyabouthalfapercentagepoint,amode
st
though statistically significant effect. Emerge ncy department
visits,
44. outpatientvisits,andprescriptiondruguseallincreasedmarkedly.Th
e
factthatthenumberofemergencydepartmentvisitsroseabout10%,a
precisely estimated effect (the standard error associated with
this
estimate, reported in column (4), is .029), is especially
noteworthy.
Many policymakers hoped and expected health insurance to
shift
formerlyuninsuredpatientsawayfromhospitalemergencydepartme
nts
towardlesscostlysourcesofcare.
TABLE1.5
OHPeffectsoninsurancecoverageandhealth-careuse
Notes:ThistablereportsestimatesoftheeffectofwinningtheOregon
HealthPlan(OHP)
lotteryoninsurancecoverageanduseofhealthcare.Odd-
numberedcolumnsshowcontrolgroup
averages. Even-numbered columns report the regression
coefficient on a dummy for lottery
winners.Standarderrorsarereportedinparentheses.
Finally, theproofof thehealth insurancepuddingappears inTable
1.6: lottery winners in the statewide sample report a modest
improvementintheprobabilitytheyassesstheirhealthasbeinggoodo
r
better(aneffectof.039,whichcanbecomparedwithacontrolmeanof
.55; theHealth isGoodvariable isadummy).Results fromin-person
interviewsconducted inPortlandsuggest thesegains stemmore
from
improvedmental rather than physical health, as can be seen in
the
45. second and third rows in column (4) (the health variables in the
Portlandsampleare indicesranging from0 to100).As in theRAND
experiment,resultsfromPortlandsuggestphysicalhealthindicatorsl
ike
cholesterol and blood pressurewere largely unchanged by
increased
accesstoOHPinsurance.
TABLE1.6
OHPeffectsonhealthindicatorsandfinancialhealth
Notes:ThistablereportsestimatesoftheeffectofwinningtheOregon
HealthPlan(OHP)
lotteryonhealth indicatorsandfinancialhealth.Odd-
numberedcolumnsshowcontrolgroup
averages. Even-numbered columns report the regression
coefficient on a dummy for lottery
winners.Standarderrorsarereportedinparentheses.
TheweakhealtheffectsoftheOHPlotterydisappointedpolicymakers
wholookedtopubliclyprovidedinsurancetogenerateahealthdividen
d
for low-income Americans. The fact that health insurance
increased
ratherthandecreasedexpensiveemergencydepartmentuseisespecia
lly
frustrating.Atthesametime,panelBofTable1.6revealsthathealth
insurance provided the sort of financial safety net forwhich
itwas
designed.Specifically,householdswinningthelotterywerelesslike l
yto
have incurred large medical expenses or to have accumulated
debt
generated by the need to pay for health care. It may be this
46. improvement in financial health that accounts for improved
mental
healthinthetreatmentgroup.
Italsobearsemphasizingthatthe financialandhealtheffectsseenin
Table1.6mostlikelycomefromthe25%ofthesamplewhoobtained
insuranceasaresultofthelottery.Adjustingforthefactthatinsurance
statuswasunchangedformanywinnersshowsthatgains in financial
securityandmentalhealthfortheone-quarterofapplicantswhowere
insuredasaresultofthelotterywereconsiderablylargerthansimple
comparisons of winners and losers would suggest. Chapter 3, on
instrumentalvariablesmethods,detailsthenatureofsuchadjustment
s.
As you’ll soon see, the appropriate adjustment here amounts to
the
divisionofwin/lossdifferencesinoutcomesbywin/lossdifferencesi
n
theprobabilityofinsurance.Thisimpliesthattheeffectofbeinginsure
d
is asmuch as four times larger than the effect ofwinning theOHP
lottery(statisticalsignificanceisunchangedbythisadjustment).
The RAND and Oregon findings are remarkably similar. Two
ambitiousexperimentstargetingsubstantiallydifferentpopulations
show
that the use of health-care services increases sharply in response
to
insurance coverage, while neither experiment reveals much of
an
insurance effect on physical health. In 2008, OHP lottery
winners
enjoyed small but noticeable improvements in mental health.
Importantly,andnotcoincidentally,OHPalsosucceeded in
insulating
many lotterywinners fromthe
48. Mastersof’Metrics:FromDanieltoR.A.Fisher
ThevalueofacontrolgroupwasrevealedintheOldTestament.The
BookofDanielrecountshowBabylonianKingNebuchadnezzardecid
ed
togroomDanielandother Israelite captives forhis royal
service.As
slaverygoes,thiswasn’tabadgig,sincethekingorderedhiscaptivesb
e
fed“foodandwinefromtheking’stable.”Danielwasuneasyaboutthe
rich diet, however, preferring modest vegetarian fare. The
king’s
chamberlainsinitiallyrefusedDaniel’sspecialmealsrequest,fearing
that
hisdietwouldprove inadequate foronecalledon to serve theking.
Daniel,notwithoutchutzpah,proposedacontrolledexperiment:“Tes
t
yourservants fortendays.Giveusnothingbutvegetablestoeatand
watertodrink.Thencompareourappearancewiththatoftheyoung
menwhoeattheroyalfood,andtreatyourservantsinaccordancewith
what you see” (Daniel 1, 12–13). The Bible recounts how this
experiment supported Daniel’s conjecture regarding the relative
healthfulness of a vegetarian diet, though as far aswe
knowDaniel
himselfdidn’tgetanacademicpaperoutofit.
Nutrition is a recurring theme in the quest for balance. Scurvy,
a
debilitatingdiseasecausedbyvitaminCdeficiency,wasthescourgeof
theBritishNavy. In 1742, James Lind, a
surgeononHMSSalisbury,
experimentedwithacureforscurvy.Lindchose12seamenwithscurvy
and started themonan identicaldiet.He then formed sixpairs and
treatedeachofthepairswithadifferentsupplementtotheirdailyfood
50. Fisher hadmany fantastically good ideas and a few bad ones. In
additiontoexplainingthevalueofrandomassignment,heinventedthe
statisticalmethodofmaximumlikelihood.Alongwith
’metricsmaster
SewallWright(andJ.B.S.Haldane),helaunchedthefieldoftheoretica
l
population genetics. But he was also a committed eugenicist and
a
proponentof
forcedsterilization(aswasregressionmasterSirFrancis
Galton,whocoinedtheterm“eugenics”).Fisher,alifelongpipesmoke
r,
wasalsoonthewrongsideofthedebateoversmokingandhealth,due
inparttohisstronglyheldbeliefthatsmokingandlungcancersharea
commongeneticorigin.Thenegativeeffectofsmokingonhealthnow
seemswellestablished,thoughFisherwasrighttoworryaboutselecti
on
biasinhealthresearch.Manylifestylechoices,suchaslow -
fatdietsand
vitamins,havebeenshown tobeunrelated tohealthoutcomeswhen
evaluatedwithrandomassignment.
Appendix:MasteringInference
YOUNGCAINE:Iampuzzled.
MASTERPO:Thatisthebeginningofwisdom.
KungFu,Season2,Episode25
Thisisthefirstofanumberofappendicesthatfillinkeyeconometric
and statistical details. You can spend your life studying
statistical
inference;manymastersdo.Hereweofferabrief sketchofessential
ideasandbasicstatisticaltools,enoughtounderstandtableslikethose
in
51. thischapter.
TheHIEisbasedonasampleofparticipantsdrawn(moreorless)at
random from the population eligible for the experiment.
Drawing
anothersamplefromthesamepopulation,we’dgetsomewhatdifferen
t
results,butthegeneralpictureshouldbesimilarifthesampleislarge
enoughfortheLLNtokickin.Howcanwedecidewhetherstatistical
resultsconstitutestrongevidenceormerelyaluckydraw,unlikelytob
e
replicatedinrepeatedsamples?Howmuchsamplingvarianceshould
we
expect?Thetoolsofformalstatisticalinferenceanswerthesequestion
s.
Thesetoolsworkforalloftheeconometricstrategiesofconcerntous.
Quantifyingsamplinguncertainty isanecessarystep
inanyempirical
project and on the road to understanding statistical claimsmade
by
others.WeexplainthebasicinferenceideahereinthecontextofHIE
treatmenteffects.
The taskathand is thequantificationof theuncertaintyassociate d
withaparticularsampleaverageand,especially,groupsofaveragesan
d
thedifferencesamongthem.Forexample,we’dliketoknowifthelarge
differencesinhealth-
careexpenditureacrossHIEtreatmentgroupscan
bediscountedasachancefinding.TheHIEsamplesweredrawnfroma
much largerdata set thatwe thinkof as covering thepopulationof
interest. The HIE population consists of all families eligible for
the
experiment(tooyoungforMedicareandsoon).Insteadofstudyingthe
manymillionsofsuchfamilies,amuchsmallergroupofabout2,000
52. families (containingabout4,000people)was selectedat randomand
thenrandomlyallocatedtooneof14plansortreatmentgroups.Note
thattherearetwosortsofrandomnessatworkhere:thefirstpertainsto
theconstructionofthestudysampleandthesecondtohowtreatment
wasallocatedtothosewhoweresampled.Randomsamplingandrando
m
assignmentarecloselyrelatedbutdistinctideas.
AWorldwithoutBias
We first quantify the uncertainty induced by random sampling,
beginning with a single sample average, say, the average health
of
everyone in thesampleathand,asmeasuredbyahealth index.Our
targetisthecorrespondingpopulationaveragehealthindex,thatis,the
meanovereveryoneinthepopulationofinterest.Aswenotedonp.14,
thepopulationmeanofavariableiscalleditsmathematicalexpectatio
n,
or justexpectation for short.For theexpectationo favariable,Yi,we
write E[Yi]. Expectation is intimately related to formal notions
of
probability. Expectations can bewritten as aweighted average of
all
possiblevaluesthatthevariableYicantakeon,withweightsgivenby
the probability these values appear in the population. In our
dice-
throwingexample,theseweightsareequalandgivenby1/6(seeSectio
n
1.1).
Unlikeournotationforaverages,thesymbolforexpectationdoesnot
reference the sample size.That’sbecause
expectationsarepopulation
quantities, defined without reference to a particular sample of
individuals.Foragivenpopulation,thereisonlyoneE[Yi],whilethere
53. aremanyAvgn[Yi],dependingonhowwechoosenandjustwhoendsup
inoursample.BecauseE[Yi]isafixedfeatureofaparticularpopulatio
n,
wecallitaparameter.Quantitiesthatvaryfromone sampletoanother,
suchasthesampleaverage,arecalledsamplestatistics.
Atthispoint,it’shelpfultoswitchfromAvgn[Yi]toamorecompact
notationforaverages,Ȳ.Notethatwe’redispensingwiththesubscript
n
to avoid clutter—henceforth, it’s on you to remember that
sample
averages are computed in a sample of a particular size. The
sample
average,Ȳ,isagoodestimatorofE[Yi](instatistics,anestimatorisany
functionofsampledatausedtoestimateparameters).Foronething,the
LLNtellsusthatinlargesamples,thesampleaverageislikelytobevery
closetothecorrespondingpopulationmean.Arelatedpropertyisthat
theexpectationofȲisalsoE[Yi].Inotherwords,ifweweretodraw
infinitelymanyrandomsamples,theaverageoftheresultingȲacross
draws would be the underlying population mean. When a sample
statistic has expectation equal to the corresponding population
parameter,it’ssaidtobeanunbiasedestimatorofthatparameter.Here’
s
thesamplemean’sunbiasednesspropertystatedformally:
UNBIASEDNESSOFTHESAMPLEMEANE[Ȳ]=E[Yi]
The sample mean should not be expected to be bang on the
corresponding population mean: the sample average in one
sample
might be too big, while in other samples it will be too small.
Unbiasednesstellsusthatthesedeviationsarenotsystematicallyupor
down; rather, in repeated samples they average out to zero. This
unbiasedness property is distinct from the LLN,which says that
54. the
samplemeangetscloserandclosertothepopulationmeanasthesampl
e
sizegrows.Unbiasednessofthesamplemeanholdsforsamplesofany
size.
MeasuringVariability
In addition to averages, we’re interested in variability. To gauge
variability,it’scustomarytolookataveragesquareddeviationsfromt
he
mean, in which positive and negative gaps get equal weight. The
resultingsummaryofvariabilityiscalledvariance.
ThesamplevarianceofYiinasampleofsizenisdefinedas
The corresponding population variance replaces averages with
expectations,giving:
Like E[Yi], the quantityV(Yi) is a fixed feature of a
population—a
parameter. It’s therefore customary to christen it inGreek: ,
whichisreadas“sigma-squared-y.”16
Becausevariancessquarethedatatheycanbeverylarge.Multiplya
variableby10and itsvariancegoesupby100.Therefore,weoften
describevariabilityusingthesquarerootofthevariance:thisiscalled
the standard deviation,writtenσY.Multiply a variable by 10 and
its
standarddeviationincreasesby10.Asalways,thepopulationstandar
d
deviation,σY,hasasamplecounterpartS(Yi),thesquarerootofS(Yi)
2.
VarianceisadescriptivefactaboutthedistributionofYi.(Reminder:
56. o
one another; in otherwords, they are statistically independent.
This
important property allows us to take advantage of the fact that
the
varianceofasumofstatisticallyindependentobservations,eachdraw
n
randomly from the same population, is the sum of their
variances.
Moreover,becauseeachYi issampledfromthesamepopulation,each
drawhasthesamevariance, .Finally,weusethepropertythatthe
varianceofaconstant(like1/n)timesYiisthesquareofthisconstant
timesthevarianceofYi.Fromtheseconsiderations,weget
Simplifyingfurther,wehave
We’veshownthatthesamplingvarianceofasampleaveragedepends
onthevarianceoftheunderlyingobservations, ,andthesamplesize,
n. As you might have guessed, more data means less dispersion
of
sampleaveragesinrepeatedsamples.Infact,whenthesamplesizeis
verylarge,there’salmostnodispersionatall,becausewhennislarge,
is small. This is the LLNatwork: asn approaches infinity, the
sampleaverageapproachesthepopulationmean,andsamplingvarian
ce
disappears.
Inpractice,weoftenworkwiththestandarddeviationofthesample
meanratherthanitsvariance.Thestandarddeviationofastatisticlike
thesampleaverageiscalleditsstandarderror.Thestandarderrorofthe
samplemeancanbewrittenas
57. Everyestimatediscussedinthisbookhasanassociatedstandarderror.
This includes sample means (for which the standard error
formula
appearsinequation(1.6)),differencesinsamplemeans(discussedlat
er
inthisappendix),regressioncoefficients(discussedinChapter2),and
instrumentalvariablesandothermoresophisticatedestimates.Form
ulas
forstandarderrorscangetcomplicated,but the idearemainssimple.
The standard error summarizes the variability in an estimate due
to
random sampling. Again, it’s important to avoid confusing
standard
errorswiththestandarddeviationsoftheunderlyingvariables;thetwo
quantitiesareintimatelyrelatedyetmeasuredifferentthings.
One last step on the road to standard errors: most population
quantities, includingthestandarddeviationinthenumeratorof(1.6),
are unknown and must be estimated. In practice, therefore, when
quantifyingthesamplingvarianceofasamplemean,weworkwithan
estimatedstandarderror.ThisisobtainedbyreplacingσYwithS(Yi)i
nthe
formula for SE(Ȳ). Specifically, the estimated standard error of
the
samplemeancanbewrittenas
Weoftenforgetthequalifier“estimated”whendiscussingstatisticsan
d
theirstandarderrors,butthat’sstillwhatwehaveinmind.Forexample,
thenumbersinparenthesesinTable1.4areestimatedstandarderrors
fortherelevantdifferencesinmeans.
Thet-StatisticandtheCentralLimitTheorem
58. Havinglaidoutasimpleschemetomeasurevaria bilityusingstandard
errors,itremainstointerpretthismeasure.Thesimplestinterpretation
usesat-statistic.Supposethedataathandcomefromadistributionfor
whichwe believe the populationmean,E[Yi], takes on a
particular
value, μ (read this Greek letter as “mu”). This value constitutes
a
workinghypothesis.At-
statisticforthesamplemeanundertheworking
hypothesisthatE[Yi]=μisconstructedas
Theworkinghypothesisisareferencepointthatisoftencalledthenull
hypothesis.Whenthenullhypothesisisμ=0,thet-statisticistheratio
ofthesamplemeantoitsestimatedstandarderror.
Manypeoplethinkthescienceofstatisticalinferenceisboring,butin
fact it’snothingshortofmiraculous.Onemiraculousstatistical fact
is
thatifE[Yi]isindeedequaltoμ,then—aslongasthesampleislarge
enough—thequantityt(μ)hasasamplingdistributionthatisveryclose
toabell-shapedstandardnormaldistribution, sketched inFigure1.1.
Thisproperty,whichappliesregardlessofwhetherYiitselfisnormall
y
distributed,iscalledtheCentralLimitTheorem(CLT).TheCLTallow
sus
tomakeanempirically informeddecisi onastowhethertheavailable
datasupportorcastdoubtonthehypothesisthatE[Yi]equalsμ.
FIGURE1.1
Astandardnormaldistribution
TheCLTisanastonishingandpowerfulresult.Amongotherthings,it
impliesthatthe(large-sample)distributionofat-
statisticisindependent
59. of the distribution of the underlying data used to calculate it.
For
example, supposewemeasure health status with a dummy
variable
distinguishinghealthypeoplefromsickandthat20%ofthepopulation
issick.Thedistributionofthisdummyvariablehastwospikes,oneof
height.8atthevalue1andoneofheight.2atthevalue0.TheCLTtells
usthatwithenoughdata,thedistributionofthet-statisticissmoothand
bell-
shapedeventhoughthedistributionoftheunderlyingdatahasonly
twovalues.
We can see the CLT in action through a sampling experiment. In
sampling experiments, we use the random number generator in
our
computertodrawrandomsamplesofdifferentsizesoverandoveragai
n.
Wedidthisforadummyvariablethatequalsone80%ofthetimeand
forsamplesofsize10,40,and100.Foreachsamplesize,wecalculated
thet-statisticinhalfamillionrandomsamplesusing.8asourvalueofμ.
Figures1.2–1.4plotthedistributionof500,000t-statisticscalculated
foreachofthethreesamplesizesinourexperiment,withthestandard
normal distribution superimposed. With only 10 observations,
the
samplingdistributionisspiky,thoughtheoutlinesofabell-
shapedcurve
alsoemerge.Asthesamplesizeincreases,thefittoanormaldistributio
n
improves.With100observations,thestandardnormalisjustaboutban
g
on.
The standard normal distribution has a mean of 0 and standard
60. deviationof1.Withanystandardnormalvariable,values largerthan
±2arehighlyunlikely. In fact, realizations larger than2
inabsolute
valueappearonlyabout5%ofthetime.Becausethet-
statisticiscloseto
normallydistributed,wesimilarlyexpect it tofallbetweenabout±2
mostofthetime.Therefore,it’scustomarytojudgeanyt-
statisticlarger
thanabout2(inabsolutevalue)astoounlikelytobeconsistentwiththe
nullhypothesisusedtoconstructit.Whenthenullhypothesisisμ=0
andthet-statisticexceeds2inabsolutevalue,wesaythesamplemeanis
significantlydifferent fromzero.Otherwise, it’snot.Similar
language is
usedforothervaluesofμaswell.
FIGURE1.2
Thedistributionofthet-statisticforthemeaninasampleofsize10
Note:Thisfigureshowsthedistributionofthesamplemeanofadummy
variablethatequals
1withprobability.8.
FIGURE1.3
Thedistributionofthet-statisticforthemeaninasampleofsize40
Note:Thisfigureshowsthedistributionofthesamplemeanofadummy
variablethatequals
1withprobability.8.
FIGURE1.4
Thedistributionofthet-statisticforthemeaninasampleofsize100
61. Note:Thisfigureshowsthedistributionofthesamplemeanofadummy
variablethatequals
1withprobability.8.
Wemightalsoturnthequestionofstatisticalsignificanceonitsside:
instead of checkingwhether the sample is consistentwith a
specific
valueofμ,wecanconstructthesetofallvaluesofμthatareconsistent
withthedata.Thesetofsuchvaluesiscalledaconfidenceintervalfor
E[Yi].Whencalculatedinrepeatedsamples,theinterval
shouldcontainE[Yi]about95%ofthetime.Thisintervalistherefore
said to be a 95% confidence interval for the population mean.
By
describing the set of parameter values consistent with our data,
confidence intervals provide a compact summary of the
information
thesedatacontainaboutthepopulationfromwhichtheyweresampled.
PairingOff
Onesampleaverageistheloneliestnumberthatyou’lleverdo.Luckily
,
we’re usually concernedwith two.We’re especially keen to
compare
averagesforsubjectsinexperimentaltreatmentandcontrolgroups.W
e
reference these averages with a compact notation, writing Ȳ1
for
Avgn[Yi|Di=1]andȲ
0forAvgn[Yi|Di=0].Thetreatmentgroupmean,
Ȳ1, is theaverage for then1observationsbelonging to the
62. treatment
group,withȲ0definedsimilarly.Thetotalsamplesizeisn=n0+n1.
For our purposes, the difference between Ȳ1 andȲ0 is either an
estimateof the causal effectof treatment (ifYi is anoutcome), or
a
checkonbalance(ifYiisacovariate).Tokeepthediscussionfocused,
we’ll assume the former. Themost important null hypothesis in
this
contextisthattreatmenthasnoeffect,inwhichcasethetwosamples
usedtoconstructtreatmentandcontrolaveragescomefromthesame
population. On the other hand, if treatment changes outcomes,
the
populationsfromwhichtreatmentandcontrolobservationsaredrawn
arenecessarilydifferent.Inparticular,theyhavedifferentmeans,whi
ch
wedenoteμ1andμ0.
Wedecidewhethertheevidencefavorsthehypothesisthatμ1=μ0
by lookingforstatisticallysignificantdifferences in
thecorresponding
sampleaverages.Statisticallysignificantresultsprovidestrongevid
ence
of a treatment effect, while results that fall short of statistical
significanceareconsistentwiththenotionthattheobserveddifferenc
e
in treatment and controlmeans is a chance finding. The
expression
“chancefinding”inthiscontextmeansthatinahypotheticalexperime
nt
involvingvery large samples—so large that any
samplingvariance is
effectivelyeliminated—
we’dfindtreatmentandcontrolmeanstobethe
same.
Statisticalsignificanceisdeterminedbytheappropriate t-statistic.A
63. keyingredientinanytrecipeisthestandarderrorthatlivesdownstairs
inthetratio.Thestandarderrorforacomparisonofmeansisthesquare
root of the sampling variance ofȲ1−Ȳ0. Using the fact that the
varianceofadifferencebetweentwostatisticallyindependentvariabl
es
isthesumoftheirvariances,wehave
Thesecondequalityhereusesequation(1.5),whichgivesthesampling
varianceofasingleaverage.Thestandarderrorweneedistherefore
In deriving this expression, we’ve assumed that the variances of
individualobservationsarethesameintreatmentandcontrolgroups.
This assumption allows us to use one symbol, , for the common
variance.Aslightlymorecomplicatedformulaallowsvariancestodif
fer
acrossgroupsevenifthemeansarethesa me(anideatakenupagainin
thediscussionofrobustregressionstandarderrors in theappendix to
Chapter2).17
Recognizingthat mustbeestimated,inpracticeweworkwiththe
estimatedstandarderror
whereS(Yi) isthepooledsamplestandarddeviation.This is
thesample
standard deviation calculated using data from both treatment
and
controlgroupscombined.
Underthenullhypothesisthatμ1−μ0isequaltothevalueμ,thet-
statisticforadifferenceinmeansis
Weusethist-statistictotestworkinghypothesesaboutμ1−μ0andto
construct confidence intervals for this difference. When the null
67. compromised the experiment’s
validity.Today’s“randomistas”dobetteronsuchnuts-and-
boltsdesignissues(see,forexample,
the experiments described in Abhijit Banerjee and Esther Duflo,
Poor Economics: A Radical
RethinkingoftheWaytoFightGlobalPoverty,PublicAffairs,2011).
9TheRANDresultsreportedherearebasedonourowntabulationsfro
mtheHIEpublicuse
file,asdescribedintheEmpiricalNotessectionattheendofthebo ok.T
heoriginalRANDresults
are summarized in Joseph P. Newhouse et al., Free for All?
Lessons from the RANDHealth
InsuranceExperiment,HarvardUniversityPress,1994.
10Participants
inthefreeplanhadslightlybettercorrectedvisionthanthose
intheother
plans;seeBrooketal.,“DoesFreeCareImproveHealth?”NewEnglan
dJournalofMedicine,1983,
fordetails.
11SeeAmyFinkelsteinetal.,“TheOregonHealthInsuranceExperim
ent:Evidencefromthe
FirstYear,”Quarterly Journal of Economics, vol. 127, no.
3,August 2012, pages1057–1106;
KatherineBaickeretal.,“TheOregonExperiment—
EffectsofMedicaidonClinicalOutcomes,”
NewEnglandJournalofMedicine,vol.368,no.18,May2,2013,pages
1713–1722;andSarah
Taubmanetal.,“MedicaidIncreasesEmergencyDepartmentUse:Evi
dencefromOregon’sHealth
InsuranceExperiment,”Science,vol.343,no.6168,January17,2014,
pages263–268.
12 Why weren’t all OHP lottery winners insured? Some failed
to submit the required
70. Studentswhoattendedaprivatefour-yearcollegeinAmericapaidan
averageofabout$29,000intuitionandfeesinthe2012–2013school
year.Thosewhowenttoapublicuniversityintheirhomestatepaidless
than$9,000.Aneliteprivateeducationmightbebetterinmanyways:
the classes smaller, the athletic facilities newer, the faculty
more
distinguished,andthestudentssmarter.But$20,000peryearofstudyi
s
abigdifference.Itmakesyouwonderwhetherthedifferenceisworthit.
Theapples-to-applesquestioninthiscaseaskshowmucha40-year-
oldMassachusetts-
borngraduateof,say,Harvard,wouldhaveearnedif
heorshehadgonetotheUniversityofMassachusetts(U-
Mass)instead.
Moneyisn’teverything,but,asGrouchoMarxobserved:“Moneyfree
s
you from doing things you dislike. Since I dislike doing nearly
everything, money is handy.” So whenwe ask whether the
private
school tuition premium is worth paying, we focus on the
possible
earnings gain enjoyed by thosewho attend elite private
universities.
Higherearningsaren’ttheonlyreasonyoumightpreferaneliteprivate
institutionoveryour localstateschool.Manycollegestudentsmeeta
futurespouseandmakelastingfriendshipswhileincollege.Still,whe
n
families invest an additional $100,000 ormore in human capital,
a
higheranticipatedearningspayoffseemslikelytobepartofthestory.
Comparisonsofearningsbetweenthosewhoattenddifferentsortsof
schools invariably reveal large gaps in favor of elite-college
alumni.
Thinkingthisthrough,however,it’seasytoseewhycomparisonsofth
e
earningsofstudentswhoattendedHarvardandU-Massareunlikelyto
71. revealthepayofftoaHarvarddegree.Thiscomparisonreflectsthefact
thatHarvardgradstypicallyhavebetterhighschoolgradesandhigher
SAT scores, aremoremotivated, and perhaps have other skills
and
talents.NodisrespectintendedforthemanygoodstudentswhogotoU-
Mass,butit’sdamnhardtogetintoHarvard,andthosewhodoarea
specialandselectgroup.Incontrast,U-Massacceptsandevenawards
scholarshipmoneytoalmosteveryMassachusettsapplicantwithdece
nt
tenth-grade test scores. We should therefore expect earnings
comparisonsacrossalmamaterstobecontaminatedbyselectionbias,
justlikethecomparisonsofhealthbyinsurancestatusdiscussedinthe
previous chapter.We’ve also seen that this sort of selection bias
is
eliminatedbyrandomassignment.Regrettably,theHarvardadmissio
ns
officeisnotyetpreparedtoturntheiradmissionsdecisionsovertoa
randomnumbergenerator.
Thequestionofwhethercollegeselectivitymattersmustbeanswered
using the data generated by the routine application, admission,
and
matriculation decisionsmade by students and universities of
various
types.Canweusethesedatatomimictherandomizedtrialwe’dliketo
runinthiscontext?Not toperfection,surely,butwemaybeableto
comeclose.Thekeytothisundertakingisthefactthatmanydecisions
and choices, including those related to college attendance,
involve a
certain amount of serendipitous variation generated by financial
considerations,personalcircumstances,andtiming.
Serendipitycanbeexploitedinasampleofapplicantsonthecusp,
whocouldeasilygoonewayor theother.Doesanyoneadmitted to
Harvard really go to their local state school instead?Our
72. friendand
formerMITPhDstudent,Nancy,didjustthat.NancygrewupinTe xas,
sotheUniversityofTexas(UT)washerstateschool.UT’sflagshipAus
tin
campusisrated“HighlyCompetitive”inBarron’srankings,butit’sno
t
Harvard. UT is, however, much less expensive than Harvard
(The
PrincetonReview recently named UT Austin a “Best Value
College”).
Admitted to both Harvard and UT, Nancy chose UT over
Harvard
becausetheUTadmissionsoffice,anxioustoboostaverageSATscore
s
oncampus,offeredNancyanda fewotheroutstandingapplicantsan
especiallygenerousfinancialaidpackage,whichNancygladlyaccept
ed.
WhataretheconsequencesofNancy’sdecisiontoacceptUT’soffer
anddeclineHarvard’s?ThingsworkedoutprettywellforNancyinspit
e
ofherchoiceofUToverHarvard:todayshe’saneconomicsprofessora
t
anotherIvyLeagueschoolinNewEngland.Butthat’sonlyoneexampl
e.
Well,actually,it’stwo:OurfriendMandygotherbachelor’sfromthe
University of Virginia, her home state school, declining offers
from
Duke, Harvard, Princeton, and Stanford. Today, Mandy teaches
at
Harvard.
Asampleoftwoisstilltoosmallforreliablecausalinference.We’d
like to comparemany people likeMandy andNancy tomany other
similarpeoplewhochoseprivatecollegesanduniversities.Fromlarge
73. r
groupcomparisons,wecanhopetodrawgeneral lessons.Accesstoa
largesampleisnotenough,however.Thefirstandmostimportantstep
inourefforttoisolatetheserendipitouscomponentofschoolchoiceist
o
hold constant the most obvious and important differences
between
studentswhogotoprivateandstateschools.Inthismanner,wehope
(thoughcannotpromise)tomakeotherthingsequal.
Here’s a small-sample numerical example to illustrate the
ceteris
paribus idea (we’ll have more data when the time comes for real
empiricalwork).Supposetheonlythingsthatmatterinlife,atleastas
farasyourearningsgo,areyourSATscoresandwhereyougotoschool.
ConsiderUmaandHarvey,bothofwhomhaveacombinedreadingand
mathscoreof1,400ontheSAT.1UmawenttoU-Mass,whileHarvey
wenttoHarvard.WestartbycomparingUma’sandHarvey’searnings.
Becausewe’veassumedthatallthatmattersforearningsbesidescolle
ge
choiceisthecombinedSATscore,Umavs.Harveyisaceterisparibus
comparison.
Inpractice,ofcourse,lifeismorecomplicated.Thissimpleexample
suggests one significant complication: Uma is a young woman,
and
Harveyisayoungman.Womenwithsimilareducationalqualification
s
oftenearnlessthanmen,perhapsduetodiscriminationortimespent
outofthelabormarkettohavechildren.ThefactthatHarveyearns20%
morethanUmamaybetheeffectofasuperiorHarvardeducation,butit
might justaswellreflectamale-femalewagegapgeneratedbyother
things.
We’d like to disentangle the pureHarvard effect from these
other
74. things.Thisiseasyiftheonlyotherthingthatmattersisgender:replace
Harvey with a female Harvard student, Hannah, who also has a
combinedSATof1,400,comparingUmaandHannah.Finally,becaus
e
we’re after general conclusions that gobeyond individual
stories,we
lookformanysimilarsame-sexandsame-SATcontrastsacrossthetwo
schools. That is,we compute the average earnings difference
among
HarvardandU-MassstudentswiththesamegenderandSATscore.The
averageofallsuchgroup-specificHarvardversusU-
Massdifferencesis
ourfirstshotatestimatingthecausaleffectofaHarvard education.Thi
s
is an econometricmatching estimator that controls for—that is,
holds
fixed—
sexandSATscores.Assumingthat,conditionalonsexandSAT
scores, the students who attend Harvard and U-Mass have
similar
earningspotential,thisestimatorcapturestheaveragecausaleffectof
a
Harvarddegreeonearnings.
Matchmaker,Matchmaker
Alas,there’smoretoearningsthansex,schools,andSATscores.Since
collegeattendancedecisionsaren’trandomlyassigned,wemustcontr
ol
for all factors that determine both attendance decisions and later
earnings. These factors include student characteristics, like
writing
ability,diligence,familyconnections,andmore.Controlforsuchawi
de
76. fromcollege.Theanalysisherefocusesonstudentswhoenrolledin19
76
and who were working in 1995 (most adult college graduates are
working).Thecollegesincludeprestigiousprivateuniversities,liket
he
UniversityofPennsylvania,Princeton,andYale; anumberof
smaller
privatecolleges,likeSwarthmore,Williams,andOberlin;andfourpu
blic
universities(Michigan,TheUniversityofNorthCarolina,PennState,
and
MiamiUniversity inOhio). The average (1978) SAT scores at
these
schoolsrangedfromalowof1,020atTulanetoahighof1,370atBryn
Mawr.In1976,tuitionrateswereaslowas$540attheUniversityof
NorthCarolinaandashighas$3,850atTufts(thosewerethedays).
Table2.1details a stripped-downversionof theDale andKrueger
matchingstrategy,inasetupwecallthe“collegematchingmatrix.”Th
is
table lists applications, admissions, andmatriculation decisions
for a
(made-up) listofninestudents,eachofwhomapplied toasmanyas
threeschoolschosenfromanimaginarylistofsix.Threeoutofthesix
schoolslistedinthetablearepublic(AllState,TallState,andAltered
State)andthreeareprivate(Ivy,Leafy,andSmart).Fiveofournine
students(numbers1,2,4,6,and7)attendedprivateschools.Average
earnings in this group are $92,000. The other four, with average
earningsof$72,500,wenttoapublicschool.Thealmost$20,000gap
betweenthesetwogroupssuggestsalargeprivateschooladvantage.
TABLE2.1
Thecollegematchingmatrix
77. Note:Enrollmentdecisionsarehighlightedingray.
ThestudentsinTable2.1areorganizedinfourgroupsdefinedbythe
setof schools towhich theyappliedandwereadmitted.Withineach
group,studentsarelikelytohavesimilarcareerambitions,whilethey
were also judged to be of similar ability by admissions staff at
the
schools to which they applied. Within-group comparisons
should
therefore be considerably more apples-to-apples than
uncontrolled
comparisonsinvolvingallstudents.
ThethreegroupAstudentsappliedtotwoprivateschools,Leafyand
Smart,andonepublicschool,TallState.Althoughthesestudentswere
rejectedatLeafy,theywereadmittedtoSmartandTallState.Students
1
and2wenttoSmart,whilestudent3optedforTallState.Thestudents
ingroupAhavehighearnings,andprobablycomefromuppermiddle
classfamilies(asignalhereisthattheyappliedtomoreprivateschools
thanpublic).Student3, thoughadmitted toSmart,opted forcheaper
TallState,perhapstosaveherfamilymoney(likeourfriendsNancyan
d
Mandy).Althoughthestudents ingroupAhavedonewell,withhigh
averageearningsandahighrateofprivateschoolattendance,within
groupA,theprivateschooldifferentialisnegative:(110+100)/2−
110=−5,inotherwords,agapof−$5,000.
ThecomparisoningroupAisoneofanumberofpossiblematched
comparisonsinthetable.GroupBincludestwostudents,eachofwhom
appliedtooneprivateandtwopublicschools(Ivy,AllState,andAltere
d
State).ThestudentsingroupBhaveloweraverageearningsthanthose
in group A. Bothwere admitted to all three schools towhich they
78. applied.Number4enrolledatIvy,whilenumber5choseAlteredState.
Theearningsdifferentialhere is$30,000 (60−30=30).Thisgap
suggestsasubstantialprivateschooladvantage.
GroupCincludestwostudentswhoappliedtoasingleschool(Leafy),
where they were admitted and enrolled. Group C earnings reveal
nothing about the effects of private school attendance, because
both
studentsinthisgroupattendedprivateschool.Thetwostudentsingrou
p
Dappliedtothreeschools,wereadmittedtotwo,andmadedifferent
choices. But these two students choseAll State and Tall State,
both
publicschools,sotheirearningsalsorevealnothingaboutthevalueofa
privateeducation.GroupsCandDareuninformative,because,fromth
e
perspectiveofourefforttoestimateaprivateschooltreatmenteffect,
eachiscomposedofeitherall-treatedorall-controlindividuals.
GroupsAandBarewheretheactionisinourexample,sincethese
groupsincludepublicandprivateschoolstudentswhoappliedtoand
wereadmittedtothesamesetofschools.Togenerateasingleestimate
thatusesallavailabledata,weaveragethegroup-
specificestimates.The
averageof−$5,000forgroupAand$30,000forgroupBis$12,500.
This isagoodestimateof theeffectofprivate schoolattendanceon
averageearnings,because,toalargedegree,itcontrolsforapplicants’
choicesandabilities.
Thesimpleaverageoftreatment-controldifferencesingroupsAandB
isn’t theonlywell-controlled comparison that canbe computed
from
thesetwogroups.Forexample,wemightconstructaweightedaverage
whichreflectsthefactthatgroupBincludestwostudentsandgroupA
includesthree.Theweightedaverageinthiscaseiscalculatedas
Byemphasizinglargergroups,thisweightingschemeusesthedatamo
79. re
efficiently and may therefore generate a statistically more
precise
summaryoftheprivate-publicearningsdifferential.
Themostimportantpointinthiscontextistheapples-to-applesand
oranges-to-oranges nature of the underlying matched
comparisons.
ApplesingroupAarecomparedtoothergroupAapples,whileoranges
in group B are compared only with oranges. In contrast, naive
comparisons that simply compare the earnings of private and
public
schoolstudentsgenerateamuchlargergapof$19,500whencomputed
using all nine students in the table. Evenwhen limited to the
five
studentsingroupsAandB,theuncontrolledcomparisongeneratesaga
p
of$20,000(20=(110+100+60)/3−(110+30)/2).Thesemuch
larger uncontrolled comparisons reflect selection bias: students
who
apply to and are admitted to private schools have higher
earnings
wherevertheyultimatelychosetogo.
Evidence of selection bias emerges from a comparison of
average
earningsacross(insteadofwithin)groupsAandB.Averageearningsi
n
group A, where two-thirds apply to private schools, are around
$107,000.AverageearningsingroupB,wheretwo-
thirdsapplytopublic
schools, are only $45,000.Ourwithin-group comparisons reveal
that
much of this shortfall is unrelated to students’ college
attendance
decisions. Rather, the cross-group differential is explained by a
combinationofambitionandability,asreflectedinapplicationdecisi
ons
80. andthesetofschoolstowhichstudentswereadmitted.
2.2MakeMeaMatch,RunMeaRegression
Regression is the tool thatmasterspickup first, ifonly toprovidea
benchmarkformoreelaborateempiricalstrategies.Althoughregressi
on
isamany-splendoredthing,wethinkofitasanautomatedmatchmaker.
Specifically, regression estimates are weighted averages of
multiple
matched comparisons of the sort constructed for the groups in
our
stylized matching matrix (the appendix to this chapter discusses
a
closely related connection between regression and mathematical
expectation).
Thekeyingredientsintheregressionrecipeare
▪thedependentvariable,inthiscase,studenti’searningslaterin
life,alsocalledtheoutcomevariable(denotedbyYi);
▪ the treatment variable, in this case, a dummy variable that
indicatesstudentswhoattendedaprivatecollegeoruniversity
(denotedbyPi);and
▪asetofcontrolvariables,inthiscase,variablesthatidentifysets
ofschoolstowhichstudentsappliedandwereadmitted.
Inourmatchingmatrix,thefivestudentsingroupsAandB(Table
2.1)contributeusefuldata,whilestudents ingroupsCandDcanbe
discarded.InadatasetcontainingthoseleftafterdiscardinggroupsC
andD,asinglevariableindicatingthestudentsingroupAtellsuswhich