1. Bayesian hierarchical models
for estimating the health
effects of air pollution sources
Roger D. Peng, PhD
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
@rdpeng, simplystatistics.org
(joint work with Jenna Krall and Amber Hackstadt)
New England Statistics Symposium
April 2016
2. Not So Standard Deviations
(with Hilary Parker of Stitch Fix)
Subscribe in iTunes: https://goo.gl/ZhWYbd
https://soundcloud.com/nssd-podcast
3. Particulate Matter and Health
• PM has been linked with health outcomes:
hospitalization, mortality, decreased lung function,
cardiac events
• WHO estimates ~800,000 premature deaths per year
• Evidence of both short-term (acute) and long-term
(chronic) effects of exposure to ambient PM
• Much recent work has examined ambient PM mass
(PM10, PM2.5, PM10-2.5) indicator
4. • There is strong evidence that ambient PM is
associated with mortality and morbidity
• What should we do about it? How can we
intervene to improve health?
• Target sources of PM that are most harmful to
human health
• How do we identify sources of ambient PM?
What’s Next?
5. • There is strong evidence that ambient PM is
associated with mortality and morbidity
• What should we do about it? How can we
intervene to improve health?
• Target sources of PM that are most harmful to
human health
• How do we identify sources of ambient PM?
What’s Next?
10. Problems
• Current source apportionment methods are applied
on an ad hoc, highly tweaked, basis and are
difficult to scale to a region or nation
• Source apportionment models are typically
informed by investigator’s local knowledge
• No reproducible way to combine information across
locations to gain power when estimating health
effects (multi-site studies)
11. Incorporating New Data
Sources on Pollution Sources
Component Data Source
Particulate Matter EPA Air Quality System (AQS)
PM Chemical Constituents EPA Chemical Speciation Network
PM Source Profiles EPA SPECIATE Database
PM Source Emissions EPA National Emissions Inventory
20. Model Fitting
• We use Markov chain Monte Carlo to simulate from
the posterior distribution of the unknown
parameters
• Adaptive MCMC approach of Haario et al (2001)
• Use data from SPECIATE and NEI to calibrate the
prior distributions
• Constraints placed on profile matrix based on what
is known about composition of specific sources
21.
22.
23. Estimating Health Effects
• For individual cities, estimated source time series can
be plugged into regression models with health
outcomes
• For multi-site studies, source determination cannot be a
manual process (not reproducible)
• Need automatic method to combine information across
a region
• Current approaches assume pollution sources are the
same everywhere
24. US EPA Chemical Speciation Network
• 85 monitors, 24 constituents
Medicare cohort (1999—2010)
• CVD hospitalizations for 63 counties
25. SHARE
• A method for estimating health effects of sources
SHared Across a REgion
• Sources are estimated at individual locations and
health effects estimated
• Sources are matched across locations via population
value decomposition (Crainiceanu et al. 2011)
• Health effects combined for common sources via
hierarchical modeling
28. Summary
• Bayesian source apportionment model can
integrate information from 3 national databases
• Data on sources and profiles can be used to
constrain the problem and to construct informative
prior distributions
• SHARE method can be used to automatically
combine health effects of estimated sources across
a region