1.
Modeling with Data
INTRO TO COMPUTING FOR COMPLEX SYSTEMS
(Session XVI)
Jon Zelner
University of Michigan
8/11/2010
2.
Data in the Modeling Process
Agents Observed Behavior
Environment Model Simulated Behavior
3.
Pattern Oriented Modeling (POM)
Term coined by Grimm et al. in 2005 Science paper
Modeling process should be guided by patterns of
interest
Can use patterns @ multiple levels:
Individual agents
Environment
Aggregate agent behavior
Patterns should be used both to guide model
development and to calibrate and validate models.
4.
Types of data for modelers
Counts/Proportions: Time Series:
Infections Evolution of outbreak in
Occupied patches time
Timeline of conflict
Distributions Number of firms over
Age time
Lifespan
Qualitative:
Duration of infection
‘Norovirus-like’
Rates outbreaks
Birthrates Size and shape of forest
patches
Transmission rate
Clusters of settlements
6.
Pattern Oriented Modeling: Kayenta
Anasazi (Axtell et al. 2001)
Trying to understand
population growth and
collapse among the
Kayenta Anasazi in U.S.
Southwest
Many factors in this:
Weather
Farming
Kinship
Optimize models by
explaining multiple
patterns @ one time.
8.
Inference for POM
Bayesian/Qualitative
Use some kind of quality function to score goodness of runs
and optimize by minimizing distance between model output
and optimum quality and/or data.
Number of occupied patches
Size of elephant herds
Frequentist/Likelihood-based
Define a likelihood function for Data | Model
Simulate runs from the model and evaluate likelihood of
data as (# runs == Data) / # runs
9.
How Infections Propagate
After Point-Source Events
An Analysis of Secondary Norovirus Transmission
Jon Zelnera,b, Aaron A. Kinga,c, Christine Moee &
Joseph N.S. Eisenberga,d
University of Michigan
a Center for the Study of Complex Systems, b Sociology & Public Policy,
c Ecology & Evolutionary Biology, d School of Public Health
Emory University
e Rollins School of Public Health
10.
Norovirus (NoV) Epidemiology
Most common cause of non-bacterial
gastroenteritis in the US and worldwide.
Est. 90 million cases in 2007
Explosive diarrhea & projectile vomiting
in symptomatic cases.
Single-stranded, non-enveloped RNA
virus
Member of family Caliciviridae
Often transmitted via food
Salad greens
Shellfish
Most person-to-person transmission is
via the environment and fomites.
11.
Why model transmission after
point-source events?
IA
Typical analysis of point-source events
focuses on primary, one-to-many risk:
How many cases are created by an
infectious food handler?
IS
How many people infected after water H
treatment failure? S
However, actual size of point source events is
underestimated without including secondary
transmission risk.
Within-household transmission is an important
bridge between point-source events.
So, even if within-household Ro < 1,
household cases have important dynamic
consequences at the community level.
12.
NoV Transmission Dynamics
Norovirus transmission dynamics tend to be
locally unstable but globally persistent.
E.g., small, explosive outbreaks in Mercer
County, but no local NoV epidemic
Multiple reported NoV outbreaks
throughout New Jersey every week.
Stochasticity operates at multiple levels.
Disease/Contact
13.
NoV Transmission Dynamics
Exponential Growth,
Global Invasion
(e.g.,Pandemic Flu)
Short, Explosive &
Limited
(Typical of NoV
outbreaks)
14.
Outbreak Data
Gotz et al. (2001) observed 500+
households exposed to NoV after a
point-source outbreak in a
network of daycare centers in
Stockholm, Sweden.
Traceable to salad prepared by a
food handler who was shedding
post-symptoms.
Followed 153 of these households
Eliminating those with only one
person.
49 had secondary cases
104 have no secondary cases
15.
Deterministic SEIR model
Infinite population
dS
Mass-action mixing = "#SI
dt
Frequency-dependent dE
transmission = #SI " $E
dt
When I > 0, a fraction of the dI
susceptible population is infected = $E " %I
at every instant dt
Constant average rate of recovery dR
Doesn’t matter who is infected
= %I
dt
‘Nano-fox’ problem
!
16.
Why use a stochastic model?
Deterministic models work well
when assumptions are plausible,
but are less useful when:
Populations are small:
e.g.,Household outbreak
Global contact patterns deviate
from homogeneous mixing:
Social networks Exponential RV
Realistic behavior
Disease natural history is not
memoryless:
Recovery period is not
exponentially or gamma
distributed
Lots of variability in
individual infectiousness
Lognormal RV
17.
Progression of NoV Infection
Short incubation period (~1.5 days)
Typical symptom duration around 1.5 days.
Exceptional cases up to a year have been reported.
Most people shed asymptomatically after recovery of
symptoms:
Typically for several days
Not uncommon for shedding to last > 1 month, year or more
15-50% of all infections may be totally asymptomatic
18.
Basic NoV Transmission Model for
Household Outbreaks
SEIR Transmission Model
Individuals may be in one of four states:
Susceptible
Exposed/Incubating
Infectious
Recovered/Immune
Multiple boxes in E & I states correspond to shape parameter of gamma distributed
waiting times.
Background infection parameter, α. (Fixed to 0.001/day)
Although NoV immunity tends to be partial and short-lived, this model is adequate
for analyzing short-lived outbreaks.
19.
Analysis Objectives
Estimate daily person-to-person rate of infection (β).
0.14/infections per day
Estimate average effective duration of infection (1/γ)
and shape parameter of gamma-distributed
infectiousness duration.
1.2 days; γs = 1
Effect of missing household sizes on results.
Minimal
Effect of asymptomatic infection.
.035 increase in β for each 10% increase in
proportion of individuals who are asymptomatically
infectious
20.
What makes these data
challenging to work with?
We want to understand:
Daily person-to-person rate of infection (β).
Average effective duration of infection (1/γ).
Variability in 1/γ.
Generation of asymptomatic infections.
But household data are noisy and only partially observed:
We know time of symptom onset but are missing:
Time of infection
Time of recovery
Firm estimate of asymptomatic ratio & infectiousness
Household Sizes (!)
Strength of these data are that we can treat each household
as an independent trial of a random infection process.
21.
Likelihood Function for Fully
Observed Household Outbreaks
Force of Infection @ t λ(Sij ,I ij ,β ,α ) = Sij ( βI ij + α)
N Q −1
Likelihood of no infections over
all infection-free intervals
i, a = ∏ exp(− λ(S ij ,I ij ,β ,α)(tj+1 − tj ))
j =0
€
NK
Probability of all infections
€ ! i,b = % " (Sik ,Iik , #, $ )
k =1
!
x = infection; = symptom onset; = recovery
22.
Likelihood Function for Fully
Observed Household Outbreaks
x = infection; = symptom onset; = recovery
Likelihood of a household ! i = ! i,a " ! i,b !
observation
Likelihood of all household ! O = ∏ i
observations i∈H
23.
Likelihood Function for Fully
Observed Household Outbreaks
Force of Infection @ t λ(Sij ,I ij ,β ,α ) = Sij ( βI ij + α)
N Q −1
Likelihood of no infections over
all infection-free intervals
i, a = ∏ exp(− λ(S ij ,I ij ,β ,α)(tj+1 − tj ))
j =0
€
NK
Probability of all infections
€ ! i,b = % " (Sik ,Iik , #, $ )
k =1
Likelihood of a household ! i = ! i,a " ! i,b !
observation !
Likelihood of all household O = ∏
! i
observations i∈H
24.
Unobserved Infection States
+ 104 Households w/
No Secondary Cases
25.
Unobserved Infection States
Use data augmentation to generate
complete observations.
For each symptom onset event (q):
Draw incubation time, k, from distribution
Infection time, a = q – k
If you draw any a < 0, whole sample has
likelihood = 0.
Draw recovery time, r, from symptom
duration distribution.
If r > observation period, w:
r=w
For right-censoring in data.
Repeat for many (1K+) samples
26.
Unobserved Infection States
x = infection; = symptom onset; = recovery
Evaluate likelihood w/respect to β and α for each sample.
E(L) is estimated likelihood of data.
27.
Unobserved Household Sizes
Sizes of households in Stockholm outbreak are
unknown.
Expected number of cases is:
S(βI + a)Δt
Missing S!
Solution:
Assume exposed households are sampled at random from
the whole population.
For each augmented household time series, sample household
size from Swedish census distribution.
Save samples by setting a lower bound:
Likelihood of outbreak with have fewer individuals than
observed infections = 0, so don’t sample these.
28.
Results: MLE Parameter Values
and 95% Confidence Intervals
1/γ limited to values >= 1 day; infectiousness duration < 1 day not plausible
29.
Results: Likelihood Surface
Contour plot shows likelihood for combinations of β and 1/γ for γs = 1.
Triangle is location of MLE; Dashed oval 95% confidence bounds
Parameter space isn’t very large, optimize using brute force.
30.
At end of step:
Transition from E "I those who have infectiousness onset time <= t.
Goodness ofI "R
fit Transition those who have recovery time <= t
Else: !
STOP
Simulate from
! SEIR
model using fitted parameters
and same demographics as outbreak.
If :
" = #SI
Draw number of new infections, x, from Binomial(S, ")
S=S–x
! E=E+x
!
Draw symptom onset times from for all new infections.
t = t + dt
At end of step:
Transition from E "I those who have infectiousness onset time <= t.
Transition I "R those who have recovery time <= t
Else: !
STOP
!
31.
Goodness of fit
Simulate from SEIR model using
fitted parameters and same
demographics as outbreak.
Quantify model performance
based on closeness to outbreak # of infections in households w/ !
characteristics 2-ary transmission
Average number of infections in
households with secondary cases.
Simulated: 1.9, SD = 0.2
Stockholm: 1.6
Average number of households
with no secondary cases.
Simulated: 110.5, SD = 5.5
!
Stockholm: 104 # of households with zero
secondary cases
32.
Sensitivity Analysis:
Household Sizes
Want to understand the extent to which using sampled
household sizes biases results.
Simulate outbreaks with household sizes drawn from
Swedish census distribution.
Estimate parameters using:
Sampled household sizes
Known sizes from simulation
Compare results.
33.
Results: Sensitivity Analysis
Estimate parameters for outbreak with β = 0.14/day
and 1/γ = 1.2 days
Dashed lines show fit when household sizes are
known, solid are unknown.
Results almost exactly the same.
34.
Asymptomatic Infections
Problem: Only observed symptomatic infections
Asymptomatics likely don’t contribute much to outbreaks in
households with symptomatic cases, but can be infected during
these outbreaks.
Are very important for seeding new outbreaks:
Stockholm outbreak started by post-symptomatic food-handler
Afternoon Delight outbreak in Ann Arbor
Subway outbreaks in Kent County, MI
Full analysis of asymptomatic infections requires active surveillance
e.g., Stool and environmental samples.
Solution: Estimate parameters for outbreaks with varying levels of
asymptomatic infection using simulated data.
35.
Modeling asymptomatic infection
π is proportion of new infections that are asymptomatic.
Assume asymptomatic infections are non-infectious during household
outbreak.
Sample 20 outbreaks each for combinations of:
Β = {0.075,0.085,…,.2}
π = {0, .1,…,.5}
36.
Modeling asymptomatic infection
If :
" = #SI
Draw number of new infections, x, from Binomial(S, ")
Draw number never symptomatic, a, from Binomial(x, " )
! S = S – (x-a)
E = E + (x-a) !
R=R+a !
Draw symptom onset times from for all new infections.
t = t + dt
At end of step:
Transition from E "I those who have infectiousness onset time <= t.
Transition I "R those who have recovery time <= t
Else: !
STOP
!
37.
Modeling asymptomatic infection
π is proportion of new infections that are asymptomatic.
Assume asymptomatic infections are non-infectious during household
outbreak.
Sample 20 outbreaks each for combinations of:
Β = {0.075,0.085,…,.2}
π = {0, .1,…,.5}
Estimate parameters using data augmentation method.
Assume π = 0, as when fitting Stockholm data.
Find expected value of β for each tau when estimated β = 0.14.
39.
Norovirus outbreaks in realistic
communities
Norovirus has interesting qualitative outbreak dynamics in the
community.
Outbreaks are explosive but typically limited.
Multiple levels of transmission:
Can embed findings about household transmission.
Community rate of transmission is unknown.
Data on community and region-level Norovirus outbreaks are rare.
Take a pattern-oriented approach to building community-level
models of NoV transmission.
Build a model based on observed patterns and data that can
recreate outbreaks with NoV-like characteristics.
40.
Detailed Transmission Model
(βIS*IS) + (βIA*IA)
IA1
S E IS R
IA2
NoV transmission is marked by heterogeneous asymptomatic infectious periods.
~5% of the population will shed for 100+ days.
Existing theory predicts that increasing variability in individual infectiousness
makes outbreaks less predictable, but smaller on average.
Want to understand how this heterogeneity impacts outbreak dynamics in the
context of heterogeneous contact structure.
41.
Contact structure
Household sizes:
Assume a representative community, i.e., household sizes are a
random sample from the census distribution of household sizes.
Contacts in the community:
Individuals separated into compartments:
School, work, etc
Social network:
How do we choose a network topology that is useful and informative?
Food handlers:
About 1% of U.S. adults are food handlers
Average norovirus point-source outbreak size is about 40
42.
Empirical contact networks
Many empirical
community contact
networks have an
exponentially
distributed degree.
Moderate
heterogeneity in
contact
43.
Outbreak Realizations
= Household Transmission
= Community Transmission
= Point Source Event
Views
Actions
Embeds 0
Report content