Modeling with Data
Jon Zelner
Postdoctoral Fellow, Ecology & Evolutionary Biology @
Princeton University
Postdoctoral Fell...
Data in the Modeling Process
Model
Agents
Environment Simulated Behavior
Observed Behavior
Pattern Oriented Modeling (POM)
— Term coined by Grimm et al. in 2005 Science paper
— Modeling process should be guided ...
Types of data for modelers
— Counts/Proportions:
— Infections
— Occupied patches
— Distributions
— Age
— Lifespan
—...
Pattern Oriented Modeling
Pattern Oriented Modeling: Kayenta
Anasazi (Axtell et al. 2001)
— Trying to understand
population growth and
collapse amo...
Anasazi (cont’d)
Inference for POM
— Bayesian/Qualitative
— Use some kind of quality function to score goodness of runs
and optimize by m...
How Infections Propagate
After Point-Source Events
An Analysis of Secondary Norovirus Transmission
Jon Zelnera,b, Aaron A....
Norovirus (NoV) Epidemiology
— Most common cause of non-bacterial
gastroenteritis in the US and worldwide.
— Est. 90 mil...
Why model transmission after
point-source events?
— Typical analysis of point-source events
focuses on primary, one-to-ma...
NoV Transmission Dynamics
— Norovirus transmission dynamics tend to be
locally unstable but globally persistent.
— E.g.,...
NoV Transmission Dynamics
Exponential Growth,
Global Invasion
(e.g.,Pandemic Flu)
Short, Explosive &
Limited
(Typical of N...
Outbreak Data
— Gotz et al. (2001) observed 500+
households exposed to NoV after a
point-source outbreak in a
network of ...
Deterministic SEIR model
— Infinite population
— Mass-action mixing
— Frequency-dependent
transmission
— When I > 0, a...
Why use a stochastic model?
— Deterministic models work well
when assumptions are plausible,
but are less useful when:
—...
Progression of NoV Infection
— Short incubation period (~1.5 days)
— Typical symptom duration around 1.5 days.
— Except...
Basic NoV Transmission Model for
Household Outbreaks
— SEIR Transmission Model
— Individuals may be in one of four state...
Analysis Objectives
— Estimate daily person-to-person rate of infection (β).
— Estimate average effective duration of in...
What makes these data
challenging to work with?
— We want to understand:
— Daily person-to-person rate of infection (β)....
Likelihood Function for Fully
Observed Household Outbreaks
€
λ(Sij ,Iij ,β,α) = Sij βIij +α( )Force of Infection @ t
€
i,...
Likelihood Function for Fully
Observed Household Outbreaks
Likelihood of a household
observation
€
i = i,a × i,b !
O =...
Likelihood Function for Fully
Observed Household Outbreaks
€
λ(Sij ,Iij ,β,α) = Sij βIij +α( )Force of Infection @ t
€
i,...
Unobserved Infection States
+ 104 Households w/
No Secondary Cases
Unobserved Infection States
— Use data augmentation to generate
complete observations.
— For each symptom onset event (q...
Unobserved Infection States
x = infection; = symptom onset; = recovery
— Evaluate likelihood w/respect to β and α for eac...
Unobserved Household Sizes
— Sizes of households in Stockholm outbreak are
unknown.
— Expected number of cases is:
— S(...
Results: MLE Parameter Values
and 95% Confidence Intervals
1/γ limited to values >= 1 day; infectiousness duration < 1 day...
Results: Likelihood Surface
— Contour plot shows likelihood for combinations of β and 1/γ for γs = 1.
— Triangle is loca...
Goodness of fit
— Simulate from SEIR model using fitted parameters
and same demographics as outbreak.
If :
!
" = #SI
Draw...
Goodness of fit
— Simulate from SEIR model using
fitted parameters and same
demographics as outbreak.
— Quantify model p...
Sensitivity Analysis:
Household Sizes
— Want to understand the extent to which using sampled
household sizes biases resul...
Results: Sensitivity Analysis
— Estimate parameters for outbreak with β = 0.14/day
and 1/γ = 1.2 days
— Dashed lines sho...
Asymptomatic Infections
— Problem: Only observed symptomatic infections
— Asymptomatics likely don’t contribute much to ...
Modeling asymptomatic infection
— π is proportion of new infections that are asymptomatic.
— Assume asymptomatic infecti...
Modeling asymptomatic infection
If :
!
" = #SI
Draw number of new infections, x, from
!
Binomial(S,")
Draw number never sy...
Modeling asymptomatic infection
— π is proportion of new infections that are asymptomatic.
— Assume asymptomatic infecti...
Modeling asymptomatic infection
Norovirus outbreaks in realistic
communities
— Norovirus has interesting qualitative outbreak dynamics in the
community.
...
Detailed Transmission Model
S E IS
IA1
R
IA2
(βIS*IS) + (βIA*IA)
NoV transmission is marked by heterogeneous asymptomatic ...
Contact structure
— Household sizes:
— Assume a representative community, i.e., household sizes are a
random sample from...
Empirical contact networks
— Many empirical
community contact
networks have an
exponentially
distributed degree.
— Moder...
Outbreak Realizations
— = Household Transmission
— = Community Transmission
— = Point Source Event
Upcoming SlideShare
Loading in …5
×

ICPSR - Complex Systems Models in the Social Sciences - 2013 - Professor Daniel Martin Katz (Guest: Jon Zelner - Modeling with Data)

2,631 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,631
On SlideShare
0
From Embeds
0
Number of Embeds
697
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ICPSR - Complex Systems Models in the Social Sciences - 2013 - Professor Daniel Martin Katz (Guest: Jon Zelner - Modeling with Data)

  1. 1. Modeling with Data Jon Zelner Postdoctoral Fellow, Ecology & Evolutionary Biology @ Princeton University Postdoctoral Fellow, NIH Fogarty International Center Research and Policy for Infectious Disease Dynamics (RAPIDD) Program
  2. 2. Data in the Modeling Process Model Agents Environment Simulated Behavior Observed Behavior
  3. 3. Pattern Oriented Modeling (POM) — Term coined by Grimm et al. in 2005 Science paper — Modeling process should be guided by patterns of interest — Can use patterns @ multiple levels: — Individual agents — Environment — Aggregate agent behavior — Patterns should be used both to guide model development and to calibrate and validate models.
  4. 4. Types of data for modelers — Counts/Proportions: — Infections — Occupied patches — Distributions — Age — Lifespan — Duration of infection — Rates — Birthrates — Transmission rate — Time Series: — Evolution of outbreak in time — Timeline of conflict — Number of firms over time — Qualitative: — ‘Norovirus-like’ outbreaks — Size and shape of forest patches — Clusters of settlements
  5. 5. Pattern Oriented Modeling
  6. 6. Pattern Oriented Modeling: Kayenta Anasazi (Axtell et al. 2001) — Trying to understand population growth and collapse among the Kayenta Anasazi in U.S. Southwest — Many factors in this: — Weather — Farming — Kinship — Optimize models by explaining multiple patterns @ one time.
  7. 7. Anasazi (cont’d)
  8. 8. Inference for POM — Bayesian/Qualitative — Use some kind of quality function to score goodness of runs and optimize by minimizing distance between model output and optimum quality and/or data. — Number of occupied patches — Size of elephant herds — Frequentist/Likelihood-based — Define a likelihood function for Data | Model — Simulate runs from the model and evaluate likelihood of data as (# runs == Data) / # runs
  9. 9. How Infections Propagate After Point-Source Events An Analysis of Secondary Norovirus Transmission Jon Zelnera,b, Aaron A. Kinga,c, Christine Moee & Joseph N.S. Eisenberga,d University of Michigan a Center for the Study of Complex Systems, b Sociology & Public Policy, c Ecology & Evolutionary Biology, d School of Public Health Emory University e Rollins School of Public Health
  10. 10. Norovirus (NoV) Epidemiology — Most common cause of non-bacterial gastroenteritis in the US and worldwide. — Est. 90 million cases in 2007 — Explosive diarrhea & projectile vomiting in symptomatic cases. — Single-stranded, non-enveloped RNA virus — Member of family Caliciviridae — Often transmitted via food — Salad greens — Shellfish — Most person-to-person transmission is via the environment and fomites.
  11. 11. Why model transmission after point-source events? — Typical analysis of point-source events focuses on primary, one-to-many risk: — How many cases are created by an infectious food handler? — How many people infected after water treatment failure? — However, actual size of point source events is underestimated without including secondary transmission risk. — Within-household transmission is an important bridge between point-source events. — So, even if within-household Ro < 1, household cases have important dynamic consequences at the community level. H IS S IA
  12. 12. NoV Transmission Dynamics — Norovirus transmission dynamics tend to be locally unstable but globally persistent. — E.g., small, explosive outbreaks in Mercer County, but no local NoV epidemic — Multiple reported NoV outbreaks throughout New Jersey every week. — Stochasticity operates at multiple levels. — Disease/Contact
  13. 13. NoV Transmission Dynamics Exponential Growth, Global Invasion (e.g.,Pandemic Flu) Short, Explosive & Limited (Typical of NoV outbreaks)
  14. 14. Outbreak Data — Gotz et al. (2001) observed 500+ households exposed to NoV after a point-source outbreak in a network of daycare centers in Stockholm, Sweden. — Traceable to salad prepared by a food handler who was shedding post-symptoms. — Followed 153 of these households — Eliminating those with only one person. — 49 had secondary cases — 104 have no secondary cases
  15. 15. Deterministic SEIR model — Infinite population — Mass-action mixing — Frequency-dependent transmission — When I > 0, a fraction of the susceptible population is infected at every instant — Constant average rate of recovery — Doesn’t matter who is infected — ‘Nano-fox’ problem ! dS dt = −βSI dE dt = βSI −εE dI dt = εE − γI dR dt = γI
  16. 16. Why use a stochastic model? — Deterministic models work well when assumptions are plausible, but are less useful when: — Populations are small: — e.g.,Household outbreak — Global contact patterns deviate from homogeneous mixing: — Social networks — Realistic behavior — Disease natural history is not memoryless: — Recovery period is not exponentially or gamma distributed — Lots of variability in individual infectiousness Exponential RV Lognormal RV
  17. 17. Progression of NoV Infection — Short incubation period (~1.5 days) — Typical symptom duration around 1.5 days. — Exceptional cases up to a year have been reported. — Most people shed asymptomatically after recovery of symptoms: — Typically for several days — Not uncommon for shedding to last > 1 month, year or more — 15-50% of all infections may be totally asymptomatic
  18. 18. Basic NoV Transmission Model for Household Outbreaks — SEIR Transmission Model — Individuals may be in one of four states: — Susceptible — Exposed/Incubating — Infectious — Recovered/Immune — Multiple boxes in E & I states correspond to shape parameter of gamma distributed waiting times. — Background infection parameter, α. (Fixed to 0.001/day) — Although NoV immunity tends to be partial and short-lived, this model is adequate for analyzing short-lived outbreaks.
  19. 19. Analysis Objectives — Estimate daily person-to-person rate of infection (β). — Estimate average effective duration of infection (1/γ) and shape parameter of gamma-distributed infectiousness duration. — Effect of missing household sizes on results. — Effect of asymptomatic infection. — 0.14/infections per day — 1.2 days; γs = 1 — Minimal — .035 increase in β for each 10% increase in proportion of individuals who are asymptomatically infectious
  20. 20. What makes these data challenging to work with? — We want to understand: — Daily person-to-person rate of infection (β). — Average effective duration of infection (1/γ). — Variability in 1/γ. — Generation of asymptomatic infections. — But household data are noisy and only partially observed: — We know time of symptom onset but are missing: — Time of infection — Time of recovery — Firm estimate of asymptomatic ratio & infectiousness — Household Sizes (!) — Strength of these data are that we can treat each household as an independent trial of a random infection process.
  21. 21. Likelihood Function for Fully Observed Household Outbreaks € λ(Sij ,Iij ,β,α) = Sij βIij +α( )Force of Infection @ t € i, a = exp −λ(Sij ,Iij ,β,α)(tj+1 − tj )( ) j =0 NQ −1 ∏Likelihood of no infections over all infection-free intervals Probability of all infections € i,b = λ(Sik,Iik,β,α) k=1 NK ∏ x = infection; = symptom onset; = recovery
  22. 22. Likelihood Function for Fully Observed Household Outbreaks Likelihood of a household observation € i = i,a × i,b ! O = i i ∈H ∏Likelihood of all household observations x = infection; = symptom onset; = recovery
  23. 23. Likelihood Function for Fully Observed Household Outbreaks € λ(Sij ,Iij ,β,α) = Sij βIij +α( )Force of Infection @ t € i, a = exp −λ(Sij ,Iij ,β,α)(tj+1 − tj )( ) j =0 NQ −1 ∏Likelihood of no infections over all infection-free intervals Probability of all infections € i,b = λ(Sik,Iik,β,α) k=1 NK ∏ Likelihood of a household observation € i = i,a × i,b ! O = i i ∈H ∏Likelihood of all household observations
  24. 24. Unobserved Infection States + 104 Households w/ No Secondary Cases
  25. 25. Unobserved Infection States — Use data augmentation to generate complete observations. — For each symptom onset event (q): — Draw incubation time, k, from distribution — Infection time, a = q – k — If you draw any a < 0, whole sample has likelihood = 0. — Draw recovery time, r, from symptom duration distribution. — If r > observation period, w: — r = w — For right-censoring in data. — Repeat for many (1K+) samples
  26. 26. Unobserved Infection States x = infection; = symptom onset; = recovery — Evaluate likelihood w/respect to β and α for each sample. — E(L) is estimated likelihood of data.
  27. 27. Unobserved Household Sizes — Sizes of households in Stockholm outbreak are unknown. — Expected number of cases is: — S(βI + a)Δt — Missing S! — Solution: — Assume exposed households are sampled at random from the whole population. — For each augmented household time series, sample household size from Swedish census distribution. — Save samples by setting a lower bound: — Likelihood of outbreak with have fewer individuals than observed infections = 0, so don’t sample these.
  28. 28. Results: MLE Parameter Values and 95% Confidence Intervals 1/γ limited to values >= 1 day; infectiousness duration < 1 day not plausible
  29. 29. Results: Likelihood Surface — Contour plot shows likelihood for combinations of β and 1/γ for γs = 1. — Triangle is location of MLE; Dashed oval 95% confidence bounds — Parameter space isn’t very large, optimize using brute force.
  30. 30. Goodness of fit — Simulate from SEIR model using fitted parameters and same demographics as outbreak. If : ! " = #SI Draw number of new infections, x, from ! Binomial(S,") S = S – x E = E + x Draw symptom onset times from for all new infections. t = t + dt At end of step: Transition from ! E "I those who have infectiousness onset time <= t. Transition ! I "R those who have recovery time <= t Else: STOP At end of step: Transition from ! E "I those who have infectiousness onset time <= t. Transition ! I "R those who have recovery time <= t Else: STOP
  31. 31. Goodness of fit — Simulate from SEIR model using fitted parameters and same demographics as outbreak. — Quantify model performance based on closeness to outbreak characteristics — Average number of infections in households with secondary cases. — Simulated: 1.9, SD = 0.2 — Stockholm: 1.6 — Average number of households with no secondary cases. — Simulated: 110.5, SD = 5.5 — Stockholm: 104 ! # of infections in households w/ 2-ary transmission ! # of households with zero secondary cases
  32. 32. Sensitivity Analysis: Household Sizes — Want to understand the extent to which using sampled household sizes biases results. — Simulate outbreaks with household sizes drawn from Swedish census distribution. — Estimate parameters using: — Sampled household sizes — Known sizes from simulation — Compare results.
  33. 33. Results: Sensitivity Analysis — Estimate parameters for outbreak with β = 0.14/day and 1/γ = 1.2 days — Dashed lines show fit when household sizes are known, solid are unknown. — Results almost exactly the same.
  34. 34. Asymptomatic Infections — Problem: Only observed symptomatic infections — Asymptomatics likely don’t contribute much to outbreaks in households with symptomatic cases, but can be infected during these outbreaks. — Are very important for seeding new outbreaks: — Stockholm outbreak started by post-symptomatic food-handler — Afternoon Delight outbreak in Ann Arbor — Subway outbreaks in Kent County, MI — Full analysis of asymptomatic infections requires active surveillance — e.g., Stool and environmental samples. — Solution: Estimate parameters for outbreaks with varying levels of asymptomatic infection using simulated data.
  35. 35. Modeling asymptomatic infection — π is proportion of new infections that are asymptomatic. — Assume asymptomatic infections are non-infectious during household outbreak. — Sample 20 outbreaks each for combinations of: — Β = {0.075,0.085,…,.2} — π = {0, .1,…,.5}
  36. 36. Modeling asymptomatic infection If : ! " = #SI Draw number of new infections, x, from ! Binomial(S,") Draw number never symptomatic, a, from ! Binomial(x,") S = S – (x-a) E = E + (x-a) R = R + a Draw symptom onset times from for all new infections. t = t + dt At end of step: Transition from ! E "I those who have infectiousness onset time <= t. Transition ! I "R those who have recovery time <= t Else: STOP
  37. 37. Modeling asymptomatic infection — π is proportion of new infections that are asymptomatic. — Assume asymptomatic infections are non-infectious during household outbreak. — Sample 20 outbreaks each for combinations of: — Β = {0.075,0.085,…,.2} — π = {0, .1,…,.5} — Estimate parameters using data augmentation method. — Assume π = 0, as when fitting Stockholm data. — Find expected value of β for each tau when estimated β = 0.14.
  38. 38. Modeling asymptomatic infection
  39. 39. Norovirus outbreaks in realistic communities — Norovirus has interesting qualitative outbreak dynamics in the community. — Outbreaks are explosive but typically limited. — Multiple levels of transmission: — Can embed findings about household transmission. — Community rate of transmission is unknown. — Data on community and region-level Norovirus outbreaks are rare. — Take a pattern-oriented approach to building community-level models of NoV transmission. — Build a model based on observed patterns and data that can recreate outbreaks with NoV-like characteristics.
  40. 40. Detailed Transmission Model S E IS IA1 R IA2 (βIS*IS) + (βIA*IA) NoV transmission is marked by heterogeneous asymptomatic infectious periods. ~5% of the population will shed for 100+ days. Existing theory predicts that increasing variability in individual infectiousness makes outbreaks less predictable, but smaller on average. Want to understand how this heterogeneity impacts outbreak dynamics in the context of heterogeneous contact structure.
  41. 41. Contact structure — Household sizes: — Assume a representative community, i.e., household sizes are a random sample from the census distribution of household sizes. — Contacts in the community: — Individuals separated into compartments: — School, work, etc — Social network: — How do we choose a network topology that is useful and informative? — Food handlers: — About 1% of U.S. adults are food handlers — Average norovirus point-source outbreak size is about 40
  42. 42. Empirical contact networks — Many empirical community contact networks have an exponentially distributed degree. — Moderate heterogeneity in contact
  43. 43. Outbreak Realizations — = Household Transmission — = Community Transmission — = Point Source Event

×