Variability (noise) caused by random variation rather than true differences among individuals is an intrinsic feature of the biomedical world. Time series data from patients (in the case of clinical science) or number of infections (in the case of epidemics) can vary due to both intrinsic differences and incidental fluctuations. The use of traditional fitting methods for ODEs applied to real data sets implies that deviation from some trend is ascribed to error or parametric heterogeneity. Thus, noise can be wrongly classified as differences among individuals, leading to potentially erroneous predictions and misguided policies or research programs. We studied the ability of model fitting, under different hypotheses (fixed or random effects), to capture individual differences in the underlying data. We explore a simple (exactly solvable) example displaying an initial exponential growth by comparing state-of-the-art stochastic fitting and traditional least squares approximations. I discuss the implications of these results for the interpretation of biological data using as an example the 2014-2015 Ebola epidemic in Africa.
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Individual Variability vs Stochastic Variability in Data Modeling
1. Individual variability or just
variability?
Ruy M. Ribeiro
Faculdade de Medicina da Universidade de
Lisboa & Los Alamos National Laboratory
2. Individual variability or just
variability?
Ruy M. Ribeiro
Ethan Romero-Severson, LANL
Mario Castro, UP Comillas, Madrid
3.
4. Typical data sets
• Panel data
• Repeated measures
• Hierarchical structure
• Covariates
5. HIV RNA decay under therapy
Time (days)
Cardozo et al. PLoS Path 2016
6. Data fitting
• Biological model
– Viral dynamics with treatment
• Non-linear mixed effects
– Population-based data fitting
• Inter-individual variability model
– mi= q ebi and bi ~ N(0,W)
• Error model
– ei ~ N(0, s2)
12. What about stochastic variability?
• We estimate variance of the error
• We estimate variance of parameter distribution
• What about process variability?
– Each time series is a realization of a biological
process that (presumably) is intrinsically stochastic
16. Multiple models to fit the same data
Stochastic
birth process
No parametric
variance
Parametric
variance
DATA GENERATION/SIMULATION DATA FITTING
Deterministic
No random effect
Stochastic
No random effect
Deterministic
Random effect
Stochastic
Random effect
Deterministic
No random effect
Stochastic
No random effect
Deterministic
Random effect
Stochastic
Random effect
17. Fitting methodology
• Simulation based
– pomp in R: Partial observed Markov process
• Iterated filtering algorithm
• Maximum likelihood
• Profile likelihood
• xj, k ~ Poisson(xj,k|α xj,k−1) or xj, k ~ α xj,k−1
King et al. (2016) J. Stat. Softw. 69, 1–43. Romero-Severson et al. (2015). Am. J. Epidemiol. 182, 255–262
31. Conclusions
1. Heterogeneity between units has implications for data
modeling and its interpretations (precision medicine)
2. Stochastic heterogeneity accounts for a large fraction
of total heterogeneity
3. Deterministic models introduce bias by forcing all
heterogeneity between units to be accounted for by
parametric heterogeneity
4. Is there a way to estimate the relative importance of
stochastic vs. parametric heterogeneity?
34. What about stochastic variability?
• The question we were interested in was “Is
the estimated distribution of parameters real
or induced by the model, which does not
account for potential stochastic variability?”
Editor's Notes
This should be my title slide. And I thank both Ethan and Mario, not only for the work I will present, but also for contributions to the presentation.
What I want to talk about today is something that I think is very relevant for precision or individualized medicine. These are also the typical data sets I work with. They are not big data, but I hope that they are still informative data.
This is a recent example. Describe.
Panel data. Repeated measures. Hierarchical (within individual). Treatment type as covariate.
Modelos não lineares de efeitos mistos. Temos três níveis de modelos:
O modelo biológico...
O modelo de erro, que vou assumir é multiplicativo ou aditivo nos logaritmos e constante (isto é, variância homogénea)
O modelo de variabilidade inter-indivíduo, que inclui os efeitos aleatórios.
Where we can include therapy
Em concreto...
Vij is the viral load at time i for subject j. f is the biological model, we assume additive error (on the logarithms) and the parameters may depend on covariates xj, with random effect bj and covariance Omega.
Distribuição Multinormal
A distribuição dos parâmetros...
And we can even get the individual parameter estimates from the random effects (posterior mean).
All this looks good. But I have a nagging question.
This model though apparently simple is still too complicated: too many parameters and variables. So we started really simply.
Fifty trajectories
Growth rate: alpha=0.1
Left: no parametric variability, sigalpha=0 - But is this a log normal distribution for alpha or gamma distribution or something else. This is from Mario data sets
Right: sigalpha=0.01 (Note: prob(x<0 | mu=0.1, sig=0.01) < 10^(-23)
Fit of the data in the previous slide.
Black line is for sigma=0 and red line for sigma=0.01, in both cases mu=0.1. Fit with glmer from package lme4, with family= poisson(link=“log”)
Stochastic birth data with mu=0.15 (top panel), and sigma=0.02 (bottom panel)
Red squares – standard mixed effect model with Poisson log-link
To make it even simpler, we assume that we have the exact model, so there is no error, as is the case in the preceding simulations – although this is relaxed later with a model for the observations.
Geometric distribution for the birth process with growth rate given by the normal distribution
Expansion for small sigma and t <1/sigma
Dashed line is empirical variance from 50 (only) trajectories, simulated from a model similar to previous slides. Brown is the variance from the formula above and red is stochastic variance vs. blue parametric variance.
The (second) approximation is valid when exp(alpha t)>>1.
Plot assuming muA=0.1.
For fixed average growth rates, lower heterogeneity takes longer to become “apparent”.
Effect of mean growth rates on 𝑅 2 is very small and only present when t is very small (not shown)
There is no heuristic for how big an infected population must become before 𝑅 2 is guaranteed to be large
Red squares correspond to R^2 determined from linear mixed effects models.
County level data from 2014-2015 Ebola outbreak. Align and trim data such that we are only addressing approximately exponential growth
Fix stochastic and deterministic growth model to trimmed data
Negative Binomial error model has a free variance terms that is fit to the data
Rather then assuming Poisson, we allow for over dispersion using a NB
We used a generalised linear mixed effects model
(GLMM) comprising both fixed and random effects, which explicitly allows for clustering in
the data [37]. Such a hierarchical model allows the mean values to vary between the different
countries, but borrows information across the districts within a country. The weekly number
of new infections cijk in district i in country j at time-point tijk was assumed to follow a Poisson
distribution with a mean λijk and was modelled with the logarithm as the link function
I am really interested in discussing these issues with anyone who has relevant ideas…
Compare theoretical distribution in black with the estimated by mixed effect model in red at time t=50.
Compare theoretical distribution in black with the estimated by mixed effect model in red at time t=100.