Individual Variability vs Stochastic Variability in Data Modeling

•Download as PPTX, PDF•

0 likes•86 views

The Statistical and Applied Mathematical Sciences Institute

Variability (noise) caused by random variation rather than true differences among individuals is an intrinsic feature of the biomedical world. Time series data from patients (in the case of clinical science) or number of infections (in the case of epidemics) can vary due to both intrinsic differences and incidental fluctuations. The use of traditional fitting methods for ODEs applied to real data sets implies that deviation from some trend is ascribed to error or parametric heterogeneity. Thus, noise can be wrongly classified as differences among individuals, leading to potentially erroneous predictions and misguided policies or research programs. We studied the ability of model fitting, under different hypotheses (fixed or random effects), to capture individual differences in the underlying data. We explore a simple (exactly solvable) example displaying an initial exponential growth by comparing state-of-the-art stochastic fitting and traditional least squares approximations. I discuss the implications of these results for the interpretation of biological data using as an example the 2014-2015 Ebola epidemic in Africa.

Education

Individual variability or just
variability?
Ruy M. Ribeiro
Faculdade de Medicina da Universidade de
Lisboa & Los Alamos National Laboratory

Individual variability or just
variability?
Ruy M. Ribeiro
Ethan Romero-Severson, LANL
Mario Castro, UP Comillas, Madrid

Typical data sets
• Panel data
• Repeated measures
• Hierarchical structure
• Covariates

HIV RNA decay under therapy
Time (days)
Cardozo et al. PLoS Path 2016

Data fitting
• Biological model
– Viral dynamics with treatment
• Non-linear mixed effects
– Population-based data fitting
• Inter-individual variability model
– mi= q ebi and bi ~ N(0,W)
• Error model
– ei ~ N(0, s2)

Fitting the data
10log ( ) ( , )ij ij j ijV f t q e 
r
2
(0, )ij Ne s:.T
j jx b
j e

q q


r rr r
(0, )b N W:

Fits of the model
Time (days)
Quad Tx (No RAL)
RAL combination

What about stochastic variability?
• We estimate variance of the error
• We estimate variance of parameter distribution
• What about process variability?
– Each time series is a realization of a biological
process that (presumably) is intrinsically stochastic

Exponential growth - birth process
m=0.1, s=0 m=0.1, s=0.01

Mixed-effect fit
t= 50 t= 100
GROWTH RATE
s=0
s=0.01

Multiple models to fit the same data
Stochastic
birth process
No parametric
variance
Parametric
variance
DATA GENERATION/SIMULATION DATA FITTING
Deterministic
No random effect
Stochastic
No random effect
Deterministic
Random effect
Stochastic
Random effect
Deterministic
No random effect
Stochastic
No random effect
Deterministic
Random effect
Stochastic
Random effect

Estimates
DATA: NO PARAMETRIC VARIABILITY DATA: PARAMETRIC VARIABILITY
MEAN
ST DEV

• Infected people, I, generate new infections at rate a
• a ~ N(mA, sA)
Variance of the birth process

Estimates of R2 in the fits
Stochastic Deterministic

Estimates of growth rate Ebola data
Romero-Severson et al. (2018). Frontiers Microbiology

Mixed-effect model fitting
Krauer et al. (2016). PLoS Neglected Trop Dis

$Conclusions 1. Heterogeneity between units has implications for data modeling and its interpretations (precision medicine) 2. Stochastic heterogeneity accounts for a large fraction of total heterogeneity 3. Deterministic models introduce bias by forcing all heterogeneity between units to be accounted for by parametric heterogeneity 4. Is there a way to estimate the relative importance of stochastic vs. parametric heterogeneity?$

Individual variability
or
just variability?

What about stochastic variability?
• The question we were interested in was “Is
the estimated distribution of parameters real
or induced by the model, which does not
account for potential stochastic variability?”

Similar to Individual Variability vs Stochastic Variability in Data Modeling

Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG

Multivariate Analysis and Visualization of Proteomic DataUC Davis

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook

Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)

Amia tb-review-08Russ Altman

2014 07 ismb personalized medicineUniversity of California, San Francisco

Dr. Igor Paploski - Making Epidemiological Sense Out of Large Datasets of PRR...John Blue

RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor

Liangqun ms defense.pptxLiangqun Lu

How to transform genomic big data into valuable clinical informationJoaquin Dopazo

Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...OSUCCC - James

Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Spark Summit

Introduction to 16S rRNA gene multivariate analysisJosh Neufeld

VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGNARRANAGAPAVANKUMAR

Identification, annotation and visualisation of extreme changes in splicing w...Mar Gonzàlez-Porta

Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...John Blue

Cdac 2018 antoniotti cancer evolution traitMarco Antoniotti

Soergel oa week-2014-lightningDavid Soergel

Lecture 7 gwas fullLekki Frazier-Wood

Amia tb-review-13Russ Altman

Similar to Individual Variability vs Stochastic Variability in Data Modeling (20)

Challenges and opportunities for machine learning in biomedical research

Multivariate Analysis and Visualization of Proteomic Data

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017

Identification of pathological mutations from the single-gene case to exome p...

Amia tb-review-08

2014 07 ismb personalized medicine

Dr. Igor Paploski - Making Epidemiological Sense Out of Large Datasets of PRR...

RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)

Liangqun ms defense.pptx

How to transform genomic big data into valuable clinical information

Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...

Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...

Introduction to 16S rRNA gene multivariate analysis

VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING

Identification, annotation and visualisation of extreme changes in splicing w...

Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...

Cdac 2018 antoniotti cancer evolution trait

Soergel oa week-2014-lightning

Lecture 7 gwas full

Amia tb-review-13

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...The Statistical and Applied Mathematical Sciences Institute

2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - A Bracketing Relationship between Differe...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Difference-in-differences: more than meet...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...The Statistical and Applied Mathematical Sciences Institute

2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...The Statistical and Applied Mathematical Sciences Institute

2019 Fall Series: Professional Development, Writing Academic Papers…What Work...The Statistical and Applied Mathematical Sciences Institute

2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...The Statistical and Applied Mathematical Sciences Institute

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...

2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...

Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...

Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...

Causal Inference Opening Workshop - A Bracketing Relationship between Differe...

Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...

Causal Inference Opening Workshop - Difference-in-differences: more than meet...

Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...

Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...

Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...

Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...

Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...

Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...

Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...

Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...

Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...

2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...

2019 Fall Series: Professional Development, Writing Academic Papers…What Work...

2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

Recently uploaded

9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

Interactive Powerpoint_How to Master effective communicationnomboosow

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre

Paris 2024 Olympic Geographies - an activityGeoBlogs

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31

Recently uploaded (20)

9548086042 for call girls in Indira Nagar with room service

The Most Excellent Way | 1 Corinthians 13

Z Score,T Score, Percential Rank and Box Plot Graph

Grant Readiness 101 TechSoup and Remy Consulting

Interactive Powerpoint_How to Master effective communication

Código Creativo y Arte de Software | Unidad 1

Separation of Lanthanides/ Lanthanides and Actinides

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

CARE OF CHILD IN INCUBATOR..........pptx

Q4-W6-Restating Informational Text Grade 3

JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Call Girls in Dwarka Mor Delhi Contact Us 9654467111

A Critique of the Proposed National Education Policy Reform

1029-Danh muc Sach Giao Khoa khoi 6.pdf

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx

Paris 2024 Olympic Geographies - an activity

The basics of sentences session 2pptx copy.pptx

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...

Individual Variability vs Stochastic Variability in Data Modeling

1. Individual variability or just variability? Ruy M. Ribeiro Faculdade de Medicina da Universidade de Lisboa & Los Alamos National Laboratory

2. Individual variability or just variability? Ruy M. Ribeiro Ethan Romero-Severson, LANL Mario Castro, UP Comillas, Madrid

4. Typical data sets • Panel data • Repeated measures • Hierarchical structure • Covariates

5. HIV RNA decay under therapy Time (days) Cardozo et al. PLoS Path 2016

6. Data fitting • Biological model – Viral dynamics with treatment • Non-linear mixed effects – Population-based data fitting • Inter-individual variability model – mi= q ebi and bi ~ N(0,W) • Error model – ei ~ N(0, s2)

7. Possible structural models

8. Fitting the data 10log ( ) ( , )ij ij j ijV f t q e  r 2 (0, )ij Ne s:.T j jx b j e  q q   r rr r (0, )b N W:

9. Fits of the model Time (days) Quad Tx (No RAL) RAL combination

10. Distribution of parameters efficacy

11.

12. What about stochastic variability? • We estimate variance of the error • We estimate variance of parameter distribution • What about process variability? – Each time series is a realization of a biological process that (presumably) is intrinsically stochastic

13.

14. Exponential growth - birth process m=0.1, s=0 m=0.1, s=0.01

15. Mixed-effect fit t= 50 t= 100 GROWTH RATE s=0 s=0.01

16. Multiple models to fit the same data Stochastic birth process No parametric variance Parametric variance DATA GENERATION/SIMULATION DATA FITTING Deterministic No random effect Stochastic No random effect Deterministic Random effect Stochastic Random effect Deterministic No random effect Stochastic No random effect Deterministic Random effect Stochastic Random effect

17. Fitting methodology • Simulation based – pomp in R: Partial observed Markov process • Iterated filtering algorithm • Maximum likelihood • Profile likelihood • xj, k ~ Poisson(xj,k|α xj,k−1) or xj, k ~ α xj,k−1 King et al. (2016) J. Stat. Softw. 69, 1–43. Romero-Severson et al. (2015). Am. J. Epidemiol. 182, 255–262

18. Estimates DATA: NO PARAMETRIC VARIABILITY DATA: PARAMETRIC VARIABILITY MEAN ST DEV

19. Panel units and observations

20. Varying sigma

21. Variability

22. • Infected people, I, generate new infections at rate a • a ~ N(mA, sA) Variance of the birth process

23. Variance of the birth process (ii)

24. Variance of the birth process

25. R2 for the birth process

26. Estimates of R2 in the fits Stochastic Deterministic

27. What does all this mean?

28. Ebola cases (counties)

29. Estimates of growth rate Ebola data Romero-Severson et al. (2018). Frontiers Microbiology

30. Mixed-effect model fitting Krauer et al. (2016). PLoS Neglected Trop Dis

31. Conclusions 1. Heterogeneity between units has implications for data modeling and its interpretations (precision medicine) 2. Stochastic heterogeneity accounts for a large fraction of total heterogeneity 3. Deterministic models introduce bias by forcing all heterogeneity between units to be accounted for by parametric heterogeneity 4. Is there a way to estimate the relative importance of stochastic vs. parametric heterogeneity?

32. Individual variability or just variability?

33.

34. What about stochastic variability? • The question we were interested in was “Is the estimated distribution of parameters real or induced by the model, which does not account for potential stochastic variability?”

Editor's Notes

This should be my title slide. And I thank both Ethan and Mario, not only for the work I will present, but also for contributions to the presentation.
What I want to talk about today is something that I think is very relevant for precision or individualized medicine. These are also the typical data sets I work with. They are not big data, but I hope that they are still informative data.
This is a recent example. Describe. Panel data. Repeated measures. Hierarchical (within individual). Treatment type as covariate.
Modelos não lineares de efeitos mistos. Temos três níveis de modelos: O modelo biológico... O modelo de erro, que vou assumir é multiplicativo ou aditivo nos logaritmos e constante (isto é, variância homogénea) O modelo de variabilidade inter-indivíduo, que inclui os efeitos aleatórios.
Where we can include therapy
Em concreto... Vij is the viral load at time i for subject j. f is the biological model, we assume additive error (on the logarithms) and the parameters may depend on covariates xj, with random effect bj and covariance Omega. Distribuição Multinormal
A distribuição dos parâmetros... And we can even get the individual parameter estimates from the random effects (posterior mean).
All this looks good. But I have a nagging question.
This model though apparently simple is still too complicated: too many parameters and variables. So we started really simply.
Fifty trajectories Growth rate: alpha=0.1 Left: no parametric variability, sigalpha=0 - But is this a log normal distribution for alpha or gamma distribution or something else. This is from Mario data sets Right: sigalpha=0.01 (Note: prob(x<0 | mu=0.1, sig=0.01) < 10^(-23)
Fit of the data in the previous slide. Black line is for sigma=0 and red line for sigma=0.01, in both cases mu=0.1. Fit with glmer from package lme4, with family= poisson(link=“log”)
Stochastic birth data with mu=0.15 (top panel), and sigma=0.02 (bottom panel)
Red squares – standard mixed effect model with Poisson log-link
To make it even simpler, we assume that we have the exact model, so there is no error, as is the case in the preceding simulations – although this is relaxed later with a model for the observations.
Geometric distribution for the birth process with growth rate given by the normal distribution
Expansion for small sigma and t <1/sigma
Dashed line is empirical variance from 50 (only) trajectories, simulated from a model similar to previous slides. Brown is the variance from the formula above and red is stochastic variance vs. blue parametric variance.
The (second) approximation is valid when exp(alpha t)>>1. Plot assuming muA=0.1. For fixed average growth rates, lower heterogeneity takes longer to become “apparent”. Effect of mean growth rates on 𝑅 2 is very small and only present when t is very small (not shown) There is no heuristic for how big an infected population must become before 𝑅 2 is guaranteed to be large
Red squares correspond to R^2 determined from linear mixed effects models.
County level data from 2014-2015 Ebola outbreak. Align and trim data such that we are only addressing approximately exponential growth Fix stochastic and deterministic growth model to trimmed data Negative Binomial error model has a free variance terms that is fit to the data
Rather then assuming Poisson, we allow for over dispersion using a NB
We used a generalised linear mixed effects model (GLMM) comprising both fixed and random effects, which explicitly allows for clustering in the data [37]. Such a hierarchical model allows the mean values to vary between the different countries, but borrows information across the districts within a country. The weekly number of new infections cijk in district i in country j at time-point tijk was assumed to follow a Poisson distribution with a mean λijk and was modelled with the logarithm as the link function
I am really interested in discussing these issues with anyone who has relevant ideas…
Compare theoretical distribution in black with the estimated by mixed effect model in red at time t=50.
Compare theoretical distribution in black with the estimated by mixed effect model in red at time t=100.

Individual Variability vs Stochastic Variability in Data Modeling

Recommended

Recommended

More Related Content

Similar to Individual Variability vs Stochastic Variability in Data Modeling

Similar to Individual Variability vs Stochastic Variability in Data Modeling (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

Individual Variability vs Stochastic Variability in Data Modeling

Editor's Notes