In this presentation, it is described how mixed-effect Bayesian networks can be used to model the personal effects of nutrition. These personal models can be then applied to personal nutrition. The presentation also discusses in general terms Bayesian modeling and probabilistic programming in Stan language.
Invezz.com - Grow your wealth with trading signals
Revealing Personal Effects of Nutrition
1. Revealing Personal
Effects of Nutrition
WITH BIOSTATISTICAL MODELLING USING
PROBABILISTIC PROGRAMMING IN STAN
JARI TURKIA // ADVANCED TOPICS IN EPIDEMIOLOGY
2. Presentation outline
• Motivation of the study:
We hypothesize that people can react to the same nutrition differently
• Results:
There are substantial personal variations in the effects of nutrition
• Following the analysis:
• Modelling process (Box’s loop)
• Probabilistic programming with Stan and R
3. ”Turkia, J., Mehtätalo, L., Schwab, U. et al. Mixed-effect Bayesian network reveals personal
effects of nutrition. Sci Rep 11, 12016 (2021). https://doi.org/10.1038/s41598-021-91437-3”
Online: https://www.nature.com/articles/s41598-021-91437-3
Code and data: https://github.com/turkiaj/personal-effects-supplemental
4. Motivation
• Nutritional therapists know by their experience that people can react to the
same nutrition very differently
• However, magnitudes of these personal reactions should be measured and put
into computable form to be applicable. This is currently lacking.
• We aim to define a statistical method that could reveal these differences and be
applicable in personal nutrition
5. • A systems biology view on personalized nutrition
• Four interacting layers are used to demonstrate
the connection between personal nutrition–based
consumer goals (top layer) and nutrients (bottom
layer)
• The two middle layers (the organ and process
layers) connect nutrients to goals and represent
the detailing of the biological processes involved
• Keep in mind the form of this graph. We develop a
graphical model that aims to replicate it.
Figure: "Ben van Ommen, Tim van den Broek, Iris de Hoogh, Marjan van Erk,
Eugene van Someren, Tanja Rouhani-Rankouhi, Joshua C Anthony, Koen
Hogenelst, Wilrike Pasman, André Boorsma, Suzan Wopereis, Systems biology
of personalized nutrition, Nutrition Reviews, Volume 75, Issue 8, August 2017,
Pages 579–599, https://doi.org/10.1093/nutrit/nux029"
Personal nutrition: biological motivation
6. • 106 subjects
• 17 nutrients, medication, and
5 blood measurements
• The amount of nutrients are estimated
from food records that were kept in a
week before the lab. measurements
• 4 repeated measurements during
12 weeks
• Personal reactions were unknown and
are here our goal to estimate
Data: Personal nutrition
7. • This graph visualizes the biological process
of nutrients affecting the blood
concentrations in general level
• Left-hand side shows nutrients in diet and
right-hand side the results in
concentrations
• Red line indicates that nutrient increases
the concentration and blue indicates
decreasing effect
• Line thickness indicates the effect strength
Results: General effects
8. • Two panels on the left show the same
general effects of nutrients with and
without shrinkage
• Even more interesting is the estimated
variance of personal effects indicating
the variation between persons. This is
shown in the right.
• Largest variation is found in how
energy content of the diet affects the
insulin concentration
• It is also interesting that cholesterol
lowering mediation increases insulin
concentrations for some patients
Results: There are variations between persons
9. • Our method allows constructing reaction graphs in both general and personal levels
• These patients both have high insulin concentrations, but their personal models show that from different reasons
Example of differently reacting patients
Patient 1 Patient 2
10. Modelling process: Box’s loop
Blei, David M. “Build, compute, critique, repeat: Data analysis with latent variable
models.” Annual Review of Statistics and Its Application 1 (2014): 203-232.
11. • Bayesian network was assumed as a good
candidate for modelling the biological
network of nutritional effects. It constructs
from variable nodes and connections
between them.
• We expanded the standard Bayesian
network into mixed-effect model that can
estimate the effects in multiple levels of
detail
• Each separable concentration distribution
is estimated with a hierarchical (mixed-
effect) model that models the effects in
additive scale as a sum of general effects
and their personal variations
Building model: Bayesian network
12. Building model: Starting point
• We try to understand the
behavior of blood concentrations
(data generating process) rather
than blindly over-fit the model to
data
• Purple lines are samples from the
estimated distributions of
concentration and black lines are
their true values
• The modelling was started by
assuming a simple normal
distribution for concentrations..
• .. But this results impossible
negative concentration levels
13. Building model: More realistic
• In the next iteration, we used
Gamma distribution that allows
only positive values and models
occasional larger values resulting
the right-tail
• The model fits better but it shows
large variance and some over-
and under estimation
14. Building model: Fine tuning
• We understood from clinical
knowledge and observed data
that successive concentration
observations are correlated with
each other
• By adding this autocorrelation
into the model removed the large
variance and adjusted the model
fit
15. Likelihood of Bayesian network
that factorizes into product of concentration specific local distributions Gi
where linear predictor 𝜇𝑖𝑘𝑑 is defined in a matrix form with
Our goal is to find a model that most probably describes the process that generated the
data. Joint probability of network G given observational data D is defined with
16. • It allows the expressing the probabilistic model in explicit program code and
separating the model from an estimation algorithm
• Probabilistic modelling is well-suited for tasks that involve managing great
uncertainties, like effects of nutrition
• We have implemented these models with a probabilistic programming language
Stan, but other languages can be found e.q. in
https://en.wikipedia.org/wiki/Probabilistic_programming
Concentration models (Gi) are implemented
with probabilistic programming
17. Implementation: Stan, mc-stan.org
data {
int<lower=1,upper=J> group[N]; // group indicator
matrix[N,p] X; // general effect model matrix
vector[N] Y; // response
}
parameters {
…
vector[p-1] beta; // general effects
cholesky_factor_corr[k] L; // Cholesky factor of personal effect var-cov matrix
vector<lower=0>[k] sigma_b; // personal effect standard deviations
}
transformed parameters {
real<lower=0> g_alpha; // alpha (shape) parameter of Gamma
matrix[k, k] Lambda; // Tau * Cholesky decomposition
vector[k] b[J]; // personal effects
// diag(sigma_b) * L
Lambda = diag_pre_multiply(sigma_b, L);
for(j in 1:J)
b[j] = Lambda * z[j];
g_alpha = exp(g_log_alpha);
}
model {
...
L ~ lkj_corr_cholesky(1);
for (j in 1:J)
z[j] ~ normal(0,1);
mu = beta_Intercept + lin_trans + X_t[n] * beta + Z_t[n] * b[group[n]];
g_beta = g_alpha / mu;
target += gamma_lpdf(Y_t[n] | g_alpha, g_beta);
}
Linear predictor
and the probability distributions are explicitly
expressed in Stan code with
18. Implementation: Custom R code
Complete networks are constructed with R and iGraph package as GraphML files/objects
initial_graph <- mebn.new_graph_with_randomvariables(datadesc)
sysdimet_gamma_ar1 <- mebn.bipartite_graph(
reaction_graph = initial_graph,
inputdata = sysdimet,
predictor_columns = assumedpredictors,
assumed_targets = assumedtargets,
group_column = "SUBJECT_ID",
local_estimation = mebn.sampling,
local_model_cache = "models/BLMM_gamma/ar1",
stan_model_file = "mebn/BLMM_gamma_ar1.stan",
normalize_values = TRUE)
write_graph(sysdimet_gamma_ar1, "graphs/sysdimet_gamma_ar1.graphml", "graphml")
# Evaluating the fit of the local distributions with visual posterior predictive check
library(bayesplot)
mebn.target_dens_overlays("BLMM_gamma/ar1/", assumedtargets, sysdimet)
19. Implementation: Bayesian networks
Personal network models are extracted from the mixed-effect networks
> personal_graph1 <- mebn.personal_graph(
person_id = person_id1,
reaction_graph = initial_graph,
predictor_columns = assumedpredictors,
assumed_targets = assumedtargets,
local_distributions = expfam_ar1_dirs)
> write.graph(personal_graph1, "graphs/personal_graph_s01.graphml", "graphml")
> mebn.plot_personal_effects(personal_graph1, 10, graph_layout)
> eff_vertice <- V(personal_graph1)[name = "personal_dvit_fsins"]
> eff_vertice$value, eff_vertice$value_lCI, eff_vertice$value_uCI
0.45, -0.41, 1.43
> # these objects are valid Bayesian networks in R
> library(bnlearn)
> bn <- as.bn(personal_graph1)
> ancestors(bn, "fsins")
"rasva", "safa", "mufa","linoli","linoleeni", "personal_prot_fsins","b_prot_fsins",
"personal_rasva_fsins","b_rasva_fsins"…
20. Wrap-up
• Even subtle, but real, effects can be extracted from data with careful modelling
• Expert knowledge should be guiding the modelling process
• Start with a simple model and then develop-criticize-repeat
• The model evaluation should match your use case. Mostly, we don’t need a
perfect model but usable in a ways that matter.
• Probabilistic programming allows iterative model development and helps
matching the actual implementation with its mathematical formulation.
You need this in your papers!