This manuscript contributes a general and practical framework for casting a Markov process model of a system at equilibrium as a structural causal model, and carry- ing out counterfactual inference. Markov processes mathematically describe the mechanisms in the system, and predict the system’s equilibrium behavior upon intervention, but do not support counterfactual inference. In contrast, structural causal models support counterfactual inference, but do not identify the mechanisms. This manuscript leverages the benefits of both approaches. We define the structural causal models in terms of the parameters and the equilibrium dynamics of the Markov process models, and counterfactual inference flows from these settings. The proposed approach alleviates the identifiability drawback of the structural causal models, in that the counterfactual inference is consistent with the counter- factual trajectories simulated from the Markov process model. We showcase the benefits of this framework in case studies of complex biomolecular systems with nonlinear dynamics. We illustrate that, in presence of Markov process model mis- specification, counterfactual inference leverages prior data, and therefore estimates the outcome of an intervention more accurately than a direct simulation.
Integrating Markov Process with Structural Causal Modeling Enables Counterfactual Inference in Complex Systems
1. Integrating Markov Process with
Structural Causal Modeling
Enables Counterfactual Inference
in Complex Systems
1
Dr. Robert Ness - Gamalon
Kaushal Paneri - Northeastern University
Dr. Olga Vitek - Northeastern University
2. Markov process models
• Stoichiometric Network - A collection of chemical processes together. Consider an
irreversible degradation process
• Ordinary Differential Equations -
• Mass balance equation ( By the law of mass conversation)
• Assumption: Large number of molecules (change in concentration as real number)
• Stochastic Differential Equations -
• Low concentration - collisions between molecules are unpredictable (Brownian motion
becomes significant)
• Instead of , we model , where is the distribution (PMF or PDF) of S2.
• In complex systems, these processes are reversible.
!2
Background
X0 - Input signal
v0 - Reaction rate
k1, k3 - Rate constants
3. Steady-state
• The rate of change for all the species, are zero, while
at the same time, net rates are non-zero.
• Species concentrations are unchanging.
• Consider a simplified model
• Setting both to zero and solving for A and B, we get
• We can achieve the same result by solving for A and B
and then taking
!3
Background
4. Complex Systems: Protein signaling networks
• Protein networks operates as signal processing network and are
responsible for sensing external signals such as nutritional or cell
signals to the inner components.
• Protein networks can be stoichiometric or non-stoichiometric.
• Stoichiometric networks consider specific biochemical events
between proteins, such as phosphorylation, degradation.
• Phosphorylated protein means activated protein. i.e., in a network, if
there is an edge from protein S to N, phosphorylated S causes N to
phosphorylate, which in turns, enables the downstream mechanism.
• Stoichiometric networks describe underlying elementary processes,
which give a physical causal relationship between system
components.
!4
Background
5. Complex Systems: Case Studies
• Mitogen-Activated Protein Kinase (MAPK)
• Model:
• Biochemical Processes:
• Steady-state solution:
• Insulin-like Growth Factor (IGF)
• Model:
!5
• Steady-state solution:
Background
vactK - Activation rate of MAPK
vinhK - Deactivation rate of MAPK
K3, K2, K - Abundance (concentration of
proteins MAP3K, MAP2K, MAPK)
PAM,i - Parents of protein i in
Markov process model M
6. Structural Causal Models (SCMs)
• For a same system with the set of components J, a structural causal model
consists of
1. Random variables , corresponding to the states of the
system.
2. A distribution on a set of independent random variables
.
3. A set of functions called structural assignments, such that
Where are parents of .
Example: MAPK
!6
NMAP3K
f3KE1 MAP3K
NMAP2K
f2K MAP2K
NMAPK
fK MAPK
Exogenous Endogenous
Background
7. Counterfactual Inference
• Intervention - do(MAP2K = a)
• Counterfactual -
1. Observe MAP3K=m3, MAP2K=m2 and MAPK=m1.
2. Condition the model and infer N’MAP3K,…, N’MAPK.
3. Intervention - what would happen to MAPK if MAP3K was m3’?
do(MAP3K = m3’)
4. Use noise from Step 2. as input to the intervention model and get the
counterfactual distribution of MAPK.
!7
NMAP3K
f3KE1 MAP3K a
NMAPK
fK MAPK
Background
8. Markov Process Models
!8
Background
Structural Causal Models
Pros:
Cons:
Pros:
Cons:
• Describes system in terms of its
initial parameters and conditions.
• Temporal dynamics of deterministic
or stochastic systems
• Does not have prescribed
formulation for counterfactuals
• Subject to model misspecification.
• Can perform counterfactual
inference.
• Only need steady-state
observational data for parameter
estimation
• Identifiability
• Hard to perform inference
• Can only model steady-state or
equilibrium
10. SCM: Case studies
• Step 1… 6: Steady-state:
• Step 8: Reparameterize:
• Use Gaussian approximation to Binomial distribution
that makes easy reparameterization
• Use inverse CDF, if tractable
!10
Methods
12. Counterfactual Simulation and Evaluation
!12
Causal effect of MAP3K on MAPK
Causal Effect: difference between observed outcome and intervened outcome
Counterfactual Question: Supposed we ran a simulation and observed MAPK
abundance, what would happen if the activation rate of MAP3K was lower?
ODE and SDE: Simulate two trajectories, return the difference of steady state values
SCM: Observe MAPK, infer noises on conditioned model, intervene on MAP3K and
get the counterfactual distribution of MAPK.
If the distributions are aligned, then counterfactual distribution is rooted in the causal
mechanism present in Markov process model.
Results
13. Counterfactual Robustness under Model Misspecification
• System biology models are subject to misspecification, due to
• Limited understanding of molecular mechanism
• Open system - described model is actually a part of broader
molecular system, where omitted components can affect the
outcomes of intervention.
• Counterfactual enables investigator to use past experiment data to
more realistically assess the utility of interventions.
• Suppose we know the model of a system at steady-state, and its true
initial rate parameters. Consider an experiment where a biologist sets
the model parameters that will be subject to misspecification.
• Because the misspecified model is conditioned with the true data to
get the counterfactual, the resulting distribution will be closer to the
true causal effect.
!13
Results
14. Counterfactual Robustness under Misspecified Rates
!14
Histograms of causal effects, i.e. differences between the “observed" and the “counter- factual"
trajectories at equilibrium, for one repetition of the evaluation. The causal effect from the
misspecified SCM (blue histogram) is closer to true causal effect (orange histogram) than the
causal effect derived from a direct but misspecified simulation (green histogram). (a) MAPK, (b)
IGF.
Results
15. Discussion
• This work demonstrated the process of converting a
mechanistic model to SCM, aiming to perform counterfactual
inference at steady-state.
• Empirical validation shows that the counterfactual inference
from the resulting SCM is rooted in the causal dynamic
processes of the mechanistic model.
• Continuation of this work would be to tackle systems with
multiple or no closed-form steady-state solutions.
• Cycles
• Combination of CausalGANs and this approach to include
non-stoichiometric system components.
!15
17. References
• U. Alon. “An Introduction to Systems Biology”. CRC
press, 2006.
• A. Balke and J. Pearl. “Counterfactual Probabilities:
Computational Methods, Bounds and Applications”
• E. Bingham et al. “Pyro: Deep universal probabilistic
programming”. In: arXiv:1810.09538 (2018)
• S. Bongers and J. M. Mooij. “From random differential
equations to structural causal models: The stochastic
case”. In: arXiv:1803.08784 (2018).
!17