SlideShare a Scribd company logo
Deep Reinforcement Learning for Control
of Probabilistic Boolean Networks
Georgios Papagiannis1 and Sotiris Moschoyiannis2
1University of Cambridge, UK
2University of Surrey, UK
Complex Networks and their Applications 2020 – 1 Dec 2020
s.moschoyiannis@surrey.ac.uk
Boolean Networks1 (BNs)
2
n1
n4
n3
n2
n5
A class of discrete dynamical systems:
• Nodes represent genes,
gene expression is quantized: 0 (inactive), 1 (active)
• Expression level of each gene is functionally related to
the expression states of some other genes
• At each time step,
each node computes and produces output (0 or 1),
which is input for its connected nodes in the next time step
AND
OR
Corresponding state space
Boolean network (BN)
1 Kauffman (1969) Metabolic stability and epigenesis in randomly constructed genetic nets, J. of Theoretical Biology, 22(3):437-467
Attractors
3
n1
n4
n3
n2
n5
Corresponding state space
Boolean network (BN)
Dynamics of BNs dictate that the network will
evolve to a state, or set of states, that it cannot
leave without external intervention
• Fixed point attractors
• Limit cycle attractors
4
n1
n4
n3
n2
n5
AND, p=0.5
OR, p=0.3
NAND, p=0.2
Boolean network (BN)
Probabilistic Boolean Networks2 (PBNs)
More than one Boolean function at each node;
one function executes at each step t, with prob. p
Accommodate uncertainty in gene regulation.
Corresponding state space
• Dynamics of PBNs
- admit Markov Chain theory (MDPs)
- exhibit attractors; these manifest as:
• absorbing states
• irreducible sets
2 Shmulevich et al (2002) Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks , Bioinformatics 18(2):261-274
Gene Regulatory Networks (GRNs)
5
Segment polarity genes5
Fission yeast cell-cycle6
• Spontaneous emergence of ordered collective behaviour 3
e.g., functional states of the cell such as growth or quiescence
correspond to such attractors 3
e.g., high / low resistance to antibiotics at different attractors 4
• (Why PBN study is useful) Targeted therapeutics: external
perturbation on certain gene(s), at certain state(s), can drive the
GRN to a desirable attractor (drug targets)
• where perturbation = change of state (i.e., 0->1, 1-> 0)
• Kauffman: attractors are stable under most gene perturbations
Study PBNs as a dynamical system where change of state on
certain genes, at certain states, may drastically affect the state
of the network as a whole, and
• lead to a different attractor, with desirable properties
• switch between attractors
3 Huang, Ingber (2000) Shape-dependent control of cell growth: Switching between attractors in cell regulatory networks. Experimental Cell Research, 261(1): 91-103
4 Reardon (2017) Modified viruses deliver death to antibiotic-resistant bacteria. Nature, 546:586-587
5 Albert, Othmer (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J. of Theoretical Biology, 223(1):1-18
6 Wang, Du, Chen, et al (2010) Process-based network decomposition reveals backbone motif structure. Proc Natl Acad Sci 107(23):10,478–10,483
Control
Complex Networks perspective:
A dynamical system is controllable if it can be driven from
any initial state to any desired state, within finite time 7
7 Liu, Slotine, Barabasi (2011) Controllability of complex networks, Nature 473: 167-173
Our goal:
Discover control strategies to effect perturbations on individual nodes
(targeted intervention), aiming to drive the whole network from its current
state to a specified target state that exhibits desirable (biological) properties.
6
Control in (P)BNs
Comes in different flavours.
e.g.,
Assume control inputs,
intervene only using these 8
Intervene on one node to
affect another node’s state 9
Intervene on any node to affect
the long-run network behaviour3
7
8 Wu, Guo, Toyoda (2020) Policy Iteration Approach to the Infinite Horizon Average Optimal Control of Probabilistic Boolean Networks, IEEE Trans, Neural Netw. Learn. Syst.
9 Pal, Datta, Dougherty (2006) Optimal Infinite-Horizon Control for Probabilistic Boolean Networks. IEEE Trans on Signal Processing, 54(6):2375-2387
10 Shmulevich, Dougherty, Zhang (2002) Gene Perturbation and Intervention in PBNs. Bioinformatics 18(10):1319-1331
Lac operon on E. coli 8
Metastatic melanoma 9
Toy example from Shmulevich 10
Control (in our work, here)
What is the series of required interventions (which gene, at which step) to drive a
PBN from any state towards a target attractor, within a finite number of steps ?
- Can intervene on any node
- Can intervene on at most one node at each time step
- Each intervention is followed by a natural evolution step (internal dynamics)
- Aim for minimum number of interventions (perturbations)
- Limit the number of steps (or num of interventions)
- Assume no additional info from systems biology study
- One requirement: knowledge of the target attractor
8
STG / Probability Transition Matrix
Intractable
9
Lac operon E.coli PBN
n = 9 nodes
Corresponding State Transition Graph (STG) : 29 = 512 states
Corresponding Probability Transition Matrix (PTM) : 29 x 29
STG / Probability Transition Matrix
10
Corresponding STG : 210 = 1024 statesCorresponding Probability Transition Matrix : 210 x 210
Fission yeast PBN
n = 10 nodes
STG / Probability Transition Matrix
11
Corresponding STG : 220 = 1,048,576 states
Corresponding Probability Transition Matrix : 220 x 220
Synthetic PBN
n = 20 nodes
(in this paper – see Section 4)
STG / Probability Transition Matrix
Corresponding STG : 228 = 268,435,456 statesCorresponding Probability Transition Matrix : 228 x 228
Metastatic Melanoma PBN
n = 28 nodes
The Probability Transition Matrix (PTM) becomes
computationally intractable for larger networks 11 –
requires the estimation of 2n x (2n – 1) probabilities..
Can we work without the PTM ?
11 Akutsu, Hayashida, et al (2007) Control of Boolean networks: Hardness results and algorithms for tree structured network. Journal of Theoretical Biology, 244(4):670-679
How
Formulate the problem as one of reward maximization and
use Reinforcement Learning.
13
Reinforcement Learning
Policy 𝜋(𝑠) : How to select an action, at each state,
a distribution over actions given a state.
Goal: Maximize expected cumulative reward.
1412 Sutton, Barto (2018) Reinforcement Learning: an introduction, MIT Press [Chapter 6]
Markov Decision Process (MDP)
An MDP is a tuple (𝑆, 𝐴, 𝑃, 𝑅, 𝛾)
𝑆 – Set of states of the environment
𝐴 – Set of possible actions to perform
at some state s ∈ 𝑆
𝑃 – State transition matrix where
𝑃𝑠 𝑡 𝑠 𝑡+1
′
𝑎∈𝐴
= 𝑃[𝑠𝑡+1
′
|𝑠𝑡, 𝑎 𝑡]
𝑅 – Reward function where 𝑅 𝑠
𝑎 =
Ε[𝑅𝑡+1|𝑠𝑡, 𝑎 𝑡]
𝛾 – Discount factor 𝛾 ∈ [0, 1]
PBNs as MDPs:
𝑆 – Binary states
𝐴 – Possible interventions given a state s ∈ 𝑆
(in fact, N+1 actions at each state)
𝑃 – Probability of transitioning between
binary states, given Boolean function realisations
𝑅 – Problem dependent
(we define the reward function)
𝛾 – Problem dependent
(we choose this)
15
PTM
Q-Learning
16
Temporal-Difference (TD) learning.
Model–free.
Off-policy.
N.B. ‘Q’ in Q-Learning stands for Quality, or value, of an action
X
Q-Learning
1712 Sutton, Barto (2018) Reinforcement Learning: an introduction, MIT Press [Chapter 6]
Expected reward of taking action at state , at time step t :
target for update
error in estimate
increment – sample update
old estimatenew estimate
18
But, how do I go to the true value ?
ε-greedy
19
Simplest idea for ensuring continual exploration.
All m actions available are tried with non-zero probability:
with probability ε choose an action at random (define small ε)
with probability 1 – ε choose the greedy action
where a greedy action is an action whose expected reward is the greatest,
and is given by argmax
Approximate iteratively, by selecting actions at each time step.
N.B. Q has been shown to converge to Q* with ε-greedy e.g., see Sutton, Barto (2018) Reinforcement Learning: an introduction, MIT Press
20
Q-Learning implies storing each state-action pair (the Q values).
21
Use a function approximator to learn a parameterised form Q(s, a; θ).
Use DQN to iteratively update θ, in order to approximate Q*(s, a; θ) (true Q values).
Deep Q Net (DQN)
22
TD-Learning is often susceptible to large oscillations in expected Q values.
23
Use a separate network to determine the TD-target.
Double DQN (DDQN)
• “target” DQN – initialised with the same parameters as the main DQN (“policy” DQN) but
has its parameters updated every k iterations
• “policy” DQN – the expected Q values of the target DQN are fixed and every k iterations the
parameters of the policy DQN are copied to the target
used to update the θ parameters,
θt
’ = θt every k time steps.
Reward
Objective: Find a policy that drives a PBN to an attractor in order to maximise reward
: set of states in target attractor
24
25
Training a network from consecutive samples directly from the environment is
susceptible to strong correlations in the data.
26
Sample from a batch of experiences, at each time step t, to update the DDQN.
Prioritised Experience Replay (DDQN with PER)
During training,
• the agent observes state , performs action on the environment, and
• then, environment transitions to and agent receives reward
The transition / experience is stored in a replay buffer
• 5K buffer (for n=10 nodes); 500K (for n=20)
At each t, a batch of experiences is sampled in order to update the network parameters
• 128 for n=10; 512 for n=20
27N.B. Please see Section 3.2 (pp. 4-5) in the paper for more detail.
PER – proportional, importance
• Proportional - probability of an experience being sampled given by
where is the priority of sample i, δ is the TD-error, c is small constant to prevent
experiences with zero TD-error from never being replayed, and ω is the magnitude of prioritisation
• Importance weights used to compensate for samples with high TD-error sampled more often
where L is the size of replay memory and β is used to anneal the amount of
importance sampling over training episodes.
28
Does it work?
Does it work for real GRNs?
Results
Success rate is at least 99% from any initial state
29
- 1024 states
- attractor occurs 1/100 times
- random interventions: 1,387
- horizon of 11 is < 1% of that
- DRL: 99.8% successful control
- 100% if horizon is set to 14
- 1,048,576 states
- attractor occurs 1/10,000 times
- random interventions: 6,511
- horizon of 100 is <1.5% of that
- DRL: 100% successful control
- 99% if horizon is set to 15
- 512 states
- attractor set to 1001111 motivated
by biology (2nd gene, WNT5A,
unexpressed)
- DRL: 100% successful control for 10
- 99.72% if horizon is set to 7
PBN20PBN10 Melanoma
Not in this paper – control larger PBNs
We have tried our DRL (DDQN with PER) method
on the more common type of control problem 13,14
30
OFF
Intervene on pirin’s state only, aiming to drive the
PBN to a state where WNT5A is OFF (target state).
Cancerous Melanoma PBN inferred from GRN data 13,14
13 Pal, Datta, Dougherty (2006) Optimal Infinite-Horizon Control for Probabilistic Boolean Networks. IEEE Trans on Signal Processing, 54(6):2375-2387
14 Sirin, Polat, Alhajj (2013) Employing Batch Reinforcement Learning to Control Gene Regulation Without Explicitly Constructing Gene Regulatory Networks, 23rd IJCAI 2013, 2042-2048
Not in this paper – control larger PBN (N= 70)
On N=7 we get favourable performance
to existing literature 13,14
On N=28 we get favourable performance
to existing literature 14
On N=70 we get 97.6% successful
control.
This is the largest PBN to be controlled,
from real data or synthetic data.
Joint work with Vytenis Sliogeris, paper under preparation
13 Pal, Datta, Dougherty (2006) Optimal Infinite-Horizon Control for Probabilistic Boolean Networks. IEEE Trans on Signal Processing, 54(6):2375-2387
14 Sirin, Polat, Alhajj (2013) Employing Batch Reinforcement Learning to Control Gene Regulation Without Explicitly Constructing Gene Regulatory Networks, 23rd IJCAI, 2042-2048
Not in this paper – infer larger PBNs
We have been successful in inferring a PBN directly from real gene expression data
(samples taken when network in a steady-state distribution)
• Metastatic melanoma dataset from Bittner et al1
• Using CoDs and a perceptron as a predictive model2,3
Our approach does not build the PTM (as our control method does not need it!).
We are looking at inferring a PBN from real, time-series gene expression data
• But, typically, studies provide no more than 6-7 time steps
This is in progress – please get in touch if you are also working on something like this.
32
1 Bittner, Meltzer, Chen et al (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406: 536-540
2 Kim,Dougherty, et al (2000) General nonlinear framework for the analysis of gene interaction via multivariate expression arrays. Journal of Biomedical Optics 5: 411-424
3 Shmulevich, Dougherty, Zhang (2002) Gene Perturbation and Intervention in PBNs. Bioinformatics 18(10): 1319-1331
Thank you for listening.
Any questions?
s.moschoyiannis@surrey.ac.uk

More Related Content

What's hot

Using RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
Using RealTime fMRI Based Neurofeedback To Probe Default Network RegulationUsing RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
Using RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
Cameron Craddock
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
Jie-Han Chen
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Universitat Politècnica de Catalunya
 
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Debmalya Biswas
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Universitat Politècnica de Catalunya
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen
 

What's hot (6)

Using RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
Using RealTime fMRI Based Neurofeedback To Probe Default Network RegulationUsing RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
Using RealTime fMRI Based Neurofeedback To Probe Default Network Regulation
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 

Similar to Deep Reinforcement Learning for control of PBNs--CNA2020

Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
Andrei KUCHARAVY
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
Mark Gerstein
 
Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...
Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...
Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...
ijeei-iaes
 
Robust Immunological Algorithms for High-Dimensional Global Optimization
Robust Immunological Algorithms for High-Dimensional Global OptimizationRobust Immunological Algorithms for High-Dimensional Global Optimization
Robust Immunological Algorithms for High-Dimensional Global Optimization
Mario Pavone
 
MAwais_presentaion
MAwais_presentaionMAwais_presentaion
MAwais_presentaion
Muhammad Awais
 
Accelerating a System’s Biology Kernel Using FPGAs
Accelerating a System’s Biology Kernel Using FPGAsAccelerating a System’s Biology Kernel Using FPGAs
Accelerating a System’s Biology Kernel Using FPGAs
Muhammad Awais
 
PSO.ppsx
PSO.ppsxPSO.ppsx
OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...
OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...
OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...
ijngnjournal
 
presentation
presentationpresentation
presentation
Peter Langfelder
 
ANN in System Biology
ANN in System Biology ANN in System Biology
ANN in System Biology
Hajra Qayyum
 
Biological logic
Biological logicBiological logic
Biological logic
J On The Beach
 
Powerpoint
PowerpointPowerpoint
Powerpoint
butest
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 
The Influence of Age Assignments on the Performance of Immune Algorithms
The Influence of Age Assignments on the Performance of Immune AlgorithmsThe Influence of Age Assignments on the Performance of Immune Algorithms
The Influence of Age Assignments on the Performance of Immune Algorithms
Mario Pavone
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual Variation
SaigeRutherford
 
Dahlquist experimental biology_20160404
Dahlquist experimental biology_20160404Dahlquist experimental biology_20160404
Dahlquist experimental biology_20160404
GRNsight
 
K346670
K346670K346670
K346670
IJERA Editor
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Willy Marroquin (WillyDevNET)
 
Summary of BRAC
Summary of BRACSummary of BRAC
Summary of BRAC
ssuser0e9ad8
 
PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...
PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...
PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...
gerogepatton
 

Similar to Deep Reinforcement Learning for control of PBNs--CNA2020 (20)

Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...
Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...
Modified Monkey Optimization Algorithm for Solving Optimal Reactive Power Dis...
 
Robust Immunological Algorithms for High-Dimensional Global Optimization
Robust Immunological Algorithms for High-Dimensional Global OptimizationRobust Immunological Algorithms for High-Dimensional Global Optimization
Robust Immunological Algorithms for High-Dimensional Global Optimization
 
MAwais_presentaion
MAwais_presentaionMAwais_presentaion
MAwais_presentaion
 
Accelerating a System’s Biology Kernel Using FPGAs
Accelerating a System’s Biology Kernel Using FPGAsAccelerating a System’s Biology Kernel Using FPGAs
Accelerating a System’s Biology Kernel Using FPGAs
 
PSO.ppsx
PSO.ppsxPSO.ppsx
PSO.ppsx
 
OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...
OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...
OPTIMIZATION OF QOS PARAMETERS IN COGNITIVE RADIO USING ADAPTIVE GENETIC ALGO...
 
presentation
presentationpresentation
presentation
 
ANN in System Biology
ANN in System Biology ANN in System Biology
ANN in System Biology
 
Biological logic
Biological logicBiological logic
Biological logic
 
Powerpoint
PowerpointPowerpoint
Powerpoint
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
The Influence of Age Assignments on the Performance of Immune Algorithms
The Influence of Age Assignments on the Performance of Immune AlgorithmsThe Influence of Age Assignments on the Performance of Immune Algorithms
The Influence of Age Assignments on the Performance of Immune Algorithms
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual Variation
 
Dahlquist experimental biology_20160404
Dahlquist experimental biology_20160404Dahlquist experimental biology_20160404
Dahlquist experimental biology_20160404
 
K346670
K346670K346670
K346670
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
Summary of BRAC
Summary of BRACSummary of BRAC
Summary of BRAC
 
PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...
PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...
PREDICTING MORE INFECTIOUS VIRUS VARIANTS FOR PANDEMIC PREVENTION THROUGH DEE...
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Deep Reinforcement Learning for control of PBNs--CNA2020

  • 1. Deep Reinforcement Learning for Control of Probabilistic Boolean Networks Georgios Papagiannis1 and Sotiris Moschoyiannis2 1University of Cambridge, UK 2University of Surrey, UK Complex Networks and their Applications 2020 – 1 Dec 2020 s.moschoyiannis@surrey.ac.uk
  • 2. Boolean Networks1 (BNs) 2 n1 n4 n3 n2 n5 A class of discrete dynamical systems: • Nodes represent genes, gene expression is quantized: 0 (inactive), 1 (active) • Expression level of each gene is functionally related to the expression states of some other genes • At each time step, each node computes and produces output (0 or 1), which is input for its connected nodes in the next time step AND OR Corresponding state space Boolean network (BN) 1 Kauffman (1969) Metabolic stability and epigenesis in randomly constructed genetic nets, J. of Theoretical Biology, 22(3):437-467
  • 3. Attractors 3 n1 n4 n3 n2 n5 Corresponding state space Boolean network (BN) Dynamics of BNs dictate that the network will evolve to a state, or set of states, that it cannot leave without external intervention • Fixed point attractors • Limit cycle attractors
  • 4. 4 n1 n4 n3 n2 n5 AND, p=0.5 OR, p=0.3 NAND, p=0.2 Boolean network (BN) Probabilistic Boolean Networks2 (PBNs) More than one Boolean function at each node; one function executes at each step t, with prob. p Accommodate uncertainty in gene regulation. Corresponding state space • Dynamics of PBNs - admit Markov Chain theory (MDPs) - exhibit attractors; these manifest as: • absorbing states • irreducible sets 2 Shmulevich et al (2002) Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks , Bioinformatics 18(2):261-274
  • 5. Gene Regulatory Networks (GRNs) 5 Segment polarity genes5 Fission yeast cell-cycle6 • Spontaneous emergence of ordered collective behaviour 3 e.g., functional states of the cell such as growth or quiescence correspond to such attractors 3 e.g., high / low resistance to antibiotics at different attractors 4 • (Why PBN study is useful) Targeted therapeutics: external perturbation on certain gene(s), at certain state(s), can drive the GRN to a desirable attractor (drug targets) • where perturbation = change of state (i.e., 0->1, 1-> 0) • Kauffman: attractors are stable under most gene perturbations Study PBNs as a dynamical system where change of state on certain genes, at certain states, may drastically affect the state of the network as a whole, and • lead to a different attractor, with desirable properties • switch between attractors 3 Huang, Ingber (2000) Shape-dependent control of cell growth: Switching between attractors in cell regulatory networks. Experimental Cell Research, 261(1): 91-103 4 Reardon (2017) Modified viruses deliver death to antibiotic-resistant bacteria. Nature, 546:586-587 5 Albert, Othmer (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J. of Theoretical Biology, 223(1):1-18 6 Wang, Du, Chen, et al (2010) Process-based network decomposition reveals backbone motif structure. Proc Natl Acad Sci 107(23):10,478–10,483
  • 6. Control Complex Networks perspective: A dynamical system is controllable if it can be driven from any initial state to any desired state, within finite time 7 7 Liu, Slotine, Barabasi (2011) Controllability of complex networks, Nature 473: 167-173 Our goal: Discover control strategies to effect perturbations on individual nodes (targeted intervention), aiming to drive the whole network from its current state to a specified target state that exhibits desirable (biological) properties. 6
  • 7. Control in (P)BNs Comes in different flavours. e.g., Assume control inputs, intervene only using these 8 Intervene on one node to affect another node’s state 9 Intervene on any node to affect the long-run network behaviour3 7 8 Wu, Guo, Toyoda (2020) Policy Iteration Approach to the Infinite Horizon Average Optimal Control of Probabilistic Boolean Networks, IEEE Trans, Neural Netw. Learn. Syst. 9 Pal, Datta, Dougherty (2006) Optimal Infinite-Horizon Control for Probabilistic Boolean Networks. IEEE Trans on Signal Processing, 54(6):2375-2387 10 Shmulevich, Dougherty, Zhang (2002) Gene Perturbation and Intervention in PBNs. Bioinformatics 18(10):1319-1331 Lac operon on E. coli 8 Metastatic melanoma 9 Toy example from Shmulevich 10
  • 8. Control (in our work, here) What is the series of required interventions (which gene, at which step) to drive a PBN from any state towards a target attractor, within a finite number of steps ? - Can intervene on any node - Can intervene on at most one node at each time step - Each intervention is followed by a natural evolution step (internal dynamics) - Aim for minimum number of interventions (perturbations) - Limit the number of steps (or num of interventions) - Assume no additional info from systems biology study - One requirement: knowledge of the target attractor 8
  • 9. STG / Probability Transition Matrix Intractable 9 Lac operon E.coli PBN n = 9 nodes Corresponding State Transition Graph (STG) : 29 = 512 states Corresponding Probability Transition Matrix (PTM) : 29 x 29
  • 10. STG / Probability Transition Matrix 10 Corresponding STG : 210 = 1024 statesCorresponding Probability Transition Matrix : 210 x 210 Fission yeast PBN n = 10 nodes
  • 11. STG / Probability Transition Matrix 11 Corresponding STG : 220 = 1,048,576 states Corresponding Probability Transition Matrix : 220 x 220 Synthetic PBN n = 20 nodes (in this paper – see Section 4)
  • 12. STG / Probability Transition Matrix Corresponding STG : 228 = 268,435,456 statesCorresponding Probability Transition Matrix : 228 x 228 Metastatic Melanoma PBN n = 28 nodes The Probability Transition Matrix (PTM) becomes computationally intractable for larger networks 11 – requires the estimation of 2n x (2n – 1) probabilities.. Can we work without the PTM ? 11 Akutsu, Hayashida, et al (2007) Control of Boolean networks: Hardness results and algorithms for tree structured network. Journal of Theoretical Biology, 244(4):670-679
  • 13. How Formulate the problem as one of reward maximization and use Reinforcement Learning. 13
  • 14. Reinforcement Learning Policy 𝜋(𝑠) : How to select an action, at each state, a distribution over actions given a state. Goal: Maximize expected cumulative reward. 1412 Sutton, Barto (2018) Reinforcement Learning: an introduction, MIT Press [Chapter 6]
  • 15. Markov Decision Process (MDP) An MDP is a tuple (𝑆, 𝐴, 𝑃, 𝑅, 𝛾) 𝑆 – Set of states of the environment 𝐴 – Set of possible actions to perform at some state s ∈ 𝑆 𝑃 – State transition matrix where 𝑃𝑠 𝑡 𝑠 𝑡+1 ′ 𝑎∈𝐴 = 𝑃[𝑠𝑡+1 ′ |𝑠𝑡, 𝑎 𝑡] 𝑅 – Reward function where 𝑅 𝑠 𝑎 = Ε[𝑅𝑡+1|𝑠𝑡, 𝑎 𝑡] 𝛾 – Discount factor 𝛾 ∈ [0, 1] PBNs as MDPs: 𝑆 – Binary states 𝐴 – Possible interventions given a state s ∈ 𝑆 (in fact, N+1 actions at each state) 𝑃 – Probability of transitioning between binary states, given Boolean function realisations 𝑅 – Problem dependent (we define the reward function) 𝛾 – Problem dependent (we choose this) 15
  • 16. PTM Q-Learning 16 Temporal-Difference (TD) learning. Model–free. Off-policy. N.B. ‘Q’ in Q-Learning stands for Quality, or value, of an action X
  • 17. Q-Learning 1712 Sutton, Barto (2018) Reinforcement Learning: an introduction, MIT Press [Chapter 6] Expected reward of taking action at state , at time step t : target for update error in estimate increment – sample update old estimatenew estimate
  • 18. 18 But, how do I go to the true value ?
  • 19. ε-greedy 19 Simplest idea for ensuring continual exploration. All m actions available are tried with non-zero probability: with probability ε choose an action at random (define small ε) with probability 1 – ε choose the greedy action where a greedy action is an action whose expected reward is the greatest, and is given by argmax Approximate iteratively, by selecting actions at each time step. N.B. Q has been shown to converge to Q* with ε-greedy e.g., see Sutton, Barto (2018) Reinforcement Learning: an introduction, MIT Press
  • 20. 20 Q-Learning implies storing each state-action pair (the Q values).
  • 21. 21 Use a function approximator to learn a parameterised form Q(s, a; θ). Use DQN to iteratively update θ, in order to approximate Q*(s, a; θ) (true Q values). Deep Q Net (DQN)
  • 22. 22 TD-Learning is often susceptible to large oscillations in expected Q values.
  • 23. 23 Use a separate network to determine the TD-target. Double DQN (DDQN) • “target” DQN – initialised with the same parameters as the main DQN (“policy” DQN) but has its parameters updated every k iterations • “policy” DQN – the expected Q values of the target DQN are fixed and every k iterations the parameters of the policy DQN are copied to the target used to update the θ parameters, θt ’ = θt every k time steps.
  • 24. Reward Objective: Find a policy that drives a PBN to an attractor in order to maximise reward : set of states in target attractor 24
  • 25. 25 Training a network from consecutive samples directly from the environment is susceptible to strong correlations in the data.
  • 26. 26 Sample from a batch of experiences, at each time step t, to update the DDQN. Prioritised Experience Replay (DDQN with PER) During training, • the agent observes state , performs action on the environment, and • then, environment transitions to and agent receives reward The transition / experience is stored in a replay buffer • 5K buffer (for n=10 nodes); 500K (for n=20) At each t, a batch of experiences is sampled in order to update the network parameters • 128 for n=10; 512 for n=20
  • 27. 27N.B. Please see Section 3.2 (pp. 4-5) in the paper for more detail. PER – proportional, importance • Proportional - probability of an experience being sampled given by where is the priority of sample i, δ is the TD-error, c is small constant to prevent experiences with zero TD-error from never being replayed, and ω is the magnitude of prioritisation • Importance weights used to compensate for samples with high TD-error sampled more often where L is the size of replay memory and β is used to anneal the amount of importance sampling over training episodes.
  • 28. 28 Does it work? Does it work for real GRNs?
  • 29. Results Success rate is at least 99% from any initial state 29 - 1024 states - attractor occurs 1/100 times - random interventions: 1,387 - horizon of 11 is < 1% of that - DRL: 99.8% successful control - 100% if horizon is set to 14 - 1,048,576 states - attractor occurs 1/10,000 times - random interventions: 6,511 - horizon of 100 is <1.5% of that - DRL: 100% successful control - 99% if horizon is set to 15 - 512 states - attractor set to 1001111 motivated by biology (2nd gene, WNT5A, unexpressed) - DRL: 100% successful control for 10 - 99.72% if horizon is set to 7 PBN20PBN10 Melanoma
  • 30. Not in this paper – control larger PBNs We have tried our DRL (DDQN with PER) method on the more common type of control problem 13,14 30 OFF Intervene on pirin’s state only, aiming to drive the PBN to a state where WNT5A is OFF (target state). Cancerous Melanoma PBN inferred from GRN data 13,14 13 Pal, Datta, Dougherty (2006) Optimal Infinite-Horizon Control for Probabilistic Boolean Networks. IEEE Trans on Signal Processing, 54(6):2375-2387 14 Sirin, Polat, Alhajj (2013) Employing Batch Reinforcement Learning to Control Gene Regulation Without Explicitly Constructing Gene Regulatory Networks, 23rd IJCAI 2013, 2042-2048
  • 31. Not in this paper – control larger PBN (N= 70) On N=7 we get favourable performance to existing literature 13,14 On N=28 we get favourable performance to existing literature 14 On N=70 we get 97.6% successful control. This is the largest PBN to be controlled, from real data or synthetic data. Joint work with Vytenis Sliogeris, paper under preparation 13 Pal, Datta, Dougherty (2006) Optimal Infinite-Horizon Control for Probabilistic Boolean Networks. IEEE Trans on Signal Processing, 54(6):2375-2387 14 Sirin, Polat, Alhajj (2013) Employing Batch Reinforcement Learning to Control Gene Regulation Without Explicitly Constructing Gene Regulatory Networks, 23rd IJCAI, 2042-2048
  • 32. Not in this paper – infer larger PBNs We have been successful in inferring a PBN directly from real gene expression data (samples taken when network in a steady-state distribution) • Metastatic melanoma dataset from Bittner et al1 • Using CoDs and a perceptron as a predictive model2,3 Our approach does not build the PTM (as our control method does not need it!). We are looking at inferring a PBN from real, time-series gene expression data • But, typically, studies provide no more than 6-7 time steps This is in progress – please get in touch if you are also working on something like this. 32 1 Bittner, Meltzer, Chen et al (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406: 536-540 2 Kim,Dougherty, et al (2000) General nonlinear framework for the analysis of gene interaction via multivariate expression arrays. Journal of Biomedical Optics 5: 411-424 3 Shmulevich, Dougherty, Zhang (2002) Gene Perturbation and Intervention in PBNs. Bioinformatics 18(10): 1319-1331
  • 33. Thank you for listening. Any questions? s.moschoyiannis@surrey.ac.uk

Editor's Notes

  1. Boolean networks were introduced by Stuart Kauffman as a model of genetic networks. BNs are a class of discrete dynamical systems, characterized by interactions over a set of Boolean variables. Nodes represent genes… As the network evolves we build the corresponding state space, shown here, which show the state of the network as a whole, between time steps.
  2. If you let these networks evolve for some time, they will invariably end up in a state they cannot get out of, at least no without external intervention. These are the so-called attractors and they come in the form of fixed points (blue) or limit cycles (red)
  3. Probabilistic BNs extend BNs to accommodate uncertainty in gene regulation. In PBNs each node is associated with more than one function, and one of these executes at each time step, with some probability p. So node n1 will use AND 5 out of 10 times, OR 3/10 times, NAND 2/10 times. The dynamics are similar to BNs with absorbing states and irreducible sets, this time, being the attractors.
  4. BNs have been used to model Gene Regulatory Networks Seminal work by Reka Albert, who gave the keynote in last year's Complex Networks conf (Lisbon), another example - the fission yeast cycle here There is a nice correspondence between ordered collective behaviour in GRNs and attractors in PBNs. I note here just 2 of the works from systems biology and genetics, who point to the fact that certain attractors are desirable, others not (ref in 3 here); sometimes switching between attractors is a good idea (ref in 4 here) So beyond using PBN for modelling GRNs, one starts to think where and when to intervene (what gene to change from 0 to 1, and vice versa) in order to drive the GRN to a desired state. Kauffman since 1993 tells us that not all interventions are effective.. so we propose the study of PBNs as a dynamical system ... PAUSE.. where change of state on certain genes (your control parameter - micro) has drastic effect on the state of the whole network (your order parameter - macro)
  5. So we approach Control from a complex networks perspective. I quote from Liu Slotine Barabasi’s 2011 paper in Nature: “READ IT OUT LOUD “
  6. And I say this because Control in the literature comes in different flavours. In certain GRNs one can assume control inputs and focus on those only (the green ones in the example at the top) to control the network. Other works consider intervening on one node only, to affect another node’s state (study of Melanoma in ref 9 here)
  7. Jut to be clear in the work in this paper specifically, we ask: “READ IT” And these are the rules of engagement, so to speak.. Won’t go thorough all of them, they are discussed in detail in the paper ! -- Interventions allowed on any node Only one node at a time, so can’t change all of nodes at once Respect the internal dynamics as much as possible, so mim num of interventions, finite number of steps
  8. I mentioned the state space when introducing PBNs. For N nodes, and since nodes are Boolean variables, there are 2^N states. So 9 nodes in the PBN, 512 states Matrix representation of that is 2^N times 2^N
  9. Fission yeast modelled as PBN of 10 nodes, state space of 1024 So one more node, double the num of states…
  10. In the paper we control a PBN of 20 nodes, that is about a MILLION states!
  11. We have also worked with PBN with 28 nodes since submitting this paper, which takes us closer to 270M states! So the point here is that although there are some elegant techniques around, the PTM is intractable for larger networks. See this paper here from 2007 … Soo… can we work without the PTM?
  12. YES WE CAN ! We formulate the problem as one of reward maximization and use RL ---- this is what the paper is about. I try to give you a quick tour of how we do this.
  13. Very briefly, in RL the agent learns by interaction with the environment. The agent's goal is to maximise the expected cumulative reward. The key word here is cumulative. So having been presented with a state by the env, the Agent selects an action and that choice not only determines the reward it receives at the next step, but also affects the next state it is presented wit...and by virtue of that, also the set of actions it can take at that next step. So selecting an action is important, what policy the agent follows, as we say,,,, this may be random or more sophisticated.
  14. The environment is often modelled as Markov Decision Process. -The binary states of the PBN become the states in the MDP -There are N+1 actions at each state: the option for flipping each of the N nodes plus doing nothing -There is the probability of function selection at each node, at each time step
  15. As far as the agent is concerned we equip it in our work with a Q-Learning algorithm. This is a Temporal-Difference Learning method ....so it combines learning from experience (so sampling like MC) with bootstrapping (like DP). Importantly Q-Learning is model-free, so no knowledge of the dynamics is required. So we don’t need the PTM (CLICK!), which as we saw earlier is intractable for larger networks. Also it is off-policy and I will come back to this later. The idea behind Temporal-Difference (TD) Learning methods is to learn directly from the environment but update estimates based on other learned estimates, without waiting for the end of the episode (bootstrap – like DP)
  16. As you probably suspect it is important to be able to calculate the expected reward of each action (as that can be used in selecting actions at each step). This formula here is key for improving the estimates I will not go into detail, this is pretty standard in RL So this is the expected reward at t+1 having taken a at t, This is the the expected value of the best action available from that next state (target for the update) And if I take out the current estimate, this gives me the error in the estimate And \alpha is the “learning rate“ which says how much I should take this error into account when updating to the new estimate …
  17. So that formula tells us how to improve our estimated Q values.... But how do we get to the true Q values?
  18. The answer is… Approximate iteratively. In this work, we use and epsilon-greedy policy so with prob epsilon we choose an action randomly with prob 1- epsilon we choose greedily – CLICK so we go for the action with the greatest expected reward Important note: Q converges to Q* with this policy
  19. So we know how to get the true values. PAUSE But there is some computational cost associated with storing these, especially in large state spaces.
  20. We turn to Deep RL to address this. We use a function approximator to learn a parameterised form This Deep Q Net (DQN) is trained by minimizing a sequence of loss functions L_i you see here…
  21. Now TD-Learning is often susceptible to overestimating….
  22. To address this, and I wont go into much detail here, we use a __separate__ network for the target during training, And we use another, a second n/w, the so-called “policy DQN”, for updating the parameters of the “target DQN” [CLICK] every k number of time steps. So we effectively use a Double DQN (DDQN) in our control method. -- The parameters from this second DQN, the so-called “policy DQN”, are used to update the parameters of the “target DQN”
  23. When it comes to rewards, we assign negative rewards for actions that do not lead to the desired attractor and positive for those that do. We also take into account the internal dynamics, the natural tendency of a PBN to graaavitate towards an attractor , which might not be the desired one, so we penalise more the actions that lead to a non-target attractor.
  24. Before we talk results, last point on the method… Training from successive experiences is susceptible to strong correlations in the data.
  25. To break up such correlations we sample from a batch of experiences, at each t, to update the n/w parameters. CLICK Size of the sample varies depending on the size of the PBN.
  26. The particular flavour of PER we use in this work is Proportional, as compared to rank-based for example, and we also use importance weighting .
  27. ... RESULTS !
  28. We have applied our method to various networks. HEADLINE NEWS: our DRL method leads to successful control, over 99% of the time!! from any initial state Some highlights: CLICK On the synthetic PBN20 (just over 1M states), the target attractor occurs 1/10K times (!), it would take 6,500 random interventions to get to get that attractor; in comparison, our DRL method needs under 100 interventions to take us there (...in faaact, the success rate only drops to 99% if we limit the agent to 15 interventions) On the PBN of a real GRN, from the well studied Melanoma gene expression dataset, we get 100% success rate when allowing up to 10 interventions; if we limit to 7, success drops by only a fraction to 99.72% So we are quite pleased with results.
  29. That made us think: Can we do larger networks? Can we do the other type of control? I refer to the problem that other work on control has been looking at: play with some subset of genes (CLICK pirin, in the studies I reference here), to fix the state of another gene (CLICK WNT5A OFF here)
  30. We applied our method to this kind of control on the 7-node PBN from the melanoma GRN We did it on the 28 node PBN -- which is the largest PBN addressed in existing literature. We get favourable results! Recently, we have been successful in controlling a 70-node PBN from this Melanoma dataset. So these developments make us hopeful we can control larger networks!!
  31. That's all from me! hope you found at least soooome parts of this talk interesting. Please do get in touch if you are working on something similar. Thank you.