The POPPY STUDY (Preconception to post-partum cardiovascular function in prim...
Optimising sepsis treatment with reinforcement learning
1. The Big Sick 2018
OPTIMISING SEPSIS TREATMENT
WITH REINFORCEMENT LEARNING
Dr Matthieu Komorowski
Consultant, Intensive Care Unit, Charing Cross Hospital, London
PhD student, Dept of Surgery and Cancer, Dept of Bioengineering, Imperial College London
Visiting scientist, Lab of Computational Physiology, MIT
Affiliate, Harvard School of Engineering and Applied Sciences
@matkomorowski matthieu.komorowski@gmail.com
4. Sepsis:
the big picture
[Rhodes 2017, Carneiro 2017, www.SCCM.org, www.CDC.gov]
• > 25 million cases annually worldwide
• Infectious diseases: 2nd or 3rd global cause of mortality
• Main cause of in-hospital deaths
• Most expensive condition treated in hospitals (US: $24B/year)
5. Treatment of sepsis
Control the source of infection1
Correct hypovolemia and
vasoplegia2
Treat secondary complications
(organ failures)3
OK
± OK
6. Correcting hypovolaemia and vasoplegia
Unanswered questions:
• Measuring volemia?
• What is the correct
volemia?
• Required volume of IV
fluid?
• Right time to initiate
vasopressors?
• Balance between IV fluids
and vasopressors?
• What parameters to target?
8. Current state-of-the-art of sepsis management
• The demise of Early-Goal Directed Therapy
• No real-time decision support to deliver “precision medicine”
[Marik Acta Scand 2015, Andrews JAMA 2017, PRISM Investigators NEJM 2017]
9. Machine learning = « learning from data»
Supervised
learning
• Learn the
function y=f(x)
1.Regression
2.Classification
Unsupervised
learning
• Learn data
structure
1.Clustering
2.Dimensionality
reduction
Reinforcement
learning
• Learn an
optimal
strategy
10. Machine learning = « learning from data»
Supervised
learning
• Learn the
function y=f(x)
1.Regression
2.Classification
Unsupervised
learning
• Learn data
structure
1.Clustering
2.Dimensionality
reduction
Reinforcement
learning
• Learn an
optimal
strategy
11. Medical cognitive process
Medical
decision
Data from new
patient
Theoretical
medical
knowledge
Clinical experience
(cases previously
encountered)
Difficulties:
• Cognitive biases
• Lack of
physio/pathological
model
• Lack of theoretical
knowledge
• Similar cases seen
previously but
forgotten
• Rare cases
• Wrong diagnosis
• Practice variations
• Etc.
12. The « perfect physician »
• Complete knowledge of all
human physiology and
diseases, of all existing
treatment options and of
the most optimal one
OR
• Permanent and unbiased
knowledge of vast number
of very similar patients,
which treatments they
received and what was
their outcome
New
Patient
Mortality
14. Medical decision as a reinforcement
learning problem
Physician
policy π
state 𝑠 action areward 𝑟
Patient
= patient’s
condition
[Sutton & Barto, 2017]
= prescription of a
dose of drug (IV fluids
and vasopressor)
= change in
mortality risk
Objective 1: Physician’s policy?
Objective 2: Optimal policy π*?
15. Why is it harder than playing Atari games?
In medicine:
• Limited amount of
training data
• Environment not
fully specified
• Impossible to learn
by trial-and-error
• No simulator to
test suggested
strategies
[Mnih Nature 2015]
16. The datasets
• Inclusion: adults with sepsis
• Data: time series of 48 variables
• Up to 72h of data per patient
Development dataset
MIMIC-III
Validation dataset
eICU-RI
17,898 patients from 5 ICUs 80,257 patients from 128 ICUs
22. Estimated mortality with optimal decisions
• What mortality gain can be
expected with optimal decisions?
• Random forest regression model.
• Predicted hospital mortality risk
with optimal actions is 9.6% (95%
CI: 9.1% – 10.1%), compared to
actual mortality of 17.7%
24. Conclusion
• Current sepsis management is suboptimal
• Reinforcement learning could lead to the development of
decision support systems for sepsis
• Flexible framework transferable to other clinical questions
26. Markov Decision Process
• A general framework for modelling sequential, stochastic and dynamic
decisions.
[Schaefer 2005]
• Defined by 𝑆, 𝐴, 𝑇, 𝑅
• 𝑆: a finite set of states
• 𝐴: a finite set of actions
• 𝑇 𝑠𝑡+1, 𝑠𝑡, 𝑎 𝑡 : the
transition matrix
• 𝑅: the immediate reward =
{-100, +100}
Action 1
Survival
State 91
Death
State 12
State
307
State 65
Action 21
Action 1
Action 9
Action 15
Action 11
Actual π
Optimal π
-100
+100
27. Development dataset Validation dataset
Source MIMIC-III Philips eICU-RI
# ICU admissions 17,898 80,257
# ICUs 5 128
Primary ICD code
• Sepsis
• Cardiovascular
• Other resp.
• Neurological
• Other
34%
31%
10%
9%
15%
52%
14%
11%
9%
13%
Mean age, years 65 65
Gender 56% male 52% male
Initial SOFA (0-24) 7.3 (3.3) 7.0 (3.5)
Initial OASIS (0-70) 33.5 (8.8) 34.8 (12.4)
Procedures:
• Mech. vent.
• Vasopressors
• Dialysis
55%
35%
9%
50%
30%
8%
Hospital mortality 13.7% 17.7%
90-day mortality 22.5% Not available
29. Action space: 25 actions
Vasopressors = norepi, vasopressin and phenylephrine
Discretised
action
IV fluids (mL in 4h) Vasopressors (unitless)
Range
Median
dose
Range
Median
dose
1 0 0 0 0
2 ]0-140] 56 ]0-0.06] 0.04
3 ]140-350] 240 ]0.06-0.14] 0.1
4 ]350-675] 486 ]0.14-0.38] 0.2
5 >675 1150 >0.38 0.6
30. Objective 1: Estimate value
of clinicians’ policy
Offline sampling SARSA
Objective:
Estimate the true value of physician’s policy (state-action
value Q)
Repeat:
Pick an actual, observed episode, with
resampling
For each step of the episode: observe
𝑠, 𝑎, r, 𝑠′
and 𝑎′
Update Q:
𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 ∙ (𝑟 + 𝛾 ∙ 𝑄 𝑠′
, 𝑎′
− 𝑄(𝑠, 𝑎))
Dynamic Programming: Policy Iteration
Objective:
Maximise sum of expected discounted rewards
Repeat:
Objective 2: Find the
optimal policy
31. MAP target in sepsis
• The model could help identify
individual MAP targets
• We can model the best
possible trajectory of patients
from any state: “Optimal Path”
• Let’s plot the MAP along this
optimal path, along with the
MAP of survivors and non-
survivors who started in the
same clinical state.
37. Supervised learning: NEWS score
Objective: predict hospital mortality from 7 clinical parameters
Survival plot of patients presenting
in the ED with respiratory distress
[Bilben 2016][UK Royal College of Physicians 2012]
38. • Principle: ensemble methods =
linear combination of sub-models
• Melds results from many weak
learners into one high-quality
ensemble predictor
• “does at least as well as the best
member of its library”
• AUROC in validation cohort = 0.94