Optimising sepsis treatment with reinforcement learning

The Big Sick 2018
OPTIMISING SEPSIS TREATMENT
WITH REINFORCEMENT LEARNING
Dr Matthieu Komorowski
Consultant, Intensive Care Unit, Charing Cross Hospital, London
PhD student, Dept of Surgery and Cancer, Dept of Bioengineering, Imperial College London
Visiting scientist, Lab of Computational Physiology, MIT
Affiliate, Harvard School of Engineering and Applied Sciences
@matkomorowski matthieu.komorowski@gmail.com

Sepsis:
the big picture
[Rhodes 2017, Carneiro 2017, www.SCCM.org, www.CDC.gov]
• > 25 million cases annually worldwide
• Infectious diseases: 2nd or 3rd global cause of mortality
• Main cause of in-hospital deaths
• Most expensive condition treated in hospitals (US: $24B/year)

Treatment of sepsis
Control the source of infection1
Correct hypovolemia and
vasoplegia2
Treat secondary complications
(organ failures)3
OK
± OK

Correcting hypovolaemia and vasoplegia
Unanswered questions:
• Measuring volemia?
• What is the correct
volemia?
• Required volume of IV
fluid?
• Right time to initiate
vasopressors?
• Balance between IV fluids
and vasopressors?
• What parameters to target?

Fluids or vasopressors?
7[Acheampong Crit Care 2015; Bai Crit Care 2014]
Sustained positive fluid balance is
associated with poor outcomes

Current state-of-the-art of sepsis management
• The demise of Early-Goal Directed Therapy
• No real-time decision support to deliver “precision medicine”
[Marik Acta Scand 2015, Andrews JAMA 2017, PRISM Investigators NEJM 2017]

Machine learning = « learning from data»
Supervised
learning
• Learn the
function y=f(x)
1.Regression
2.Classification
Unsupervised
learning
• Learn data
structure
1.Clustering
2.Dimensionality
reduction
Reinforcement
learning
• Learn an
optimal
strategy

Medical cognitive process
Medical
decision
Data from new
patient
Theoretical
medical
knowledge
Clinical experience
(cases previously
encountered)
Difficulties:
• Cognitive biases
• Lack of
physio/pathological
model
• Lack of theoretical
knowledge
• Similar cases seen
previously but
forgotten
• Rare cases
• Wrong diagnosis
• Practice variations
• Etc.

The « perfect physician »
• Complete knowledge of all
human physiology and
diseases, of all existing
treatment options and of
the most optimal one
OR
• Permanent and unbiased
knowledge of vast number
of very similar patients,
which treatments they
received and what was
their outcome
New
Patient
Mortality

Reinforcement learning
• Objective: learn an optimal strategy
• More complex than prediction tasks!

Medical decision as a reinforcement
learning problem
Physician
policy π
state 𝑠 action areward 𝑟
Patient
= patient’s
condition
[Sutton & Barto, 2017]
= prescription of a
dose of drug (IV fluids
and vasopressor)
= change in
mortality risk
Objective 1: Physician’s policy?
Objective 2: Optimal policy π*?

Why is it harder than playing Atari games?
In medicine:
• Limited amount of
training data
• Environment not
fully specified
• Impossible to learn
by trial-and-error
• No simulator to
test suggested
strategies
[Mnih Nature 2015]

The datasets
• Inclusion: adults with sepsis
• Data: time series of 48 variables
• Up to 72h of data per patient
Development dataset
MIMIC-III
Validation dataset
eICU-RI
17,898 patients from 5 ICUs 80,257 patients from 128 ICUs

Results: model calibration
Relationship between the value
of physicians’ decisions and the
risk of 90-day mortality.

How optimal do you need to be?
Observed mortality of
patients, depending on
whether the 1st, 2nd,
etc. most optimal
action was chosen

Comparing the 2 policies
On average, patients received more IV fluids
and less vasopressors than recommended.

Are the suggested doses optimal?
Intravenous fluids Vasopressors

Estimated mortality with optimal decisions
• What mortality gain can be
expected with optimal decisions?
• Random forest regression model.
• Predicted hospital mortality risk
with optimal actions is 9.6% (95%
CI: 9.1% – 10.1%), compared to
actual mortality of 17.7%

Interpretability
of the policies
What parameters are the
most important when
deciding whether a
patient needs fluids or
vasopressors?

Conclusion
• Current sepsis management is suboptimal
• Reinforcement learning could lead to the development of
decision support systems for sepsis
• Flexible framework transferable to other clinical questions

Questions?
25
matthieu.komorowski@gmail.com
@matkomorowski

Markov Decision Process
• A general framework for modelling sequential, stochastic and dynamic
decisions.
[Schaefer 2005]
• Defined by 𝑆, 𝐴, 𝑇, 𝑅
• 𝑆: a finite set of states
• 𝐴: a finite set of actions
• 𝑇 𝑠𝑡+1, 𝑠𝑡, 𝑎 𝑡 : the
transition matrix
• 𝑅: the immediate reward =
{-100, +100}
Action 1
Survival
State 91
Death
State 12
State
307
State 65
Action 21
Action 1
Action 9
Action 15
Action 11
Actual π
Optimal π
-100
+100

Development dataset Validation dataset
Source MIMIC-III Philips eICU-RI
# ICU admissions 17,898 80,257
# ICUs 5 128
Primary ICD code
• Sepsis
• Cardiovascular
• Other resp.
• Neurological
• Other
34%
31%
10%
9%
15%
52%
14%
11%
9%
13%
Mean age, years 65 65
Gender 56% male 52% male
Initial SOFA (0-24) 7.3 (3.3) 7.0 (3.5)
Initial OASIS (0-70) 33.5 (8.8) 34.8 (12.4)
Procedures:
• Mech. vent.
• Vasopressors
• Dialysis
55%
35%
9%
50%
30%
8%
Hospital mortality 13.7% 17.7%
90-day mortality 22.5% Not available

Comparing the values of the policies
(500 models)

Action space: 25 actions
Vasopressors = norepi, vasopressin and phenylephrine
Discretised
action
IV fluids (mL in 4h) Vasopressors (unitless)
Range
Median
dose
Range
Median
dose
1 0 0 0 0
2 ]0-140] 56 ]0-0.06] 0.04
3 ]140-350] 240 ]0.06-0.14] 0.1
4 ]350-675] 486 ]0.14-0.38] 0.2
5 >675 1150 >0.38 0.6

Objective 1: Estimate value
of clinicians’ policy
Offline sampling SARSA
Objective:
Estimate the true value of physician’s policy (state-action
value Q)
Repeat:
Pick an actual, observed episode, with
resampling
For each step of the episode: observe
𝑠, 𝑎, r, 𝑠′
and 𝑎′
Update Q:
𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 ∙ (𝑟 + 𝛾 ∙ 𝑄 𝑠′
, 𝑎′
− 𝑄(𝑠, 𝑎))
Dynamic Programming: Policy Iteration
Objective:
Maximise sum of expected discounted rewards
Repeat:
Objective 2: Find the
optimal policy

MAP target in sepsis
• The model could help identify
individual MAP targets
• We can model the best
possible trajectory of patients
from any state: “Optimal Path”
• Let’s plot the MAP along this
optimal path, along with the
MAP of survivors and non-
survivors who started in the
same clinical state.

Was the
optimal MAP
higher than
what was
achieved?

Was the
optimal MAP
lower than
what was
achieved?

Machine learning = « learning from data»
Supervised
learning
• Learn the
function
y=f(x)
1.Regression
2.Classification
Unsupervised
learning
• Learn data
structure
1.Clustering
2.Dimensional
reduction
Reinforcement
learning
• Learn an
optimal
strategy

Supervised learning
• Objective : predict an outcome given patient parameters: y=f(x)
• Methods:
• Regression for continuous outcome
• Classification for binary outcome
• Exemple: predict mortality risk from SOFA on admission
Total per
score SOFA
Patient SOFA Death
1 4 0
2 21 1
3 11 0
… … …
199,999 2 0
200,000 16 1
SOFA # patients % mortality
0 9,103 0.2
1 9,125 2.1
2 16,492 3.6
… …
23 0 -
24 1 100

Supervised learning: logistic regression classifier
[Raith, JAMA 2017]

Supervised learning: NEWS score
Objective: predict hospital mortality from 7 clinical parameters
Survival plot of patients presenting
in the ED with respiratory distress
[Bilben 2016][UK Royal College of Physicians 2012]

• Principle: ensemble methods =
linear combination of sub-models
• Melds results from many weak
learners into one high-quality
ensemble predictor
• “does at least as well as the best
member of its library”
• AUROC in validation cohort = 0.94

Supervised learning: neural networks
SOFA
Age
Mortality
SIMPLE
CONVOLUTIONAL
DEEP
[MathWorks.com]

Supervised learning: image classification
[Kaggle StateFarm]

Supervised learning: deep learning in the ICU
• Objective:
predict
interventions
• Invasive
ventilation
• NIV
• Vasopressors
• Fluid boluses
• AUCs achieved:
0.75 to 0.97
2017

Unsupervised learning
• Objective: find a structure in the data
• Example: clustering
k-means
Hierarchical clustering

Transcriptomic sepsis response signatures Kaplan-Meier survival plot by SRS group

Optimising sepsis treatment with reinforcement learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Optimising sepsis treatment with reinforcement learning

Similar to Optimising sepsis treatment with reinforcement learning (20)

More from Mads Astvad

More from Mads Astvad (20)

Recently uploaded

Recently uploaded (20)

Optimising sepsis treatment with reinforcement learning