Using	
  Machine	
  Learning	
  to	
  Automate	
  
Clinical	
  Pathways	
  
David	
  Sontag,	
  PhD	
  
Department	
  of	
  Computer	
  Science	
  
Courant	
  Ins@tute	
  of	
  Mathema@cal	
  Sciences	
  
NYU	
  
Joint	
  work	
  with	
  my	
  student	
  Yoni	
  Halpern	
  (NYU)	
  and	
  Steven	
  Horng	
  
(Beth	
  Israel	
  Deaconess	
  Medical	
  Center)	
  
Health	
  Informa@on	
  Technology	
  is	
  
Rapidly	
  Changing	
  
•  Aided	
  by	
  HITECH	
  Act,	
  hospital	
  adop@on	
  of	
  
EHRs	
  has	
  increased	
  5-­‐fold	
  since	
  2008	
  
[Charles	
  et	
  al.,	
  ONC	
  Data	
  Brief,	
  May	
  2014]	
  
•  Over	
  $4	
  billion	
  of	
  investment	
  in	
  digital	
  health	
  
startups	
  in	
  2014	
  
Health	
  Informa@on	
  Technology	
  is	
  
Rapidly	
  Changing	
  
Analy@cs	
  /	
  Big	
  
Data	
  
Healthcare	
  
Consumer	
  
Engagement	
  
[Wang	
  et	
  al.,	
  “Digital	
  health	
  funding	
  in	
  Q1	
  2015	
  over	
  $600M”,	
  Rock	
  Health,	
  April	
  2015]	
  
EHR	
  /	
  Clinical	
  
Workflow	
  
Digital	
  
Diagnos@cs	
  
Popula@on	
  
Health	
  
Management	
  
Digital	
  
Medical	
  
Device	
  
[Weber	
  et	
  al.	
  (2014).	
  Finding	
  the	
  Missing	
  Link	
  for	
  Big	
  Biomedical	
  Data.	
  JAMA.]	
  
Wealth	
  of	
  digital	
  health	
  data	
  available	
  
Research	
  in	
  my	
  clinical	
  ML	
  lab	
  
•  Next-­‐genera*on	
  electronic	
  health	
  records	
  
focus	
  of	
  today’s	
  talk	
  
•  Popula@on-­‐level	
  risk	
  stra@fica@on	
  
•  Beber	
  managing	
  pa@ents	
  with	
  chronic	
  
disease	
  
clinicalml.org	
  
Emergency	
  Department:	
  
•  Limited	
  resources	
  
•  Time	
  sensi*ve	
  
•  Cri*cal	
  decisions	
  
Triage	
  Informa@on	
  
(Free	
  text)	
  
Lab	
  results	
  
(Con@nuous	
  valued)	
  
MD	
  comments	
  
(free	
  text)	
  
Specialist	
  consults	
   Physician	
  
documenta@on	
  
Repeated	
  vital	
  signs	
  
(con@nuous	
  values)	
  
Measured	
  every	
  30	
  s	
  
T=0	
  
30	
  min	
  
2	
  hrs	
  
Disposi@on	
  
Next-Generation EHR for the
Emergency Department
All	
  pa*ent	
  	
  
observa*ons	
  
MD/nurse	
  
documenta@on	
  
Billing	
  
codes	
   Vitals	
   Orders	
   Labs	
   History	
  
Built	
  on	
  Top	
  of	
  Real-­‐@me	
  Predic@on	
  of	
  Clinical	
  
State	
  Variables	
  
All	
  pa*ent	
  	
  
observa*ons	
  
Clinical	
  state	
  
variables	
  
MD/nurse	
  
documenta@on	
  
Billing	
  
codes	
   Vitals	
   Orders	
   Labs	
   History	
  
From	
  nursing	
  
home?	
  
Has	
  altered	
  
mental	
  
status?	
  
Has	
  cardiac	
  
e@ology?	
  
Has	
  
infec@on?	
  
Will	
  die	
  in	
  
next	
  30	
  
days?	
  
Built	
  on	
  Top	
  of	
  Real-­‐@me	
  Predic@on	
  of	
  Clinical	
  
State	
  Variables	
  
Machine	
  learning	
  and	
  natural	
  language	
  processing	
  
All	
  pa*ent	
  	
  
observa*ons	
  
Clinical	
  state	
  
variables	
  
MD/nurse	
  
documenta@on	
  
Billing	
  
codes	
   Vitals	
   Orders	
   Labs	
   History	
  
Ac*on	
   Alerts/
Reminders	
  
Decision	
  support	
   Cohort	
  Selec@on	
  QA	
  review	
  
Contextual	
  
display	
  
From	
  nursing	
  
home?	
  
Has	
  altered	
  
mental	
  
status?	
  
Has	
  cardiac	
  
e@ology?	
  
Has	
  
infec@on?	
  
Will	
  die	
  in	
  
next	
  30	
  
days?	
  
Built	
  on	
  Top	
  of	
  Real-­‐@me	
  Predic@on	
  of	
  Clinical	
  
State	
  Variables	
  
Machine	
  learning	
  and	
  natural	
  language	
  processing	
  
All	
  pa*ent	
  	
  
observa*ons	
  
Clinical	
  state	
  
variables	
  
MD/nurse	
  
documenta@on	
  
Billing	
  
codes	
   Vitals	
   Orders	
   Labs	
   History	
  
Ac*on	
   Alerts/
Reminders	
  
Decision	
  support	
   Cohort	
  Selec@on	
  QA	
  review	
  
Contextual	
  
display	
  
From	
  nursing	
  
home?	
  
Has	
  altered	
  
mental	
  
status?	
  
Has	
  cardiac	
  
e@ology?	
  
Has	
  
infec@on?	
  
Will	
  die	
  in	
  
next	
  30	
  
days?	
  
Built	
  on	
  Top	
  of	
  Real-­‐@me	
  Predic@on	
  of	
  Clinical	
  
State	
  Variables	
  
Machine	
  learning	
  and	
  natural	
  language	
  processing	
  
Advise	
  fall	
  
precau@ons	
  
Suggested	
  
order	
  sets	
  
Triggering	
  
celluli@s	
  
pathway	
  
Sepsis	
  alert	
   Panel	
  
management	
  
Example:	
  Triggering	
  Clinical	
  Pathways	
  
•  Clinical	
  Pathways	
  project	
  at	
  Beth	
  Israel	
  Deaconess	
  
Medical	
  Center	
  (BIDMC)	
  
•  Standardizing	
  care	
  in	
  the	
  Emergency	
  Department	
  
–  Reduce	
  possibili@es	
  for	
  error	
  
–  Enforce	
  established	
  best	
  prac@ces	
  
•  Pathways	
  have	
  been	
  shown	
  to	
  reduce	
  in-­‐hospital	
  
complica@ons,	
  without	
  increasing	
  costs	
  [Rober	
  et	
  
al	
  2010]	
  
Celluli@s	
  Pathway	
  Flowchart	
  
Automa@ng	
  triggers	
  
•  Don’t	
  rely	
  on	
  the	
  user’s	
  knowledge	
  that	
  the	
  
pathway	
  exists!	
  
Current	
  triggering	
  mechanism	
  
(Celluli@s	
  pathway)	
  
Trigger	
  if	
  chief	
  complaint	
  contains	
  any	
  of	
  the	
  
following:	
  	
  
CELLULITIS,	
  REDDENED	
  HOT	
  LIMB,	
  ERYTHEMA,	
  LEG	
  
SWELLING,	
  INFECTION,	
  HAND,	
  LEG,	
  FOOT,	
  TOE,	
  ARM,	
  
FACE,	
  FINGER	
  
Current	
  triggering	
  mechanism	
  
(Celluli@s	
  pathway)	
  
Trigger	
  if	
  chief	
  complaint	
  contains	
  any	
  of	
  the	
  
following:	
  	
  
CELLULITIS,	
  REDDENED	
  HOT	
  LIMB,	
  ERYTHEMA,	
  LEG	
  
SWELLING,	
  INFECTION,	
  HAND,	
  LEG,	
  FOOT,	
  TOE,	
  ARM,	
  
FACE,	
  FINGER	
  
Expert	
  constructed	
  rule	
  –	
  built	
  for	
  sensi*vity	
  
Could	
  we	
  learn	
  a	
  beber	
  rule?	
  
Supervised	
  learning	
  is	
  a	
  non-­‐starter	
  
•  Leverage	
  large	
  clinical	
  databases	
  to	
  learn	
  
predic@ve	
  rules.	
  
•  Need	
  labeled	
  data	
  
•  Classifiers	
  onen	
  don’t	
  generalize	
  across	
  
ins@tu@ons	
  	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Our	
  contribu@on:	
  	
  
Anchor	
  &	
  Learn	
  Framework	
  
•  Use	
  a	
  combina@on	
  of	
  domain	
  exper@se	
  
(simple	
  rules)	
  and	
  vast	
  amounts	
  of	
  data	
  
(machine	
  learning).	
  
•  Method	
  does	
  not	
  require	
  any	
  manual	
  labeling.	
  
•  Anchors	
  are	
  highly	
  transferable	
  between	
  
ins@tu@ons.	
  
[Halpern	
  et	
  al.,	
  AMIA	
  2014]	
  
What	
  are	
  anchors?	
  
•  Rather	
  than	
  provide	
  gold-­‐standard	
  labels,	
  
construct	
  a	
  simple	
  rule	
  that	
  can	
  catch	
  some	
  
posi@ve	
  cases.	
  	
  
What	
  are	
  anchors?	
  
•  Rather	
  than	
  provide	
  gold-­‐standard	
  labels,	
  
construct	
  a	
  simple	
  rule	
  that	
  can	
  catch	
  some	
  
posi@ve	
  cases.	
  	
  
•  Examples:	
  
Phenotype	
   Possible	
  Anchor	
  
Diabe@c	
   gsn:016313	
  (insulin)	
  in	
  Medica@ons	
  
Cardiac	
   ICD9:428.X	
  (heart	
  failure)	
  in	
  Diagnoses	
  
Nursing	
  home	
   “from	
  nursing	
  home”	
  in	
  text	
  
Social	
  work	
   “social	
  work	
  consulted”	
  in	
  text	
  
What	
  are	
  anchors?	
  
•  Rather	
  than	
  provide	
  gold-­‐standard	
  labels,	
  
construct	
  a	
  simple	
  rule	
  that	
  can	
  catch	
  some	
  
posi@ve	
  cases.	
  Low	
  sensi*vity	
  here	
  is	
  ok!	
  	
  
•  Examples:	
  
Phenotype	
   Possible	
  Anchor	
  
Diabe@c	
   gsn:016313	
  (insulin)	
  in	
  Medica@ons	
  
Cardiac	
   ICD9:428.X	
  (heart	
  failure)	
  in	
  Diagnoses	
  
Nursing	
  home	
   “from	
  nursing	
  home”	
  in	
  text	
  
Social	
  work	
   “social	
  work	
  consulted”	
  in	
  text	
  
Learning	
  with	
  Anchors	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent	
  	
  
database	
  
•  Iden@fy	
  anchors	
  
Learning	
  with	
  Anchors	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent	
  	
  
database	
  
1
0
1
1
0
0
1	
  
•  Iden@fy	
  anchors	
  
Learning	
  with	
  Anchors	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent	
  	
  
database	
  
1
0
1
1
0
0
1	
  
•  Iden@fy	
  anchors	
  
•  Learn	
  to	
  predict	
  the	
  anchors	
  (anchor	
  as	
  pseudo-­‐labels)	
  
Learning	
  with	
  Anchors	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent	
  	
  
database	
  
1
0
1
1
0
0
1	
  
•  Iden@fy	
  anchors	
  
•  Learn	
  to	
  predict	
  the	
  anchors	
  (anchor	
  as	
  pseudo-­‐labels)	
  
•  Account	
  for	
  the	
  difference	
  between	
  anchors	
  and	
  labels	
  
Transform	
  
Predict	
  anchor	
   Predict	
  label	
  
New	
  
ins@tu@on	
  
Generalizability/Portability	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
New	
  
ins@tu@on	
  
Generalizability/Portability	
  
Data	
  may	
  be	
  very	
  different:	
  
•  Language	
  
•  Representa@on	
  	
  
•  Popula@on	
  
New	
  
ins@tu@on	
  
Generalizability/Portability	
  
As	
  long	
  as	
  our	
  anchors	
  appear	
  in	
  the	
  new	
  data	
  as	
  well…	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
New	
  
ins@tu@on	
  
Generalizability/Portability	
  
As	
  long	
  as	
  our	
  anchors	
  appear	
  in	
  the	
  new	
  data	
  as	
  well…	
  
Can	
  learn	
  a	
  new	
  model,	
  specific	
  to	
  the	
  new	
  ins@tu@on.	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
New	
  
ins@tu@on	
  
Generalizability/Portability	
  
As	
  long	
  as	
  our	
  anchors	
  appear	
  in	
  the	
  new	
  data	
  as	
  well…	
  
Can	
  learn	
  a	
  new	
  model,	
  specific	
  to	
  the	
  new	
  ins@tu@on.	
  
Only	
  need	
  to	
  share	
  anchor	
  defini*ons,	
  
Each	
  site	
  trains	
  models	
  on	
  its	
  own	
  data.	
  
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
Theore@cal	
  basis	
  for	
  anchors	
  
•  Unobserved	
  variable:	
  Y,	
  Observa@on:	
  A	
  
•  A	
  is	
  an	
  anchor	
  for	
  Y	
  if	
  condi@oning	
  on	
  A=1	
  gives	
  
uniform	
  samples	
  from	
  the	
  set	
  of	
  posi8ve	
  cases.	
  
Theore@cal	
  basis	
  for	
  anchors	
  
•  Unobserved	
  variable:	
  Y,	
  Observa@on:	
  A	
  
•  A	
  is	
  an	
  anchor	
  for	
  Y	
  if	
  condi@oning	
  on	
  A=1	
  gives	
  
uniform	
  samples	
  from	
  the	
  set	
  of	
  posi8ve	
  cases.	
  
•  Alterna@ve	
  formula@on	
  –	
  two	
  necessary	
  
condi@ons:	
  
P(Y = 1|A = 1) = 1
Posi*ve	
  condi*on	
  
A ? X|Y
Condi*onal	
  independence	
  
AND	
  
X represents	
  all	
  other	
  observa@ons.	
  
Theore@cal	
  basis	
  for	
  anchors	
  
•  Unobserved	
  variable:	
  Y,	
  Observa@on:	
  A	
  
•  A	
  is	
  an	
  anchor	
  for	
  Y	
  if	
  condi@oning	
  on	
  A=1	
  gives	
  
uniform	
  samples	
  from	
  the	
  set	
  of	
  posi8ve	
  cases.	
  
•  Alterna@ve	
  formula@on	
  –	
  two	
  necessary	
  
condi@ons:	
  
P(Y = 1|A = 1) = 1
Posi*ve	
  condi*on	
  
A ? X|Y
Condi*onal	
  independence	
  
AND	
  
X represents	
  all	
  other	
  observa@ons.	
  
e.g.	
  If	
  pa@ent	
  is	
  taking	
  insulin,	
  
the	
  pa@ent	
  is	
  surely	
  diabe*c.	
  
e.g.	
  If	
  we	
  know	
  the	
  pa@ent	
  had	
  
heart	
  failure,	
  knowing	
  whether	
  
the	
  diagnosis	
  code	
  appears	
  does	
  
inform	
  us	
  about	
  the	
  rest	
  of	
  the	
  
record.	
  
Theore@cal	
  basis	
  for	
  anchors	
  
•  Unobserved	
  variable:	
  Y,	
  Observa@on:	
  A	
  
•  A	
  is	
  an	
  anchor	
  for	
  Y	
  if	
  condi@oning	
  on	
  A=1	
  gives	
  
uniform	
  samples	
  from	
  the	
  set	
  of	
  posi8ve	
  cases.	
  
•  Theorem	
  [Elkan	
  &	
  Noto	
  2008]:	
  	
  
In	
  the	
  above	
  se>ng,	
  a	
  func8on	
  to	
  predict	
  A	
  	
  
can	
  be	
  transformed	
  to	
  predict	
  Y	
  
•  Can	
  also	
  use	
  more	
  recent	
  advances	
  on	
  learning	
  
with	
  noisy	
  labels	
  (e.g.,	
  Natarajan	
  et	
  al.,	
  NIPS	
  ‘13)	
  
Learning	
  with	
  anchors	
  
Input:	
  anchor	
  A	
  
	
  	
  	
  	
  	
  	
  unlabeled	
  pa@ents	
  
Output:	
  predic@on	
  rule	
  
1.  Learn	
  a	
  calibrated	
  classifier	
  (e.g.	
  
logis@c	
  regression)	
  to	
  predict:	
  
2.  Using	
  a	
  validate	
  set,	
  let	
  P	
  be	
  the	
  
pa@ents	
  with	
  A=1.	
  Compute:	
  
3.  For	
  a	
  previously	
  unseen	
  pa@ent	
  t,	
  
predict:	
  
Pr(A = 1 | ˜X)
C =
1
|P|
X
k2P
Pr(A = 1 | ˜X(k)
)
[Elkan	
  &	
  Noto	
  2008]	
  
1
C
Pr(A = 1|X(t)
) if A(t)
= 0
1 if A(t)
= 1
Calibra*on	
  
C	
  is	
  the	
  average	
  model	
  
predic@on	
  for	
  pa@ents	
  with	
  
anchors.	
  
Learning	
  
Learn	
  to	
  predict	
  A	
  from	
  
the	
  other	
  variables.	
  
Transforma*on	
  
If	
  no	
  anchor	
  present,	
  
according	
  to	
  a	
  scaled	
  version	
  
of	
  the	
  anchor-­‐predic@on	
  
model.	
  
…	
  
…	
  
Specified	
  anchors	
  
Automated	
  
sugges@ons	
  
Detailed	
  pa@ent	
  display	
  
Ranked	
  pa@ent	
  list	
  
Pa@ent	
  filters	
  
User	
  interface	
  to	
  specify	
  anchors	
  
Rapid	
  itera*on	
  
~30	
  min	
  to	
  add	
  a	
  
new	
  clinical	
  state	
  
variable	
  
Sonware	
  freely	
  available:	
  clinicalml.org	
  
Learned	
  model:	
  Celluli@s	
  
Pyxis	
  
Unstructured	
  text	
  
Anchors	
  
Highly	
  weighted	
  features	
  
(covariates)	
  
ICD9	
  680-­‐686:	
  	
  
Infec*ons	
  of	
  skin	
  and	
  
subcutaneous	
  *ssue	
  
celluli*s	
  
celluli*c	
  
cellulits	
  	
  
paronychia	
  	
  
pilonidal	
  	
  
bite	
  
cyst	
  	
  
boil	
  	
  
abcess	
  
abscess	
  	
  
abcesses	
  
red	
  	
  
redness	
  	
  
reddness	
  	
  
erythema	
  
unasyn	
  	
  
vanco	
  
finger	
  
thumb	
  
rle	
  
lle	
  
gluteal	
  
cephalexin	
  
vancomycin	
  
clindamycin	
  
cephazolin	
  
amoxicillin	
  
sulfameth/trimeth	
  
(using	
  200K	
  pa@ents’	
  data,	
  2008-­‐2013)	
  
Learned	
  model:	
  Cardiac	
  E@ology	
  
ICD9	
  codes	
  
410.*	
  acute	
  MI	
  
411.*	
  other	
  acute	
  …	
  
413.*	
  angina	
  pectoris	
  
785.51	
  card.	
  shock	
  
Pyxis	
  
coron.	
  vasodilators	
  
loop	
  diure@c	
  
Anchors	
  
cmed	
  
Ages	
  
age=80-­‐90	
  
age=70-­‐80	
  
age=90+	
  
nstemi	
  
stemi	
  
ntg	
  	
  
lasix	
  
nitro	
  
lasix	
  
furosemide	
  
Medica*ons	
  
aspirin	
  
clopidogrel	
  
Heparin	
  Sodium	
  
Metoprolol	
  
Tartrate	
  
Morphine	
  Sulfate	
  
Integrilin	
  
Labetalol	
  
Pyxis	
  
Unstructured	
  text	
  
cp	
  
chest	
  pain	
  
edema	
  
cmed	
  
chf	
  exacerba@on	
  
sob	
  
pedal	
  edema	
  
Sex=M	
  
Highly	
  weighted	
  features	
  
(covariates)	
  
(using	
  200K	
  pa@ents’	
  data,	
  2008-­‐2013)	
  
Learned	
  model:	
  Nursing	
  Home	
  
nursing	
  facility	
  
nursing	
  home	
  
nsg	
  facility	
  
nsg	
  home	
  
nsg.	
  home	
  
from	
  
staff	
  
at	
  
resident	
  
sent	
  
reported	
  
Ages	
  
age=90+	
  
age=80-­‐90	
  
age=70-­‐80	
  
baseline	
  
changes	
  
nonverbal	
  
ams	
  
unwitnessed_fall	
  
confusion	
  
senna	
  
colace	
  
trazodone	
  
dnr	
  
full	
  code	
  
g	
  tube	
  
foley	
  
nh	
  
Medica*ons	
  
vancomycin	
  
levofloxacin	
  
Pyxis	
  
Unstructured	
  text	
  
Anchors	
  
mirtazapine	
  
maalox	
  
tums	
  
Highly	
  weighted	
  features	
  
(covariates)	
  
(using	
  200K	
  pa@ents’	
  data,	
  2008-­‐2013)	
  
Evalua@on:	
  ED	
  red	
  flags	
  
•  Ac@ve	
  malignancy	
  
•  Fall	
  
•  Cardiac	
  E@ology	
  
•  Infec@on	
  
•  From	
  Nursing	
  Home	
  
•  An@coagulated	
  
•  Immunosuppressed	
  
•  Sep@c	
  Shock	
  
•  Pneumonia	
  
We	
  gathered	
  gold	
  standard	
  labels	
  for	
  these	
  9	
  variables	
  by	
  
adding	
  ques@ons	
  to	
  EMR	
  at	
  @me	
  of	
  ED	
  disposi@on:	
  
Comparison	
  to	
  Exis@ng	
  Approaches	
  
•  (Rules)	
  Predict	
  just	
  according	
  to	
  the	
  anchors.	
  	
  
– 1	
  if	
  anchor	
  is	
  present,	
  0	
  otherwise	
  
•  (ML)	
  Machine	
  learning	
  (logis@c	
  regression)	
  
– Using	
  up	
  to	
  3K	
  labels	
  
– Improves	
  with	
  more	
  labels,	
  but	
  labels	
  are	
  
expensive!	
  
Accuracy	
  of	
  predic@ons	
  
*	
  
Accuracy	
  of	
  predic@ons	
  
*	
  
Accuracy	
  of	
  predic@ons	
  
*	
  
*	
  
Scaling	
  this	
  up	
  
•  Currently	
  making	
  predic@ons	
  for	
  40	
  clinical	
  
variables	
  within	
  the	
  BIDMC	
  pa*ent	
  display	
  
– e.g.	
  allergic	
  reac@on,	
  motor	
  vehicle	
  accident,	
  hiv+	
  
•  Only	
  turned	
  on	
  for	
  a	
  small	
  number	
  of	
  clinicians	
  
Suggested	
  tags:	
  
MD	
  can	
  accept/reject	
  
Scaling	
  this	
  up	
  
•  Currently	
  making	
  predic@ons	
  for	
  40	
  clinical	
  
variables	
  within	
  the	
  BIDMC	
  pa*ent	
  display	
  
– e.g.	
  allergic	
  reac@on,	
  motor	
  vehicle	
  accident,	
  hiv+	
  
Accep@ng	
  a	
  tag	
  triggers	
  events	
  	
  
(pathway	
  enrollment,	
  specialized	
  order	
  sets,	
  etc)	
  
Our	
  next	
  steps	
  
•  Shared	
  library	
  of	
  anchored	
  phenotypes	
  
•  Real-­‐@me	
  es@ma@on	
  of	
  clinical	
  states	
  and	
  
actual	
  use	
  for	
  decision	
  support	
  within	
  ED	
  
•  Test	
  portability	
  of	
  anchors	
  to	
  other	
  ins@tu@ons	
  
More	
  info:	
  clinicalml.org	
  

Using Machine Learning to Automate Clinical Pathways

  • 1.
    Using  Machine  Learning  to  Automate   Clinical  Pathways   David  Sontag,  PhD   Department  of  Computer  Science   Courant  Ins@tute  of  Mathema@cal  Sciences   NYU   Joint  work  with  my  student  Yoni  Halpern  (NYU)  and  Steven  Horng   (Beth  Israel  Deaconess  Medical  Center)  
  • 2.
    Health  Informa@on  Technology  is   Rapidly  Changing   •  Aided  by  HITECH  Act,  hospital  adop@on  of   EHRs  has  increased  5-­‐fold  since  2008   [Charles  et  al.,  ONC  Data  Brief,  May  2014]  
  • 3.
    •  Over  $4  billion  of  investment  in  digital  health   startups  in  2014   Health  Informa@on  Technology  is   Rapidly  Changing   Analy@cs  /  Big   Data   Healthcare   Consumer   Engagement   [Wang  et  al.,  “Digital  health  funding  in  Q1  2015  over  $600M”,  Rock  Health,  April  2015]   EHR  /  Clinical   Workflow   Digital   Diagnos@cs   Popula@on   Health   Management   Digital   Medical   Device  
  • 4.
    [Weber  et  al.  (2014).  Finding  the  Missing  Link  for  Big  Biomedical  Data.  JAMA.]   Wealth  of  digital  health  data  available  
  • 5.
    Research  in  my  clinical  ML  lab   •  Next-­‐genera*on  electronic  health  records   focus  of  today’s  talk   •  Popula@on-­‐level  risk  stra@fica@on   •  Beber  managing  pa@ents  with  chronic   disease   clinicalml.org  
  • 6.
    Emergency  Department:   • Limited  resources   •  Time  sensi*ve   •  Cri*cal  decisions  
  • 7.
    Triage  Informa@on   (Free  text)   Lab  results   (Con@nuous  valued)   MD  comments   (free  text)   Specialist  consults   Physician   documenta@on   Repeated  vital  signs   (con@nuous  values)   Measured  every  30  s   T=0   30  min   2  hrs   Disposi@on   Next-Generation EHR for the Emergency Department
  • 8.
    All  pa*ent     observa*ons   MD/nurse   documenta@on   Billing   codes   Vitals   Orders   Labs   History   Built  on  Top  of  Real-­‐@me  Predic@on  of  Clinical   State  Variables  
  • 9.
    All  pa*ent     observa*ons   Clinical  state   variables   MD/nurse   documenta@on   Billing   codes   Vitals   Orders   Labs   History   From  nursing   home?   Has  altered   mental   status?   Has  cardiac   e@ology?   Has   infec@on?   Will  die  in   next  30   days?   Built  on  Top  of  Real-­‐@me  Predic@on  of  Clinical   State  Variables   Machine  learning  and  natural  language  processing  
  • 10.
    All  pa*ent     observa*ons   Clinical  state   variables   MD/nurse   documenta@on   Billing   codes   Vitals   Orders   Labs   History   Ac*on   Alerts/ Reminders   Decision  support   Cohort  Selec@on  QA  review   Contextual   display   From  nursing   home?   Has  altered   mental   status?   Has  cardiac   e@ology?   Has   infec@on?   Will  die  in   next  30   days?   Built  on  Top  of  Real-­‐@me  Predic@on  of  Clinical   State  Variables   Machine  learning  and  natural  language  processing  
  • 11.
    All  pa*ent     observa*ons   Clinical  state   variables   MD/nurse   documenta@on   Billing   codes   Vitals   Orders   Labs   History   Ac*on   Alerts/ Reminders   Decision  support   Cohort  Selec@on  QA  review   Contextual   display   From  nursing   home?   Has  altered   mental   status?   Has  cardiac   e@ology?   Has   infec@on?   Will  die  in   next  30   days?   Built  on  Top  of  Real-­‐@me  Predic@on  of  Clinical   State  Variables   Machine  learning  and  natural  language  processing   Advise  fall   precau@ons   Suggested   order  sets   Triggering   celluli@s   pathway   Sepsis  alert   Panel   management  
  • 12.
    Example:  Triggering  Clinical  Pathways   •  Clinical  Pathways  project  at  Beth  Israel  Deaconess   Medical  Center  (BIDMC)   •  Standardizing  care  in  the  Emergency  Department   –  Reduce  possibili@es  for  error   –  Enforce  established  best  prac@ces   •  Pathways  have  been  shown  to  reduce  in-­‐hospital   complica@ons,  without  increasing  costs  [Rober  et   al  2010]  
  • 13.
  • 14.
    Automa@ng  triggers   • Don’t  rely  on  the  user’s  knowledge  that  the   pathway  exists!  
  • 15.
    Current  triggering  mechanism   (Celluli@s  pathway)   Trigger  if  chief  complaint  contains  any  of  the   following:     CELLULITIS,  REDDENED  HOT  LIMB,  ERYTHEMA,  LEG   SWELLING,  INFECTION,  HAND,  LEG,  FOOT,  TOE,  ARM,   FACE,  FINGER  
  • 16.
    Current  triggering  mechanism   (Celluli@s  pathway)   Trigger  if  chief  complaint  contains  any  of  the   following:     CELLULITIS,  REDDENED  HOT  LIMB,  ERYTHEMA,  LEG   SWELLING,  INFECTION,  HAND,  LEG,  FOOT,  TOE,  ARM,   FACE,  FINGER   Expert  constructed  rule  –  built  for  sensi*vity   Could  we  learn  a  beber  rule?  
  • 17.
    Supervised  learning  is  a  non-­‐starter   •  Leverage  large  clinical  databases  to  learn   predic@ve  rules.   •  Need  labeled  data   •  Classifiers  onen  don’t  generalize  across   ins@tu@ons     LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
  • 18.
    Our  contribu@on:     Anchor  &  Learn  Framework   •  Use  a  combina@on  of  domain  exper@se   (simple  rules)  and  vast  amounts  of  data   (machine  learning).   •  Method  does  not  require  any  manual  labeling.   •  Anchors  are  highly  transferable  between   ins@tu@ons.   [Halpern  et  al.,  AMIA  2014]  
  • 19.
    What  are  anchors?   •  Rather  than  provide  gold-­‐standard  labels,   construct  a  simple  rule  that  can  catch  some   posi@ve  cases.    
  • 20.
    What  are  anchors?   •  Rather  than  provide  gold-­‐standard  labels,   construct  a  simple  rule  that  can  catch  some   posi@ve  cases.     •  Examples:   Phenotype   Possible  Anchor   Diabe@c   gsn:016313  (insulin)  in  Medica@ons   Cardiac   ICD9:428.X  (heart  failure)  in  Diagnoses   Nursing  home   “from  nursing  home”  in  text   Social  work   “social  work  consulted”  in  text  
  • 21.
    What  are  anchors?   •  Rather  than  provide  gold-­‐standard  labels,   construct  a  simple  rule  that  can  catch  some   posi@ve  cases.  Low  sensi*vity  here  is  ok!     •  Examples:   Phenotype   Possible  Anchor   Diabe@c   gsn:016313  (insulin)  in  Medica@ons   Cardiac   ICD9:428.X  (heart  failure)  in  Diagnoses   Nursing  home   “from  nursing  home”  in  text   Social  work   “social  work  consulted”  in  text  
  • 22.
    Learning  with  Anchors   LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data& Pa@ent     database   •  Iden@fy  anchors  
  • 23.
    Learning  with  Anchors   LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data& Pa@ent     database   1 0 1 1 0 0 1   •  Iden@fy  anchors  
  • 24.
    Learning  with  Anchors   LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data& Pa@ent     database   1 0 1 1 0 0 1   •  Iden@fy  anchors   •  Learn  to  predict  the  anchors  (anchor  as  pseudo-­‐labels)  
  • 25.
    Learning  with  Anchors   LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data& Pa@ent     database   1 0 1 1 0 0 1   •  Iden@fy  anchors   •  Learn  to  predict  the  anchors  (anchor  as  pseudo-­‐labels)   •  Account  for  the  difference  between  anchors  and  labels   Transform   Predict  anchor   Predict  label  
  • 26.
    New   ins@tu@on   Generalizability/Portability   LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
  • 27.
    LOINC& UMLS&CUID& RXnorm&ICD9& Different&data&types& New   ins@tu@on   Generalizability/Portability   Data  may  be  very  different:   •  Language   •  Representa@on     •  Popula@on  
  • 28.
    New   ins@tu@on   Generalizability/Portability   As  long  as  our  anchors  appear  in  the  new  data  as  well…   LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
  • 29.
    New   ins@tu@on   Generalizability/Portability   As  long  as  our  anchors  appear  in  the  new  data  as  well…   Can  learn  a  new  model,  specific  to  the  new  ins@tu@on.   LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
  • 30.
    New   ins@tu@on   Generalizability/Portability   As  long  as  our  anchors  appear  in  the  new  data  as  well…   Can  learn  a  new  model,  specific  to  the  new  ins@tu@on.   Only  need  to  share  anchor  defini*ons,   Each  site  trains  models  on  its  own  data.   LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
  • 31.
    Theore@cal  basis  for  anchors   •  Unobserved  variable:  Y,  Observa@on:  A   •  A  is  an  anchor  for  Y  if  condi@oning  on  A=1  gives   uniform  samples  from  the  set  of  posi8ve  cases.  
  • 32.
    Theore@cal  basis  for  anchors   •  Unobserved  variable:  Y,  Observa@on:  A   •  A  is  an  anchor  for  Y  if  condi@oning  on  A=1  gives   uniform  samples  from  the  set  of  posi8ve  cases.   •  Alterna@ve  formula@on  –  two  necessary   condi@ons:   P(Y = 1|A = 1) = 1 Posi*ve  condi*on   A ? X|Y Condi*onal  independence   AND   X represents  all  other  observa@ons.  
  • 33.
    Theore@cal  basis  for  anchors   •  Unobserved  variable:  Y,  Observa@on:  A   •  A  is  an  anchor  for  Y  if  condi@oning  on  A=1  gives   uniform  samples  from  the  set  of  posi8ve  cases.   •  Alterna@ve  formula@on  –  two  necessary   condi@ons:   P(Y = 1|A = 1) = 1 Posi*ve  condi*on   A ? X|Y Condi*onal  independence   AND   X represents  all  other  observa@ons.   e.g.  If  pa@ent  is  taking  insulin,   the  pa@ent  is  surely  diabe*c.   e.g.  If  we  know  the  pa@ent  had   heart  failure,  knowing  whether   the  diagnosis  code  appears  does   inform  us  about  the  rest  of  the   record.  
  • 34.
    Theore@cal  basis  for  anchors   •  Unobserved  variable:  Y,  Observa@on:  A   •  A  is  an  anchor  for  Y  if  condi@oning  on  A=1  gives   uniform  samples  from  the  set  of  posi8ve  cases.   •  Theorem  [Elkan  &  Noto  2008]:     In  the  above  se>ng,  a  func8on  to  predict  A     can  be  transformed  to  predict  Y   •  Can  also  use  more  recent  advances  on  learning   with  noisy  labels  (e.g.,  Natarajan  et  al.,  NIPS  ‘13)  
  • 35.
    Learning  with  anchors   Input:  anchor  A              unlabeled  pa@ents   Output:  predic@on  rule   1.  Learn  a  calibrated  classifier  (e.g.   logis@c  regression)  to  predict:   2.  Using  a  validate  set,  let  P  be  the   pa@ents  with  A=1.  Compute:   3.  For  a  previously  unseen  pa@ent  t,   predict:   Pr(A = 1 | ˜X) C = 1 |P| X k2P Pr(A = 1 | ˜X(k) ) [Elkan  &  Noto  2008]   1 C Pr(A = 1|X(t) ) if A(t) = 0 1 if A(t) = 1 Calibra*on   C  is  the  average  model   predic@on  for  pa@ents  with   anchors.   Learning   Learn  to  predict  A  from   the  other  variables.   Transforma*on   If  no  anchor  present,   according  to  a  scaled  version   of  the  anchor-­‐predic@on   model.  
  • 36.
    …   …   Specified  anchors   Automated   sugges@ons   Detailed  pa@ent  display   Ranked  pa@ent  list   Pa@ent  filters   User  interface  to  specify  anchors   Rapid  itera*on   ~30  min  to  add  a   new  clinical  state   variable   Sonware  freely  available:  clinicalml.org  
  • 37.
    Learned  model:  Celluli@s   Pyxis   Unstructured  text   Anchors   Highly  weighted  features   (covariates)   ICD9  680-­‐686:     Infec*ons  of  skin  and   subcutaneous  *ssue   celluli*s   celluli*c   cellulits     paronychia     pilonidal     bite   cyst     boil     abcess   abscess     abcesses   red     redness     reddness     erythema   unasyn     vanco   finger   thumb   rle   lle   gluteal   cephalexin   vancomycin   clindamycin   cephazolin   amoxicillin   sulfameth/trimeth   (using  200K  pa@ents’  data,  2008-­‐2013)  
  • 38.
    Learned  model:  Cardiac  E@ology   ICD9  codes   410.*  acute  MI   411.*  other  acute  …   413.*  angina  pectoris   785.51  card.  shock   Pyxis   coron.  vasodilators   loop  diure@c   Anchors   cmed   Ages   age=80-­‐90   age=70-­‐80   age=90+   nstemi   stemi   ntg     lasix   nitro   lasix   furosemide   Medica*ons   aspirin   clopidogrel   Heparin  Sodium   Metoprolol   Tartrate   Morphine  Sulfate   Integrilin   Labetalol   Pyxis   Unstructured  text   cp   chest  pain   edema   cmed   chf  exacerba@on   sob   pedal  edema   Sex=M   Highly  weighted  features   (covariates)   (using  200K  pa@ents’  data,  2008-­‐2013)  
  • 39.
    Learned  model:  Nursing  Home   nursing  facility   nursing  home   nsg  facility   nsg  home   nsg.  home   from   staff   at   resident   sent   reported   Ages   age=90+   age=80-­‐90   age=70-­‐80   baseline   changes   nonverbal   ams   unwitnessed_fall   confusion   senna   colace   trazodone   dnr   full  code   g  tube   foley   nh   Medica*ons   vancomycin   levofloxacin   Pyxis   Unstructured  text   Anchors   mirtazapine   maalox   tums   Highly  weighted  features   (covariates)   (using  200K  pa@ents’  data,  2008-­‐2013)  
  • 40.
    Evalua@on:  ED  red  flags   •  Ac@ve  malignancy   •  Fall   •  Cardiac  E@ology   •  Infec@on   •  From  Nursing  Home   •  An@coagulated   •  Immunosuppressed   •  Sep@c  Shock   •  Pneumonia   We  gathered  gold  standard  labels  for  these  9  variables  by   adding  ques@ons  to  EMR  at  @me  of  ED  disposi@on:  
  • 41.
    Comparison  to  Exis@ng  Approaches   •  (Rules)  Predict  just  according  to  the  anchors.     – 1  if  anchor  is  present,  0  otherwise   •  (ML)  Machine  learning  (logis@c  regression)   – Using  up  to  3K  labels   – Improves  with  more  labels,  but  labels  are   expensive!  
  • 42.
  • 43.
  • 44.
  • 45.
    Scaling  this  up   •  Currently  making  predic@ons  for  40  clinical   variables  within  the  BIDMC  pa*ent  display   – e.g.  allergic  reac@on,  motor  vehicle  accident,  hiv+   •  Only  turned  on  for  a  small  number  of  clinicians   Suggested  tags:   MD  can  accept/reject  
  • 46.
    Scaling  this  up   •  Currently  making  predic@ons  for  40  clinical   variables  within  the  BIDMC  pa*ent  display   – e.g.  allergic  reac@on,  motor  vehicle  accident,  hiv+   Accep@ng  a  tag  triggers  events     (pathway  enrollment,  specialized  order  sets,  etc)  
  • 47.
    Our  next  steps   •  Shared  library  of  anchored  phenotypes   •  Real-­‐@me  es@ma@on  of  clinical  states  and   actual  use  for  decision  support  within  ED   •  Test  portability  of  anchors  to  other  ins@tu@ons   More  info:  clinicalml.org