SlideShare a Scribd company logo
1 of 57
Download to read offline
Scaling is caring
Building scalable feature engineering pipelines for
machine learning in healthcare
April 3 2019
Amsterdam 2019
Introductions
• Michele Tonutti !
•Data Scientist at Pacmed
•Intensive Care Team
•Background in Biomedical Engineering and Robotics

Introductions
•Developing machine-learning-driven decision
support tools to make healthcare more
personal, personalised and precise.
•Patients only get care that has the highest
probability of success for them.
•Focus on oncology, emergency care, chronic
diseases, and intensive care.
Pacmed focuses on four applications
Emergency care: 

What is the urgency level of a patient (how quick should someone see a doctor)?
Intensive Care: 

Predicting risk of ICU and post-ICU complications to support decision-making
Chronic diseases: 

What is the best treatment (combination) for patients with hypertension, diabetes and/or
chronic kidney failure?
Oncology: 

What are the optimal treatments for the individual patient with colon-, prostate- or breast-
cancer?
Intensive care is most promising and furthest developed
Emergency care: 

What is the urgency level of a patient (how quick should someone see a doctor)?
Intensive Care: 

Predicting risk of ICU and post-ICU complications to support decision-making
Chronic diseases: 

What is the best treatment (combination) for patients with hypertension, diabetes and/or
chronic kidney failure?
Oncology: 

What are the optimal treatments for the individual patient with colon-, prostate- or breast-
cancer?
The Intensive Care Unit (ICU)
Pacmed is currently working on four prediction problems on the
intensive care
t-3 t-2 t-1 Today t+7
Readmission/mortality
Vital signs
t-3 t-2 t-1 Today t+2
Re-intubationRespiratory 

parameters
t-3 t-2 t-1 Today t+1 t+2
Bed capacityPatient inflow

& outflow
t-3 t-2 t-1 Today t+1
Creatinine Kidney function
Discharge decision


Predicting the readmission and
mortality risk of patients on
discharge
Extubation decision
Predicting the risk of re-intubation
of patients if they are extubated
Capacity management
Predicting the number of full/
available beds
Predicting complications
E.g. Predicting kidney function
Machine-learning based decision support software
Explainable prediction of eligibility for discharge from the ICU
Explainable prediction of eligibility for discharge from the ICU
Feature Value Interpretation of value
SATURATION

Max value of the admission
98% A max value of 98% is lower than 95% of all discharged patients
SERUM CREATININE

Trend in last 24 hours
Increase of 20 ml
From 100 to 120
The average patient had a stable serum creatinine during the last
24 hours. The increase of +20 is higher than 99% of discharged
patients
ALAT

Variation in values last 24 hours
Variation of 7 ml
Between 5 and 12
The average patient had a variation of ALAT of 2 in the last 24
hours. A variation of 7 is higher than 76% of all patients.
URINE OUTPUT

Average last 24 hours
240 ml
An average value of last 24 hours. The average discharged patient
has a urine output of 250.
A pipeline for ICUs that works for both development and production
Hospital 1
Hospital 2
Hospital 3
Development
Production
Hospital 1
Hospital 2
Hospital 3
A pipeline for ICUs that works for both development and production
Development
Production
Feature
Engineering
Hospital 1
Hospital 2
Hospital 3
A pipeline for ICUs that works for both development and production
Feature engineering for medical data is an iterative process
Medical knowledge
Feature engineering
Modelling
Validation
Feature engineering for medical data is an iterative process
Medical knowledge
Feature engineering
Modelling
Validation
The issue of variety in medical data
1.High number of unique parameters
2.Differing feature structure for different problems
3.Different parameter distributions between populations
4.Variability of measurements over time
Patient and admission characteristics
Clinical observations
Vital signs & device data
Lab values
High number of parameters measured in the ICU
• Respiratory rate
• Mechanical Ventilation
• Tidal Volume
• Expiratory minute Volume
• Respiration modus
• PEEP
• Piek druk
• Supplemental O2
• Fraction of inspired O2
• Type of O2 administration
• Peripheral O2 saturation
• Blood pressure (diastolic
and systolic, arterial and
non-invasive)
• Pulmonary artery press.
(diastolic and systolic)
• CVP
• PCWP wedge
• Heart rate
• Cardiac output
• Tidal volume (inspiratory
and expiratory)
• Heart rhythm & ectopic
• Shock index
• Temperature peripheral
• CAM, DOS, RASS, NAS
• GCS
• Pupil size and reaction
Respiration Circulation
• Cough stimulant
• Urine output
• Number of bronchial toilets
• Age, sex
• Length and weight at
admission
• Department of origin
• Length of stay
• Number of prior
admissions
• Time in the hospital
before admission
• CPR code
• Base excess
• O2 content in blood
• Arterial O2 saturation
• pH
• Part. press. (O2 & CO2)
• Actual bicarbonate
Blood gas analysis Haematology
• Hb, Ht
• White blood cell count
• MCH, MCV
• Erythrocytes
• Thrombocytes
• Lymphocytes
• Leucocytes
• Baso, eo and neutro
• Reticulocytes
• PT, APTT
• CK-MB
• Troponin-T
Cardiac enzymes
• Natrium, Kalium
• Chloride
• Calcium, ion. Calcium
• Magnesium
• Fosfaat
• Creatinine
• CK
• EST and CRP
• Blood glucose
• Blood lactate
• Amylase
• Serum albumine
• BUN_creatinine
• NT-ProBNP
Chemistry
• ALAT and ASAT
• GGT, AF
• LDH
• Bilirubine
Liver tests
• Natrium, Kalium
• Ureum
Urinalysis
Medication categories
• Alimentary tract and metabolism
• Antibiotics
• Blood and blood-forming organs
• Cardiovascular
• Musculoskeletal system
• Nervous system
• General (sondevoeding)
Other
• CVVH
• Lines and drains
Measurements can vary widely between hospitals
Number of measurements Mean value
Hospital 1
Hospital 2
Activated partial thromboplastin time (aPTT)
Parameters are measured at different time scales, with highly varying
values and measurement frequencies
What do we need?
• A feature engineering pipeline that:

1. is scalable
2. can be used efficiently for both development and production
3. can be used for multiple outcome measures
4. produces features that are interpretable and useful for both machine
learning models and doctors
Challenge: how to turn time series into information relevant for a
model (and doctors)?
Challenge: how to turn time series into information relevant for a
model (and doctors)?
๏ Recurrent Neural Networks

e.g. (Phased) LSTMs
๏ Frequency domain transforms

e.g. Fourier transform
๏ Embedded representations 

e.g. patient2vec
Challenge: how to turn time series into information relevant for a
model (and doctors)?
๏ Recurrent Neural Networks

e.g. (Phased) LSTMs
๏ Frequency domain transforms

e.g. Fourier transform
๏ Embedded representations 

e.g. patient2vec
• Scalable?
• Reusable across models?
• Interpretable?
Challenge: how to turn time series into information relevant for a
model (and doctors)?
๏ Recurrent Neural Networks

e.g. (Phased) LSTMs
๏ Frequency domain transforms

e.g. Fourier transform
๏ Embedded representations 

e.g. patient2vec
• Scalable?
• Reusable across models?
• Interpretable?
Extracting interpretable aggregated values from vital parameters
last
first
minimum
average
slope standard deviation
maximum
{…}counts
Heart rate (bpm)
{…}
{…}
1
2
3
First 48h
First 72h
First 24h
{…}
We use these aggregated features to capture short-term effects as well as
longer-term trends
We use these aggregated features to capture short-term effects as well as
longer-term trends
{…} {…}
{…}
1
2
3
Whole stay
Day averages
First and last day
Multiple patients, multiple parameters, continuous time scale
Multiple patients, multiple parameters, continuous time scale
Split - apply - combine
1) Splitting the data into groups based on some criteria.
2) Applying a function to each group independently.
3) Combining the results into a data structure.
Creating features grouped in custom time windows
Creating features grouped in custom time windows
Creating features grouped in custom time windows
Why not stick to Pandas then?
• Interpretable, easy, reliable
• Works very well with datetime
formats
• Most simple aggregations available
Why not stick to Pandas then?
• Interpretable, easy, reliable
• Works very well with datetime
formats
• Most simple aggregations available
• No out-of-the-box parallelisation
• Everything in memory
• Custom aggregations can be
extremely computationally heavy
Heavy computational load for custom functions
Dask: scalable Pandas
• Abstraction over numpy, pandas and scikit-learn allowing you to run
operations on them in parallel, using multicore processing
Dask: scalable Pandas
Dask: scalable Pandas
Dask: scalable Pandas
• Manipulating large datasets, even when those datasets don’t fit in memory
• Distributed computing on large datasets with standard Pandas operations
like groupby, join, and time series computations
• Scales up to multiple machines auto-magically.

Scales down: low-memory and fast even on local machines.
Reminder: our goal of scalability
๏ Develop and test on any machine
๏ Re-use the same pipeline for production
๏ For both large and small datasets
Problems with Dask
• Not all pandas aggregations available

(e.g. apply custom functions on expanding windows)
• Complex to optimise on each machine
• Need to select manually number of workers, partitions, etc.
• Performance highly dependent on settings
• Slower for small datasets and certain transformations
Can we do better?
TSFRESH
• "Time Series Feature extraction based on scalable hypothesis tests”.
TSFRESH
• "Time Series Feature extraction based on scalable hypothesis tests”.
TSFRESH
• Same split-apply-combine concept, but feature calculations are done on
numpy arrays (vectorized), in parallel
Dealing with time-varying signals
pandas Series numpy array
Calculate aggregates
in parallel
pandas DataFrame
min()

max()
std()
…
Huge list of aggregates available out of the box
Result: clean, interpretable dataframe ready for modelling
Scaling up and down
• (Local) multiprocessing
• Cluster with Dask
Dealing with time-varying signals
• Problem: using numpy arrays means losing the datetime dimension
• Solution: custom fork of TSFRESH
• The DatetimeIndex of the input pandas dataframe is used only when
calculating time-dependent aggregations
• Medication data can also be taken into account by exploiting multi-
indices (e.g. medications)
Dealing with medications
Aggregates:
- Total amount
- Time since last dose
- Time under treatment
- Time without treatment
Summary
• Creating features for medical data entails dealing with variety and
variability
• Quick processing and interpretable features are top priorities
• No single tool offer a unique solution
Summary
• Pandas works well for quick processing of relatively small datasets
• Split-apply-combine
• Parallelizing (e.g. through Dask) allows quick computation of aggregates
both locally and distributed
• Vectorizing the split-apply-combine approach (e.g. with TSFRESH) speeds
up computation both for small and large datasets.
• Native support for Dask and custom distributors enables scaling
Conclusions
• Approach not limited to Python or specific packages
• Can be extended to any application that involve time series
• Scaling horizontally: we adapted the ICU pipeline for various other
projects (e.g. treatment decision based on patients’ clinical history)
• No need to re-invent the wheel every time
Key takeaway
“FEATURE ENGINEERING”
PANDAS
DATA SCIENTIST
Questions or feedback?
Michele Tonutti
michele.tonutti@pacmed.nl

More Related Content

Similar to Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019

David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...Cirdan
 
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon
 
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...BerkeleyPoCDx
 
Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1Miss.Alicia Zhang
 
InterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewInterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewISCMarketing
 
Raise the bar webinar 3.21.17 final
Raise the bar webinar   3.21.17 finalRaise the bar webinar   3.21.17 final
Raise the bar webinar 3.21.17 finalMeghan Carter
 
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...InsideScientific
 
Esco Versati Laboratory Centrifuge
Esco Versati Laboratory CentrifugeEsco Versati Laboratory Centrifuge
Esco Versati Laboratory CentrifugeEsco Group
 
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living HeartsHPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Heartsinside-BigData.com
 
CBCC Biorepository capabilities
CBCC Biorepository capabilitiesCBCC Biorepository capabilities
CBCC Biorepository capabilitiesHarini Patel
 
Using Simulation for Hospital Planning
Using Simulation for Hospital PlanningUsing Simulation for Hospital Planning
Using Simulation for Hospital PlanningSIMUL8 Corporation
 
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...John Blue
 
A Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseA Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseCancer Institute NSW
 
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoyH2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoySri Ambati
 
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...Health-Tech Innovation LABS
 
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"ASIP Santé
 

Similar to Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019 (20)

David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...David Snead on The use of digital pathology in the primary diagnosis of histo...
David Snead on The use of digital pathology in the primary diagnosis of histo...
 
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
 
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
March 5, 2015 PoCDx Seminar - Wallace White, Stratos - Development of the Pan...
 
Cardiac Design Labs
Cardiac Design LabsCardiac Design Labs
Cardiac Design Labs
 
Cardiac Design Labs
Cardiac Design Labs Cardiac Design Labs
Cardiac Design Labs
 
Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1Meditech products list 2015 no FM v.B1
Meditech products list 2015 no FM v.B1
 
InterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate OverviewInterSystems UK Symposium 2012 Corporate Overview
InterSystems UK Symposium 2012 Corporate Overview
 
Raise the bar webinar 3.21.17 final
Raise the bar webinar   3.21.17 finalRaise the bar webinar   3.21.17 final
Raise the bar webinar 3.21.17 final
 
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
Untether Your Data with EndoGear: Wireless Volumetric Blood Flow and Pressure...
 
Center for Integrative Research in Critical Care by Kevin Ward
Center for Integrative Research in Critical Care by Kevin WardCenter for Integrative Research in Critical Care by Kevin Ward
Center for Integrative Research in Critical Care by Kevin Ward
 
Esco Versati Laboratory Centrifuge
Esco Versati Laboratory CentrifugeEsco Versati Laboratory Centrifuge
Esco Versati Laboratory Centrifuge
 
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living HeartsHPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
HPC Infrastructure for Simulations of Drug-Induced Arrhythmias in Living Hearts
 
CBCC Biorepository capabilities
CBCC Biorepository capabilitiesCBCC Biorepository capabilities
CBCC Biorepository capabilities
 
Using Simulation for Hospital Planning
Using Simulation for Hospital PlanningUsing Simulation for Hospital Planning
Using Simulation for Hospital Planning
 
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
Dr. Jim Lowe - Big data and models: Are they really useful in disease managem...
 
A Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseA Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research Database
 
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoyH2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
 
M-Health
M-HealthM-Health
M-Health
 
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...Orlando Agrippa, Draper & Dash  at "Journeys of Health-Tech Innovation" Nov 3...
Orlando Agrippa, Draper & Dash at "Journeys of Health-Tech Innovation" Nov 3...
 
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
2011-10-21 ASIP Santé Conférence Télémédecine "Présentation TEMPiS"
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019

  • 1. Scaling is caring Building scalable feature engineering pipelines for machine learning in healthcare April 3 2019 Amsterdam 2019
  • 2. Introductions • Michele Tonutti ! •Data Scientist at Pacmed •Intensive Care Team •Background in Biomedical Engineering and Robotics

  • 3. Introductions •Developing machine-learning-driven decision support tools to make healthcare more personal, personalised and precise. •Patients only get care that has the highest probability of success for them. •Focus on oncology, emergency care, chronic diseases, and intensive care.
  • 4. Pacmed focuses on four applications Emergency care: 
 What is the urgency level of a patient (how quick should someone see a doctor)? Intensive Care: 
 Predicting risk of ICU and post-ICU complications to support decision-making Chronic diseases: 
 What is the best treatment (combination) for patients with hypertension, diabetes and/or chronic kidney failure? Oncology: 
 What are the optimal treatments for the individual patient with colon-, prostate- or breast- cancer?
  • 5. Intensive care is most promising and furthest developed Emergency care: 
 What is the urgency level of a patient (how quick should someone see a doctor)? Intensive Care: 
 Predicting risk of ICU and post-ICU complications to support decision-making Chronic diseases: 
 What is the best treatment (combination) for patients with hypertension, diabetes and/or chronic kidney failure? Oncology: 
 What are the optimal treatments for the individual patient with colon-, prostate- or breast- cancer?
  • 6. The Intensive Care Unit (ICU)
  • 7. Pacmed is currently working on four prediction problems on the intensive care t-3 t-2 t-1 Today t+7 Readmission/mortality Vital signs t-3 t-2 t-1 Today t+2 Re-intubationRespiratory 
 parameters t-3 t-2 t-1 Today t+1 t+2 Bed capacityPatient inflow
 & outflow t-3 t-2 t-1 Today t+1 Creatinine Kidney function Discharge decision 
 Predicting the readmission and mortality risk of patients on discharge Extubation decision Predicting the risk of re-intubation of patients if they are extubated Capacity management Predicting the number of full/ available beds Predicting complications E.g. Predicting kidney function
  • 9. Explainable prediction of eligibility for discharge from the ICU
  • 10. Explainable prediction of eligibility for discharge from the ICU Feature Value Interpretation of value SATURATION Max value of the admission 98% A max value of 98% is lower than 95% of all discharged patients SERUM CREATININE Trend in last 24 hours Increase of 20 ml From 100 to 120 The average patient had a stable serum creatinine during the last 24 hours. The increase of +20 is higher than 99% of discharged patients ALAT Variation in values last 24 hours Variation of 7 ml Between 5 and 12 The average patient had a variation of ALAT of 2 in the last 24 hours. A variation of 7 is higher than 76% of all patients. URINE OUTPUT Average last 24 hours 240 ml An average value of last 24 hours. The average discharged patient has a urine output of 250.
  • 11. A pipeline for ICUs that works for both development and production Hospital 1 Hospital 2 Hospital 3
  • 12. Development Production Hospital 1 Hospital 2 Hospital 3 A pipeline for ICUs that works for both development and production
  • 13. Development Production Feature Engineering Hospital 1 Hospital 2 Hospital 3 A pipeline for ICUs that works for both development and production
  • 14. Feature engineering for medical data is an iterative process Medical knowledge Feature engineering Modelling Validation
  • 15. Feature engineering for medical data is an iterative process Medical knowledge Feature engineering Modelling Validation
  • 16. The issue of variety in medical data 1.High number of unique parameters 2.Differing feature structure for different problems 3.Different parameter distributions between populations 4.Variability of measurements over time
  • 17. Patient and admission characteristics Clinical observations Vital signs & device data Lab values High number of parameters measured in the ICU • Respiratory rate • Mechanical Ventilation • Tidal Volume • Expiratory minute Volume • Respiration modus • PEEP • Piek druk • Supplemental O2 • Fraction of inspired O2 • Type of O2 administration • Peripheral O2 saturation • Blood pressure (diastolic and systolic, arterial and non-invasive) • Pulmonary artery press. (diastolic and systolic) • CVP • PCWP wedge • Heart rate • Cardiac output • Tidal volume (inspiratory and expiratory) • Heart rhythm & ectopic • Shock index • Temperature peripheral • CAM, DOS, RASS, NAS • GCS • Pupil size and reaction Respiration Circulation • Cough stimulant • Urine output • Number of bronchial toilets • Age, sex • Length and weight at admission • Department of origin • Length of stay • Number of prior admissions • Time in the hospital before admission • CPR code • Base excess • O2 content in blood • Arterial O2 saturation • pH • Part. press. (O2 & CO2) • Actual bicarbonate Blood gas analysis Haematology • Hb, Ht • White blood cell count • MCH, MCV • Erythrocytes • Thrombocytes • Lymphocytes • Leucocytes • Baso, eo and neutro • Reticulocytes • PT, APTT • CK-MB • Troponin-T Cardiac enzymes • Natrium, Kalium • Chloride • Calcium, ion. Calcium • Magnesium • Fosfaat • Creatinine • CK • EST and CRP • Blood glucose • Blood lactate • Amylase • Serum albumine • BUN_creatinine • NT-ProBNP Chemistry • ALAT and ASAT • GGT, AF • LDH • Bilirubine Liver tests • Natrium, Kalium • Ureum Urinalysis Medication categories • Alimentary tract and metabolism • Antibiotics • Blood and blood-forming organs • Cardiovascular • Musculoskeletal system • Nervous system • General (sondevoeding) Other • CVVH • Lines and drains
  • 18. Measurements can vary widely between hospitals Number of measurements Mean value Hospital 1 Hospital 2 Activated partial thromboplastin time (aPTT)
  • 19. Parameters are measured at different time scales, with highly varying values and measurement frequencies
  • 20. What do we need? • A feature engineering pipeline that:
 1. is scalable 2. can be used efficiently for both development and production 3. can be used for multiple outcome measures 4. produces features that are interpretable and useful for both machine learning models and doctors
  • 21. Challenge: how to turn time series into information relevant for a model (and doctors)?
  • 22. Challenge: how to turn time series into information relevant for a model (and doctors)? ๏ Recurrent Neural Networks
 e.g. (Phased) LSTMs ๏ Frequency domain transforms
 e.g. Fourier transform ๏ Embedded representations 
 e.g. patient2vec
  • 23. Challenge: how to turn time series into information relevant for a model (and doctors)? ๏ Recurrent Neural Networks
 e.g. (Phased) LSTMs ๏ Frequency domain transforms
 e.g. Fourier transform ๏ Embedded representations 
 e.g. patient2vec • Scalable? • Reusable across models? • Interpretable?
  • 24. Challenge: how to turn time series into information relevant for a model (and doctors)? ๏ Recurrent Neural Networks
 e.g. (Phased) LSTMs ๏ Frequency domain transforms
 e.g. Fourier transform ๏ Embedded representations 
 e.g. patient2vec • Scalable? • Reusable across models? • Interpretable?
  • 25. Extracting interpretable aggregated values from vital parameters last first minimum average slope standard deviation maximum {…}counts Heart rate (bpm)
  • 26. {…} {…} 1 2 3 First 48h First 72h First 24h {…} We use these aggregated features to capture short-term effects as well as longer-term trends
  • 27. We use these aggregated features to capture short-term effects as well as longer-term trends {…} {…} {…} 1 2 3 Whole stay Day averages First and last day
  • 28. Multiple patients, multiple parameters, continuous time scale
  • 29. Multiple patients, multiple parameters, continuous time scale
  • 30. Split - apply - combine 1) Splitting the data into groups based on some criteria. 2) Applying a function to each group independently. 3) Combining the results into a data structure.
  • 31. Creating features grouped in custom time windows
  • 32. Creating features grouped in custom time windows
  • 33. Creating features grouped in custom time windows
  • 34. Why not stick to Pandas then? • Interpretable, easy, reliable • Works very well with datetime formats • Most simple aggregations available
  • 35. Why not stick to Pandas then? • Interpretable, easy, reliable • Works very well with datetime formats • Most simple aggregations available • No out-of-the-box parallelisation • Everything in memory • Custom aggregations can be extremely computationally heavy
  • 36. Heavy computational load for custom functions
  • 37. Dask: scalable Pandas • Abstraction over numpy, pandas and scikit-learn allowing you to run operations on them in parallel, using multicore processing
  • 40. Dask: scalable Pandas • Manipulating large datasets, even when those datasets don’t fit in memory • Distributed computing on large datasets with standard Pandas operations like groupby, join, and time series computations • Scales up to multiple machines auto-magically.
 Scales down: low-memory and fast even on local machines.
  • 41. Reminder: our goal of scalability ๏ Develop and test on any machine ๏ Re-use the same pipeline for production ๏ For both large and small datasets
  • 42. Problems with Dask • Not all pandas aggregations available
 (e.g. apply custom functions on expanding windows) • Complex to optimise on each machine • Need to select manually number of workers, partitions, etc. • Performance highly dependent on settings • Slower for small datasets and certain transformations
  • 43. Can we do better?
  • 44. TSFRESH • "Time Series Feature extraction based on scalable hypothesis tests”.
  • 45. TSFRESH • "Time Series Feature extraction based on scalable hypothesis tests”.
  • 46. TSFRESH • Same split-apply-combine concept, but feature calculations are done on numpy arrays (vectorized), in parallel
  • 47. Dealing with time-varying signals pandas Series numpy array Calculate aggregates in parallel pandas DataFrame min()
 max() std() …
  • 48. Huge list of aggregates available out of the box
  • 49. Result: clean, interpretable dataframe ready for modelling
  • 50. Scaling up and down • (Local) multiprocessing • Cluster with Dask
  • 51. Dealing with time-varying signals • Problem: using numpy arrays means losing the datetime dimension • Solution: custom fork of TSFRESH • The DatetimeIndex of the input pandas dataframe is used only when calculating time-dependent aggregations • Medication data can also be taken into account by exploiting multi- indices (e.g. medications)
  • 52. Dealing with medications Aggregates: - Total amount - Time since last dose - Time under treatment - Time without treatment
  • 53. Summary • Creating features for medical data entails dealing with variety and variability • Quick processing and interpretable features are top priorities • No single tool offer a unique solution
  • 54. Summary • Pandas works well for quick processing of relatively small datasets • Split-apply-combine • Parallelizing (e.g. through Dask) allows quick computation of aggregates both locally and distributed • Vectorizing the split-apply-combine approach (e.g. with TSFRESH) speeds up computation both for small and large datasets. • Native support for Dask and custom distributors enables scaling
  • 55. Conclusions • Approach not limited to Python or specific packages • Can be extended to any application that involve time series • Scaling horizontally: we adapted the ICU pipeline for various other projects (e.g. treatment decision based on patients’ clinical history) • No need to re-invent the wheel every time
  • 57. Questions or feedback? Michele Tonutti michele.tonutti@pacmed.nl