SlideShare a Scribd company logo
Lorenzo Rossi, PhD
Data Scientist
City of Hope National Medical Center
DataCon LA, August 2019
Best Practices for Prototyping
Machine Learning Models for
Healthcare
Machine learning in healthcare is growing fast,
but best practices are not well established yet
Towards Guidelines for ML in Health (8.2018, Stanford)
Motivations for ML in Healthcare
1. Lots of information about patients, but not enough time for clinicians
to process it
2. Physicians spend too much time typing information about patients
during encounters
3. Overwhelming amount of false alerts (e.g. in ICU)
Topics
1. The electronic health record (EHR)
2. Cohort definition
3. Data quality
4. Training - testing split
5. Performance metrics and reporting
6. Survival analysis
Topics
1. The electronic health record (EHR)
2. Cohort definition
3. Data quality
4. Training - testing split
5. Performance metrics and reporting
6. Survival curves
Data preparation
1. The Electronic Health Record (EHR)
• Laboratory tests
• Vitals
• Diagnoses
• Medications
• X-rays, CT scans, EKGs, …
• Notes
EHR data are very heterogeneous
• Laboratory tests [multi dimensional time series]
• Vitals [multi dimensional time series]
• Diagnoses [text, codes]
• Medications [text, codes, numeric]
• X-rays, CT scans, EKGs,… [2D - 3D images, time series, ..]
• Notes [text]
EHR data are very heterogeneous
• labs
• vitals
• notes
• …
Time is a key aspect of EHR data
p01
p02
p03
time
• labs
• vitals
• notes
• …
Time is a key aspect of EHR data
p01
p02
p03
Temporal resolution varies a lot
• ICU patient [minutes]
• Hospital patient [hours]
• Outpatient [weeks]
time
• Unplanned 30 day readmission
• Length of stay
• Mortality
• Sepsis
• ICU admission
• Surgical complications
Events hospitals want to predict from EHR data
• Unplanned 30 day readmission
• Length of stay
• Mortality
• Sepsis
• ICU admission
• Surgical complications
Events hospitals want to predict from EHR data
Improve capacity
• Unplanned 30 day readmission
• Length of stay
• Mortality
• Sepsis
• ICU admission
• Surgical complications
Events hospitals want to predict from EHR data
Improve capacity
Optimize decisions
Consider only binary prediction tasks for simplicity
Prediction algorithm gives score from 0 to 1
– E.g. close to 1 → high risk of readmission within 30 days
0 / 1
Consider only binary prediction tasks for simplicity
Prediction algorithm gives score from 0 to 1
– E.g. close to 1 → high risk of readmission within 30 days
Trade-off between falsely detected and missed targets
0 / 1
2. Cohort Definition
Individuals “who experienced particular event during specific
period of time”
Cohort
Individuals “who experienced particular event during specific
period of time”
Given prediction task, select clinically relevant cohort
E.g. for surgery complication prediction, patients who had one
or more surgeries between 2011 and 2018.
Cohort
A. Pick records of subset of patients
• labs
• vitals
• notes
• …p01
p02
p03
time
B. Pick a prediction time for each patients. Records after
prediction time are discarded
• labs
• vitals
• notes
• …p01
p02
p03
time
B. Pick a prediction time for each patients. Records after
prediction time are discarded
• labs
• vitals
• notes
• …p01
p02
p03
time
3. Data Quality
[Image source: SalesForce]
EHR data challenging in many different ways
Example: most common non-numeric entries for
lab values in a legacy HER system
• pending
• “>60”
• see note
• not done
• “<2”
• normal
• “1+”
• “2 to 5”
• “<250”
• “<0.1”
Example: discrepancies in dates of death
between hospital records and Social Security (~
4.8 % of shared patients)
Anomalies vs. Outliers
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one producing rest of data
Need domain knowledge to differentiate
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one producing rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = -1 → ?
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = -1 → anomaly (treat as missing value)
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = 1 → ?
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = 1 → possibly a outlier (clinically relevant)
4. Training - Testing Split
• Machine learning models evaluated on ability to make
prediction on new (unseen) data
• Split train (cross-validation) and test sets based on
temporal criteria
– e.g. no records in train set after prediction dates in test set
– random splits, even if stratified, could include records virtually
from ‘future’ to train model
• In retrospective studies should also avoid records of same
patients across train and test
– model could just learn to recognize patients
Guidelines
5. Performance Metrics and Reporting
Background
Generally highly imbalanced problems:
15% unplanned 30 day readmissions
< 10% sepsis cases
< 1% 30 day mortality
Types of Performance Metrics
1. Measure trade-offs
– (ROC) AUC
– average precision / PR AUC
2. Measure error rate at specific decision point
– false positive, false negative rates
– precision, recall
– F1
– accuracy
Types of Performance Metrics (II)
1. Measure trade-offs
– AUC, average precision / PR AUC,
– good for global performance characterization and (intra)-
model comparisons
2. Measure error rate at a specific decision point
– false positives, false negatives, …, precision, recall
– possibly good for interpretation of specific clinical costs and
benefits
Don’t use accuracy unless dataset is balanced
ROC AUC can be misleading too
ROC AUC can be misleading (II)
[Avati, Ng et al., Countdown Regression: Sharp and Calibrated
Survival Predictions. ArXiv, 2018]
ROC AUC (1 year) > ROC AUC (5 years), but PR AUC (1
year) < PR AUC (5 years)! Latter prediction task is easier.
[Avati, Ng et al., Countdown Regression: Sharp and Calibrated
Survival Predictions. ArXiv, 2018]
Performance should be reported with both types
of metrics
• 1 or 2 metrics for trade-off evaluation
– ROC AUC
– average precision
• 1 metric for performance at clinically meaningful decision
point
– e.g. recall @ 90% precision
Performance should be reported with both types
of metrics
• 1 or 2 metrics for trade-off evaluation
– ROC AUC
– average precision
• 1 metric for performance at clinically meaningful decision
point
– e.g. recall @ 90% precision
+ Comparison with a known benchmark (baseline)
Metrics in Stanford 2017 paper on mortality
prediction: AUC, average precision, recall @ 90%
Benchmarks
Main paper [Google, Nature, 2018] only reports deep
learning results with no benchmark comparison
Comparison only in supplemental online file (not on
Nature paper): deep learning only 1-2% better than
logistic regression benchmark
Plot scales can be deceiving [undisclosed
vendor, 2017]!
Same TP, FP plots rescaled
6. Survival Analysis
B. Pick a prediction time for each patients. Records after
prediction time are discarded
• labs
• vitals
• notes
• …p01
p02
p03
C. Plot survival curves
• Consider binary classification tasks
– Event of interest (e.g. death) either happens or not before
censoring time
• Survival curve: distribution of time to event and time to
censoring
Different selections of prediction times lead to
different survival profiles over same cohort
Example: high percentage of patients deceased within 30
days. Model trained to distinguish mostly between
relatively healthy and moribund patients
Example: high percentage of patients deceased within 30
days. Model trained to distinguish mostly between
relatively healthy and moribund patients → performance
overestimate
Final Remarks
• Outliers should not to be treated like anomalies
• Split train (CV) and test sets temporally
• Metrics:
– ROC AUC alone could be misleading
– Precision-Recall curve often more useful than ROC
– Compare with meaningful benchmarks
• Performance possibly overestimated for cohorts with
unrealistic survival curves
Thank You!
Twitter: @LorenzoARossi
Supplemental Material
Example: ROC Curve
Very high detection rate,
but also high false alarm rate
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi

More Related Content

What's hot

Fallacies indrayan
Fallacies indrayanFallacies indrayan
Fallacies indrayan
Abhaya Indrayan
 
Knowledge discovery in medicine
Knowledge discovery in medicineKnowledge discovery in medicine
Knowledge discovery in medicine
Avinash Hanwate
 
Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculations
Ramachandra Barik
 
Scientific Studies Reporting Guidelines
Scientific Studies Reporting GuidelinesScientific Studies Reporting Guidelines
Scientific Studies Reporting Guidelines
Cognibrain Healthcare
 
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
IJDKP
 
Trends in clinical research and career gd 09_may20
Trends in clinical research and career gd 09_may20Trends in clinical research and career gd 09_may20
Trends in clinical research and career gd 09_may20
Dr. Ganesh Divekar
 
Searching for Evidence
Searching for EvidenceSearching for Evidence
Searching for Evidence
Aboubakr Elnashar
 
To Cochrane or not: that's the question
To Cochrane or not: that's the questionTo Cochrane or not: that's the question
To Cochrane or not: that's the question
Hesham Al-Inany
 
When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...Effective Health Care Program
 
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-TantsyuraTherapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-TantsyuraVadim Tantsyura
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Aseda Owusua Addai-Deseh
 
How to conduct meta analysis
How to conduct meta analysisHow to conduct meta analysis
How to conduct meta analysis
Dr.Junaid Nazar
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
Sethu S
 
Research methodology and biostatistics
Research methodology and biostatisticsResearch methodology and biostatistics
Research methodology and biostatistics
Medical Ultrasound
 
lecture C
lecture Clecture C
lecture C
CMDLMS
 
Common statistical pitfalls in basic science research
Common statistical pitfalls in basic science researchCommon statistical pitfalls in basic science research
Common statistical pitfalls in basic science research
Ramachandra Barik
 

What's hot (20)

Fallacies indrayan
Fallacies indrayanFallacies indrayan
Fallacies indrayan
 
Knowledge discovery in medicine
Knowledge discovery in medicineKnowledge discovery in medicine
Knowledge discovery in medicine
 
Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculations
 
Scientific Studies Reporting Guidelines
Scientific Studies Reporting GuidelinesScientific Studies Reporting Guidelines
Scientific Studies Reporting Guidelines
 
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
 
Trends in clinical research and career gd 09_may20
Trends in clinical research and career gd 09_may20Trends in clinical research and career gd 09_may20
Trends in clinical research and career gd 09_may20
 
Searching for Evidence
Searching for EvidenceSearching for Evidence
Searching for Evidence
 
Amsterdam 11.06.2008
Amsterdam 11.06.2008Amsterdam 11.06.2008
Amsterdam 11.06.2008
 
To Cochrane or not: that's the question
To Cochrane or not: that's the questionTo Cochrane or not: that's the question
To Cochrane or not: that's the question
 
When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...
 
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-TantsyuraTherapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
How to conduct meta analysis
How to conduct meta analysisHow to conduct meta analysis
How to conduct meta analysis
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Research methodology and biostatistics
Research methodology and biostatisticsResearch methodology and biostatistics
Research methodology and biostatistics
 
lecture C
lecture Clecture C
lecture C
 
Common statistical pitfalls in basic science research
Common statistical pitfalls in basic science researchCommon statistical pitfalls in basic science research
Common statistical pitfalls in basic science research
 
297 vickers
297 vickers297 vickers
297 vickers
 
297 vickers
297 vickers297 vickers
297 vickers
 
Malmo 11.11.2008
Malmo 11.11.2008Malmo 11.11.2008
Malmo 11.11.2008
 

Similar to Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi

Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
SudeekshaKoricherla
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
AlAcademia Tsr
 
Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015
CADTH Symposium
 
Data analysis ( Bio-statistic )
Data analysis ( Bio-statistic )Data analysis ( Bio-statistic )
Data analysis ( Bio-statistic )
Amany Elsayed
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
CHANDAN PADHAN
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
Evangelos Kritsotakis
 
bio equivalence studies
bio equivalence studiesbio equivalence studies
bio equivalence studies
RamyaP53
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
Ewout Steyerberg
 
Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015Victor Brunka
 
Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IA
Veronika Garga
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty component
GH Yeoh
 
Quality control clia
Quality control cliaQuality control clia
Quality control clia
Juan Méndez
 
In tech quality-control_in_clinical_laboratories
In tech quality-control_in_clinical_laboratoriesIn tech quality-control_in_clinical_laboratories
In tech quality-control_in_clinical_laboratories
Millat Sultan
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
Stephen Senn
 
Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Praveen Chand
 
ICU SCORES.pptx
ICU SCORES.pptxICU SCORES.pptx
ICU SCORES.pptx
drmayanksach
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-Statisticians
Brook White, PMP
 
Extrapolation of time-to-event data
Extrapolation of time-to-event dataExtrapolation of time-to-event data
Extrapolation of time-to-event dataSheily Kamra
 
Data-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk LearningData-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk Learning
Po-Hsiang (Barnett) Chiu
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
Santam Chakraborty
 

Similar to Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi (20)

Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015
 
Data analysis ( Bio-statistic )
Data analysis ( Bio-statistic )Data analysis ( Bio-statistic )
Data analysis ( Bio-statistic )
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
bio equivalence studies
bio equivalence studiesbio equivalence studies
bio equivalence studies
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015
 
Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IA
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty component
 
Quality control clia
Quality control cliaQuality control clia
Quality control clia
 
In tech quality-control_in_clinical_laboratories
In tech quality-control_in_clinical_laboratoriesIn tech quality-control_in_clinical_laboratories
In tech quality-control_in_clinical_laboratories
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016
 
ICU SCORES.pptx
ICU SCORES.pptxICU SCORES.pptx
ICU SCORES.pptx
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-Statisticians
 
Extrapolation of time-to-event data
Extrapolation of time-to-event dataExtrapolation of time-to-event data
Extrapolation of time-to-event data
 
Data-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk LearningData-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk Learning
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi

  • 1. Lorenzo Rossi, PhD Data Scientist City of Hope National Medical Center DataCon LA, August 2019 Best Practices for Prototyping Machine Learning Models for Healthcare
  • 2.
  • 3. Machine learning in healthcare is growing fast, but best practices are not well established yet Towards Guidelines for ML in Health (8.2018, Stanford)
  • 4. Motivations for ML in Healthcare 1. Lots of information about patients, but not enough time for clinicians to process it 2. Physicians spend too much time typing information about patients during encounters 3. Overwhelming amount of false alerts (e.g. in ICU)
  • 5. Topics 1. The electronic health record (EHR) 2. Cohort definition 3. Data quality 4. Training - testing split 5. Performance metrics and reporting 6. Survival analysis
  • 6. Topics 1. The electronic health record (EHR) 2. Cohort definition 3. Data quality 4. Training - testing split 5. Performance metrics and reporting 6. Survival curves Data preparation
  • 7. 1. The Electronic Health Record (EHR)
  • 8. • Laboratory tests • Vitals • Diagnoses • Medications • X-rays, CT scans, EKGs, … • Notes EHR data are very heterogeneous
  • 9. • Laboratory tests [multi dimensional time series] • Vitals [multi dimensional time series] • Diagnoses [text, codes] • Medications [text, codes, numeric] • X-rays, CT scans, EKGs,… [2D - 3D images, time series, ..] • Notes [text] EHR data are very heterogeneous
  • 10. • labs • vitals • notes • … Time is a key aspect of EHR data p01 p02 p03 time
  • 11. • labs • vitals • notes • … Time is a key aspect of EHR data p01 p02 p03 Temporal resolution varies a lot • ICU patient [minutes] • Hospital patient [hours] • Outpatient [weeks] time
  • 12. • Unplanned 30 day readmission • Length of stay • Mortality • Sepsis • ICU admission • Surgical complications Events hospitals want to predict from EHR data
  • 13. • Unplanned 30 day readmission • Length of stay • Mortality • Sepsis • ICU admission • Surgical complications Events hospitals want to predict from EHR data Improve capacity
  • 14. • Unplanned 30 day readmission • Length of stay • Mortality • Sepsis • ICU admission • Surgical complications Events hospitals want to predict from EHR data Improve capacity Optimize decisions
  • 15. Consider only binary prediction tasks for simplicity Prediction algorithm gives score from 0 to 1 – E.g. close to 1 → high risk of readmission within 30 days 0 / 1
  • 16. Consider only binary prediction tasks for simplicity Prediction algorithm gives score from 0 to 1 – E.g. close to 1 → high risk of readmission within 30 days Trade-off between falsely detected and missed targets 0 / 1
  • 18. Individuals “who experienced particular event during specific period of time” Cohort
  • 19. Individuals “who experienced particular event during specific period of time” Given prediction task, select clinically relevant cohort E.g. for surgery complication prediction, patients who had one or more surgeries between 2011 and 2018. Cohort
  • 20. A. Pick records of subset of patients • labs • vitals • notes • …p01 p02 p03 time
  • 21. B. Pick a prediction time for each patients. Records after prediction time are discarded • labs • vitals • notes • …p01 p02 p03 time
  • 22. B. Pick a prediction time for each patients. Records after prediction time are discarded • labs • vitals • notes • …p01 p02 p03 time
  • 23. 3. Data Quality [Image source: SalesForce]
  • 24. EHR data challenging in many different ways
  • 25. Example: most common non-numeric entries for lab values in a legacy HER system • pending • “>60” • see note • not done • “<2” • normal • “1+” • “2 to 5” • “<250” • “<0.1”
  • 26. Example: discrepancies in dates of death between hospital records and Social Security (~ 4.8 % of shared patients)
  • 28. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one producing rest of data Need domain knowledge to differentiate
  • 29. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one producing rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort.
  • 30. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = -1 → ?
  • 31. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = -1 → anomaly (treat as missing value)
  • 32. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = 1 → ?
  • 33. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = 1 → possibly a outlier (clinically relevant)
  • 34. 4. Training - Testing Split
  • 35.
  • 36. • Machine learning models evaluated on ability to make prediction on new (unseen) data • Split train (cross-validation) and test sets based on temporal criteria – e.g. no records in train set after prediction dates in test set – random splits, even if stratified, could include records virtually from ‘future’ to train model • In retrospective studies should also avoid records of same patients across train and test – model could just learn to recognize patients Guidelines
  • 37. 5. Performance Metrics and Reporting
  • 38. Background Generally highly imbalanced problems: 15% unplanned 30 day readmissions < 10% sepsis cases < 1% 30 day mortality
  • 39. Types of Performance Metrics 1. Measure trade-offs – (ROC) AUC – average precision / PR AUC 2. Measure error rate at specific decision point – false positive, false negative rates – precision, recall – F1 – accuracy
  • 40. Types of Performance Metrics (II) 1. Measure trade-offs – AUC, average precision / PR AUC, – good for global performance characterization and (intra)- model comparisons 2. Measure error rate at a specific decision point – false positives, false negatives, …, precision, recall – possibly good for interpretation of specific clinical costs and benefits
  • 41. Don’t use accuracy unless dataset is balanced
  • 42. ROC AUC can be misleading too
  • 43.
  • 44. ROC AUC can be misleading (II) [Avati, Ng et al., Countdown Regression: Sharp and Calibrated Survival Predictions. ArXiv, 2018]
  • 45. ROC AUC (1 year) > ROC AUC (5 years), but PR AUC (1 year) < PR AUC (5 years)! Latter prediction task is easier. [Avati, Ng et al., Countdown Regression: Sharp and Calibrated Survival Predictions. ArXiv, 2018]
  • 46. Performance should be reported with both types of metrics • 1 or 2 metrics for trade-off evaluation – ROC AUC – average precision • 1 metric for performance at clinically meaningful decision point – e.g. recall @ 90% precision
  • 47. Performance should be reported with both types of metrics • 1 or 2 metrics for trade-off evaluation – ROC AUC – average precision • 1 metric for performance at clinically meaningful decision point – e.g. recall @ 90% precision + Comparison with a known benchmark (baseline)
  • 48. Metrics in Stanford 2017 paper on mortality prediction: AUC, average precision, recall @ 90%
  • 50. Main paper [Google, Nature, 2018] only reports deep learning results with no benchmark comparison
  • 51. Comparison only in supplemental online file (not on Nature paper): deep learning only 1-2% better than logistic regression benchmark
  • 52. Plot scales can be deceiving [undisclosed vendor, 2017]!
  • 53. Same TP, FP plots rescaled
  • 55. B. Pick a prediction time for each patients. Records after prediction time are discarded • labs • vitals • notes • …p01 p02 p03
  • 56. C. Plot survival curves • Consider binary classification tasks – Event of interest (e.g. death) either happens or not before censoring time • Survival curve: distribution of time to event and time to censoring
  • 57. Different selections of prediction times lead to different survival profiles over same cohort
  • 58. Example: high percentage of patients deceased within 30 days. Model trained to distinguish mostly between relatively healthy and moribund patients
  • 59. Example: high percentage of patients deceased within 30 days. Model trained to distinguish mostly between relatively healthy and moribund patients → performance overestimate
  • 60. Final Remarks • Outliers should not to be treated like anomalies • Split train (CV) and test sets temporally • Metrics: – ROC AUC alone could be misleading – Precision-Recall curve often more useful than ROC – Compare with meaningful benchmarks • Performance possibly overestimated for cohorts with unrealistic survival curves
  • 63.
  • 64. Example: ROC Curve Very high detection rate, but also high false alarm rate