SlideShare a Scribd company logo
DATA SCIENCE IN
HEALTHCARE & LIFE
SCIENCES
APPLIED CLINICAL ANALYTICS
DATA ANALYTICS IN HEALTHCARE & LIFE SCIENCES
1. VITAL BUSINESS PROBLEMS:
So many different problems exist and they are of varying degree of complexity:
- What impacts favorable clinical outcomes
- Drivers of adverse events
- Factors impacting cost of care
- Earlier diagnosis of cancers and chronic diseases
Understanding these different business problems is critical for generating
possible solutions
2. POTENTIAL DATA SOURCES:
Huge amounts of data is getting generated nowadays from different sources that
are capable of capturing information :
- Electronic Health Records
- Healthcare claims from Insurance companies
- Pharmacies – claims and medication reviews
- Lab tests and Imaging results
- Population health data – Social Determinants of Health
- Genomics (and later Proteomics and Metabolomics)
- Wearable and other devices
- Other sources (Surveys, Patient Reported Outcomes)
The volume, velocity, variety, and veracity that is getting generated is staggering
– typical Big Data problem.
3. DATA PROCESSING, MANAGEMENT AND ANALYSIS:
Making sense of these varied sources of data and processing them so that they are useful for analysis is a data engineering challenge.
Structured data needs to be cleaned and curated; data from different sources need to be matched to get a complete 360 degree view of the customer.
Semi-structured and unstructured data sources (Physician notes, imaging data) pose challenges to curate and store the information so that it can be retrieved and
analyzed at scale and speed.
Various Big Data technologies have been developed to tackle this problem of storing(HADOOP ecosystem, SPARK) and analyzing semi-structured and unstructured data
(Text mining, NLP, Deep Learning for Image and Video Analytics).
4. SOLUTIONS TO THE PROBLEMS:
At the end of the day, all the analysis should be able to generate actionable insights. Interpretation of the results and their implementation to solve the problem are key.
HOW ML/DL CAN AUGMENT THE DECISION MAKING
PROCESS FOR CLINICIANS
PROGNOSIS
•A machine-learning
model can learn the
patterns of health
trajectories of vast
numbers of patients.
This facility can help
physicians to
anticipate future
events at an expert
level, drawing from
information well
beyond the
individual physician’s
practice experience.
For example, how
likely is it that a
patient will be able
to return to work, or
how quickly will the
disease progress?
DIAGNOSIS
•A diagnostic error
will occur in the
care of nearly every
patient in his or her
lifetime, and
receiving the right
diagnosis is critical
to receiving
appropriate care.
This problem is not
limited to rare
conditions. Cardiac
chest pain, TB,
dysentery, and
complications of
childbirth are
commonly not
detected even in
developing
countries
TREATMENT
•In a large health
care system with
tens of thousands of
physicians treating
tens of millions of
patients, there is
variation in when
and why patients
present for care and
how patients with
similar conditions
are treated. Can a
model sort through
these natural
variations to help
physicians identify
when the collective
experience points to
a preferred
treatment pathway?
CLINICALWORKFLOW
•The same machine-
learning techniques
that are used in
many consumer
products can be
used to make
clinicians more
efficient. Machine
learning that drives
search engines can
help expose reqd.
.information in a
patient’s chart for a
clinician without
multiple clicks.
Data entry of forms
and text fields can
be improved with
the use of machine-
learning
techniques.
REMOTEAREAS
•There is no way for
physicians to
individually interact
with all the patients
who may need care.
Can machine learning
extend the reach of
clinicians to provide
expert-level medical
assessment without
involvement? For
example, patients
with new rashes may
be able to obtain a
diagnosis by sending
a picture that they
take on their
smartphones,
thereby averting
unnecessary urgent-
care visits.
REFERENCE: https://www.nejm.org/doi/full/10.1056/NEJMra1814259
COMPONENTS OF ELECTRONIC HEALTH RECORDS
EMR
DEMOG &
HISTORY
DRUGS
ALLERGIES
VISITS
ADMISSIONS
DIAGNOSES
LAB
RESULTS
PROCEDURE
ADDITIONAL DATA FACTORS (normally not present)
 GENOMICS
 SOCIAL DETERMINANTS OF HEALTH
 IMAGING DATA – X-RAY/USG/CT/MRI
 PATIENT REPORTED OUTCOMES - PRO
STANDARD EMR/EHR DATA COMPONENTS
 DEMOGRAHICS – Age, Gender, Race, Language, Religion, Insurance, Location
 CLINICAL HISTORY – Habits, Past Dx and Observations
 MEDICATIONS – Drug NDC, Quantity, Refills, Route, Rx dates
 FOOD AND DRUG ALLERGIES – Allergen, Reaction Desc., Severity, Dates
 VISITS TO ER AND OPD – Date/Time, Encounter Type, Provider Info
 INPATIENT ADMISSIONS – Date/Time, Source, Discharge Code
 PRIMARY DIAGNOSES AND COMORBIDITIES – ICD9/10, SNOMED
 PROCEDURES AND SURGERIES – Procedure codes and ICD codes
 LABORATORY RESULTS – LOINC, Date/Time, Reference Range, Value, UOM
Standard dictionaries: ICD9/10, SNOMED-CT, NDC, LOINC, NPI
GENOMICS IMAGING SDoH OUTCOMES
DIABETES – THE MAGNITUDE OF THE PROBLEM
Diabetes is the world's
eighth biggest killer,
accounting for some 1.5
million deaths each year. A
major new World Health
Organization report has
now revealed that the
number of cases around the
world has nearly
quadrupled to 422 million
in 2014 from 108 million in
1980. The Eastern-
Mediterranean region had
the biggest increase in cases
during that time frame.
Diabetes now affects one in
11 adults with high blood
sugar levels linked to 3.8
million deaths every year.
REFERENCE:
https://www.statista.com/chart/4617/the-
unrelenting-global-march-of-diabetes/
WHAT HAPPENS IN DIABETES MELLITUS
• https://youtu.be/qn2dhw0NJxo
Type 1 diabetes (T2DM)
In people with type 1 diabetes, the
body does not make insulin. The
immune system attacks and destroys
the cells in the pancreas that make
insulin. Type 1 diabetes is usually
diagnosed in children and young
adults, although it can appear at any
age. People with type 1 diabetes need
to take insulin every day to stay alive.
Type 2 diabetes (T1DM)
In people having type 2 diabetes, the
body does not make or use insulin
well. It can develop diabetes at any
age, even during childhood. However,
this type of diabetes occurs most often
in middle-aged and older people. Type
2 is the most common type of
diabetes.
COURTESY: NIDDK
https://www.niddk.nih.gov/health-
information/diabetes/overview/what-is-diabetes
IMAGE COURTESY: KHAN ACADEMY
HOW MACHINE LEARNING CAN HELP IN DIABETES
Predicting risk of heart failure for
diabetes patients with help from
machine learning
Identification of Type 2 Diabetes
Risk Factors Using Phenotypes
Consisting of Anthropometry and
Triglycerides based on Machine
Learning
Use of a Machine Learning
Algorithm Improves Prediction of
Progression to Diabetes
Predicting Future Glucose
Fluctuations Using Machine
Learning and Wearable Sensor Data
Predicting Diabetes Mellitus With
Machine Learning Techniques
Machine-learning to stratify
diabetic patients using novel
cardiac biomarkers and integrative
genomics
Predicting diabetic retinopathy and
identifying interpretable biomedical
features using machine learning
algorithms
Impact of HbA1c Measurement on
Hospital Readmission Rates:
Analysis of 70,000 Clinical Database
Patient Records
Data-Driven Blood Glucose Pattern
Classification and Anomalies
Detection: Machine-Learning
Applications in Type 1 Diabetes
APPROACH FOR DM READMISSION PREDICTIVE MODEL
• DMT2 risk prediction using clinical data and statistical and machine learning
algorithms/models
8
Predictor Variables (total 44 variables)
 Demographic
 Age
 Gender
 Ethnicity
 Diagnosis
 Type of Condition(DM T1/T2) diagnosis
 # of comorbidities
 Position (primary, secondary, etc.) of
diagnosis
 Encounter
 IP, OP, AE visits
 Medications
 Dosage, frequency, route
 Lab results
 Test names, dates, UOM, value
 Normal/abnormal result
 Admission
 Length of stay
 Admission method (elective, non-
elective)
 Discharge destination
 Procedure
 Count of procedures
 Cost of procedures
Response Variable
 Readmission within 30 days
INPUT MODEL OUTPUT
4 years 1 year
Observation
window
Performance
window
Validation
window
Data split into time windows1
2 Models built using following algorithms (data from
observation and performance windows)
 Logistic regression model (LOG)
 Decision tree model (DT)
 Random forest model (RF)
 Model Ensembles
3 In-time validation (within performance window)
48.6%
74.3%
34.9%
29.4%
37.3%
68.7%
38.5%
28.2%
53.5%
76.7%
39.8%
33.7%
GINI AUC KS WORST
DECILE
CAPTURELOG DT RF
4 Out-of-time validation (in validation window)
All three models provided accuracy of
~80% in out-of-time validation scenario
RF model with ~76% AUC indicates reasonably good fit
Significant variables (major
drivers of readmission)
 SEVERITY OF DM
 # of DM spells in past 1 year
 ED LOS in past 1 year
 # of procedures undergone
 # of OPD visits in past 1 year
 # of ED visits in past 1 year
 # of IP visits in past 1 year
 # of comorbidities
 Distance from hospital
 DM LOS in past 1 year
 Time since last ED visit
 Total ED cost in past 1 year
 Age of patient
Patient category based on
risk score
HighLow
5
6
9
RISK PREDICTION MODEL: DESIGN, EVALUATION
• Mean/Median
• Regression
• KNN
Missing
imputation
• Feature Imp
• RFE
• WoE and IV
Feature
Selection
• Tree based
(DT, RF, GBT)
• Others (SVM,
NN, NB)
Model
Build
• K-fold cross
validation
• ROC curve
Model
Evaluation
Patient cohorts are created based on ICD 9/10 codes for defined chronic disease (e.g. DMT2) and also on the time of
diagnosis to separate already diagnosed patients from those who will potentially develop the disease.
Prospective
Cohort -
Scoring
Dataset
Feature selection
mechanisms help to
focus on the most
important variables
which the outcome
variable – methods
mentioned above
have been used.
EMR data has many
dimensions and this
also means lot of
values are missing –
imputation methods
help keep most of
the features usable.
The basic task is
classification which
is done by
computing the
probability of
outcome at each
patient level and
then applying
thresholds.
Multiple models
were created and
then validated for
accuracy metrics to
select the best
model. Cross
validation and area
under ROC curve
utilized.
Scoring was done
on the prospective
cohort to group
patients into high
risk, medium risk
and low risk. High
risk group was to be
targeted for
interventions.
PRACTICAL USE CASE AND CODE DEMO
USE CASE
DATASET
• Risk Prediction for Diabetes
• Impact of HbA1c Measurement on Hospital Readmission Rates:
Analysis of Clinical Database Patient Records
UCI MACHINE LEARNING REPOSITORY - Description
100000 T2DM patients from 30 hospitals; CERNER HEALTH FACTS
OUTCOME
• How likely is a patient to be diagnosed with DM in near future?
• How likely is a T2DM patient to come back to the hospital, before
30 days post discharge and after 30 days discharge?
METHODS
Multiple ML models generated and compared
Individual Classifiers: DT, LOGREG, SVC
Ensemble Classifiers: RF, GBC
GitHub Link

More Related Content

What's hot

Assessment of Anxiety,Depression and Stress using Machine Learning Models
Assessment of Anxiety,Depression and Stress using Machine Learning ModelsAssessment of Anxiety,Depression and Stress using Machine Learning Models
Assessment of Anxiety,Depression and Stress using Machine Learning Models
Prince Kumar
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
Mustafa Oğuz
 
Webinar on AI in Medical Diagnosis with Emerging Technologies
Webinar on AI in Medical Diagnosis with Emerging TechnologiesWebinar on AI in Medical Diagnosis with Emerging Technologies
Webinar on AI in Medical Diagnosis with Emerging Technologies
BIS Research Inc.
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
IRJET Journal
 
Ai applied in healthcare
Ai applied in healthcareAi applied in healthcare
Ai applied in healthcare
Javier Samir Rey
 
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
IAEME Publication
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
Spotle.ai
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
Kishor Datta Gupta
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
Ashish Salve
 
Artificial intelligence enters the medical field
Artificial intelligence enters the medical fieldArtificial intelligence enters the medical field
Artificial intelligence enters the medical field
Ruchi Jain
 
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
Hamidreza Bolhasani
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
IDEAS - Int'l Data Engineering and Science Association
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
Ariful Haque
 
Plant Disease Prediction using CNN
Plant Disease Prediction using CNNPlant Disease Prediction using CNN
Plant Disease Prediction using CNN
vishwasgarade1
 
Ai in healthcare (3)
Ai in healthcare (3)Ai in healthcare (3)
Ai in healthcare (3)
Nicholas Gormley
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
SUJIT SHIBAPRASAD MAITY
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and future
Deakin University
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
shivaniyadav112
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
amiteshg
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
Aboul Ella Hassanien
 

What's hot (20)

Assessment of Anxiety,Depression and Stress using Machine Learning Models
Assessment of Anxiety,Depression and Stress using Machine Learning ModelsAssessment of Anxiety,Depression and Stress using Machine Learning Models
Assessment of Anxiety,Depression and Stress using Machine Learning Models
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
Webinar on AI in Medical Diagnosis with Emerging Technologies
Webinar on AI in Medical Diagnosis with Emerging TechnologiesWebinar on AI in Medical Diagnosis with Emerging Technologies
Webinar on AI in Medical Diagnosis with Emerging Technologies
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Ai applied in healthcare
Ai applied in healthcareAi applied in healthcare
Ai applied in healthcare
 
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUES
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Artificial intelligence enters the medical field
Artificial intelligence enters the medical fieldArtificial intelligence enters the medical field
Artificial intelligence enters the medical field
 
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
 
Plant Disease Prediction using CNN
Plant Disease Prediction using CNNPlant Disease Prediction using CNN
Plant Disease Prediction using CNN
 
Ai in healthcare (3)
Ai in healthcare (3)Ai in healthcare (3)
Ai in healthcare (3)
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and future
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 

Similar to Predictive Analytics and Machine Learning for Healthcare - Diabetes

Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
IRJET Journal
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare Industry
Dinesh V
 
Basics of Information support of the hospital
Basics of Information support of the hospitalBasics of Information support of the hospital
Basics of Information support of the hospital
Eneutron
 
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Fundación Ramón Areces
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014
ipposi
 
Pavia wsp october 2011
Pavia wsp october 2011Pavia wsp october 2011
Pavia wsp october 2011
Australian Medical Council Limited
 
Patient generated-data
Patient generated-dataPatient generated-data
Patient generated-data
EURORDIS Rare Diseases Europe
 
Detection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learningDetection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learning
IJICTJOURNAL
 
Genomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical RecordsGenomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical Records
Lyle Berkowitz, MD
 
Multiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewMultiple Disease Prediction System: A Review
Multiple Disease Prediction System: A Review
IRJET Journal
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
Semantic Web San Diego
 
How predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patientHow predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patient
IMSHealthRWES
 
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
TELKOMNIKA JOURNAL
 
K-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemiaK-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemia
ijtsrd
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer System
IRJET Journal
 
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision MedicineElectronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision MedicineKent State University
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
Warren Kibbe
 
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET Journal
 
Unified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on AccuracyUnified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on Accuracy
Quahog Life Sciences
 
Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.
IRJET Journal
 

Similar to Predictive Analytics and Machine Learning for Healthcare - Diabetes (20)

Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare Industry
 
Basics of Information support of the hospital
Basics of Information support of the hospitalBasics of Information support of the hospital
Basics of Information support of the hospital
 
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014
 
Pavia wsp october 2011
Pavia wsp october 2011Pavia wsp october 2011
Pavia wsp october 2011
 
Patient generated-data
Patient generated-dataPatient generated-data
Patient generated-data
 
Detection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learningDetection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learning
 
Genomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical RecordsGenomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical Records
 
Multiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewMultiple Disease Prediction System: A Review
Multiple Disease Prediction System: A Review
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
How predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patientHow predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patient
 
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
 
K-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemiaK-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemia
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer System
 
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision MedicineElectronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
 
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
 
Unified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on AccuracyUnified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on Accuracy
 
Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 

Predictive Analytics and Machine Learning for Healthcare - Diabetes

  • 1. DATA SCIENCE IN HEALTHCARE & LIFE SCIENCES APPLIED CLINICAL ANALYTICS
  • 2. DATA ANALYTICS IN HEALTHCARE & LIFE SCIENCES 1. VITAL BUSINESS PROBLEMS: So many different problems exist and they are of varying degree of complexity: - What impacts favorable clinical outcomes - Drivers of adverse events - Factors impacting cost of care - Earlier diagnosis of cancers and chronic diseases Understanding these different business problems is critical for generating possible solutions 2. POTENTIAL DATA SOURCES: Huge amounts of data is getting generated nowadays from different sources that are capable of capturing information : - Electronic Health Records - Healthcare claims from Insurance companies - Pharmacies – claims and medication reviews - Lab tests and Imaging results - Population health data – Social Determinants of Health - Genomics (and later Proteomics and Metabolomics) - Wearable and other devices - Other sources (Surveys, Patient Reported Outcomes) The volume, velocity, variety, and veracity that is getting generated is staggering – typical Big Data problem. 3. DATA PROCESSING, MANAGEMENT AND ANALYSIS: Making sense of these varied sources of data and processing them so that they are useful for analysis is a data engineering challenge. Structured data needs to be cleaned and curated; data from different sources need to be matched to get a complete 360 degree view of the customer. Semi-structured and unstructured data sources (Physician notes, imaging data) pose challenges to curate and store the information so that it can be retrieved and analyzed at scale and speed. Various Big Data technologies have been developed to tackle this problem of storing(HADOOP ecosystem, SPARK) and analyzing semi-structured and unstructured data (Text mining, NLP, Deep Learning for Image and Video Analytics). 4. SOLUTIONS TO THE PROBLEMS: At the end of the day, all the analysis should be able to generate actionable insights. Interpretation of the results and their implementation to solve the problem are key.
  • 3. HOW ML/DL CAN AUGMENT THE DECISION MAKING PROCESS FOR CLINICIANS PROGNOSIS •A machine-learning model can learn the patterns of health trajectories of vast numbers of patients. This facility can help physicians to anticipate future events at an expert level, drawing from information well beyond the individual physician’s practice experience. For example, how likely is it that a patient will be able to return to work, or how quickly will the disease progress? DIAGNOSIS •A diagnostic error will occur in the care of nearly every patient in his or her lifetime, and receiving the right diagnosis is critical to receiving appropriate care. This problem is not limited to rare conditions. Cardiac chest pain, TB, dysentery, and complications of childbirth are commonly not detected even in developing countries TREATMENT •In a large health care system with tens of thousands of physicians treating tens of millions of patients, there is variation in when and why patients present for care and how patients with similar conditions are treated. Can a model sort through these natural variations to help physicians identify when the collective experience points to a preferred treatment pathway? CLINICALWORKFLOW •The same machine- learning techniques that are used in many consumer products can be used to make clinicians more efficient. Machine learning that drives search engines can help expose reqd. .information in a patient’s chart for a clinician without multiple clicks. Data entry of forms and text fields can be improved with the use of machine- learning techniques. REMOTEAREAS •There is no way for physicians to individually interact with all the patients who may need care. Can machine learning extend the reach of clinicians to provide expert-level medical assessment without involvement? For example, patients with new rashes may be able to obtain a diagnosis by sending a picture that they take on their smartphones, thereby averting unnecessary urgent- care visits. REFERENCE: https://www.nejm.org/doi/full/10.1056/NEJMra1814259
  • 4. COMPONENTS OF ELECTRONIC HEALTH RECORDS EMR DEMOG & HISTORY DRUGS ALLERGIES VISITS ADMISSIONS DIAGNOSES LAB RESULTS PROCEDURE ADDITIONAL DATA FACTORS (normally not present)  GENOMICS  SOCIAL DETERMINANTS OF HEALTH  IMAGING DATA – X-RAY/USG/CT/MRI  PATIENT REPORTED OUTCOMES - PRO STANDARD EMR/EHR DATA COMPONENTS  DEMOGRAHICS – Age, Gender, Race, Language, Religion, Insurance, Location  CLINICAL HISTORY – Habits, Past Dx and Observations  MEDICATIONS – Drug NDC, Quantity, Refills, Route, Rx dates  FOOD AND DRUG ALLERGIES – Allergen, Reaction Desc., Severity, Dates  VISITS TO ER AND OPD – Date/Time, Encounter Type, Provider Info  INPATIENT ADMISSIONS – Date/Time, Source, Discharge Code  PRIMARY DIAGNOSES AND COMORBIDITIES – ICD9/10, SNOMED  PROCEDURES AND SURGERIES – Procedure codes and ICD codes  LABORATORY RESULTS – LOINC, Date/Time, Reference Range, Value, UOM Standard dictionaries: ICD9/10, SNOMED-CT, NDC, LOINC, NPI GENOMICS IMAGING SDoH OUTCOMES
  • 5. DIABETES – THE MAGNITUDE OF THE PROBLEM Diabetes is the world's eighth biggest killer, accounting for some 1.5 million deaths each year. A major new World Health Organization report has now revealed that the number of cases around the world has nearly quadrupled to 422 million in 2014 from 108 million in 1980. The Eastern- Mediterranean region had the biggest increase in cases during that time frame. Diabetes now affects one in 11 adults with high blood sugar levels linked to 3.8 million deaths every year. REFERENCE: https://www.statista.com/chart/4617/the- unrelenting-global-march-of-diabetes/
  • 6. WHAT HAPPENS IN DIABETES MELLITUS • https://youtu.be/qn2dhw0NJxo Type 1 diabetes (T2DM) In people with type 1 diabetes, the body does not make insulin. The immune system attacks and destroys the cells in the pancreas that make insulin. Type 1 diabetes is usually diagnosed in children and young adults, although it can appear at any age. People with type 1 diabetes need to take insulin every day to stay alive. Type 2 diabetes (T1DM) In people having type 2 diabetes, the body does not make or use insulin well. It can develop diabetes at any age, even during childhood. However, this type of diabetes occurs most often in middle-aged and older people. Type 2 is the most common type of diabetes. COURTESY: NIDDK https://www.niddk.nih.gov/health- information/diabetes/overview/what-is-diabetes IMAGE COURTESY: KHAN ACADEMY
  • 7. HOW MACHINE LEARNING CAN HELP IN DIABETES Predicting risk of heart failure for diabetes patients with help from machine learning Identification of Type 2 Diabetes Risk Factors Using Phenotypes Consisting of Anthropometry and Triglycerides based on Machine Learning Use of a Machine Learning Algorithm Improves Prediction of Progression to Diabetes Predicting Future Glucose Fluctuations Using Machine Learning and Wearable Sensor Data Predicting Diabetes Mellitus With Machine Learning Techniques Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes
  • 8. APPROACH FOR DM READMISSION PREDICTIVE MODEL • DMT2 risk prediction using clinical data and statistical and machine learning algorithms/models 8 Predictor Variables (total 44 variables)  Demographic  Age  Gender  Ethnicity  Diagnosis  Type of Condition(DM T1/T2) diagnosis  # of comorbidities  Position (primary, secondary, etc.) of diagnosis  Encounter  IP, OP, AE visits  Medications  Dosage, frequency, route  Lab results  Test names, dates, UOM, value  Normal/abnormal result  Admission  Length of stay  Admission method (elective, non- elective)  Discharge destination  Procedure  Count of procedures  Cost of procedures Response Variable  Readmission within 30 days INPUT MODEL OUTPUT 4 years 1 year Observation window Performance window Validation window Data split into time windows1 2 Models built using following algorithms (data from observation and performance windows)  Logistic regression model (LOG)  Decision tree model (DT)  Random forest model (RF)  Model Ensembles 3 In-time validation (within performance window) 48.6% 74.3% 34.9% 29.4% 37.3% 68.7% 38.5% 28.2% 53.5% 76.7% 39.8% 33.7% GINI AUC KS WORST DECILE CAPTURELOG DT RF 4 Out-of-time validation (in validation window) All three models provided accuracy of ~80% in out-of-time validation scenario RF model with ~76% AUC indicates reasonably good fit Significant variables (major drivers of readmission)  SEVERITY OF DM  # of DM spells in past 1 year  ED LOS in past 1 year  # of procedures undergone  # of OPD visits in past 1 year  # of ED visits in past 1 year  # of IP visits in past 1 year  # of comorbidities  Distance from hospital  DM LOS in past 1 year  Time since last ED visit  Total ED cost in past 1 year  Age of patient Patient category based on risk score HighLow 5 6
  • 9. 9 RISK PREDICTION MODEL: DESIGN, EVALUATION • Mean/Median • Regression • KNN Missing imputation • Feature Imp • RFE • WoE and IV Feature Selection • Tree based (DT, RF, GBT) • Others (SVM, NN, NB) Model Build • K-fold cross validation • ROC curve Model Evaluation Patient cohorts are created based on ICD 9/10 codes for defined chronic disease (e.g. DMT2) and also on the time of diagnosis to separate already diagnosed patients from those who will potentially develop the disease. Prospective Cohort - Scoring Dataset Feature selection mechanisms help to focus on the most important variables which the outcome variable – methods mentioned above have been used. EMR data has many dimensions and this also means lot of values are missing – imputation methods help keep most of the features usable. The basic task is classification which is done by computing the probability of outcome at each patient level and then applying thresholds. Multiple models were created and then validated for accuracy metrics to select the best model. Cross validation and area under ROC curve utilized. Scoring was done on the prospective cohort to group patients into high risk, medium risk and low risk. High risk group was to be targeted for interventions.
  • 10. PRACTICAL USE CASE AND CODE DEMO USE CASE DATASET • Risk Prediction for Diabetes • Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of Clinical Database Patient Records UCI MACHINE LEARNING REPOSITORY - Description 100000 T2DM patients from 30 hospitals; CERNER HEALTH FACTS OUTCOME • How likely is a patient to be diagnosed with DM in near future? • How likely is a T2DM patient to come back to the hospital, before 30 days post discharge and after 30 days discharge? METHODS Multiple ML models generated and compared Individual Classifiers: DT, LOGREG, SVC Ensemble Classifiers: RF, GBC GitHub Link