SlideShare a Scribd company logo
1 of 10
DATA SCIENCE IN
HEALTHCARE & LIFE
SCIENCES
APPLIED CLINICAL ANALYTICS
DATA ANALYTICS IN HEALTHCARE & LIFE SCIENCES
1. VITAL BUSINESS PROBLEMS:
So many different problems exist and they are of varying degree of complexity:
- What impacts favorable clinical outcomes
- Drivers of adverse events
- Factors impacting cost of care
- Earlier diagnosis of cancers and chronic diseases
Understanding these different business problems is critical for generating
possible solutions
2. POTENTIAL DATA SOURCES:
Huge amounts of data is getting generated nowadays from different sources that
are capable of capturing information :
- Electronic Health Records
- Healthcare claims from Insurance companies
- Pharmacies – claims and medication reviews
- Lab tests and Imaging results
- Population health data – Social Determinants of Health
- Genomics (and later Proteomics and Metabolomics)
- Wearable and other devices
- Other sources (Surveys, Patient Reported Outcomes)
The volume, velocity, variety, and veracity that is getting generated is staggering
– typical Big Data problem.
3. DATA PROCESSING, MANAGEMENT AND ANALYSIS:
Making sense of these varied sources of data and processing them so that they are useful for analysis is a data engineering challenge.
Structured data needs to be cleaned and curated; data from different sources need to be matched to get a complete 360 degree view of the customer.
Semi-structured and unstructured data sources (Physician notes, imaging data) pose challenges to curate and store the information so that it can be retrieved and
analyzed at scale and speed.
Various Big Data technologies have been developed to tackle this problem of storing(HADOOP ecosystem, SPARK) and analyzing semi-structured and unstructured data
(Text mining, NLP, Deep Learning for Image and Video Analytics).
4. SOLUTIONS TO THE PROBLEMS:
At the end of the day, all the analysis should be able to generate actionable insights. Interpretation of the results and their implementation to solve the problem are key.
HOW ML/DL CAN AUGMENT THE DECISION MAKING
PROCESS FOR CLINICIANS
PROGNOSIS
•A machine-learning
model can learn the
patterns of health
trajectories of vast
numbers of patients.
This facility can help
physicians to
anticipate future
events at an expert
level, drawing from
information well
beyond the
individual physician’s
practice experience.
For example, how
likely is it that a
patient will be able
to return to work, or
how quickly will the
disease progress?
DIAGNOSIS
•A diagnostic error
will occur in the
care of nearly every
patient in his or her
lifetime, and
receiving the right
diagnosis is critical
to receiving
appropriate care.
This problem is not
limited to rare
conditions. Cardiac
chest pain, TB,
dysentery, and
complications of
childbirth are
commonly not
detected even in
developing
countries
TREATMENT
•In a large health
care system with
tens of thousands of
physicians treating
tens of millions of
patients, there is
variation in when
and why patients
present for care and
how patients with
similar conditions
are treated. Can a
model sort through
these natural
variations to help
physicians identify
when the collective
experience points to
a preferred
treatment pathway?
CLINICALWORKFLOW
•The same machine-
learning techniques
that are used in
many consumer
products can be
used to make
clinicians more
efficient. Machine
learning that drives
search engines can
help expose reqd.
.information in a
patient’s chart for a
clinician without
multiple clicks.
Data entry of forms
and text fields can
be improved with
the use of machine-
learning
techniques.
REMOTEAREAS
•There is no way for
physicians to
individually interact
with all the patients
who may need care.
Can machine learning
extend the reach of
clinicians to provide
expert-level medical
assessment without
involvement? For
example, patients
with new rashes may
be able to obtain a
diagnosis by sending
a picture that they
take on their
smartphones,
thereby averting
unnecessary urgent-
care visits.
REFERENCE: https://www.nejm.org/doi/full/10.1056/NEJMra1814259
COMPONENTS OF ELECTRONIC HEALTH RECORDS
EMR
DEMOG &
HISTORY
DRUGS
ALLERGIES
VISITS
ADMISSIONS
DIAGNOSES
LAB
RESULTS
PROCEDURE
ADDITIONAL DATA FACTORS (normally not present)
 GENOMICS
 SOCIAL DETERMINANTS OF HEALTH
 IMAGING DATA – X-RAY/USG/CT/MRI
 PATIENT REPORTED OUTCOMES - PRO
STANDARD EMR/EHR DATA COMPONENTS
 DEMOGRAHICS – Age, Gender, Race, Language, Religion, Insurance, Location
 CLINICAL HISTORY – Habits, Past Dx and Observations
 MEDICATIONS – Drug NDC, Quantity, Refills, Route, Rx dates
 FOOD AND DRUG ALLERGIES – Allergen, Reaction Desc., Severity, Dates
 VISITS TO ER AND OPD – Date/Time, Encounter Type, Provider Info
 INPATIENT ADMISSIONS – Date/Time, Source, Discharge Code
 PRIMARY DIAGNOSES AND COMORBIDITIES – ICD9/10, SNOMED
 PROCEDURES AND SURGERIES – Procedure codes and ICD codes
 LABORATORY RESULTS – LOINC, Date/Time, Reference Range, Value, UOM
Standard dictionaries: ICD9/10, SNOMED-CT, NDC, LOINC, NPI
GENOMICS IMAGING SDoH OUTCOMES
DIABETES – THE MAGNITUDE OF THE PROBLEM
Diabetes is the world's
eighth biggest killer,
accounting for some 1.5
million deaths each year. A
major new World Health
Organization report has
now revealed that the
number of cases around the
world has nearly
quadrupled to 422 million
in 2014 from 108 million in
1980. The Eastern-
Mediterranean region had
the biggest increase in cases
during that time frame.
Diabetes now affects one in
11 adults with high blood
sugar levels linked to 3.8
million deaths every year.
REFERENCE:
https://www.statista.com/chart/4617/the-
unrelenting-global-march-of-diabetes/
WHAT HAPPENS IN DIABETES MELLITUS
• https://youtu.be/qn2dhw0NJxo
Type 1 diabetes (T2DM)
In people with type 1 diabetes, the
body does not make insulin. The
immune system attacks and destroys
the cells in the pancreas that make
insulin. Type 1 diabetes is usually
diagnosed in children and young
adults, although it can appear at any
age. People with type 1 diabetes need
to take insulin every day to stay alive.
Type 2 diabetes (T1DM)
In people having type 2 diabetes, the
body does not make or use insulin
well. It can develop diabetes at any
age, even during childhood. However,
this type of diabetes occurs most often
in middle-aged and older people. Type
2 is the most common type of
diabetes.
COURTESY: NIDDK
https://www.niddk.nih.gov/health-
information/diabetes/overview/what-is-diabetes
IMAGE COURTESY: KHAN ACADEMY
HOW MACHINE LEARNING CAN HELP IN DIABETES
Predicting risk of heart failure for
diabetes patients with help from
machine learning
Identification of Type 2 Diabetes
Risk Factors Using Phenotypes
Consisting of Anthropometry and
Triglycerides based on Machine
Learning
Use of a Machine Learning
Algorithm Improves Prediction of
Progression to Diabetes
Predicting Future Glucose
Fluctuations Using Machine
Learning and Wearable Sensor Data
Predicting Diabetes Mellitus With
Machine Learning Techniques
Machine-learning to stratify
diabetic patients using novel
cardiac biomarkers and integrative
genomics
Predicting diabetic retinopathy and
identifying interpretable biomedical
features using machine learning
algorithms
Impact of HbA1c Measurement on
Hospital Readmission Rates:
Analysis of 70,000 Clinical Database
Patient Records
Data-Driven Blood Glucose Pattern
Classification and Anomalies
Detection: Machine-Learning
Applications in Type 1 Diabetes
APPROACH FOR DM READMISSION PREDICTIVE MODEL
• DMT2 risk prediction using clinical data and statistical and machine learning
algorithms/models
8
Predictor Variables (total 44 variables)
 Demographic
 Age
 Gender
 Ethnicity
 Diagnosis
 Type of Condition(DM T1/T2) diagnosis
 # of comorbidities
 Position (primary, secondary, etc.) of
diagnosis
 Encounter
 IP, OP, AE visits
 Medications
 Dosage, frequency, route
 Lab results
 Test names, dates, UOM, value
 Normal/abnormal result
 Admission
 Length of stay
 Admission method (elective, non-
elective)
 Discharge destination
 Procedure
 Count of procedures
 Cost of procedures
Response Variable
 Readmission within 30 days
INPUT MODEL OUTPUT
4 years 1 year
Observation
window
Performance
window
Validation
window
Data split into time windows1
2 Models built using following algorithms (data from
observation and performance windows)
 Logistic regression model (LOG)
 Decision tree model (DT)
 Random forest model (RF)
 Model Ensembles
3 In-time validation (within performance window)
48.6%
74.3%
34.9%
29.4%
37.3%
68.7%
38.5%
28.2%
53.5%
76.7%
39.8%
33.7%
GINI AUC KS WORST
DECILE
CAPTURELOG DT RF
4 Out-of-time validation (in validation window)
All three models provided accuracy of
~80% in out-of-time validation scenario
RF model with ~76% AUC indicates reasonably good fit
Significant variables (major
drivers of readmission)
 SEVERITY OF DM
 # of DM spells in past 1 year
 ED LOS in past 1 year
 # of procedures undergone
 # of OPD visits in past 1 year
 # of ED visits in past 1 year
 # of IP visits in past 1 year
 # of comorbidities
 Distance from hospital
 DM LOS in past 1 year
 Time since last ED visit
 Total ED cost in past 1 year
 Age of patient
Patient category based on
risk score
HighLow
5
6
9
RISK PREDICTION MODEL: DESIGN, EVALUATION
• Mean/Median
• Regression
• KNN
Missing
imputation
• Feature Imp
• RFE
• WoE and IV
Feature
Selection
• Tree based
(DT, RF, GBT)
• Others (SVM,
NN, NB)
Model
Build
• K-fold cross
validation
• ROC curve
Model
Evaluation
Patient cohorts are created based on ICD 9/10 codes for defined chronic disease (e.g. DMT2) and also on the time of
diagnosis to separate already diagnosed patients from those who will potentially develop the disease.
Prospective
Cohort -
Scoring
Dataset
Feature selection
mechanisms help to
focus on the most
important variables
which the outcome
variable – methods
mentioned above
have been used.
EMR data has many
dimensions and this
also means lot of
values are missing –
imputation methods
help keep most of
the features usable.
The basic task is
classification which
is done by
computing the
probability of
outcome at each
patient level and
then applying
thresholds.
Multiple models
were created and
then validated for
accuracy metrics to
select the best
model. Cross
validation and area
under ROC curve
utilized.
Scoring was done
on the prospective
cohort to group
patients into high
risk, medium risk
and low risk. High
risk group was to be
targeted for
interventions.
PRACTICAL USE CASE AND CODE DEMO
USE CASE
DATASET
• Risk Prediction for Diabetes
• Impact of HbA1c Measurement on Hospital Readmission Rates:
Analysis of Clinical Database Patient Records
UCI MACHINE LEARNING REPOSITORY - Description
100000 T2DM patients from 30 hospitals; CERNER HEALTH FACTS
OUTCOME
• How likely is a patient to be diagnosed with DM in near future?
• How likely is a T2DM patient to come back to the hospital, before
30 days post discharge and after 30 days discharge?
METHODS
Multiple ML models generated and compared
Individual Classifiers: DT, LOGREG, SVC
Ensemble Classifiers: RF, GBC
GitHub Link

More Related Content

What's hot

Diabetes Data Science
Diabetes Data ScienceDiabetes Data Science
Diabetes Data SciencePhilip Bourne
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease PredictionMustafa Oğuz
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MININGshivaniyadav112
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonGrammarly
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learningjagan477830
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industryBhagath Gopinath
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.SUJIT SHIBAPRASAD MAITY
 
Machine learning in healthcare.pptx
Machine learning in healthcare.pptxMachine learning in healthcare.pptx
Machine learning in healthcare.pptxharshit338894
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMamiteshg
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease predictionAriful Haque
 
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptxCardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptxTaminul Islam
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data miningAbheepsa Pattnaik
 
Artificial intelligence (a.i) copy (1)
Artificial intelligence (a.i) copy (1)Artificial intelligence (a.i) copy (1)
Artificial intelligence (a.i) copy (1)Sharda University
 
Artificial Intelligence and Diabetes
Artificial Intelligence and DiabetesArtificial Intelligence and Diabetes
Artificial Intelligence and DiabetesIris Thiele Isip-Tan
 

What's hot (20)

Diabetes Data Science
Diabetes Data ScienceDiabetes Data Science
Diabetes Data Science
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
Machine learning in healthcare.pptx
Machine learning in healthcare.pptxMachine learning in healthcare.pptx
Machine learning in healthcare.pptx
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptxCardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data mining
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Artificial intelligence (a.i) copy (1)
Artificial intelligence (a.i) copy (1)Artificial intelligence (a.i) copy (1)
Artificial intelligence (a.i) copy (1)
 
Artificial Intelligence and Diabetes
Artificial Intelligence and DiabetesArtificial Intelligence and Diabetes
Artificial Intelligence and Diabetes
 

Similar to Predictive Analytics and Machine Learning for Healthcare - Diabetes

Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningIRJET Journal
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryDinesh V
 
Basics of Information support of the hospital
Basics of Information support of the hospitalBasics of Information support of the hospital
Basics of Information support of the hospitalEneutron
 
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...Fundación Ramón Areces
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014ipposi
 
Detection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learningDetection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learningIJICTJOURNAL
 
Genomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical RecordsGenomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical RecordsLyle Berkowitz, MD
 
Multiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewMultiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewIRJET Journal
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSemantic Web San Diego
 
How predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patientHow predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patientIMSHealthRWES
 
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...TELKOMNIKA JOURNAL
 
K-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemiaK-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemiaijtsrd
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer SystemIRJET Journal
 
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision MedicineElectronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision MedicineKent State University
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and MedicineWarren Kibbe
 
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...IRJET Journal
 
Unified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on AccuracyUnified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on AccuracyQuahog Life Sciences
 
Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.IRJET Journal
 

Similar to Predictive Analytics and Machine Learning for Healthcare - Diabetes (20)

Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
 
Data Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare IndustryData Science Deep Roots in Healthcare Industry
Data Science Deep Roots in Healthcare Industry
 
Basics of Information support of the hospital
Basics of Information support of the hospitalBasics of Information support of the hospital
Basics of Information support of the hospital
 
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
Oscar Rodríguez-El impacto de las ciencias ómicas en la medicina, la nutrició...
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014
 
Pavia wsp october 2011
Pavia wsp october 2011Pavia wsp october 2011
Pavia wsp october 2011
 
Patient generated-data
Patient generated-dataPatient generated-data
Patient generated-data
 
Detection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learningDetection of myocardial infarction on recent dataset using machine learning
Detection of myocardial infarction on recent dataset using machine learning
 
Genomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical RecordsGenomics, Personalized Medicine and Electronic Medical Records
Genomics, Personalized Medicine and Electronic Medical Records
 
Multiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewMultiple Disease Prediction System: A Review
Multiple Disease Prediction System: A Review
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
How predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patientHow predictive analytics can help find the rare disease patient
How predictive analytics can help find the rare disease patient
 
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
Therapeutic management of diseases based on fuzzy logic system- hypertriglyce...
 
K-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemiaK-Nearest Neighbours based diagnosis of hyperglycemia
K-Nearest Neighbours based diagnosis of hyperglycemia
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer System
 
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision MedicineElectronic Medical Records: From Clinical Decision Support to Precision Medicine
Electronic Medical Records: From Clinical Decision Support to Precision Medicine
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
 
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...IRJET-  	  Diabetes Prediction by Machine Learning over Big Data from Healthc...
IRJET- Diabetes Prediction by Machine Learning over Big Data from Healthc...
 
Unified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on AccuracyUnified Medical Data Platform focused on Accuracy
Unified Medical Data Platform focused on Accuracy
 
Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.
 

Recently uploaded

Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligencePriyadharshiniG41
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachAdekunleJoseph4
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 

Recently uploaded (20)

Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligence
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
prediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approachprediction of default payment next month using a logistic approach
prediction of default payment next month using a logistic approach
 
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 

Predictive Analytics and Machine Learning for Healthcare - Diabetes

  • 1. DATA SCIENCE IN HEALTHCARE & LIFE SCIENCES APPLIED CLINICAL ANALYTICS
  • 2. DATA ANALYTICS IN HEALTHCARE & LIFE SCIENCES 1. VITAL BUSINESS PROBLEMS: So many different problems exist and they are of varying degree of complexity: - What impacts favorable clinical outcomes - Drivers of adverse events - Factors impacting cost of care - Earlier diagnosis of cancers and chronic diseases Understanding these different business problems is critical for generating possible solutions 2. POTENTIAL DATA SOURCES: Huge amounts of data is getting generated nowadays from different sources that are capable of capturing information : - Electronic Health Records - Healthcare claims from Insurance companies - Pharmacies – claims and medication reviews - Lab tests and Imaging results - Population health data – Social Determinants of Health - Genomics (and later Proteomics and Metabolomics) - Wearable and other devices - Other sources (Surveys, Patient Reported Outcomes) The volume, velocity, variety, and veracity that is getting generated is staggering – typical Big Data problem. 3. DATA PROCESSING, MANAGEMENT AND ANALYSIS: Making sense of these varied sources of data and processing them so that they are useful for analysis is a data engineering challenge. Structured data needs to be cleaned and curated; data from different sources need to be matched to get a complete 360 degree view of the customer. Semi-structured and unstructured data sources (Physician notes, imaging data) pose challenges to curate and store the information so that it can be retrieved and analyzed at scale and speed. Various Big Data technologies have been developed to tackle this problem of storing(HADOOP ecosystem, SPARK) and analyzing semi-structured and unstructured data (Text mining, NLP, Deep Learning for Image and Video Analytics). 4. SOLUTIONS TO THE PROBLEMS: At the end of the day, all the analysis should be able to generate actionable insights. Interpretation of the results and their implementation to solve the problem are key.
  • 3. HOW ML/DL CAN AUGMENT THE DECISION MAKING PROCESS FOR CLINICIANS PROGNOSIS •A machine-learning model can learn the patterns of health trajectories of vast numbers of patients. This facility can help physicians to anticipate future events at an expert level, drawing from information well beyond the individual physician’s practice experience. For example, how likely is it that a patient will be able to return to work, or how quickly will the disease progress? DIAGNOSIS •A diagnostic error will occur in the care of nearly every patient in his or her lifetime, and receiving the right diagnosis is critical to receiving appropriate care. This problem is not limited to rare conditions. Cardiac chest pain, TB, dysentery, and complications of childbirth are commonly not detected even in developing countries TREATMENT •In a large health care system with tens of thousands of physicians treating tens of millions of patients, there is variation in when and why patients present for care and how patients with similar conditions are treated. Can a model sort through these natural variations to help physicians identify when the collective experience points to a preferred treatment pathway? CLINICALWORKFLOW •The same machine- learning techniques that are used in many consumer products can be used to make clinicians more efficient. Machine learning that drives search engines can help expose reqd. .information in a patient’s chart for a clinician without multiple clicks. Data entry of forms and text fields can be improved with the use of machine- learning techniques. REMOTEAREAS •There is no way for physicians to individually interact with all the patients who may need care. Can machine learning extend the reach of clinicians to provide expert-level medical assessment without involvement? For example, patients with new rashes may be able to obtain a diagnosis by sending a picture that they take on their smartphones, thereby averting unnecessary urgent- care visits. REFERENCE: https://www.nejm.org/doi/full/10.1056/NEJMra1814259
  • 4. COMPONENTS OF ELECTRONIC HEALTH RECORDS EMR DEMOG & HISTORY DRUGS ALLERGIES VISITS ADMISSIONS DIAGNOSES LAB RESULTS PROCEDURE ADDITIONAL DATA FACTORS (normally not present)  GENOMICS  SOCIAL DETERMINANTS OF HEALTH  IMAGING DATA – X-RAY/USG/CT/MRI  PATIENT REPORTED OUTCOMES - PRO STANDARD EMR/EHR DATA COMPONENTS  DEMOGRAHICS – Age, Gender, Race, Language, Religion, Insurance, Location  CLINICAL HISTORY – Habits, Past Dx and Observations  MEDICATIONS – Drug NDC, Quantity, Refills, Route, Rx dates  FOOD AND DRUG ALLERGIES – Allergen, Reaction Desc., Severity, Dates  VISITS TO ER AND OPD – Date/Time, Encounter Type, Provider Info  INPATIENT ADMISSIONS – Date/Time, Source, Discharge Code  PRIMARY DIAGNOSES AND COMORBIDITIES – ICD9/10, SNOMED  PROCEDURES AND SURGERIES – Procedure codes and ICD codes  LABORATORY RESULTS – LOINC, Date/Time, Reference Range, Value, UOM Standard dictionaries: ICD9/10, SNOMED-CT, NDC, LOINC, NPI GENOMICS IMAGING SDoH OUTCOMES
  • 5. DIABETES – THE MAGNITUDE OF THE PROBLEM Diabetes is the world's eighth biggest killer, accounting for some 1.5 million deaths each year. A major new World Health Organization report has now revealed that the number of cases around the world has nearly quadrupled to 422 million in 2014 from 108 million in 1980. The Eastern- Mediterranean region had the biggest increase in cases during that time frame. Diabetes now affects one in 11 adults with high blood sugar levels linked to 3.8 million deaths every year. REFERENCE: https://www.statista.com/chart/4617/the- unrelenting-global-march-of-diabetes/
  • 6. WHAT HAPPENS IN DIABETES MELLITUS • https://youtu.be/qn2dhw0NJxo Type 1 diabetes (T2DM) In people with type 1 diabetes, the body does not make insulin. The immune system attacks and destroys the cells in the pancreas that make insulin. Type 1 diabetes is usually diagnosed in children and young adults, although it can appear at any age. People with type 1 diabetes need to take insulin every day to stay alive. Type 2 diabetes (T1DM) In people having type 2 diabetes, the body does not make or use insulin well. It can develop diabetes at any age, even during childhood. However, this type of diabetes occurs most often in middle-aged and older people. Type 2 is the most common type of diabetes. COURTESY: NIDDK https://www.niddk.nih.gov/health- information/diabetes/overview/what-is-diabetes IMAGE COURTESY: KHAN ACADEMY
  • 7. HOW MACHINE LEARNING CAN HELP IN DIABETES Predicting risk of heart failure for diabetes patients with help from machine learning Identification of Type 2 Diabetes Risk Factors Using Phenotypes Consisting of Anthropometry and Triglycerides based on Machine Learning Use of a Machine Learning Algorithm Improves Prediction of Progression to Diabetes Predicting Future Glucose Fluctuations Using Machine Learning and Wearable Sensor Data Predicting Diabetes Mellitus With Machine Learning Techniques Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes
  • 8. APPROACH FOR DM READMISSION PREDICTIVE MODEL • DMT2 risk prediction using clinical data and statistical and machine learning algorithms/models 8 Predictor Variables (total 44 variables)  Demographic  Age  Gender  Ethnicity  Diagnosis  Type of Condition(DM T1/T2) diagnosis  # of comorbidities  Position (primary, secondary, etc.) of diagnosis  Encounter  IP, OP, AE visits  Medications  Dosage, frequency, route  Lab results  Test names, dates, UOM, value  Normal/abnormal result  Admission  Length of stay  Admission method (elective, non- elective)  Discharge destination  Procedure  Count of procedures  Cost of procedures Response Variable  Readmission within 30 days INPUT MODEL OUTPUT 4 years 1 year Observation window Performance window Validation window Data split into time windows1 2 Models built using following algorithms (data from observation and performance windows)  Logistic regression model (LOG)  Decision tree model (DT)  Random forest model (RF)  Model Ensembles 3 In-time validation (within performance window) 48.6% 74.3% 34.9% 29.4% 37.3% 68.7% 38.5% 28.2% 53.5% 76.7% 39.8% 33.7% GINI AUC KS WORST DECILE CAPTURELOG DT RF 4 Out-of-time validation (in validation window) All three models provided accuracy of ~80% in out-of-time validation scenario RF model with ~76% AUC indicates reasonably good fit Significant variables (major drivers of readmission)  SEVERITY OF DM  # of DM spells in past 1 year  ED LOS in past 1 year  # of procedures undergone  # of OPD visits in past 1 year  # of ED visits in past 1 year  # of IP visits in past 1 year  # of comorbidities  Distance from hospital  DM LOS in past 1 year  Time since last ED visit  Total ED cost in past 1 year  Age of patient Patient category based on risk score HighLow 5 6
  • 9. 9 RISK PREDICTION MODEL: DESIGN, EVALUATION • Mean/Median • Regression • KNN Missing imputation • Feature Imp • RFE • WoE and IV Feature Selection • Tree based (DT, RF, GBT) • Others (SVM, NN, NB) Model Build • K-fold cross validation • ROC curve Model Evaluation Patient cohorts are created based on ICD 9/10 codes for defined chronic disease (e.g. DMT2) and also on the time of diagnosis to separate already diagnosed patients from those who will potentially develop the disease. Prospective Cohort - Scoring Dataset Feature selection mechanisms help to focus on the most important variables which the outcome variable – methods mentioned above have been used. EMR data has many dimensions and this also means lot of values are missing – imputation methods help keep most of the features usable. The basic task is classification which is done by computing the probability of outcome at each patient level and then applying thresholds. Multiple models were created and then validated for accuracy metrics to select the best model. Cross validation and area under ROC curve utilized. Scoring was done on the prospective cohort to group patients into high risk, medium risk and low risk. High risk group was to be targeted for interventions.
  • 10. PRACTICAL USE CASE AND CODE DEMO USE CASE DATASET • Risk Prediction for Diabetes • Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of Clinical Database Patient Records UCI MACHINE LEARNING REPOSITORY - Description 100000 T2DM patients from 30 hospitals; CERNER HEALTH FACTS OUTCOME • How likely is a patient to be diagnosed with DM in near future? • How likely is a T2DM patient to come back to the hospital, before 30 days post discharge and after 30 days discharge? METHODS Multiple ML models generated and compared Individual Classifiers: DT, LOGREG, SVC Ensemble Classifiers: RF, GBC GitHub Link