Data Science for
Health Care
Use Case:
Heart Risk Analytics
in Python
Chetan Khanzode
WHO Report
• Coronary heart disease (CHD) and stroke are the
leading causes of death in both economically
developed and developing countries. Each year, more
than 17 million people die from cardiovascular
disease worldwide.
• It is number 1 cause of death globally
Health Domain- CHD – ML Model
• CHD is a disease of the blood vessels
supplying the heart.
• Logistic regression ML model can be used
to predict risk of CHD
• Collection of Health and behavioural data,
improve overall health of individuals by
identifying preventive measure to reduce
cost with best treatment to individual.
Health Domain- CHD – Risk Factors
Risk factors are variables that increase the chances of a
disease
• Demographic risk factors
– Gender: sex of patient
– Age: Age in years
– Education: education of patient
• First medical examination Risk factors
– TotChol: Total cholesterol (mg/dL)
– SysBP: Systolic blood pressure
– DiaBP: Diastolic blood pressure
– BMI: Body Mass Index, weight (kg)/height (m)2
– HeartRate: Heart rate (beats/minute)
– Glucose: Blood glucose level (mg/dL)
Build Predictive Model
Data Input
Impute the
Missing
Values with
median
values
Randomly split
the dataset into
Training and Test
sets
Train the Logistic
regression Model on
training set to predict
whether or not patient
experience CHD within
10 years of first medical
examination
Evaluate predictive
model on test set for
future set of
observations
Health Data Input – Dataset
Dataset used contains 4200+ observations
14 independent variables
1 dependent variable
Import Relevant Libs and Impute
missing values
Quick insight into data
Quick insight into data
Quick insight into data - Grouping
Quick insight into data - correlation
Quick insight into data - SysBP
Quick insight into data - Heatmap
Train and Test set , CM metric
The model accuracy on test data is 85.7%.
Confusion Matrix
Model Usage
• “Prevention is better than cure”. The model can help to predict CHD risk to individual and
they can adapt to healthy style and avoid expenses in later stage of life.
• Hospitals can run the health campaign with this predicted model to identify set of patients
who are at risk of CHD. Also it can be used for new set for medical examinations.
• $Revenue = x*$y*$z – $o
Where
– x is number of people who enrol for preventive care
– $y is amount of expenses
– Period of say z( 5 to 10 years).
– Overhead (for say marketing campaign etc.)
– Also as next step can be to reduce overhead by implementing AI solutions so as to
improve the revenue.
Thank you

Data science in health care

  • 1.
    Data Science for HealthCare Use Case: Heart Risk Analytics in Python Chetan Khanzode
  • 2.
    WHO Report • Coronaryheart disease (CHD) and stroke are the leading causes of death in both economically developed and developing countries. Each year, more than 17 million people die from cardiovascular disease worldwide. • It is number 1 cause of death globally
  • 3.
    Health Domain- CHD– ML Model • CHD is a disease of the blood vessels supplying the heart. • Logistic regression ML model can be used to predict risk of CHD • Collection of Health and behavioural data, improve overall health of individuals by identifying preventive measure to reduce cost with best treatment to individual.
  • 4.
    Health Domain- CHD– Risk Factors Risk factors are variables that increase the chances of a disease • Demographic risk factors – Gender: sex of patient – Age: Age in years – Education: education of patient • First medical examination Risk factors – TotChol: Total cholesterol (mg/dL) – SysBP: Systolic blood pressure – DiaBP: Diastolic blood pressure – BMI: Body Mass Index, weight (kg)/height (m)2 – HeartRate: Heart rate (beats/minute) – Glucose: Blood glucose level (mg/dL)
  • 5.
    Build Predictive Model DataInput Impute the Missing Values with median values Randomly split the dataset into Training and Test sets Train the Logistic regression Model on training set to predict whether or not patient experience CHD within 10 years of first medical examination Evaluate predictive model on test set for future set of observations
  • 6.
    Health Data Input– Dataset Dataset used contains 4200+ observations 14 independent variables 1 dependent variable
  • 7.
    Import Relevant Libsand Impute missing values
  • 8.
  • 9.
  • 10.
    Quick insight intodata - Grouping
  • 11.
    Quick insight intodata - correlation
  • 12.
    Quick insight intodata - SysBP
  • 13.
    Quick insight intodata - Heatmap
  • 14.
    Train and Testset , CM metric The model accuracy on test data is 85.7%.
  • 15.
  • 16.
    Model Usage • “Preventionis better than cure”. The model can help to predict CHD risk to individual and they can adapt to healthy style and avoid expenses in later stage of life. • Hospitals can run the health campaign with this predicted model to identify set of patients who are at risk of CHD. Also it can be used for new set for medical examinations. • $Revenue = x*$y*$z – $o Where – x is number of people who enrol for preventive care – $y is amount of expenses – Period of say z( 5 to 10 years). – Overhead (for say marketing campaign etc.) – Also as next step can be to reduce overhead by implementing AI solutions so as to improve the revenue.
  • 17.