SlideShare a Scribd company logo
Cardiovascular Disease Risk Assessment
Evripidis Themelis, Democritus University of Thrace
Introduction
Cardiovascular diseases are a leading cause of death worldwide. Analyzing patient data can
provide insights into the factors contributing to these diseases and aid in early detection and
prevention. This report delves into a dataset containing patient records, exploring various
factors that may contribute to cardiovascular diseases.
Dataset Overview
The dataset consists of 70000 records of patients’ data (Kaggle). It encompasses 11 features,
including age, height, weight, and blood pressure readings, along with a target variable indi-
cating the presence or absence of cardiovascular disease:
1. Age:
• Type: Integer
• Description: Age of the person in days.
• Example Values: 1.8393 × 104
, 2.0228 × 104
, 1.8857 × 104
2. Gender:
• Type: Categorical Integer
• Description: Gender of the person (Assuming 1 represents ‘Female’ and 2 represents
‘Male’).
• Example Values: 2, 1, 1
3. Height:
• Type: Integer
• Description: Height of the person in centimeters.
• Example Values: 168, 156, 165
1
4. Weight:
• Type: Float
• Description: Weight of the person in kilograms.
• Example Values: 62, 85, 64
5. Systolic Blood Pressure (ap_hi):
• Type: Integer
• Description: Systolic blood pressure measurement.
• Example Values: 110, 140, 130
6. Diastolic Blood Pressure (ap_lo):
• Type: Integer
• Description: Diastolic blood pressure measurement.
• Example Values: 80, 90, 70
7. Cholesterol:
• Type: Categorical Integer
• Description: Cholesterol level.
– 1: normal
– 2: above normal
– 3: well above normal
• Example Values: 1, 3, 3
8. Glucose (gluc):
• Type: Categorical Integer
• Description: Glucose level.
– 1: normal
– 2: above normal
– 3: well above normal
• Example Values: 1, 1, 1
2
Table 1: Subset of the Cardiovascular Disease dataset
age gender height weight ap_hi ap_lo cholesterol
18393 2 168 62 110 80 1
20228 1 156 85 140 90 3
18857 1 165 64 130 70 3
17623 2 169 82 150 100 1
17474 1 156 56 100 60 1
21914 1 151 67 120 80 2
22113 1 157 93 130 80 3
22584 2 178 95 130 90 3
17668 1 158 71 110 70 1
19834 1 164 68 110 60 1
22530 1 169 80 120 80 1
18815 2 173 60 120 80 1
9. Smoking:
• Type: Binary (0 for ‘No’ and 1 for ‘Yes’)
• Description: Whether the person smokes or not.
• Example Values: 0, 0, 0
10. Alcohol Intake (alco):
• Type: Binary (0 for ‘No’ and 1 for ‘Yes’)
• Description: Whether the person consumes alcohol or not.
• Example Values: 0, 0, 0
11. Physical Activity (active):
• Type: Binary (0 for ‘No’ and 1 for ‘Yes’)
• Description: Whether the person is physically active or not.
• Example Values: 1, 1, 0
12. Cardiovascular Disease (cardio):
• Type: Binary (0 for ‘No’ and 1 for ‘Yes’)
• Description: Presence or absence of cardiovascular disease.
• Example Values: 0, 1, 1
3
Feature Engineering
New variables related to the age in years, height in meters, BMI categories, and blood pressure
health categories were created and added to the dataset.
Age in Years (data$age): The original age variable seems to be given in days. It’s trans-
formed to represent age in years by dividing the original value by 365. Height in Meters
(data[“height”]):
The original height variable was converted to meters by dividing the height by 100.
Body Mass Index (BMI) is a measure used to determine whether a person has an appro-
priate weight with respect to their height. It’s calculated with the formula:
BMI =
weight (kg)
height (m)
2
The BMI is then rounded to one decimal place.
A discretized BMI variable has been subsequently created:
Underweight: BMI < 18.5
Normal Weight: 18.5 ≤ BMI < 24.9
Overweight: 25 ≤ BMI < 29.9
Obesity Class 1 (Moderate): 30 ≤ BMI < 34.9
Obesity Class 2 (Severe): 35 ≤ BMI < 39.9
Obesity Class 3 (Very Severe): BMI ≥ 40
Blood Pressure Categories: A function is used to categorize patients based on their systolic
(the top number) and diastolic (the bottom number) blood pressure readings into different
health categories:
Normal Blood Pressure: Systolic < 120 and Diastolic < 80
Prehypertension: Systolic 120 ≤ BP < 130 and Diastolic < 80
Hypertension Stage 1: Systolic 130 ≤ BP < 140 or Diastolic 80 ≤ BP < 90
Hypertension Stage 2: Systolic BP ≥ 140 or Diastolic BP ≥ 90
Hypertensive Crisis: Systolic BP ≥ 180 or Diastolic BP ≥ 120
4
Correlations
Understanding correlations between various features can highlight the relationships and de-
pendencies among them. For instance, a strong positive correlation between age and the onset
of cardiovascular disease might indicate age as a significant risk factor.
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
age
height
weight
ap_hi
ap_lo
BMI
age
height
weight
ap_hi
ap_lo
BMI
As anticipated, height exhibits a positive correlation with weight and BMI, while the correla-
tions among the remaining variables are relatively weak.
How does the age distribution look for individuals with and without cardiovascular diseases?
Understanding the age distribution of individuals with cardiovascular diseases (CVD) is crucial.
Age is one of the most established risk factors for CVD. As age increases, the risk of damaged
and narrowed arteries, as well as weakened or thickened heart muscle, increases. Here, we
visualize how age varies for those with and without CVD. On average, individuals with CVD
tend to be older.
5
30
40
50
60
0 1
Cardiovascular Disease Status
Age
No Cardiovascular Disease Cardiovascular Disease
Age Distribution by Cardiovascular Disease Status
Cholesterol Levels and Cardiovascular Disease Prevalence
Cholesterol is a fatty substance found in the blood. While it’s vital for the formation of cell
membranes, certain hormones, and vitamin D, too much cholesterol in the blood can increase
the risk of cardiovascular diseases. Here, we explore how varying cholesterol levels correlate
with the prevalence of cardiovascular diseases. The three cholesterol levels are:
Normal: Cholesterol levels are within the recommended range.
Above Normal: Cholesterol levels are higher than what’s considered normal but not critically
high.
Well Above Normal: Critically high cholesterol levels that might need immediate medical
attention.
6
0.00
0.25
0.50
0.75
1.00
Normal Above Normal Well Above Normal
Cholesterol Level
Proportion
No Cardiovascular Disease Cardiovascular Disease
Cholesterol Levels and Cardiovascular Disease Prevalence
The proportion of individuals with CVD tends to increase as we transition from the normal
cholesterol group to the significantly above-normal cholesterol group.
Blood Pressure Categories and Cardiovascular Disease Prevalence
Blood pressure, a key health indicator, measures the force exerted by blood against the walls
of your arteries as your heart pumps it around your body. It’s a vital sign of cardiovascular
health. Persistent high blood pressure, medically known as hypertension, can lead to serious
health complications. It can increase the risk of heart disease, stroke, kidney disease, and
other health problems.
The categories of blood pressure are defined as:
Normal Blood Pressure: Indicative of a healthy heart and no heightened risk of cardiovas-
cular disease.
Hypertension Stage 1: A warning sign that one might be at risk of heart-related complica-
tions in the future.
Hypertension Stage 2: Indicates an advanced level of hypertension, which might result in
damage to critical organs if left unaddressed.
Hypertensive Crisis: A severe condition that demands immediate medical attention.
In this visualization, we aim to explore the distribution of these blood pressure categories in
relation to the presence or absence of cardiovascular diseases.
7
0.00
0.25
0.50
0.75
1.00
Normal Blood Pressure
Hypertension Stage 1
Hypertension Stage 2Hypertensive Crisis
Blood Pressure Category
Proportion
No Cardiovascular Disease Cardiovascular Disease
Blood Pressure Categories and Cardiovascular Disease Prevalence
The prevalence of CVD triples as we progress from the normal blood pressure group to the
Hypertensive Crisis group.
BMI and Cardiovascular Diseases
Body Mass Index (BMI) is a widely used measure to categorize individuals based on their
weight relative to their height. It’s a useful metric to gauge whether a person has an appropriate
weight for their height. Here are the typical BMI categories:
Underweight: BMI is less than 18.5.
Normal Weight: BMI is 18.5 to 24.9.
Overweight: BMI is 25 to 29.9.
Obesity Class 1 (Moderate): BMI is 30 to 34.9.
Obesity Class 2 (Severe): BMI is 35 to 39.9.
Obesity Class 3 (Very Severe or Morbidly Obese): BMI is 40 or higher.
Higher BMI values are associated with increased risks of CVD.
8
0.00
0.25
0.50
0.75
1.00
U
n
d
e
r
w
e
i
g
h
t
N
o
r
m
a
l
W
e
i
g
h
t
O
v
e
r
w
e
i
g
h
t
O
b
e
s
i
t
y
C
l
a
s
s
1
(
M
o
d
e
r
a
t
e
)
O
b
e
s
i
t
y
C
l
a
s
s
2
(
S
e
v
e
r
e
)
O
b
e
s
i
t
y
C
l
a
s
s
3
(
V
e
r
y
S
e
v
e
r
e
)
BMI Category
Proportion
No Cardiovascular Disease Cardiovascular Disease
BMI Categories and Cardiovascular Disease Prevalence
Logistic Regression Model for Predicting Cardiovascular Diseases
Logistic Regression is a statistical method used for analyzing datasets in which there are one
or more independent variables that determine an outcome. The outcome is measured with
a dichotomous variable (in which there are only two possible outcomes). In this case, we’re
using logistic regression with 5-fold cross-validation to predict the likelihood of a patient having
cardiovascular disease based on several features.
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 8313 3573
1 2193 6920
Accuracy : 0.7254
95% CI : (0.7193, 0.7314)
No Information Rate : 0.5003
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.4508
Mcnemar's Test P-Value : < 2.2e-16
9
Sensitivity : 0.7913
Specificity : 0.6595
Pos Pred Value : 0.6994
Neg Pred Value : 0.7594
Prevalence : 0.5003
Detection Rate : 0.3959
Detection Prevalence : 0.5660
Balanced Accuracy : 0.7254
'Positive' Class : 0
Model Performance Insights
The model achieved an accuracy of 72.54%, meaning it correctly predicted the cardiovascular
disease status for about 72.54% of the patients in the test data. While this is a decent
performance, there might be room for improvement using either more advanced algorithms
or feature engineering.
The model has a sensitivity (true positive rate) of 79.13% and a specificity (true negative rate)
of 65.95%. While it correctly identifies a large proportion of the patients with cardiovascular
diseases, it also has a relatively higher false positive rate.
The model has a positive predictive value (precision) of 69.94% and a negative predictive value
of 75.94%. This suggests that when the model predicts a patient has cardiovascular disease,
it’s correct about 69.94% of the time. On the other hand, when it predicts a patient doesn’t
have the disease, it’s correct about 75.94% of the time.
Model Interpretation
Age is a significant predictor of cardiovascular disease. As age increases, the likelihood of
having cardiovascular disease also increases.
Having cholesterol level 2 or 3 (above normal or well above normal) significantly increases
the odds of cardiovascular disease.
Glucose level 3 (well above normal) is associated with a decreased likelihood, though this is
a bit counterintuitive and might require further investigation.
Smoking, alcohol consumption, and being inactive are associated with increased odds
of cardiovascular disease.
As BMI category increases, the odds of having cardiovascular disease also tend to increase.
10
Blood pressure categories (Hypertension Stage 1, Hypertension Stage 2, Hypertensive
Crisis) are very significant predictors. The higher the blood pressure category, the higher the
odds of having cardiovascular disease.
A Decision Tree for Predicting Cardiovascular Diseases
We build and evaluate a decision tree model for predicting cardiovascular diseases.
The model achieved an accuracy of 72.76%, indicating that it correctly predicted the cardio-
vascular disease status for approximately 72.76% of the patients in the test data. While this
accuracy is decent, there is still potential for improvement, either through more advanced
algorithms or feature engineering.
In terms of sensitivity (true positive rate), the model performs reasonably well with a sensitivity
of 75.77%. This means that it correctly identifies 75.77% of the patients with cardiovascular
diseases. However, it’s important to note that there is still a portion of patients with the
disease that the model does not detect.
The model also exhibits a specificity (true negative rate) of 69.75%. This suggests that it
correctly identifies 69.75% of the patients without cardiovascular disease. While this is a
respectable rate, there is a relatively higher false positive rate, indicating that the model may
incorrectly classify some patients as having the disease when they do not.
The positive predictive value (precision) of the model is 71.49%. This means that when the
model predicts a patient has cardiovascular disease, it is correct approximately 71.49% of the
time. On the other hand, the negative predictive value is 74.19%, indicating that when the
model predicts a patient does not have the disease, it is correct about 74.19% of the time.
Overall, the model provides a reasonable level of accuracy and balance between sensitivity
and specificity. It correctly identifies a significant portion of patients with and without car-
diovascular disease. However, there is still room for improvement, and further refinement of
the model may enhance its predictive performance.
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 7960 3174
1 2546 7319
Accuracy : 0.7276
95% CI : (0.7215, 0.7336)
No Information Rate : 0.5003
P-Value [Acc > NIR] : < 2.2e-16
11
Kappa : 0.4552
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.7577
Specificity : 0.6975
Pos Pred Value : 0.7149
Neg Pred Value : 0.7419
Prevalence : 0.5003
Detection Rate : 0.3791
Detection Prevalence : 0.5302
Balanced Accuracy : 0.7276
'Positive' Class : 0
A Decision Tree for Predicting Cardiovascular Diseases
The primary predictors in this decision tree model are newap, age, and cholesterol, and they
contribute to classifying individuals into two categories, Class 0 and Class 1, indicating the
absence or presence of cardiovascular disease, respectively.
Specifically:
• Individuals with hypertension stage 2 or experiencing a hypertensive crisis are more likely
to be classified as having cardiovascular disease (Class 1).
• Individuals with an age less than 54, lower cholesterol levels, and either no blood pres-
sure information or hypertension stage 1 are more likely to be classified as not having
cardiovascular disease (Class 0).
• Individuals aged 54 or older, with higher cholesterol levels, and known blood pressure
values are more likely to be classified as having cardiovascular disease (Class 1).
12
newap = NBP,HS1
age < 54
cholesterol = 1,2 cholesterol = 1,2
age < 61
0
25e+3 24e+3
100%
0
21e+3 11e+3
65%
0
14e+3 4747
38%
0
13e+3 4168
36%
1
321 579
2%
0
7042 6215
27%
0
6598 5055
24%
0
5170 3382
17%
1
1428 1673
6%
1
444 1160
3%
1
3774 14e+3
35%
yes no
Concluding remarks
In conclusion, this report has explored a dataset containing patient records with a focus on
understanding the factors contributing to cardiovascular diseases and improving early detection
and prevention.
The key findings and insights from this analysis are as follows:
Age is a significant risk factor: As age increases, the likelihood of having cardiovascular
disease also increases. This underscores the importance of age as a risk factor in cardiovascular
health.
Elevated cholesterol levels (above normal and well above normal) are associated with an
increased risk of cardiovascular disease. Monitoring and managing cholesterol levels is crucial
in preventing cardiovascular diseases.
Blood pressure is a vital indicator of cardiovascular health. The higher the blood pressure
category (Hypertension Stage 1, Hypertension Stage 2, or Hypertensive Crisis), the higher the
odds of having cardiovascular disease. Regular blood pressure monitoring and management
are essential for cardiovascular health.
Body Mass Index (BMI) is a significant predictor. Higher BMI categories (overweight and
different obesity classes) are associated with an increased likelihood of cardiovascular disease.
Weight management and maintaining a healthy BMI are essential for reducing the risk of
cardiovascular diseases.
13
Smoking, alcohol consumption, and physical inactivity are associated with increased odds of
cardiovascular disease. Promoting a healthy lifestyle, including smoking cessation, moderate
alcohol consumption, and regular physical activity, is crucial for cardiovascular health.
Additionally, we developed two predictive models, logistic regression and a decision tree, to
assess the likelihood of a patient having cardiovascular disease based on various features. These
models achieved reasonable accuracy, with scope for further improvement through advanced
algorithms and feature engineering.
Overall, this analysis provides valuable insights into the risk factors and predictive models for
cardiovascular diseases, serving as a foundation for further research and healthcare interven-
tions to reduce the global burden of cardiovascular diseases.
14

More Related Content

Similar to Themelis-BIP-Project.pdf

Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...
Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...
Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...
International Multispeciality Journal of Health
 
Purification & Rejuvenation Public Lecture
Purification & Rejuvenation Public LecturePurification & Rejuvenation Public Lecture
Purification & Rejuvenation Public Lecture
DrConley
 
Hypertension Community Medicine Presentation
Hypertension Community Medicine PresentationHypertension Community Medicine Presentation
Hypertension Community Medicine Presentation
AdwaithA2
 
How Excess Weight Affects Your Health.pdf
How Excess Weight Affects Your Health.pdfHow Excess Weight Affects Your Health.pdf
How Excess Weight Affects Your Health.pdf
yousuf938073
 
Non-communicalbe diseases and its prevention
Non-communicalbe diseases and its preventionNon-communicalbe diseases and its prevention
Non-communicalbe diseases and its prevention
Shoaib Kashem
 
Personal System Biology
Personal System BiologyPersonal System Biology
Personal System Biology
Rebooted Body
 
Built and nutrition
Built and nutritionBuilt and nutrition
Built and nutrition
Chetan Ganteppanavar
 
Impact of obesity on cardiometabolic risk: Will we lose the battle?
Impact of obesity on cardiometabolic risk: Will we lose the battle?Impact of obesity on cardiometabolic risk: Will we lose the battle?
Impact of obesity on cardiometabolic risk: Will we lose the battle?
My Healthy Waist
 
noncommunicalbedisease-150721175002-lva1-app6891.pdf
noncommunicalbedisease-150721175002-lva1-app6891.pdfnoncommunicalbedisease-150721175002-lva1-app6891.pdf
noncommunicalbedisease-150721175002-lva1-app6891.pdf
CharmaineCanono
 
Heart Diseases.pdf
Heart Diseases.pdfHeart Diseases.pdf
Obesity
Obesity Obesity
Obesity
Kyaw Win
 
Nih causes of weight gain and obesity and strategies and help losing weight
Nih causes of weight gain and obesity and strategies and help losing weightNih causes of weight gain and obesity and strategies and help losing weight
Nih causes of weight gain and obesity and strategies and help losing weight
Prab Tumpati
 
Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...
Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...
Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...
magdy elmasry
 
Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....
Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....
Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....
Summit Health
 
Lipid Profile of Kashmiri Type 2 Diabetic Patients
Lipid Profile of Kashmiri Type 2 Diabetic PatientsLipid Profile of Kashmiri Type 2 Diabetic Patients
Lipid Profile of Kashmiri Type 2 Diabetic Patients
International Journal of Science and Research (IJSR)
 
Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1
Riverside County Office of Education
 
Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1
Riverside County Office of Education
 
Health Risks of Being Overweight & Obesity | How to lose weight fast
Health Risks of Being Overweight & Obesity | How to lose weight fastHealth Risks of Being Overweight & Obesity | How to lose weight fast
Health Risks of Being Overweight & Obesity | How to lose weight fast
lose_weight_fast
 
Global Medical Cures™ | Your Guide to Lowering High Blood Pressure
Global Medical Cures™ | Your Guide to Lowering High Blood PressureGlobal Medical Cures™ | Your Guide to Lowering High Blood Pressure
Global Medical Cures™ | Your Guide to Lowering High Blood Pressure
Global Medical Cures™
 
Obesity and heart disease
Obesity and heart diseaseObesity and heart disease
Obesity and heart disease
Prateek Singh
 

Similar to Themelis-BIP-Project.pdf (20)

Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...
Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...
Micro-albuminuria in non-diabetic, non-hypertensive cardiovascular disease (C...
 
Purification & Rejuvenation Public Lecture
Purification & Rejuvenation Public LecturePurification & Rejuvenation Public Lecture
Purification & Rejuvenation Public Lecture
 
Hypertension Community Medicine Presentation
Hypertension Community Medicine PresentationHypertension Community Medicine Presentation
Hypertension Community Medicine Presentation
 
How Excess Weight Affects Your Health.pdf
How Excess Weight Affects Your Health.pdfHow Excess Weight Affects Your Health.pdf
How Excess Weight Affects Your Health.pdf
 
Non-communicalbe diseases and its prevention
Non-communicalbe diseases and its preventionNon-communicalbe diseases and its prevention
Non-communicalbe diseases and its prevention
 
Personal System Biology
Personal System BiologyPersonal System Biology
Personal System Biology
 
Built and nutrition
Built and nutritionBuilt and nutrition
Built and nutrition
 
Impact of obesity on cardiometabolic risk: Will we lose the battle?
Impact of obesity on cardiometabolic risk: Will we lose the battle?Impact of obesity on cardiometabolic risk: Will we lose the battle?
Impact of obesity on cardiometabolic risk: Will we lose the battle?
 
noncommunicalbedisease-150721175002-lva1-app6891.pdf
noncommunicalbedisease-150721175002-lva1-app6891.pdfnoncommunicalbedisease-150721175002-lva1-app6891.pdf
noncommunicalbedisease-150721175002-lva1-app6891.pdf
 
Heart Diseases.pdf
Heart Diseases.pdfHeart Diseases.pdf
Heart Diseases.pdf
 
Obesity
Obesity Obesity
Obesity
 
Nih causes of weight gain and obesity and strategies and help losing weight
Nih causes of weight gain and obesity and strategies and help losing weightNih causes of weight gain and obesity and strategies and help losing weight
Nih causes of weight gain and obesity and strategies and help losing weight
 
Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...
Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...
Navigating Inter-connected Cardio-metabolic ConditionsTGlobal cardiometabolic...
 
Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....
Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....
Living a Heart Healthy Life - Liliana Cohen - West Orange Public Library - 2....
 
Lipid Profile of Kashmiri Type 2 Diabetic Patients
Lipid Profile of Kashmiri Type 2 Diabetic PatientsLipid Profile of Kashmiri Type 2 Diabetic Patients
Lipid Profile of Kashmiri Type 2 Diabetic Patients
 
Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1
 
Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1Dean R Berry Killer Diseases part 1
Dean R Berry Killer Diseases part 1
 
Health Risks of Being Overweight & Obesity | How to lose weight fast
Health Risks of Being Overweight & Obesity | How to lose weight fastHealth Risks of Being Overweight & Obesity | How to lose weight fast
Health Risks of Being Overweight & Obesity | How to lose weight fast
 
Global Medical Cures™ | Your Guide to Lowering High Blood Pressure
Global Medical Cures™ | Your Guide to Lowering High Blood PressureGlobal Medical Cures™ | Your Guide to Lowering High Blood Pressure
Global Medical Cures™ | Your Guide to Lowering High Blood Pressure
 
Obesity and heart disease
Obesity and heart diseaseObesity and heart disease
Obesity and heart disease
 

Recently uploaded

math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Vivekanand Anglo Vedic Academy
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 

Recently uploaded (20)

math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 

Themelis-BIP-Project.pdf

  • 1. Cardiovascular Disease Risk Assessment Evripidis Themelis, Democritus University of Thrace Introduction Cardiovascular diseases are a leading cause of death worldwide. Analyzing patient data can provide insights into the factors contributing to these diseases and aid in early detection and prevention. This report delves into a dataset containing patient records, exploring various factors that may contribute to cardiovascular diseases. Dataset Overview The dataset consists of 70000 records of patients’ data (Kaggle). It encompasses 11 features, including age, height, weight, and blood pressure readings, along with a target variable indi- cating the presence or absence of cardiovascular disease: 1. Age: • Type: Integer • Description: Age of the person in days. • Example Values: 1.8393 × 104 , 2.0228 × 104 , 1.8857 × 104 2. Gender: • Type: Categorical Integer • Description: Gender of the person (Assuming 1 represents ‘Female’ and 2 represents ‘Male’). • Example Values: 2, 1, 1 3. Height: • Type: Integer • Description: Height of the person in centimeters. • Example Values: 168, 156, 165 1
  • 2. 4. Weight: • Type: Float • Description: Weight of the person in kilograms. • Example Values: 62, 85, 64 5. Systolic Blood Pressure (ap_hi): • Type: Integer • Description: Systolic blood pressure measurement. • Example Values: 110, 140, 130 6. Diastolic Blood Pressure (ap_lo): • Type: Integer • Description: Diastolic blood pressure measurement. • Example Values: 80, 90, 70 7. Cholesterol: • Type: Categorical Integer • Description: Cholesterol level. – 1: normal – 2: above normal – 3: well above normal • Example Values: 1, 3, 3 8. Glucose (gluc): • Type: Categorical Integer • Description: Glucose level. – 1: normal – 2: above normal – 3: well above normal • Example Values: 1, 1, 1 2
  • 3. Table 1: Subset of the Cardiovascular Disease dataset age gender height weight ap_hi ap_lo cholesterol 18393 2 168 62 110 80 1 20228 1 156 85 140 90 3 18857 1 165 64 130 70 3 17623 2 169 82 150 100 1 17474 1 156 56 100 60 1 21914 1 151 67 120 80 2 22113 1 157 93 130 80 3 22584 2 178 95 130 90 3 17668 1 158 71 110 70 1 19834 1 164 68 110 60 1 22530 1 169 80 120 80 1 18815 2 173 60 120 80 1 9. Smoking: • Type: Binary (0 for ‘No’ and 1 for ‘Yes’) • Description: Whether the person smokes or not. • Example Values: 0, 0, 0 10. Alcohol Intake (alco): • Type: Binary (0 for ‘No’ and 1 for ‘Yes’) • Description: Whether the person consumes alcohol or not. • Example Values: 0, 0, 0 11. Physical Activity (active): • Type: Binary (0 for ‘No’ and 1 for ‘Yes’) • Description: Whether the person is physically active or not. • Example Values: 1, 1, 0 12. Cardiovascular Disease (cardio): • Type: Binary (0 for ‘No’ and 1 for ‘Yes’) • Description: Presence or absence of cardiovascular disease. • Example Values: 0, 1, 1 3
  • 4. Feature Engineering New variables related to the age in years, height in meters, BMI categories, and blood pressure health categories were created and added to the dataset. Age in Years (data$age): The original age variable seems to be given in days. It’s trans- formed to represent age in years by dividing the original value by 365. Height in Meters (data[“height”]): The original height variable was converted to meters by dividing the height by 100. Body Mass Index (BMI) is a measure used to determine whether a person has an appro- priate weight with respect to their height. It’s calculated with the formula: BMI = weight (kg) height (m) 2 The BMI is then rounded to one decimal place. A discretized BMI variable has been subsequently created: Underweight: BMI < 18.5 Normal Weight: 18.5 ≤ BMI < 24.9 Overweight: 25 ≤ BMI < 29.9 Obesity Class 1 (Moderate): 30 ≤ BMI < 34.9 Obesity Class 2 (Severe): 35 ≤ BMI < 39.9 Obesity Class 3 (Very Severe): BMI ≥ 40 Blood Pressure Categories: A function is used to categorize patients based on their systolic (the top number) and diastolic (the bottom number) blood pressure readings into different health categories: Normal Blood Pressure: Systolic < 120 and Diastolic < 80 Prehypertension: Systolic 120 ≤ BP < 130 and Diastolic < 80 Hypertension Stage 1: Systolic 130 ≤ BP < 140 or Diastolic 80 ≤ BP < 90 Hypertension Stage 2: Systolic BP ≥ 140 or Diastolic BP ≥ 90 Hypertensive Crisis: Systolic BP ≥ 180 or Diastolic BP ≥ 120 4
  • 5. Correlations Understanding correlations between various features can highlight the relationships and de- pendencies among them. For instance, a strong positive correlation between age and the onset of cardiovascular disease might indicate age as a significant risk factor. −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 age height weight ap_hi ap_lo BMI age height weight ap_hi ap_lo BMI As anticipated, height exhibits a positive correlation with weight and BMI, while the correla- tions among the remaining variables are relatively weak. How does the age distribution look for individuals with and without cardiovascular diseases? Understanding the age distribution of individuals with cardiovascular diseases (CVD) is crucial. Age is one of the most established risk factors for CVD. As age increases, the risk of damaged and narrowed arteries, as well as weakened or thickened heart muscle, increases. Here, we visualize how age varies for those with and without CVD. On average, individuals with CVD tend to be older. 5
  • 6. 30 40 50 60 0 1 Cardiovascular Disease Status Age No Cardiovascular Disease Cardiovascular Disease Age Distribution by Cardiovascular Disease Status Cholesterol Levels and Cardiovascular Disease Prevalence Cholesterol is a fatty substance found in the blood. While it’s vital for the formation of cell membranes, certain hormones, and vitamin D, too much cholesterol in the blood can increase the risk of cardiovascular diseases. Here, we explore how varying cholesterol levels correlate with the prevalence of cardiovascular diseases. The three cholesterol levels are: Normal: Cholesterol levels are within the recommended range. Above Normal: Cholesterol levels are higher than what’s considered normal but not critically high. Well Above Normal: Critically high cholesterol levels that might need immediate medical attention. 6
  • 7. 0.00 0.25 0.50 0.75 1.00 Normal Above Normal Well Above Normal Cholesterol Level Proportion No Cardiovascular Disease Cardiovascular Disease Cholesterol Levels and Cardiovascular Disease Prevalence The proportion of individuals with CVD tends to increase as we transition from the normal cholesterol group to the significantly above-normal cholesterol group. Blood Pressure Categories and Cardiovascular Disease Prevalence Blood pressure, a key health indicator, measures the force exerted by blood against the walls of your arteries as your heart pumps it around your body. It’s a vital sign of cardiovascular health. Persistent high blood pressure, medically known as hypertension, can lead to serious health complications. It can increase the risk of heart disease, stroke, kidney disease, and other health problems. The categories of blood pressure are defined as: Normal Blood Pressure: Indicative of a healthy heart and no heightened risk of cardiovas- cular disease. Hypertension Stage 1: A warning sign that one might be at risk of heart-related complica- tions in the future. Hypertension Stage 2: Indicates an advanced level of hypertension, which might result in damage to critical organs if left unaddressed. Hypertensive Crisis: A severe condition that demands immediate medical attention. In this visualization, we aim to explore the distribution of these blood pressure categories in relation to the presence or absence of cardiovascular diseases. 7
  • 8. 0.00 0.25 0.50 0.75 1.00 Normal Blood Pressure Hypertension Stage 1 Hypertension Stage 2Hypertensive Crisis Blood Pressure Category Proportion No Cardiovascular Disease Cardiovascular Disease Blood Pressure Categories and Cardiovascular Disease Prevalence The prevalence of CVD triples as we progress from the normal blood pressure group to the Hypertensive Crisis group. BMI and Cardiovascular Diseases Body Mass Index (BMI) is a widely used measure to categorize individuals based on their weight relative to their height. It’s a useful metric to gauge whether a person has an appropriate weight for their height. Here are the typical BMI categories: Underweight: BMI is less than 18.5. Normal Weight: BMI is 18.5 to 24.9. Overweight: BMI is 25 to 29.9. Obesity Class 1 (Moderate): BMI is 30 to 34.9. Obesity Class 2 (Severe): BMI is 35 to 39.9. Obesity Class 3 (Very Severe or Morbidly Obese): BMI is 40 or higher. Higher BMI values are associated with increased risks of CVD. 8
  • 9. 0.00 0.25 0.50 0.75 1.00 U n d e r w e i g h t N o r m a l W e i g h t O v e r w e i g h t O b e s i t y C l a s s 1 ( M o d e r a t e ) O b e s i t y C l a s s 2 ( S e v e r e ) O b e s i t y C l a s s 3 ( V e r y S e v e r e ) BMI Category Proportion No Cardiovascular Disease Cardiovascular Disease BMI Categories and Cardiovascular Disease Prevalence Logistic Regression Model for Predicting Cardiovascular Diseases Logistic Regression is a statistical method used for analyzing datasets in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). In this case, we’re using logistic regression with 5-fold cross-validation to predict the likelihood of a patient having cardiovascular disease based on several features. Confusion Matrix and Statistics Reference Prediction 0 1 0 8313 3573 1 2193 6920 Accuracy : 0.7254 95% CI : (0.7193, 0.7314) No Information Rate : 0.5003 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 0.4508 Mcnemar's Test P-Value : < 2.2e-16 9
  • 10. Sensitivity : 0.7913 Specificity : 0.6595 Pos Pred Value : 0.6994 Neg Pred Value : 0.7594 Prevalence : 0.5003 Detection Rate : 0.3959 Detection Prevalence : 0.5660 Balanced Accuracy : 0.7254 'Positive' Class : 0 Model Performance Insights The model achieved an accuracy of 72.54%, meaning it correctly predicted the cardiovascular disease status for about 72.54% of the patients in the test data. While this is a decent performance, there might be room for improvement using either more advanced algorithms or feature engineering. The model has a sensitivity (true positive rate) of 79.13% and a specificity (true negative rate) of 65.95%. While it correctly identifies a large proportion of the patients with cardiovascular diseases, it also has a relatively higher false positive rate. The model has a positive predictive value (precision) of 69.94% and a negative predictive value of 75.94%. This suggests that when the model predicts a patient has cardiovascular disease, it’s correct about 69.94% of the time. On the other hand, when it predicts a patient doesn’t have the disease, it’s correct about 75.94% of the time. Model Interpretation Age is a significant predictor of cardiovascular disease. As age increases, the likelihood of having cardiovascular disease also increases. Having cholesterol level 2 or 3 (above normal or well above normal) significantly increases the odds of cardiovascular disease. Glucose level 3 (well above normal) is associated with a decreased likelihood, though this is a bit counterintuitive and might require further investigation. Smoking, alcohol consumption, and being inactive are associated with increased odds of cardiovascular disease. As BMI category increases, the odds of having cardiovascular disease also tend to increase. 10
  • 11. Blood pressure categories (Hypertension Stage 1, Hypertension Stage 2, Hypertensive Crisis) are very significant predictors. The higher the blood pressure category, the higher the odds of having cardiovascular disease. A Decision Tree for Predicting Cardiovascular Diseases We build and evaluate a decision tree model for predicting cardiovascular diseases. The model achieved an accuracy of 72.76%, indicating that it correctly predicted the cardio- vascular disease status for approximately 72.76% of the patients in the test data. While this accuracy is decent, there is still potential for improvement, either through more advanced algorithms or feature engineering. In terms of sensitivity (true positive rate), the model performs reasonably well with a sensitivity of 75.77%. This means that it correctly identifies 75.77% of the patients with cardiovascular diseases. However, it’s important to note that there is still a portion of patients with the disease that the model does not detect. The model also exhibits a specificity (true negative rate) of 69.75%. This suggests that it correctly identifies 69.75% of the patients without cardiovascular disease. While this is a respectable rate, there is a relatively higher false positive rate, indicating that the model may incorrectly classify some patients as having the disease when they do not. The positive predictive value (precision) of the model is 71.49%. This means that when the model predicts a patient has cardiovascular disease, it is correct approximately 71.49% of the time. On the other hand, the negative predictive value is 74.19%, indicating that when the model predicts a patient does not have the disease, it is correct about 74.19% of the time. Overall, the model provides a reasonable level of accuracy and balance between sensitivity and specificity. It correctly identifies a significant portion of patients with and without car- diovascular disease. However, there is still room for improvement, and further refinement of the model may enhance its predictive performance. Confusion Matrix and Statistics Reference Prediction 0 1 0 7960 3174 1 2546 7319 Accuracy : 0.7276 95% CI : (0.7215, 0.7336) No Information Rate : 0.5003 P-Value [Acc > NIR] : < 2.2e-16 11
  • 12. Kappa : 0.4552 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.7577 Specificity : 0.6975 Pos Pred Value : 0.7149 Neg Pred Value : 0.7419 Prevalence : 0.5003 Detection Rate : 0.3791 Detection Prevalence : 0.5302 Balanced Accuracy : 0.7276 'Positive' Class : 0 A Decision Tree for Predicting Cardiovascular Diseases The primary predictors in this decision tree model are newap, age, and cholesterol, and they contribute to classifying individuals into two categories, Class 0 and Class 1, indicating the absence or presence of cardiovascular disease, respectively. Specifically: • Individuals with hypertension stage 2 or experiencing a hypertensive crisis are more likely to be classified as having cardiovascular disease (Class 1). • Individuals with an age less than 54, lower cholesterol levels, and either no blood pres- sure information or hypertension stage 1 are more likely to be classified as not having cardiovascular disease (Class 0). • Individuals aged 54 or older, with higher cholesterol levels, and known blood pressure values are more likely to be classified as having cardiovascular disease (Class 1). 12
  • 13. newap = NBP,HS1 age < 54 cholesterol = 1,2 cholesterol = 1,2 age < 61 0 25e+3 24e+3 100% 0 21e+3 11e+3 65% 0 14e+3 4747 38% 0 13e+3 4168 36% 1 321 579 2% 0 7042 6215 27% 0 6598 5055 24% 0 5170 3382 17% 1 1428 1673 6% 1 444 1160 3% 1 3774 14e+3 35% yes no Concluding remarks In conclusion, this report has explored a dataset containing patient records with a focus on understanding the factors contributing to cardiovascular diseases and improving early detection and prevention. The key findings and insights from this analysis are as follows: Age is a significant risk factor: As age increases, the likelihood of having cardiovascular disease also increases. This underscores the importance of age as a risk factor in cardiovascular health. Elevated cholesterol levels (above normal and well above normal) are associated with an increased risk of cardiovascular disease. Monitoring and managing cholesterol levels is crucial in preventing cardiovascular diseases. Blood pressure is a vital indicator of cardiovascular health. The higher the blood pressure category (Hypertension Stage 1, Hypertension Stage 2, or Hypertensive Crisis), the higher the odds of having cardiovascular disease. Regular blood pressure monitoring and management are essential for cardiovascular health. Body Mass Index (BMI) is a significant predictor. Higher BMI categories (overweight and different obesity classes) are associated with an increased likelihood of cardiovascular disease. Weight management and maintaining a healthy BMI are essential for reducing the risk of cardiovascular diseases. 13
  • 14. Smoking, alcohol consumption, and physical inactivity are associated with increased odds of cardiovascular disease. Promoting a healthy lifestyle, including smoking cessation, moderate alcohol consumption, and regular physical activity, is crucial for cardiovascular health. Additionally, we developed two predictive models, logistic regression and a decision tree, to assess the likelihood of a patient having cardiovascular disease based on various features. These models achieved reasonable accuracy, with scope for further improvement through advanced algorithms and feature engineering. Overall, this analysis provides valuable insights into the risk factors and predictive models for cardiovascular diseases, serving as a foundation for further research and healthcare interven- tions to reduce the global burden of cardiovascular diseases. 14