SlideShare a Scribd company logo
1 of 6
Download to read offline
2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), 8-9 July 2021, Rajshahi, Bangladesh
Applying Machine Learning Classifiers on ECG
Dataset for Predicting Heart Disease
Adiba Ibnat Hossain∗, Sabitri Sikder†, Annesha Das‡ and Ashim Dey§
Department of Computer Science and Engineering
Chittagong University of Engineering and Technology
Chittagong-4349, Bangladesh
∗hossainadiba123@gmail.com, †sabitri287525@gmail.com, ‡annesha@cuet.ac.bd, §ashim@cuet.ac.bd
Abstract—Sudden demise from heart disease is rising in a
terrible rate and this disease has become a common cause of
death worldwide. But it is a matter of hope that heart diseases are
avertible by making simple lifestyle changes coupled with early
prognosis which can greatly improve its recovery. Identifying high
risk patients is difficult due to the multifaceted characteristic
of various threat factors such as high cholesterol, high blood
pressure, diabetes etc. Most of the time, diagnosis of heart
disease depends on doctor’s observation and expertise instead
of utilizing the large amount of knowledge-rich medical dataset.
To change the situation, scientists and doctors have turned to
machine learning techniques to evaluate screening results along
with other medical parameters to predict heart disease. For
heart disease prediction, this study implements five machine
learning algorithms including Support Vector Machine, Logistic
Regression, K-nearest Neighbor, Naive Bayes, and Ensemble
Voting Classifier on a dataset with 1190 records accumulated
from UCI repository. The dataset combines five independent ECG
dataset which gives us an extra edge to achieve our objectives.
Relation among the attributes in the dataset is analyzed before the
accuracy is calculated. Among the five classification algorithms,
Support Vector Machine outperforms other classifiers with the
accuracy of 85.49%. We hope this study will ensure early
diagnosis of heart disease and increase the chance of survival.
Keywords—Cardiovascular disease, ECG dataset, Heart dis-
ease prediction, Machine learning classifiers, Support vector
machine
I. INTRODUCTION
The term “Heart Disease”, also known as “Cardiovascular
Disease (CVD)” commonly refers to the heart condition that
affects the muscles, valves and blood vessels of the heart that
can arise a severe cardiovascular problem leading to a heart
attack. Angina (chest pain or discomfort) is reckoned as a
form of CVD where constricted or blocked blood vessels can
even endanger the life of a patient causing a heart failure.
CVDs are taken to be one of the prime causes of death all
over the world. As per the World Health Survey carried out
by World Health Organization (WHO), it is estimated that
CVD accounts for death to 17.9 million people every year,
which is about 31% of the total deaths globally [1]. Death
rates from heart diseases are highest in developed countries
like USA, Scotland, and Northern England. From the statistics
of American Heart Association performed in 2018, it’s been
surveyed that 1 of 3 deaths in the USA is caused by heart
disease [2]. Heart disease can be avoided by changing some
daily habits such as maintaining healthy diet, quitting alcohol
and tobacco intake, doing regular exercises etc. Early diagnosis
of heart disease can make the contrast between life and death
because patients can be treated before they actually become
ill. Therefore, prediction of heart disease is reckoned as one of
the prime focuses in the arena of medical data research. There
is a huge amount of raw medical data to be processed into
practicable knowledge for cardiovascular data analysis that can
help us to make decisions based on credible facts promoting
prompt predictions.
Usually, the tests a patient needs in order to diagnose heart
disease depends on what conditions the physician thinks he/she
may have. Besides blood tests, chest X-ray, electrocardiogram
(ECG), there are some conventional tests required to be
done for the diagnosis of heart disease that includes cardiac
magnetic resonance imaging (MRI), cardiac computerized
tomography (CT) scan, echocardiogram, holter monitoring,
heart catheterization, stress test, etc. Moreover, a bunch of
new techniques and models based on machine learning and
image processing have been introduced such as, medical
image fusion [3], feature fusion approach [4], prediction model
based on Weighted Associative Classifier (WAC) [5], etc. In
many developing countries, due to the scarcity of medical
professionals and lack of efficient diagnostic tools, diagnosing
heart disease and providing proper treatment are getting very
difficult.
This study aims at resolving these inconveniences by de-
veloping a prediction model applying some machine learning
algorithms which will take some medical parameters of a
patient and analyze them to forecast if the patient may have
heart disease or not. For this purpose, we have used Support
Vector Machine (SVM), Logistic regression, K-nearest neigh-
bors (KNN), Naı̈ve Bayes, and Ensemble Voting Classifier
algorithm to implement a heart disease prediction model using
publicly available ECG dataset. The main goals of our work
are:
• To analyze the comprehensive dataset consisting of
Statlog-Heart, Long Beach VA, Switzerland, Hungarian,
and Cleveland datasets by depicting the relation and
implication between the features.
• To develop five classification models using the chosen
dataset based on the fore-said algorithms.
978-1-6654-3843-8/21/$31.00 ©2021 IEEE
• To investigate the performance of the applied models
considering their accuracy for selecting the best one.
The rest of the document is outlined as follows: Section
II explores the literature review of related works we have
studied to develop our idea. Section III illustrates details of
our methodology. Section IV analyzes the performance of the
applied algorithms on the chosen dataset. At last, Section V
finishes the paper with a summary.
II. LITERATURE REVIEW
In recent times, researchers have proposed different machine
learning based techniques to detect the existence of heart
disease among patients.
In [6], with the aid of clinical evidence, certain classification
algorithms such as Naive Bayes, SVM, and KNN were used
to predict whether or not a patient has cardiopathy. With an
accuracy of 86.6%, Naive Bayes anticipates the heart disease
better than other algorithms.
In [7], dimensionality reduction was performed using two
methods including feature extraction and feature selection.
Among several supervised machine learning algorithms, SVM
performed very well in this study.
In [8], the primary goal of this analysis is to develop a heart
disease prediction system more dynamic using various sensors,
such as AliveKor, HealthGear, MyHeart, Fitbit to gather data
on heart disease to deter costly medical examinations. For
training and testing purpose, the neural network algorithm and
multi-layer perceptron techniques were implemented.
In [9], heart disease prediction models were generated using
seven classification techniques. RapidMiner Studio which is a
data science software platform was utilized to perform the
experiments. The prediction model evolved using voting clas-
sifier with nine selected features which obtained the highest
accuracy of 87.41%. The benchmarking tool was used to
assess the performance of the applied model relative to other
works.
In [10], multilayer perceptron neural network with back-
propagation has been used by the authors as the training
algorithm. The findings of the experiments demonstrate that
the proposed system based on neural networks can accurately
identify heart disease.
In [11], several classification algorithms including SVM,
Logistic Regression, Decision Tree, KNN, Naive Bayes, and
ANN were implemented. For the selection of appropriate fea-
tures, some algorithms for instance Least Absolute Shrinkage,
Minimum Redundancy Maximum Relevance, Relief, etc. were
applied. The dataset has been gone through various statistical
operations before training the models.
In [12], authors have used several machine learning meth-
ods to compare the accuracy of the heart disease diagnosis.
Without any feature selection constraints, the Hybrid Random
Forest with Linear Model (HRFLM) technique predicts CVDs
with lower classification error and higher accuracy.
In [13], authors analyzed the accuracy of each algorithms
with the support of confusion matrix while developing a model
for heart disease prediction. In this work, KNN performed
much efficiently with 87% accuracy relative to other classi-
fiers.
In [14], relief has been identified as the best feature selection
algorithm. Chest pain, exercise-induced angina, and thallium
scan, are addressed as the most preferable features. Here, con-
sidering the accuracy Logistic Regression has outperformed
considering the accuracy. On the other hand, SVM is the best
when it comes to specificity. In this study, they focused on
reducing the time of execution.
In [15], between Naive Bayes and Decision Tree, Decision
Tree has done significantly well with 19 attributes. Each
attribute’s information gain has been calculated and the highest
value of information gain is taken to construct a shorter tree.
In [16], the research demonstrates that it is critical to select
the most appropriate and influential features to maximize
the heart disease prediction result. In spite of opting for six
features among the eight features, the accuracy varied a little.
In this work, Random Forest yields the highest accuracy of
95%.
To the best of authors knowledge, there exist a few works
on the combined dataset used in this work. Apart from the
application of traditional machine learning algorithms (SVM,
Naı̈ve Bayes, Logistic Regression, KNN), this work includes
ensemble voting classifier algorithm which is one of the recent
findings incorporating multiple diverse models.
III. METHODOLOGY
Before approaching for the implementation of several ma-
chine learning algorithms and analysis of their results, we
have figured out some procedural steps and established a
methodology to achieve our objectives. Fig. 1 represents the
overall workflow of our study which sums up every required
step to proceed towards the goal. Initially a dataset based on
ECG report is read from the CSV file. The parameters of
the dataset are studied and preprocessed before applying the
algorithms to predict a result.
The following sub-sections go into greater detail on our
workflow.
A. Data Collection
To accomplish our goal, we have started with the data
collection process from UCI repository datasets which are well
verified by the researcher community. We have collected the
dataset which is a combination of five popular independent
datasets available in UCI machine learning repository. It is
basically an ECG dataset and is combined over 12 common
attributes from the five constituent datasets resulting in 1190
records in total which can be claimed as the largest CVD
dataset [17] available for the research practitioners. The dataset
contains common medical parameters related to heart condi-
tion along with the information of comorbidities. The details
of five constituent datasets are exhibited in Table I.
Fig. 1. Overall workflow.
TABLE I. Dataset Overview
Name of
Dataset
Number
of Data
Source
Cleveland
Dataset
303 Cleveland Clinic Foundation: Robert
Detrano, M.D., Ph.D.
Hungarian
Dataset
294 Hungarian Institute of Cardiology. Bu-
dapest: Andras Janosi, M.D.
Switzerland
Dataset
123 University Hospital, Zurich, Switzer-
land: William Steinbrunn, M.D.
Long Beach VA
Dataset
200 V.A. Medical Center, Long Beach
Stalog (Heart)
Dataset
270 University Hospital, Basel, Switzer-
land: Matthias Pfisterer, M.D.
Total Data 1190
B. Feature Description
The dataset holds 1190 records of patients from four differ-
ent countries (UK, US, Hungary and Switzerland). It consists
of 11 features and 1 target variable as exhibited in Table II.
C. Preprocessing of Data
While working with enormous amount of diverse data,
we had to preprocess the dataset. It helps to improve data
efficiency in order to facilitate practical insights and to obtain
better result from the system. Preprocessing of data converts
unprocessed data into a comprehensible and readable format.
For preprocessing, we have conducted the following steps:
1) Identifying and handling missing values: If we fail to
find and resolve missing values appropriately, we may fail to
draw an accurate conclusion. When there are enough samples
in the dataset, a specific row that holds null values is removed
to avoid the addition of bias. In another method, the missing
value can be replaced with the mean, median or mode of a
specific attribute, which is applicable for numerical data. We
imported a python library pandas to apply isnull() function
for detecting missing values and ended up encountering with
TABLE II. Attributes of the Dataset
SL
No.
Attribute Name Attribute Description
1 Age Patient’s age in years
2 Sex 1 = male; 0 = female
3 Chest pain Chest Pain Type and ranges from 0-3
depending upon the symptoms experi-
enced by a patient.
4 Resting BPS Resting blood pressure (in mm Hg after
being admitted to the hospital)
5 Cholesterol Serum cholestoral in mg/dl .
6 Fasting blood sugar Fastingbloodsugar > 120mg/dl
(1 = signifies a blood sugar level in
excess of 120mg/dl; 0 = signifies a
blood sugar level lower than 120mg/dl)
7 Resting ECG Resting electrocardiographic results
8 Max heart rate The maximum heart rate of an individ-
ual using a Thallium Test. Measured in
beats per minute.
9 Exercise angina Exercise triggered angina (1 = yes; 0 =
no)
10 Oldpeak ST depression induced by exercise rel-
ative to rest
11 ST slope The slope of the peak exercise ST seg-
ment
12 Target 1 or 0 ( 1 = heart attack may happen,
0 = heart attack may not happen)
zero null value. This forecasts the efficacy and completeness
of this dataset.
2) Data balancing: If a dataset contains positive values
whose amount is approximately same as negative values, then
the dataset is said to be balanced. Some machine learning
classifiers struggle with imbalanced training datasets as they
are vulnerable to the proportions of the different classes [18].
Fig. 2 shows the target classes where “1” represents patient
having heart disease and “0” represents patients not having
heart disease. The number of patients with heart disease is
629 whereas 561 patients have no heart disease. Thus from
the figure, it can be observed that the target classes contain
nearly equal number of entries which indicates a balanced
dataset.
Fig. 2. Distribution of target class.
Fig. 3. Age variation for every target class.
D. Data Analysis
In Fig. 3, the variation of age for each target class is pictured
where it is visible that people aging around 50-65 are more
prone to have heart disease compared to people of other age
classes. This graph depicts the probable tendency of having
heart disease in a certain age group.
Fig. 4. Correlation between different features.
Fig. 4 illustrates a correlation heatmap. Correlation explains
how one or more input data features are connected to one
another to predict the target variable. Here, strength of the
correlation ranges from -1 to +1. Values closer to zero means
there is weaker linear relationship between the two variables.
Values close to 1 refers that the variables are more positively
correlated whereas values close to -1 are more negatively
correlated. The most positive correlation between two or more
variables tends to hold the darkest shade of green in the
heatmap, while the most negative correlation tends to take
on the darkest tone of red. From this correlation heatmap, we
can ascertain that ST slope is the most positively correlated
feature to the target (0.51) whereas max heart rate has the
most negative correlation with the target (-0.41).
E. Training and Testing
Five significant machine learning classification methods
namely KNN, Naive Bayes, SVM, Logistic Regression, and
Ensemble voting classifier are used to develop prediction
models on the dataset. Ensemble voting classifier is a hybrid
classifier implemented with Logistic Regression, Random For-
est and Naive Bayes. Before implementing the algorithms, the
dataset is split into two portions including train set and test set.
To train the machine learning model, a train dataset is used
where this subset of the data already knows the corresponding
output. Whereas the testing set is used to predict the outcome
of the model. We have split our dataset into train-test ratio
of 67:33. Here, the training dataset takes 797 records of the
total data leaving the rest 393 records for testing purpose.The
accuracy is calculated by comparing the actual response values
with predicted response values. After building the prediction
model we can make predictions on out of sample data to make
sure that the model is ready to do heart disease prediction.
IV. RESULT ANALYSIS
Python programming is used to implement the classification
algorithms as it offers the most versatile and enriched libraries.
After going through the fore-mentioned steps the machine
learning classifiers are trained on the chosen dataset. Numpy
TABLE III. Comparison Among Algorithms
Algorithms Training Accuracy Testing Accuracy Precision Recall F-measure
SVM 82.56% 85.49% 87.05% 87.44% 87.24%
Naive Bayes 83.18% 84.98% 87.27% 86.09% 86.67%
Logistic Regression 81.05% 84.47% 86.49% 86.09% 86.29%
Ensemble Vote Classifier 84.06% 84.11% 89.14% 88.34% 88.64%
KNN 75.28% 75.31% 80.58% 74.44% 77.39%
python library is used to perform mathematical operation on
confusion matrix to yield the accuracy. Confusion matrix fa-
cilitates the evaluation of the model for performance analysis.
A confusion matrix represents a table layout of the different
outcomes of the prediction that helps to visualize the results.
Usually, it generates four outcomes as following:
• True Positive (TP): The number of accurately identified
actual positive values
• True Negative (TN): The number of accurately identified
actual negative values
• False Positive (FP): The number of times the negative
values are predicted as positive
• False Negative (FN) : The number of times the positive
values are predicted as negative
The confusion matrix of SVM algorithm is given in Table IV.
TABLE IV. Confusion Matrix of SVM Classifier
Total Tests = 393
Predicted
No
Predicted
Yes
Actual
No
141 29
Actual
Yes
28 195
Just by observing the confusion matrix, the performance
of the model can not be depicted clearly. To determine how
accurate the model is, accuracy, precision, recall, f-measure are
calculated using (1), (2), (3), (4) respectively. The estimation
of properly classified values are determined by accuracy. It
tells how often our classifier is predicting right. It is the total
of all true values divided by total values.
Accuracy =
True Predictions
Total Predictions
(1)
Precision =
True Positives
False Positives + True Positives
(2)
Recall =
True Positives
False Negatives + True Positives
(3)
F − Measure =
2 ∗ Recall ∗ Precision
Recall + Precision
(4)
Table III shows the comparison of the implemented al-
gorithms based on the aforementioned performance metrics.
From the table, it is clear that SVM algorithm has outper-
formed other classification algorithms with the accuracy of
85.49% as it aims to find the best hyperplane (also called
decision boundary) in a binary classification model. It best
splits our dataset into two classes yielding the prediction of
whether a patient would have CVD or not. On the other hand,
KNN has shown the lowest accuracy (75.31%) among these
algorithms as it works well with a small number of input
variables, but struggles when the number of input is very large.
Fig. 5 shows in depth analysis of the models by plotting the
total number of wrong predicted values with respect to each
classification algorithm. The total number of wrong predicted
values on the test set are calculated from the confusion matrix.
Here, (FP + FN) is calculated to yield the total number of
wrong predictions. SVM generates least amount of wrongly
predicted values. On the contrary, KNN encounters the highest
amount of wrong predictions as it holds the least accuracy.
From the investigation, it can be inferred that SVM performs
much competently as compared to other algorithms.
S
V
M
N
a
i
v
e
B
a
y
e
s
L
o
g
i
s
t
i
c
R
e
g
r
e
s
s
i
o
n
E
n
s
e
m
b
l
e
V
o
t
e
C
l
a
s
s
i
fi
e
r
K
N
N
0
20
40
60
80
100
Prediction Models
Total
Wrong
Predicted
Values
Fig. 5. Wrong predicted values for the applied algorithms.
In Table V, the performance of our work and other existing
TABLE V. Comparison with Other Existing Work
Work Reference
Accuracy Using
SVM
Ours 85.49%
[6] 77.7%
[11] 83%
[13] 85%
works are compared based on the accuracy obtained by the
best performing algorithm in our work i.e. SVM. We attained
better accuracy in this case by using combined dataset (1190
records) whereas the other works applied the algorithm on the
Cleveland dataset (303 records).
V. CONCLUSION
The number of deaths due to heart disease is increasing day
by day for the lack of early prognosis and timely treatment. In
case of heart disease, early diagnosis can accelerate the chance
of survival and also reduce the associated health complexities.
In this work, we have developed a prediction model to detect
heart disease based on some parameters derived from ECG
reports. To achieve this goal, five fore-mentioned classification
algorithms have been applied on a publicly available dataset
with 1190 records which is a combination of five popular
datasets in this field. Among the applied classifiers, Ensemble
Voting classifier is a hybrid technique combining Logistic
Regression, Random Forest and Naive Bayes. From our anal-
ysis, it is observed that SVM performs realistically well with
85.49% accuracy. Further enhancement of this study is that a
large-scale dataset can be collected from our native hospitals as
individual health also depends on regions and socio-economic
factors. In future, more analysis can be performed with the
different combination of algorithms used in hybrid techniques
to obtain a better performing heart disease prediction model.
REFERENCES
[1] T. WHO. (2016) Cardiovascular diseases. [Online]. Available: https:
//www.who.int/health-topics/cardiovascular-diseases
[2] R. Cementa. (2018) Heart attacks: Men vs. women. [Online]. Avail-
able: https://www.caringseniorservice.com/blog/heart-attacks-men-vs.
-women?
[3] M. Diwakar, A. Tripathi, K. Joshi, M. Memoria, P. Singh et al., “Latest
trends on heart disease prediction using machine learning and image
fusion,” Materials Today: Proceedings, vol. 37, pp. 3213–3218, 2021.
[4] F. Ali, S. El-Sappagh, S. R. Islam, D. Kwak, A. Ali, M. Imran,
and K.-S. Kwak, “A smart healthcare monitoring system for heart
disease prediction based on ensemble deep learning and feature fusion,”
Information Fusion, vol. 63, pp. 208–222, 2020.
[5] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Intelligent and effective
heart disease prediction system using weighted associative classifiers,”
International Journal on Computer Science and Engineering, vol. 3,
no. 6, pp. 2385–2392, 2011.
[6] S. Anitha and N. Sridevi, “Heart disease prediction using data mining
techniques,” Journal of Analysis and Computation, 2019.
[7] V. Ramalingam, A. Dandapath, and M. K. Raja, “Heart disease predic-
tion using machine learning techniques: a survey,” International Journal
of Engineering & Technology, vol. 7, no. 2.8, pp. 684–687, 2018.
[8] A. Gavhane, G. Kokkula, I. Pandya, and K. Devadkar, “Prediction of
heart disease using machine learning,” in 2018 Second International
Conference on Electronics, Communication and Aerospace Technology
(ICECA). IEEE, 2018, pp. 1275–1278.
[9] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of
significant features and data mining techniques in predicting heart
disease,” Telematics and Informatics, vol. 36, pp. 82–93, 2019.
[10] P. Singh, S. Singh, and G. S. Pandi-Jain, “Effective heart disease
prediction system using data mining techniques,” International journal
of nanomedicine, vol. 13, no. T-NANO 2014 Abstracts, p. 121, 2018.
[11] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart
disease identification method using machine learning classification in
e-healthcare,” IEEE Access, vol. 8, pp. 107 562–107 582, 2020.
[12] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease
prediction using hybrid machine learning techniques,” IEEE Access,
vol. 7, pp. 81 542–81 554, 2019.
[13] A. Singh and R. Kumar, “Heart disease prediction using machine
learning algorithms,” in 2020 international conference on electrical and
electronics engineering (ICE3). IEEE, 2020, pp. 452–457.
[14] A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun, “A hybrid
intelligent system framework for the prediction of heart disease using
machine learning algorithms,” Mobile Information Systems, vol. 2018,
2018.
[15] S. Nikhar and A. Karandikar, “Prediction of heart disease using machine
learning algorithms,” International Journal of Advanced Engineering,
Management and Science, vol. 2, no. 6, p. 239484, 2016.
[16] N. S. C. Reddy, S. S. Nee, L. Z. Min, and C. X. Ying, “Classification
and feature selection approaches by machine learning techniques: Heart
disease prediction,” International Journal of Innovative Computing,
vol. 9, no. 1, 2019.
[17] M. Siddharta. (2019) Heart disease dataset (most compre-
hensive). [Online]. Available: https://www.kaggle.com/sid321axn/
heart-statlog-cleveland-hungary-final
[18] D. J. Dittman, T. M. Khoshgoftaar, and A. Napolitano, “The effect of
data sampling when using random forest on imbalanced bioinformatics
data,” in 2015 IEEE international conference on information reuse and
integration. IEEE, 2015, pp. 457–463.

More Related Content

Similar to 238_heartdisease (1).pdf

Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...cscpconf
 
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...csitconf
 
A Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction TechniquesA Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction Techniquesijtsrd
 
Comparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease PredictionComparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease PredictionIRJET Journal
 
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET Journal
 
A comprehensive study of machine learning for predicting cardiovascular disea...
A comprehensive study of machine learning for predicting cardiovascular disea...A comprehensive study of machine learning for predicting cardiovascular disea...
A comprehensive study of machine learning for predicting cardiovascular disea...IJECEIAES
 
Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
 
javed_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart diseasejaved_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart diseasejaved75
 
Heart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning TechniquesHeart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning TechniquesIRJET Journal
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionBASMAJUMAASALEHALMOH
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET Journal
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewIRJET Journal
 
An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...BASMAJUMAASALEHALMOH
 
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODSA STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODSIRJET Journal
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksIAEME Publication
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksIAEME Publication
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksIAEME Publication
 
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...BASMAJUMAASALEHALMOH
 

Similar to 238_heartdisease (1).pdf (20)

Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...
 
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...
 
A Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction TechniquesA Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction Techniques
 
Comparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease PredictionComparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease Prediction
 
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
 
A comprehensive study of machine learning for predicting cardiovascular disea...
A comprehensive study of machine learning for predicting cardiovascular disea...A comprehensive study of machine learning for predicting cardiovascular disea...
A comprehensive study of machine learning for predicting cardiovascular disea...
 
Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...
 
javed_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart diseasejaved_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart disease
 
Heart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning TechniquesHeart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning Techniques
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease Prediction
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A Review
 
An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...
 
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODSA STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
A STUDY OF THE LITERATURE ON CARDIOVASCULAR DISEASE PREDICTION METHODS
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 

Recently uploaded

ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

238_heartdisease (1).pdf

  • 1. 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), 8-9 July 2021, Rajshahi, Bangladesh Applying Machine Learning Classifiers on ECG Dataset for Predicting Heart Disease Adiba Ibnat Hossain∗, Sabitri Sikder†, Annesha Das‡ and Ashim Dey§ Department of Computer Science and Engineering Chittagong University of Engineering and Technology Chittagong-4349, Bangladesh ∗hossainadiba123@gmail.com, †sabitri287525@gmail.com, ‡annesha@cuet.ac.bd, §ashim@cuet.ac.bd Abstract—Sudden demise from heart disease is rising in a terrible rate and this disease has become a common cause of death worldwide. But it is a matter of hope that heart diseases are avertible by making simple lifestyle changes coupled with early prognosis which can greatly improve its recovery. Identifying high risk patients is difficult due to the multifaceted characteristic of various threat factors such as high cholesterol, high blood pressure, diabetes etc. Most of the time, diagnosis of heart disease depends on doctor’s observation and expertise instead of utilizing the large amount of knowledge-rich medical dataset. To change the situation, scientists and doctors have turned to machine learning techniques to evaluate screening results along with other medical parameters to predict heart disease. For heart disease prediction, this study implements five machine learning algorithms including Support Vector Machine, Logistic Regression, K-nearest Neighbor, Naive Bayes, and Ensemble Voting Classifier on a dataset with 1190 records accumulated from UCI repository. The dataset combines five independent ECG dataset which gives us an extra edge to achieve our objectives. Relation among the attributes in the dataset is analyzed before the accuracy is calculated. Among the five classification algorithms, Support Vector Machine outperforms other classifiers with the accuracy of 85.49%. We hope this study will ensure early diagnosis of heart disease and increase the chance of survival. Keywords—Cardiovascular disease, ECG dataset, Heart dis- ease prediction, Machine learning classifiers, Support vector machine I. INTRODUCTION The term “Heart Disease”, also known as “Cardiovascular Disease (CVD)” commonly refers to the heart condition that affects the muscles, valves and blood vessels of the heart that can arise a severe cardiovascular problem leading to a heart attack. Angina (chest pain or discomfort) is reckoned as a form of CVD where constricted or blocked blood vessels can even endanger the life of a patient causing a heart failure. CVDs are taken to be one of the prime causes of death all over the world. As per the World Health Survey carried out by World Health Organization (WHO), it is estimated that CVD accounts for death to 17.9 million people every year, which is about 31% of the total deaths globally [1]. Death rates from heart diseases are highest in developed countries like USA, Scotland, and Northern England. From the statistics of American Heart Association performed in 2018, it’s been surveyed that 1 of 3 deaths in the USA is caused by heart disease [2]. Heart disease can be avoided by changing some daily habits such as maintaining healthy diet, quitting alcohol and tobacco intake, doing regular exercises etc. Early diagnosis of heart disease can make the contrast between life and death because patients can be treated before they actually become ill. Therefore, prediction of heart disease is reckoned as one of the prime focuses in the arena of medical data research. There is a huge amount of raw medical data to be processed into practicable knowledge for cardiovascular data analysis that can help us to make decisions based on credible facts promoting prompt predictions. Usually, the tests a patient needs in order to diagnose heart disease depends on what conditions the physician thinks he/she may have. Besides blood tests, chest X-ray, electrocardiogram (ECG), there are some conventional tests required to be done for the diagnosis of heart disease that includes cardiac magnetic resonance imaging (MRI), cardiac computerized tomography (CT) scan, echocardiogram, holter monitoring, heart catheterization, stress test, etc. Moreover, a bunch of new techniques and models based on machine learning and image processing have been introduced such as, medical image fusion [3], feature fusion approach [4], prediction model based on Weighted Associative Classifier (WAC) [5], etc. In many developing countries, due to the scarcity of medical professionals and lack of efficient diagnostic tools, diagnosing heart disease and providing proper treatment are getting very difficult. This study aims at resolving these inconveniences by de- veloping a prediction model applying some machine learning algorithms which will take some medical parameters of a patient and analyze them to forecast if the patient may have heart disease or not. For this purpose, we have used Support Vector Machine (SVM), Logistic regression, K-nearest neigh- bors (KNN), Naı̈ve Bayes, and Ensemble Voting Classifier algorithm to implement a heart disease prediction model using publicly available ECG dataset. The main goals of our work are: • To analyze the comprehensive dataset consisting of Statlog-Heart, Long Beach VA, Switzerland, Hungarian, and Cleveland datasets by depicting the relation and implication between the features. • To develop five classification models using the chosen dataset based on the fore-said algorithms. 978-1-6654-3843-8/21/$31.00 ©2021 IEEE
  • 2. • To investigate the performance of the applied models considering their accuracy for selecting the best one. The rest of the document is outlined as follows: Section II explores the literature review of related works we have studied to develop our idea. Section III illustrates details of our methodology. Section IV analyzes the performance of the applied algorithms on the chosen dataset. At last, Section V finishes the paper with a summary. II. LITERATURE REVIEW In recent times, researchers have proposed different machine learning based techniques to detect the existence of heart disease among patients. In [6], with the aid of clinical evidence, certain classification algorithms such as Naive Bayes, SVM, and KNN were used to predict whether or not a patient has cardiopathy. With an accuracy of 86.6%, Naive Bayes anticipates the heart disease better than other algorithms. In [7], dimensionality reduction was performed using two methods including feature extraction and feature selection. Among several supervised machine learning algorithms, SVM performed very well in this study. In [8], the primary goal of this analysis is to develop a heart disease prediction system more dynamic using various sensors, such as AliveKor, HealthGear, MyHeart, Fitbit to gather data on heart disease to deter costly medical examinations. For training and testing purpose, the neural network algorithm and multi-layer perceptron techniques were implemented. In [9], heart disease prediction models were generated using seven classification techniques. RapidMiner Studio which is a data science software platform was utilized to perform the experiments. The prediction model evolved using voting clas- sifier with nine selected features which obtained the highest accuracy of 87.41%. The benchmarking tool was used to assess the performance of the applied model relative to other works. In [10], multilayer perceptron neural network with back- propagation has been used by the authors as the training algorithm. The findings of the experiments demonstrate that the proposed system based on neural networks can accurately identify heart disease. In [11], several classification algorithms including SVM, Logistic Regression, Decision Tree, KNN, Naive Bayes, and ANN were implemented. For the selection of appropriate fea- tures, some algorithms for instance Least Absolute Shrinkage, Minimum Redundancy Maximum Relevance, Relief, etc. were applied. The dataset has been gone through various statistical operations before training the models. In [12], authors have used several machine learning meth- ods to compare the accuracy of the heart disease diagnosis. Without any feature selection constraints, the Hybrid Random Forest with Linear Model (HRFLM) technique predicts CVDs with lower classification error and higher accuracy. In [13], authors analyzed the accuracy of each algorithms with the support of confusion matrix while developing a model for heart disease prediction. In this work, KNN performed much efficiently with 87% accuracy relative to other classi- fiers. In [14], relief has been identified as the best feature selection algorithm. Chest pain, exercise-induced angina, and thallium scan, are addressed as the most preferable features. Here, con- sidering the accuracy Logistic Regression has outperformed considering the accuracy. On the other hand, SVM is the best when it comes to specificity. In this study, they focused on reducing the time of execution. In [15], between Naive Bayes and Decision Tree, Decision Tree has done significantly well with 19 attributes. Each attribute’s information gain has been calculated and the highest value of information gain is taken to construct a shorter tree. In [16], the research demonstrates that it is critical to select the most appropriate and influential features to maximize the heart disease prediction result. In spite of opting for six features among the eight features, the accuracy varied a little. In this work, Random Forest yields the highest accuracy of 95%. To the best of authors knowledge, there exist a few works on the combined dataset used in this work. Apart from the application of traditional machine learning algorithms (SVM, Naı̈ve Bayes, Logistic Regression, KNN), this work includes ensemble voting classifier algorithm which is one of the recent findings incorporating multiple diverse models. III. METHODOLOGY Before approaching for the implementation of several ma- chine learning algorithms and analysis of their results, we have figured out some procedural steps and established a methodology to achieve our objectives. Fig. 1 represents the overall workflow of our study which sums up every required step to proceed towards the goal. Initially a dataset based on ECG report is read from the CSV file. The parameters of the dataset are studied and preprocessed before applying the algorithms to predict a result. The following sub-sections go into greater detail on our workflow. A. Data Collection To accomplish our goal, we have started with the data collection process from UCI repository datasets which are well verified by the researcher community. We have collected the dataset which is a combination of five popular independent datasets available in UCI machine learning repository. It is basically an ECG dataset and is combined over 12 common attributes from the five constituent datasets resulting in 1190 records in total which can be claimed as the largest CVD dataset [17] available for the research practitioners. The dataset contains common medical parameters related to heart condi- tion along with the information of comorbidities. The details of five constituent datasets are exhibited in Table I.
  • 3. Fig. 1. Overall workflow. TABLE I. Dataset Overview Name of Dataset Number of Data Source Cleveland Dataset 303 Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D. Hungarian Dataset 294 Hungarian Institute of Cardiology. Bu- dapest: Andras Janosi, M.D. Switzerland Dataset 123 University Hospital, Zurich, Switzer- land: William Steinbrunn, M.D. Long Beach VA Dataset 200 V.A. Medical Center, Long Beach Stalog (Heart) Dataset 270 University Hospital, Basel, Switzer- land: Matthias Pfisterer, M.D. Total Data 1190 B. Feature Description The dataset holds 1190 records of patients from four differ- ent countries (UK, US, Hungary and Switzerland). It consists of 11 features and 1 target variable as exhibited in Table II. C. Preprocessing of Data While working with enormous amount of diverse data, we had to preprocess the dataset. It helps to improve data efficiency in order to facilitate practical insights and to obtain better result from the system. Preprocessing of data converts unprocessed data into a comprehensible and readable format. For preprocessing, we have conducted the following steps: 1) Identifying and handling missing values: If we fail to find and resolve missing values appropriately, we may fail to draw an accurate conclusion. When there are enough samples in the dataset, a specific row that holds null values is removed to avoid the addition of bias. In another method, the missing value can be replaced with the mean, median or mode of a specific attribute, which is applicable for numerical data. We imported a python library pandas to apply isnull() function for detecting missing values and ended up encountering with TABLE II. Attributes of the Dataset SL No. Attribute Name Attribute Description 1 Age Patient’s age in years 2 Sex 1 = male; 0 = female 3 Chest pain Chest Pain Type and ranges from 0-3 depending upon the symptoms experi- enced by a patient. 4 Resting BPS Resting blood pressure (in mm Hg after being admitted to the hospital) 5 Cholesterol Serum cholestoral in mg/dl . 6 Fasting blood sugar Fastingbloodsugar > 120mg/dl (1 = signifies a blood sugar level in excess of 120mg/dl; 0 = signifies a blood sugar level lower than 120mg/dl) 7 Resting ECG Resting electrocardiographic results 8 Max heart rate The maximum heart rate of an individ- ual using a Thallium Test. Measured in beats per minute. 9 Exercise angina Exercise triggered angina (1 = yes; 0 = no) 10 Oldpeak ST depression induced by exercise rel- ative to rest 11 ST slope The slope of the peak exercise ST seg- ment 12 Target 1 or 0 ( 1 = heart attack may happen, 0 = heart attack may not happen) zero null value. This forecasts the efficacy and completeness of this dataset. 2) Data balancing: If a dataset contains positive values whose amount is approximately same as negative values, then the dataset is said to be balanced. Some machine learning classifiers struggle with imbalanced training datasets as they are vulnerable to the proportions of the different classes [18]. Fig. 2 shows the target classes where “1” represents patient having heart disease and “0” represents patients not having heart disease. The number of patients with heart disease is 629 whereas 561 patients have no heart disease. Thus from the figure, it can be observed that the target classes contain nearly equal number of entries which indicates a balanced dataset. Fig. 2. Distribution of target class.
  • 4. Fig. 3. Age variation for every target class. D. Data Analysis In Fig. 3, the variation of age for each target class is pictured where it is visible that people aging around 50-65 are more prone to have heart disease compared to people of other age classes. This graph depicts the probable tendency of having heart disease in a certain age group. Fig. 4. Correlation between different features. Fig. 4 illustrates a correlation heatmap. Correlation explains how one or more input data features are connected to one another to predict the target variable. Here, strength of the correlation ranges from -1 to +1. Values closer to zero means there is weaker linear relationship between the two variables. Values close to 1 refers that the variables are more positively correlated whereas values close to -1 are more negatively correlated. The most positive correlation between two or more variables tends to hold the darkest shade of green in the heatmap, while the most negative correlation tends to take on the darkest tone of red. From this correlation heatmap, we can ascertain that ST slope is the most positively correlated feature to the target (0.51) whereas max heart rate has the most negative correlation with the target (-0.41). E. Training and Testing Five significant machine learning classification methods namely KNN, Naive Bayes, SVM, Logistic Regression, and Ensemble voting classifier are used to develop prediction models on the dataset. Ensemble voting classifier is a hybrid classifier implemented with Logistic Regression, Random For- est and Naive Bayes. Before implementing the algorithms, the dataset is split into two portions including train set and test set. To train the machine learning model, a train dataset is used where this subset of the data already knows the corresponding output. Whereas the testing set is used to predict the outcome of the model. We have split our dataset into train-test ratio of 67:33. Here, the training dataset takes 797 records of the total data leaving the rest 393 records for testing purpose.The accuracy is calculated by comparing the actual response values with predicted response values. After building the prediction model we can make predictions on out of sample data to make sure that the model is ready to do heart disease prediction. IV. RESULT ANALYSIS Python programming is used to implement the classification algorithms as it offers the most versatile and enriched libraries. After going through the fore-mentioned steps the machine learning classifiers are trained on the chosen dataset. Numpy
  • 5. TABLE III. Comparison Among Algorithms Algorithms Training Accuracy Testing Accuracy Precision Recall F-measure SVM 82.56% 85.49% 87.05% 87.44% 87.24% Naive Bayes 83.18% 84.98% 87.27% 86.09% 86.67% Logistic Regression 81.05% 84.47% 86.49% 86.09% 86.29% Ensemble Vote Classifier 84.06% 84.11% 89.14% 88.34% 88.64% KNN 75.28% 75.31% 80.58% 74.44% 77.39% python library is used to perform mathematical operation on confusion matrix to yield the accuracy. Confusion matrix fa- cilitates the evaluation of the model for performance analysis. A confusion matrix represents a table layout of the different outcomes of the prediction that helps to visualize the results. Usually, it generates four outcomes as following: • True Positive (TP): The number of accurately identified actual positive values • True Negative (TN): The number of accurately identified actual negative values • False Positive (FP): The number of times the negative values are predicted as positive • False Negative (FN) : The number of times the positive values are predicted as negative The confusion matrix of SVM algorithm is given in Table IV. TABLE IV. Confusion Matrix of SVM Classifier Total Tests = 393 Predicted No Predicted Yes Actual No 141 29 Actual Yes 28 195 Just by observing the confusion matrix, the performance of the model can not be depicted clearly. To determine how accurate the model is, accuracy, precision, recall, f-measure are calculated using (1), (2), (3), (4) respectively. The estimation of properly classified values are determined by accuracy. It tells how often our classifier is predicting right. It is the total of all true values divided by total values. Accuracy = True Predictions Total Predictions (1) Precision = True Positives False Positives + True Positives (2) Recall = True Positives False Negatives + True Positives (3) F − Measure = 2 ∗ Recall ∗ Precision Recall + Precision (4) Table III shows the comparison of the implemented al- gorithms based on the aforementioned performance metrics. From the table, it is clear that SVM algorithm has outper- formed other classification algorithms with the accuracy of 85.49% as it aims to find the best hyperplane (also called decision boundary) in a binary classification model. It best splits our dataset into two classes yielding the prediction of whether a patient would have CVD or not. On the other hand, KNN has shown the lowest accuracy (75.31%) among these algorithms as it works well with a small number of input variables, but struggles when the number of input is very large. Fig. 5 shows in depth analysis of the models by plotting the total number of wrong predicted values with respect to each classification algorithm. The total number of wrong predicted values on the test set are calculated from the confusion matrix. Here, (FP + FN) is calculated to yield the total number of wrong predictions. SVM generates least amount of wrongly predicted values. On the contrary, KNN encounters the highest amount of wrong predictions as it holds the least accuracy. From the investigation, it can be inferred that SVM performs much competently as compared to other algorithms. S V M N a i v e B a y e s L o g i s t i c R e g r e s s i o n E n s e m b l e V o t e C l a s s i fi e r K N N 0 20 40 60 80 100 Prediction Models Total Wrong Predicted Values Fig. 5. Wrong predicted values for the applied algorithms. In Table V, the performance of our work and other existing
  • 6. TABLE V. Comparison with Other Existing Work Work Reference Accuracy Using SVM Ours 85.49% [6] 77.7% [11] 83% [13] 85% works are compared based on the accuracy obtained by the best performing algorithm in our work i.e. SVM. We attained better accuracy in this case by using combined dataset (1190 records) whereas the other works applied the algorithm on the Cleveland dataset (303 records). V. CONCLUSION The number of deaths due to heart disease is increasing day by day for the lack of early prognosis and timely treatment. In case of heart disease, early diagnosis can accelerate the chance of survival and also reduce the associated health complexities. In this work, we have developed a prediction model to detect heart disease based on some parameters derived from ECG reports. To achieve this goal, five fore-mentioned classification algorithms have been applied on a publicly available dataset with 1190 records which is a combination of five popular datasets in this field. Among the applied classifiers, Ensemble Voting classifier is a hybrid technique combining Logistic Regression, Random Forest and Naive Bayes. From our anal- ysis, it is observed that SVM performs realistically well with 85.49% accuracy. Further enhancement of this study is that a large-scale dataset can be collected from our native hospitals as individual health also depends on regions and socio-economic factors. In future, more analysis can be performed with the different combination of algorithms used in hybrid techniques to obtain a better performing heart disease prediction model. REFERENCES [1] T. WHO. (2016) Cardiovascular diseases. [Online]. Available: https: //www.who.int/health-topics/cardiovascular-diseases [2] R. Cementa. (2018) Heart attacks: Men vs. women. [Online]. Avail- able: https://www.caringseniorservice.com/blog/heart-attacks-men-vs. -women? [3] M. Diwakar, A. Tripathi, K. Joshi, M. Memoria, P. Singh et al., “Latest trends on heart disease prediction using machine learning and image fusion,” Materials Today: Proceedings, vol. 37, pp. 3213–3218, 2021. [4] F. Ali, S. El-Sappagh, S. R. Islam, D. Kwak, A. Ali, M. Imran, and K.-S. Kwak, “A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion,” Information Fusion, vol. 63, pp. 208–222, 2020. [5] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Intelligent and effective heart disease prediction system using weighted associative classifiers,” International Journal on Computer Science and Engineering, vol. 3, no. 6, pp. 2385–2392, 2011. [6] S. Anitha and N. Sridevi, “Heart disease prediction using data mining techniques,” Journal of Analysis and Computation, 2019. [7] V. Ramalingam, A. Dandapath, and M. K. Raja, “Heart disease predic- tion using machine learning techniques: a survey,” International Journal of Engineering & Technology, vol. 7, no. 2.8, pp. 684–687, 2018. [8] A. Gavhane, G. Kokkula, I. Pandya, and K. Devadkar, “Prediction of heart disease using machine learning,” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, 2018, pp. 1275–1278. [9] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of significant features and data mining techniques in predicting heart disease,” Telematics and Informatics, vol. 36, pp. 82–93, 2019. [10] P. Singh, S. Singh, and G. S. Pandi-Jain, “Effective heart disease prediction system using data mining techniques,” International journal of nanomedicine, vol. 13, no. T-NANO 2014 Abstracts, p. 121, 2018. [11] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart disease identification method using machine learning classification in e-healthcare,” IEEE Access, vol. 8, pp. 107 562–107 582, 2020. [12] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE Access, vol. 7, pp. 81 542–81 554, 2019. [13] A. Singh and R. Kumar, “Heart disease prediction using machine learning algorithms,” in 2020 international conference on electrical and electronics engineering (ICE3). IEEE, 2020, pp. 452–457. [14] A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun, “A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms,” Mobile Information Systems, vol. 2018, 2018. [15] S. Nikhar and A. Karandikar, “Prediction of heart disease using machine learning algorithms,” International Journal of Advanced Engineering, Management and Science, vol. 2, no. 6, p. 239484, 2016. [16] N. S. C. Reddy, S. S. Nee, L. Z. Min, and C. X. Ying, “Classification and feature selection approaches by machine learning techniques: Heart disease prediction,” International Journal of Innovative Computing, vol. 9, no. 1, 2019. [17] M. Siddharta. (2019) Heart disease dataset (most compre- hensive). [Online]. Available: https://www.kaggle.com/sid321axn/ heart-statlog-cleveland-hungary-final [18] D. J. Dittman, T. M. Khoshgoftaar, and A. Napolitano, “The effect of data sampling when using random forest on imbalanced bioinformatics data,” in 2015 IEEE international conference on information reuse and integration. IEEE, 2015, pp. 457–463.