SlideShare a Scribd company logo
1 of 17
Download to read offline
Informatics in Medicine Unlocked 26 (2021) 100696
Available online 11 August 2021
2352-9148/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Heart disease prediction by using novel optimization algorithm: A supervised
learning prospective
A R T I C L E I N F O
Keywords
Supervised learning
Salp swarm optimization algorithm
Heart attack Prediction
Neural network
A B S T R A C T
Data analysis in medicine is becoming more and more frequent to clarify diagnoses, refine research methods, and
plan appropriate equipment supplies according to the importance of the pathologies that appear. Artificial in­
telligence offers software solutions that are required to analyze the present data for optimal prediction of results.
A system model is capable of several data processing algorithms for the classification of heart disease. This
research work is particularly interested in the category of data. The classification allows us to obtain a prediction
model from training data and test data. These data are screened by a classification algorithm that produces a new
model capable of detailed data, possibly having the same classes of data through the combination of mathe­
matical tools and computer methods. To analyze the present data to predict optimal results, we need to use the
optimization technique. This research work aims to design a framework for heart disease prediction by using
major risk factors based on different classifier algorithms such as Naïve Bayes (NB), Bayesian Optimized Support
Vector Machine (BO-SVM), K-Nearest Neighbors (KNN), and Salp Swarm Optimized Neural Network (SSA-NN).
This research is carried out for the effective diagnosis of heart disease using the heart disease dataset available on
the UCI Machine Repository. The highest performance was obtained using BO-SVM (accuracy = 93.3%, preci­
sion = 100%, sensitivity = 80%) followed by SSA-NN with (accuracy = 86.7%, precision = 100%, sensitivity =
60%) respectively. The results reveal that the proposed novel optimized algorithm can provide an effective
healthcare monitoring system for the early prediction of heart disease.
1. Introduction
A disease in the human body is an unnatural medical condition. It
affects negatively the human body organism’s functional state. It is
generally associated with few signs of illness in the patient body. Ac­
cording to the World Health Organization (WHO), in the last 15 years,
an estimated 17 million people die each year from cardiovascular dis­
ease, particularly heart attacks and strokes [1]. Heart disease and stroke
are the biggest killers. To predict heart disease, Machine Learning can be
used for identifying unseen patterns and providing some clinical insights
that will assist the physicians in planning and providing care. (see
Fig. 8–19)
Heart disease refers to a series of conditions that include the heart,
vessels, muscles, valves, or internal electrical pathways responsible for
muscle contraction. According to the Centers for Disease Control and
Prevention(CDC), heart disease is one of the leading causes of death in
India, the UK, the US, Canada, and Australia. Cardiovascular diseases
(CVDs) are a leading cause of clinical (i.e., death and disability), health,
and economic burden globally, accounting for approximately 31% (17.9
million) of total deaths each year, One in four deaths in the USA occurs
as a result of heart disease [2]. Heart disease is common among both
men and women in most countries around the world. Therefore, people
should consider heart disease risk factors. Although it plays a genetic
role, some lifestyle factors significantly affect heart disease. The known
risk factors for heart disease; radiation therapy for age, gender, family
history, smoking, some chemotherapy drugs and cancer, malnutrition,
high blood pressure, high blood cholesterol levels, diabetes, obesity,
physical mobility, stress, and poor hygiene [3]. These are the various
risk factors in which the patient’s exposure towards developing a CVD.
In contrast, the removal or improvement of this factor decreases this
risk. This interpretation suggests the causality between the factor and
the illness, which means that the risk factor precedes the disease (the
notion of anteriority). Correction of the factor will cause the disease (the
idea of reversibility) to decrease its occurrence. Of course, it must be
recognized in several different populations and offer a plausible phys­
iopathological explanation of the disease. Strictly speaking, when there
is no direct causal relationship, it is a “risk marker,” a witness to a
process (e.g., the elevation of microalbuminuria, elevation of C-reactive
protein “CRP”). We will see the main heart disease risk factors such as
physiological factors (age, sex, and menopausal status), lifestyle factors
(smoking, physical activity, alcohol, stress), metabolic syndrome factors
(insulin resistance), dyslipidemia, abdominal obesity, high blood pres­
sure) and dietary factors. A heart disease risk factor is defined as a factor
in which the patient’s exposure to this factor increases the risk of
developing a CVD. In contrast, the removal or improvement of this
factor decreases this risk. The risk factor’s importance is defined by the
association’s strength with the disease (expressed by the relative risk
observed in the exposed subjects compared to the unexposed) and the
Contents lists available at ScienceDirect
Informatics in Medicine Unlocked
journal homepage: www.elsevier.com/locate/imu
https://doi.org/10.1016/j.imu.2021.100696
Received 14 May 2021; Received in revised form 20 July 2021; Accepted 7 August 2021
Informatics in Medicine Unlocked 26 (2021) 100696
2
gradual association (parallel to the risk factor).
When the dataset contains various features, some of them are not
useful and cause bad results. Therefore, the main aim of this research is
to use a combined method to improve the classification and better
feature selection which will lead to a better diagnosis of heart disease. In
this study, an imperialist competitive algorithm with a meta-heuristic
approach is used to optimize the selection of important features in
heart disease. This algorithm can provide a more optimal response for
feature selection towards genetic and other optimization algorithms.
After data pre-processing of the dataset spitted into two sets training and
testing set. The training set with 80% and the testing set with 20%. After
feature extraction, the features have been supplied to K-nearest neighbor
(KNN), Naïve Bayes Classifier, Support Vector Machine for classification
proposes. Therefore, using the combination of these four methods can
lead to improving the result of heart disease diagnosis and their different
aspects. In other words, we are trying to improve classification accuracy
on heart disease diagnosis. The proposed K-nearest neighbor (KNN),
Naïve Bayes Classifier, Support Vector Machine classifier idea has never
been done before. According to the simulation result section 13 the
proposed method has achieved a better result in comparison with other
algorithms, which has had two advantages, first decreasing the number
of features, second, increasing classification accuracy.
Objectives of this research are as follows:
• Data collection from new features about heart disease.
• Prediction and classification of incidence of heart disease using the
proposed method.
• Using new feature selection algorithms for the first time.
• Providing a new combined approach with higher accuracy
The paper is organized as follows: Section 2 contains a literature
review, section 3 contains design parameters, Section 4 contains the
optimization methods, section 5 contains the research gap, the proposed
methodology is presented in section 6, data analysis and classification
methods are presented in section 7,8 and 9, finally experimental results
and conclusion are presented in the last section.
2. Literature review
Any classification regarding Feature selection plays a significant role.
Later, Swarm algorithms are suggested and they proved a valuable
performance for feature selection. There are some studies on the clas­
sification of heart disease in the literature. One of them is the study of
hybrid smart modeling schemes for the classification of heart disease of
Shao et al. [4]. This paper uses 13 risk factors for heart disease predic­
tion. This study, which differs from existing approaches, proposes a
novel hybrid framework to achieve various risk factors. This hybrid
framework contains three methods; Multivariate Adaptive Regression
(MAR), Logistic Regression (LR), and Artificial Neural Network (ANN).
Initially, the encoded values of risk factors are reduced by using LR and
MAR. Then, the rest of the encoded factors are used for the training of
the ANN. The simulation results show that the hybrid approach out­
performs the conventional single-stage neural network [4]. The study of
the use of data mining techniques in the prediction of heart diseases by
Priyanka et al. compared the performances of Naïve Bayes and Decision
tree algorithms, and the decision tree algorithm yielded much more
successful results than Naïve Bayes, which gave an accuracy rate of
98.03%–82.35% [5]. Yekkala et al. [5] used Particle Swarm Optimiza­
tion (PSO) in conjunction with particle methods (Random Forest, Ada­
Boost, and Bagged Tree) to more accurately predict the results. The
Heart Stalog dataset has 270 samples and 14 attributes, taken from the
UCI database [5]. The data has already been processed, and PSO is used
as a feature selection method to delete unnecessary and missing data.
The significant features have been tested on the community classifier for
various performance measures and steps as follows. After loading the
data collection, after using PSO’s data to pick the element, removed the
cleaning technique used for data after removing useless functions.
Powerful features continued, and the AdaBoost, Bagging, and Random
Forest. The two factor’s importance to full features. Eventually, we
measured the performance of each algorithm. As a result, Bagged Tree
performed 100%, Random Forest 90.37%, and AdaBoost 88.89%. Ac­
cording to test results, Yekkal et al. [5] proved that using Bagging Trees
on PSO will improve learning accuracy in predicting heart disease. Amin
et al. [6] show a heart disease prediction model using a genetic algo­
rithm, neural network, Naïve Bayes, Bagging Trees, Decision Tree, Core
Density, and SVM. Learning is faster, more stable, and accurate
compared to back-propagation. Collected risk factors data of 50 patients
and the hybrid model resulted in 96% training accuracy and 89% test
accuracy. Amin and his colleagues then developed the system using the
hybrid fuzzy and k-nearest neighbor approach to predict heart diseases;
in another system, using the neural network community was used with
an accuracy of 89.01% in the diagnosis of heart disease. This hybrid
system’s advantage is to help patients reduce their cost and time and
control themselves for medical examinations before their heart disease
and side effects. The researchers compared the algorithms used ac­
cording to the confusion matrix. In the end, understood that the J48
recorded the highest Accuracy at 99%. The K-nearest neighbor algo­
rithm is simple, but it can give impressive results. Factor’s importance
about a classification method widely used in many fields and is also
found among the top 10 data mining algorithms [8]. Typically, houses
that are close to each other have similar characteristics. We can group
them and give them a classification. The algorithm uses this same logic
to try to group the elements that are close to each other. There are two
basic types of data mining techniques; Predictive methods and
descriptive methods [7].
• Descriptive Methods: These methods identify the current situation,
describes the common belongings of the data in the dataset, and
emphasize the understanding and interpretation of the feature.
• Predictive Methods: These methods by learning the past simulate
the feature. They use data with the help of known results to develop a
model that could predict the values of other data.
The stress risk factor seems to be complicated by a depressive state
that follows the myocardial infarction. According to some studies, the
incidence of depression is higher in patients with CVD than in those
without CVD. Several studies agree that after myocardial infarction, a
depressive state increases the recurrence risk by two years following the
heart disease event [9]. Various approaches exist to explain the corre­
lation between psychosocial factors (stress, anxiety, and depression) and
CVD. These factors increase catecholamine synthesis with their conse­
quences on the different metabolisms, blood pressure, and heart rate
[10]. According to Ref. [11], Data Mining has three main axes: Statistics,
Artificial Intelligence (AI, including Machine Learning), and Databases.
Although these three axes are well specified, it is difficult to give a single
definition for Data Mining. However, the description used is probably
stated in Ref. [11], where it is mentioned that “Data Mining is the
non-trivial process of extracting information from the data that is pre­
sent implicitly, previously unknown and potentially useful for the user.”
This information is present in the data as patterns that are very useful
when applied to solve problems in a particular context. Ah. E. Hegazy
et al. [12] have highlighted how to improve basic SSA structure to
enhance accuracy, convergence, speed, and reliability. In this research,
the author presented a new control parameter to adjust the existing
solution and suggested a new name to the improved salp swarm algo­
rithm(ISSA). This algorithm is used for testing the feature selection task.
The combination of the ISSA algorithm and K-nearest neighbor classifier
is used for feature selection. In this work, they used 23 UCI datasets for
finding the performance of the ISSA algorithm. ISSA is used as a wrapper
feature selection with the combination of KNN classifier as a fitness
function. The researcher compared ISSA with the other four swarm
methods. They got superior results than the previous feature reduction
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
3
and classification accuracy. For ISSA, the average classification accuracy
result is 0.8422. Pei Du et al. [13] have implemented a model for getting
accurate and reliable air pollutant forecasting. Air pollution seriously
affects the human beings’ living environment, and it could be even
endangering human being lives. To overcome this problem the
researcher proposed a novel hybrid model. To decompose the original
time series into different models a robust data preprocessing model is
implemented. This model contains a low frequency and high frequency.
For air pollution series prediction the researcher used the ELM model
parameters with the help of high forecasting accuracy and consistency.
The authors used two experiments, named PM2.5forecasting and PM
10forecasting, to illustrate the hybrid models’ superiority. Liyuan Gao
et al. [14] have highlighted early disease prediction and diagnosis are
most important for the betterment of patients’ survival. It is critically
important to recognize the patient’s condition and its predictive char­
acteristics. The authors provided a comparative analysis of various
systems of machine learning. By sampling with substitution, the unit
calculates the standard deviation of the data. The researchers have given
more emphasis in this research regarding analyzing and comparing
Machine learning strategies to predict Breast cancer, Heart Disease, and
identifying early high-risk features. The results of this research show
how the Bayesian Hyperparameter Optimization model is better than
random search and grid search methods. The researcher used the breast
cancer diagnosis dataset for the Extreme Gradient Boosting model, and
they got 94.74% accuracy, and for the heart disease dataset, they got
73.50%. Ahmed A. Abusnain et al. [15] have highlighted pattern clas­
sification is the most popular application of neural networks. It is most
important to train the neural network. The researcher highlighted that
the back-propagation algorithm is low in convergence rate, hence
overcoming the Salp Swarm Algorithm(SSA) algorithm. SSA gives good
performance results for optimization problems. For this research work,
they proposed SSA’s for optimizing the weight coefficients for the neural
networks for identifying pattern classification. They used the UCI ma­
chine learning repository dataset. Here an approach is used to adjust the
parameter of NN connections weights using SSA algorithms. Zaher
Mundher Yaseen et al. [16] Have highlighted a classical extreme
learning machine (ELM) model algorithm on a non tuned. This approach
is based on a random procedure which is not efficient in the convergence
of outstanding performance in case of local problems. In this work, the
researcher investigated the forecasting of the monthly Tigris river,
Baghdad. The researcher used Salp Swarm Algorithm by using ELM for
this work. They took twenty years of river flow data time series, and the
results were evaluated depending on graphical presentations and several
statistical measures. In this work they used the SSA-ELM model, the
results were found an acceptable level of augmentation regarding the
absolute metrics achieved 8.4 and 13.1% for RMSE and MAE, respec­
tively. Youness Khourdifi et al. [17] highlighted that machine learning is
one of the key areas for predicting heart disease. The researcher high­
lighted optimization algorithms having the advantages of dealing with
complex non-linear problems with high adaptability and flexibility. In
this research, to improve the quality of heart disease classification, a
method named Fast Correlation-Based Feature Selectin (FCBF) is used to
filter redundant features. The researcher allows used various classifi­
cation algorithms like support vector machine, k-nearest neighbor,
naïve Bayes, and random forest by particle swarm optimization com­
bined with ant colony optimization process. With the proposed mixed
approach, the researchers applied the heart disease dataset for heart
disease classification. With the help of proposed optimized models,
including FBF, PSO, and ACO, they got a maximum classification ac­
curacy of 99.65% with KNN and 99.6% with RF. Jiyang Want et al. [18]
have proposed that reliable and effective load forecasting is one of the
important factors for operation decisions and power system planning.
The safety and economic operation of the power system are directly
affected by forecasting accuracy. Due to the complexity and instability
of power load, forecasting accuracy is a most challenging issue. Hence
the researcher proposed a novel hybrid system for designing forecasting
by embedding a multi-objective module. A detailed salp swarm algo­
rithm and its critical characteristics were highlighted by Abualigah et al.
[19]. SSA is one of the effective meta-heuristic optimization algorithms
used for optimization problems discussed by the author. In machine
learning, wireless networking, engineering design, storage power en­
ergy, and image processing, SSA can be used. They have done a
comprehensive review in this research for various SSA types, including a
chaotic salp algorithm, hybridizations of scalp swarm algorithm, binary
scalp swarm algorithm, etc. The researcher highlighted the different
limitations of the salp swarm algorithm. The SSA algorithm is having
less control over multimodal strategies. Finally, the review says SSA
shares a few advantages they are speed, simple and hybridization with
other optimization algorithms. Sobhi Ahmed et al. [20] have highlighted
the classification algorithm’s performance for the data dimensionality.
Due to the high dimensionality of data, many problems related to clas­
sifier for its computational time are high, to avoid this, Feature selection
is the best solution. This technique aims to reduce the number of features
and irrelevant data, noisy data, and redundant data removed. The
author highlighted metaheuristic algorithms are superior for solving this
type of problem. In this research, the authors proposed a chaotic version
of the Salp Swarm Algorithm. They used four different types of chaotic
maps for controlling the balance between exploration and exploitation.
The researchers used twelve well-known datasets in this research, which
were brought from the UCI data repository. For the wrapper feature
selection, they used the K-NN classifier evaluator. They divided each
dataset into two parts, like 80% for training the data and 20% for testing
the data. Subrat Kumar Nayak at el [21] highlighted how to deal with
real-world data, which are more involved. To deal with such kinds of
data, feature selection plays an important role. In this research, the
author highlighted a Filter Approach using a Multi-objective differential
evolution algorithm for feature selection. This algorithm is applied to
handle the duplicate and unwanted features of a given dataset. The
researcher highlighted two objectives. These are how to remove
redundant ones, and the other is erroneous features by evaluating their
relevance concerning additional features and class labels. In this novel
work, the researcher used feature subsets of 23 required datasets and
they are tested using 10-fold cross-validation. In this research 23,
benchmark datasets are tested using 10- fold cross-validation with the
help of four different well-known classifiers to get the result. Yun Bai
et al. [22] have highlighted a PM2.5 concentration forecasting, which is
useful and essential for protecting public health. In this research, the
author proposed an ensemble long short-term memory neural network
(E-LSTM). The proposed model is implemented in three different steps:
multimodal feature extraction, multimodal feature learning, and inte­
gration. In this research, real datasets were used. The datasets are
collected from Beijing and China’s environmental monitoring stations.
They developed various LSTMs in different modes with the aid of
E-LSTM; it was used as a single LSTM, feed-forward neural network, and
the result was achieved as a mean absolute error of 19.604%, root mean
square error 12.077, and correlation coefficient criteria 0.994. Alani, H.,
et al. [23] highlighted that chronic kidney disease leads to high mor­
tality rates and high patient expenditure. CKD may be one of the critical
factors which lead to heart disease, and it’s a significant cause of death
among renal transplant patients. This kind of disease is mainly
uremia-specific, and it increases in prevalence as kidney function de­
clines. Due to the uremia-specific, it leads to various risk factors. They
are hemoglobin, abnormal bone, albuminuria, and mineral metabolism.
This kind of disease is found due to a lack of diagnostic screening tools, a
lack of sensitivity and specificity to make them reliable, and the need for
more RCT-quality evidence to guide intervention. Jacqueline O’Toole
et al. [24] expressed, as heart disease can be predicted earlier in young
adults even the risk can be reduced from future CVD burden. Mostly in
young adults, the CVD risk is found when they are getting chest pain.
This research analyzed lifestyle habits and CVD risks in young adults. In
this work, they used 26 young adults data, with the age group of 39–40
years. The survey of this works shows a low risk of getting heart disease
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
4
within ten years due to their age. Where half of the young adults were
identified with only 2 or more CVDRFs. But most of the adults were
suffered from heart disease due to sedentary and overweight. Puja
Wieslaw et al. [25] approached that the feature selection is the initial
part in most of the knowledge discovery experiment work. In this
research, a tree-based generational feature selection application is pre­
sented for medical data analysis. The basic approach was to estimate the
importance of attributes for extraction from a given structure of the tee
with the recursive application of the generation of feature sets. It helps
remove selected features from a dataset. It creates a next-generation
with a critical feature set, and this process continues until a crucial
feature will be a random value. The researcher applied this process to a
real-world medical dataset, including the Colon dataset and Loung
Dataset. In this work, they recognized that almost all truly relevant
features (19 of them) were simultaneous, with average Accuracy
increasing from 0.68 to 0.74,Only one relevant feature wasn’t identified
as important.
Juan-Jose Beunza et al. [26] highlighted the usage of a supervised
machine learning algorithm for predicting clinical events for its validity
and accuracy. In this work, the data were brought from Framingham
heart study data which contains 4240 observations. For this research,
they focused on risk factors of heart disease with a combination of data
mining. The researcher used different machine learning algorithms
along with RapidMiner and R-Studio for analyzing the data. A neural
network model was implemented to omit all the missing values when the
AUC is 0.71. Later they used the same data by using RapidMiner and
support vector machines and they got an AUC of 0.75. Sinan Q. Salih
et al. [27] highlighted as metaheuristic algorithms are more suitable for
solving different optimization and engineering problems. These kinds of
algorithms have few issues. They are global search and local search
capabilities. The researcher developed an algorithm called nomadic
people optimizer for simulating nature for simulating the nature of the
people’s movement, how they search their food, and how their life is
under years, etc. This research was focused on a multi-swarm approach,
where several clans consist, and each clan finds its best place. Validated
the algorithm on 36 no of unconstrained benchmark functions. The
outcome of the research was a unique solution for the NPO algorithm.
Khaled Mohamad Almustafa et al. [28] highlighted that heart disease
becomes one of the common diseases, and early diagnosis for this dis­
ease is challenging for healthcare providers. In this research, the re­
searchers implemented various classifiers for classifying the heart
disease dataset for predicting heart disease with minimal attributes.
They have collected the dataset from Cleveland, Switzerland, that con­
tains 76 characteristics with a class attribute of 1025 patients. Out of the
76 attributes in this work, using only 14 features. The researcher used
various algorithms, including a k-nearest neighbor, decision tree, Naïve
Bayes, SVM, stochastic gradient descent for best classification, and
predicting heart disease cases. Using these classification algorithms, the
researcher got accuracy results for KNN (k = 1), JRip a Decision tree
with 99.70,97.26%, and 98.04, respectively. K.Vembandasamy et al.
[29] approached health care as an essential factor in human life and
health concern business nowadays, becoming notable in medical sci­
ence. The health care industry has a large amount of patient data, and on
these data, various data mining techniques are applied to detect heart
diseases in the patient. But using the data mining technique couldn’t get
significant test results with the hidden information, so the researcher
proposed a system using data mining algorithms to classify the data and
detect heart diseases. In this research, the researcher used the Naïve
Bayes algorithm to diagnose heart disease patients and implement the
experiments using the weka tool. The proposed naïve Bayes model was
satisfied for classifying 74% of the input instances correctly. It exhibited
a precision of 71% on average, recall of 74% on average, and F- a 71.2%
measure. Vikas Chaurasia et al. [30] discussed heart disease as an
essential factor that causes death, and most of these deaths are found in
low and middle-income countries. The healthcare industry collects
enormous amounts of heart disease data, but these data are not
well-mined to discover the hidden information to make effective
decision-making. In this research, the researcher highlighted different
knowledge discovery concepts in databases using various data ming
techniques to help medical practitioners make an effective decision.
The research work’s primary motto was to predict the presence of
heart disease more accurately with fewer attributes. The researcher took
only 11 features and used three classifiers named J48 Decision Tree,
Naïve Bayes, and Bagging Algorithm used to predict patients’ diagnosis.
In this work, the researcher got the highest Accuracy is 85.03%, and the
lowest is 82.31%, where other algorithm yields an average accuracy of
84.35%. Yudong Zhang et al. [31] highlighted that practical swarm
optimization is treated as a heuristic global optimization method and is
one of the most commonly used optimization techniques. In this
research, the author presented PSO’s comprehensive investigation with
its advances, modifications like quantum-behaved PSO, chaotic PSO,
Fuzzy PSO, etc. The author surveyed various applications of PSO in
different areas. They are automation control systems, operation
research, communication theory, fuel, and energy, etc. This work is
divided into various aspects, including PSO modifications, an extension
of PSO, hybridization of PSO, parallel implementation of PSO, and
theoretical analysis of PSO.
Jianzhou Rizk-Allah et al. [32] highlighted on salp swarm algorithm.
It is the recent meta-heuristic algorithm that imitates salps’ behaviors at
the time of navigating and foraging. This research, proposing a new
version of SSA with named BSSA i.e binary salp swarm algorithm. The
proposed BSSA is used for comparing four different variants of trans­
formed functions for solving few global optimization problems. Along
with few nonparametric statistical tests were carried out named Wil­
coxon’s rank-sum with 5% significance level for judging the importance
of the obtained results among the different algorithms statistically. In
this work, the results of BSSA are better than other algorithms.
Patro, S. P et al. [33] highlighted the challenge of taking health care
anthem of age populations social care. Mostly, heart disease and chronic
illnesses become more dangerous on these aged people, and sometimes
it leads to a heart attack without any omens. It is too difficult for doctors
to identify the patient’s status in time. In this regard, the researcher
proposed a model that can identify these challenges, remotely real-time
patient health data. A framework was proposed for predicting heart
disease using major risk factors based on different classifier algorithms,
including Naïve Bayes, K-Nearest, Support Vector Machine, Laso, and
Ridge regression. Along with this for data classification, the researcher
used principal component analysis and linear discriminant analysis.
They used an open-source data set. In this process, they used 14 attri­
butes. After successful implementation, the support vector machine
provides 92% accuracy, and F1 Accuracy is 85%. Khan, M. A., et al. [34]
discussed the IoT application, including manufacturing, agriculture,
healthcare, etc. The researcher focused on wearable devices application
in the health monitoring system named Internet of Medical Things
(IoMT). We can identify the mortality rate of early detection of heart
disease prediction with clinical data analysis with its help. In this
research, they investigated the key characteristics of heart disease pre­
diction using machine learning techniques. To improve the prediction
accuracy of an IoMT framework designed to diagnose heart disease, they
used modified salp swarm optimization, an adaptive neuro-fuzzy infer­
ence system in this research. The proposed MSSO-ANFIS prediction
model obtains an accuracy of 99.45 with a precision of 96.54, which is
higher than the other approaches. Wang, J., et al. [35] proposed the
coronary arteriography (CAG) approach for the diagnosis of coronary
heart disease (CHD). Machine learning’s help to perform different se­
lective of multiple ml algorithms for feature selection methods is used in
the health care industry. In this research, they implemented a two-level
stacking named base-level and meta-level. The prediction of base-level
classifiers is selected as the input of the meta-level. They used the
Z-Alizadeh Sani CHD dataset in this research, and this data set consisting
of 2020 CAG cases. This model’s results obtained an accuracy, speci­
ficity, and sensitivity of 95.3%, 94.44%, and 95.84%, respectively.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
5
The common goal of all the above-mentioned techniques is to classify
heart disease using hybrid classification techniques. Many of the
research is carried out using only classification and optimization tech­
niques. The proposed approaches are presented to achieve the desired
results by identifying different optimization techniques with various
machine learning algorithms.
In this research different classifier approaches are proposed that
include combination of different ensemble-based machine learning al­
gorithms to identify the redundant features to improve the accuracy and
quality of heart disease classification. We will present a comparative
analysis for a heart disease dataset classification using various classifi­
cation algorithms, all the classifiers are mostly used for similar heart
disease-related research for datasets classification. The classifiers were
used with cross-validation with 10 folds method, then we will study the
performance of the Bayesian Optimized Support Vector Machine (BO-
SVM), K-Nearest Neighbors (KNN) classifiers using various training set
instead of the cross-validation methods using K-10 folds classification.
Then, we will apply a different classifier algorithm such as Naïve Bayes
(NB), Bayesian Optimized Support Vector Machine (BO-SVM), K-Nearest
Neighbors (KNN), and Salp Swarm Optimized Neural Network (SSA-
NN). The main goal of this research is to find the best accuracy for the
prediction of heart disease by using major risk factors based on different
classifier algorithms such as Bayesian Optimized Support Vector Ma­
chine (BO-SVM), K-Nearest Neighbors (KNN).
3. Design parameters
Parameters or design variables are controlled factors that influence
performance. They can be of various natures: geometric dimensions,
properties of materials, structural choices, etc. They may be quantitative
or qualitative, continuous, or discrete. The selection and the number of
parameters also determine the definition of the optimization problem.
There are many factors for increase the search space, but the optimi­
zation process will take longer. For example, a suitable geometrical
shape, to ensure the validity of the modeling retained and its proper
functioning, etc.
4. Optimization methods
4.1. Continuous optimization
Continuous optimization is done by two methods, the first linear and
the second non-linear.
• Linear optimization in integers studies linear optimization problems
in which a particular or all variables are controlled to get integer
values.
• Non-linear optimization delivers the normal case in which the
objective or constraints (or both) contain non-linear, possibly non-
convex, parts.
4.2. Combinatorial optimization
Combinatorial optimization consists of finding the best solution be­
tween a finite number of choices. In other words, to minimize a function,
with or without constraints, on a finite set of possibilities. When the
number of possible combinations becomes exponential concerning the
problem’s size, the computation time becomes rapidly critical.
A generalized optimization problem is solved if it consists of finding
a solution _x0001_ optimizing the cost function’s value _x0001_.
Formally, we thus seek s*
∈ X such that f (s*
) ≤ f for all s ∈< /s > (1)
Such a solution S*
_x0001_ Is called an optimal solution or a global
optimum.
5. Research gap
Data quality is viewed from various basic dimensions (Accuracy,
timeliness, relevance, completeness, intelligibility, and reliability),
mainly addressing the data’s integrity in a particular research project.
Missing data values cause errors; data without value creates ambiguity
since it can be correct or wrong. Its importance lies in the fact that
decision-making efficiency depends on the quality of the data. Small
improvements in the dimensions of the data can lead to substantial
improvements in the information for decision-making. Hence, it is
beneficial for organizations to have proven studies of the selection and
evaluation of characteristics of computational learning techniques and
to use hybrid technologies that improve the results obtained. Various
methods have been created for the analysis of heart disease. Be that as it
may, there is dependably a degree for development, and still, a few
systems are being created to beat the confinements of the current stra­
tegies. There are different data mining techniques for discovering re­
lations between the diseases, their symptoms, and prescriptions.
Although such methods have certain constraints, iteration count,
disposing of the consistent contentions, higher response time, and so
forth. The principal limitation of the back-propagation neural network is
that it results in a higher MSE (mean squared error) if the weight value is
random (not optimized). Therefore, this research work uses the Salp
Swarm algorithm to optimize the neural network’s weight value to
reduce the MSE. Thus the Accuracy of an SSA-optimized Neural Network
is higher than that of a Neural Network alone. Similarly, the support
vector machine is optimized by Bayesian optimization. The KNN and
Naïve Bayes classification algorithms are also used for comparative
analysis of the research work.
6. Proposed methodology
In this research work, we have given more focus to Machine
Learning. Machine learning is a discipline that contains algorithms,
which helps for empirical data to carry out in two approaches. First,
identify the complex relationship through the data’s characteristics and
employ the patterns to predict.
In the data, it’s possible to find the relationship between the vari­
ables observed through algorithms, which would come to be like a
machine that learns from a data sample or training data to capture the
characteristics that are not observed through the probability distribu­
tion. It is possible to use the learned knowledge to make a smarter de­
cision using new data. Normally we can classify machine learning
algorithms into different categories depending upon the results. Few of
these classifications are supervised learning and non-supervised
learning.
When we want to analyze huge variables, we may face some prob­
lems with high dimensionality. To avoid such problems, there is a va­
riety of classified methods are used. For example, using a one-step
selection method before using some other approach can increase the
latter’s power. There are a few general characteristics of each of the
strategies defined, and they are 1. Method of dimension reduction 2.
Method of selection variables.
6.1. Data analysis and encoding
In this work, the Cleveland dataset is used. The data were taken for
this work in the form of a matrix. The matrix contains a set of rows and
columns. By taking the data, we are predicting heart disease. In the UCI
repository, there is various heart disease dataset available. They are
Hungarian, Cleveland, Switzerland. The dataset contains 76 attributes
and 303 records. But all published experiments refer to using a subset of
14 of them. The target column in the given dataset includes two different
classes; for heart disease, it indicates 1 otherwise, 0.
The important risk factors of the dataset are revealed in Table 1. The
table consists of various risk factor and their corresponding values along
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
6
with the encoded values in brackets. These encoded values will be used
as input to the proposed framework.
Machine learning methods are dynamic because they usually contain
several parameters that need to be optimized for best performance, and
it can be tiring to optimize data manually as well as overloading.
Therefore, the two classification approaches, SVM and Neural Network
are optimized by Bayesian Optimization and Salp Swarm Optimization,
respectively. Also, two standalone classification methods are used are
KNN and Naïve Bayes. These approaches are proposed to define an op­
timum number of clusters in analyzed data.
6.2. Data analysis and encoding
The significant risk factors of the dataset are revealed in Table 1. The
table consists of various risk factor and their corresponding values along
with the encoded values in brackets. These encoded values will be used
as input to the proposed framework. Machine learning methods are
dynamic because they usually contain several parameters that need to be
optimized for the best performance. It can be tiring to optimize data
manually and overloading. Therefore, the two classification approaches,
SVM and Neural Network are optimized by Bayesian Optimization and
Salp Swarm Optimization, respectively. Also, two standalone classifi­
cation methods are used are KNN and Naïve Bayes. These approaches
are proposed to define an optimum number of clusters in analyzed data.
The proposed methodology is shown in Fig. 1.
7. Classification using K-Nearest neighbor
First of all, three parameters are considered: the sample data, the
number of closest neighbors to select (K), and the point we want to
evaluate (X). Subsequently, for each element of the sample, we assess
the distance between reference point X and point X; of the learning set,
and we check if the distance between them is less than one of the dis­
tances contained in the nearest neighbors’ list. If so, the point is added to
the list. If the number of items in the list is more significant than K, the
last value is simply removed from the list. The above Fig. 2 illustrates the
classification of the K-Nearest Neighbor. The algorithm itself is not very
complicated and can result in brute force if sampling is not too big.
However, since we are talking about data mining, the number of in­
dividuals to be evaluated is often very big, that’s why an optimization
algorithm is needed. There are many types of trees to speed up a search
like the JCD tree or the ball tree. The algorithm ball tree will be covered
later in this report. Here is the pseudo-code representing the algorithm
[37].
Phase -1: In Heart disease prediction, the data set to follow one of
the topmost repositories, which is called UCI machine learning, is a
collection of data generators that are used to analyze the machine
learning algorithm.
Phase -2: The Data preprocessing step will remain the same as Lo­
gistic Regression, which refers to cleaning and organizing the raw for
building and training the machine learning models. Generally, the Data
preprocessing for machine learning follow specific steps, like as;
1.1. Import libraries.
1.2. Import the dataset, which datasets have almost come in CSV
formats
1.3. Focused on Missing Data in Dataset. For identifying the missing
data a library we use is called “Scikit Learn” preprocessing, which
Table 1
Risk factors and their corresponding encodings [36].
S. No. Risk Factors Values
1 Sex Male (1), Female (0)
2 Age (years) 20-34 (− 2), 35–50 (− 1), 51–60 (0),
61-79 (1), >79 (2)
3 Blood
Cholesterol
Below 200 mg/dL - Low (− 1)
200–239 mg/dL - Normal (0)
240 mg/dL and above - High (1)
4 Blood
Pressure
Below 120 mm Hg- Low (− 1)
120–139 mm Hg- Normal (0)
Above 139 mm Hg- High (1)
5 Hereditary Family Member diagnosed with HD -Yes (1) Otherwise
–No (0)
6 Smoking Yes (1) or No (0)
7 Alcohol Intake Yes (1) or No (0)
8 Physical
Activity
Low (− 1), Normal (0) or High (1)
9 Diabetes Yes (1) or No (0)
10 Diet Poor (− 1), Normal (0) or Good (1)
11 Obesity Yes (1) or No (0)
12 Stress Yes (1) or No (0)
Output Heart Disease Yes (1) or No (0)
Fig. 1. Proposed methodology.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
7
contains a class called “imputer,” which will help us take care of
the missing data.
1.4. Encoding categorical data.
1.5. Split the dataset into a Training set and Test set.
1.6. Feature Scaling. This step is the final step of data preprocessing
Phase -3: Training data set, here fit the K-NN classifier to the training
data. To do this, we will import the K- Neighbors Classifier class of the
Sklean Neighbors library. After importing the class, we will be able to
create the Classifier object of the class.
Phase -4: After evaluating the Training data set, It goes through the
K-Nearest Neighbor method for the prediction of the class, we are
implementing the KNN algorithm, which will be worked on as like;
4.1 Load the data
4.2 Initialize K to your chosen number of neighbors.
4.3 Now compare the Actual/Desired output. For taking the pre­
dicted class, repeat the process from 1 to the total number of
training data points. Then calculate the distance between test
data and training data by using the most popular distance metric
called the Euclidean distance method, and the index is the sort to
the ordered collection. Now pick the first K entries from the
sorted array and get the most frequent class of the selected K-
entries.
4.4 If any Error, then repeats step-1 to step 3. Otherwise, return the
predicted class.
Determine the class of x from the class of examples whose number is
stored in the KNN.
8. Classification using Naïve Bayes classifier
The Naïve Bayes technique is based on the theory of probability. In
this, conditional probabilities are calculated by frequencies to predict
the prediction of new cases. The Fig. 3 shows the Naïve Bayes classifier-
based approach.
Let E and F events, we can express E as:
E = EF ∪ EFc
(2)
That is, for an event E to occur, E and F must occur, or E must occur and F
not. Because EF and EFc
are mutually exclusive, then we have:
P(E) = P(EF) + P(EFc
)
= P(E|F)P(F) + P(E|Fc
)P(Fc
)
= P(E|F)P(F) + P(E|Fc
)(1 − P(F)) (3)
Equation (3) states that the probability of event E is a weight of the
conditional probability of E given that F has occurred and the condi­
tional probability of event Egiven that F has not occurred. Each condi­
tional probability provides as much weight as the conditioned event
Fig. 2. Classification using K-Nearest Neighbor.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
8
tends to occur.
Equation (3) can be generalized as follows: suppose that events
F1, F2, ...Fn are mutually exclusive such that ∪n
i=1 Fi = S, where S is the
sample space. In other words, precisely one of the events will occur
(Fig. 4).
One can write the above as:
E = ∪
n
i=1
Ei (4)
From the definition of conditional probability, we have:
P(EFi) = P(E|Fi)P(Fi) (5)
Furthermore, using the fact that the EFi events i = 1, …, n are
mutually exclusive, we obtain that:
P(E) =
∑
n
i=1
P(EFi) =
∑
n
i=1
P(E|Fi)P(Fi) (6)
Thus, equation (6) shows how forgiven events F1, F2, ... Fn of which
one and only one can occur, P(E) can be calculated conditioning that F1
occurs. That is, it is established that P(E) is equal to the average of the
weights of P(E|Fi) and each term is weighted by the probability of the
event in which it is conditioned. Now suppose that E has occurred and
that you want to determine the probability that event Fj has occurred. By
equation (6) we have:
P
(
Fj
⃒
⃒E
)
=
P(EFi)
P(E)
=
P
(
E
⃒
⃒Fj
)
P
(
Fj
)
∑n
i=1P(E|Fi)P(Fi)
(7)
Equation (7) is known as the Bayes formula. Thus, we can consider E
as evidence of Fj, and calculate the probability that Fj will occur given
the evidence, P(E|Fi). Now suppose you have evidence from multiple
sources. From equation (4):
P
(
Fj
⃒
⃒E1E2…Em
)
=
P
(
E1E2…Em
⃒
⃒Fj
)
P
(
Fj
)
P(E1E2…Em)
(8)
The above equation will be used to obtain results.
The assumption that gives rise to the adjective Naïve is the inde­
pendence between the variables, which is not always true. However, the
approach is efficient in its implementation as the knowledge concerned.
It is found in comparatively large quantities and not so much in the
values themselves of the probabilities.
9. Classification using Bayesian Optimized SVM classifier
Fig. 5 shows the Bayesian optimized SVM classifier-based approach
described in the following subheadings.
Fig. 3. Basic block diagram of the proposed Naïve Bayes method.
Fig. 4. Event E occurs in conjunction with one of the mutually exclusive events
Fj [38].
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
9
9.1. Support vector machines (SVM)
Today in many real-world problems we use Multi-class classification.
Previously Support Vector Machines were used to deal with binary
(+/− 1) problems. The objective function is expressed by:
wr ∈ H, ∈r
∈ Rm
, br ∈ R
1
2
∑
M
r=1
wr
2
+
c
m
∑
m
i=1
∑
r∕
=y1
εr
i (9)
Subject to:
Wyi
, Xi + byi
≥ Wr, Xi + br + 2 − εr
i , … εr
i ≥ 0 (10)
Where, m ∈ {1, …, M}Yi and Yi ∈ [1, …, M] is the multi-class label of the
Xi pattern.
In terms of precision, the results obtained with this approach are
comparable to those obtained directly using the one against the rest
method. For practical problems, the choice of approach depends on the
available limitations, and relevant factors include the precision
required, the time available for development, the processing time, and
the classification problem’s nature.
9.2. Bayesian Optimization of support vector machine
Bayesian Optimization (BO) ‘s main idea is to construct a surrogate
probabilistic model sequentially to infer the objective function. Itera­
tively, new observations are made. The model is updated, reducing its
uncertainty allows working with a known and cheaper model, which is
used to construct a utility function that determines the next point to
evaluate. The different steps of the BO methodology are described
below.
First, the apriori model must be chosen over the possible space of
functions. Different parametric approaches can be used, such as Beta-
Bernoulli Bandit or Linear Models (Generalized) or nonparametric
models such as t-Student Processes or Gaussian processes. Then
repeatedly until a particular stopping criterion:
The prior and the likelihood of the observations so far are combined
to obtain a posterior distribution. This is done using Bayes’ theorem,
hence the origin of the name.
Recall Bayes’ theorem. Let A and B be two events such that the
conditional probability P (B | A) is known, then the probability P (A | B)
is given by:
P(A|B) =
P(B|A)P
P(B)
(11)
Where P(A) is the a priori probability, P(B|A) is the probability of event
B conditional on the occurrence of events, and P(A|B) is the posterior
probability. A particular utility function is then maximized on the a
posteriori model to determine the next point to evaluate. The new
observation is collected to repeat until the stop criterion. The primary
function of an SVM classifier is to resolve the feature subset selection
with parameter tuning. Since the SVM approach uses a discretization
technique for the continuous parameter, it results in less accurate data
loss results. This proposed work discusses the algorithm that can tune
SVM parameters. In this work, the algorithms are proposed to optimize
two SVM parameters, which are weight, C, and kernel function. The first
parameter weight identifies the trade-off between specific misclassifying
points and correctly classifying others, and the second parameter kernel
is used to tune SVM parameters and select the feature subset instanta­
neously.
Fig. 5. Basic block diagram of the proposed BO-SVM method.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
10
9.3. BO-SVM algorithm
In the above algorithm, the jobs of each variable are as follows.
i. K _x0001_holds the solutions which are received from the records
ii. M _x0001_holds the number of models that are used to generate
solutions
iii. Q holds the algorithm’s parameter which is used to control
diversification of the search process
iv. C holds the soft margin parameter
v. Y is Used as kernel function parameter that called the margin or
the width parameter.
vi. Finally, the termination conditions for the best values for SVM
parameters (C and Y)
10. Classification using salp swarm optimized neural network
classifier
Fig. 6 shows the salp swarm optimization of the neural network-
based approach described in the following subheadings.
11. Neural network (NN)
In this work, different neural network structures with a hidden layer
are tested, starting from several neurons equal to the average between
the number of inputs and the number of outputs. Then, the number of
neurons in the said layer gradually increased until the most recom­
mended structure for predicting heart disease is studied. The selection of
the best network structure is made considering the following evaluation
measures inside and outside the sample: RMSE (Root Mean Square
Error) and the MAPE (Mean Absolute Percentage Error), calculated
using equations (12) and (13).
RMSE =
̅̅̅
1
n
√
∑
n
t=1
(
y
1
t
− yt
)2
(12)
MAPE =
100
n
∑
n
t=1
⃒
⃒
⃒
⃒
⃒
⃒
⃒
y
1
t
− yt
yt
(13)
The number of observations is considered and is the real price and is
the model’s price estimated. The Salp Swarm Algorithm is used for
Fig. 6. Basic block diagram of the proposed SSA-NN method.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
11
determining bias and weight values.
Fitness Function: The fitness function’s objective is to minimize the
MSE between the target class and the predicted class of the training data.
It is also a function of bias and weight.
mim F(w, v) =
∑
q
t=1
[ct − (wxt + v)]2
(14)
Where, xt is input and ct is the target output.
The Salpswarm algorithm is utilized to find optimal bias and weight
values with an end goal to minimize the objective function of the
equation (14).
12. Salp swarm algorithm (SSA)
A salp is look likes a barrel-shaped planktic tunicate. It belongs to the
salpidae family. The body is very similar to the texture of jellyfish. It
moves in the water like jellyfish. It soaps and moves by pumping the
water with the help of their bodies. Due to the behavior of these crea­
tures, it’s not possible to reach their habitats, and also difficult to keep
them in a laboratory environment. For these creatures, biological
research is just a starting point. The main idea of this algorithm is about
the swarm behavior of salps. In-depth oceans, the swarm salps from a
chained structure which is called a salp-chain. Till today the reason for
these kinds of behavior is unknown. But few researchers consider as this
is carried by fast, different variations and producing methods to get
better movement.
A mathematical model for swarm behavior is designed with the help
of salp chains. It processes the population by dividing it into two
different parts, they are followers and leaders. In this salp chain process,
the leader is always in the front. Others are followed by him. The goal in
the search space is a food source, which is called TF. All the swarms are
targeted at this food source. Updating the position for leader salp by
target food source applies the following formula
x1
j =
{
TFj + c1
(
c2
(
ubj − lbj
)
+ lbj
)
c3 ≥ 0
TFj + c1
(
c2
(
ubj − lbj
)
+ lbj
)
c3 < 0
}
(15)
Here, x1
j represents the leading alpine position in the jth
dimension,
TFj represents the target food source in the jth
dimension, the c1, c2 and c3
are random numbers, ubj and lbj, respectively, the upper and lower
boundaries in the jth
dimension. The coefficient c1 balances the explo­
ration (global search) and exploitation (local search) phases of the
research space. This is the reason that it is the most important parameter
of the SSA algorithm which is written mathematically as [39].
c1 = 2
−
(
4m
M
)
2
e (16)
Here, m represents the current step, while M represents the total
number of steps. Let the value of M is 100. Both are random numbers,
and coefficients produced uniformly in the range of c1 and c2 [0, 1]. Each
follower salpinx updates the position according to the track followed by
the equation as follows [39]:
xi
j =
1
2
(
xi
j + xi− 1
j
)
∀ ≥ 2 (17)
Equation (17) shows that each follower salpinx follows its leader to
form a chain of salps. Here, xi
j means the jth
dimension ith
follower spin
site. Like all swarm-based optimization techniques, the starting location
of the sepsis was also generated randomly [39].
13. Experimental result
The following Table 2 illustrates the parameters of a simulation.
13.1. Evaluation parameters
The formulas shown below are used to calculate accuracy, precision,
and sensitivity.
Accuracy: Accuracy means what percentage of data is correctly
classified:
Accuracy =
TP + TN
TP + TN + FP + FN
(18)
Precision (P): The percentage of correctly classified when predicting
positivity is called precision:
p =
TP
Totalpositiveclassified
(19)
Sensitivity: The percentage of classifying a positive class is called
sensitivity:
S =
TP
Totalpositive
(20)
13.2. Simulation results for KNN
The Fig. 7 showing the confusion matrix for K-Nearest Neighbor
based approach. The matrix describes the performance of the KNN
model with the target class and output class values.
Here, TP = 3, TN = 9, FP = 1, FN = 2
Fig. 7. Confusion matrix for KNN based approach.
Table 2
Simulation parameters.
Salp Swarm
Algorithm
Neural Network Bayesian
Optimization
SVM
• Number of
swarm = 30
• Maximum
iteration = 100
• A feed-forward neural
network with 1 hidden layer
with 6 neurons.
• Scaled conjugate training
K-10 fold cross-
validation
–
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
12
Accuracy =
TP + TN
TP + TN + FP + FN
=
3 + 9
3 + 9 + 1 + 2
= 80%
Precision =
TP
TP + FP
=
3
3 + 1
= 75%
Sensitivity =
TP
TP + FN
=
3
3 + 2
= 60%
13.3. Simulation results for Naïve Bayes
The Fig. 8 showing the confusion matrix for Naïve Bayes based
approach. The matrix describes the performance of the NB model with
the target class and output class values.
Here, TP = 4, TN = 9, FP = 1, FN = 4
Accuracy =
TP + TN
TP + TN + FP + FN
=
4 + 9
4 + 9 + 1 + 1
= 86.7%
Precision =
TP
TP + FP
=
4
4 + 1
= 80%
Sensitivity =
TP
TP + FN
=
4
4 + 1
80%
13.4. Simulation results for SSA-NN
The Fig. 9 and Fig. 10 showing the confusion matrix for Neural
Network based approach and Salp Swarm Optimized Neural Network-
based approach respectively. The matrix describes the performance of
the Neural Network model with the target class and output class values.
Here, TP = 3, TN = 9, FP = 1, FN = 2
Accuracy =
TP + TN
TP + TN + FP + FN
=
3 + 9
3 + 9 + 1 + 2
= 80%
Precision =
TP
TP + FP
=
3
3 + 1
= 75%
Sensitivity =
TP
TP + FN
=
3
3 + 2
= 60%
The Fig. 10 showing the simulation results for Neural Network based
approach and Fig. 11 showing the Mean squared error performance
graph for a neural network-based approach.
The Fig. 12 showing the confusion matrix for Salp Swarm Neural
Network based approach.
Here, TP = 3, TN = 10 FP = 0, FN = 2
Accuracy =
TP + TN
TP + TN + FP + FN
=
3 + 10
3 + 10 + 0 + 2
= 86.7%
Precision =
TP
TP + FP
=
3
3 + 0
= 100%
Sensitivity =
TP
TP + FN
=
3
3 + 2
= 60%
The Fig. 13 showing the simulation results for Salp Swarm Optimized
Neural Network-based approach and Fig. 14 showing the Mean squared
error performance graph for Salp Swarm Optimized Neural Network-
based approach.
Fig. 9. Confusion matrix for a neural network-based approach.
Fig. 8. Confusion matrix for Naïve Bayes based approach.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
13
Fig. 10. Simulation result.
Fig. 11. Mean squared error performance graph for a neural network-
based approach.
Fig. 12. Confusion matrix for salp swarm optimized neural network-
based approach.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
14
13.5. Simulation results for BO-SVM
The confusion matrix for SVM and BO-SVM is show in Fig. 15 and
Fig. 16.
Here, TP = 4, TN = 8 FP = 2, FN = 1
Accuracy =
TP + TN
TP + TN + FP + FN
=
4 + 8
4 + 8 + 2 + 1
= 80 = %
Precision =
TP
TP + FP
=
4
4 + 2
= 66.7%
Sensitivity =
TP
TP + FN
=
4
4 + 1
= 80%
Here, TP = 4, TN = 10 FP = 0, FN = 1
Fig. 13. Output for salp swarm optimized neural network-based approach.
Fig. 14. Mean squared error performance graph for salp swarm optimized
neural network-based approach.
Fig. 15. Confusion matrix for SVM based approach.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
15
Accuracy =
TP + TN
TP + TN + FP + FN
=
4 + 10
4 + 10 + 0 + 1
= 93.3%
Precision =
TP
TP + FP
=
4
4 + 0
= 100%
Sensitivity =
TP
TP + FN
=
4
4 + 1
= 80%
The Objective function model for SVM is shown in the above Fig. 17
and the function evaluation graph in Fig. 18. The classification process
was developed using the MATLAB R2018a. In the study carried out, four
classification techniques were applied to compare those and observe
who provided greater Accuracy and less error in the predictions related
to heart diseases. Each confusion matrix shows the Accuracy, precision,
and sensitivity of the particular methods applied. The comparative re­
sults of each process are presented in Table 3 for better analysis:
Fig. 19 showing the Comparative results graph representation for
various classification methods.
14. Conclusion
Classification in data mining ensures a reduction in the problem’s
size, which reduces the duration of learning and simplifies the learned
model. This simplification generally facilitates the interpretation of this
model. It also makes it possible to avoid over-learning, improving the
accuracy of the prediction, and understanding the classifier. In this
research work, KNN and Naïve Bayes methods are standalone used for
the classification while the Salp Swarm Algorithm optimizes the bias and
weight values for Neural Network. Also, the weight and kernel function
of SVM are optimized by Bayesian Optimization. It can observe from the
confusion matrix plots that the optimization methods are very useful in
heart disease prediction. In this research, the Bayesian Optimized SVM-
based approach exceeds other methods with 93.3% of maximum
accuracy.
Fig. 17. Objective function model for SVM.
Table 3
Comparative results for various classification methods.
Proposed Method Accuracy Precision Sensitivity
KNN 80% 75% 60%
Naïve Bayes 86.7% 80% 80%
NN 80% 75% 60%
SSA-NN 86.7% 100% 60%
SVM 80% 66.7% 80%
BO-SVM 93.3% 100% 80%
Fig. 16. Confusion matrix for Bayesian optimized-SVM based approach.
Fig. 18. Minimum objective vs. the number of function evaluations graph.
Fig. 19. Comparative results graph representation for various classifica­
tion methods.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
16
14.1. Future scope
1. From a future perspective, it is necessary to formalize an alliance and
work together with the institutions that collect the forefront of
knowledge, and thus be able to apply it to improve a real problem at
the country level, to be a contribution to our society.
2. This work leaves an application that can be used as support for
medical personnel in medical decision making, but discrete data
variables.
3. It can also detect future work, heart disease, cancer, arthritis, and
other chronic diseases.
4. As the developed system is generalized, it can utilize it to analyze
various datasets in the future.
5. Deep learning algorithms can be used to increase Accuracy.
Conflict of interest STATEMENT
The authors whose names are listed immediately below certify that
they have NO affiliations with or involvement in any organization or
entity with any fi nancial interest (such as honoraria; educational grants;
participation in speakers’ bureaus; membership, employment, consul­
tancies, stock ownership, or other equity interest; and expert testimony
or patent-licensing arrangements), or non-financial interest (such as
personal or professional relationships, affiliations, knowledge or beliefs)
in the subject matter or materials discussed in this manuscript.
Acknowledgement
Sibo Prasad PatroAssistant Professor, Dept. of CSE GIET University,
Gunupur - 765022 14/05/2021.
I/We wish to submit an original research article entitled “Heart
Disease Prediction by using novel optimization algorithm: A supervised
learning prospective” for consideration by Informatics in Medicine
Unlocked.
I/We confirm that this work is original and has not been published
elsewhere, nor is it currently under consideration for publication
elsewhere.
In this paper, I/we report on/show that the heart disease prediction
using novel optimization technique. This is significant because the re­
sults reveal that the proposed novel optimized algorithm can provide an
effective healthcare monitoring system for the early prediction of car­
diovascular disease.
We believe that this manuscript is appropriate for publication by
Elsevier journal because it is more popular journal publication house for
different research domains over worldwide.
This research work is particularly interested in the category of data.
The classification allows us to obtain a prediction model from training
data and test data. These data are screened by a classification algorithm
that produces a new model capable of detailed data, possibly having the
same classes of data through the combination of mathematical tools and
computer methods. The analysis of data in medicine is becoming more
frequent to clarify the diagnoses, refine the research methods, and
envisage appropriate equipment supplies according to the importance of
the pathologies that appear. To analyze the present data to predict
optimal results, we need to use the optimization technique. In this
research the research work aims a framework for prediction of hearth
disese by using major risk factors based on different classifier algorithms
such as Naïve Bayes (NB), Bayesian Optimized Support Vector Machine
(BO-SVM), K-Nearest Neighbors (KNN), and Salp Swarm Optimized
Neural Network (SSA-NN).
We have no conflicts of interest to disclose.
References
[1] https://medium.com/analytics-vidhya/heart-disease-prediction-with-ensemble-lea
rning-74d6109beba1.
[2] Felman, A. (2018). Everything you need to know about heart disease. Medical
News Today, https://www.medicalnewstoday.com/articles/237191#types,
accessed date : 05/02/2021.
[3] Thomas J, Princy RT. March. “Human heart disease prediction system using data
mining techniques,”. In: 2016 international conference on circuit, power and
computing technologies (ICCPCT). IEEE; 2016. p. 1–5.
[4] Shao YE, Hou CD, Chiu CC. Hybrid intelligent modeling schemes for heart disease
classification. Appl Soft Comput 2014;14:47–52.
[5] Yekkala I, Dixit S, Jabbar MA. August. Prediction of heart disease using ensemble
learning and Particle Swarm Optimization. In: 2017 international conference on
smart technologies for smart nation (SmartTechCon). IEEE; 2017. p. 691–8.
[6] Amin SU, Agarwal K, Beg R. April. Genetic neural network-based data mining in
the prediction of heart disease using risk factors. In: 2013 IEEE conference on in­
formation & communication technologies. IEEE; 2013. p. 1227–31.
[7] Tan PN, Chawla S, Ho CK, Bailey J, editors. Advances in knowledge discovery and
data mining, Part II: 16th Pacific-Asia conference, PAKDD 2012, Kuala Lumpur,
Malaysia, may 29-June 1, 2012, Proceedings, Part II, vol. 7302. Springer; 2012.
[8] Chandel K, Kunwar V, Sabitha S, Choudhury T, Mukherjee S. A comparative study
on thyroid disease detection using K-nearest neighbor and Naive Bayes classifica­
tion techniques. CSI Trans. ICT 2016;4(2–4):313–9.
[9] Lépine JP, Briley M. The increasing burden of depression. Neuropsychiatric Dis
Treat 2011;7(Suppl 1):3.
[10] Gielen S, Schuler G, Adams V. Cardiovascular effects of exercise training. K: mo­
lecular Marti; 2010.
[11] Marti K. Stochastic optimization methods, vol. 3. Berlin: Springer; 2005.
[12] Hegazy AE, Makhlouf MA, El-Tawel GS. Improved salp swarm algorithm for feature
selection. J. King Saud Univ. Comput. Inf. Sci. 2020;32(3):335–44.
[13] Du P, Wang J, Hao Y, Niu T, Yang W. A novel hybrid model based on a multi-
objective Harris hawks optimization algorithm for daily PM2. 5 and PM10 fore­
casting. Appl Soft Comput 2020;96:106620.
[14] Gao L, Ding Y. Disease prediction via Bayesian hyperparameter optimization and
ensemble learning. BMC Res Notes 2020;13:1–6.
[15] Abusnaina AA, Ahmad S, Jarrar R, Mafarja M. June). Training neural networks
using a salp swarm algorithm for pattern classification. In: Proceedings of the 2nd
international conference on future networks and distributed systems; 2018. p. 1–6.
[16] Yaseen ZM, Faris H, Al-Ansari N. Hybridized extreme learning machine model with
salp swarm algorithm: a novel predictive model for hydrological application.
Complexity; 2020. 2020.
[17] Khourdifi Y, Bahaj M. Heart disease prediction and classification using machine
learning algorithms optimized by particle swarm optimization and ant colony
optimization. Int. J. Intell. Eng. Syst. 2019;12(1):242–52.
[18] Wang J, Gao Y, Chen X. A novel hybrid interval prediction approach based on
modified lower upper bound estimation in combination with multi-objective salp
swarm algorithm for short-term load forecasting. Energies 2018;11(6):1561.
[19] Abualigah L, Shehab M, Alshinwan M, Alabool H. Salp swarm algorithm: a
comprehensive survey. Neural Comput Appl 2019:1–21.
[20] Ahmed S, Mafarja M, Faris H, Aljarah I. March). Feature selection using a salp
swarm algorithm with chaos. In: Proceedings of the 2nd international conference
on intelligent systems. Metaheuristics & Swarm Intelligence; 2018. p. 65–9.
[21] Nayak SK, Rout PK, Jagadev AK, Swarnkar T. Elitism-based multi-objective dif­
ferential evolution for feature selection: a filter approach with an efficient
redundancy measure. J. King Saud Univ. Comput. Inf. Sci. 2020;32(2):174–87.
[22] Bai Y, Zeng B, Li C, Zhang J. An ensemble long short-term memory neural network
for hourly PM2. 5 concentration forecasting. Chemosphere 2019;222:286–94.
[23] Alani H, Tamimi A, Tamimi N. Cardiovascular co-morbidity in chronic kidney
disease: current knowledge and future research needs. World J Nephrol 2014;3(4):
156.
[24] Gao L, Ding Y. Disease prediction via Bayesian hyperparameter optimization and
ensemble learning. BMC Res Notes 2020;13:1–6.
[25] Wiesław P. Tree-based generational feature selection in medical applications.
Procedia Comput. Sci. 2019;159:2172–8.
[26] Beunza JJ, Puertas E, García-Ovejero E, Villalba G, Condes E, Koleva G,
Landecho MF. Comparison of machine learning algorithms for clinical event pre­
diction (risk of coronary heart disease). J Biomed Inf 2019;97:103257.
[27] Salih SQ, Alsewari AA. A new algorithm for normal and large-scale optimization
problems: nomadic People Optimizer. Neural Comput Appl 2020;32(14):
10359–86.
[28] Almustafa, K. M. Prediction of heart disease and classifiers’ sensitivity analysis, 02
July 2020. BMC Bioinf, 21.
[29] Vembandasamy K, Sasipriya R, Deepa E. Heart disease detection using Naive Bayes
algorithm. Int. J. Innov. Sci. Eng. Technol. 2015;2(9):441–4.
[30] Chaurasia V, Pal S. Data mining approach to detect heart diseases. Int J Adv
Comput Sci Inf Technol 2014;2:56–66.
[31] Zhang Y, Wang S, Ji G. A comprehensive survey on particle swarm optimization
algorithm and its applications. Math Probl Eng 2015;2015:931256. https://doi.
org/10.1155/2015/931256.
[32] Rizk-Allah RM, Hassanien AE, Elhoseny M, Gunasekaran M. A new binary salp
swarm algorithm: development and application for optimization tasks. Neural
Comput Appl 2019;31(5):1641–63.
[33] Patro SP, Padhy N, Chiranjevi D. Ambient assisted living predictive model for
cardiovascular disease prediction using supervised learning. Evol. Intell. 2020:
1–29.
[34] Khan MA, Algarni F. A healthcare monitoring system for the diagnosis of heart
disease in the IoMT cloud environment using MSSO-ANFIS. IEEE Access 2020;8:
122259–69.
S.P. Patro et al.
Informatics in Medicine Unlocked 26 (2021) 100696
17
[35] Wang J, Liu C, Li L, Li W, Yao L, Li H, Zhang H. A stacking-based model for non-
invasive detection of coronary heart disease. IEEE Access 2020;8:37124–33.
[36] Amin SU, Agarwal K, Beg R. Genetic neural network based data mining in pre­
diction of heart disease using risk factors. In: 2013 IEEE conference on information
& communication technologies. IEEE; 2013, April. p. 1227–31.
[37] Khateeb N, Usman M. Efficient heart disease prediction system using K-nearest
neighbor classification technique. In: proceedings of the international conference
on big data and Internet of thing; 2017, December. p. 21–6.
[38] Rish I. An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on
empirical methods in artificial intelligence, vol. 3; 2001, August. p. 41–6. 22.
[39] Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM. Salp Swarm
Algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng
Software 2017;114:163–91.
Sibo Prasad Patro*
, Gouri Sankar Nayak, Neelamadhab Padhy
School of Engineering and Technology, Department of Computer Science and
Engineering, GIET University, Gunupur-765022, Odisha, India
*
Corresponding author.
E-mail addresses: sibofromgiet@giet.edu (S.P. Patro), gsnayakcse@giet.
edu (G.S. Nayak), dr.neelamadhab@giet.edu (N. Padhy).
S.P. Patro et al.

More Related Content

Similar to Heart disease prediction by using novel optimization algorithm_ A supervised learning prospective.pdf

EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...BASMAJUMAASALEHALMOH
 
08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyanIAESIJEECS
 
Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...IJECEIAES
 
Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
 
A Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision TreeA Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision TreeIOSR Journals
 
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...BASMAJUMAASALEHALMOH
 
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...IRJET Journal
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regressionijtsrd
 
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...ijcsa
 
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...IRJET Journal
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationCemal Ardil
 
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHindexPub
 
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...BASMAJUMAASALEHALMOH
 
Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Kedar Damkondwar
 
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...indexPub
 
An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...BASMAJUMAASALEHALMOH
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationIRJET Journal
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET Journal
 
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...BASMAJUMAASALEHALMOH
 

Similar to Heart disease prediction by using novel optimization algorithm_ A supervised learning prospective.pdf (20)

EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...
 
08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan
 
Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...
 
Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...Machine learning approach for predicting heart and diabetes diseases using da...
Machine learning approach for predicting heart and diabetes diseases using da...
 
A Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision TreeA Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision Tree
 
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
 
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regression
 
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...
 
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
A Comparative Analysis of Heart Disease Prediction System Using Machine Learn...
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
 
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACH
 
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
 
Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm
 
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
CARDIOVASCULAR DISEASE DETECTION USING MACHINE LEARNING AND RISK CLASSIFICATI...
 
An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease Classification
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
 
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
 

More from BASMAJUMAASALEHALMOH

A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...BASMAJUMAASALEHALMOH
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...BASMAJUMAASALEHALMOH
 
An ensemble deep learning classifier of entropy convolutional neural network ...
An ensemble deep learning classifier of entropy convolutional neural network ...An ensemble deep learning classifier of entropy convolutional neural network ...
An ensemble deep learning classifier of entropy convolutional neural network ...BASMAJUMAASALEHALMOH
 
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...BASMAJUMAASALEHALMOH
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionBASMAJUMAASALEHALMOH
 
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdfWang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdfBASMAJUMAASALEHALMOH
 

More from BASMAJUMAASALEHALMOH (6)

A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
 
An ensemble deep learning classifier of entropy convolutional neural network ...
An ensemble deep learning classifier of entropy convolutional neural network ...An ensemble deep learning classifier of entropy convolutional neural network ...
An ensemble deep learning classifier of entropy convolutional neural network ...
 
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease Prediction
 
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdfWang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
 

Recently uploaded

Call Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service GurgaonCall Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service GurgaonCall Girls Service Gurgaon
 
Call Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any TimeCall Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any Timedelhimodelshub1
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012Call Girls Service Gurgaon
 
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...delhimodelshub1
 
Dehradun Call Girls Service 7017441440 Real Russian Girls Looking Models
Dehradun Call Girls Service 7017441440 Real Russian Girls Looking ModelsDehradun Call Girls Service 7017441440 Real Russian Girls Looking Models
Dehradun Call Girls Service 7017441440 Real Russian Girls Looking Modelsindiancallgirl4rent
 
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In ChandigarhHot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In ChandigarhVip call girls In Chandigarh
 
Call Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any TimeCall Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any Timedelhimodelshub1
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...High Profile Call Girls Chandigarh Aarushi
 
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591adityaroy0215
 
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service HyderabadCall Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsiindian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana TulsiHigh Profile Call Girls Chandigarh Aarushi
 
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7Miss joya
 
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunNiamh verma
 
Call Girls LB Nagar 7001305949 all area service COD available Any Time
Call Girls LB Nagar 7001305949 all area service COD available Any TimeCall Girls LB Nagar 7001305949 all area service COD available Any Time
Call Girls LB Nagar 7001305949 all area service COD available Any Timedelhimodelshub1
 
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
Basics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxBasics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxAyush Gupta
 

Recently uploaded (20)

Call Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service GurgaonCall Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
 
Call Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any TimeCall Girls Secunderabad 7001305949 all area service COD available Any Time
Call Girls Secunderabad 7001305949 all area service COD available Any Time
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
 
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
 
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
 
Call Girl Dehradun Aashi 🔝 7001305949 🔝 💃 Independent Escort Service Dehradun
Call Girl Dehradun Aashi 🔝 7001305949 🔝 💃 Independent Escort Service DehradunCall Girl Dehradun Aashi 🔝 7001305949 🔝 💃 Independent Escort Service Dehradun
Call Girl Dehradun Aashi 🔝 7001305949 🔝 💃 Independent Escort Service Dehradun
 
Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Subhash Nagar Delhi reach out to us at 🔝9953056974🔝
 
Dehradun Call Girls Service 7017441440 Real Russian Girls Looking Models
Dehradun Call Girls Service 7017441440 Real Russian Girls Looking ModelsDehradun Call Girls Service 7017441440 Real Russian Girls Looking Models
Dehradun Call Girls Service 7017441440 Real Russian Girls Looking Models
 
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service LucknowVIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
 
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In ChandigarhHot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
 
Call Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any TimeCall Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any Time
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
 
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
VIP Call Girl Sector 88 Gurgaon Delhi Just Call Me 9899900591
 
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service HyderabadCall Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
Call Girls in Hyderabad Lavanya 9907093804 Independent Escort Service Hyderabad
 
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsiindian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
indian Call Girl Panchkula ❤️🍑 9907093804 Low Rate Call Girls Ludhiana Tulsi
 
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
 
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
 
Call Girls LB Nagar 7001305949 all area service COD available Any Time
Call Girls LB Nagar 7001305949 all area service COD available Any TimeCall Girls LB Nagar 7001305949 all area service COD available Any Time
Call Girls LB Nagar 7001305949 all area service COD available Any Time
 
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
 
Basics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxBasics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptx
 

Heart disease prediction by using novel optimization algorithm_ A supervised learning prospective.pdf

  • 1. Informatics in Medicine Unlocked 26 (2021) 100696 Available online 11 August 2021 2352-9148/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Heart disease prediction by using novel optimization algorithm: A supervised learning prospective A R T I C L E I N F O Keywords Supervised learning Salp swarm optimization algorithm Heart attack Prediction Neural network A B S T R A C T Data analysis in medicine is becoming more and more frequent to clarify diagnoses, refine research methods, and plan appropriate equipment supplies according to the importance of the pathologies that appear. Artificial in­ telligence offers software solutions that are required to analyze the present data for optimal prediction of results. A system model is capable of several data processing algorithms for the classification of heart disease. This research work is particularly interested in the category of data. The classification allows us to obtain a prediction model from training data and test data. These data are screened by a classification algorithm that produces a new model capable of detailed data, possibly having the same classes of data through the combination of mathe­ matical tools and computer methods. To analyze the present data to predict optimal results, we need to use the optimization technique. This research work aims to design a framework for heart disease prediction by using major risk factors based on different classifier algorithms such as Naïve Bayes (NB), Bayesian Optimized Support Vector Machine (BO-SVM), K-Nearest Neighbors (KNN), and Salp Swarm Optimized Neural Network (SSA-NN). This research is carried out for the effective diagnosis of heart disease using the heart disease dataset available on the UCI Machine Repository. The highest performance was obtained using BO-SVM (accuracy = 93.3%, preci­ sion = 100%, sensitivity = 80%) followed by SSA-NN with (accuracy = 86.7%, precision = 100%, sensitivity = 60%) respectively. The results reveal that the proposed novel optimized algorithm can provide an effective healthcare monitoring system for the early prediction of heart disease. 1. Introduction A disease in the human body is an unnatural medical condition. It affects negatively the human body organism’s functional state. It is generally associated with few signs of illness in the patient body. Ac­ cording to the World Health Organization (WHO), in the last 15 years, an estimated 17 million people die each year from cardiovascular dis­ ease, particularly heart attacks and strokes [1]. Heart disease and stroke are the biggest killers. To predict heart disease, Machine Learning can be used for identifying unseen patterns and providing some clinical insights that will assist the physicians in planning and providing care. (see Fig. 8–19) Heart disease refers to a series of conditions that include the heart, vessels, muscles, valves, or internal electrical pathways responsible for muscle contraction. According to the Centers for Disease Control and Prevention(CDC), heart disease is one of the leading causes of death in India, the UK, the US, Canada, and Australia. Cardiovascular diseases (CVDs) are a leading cause of clinical (i.e., death and disability), health, and economic burden globally, accounting for approximately 31% (17.9 million) of total deaths each year, One in four deaths in the USA occurs as a result of heart disease [2]. Heart disease is common among both men and women in most countries around the world. Therefore, people should consider heart disease risk factors. Although it plays a genetic role, some lifestyle factors significantly affect heart disease. The known risk factors for heart disease; radiation therapy for age, gender, family history, smoking, some chemotherapy drugs and cancer, malnutrition, high blood pressure, high blood cholesterol levels, diabetes, obesity, physical mobility, stress, and poor hygiene [3]. These are the various risk factors in which the patient’s exposure towards developing a CVD. In contrast, the removal or improvement of this factor decreases this risk. This interpretation suggests the causality between the factor and the illness, which means that the risk factor precedes the disease (the notion of anteriority). Correction of the factor will cause the disease (the idea of reversibility) to decrease its occurrence. Of course, it must be recognized in several different populations and offer a plausible phys­ iopathological explanation of the disease. Strictly speaking, when there is no direct causal relationship, it is a “risk marker,” a witness to a process (e.g., the elevation of microalbuminuria, elevation of C-reactive protein “CRP”). We will see the main heart disease risk factors such as physiological factors (age, sex, and menopausal status), lifestyle factors (smoking, physical activity, alcohol, stress), metabolic syndrome factors (insulin resistance), dyslipidemia, abdominal obesity, high blood pres­ sure) and dietary factors. A heart disease risk factor is defined as a factor in which the patient’s exposure to this factor increases the risk of developing a CVD. In contrast, the removal or improvement of this factor decreases this risk. The risk factor’s importance is defined by the association’s strength with the disease (expressed by the relative risk observed in the exposed subjects compared to the unexposed) and the Contents lists available at ScienceDirect Informatics in Medicine Unlocked journal homepage: www.elsevier.com/locate/imu https://doi.org/10.1016/j.imu.2021.100696 Received 14 May 2021; Received in revised form 20 July 2021; Accepted 7 August 2021
  • 2. Informatics in Medicine Unlocked 26 (2021) 100696 2 gradual association (parallel to the risk factor). When the dataset contains various features, some of them are not useful and cause bad results. Therefore, the main aim of this research is to use a combined method to improve the classification and better feature selection which will lead to a better diagnosis of heart disease. In this study, an imperialist competitive algorithm with a meta-heuristic approach is used to optimize the selection of important features in heart disease. This algorithm can provide a more optimal response for feature selection towards genetic and other optimization algorithms. After data pre-processing of the dataset spitted into two sets training and testing set. The training set with 80% and the testing set with 20%. After feature extraction, the features have been supplied to K-nearest neighbor (KNN), Naïve Bayes Classifier, Support Vector Machine for classification proposes. Therefore, using the combination of these four methods can lead to improving the result of heart disease diagnosis and their different aspects. In other words, we are trying to improve classification accuracy on heart disease diagnosis. The proposed K-nearest neighbor (KNN), Naïve Bayes Classifier, Support Vector Machine classifier idea has never been done before. According to the simulation result section 13 the proposed method has achieved a better result in comparison with other algorithms, which has had two advantages, first decreasing the number of features, second, increasing classification accuracy. Objectives of this research are as follows: • Data collection from new features about heart disease. • Prediction and classification of incidence of heart disease using the proposed method. • Using new feature selection algorithms for the first time. • Providing a new combined approach with higher accuracy The paper is organized as follows: Section 2 contains a literature review, section 3 contains design parameters, Section 4 contains the optimization methods, section 5 contains the research gap, the proposed methodology is presented in section 6, data analysis and classification methods are presented in section 7,8 and 9, finally experimental results and conclusion are presented in the last section. 2. Literature review Any classification regarding Feature selection plays a significant role. Later, Swarm algorithms are suggested and they proved a valuable performance for feature selection. There are some studies on the clas­ sification of heart disease in the literature. One of them is the study of hybrid smart modeling schemes for the classification of heart disease of Shao et al. [4]. This paper uses 13 risk factors for heart disease predic­ tion. This study, which differs from existing approaches, proposes a novel hybrid framework to achieve various risk factors. This hybrid framework contains three methods; Multivariate Adaptive Regression (MAR), Logistic Regression (LR), and Artificial Neural Network (ANN). Initially, the encoded values of risk factors are reduced by using LR and MAR. Then, the rest of the encoded factors are used for the training of the ANN. The simulation results show that the hybrid approach out­ performs the conventional single-stage neural network [4]. The study of the use of data mining techniques in the prediction of heart diseases by Priyanka et al. compared the performances of Naïve Bayes and Decision tree algorithms, and the decision tree algorithm yielded much more successful results than Naïve Bayes, which gave an accuracy rate of 98.03%–82.35% [5]. Yekkala et al. [5] used Particle Swarm Optimiza­ tion (PSO) in conjunction with particle methods (Random Forest, Ada­ Boost, and Bagged Tree) to more accurately predict the results. The Heart Stalog dataset has 270 samples and 14 attributes, taken from the UCI database [5]. The data has already been processed, and PSO is used as a feature selection method to delete unnecessary and missing data. The significant features have been tested on the community classifier for various performance measures and steps as follows. After loading the data collection, after using PSO’s data to pick the element, removed the cleaning technique used for data after removing useless functions. Powerful features continued, and the AdaBoost, Bagging, and Random Forest. The two factor’s importance to full features. Eventually, we measured the performance of each algorithm. As a result, Bagged Tree performed 100%, Random Forest 90.37%, and AdaBoost 88.89%. Ac­ cording to test results, Yekkal et al. [5] proved that using Bagging Trees on PSO will improve learning accuracy in predicting heart disease. Amin et al. [6] show a heart disease prediction model using a genetic algo­ rithm, neural network, Naïve Bayes, Bagging Trees, Decision Tree, Core Density, and SVM. Learning is faster, more stable, and accurate compared to back-propagation. Collected risk factors data of 50 patients and the hybrid model resulted in 96% training accuracy and 89% test accuracy. Amin and his colleagues then developed the system using the hybrid fuzzy and k-nearest neighbor approach to predict heart diseases; in another system, using the neural network community was used with an accuracy of 89.01% in the diagnosis of heart disease. This hybrid system’s advantage is to help patients reduce their cost and time and control themselves for medical examinations before their heart disease and side effects. The researchers compared the algorithms used ac­ cording to the confusion matrix. In the end, understood that the J48 recorded the highest Accuracy at 99%. The K-nearest neighbor algo­ rithm is simple, but it can give impressive results. Factor’s importance about a classification method widely used in many fields and is also found among the top 10 data mining algorithms [8]. Typically, houses that are close to each other have similar characteristics. We can group them and give them a classification. The algorithm uses this same logic to try to group the elements that are close to each other. There are two basic types of data mining techniques; Predictive methods and descriptive methods [7]. • Descriptive Methods: These methods identify the current situation, describes the common belongings of the data in the dataset, and emphasize the understanding and interpretation of the feature. • Predictive Methods: These methods by learning the past simulate the feature. They use data with the help of known results to develop a model that could predict the values of other data. The stress risk factor seems to be complicated by a depressive state that follows the myocardial infarction. According to some studies, the incidence of depression is higher in patients with CVD than in those without CVD. Several studies agree that after myocardial infarction, a depressive state increases the recurrence risk by two years following the heart disease event [9]. Various approaches exist to explain the corre­ lation between psychosocial factors (stress, anxiety, and depression) and CVD. These factors increase catecholamine synthesis with their conse­ quences on the different metabolisms, blood pressure, and heart rate [10]. According to Ref. [11], Data Mining has three main axes: Statistics, Artificial Intelligence (AI, including Machine Learning), and Databases. Although these three axes are well specified, it is difficult to give a single definition for Data Mining. However, the description used is probably stated in Ref. [11], where it is mentioned that “Data Mining is the non-trivial process of extracting information from the data that is pre­ sent implicitly, previously unknown and potentially useful for the user.” This information is present in the data as patterns that are very useful when applied to solve problems in a particular context. Ah. E. Hegazy et al. [12] have highlighted how to improve basic SSA structure to enhance accuracy, convergence, speed, and reliability. In this research, the author presented a new control parameter to adjust the existing solution and suggested a new name to the improved salp swarm algo­ rithm(ISSA). This algorithm is used for testing the feature selection task. The combination of the ISSA algorithm and K-nearest neighbor classifier is used for feature selection. In this work, they used 23 UCI datasets for finding the performance of the ISSA algorithm. ISSA is used as a wrapper feature selection with the combination of KNN classifier as a fitness function. The researcher compared ISSA with the other four swarm methods. They got superior results than the previous feature reduction S.P. Patro et al.
  • 3. Informatics in Medicine Unlocked 26 (2021) 100696 3 and classification accuracy. For ISSA, the average classification accuracy result is 0.8422. Pei Du et al. [13] have implemented a model for getting accurate and reliable air pollutant forecasting. Air pollution seriously affects the human beings’ living environment, and it could be even endangering human being lives. To overcome this problem the researcher proposed a novel hybrid model. To decompose the original time series into different models a robust data preprocessing model is implemented. This model contains a low frequency and high frequency. For air pollution series prediction the researcher used the ELM model parameters with the help of high forecasting accuracy and consistency. The authors used two experiments, named PM2.5forecasting and PM 10forecasting, to illustrate the hybrid models’ superiority. Liyuan Gao et al. [14] have highlighted early disease prediction and diagnosis are most important for the betterment of patients’ survival. It is critically important to recognize the patient’s condition and its predictive char­ acteristics. The authors provided a comparative analysis of various systems of machine learning. By sampling with substitution, the unit calculates the standard deviation of the data. The researchers have given more emphasis in this research regarding analyzing and comparing Machine learning strategies to predict Breast cancer, Heart Disease, and identifying early high-risk features. The results of this research show how the Bayesian Hyperparameter Optimization model is better than random search and grid search methods. The researcher used the breast cancer diagnosis dataset for the Extreme Gradient Boosting model, and they got 94.74% accuracy, and for the heart disease dataset, they got 73.50%. Ahmed A. Abusnain et al. [15] have highlighted pattern clas­ sification is the most popular application of neural networks. It is most important to train the neural network. The researcher highlighted that the back-propagation algorithm is low in convergence rate, hence overcoming the Salp Swarm Algorithm(SSA) algorithm. SSA gives good performance results for optimization problems. For this research work, they proposed SSA’s for optimizing the weight coefficients for the neural networks for identifying pattern classification. They used the UCI ma­ chine learning repository dataset. Here an approach is used to adjust the parameter of NN connections weights using SSA algorithms. Zaher Mundher Yaseen et al. [16] Have highlighted a classical extreme learning machine (ELM) model algorithm on a non tuned. This approach is based on a random procedure which is not efficient in the convergence of outstanding performance in case of local problems. In this work, the researcher investigated the forecasting of the monthly Tigris river, Baghdad. The researcher used Salp Swarm Algorithm by using ELM for this work. They took twenty years of river flow data time series, and the results were evaluated depending on graphical presentations and several statistical measures. In this work they used the SSA-ELM model, the results were found an acceptable level of augmentation regarding the absolute metrics achieved 8.4 and 13.1% for RMSE and MAE, respec­ tively. Youness Khourdifi et al. [17] highlighted that machine learning is one of the key areas for predicting heart disease. The researcher high­ lighted optimization algorithms having the advantages of dealing with complex non-linear problems with high adaptability and flexibility. In this research, to improve the quality of heart disease classification, a method named Fast Correlation-Based Feature Selectin (FCBF) is used to filter redundant features. The researcher allows used various classifi­ cation algorithms like support vector machine, k-nearest neighbor, naïve Bayes, and random forest by particle swarm optimization com­ bined with ant colony optimization process. With the proposed mixed approach, the researchers applied the heart disease dataset for heart disease classification. With the help of proposed optimized models, including FBF, PSO, and ACO, they got a maximum classification ac­ curacy of 99.65% with KNN and 99.6% with RF. Jiyang Want et al. [18] have proposed that reliable and effective load forecasting is one of the important factors for operation decisions and power system planning. The safety and economic operation of the power system are directly affected by forecasting accuracy. Due to the complexity and instability of power load, forecasting accuracy is a most challenging issue. Hence the researcher proposed a novel hybrid system for designing forecasting by embedding a multi-objective module. A detailed salp swarm algo­ rithm and its critical characteristics were highlighted by Abualigah et al. [19]. SSA is one of the effective meta-heuristic optimization algorithms used for optimization problems discussed by the author. In machine learning, wireless networking, engineering design, storage power en­ ergy, and image processing, SSA can be used. They have done a comprehensive review in this research for various SSA types, including a chaotic salp algorithm, hybridizations of scalp swarm algorithm, binary scalp swarm algorithm, etc. The researcher highlighted the different limitations of the salp swarm algorithm. The SSA algorithm is having less control over multimodal strategies. Finally, the review says SSA shares a few advantages they are speed, simple and hybridization with other optimization algorithms. Sobhi Ahmed et al. [20] have highlighted the classification algorithm’s performance for the data dimensionality. Due to the high dimensionality of data, many problems related to clas­ sifier for its computational time are high, to avoid this, Feature selection is the best solution. This technique aims to reduce the number of features and irrelevant data, noisy data, and redundant data removed. The author highlighted metaheuristic algorithms are superior for solving this type of problem. In this research, the authors proposed a chaotic version of the Salp Swarm Algorithm. They used four different types of chaotic maps for controlling the balance between exploration and exploitation. The researchers used twelve well-known datasets in this research, which were brought from the UCI data repository. For the wrapper feature selection, they used the K-NN classifier evaluator. They divided each dataset into two parts, like 80% for training the data and 20% for testing the data. Subrat Kumar Nayak at el [21] highlighted how to deal with real-world data, which are more involved. To deal with such kinds of data, feature selection plays an important role. In this research, the author highlighted a Filter Approach using a Multi-objective differential evolution algorithm for feature selection. This algorithm is applied to handle the duplicate and unwanted features of a given dataset. The researcher highlighted two objectives. These are how to remove redundant ones, and the other is erroneous features by evaluating their relevance concerning additional features and class labels. In this novel work, the researcher used feature subsets of 23 required datasets and they are tested using 10-fold cross-validation. In this research 23, benchmark datasets are tested using 10- fold cross-validation with the help of four different well-known classifiers to get the result. Yun Bai et al. [22] have highlighted a PM2.5 concentration forecasting, which is useful and essential for protecting public health. In this research, the author proposed an ensemble long short-term memory neural network (E-LSTM). The proposed model is implemented in three different steps: multimodal feature extraction, multimodal feature learning, and inte­ gration. In this research, real datasets were used. The datasets are collected from Beijing and China’s environmental monitoring stations. They developed various LSTMs in different modes with the aid of E-LSTM; it was used as a single LSTM, feed-forward neural network, and the result was achieved as a mean absolute error of 19.604%, root mean square error 12.077, and correlation coefficient criteria 0.994. Alani, H., et al. [23] highlighted that chronic kidney disease leads to high mor­ tality rates and high patient expenditure. CKD may be one of the critical factors which lead to heart disease, and it’s a significant cause of death among renal transplant patients. This kind of disease is mainly uremia-specific, and it increases in prevalence as kidney function de­ clines. Due to the uremia-specific, it leads to various risk factors. They are hemoglobin, abnormal bone, albuminuria, and mineral metabolism. This kind of disease is found due to a lack of diagnostic screening tools, a lack of sensitivity and specificity to make them reliable, and the need for more RCT-quality evidence to guide intervention. Jacqueline O’Toole et al. [24] expressed, as heart disease can be predicted earlier in young adults even the risk can be reduced from future CVD burden. Mostly in young adults, the CVD risk is found when they are getting chest pain. This research analyzed lifestyle habits and CVD risks in young adults. In this work, they used 26 young adults data, with the age group of 39–40 years. The survey of this works shows a low risk of getting heart disease S.P. Patro et al.
  • 4. Informatics in Medicine Unlocked 26 (2021) 100696 4 within ten years due to their age. Where half of the young adults were identified with only 2 or more CVDRFs. But most of the adults were suffered from heart disease due to sedentary and overweight. Puja Wieslaw et al. [25] approached that the feature selection is the initial part in most of the knowledge discovery experiment work. In this research, a tree-based generational feature selection application is pre­ sented for medical data analysis. The basic approach was to estimate the importance of attributes for extraction from a given structure of the tee with the recursive application of the generation of feature sets. It helps remove selected features from a dataset. It creates a next-generation with a critical feature set, and this process continues until a crucial feature will be a random value. The researcher applied this process to a real-world medical dataset, including the Colon dataset and Loung Dataset. In this work, they recognized that almost all truly relevant features (19 of them) were simultaneous, with average Accuracy increasing from 0.68 to 0.74,Only one relevant feature wasn’t identified as important. Juan-Jose Beunza et al. [26] highlighted the usage of a supervised machine learning algorithm for predicting clinical events for its validity and accuracy. In this work, the data were brought from Framingham heart study data which contains 4240 observations. For this research, they focused on risk factors of heart disease with a combination of data mining. The researcher used different machine learning algorithms along with RapidMiner and R-Studio for analyzing the data. A neural network model was implemented to omit all the missing values when the AUC is 0.71. Later they used the same data by using RapidMiner and support vector machines and they got an AUC of 0.75. Sinan Q. Salih et al. [27] highlighted as metaheuristic algorithms are more suitable for solving different optimization and engineering problems. These kinds of algorithms have few issues. They are global search and local search capabilities. The researcher developed an algorithm called nomadic people optimizer for simulating nature for simulating the nature of the people’s movement, how they search their food, and how their life is under years, etc. This research was focused on a multi-swarm approach, where several clans consist, and each clan finds its best place. Validated the algorithm on 36 no of unconstrained benchmark functions. The outcome of the research was a unique solution for the NPO algorithm. Khaled Mohamad Almustafa et al. [28] highlighted that heart disease becomes one of the common diseases, and early diagnosis for this dis­ ease is challenging for healthcare providers. In this research, the re­ searchers implemented various classifiers for classifying the heart disease dataset for predicting heart disease with minimal attributes. They have collected the dataset from Cleveland, Switzerland, that con­ tains 76 characteristics with a class attribute of 1025 patients. Out of the 76 attributes in this work, using only 14 features. The researcher used various algorithms, including a k-nearest neighbor, decision tree, Naïve Bayes, SVM, stochastic gradient descent for best classification, and predicting heart disease cases. Using these classification algorithms, the researcher got accuracy results for KNN (k = 1), JRip a Decision tree with 99.70,97.26%, and 98.04, respectively. K.Vembandasamy et al. [29] approached health care as an essential factor in human life and health concern business nowadays, becoming notable in medical sci­ ence. The health care industry has a large amount of patient data, and on these data, various data mining techniques are applied to detect heart diseases in the patient. But using the data mining technique couldn’t get significant test results with the hidden information, so the researcher proposed a system using data mining algorithms to classify the data and detect heart diseases. In this research, the researcher used the Naïve Bayes algorithm to diagnose heart disease patients and implement the experiments using the weka tool. The proposed naïve Bayes model was satisfied for classifying 74% of the input instances correctly. It exhibited a precision of 71% on average, recall of 74% on average, and F- a 71.2% measure. Vikas Chaurasia et al. [30] discussed heart disease as an essential factor that causes death, and most of these deaths are found in low and middle-income countries. The healthcare industry collects enormous amounts of heart disease data, but these data are not well-mined to discover the hidden information to make effective decision-making. In this research, the researcher highlighted different knowledge discovery concepts in databases using various data ming techniques to help medical practitioners make an effective decision. The research work’s primary motto was to predict the presence of heart disease more accurately with fewer attributes. The researcher took only 11 features and used three classifiers named J48 Decision Tree, Naïve Bayes, and Bagging Algorithm used to predict patients’ diagnosis. In this work, the researcher got the highest Accuracy is 85.03%, and the lowest is 82.31%, where other algorithm yields an average accuracy of 84.35%. Yudong Zhang et al. [31] highlighted that practical swarm optimization is treated as a heuristic global optimization method and is one of the most commonly used optimization techniques. In this research, the author presented PSO’s comprehensive investigation with its advances, modifications like quantum-behaved PSO, chaotic PSO, Fuzzy PSO, etc. The author surveyed various applications of PSO in different areas. They are automation control systems, operation research, communication theory, fuel, and energy, etc. This work is divided into various aspects, including PSO modifications, an extension of PSO, hybridization of PSO, parallel implementation of PSO, and theoretical analysis of PSO. Jianzhou Rizk-Allah et al. [32] highlighted on salp swarm algorithm. It is the recent meta-heuristic algorithm that imitates salps’ behaviors at the time of navigating and foraging. This research, proposing a new version of SSA with named BSSA i.e binary salp swarm algorithm. The proposed BSSA is used for comparing four different variants of trans­ formed functions for solving few global optimization problems. Along with few nonparametric statistical tests were carried out named Wil­ coxon’s rank-sum with 5% significance level for judging the importance of the obtained results among the different algorithms statistically. In this work, the results of BSSA are better than other algorithms. Patro, S. P et al. [33] highlighted the challenge of taking health care anthem of age populations social care. Mostly, heart disease and chronic illnesses become more dangerous on these aged people, and sometimes it leads to a heart attack without any omens. It is too difficult for doctors to identify the patient’s status in time. In this regard, the researcher proposed a model that can identify these challenges, remotely real-time patient health data. A framework was proposed for predicting heart disease using major risk factors based on different classifier algorithms, including Naïve Bayes, K-Nearest, Support Vector Machine, Laso, and Ridge regression. Along with this for data classification, the researcher used principal component analysis and linear discriminant analysis. They used an open-source data set. In this process, they used 14 attri­ butes. After successful implementation, the support vector machine provides 92% accuracy, and F1 Accuracy is 85%. Khan, M. A., et al. [34] discussed the IoT application, including manufacturing, agriculture, healthcare, etc. The researcher focused on wearable devices application in the health monitoring system named Internet of Medical Things (IoMT). We can identify the mortality rate of early detection of heart disease prediction with clinical data analysis with its help. In this research, they investigated the key characteristics of heart disease pre­ diction using machine learning techniques. To improve the prediction accuracy of an IoMT framework designed to diagnose heart disease, they used modified salp swarm optimization, an adaptive neuro-fuzzy infer­ ence system in this research. The proposed MSSO-ANFIS prediction model obtains an accuracy of 99.45 with a precision of 96.54, which is higher than the other approaches. Wang, J., et al. [35] proposed the coronary arteriography (CAG) approach for the diagnosis of coronary heart disease (CHD). Machine learning’s help to perform different se­ lective of multiple ml algorithms for feature selection methods is used in the health care industry. In this research, they implemented a two-level stacking named base-level and meta-level. The prediction of base-level classifiers is selected as the input of the meta-level. They used the Z-Alizadeh Sani CHD dataset in this research, and this data set consisting of 2020 CAG cases. This model’s results obtained an accuracy, speci­ ficity, and sensitivity of 95.3%, 94.44%, and 95.84%, respectively. S.P. Patro et al.
  • 5. Informatics in Medicine Unlocked 26 (2021) 100696 5 The common goal of all the above-mentioned techniques is to classify heart disease using hybrid classification techniques. Many of the research is carried out using only classification and optimization tech­ niques. The proposed approaches are presented to achieve the desired results by identifying different optimization techniques with various machine learning algorithms. In this research different classifier approaches are proposed that include combination of different ensemble-based machine learning al­ gorithms to identify the redundant features to improve the accuracy and quality of heart disease classification. We will present a comparative analysis for a heart disease dataset classification using various classifi­ cation algorithms, all the classifiers are mostly used for similar heart disease-related research for datasets classification. The classifiers were used with cross-validation with 10 folds method, then we will study the performance of the Bayesian Optimized Support Vector Machine (BO- SVM), K-Nearest Neighbors (KNN) classifiers using various training set instead of the cross-validation methods using K-10 folds classification. Then, we will apply a different classifier algorithm such as Naïve Bayes (NB), Bayesian Optimized Support Vector Machine (BO-SVM), K-Nearest Neighbors (KNN), and Salp Swarm Optimized Neural Network (SSA- NN). The main goal of this research is to find the best accuracy for the prediction of heart disease by using major risk factors based on different classifier algorithms such as Bayesian Optimized Support Vector Ma­ chine (BO-SVM), K-Nearest Neighbors (KNN). 3. Design parameters Parameters or design variables are controlled factors that influence performance. They can be of various natures: geometric dimensions, properties of materials, structural choices, etc. They may be quantitative or qualitative, continuous, or discrete. The selection and the number of parameters also determine the definition of the optimization problem. There are many factors for increase the search space, but the optimi­ zation process will take longer. For example, a suitable geometrical shape, to ensure the validity of the modeling retained and its proper functioning, etc. 4. Optimization methods 4.1. Continuous optimization Continuous optimization is done by two methods, the first linear and the second non-linear. • Linear optimization in integers studies linear optimization problems in which a particular or all variables are controlled to get integer values. • Non-linear optimization delivers the normal case in which the objective or constraints (or both) contain non-linear, possibly non- convex, parts. 4.2. Combinatorial optimization Combinatorial optimization consists of finding the best solution be­ tween a finite number of choices. In other words, to minimize a function, with or without constraints, on a finite set of possibilities. When the number of possible combinations becomes exponential concerning the problem’s size, the computation time becomes rapidly critical. A generalized optimization problem is solved if it consists of finding a solution _x0001_ optimizing the cost function’s value _x0001_. Formally, we thus seek s* ∈ X such that f (s* ) ≤ f for all s ∈< /s > (1) Such a solution S* _x0001_ Is called an optimal solution or a global optimum. 5. Research gap Data quality is viewed from various basic dimensions (Accuracy, timeliness, relevance, completeness, intelligibility, and reliability), mainly addressing the data’s integrity in a particular research project. Missing data values cause errors; data without value creates ambiguity since it can be correct or wrong. Its importance lies in the fact that decision-making efficiency depends on the quality of the data. Small improvements in the dimensions of the data can lead to substantial improvements in the information for decision-making. Hence, it is beneficial for organizations to have proven studies of the selection and evaluation of characteristics of computational learning techniques and to use hybrid technologies that improve the results obtained. Various methods have been created for the analysis of heart disease. Be that as it may, there is dependably a degree for development, and still, a few systems are being created to beat the confinements of the current stra­ tegies. There are different data mining techniques for discovering re­ lations between the diseases, their symptoms, and prescriptions. Although such methods have certain constraints, iteration count, disposing of the consistent contentions, higher response time, and so forth. The principal limitation of the back-propagation neural network is that it results in a higher MSE (mean squared error) if the weight value is random (not optimized). Therefore, this research work uses the Salp Swarm algorithm to optimize the neural network’s weight value to reduce the MSE. Thus the Accuracy of an SSA-optimized Neural Network is higher than that of a Neural Network alone. Similarly, the support vector machine is optimized by Bayesian optimization. The KNN and Naïve Bayes classification algorithms are also used for comparative analysis of the research work. 6. Proposed methodology In this research work, we have given more focus to Machine Learning. Machine learning is a discipline that contains algorithms, which helps for empirical data to carry out in two approaches. First, identify the complex relationship through the data’s characteristics and employ the patterns to predict. In the data, it’s possible to find the relationship between the vari­ ables observed through algorithms, which would come to be like a machine that learns from a data sample or training data to capture the characteristics that are not observed through the probability distribu­ tion. It is possible to use the learned knowledge to make a smarter de­ cision using new data. Normally we can classify machine learning algorithms into different categories depending upon the results. Few of these classifications are supervised learning and non-supervised learning. When we want to analyze huge variables, we may face some prob­ lems with high dimensionality. To avoid such problems, there is a va­ riety of classified methods are used. For example, using a one-step selection method before using some other approach can increase the latter’s power. There are a few general characteristics of each of the strategies defined, and they are 1. Method of dimension reduction 2. Method of selection variables. 6.1. Data analysis and encoding In this work, the Cleveland dataset is used. The data were taken for this work in the form of a matrix. The matrix contains a set of rows and columns. By taking the data, we are predicting heart disease. In the UCI repository, there is various heart disease dataset available. They are Hungarian, Cleveland, Switzerland. The dataset contains 76 attributes and 303 records. But all published experiments refer to using a subset of 14 of them. The target column in the given dataset includes two different classes; for heart disease, it indicates 1 otherwise, 0. The important risk factors of the dataset are revealed in Table 1. The table consists of various risk factor and their corresponding values along S.P. Patro et al.
  • 6. Informatics in Medicine Unlocked 26 (2021) 100696 6 with the encoded values in brackets. These encoded values will be used as input to the proposed framework. Machine learning methods are dynamic because they usually contain several parameters that need to be optimized for best performance, and it can be tiring to optimize data manually as well as overloading. Therefore, the two classification approaches, SVM and Neural Network are optimized by Bayesian Optimization and Salp Swarm Optimization, respectively. Also, two standalone classification methods are used are KNN and Naïve Bayes. These approaches are proposed to define an op­ timum number of clusters in analyzed data. 6.2. Data analysis and encoding The significant risk factors of the dataset are revealed in Table 1. The table consists of various risk factor and their corresponding values along with the encoded values in brackets. These encoded values will be used as input to the proposed framework. Machine learning methods are dynamic because they usually contain several parameters that need to be optimized for the best performance. It can be tiring to optimize data manually and overloading. Therefore, the two classification approaches, SVM and Neural Network are optimized by Bayesian Optimization and Salp Swarm Optimization, respectively. Also, two standalone classifi­ cation methods are used are KNN and Naïve Bayes. These approaches are proposed to define an optimum number of clusters in analyzed data. The proposed methodology is shown in Fig. 1. 7. Classification using K-Nearest neighbor First of all, three parameters are considered: the sample data, the number of closest neighbors to select (K), and the point we want to evaluate (X). Subsequently, for each element of the sample, we assess the distance between reference point X and point X; of the learning set, and we check if the distance between them is less than one of the dis­ tances contained in the nearest neighbors’ list. If so, the point is added to the list. If the number of items in the list is more significant than K, the last value is simply removed from the list. The above Fig. 2 illustrates the classification of the K-Nearest Neighbor. The algorithm itself is not very complicated and can result in brute force if sampling is not too big. However, since we are talking about data mining, the number of in­ dividuals to be evaluated is often very big, that’s why an optimization algorithm is needed. There are many types of trees to speed up a search like the JCD tree or the ball tree. The algorithm ball tree will be covered later in this report. Here is the pseudo-code representing the algorithm [37]. Phase -1: In Heart disease prediction, the data set to follow one of the topmost repositories, which is called UCI machine learning, is a collection of data generators that are used to analyze the machine learning algorithm. Phase -2: The Data preprocessing step will remain the same as Lo­ gistic Regression, which refers to cleaning and organizing the raw for building and training the machine learning models. Generally, the Data preprocessing for machine learning follow specific steps, like as; 1.1. Import libraries. 1.2. Import the dataset, which datasets have almost come in CSV formats 1.3. Focused on Missing Data in Dataset. For identifying the missing data a library we use is called “Scikit Learn” preprocessing, which Table 1 Risk factors and their corresponding encodings [36]. S. No. Risk Factors Values 1 Sex Male (1), Female (0) 2 Age (years) 20-34 (− 2), 35–50 (− 1), 51–60 (0), 61-79 (1), >79 (2) 3 Blood Cholesterol Below 200 mg/dL - Low (− 1) 200–239 mg/dL - Normal (0) 240 mg/dL and above - High (1) 4 Blood Pressure Below 120 mm Hg- Low (− 1) 120–139 mm Hg- Normal (0) Above 139 mm Hg- High (1) 5 Hereditary Family Member diagnosed with HD -Yes (1) Otherwise –No (0) 6 Smoking Yes (1) or No (0) 7 Alcohol Intake Yes (1) or No (0) 8 Physical Activity Low (− 1), Normal (0) or High (1) 9 Diabetes Yes (1) or No (0) 10 Diet Poor (− 1), Normal (0) or Good (1) 11 Obesity Yes (1) or No (0) 12 Stress Yes (1) or No (0) Output Heart Disease Yes (1) or No (0) Fig. 1. Proposed methodology. S.P. Patro et al.
  • 7. Informatics in Medicine Unlocked 26 (2021) 100696 7 contains a class called “imputer,” which will help us take care of the missing data. 1.4. Encoding categorical data. 1.5. Split the dataset into a Training set and Test set. 1.6. Feature Scaling. This step is the final step of data preprocessing Phase -3: Training data set, here fit the K-NN classifier to the training data. To do this, we will import the K- Neighbors Classifier class of the Sklean Neighbors library. After importing the class, we will be able to create the Classifier object of the class. Phase -4: After evaluating the Training data set, It goes through the K-Nearest Neighbor method for the prediction of the class, we are implementing the KNN algorithm, which will be worked on as like; 4.1 Load the data 4.2 Initialize K to your chosen number of neighbors. 4.3 Now compare the Actual/Desired output. For taking the pre­ dicted class, repeat the process from 1 to the total number of training data points. Then calculate the distance between test data and training data by using the most popular distance metric called the Euclidean distance method, and the index is the sort to the ordered collection. Now pick the first K entries from the sorted array and get the most frequent class of the selected K- entries. 4.4 If any Error, then repeats step-1 to step 3. Otherwise, return the predicted class. Determine the class of x from the class of examples whose number is stored in the KNN. 8. Classification using Naïve Bayes classifier The Naïve Bayes technique is based on the theory of probability. In this, conditional probabilities are calculated by frequencies to predict the prediction of new cases. The Fig. 3 shows the Naïve Bayes classifier- based approach. Let E and F events, we can express E as: E = EF ∪ EFc (2) That is, for an event E to occur, E and F must occur, or E must occur and F not. Because EF and EFc are mutually exclusive, then we have: P(E) = P(EF) + P(EFc ) = P(E|F)P(F) + P(E|Fc )P(Fc ) = P(E|F)P(F) + P(E|Fc )(1 − P(F)) (3) Equation (3) states that the probability of event E is a weight of the conditional probability of E given that F has occurred and the condi­ tional probability of event Egiven that F has not occurred. Each condi­ tional probability provides as much weight as the conditioned event Fig. 2. Classification using K-Nearest Neighbor. S.P. Patro et al.
  • 8. Informatics in Medicine Unlocked 26 (2021) 100696 8 tends to occur. Equation (3) can be generalized as follows: suppose that events F1, F2, ...Fn are mutually exclusive such that ∪n i=1 Fi = S, where S is the sample space. In other words, precisely one of the events will occur (Fig. 4). One can write the above as: E = ∪ n i=1 Ei (4) From the definition of conditional probability, we have: P(EFi) = P(E|Fi)P(Fi) (5) Furthermore, using the fact that the EFi events i = 1, …, n are mutually exclusive, we obtain that: P(E) = ∑ n i=1 P(EFi) = ∑ n i=1 P(E|Fi)P(Fi) (6) Thus, equation (6) shows how forgiven events F1, F2, ... Fn of which one and only one can occur, P(E) can be calculated conditioning that F1 occurs. That is, it is established that P(E) is equal to the average of the weights of P(E|Fi) and each term is weighted by the probability of the event in which it is conditioned. Now suppose that E has occurred and that you want to determine the probability that event Fj has occurred. By equation (6) we have: P ( Fj ⃒ ⃒E ) = P(EFi) P(E) = P ( E ⃒ ⃒Fj ) P ( Fj ) ∑n i=1P(E|Fi)P(Fi) (7) Equation (7) is known as the Bayes formula. Thus, we can consider E as evidence of Fj, and calculate the probability that Fj will occur given the evidence, P(E|Fi). Now suppose you have evidence from multiple sources. From equation (4): P ( Fj ⃒ ⃒E1E2…Em ) = P ( E1E2…Em ⃒ ⃒Fj ) P ( Fj ) P(E1E2…Em) (8) The above equation will be used to obtain results. The assumption that gives rise to the adjective Naïve is the inde­ pendence between the variables, which is not always true. However, the approach is efficient in its implementation as the knowledge concerned. It is found in comparatively large quantities and not so much in the values themselves of the probabilities. 9. Classification using Bayesian Optimized SVM classifier Fig. 5 shows the Bayesian optimized SVM classifier-based approach described in the following subheadings. Fig. 3. Basic block diagram of the proposed Naïve Bayes method. Fig. 4. Event E occurs in conjunction with one of the mutually exclusive events Fj [38]. S.P. Patro et al.
  • 9. Informatics in Medicine Unlocked 26 (2021) 100696 9 9.1. Support vector machines (SVM) Today in many real-world problems we use Multi-class classification. Previously Support Vector Machines were used to deal with binary (+/− 1) problems. The objective function is expressed by: wr ∈ H, ∈r ∈ Rm , br ∈ R 1 2 ∑ M r=1 wr 2 + c m ∑ m i=1 ∑ r∕ =y1 εr i (9) Subject to: Wyi , Xi + byi ≥ Wr, Xi + br + 2 − εr i , … εr i ≥ 0 (10) Where, m ∈ {1, …, M}Yi and Yi ∈ [1, …, M] is the multi-class label of the Xi pattern. In terms of precision, the results obtained with this approach are comparable to those obtained directly using the one against the rest method. For practical problems, the choice of approach depends on the available limitations, and relevant factors include the precision required, the time available for development, the processing time, and the classification problem’s nature. 9.2. Bayesian Optimization of support vector machine Bayesian Optimization (BO) ‘s main idea is to construct a surrogate probabilistic model sequentially to infer the objective function. Itera­ tively, new observations are made. The model is updated, reducing its uncertainty allows working with a known and cheaper model, which is used to construct a utility function that determines the next point to evaluate. The different steps of the BO methodology are described below. First, the apriori model must be chosen over the possible space of functions. Different parametric approaches can be used, such as Beta- Bernoulli Bandit or Linear Models (Generalized) or nonparametric models such as t-Student Processes or Gaussian processes. Then repeatedly until a particular stopping criterion: The prior and the likelihood of the observations so far are combined to obtain a posterior distribution. This is done using Bayes’ theorem, hence the origin of the name. Recall Bayes’ theorem. Let A and B be two events such that the conditional probability P (B | A) is known, then the probability P (A | B) is given by: P(A|B) = P(B|A)P P(B) (11) Where P(A) is the a priori probability, P(B|A) is the probability of event B conditional on the occurrence of events, and P(A|B) is the posterior probability. A particular utility function is then maximized on the a posteriori model to determine the next point to evaluate. The new observation is collected to repeat until the stop criterion. The primary function of an SVM classifier is to resolve the feature subset selection with parameter tuning. Since the SVM approach uses a discretization technique for the continuous parameter, it results in less accurate data loss results. This proposed work discusses the algorithm that can tune SVM parameters. In this work, the algorithms are proposed to optimize two SVM parameters, which are weight, C, and kernel function. The first parameter weight identifies the trade-off between specific misclassifying points and correctly classifying others, and the second parameter kernel is used to tune SVM parameters and select the feature subset instanta­ neously. Fig. 5. Basic block diagram of the proposed BO-SVM method. S.P. Patro et al.
  • 10. Informatics in Medicine Unlocked 26 (2021) 100696 10 9.3. BO-SVM algorithm In the above algorithm, the jobs of each variable are as follows. i. K _x0001_holds the solutions which are received from the records ii. M _x0001_holds the number of models that are used to generate solutions iii. Q holds the algorithm’s parameter which is used to control diversification of the search process iv. C holds the soft margin parameter v. Y is Used as kernel function parameter that called the margin or the width parameter. vi. Finally, the termination conditions for the best values for SVM parameters (C and Y) 10. Classification using salp swarm optimized neural network classifier Fig. 6 shows the salp swarm optimization of the neural network- based approach described in the following subheadings. 11. Neural network (NN) In this work, different neural network structures with a hidden layer are tested, starting from several neurons equal to the average between the number of inputs and the number of outputs. Then, the number of neurons in the said layer gradually increased until the most recom­ mended structure for predicting heart disease is studied. The selection of the best network structure is made considering the following evaluation measures inside and outside the sample: RMSE (Root Mean Square Error) and the MAPE (Mean Absolute Percentage Error), calculated using equations (12) and (13). RMSE = ̅̅̅ 1 n √ ∑ n t=1 ( y 1 t − yt )2 (12) MAPE = 100 n ∑ n t=1 ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ y 1 t − yt yt (13) The number of observations is considered and is the real price and is the model’s price estimated. The Salp Swarm Algorithm is used for Fig. 6. Basic block diagram of the proposed SSA-NN method. S.P. Patro et al.
  • 11. Informatics in Medicine Unlocked 26 (2021) 100696 11 determining bias and weight values. Fitness Function: The fitness function’s objective is to minimize the MSE between the target class and the predicted class of the training data. It is also a function of bias and weight. mim F(w, v) = ∑ q t=1 [ct − (wxt + v)]2 (14) Where, xt is input and ct is the target output. The Salpswarm algorithm is utilized to find optimal bias and weight values with an end goal to minimize the objective function of the equation (14). 12. Salp swarm algorithm (SSA) A salp is look likes a barrel-shaped planktic tunicate. It belongs to the salpidae family. The body is very similar to the texture of jellyfish. It moves in the water like jellyfish. It soaps and moves by pumping the water with the help of their bodies. Due to the behavior of these crea­ tures, it’s not possible to reach their habitats, and also difficult to keep them in a laboratory environment. For these creatures, biological research is just a starting point. The main idea of this algorithm is about the swarm behavior of salps. In-depth oceans, the swarm salps from a chained structure which is called a salp-chain. Till today the reason for these kinds of behavior is unknown. But few researchers consider as this is carried by fast, different variations and producing methods to get better movement. A mathematical model for swarm behavior is designed with the help of salp chains. It processes the population by dividing it into two different parts, they are followers and leaders. In this salp chain process, the leader is always in the front. Others are followed by him. The goal in the search space is a food source, which is called TF. All the swarms are targeted at this food source. Updating the position for leader salp by target food source applies the following formula x1 j = { TFj + c1 ( c2 ( ubj − lbj ) + lbj ) c3 ≥ 0 TFj + c1 ( c2 ( ubj − lbj ) + lbj ) c3 < 0 } (15) Here, x1 j represents the leading alpine position in the jth dimension, TFj represents the target food source in the jth dimension, the c1, c2 and c3 are random numbers, ubj and lbj, respectively, the upper and lower boundaries in the jth dimension. The coefficient c1 balances the explo­ ration (global search) and exploitation (local search) phases of the research space. This is the reason that it is the most important parameter of the SSA algorithm which is written mathematically as [39]. c1 = 2 − ( 4m M ) 2 e (16) Here, m represents the current step, while M represents the total number of steps. Let the value of M is 100. Both are random numbers, and coefficients produced uniformly in the range of c1 and c2 [0, 1]. Each follower salpinx updates the position according to the track followed by the equation as follows [39]: xi j = 1 2 ( xi j + xi− 1 j ) ∀ ≥ 2 (17) Equation (17) shows that each follower salpinx follows its leader to form a chain of salps. Here, xi j means the jth dimension ith follower spin site. Like all swarm-based optimization techniques, the starting location of the sepsis was also generated randomly [39]. 13. Experimental result The following Table 2 illustrates the parameters of a simulation. 13.1. Evaluation parameters The formulas shown below are used to calculate accuracy, precision, and sensitivity. Accuracy: Accuracy means what percentage of data is correctly classified: Accuracy = TP + TN TP + TN + FP + FN (18) Precision (P): The percentage of correctly classified when predicting positivity is called precision: p = TP Totalpositiveclassified (19) Sensitivity: The percentage of classifying a positive class is called sensitivity: S = TP Totalpositive (20) 13.2. Simulation results for KNN The Fig. 7 showing the confusion matrix for K-Nearest Neighbor based approach. The matrix describes the performance of the KNN model with the target class and output class values. Here, TP = 3, TN = 9, FP = 1, FN = 2 Fig. 7. Confusion matrix for KNN based approach. Table 2 Simulation parameters. Salp Swarm Algorithm Neural Network Bayesian Optimization SVM • Number of swarm = 30 • Maximum iteration = 100 • A feed-forward neural network with 1 hidden layer with 6 neurons. • Scaled conjugate training K-10 fold cross- validation – S.P. Patro et al.
  • 12. Informatics in Medicine Unlocked 26 (2021) 100696 12 Accuracy = TP + TN TP + TN + FP + FN = 3 + 9 3 + 9 + 1 + 2 = 80% Precision = TP TP + FP = 3 3 + 1 = 75% Sensitivity = TP TP + FN = 3 3 + 2 = 60% 13.3. Simulation results for Naïve Bayes The Fig. 8 showing the confusion matrix for Naïve Bayes based approach. The matrix describes the performance of the NB model with the target class and output class values. Here, TP = 4, TN = 9, FP = 1, FN = 4 Accuracy = TP + TN TP + TN + FP + FN = 4 + 9 4 + 9 + 1 + 1 = 86.7% Precision = TP TP + FP = 4 4 + 1 = 80% Sensitivity = TP TP + FN = 4 4 + 1 80% 13.4. Simulation results for SSA-NN The Fig. 9 and Fig. 10 showing the confusion matrix for Neural Network based approach and Salp Swarm Optimized Neural Network- based approach respectively. The matrix describes the performance of the Neural Network model with the target class and output class values. Here, TP = 3, TN = 9, FP = 1, FN = 2 Accuracy = TP + TN TP + TN + FP + FN = 3 + 9 3 + 9 + 1 + 2 = 80% Precision = TP TP + FP = 3 3 + 1 = 75% Sensitivity = TP TP + FN = 3 3 + 2 = 60% The Fig. 10 showing the simulation results for Neural Network based approach and Fig. 11 showing the Mean squared error performance graph for a neural network-based approach. The Fig. 12 showing the confusion matrix for Salp Swarm Neural Network based approach. Here, TP = 3, TN = 10 FP = 0, FN = 2 Accuracy = TP + TN TP + TN + FP + FN = 3 + 10 3 + 10 + 0 + 2 = 86.7% Precision = TP TP + FP = 3 3 + 0 = 100% Sensitivity = TP TP + FN = 3 3 + 2 = 60% The Fig. 13 showing the simulation results for Salp Swarm Optimized Neural Network-based approach and Fig. 14 showing the Mean squared error performance graph for Salp Swarm Optimized Neural Network- based approach. Fig. 9. Confusion matrix for a neural network-based approach. Fig. 8. Confusion matrix for Naïve Bayes based approach. S.P. Patro et al.
  • 13. Informatics in Medicine Unlocked 26 (2021) 100696 13 Fig. 10. Simulation result. Fig. 11. Mean squared error performance graph for a neural network- based approach. Fig. 12. Confusion matrix for salp swarm optimized neural network- based approach. S.P. Patro et al.
  • 14. Informatics in Medicine Unlocked 26 (2021) 100696 14 13.5. Simulation results for BO-SVM The confusion matrix for SVM and BO-SVM is show in Fig. 15 and Fig. 16. Here, TP = 4, TN = 8 FP = 2, FN = 1 Accuracy = TP + TN TP + TN + FP + FN = 4 + 8 4 + 8 + 2 + 1 = 80 = % Precision = TP TP + FP = 4 4 + 2 = 66.7% Sensitivity = TP TP + FN = 4 4 + 1 = 80% Here, TP = 4, TN = 10 FP = 0, FN = 1 Fig. 13. Output for salp swarm optimized neural network-based approach. Fig. 14. Mean squared error performance graph for salp swarm optimized neural network-based approach. Fig. 15. Confusion matrix for SVM based approach. S.P. Patro et al.
  • 15. Informatics in Medicine Unlocked 26 (2021) 100696 15 Accuracy = TP + TN TP + TN + FP + FN = 4 + 10 4 + 10 + 0 + 1 = 93.3% Precision = TP TP + FP = 4 4 + 0 = 100% Sensitivity = TP TP + FN = 4 4 + 1 = 80% The Objective function model for SVM is shown in the above Fig. 17 and the function evaluation graph in Fig. 18. The classification process was developed using the MATLAB R2018a. In the study carried out, four classification techniques were applied to compare those and observe who provided greater Accuracy and less error in the predictions related to heart diseases. Each confusion matrix shows the Accuracy, precision, and sensitivity of the particular methods applied. The comparative re­ sults of each process are presented in Table 3 for better analysis: Fig. 19 showing the Comparative results graph representation for various classification methods. 14. Conclusion Classification in data mining ensures a reduction in the problem’s size, which reduces the duration of learning and simplifies the learned model. This simplification generally facilitates the interpretation of this model. It also makes it possible to avoid over-learning, improving the accuracy of the prediction, and understanding the classifier. In this research work, KNN and Naïve Bayes methods are standalone used for the classification while the Salp Swarm Algorithm optimizes the bias and weight values for Neural Network. Also, the weight and kernel function of SVM are optimized by Bayesian Optimization. It can observe from the confusion matrix plots that the optimization methods are very useful in heart disease prediction. In this research, the Bayesian Optimized SVM- based approach exceeds other methods with 93.3% of maximum accuracy. Fig. 17. Objective function model for SVM. Table 3 Comparative results for various classification methods. Proposed Method Accuracy Precision Sensitivity KNN 80% 75% 60% Naïve Bayes 86.7% 80% 80% NN 80% 75% 60% SSA-NN 86.7% 100% 60% SVM 80% 66.7% 80% BO-SVM 93.3% 100% 80% Fig. 16. Confusion matrix for Bayesian optimized-SVM based approach. Fig. 18. Minimum objective vs. the number of function evaluations graph. Fig. 19. Comparative results graph representation for various classifica­ tion methods. S.P. Patro et al.
  • 16. Informatics in Medicine Unlocked 26 (2021) 100696 16 14.1. Future scope 1. From a future perspective, it is necessary to formalize an alliance and work together with the institutions that collect the forefront of knowledge, and thus be able to apply it to improve a real problem at the country level, to be a contribution to our society. 2. This work leaves an application that can be used as support for medical personnel in medical decision making, but discrete data variables. 3. It can also detect future work, heart disease, cancer, arthritis, and other chronic diseases. 4. As the developed system is generalized, it can utilize it to analyze various datasets in the future. 5. Deep learning algorithms can be used to increase Accuracy. Conflict of interest STATEMENT The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any fi nancial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consul­ tancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. Acknowledgement Sibo Prasad PatroAssistant Professor, Dept. of CSE GIET University, Gunupur - 765022 14/05/2021. I/We wish to submit an original research article entitled “Heart Disease Prediction by using novel optimization algorithm: A supervised learning prospective” for consideration by Informatics in Medicine Unlocked. I/We confirm that this work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere. In this paper, I/we report on/show that the heart disease prediction using novel optimization technique. This is significant because the re­ sults reveal that the proposed novel optimized algorithm can provide an effective healthcare monitoring system for the early prediction of car­ diovascular disease. We believe that this manuscript is appropriate for publication by Elsevier journal because it is more popular journal publication house for different research domains over worldwide. This research work is particularly interested in the category of data. The classification allows us to obtain a prediction model from training data and test data. These data are screened by a classification algorithm that produces a new model capable of detailed data, possibly having the same classes of data through the combination of mathematical tools and computer methods. The analysis of data in medicine is becoming more frequent to clarify the diagnoses, refine the research methods, and envisage appropriate equipment supplies according to the importance of the pathologies that appear. To analyze the present data to predict optimal results, we need to use the optimization technique. In this research the research work aims a framework for prediction of hearth disese by using major risk factors based on different classifier algorithms such as Naïve Bayes (NB), Bayesian Optimized Support Vector Machine (BO-SVM), K-Nearest Neighbors (KNN), and Salp Swarm Optimized Neural Network (SSA-NN). We have no conflicts of interest to disclose. References [1] https://medium.com/analytics-vidhya/heart-disease-prediction-with-ensemble-lea rning-74d6109beba1. [2] Felman, A. (2018). Everything you need to know about heart disease. Medical News Today, https://www.medicalnewstoday.com/articles/237191#types, accessed date : 05/02/2021. [3] Thomas J, Princy RT. March. “Human heart disease prediction system using data mining techniques,”. In: 2016 international conference on circuit, power and computing technologies (ICCPCT). IEEE; 2016. p. 1–5. [4] Shao YE, Hou CD, Chiu CC. Hybrid intelligent modeling schemes for heart disease classification. Appl Soft Comput 2014;14:47–52. [5] Yekkala I, Dixit S, Jabbar MA. August. Prediction of heart disease using ensemble learning and Particle Swarm Optimization. In: 2017 international conference on smart technologies for smart nation (SmartTechCon). IEEE; 2017. p. 691–8. [6] Amin SU, Agarwal K, Beg R. April. Genetic neural network-based data mining in the prediction of heart disease using risk factors. In: 2013 IEEE conference on in­ formation & communication technologies. IEEE; 2013. p. 1227–31. [7] Tan PN, Chawla S, Ho CK, Bailey J, editors. Advances in knowledge discovery and data mining, Part II: 16th Pacific-Asia conference, PAKDD 2012, Kuala Lumpur, Malaysia, may 29-June 1, 2012, Proceedings, Part II, vol. 7302. Springer; 2012. [8] Chandel K, Kunwar V, Sabitha S, Choudhury T, Mukherjee S. A comparative study on thyroid disease detection using K-nearest neighbor and Naive Bayes classifica­ tion techniques. CSI Trans. ICT 2016;4(2–4):313–9. [9] Lépine JP, Briley M. The increasing burden of depression. Neuropsychiatric Dis Treat 2011;7(Suppl 1):3. [10] Gielen S, Schuler G, Adams V. Cardiovascular effects of exercise training. K: mo­ lecular Marti; 2010. [11] Marti K. Stochastic optimization methods, vol. 3. Berlin: Springer; 2005. [12] Hegazy AE, Makhlouf MA, El-Tawel GS. Improved salp swarm algorithm for feature selection. J. King Saud Univ. Comput. Inf. Sci. 2020;32(3):335–44. [13] Du P, Wang J, Hao Y, Niu T, Yang W. A novel hybrid model based on a multi- objective Harris hawks optimization algorithm for daily PM2. 5 and PM10 fore­ casting. Appl Soft Comput 2020;96:106620. [14] Gao L, Ding Y. Disease prediction via Bayesian hyperparameter optimization and ensemble learning. BMC Res Notes 2020;13:1–6. [15] Abusnaina AA, Ahmad S, Jarrar R, Mafarja M. June). Training neural networks using a salp swarm algorithm for pattern classification. In: Proceedings of the 2nd international conference on future networks and distributed systems; 2018. p. 1–6. [16] Yaseen ZM, Faris H, Al-Ansari N. Hybridized extreme learning machine model with salp swarm algorithm: a novel predictive model for hydrological application. Complexity; 2020. 2020. [17] Khourdifi Y, Bahaj M. Heart disease prediction and classification using machine learning algorithms optimized by particle swarm optimization and ant colony optimization. Int. J. Intell. Eng. Syst. 2019;12(1):242–52. [18] Wang J, Gao Y, Chen X. A novel hybrid interval prediction approach based on modified lower upper bound estimation in combination with multi-objective salp swarm algorithm for short-term load forecasting. Energies 2018;11(6):1561. [19] Abualigah L, Shehab M, Alshinwan M, Alabool H. Salp swarm algorithm: a comprehensive survey. Neural Comput Appl 2019:1–21. [20] Ahmed S, Mafarja M, Faris H, Aljarah I. March). Feature selection using a salp swarm algorithm with chaos. In: Proceedings of the 2nd international conference on intelligent systems. Metaheuristics & Swarm Intelligence; 2018. p. 65–9. [21] Nayak SK, Rout PK, Jagadev AK, Swarnkar T. Elitism-based multi-objective dif­ ferential evolution for feature selection: a filter approach with an efficient redundancy measure. J. King Saud Univ. Comput. Inf. Sci. 2020;32(2):174–87. [22] Bai Y, Zeng B, Li C, Zhang J. An ensemble long short-term memory neural network for hourly PM2. 5 concentration forecasting. Chemosphere 2019;222:286–94. [23] Alani H, Tamimi A, Tamimi N. Cardiovascular co-morbidity in chronic kidney disease: current knowledge and future research needs. World J Nephrol 2014;3(4): 156. [24] Gao L, Ding Y. Disease prediction via Bayesian hyperparameter optimization and ensemble learning. BMC Res Notes 2020;13:1–6. [25] Wiesław P. Tree-based generational feature selection in medical applications. Procedia Comput. Sci. 2019;159:2172–8. [26] Beunza JJ, Puertas E, García-Ovejero E, Villalba G, Condes E, Koleva G, Landecho MF. Comparison of machine learning algorithms for clinical event pre­ diction (risk of coronary heart disease). J Biomed Inf 2019;97:103257. [27] Salih SQ, Alsewari AA. A new algorithm for normal and large-scale optimization problems: nomadic People Optimizer. Neural Comput Appl 2020;32(14): 10359–86. [28] Almustafa, K. M. Prediction of heart disease and classifiers’ sensitivity analysis, 02 July 2020. BMC Bioinf, 21. [29] Vembandasamy K, Sasipriya R, Deepa E. Heart disease detection using Naive Bayes algorithm. Int. J. Innov. Sci. Eng. Technol. 2015;2(9):441–4. [30] Chaurasia V, Pal S. Data mining approach to detect heart diseases. Int J Adv Comput Sci Inf Technol 2014;2:56–66. [31] Zhang Y, Wang S, Ji G. A comprehensive survey on particle swarm optimization algorithm and its applications. Math Probl Eng 2015;2015:931256. https://doi. org/10.1155/2015/931256. [32] Rizk-Allah RM, Hassanien AE, Elhoseny M, Gunasekaran M. A new binary salp swarm algorithm: development and application for optimization tasks. Neural Comput Appl 2019;31(5):1641–63. [33] Patro SP, Padhy N, Chiranjevi D. Ambient assisted living predictive model for cardiovascular disease prediction using supervised learning. Evol. Intell. 2020: 1–29. [34] Khan MA, Algarni F. A healthcare monitoring system for the diagnosis of heart disease in the IoMT cloud environment using MSSO-ANFIS. IEEE Access 2020;8: 122259–69. S.P. Patro et al.
  • 17. Informatics in Medicine Unlocked 26 (2021) 100696 17 [35] Wang J, Liu C, Li L, Li W, Yao L, Li H, Zhang H. A stacking-based model for non- invasive detection of coronary heart disease. IEEE Access 2020;8:37124–33. [36] Amin SU, Agarwal K, Beg R. Genetic neural network based data mining in pre­ diction of heart disease using risk factors. In: 2013 IEEE conference on information & communication technologies. IEEE; 2013, April. p. 1227–31. [37] Khateeb N, Usman M. Efficient heart disease prediction system using K-nearest neighbor classification technique. In: proceedings of the international conference on big data and Internet of thing; 2017, December. p. 21–6. [38] Rish I. An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3; 2001, August. p. 41–6. 22. [39] Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM. Salp Swarm Algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Software 2017;114:163–91. Sibo Prasad Patro* , Gouri Sankar Nayak, Neelamadhab Padhy School of Engineering and Technology, Department of Computer Science and Engineering, GIET University, Gunupur-765022, Odisha, India * Corresponding author. E-mail addresses: sibofromgiet@giet.edu (S.P. Patro), gsnayakcse@giet. edu (G.S. Nayak), dr.neelamadhab@giet.edu (N. Padhy). S.P. Patro et al.