SlideShare a Scribd company logo
1 of 15
Download to read offline
Vol.:(0123456789)
1 3
Int J Syst Assur Eng Manag
https://doi.org/10.1007/s13198-022-01837-5
ORIGINAL ARTICLE
An ensemble deep learning classifier of entropy convolutional
neural network and divergence weight bidirectional LSTM
for efficient disease prediction
S. R. Lavanya1
· R. Mallika2
Received: 7 March 2022 / Revised: 17 September 2022 / Accepted: 12 December 2022
© The Author(s) under exclusive licence to The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and
The Division of Operation and Maintenance, Lulea University of Technology, Sweden 2022
Abstract According to the World Health Organization
(WHO), worldwide rise in death rates amongst women is
mostly attributable to HD (Heart Disease) and BC (breast
cancer). In the last three decades, India has experienced a
30% increase in incidences of BC’s where every nine min-
utes, a woman dies of BC. HD has become very common,
and their early identification is crucial for clinicians to save
lives and protect their patients. Early diagnosis of BC and
HD are thus beneficial. MLT (Machine Learning Technique)
have been used in medical diagnostics for detecting illnesses,
minimizing modalities, and enhancing survival rates. Single
classifier or Conventional classification algorithms exhibit a
limited accuracy and affect the prediction outcome. Ensem-
ble learning is a technique for producing many basic clas-
sifiers are those that are used to construct new classifiers
created that outperforms any constituent classifier. In this
work, EDL (Ensemble Deep Learning) classifier is estimated
based on two classifiers, ECNN (Entropy Convolutional
Neural Network) and DWBi-LSTM. These classifiers are
applied to an imputed dataset of heart disease and breast
cancer, with selected feature subsets. In the ECNN classifier,
the weight of the Convolutional layers is computed via the
entropy function. In the DWBi-LSTM classifier, divergence
is used to assign the attentive weights of each feature to the
classification. The results of ECNN and DWBi-LSTM clas-
sifiers are combined via bootstrap aggregation and the final
classification is predicted by majority voting for accuracy.
Metrics such as precision, recall, f-measure, specificity and
accuracy has been used for assessing the results of proposed
system, and existing classifiers.
Keywords Heart disease · Breast cancer · Feature
selection · ECNN · DWBi-LSTM · EDL
Abbreviations
AMCGWO	
Adaptive mean chaotic grey wolf
optimization
BC	Breast cancer
BiGRU​	
Bidirectional gated recurrent unit
Bi-LSTM	
Bidirectional- long short term memory
BN	Bayesian network
BPNN	
Backpropagation neural network
CAD	Computer-aided diagnosis
CNN	
Convolutional neural network
CP	Canonical polyadic
DBN	
Deep belief network
DLT	
Deep learning technique
DNN	
Deep neural network
DT	Decision tree
DWBi-LSTM	
Divergence weight bidirectional-long
short-term memory
ECNN	
Entropy convolutional neural network
EDL	
Ensemble deep learning
EM	Expectation–maximization
FCLF-CNN	
Fully connected layer first convolutional
neural network
FFNN	
Feed-forward neural network
FNA	
Fine needle aspiration
Grad-CAM	
Gradient-weighted class activation
mapping
HD	Heart disease
* S. R. Lavanya
srlavanyaraj@gmail.com
R. Mallika
mallikapanneer@hotmail.com
1
Research and Development Centre, Bharathiar University,
Coimbatore, India
2
Department of Computer Science, CBM College,
Coimbatore, India
Int J Syst Assur Eng Manag
1 3
KL	Kullback–Leibler
KNN	
K Nearest neighbor
LM	Liebenberg marquardt
LR	Logistic regression
MCC	
Matthews correlation coefficient
MFWCP	
Mode fuzzy weight based canonical
polyadic
MLP	Multilayer perceptron
MLT	
Machine learning technique
MSD	
Mean squared deviation
NB	Naive bayes
NN	Neural network
ReLU	
Rectified linear unit
RNN	
Recurrent neural network
SSAE-SM	
Stacked sparse autoencoder and softmax
regression-based model
SVM	
Support vector machine
TFMF	
Trapezoidal fuzzy membership function
UCI	
University of California Irvine
WBCD	
Wisconsin breast cancer dataset
WDBC	
Wisconsin diagnostic breast cancer
WHO	
World health organization
1 Introduction
Human beings suffer from a multitude of diseases that affect
them both physically and psychologically. Diseases develop
primarily from infections or deficiencies or heredity traits or
organ dysfunctions. Doctors or medical experts detect and
diagnose disease-affected humans and include medical ther-
apy to treat them. Though many diseases can be cured with
therapies, HDand BCcannot be cured despite treatment, but
medications can prevent these diseases from getting worse
over a period of time. BC is a type of cancer that devel-
ops in breast cells and is a very common illness in women.
According to projections for India in 2020, the number of
Patients might reach two million (Malvia et al. 2017). Men
are more likely to be affected by HD than women. Accord-
ing to WHO, HD is responsible for 24% of non—communi-
cable disease deaths in India. Assessing HD based on risk
factors manually is a challenging task where the diagnosis
is the process of determining, explaining, or establishing a
human’s condition based on their symptoms and indications.
Early and accurate diagnosis is critical because it affects
treatment efficiency and prevents long-term consequences
for the infected person. Diagnostic errors are responsible
for around 10% of patient deaths and a variety of severe
consequences and/or occurrences in hospitals (De and
Chakraborty 2020). Errors in diagnosis may result from
a variety of causes including mismatches in communica-
tions between doctors, patients, and their families or poor
diagnostic procedures, or inefficient information from
healthcare systems.
MLTs are sophisticated, automated techniques for analyz-
ing high-dimensional, multimodal biomedical data and can
greatly expedite and enhance medical diagnosis processes.
One way of using these techniques is predicting depend-
ent variables based on the values of independent variables.
MLTs once designed can repeat tasks with great accuracy,
a crucial factor for making decisions in healthcare. MLTs
which can classify accurately are important components of
CAD (Computer-Aided Diagnosis) systems designed for
assisting medical practitioners who use them in the early
diagnosis of abnormalities. CAD systems help radiologists
visually examine mammograms to reduce chances of mis-
diagnosis due to fatigue or eye strain or inexperience. Thus,
proper use of CAD systems in healthcare can undoubtedly
save lives. MLTs that have been used in CAD systems for
predictions of abnormalities include SVM, KNN, NN, DT,
and NB.
The presence of irrelevant characteristics in training sets
affects classifier performances which can be enhanced by
discarding unnecessary features and choosing a subset of
significant characteristics called feature selections. These
approaches can be supervised (Song et al. 2017)-(Solorio-
Fernández et al. 2020), unsupervised (Sheikhpour et al.
2017), or semi-supervised (Xu et al. 2016)-(Ang et al. 2015)
based on how the training sets are labeled. Filter, wrapper,
and filtering is used in guided component choice techniques
like embedded has minimal effects on classifications. Wrap-
per on the other hand uses prediction accuracies of preset
learning algorithms to measure the quality of selected fea-
tures. An embedded technique like filters begins by using
statistical criteria for selecting many possible feature subsets
with specific cardinalities. Subsequently, the subset with the
best accuracy is chosen for classifications. Unsupervised fea-
ture choices may be used on unmarked data, but it can be
challenging to determine which characteristics are relevant.
Marked and unmarked data are used in semi-supervised fea-
ture choices to determine the feature’s significance. Biologi-
cal processes can be processed by computational algorithms
and thus aid in problem-solving and decision-making (Xue
et al. 2015). On selection of features, DLT like DNN, RNN,
DBN, and CNN can be used to diagnose diseases. Using
ensemble algorithms enhances illness predictions or catego-
rization accuracy (Rokach et al. 2014).
This paper proposes a diagnostic framework for clin-
ics using bio-inspired algorithms for feature selection and
EDL (Ensemble Deep Learning) for the categorization of
BC and HD. The obtained data is initially separated into
categorical and continuously subsets, using BN (Bayesian
Network)-imputed discrete fields and MFWCP-imputed ten-
sor decomposition subsets. In addition, AMCGWO is used
for wrapper-based feature selection technique which results
Int J Syst Assur Eng Manag
1 3
in selecting key features from missing data imputations.
Finally, EDL based classifier detects BC and HD. The pro-
posed scheme’s outcomes are evaluated using performance
evaluation metrics. This article’s remainder is organized as
following; Outlines of related works on ensemble classi-
fiers for BC and HD are provided in part 2, and the recom-
mended technique is explained in Sect. 3. Division 4 shows
the details of experimental studies along with results and
discussions. In Sect. 5 research study is concluded in detail.
2 Literature review
Raza (Raza 2019) In order to predict cardiac disorders, a
variety of machine learning techniques, including Logistic
Regression (LR), Neural Networks (NB), and the Multi-
layer Perceptron (MLP), are combined with an accuracy of
88.88%. In order to get a final determination, by weighing
and combining numerous separate classifiers, ensemble
learning is also used to increase the validity of a categoriza-
tion. The output of an assembly system is significantly influ-
enced by how the classifiers’ outputs are mixed. The findings
were compared to previously published studies, which found
that the ensemble technique outperformed individual classi-
fiers in terms of accuracy. The suggested ensembles method
is presented to improve the model’s capacity to forecast car-
diac illness accurately, robustly, and consistently, as well as
to prevent patient’s misinterpretation.
A majority vote ensemble approach developed by Atallah
and Al-Mousa (Atallah and Al-Mousa 2019) predicted the
existence of HD in individuals. Their forecasts were based
on basic, low-cost medical test results conducted at clin-
ics. They trained their model using real-life data consisting
of healthy and sick individuals as they aimed to increase
the trust and accuracy for diagnosing clinicians. In order to
produce more accurate results, the study identified patients
using many MLT. Their strategy resulted in 90% accuracy
of detection while using the hard voting ensemble model.
Sapra et al. (Sapra et al. 2021) employed MLT in the
diagnosis of HD. Their scheme was a quick recursive pro-
cedure that was very low in cost and accurate. Patient data
from clinics were inputs for the scheme which was predicted
based on these low-cost clinical test results. Moreover, since
the scheme was put to test on the results of patients and
healthy individual’s results, its results on predictions were
more trustworthy. The study also benchmarked several MLT
for evaluations where they found that their proposed strategy
which employed a hard voting ensemble model resulted in
90% accuracy.
Latha and Jeeva (Latha and Jeeva 2019) investigated
ensemble classifications by combining numerous classi-
fiers to increase the accuracy of weak algorithms. Their
experiments using the tool were executed on a heart
disease database dataset. The study employed comparative
analytical methods to see how ensembles could be used to
increase the prediction accuracy of HD. The use of ensem-
ble classification resulted in a maximum accuracy boost of
7.00% for weak classifiers. Implementing feature selection
boosted the process’s performance even further, with the
results indicating a significant rise in prediction accuracy.
In their data preparation, Baccouche et al. (Baccouche
et al. 2020) proposed a feature selection phase. Experi-
ments with unidirectional and bidirectional NNs revealed
that ensemble classifiers with Bi-LSTM model combined
with CNN achieved the most successful categorization
result for predicting various types of HD and with accu-
racy and F1-scores ranging from 91.00 to 96.00%. DLT
based ensemble-learning framework could address classifi-
cation issues of unbalanced heart disease datasets and their
proposed technique could result in exceptionally accurate
models suitable for clinical real-world diagnosis.
Kadam et al. (Kadam et al. 2019) Sparse Auto encoders
and Softmax Regression were used to categorize BC as
benign or cancerous. The suggested approach was tested
using the UCI machine learning library’s Wisconsin Diag-
nostic Breast Cancer (WDBC) dataset utilizing efficiency
in categorization, sensitivities, recollection, recollection,
clarity, and f measure and Matthews correlation coefficient
(MCC). The method has excellent reliability and efficiency
characteristics. Their approach beat SSAE-SM and other
classifiers in experiments, indicating the approach may be
beneficial for categorizing BC.
Elgin Christo et al. (Elgin Christo et al. 2019) described
a clinical diagnostics system that used bio-inspired feature
selection approaches with gradient descendent BPNN for
classifications. The study’s correlation-based ensemble
feature selections selected best features from three fea-
ture subsets which were obtained via correlation-based
ensemble feature selections and subsequently trained on
gradient descendent BPNN. The study used ten-fold cross-
validations to train and assess the classifier’s performance
where classification accuracy was assessed using the UCI
Machine Learning Repository’s Hepatitis dataset and
WDBC dataset.
Liu et al. (Liu et al. 2018) proposed the use using CNN
to enhance the accuracy of categorization for datasets. The
investigation to enhance organized data categorization effi-
ciency recommended using FCLF-CNN (Fully Connected
Layer First- Convolutional Neural Network). Before the first
convolutional layer, fully connected layers are combined and
fully connected layers are used as encoders or approximates
to transform raw data into representations of locations. To
boost performances, the study trained four different types
of FCLF-CNN and combined them to form an ensemble
FCLF-CNN. The results from WDBC and WBCD datasets
were cross validated five folds. In classification results, the
Int J Syst Assur Eng Manag
1 3
proposed FCLF-CNN outperformed MLP and CNN on both
datasets.
Masud et al. (Masud et al. 2020) created shallow propri-
etary CNNs that outperformed pre-trained models across
a wide range of performance metrics. To avoid bias, the
study’s model was trained using a fivefold cross-validation
strategy. Furthermore, the model was simpler to train than
pre-trained models as it required very few trainable param-
eters. The proposed Grad-CAM (Gradient-weighted Class
Activation Mapping) heat map visualizations clearly dis-
played that the proposed framework could extract crucial
characteristics for diagnosing BCs.Recently, CAD system
is developed to detect classes in an efficient manner. How-
ever single classifiers are will not enhance performance of
the system due to irrelevant features, incomplete dataset.
Ensemble learning is a technique for producing with numer-
ous basic classifications, a fresh one is created. created that
outperforms any constituent classifier and then issues of
missing data, and feature selection has been also solved by
factorization, and optimization methods.
3 Proposed methodology
The major advantage of the suggested strategy aims to
improve breast cancer and heart disease performances.
prediction by employing an EDL classifier. The proposed
approach comprises five stages i.e., data splitting, data pre-
processing, feature selection, training model, and perfor-
mance evaluation. In the first Stage, the actual data is split
into discrete and continuous feature sets. The missing value
imputation is performed by the Bayesian network (BN) in
the second stage. From this, reconstruction of data and ten-
sor factorization is performed by MFWCP. In the third stage,
Feature selection by AMCGWO is performed for reducing
the number of features in the dataset. In the fourth Stage, the
EDL classifier is utilized to improve two classifiers’ accu-
racy of ECNN and DWBi-LSTM for diseases prediction.
The evaluation metrics like precision, sensitivity, specific-
ity, F-measure, and accuracy for assessing the classifiers.
Figure 1 depicts the proposed method architecture.
3.1 
Imputation methods for incomplete dataset
When data values are missing, classifier accuracy suffers,
and Imputed values based on these data become essential.
BN makes assumptions in place of lacking data in that study
because of its capacity to represent ambiguity through causa-
tive relationships among factors. This work’s primary goal
is to impute imbalanced datasets for enhanced predictions
of BC. Missing at Random (MAR) is used to attain values
that are absent from one of the reported databases instances.
In this study, the dataset is utilized to teach the Directed
Acyclic Graph (DAG) characteristics of dependent prob-
abilities distributes. With only two stages, EM (Expecta-
tion–Maximization) is a fast BN technique that iteratively
finds the greatest probabilities. The Expectations phase com-
putes the log probability of the data, and a Log step that
computes the network’s current structure, log, and param-
eters (Franzin et al. 2017). The maximizing stage next iden-
tifies the parameters that maximize the probability from the
prior action. Repeating the process up to the network reaches
equilibrium or no parameters are present. As a consequence,
the training of missing data is successful.
3.2 
Reconstruction and imputation using tensor Via
MFWCP
Tensors are configurations that exist across several dimen-
sions, and the degree of a tensor is proportional to the num-
ber of dimensions it contains. Tensor factorizations result in
more accuracy, but they take much more time to compute.
Tensor factorization models Tucker and CP are quite well-
known at this point (Yang et al. 2017). MFWCP, factoriza-
tion has as its major objective the correction of errors that
occur during the restoration of basic tensors as well as the
aggregate of singular tensors ranking tensors with the fewest
t deviations from the original tensor. TFMF (Trapezoidal
Fuzzy Membership Function) are used to construct fuzzy
weight tensors that are not negative and have fuzzy member-
ship values. In order to compare comparable weighted ten-
sors, such as the original tensor, to deliberate missing data
imputation. Reconstruction results are measured via Mean
Squared Deviation (MSD), and the results are calculated
using Eq. (1) (Vazifehdan et al. 2019):
where Output(t)i—estimated value in tth iteration and n—
count of missing values.
3.3 
Feature selection Via AMCGWO
In this method, AMCGWO is used to choose features,
and EDL is used to determine the optimum feature set.
AMCGWO imitates grey wolves’ hunting and prey-search-
ing behaviours for optimum database component extraction.
AMCGWO believes grey wolf social structures relate 𝛼 first,
𝛽 second, 𝛿 at next, and finally 𝜔 wolves. 𝛼 are the dominant
wolves that are utilized for leading and controlling the entire
pack of grey wolves in order to choose those with the most
desirable characteristics.𝛽 wolf is the supreme candidate that
receives feedback from other wolves and provides it to the
head wolf. The next level of wolves, i.e.𝛿 wolves, control
the wolves, and final level 𝜔 wolves that are dependable
(1)
MSD =
1
n
n
∑
i=1
(
Outputi(t) − Outputi(t − 1)
)2
Int J Syst Assur Eng Manag
1 3
for preserving the consistency and safety of the wolf group
(Faris et al. 2018). The ranges of the method’s regulat-
ing parameters, including a, A, and C, are first assessed
the random vectors now ��⃗
r1 and ��⃗
r2 within [0, 1] are used
to the wolves getting between them and their prey. Here,
the concept’s mean is used to calculate the vector value.
Enhance the randomized vectors if the mean value is more
important for categorization; otherwise, lower it. Though
the convergence rate of GWO is high, thus it doesn’t work
fine in identifying global optima that affect the algorithm’s
rate of convergence. Thus, so as to decrease the effect and
enhance efficiency, the AMCGWO method was built by hav-
ing confusion in the GWO method. For these chaotic maps,
the initial value is identified between 0 and 1. Nevertheless,
these initial values can have a significant change in chaotic
map patterns. The current collection of chaotic maps is cho-
sen using a variety of behaviors, with the starting value set
at 0.7. Initially, stochastic initialization for the population is
performed through the quantity of grey wolves. Next, map-
ping of the selected ICMIC map with the approach is per-
formed during the initialization of the initial chaotic value
and the variable (Gandomi and Yang 2014). The AMCGWO
Fig. 1  Proposed Feature Selec-
tion and Ensemble Deep Learn-
ing (EDL) Classifier for Disease
Diagnosis
Incomplete Data
Tensor factorization with MFWCP
Ensemble Deep Learning (EDL)
Dataset with discrete missing values Dataset with continuous missing values
Missing data imputation by Bayesian network Reconstruction and imputation using tensor factorization
Optimal filled dataset
Mean
SquaredDeviation
(MSD)
No
Yes
ECNN DWBi-LSTM
AMCGWO
Performance analysis
Bootstrap aggregation
Int J Syst Assur Eng Manag
1 3
approach’s parameters a, A, and C are specified as being
comparable to CGWO in order to be employed in extraction
and exploratory operations. All of the grey wolves’ fitness is
evaluated using the benchmark function, and characteristics
are then ranked according to their fitness values. The most
suited wolf is the best outcome of the AMCGWO procedure
at the last iteration.
3.4 
Ensemble deep learning (EDL) classification
Ensemble is a method that may be used to increase a classi-
fier’s accuracy. It’s a helpful meta categorization strategy to
pair pairing less capable students with more capable students
to increase the effectiveness of the less capable students.
The performance of numerous illness detection algorithms
is improved in this work using the ensemble deep learning
(EDL) technique. The goal of integrating several classifiers
is to achieve greater performance than an individual classi-
fier. Figure 2 shows theensemble deep learning (edl) process.
In this work two classifiers like ECNN and DWBi-LSTM
are combined via bootstrap aggregation (Ren et al. 2017).
3.4.1 
Entropy convolutional neural network (ECNN)
CNN is modeled as an FFNN (Feed-Forward Neural Net-
work) with fully linked, compression, and max-pooling lay-
ers. Convolutions come next, then max-pooling layers, and
the completely linked layer serves as the final layer (Bashir
et al. 2015).
3.4.1.1 Convolutional layers In convolutional layers, the
weights are represented as the multiplicative factor of the
filters.In the proposed work, the weight of the Convolutional
layers is computed via the entropy function. Entropy is used
to compute the weight value of the layer by considering the
feature range to classes. If the feature range is higher for the
positive class, then the entropy range is also higher which
results in increased weight value and a reduced bias value.
If the feature range is lower for the positive class, then the
entropy range is lower which results in reduced weight value
and a reduced bias value. Based on this classifier results are
enhanced for disease diagnosis. Let vi ∈ ℝk
be k dimen-
sioned feature vector related to the ­
ith
sample of the data-
base. An extended dataset is indicated by Eq. (2),
here ⊕ represent the concatenation operator. A filter
w ∈ ℝhk is used by the convolution operation, which is uti-
lized in the time period for developing a fresh feature using
h features. Consider, a feature ci is developed from a window
of features vi∶i+h−1 by Eq. (3),
where b ∈ R indicate a hyperbolic tangential and biased
factor are examples of non-linear functions. The fil-
ter is utilized by every feature window in the dataset
{
v1∶h, v2∶h+1, ..., vn−h+1∶n
}
to build a feature map by Eq. (4),
with c ∈ ℝn−h+1. Max pooling method is performed on
the feature map and acquires maximum value ̂
c = max{c}
as the feature related to this filter. The goal is to search for
highly significant features with high value for every feature
map.
3.4.1.2 Dropout layer The dropout is performed with
weighted vector l2-norms of constraint to have regulariza-
tions. It is mentioned by Eq. (5),
The output y in forward propagation, z is denoted as the
input samples, dropout utilizes Eq. (6),
where ◦ represent component multiplying operation, as well
as r ∈ ℝm, the vector used to mask Bernoulli random vari-
ables when p is 1. Gradients are transmitted backwards using
uncovered integers. Sizing of learnt weight vectors by p is
done during the test period so ̂
w = pw, and ̂
w is utilized
to achieve samples. In addition to that l2-norm restriction
of the weighted vectors by w’s resizing to ||w||2 = s when
||w||2  s after a step of steep decline.In Eq. (6), w and b
denote the weight and bias of classifier which is calculated
(2)
v1∶n = v1 ⊕ v2 ⊕ … ⊕ vn
(3)
ci = f
(
w.vi∶i+h−1 + b
)
(4)
c =
[
c1, … .cn−h+1
]
(5)
y = w.z + b
(6)
y = w.(z◦r) + b
Training set 1 Training set 2
Classifier 1 Classifier 2
Ensemble Classifier
Combined results using
averaging
Prediction in
the test set
Feature Selection
Test set
Fig. 2  Ensemble deep learning (EDL) process
Int J Syst Assur Eng Manag
1 3
via the entropy basedon feature importance. The quadratic
entropy of information is calculated by Eq. (7),
Here P(x=k) is the chances that a specified characteristic
will have a certain number, k. if the entropy value is higher
than the weight and bias of the vector or the samples is
increased. It may give importance to the feature to the clas-
sifier to predict the positive or negative class. If the entropy
is higher belonging to the positive class, then the w and b
of the classifier is increased to improve the prediction rate.
3.4.1.3 Softmax layer or fully connected layer ReLU is
employed as an activation function. ReLU definition is dem-
onstrate in Eq. (8)
When x0, output=0, x0, output=x.
3.4.1.4 Output layer The final layer contains n neurons
related to n feature classes. This is a fully connected layer.
The common method is considering the high output neuron
as a class label of given input in classification (Sainath et al.
2013).
3.4.2 
Divergence weight bidirectional‑ long short‑term
memory
DWBi-LSTM classifier is used to diagnose various dis-
eases in this work. LSTM is a specialized RNN architec-
ture designed to learn long-term relationships (Sahoo et al.
2020). The cell additionally takes previous cell output state
(Ct−1), the cell input state (̃
Ct), and the cell output state (Ct).
For classification of various diseases, the LSTM architecture
comprises three gates: forget, input, and output, abbrevi-
ated as ft, it, and ot correspondingly.DWBi-LSTM classi-
fier weight is computed via Kullback–Leibler (KL) diver-
gence function. If the feature range is wide, the KL value
is enhanced, resulting in a larger weight value. Otherwise,
the classifier’s weight value is lowered for classification. It
improves the classifier’s accuracy and lowers the system’s
error. The cell state serves as a network storage, transmitting
important data along the series. Which data on the status of
the cell is authorized is determined by the gateways, which
are NN. When the HD and BC datasets are trained, the gates
will learn which data is crucial to preserve and which to
discard. Equations (9–12) may be used to calculate the value
of gates and cell state,
(7)
Entropy(x) = −
∑ (
P(x = k) ∗ log2(P(x = k)
)
(8)
f(x) = {0, if x  0x, if x ≥ 0
were DWf, DWi, DWo and DWc are divergence weights
linking layer’s contribution to the states of all gateways and
input cells Uf, Ui, Uo, Uc are the value vectors connecting
the inpu cell state and all of the gateways to the preceding
cell terminal side.bf, bi, bo,bc are bias vectors. σ and tanh
are, respectively, the sigmoid and tanh activation functions.
Cellular output state
(
Ct
)
, output layer
(
ht
)
, at every time
iteration t, is computed as byequation (13)-(14),
LSTM layer, the result vector for each of the outputs
is shown. as YT =
[
hT−n, … hT−1
]
.. Bidirectional LSTM is
based on bidirectional RNN (Houdt et al. 2020). It analyzes
successive data using two separate hidden layers, and trave-
ling ahead or backward links those layers to the same output
layer. Figure 3 shows a layered architecture for the Bi-LSTM
network.
The sequence input layer is also known as the first layer
embedding layer. As input, it uses the sorted chosen features
from the HD and BC datasets. Hidden forward and back-
ward LSTM layers are the second and third layers, giving
the 100-hidden unit Bi-LSTM layer. These two layers relate
present data to prior and future phases. Two data sequences
reach the system via the hidden layer. The outputs of the
hidden layers are integrated after processing to create the
Bi-LSTM layer’s final output. The following Eq. (15) may
be used to calculate the output from both LSTM layers,
when it accepts sequence from x1 and xT as input, h
f
t and hb
t
indicates the relative results of the advanced and reverse LSTM
layers. 𝛼 and 𝛽 are used to adjust the Bi-LSTM factors. At the
time, ht is dual bidirectional LSTM components. Bi-output
LSTM’s feeds a completely linked level with Five categories.
This layer links input characteristic data to output information
so subsequent layers can categorize them. Ultimately, the soft-
max and classification layers divide data into several classes.
(9)
ft = 𝜎
(
DWf xt + Uf ht−1 + bf
)
(10)
it = 𝜎
(
DWixt + Uiht−1 + bi
)
(11)
ot = 𝜎
(
DWoxt + Uoht−1 + bo
)
(12)
̃
Ct = tanh
(
DWcxt + Ucht−1 + bc
)
(13)
Ct = ft ∗ Ct−1 + it ∗ ̃
Ct
(14)
ht = ot*tanh
(
Ct
)
(15)
ht = 𝛼h
f
t + 𝛽hb
t
Int J Syst Assur Eng Manag
1 3
The softmax activations transform real vector values into in
the range 0 to 1, allowing probabilities to be understood. In
the somax regression (Wang et al. 2018), the probability of
classifying into a class may be calculated using Eq. (16).
where the value of K indicates the total amount of categories
𝜃 is denoted as the model parameters. The model gets the
results from the softmax function in the classification layer
and assigns every input for a class that makes use of the
cross-entropy function via Eq. (17).
N observations, K categories, tij denotes that the ­
ith
sam-
ple belongs to the ­
jth
class and yij. denotes the softmax value.
Weighting features is essential in classification because it
ensures that each feature has the same benefits in compari-
son to the targeted idea. Let us assume that when a given
feature value is seen, it provides a certain quantity of in
addition to calculating the comparative relevance of the
targeted characteristic, information is also provided to the
target feature of every distinguishing characteristic in the
categorization scheme,the discrepancy between prior and
posterior distributions of the target feature is used to define
the amount of information contained in a particular feature
value. The range of weight is computed using Kullback–Lei-
bler (KL) metric, which is computed using Eq. (18),
Here frij is the ­
ith
feature’s j value in training samples.
The weighted average of the KL measurements across the
(16)
P
�
y(i)
= k|x;θ
�
=
exp
�
θ(k)Tx
�
∑K
j=1 exp
�
θ(j)Tx
�
(17)
loss = −
N
∑
i=1
K
∑
j=1
tij ln yij
(18)
KL
(
C|frij
)
=
∑
C
P
(
c|frij
)
log
(
P
(
c|frij
)
P(c)
)
feature values is the feature weight. As a result, the weight
of feature i represented by Eq. (19) as fwavg(i),
P( frij) is the probability that the feature i has the value
of frij in this Eq. (19). Above weight fwavg(i) favors charac-
teristics that include a large amount of entries; as a result, a
range of records linked with each feature value is too less to
make any reliable learning. Equation (20) defines the final
form of the weight of feature i denoted as fw(i)
here Z is a normalization constant which is computed by
Eq. (21),
The value of n in equations represents the number of
selected features from the training data this Eq. (21). The
normalized form of fw(i) (Eq. (20) is presented in this work
in order to verify that
∑
i
fw(i) = n. Lastly, each gate in the
Bi-LSTM classifier is updated with this weight value. The
network’s hyper-parameters are initialized once the network
has been defined. The qualities on which the whole training
process is based are known as model’s hyper-parameters
where hyperparameters can be model-specific or optimiza-
tion specific. These parameters include epoch counts, batch
sizes, and learning rates which impact performances consid-
erably when optimized. Model-specific parameters are ele-
ments that impact structures like hidden units or layer
counts. These hyperparameters directly control training
(19)
fwavg(i) =
∑
j|i
P
(
frij
)
.KL
(
C|frij
)
(20)
fw(i) =
∑
j�i P
�
frij
� ∑
c P
�
c�frij
�
log
�P(c�frij)
P(c)
�
−Z.
∑
j�i P
�
frij
�
log
�
P
�
frij
��
(21)
Z =
1
n
∑
i
fw(i)
Fig. 3  DWBi-LSTM Network
Classes
Input
data
Padded
Input
data
1
2
3
………
……
Sorting
data
1
2
3
Embedding
Layer
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
ℎ1 ℎ1
ℎ2 ℎ2
ℎ3 ℎ3
ℎ ℎ
Fully
Connected
Layer
Softmax
Layer
Classification
Layer
Int J Syst Assur Eng Manag
1 3
processes which have significant influences on the resulting
performances of models. Hence, it is imperative to select the
right parameters for a model’s learning. A huge number of
trials are required to select optimal hyperparameters, which
may be consuming time making operations difficult. Clas-
sification accuracies of test sets are evaluated for obtaining
appropriate hyperparameters sets. Hence, this research work
diagnoses performances hyperparameters in both training
and validation datasets for selecting the right mix in terms
of classification results. This study considers learning rates,
batch sizes, epoch counts, and hidden unit counts as its
hyperparameters.
3.4.3 Bootstrap aggregation
Bootstrap aggregation uses replacement to select some sam-
ples from the training set at random. The term “bootstrap
replicate” refers to a fresh training set. The process of
obtaining bootstrap samples from data and training the clas-
sifier with each individual sample is referred to as bagging.
The combined votes of each classifier are tallied, and the
final outcome of the classification is decided according to
whichever vote has a greater number of affirmatives. Any
other classifier that uses a majority vote may be merged into
the majority voting classifier since it is a meta-classifier. The
ultimate label for the class is the one that was anticipated by
the vast majority of classifiers. The training set is used to
generate samples for the bootstrap aggregation method,
which selects them at random and replaces them. The new
training set is referred to by the moniker bootstrap replicate
throughout this article. The process of bagging entails
obtaining samples of the bootstrap from the raw data, and
then proceed to train the classifier using each individual
example. After tallying up the votes from each classifier, the
final outcome of the classification is decided according to
whichever vote had the majority of support.The class label
dJ is represented as dJ = mode
{
C1, C2, … Cn
}
, where
{C1, C2, … Cn} refers to individual classifiers that partici-
pated in voting. Let be ci,j the prediction of the ­
ith
classifier
on a class with j labels is represented as follows,
n
∑
i=1
ci,j = max
j=1,…,m
n
∑
t=1
ci,j.The algorithm for bootstrap aggrega-
tion is shown below.
4 
Experiments and results discussion
The influence of imputation methods was measured depend-
ing on how well the predictions turned out. approaches in
this study. The experiments assessed and examined perfor-
mances based on detailed description of real datasets.
4.1 Datasets
This study uses the UCI Machine Learning Repository’s
WDBC and WOBC datasets for BCs, and its Hungarian and
Switzerland datasets for HD.
4.1.1 WDBC
Dataset, the characteristics of a digitized picture that belong
to FNA (Fine Needle Aspiration) of breast mass have been
added. The aforementioned features have defined the cell
nuclei feature in the picture. There are 569 data points in
the dataset, with 212 belonging to Malignant and 357 to
Benign. The dataset was divided into 10 groups based on
the following characteristics: fractal size, radius, symmetry,
texture, concave points, perimeter, concavity, area, compact-
ness, and smoothness. Three measures, such as Mean, SE,
and the mean of the three greatest values, are calculated for
each characteristic. As a result, there will be a total of 30
dataset features.
4.1.2 WOBC
The dataset contains 699 samples obtained from the UCI
repository. There are 458 benign samples and 241 malignant
samples in the group. In addition, the dataset contains ten
characteristics and one class. The class level is divided into
two categories: benign and malignant. In addition, the data-
base has missing data. The traits comprise the code number:
id; clump thickness: 1–10; cell size–1–10; cell shape–1–10;
peripheral bonding: 1–10; singular enterocytes size–1–10;
naked nuclei: 1–10; bland chromatomatin: 1–10; regular
nucleoli: 1–10; mitoses: 1–10; benign-2 and malignant-4
classes.
4.1.3 
Heart disease (HD)
Only a subset of 14 of the 76 qualities that make up heart
disease (HD) are taken into account. Particularly, the Cleve-
land database was used the most by MLT. A number of 0
(absence) to 4 (presence) in the goal field indicates the
existence of cardiac disease. The tests performed on this
database aimed to detect the present or absent of illness.
4.1.3.1 Hungarian dataset It contains 294 samples with
14 features.
4.1.3.2 Switzerland dataset It contains 123 samples with
14 features.
Int J Syst Assur Eng Manag
1 3
4.2 Evaluation metrics
Following completion of the dataset, any missing values are
recognized, and the following metrics are utilized to calcu-
late Precision, Recall, Specificity, F-measure, and Accuracy.
Precision refers to the proportion of correctly recognized
good results.
Precision gives the proportion of positive predictions that
are actually correct. It can be calculated by Eq. (22),
Recall gauges the percentage of accurately anticipated
positive results that actually materialized. It can be calcu-
lated by Eq. (23),
The F-measure may be represented using Eq. (24),
Selectivity metric, which evaluates the percentage of
properly detected negatives in comparison to the overall neg-
ative predictions generated by the model, is often referred to
as the true negative rate. It is possible to represent it using
an Eq. (25),
Accuracy is a measurement that has been recognized as
being one of the highly acknowledged metrics to examine
the classification efficiency, which has been carried out
in this study for the purpose of cancer detection. It can be
expressed as follows by Eq. (26),
The imputation strategies result of the classifiers are
measured usingNRMSE to find the missing value. The
Eq. (27) to calculate NRMSE is described as follows (27),
In which the true worth is shown by represented by xi,
while simulated values is denoted by x
′
i
.
4.3 Results comparison
During the course of the tests, the suggested EDL classi-
fier was evaluated in comparison to four other classifying
(22)
Precision =
TP
FP + TP
(23)
Recall =
TP
TP + FN
(24)
F − measure =
2 ∗ (Recall ∗ Precision)
(Recall + Precision)
(25)
Specificity =
TN
FP + TN
(26)
Accuracy =
TP + TN
TP + FP + TN + FN
(27)
NRMSE =
1
max − min
√
√
√
√1
n
n
∑
i=1
(
xi − x
�
i
)2
strategies, including KNN, DT, ANFIS, and CNN. All these
classifiers are performed after the feature selection is com-
pleted via the AMCGWO. Table 1 shows the results com-
parison of feature selection with classifiers vs. datasets.
Figure 4 depicts precision value comparison of the BC
datasets (Fig. 4a) and the HD datasets (See Fig. 4b) with
earlier classifiers. For WDBC and WOBC, the suggested
MFWCP-EDL approach yields accuracy values of 98.7304%
and 98.1207%, respectively. For Switzerland and Hungary,
the similarly suggested MFWCP-EDL approach yields accu-
racy values of 98.5446% and 91.6667%, respectively. Addi-
tionally, there are additional techniques like WCP-KNN,
MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS,
and MFWCP-ANFIS, WCP-CNN, and MFWCP-CNN has
the precision value of 56.3333%, 60.9211%, 56.8750%,
71.2698%, 73.4848%, 76.3158%, 83.3333%, and 85.7143%
respectively for the Switzerland dataset (See Table 1). The
results of the proposed system have higher precision due to
optimal selection of features from the AMCGWO algorithm,
it exactly predicts the true positive results.
Recall results comparison of various classifiers with
imputation methods for BC and Heart Disease (HD) datasets
are shown in Fig. 5a, b respectively. For WDBC and WOBC,
the suggested MFWCP-EDL approach yields recall values
of 98.6364% and 98.3235%, respectively. For the Swiss and
Hungarian datasets, the similarly suggested MFWCP-EDL
approach yields recall values of 99.1150% and 98.9132%,
respectively. In addition, there are more techniques like
WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-
ANFIS, and MFWCP-ANFIS, WCP-CNN, and MFWCP-
CNN shows the recall value of 60.7843%, 75.7080%,
68.6275%, 85.5752%, 80.3922%, 96.0177%, 97.7876%,
and 98.2301% respectively for Switzerland dataset. The
proposed system has higher sensitivity results due to opti-
mal features selection by AMCGWO algorithm; it exactly
predicts the actual data correctly.
Figure 6 displays the comparison of the F-measure value
with categorization methods in relation to the BC data-
sets (Fig. 6a) and the datasets for heart disease (HD) (See
Fig. 6b). For WDBC and WOBC, the suggested MFWCP-
EDL technique has values of 98.6834% and 98.2220%,
respectively. For the Hungarian and Swiss databases,
the identically suggested MFWCP-EDL approach yields
F-measure values of 98.7286% and 95.2455%, respectively.
In addition, there are more techniques like WCP-KNN,
MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS, and
MFWCP-ANFIS, WCP-CNN, and MFWCP-CNN shows the
F-measure value of 58.7647%, 67.5143%, 61.8756%, 77.77
02%,76.7835%,85.0405%,89.9837%, and 91.5464%respec-
tivelyfor Switzerland dataset. The suggested method accu-
rately predicts the real data by increasing f-measure by the
average while reducing the total amount of characteristics
in the database.
Int J Syst Assur Eng Manag
1 3
Table 1  Results Comparison of Feature Selection with Classifiers vs. Datasets
Imputation AMCGWO+Classifiers WDBC Results (%) NRMSE
Precision Recall F-measure Specificity Accuracy
WCP KNN 69.6698 71.0979 70.3766 63.1981 70.1754 0.5461
DT 84.7445 85.5820 85.1612 76.0729 85.9649 0.3746
ANFIS 90.2199 91.4021 90.8071 81.2463 91.2281 0.2962
CNN 94.9526 95.8280 95.3883 85.1805 95.6063 0.2096
EDL 97.5141 97.6025 97.5593 86.7596 97.7153 0.1512
MFWCP KNN 80.0703 80.3774 80.2236 71.4446 80.5000 0.4416
DT 89.7456 89.8593 89.8024 79.8749 90.0000 0.3162
ANFIS 92.9545 93.4383 93.1958 83.0563 93.2500 0.2598
CNN 96.0864 96.2805 96.1834 85.5827 96.2500 0.1936
EDL 98.7304 98.6364 98.6834 87.6768 98.7698 0.1109
Imputation AMCGWO+Classifiers WOBC Results (%) NRMSE
Precision Recall F-measure Specificity Accuracy
WCP KNN 69.7442 70.2857 70.0139 62.4762 70.000 0.5477
DT 85.5054 86.4286 85.9645 76.8254 85.8333 0.3764
ANFIS 91.2896 91.7143 91.5014 81.5238 91.6667 0.2887
CNN 95.1193 95.1850 95.1522 84.6089 95.2500 0.2179
EDL 97.9570 97.9570 97.9570 87.0729 98.0000 0.1414
MFWCP KNN 78.6044 79.6271 79.1125 70.7797 79.9649 0.4476
DT 90.0764 91.2287 90.6489 81.0921 91.0369 0.2994
ANFIS 93.5409 93.7054 93.6231 83.2937 94.0246 0.2444
CNN 95.7174 95.6292 95.6733 85.0037 95.9578 0.2011
EDL 98.1207 98.3235 98.2220 87.3987 98.2500 0.1323
Imputation AMCGWO+Classifiers Hungarian dataset (%) NRMSE
Precision Recall F-measure Specificity Accuracy
WCP KNN 64.0625 68.5294 66.2207 60.9150 70.4545 0.5436
DT 79.3396 86.9118 82.9532 77.2549 85.2273 0.3844
ANFIS 87.5920 91.3235 89.4188 81.1765 92.0455 0.2820
CNN 93.2307 95.5658 94.8690 85.8363 95.1282 0.2103
EDL 96.2771 96.9744 96.6245 86.1995 97.2789 0.1650
MFWCP KNN 76.9502 81.4989 79.1592 72.4434 80.9524 0.4364
DT 87.7941 90.2191 88.9901 80.1947 90.8163 0.3030
ANFIS 90.2989 94.1575 92.1878 83.6956 93.1973 0.2608
CNN 94.1029 96.2355 95.0594 86.3649 95.9184 0.2020
EDL 98.5446 98.9132 98.7286 87.9229 98.9796 0.1010
Imputation AMCGWO+Classifiers Switzerland dataset (%) NRMSE
Precision Recall F-measure Specificity Accuracy
WCP KNN 56.3333 60.7843 58.7647 54.0305 70.2703 0.5452
DT 56.8750 68.6275 61.8756 61.0022 83.7838 0.4027
ANFIS 73.4848 80.3922 76.7835 71.4597 91.8919 0.2847
CNN 83.3333 97.7876 89.9837 86.9223 95.9350 0.2016
EDL 88.4615 98.6726 93.2885 87.7089 97.5610 0.1562
MFWCP KNN 60.9211 75.7080 67.5143 67.2960 80.4878 0.4417
DT 71.2698 85.5752 77.7702 76.0669 90.2439 0.3123
ANFIS 76.3158 96.0177 85.0405 85.3491 92.6829 0.2705
CNN 85.7143 98.2301 91.5464 87.3156 96.7480 0.1803
EDL 91.6667 99.1150 95.2455 88.1023 98.3740 0.1275
Int J Syst Assur Eng Manag
1 3
Specificity evaluation of various classifiers for Fig. 7
is an illustration of BC datasets. For WDBC and WOBC,
the suggested MFWCP-EDL technique yields specific-
ity values of 87.6768% and 87.3987%, correspondingly.
For the Swiss and Hungarian datasets, the identically sug-
gested MFWCP-EDL technique yields particular values of
88.1023% and 87.9229%, correspondingly. The alternative
techniques, include WCP-KNN, MFWCP-KNN, WCP-DT,
MFWCP-DT, and WCP-ANFIS, MFWCP-ANFIS, WCP-
CNN, and MFWCP-CNN shows the specificity of 54.0305%,
67.2960%, 61.0022%, 76.0669%, 71.4597%, 85.3491%,
86.9223%, and 87.3156% respectively for Switzerland data-
set (See Table 1).
Overall accuracy comparison of several classifiers with
imputation, Fig. 8 shows datasets with regard to BC. For
WDBC and WOBC, the proposed MFWCP-EDL tech-
nique demonstrates superior accuracy with 98.7698% and
98.2500%, correspondingly. For the Swiss and Hungarian
datasets, the identically suggested MFWCP-EDL approach
yields correctness of 98.9796% and 98.3740%, correspond-
ingly. In addition, there are more techniques like WCP-KNN,
MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS, and
MFWCP-ANFIS., WCP-CNN, and MFWCP-CNN gives
the accuracy value of 70.2703%, 80.4878%, 83.7838%,
90.2439%, 91.8919%, 92.6829%, 95.9350%, and 96.7480%
respectively for Switzerlanddataset.The proposed system has
increased accuracy by correctly classifying the samples as
positive, and negative.
Figure 9 depicts the NRMSE evaluation of classifica-
tion methods on BC and HD datasets. The methods like
WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT,
WCP-ANFIS, MFWCP-ANFIS, WCP-CNN, and MFWCP-
CNN gives the NRMSE of 0.5452, 0.4417, 0.4027, 0.3123,
0.2847, 0.2705, 0.2016, and 0.1803 respectively for the
Switzerland dataset.
Fig. 4  Precision Comparison VS Classifiers
Fig. 5  Recall Value Comparison Vs Classifiers
Int J Syst Assur Eng Manag
1 3
5 
Conclusion and future work
In many classification issues when the feature training
matrix is missing, imputation of missing data is a typical
use. Simultaneously, feature selection becomes more sig-
nificant, particularly in data sets with a huge number of ele-
ments and variables. Imputation of missing data and feature
selection, and classification issues are solved for multiple
disease diagnoses. Initially, missing value imputation is
done by Bayesian network (BN) and optimal missing data
imputation reconstruction of data is performed by the pro-
cess of factoring tensors via MFWCP for the imputed data.
AMCGWO is a wrapper-based strategy for picking just the
optimum characteristics. It collects the average value of the
characteristic and introduces the ICMIC map to improve
GWO’s performance. EDL improves the efficiency of
disease-diagnosis algorithms. EDL is built on ECNN and
DWBi-LSTM through bootstrap aggregation. In the ECNN
classifier, the weight and bias of the classifier are calcu-
lated via the entropy with feature-based importance. In the
DWBi-LSTM classifier, weight features play a major vital
role in classification that every feature has similar impor-
tance regarding target concept by Kullback–Leibler (KL)
divergence. The majority voting classifier merges classifiers
by majority vote. Most classifiers anticipated the final class
label. In order to forecast illnesses, classifiers’ outcomes are
assessed.When compared against MFWCP-KNN, MFWCP-
DT, MFWCP-ANFIS, and MFWCP-CNN, sequentially for
the Switzerland dataset, the suggested MFWCP-EDL tech-
nique has produced a higher accuracy of 98.374%, which
is 17.8862%, 8.1301%, 5.6911%, and 1.626% greater than
Fig. 6  F-Measure Value Comparison Vs Classifiers
Fig. 7  Specificity Results Comparison vs. Classifiers
Int J Syst Assur Eng Manag
1 3
those methods’ corresponding previous bests. This study has
been expanded by integrating more datasets, and new deep
learning algorithms have also been incorporated to improve
the effectiveness of the classifier.
Declarations
Conflict of interest The authors declare they have no conflicts of
interest.
Research involving Human Participants and/or Animals In our
work, no animals or human are involved.
Informed consent Not applicable as no human or animal sample
was involved in this study.
References
Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsu-
pervised, and semi-supervised feature selection: a review on gene
selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989
Atallah R, Al-Mousa A (2019) Heart disease detection using machine
learning majority voting ensemble method. In: Proceedings of the
2019 2nd International Conference on New Trends in Computing
Sciences (ICTCS), pp. 1–6, Amman, Jordan, October 2019.
Baccouche A, Garcia-Zapirain B, Castillo Olea C, Elmaghraby A
(2020) Ensemble deep learning models for heart disease classifi-
cation: a case study from Mexico. Information 11(4):1–28
Bashir S, Qamar U, Khan FH (2015) BagMOOV: A novel ensemble for
heart disease prediction bootstrap aggregation with multi-objec-
tive optimized voting. Australas Phys Eng Sci Med 38(2):305–323
De S, Chakraborty B (2020) Disease Detection System (DDS) Using
Machine Learning Technique. In Machine Learning with Health
Care Perspective (pp. 107–132). Springer, Cham.
Elgin Christo VR, Khanna Nehemiah H, Minu B, Kannan A (2019)
Correlation-based ensemble feature selection using bioinspired
algorithms and classification using backpropagation neural net-
work. Comput Math Methods Med 2019(7398307):1–17
Faris H, Aljarah I, Al-Betar MA, Mirjalili S (2018) Grey wolf opti-
mizer: a review of recent variants and applications. Neural Com-
put Appl 30(2):413–435
Franzin A, Sambo F, Di Camillo B (2017) BNSTRUCT: an R package
for Bayesian network structure learning in the presence of missing
data. Bioinformatics 33(8):1250–1252
Gandomi AH, Yang X-S (2014) Chaotic bat algorithm. J Comput Sci
5(2):224–232
Kadam VJ, Jadhav SM, Vijayakumar K (2019) Breast cancer diagnosis
using feature ensemble learning based on stacked sparse autoen-
coders and softmax regression. J Med Syst 43(8):1–11
Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of
heart disease risk based on ensemble classification techniques.
Inform Med Unlocked 16:1–9
Liu K, Kang G, Zhang N, Hou B (2018) Breast cancer classifica-
tion based on fully-connected layer first Convolutional neural
networks. IEEE Access 6:23722–23732
Fig. 8  Accuracy Results Comparison Vs Classifiers
Fig. 9  NRMSE Value Comparison Vs Classifiers
Int J Syst Assur Eng Manag
1 3
Malvia S, Bagadi SA, Dubey US, Saxena S (2017) Epidemiol-
ogy of breast cancer in Indian women. Asia Pac J Clin Oncol
13(4):289–295
Masud M, Rashed AEE, Hossain MS (2020) Convolutional neural
network-based models for diagnosis of breast cancer. Neural
Comput Appl, pp.1–12.
Raza K (2019) Improving the prediction accuracy of heart disease
with ensemble learning and majority voting rule,” InU-Health-
care Monitoring Systems, pp. 179–196, 2019.
Ren Y, Zhao P, Sheng Y, Yao D, Xu Z (2017) Robust softmax regres-
sion for multi-class classification with self-paced learning.
In: Proceedings of the ­
26th
International Joint Conference on
Artificial Intelligence (pp. 2641–2647).
Rokach L, Schclar A, Itach E (2014) Ensemble methods for multi-
label classification. Exp Syst Appl 41(16):7507–7523
Sahoo AK, Pradhan C, Das H (2020) Performance evaluation of
different machine learning methods and deep-learning based
Convolutional neural network for health decision making.
In Nature inspired computing for data science (pp. 201–212).
Springer, Cham.
Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013)
Deep Convolutional neural networks for LVCSR. In: Proceed-
ings of the 38th IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP ’13), pp. 8614–8618,
2013.
Sapra L, Sandhu JK, Goyal N (2021) Intelligent method for detection
of coronary artery disease with ensemble approach. Advances in
Communication and Computational Technology, vol. 1033–1042,
2021.
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017)
A survey on semi-supervised feature selection methods. Pattern
Recogn 64:141–158
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020)
A review of unsupervised feature selection methods. Artif Intell
Rev 53(2):907–948
Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (2017) Super-
vised feature selection via dependence estimation,” in Proceed-
ings of the ­
24th
international conference on Machine learning, pp.
823–830, ACM, Corvallis, OR, USA, June 2017.
Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long
short-term memory model. Artif Intell Rev 53(8):5929–5955
Vazifehdan M, Moattar MH, Jalali M (2019) A hybrid Bayesian net-
work and tensor factorization approach for missing value impu-
tation to improve BC recurrence prediction. J King Saud Univ-
Comput Inform Sci 31(2):175–184
Wang J, Wen G, Yang S, Liu Y (2018) Remaining useful life estimation
in prognostics using deep bidirectional LSTM neural network.
In 2018 Prognostics and System Health Management Conference
(PHM-Chongqing) ,pp. 1037–1042.
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary
computation approaches to feature selection. IEEE Trans Evol
Comput 20(4):606–626
Xu J, Tang B, He H, Man H (2016) Semisupervised feature selection
based on relevance and redundancy criteria. IEEE Trans Neural
Netw Learn Syst 28(9):1974–1984
Yang F, Shang F, Huang Y, Cheng J, Li J, Zhao Y, Zhao R (2017)
LFTF: a framework for efficient tensor analytics at scale. Proceed
VLDB Endowment 10(7):745–756
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of
such publishing agreement and applicable law.

More Related Content

Similar to An ensemble deep learning classifier of entropy convolutional neural network and divergence weight bidirectional LSTM for efficient disease prediction

BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
ijistjournal
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
ijistjournal
 
Hybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksHybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networks
CSITiaesprime
 

Similar to An ensemble deep learning classifier of entropy convolutional neural network and divergence weight bidirectional LSTM for efficient disease prediction (20)

A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
 
Hybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networksHybrid model for detection of brain tumor using convolution neural networks
Hybrid model for detection of brain tumor using convolution neural networks
 
IRJET- Brain Tumor Detection using Convolutional Neural Network
IRJET- Brain Tumor Detection using Convolutional Neural NetworkIRJET- Brain Tumor Detection using Convolutional Neural Network
IRJET- Brain Tumor Detection using Convolutional Neural Network
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
IRJET - Detection of Heamorrhage in Brain using Deep Learning
IRJET - Detection of Heamorrhage in Brain using Deep LearningIRJET - Detection of Heamorrhage in Brain using Deep Learning
IRJET - Detection of Heamorrhage in Brain using Deep Learning
 
Cervical cancer diagnosis based on cytology pap smear image classification us...
Cervical cancer diagnosis based on cytology pap smear image classification us...Cervical cancer diagnosis based on cytology pap smear image classification us...
Cervical cancer diagnosis based on cytology pap smear image classification us...
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
 
Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
 
Classification ANN
Classification ANNClassification ANN
Classification ANN
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease Prediction
 
Discovering Abnormal Patches and Transformations of Diabetics Retinopathy in ...
Discovering Abnormal Patches and Transformations of Diabetics Retinopathy in ...Discovering Abnormal Patches and Transformations of Diabetics Retinopathy in ...
Discovering Abnormal Patches and Transformations of Diabetics Retinopathy in ...
 
DISCOVERING ABNORMAL PATCHES AND TRANSFORMATIONS OF DIABETICS RETINOPATHY IN ...
DISCOVERING ABNORMAL PATCHES AND TRANSFORMATIONS OF DIABETICS RETINOPATHY IN ...DISCOVERING ABNORMAL PATCHES AND TRANSFORMATIONS OF DIABETICS RETINOPATHY IN ...
DISCOVERING ABNORMAL PATCHES AND TRANSFORMATIONS OF DIABETICS RETINOPATHY IN ...
 
BRAIN TUMOR DETECTION
BRAIN TUMOR DETECTIONBRAIN TUMOR DETECTION
BRAIN TUMOR DETECTION
 
Brain Tumor Detection From MRI Image Using Deep Learning
Brain Tumor Detection From MRI Image Using Deep LearningBrain Tumor Detection From MRI Image Using Deep Learning
Brain Tumor Detection From MRI Image Using Deep Learning
 
1 springer format chronic changed edit iqbal qc
1 springer format chronic changed edit iqbal qc1 springer format chronic changed edit iqbal qc
1 springer format chronic changed edit iqbal qc
 
Predicting heart failure using a wrapper-based feature selection
Predicting heart failure using a wrapper-based feature selectionPredicting heart failure using a wrapper-based feature selection
Predicting heart failure using a wrapper-based feature selection
 
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
 

More from BASMAJUMAASALEHALMOH

A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
BASMAJUMAASALEHALMOH
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
BASMAJUMAASALEHALMOH
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
BASMAJUMAASALEHALMOH
 
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
BASMAJUMAASALEHALMOH
 
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
BASMAJUMAASALEHALMOH
 
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
BASMAJUMAASALEHALMOH
 
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
BASMAJUMAASALEHALMOH
 
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdfWang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
BASMAJUMAASALEHALMOH
 

More from BASMAJUMAASALEHALMOH (10)

A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
A_Healthcare_Monitoring_System_for_the_Diagnosis_of_Heart_Disease_in_the_IoMT...
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
 
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...An automatic heart disease prediction using cluster-based bidirectional LSTM ...
An automatic heart disease prediction using cluster-based bidirectional LSTM ...
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...
 
An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...An optimal heart disease prediction using chaos game optimization‑based recur...
An optimal heart disease prediction using chaos game optimization‑based recur...
 
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
Deep Spectral Time‑Variant Feature Analytic Model for Cardiac Disease Predict...
 
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
Bidirectional Recurrent Network and Neuro‑fuzzy Frequent Pattern Mining for H...
 
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
Diagnosis of Cardiac Disease Utilizing Machine Learning Techniques and Dense ...
 
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdfWang2022_Article_ArtificialIntelligenceForPredi.pdf
Wang2022_Article_ArtificialIntelligenceForPredi.pdf
 

Recently uploaded

Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...
Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...
Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...
Sheetaleventcompany
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
mahaiklolahd
 
Kottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Kottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetKottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Kottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
Goa cutee sexy top girl
 
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
dilpreetentertainmen
 
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
mahaiklolahd
 
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetsurat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
mahaiklolahd
 

Recently uploaded (20)

Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...
Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...
Gorgeous Call Girls In Pune {9xx000xx09} ❤️VVIP ANKITA Call Girl in Pune Maha...
 
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
 
Call Now ☎ 8868886958 || Call Girls in Chandigarh Escort Service Chandigarh
Call Now ☎ 8868886958 || Call Girls in Chandigarh Escort Service ChandigarhCall Now ☎ 8868886958 || Call Girls in Chandigarh Escort Service Chandigarh
Call Now ☎ 8868886958 || Call Girls in Chandigarh Escort Service Chandigarh
 
❤️Ludhiana Call Girls ☎️98157-77685☎️ Call Girl service in Ludhiana☎️Ludhiana...
❤️Ludhiana Call Girls ☎️98157-77685☎️ Call Girl service in Ludhiana☎️Ludhiana...❤️Ludhiana Call Girls ☎️98157-77685☎️ Call Girl service in Ludhiana☎️Ludhiana...
❤️Ludhiana Call Girls ☎️98157-77685☎️ Call Girl service in Ludhiana☎️Ludhiana...
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
 
Kottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Kottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetKottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Kottayam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Sexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort Service
Sexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort ServiceSexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort Service
Sexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort Service
 
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
 
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
 
Dehradun Call Girls 8854095900 Call Girl in Dehradun Uttrakhand
Dehradun Call Girls 8854095900 Call Girl in Dehradun  UttrakhandDehradun Call Girls 8854095900 Call Girl in Dehradun  Uttrakhand
Dehradun Call Girls 8854095900 Call Girl in Dehradun Uttrakhand
 
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
 
Budhwar Peth ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Budhwar Peth ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Budhwar Peth ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Budhwar Peth ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Sexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort Service
Sexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort ServiceSexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort Service
Sexy Call Girl Villupuram Arshi 💚9058824046💚 Villupuram Escort Service
 
Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...
Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...
Independent Call Girls Hyderabad 💋 9352988975 💋 Genuine WhatsApp Number for R...
 
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
 
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetsurat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
 
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
(Deeksha) 💓 9920725232 💓High Profile Call Girls Navi Mumbai You Can Get The S...
 
Ludhiana Call Girls Service Just Call 6367187148 Top Class Call Girl Service ...
Ludhiana Call Girls Service Just Call 6367187148 Top Class Call Girl Service ...Ludhiana Call Girls Service Just Call 6367187148 Top Class Call Girl Service ...
Ludhiana Call Girls Service Just Call 6367187148 Top Class Call Girl Service ...
 
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real ServiceAECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
 

An ensemble deep learning classifier of entropy convolutional neural network and divergence weight bidirectional LSTM for efficient disease prediction

  • 1. Vol.:(0123456789) 1 3 Int J Syst Assur Eng Manag https://doi.org/10.1007/s13198-022-01837-5 ORIGINAL ARTICLE An ensemble deep learning classifier of entropy convolutional neural network and divergence weight bidirectional LSTM for efficient disease prediction S. R. Lavanya1 · R. Mallika2 Received: 7 March 2022 / Revised: 17 September 2022 / Accepted: 12 December 2022 © The Author(s) under exclusive licence to The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden 2022 Abstract According to the World Health Organization (WHO), worldwide rise in death rates amongst women is mostly attributable to HD (Heart Disease) and BC (breast cancer). In the last three decades, India has experienced a 30% increase in incidences of BC’s where every nine min- utes, a woman dies of BC. HD has become very common, and their early identification is crucial for clinicians to save lives and protect their patients. Early diagnosis of BC and HD are thus beneficial. MLT (Machine Learning Technique) have been used in medical diagnostics for detecting illnesses, minimizing modalities, and enhancing survival rates. Single classifier or Conventional classification algorithms exhibit a limited accuracy and affect the prediction outcome. Ensem- ble learning is a technique for producing many basic clas- sifiers are those that are used to construct new classifiers created that outperforms any constituent classifier. In this work, EDL (Ensemble Deep Learning) classifier is estimated based on two classifiers, ECNN (Entropy Convolutional Neural Network) and DWBi-LSTM. These classifiers are applied to an imputed dataset of heart disease and breast cancer, with selected feature subsets. In the ECNN classifier, the weight of the Convolutional layers is computed via the entropy function. In the DWBi-LSTM classifier, divergence is used to assign the attentive weights of each feature to the classification. The results of ECNN and DWBi-LSTM clas- sifiers are combined via bootstrap aggregation and the final classification is predicted by majority voting for accuracy. Metrics such as precision, recall, f-measure, specificity and accuracy has been used for assessing the results of proposed system, and existing classifiers. Keywords Heart disease · Breast cancer · Feature selection · ECNN · DWBi-LSTM · EDL Abbreviations AMCGWO Adaptive mean chaotic grey wolf optimization BC Breast cancer BiGRU​ Bidirectional gated recurrent unit Bi-LSTM Bidirectional- long short term memory BN Bayesian network BPNN Backpropagation neural network CAD Computer-aided diagnosis CNN Convolutional neural network CP Canonical polyadic DBN Deep belief network DLT Deep learning technique DNN Deep neural network DT Decision tree DWBi-LSTM Divergence weight bidirectional-long short-term memory ECNN Entropy convolutional neural network EDL Ensemble deep learning EM Expectation–maximization FCLF-CNN Fully connected layer first convolutional neural network FFNN Feed-forward neural network FNA Fine needle aspiration Grad-CAM Gradient-weighted class activation mapping HD Heart disease * S. R. Lavanya srlavanyaraj@gmail.com R. Mallika mallikapanneer@hotmail.com 1 Research and Development Centre, Bharathiar University, Coimbatore, India 2 Department of Computer Science, CBM College, Coimbatore, India
  • 2. Int J Syst Assur Eng Manag 1 3 KL Kullback–Leibler KNN K Nearest neighbor LM Liebenberg marquardt LR Logistic regression MCC Matthews correlation coefficient MFWCP Mode fuzzy weight based canonical polyadic MLP Multilayer perceptron MLT Machine learning technique MSD Mean squared deviation NB Naive bayes NN Neural network ReLU Rectified linear unit RNN Recurrent neural network SSAE-SM Stacked sparse autoencoder and softmax regression-based model SVM Support vector machine TFMF Trapezoidal fuzzy membership function UCI University of California Irvine WBCD Wisconsin breast cancer dataset WDBC Wisconsin diagnostic breast cancer WHO World health organization 1 Introduction Human beings suffer from a multitude of diseases that affect them both physically and psychologically. Diseases develop primarily from infections or deficiencies or heredity traits or organ dysfunctions. Doctors or medical experts detect and diagnose disease-affected humans and include medical ther- apy to treat them. Though many diseases can be cured with therapies, HDand BCcannot be cured despite treatment, but medications can prevent these diseases from getting worse over a period of time. BC is a type of cancer that devel- ops in breast cells and is a very common illness in women. According to projections for India in 2020, the number of Patients might reach two million (Malvia et al. 2017). Men are more likely to be affected by HD than women. Accord- ing to WHO, HD is responsible for 24% of non—communi- cable disease deaths in India. Assessing HD based on risk factors manually is a challenging task where the diagnosis is the process of determining, explaining, or establishing a human’s condition based on their symptoms and indications. Early and accurate diagnosis is critical because it affects treatment efficiency and prevents long-term consequences for the infected person. Diagnostic errors are responsible for around 10% of patient deaths and a variety of severe consequences and/or occurrences in hospitals (De and Chakraborty 2020). Errors in diagnosis may result from a variety of causes including mismatches in communica- tions between doctors, patients, and their families or poor diagnostic procedures, or inefficient information from healthcare systems. MLTs are sophisticated, automated techniques for analyz- ing high-dimensional, multimodal biomedical data and can greatly expedite and enhance medical diagnosis processes. One way of using these techniques is predicting depend- ent variables based on the values of independent variables. MLTs once designed can repeat tasks with great accuracy, a crucial factor for making decisions in healthcare. MLTs which can classify accurately are important components of CAD (Computer-Aided Diagnosis) systems designed for assisting medical practitioners who use them in the early diagnosis of abnormalities. CAD systems help radiologists visually examine mammograms to reduce chances of mis- diagnosis due to fatigue or eye strain or inexperience. Thus, proper use of CAD systems in healthcare can undoubtedly save lives. MLTs that have been used in CAD systems for predictions of abnormalities include SVM, KNN, NN, DT, and NB. The presence of irrelevant characteristics in training sets affects classifier performances which can be enhanced by discarding unnecessary features and choosing a subset of significant characteristics called feature selections. These approaches can be supervised (Song et al. 2017)-(Solorio- Fernández et al. 2020), unsupervised (Sheikhpour et al. 2017), or semi-supervised (Xu et al. 2016)-(Ang et al. 2015) based on how the training sets are labeled. Filter, wrapper, and filtering is used in guided component choice techniques like embedded has minimal effects on classifications. Wrap- per on the other hand uses prediction accuracies of preset learning algorithms to measure the quality of selected fea- tures. An embedded technique like filters begins by using statistical criteria for selecting many possible feature subsets with specific cardinalities. Subsequently, the subset with the best accuracy is chosen for classifications. Unsupervised fea- ture choices may be used on unmarked data, but it can be challenging to determine which characteristics are relevant. Marked and unmarked data are used in semi-supervised fea- ture choices to determine the feature’s significance. Biologi- cal processes can be processed by computational algorithms and thus aid in problem-solving and decision-making (Xue et al. 2015). On selection of features, DLT like DNN, RNN, DBN, and CNN can be used to diagnose diseases. Using ensemble algorithms enhances illness predictions or catego- rization accuracy (Rokach et al. 2014). This paper proposes a diagnostic framework for clin- ics using bio-inspired algorithms for feature selection and EDL (Ensemble Deep Learning) for the categorization of BC and HD. The obtained data is initially separated into categorical and continuously subsets, using BN (Bayesian Network)-imputed discrete fields and MFWCP-imputed ten- sor decomposition subsets. In addition, AMCGWO is used for wrapper-based feature selection technique which results
  • 3. Int J Syst Assur Eng Manag 1 3 in selecting key features from missing data imputations. Finally, EDL based classifier detects BC and HD. The pro- posed scheme’s outcomes are evaluated using performance evaluation metrics. This article’s remainder is organized as following; Outlines of related works on ensemble classi- fiers for BC and HD are provided in part 2, and the recom- mended technique is explained in Sect. 3. Division 4 shows the details of experimental studies along with results and discussions. In Sect. 5 research study is concluded in detail. 2 Literature review Raza (Raza 2019) In order to predict cardiac disorders, a variety of machine learning techniques, including Logistic Regression (LR), Neural Networks (NB), and the Multi- layer Perceptron (MLP), are combined with an accuracy of 88.88%. In order to get a final determination, by weighing and combining numerous separate classifiers, ensemble learning is also used to increase the validity of a categoriza- tion. The output of an assembly system is significantly influ- enced by how the classifiers’ outputs are mixed. The findings were compared to previously published studies, which found that the ensemble technique outperformed individual classi- fiers in terms of accuracy. The suggested ensembles method is presented to improve the model’s capacity to forecast car- diac illness accurately, robustly, and consistently, as well as to prevent patient’s misinterpretation. A majority vote ensemble approach developed by Atallah and Al-Mousa (Atallah and Al-Mousa 2019) predicted the existence of HD in individuals. Their forecasts were based on basic, low-cost medical test results conducted at clin- ics. They trained their model using real-life data consisting of healthy and sick individuals as they aimed to increase the trust and accuracy for diagnosing clinicians. In order to produce more accurate results, the study identified patients using many MLT. Their strategy resulted in 90% accuracy of detection while using the hard voting ensemble model. Sapra et al. (Sapra et al. 2021) employed MLT in the diagnosis of HD. Their scheme was a quick recursive pro- cedure that was very low in cost and accurate. Patient data from clinics were inputs for the scheme which was predicted based on these low-cost clinical test results. Moreover, since the scheme was put to test on the results of patients and healthy individual’s results, its results on predictions were more trustworthy. The study also benchmarked several MLT for evaluations where they found that their proposed strategy which employed a hard voting ensemble model resulted in 90% accuracy. Latha and Jeeva (Latha and Jeeva 2019) investigated ensemble classifications by combining numerous classi- fiers to increase the accuracy of weak algorithms. Their experiments using the tool were executed on a heart disease database dataset. The study employed comparative analytical methods to see how ensembles could be used to increase the prediction accuracy of HD. The use of ensem- ble classification resulted in a maximum accuracy boost of 7.00% for weak classifiers. Implementing feature selection boosted the process’s performance even further, with the results indicating a significant rise in prediction accuracy. In their data preparation, Baccouche et al. (Baccouche et al. 2020) proposed a feature selection phase. Experi- ments with unidirectional and bidirectional NNs revealed that ensemble classifiers with Bi-LSTM model combined with CNN achieved the most successful categorization result for predicting various types of HD and with accu- racy and F1-scores ranging from 91.00 to 96.00%. DLT based ensemble-learning framework could address classifi- cation issues of unbalanced heart disease datasets and their proposed technique could result in exceptionally accurate models suitable for clinical real-world diagnosis. Kadam et al. (Kadam et al. 2019) Sparse Auto encoders and Softmax Regression were used to categorize BC as benign or cancerous. The suggested approach was tested using the UCI machine learning library’s Wisconsin Diag- nostic Breast Cancer (WDBC) dataset utilizing efficiency in categorization, sensitivities, recollection, recollection, clarity, and f measure and Matthews correlation coefficient (MCC). The method has excellent reliability and efficiency characteristics. Their approach beat SSAE-SM and other classifiers in experiments, indicating the approach may be beneficial for categorizing BC. Elgin Christo et al. (Elgin Christo et al. 2019) described a clinical diagnostics system that used bio-inspired feature selection approaches with gradient descendent BPNN for classifications. The study’s correlation-based ensemble feature selections selected best features from three fea- ture subsets which were obtained via correlation-based ensemble feature selections and subsequently trained on gradient descendent BPNN. The study used ten-fold cross- validations to train and assess the classifier’s performance where classification accuracy was assessed using the UCI Machine Learning Repository’s Hepatitis dataset and WDBC dataset. Liu et al. (Liu et al. 2018) proposed the use using CNN to enhance the accuracy of categorization for datasets. The investigation to enhance organized data categorization effi- ciency recommended using FCLF-CNN (Fully Connected Layer First- Convolutional Neural Network). Before the first convolutional layer, fully connected layers are combined and fully connected layers are used as encoders or approximates to transform raw data into representations of locations. To boost performances, the study trained four different types of FCLF-CNN and combined them to form an ensemble FCLF-CNN. The results from WDBC and WBCD datasets were cross validated five folds. In classification results, the
  • 4. Int J Syst Assur Eng Manag 1 3 proposed FCLF-CNN outperformed MLP and CNN on both datasets. Masud et al. (Masud et al. 2020) created shallow propri- etary CNNs that outperformed pre-trained models across a wide range of performance metrics. To avoid bias, the study’s model was trained using a fivefold cross-validation strategy. Furthermore, the model was simpler to train than pre-trained models as it required very few trainable param- eters. The proposed Grad-CAM (Gradient-weighted Class Activation Mapping) heat map visualizations clearly dis- played that the proposed framework could extract crucial characteristics for diagnosing BCs.Recently, CAD system is developed to detect classes in an efficient manner. How- ever single classifiers are will not enhance performance of the system due to irrelevant features, incomplete dataset. Ensemble learning is a technique for producing with numer- ous basic classifications, a fresh one is created. created that outperforms any constituent classifier and then issues of missing data, and feature selection has been also solved by factorization, and optimization methods. 3 Proposed methodology The major advantage of the suggested strategy aims to improve breast cancer and heart disease performances. prediction by employing an EDL classifier. The proposed approach comprises five stages i.e., data splitting, data pre- processing, feature selection, training model, and perfor- mance evaluation. In the first Stage, the actual data is split into discrete and continuous feature sets. The missing value imputation is performed by the Bayesian network (BN) in the second stage. From this, reconstruction of data and ten- sor factorization is performed by MFWCP. In the third stage, Feature selection by AMCGWO is performed for reducing the number of features in the dataset. In the fourth Stage, the EDL classifier is utilized to improve two classifiers’ accu- racy of ECNN and DWBi-LSTM for diseases prediction. The evaluation metrics like precision, sensitivity, specific- ity, F-measure, and accuracy for assessing the classifiers. Figure 1 depicts the proposed method architecture. 3.1  Imputation methods for incomplete dataset When data values are missing, classifier accuracy suffers, and Imputed values based on these data become essential. BN makes assumptions in place of lacking data in that study because of its capacity to represent ambiguity through causa- tive relationships among factors. This work’s primary goal is to impute imbalanced datasets for enhanced predictions of BC. Missing at Random (MAR) is used to attain values that are absent from one of the reported databases instances. In this study, the dataset is utilized to teach the Directed Acyclic Graph (DAG) characteristics of dependent prob- abilities distributes. With only two stages, EM (Expecta- tion–Maximization) is a fast BN technique that iteratively finds the greatest probabilities. The Expectations phase com- putes the log probability of the data, and a Log step that computes the network’s current structure, log, and param- eters (Franzin et al. 2017). The maximizing stage next iden- tifies the parameters that maximize the probability from the prior action. Repeating the process up to the network reaches equilibrium or no parameters are present. As a consequence, the training of missing data is successful. 3.2  Reconstruction and imputation using tensor Via MFWCP Tensors are configurations that exist across several dimen- sions, and the degree of a tensor is proportional to the num- ber of dimensions it contains. Tensor factorizations result in more accuracy, but they take much more time to compute. Tensor factorization models Tucker and CP are quite well- known at this point (Yang et al. 2017). MFWCP, factoriza- tion has as its major objective the correction of errors that occur during the restoration of basic tensors as well as the aggregate of singular tensors ranking tensors with the fewest t deviations from the original tensor. TFMF (Trapezoidal Fuzzy Membership Function) are used to construct fuzzy weight tensors that are not negative and have fuzzy member- ship values. In order to compare comparable weighted ten- sors, such as the original tensor, to deliberate missing data imputation. Reconstruction results are measured via Mean Squared Deviation (MSD), and the results are calculated using Eq. (1) (Vazifehdan et al. 2019): where Output(t)i—estimated value in tth iteration and n— count of missing values. 3.3  Feature selection Via AMCGWO In this method, AMCGWO is used to choose features, and EDL is used to determine the optimum feature set. AMCGWO imitates grey wolves’ hunting and prey-search- ing behaviours for optimum database component extraction. AMCGWO believes grey wolf social structures relate 𝛼 first, 𝛽 second, 𝛿 at next, and finally 𝜔 wolves. 𝛼 are the dominant wolves that are utilized for leading and controlling the entire pack of grey wolves in order to choose those with the most desirable characteristics.𝛽 wolf is the supreme candidate that receives feedback from other wolves and provides it to the head wolf. The next level of wolves, i.e.𝛿 wolves, control the wolves, and final level 𝜔 wolves that are dependable (1) MSD = 1 n n ∑ i=1 ( Outputi(t) − Outputi(t − 1) )2
  • 5. Int J Syst Assur Eng Manag 1 3 for preserving the consistency and safety of the wolf group (Faris et al. 2018). The ranges of the method’s regulat- ing parameters, including a, A, and C, are first assessed the random vectors now ��⃗ r1 and ��⃗ r2 within [0, 1] are used to the wolves getting between them and their prey. Here, the concept’s mean is used to calculate the vector value. Enhance the randomized vectors if the mean value is more important for categorization; otherwise, lower it. Though the convergence rate of GWO is high, thus it doesn’t work fine in identifying global optima that affect the algorithm’s rate of convergence. Thus, so as to decrease the effect and enhance efficiency, the AMCGWO method was built by hav- ing confusion in the GWO method. For these chaotic maps, the initial value is identified between 0 and 1. Nevertheless, these initial values can have a significant change in chaotic map patterns. The current collection of chaotic maps is cho- sen using a variety of behaviors, with the starting value set at 0.7. Initially, stochastic initialization for the population is performed through the quantity of grey wolves. Next, map- ping of the selected ICMIC map with the approach is per- formed during the initialization of the initial chaotic value and the variable (Gandomi and Yang 2014). The AMCGWO Fig. 1  Proposed Feature Selec- tion and Ensemble Deep Learn- ing (EDL) Classifier for Disease Diagnosis Incomplete Data Tensor factorization with MFWCP Ensemble Deep Learning (EDL) Dataset with discrete missing values Dataset with continuous missing values Missing data imputation by Bayesian network Reconstruction and imputation using tensor factorization Optimal filled dataset Mean SquaredDeviation (MSD) No Yes ECNN DWBi-LSTM AMCGWO Performance analysis Bootstrap aggregation
  • 6. Int J Syst Assur Eng Manag 1 3 approach’s parameters a, A, and C are specified as being comparable to CGWO in order to be employed in extraction and exploratory operations. All of the grey wolves’ fitness is evaluated using the benchmark function, and characteristics are then ranked according to their fitness values. The most suited wolf is the best outcome of the AMCGWO procedure at the last iteration. 3.4  Ensemble deep learning (EDL) classification Ensemble is a method that may be used to increase a classi- fier’s accuracy. It’s a helpful meta categorization strategy to pair pairing less capable students with more capable students to increase the effectiveness of the less capable students. The performance of numerous illness detection algorithms is improved in this work using the ensemble deep learning (EDL) technique. The goal of integrating several classifiers is to achieve greater performance than an individual classi- fier. Figure 2 shows theensemble deep learning (edl) process. In this work two classifiers like ECNN and DWBi-LSTM are combined via bootstrap aggregation (Ren et al. 2017). 3.4.1  Entropy convolutional neural network (ECNN) CNN is modeled as an FFNN (Feed-Forward Neural Net- work) with fully linked, compression, and max-pooling lay- ers. Convolutions come next, then max-pooling layers, and the completely linked layer serves as the final layer (Bashir et al. 2015). 3.4.1.1 Convolutional layers In convolutional layers, the weights are represented as the multiplicative factor of the filters.In the proposed work, the weight of the Convolutional layers is computed via the entropy function. Entropy is used to compute the weight value of the layer by considering the feature range to classes. If the feature range is higher for the positive class, then the entropy range is also higher which results in increased weight value and a reduced bias value. If the feature range is lower for the positive class, then the entropy range is lower which results in reduced weight value and a reduced bias value. Based on this classifier results are enhanced for disease diagnosis. Let vi ∈ ℝk be k dimen- sioned feature vector related to the ­ ith sample of the data- base. An extended dataset is indicated by Eq. (2), here ⊕ represent the concatenation operator. A filter w ∈ ℝhk is used by the convolution operation, which is uti- lized in the time period for developing a fresh feature using h features. Consider, a feature ci is developed from a window of features vi∶i+h−1 by Eq. (3), where b ∈ R indicate a hyperbolic tangential and biased factor are examples of non-linear functions. The fil- ter is utilized by every feature window in the dataset { v1∶h, v2∶h+1, ..., vn−h+1∶n } to build a feature map by Eq. (4), with c ∈ ℝn−h+1. Max pooling method is performed on the feature map and acquires maximum value ̂ c = max{c} as the feature related to this filter. The goal is to search for highly significant features with high value for every feature map. 3.4.1.2 Dropout layer The dropout is performed with weighted vector l2-norms of constraint to have regulariza- tions. It is mentioned by Eq. (5), The output y in forward propagation, z is denoted as the input samples, dropout utilizes Eq. (6), where ◦ represent component multiplying operation, as well as r ∈ ℝm, the vector used to mask Bernoulli random vari- ables when p is 1. Gradients are transmitted backwards using uncovered integers. Sizing of learnt weight vectors by p is done during the test period so ̂ w = pw, and ̂ w is utilized to achieve samples. In addition to that l2-norm restriction of the weighted vectors by w’s resizing to ||w||2 = s when ||w||2 s after a step of steep decline.In Eq. (6), w and b denote the weight and bias of classifier which is calculated (2) v1∶n = v1 ⊕ v2 ⊕ … ⊕ vn (3) ci = f ( w.vi∶i+h−1 + b ) (4) c = [ c1, … .cn−h+1 ] (5) y = w.z + b (6) y = w.(z◦r) + b Training set 1 Training set 2 Classifier 1 Classifier 2 Ensemble Classifier Combined results using averaging Prediction in the test set Feature Selection Test set Fig. 2  Ensemble deep learning (EDL) process
  • 7. Int J Syst Assur Eng Manag 1 3 via the entropy basedon feature importance. The quadratic entropy of information is calculated by Eq. (7), Here P(x=k) is the chances that a specified characteristic will have a certain number, k. if the entropy value is higher than the weight and bias of the vector or the samples is increased. It may give importance to the feature to the clas- sifier to predict the positive or negative class. If the entropy is higher belonging to the positive class, then the w and b of the classifier is increased to improve the prediction rate. 3.4.1.3 Softmax layer or fully connected layer ReLU is employed as an activation function. ReLU definition is dem- onstrate in Eq. (8) When x0, output=0, x0, output=x. 3.4.1.4 Output layer The final layer contains n neurons related to n feature classes. This is a fully connected layer. The common method is considering the high output neuron as a class label of given input in classification (Sainath et al. 2013). 3.4.2  Divergence weight bidirectional‑ long short‑term memory DWBi-LSTM classifier is used to diagnose various dis- eases in this work. LSTM is a specialized RNN architec- ture designed to learn long-term relationships (Sahoo et al. 2020). The cell additionally takes previous cell output state (Ct−1), the cell input state (̃ Ct), and the cell output state (Ct). For classification of various diseases, the LSTM architecture comprises three gates: forget, input, and output, abbrevi- ated as ft, it, and ot correspondingly.DWBi-LSTM classi- fier weight is computed via Kullback–Leibler (KL) diver- gence function. If the feature range is wide, the KL value is enhanced, resulting in a larger weight value. Otherwise, the classifier’s weight value is lowered for classification. It improves the classifier’s accuracy and lowers the system’s error. The cell state serves as a network storage, transmitting important data along the series. Which data on the status of the cell is authorized is determined by the gateways, which are NN. When the HD and BC datasets are trained, the gates will learn which data is crucial to preserve and which to discard. Equations (9–12) may be used to calculate the value of gates and cell state, (7) Entropy(x) = − ∑ ( P(x = k) ∗ log2(P(x = k) ) (8) f(x) = {0, if x 0x, if x ≥ 0 were DWf, DWi, DWo and DWc are divergence weights linking layer’s contribution to the states of all gateways and input cells Uf, Ui, Uo, Uc are the value vectors connecting the inpu cell state and all of the gateways to the preceding cell terminal side.bf, bi, bo,bc are bias vectors. σ and tanh are, respectively, the sigmoid and tanh activation functions. Cellular output state ( Ct ) , output layer ( ht ) , at every time iteration t, is computed as byequation (13)-(14), LSTM layer, the result vector for each of the outputs is shown. as YT = [ hT−n, … hT−1 ] .. Bidirectional LSTM is based on bidirectional RNN (Houdt et al. 2020). It analyzes successive data using two separate hidden layers, and trave- ling ahead or backward links those layers to the same output layer. Figure 3 shows a layered architecture for the Bi-LSTM network. The sequence input layer is also known as the first layer embedding layer. As input, it uses the sorted chosen features from the HD and BC datasets. Hidden forward and back- ward LSTM layers are the second and third layers, giving the 100-hidden unit Bi-LSTM layer. These two layers relate present data to prior and future phases. Two data sequences reach the system via the hidden layer. The outputs of the hidden layers are integrated after processing to create the Bi-LSTM layer’s final output. The following Eq. (15) may be used to calculate the output from both LSTM layers, when it accepts sequence from x1 and xT as input, h f t and hb t indicates the relative results of the advanced and reverse LSTM layers. 𝛼 and 𝛽 are used to adjust the Bi-LSTM factors. At the time, ht is dual bidirectional LSTM components. Bi-output LSTM’s feeds a completely linked level with Five categories. This layer links input characteristic data to output information so subsequent layers can categorize them. Ultimately, the soft- max and classification layers divide data into several classes. (9) ft = 𝜎 ( DWf xt + Uf ht−1 + bf ) (10) it = 𝜎 ( DWixt + Uiht−1 + bi ) (11) ot = 𝜎 ( DWoxt + Uoht−1 + bo ) (12) ̃ Ct = tanh ( DWcxt + Ucht−1 + bc ) (13) Ct = ft ∗ Ct−1 + it ∗ ̃ Ct (14) ht = ot*tanh ( Ct ) (15) ht = 𝛼h f t + 𝛽hb t
  • 8. Int J Syst Assur Eng Manag 1 3 The softmax activations transform real vector values into in the range 0 to 1, allowing probabilities to be understood. In the somax regression (Wang et al. 2018), the probability of classifying into a class may be calculated using Eq. (16). where the value of K indicates the total amount of categories 𝜃 is denoted as the model parameters. The model gets the results from the softmax function in the classification layer and assigns every input for a class that makes use of the cross-entropy function via Eq. (17). N observations, K categories, tij denotes that the ­ ith sam- ple belongs to the ­ jth class and yij. denotes the softmax value. Weighting features is essential in classification because it ensures that each feature has the same benefits in compari- son to the targeted idea. Let us assume that when a given feature value is seen, it provides a certain quantity of in addition to calculating the comparative relevance of the targeted characteristic, information is also provided to the target feature of every distinguishing characteristic in the categorization scheme,the discrepancy between prior and posterior distributions of the target feature is used to define the amount of information contained in a particular feature value. The range of weight is computed using Kullback–Lei- bler (KL) metric, which is computed using Eq. (18), Here frij is the ­ ith feature’s j value in training samples. The weighted average of the KL measurements across the (16) P � y(i) = k|x;θ � = exp � θ(k)Tx � ∑K j=1 exp � θ(j)Tx � (17) loss = − N ∑ i=1 K ∑ j=1 tij ln yij (18) KL ( C|frij ) = ∑ C P ( c|frij ) log ( P ( c|frij ) P(c) ) feature values is the feature weight. As a result, the weight of feature i represented by Eq. (19) as fwavg(i), P( frij) is the probability that the feature i has the value of frij in this Eq. (19). Above weight fwavg(i) favors charac- teristics that include a large amount of entries; as a result, a range of records linked with each feature value is too less to make any reliable learning. Equation (20) defines the final form of the weight of feature i denoted as fw(i) here Z is a normalization constant which is computed by Eq. (21), The value of n in equations represents the number of selected features from the training data this Eq. (21). The normalized form of fw(i) (Eq. (20) is presented in this work in order to verify that ∑ i fw(i) = n. Lastly, each gate in the Bi-LSTM classifier is updated with this weight value. The network’s hyper-parameters are initialized once the network has been defined. The qualities on which the whole training process is based are known as model’s hyper-parameters where hyperparameters can be model-specific or optimiza- tion specific. These parameters include epoch counts, batch sizes, and learning rates which impact performances consid- erably when optimized. Model-specific parameters are ele- ments that impact structures like hidden units or layer counts. These hyperparameters directly control training (19) fwavg(i) = ∑ j|i P ( frij ) .KL ( C|frij ) (20) fw(i) = ∑ j�i P � frij � ∑ c P � c�frij � log �P(c�frij) P(c) � −Z. ∑ j�i P � frij � log � P � frij �� (21) Z = 1 n ∑ i fw(i) Fig. 3  DWBi-LSTM Network Classes Input data Padded Input data 1 2 3 ……… …… Sorting data 1 2 3 Embedding Layer LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM ℎ1 ℎ1 ℎ2 ℎ2 ℎ3 ℎ3 ℎ ℎ Fully Connected Layer Softmax Layer Classification Layer
  • 9. Int J Syst Assur Eng Manag 1 3 processes which have significant influences on the resulting performances of models. Hence, it is imperative to select the right parameters for a model’s learning. A huge number of trials are required to select optimal hyperparameters, which may be consuming time making operations difficult. Clas- sification accuracies of test sets are evaluated for obtaining appropriate hyperparameters sets. Hence, this research work diagnoses performances hyperparameters in both training and validation datasets for selecting the right mix in terms of classification results. This study considers learning rates, batch sizes, epoch counts, and hidden unit counts as its hyperparameters. 3.4.3 Bootstrap aggregation Bootstrap aggregation uses replacement to select some sam- ples from the training set at random. The term “bootstrap replicate” refers to a fresh training set. The process of obtaining bootstrap samples from data and training the clas- sifier with each individual sample is referred to as bagging. The combined votes of each classifier are tallied, and the final outcome of the classification is decided according to whichever vote has a greater number of affirmatives. Any other classifier that uses a majority vote may be merged into the majority voting classifier since it is a meta-classifier. The ultimate label for the class is the one that was anticipated by the vast majority of classifiers. The training set is used to generate samples for the bootstrap aggregation method, which selects them at random and replaces them. The new training set is referred to by the moniker bootstrap replicate throughout this article. The process of bagging entails obtaining samples of the bootstrap from the raw data, and then proceed to train the classifier using each individual example. After tallying up the votes from each classifier, the final outcome of the classification is decided according to whichever vote had the majority of support.The class label dJ is represented as dJ = mode { C1, C2, … Cn } , where {C1, C2, … Cn} refers to individual classifiers that partici- pated in voting. Let be ci,j the prediction of the ­ ith classifier on a class with j labels is represented as follows, n ∑ i=1 ci,j = max j=1,…,m n ∑ t=1 ci,j.The algorithm for bootstrap aggrega- tion is shown below. 4  Experiments and results discussion The influence of imputation methods was measured depend- ing on how well the predictions turned out. approaches in this study. The experiments assessed and examined perfor- mances based on detailed description of real datasets. 4.1 Datasets This study uses the UCI Machine Learning Repository’s WDBC and WOBC datasets for BCs, and its Hungarian and Switzerland datasets for HD. 4.1.1 WDBC Dataset, the characteristics of a digitized picture that belong to FNA (Fine Needle Aspiration) of breast mass have been added. The aforementioned features have defined the cell nuclei feature in the picture. There are 569 data points in the dataset, with 212 belonging to Malignant and 357 to Benign. The dataset was divided into 10 groups based on the following characteristics: fractal size, radius, symmetry, texture, concave points, perimeter, concavity, area, compact- ness, and smoothness. Three measures, such as Mean, SE, and the mean of the three greatest values, are calculated for each characteristic. As a result, there will be a total of 30 dataset features. 4.1.2 WOBC The dataset contains 699 samples obtained from the UCI repository. There are 458 benign samples and 241 malignant samples in the group. In addition, the dataset contains ten characteristics and one class. The class level is divided into two categories: benign and malignant. In addition, the data- base has missing data. The traits comprise the code number: id; clump thickness: 1–10; cell size–1–10; cell shape–1–10; peripheral bonding: 1–10; singular enterocytes size–1–10; naked nuclei: 1–10; bland chromatomatin: 1–10; regular nucleoli: 1–10; mitoses: 1–10; benign-2 and malignant-4 classes. 4.1.3  Heart disease (HD) Only a subset of 14 of the 76 qualities that make up heart disease (HD) are taken into account. Particularly, the Cleve- land database was used the most by MLT. A number of 0 (absence) to 4 (presence) in the goal field indicates the existence of cardiac disease. The tests performed on this database aimed to detect the present or absent of illness. 4.1.3.1 Hungarian dataset It contains 294 samples with 14 features. 4.1.3.2 Switzerland dataset It contains 123 samples with 14 features.
  • 10. Int J Syst Assur Eng Manag 1 3 4.2 Evaluation metrics Following completion of the dataset, any missing values are recognized, and the following metrics are utilized to calcu- late Precision, Recall, Specificity, F-measure, and Accuracy. Precision refers to the proportion of correctly recognized good results. Precision gives the proportion of positive predictions that are actually correct. It can be calculated by Eq. (22), Recall gauges the percentage of accurately anticipated positive results that actually materialized. It can be calcu- lated by Eq. (23), The F-measure may be represented using Eq. (24), Selectivity metric, which evaluates the percentage of properly detected negatives in comparison to the overall neg- ative predictions generated by the model, is often referred to as the true negative rate. It is possible to represent it using an Eq. (25), Accuracy is a measurement that has been recognized as being one of the highly acknowledged metrics to examine the classification efficiency, which has been carried out in this study for the purpose of cancer detection. It can be expressed as follows by Eq. (26), The imputation strategies result of the classifiers are measured usingNRMSE to find the missing value. The Eq. (27) to calculate NRMSE is described as follows (27), In which the true worth is shown by represented by xi, while simulated values is denoted by x ′ i . 4.3 Results comparison During the course of the tests, the suggested EDL classi- fier was evaluated in comparison to four other classifying (22) Precision = TP FP + TP (23) Recall = TP TP + FN (24) F − measure = 2 ∗ (Recall ∗ Precision) (Recall + Precision) (25) Specificity = TN FP + TN (26) Accuracy = TP + TN TP + FP + TN + FN (27) NRMSE = 1 max − min √ √ √ √1 n n ∑ i=1 ( xi − x � i )2 strategies, including KNN, DT, ANFIS, and CNN. All these classifiers are performed after the feature selection is com- pleted via the AMCGWO. Table 1 shows the results com- parison of feature selection with classifiers vs. datasets. Figure 4 depicts precision value comparison of the BC datasets (Fig. 4a) and the HD datasets (See Fig. 4b) with earlier classifiers. For WDBC and WOBC, the suggested MFWCP-EDL approach yields accuracy values of 98.7304% and 98.1207%, respectively. For Switzerland and Hungary, the similarly suggested MFWCP-EDL approach yields accu- racy values of 98.5446% and 91.6667%, respectively. Addi- tionally, there are additional techniques like WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS, and MFWCP-ANFIS, WCP-CNN, and MFWCP-CNN has the precision value of 56.3333%, 60.9211%, 56.8750%, 71.2698%, 73.4848%, 76.3158%, 83.3333%, and 85.7143% respectively for the Switzerland dataset (See Table 1). The results of the proposed system have higher precision due to optimal selection of features from the AMCGWO algorithm, it exactly predicts the true positive results. Recall results comparison of various classifiers with imputation methods for BC and Heart Disease (HD) datasets are shown in Fig. 5a, b respectively. For WDBC and WOBC, the suggested MFWCP-EDL approach yields recall values of 98.6364% and 98.3235%, respectively. For the Swiss and Hungarian datasets, the similarly suggested MFWCP-EDL approach yields recall values of 99.1150% and 98.9132%, respectively. In addition, there are more techniques like WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, WCP- ANFIS, and MFWCP-ANFIS, WCP-CNN, and MFWCP- CNN shows the recall value of 60.7843%, 75.7080%, 68.6275%, 85.5752%, 80.3922%, 96.0177%, 97.7876%, and 98.2301% respectively for Switzerland dataset. The proposed system has higher sensitivity results due to opti- mal features selection by AMCGWO algorithm; it exactly predicts the actual data correctly. Figure 6 displays the comparison of the F-measure value with categorization methods in relation to the BC data- sets (Fig. 6a) and the datasets for heart disease (HD) (See Fig. 6b). For WDBC and WOBC, the suggested MFWCP- EDL technique has values of 98.6834% and 98.2220%, respectively. For the Hungarian and Swiss databases, the identically suggested MFWCP-EDL approach yields F-measure values of 98.7286% and 95.2455%, respectively. In addition, there are more techniques like WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS, and MFWCP-ANFIS, WCP-CNN, and MFWCP-CNN shows the F-measure value of 58.7647%, 67.5143%, 61.8756%, 77.77 02%,76.7835%,85.0405%,89.9837%, and 91.5464%respec- tivelyfor Switzerland dataset. The suggested method accu- rately predicts the real data by increasing f-measure by the average while reducing the total amount of characteristics in the database.
  • 11. Int J Syst Assur Eng Manag 1 3 Table 1  Results Comparison of Feature Selection with Classifiers vs. Datasets Imputation AMCGWO+Classifiers WDBC Results (%) NRMSE Precision Recall F-measure Specificity Accuracy WCP KNN 69.6698 71.0979 70.3766 63.1981 70.1754 0.5461 DT 84.7445 85.5820 85.1612 76.0729 85.9649 0.3746 ANFIS 90.2199 91.4021 90.8071 81.2463 91.2281 0.2962 CNN 94.9526 95.8280 95.3883 85.1805 95.6063 0.2096 EDL 97.5141 97.6025 97.5593 86.7596 97.7153 0.1512 MFWCP KNN 80.0703 80.3774 80.2236 71.4446 80.5000 0.4416 DT 89.7456 89.8593 89.8024 79.8749 90.0000 0.3162 ANFIS 92.9545 93.4383 93.1958 83.0563 93.2500 0.2598 CNN 96.0864 96.2805 96.1834 85.5827 96.2500 0.1936 EDL 98.7304 98.6364 98.6834 87.6768 98.7698 0.1109 Imputation AMCGWO+Classifiers WOBC Results (%) NRMSE Precision Recall F-measure Specificity Accuracy WCP KNN 69.7442 70.2857 70.0139 62.4762 70.000 0.5477 DT 85.5054 86.4286 85.9645 76.8254 85.8333 0.3764 ANFIS 91.2896 91.7143 91.5014 81.5238 91.6667 0.2887 CNN 95.1193 95.1850 95.1522 84.6089 95.2500 0.2179 EDL 97.9570 97.9570 97.9570 87.0729 98.0000 0.1414 MFWCP KNN 78.6044 79.6271 79.1125 70.7797 79.9649 0.4476 DT 90.0764 91.2287 90.6489 81.0921 91.0369 0.2994 ANFIS 93.5409 93.7054 93.6231 83.2937 94.0246 0.2444 CNN 95.7174 95.6292 95.6733 85.0037 95.9578 0.2011 EDL 98.1207 98.3235 98.2220 87.3987 98.2500 0.1323 Imputation AMCGWO+Classifiers Hungarian dataset (%) NRMSE Precision Recall F-measure Specificity Accuracy WCP KNN 64.0625 68.5294 66.2207 60.9150 70.4545 0.5436 DT 79.3396 86.9118 82.9532 77.2549 85.2273 0.3844 ANFIS 87.5920 91.3235 89.4188 81.1765 92.0455 0.2820 CNN 93.2307 95.5658 94.8690 85.8363 95.1282 0.2103 EDL 96.2771 96.9744 96.6245 86.1995 97.2789 0.1650 MFWCP KNN 76.9502 81.4989 79.1592 72.4434 80.9524 0.4364 DT 87.7941 90.2191 88.9901 80.1947 90.8163 0.3030 ANFIS 90.2989 94.1575 92.1878 83.6956 93.1973 0.2608 CNN 94.1029 96.2355 95.0594 86.3649 95.9184 0.2020 EDL 98.5446 98.9132 98.7286 87.9229 98.9796 0.1010 Imputation AMCGWO+Classifiers Switzerland dataset (%) NRMSE Precision Recall F-measure Specificity Accuracy WCP KNN 56.3333 60.7843 58.7647 54.0305 70.2703 0.5452 DT 56.8750 68.6275 61.8756 61.0022 83.7838 0.4027 ANFIS 73.4848 80.3922 76.7835 71.4597 91.8919 0.2847 CNN 83.3333 97.7876 89.9837 86.9223 95.9350 0.2016 EDL 88.4615 98.6726 93.2885 87.7089 97.5610 0.1562 MFWCP KNN 60.9211 75.7080 67.5143 67.2960 80.4878 0.4417 DT 71.2698 85.5752 77.7702 76.0669 90.2439 0.3123 ANFIS 76.3158 96.0177 85.0405 85.3491 92.6829 0.2705 CNN 85.7143 98.2301 91.5464 87.3156 96.7480 0.1803 EDL 91.6667 99.1150 95.2455 88.1023 98.3740 0.1275
  • 12. Int J Syst Assur Eng Manag 1 3 Specificity evaluation of various classifiers for Fig. 7 is an illustration of BC datasets. For WDBC and WOBC, the suggested MFWCP-EDL technique yields specific- ity values of 87.6768% and 87.3987%, correspondingly. For the Swiss and Hungarian datasets, the identically sug- gested MFWCP-EDL technique yields particular values of 88.1023% and 87.9229%, correspondingly. The alternative techniques, include WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, and WCP-ANFIS, MFWCP-ANFIS, WCP- CNN, and MFWCP-CNN shows the specificity of 54.0305%, 67.2960%, 61.0022%, 76.0669%, 71.4597%, 85.3491%, 86.9223%, and 87.3156% respectively for Switzerland data- set (See Table 1). Overall accuracy comparison of several classifiers with imputation, Fig. 8 shows datasets with regard to BC. For WDBC and WOBC, the proposed MFWCP-EDL tech- nique demonstrates superior accuracy with 98.7698% and 98.2500%, correspondingly. For the Swiss and Hungarian datasets, the identically suggested MFWCP-EDL approach yields correctness of 98.9796% and 98.3740%, correspond- ingly. In addition, there are more techniques like WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS, and MFWCP-ANFIS., WCP-CNN, and MFWCP-CNN gives the accuracy value of 70.2703%, 80.4878%, 83.7838%, 90.2439%, 91.8919%, 92.6829%, 95.9350%, and 96.7480% respectively for Switzerlanddataset.The proposed system has increased accuracy by correctly classifying the samples as positive, and negative. Figure 9 depicts the NRMSE evaluation of classifica- tion methods on BC and HD datasets. The methods like WCP-KNN, MFWCP-KNN, WCP-DT, MFWCP-DT, WCP-ANFIS, MFWCP-ANFIS, WCP-CNN, and MFWCP- CNN gives the NRMSE of 0.5452, 0.4417, 0.4027, 0.3123, 0.2847, 0.2705, 0.2016, and 0.1803 respectively for the Switzerland dataset. Fig. 4  Precision Comparison VS Classifiers Fig. 5  Recall Value Comparison Vs Classifiers
  • 13. Int J Syst Assur Eng Manag 1 3 5  Conclusion and future work In many classification issues when the feature training matrix is missing, imputation of missing data is a typical use. Simultaneously, feature selection becomes more sig- nificant, particularly in data sets with a huge number of ele- ments and variables. Imputation of missing data and feature selection, and classification issues are solved for multiple disease diagnoses. Initially, missing value imputation is done by Bayesian network (BN) and optimal missing data imputation reconstruction of data is performed by the pro- cess of factoring tensors via MFWCP for the imputed data. AMCGWO is a wrapper-based strategy for picking just the optimum characteristics. It collects the average value of the characteristic and introduces the ICMIC map to improve GWO’s performance. EDL improves the efficiency of disease-diagnosis algorithms. EDL is built on ECNN and DWBi-LSTM through bootstrap aggregation. In the ECNN classifier, the weight and bias of the classifier are calcu- lated via the entropy with feature-based importance. In the DWBi-LSTM classifier, weight features play a major vital role in classification that every feature has similar impor- tance regarding target concept by Kullback–Leibler (KL) divergence. The majority voting classifier merges classifiers by majority vote. Most classifiers anticipated the final class label. In order to forecast illnesses, classifiers’ outcomes are assessed.When compared against MFWCP-KNN, MFWCP- DT, MFWCP-ANFIS, and MFWCP-CNN, sequentially for the Switzerland dataset, the suggested MFWCP-EDL tech- nique has produced a higher accuracy of 98.374%, which is 17.8862%, 8.1301%, 5.6911%, and 1.626% greater than Fig. 6  F-Measure Value Comparison Vs Classifiers Fig. 7  Specificity Results Comparison vs. Classifiers
  • 14. Int J Syst Assur Eng Manag 1 3 those methods’ corresponding previous bests. This study has been expanded by integrating more datasets, and new deep learning algorithms have also been incorporated to improve the effectiveness of the classifier. Declarations Conflict of interest The authors declare they have no conflicts of interest. Research involving Human Participants and/or Animals In our work, no animals or human are involved. Informed consent Not applicable as no human or animal sample was involved in this study. References Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsu- pervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989 Atallah R, Al-Mousa A (2019) Heart disease detection using machine learning majority voting ensemble method. In: Proceedings of the 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), pp. 1–6, Amman, Jordan, October 2019. Baccouche A, Garcia-Zapirain B, Castillo Olea C, Elmaghraby A (2020) Ensemble deep learning models for heart disease classifi- cation: a case study from Mexico. Information 11(4):1–28 Bashir S, Qamar U, Khan FH (2015) BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objec- tive optimized voting. Australas Phys Eng Sci Med 38(2):305–323 De S, Chakraborty B (2020) Disease Detection System (DDS) Using Machine Learning Technique. In Machine Learning with Health Care Perspective (pp. 107–132). Springer, Cham. Elgin Christo VR, Khanna Nehemiah H, Minu B, Kannan A (2019) Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural net- work. Comput Math Methods Med 2019(7398307):1–17 Faris H, Aljarah I, Al-Betar MA, Mirjalili S (2018) Grey wolf opti- mizer: a review of recent variants and applications. Neural Com- put Appl 30(2):413–435 Franzin A, Sambo F, Di Camillo B (2017) BNSTRUCT: an R package for Bayesian network structure learning in the presence of missing data. Bioinformatics 33(8):1250–1252 Gandomi AH, Yang X-S (2014) Chaotic bat algorithm. J Comput Sci 5(2):224–232 Kadam VJ, Jadhav SM, Vijayakumar K (2019) Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoen- coders and softmax regression. J Med Syst 43(8):1–11 Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked 16:1–9 Liu K, Kang G, Zhang N, Hou B (2018) Breast cancer classifica- tion based on fully-connected layer first Convolutional neural networks. IEEE Access 6:23722–23732 Fig. 8  Accuracy Results Comparison Vs Classifiers Fig. 9  NRMSE Value Comparison Vs Classifiers
  • 15. Int J Syst Assur Eng Manag 1 3 Malvia S, Bagadi SA, Dubey US, Saxena S (2017) Epidemiol- ogy of breast cancer in Indian women. Asia Pac J Clin Oncol 13(4):289–295 Masud M, Rashed AEE, Hossain MS (2020) Convolutional neural network-based models for diagnosis of breast cancer. Neural Comput Appl, pp.1–12. Raza K (2019) Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule,” InU-Health- care Monitoring Systems, pp. 179–196, 2019. Ren Y, Zhao P, Sheng Y, Yao D, Xu Z (2017) Robust softmax regres- sion for multi-class classification with self-paced learning. In: Proceedings of the ­ 26th International Joint Conference on Artificial Intelligence (pp. 2641–2647). Rokach L, Schclar A, Itach E (2014) Ensemble methods for multi- label classification. Exp Syst Appl 41(16):7507–7523 Sahoo AK, Pradhan C, Das H (2020) Performance evaluation of different machine learning methods and deep-learning based Convolutional neural network for health decision making. In Nature inspired computing for data science (pp. 201–212). Springer, Cham. Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013) Deep Convolutional neural networks for LVCSR. In: Proceed- ings of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’13), pp. 8614–8618, 2013. Sapra L, Sandhu JK, Goyal N (2021) Intelligent method for detection of coronary artery disease with ensemble approach. Advances in Communication and Computational Technology, vol. 1033–1042, 2021. Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recogn 64:141–158 Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948 Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (2017) Super- vised feature selection via dependence estimation,” in Proceed- ings of the ­ 24th international conference on Machine learning, pp. 823–830, ACM, Corvallis, OR, USA, June 2017. Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Artif Intell Rev 53(8):5929–5955 Vazifehdan M, Moattar MH, Jalali M (2019) A hybrid Bayesian net- work and tensor factorization approach for missing value impu- tation to improve BC recurrence prediction. J King Saud Univ- Comput Inform Sci 31(2):175–184 Wang J, Wen G, Yang S, Liu Y (2018) Remaining useful life estimation in prognostics using deep bidirectional LSTM neural network. In 2018 Prognostics and System Health Management Conference (PHM-Chongqing) ,pp. 1037–1042. Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626 Xu J, Tang B, He H, Man H (2016) Semisupervised feature selection based on relevance and redundancy criteria. IEEE Trans Neural Netw Learn Syst 28(9):1974–1984 Yang F, Shang F, Huang Y, Cheng J, Li J, Zhao Y, Zhao R (2017) LFTF: a framework for efficient tensor analytics at scale. Proceed VLDB Endowment 10(7):745–756 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.