SlideShare a Scribd company logo
1 of 25
Download to read offline
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=twrm20
Waves in Random and Complex Media
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/twrm20
Deep convolutional neural networks accurately
predict breast cancer using mammograms
Lal Hussain, Sara Ansari, Mamoona Shabir, Shahzad Ahmad Qureshi, Amjad
Aldweesh, Abdulfattah Omar, Zahoor Iqbal & Syed Ahmed Chan Bukhari
To cite this article: Lal Hussain, Sara Ansari, Mamoona Shabir, Shahzad Ahmad Qureshi,
Amjad Aldweesh, Abdulfattah Omar, Zahoor Iqbal & Syed Ahmed Chan Bukhari (2023): Deep
convolutional neural networks accurately predict breast cancer using mammograms, Waves in
Random and Complex Media, DOI: 10.1080/17455030.2023.2189485
To link to this article: https://doi.org/10.1080/17455030.2023.2189485
Published online: 14 Mar 2023.
Submit your article to this journal
View related articles
View Crossmark data
WAVES IN RANDOM AND COMPLEX MEDIA
https://doi.org/10.1080/17455030.2023.2189485
Deep convolutional neural networks accurately predict breast
cancer using mammograms
Lal Hussaina,b, Sara Ansaric, Mamoona Shabird, Shahzad Ahmad Qureshie,
Amjad Aldweeshf, Abdulfattah Omarg, Zahoor Iqbalh and Syed Ahmed Chan Bukharii
aDepartment of Computer Science & IT, Neelum Campus, The University of Azad Jammu and Kashmir,
Muzaffarabad, Pakistan; bDepartment of Computer Science & IT, King Abdullah Campus, The University of
Azad Jammu and Kashmir, Muzaffarabad, Pakistan; cThe Children’s Hospital, University of Child Sciences,
Lahore, Pakistan; dServices Institute of Medical Sciences, Lahore, Pakistan; eDepartment of Computer and
Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad, Pakistan;
fCollege of Computer Science and Information Technology, Shaqra University, Shaqra, Saudi Arabia;
gDepartment of English, College of Science & Humanities, Prince Sattam Bin Abdulaziz University, Al Kharj,
Saudi Arabia; hDepartment of Mathematics, Quaid-i-Azam University, Islamabad, Pakistan; iHealthcare
Informatics, St. John’s University, Queens, NY, USA
ABSTRACT
Breast cancer in women is the most frequently diagnosed and major
leading cause of cancer deaths. Due to the complex nature of micro-
calcification and masses, radiologists fail to diagnose breast can-
cer properly. In this research paper, we have employed a novel
Deep Convolutional Neural Network (DCNN) model using a transfer
learning strategy and compared the results with Machine Learning
(ML) techniques such as Support vector machine (SVM) kernels and
Decision Trees based on different features extracting strategies to
distinguish cancer mammograms from normal subjects. In this study,
we first extracted the hand-crafted features such as as texture, mor-
phological, entropy-based, scale-invariant feature transform (SIFT),
and elliptic Fourier descriptors (EFDs) and fed into machine learn-
ing algorithm for classification. We then utilized the deep learning
algorithms with transfer learning approach. The deep learning mod-
els yielded the highest detection performance with default and
optimized parameters i.e. GoogleNet yielded accuracy (99.26%),
AUC (0.9998) with default parameters and AlexNet yielded accuracy
(99.26%), AUC (0.9996) with optimized parameters. The results reveal
that proposed approach is more robust for early detection of breast
mammogramswhichcanbebestutilizedforimproveddiagnosisand
prognosis.
ARTICLE HISTORY
Received 29 November 2021
Accepted 20 February 2023
KEYWORDS
Breast cancer; deep learning
(DL); convolutional neural
network (CNN); GoogleNet;
AlexNet; support vector
machine (SVM); scale
invariant feature transform
(SIFT)
1. Introduction
Breast cancer is among women most frequently diagnosed cancers. In developing coun-
tries, breast cancer accounts for 23% of the total cancer cases, and 1.6 million new cases
of breast cancer are estimated worldwide, affecting women [1–3]. Breast cancer accounts
for nearly one in three cancers among US women excluding skin cancer and is the second
CONTACT Lal Hussain lall_hussain2008@live.com; Amjad Aldweesh a.aldweesh@su.edu.sa
© 2023 Informa UK Limited, trading as Taylor & Francis Group
2 L. HUSSAIN ET AL.
leading cause of cancer death among women after lung cancer [4]. In 2016, about 29%
of deaths were accounted in females due to breast cancer in the United States State. In
2016, it was estimated that 595,690 Americans would die from cancer, corresponding to
1600 deaths per day [5]. The most common causes of cancer deaths are lung and bronchus,
prostate, and colorectal cancers in men, and for women, these include lung and bronchus,
breast, and colorectal cancers. The invasive cancer lifetime probability of being diagnosed
in men (42%) is higher than in women (38%). This may be reflected due to external dif-
ferences in environmental exposure, endogenous hormones, and complex interaction
between these influences. Cancer incidences and deaths in both men and women are asso-
ciated with an adult height determined by genetics and childhood nutrition accounting for
1/3 of 6 differences in cancer risk [5,6]. The cancer risk for adults younger than 50 years is
higher in women (5.4%) than for men (3.4%) because of the relatively high burden of breast,
genital, and thyroid cancers in young women [7].
The early diagnosis and detection of breast cancer can decrease the death rate and
provide means for prompt treatment. Breast cancer is diagnosed and detected using a com-
bination of approaches, including imaging, physical examination, and biopsy [8]. One of
the imaging techniques used to detect breast cancer is mammography, where X-rays are
used to create images, known as mammograms, of the breast. Radiologists are trained to
read mammograms to detect the signs of breast cancer. The effectiveness of the screen-
ing process can rely on radiologists’ explanations [9]. Patients affected by palpable breast
cancer may have a sonogram and mammogram examination with both normal and benign
or nonspecific appearance [10]. The biopsy is used to confirm the symptoms of breast can-
cer, but it is an invasive surgical operation causing a psychological and physical impact on
patients. To avoid unnecessary biopsies, researchers have devised and investigated var-
ious computer-aided diagnosis (CAD) systems [3,11] providing stable detection rates by
identifying ultrasound & clinical features [12], using data mining classification techniques,
medical imaging and computer-aided diagnostics [13], and breast magnetic resonance
imaging (MRI) [14].
As far as mammography is concerned, the research evidence that radiologists may miss
up to 30% of breast cancers depending on the density of the breasts [15]. The mammo-
grams in breast cancer have been evaluated using two powerful indicators: masses and
micro-calcifications. Mass detection is more challenging than micro-calcification, not due
to the large variation in size and shape in which masses can appear in mammograms
but also because masses often exhibit poor image contrast [16]. Radiologists read mam-
mograms based on their experience, training, and subjective criteria. There may be a
65–75% inter-observer variation rate even by the trained experts [17]. Hence, computer-
aided diagnosis (CAD) may help radiologists to interpret mammograms to detect and
classify masses. The literature also reveals that about 65–90% of the biopsies of suspected
cancers turned out to be benign. Thus, it is essentially to develop techniques that can dis-
tinguish the malignant and benign lesions. The combination of computer-aided diagnosis
(CAD), expert knowledge, and Machine Learning (ML) techniques would greatly improve
detection accuracy. The detection accuracy without CAD was obtained below 80%, and
with computer-aided diagnosis (CAD) above 90% [18]. CAD can automatically identify
the area of abnormal contrast, calling the radiologist towards suspicious regions. Thus,
mammograms with computer-aided diagnosis (CAD) will improve the detection of can-
cer. The cancer masses and micro-calcifications in many cases are hidden in the intense
WAVES IN RANDOM AND COMPLEX MEDIA 3
breast tissues, especially in younger women, that are complex to detect and diagnose
cancer [3].
Features extraction is an important step to detect any pathologies from physiological
and neurophysiological systems. Likewise, time–frequency representation methods were
employed by [19] to determine the correlation and coupling between the brain waves dur-
ing resting states. Hussain et al. [20] extracted multimodal features based on fuzzy entropy
to detect arrhythmia, which outperformed the traditional features extracting approaches
and hybrid features [21] by employing regression methods to detect and predict epilep-
tic seizures. Moreover, to distinguish normal images from malignant subjects, researchers
extracted different imaging-related features. Karahaliou et al. [22] used a probabilistic neu-
ral network to diagnose breast cancer by extracting multi-scale texture properties of the
tissue surrounding the micro-calcifications. In the past few decades, other approaches have
also been used to detect and diagnose breast cancer, viz., a probabilistic algorithm and
radial gradient index-based algorithm [23], Convolution Neural Network (CNN) classifier
[24], and a mixed feature-based neural network [25], fractal geometry and analysis using
digital mammograms [26–28], and a method for automated segmentation of individual
micro-calcifications in a region of interest (ROI). Recently, Hussain et al. [29] computed
the associations between the morphological features extracted from the prostate cancer
images and found very stronger associations among the features.
In the past, researchers employed different hand-crafted feature-extracting strategies
such as texture, morphology, gray level co-occurrence matrix, histogram of oriented gra-
dients, scale-invariant feature transform, or a hybrid of these features for a brain tumor,
prostate cancer, and arrhythmia detection using ML and DL techniques [20,30,31]. The
existing techniques have some limitations; the graph-based techniques are competitively
expensive. The other computer-aided diagnosis (CAD) techniques based on texture fea-
tures exploited general texture features for classification and fail to provide the background
knowledge of morphological features. The machine learning methods based on differ-
ent feature-extracting strategies have limitations as different researchers employ different
feature-extracting methods. However, these classifiers are not fine-tuned for challenging
contrast existing in features.
With the advent of modern computational systems, ML-related Artificial Intelligence
application and graphical processing units (GPU) embedded processors have achieved
exponential growth by developing novel models and methodologies which is currently
knownasDL[32].TheDL-basedConvolutionNeuralNetwork(CNN)modeladoptsthearchi-
tecture of an artificial neural network that contains a much larger number of processing
layers which is contrary to the shallower architecture. CNN’s drastically reduce the struc-
tural elements (i.e. neurons) in comparison to traditional feedforward neural networks [32].
For image processing, different baseline architectures of CNNs have been developed and
successfully applied to complicated image-processing tasks.
The breast cancer diagnosis has accompanied classification and segmentation perfor-
mance improvement due to the representation learning, a characteristic of DL, due to its
auto-feature extraction proficiency as compared with the handpicked feature extraction
requirement in ML [33]. The learning phase is characterized by the flow of information
exhibiting the capability of self-leering [34]. In DL, the Bayesian framework determines
uncertainty in the model output using a Bayesian neural network [35,36]. Donald F. Specht
introduced a probabilistic neural network (PNN), using the Bayesian classification theory,
4 L. HUSSAIN ET AL.
consisting of three layers, viz. Input, Radial Basis, and Competitive layers [37,38]. PNN
has been used to categorize mammography images into normal, benign, and malignant
classes. The discrete wavelet transforms been used to find the input feature vector as
handpicked features. They used seventy-five mammograms in their study and claimed an
accuracy of 90%.
Zhang, Lin, et al. [39] introduced a three-stage neural network method to alleviate the
false positive rate of microcalcification in mammographic images. The microcalcification
was detected in the first stage, followed by the second stage, where the FP detection was
reduced from the first stage output. Lastly, in the third stage, the Kalman filter-based back
propagation neural network isolated the microcalcifications in the mammograms.
The DL networks using CNN achieved outclass performance for the detection and clas-
sification of masses and microcalcifications. In this context, Fukushima et al introduced a
light-weight CNN, known as ‘Recognition’, for medical image analysis [40,41]. Lo et al. [42]
introduced a CNN with multiple circular paths where information was first collected from
the suspected regions of mammograms, followed by processing as features using CNN.
Sahiner et al. [43] proposed a CNN for mammography where selected regions, extracted
by either averaging or subsampling were input to the CNN.
Jiao et al. [44] classified breast masses using a DL-based strategy where intensity-based
features were combined with CNN-extracted features using mammograms. Fonseca et al.
[45] used CNN with an SVM classifier for the classification of breast cancer. Su et al. [46]
introduced a rapid CNN method for breast cancer categorization where the semantic
segmentation was carried out to reduce redundant information at the cost of higher com-
plexity of the CNN model. Huynh et al. [47] used CNN by transfer learning to classify masses
and microcalcification. Arevalo et al. [48] introduced a method that did not use hand crafted
features where CNN was used to learn the data representation in a supervised learning
manner from biopsy images of 344 breast cancer patients.
Rezaeilouyeh et al. [49] proposed a microscopic breast cancer classification model using
CNN where the shearlet transform-based images were obtained as the feature vectors.
Subsequently, the shearlet coefficients were input to the CNN for classification. Jaffar [50]
proposed a method that was based on the enhancement as preprocessing of mammo-
grams, followed by CNN for feature extraction. The features were used to train the SVM
classifier. Jadoon et al. [51] introduced a dual deep neural networks-based classification
model for classes, viz. benign, malignant and normal. These algorithms were convolutional
neural network-discrete wavelet and convolutional neural network-curvelet transform. The
features extracted from discrete wavelet and curvelet transform based coefficients were
fused and fed to the CNN. The CNN was trained on softmax and SVM for classification.
Gastounioti et al. [52] used an ensemble classifier for breast cancer categorization. The
textural feature maps, obtained from lattice-based methods, were fed to the CNN for
multi-class categorization. Wang et al. [53] proposed a hybrid approach for breast can-
cer classification into benign and malignant classes. The cropping and clinical features are
extracted using multi-view patches of mammograms. Finally, the CNN was trained using
multiple features to focus on the regions related to semantic-based lesions. Zhu et al. [54]
introduced a combination of a fully convolutional network to segment the masses within
mammograms by using a conditional random field. The method estimated ROIs on empir-
ical basis with prior information on positions that helped to improve the prediction of
ROIs.
WAVES IN RANDOM AND COMPLEX MEDIA 5
Ribli et al. [55] introduced Faster Regions with Convolutional Neural Networks (R-CNN)
forbreastcancerclassificationasbenignandmalignantcases.InFasterR-CNN,theROIpool-
ing method was used to extract the features that are fed to the VGG-16 model. The output
of the method resulted as bounding boxes with a confidence score that decides the class
of cancer. Chiao et al. [56] proposed an improved version of the region proposal network
called Mask R-CNN that was used for the detection and segmentation of cancer regions in
mammograms. The Mask R-CNN method used the ROI alignment technique. After the fea-
ture extraction from the ROI Align method, CNN was used for detection and classification
processes.Nahidetal.[57]usedLSTMfortheclassificationofmicrocalcificationsandmasses
by transforming mammograms into 1D-vector format, followed by conversion into time-
series data. A total of 7909 images were used from the BreakHis histopathological dataset
which were evaluated on SVM and Softmax at the decision layer.
In contrast, the DL convolution neural network models with TL approaches are fine-
tuned to optimize the parameters by minimizing the error. In this study, we have tested
the generalization of the breast cancer mammographic images through AlexNet [33], and
GoogleNet [58] as pre-trained CNN models using a TL approach verified in literature [59,60]
in the most widely used imaging datasets. The features and training data were desired to
lie within the same feature space. Transfer learning has the capability to allow the users to
extract pre-known expertise and apply it on the new domain by reducing overall computa-
tional time with the images lying in the combined feature space of two known TL methods
on a broader spectrum with marked discrimination in feature space. The widened solution
space, using the feature fusion, has resulted in the outclass performance.
2. Methods
2.1. Datasets
Datasets were taken from publicly available databases provided by the University of South
Florida [61] available online at (http://marathon.csee.usf.edu/Mammography/Database.
html). In DDSM images, suspicious regions of interest are marked by experienced radi-
ologists, and BI-RADS information is also annotated for each abnormal region. In our
experiment, we used mass instance images digitized by LUMYSIS. This dataset contains
approximately 2500 studies. We used the latest volumes of the DDMS database, i.e. 12 nor-
mal volumes and 15 cancer volumes, 15 containing a total of 899 images, including 500
cancer images having 105 cases and 399 normal subject images having 100 cases.
2.2. Convolutional neural network
Due to the outclass performance, CNNs have been used for breast cancer classification
[62]. An end-to-end CNN architecture was applied to classify the cancer images directly to.
To obtain high performance, we require a careful combination of pre-processing, TL, and
data augmentation. In this proposed work, the performance was evaluated using two net-
work architectures of CNN, namely AlexNet [33] and GoogleNet [58]. For both networks,
the same architecture was used only replacing the last fully connected (FC) layer to output
two classes. From GoogleNet, two auxiliary classifiers were removed. We also used batch
normalization to regularize the data flowing between neural network layers reducing the
6 L. HUSSAIN ET AL.
internalcovariateshift[63].Theinputof224 × 224 × 3imageswassuppliedtothenetwork.
CNN consists of convolution blocks composed of 3 × 3 convolutions – Batch Norm-ReLU-
Max Pooling, with respectively 32, 32, and 64 filters each followed by three fully connected
layersofsize128,64,and2.Thefinallayerisofsoft-maxforbinaryclassification.Inthisstudy,
we used default and optimized parameters as: Xavier’s [64] weight initialization, ReLU acti-
vation function, and Adam’s [62] update rule. We used a base learning rate of 10−04 and
mini-batch size of 20 and 64, while for optimized parameters, we used a momentum of 0.9,
an initial learning rate of 0.001, learning rate drop factor of 0.1, L2 regularization of 0.004,
batch size of 20, epoch 2, etc.
Let us consider input y (suppose depicted object in the image) using the model
y = f(x, θ). Since the model is not previously known, our aim is to use a generic model by
describing through a set of parameters θ that are specialized in the target task. This can be
done using a supervised ML approach by presenting a model using a set of input exam-
ples and labels pairing (x, y) and updating iteratively its parameters so that the obtained
output is near to the possibly original associated labels. The difference between the label
ŷ predicted from the model and desired label y, the loss function (y − ŷ) is employed. The
main purpose of this learning process is to select the parameter θ values that minimize
such a function. An optimization method is desired to adjust the parameter θ values from
the family of the gradient descent algorithm.
2.2.1. Deep learning ResNet101
ResNet101, named after its 101 layers of the residual network, contains a modified version
of ResNet 50 architecture. The ResNet model was originally proposed by He et al. in 2016
[32]. ResNet is an abbreviation for residual networks and has been employed in solving
numerous problems related to computer vision and its other applications. ResNet is one
of the deepest Convolutional Neural Network architectures used on large scales and has
been used for a wide range of applications in the ImageNet dataset (i.e. object detection
and recognition, various classification purposes). Generally, the multiple layers of a CNN
are interconnected to each other in a specified manner; these layers are trained to perform
various tasks. The basic idea behind ResNet architecture and its implementation is based
on residual network connections across which the gradients pass to inhibit the gradients
to zero after employing the chain rule [32]. ResNet101 has 104 convolutional layers along
with 33 filters (blocks), with one block for each layer, respectively. Nine out of 33 layers use
the output of previous layers directly, which is known as a residual connection. These resid-
ual connections are used as the first operand of the summation operator at the end of each
layer to obtain the input from other layers. The remaining 4 layers receive the output of the
previous block as input and employ it in the convolutional layer with a filter size of 1 × 1 and
a stride of 1, followed by a group of normalization layers. This normalization layer is used
to perform normalization operations, and then the obtained output is transferred to the
summation operator at the output of that block. The depth of each block may vary accord-
ing to the density of each block [65]. The general architecture of ResNet101 is reflected in
Figure 5. Moreover, Figures 6 and 7 reflect the replaced layers of ResNet101 before and after
fine-tuning (Figure 1).
The hyper-parameter settings found empirically for ResNet101 are depicted in Table 1.
The hyper-parameters of the CNN models were adjusted heuristically to facilitate the
convergence of the loss function during training. The Adam optimizer was chosen because
WAVES IN RANDOM AND COMPLEX MEDIA 7
Figure 1. ResNet101 overall architecture.
Table 1. Empirically tuned set of parameters.
Model Parameter Value
ResNet101 Optimizer Adam
(TL Deep CNN) Momentum 0.90
Initial learning rate 0.0001
L2 Regularization 0.00004
Max epochs 10
Minibatch size 12
of its learning rate and the parameter-specific adaptive nature of the learning rates. The
initial learning rates were chosen as 0.0001 for ResNet101. A large learning rate may pre-
vent the loss of function from converging and could cause overshoots. An extremely small
learning rate drastically increases the training time. The mini-batch size of 10 and 12 was
set, according to the speed of training and computational requirements. Extremely large
values of batch size adversely affect the model quality.
2.2.2. GoogleNet
On a new set of cancer images, the GoogleNet was retrained. The weights of the earlier
layers were frozen in the network by setting the learning rate to zero. During the freezing
of the training layers, parameters were not updated because the gradients of these layers
were not computed, and this helped to improve the network performance significantly.
This property also helps to avoid overfitting the new dataset. The first 110 layers in the
GoogleNet include the inception module. By using the freezeWeights(), the learning rates
of the first 110 layers were set to zero. The layers in original order were reconnected using
the CreateLgraph() using the connection function while the earlier layers learning rate was
set to zero. Figure 2 illustrates the schematic diagram of GoogleNet model.
2.2.2.1. Train network-framework. As the training network require input images of the
size 224 × 224 × 3 and 227 × 227 × 3 for GoogleNet and AlexNet, respectively, but images
in the datasets have different size. So, we used imresize() function to resize the images of
8 L. HUSSAIN ET AL.
Figure 2. Schematic diagram of GoogleNet architecture.
different size equivalent to the input images size. The TL-based framework adopts ResNet-
101 (2048 features) and GoogleNet (1000 features) using mammograms. The features after
fusion (3048 features) were used for each image. The entire dataset was fed to the cross-
validation (10-fold) stage. The optimized model was used to determine the performance of
the test instances for discriminating the healthy and diseased subjects.
2.2.2.2. Transfer learning (TL) approach. We applied the TL approach, using networks
such as GoogleNet and AlexNet of CNN pre-trained on the ImageNet comprising of
inception-, convolution- and fully-connected-layers. The fully connected layers require
fixed image input for processing while convolution layers can work with arbitrary input
image size. To avoid overfitting in the training, the images are resized for GoogleNet and
AlexNet as 224 × 224 × 3 and CNN as 227 × 227 × 3. Moreover, for GoogleNet, we modi-
fied the dimension of the last fully connected layers from 1000 to 2. Likewise, the last fully
connected layer was also completely re-initialized randomly while all other layers main-
tained their weights from the pre-training. The shallow layers are general and low-level
image features, while deeper layers are high-level and task specific. Thus, the learning rate
of deeper layers should be larger than that of shallow layers. The batch size was set to 20,
the initial learning rate of 10−4, and maximum epochs of 6 using 378 iterations.
The CNN entire training from scratch can be cumbersome because a small dataset may
cause the problem of overfitting. To tackle this kind of problem, a TL technique is employed.
This technique can solve a new problem from previously learned knowledge with a better
WAVES IN RANDOM AND COMPLEX MEDIA 9
solution by extracting knowledge from source tasks and applying knowledge to a target
task by applying the task T and domain D concepts.
Consider a Domain D = {χ, P(X)} comprising of a feature space χ and marginal prob-
ability distribution P(X), where X = {x1, x2, x3, . . . ..xn}χ. In Domain, D = {χ, P(X)} a task
T = {γ , f(.)} comprised of a label γ and objective predictive function f(.) learning from the
training data, comprised of a pair {xi, yi}, where xiχ and yiγ , i.e. predicting corresponding
label f(x) of a new instance x. Consider a source domain Ds and corresponding source task
Ts, target domain Dt and corresponding target task Tt using the knowledge in Ds and Dt TL
approach is aimed to help in improving the learning of target predictive function ft (.) in Dt
where Ds = Dt and Ts = Tt [66].
To employ TL on CNN, various approaches have been employed [67]. A CNN trained
previously on another task, say image classification using ImageNet dataset [68] can dis-
tinguish two approaches: (a) Fine tuning: Using this approach, the network parameters are
retrained by propagating back the error to the whole network [69], (b) Freezinglayers: Using
this approach, most of the transferred features remained unchanged during the training of
the new task. Due to this fact, the most common generic features are contained in the first
layer, which is common to many problems, while other layers progressively become more
specific to the target dataset [70].
Applying the proper type of TL to a specific task requires several factors into considera-
tion. The most important factors include the dataset size [71] and its similarity to the dataset
used in the originally trained network [72], viz. ImageNet. When the dataset is smaller than
the original dataset, the concept of the freezing layer approach is most feasible because
low-level features are also relevant for the target dataset. Moreover, the smaller dataset
may lead to overfitting when the fine-tuned approach is employed, suggested that when
the bigger data is available instead. The latter approach is also suitable when we have a
different dataset available than the original one.
2.2.2.3. Convolutional layer. In CNN, the Convolutional Layer is the main building block.
In the basic CNN, the convolution filter is a generalized linear model (GLM) for the underly-
ing local image patch. It works at the abstraction level and when the instances of latent are
separable linearly. This layer has learnable filter parameters and 3D matrices of numerical
values, which are spatially smaller than the input ones in terms of dimension. According
to the design choice, the width and height are fixed while the depth is fixed according to
the number of input channels i.e. the number of 2D inputs in the layer. These filters, during
the forward pass, slide across the height and width of the input at any position. The filter
slicing operation is translated mathematically into a dot product between the filter and the
input at any position. A 2D output result takes the name of the activation map, which will
be stacked along the depth dimension with the other activation maps to make the out-
put volume. By employing the zero padding techniques, the spatial size of the output is
controlled.
For convolutional layer l, the output of the ith filter is denoted by yl
i with total number of
C filters, mathematically expressed as:
yl
i = s
⎛
⎝
Ci−1

j=1
fl
i,j∗ yl−1
i + bl
⎞
⎠ (1)
10 L. HUSSAIN ET AL.
For layer l, the bias vector is denoted by bl, ith filter of the convolution layer is denoted by
fl
i,j which connect to the jth feature map of layer l-1, and activation function is represented
by s.
A convolution operation during the backward pass is also employed but filters are
flipped spatially along both axes for height and width. Using the backpropagation
algorithm, the parameter fl
i,j is updated and learned by the network. Using this approach,
the network is capable of learning various types of filters to solve any kind of tasks with their
specialized properties.
2.2.2.4. Pooling layer. The Convolution layer is followed by the Pooling layer. Its major
functionality is to reduce the spatial size of the input layer and to operate independently
on every depth slice. This layer is nonparametric and consists of filters that slide with a prior
fixed value of stride from the input layer to produce the output [32,73]. It used the filter
functions: Max Pooling and Average Pooling.
2.2.2.5. Fully connected layer. To convert the combined features in the class score, at
least one fully connected (FC) layer is present in CNN before the output of the network. In
this layer, each neuron is connected to all other neurons in the layers before it by consid-
ering the mesh topology strategy. The main function of this layer is to learn parameters
(biases and weights) to map the input layer to the corresponding output layer.
The output yl for FC layer l can be computed as given by:
yl
= s(yl−1
∗Wl
+ bl
) (2)
whereWl andbl denotetheweightsandbiasvectorsoflayerl,andsrepresenttheactivation
function. FC layers contrary to the convolution layer do not support parameter sharing. Due
to this property, the learnable parameters with CNN are substantially increased.
2.2.2.6. Activation function. The nonlinearity in the network to learn more complex
functions is determined by employing the activation function. In the DL framework, the
nonlinear transformation from input to output is performed using the activation functions
from the nonlinear layers and their combination with other layers [74,75]. Therefore, an
appropriate activation function is required for better feature-extracting strategy [33,76,77].
A brief overview of the most commonly used activation functions g () is given by:
The sigmoid function is given by: g(a) = 1
1+e−a , where a denotes the input from the front
layer. The values of the sigmoid function are transformed with values ranges from 0 to 1 and
commonly used to produce a Bernoulli distribution as given by:
g̃ =

0, if g(a) ≤ 0.5
1, if g(a)  0.5
(3)
The hyperbolic tangent function is given by: g(a) = tanh(a) = ea+e−a
ea+e−a , where the derivative
of g is determined by: g = 1 − g2, makes it comfortable to work with the BP algorithms.
The Softmax function is given by: g(a) = eai

j e
aj . This layer is used commonly as output
final layers that an be considered as a probability distribution over the categories.
The Rectified Linear Unit (ReLU) is the most widely used activation function as given
by: g(a) = max(0, a). Using gradient base algorithms, ReLU using the property of linear
WAVES IN RANDOM AND COMPLEX MEDIA 11
models make them easy to optimize. This is easy to implement and greatly accelerate the
convergence of optimization methods [32,73]. A superior performance is shown using this
activation function and its variants. Moreover, in DL this activation function most popular
so far [77–80]. The gradient diffusion problems can also be solved using ReLU function
[74,81,82].
The Softplus function, a variant of ReLU, is given by: g(a) = log(1 + ea). The smooth
approximation of ReLU is computed using this function.
The absolute value rectification function is given by: g(a) = |a| is used for taking the
average value in CNNs by the pooling layer [81] being capable to preventing negative and
positive features from diminishing.
The Maxout function is given by: gi(x) = maxi(bi + wi.x). In this case, a three-
dimensionalarrayisusedforweightmatrix,theneighboringlayersconnectionscorrespond
to the third array [75].
2.2.2.7. Optimization objective. A regularization term and loss function are used to
compute the objective function. The discrepancy between the output of the network is
measured using the loss function which depends on the expected result y and the model
parameter (θ)f(x|θ). For example, in classification tasks denoted by true class labels and in
prediction tasks denoted by true level. Due to this ability, the learning algorithm not only
performs well on training data but also on testing data. The test error-reducing strategy
is known as regularization [74,75]. To prevent overly complex models, some regularization
parameters apply penalties to the parameters. The commonly used loss function and reg-
ularization parameters are represented by L(f(θ)) and Ω(θ). The optimization objective is
defined as given by:
L̃(X, y, θ) = L(f(θ), y) + α (θ) (4)
where α represents the balance of these two components, and pragmatically the loss func-
tion is computed usually across the randomly sampled training samples rather than the
data generating distribution because the latter is unknown.
2.2.2.8. Lossfunction. Mostnetworksusedcrossentropybetweenthemodeldistribution
and training data as the loss function. The commonly used cross entropy is the nega-
tive conditional log-likelihood as given by: L(f(θ), y) = − log log P(x, θ), which represents
the loss function collection corresponding to the distribution y gives the value of input
variable x. Consider the following commonly used loss function. Suppose y is a contin-
uous function and has Gaussian distribution over a given variable x. The loss function is
given by:
L(f(θ), y) = − log
1
2πσ2
exp exp
−1
2σ2
(y − f)2
(5)
=
1
2σ2
(y − f)2
+
1
2
log log(2πσ2
) (6)
This is described equivalently to the squared error which was the most commonly used
loss function in the 1980s [74,75]. However, the outliers are excessively penalized leading
to slower convergence rates [83]. Consider, the output variable y following the Bernoulli
12 L. HUSSAIN ET AL.
distribution, then the loss function is represented as:
L(θ), y) = −y log f(θ) − (1 − y) log(1 − f(θ) (7)
Where y is discrete and has only two values, for example, y(1, 2, 3 . . . ., k), then we can use
the Softmax value as the probability over the categories, then the loss function will be.
L(f(θ), y) = − log

eay

j eaj

(8)
= ay + log
⎛
⎝

j
eaj
⎞
⎠ (9)
2.2.2.9. Regularization term. For regularization, the parameter L2 is commonly used,
which contributes to the convexity of the optimization objective by converging to the min-
imum of the solution using Hessian matrix [66,84]. The regularization parameter L2 can be
defined as follow:
Ω(θ) =
1
2
||ω||2
(10)
The networks connecting unit weights are represented by Ω.
2.2.3. Performance evaluation parameters
Breast cancer and normal subjects are classified using ML classifiers, and performance is
measured by computing sensitivity, specificity, PPV, NPV, and Total Accuracy.
2.2.3.1. Sensitivity. The sensitivity measure is used to test the proportion of people who
test positive for the disease among those who have the disease. Mathematically, it is
expressed as:
Sensitivity =
TP
TP + FN
(11)
2.2.3.2. Specificity. Specificity measures the proportion of negatives that are correctly
identified. Mathematically, it is expressed as:
Specificity =
TN
TN + FP
(12)
2.2.3.3. Positive predictive value (PPV). It is mathematically being expressed as:
PPV =
TP
TP + FP
(13)
2.2.3.4. Negative predictive value (NPV). It is mathematically being expressed as:
NPV =
TN
TN + FN
(14)
WAVES IN RANDOM AND COMPLEX MEDIA 13
2.2.3.5. Total accuracy (TA). The total accuracy is computed as:
TA =
TP + TN
TP + FP + FN + TN
(15)
2.2.4. Training/testing data formulation
The Jack-knife k-fold cross-validation (CV) technique was applied for training/testing data
formulation and parameter optimization. In this research, 2,4,5, and 10-fold CVs were used
to evaluate the performance of classifiers for different feature extracting strategies. The
higherperformancewasobtainedusinga10-foldCV,wherethedataisdividedinto10folds,
in training, the 9 folds participate and classes of samples of remaining folds are predicted
based on the training performed on 9 folds. For the trained models, the test samples in the
test fold are purely unseen. The entire process is repeated 10 times and each class sam-
ple is predicted accordingly. A similar approach is applied to other CVs. Finally, the unseen
samples predicted labels are used to determine the classification accuracy.
2.2.5. Receiver operating curve (ROC)
The ROC is plotted against the true positive rate (TPR) i.e. sensitivity and false positive rate
(FPR) i.e. specificity values of prostate and brachytherapy subjects. The mean features val-
ues for brachytherapy subjects are classified as 1 and for prostate, subjects are classified as
0. This vector is then passed the ROC function, which plots each sample value against speci-
ficity and sensitivity values. To diagnose and visualize the performance of a classifier, ROC is
one of the standard ways to measure performance [85]. The TPR is plotted against the y-axis
and FPR is plotted against the x-axis. The area under the curve (AUC) shows the portion of
a square unit. Its value lies between 0 and 1. Seemingly, AUC  0.5 shows the separation.
The higher AUC shows a better diagnostic system. Correct positive cases divided by the
total number of positive cases are represented by TPR, while negative cases predicted as
positive divided by the total number of negative cases are represented by FPR.
3. Results
In this research, we have employed DL CNN models using a TL approach to detect breast
cancer. We also extracted multimodal features such as texture, morphological, SIFT, EFDs,
and entropy from these mammograms and applied ML classifiers such as the Bayesian
approach, Support Vector Machine (SVM) kernels – Polynomial, RBF, Gaussian and Deci-
sion Tree. Using the TL approach, we trained the GoogleNet and AlexNet pre-trained
models with 500 Breast and 399 Normal mammograms. The features are then extracted
using the Softmax layer. The performance was evaluated in terms of sensitivity, specificity,
Positive predictive value (PPV), negative predictive value (NPV), total accuracy (TA), false
positive rate (FPR) and area under the receiver operating curve (AUC) as reflected in Table
1 and Figures 3–6. For ML methods, four stages namely pre-processing, features extrac-
tion, classification, training/test data formulation, and classification of images into normal
and cancer/malignant using SVM, Decision Tree and Bayesian classifier, were employed as
detailed in [50]. The texture, morphological, entropy, SIFT, and EFDs features are extracted
as discussed by [31,86,87]. In the DL TL approaches, we resized the images according to the
network requirements and then trained the GoogleNet and AlexNet pre-trained models
with a new set of cancer images.
14 L. HUSSAIN ET AL.
Figure 3. Transfer learning-based proposed framework for detection of masses and microcalcification
using mammographic images.
Figure 4. Performance evaluation using ML and DL methods.
Using ML classifiers, with Naïve Bayes, the highest performance in terms of total accuracy
(TA) was obtained with SIFT feature (TA = 57.54%) followed by Entropy (TA = 56.06%),
Texture, Morphological, EFDs with (TA = 55.84%). The other performance metrics for the
Bayes classifier are reflected in Table 1. Using the SVM polynomial classifier, the highest
performance was obtained with texture feature (TA = 82.65%) followed by morphologi-
cal and entropy (AUC = 82.42%), EFDs (TA = 77.42%) and SIFT (TA = 67.49%). The SVM
WAVES IN RANDOM AND COMPLEX MEDIA 15
Figure 5. PerformancemeasureinformofAUCusingMLmethodsusing(a)EntropyFeatures,(b)Texture
Features and DL Methods (c) AlexNet, (d) GoogleNet.
RBF gives the highest performance with entropy (TA = 85.21%) followed by Morphological
(TA = 84.20%), texture (TA = 83.98%), SIFT (73.68%) and EFDs (TA = 72.75%). Moreover,
using SVM Gaussian, the highest performance was obtained with entropy (TA = 84.87%)
followed by morphological (TA = 83.43%), texture (TA = 83.31%), SIFT (TA = 74.39%) and
EFDs (TA = 73.75%). The ML Decision tree classifier gives the highest performance with
entropy (TA = 85.65%) followed by morphological (TA = 84.87%), SIFT (TA = 74.04%),
texture (TA = 55.17%) and EFDs (TA = 47.16%). Using DL-CNN models, the highest perfor-
mance was obtained using GoogleNet with default parameters  AlexNet with optimized
parameters (TA = 99.42%) followed by AlexNet with default parameters (TA = 98.89%),
and GoogleNet with optimized parameters (TA = 98.03%). The other performance metrics
in terms of sensitivity, specificity, PPV, NPV, FPR and AUC are reflected in Table 2.
Figure 4 depicts the evaluation performance using ML classifiers and CNN methods
to detect breast cancer. For ML, different features are extracted, such as texture, mor-
phology, entropy, SIFT and EFDs where these classifiers outer performed, and results are
compared with CNN methods. Using the Bayes classifier, SIFT features outer performed
with sensitivity (57.54%), specificity (43.81%), PPV (75.68%), NPV (81.62%), TA (57.54%),
FPR (0.5619) and AUC (0.5088). Using the SVM polynomial kernel, the texture feature gives
the highest performance with sensitivity (82.55%), specificity (82.46%), TA (82.55%) and
16 L. HUSSAIN ET AL.
Figure 6. Performance evaluation using GoogleNet with initial parameters and 378 iterations.
AUC (0.5045). SVM RBF gives the highest performance using entropy features obtain-
ing sensitivity (85.21%), specificity (83.95%), TA (85.21%) and AUC (0.8857). Likewise, SVM
Gaussian with entropy features gives the highest performance with sensitivity (84.87%),
specificity (83.07%), TA (84.87%), and AUC (0.8779). The Decision tree classifier gives the
highest performance using entropy features with sensitivity (85.65%), specificity (84.75%),
TA (85.65%) and AUC (0.9173). The performance using CNN methods was evaluated
using GoogleNet and AlexNet with default and optimized parameters. DL GoogleNet with
default parameters gives the performance of sensitivity (99.26%), specificity (99.24%), PPV
(99.26%), TA (99.26%), FPR (0.00076), and AUC (0.9998). GoogleNet with optimized param-
eters gives the performance of sensitivity (98.15%), specificity (98.19%), PPV (98.15%), NPV
(98.03%), TA (98.15%), FPR (0.0181) and AUC (0.9983). Similarly, DL CNN AlexNet method
with default (auto) parameters gives sensitivity (98.89%), specificity (98.94%), PPV (98.89%),
NPV (98.78%), TA (98.89%), FPR (0.0106) and AUC (0.9981). Moreover, AlexNet with opti-
mized parameters gives the performance of sensitivity (99.26%), specificity (99.07%), PPV
(99.27%), NPV (99.42%), TA (99.26%), FPR (0.00093) and AUC (0.9996).
Figure 5 depicts the performance evaluation in terms of AUC to separate breast can-
cer subjects from normal subjects using ML classifiers with a different set of features
which outer performed and CNN methods. Using entropy features, the highest separa-
tion was obtained using Decision Tree with (AUC = 0.9173) followed by SVM RBF with
(AUC = 0.8857), SVM Gaussian (AUC = 0.8779) and Naïve Bayes, and SVM polynomial with
(AUC = 0.507). Similarly, with texture features, the highest separation was obtained using
SVM RBF with (AUC = 0.8968) followed by SVM Gaussian with (AUC = 0.8918), Decision
Tree with (AUC = 0.6878) and Naïve Bayes  SVM Polynomial with (AUC = 0.5045) as
reflected in Figure 5(a–b). The performance in terms of AUC using DL GoogleNet with
WAVES IN RANDOM AND COMPLEX MEDIA 17
Table 2. Performance evaluation based on Different extracted features using ML Classifiers and TL
Approaches using DL Methods.
Features Sensitivity Specificity PPV NPV TA FPR AUC
Bayes
Texture 0.5584 0.4466 0.7538 0.8036 0.5584 0.5534 0.5045
Morphological 0.5584 0.4466 0.7538 0.8036 0.5584 0.5534 0.5045
SIFT 0.5754 0.4381 0.7568 0.8162 0.5754 0.5619 0.5088
EFDs 0.5584 0.4466 0.7538 0.8036 0.5584 0.5534 0.5045
Entropy 0.5606 0.4494 0.7545 0.8041 0.5606 0.5506 0.507
SVM polynomial
Texture 0.8265 0.8246 0.8271 0.821 0.8265 0.1754 0.5045
Morphological 0.8242 0.8213 0.8246 0.8191 0.8242 0.1787 0.5045
SIFT 0.6749 0.6547 0.6729 0.6624 0.6749 0.3453 0.5088
EFDs 0.7742 0.7712 0.775 0.7677 0.7742 0.2288 0.5045
Entropy 0.8242 0.8213 0.8246 0.8191 0.8242 0.1787 0.507
SVM RBF
Texture 0.8398 0.8317 0.8396 0.8383 0.8398 0.1683 0.8968
Morphological 0.8420 0.8375 0.842 0.8383 0.8420 0.1625 0.9069
SIFT 0.7368 0.7248 0.7364 0.7268 0.7368 0.2752 0.7948
EFDs 0.7275 0.6929 0.7343 0.7406 0.7275 0.3071 0.7940
Entropy 0.8521 0.8359 0.8546 0.8597 0.8521 0.1641 0.8857
SVM Gaussian
Texture 0.8331 0.8269 0.8329 0.8300 0.8331 0.1731 0.8918
Morphological 0.8343 0.8318 0.8346 0.8292 0.8343 0.1682 0.9109
SIFT 0.7439 0.7274 0.7427 0.7356 0.7439 0.2726 0.7990
EFDs 0.7375 0.7055 0.7433 0.7492 0.7375 0.2945 0.7945
Entropy 0.8487 0.8307 0.8522 0.8585 0.8487 0.1693 0.8779
Decision tree
Texture 0.5517 0.6013 0.6028 0.5814 0.5517 0.3987 0.6878
Morphological 0.8487 0.8443 0.8487 0.8451 0.8487 0.1557 0.9117
SIFT 0.7404 0.7235 0.7391 0.732 0.7404 0.2765 0.8039
EFDs 0.4716 0.5566 0.5412 0.5232 0.4716 0.4434 0.5175
Entropy 0.8565 0.8475 0.8565 0.8567 0.8565 0.1525 0.9173
DL
GoogleNet AutoP 0.9926 0.9924 0.9926 0.9924 0.9926 0.0076 0.9998
GoogleNet DiffP 0.9815 0.9819 0.9815 0.9803 0.9815 0.0181 0.9983
AlexNet AutoP 0.9889 0.9894 0.9889 0.9878 0.9889 0.0106 0.9981
AlexNet DiffP 0.9926 0.9907 0.9927 0.9942 0.9926 0.00093 0.9996
Legends: AutoP (Auto/default parameters), DiffP (Different/Optimized Parameters).
default parameters was obtained with (AUC = 0.9998) and AlexNet with an optimized set
of parameters as (AUC = 0.9996).
Figure 6 depicts the performance using GoogleNet with default parameters for 6 epochs
and 378 iterations. For each training and validation, the accuracy was observed lower in
the 1st and 2nd epoch accordingly higher the loss. The accuracy becomes higher in higher
iterations and epochs with a decrease in loss. After the 2nd epoch, there were almost
steady values of accuracy near 100% and lower loss of less than 0.3 as can be observed in
Figure 6.
Figure 6(a–b) shows the loss and accuracy in different iterations obtained using
GoogleNet. In initial iterations, the mini-batch and validation values were higher and
decreased in higher iterations. As shown in Figure 7(a), the mini batch at selected iter-
ations using GoogleNet was found as 1st iteration (0.8184), 10th iteration (0.2545), 20th
iteration (0.2597), 45th iteration (0.2059) and 55th iteration (0.0422). Similarly, validation
18 L. HUSSAIN ET AL.
Figure 7. Performance measure using GoogleNet (a) Loss, (b) Accuracy.
loss at selected iterations was found as 1st iteration (0.7308), 10th iteration (0.3007), 20th
iteration (0.1162), 45th iteration (0.0685) and 55th iteration (0.0669). Moreover, accuracy
at selected iterations using GoogleNet is reflected in Figure 7(b). The validation accuracy
was found as 1st iteration (40%), 10th iteration (90%), 20th iteration (85%), 45th iteration
WAVES IN RANDOM AND COMPLEX MEDIA 19
(90%), and 55th iteration (100%). Similarly, mini-batch accuracy was found as 1st iteration
(36.30%), 10th iteration (84.44%), 20th iteration (96.60%), 45th and 55th iteration (98.15%).
4. Discussions
The CNN uses a convolution operation on the convolution layer which serves as a detection
filter for the presence of a particular feature or pattern present in the original data. Instead
of being a priori assigned as in conventional image processing, parameters of such filters
and learned based on training data and are specialized to solve the problem at hand. This
shows that lower layers of CNN can detect features that are usually common for each of
the image recognition tasks, such as edges and curves [67]. Convolutional Neural Networks
(CNNs) have had the greatest impact within the field of health informatics. Its architecture
can be defined as an interleaved set of feed-forward layers implementing convolutional
filters followed by reduction, rectification, or pooling layers. Each layer in the network orig-
inates a high-level abstract feature [88]. First, in CNNs weights in the network are shared
in such a way that the network performs convolution operations on images. This way, the
model does not need to learn separate detectors for the same object occurring at different
positions in an image, making the network equivariant with respect to translations of the
input. It also drastically reduces the number of parameters (i.e. the number of weights no
longer depends on the size of the input image) that need to be learned.
In DL, the first CNN winning the ILSVRCs, which also made CNN’s very popular, was the
AlexNet architecture [33]. This architecture comprises 5 convolutional layers, max-pooling
layers, dropout layers, and three fully connected layers and employs ReLU as an activation
function. It obtained a top-5 error rate of 15.6%, the error in classifying an image within the
closest five classes. AlexNet was then improved in the next year by the authors by modify-
ing its parameters and achieving a top-5 error rate of 11.2% [59]. In 2014, VGGNet [89], even
though it did not win the competition, showed that it was possible to reduce the number
of parameters and, at the same time to increase the depth of the network, achieving bet-
ter performance than the architecture mentioned above with an error rate of 7.3%. This
architecture is composed of more convolutional layers than AlexNet, 13 exactly, which are
smaller in terms of filter dimensions leading to a reduction of parameters but being able to
learn more high-level features than previous CNN.
Another essential architecture, the winner of the ILSVRC 2014 with an error rate of 6.7%,
is GoogleNet [59,89]. The architecture changes the way of structuring CNN architectures,
which stack single layers one upon other sequentially, introducing the inception module.
The architecture is modularized, and the main block is the inception module, which is com-
posed of convolutional layers that are arranged in parallel. GoogleNet has 122 layers, but
not all in sequential, as in AlexNet, parts of the network are executed in parallel, mainly its
Inception module. Each of its nine Inception modules is a network within the network layer
leading to over 100 layers total. GoogleNet trained on ‘a few high-end GPUs within a week’
[14].
In the present study, we first extracted hand-crafted features and fed to different tra-
ditional machine leaning (ML) algorithms. For ML techniques, different features such as
texture, morphological, entropy based, SIFT, and EFDs are extracted from breast cancer
mammograms. In the second phase CNN methods utilizing a TL approach was employed in
which GoogleNet and AlexNet pre-trained models are trained. The deep learning methods
20 L. HUSSAIN ET AL.
are more robust when the data volume is large. Moreover, deep learning models utilizes
the feature engineering processing using the domain knowledge by extracting high-level
characteristics directly from the data. This capability decreases the DL effort and time to
construct a feature extractor for each problem. The GoogleNet was retrained on the new
set of cancer images. The weights in the earlier layers in the network were frozen by set-
ting learning rate to zero. Moreover, the parameters were not updated during freezing the
training layers, which help to improve the network performance significantly and also help-
ful to avoid overfitting. For each model, we have used default and optimized parameters
for evaluating the performance. The deep learning model with transfer learning approach
effectively utilizing previously learning model knowledge to solve the new task with fine-
tuningorminimumtraining.Thedeeptransferlearning(DTL)approachishelpfultoaddress
the computational issues. By applying the traditional machine learning algorithms with
hand-crafted features, the Naïve Bayes yielded the highest accuracy (57.54%) with SIFT
features. The SVM Polynomial yielded the highest accuracy (82.65%) with texture feature,
SVM RBF provided an accuracy (85.21%) with entropy features, SVM Gaussian with entropy
features provided an accuracy (84.87%), and decision tree yielded an accuracy (85.65%)
with entropy features. The deep learning models with transfer learning approach improved
the classification performance i.e. GoogleNet with default parameters yielded accuracy
(99.26%), AUC (0.9998), and AlexNet with optimized parameters yielded accuracy (99.26%)
and AUC (0.9996).
5. Conclusion
In this research, the CNN models are employed. Results are compared with ML classi-
fication techniques such as SVM kernels, Bayesian approach, and Decision Tree to dis-
tinguish the cancer mammograms from that of normal subjects. The mass detection is
due to the low image contrast, and microcalcification is due to the large variation in
size and shape for which multimodal features are extracted to distinguish the cancer
mammograms effectively to. We extracted texture, morphology, entropy based, SIFT, and
EFDs features for training and validating the ML classifiers. A 10-fold cross-validation was
used to train and test the image database. The performance was measured based on
specificity, sensitivity, PPV, NPV, FPR, and AUC. The CNN GoogleNet with default param-
eters and AlexNet with optimized parameters gives the highest performance of (TA,
Sensitivity = 99.26%, AUC = 0.9998,0.9996) respectively, followed by Decision Tree with
(TA = 85.65%, AUC = 0.9173), SVM RBF with (TA = 85.21%, AUC = 0.8057). Using ML
classifiers, the entropy-based features give the highest performance evaluation measures
than the other features extracted from breast cancer mammograms. The detection per-
formance utilizing the deep learning methods with transfer learning approach improved
the classification performance than traditional machine learning algorithms due to the
dynamic feature engineering characteristics. Thus, the proposed approach is more robust
for improving the detection of breast mammogram and improving the healthcare systems.
5.1. Limitations and future directions
The present study was focused to apply machine learning methods with diverse hand-
crafted features based approaches and deep learning methods. Though researchers are still
working on multiple aspects of feature-extracting strategies to improve the classification
WAVES IN RANDOM AND COMPLEX MEDIA 21
performance using deep learning algorithms. In this context, a light deep learning based
architecture using minimum number of layers for optimized MRI scans with empirically
controlled unknown parameters generating dynamic features will be utilized. Similarly, the
attention mechanisms are being constantly used to focus the important regions in the
image by enhancing the weight of the image location, thereby taking care of the loss of
spatial information at the cost of improving the feature information will be used. Another
future direction is to collect a primary dataset for better BC control, containing the clin-
ical parameters and demographic profiles of the patients as well as pathological control
response, survival, and progression of the patients. We will also utilize the hybrid deep
learning methods, and parametric optimization using grid search, Bayesian optimization
and genetic algorithms to further improve the classification performance.
Acknowledgement
This study is supported via funding from Prince Sattam Bin Abdulaziz University project number
PSAU/2023/R/1444. The authors would like to thank the Deanship of Scientific Research. At Shaqra
University for supporting.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
[1] Forouzanfar MH, et al. Breast and cervical cancer in 187 countries between 1980 and 2010: a
systematic analysis. Lancet. 2011;378(9801):1461–1484.
[2] Jemal A, et al. Global cancer statistics. CA Cancer J Clin. 2011;61(2):69–90.
[3] Dheeba J, Singh NA, Selvi ST. Computer-aided detection of breast cancer on mammo-
grams: a swarm intelligence optimized wavelet neural network approach. J Biomed Inform.
2014;49:45–52.
[4] DeSantis CE, et al. Breast cancer statistics, 2015: convergence of incidence rates between black
and white women. CA Cancer J Clin. 2016;66(1):31–42.
[5] Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34.
[6] Wirén S, et al. Pooled cohort study on height and risk of cancer and cancer death. Cancer Causes
Contr. 2014;25(2):151–159.
[7] Walter RB, et al. Height as an explanatory factor for sex differences in human cancer. J Natl Cancer
Inst. 2013;105(12):860–868.
[8] Ardakani AA, Gharbali A, Mohammadi A. Classification of breast tumors using sonographic
texture analysis. J Ultrasound Med. 2015;34(2):225–231.
[9] Sprague BL, et al. Variation in mammographic breast density assessments among radiologists in
clinical practice: a multicenter observational study. Ann Intern Med. 2016;165(7):457–464.
[10] Freer PE. Mammographic breast density: impact on breast cancer risk and implications for
screening. Radiographics. 2015;35(2):302–315.
[11] Acharya UR, et al. Data mining framework for breast lesion classification in shear wave ultra-
sound: a hybrid feature paradigm. Biomed Signal Process Contr. 2017;33:400–410.
[12] Zhang L, et al. Identifying ultrasound and clinical features of breast cancer molecular subtypes
by ensemble decision. Sci Rep. 2015;5(1):1–14.
[13] Sathish D, et al. Medical imaging techniques and computer aided diagnostic approaches for the
detection of breast cancer with an emphasis on thermography-a review. Int J Med Eng Inform.
2016;8(3):275–299.
[14] Machida Y, et al. Single focus on breast magnetic resonance imaging: diagnosis based on kinetic
pattern and patient age. Acta Radiol. 2017;58(6):652–659.
22 L. HUSSAIN ET AL.
[15] Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography,
physical examination, and breast US and evaluation of factors that influence them: an analysis
of 27,825 patient evaluations. Radiology. 2002;225(1):165–175.
[16] Cheng H-D, et al. Approaches for automated detection and classification of masses in mammo-
grams. Pattern Recognit. 2006;39(4):646–668.
[17] Skaane P, Engedal K. Analysis of sonographic features in the differentiation of fibroadenoma and
invasive ductal carcinoma. Am J Roentgenol. 1998;170(1):109–114.
[18] Doi K. Computer-aided diagnosis: potential usefulness in diagnostic radiology and telemedicine.
In Proceedings of the National Forum: Military Telemedicine On-Line Today Research, Practice,
and Opportunities. 1995. IEEE.
[19] Hussain L, et al. Spatial wavelet-based coherence and coupling in EEG signals with eye open and
closed during resting state. IEEE Access. 2018;6:37003–37022.
[20] Hussain L, et al. Arrhythmia detection by extracting hybrid features based on refined Fuzzy
entropy (FuzEn) approach and employing machine learning techniques. Waves Random Com-
plex Media. 2020;30(4):656–686.
[21] Hussain L, et al. Regression analysis for detecting epileptic seizure with different feature extract-
ing strategies. Biomed Eng Biomed Tech. 2019;64(6):619–642.
[22] Karahaliou AN, et al. Breast cancer diagnosis: analyzing texture of tissue surrounding microcal-
cifications. IEEE Trans Inf Technol Biomed. 2008;12(6):731–738.
[23] Kupinski MA, Giger ML. Automated seeded lesion segmentation on digital mammograms. IEEE
Trans Med Imaging. 1998;17(4):510–517.
[24] Sahiner B, et al. Improvement of mammographic mass characterization using spiculation mea-
sures and morphological features. Med Phys. 2001;28(7):1455–1465.
[25] Zhen L, Chan AK. An artificial intelligent algorithm for tumor detection in screening mammo-
gram. IEEE Trans Med Imaging. 2001;20(7):559–567.
[26] CaldwellCB,etal.Characterisationofmammographicparenchymalpatternbyfractaldimension.
Phys Med Biol. 1990;35(2):235.
[27] Li H, Liu KR, Lo S-C. Fractal modeling and segmentation for the enhancement of microcalcifica-
tions in digital mammograms. IEEE Trans Med Imaging. 1997;16(6):785–798.
[28] Chen D-R, et al. Classification of breast ultrasound images using fractal feature. Clin Imaging.
2005;29(4):235–245.
[29] Hussain L, et al. Applying Bayesian network approach to determine the association between
morphological features extracted from prostate cancer images. IEEE Access. 2018;7:1586–1601.
[30] Qureshi SA, et al. Intelligent ultra-light deep learning model for multi-class brain tumor detec-
tion. Appl Sci. 2022;12(8):3715.
[31] Hussain L, et al. Prostate cancer detection using machine learning techniques by employing
combination of features extracting strategies. Cancer Biomark. 2018;21(2):393–413.
[32] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444.
[33] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural
networks. Commun ACM. 2017;60(6):84–90.
[34] Delphia AA, Kamarasan M, Sathiamoorthy S. Image processing for identification of breast cancer:
a literature survey. Asian J Electr Sci. 2018;7(2):28–37.
[35] Kupinski MA, et al. Ideal observer approximation using Bayesian classification neural networks.
IEEE Trans Med Imaging. 2001;20(9):886–899.
[36] Lyons L. Statistical problems in particle physics, astrophysics and cosmology: PHYSTAT05,
Oxford, UK, 12–15 September 2005. 2006: Imperial College Press.
[37] Specht DF. Probabilistic neural networks. Neural Netw. 1990;3(1):109–118.
[38] Hamad YA, Simonov K, Naeem MB. Breast cancer detection and classification using artificial neu-
ral networks. In 2018 1st Annual International Conference on Information and Sciences (AiCIS).
2018. IEEE.
[39] Zheng B, Qian W, Clarke LP. Digital mammography: mixed feature neural network with
spectral entropy decision for detection of microcalcifications. IEEE Trans Med Imaging.
1996;15(5):589–597.
WAVES IN RANDOM AND COMPLEX MEDIA 23
[40] Nahid A-A, Kong Y. Involvement of machine learning for breast cancer image classification: a
survey. Comput Math Methods Med. 2017;2017:3781951–3781951.
[41] Bhandare A, et al. Applications of convolutional neural networks. Int J Comp Sci Inform Technol.
2016;7(5):2206–2215.
[42] Lo S-CB, et al. A multiple circular path convolution neural network system for detection of
mammographic masses. IEEE Trans Med Imaging. 2002;21(2):150–158.
[43] Sahiner B, et al. Classification of mass and normal breast tissue: a convolution neural network
classifier with spatial domain and texture images. IEEE Trans Med Imaging. 1996;15(5):598–610.
[44] Jiao Z, et al. A deep feature based framework for breast masses classification. Neurocomputing.
2016;197:221–231.
[45] Fonseca P, et al. Automatic breast density classification using a convolutional neural network
architecture search procedure. In Medical imaging 2015: computer-aided diagnosis. 2015. SPIE.
[46] Su H, et al. Region segmentation in histopathological breast cancer images using deep convolu-
tional neural network. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).
2015. IEEE.
[47] Huynh BQ, Li H, Giger ML. Digital mammographic tumor classification using transfer learning
from deep convolutional neural networks. J Med Imaging. 2016;3(3):034501.
[48] Arevalo J, et al. Representation learning for mammography mass lesion classification with
convolutional neural networks. Comput Methods Programs Biomed. 2016;127:248–257.
[49] Rezaeilouyeh H, Mollahosseini A, Mahoor MH. Microscopic medical image classification frame-
work via deep learning and Shearlet transform. J Med Imaging. 2016;3(4):044501.
[50] Jaffar MA. Deep learning based computer aided diagnosis system for breast mammograms. Int
J Adv Comp Sci Appl. 2017;8:7.
[51] Jadoon MM, et al. Three-class mammogram classification based on descriptive CNN features.
BioMed Res Int. 2017;2017:3640901–3640901.
[52] Gastounioti A, et al. Using convolutional neural networks for enhanced capture of breast
parenchymal complexity patterns associated with breast cancer risk. Acad Radiol. 2018;25(8):
977–984.
[53] Wang H, et al. Breast mass classification via deeply integrating the contextual information from
multi-view data. Pattern Recognit. 2018;80:42–52.
[54] Zhu W, et al. Adversarial deep structured nets for mass segmentation from mammograms. In
2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). 2018. IEEE.
[55] Ribli D, et al. Detecting and classifying lesions in mammograms with deep learning. Sci Rep.
2018;8(1):1–7.
[56] Chiao J-Y, et al. Detection and classification the breast tumors using mask R-CNN on sonograms.
Medicine. 2019;98:19.
[57] Nahid A-A, Mehrabi MA, Kong Y. Histopathological breast cancer image classification by deep
neural network techniques guided by local clustering. BioMed Res Int. 2018;2018:2362108–
2362108.
[58] Szegedy C, et al. Rethinking the inception architecture for computer vision. In Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016.
[59] Shin H-C, et al. Deep convolutional neural networks for computer-aided detection: CNN archi-
tectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging. 2016;35(5):
1285–1298.
[60] Chen H, et al. Standard plane localization in fetal ultrasound via domain transferred deep neural
networks. IEEE J Biomed Health Inform. 2015;19(5):1627–1636.
[61] Heath M, et al. Current status of the digital database for screening mammography. In: Karssemei-
jer N, Thijssen M, Hendriks J, et al., editors. Digital mammography. Dordrecht: Springer; 1998. p.
457–460.
[62] Lévy D, Jain A. Breast mass classification from mammograms using deep convolutional neural
networks. arXiv preprint arXiv:1612.00542, 2016.
[63] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal
covariate shift. In International conference on machine learning. 2015. PMLR.
24 L. HUSSAIN ET AL.
[64] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks.
In Proceedings of the thirteenth international conference on artificial intelligence and statistics.
2010. JMLR Workshop and Conference Proceedings.
[65] Chen T, et al. Improving sentiment analysis via sentence type classification using BiLSTM-CRF
and CNN. Expert Syst Appl. 2017;72:221–230.
[66] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–1359.
[67] Yosinski J, et al. How transferable are features in deep neural networks? Adv Neural Inf Process
Syst. 2014;27:1792.
[68] Deng J, et al. Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on
computer vision and pattern recognition. 2009. IEEE.
[69] Zhu Z, et al. Extreme weather recognition using convolutional neural networks. In 2016 IEEE
International Symposium on Multimedia (ISM). 2016. IEEE.
[70] Elhoseiny M, Huang S, Elgammal A. Weather classification with deep convolutional neural
networks. In 2015 IEEE International Conference on Image Processing (ICIP). 2015. IEEE.
[71] Soekhoe D, Putten PVD, Plaat A. On the impact of data set size in transfer learning using deep
neural networks. In International symposium on intelligent data analysis. 2016. Springer.
[72] Chu B, et al. Best practices for fine-tuning visual classifiers to new domains. In European confer-
ence on computer vision. 2016. Springer.
[73] Kim KG. Book review: deep learning. Healthc Inform Res. 2016;22(4):351–354.
[74] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In Proceedings of the four-
teenthinternationalconferenceonartificialintelligenceandstatistics.2011.JMLRWorkshopand
Conference Proceedings.
[75] Goodfellow I, Bengio Y, Courville A. Convolutional networks. In: Goodfellow I, Bengio Y, Courville
A, editors. Deep learning. Cambridge: MIT Press; 2016. p. 330–372.
[76] Singh RG, Kishore N. The impact of transformation function on the classification ability of
complex valued extreme learning machines. In 2013 International Conference on Control, Com-
puting, Communication and Materials (ICCCCM). 2013. IEEE.
[77] Bengio Y. Practical recommendations for gradient-based training of deep architectures. In: Mon-
tavon G, Orr GB, Müller KR, editors. Neural networks: tricks of the trade. Berlin, Heidelberg:
Springer; 2012. p. 437–478.
[78] Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In
Proc. ICML. 2013. Atlanta, Georgia, USA.
[79] Tóth L. Phone recognition with deep sparse rectifier neural networks. In 2013 IEEE International
Conference on Acoustics, Speech and Signal Processing. 2013. IEEE.
[80] Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. Haifa: ICML;
2010.
[81] Jarrett K, et al. What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th
international conference on computer vision. 2009. IEEE.
[82] Lai M. Deep learning for medical image segmentation. arXiv preprint arXiv:1505.02000, 2015.
[83] Rosasco L, et al. Are loss functions all the same? Neural Comput. 2004;16(5):1063–1076.
[84] Boyd S, Boyd SP, Vandenberghe L. Convex optimization. Los Angeles: Cambridge University
Press; 2004.
[85] Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test
evaluation. Caspian J Intern Med. 2013;4(2):627.
[86] Mishra S, Panda M. A histogram-based classification of image database using scale invariant
features. Int J Image Graphics Signal Proc. 2017;9(6):55.
[87] Hussain L. Detecting epileptic seizure with different feature extracting strategies using robust
machine learning classification techniques by applying advance parameter optimization
approach. Cogn Neurodyn. 2018;12(3):271–294.
[88] Ravì D, et al. Deep learning for health informatics. IEEE J Biomed Health Inform. 2016;21(1):4–21.
[89] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556, 2014.

More Related Content

Similar to Hussain et al BC Deep Learning March 2023.pdf

A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceA Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisPramod Sharma
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
 
Image processing and machine learning techniques used in computer-aided dete...
Image processing and machine learning techniques  used in computer-aided dete...Image processing and machine learning techniques  used in computer-aided dete...
Image processing and machine learning techniques used in computer-aided dete...IJECEIAES
 
Applying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisApplying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisCognizant
 
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...CSCJournals
 
A Comparative Study on the Methods Used for the Detection of Breast Cancer
A Comparative Study on the Methods Used for the Detection of Breast CancerA Comparative Study on the Methods Used for the Detection of Breast Cancer
A Comparative Study on the Methods Used for the Detection of Breast Cancerrahulmonikasharma
 
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...IRJET Journal
 
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...IRJET Journal
 
Role of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast LesionRole of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast LesionApollo Hospitals
 
A Progressive Review on Early Stage Breast Cancer Detection
A Progressive Review on Early Stage Breast Cancer DetectionA Progressive Review on Early Stage Breast Cancer Detection
A Progressive Review on Early Stage Breast Cancer DetectionIRJET Journal
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerIRJET Journal
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
 
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
 
Comparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problemComparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problemjournalBEEI
 

Similar to Hussain et al BC Deep Learning March 2023.pdf (20)

A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceA Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...
 
Image processing and machine learning techniques used in computer-aided dete...
Image processing and machine learning techniques  used in computer-aided dete...Image processing and machine learning techniques  used in computer-aided dete...
Image processing and machine learning techniques used in computer-aided dete...
 
Applying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisApplying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer Diagnosis
 
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
 
A Comparative Study on the Methods Used for the Detection of Breast Cancer
A Comparative Study on the Methods Used for the Detection of Breast CancerA Comparative Study on the Methods Used for the Detection of Breast Cancer
A Comparative Study on the Methods Used for the Detection of Breast Cancer
 
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...
 
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
 
Li2019
Li2019Li2019
Li2019
 
Role of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast LesionRole of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast Lesion
 
A Progressive Review on Early Stage Breast Cancer Detection
A Progressive Review on Early Stage Breast Cancer DetectionA Progressive Review on Early Stage Breast Cancer Detection
A Progressive Review on Early Stage Breast Cancer Detection
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
 
Review_1.pdf
Review_1.pdfReview_1.pdf
Review_1.pdf
 
journals public
journals publicjournals public
journals public
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
Comparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problemComparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problem
 
1. 501099
1. 5010991. 501099
1. 501099
 

More from LallHussain

ch15BayesNet.ppt
ch15BayesNet.pptch15BayesNet.ppt
ch15BayesNet.pptLallHussain
 
Bioinformatics Assignment No 02 (1).pdf
Bioinformatics Assignment No 02 (1).pdfBioinformatics Assignment No 02 (1).pdf
Bioinformatics Assignment No 02 (1).pdfLallHussain
 
Hussain et al PCR Breast.pdf
Hussain et al PCR Breast.pdfHussain et al PCR Breast.pdf
Hussain et al PCR Breast.pdfLallHussain
 
BUSINESS-LETTER.ppt
BUSINESS-LETTER.pptBUSINESS-LETTER.ppt
BUSINESS-LETTER.pptLallHussain
 
Radiomic Features.pdf
Radiomic Features.pdfRadiomic Features.pdf
Radiomic Features.pdfLallHussain
 

More from LallHussain (6)

ch15BayesNet.ppt
ch15BayesNet.pptch15BayesNet.ppt
ch15BayesNet.ppt
 
Bioinformatics Assignment No 02 (1).pdf
Bioinformatics Assignment No 02 (1).pdfBioinformatics Assignment No 02 (1).pdf
Bioinformatics Assignment No 02 (1).pdf
 
Hussain et al PCR Breast.pdf
Hussain et al PCR Breast.pdfHussain et al PCR Breast.pdf
Hussain et al PCR Breast.pdf
 
BUSINESS-LETTER.ppt
BUSINESS-LETTER.pptBUSINESS-LETTER.ppt
BUSINESS-LETTER.ppt
 
Radiomic Features.pdf
Radiomic Features.pdfRadiomic Features.pdf
Radiomic Features.pdf
 
1-intro.ppt
1-intro.ppt1-intro.ppt
1-intro.ppt
 

Recently uploaded

Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...parulsinha
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...chandars293
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...GENUINE ESCORT AGENCY
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...parulsinha
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...narwatsonia7
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableGENUINE ESCORT AGENCY
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Ishani Gupta
 
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Dipal Arora
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Sheetaleventcompany
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...vidya singh
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...adilkhan87451
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...parulsinha
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...parulsinha
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora
 

Recently uploaded (20)

Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
 
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
 

Hussain et al BC Deep Learning March 2023.pdf

  • 1. Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=twrm20 Waves in Random and Complex Media ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/twrm20 Deep convolutional neural networks accurately predict breast cancer using mammograms Lal Hussain, Sara Ansari, Mamoona Shabir, Shahzad Ahmad Qureshi, Amjad Aldweesh, Abdulfattah Omar, Zahoor Iqbal & Syed Ahmed Chan Bukhari To cite this article: Lal Hussain, Sara Ansari, Mamoona Shabir, Shahzad Ahmad Qureshi, Amjad Aldweesh, Abdulfattah Omar, Zahoor Iqbal & Syed Ahmed Chan Bukhari (2023): Deep convolutional neural networks accurately predict breast cancer using mammograms, Waves in Random and Complex Media, DOI: 10.1080/17455030.2023.2189485 To link to this article: https://doi.org/10.1080/17455030.2023.2189485 Published online: 14 Mar 2023. Submit your article to this journal View related articles View Crossmark data
  • 2. WAVES IN RANDOM AND COMPLEX MEDIA https://doi.org/10.1080/17455030.2023.2189485 Deep convolutional neural networks accurately predict breast cancer using mammograms Lal Hussaina,b, Sara Ansaric, Mamoona Shabird, Shahzad Ahmad Qureshie, Amjad Aldweeshf, Abdulfattah Omarg, Zahoor Iqbalh and Syed Ahmed Chan Bukharii aDepartment of Computer Science & IT, Neelum Campus, The University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan; bDepartment of Computer Science & IT, King Abdullah Campus, The University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan; cThe Children’s Hospital, University of Child Sciences, Lahore, Pakistan; dServices Institute of Medical Sciences, Lahore, Pakistan; eDepartment of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad, Pakistan; fCollege of Computer Science and Information Technology, Shaqra University, Shaqra, Saudi Arabia; gDepartment of English, College of Science & Humanities, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia; hDepartment of Mathematics, Quaid-i-Azam University, Islamabad, Pakistan; iHealthcare Informatics, St. John’s University, Queens, NY, USA ABSTRACT Breast cancer in women is the most frequently diagnosed and major leading cause of cancer deaths. Due to the complex nature of micro- calcification and masses, radiologists fail to diagnose breast can- cer properly. In this research paper, we have employed a novel Deep Convolutional Neural Network (DCNN) model using a transfer learning strategy and compared the results with Machine Learning (ML) techniques such as Support vector machine (SVM) kernels and Decision Trees based on different features extracting strategies to distinguish cancer mammograms from normal subjects. In this study, we first extracted the hand-crafted features such as as texture, mor- phological, entropy-based, scale-invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) and fed into machine learn- ing algorithm for classification. We then utilized the deep learning algorithms with transfer learning approach. The deep learning mod- els yielded the highest detection performance with default and optimized parameters i.e. GoogleNet yielded accuracy (99.26%), AUC (0.9998) with default parameters and AlexNet yielded accuracy (99.26%), AUC (0.9996) with optimized parameters. The results reveal that proposed approach is more robust for early detection of breast mammogramswhichcanbebestutilizedforimproveddiagnosisand prognosis. ARTICLE HISTORY Received 29 November 2021 Accepted 20 February 2023 KEYWORDS Breast cancer; deep learning (DL); convolutional neural network (CNN); GoogleNet; AlexNet; support vector machine (SVM); scale invariant feature transform (SIFT) 1. Introduction Breast cancer is among women most frequently diagnosed cancers. In developing coun- tries, breast cancer accounts for 23% of the total cancer cases, and 1.6 million new cases of breast cancer are estimated worldwide, affecting women [1–3]. Breast cancer accounts for nearly one in three cancers among US women excluding skin cancer and is the second CONTACT Lal Hussain lall_hussain2008@live.com; Amjad Aldweesh a.aldweesh@su.edu.sa © 2023 Informa UK Limited, trading as Taylor & Francis Group
  • 3. 2 L. HUSSAIN ET AL. leading cause of cancer death among women after lung cancer [4]. In 2016, about 29% of deaths were accounted in females due to breast cancer in the United States State. In 2016, it was estimated that 595,690 Americans would die from cancer, corresponding to 1600 deaths per day [5]. The most common causes of cancer deaths are lung and bronchus, prostate, and colorectal cancers in men, and for women, these include lung and bronchus, breast, and colorectal cancers. The invasive cancer lifetime probability of being diagnosed in men (42%) is higher than in women (38%). This may be reflected due to external dif- ferences in environmental exposure, endogenous hormones, and complex interaction between these influences. Cancer incidences and deaths in both men and women are asso- ciated with an adult height determined by genetics and childhood nutrition accounting for 1/3 of 6 differences in cancer risk [5,6]. The cancer risk for adults younger than 50 years is higher in women (5.4%) than for men (3.4%) because of the relatively high burden of breast, genital, and thyroid cancers in young women [7]. The early diagnosis and detection of breast cancer can decrease the death rate and provide means for prompt treatment. Breast cancer is diagnosed and detected using a com- bination of approaches, including imaging, physical examination, and biopsy [8]. One of the imaging techniques used to detect breast cancer is mammography, where X-rays are used to create images, known as mammograms, of the breast. Radiologists are trained to read mammograms to detect the signs of breast cancer. The effectiveness of the screen- ing process can rely on radiologists’ explanations [9]. Patients affected by palpable breast cancer may have a sonogram and mammogram examination with both normal and benign or nonspecific appearance [10]. The biopsy is used to confirm the symptoms of breast can- cer, but it is an invasive surgical operation causing a psychological and physical impact on patients. To avoid unnecessary biopsies, researchers have devised and investigated var- ious computer-aided diagnosis (CAD) systems [3,11] providing stable detection rates by identifying ultrasound & clinical features [12], using data mining classification techniques, medical imaging and computer-aided diagnostics [13], and breast magnetic resonance imaging (MRI) [14]. As far as mammography is concerned, the research evidence that radiologists may miss up to 30% of breast cancers depending on the density of the breasts [15]. The mammo- grams in breast cancer have been evaluated using two powerful indicators: masses and micro-calcifications. Mass detection is more challenging than micro-calcification, not due to the large variation in size and shape in which masses can appear in mammograms but also because masses often exhibit poor image contrast [16]. Radiologists read mam- mograms based on their experience, training, and subjective criteria. There may be a 65–75% inter-observer variation rate even by the trained experts [17]. Hence, computer- aided diagnosis (CAD) may help radiologists to interpret mammograms to detect and classify masses. The literature also reveals that about 65–90% of the biopsies of suspected cancers turned out to be benign. Thus, it is essentially to develop techniques that can dis- tinguish the malignant and benign lesions. The combination of computer-aided diagnosis (CAD), expert knowledge, and Machine Learning (ML) techniques would greatly improve detection accuracy. The detection accuracy without CAD was obtained below 80%, and with computer-aided diagnosis (CAD) above 90% [18]. CAD can automatically identify the area of abnormal contrast, calling the radiologist towards suspicious regions. Thus, mammograms with computer-aided diagnosis (CAD) will improve the detection of can- cer. The cancer masses and micro-calcifications in many cases are hidden in the intense
  • 4. WAVES IN RANDOM AND COMPLEX MEDIA 3 breast tissues, especially in younger women, that are complex to detect and diagnose cancer [3]. Features extraction is an important step to detect any pathologies from physiological and neurophysiological systems. Likewise, time–frequency representation methods were employed by [19] to determine the correlation and coupling between the brain waves dur- ing resting states. Hussain et al. [20] extracted multimodal features based on fuzzy entropy to detect arrhythmia, which outperformed the traditional features extracting approaches and hybrid features [21] by employing regression methods to detect and predict epilep- tic seizures. Moreover, to distinguish normal images from malignant subjects, researchers extracted different imaging-related features. Karahaliou et al. [22] used a probabilistic neu- ral network to diagnose breast cancer by extracting multi-scale texture properties of the tissue surrounding the micro-calcifications. In the past few decades, other approaches have also been used to detect and diagnose breast cancer, viz., a probabilistic algorithm and radial gradient index-based algorithm [23], Convolution Neural Network (CNN) classifier [24], and a mixed feature-based neural network [25], fractal geometry and analysis using digital mammograms [26–28], and a method for automated segmentation of individual micro-calcifications in a region of interest (ROI). Recently, Hussain et al. [29] computed the associations between the morphological features extracted from the prostate cancer images and found very stronger associations among the features. In the past, researchers employed different hand-crafted feature-extracting strategies such as texture, morphology, gray level co-occurrence matrix, histogram of oriented gra- dients, scale-invariant feature transform, or a hybrid of these features for a brain tumor, prostate cancer, and arrhythmia detection using ML and DL techniques [20,30,31]. The existing techniques have some limitations; the graph-based techniques are competitively expensive. The other computer-aided diagnosis (CAD) techniques based on texture fea- tures exploited general texture features for classification and fail to provide the background knowledge of morphological features. The machine learning methods based on differ- ent feature-extracting strategies have limitations as different researchers employ different feature-extracting methods. However, these classifiers are not fine-tuned for challenging contrast existing in features. With the advent of modern computational systems, ML-related Artificial Intelligence application and graphical processing units (GPU) embedded processors have achieved exponential growth by developing novel models and methodologies which is currently knownasDL[32].TheDL-basedConvolutionNeuralNetwork(CNN)modeladoptsthearchi- tecture of an artificial neural network that contains a much larger number of processing layers which is contrary to the shallower architecture. CNN’s drastically reduce the struc- tural elements (i.e. neurons) in comparison to traditional feedforward neural networks [32]. For image processing, different baseline architectures of CNNs have been developed and successfully applied to complicated image-processing tasks. The breast cancer diagnosis has accompanied classification and segmentation perfor- mance improvement due to the representation learning, a characteristic of DL, due to its auto-feature extraction proficiency as compared with the handpicked feature extraction requirement in ML [33]. The learning phase is characterized by the flow of information exhibiting the capability of self-leering [34]. In DL, the Bayesian framework determines uncertainty in the model output using a Bayesian neural network [35,36]. Donald F. Specht introduced a probabilistic neural network (PNN), using the Bayesian classification theory,
  • 5. 4 L. HUSSAIN ET AL. consisting of three layers, viz. Input, Radial Basis, and Competitive layers [37,38]. PNN has been used to categorize mammography images into normal, benign, and malignant classes. The discrete wavelet transforms been used to find the input feature vector as handpicked features. They used seventy-five mammograms in their study and claimed an accuracy of 90%. Zhang, Lin, et al. [39] introduced a three-stage neural network method to alleviate the false positive rate of microcalcification in mammographic images. The microcalcification was detected in the first stage, followed by the second stage, where the FP detection was reduced from the first stage output. Lastly, in the third stage, the Kalman filter-based back propagation neural network isolated the microcalcifications in the mammograms. The DL networks using CNN achieved outclass performance for the detection and clas- sification of masses and microcalcifications. In this context, Fukushima et al introduced a light-weight CNN, known as ‘Recognition’, for medical image analysis [40,41]. Lo et al. [42] introduced a CNN with multiple circular paths where information was first collected from the suspected regions of mammograms, followed by processing as features using CNN. Sahiner et al. [43] proposed a CNN for mammography where selected regions, extracted by either averaging or subsampling were input to the CNN. Jiao et al. [44] classified breast masses using a DL-based strategy where intensity-based features were combined with CNN-extracted features using mammograms. Fonseca et al. [45] used CNN with an SVM classifier for the classification of breast cancer. Su et al. [46] introduced a rapid CNN method for breast cancer categorization where the semantic segmentation was carried out to reduce redundant information at the cost of higher com- plexity of the CNN model. Huynh et al. [47] used CNN by transfer learning to classify masses and microcalcification. Arevalo et al. [48] introduced a method that did not use hand crafted features where CNN was used to learn the data representation in a supervised learning manner from biopsy images of 344 breast cancer patients. Rezaeilouyeh et al. [49] proposed a microscopic breast cancer classification model using CNN where the shearlet transform-based images were obtained as the feature vectors. Subsequently, the shearlet coefficients were input to the CNN for classification. Jaffar [50] proposed a method that was based on the enhancement as preprocessing of mammo- grams, followed by CNN for feature extraction. The features were used to train the SVM classifier. Jadoon et al. [51] introduced a dual deep neural networks-based classification model for classes, viz. benign, malignant and normal. These algorithms were convolutional neural network-discrete wavelet and convolutional neural network-curvelet transform. The features extracted from discrete wavelet and curvelet transform based coefficients were fused and fed to the CNN. The CNN was trained on softmax and SVM for classification. Gastounioti et al. [52] used an ensemble classifier for breast cancer categorization. The textural feature maps, obtained from lattice-based methods, were fed to the CNN for multi-class categorization. Wang et al. [53] proposed a hybrid approach for breast can- cer classification into benign and malignant classes. The cropping and clinical features are extracted using multi-view patches of mammograms. Finally, the CNN was trained using multiple features to focus on the regions related to semantic-based lesions. Zhu et al. [54] introduced a combination of a fully convolutional network to segment the masses within mammograms by using a conditional random field. The method estimated ROIs on empir- ical basis with prior information on positions that helped to improve the prediction of ROIs.
  • 6. WAVES IN RANDOM AND COMPLEX MEDIA 5 Ribli et al. [55] introduced Faster Regions with Convolutional Neural Networks (R-CNN) forbreastcancerclassificationasbenignandmalignantcases.InFasterR-CNN,theROIpool- ing method was used to extract the features that are fed to the VGG-16 model. The output of the method resulted as bounding boxes with a confidence score that decides the class of cancer. Chiao et al. [56] proposed an improved version of the region proposal network called Mask R-CNN that was used for the detection and segmentation of cancer regions in mammograms. The Mask R-CNN method used the ROI alignment technique. After the fea- ture extraction from the ROI Align method, CNN was used for detection and classification processes.Nahidetal.[57]usedLSTMfortheclassificationofmicrocalcificationsandmasses by transforming mammograms into 1D-vector format, followed by conversion into time- series data. A total of 7909 images were used from the BreakHis histopathological dataset which were evaluated on SVM and Softmax at the decision layer. In contrast, the DL convolution neural network models with TL approaches are fine- tuned to optimize the parameters by minimizing the error. In this study, we have tested the generalization of the breast cancer mammographic images through AlexNet [33], and GoogleNet [58] as pre-trained CNN models using a TL approach verified in literature [59,60] in the most widely used imaging datasets. The features and training data were desired to lie within the same feature space. Transfer learning has the capability to allow the users to extract pre-known expertise and apply it on the new domain by reducing overall computa- tional time with the images lying in the combined feature space of two known TL methods on a broader spectrum with marked discrimination in feature space. The widened solution space, using the feature fusion, has resulted in the outclass performance. 2. Methods 2.1. Datasets Datasets were taken from publicly available databases provided by the University of South Florida [61] available online at (http://marathon.csee.usf.edu/Mammography/Database. html). In DDSM images, suspicious regions of interest are marked by experienced radi- ologists, and BI-RADS information is also annotated for each abnormal region. In our experiment, we used mass instance images digitized by LUMYSIS. This dataset contains approximately 2500 studies. We used the latest volumes of the DDMS database, i.e. 12 nor- mal volumes and 15 cancer volumes, 15 containing a total of 899 images, including 500 cancer images having 105 cases and 399 normal subject images having 100 cases. 2.2. Convolutional neural network Due to the outclass performance, CNNs have been used for breast cancer classification [62]. An end-to-end CNN architecture was applied to classify the cancer images directly to. To obtain high performance, we require a careful combination of pre-processing, TL, and data augmentation. In this proposed work, the performance was evaluated using two net- work architectures of CNN, namely AlexNet [33] and GoogleNet [58]. For both networks, the same architecture was used only replacing the last fully connected (FC) layer to output two classes. From GoogleNet, two auxiliary classifiers were removed. We also used batch normalization to regularize the data flowing between neural network layers reducing the
  • 7. 6 L. HUSSAIN ET AL. internalcovariateshift[63].Theinputof224 × 224 × 3imageswassuppliedtothenetwork. CNN consists of convolution blocks composed of 3 × 3 convolutions – Batch Norm-ReLU- Max Pooling, with respectively 32, 32, and 64 filters each followed by three fully connected layersofsize128,64,and2.Thefinallayerisofsoft-maxforbinaryclassification.Inthisstudy, we used default and optimized parameters as: Xavier’s [64] weight initialization, ReLU acti- vation function, and Adam’s [62] update rule. We used a base learning rate of 10−04 and mini-batch size of 20 and 64, while for optimized parameters, we used a momentum of 0.9, an initial learning rate of 0.001, learning rate drop factor of 0.1, L2 regularization of 0.004, batch size of 20, epoch 2, etc. Let us consider input y (suppose depicted object in the image) using the model y = f(x, θ). Since the model is not previously known, our aim is to use a generic model by describing through a set of parameters θ that are specialized in the target task. This can be done using a supervised ML approach by presenting a model using a set of input exam- ples and labels pairing (x, y) and updating iteratively its parameters so that the obtained output is near to the possibly original associated labels. The difference between the label ŷ predicted from the model and desired label y, the loss function (y − ŷ) is employed. The main purpose of this learning process is to select the parameter θ values that minimize such a function. An optimization method is desired to adjust the parameter θ values from the family of the gradient descent algorithm. 2.2.1. Deep learning ResNet101 ResNet101, named after its 101 layers of the residual network, contains a modified version of ResNet 50 architecture. The ResNet model was originally proposed by He et al. in 2016 [32]. ResNet is an abbreviation for residual networks and has been employed in solving numerous problems related to computer vision and its other applications. ResNet is one of the deepest Convolutional Neural Network architectures used on large scales and has been used for a wide range of applications in the ImageNet dataset (i.e. object detection and recognition, various classification purposes). Generally, the multiple layers of a CNN are interconnected to each other in a specified manner; these layers are trained to perform various tasks. The basic idea behind ResNet architecture and its implementation is based on residual network connections across which the gradients pass to inhibit the gradients to zero after employing the chain rule [32]. ResNet101 has 104 convolutional layers along with 33 filters (blocks), with one block for each layer, respectively. Nine out of 33 layers use the output of previous layers directly, which is known as a residual connection. These resid- ual connections are used as the first operand of the summation operator at the end of each layer to obtain the input from other layers. The remaining 4 layers receive the output of the previous block as input and employ it in the convolutional layer with a filter size of 1 × 1 and a stride of 1, followed by a group of normalization layers. This normalization layer is used to perform normalization operations, and then the obtained output is transferred to the summation operator at the output of that block. The depth of each block may vary accord- ing to the density of each block [65]. The general architecture of ResNet101 is reflected in Figure 5. Moreover, Figures 6 and 7 reflect the replaced layers of ResNet101 before and after fine-tuning (Figure 1). The hyper-parameter settings found empirically for ResNet101 are depicted in Table 1. The hyper-parameters of the CNN models were adjusted heuristically to facilitate the convergence of the loss function during training. The Adam optimizer was chosen because
  • 8. WAVES IN RANDOM AND COMPLEX MEDIA 7 Figure 1. ResNet101 overall architecture. Table 1. Empirically tuned set of parameters. Model Parameter Value ResNet101 Optimizer Adam (TL Deep CNN) Momentum 0.90 Initial learning rate 0.0001 L2 Regularization 0.00004 Max epochs 10 Minibatch size 12 of its learning rate and the parameter-specific adaptive nature of the learning rates. The initial learning rates were chosen as 0.0001 for ResNet101. A large learning rate may pre- vent the loss of function from converging and could cause overshoots. An extremely small learning rate drastically increases the training time. The mini-batch size of 10 and 12 was set, according to the speed of training and computational requirements. Extremely large values of batch size adversely affect the model quality. 2.2.2. GoogleNet On a new set of cancer images, the GoogleNet was retrained. The weights of the earlier layers were frozen in the network by setting the learning rate to zero. During the freezing of the training layers, parameters were not updated because the gradients of these layers were not computed, and this helped to improve the network performance significantly. This property also helps to avoid overfitting the new dataset. The first 110 layers in the GoogleNet include the inception module. By using the freezeWeights(), the learning rates of the first 110 layers were set to zero. The layers in original order were reconnected using the CreateLgraph() using the connection function while the earlier layers learning rate was set to zero. Figure 2 illustrates the schematic diagram of GoogleNet model. 2.2.2.1. Train network-framework. As the training network require input images of the size 224 × 224 × 3 and 227 × 227 × 3 for GoogleNet and AlexNet, respectively, but images in the datasets have different size. So, we used imresize() function to resize the images of
  • 9. 8 L. HUSSAIN ET AL. Figure 2. Schematic diagram of GoogleNet architecture. different size equivalent to the input images size. The TL-based framework adopts ResNet- 101 (2048 features) and GoogleNet (1000 features) using mammograms. The features after fusion (3048 features) were used for each image. The entire dataset was fed to the cross- validation (10-fold) stage. The optimized model was used to determine the performance of the test instances for discriminating the healthy and diseased subjects. 2.2.2.2. Transfer learning (TL) approach. We applied the TL approach, using networks such as GoogleNet and AlexNet of CNN pre-trained on the ImageNet comprising of inception-, convolution- and fully-connected-layers. The fully connected layers require fixed image input for processing while convolution layers can work with arbitrary input image size. To avoid overfitting in the training, the images are resized for GoogleNet and AlexNet as 224 × 224 × 3 and CNN as 227 × 227 × 3. Moreover, for GoogleNet, we modi- fied the dimension of the last fully connected layers from 1000 to 2. Likewise, the last fully connected layer was also completely re-initialized randomly while all other layers main- tained their weights from the pre-training. The shallow layers are general and low-level image features, while deeper layers are high-level and task specific. Thus, the learning rate of deeper layers should be larger than that of shallow layers. The batch size was set to 20, the initial learning rate of 10−4, and maximum epochs of 6 using 378 iterations. The CNN entire training from scratch can be cumbersome because a small dataset may cause the problem of overfitting. To tackle this kind of problem, a TL technique is employed. This technique can solve a new problem from previously learned knowledge with a better
  • 10. WAVES IN RANDOM AND COMPLEX MEDIA 9 solution by extracting knowledge from source tasks and applying knowledge to a target task by applying the task T and domain D concepts. Consider a Domain D = {χ, P(X)} comprising of a feature space χ and marginal prob- ability distribution P(X), where X = {x1, x2, x3, . . . ..xn}χ. In Domain, D = {χ, P(X)} a task T = {γ , f(.)} comprised of a label γ and objective predictive function f(.) learning from the training data, comprised of a pair {xi, yi}, where xiχ and yiγ , i.e. predicting corresponding label f(x) of a new instance x. Consider a source domain Ds and corresponding source task Ts, target domain Dt and corresponding target task Tt using the knowledge in Ds and Dt TL approach is aimed to help in improving the learning of target predictive function ft (.) in Dt where Ds = Dt and Ts = Tt [66]. To employ TL on CNN, various approaches have been employed [67]. A CNN trained previously on another task, say image classification using ImageNet dataset [68] can dis- tinguish two approaches: (a) Fine tuning: Using this approach, the network parameters are retrained by propagating back the error to the whole network [69], (b) Freezinglayers: Using this approach, most of the transferred features remained unchanged during the training of the new task. Due to this fact, the most common generic features are contained in the first layer, which is common to many problems, while other layers progressively become more specific to the target dataset [70]. Applying the proper type of TL to a specific task requires several factors into considera- tion. The most important factors include the dataset size [71] and its similarity to the dataset used in the originally trained network [72], viz. ImageNet. When the dataset is smaller than the original dataset, the concept of the freezing layer approach is most feasible because low-level features are also relevant for the target dataset. Moreover, the smaller dataset may lead to overfitting when the fine-tuned approach is employed, suggested that when the bigger data is available instead. The latter approach is also suitable when we have a different dataset available than the original one. 2.2.2.3. Convolutional layer. In CNN, the Convolutional Layer is the main building block. In the basic CNN, the convolution filter is a generalized linear model (GLM) for the underly- ing local image patch. It works at the abstraction level and when the instances of latent are separable linearly. This layer has learnable filter parameters and 3D matrices of numerical values, which are spatially smaller than the input ones in terms of dimension. According to the design choice, the width and height are fixed while the depth is fixed according to the number of input channels i.e. the number of 2D inputs in the layer. These filters, during the forward pass, slide across the height and width of the input at any position. The filter slicing operation is translated mathematically into a dot product between the filter and the input at any position. A 2D output result takes the name of the activation map, which will be stacked along the depth dimension with the other activation maps to make the out- put volume. By employing the zero padding techniques, the spatial size of the output is controlled. For convolutional layer l, the output of the ith filter is denoted by yl i with total number of C filters, mathematically expressed as: yl i = s ⎛ ⎝ Ci−1 j=1 fl i,j∗ yl−1 i + bl ⎞ ⎠ (1)
  • 11. 10 L. HUSSAIN ET AL. For layer l, the bias vector is denoted by bl, ith filter of the convolution layer is denoted by fl i,j which connect to the jth feature map of layer l-1, and activation function is represented by s. A convolution operation during the backward pass is also employed but filters are flipped spatially along both axes for height and width. Using the backpropagation algorithm, the parameter fl i,j is updated and learned by the network. Using this approach, the network is capable of learning various types of filters to solve any kind of tasks with their specialized properties. 2.2.2.4. Pooling layer. The Convolution layer is followed by the Pooling layer. Its major functionality is to reduce the spatial size of the input layer and to operate independently on every depth slice. This layer is nonparametric and consists of filters that slide with a prior fixed value of stride from the input layer to produce the output [32,73]. It used the filter functions: Max Pooling and Average Pooling. 2.2.2.5. Fully connected layer. To convert the combined features in the class score, at least one fully connected (FC) layer is present in CNN before the output of the network. In this layer, each neuron is connected to all other neurons in the layers before it by consid- ering the mesh topology strategy. The main function of this layer is to learn parameters (biases and weights) to map the input layer to the corresponding output layer. The output yl for FC layer l can be computed as given by: yl = s(yl−1 ∗Wl + bl ) (2) whereWl andbl denotetheweightsandbiasvectorsoflayerl,andsrepresenttheactivation function. FC layers contrary to the convolution layer do not support parameter sharing. Due to this property, the learnable parameters with CNN are substantially increased. 2.2.2.6. Activation function. The nonlinearity in the network to learn more complex functions is determined by employing the activation function. In the DL framework, the nonlinear transformation from input to output is performed using the activation functions from the nonlinear layers and their combination with other layers [74,75]. Therefore, an appropriate activation function is required for better feature-extracting strategy [33,76,77]. A brief overview of the most commonly used activation functions g () is given by: The sigmoid function is given by: g(a) = 1 1+e−a , where a denotes the input from the front layer. The values of the sigmoid function are transformed with values ranges from 0 to 1 and commonly used to produce a Bernoulli distribution as given by: g̃ = 0, if g(a) ≤ 0.5 1, if g(a) 0.5 (3) The hyperbolic tangent function is given by: g(a) = tanh(a) = ea+e−a ea+e−a , where the derivative of g is determined by: g = 1 − g2, makes it comfortable to work with the BP algorithms. The Softmax function is given by: g(a) = eai j e aj . This layer is used commonly as output final layers that an be considered as a probability distribution over the categories. The Rectified Linear Unit (ReLU) is the most widely used activation function as given by: g(a) = max(0, a). Using gradient base algorithms, ReLU using the property of linear
  • 12. WAVES IN RANDOM AND COMPLEX MEDIA 11 models make them easy to optimize. This is easy to implement and greatly accelerate the convergence of optimization methods [32,73]. A superior performance is shown using this activation function and its variants. Moreover, in DL this activation function most popular so far [77–80]. The gradient diffusion problems can also be solved using ReLU function [74,81,82]. The Softplus function, a variant of ReLU, is given by: g(a) = log(1 + ea). The smooth approximation of ReLU is computed using this function. The absolute value rectification function is given by: g(a) = |a| is used for taking the average value in CNNs by the pooling layer [81] being capable to preventing negative and positive features from diminishing. The Maxout function is given by: gi(x) = maxi(bi + wi.x). In this case, a three- dimensionalarrayisusedforweightmatrix,theneighboringlayersconnectionscorrespond to the third array [75]. 2.2.2.7. Optimization objective. A regularization term and loss function are used to compute the objective function. The discrepancy between the output of the network is measured using the loss function which depends on the expected result y and the model parameter (θ)f(x|θ). For example, in classification tasks denoted by true class labels and in prediction tasks denoted by true level. Due to this ability, the learning algorithm not only performs well on training data but also on testing data. The test error-reducing strategy is known as regularization [74,75]. To prevent overly complex models, some regularization parameters apply penalties to the parameters. The commonly used loss function and reg- ularization parameters are represented by L(f(θ)) and Ω(θ). The optimization objective is defined as given by: L̃(X, y, θ) = L(f(θ), y) + α (θ) (4) where α represents the balance of these two components, and pragmatically the loss func- tion is computed usually across the randomly sampled training samples rather than the data generating distribution because the latter is unknown. 2.2.2.8. Lossfunction. Mostnetworksusedcrossentropybetweenthemodeldistribution and training data as the loss function. The commonly used cross entropy is the nega- tive conditional log-likelihood as given by: L(f(θ), y) = − log log P(x, θ), which represents the loss function collection corresponding to the distribution y gives the value of input variable x. Consider the following commonly used loss function. Suppose y is a contin- uous function and has Gaussian distribution over a given variable x. The loss function is given by: L(f(θ), y) = − log 1 2πσ2 exp exp −1 2σ2 (y − f)2 (5) = 1 2σ2 (y − f)2 + 1 2 log log(2πσ2 ) (6) This is described equivalently to the squared error which was the most commonly used loss function in the 1980s [74,75]. However, the outliers are excessively penalized leading to slower convergence rates [83]. Consider, the output variable y following the Bernoulli
  • 13. 12 L. HUSSAIN ET AL. distribution, then the loss function is represented as: L(θ), y) = −y log f(θ) − (1 − y) log(1 − f(θ) (7) Where y is discrete and has only two values, for example, y(1, 2, 3 . . . ., k), then we can use the Softmax value as the probability over the categories, then the loss function will be. L(f(θ), y) = − log eay j eaj (8) = ay + log ⎛ ⎝ j eaj ⎞ ⎠ (9) 2.2.2.9. Regularization term. For regularization, the parameter L2 is commonly used, which contributes to the convexity of the optimization objective by converging to the min- imum of the solution using Hessian matrix [66,84]. The regularization parameter L2 can be defined as follow: Ω(θ) = 1 2 ||ω||2 (10) The networks connecting unit weights are represented by Ω. 2.2.3. Performance evaluation parameters Breast cancer and normal subjects are classified using ML classifiers, and performance is measured by computing sensitivity, specificity, PPV, NPV, and Total Accuracy. 2.2.3.1. Sensitivity. The sensitivity measure is used to test the proportion of people who test positive for the disease among those who have the disease. Mathematically, it is expressed as: Sensitivity = TP TP + FN (11) 2.2.3.2. Specificity. Specificity measures the proportion of negatives that are correctly identified. Mathematically, it is expressed as: Specificity = TN TN + FP (12) 2.2.3.3. Positive predictive value (PPV). It is mathematically being expressed as: PPV = TP TP + FP (13) 2.2.3.4. Negative predictive value (NPV). It is mathematically being expressed as: NPV = TN TN + FN (14)
  • 14. WAVES IN RANDOM AND COMPLEX MEDIA 13 2.2.3.5. Total accuracy (TA). The total accuracy is computed as: TA = TP + TN TP + FP + FN + TN (15) 2.2.4. Training/testing data formulation The Jack-knife k-fold cross-validation (CV) technique was applied for training/testing data formulation and parameter optimization. In this research, 2,4,5, and 10-fold CVs were used to evaluate the performance of classifiers for different feature extracting strategies. The higherperformancewasobtainedusinga10-foldCV,wherethedataisdividedinto10folds, in training, the 9 folds participate and classes of samples of remaining folds are predicted based on the training performed on 9 folds. For the trained models, the test samples in the test fold are purely unseen. The entire process is repeated 10 times and each class sam- ple is predicted accordingly. A similar approach is applied to other CVs. Finally, the unseen samples predicted labels are used to determine the classification accuracy. 2.2.5. Receiver operating curve (ROC) The ROC is plotted against the true positive rate (TPR) i.e. sensitivity and false positive rate (FPR) i.e. specificity values of prostate and brachytherapy subjects. The mean features val- ues for brachytherapy subjects are classified as 1 and for prostate, subjects are classified as 0. This vector is then passed the ROC function, which plots each sample value against speci- ficity and sensitivity values. To diagnose and visualize the performance of a classifier, ROC is one of the standard ways to measure performance [85]. The TPR is plotted against the y-axis and FPR is plotted against the x-axis. The area under the curve (AUC) shows the portion of a square unit. Its value lies between 0 and 1. Seemingly, AUC 0.5 shows the separation. The higher AUC shows a better diagnostic system. Correct positive cases divided by the total number of positive cases are represented by TPR, while negative cases predicted as positive divided by the total number of negative cases are represented by FPR. 3. Results In this research, we have employed DL CNN models using a TL approach to detect breast cancer. We also extracted multimodal features such as texture, morphological, SIFT, EFDs, and entropy from these mammograms and applied ML classifiers such as the Bayesian approach, Support Vector Machine (SVM) kernels – Polynomial, RBF, Gaussian and Deci- sion Tree. Using the TL approach, we trained the GoogleNet and AlexNet pre-trained models with 500 Breast and 399 Normal mammograms. The features are then extracted using the Softmax layer. The performance was evaluated in terms of sensitivity, specificity, Positive predictive value (PPV), negative predictive value (NPV), total accuracy (TA), false positive rate (FPR) and area under the receiver operating curve (AUC) as reflected in Table 1 and Figures 3–6. For ML methods, four stages namely pre-processing, features extrac- tion, classification, training/test data formulation, and classification of images into normal and cancer/malignant using SVM, Decision Tree and Bayesian classifier, were employed as detailed in [50]. The texture, morphological, entropy, SIFT, and EFDs features are extracted as discussed by [31,86,87]. In the DL TL approaches, we resized the images according to the network requirements and then trained the GoogleNet and AlexNet pre-trained models with a new set of cancer images.
  • 15. 14 L. HUSSAIN ET AL. Figure 3. Transfer learning-based proposed framework for detection of masses and microcalcification using mammographic images. Figure 4. Performance evaluation using ML and DL methods. Using ML classifiers, with Naïve Bayes, the highest performance in terms of total accuracy (TA) was obtained with SIFT feature (TA = 57.54%) followed by Entropy (TA = 56.06%), Texture, Morphological, EFDs with (TA = 55.84%). The other performance metrics for the Bayes classifier are reflected in Table 1. Using the SVM polynomial classifier, the highest performance was obtained with texture feature (TA = 82.65%) followed by morphologi- cal and entropy (AUC = 82.42%), EFDs (TA = 77.42%) and SIFT (TA = 67.49%). The SVM
  • 16. WAVES IN RANDOM AND COMPLEX MEDIA 15 Figure 5. PerformancemeasureinformofAUCusingMLmethodsusing(a)EntropyFeatures,(b)Texture Features and DL Methods (c) AlexNet, (d) GoogleNet. RBF gives the highest performance with entropy (TA = 85.21%) followed by Morphological (TA = 84.20%), texture (TA = 83.98%), SIFT (73.68%) and EFDs (TA = 72.75%). Moreover, using SVM Gaussian, the highest performance was obtained with entropy (TA = 84.87%) followed by morphological (TA = 83.43%), texture (TA = 83.31%), SIFT (TA = 74.39%) and EFDs (TA = 73.75%). The ML Decision tree classifier gives the highest performance with entropy (TA = 85.65%) followed by morphological (TA = 84.87%), SIFT (TA = 74.04%), texture (TA = 55.17%) and EFDs (TA = 47.16%). Using DL-CNN models, the highest perfor- mance was obtained using GoogleNet with default parameters AlexNet with optimized parameters (TA = 99.42%) followed by AlexNet with default parameters (TA = 98.89%), and GoogleNet with optimized parameters (TA = 98.03%). The other performance metrics in terms of sensitivity, specificity, PPV, NPV, FPR and AUC are reflected in Table 2. Figure 4 depicts the evaluation performance using ML classifiers and CNN methods to detect breast cancer. For ML, different features are extracted, such as texture, mor- phology, entropy, SIFT and EFDs where these classifiers outer performed, and results are compared with CNN methods. Using the Bayes classifier, SIFT features outer performed with sensitivity (57.54%), specificity (43.81%), PPV (75.68%), NPV (81.62%), TA (57.54%), FPR (0.5619) and AUC (0.5088). Using the SVM polynomial kernel, the texture feature gives the highest performance with sensitivity (82.55%), specificity (82.46%), TA (82.55%) and
  • 17. 16 L. HUSSAIN ET AL. Figure 6. Performance evaluation using GoogleNet with initial parameters and 378 iterations. AUC (0.5045). SVM RBF gives the highest performance using entropy features obtain- ing sensitivity (85.21%), specificity (83.95%), TA (85.21%) and AUC (0.8857). Likewise, SVM Gaussian with entropy features gives the highest performance with sensitivity (84.87%), specificity (83.07%), TA (84.87%), and AUC (0.8779). The Decision tree classifier gives the highest performance using entropy features with sensitivity (85.65%), specificity (84.75%), TA (85.65%) and AUC (0.9173). The performance using CNN methods was evaluated using GoogleNet and AlexNet with default and optimized parameters. DL GoogleNet with default parameters gives the performance of sensitivity (99.26%), specificity (99.24%), PPV (99.26%), TA (99.26%), FPR (0.00076), and AUC (0.9998). GoogleNet with optimized param- eters gives the performance of sensitivity (98.15%), specificity (98.19%), PPV (98.15%), NPV (98.03%), TA (98.15%), FPR (0.0181) and AUC (0.9983). Similarly, DL CNN AlexNet method with default (auto) parameters gives sensitivity (98.89%), specificity (98.94%), PPV (98.89%), NPV (98.78%), TA (98.89%), FPR (0.0106) and AUC (0.9981). Moreover, AlexNet with opti- mized parameters gives the performance of sensitivity (99.26%), specificity (99.07%), PPV (99.27%), NPV (99.42%), TA (99.26%), FPR (0.00093) and AUC (0.9996). Figure 5 depicts the performance evaluation in terms of AUC to separate breast can- cer subjects from normal subjects using ML classifiers with a different set of features which outer performed and CNN methods. Using entropy features, the highest separa- tion was obtained using Decision Tree with (AUC = 0.9173) followed by SVM RBF with (AUC = 0.8857), SVM Gaussian (AUC = 0.8779) and Naïve Bayes, and SVM polynomial with (AUC = 0.507). Similarly, with texture features, the highest separation was obtained using SVM RBF with (AUC = 0.8968) followed by SVM Gaussian with (AUC = 0.8918), Decision Tree with (AUC = 0.6878) and Naïve Bayes SVM Polynomial with (AUC = 0.5045) as reflected in Figure 5(a–b). The performance in terms of AUC using DL GoogleNet with
  • 18. WAVES IN RANDOM AND COMPLEX MEDIA 17 Table 2. Performance evaluation based on Different extracted features using ML Classifiers and TL Approaches using DL Methods. Features Sensitivity Specificity PPV NPV TA FPR AUC Bayes Texture 0.5584 0.4466 0.7538 0.8036 0.5584 0.5534 0.5045 Morphological 0.5584 0.4466 0.7538 0.8036 0.5584 0.5534 0.5045 SIFT 0.5754 0.4381 0.7568 0.8162 0.5754 0.5619 0.5088 EFDs 0.5584 0.4466 0.7538 0.8036 0.5584 0.5534 0.5045 Entropy 0.5606 0.4494 0.7545 0.8041 0.5606 0.5506 0.507 SVM polynomial Texture 0.8265 0.8246 0.8271 0.821 0.8265 0.1754 0.5045 Morphological 0.8242 0.8213 0.8246 0.8191 0.8242 0.1787 0.5045 SIFT 0.6749 0.6547 0.6729 0.6624 0.6749 0.3453 0.5088 EFDs 0.7742 0.7712 0.775 0.7677 0.7742 0.2288 0.5045 Entropy 0.8242 0.8213 0.8246 0.8191 0.8242 0.1787 0.507 SVM RBF Texture 0.8398 0.8317 0.8396 0.8383 0.8398 0.1683 0.8968 Morphological 0.8420 0.8375 0.842 0.8383 0.8420 0.1625 0.9069 SIFT 0.7368 0.7248 0.7364 0.7268 0.7368 0.2752 0.7948 EFDs 0.7275 0.6929 0.7343 0.7406 0.7275 0.3071 0.7940 Entropy 0.8521 0.8359 0.8546 0.8597 0.8521 0.1641 0.8857 SVM Gaussian Texture 0.8331 0.8269 0.8329 0.8300 0.8331 0.1731 0.8918 Morphological 0.8343 0.8318 0.8346 0.8292 0.8343 0.1682 0.9109 SIFT 0.7439 0.7274 0.7427 0.7356 0.7439 0.2726 0.7990 EFDs 0.7375 0.7055 0.7433 0.7492 0.7375 0.2945 0.7945 Entropy 0.8487 0.8307 0.8522 0.8585 0.8487 0.1693 0.8779 Decision tree Texture 0.5517 0.6013 0.6028 0.5814 0.5517 0.3987 0.6878 Morphological 0.8487 0.8443 0.8487 0.8451 0.8487 0.1557 0.9117 SIFT 0.7404 0.7235 0.7391 0.732 0.7404 0.2765 0.8039 EFDs 0.4716 0.5566 0.5412 0.5232 0.4716 0.4434 0.5175 Entropy 0.8565 0.8475 0.8565 0.8567 0.8565 0.1525 0.9173 DL GoogleNet AutoP 0.9926 0.9924 0.9926 0.9924 0.9926 0.0076 0.9998 GoogleNet DiffP 0.9815 0.9819 0.9815 0.9803 0.9815 0.0181 0.9983 AlexNet AutoP 0.9889 0.9894 0.9889 0.9878 0.9889 0.0106 0.9981 AlexNet DiffP 0.9926 0.9907 0.9927 0.9942 0.9926 0.00093 0.9996 Legends: AutoP (Auto/default parameters), DiffP (Different/Optimized Parameters). default parameters was obtained with (AUC = 0.9998) and AlexNet with an optimized set of parameters as (AUC = 0.9996). Figure 6 depicts the performance using GoogleNet with default parameters for 6 epochs and 378 iterations. For each training and validation, the accuracy was observed lower in the 1st and 2nd epoch accordingly higher the loss. The accuracy becomes higher in higher iterations and epochs with a decrease in loss. After the 2nd epoch, there were almost steady values of accuracy near 100% and lower loss of less than 0.3 as can be observed in Figure 6. Figure 6(a–b) shows the loss and accuracy in different iterations obtained using GoogleNet. In initial iterations, the mini-batch and validation values were higher and decreased in higher iterations. As shown in Figure 7(a), the mini batch at selected iter- ations using GoogleNet was found as 1st iteration (0.8184), 10th iteration (0.2545), 20th iteration (0.2597), 45th iteration (0.2059) and 55th iteration (0.0422). Similarly, validation
  • 19. 18 L. HUSSAIN ET AL. Figure 7. Performance measure using GoogleNet (a) Loss, (b) Accuracy. loss at selected iterations was found as 1st iteration (0.7308), 10th iteration (0.3007), 20th iteration (0.1162), 45th iteration (0.0685) and 55th iteration (0.0669). Moreover, accuracy at selected iterations using GoogleNet is reflected in Figure 7(b). The validation accuracy was found as 1st iteration (40%), 10th iteration (90%), 20th iteration (85%), 45th iteration
  • 20. WAVES IN RANDOM AND COMPLEX MEDIA 19 (90%), and 55th iteration (100%). Similarly, mini-batch accuracy was found as 1st iteration (36.30%), 10th iteration (84.44%), 20th iteration (96.60%), 45th and 55th iteration (98.15%). 4. Discussions The CNN uses a convolution operation on the convolution layer which serves as a detection filter for the presence of a particular feature or pattern present in the original data. Instead of being a priori assigned as in conventional image processing, parameters of such filters and learned based on training data and are specialized to solve the problem at hand. This shows that lower layers of CNN can detect features that are usually common for each of the image recognition tasks, such as edges and curves [67]. Convolutional Neural Networks (CNNs) have had the greatest impact within the field of health informatics. Its architecture can be defined as an interleaved set of feed-forward layers implementing convolutional filters followed by reduction, rectification, or pooling layers. Each layer in the network orig- inates a high-level abstract feature [88]. First, in CNNs weights in the network are shared in such a way that the network performs convolution operations on images. This way, the model does not need to learn separate detectors for the same object occurring at different positions in an image, making the network equivariant with respect to translations of the input. It also drastically reduces the number of parameters (i.e. the number of weights no longer depends on the size of the input image) that need to be learned. In DL, the first CNN winning the ILSVRCs, which also made CNN’s very popular, was the AlexNet architecture [33]. This architecture comprises 5 convolutional layers, max-pooling layers, dropout layers, and three fully connected layers and employs ReLU as an activation function. It obtained a top-5 error rate of 15.6%, the error in classifying an image within the closest five classes. AlexNet was then improved in the next year by the authors by modify- ing its parameters and achieving a top-5 error rate of 11.2% [59]. In 2014, VGGNet [89], even though it did not win the competition, showed that it was possible to reduce the number of parameters and, at the same time to increase the depth of the network, achieving bet- ter performance than the architecture mentioned above with an error rate of 7.3%. This architecture is composed of more convolutional layers than AlexNet, 13 exactly, which are smaller in terms of filter dimensions leading to a reduction of parameters but being able to learn more high-level features than previous CNN. Another essential architecture, the winner of the ILSVRC 2014 with an error rate of 6.7%, is GoogleNet [59,89]. The architecture changes the way of structuring CNN architectures, which stack single layers one upon other sequentially, introducing the inception module. The architecture is modularized, and the main block is the inception module, which is com- posed of convolutional layers that are arranged in parallel. GoogleNet has 122 layers, but not all in sequential, as in AlexNet, parts of the network are executed in parallel, mainly its Inception module. Each of its nine Inception modules is a network within the network layer leading to over 100 layers total. GoogleNet trained on ‘a few high-end GPUs within a week’ [14]. In the present study, we first extracted hand-crafted features and fed to different tra- ditional machine leaning (ML) algorithms. For ML techniques, different features such as texture, morphological, entropy based, SIFT, and EFDs are extracted from breast cancer mammograms. In the second phase CNN methods utilizing a TL approach was employed in which GoogleNet and AlexNet pre-trained models are trained. The deep learning methods
  • 21. 20 L. HUSSAIN ET AL. are more robust when the data volume is large. Moreover, deep learning models utilizes the feature engineering processing using the domain knowledge by extracting high-level characteristics directly from the data. This capability decreases the DL effort and time to construct a feature extractor for each problem. The GoogleNet was retrained on the new set of cancer images. The weights in the earlier layers in the network were frozen by set- ting learning rate to zero. Moreover, the parameters were not updated during freezing the training layers, which help to improve the network performance significantly and also help- ful to avoid overfitting. For each model, we have used default and optimized parameters for evaluating the performance. The deep learning model with transfer learning approach effectively utilizing previously learning model knowledge to solve the new task with fine- tuningorminimumtraining.Thedeeptransferlearning(DTL)approachishelpfultoaddress the computational issues. By applying the traditional machine learning algorithms with hand-crafted features, the Naïve Bayes yielded the highest accuracy (57.54%) with SIFT features. The SVM Polynomial yielded the highest accuracy (82.65%) with texture feature, SVM RBF provided an accuracy (85.21%) with entropy features, SVM Gaussian with entropy features provided an accuracy (84.87%), and decision tree yielded an accuracy (85.65%) with entropy features. The deep learning models with transfer learning approach improved the classification performance i.e. GoogleNet with default parameters yielded accuracy (99.26%), AUC (0.9998), and AlexNet with optimized parameters yielded accuracy (99.26%) and AUC (0.9996). 5. Conclusion In this research, the CNN models are employed. Results are compared with ML classi- fication techniques such as SVM kernels, Bayesian approach, and Decision Tree to dis- tinguish the cancer mammograms from that of normal subjects. The mass detection is due to the low image contrast, and microcalcification is due to the large variation in size and shape for which multimodal features are extracted to distinguish the cancer mammograms effectively to. We extracted texture, morphology, entropy based, SIFT, and EFDs features for training and validating the ML classifiers. A 10-fold cross-validation was used to train and test the image database. The performance was measured based on specificity, sensitivity, PPV, NPV, FPR, and AUC. The CNN GoogleNet with default param- eters and AlexNet with optimized parameters gives the highest performance of (TA, Sensitivity = 99.26%, AUC = 0.9998,0.9996) respectively, followed by Decision Tree with (TA = 85.65%, AUC = 0.9173), SVM RBF with (TA = 85.21%, AUC = 0.8057). Using ML classifiers, the entropy-based features give the highest performance evaluation measures than the other features extracted from breast cancer mammograms. The detection per- formance utilizing the deep learning methods with transfer learning approach improved the classification performance than traditional machine learning algorithms due to the dynamic feature engineering characteristics. Thus, the proposed approach is more robust for improving the detection of breast mammogram and improving the healthcare systems. 5.1. Limitations and future directions The present study was focused to apply machine learning methods with diverse hand- crafted features based approaches and deep learning methods. Though researchers are still working on multiple aspects of feature-extracting strategies to improve the classification
  • 22. WAVES IN RANDOM AND COMPLEX MEDIA 21 performance using deep learning algorithms. In this context, a light deep learning based architecture using minimum number of layers for optimized MRI scans with empirically controlled unknown parameters generating dynamic features will be utilized. Similarly, the attention mechanisms are being constantly used to focus the important regions in the image by enhancing the weight of the image location, thereby taking care of the loss of spatial information at the cost of improving the feature information will be used. Another future direction is to collect a primary dataset for better BC control, containing the clin- ical parameters and demographic profiles of the patients as well as pathological control response, survival, and progression of the patients. We will also utilize the hybrid deep learning methods, and parametric optimization using grid search, Bayesian optimization and genetic algorithms to further improve the classification performance. Acknowledgement This study is supported via funding from Prince Sattam Bin Abdulaziz University project number PSAU/2023/R/1444. The authors would like to thank the Deanship of Scientific Research. At Shaqra University for supporting. Disclosure statement No potential conflict of interest was reported by the author(s). References [1] Forouzanfar MH, et al. Breast and cervical cancer in 187 countries between 1980 and 2010: a systematic analysis. Lancet. 2011;378(9801):1461–1484. [2] Jemal A, et al. Global cancer statistics. CA Cancer J Clin. 2011;61(2):69–90. [3] Dheeba J, Singh NA, Selvi ST. Computer-aided detection of breast cancer on mammo- grams: a swarm intelligence optimized wavelet neural network approach. J Biomed Inform. 2014;49:45–52. [4] DeSantis CE, et al. Breast cancer statistics, 2015: convergence of incidence rates between black and white women. CA Cancer J Clin. 2016;66(1):31–42. [5] Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34. [6] Wirén S, et al. Pooled cohort study on height and risk of cancer and cancer death. Cancer Causes Contr. 2014;25(2):151–159. [7] Walter RB, et al. Height as an explanatory factor for sex differences in human cancer. J Natl Cancer Inst. 2013;105(12):860–868. [8] Ardakani AA, Gharbali A, Mohammadi A. Classification of breast tumors using sonographic texture analysis. J Ultrasound Med. 2015;34(2):225–231. [9] Sprague BL, et al. Variation in mammographic breast density assessments among radiologists in clinical practice: a multicenter observational study. Ann Intern Med. 2016;165(7):457–464. [10] Freer PE. Mammographic breast density: impact on breast cancer risk and implications for screening. Radiographics. 2015;35(2):302–315. [11] Acharya UR, et al. Data mining framework for breast lesion classification in shear wave ultra- sound: a hybrid feature paradigm. Biomed Signal Process Contr. 2017;33:400–410. [12] Zhang L, et al. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision. Sci Rep. 2015;5(1):1–14. [13] Sathish D, et al. Medical imaging techniques and computer aided diagnostic approaches for the detection of breast cancer with an emphasis on thermography-a review. Int J Med Eng Inform. 2016;8(3):275–299. [14] Machida Y, et al. Single focus on breast magnetic resonance imaging: diagnosis based on kinetic pattern and patient age. Acta Radiol. 2017;58(6):652–659.
  • 23. 22 L. HUSSAIN ET AL. [15] Kolb TM, Lichy J, Newhouse JH. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology. 2002;225(1):165–175. [16] Cheng H-D, et al. Approaches for automated detection and classification of masses in mammo- grams. Pattern Recognit. 2006;39(4):646–668. [17] Skaane P, Engedal K. Analysis of sonographic features in the differentiation of fibroadenoma and invasive ductal carcinoma. Am J Roentgenol. 1998;170(1):109–114. [18] Doi K. Computer-aided diagnosis: potential usefulness in diagnostic radiology and telemedicine. In Proceedings of the National Forum: Military Telemedicine On-Line Today Research, Practice, and Opportunities. 1995. IEEE. [19] Hussain L, et al. Spatial wavelet-based coherence and coupling in EEG signals with eye open and closed during resting state. IEEE Access. 2018;6:37003–37022. [20] Hussain L, et al. Arrhythmia detection by extracting hybrid features based on refined Fuzzy entropy (FuzEn) approach and employing machine learning techniques. Waves Random Com- plex Media. 2020;30(4):656–686. [21] Hussain L, et al. Regression analysis for detecting epileptic seizure with different feature extract- ing strategies. Biomed Eng Biomed Tech. 2019;64(6):619–642. [22] Karahaliou AN, et al. Breast cancer diagnosis: analyzing texture of tissue surrounding microcal- cifications. IEEE Trans Inf Technol Biomed. 2008;12(6):731–738. [23] Kupinski MA, Giger ML. Automated seeded lesion segmentation on digital mammograms. IEEE Trans Med Imaging. 1998;17(4):510–517. [24] Sahiner B, et al. Improvement of mammographic mass characterization using spiculation mea- sures and morphological features. Med Phys. 2001;28(7):1455–1465. [25] Zhen L, Chan AK. An artificial intelligent algorithm for tumor detection in screening mammo- gram. IEEE Trans Med Imaging. 2001;20(7):559–567. [26] CaldwellCB,etal.Characterisationofmammographicparenchymalpatternbyfractaldimension. Phys Med Biol. 1990;35(2):235. [27] Li H, Liu KR, Lo S-C. Fractal modeling and segmentation for the enhancement of microcalcifica- tions in digital mammograms. IEEE Trans Med Imaging. 1997;16(6):785–798. [28] Chen D-R, et al. Classification of breast ultrasound images using fractal feature. Clin Imaging. 2005;29(4):235–245. [29] Hussain L, et al. Applying Bayesian network approach to determine the association between morphological features extracted from prostate cancer images. IEEE Access. 2018;7:1586–1601. [30] Qureshi SA, et al. Intelligent ultra-light deep learning model for multi-class brain tumor detec- tion. Appl Sci. 2022;12(8):3715. [31] Hussain L, et al. Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies. Cancer Biomark. 2018;21(2):393–413. [32] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. [33] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90. [34] Delphia AA, Kamarasan M, Sathiamoorthy S. Image processing for identification of breast cancer: a literature survey. Asian J Electr Sci. 2018;7(2):28–37. [35] Kupinski MA, et al. Ideal observer approximation using Bayesian classification neural networks. IEEE Trans Med Imaging. 2001;20(9):886–899. [36] Lyons L. Statistical problems in particle physics, astrophysics and cosmology: PHYSTAT05, Oxford, UK, 12–15 September 2005. 2006: Imperial College Press. [37] Specht DF. Probabilistic neural networks. Neural Netw. 1990;3(1):109–118. [38] Hamad YA, Simonov K, Naeem MB. Breast cancer detection and classification using artificial neu- ral networks. In 2018 1st Annual International Conference on Information and Sciences (AiCIS). 2018. IEEE. [39] Zheng B, Qian W, Clarke LP. Digital mammography: mixed feature neural network with spectral entropy decision for detection of microcalcifications. IEEE Trans Med Imaging. 1996;15(5):589–597.
  • 24. WAVES IN RANDOM AND COMPLEX MEDIA 23 [40] Nahid A-A, Kong Y. Involvement of machine learning for breast cancer image classification: a survey. Comput Math Methods Med. 2017;2017:3781951–3781951. [41] Bhandare A, et al. Applications of convolutional neural networks. Int J Comp Sci Inform Technol. 2016;7(5):2206–2215. [42] Lo S-CB, et al. A multiple circular path convolution neural network system for detection of mammographic masses. IEEE Trans Med Imaging. 2002;21(2):150–158. [43] Sahiner B, et al. Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images. IEEE Trans Med Imaging. 1996;15(5):598–610. [44] Jiao Z, et al. A deep feature based framework for breast masses classification. Neurocomputing. 2016;197:221–231. [45] Fonseca P, et al. Automatic breast density classification using a convolutional neural network architecture search procedure. In Medical imaging 2015: computer-aided diagnosis. 2015. SPIE. [46] Su H, et al. Region segmentation in histopathological breast cancer images using deep convolu- tional neural network. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI). 2015. IEEE. [47] Huynh BQ, Li H, Giger ML. Digital mammographic tumor classification using transfer learning from deep convolutional neural networks. J Med Imaging. 2016;3(3):034501. [48] Arevalo J, et al. Representation learning for mammography mass lesion classification with convolutional neural networks. Comput Methods Programs Biomed. 2016;127:248–257. [49] Rezaeilouyeh H, Mollahosseini A, Mahoor MH. Microscopic medical image classification frame- work via deep learning and Shearlet transform. J Med Imaging. 2016;3(4):044501. [50] Jaffar MA. Deep learning based computer aided diagnosis system for breast mammograms. Int J Adv Comp Sci Appl. 2017;8:7. [51] Jadoon MM, et al. Three-class mammogram classification based on descriptive CNN features. BioMed Res Int. 2017;2017:3640901–3640901. [52] Gastounioti A, et al. Using convolutional neural networks for enhanced capture of breast parenchymal complexity patterns associated with breast cancer risk. Acad Radiol. 2018;25(8): 977–984. [53] Wang H, et al. Breast mass classification via deeply integrating the contextual information from multi-view data. Pattern Recognit. 2018;80:42–52. [54] Zhu W, et al. Adversarial deep structured nets for mass segmentation from mammograms. In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). 2018. IEEE. [55] Ribli D, et al. Detecting and classifying lesions in mammograms with deep learning. Sci Rep. 2018;8(1):1–7. [56] Chiao J-Y, et al. Detection and classification the breast tumors using mask R-CNN on sonograms. Medicine. 2019;98:19. [57] Nahid A-A, Mehrabi MA, Kong Y. Histopathological breast cancer image classification by deep neural network techniques guided by local clustering. BioMed Res Int. 2018;2018:2362108– 2362108. [58] Szegedy C, et al. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [59] Shin H-C, et al. Deep convolutional neural networks for computer-aided detection: CNN archi- tectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging. 2016;35(5): 1285–1298. [60] Chen H, et al. Standard plane localization in fetal ultrasound via domain transferred deep neural networks. IEEE J Biomed Health Inform. 2015;19(5):1627–1636. [61] Heath M, et al. Current status of the digital database for screening mammography. In: Karssemei- jer N, Thijssen M, Hendriks J, et al., editors. Digital mammography. Dordrecht: Springer; 1998. p. 457–460. [62] Lévy D, Jain A. Breast mass classification from mammograms using deep convolutional neural networks. arXiv preprint arXiv:1612.00542, 2016. [63] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. 2015. PMLR.
  • 25. 24 L. HUSSAIN ET AL. [64] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010. JMLR Workshop and Conference Proceedings. [65] Chen T, et al. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl. 2017;72:221–230. [66] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–1359. [67] Yosinski J, et al. How transferable are features in deep neural networks? Adv Neural Inf Process Syst. 2014;27:1792. [68] Deng J, et al. Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 2009. IEEE. [69] Zhu Z, et al. Extreme weather recognition using convolutional neural networks. In 2016 IEEE International Symposium on Multimedia (ISM). 2016. IEEE. [70] Elhoseiny M, Huang S, Elgammal A. Weather classification with deep convolutional neural networks. In 2015 IEEE International Conference on Image Processing (ICIP). 2015. IEEE. [71] Soekhoe D, Putten PVD, Plaat A. On the impact of data set size in transfer learning using deep neural networks. In International symposium on intelligent data analysis. 2016. Springer. [72] Chu B, et al. Best practices for fine-tuning visual classifiers to new domains. In European confer- ence on computer vision. 2016. Springer. [73] Kim KG. Book review: deep learning. Healthc Inform Res. 2016;22(4):351–354. [74] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In Proceedings of the four- teenthinternationalconferenceonartificialintelligenceandstatistics.2011.JMLRWorkshopand Conference Proceedings. [75] Goodfellow I, Bengio Y, Courville A. Convolutional networks. In: Goodfellow I, Bengio Y, Courville A, editors. Deep learning. Cambridge: MIT Press; 2016. p. 330–372. [76] Singh RG, Kishore N. The impact of transformation function on the classification ability of complex valued extreme learning machines. In 2013 International Conference on Control, Com- puting, Communication and Materials (ICCCCM). 2013. IEEE. [77] Bengio Y. Practical recommendations for gradient-based training of deep architectures. In: Mon- tavon G, Orr GB, Müller KR, editors. Neural networks: tricks of the trade. Berlin, Heidelberg: Springer; 2012. p. 437–478. [78] Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML. 2013. Atlanta, Georgia, USA. [79] Tóth L. Phone recognition with deep sparse rectifier neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. IEEE. [80] Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. Haifa: ICML; 2010. [81] Jarrett K, et al. What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th international conference on computer vision. 2009. IEEE. [82] Lai M. Deep learning for medical image segmentation. arXiv preprint arXiv:1505.02000, 2015. [83] Rosasco L, et al. Are loss functions all the same? Neural Comput. 2004;16(5):1063–1076. [84] Boyd S, Boyd SP, Vandenberghe L. Convex optimization. Los Angeles: Cambridge University Press; 2004. [85] Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med. 2013;4(2):627. [86] Mishra S, Panda M. A histogram-based classification of image database using scale invariant features. Int J Image Graphics Signal Proc. 2017;9(6):55. [87] Hussain L. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach. Cogn Neurodyn. 2018;12(3):271–294. [88] Ravì D, et al. Deep learning for health informatics. IEEE J Biomed Health Inform. 2016;21(1):4–21. [89] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.