1
EARLY DETECTION AND CLASSIFICATION OF
BREAST CANCER USING MAMMOGRAMS AND
MACHINE LEARNING ALGORITHMS
Mr. P. Narasimhaiah
Research Scholar
ROLL NO: 1012002203
Department of CSE
YSR Engineering College of Yogivemana University
Proddatur
Andhra Pradesh
SUPERVISOR
Prof. Dr. C. NAGARAJU
Department of CSE
YSR Engineering College of Yogivemana University
CONTENT
1.Introduction
1.1. Motivation
1.2. Tumor
1.3. Socio economic relevance of the Breast cancer
1.4. Mammography Database
2. Literature Survey
3. Research gaps
4. Objectives
5. Research Work
6. Objective 2 & 3
7. Objective 2, 3 & 4
8. Objectives 1, 2, 3, & 4
2
1.
1. Introduction
1.1 Motivation
•Breast Cancer comprises 16% of all cancers in women-
WHO Report.
•The number of cases of Breast Cancer is more in
developed countries while the incident rates are low.
•On the contrary the incidence rates are high in
developing nations while the number of cases currently
are relatively low.
• 1.15 million new cases.
• Incidence increasing in most countries
•470 000 deaths. 3
Continued…
 Breast cancer is increasing in India also. There are an
estimated 1,00,000 to 1,25,000 new breast cancer
cases in India every year.
 Breast cancer survival rate in India is 66%.
 The number of breast cancer cases in India is
estimated to double by 2025.
 75% breast cancer Indian women are below 50 aged.
 Early detection of breast cancer can improve the
chances of successful treatment and recovery.
4
1.2 About Tumours
An abnormal mass of tissue that forms when cells grow
and divide more than they should or do not die when
they should.
A tumour is a mass or lump of tissue that may resemble
swelling.
Tumour Types
Mainly Three types of tumour are there:
Benign are not cancerous. They
either cannot spread or grow, or
they do so very slowly. If a doctor
removes them, they do not
generally return.
5
Continued…
…
Premalignant tumors are not
yet cancerous, but they have
the potential to become malignant.
Malignant tumors are cancerous.
The cells can grow and spread
to other parts of the body.
It is not always clear how a tumor
will act in the future. Some benign
tumors can become premalignant
and then malignant. For this
reason, it is best to monitor any growth.
6
• The survival rate of breast cancer is low in countries
where literacy rate is low.
• Diagnosis, treatment and prognosis of breast cancer
patients with low socioeconomic status were lower and
poorer.
• Low socio economic status women were less likely to
get breast screening.
• High socio economic status women were more likely to
have breast cancer risk.
• Racial differences in survival of breast cancer are
also large.
1.3. Socio economic relevance of the Breast
cancer
1. 4. Mammography Database
 Digital Database For Screening Mammography(DDSM)
 Inbreast Mammographic Database.
 National Mammography Database (NMD).
 MIAS Mammography.
 Chinese Mammography Database (CMD).
 LLNL/UCSF Database.
 NIJMEGEN Database.
 MammoGRID.
 Vin Dr Mammogram Dataset.
 Washington University Digital Mammography
Database.
8
2. Literature Survey
 Breast Cancer is a kind of cancer where cells in the
chest tissue split and cultivate in unusual manner.
 Breast cancer is the second leading cause of cancer
deaths among women.
 Every two minutes a woman is diagnosed with breast
cancer.
 The chance that breast cancer will be responsible for a
woman's death is about 1 in 36 (about 3%).
 Treatment includes surgery, drugs (hormonal therapy
and chemotherapy, and radiation).
9
Survey of Previous Works
S. No Work Publisher Method Results Feature Scope
1 Application of Gabor wavelet and Locality
Sensitive DiscriminantAnalysis for automated
identification of breast cancer using digitized
mammogram images
Elsevier Analysis of Variance
(ANOVA)
Decision Tree (DT)
classifier
An automated
diagnosis of breast
cancer with a huge
sample size
The proposed work can be
extended to diagnose other
diseases like diabetes
retinopathy, coronary artery
disease, brain tumor, and
tuberculosis.
2 An Optimized Framework for Breast Cancer
Classification Using
Machine Learning
BioMed
Research
International
2022
ML based on
BOTPE( Bayesian
Optimization with
Tree Structured
Parzen Estimator)
Machine Learning
Model-Based
HPO(Hyperparamet
er Optimization)
the LightGBM
classifier
outperformed the
other four
classifiers in
accuracy, precision,
recall, and –score.
Future work Reduce
Misclassification of breast
lesions, resulting in a high
false-positive rate.
3 Machine learning models in breast cancer survival
prediction
Technology and
Health Care 24
Machine Learning prediction of breast
cancer survival.
Feature importance can be
estimated during training for
little additional computation.
4 Character-based Convolutional Grid Neural
Network for Breast Cancer Classification
IEEE CNN It can achieve a
state-of the-
art performance
with minimal
computational cost.
Using the CharCNN and
residual layers for
representation learning remains
an avenue for future work.
5 Computer-Aided Detection and Diagnosis of
Breast Cancer With Mammography IEEE
Transactions
Development of
CAD systems
Image enhancement To develop more
effective CAD systems for
further image enhancement.
6 Breast cancer detection by leveraging Machine
Learning
Elsevier Deep Neural
Network
with Support Value
To meet the better
performance,
efficiency, and
quality of images
To meet the better
performance, efficiency, and
quality of images using Deep
learning.
7 A Multimodal Deep Neural Network for
Human Breast Cancer Prognosis
Prediction by Integrating Multi-
Dimensional Data
IEEE/ACM
TRANSACTI
ONS 2019
MDNNMD
Prediction Model
for Multi-
Dimension
Data
MDNNMD
achieves
competitive or
better
performance
than SVM, RF
and LR.3.
MDNNMD still
has some avenues for
further investigation
predicting survival
time of breast cancer.
8 Multi-task fusion for improving mammography
screening data classification
IEEE
TRANSACTIO
NS ON
MEDICAL
IMAGING 2021
fusion prediction classifying images
model F reaches
an AUC score of
0.921 on test data
with TPR = 0.881
and
specificity = 0.802
full-field digital mammography
dataset
would be the logical next step
and potentially boost
performance
when trained on a larger data
resource.
9 Meta-learning Based Breast Abnormality
Classification on Screening Mammograms
ICCEA-2021 The idea of meta-
learning, also
known as
“learn to learn”
The meta-learning-
based Mass and
Calcification
classifier
reached the
accuracy of 78%
and the
contrast
experiment using
The possible
future analysis on applying
meta-learning on cross-
category
classification with medical
images.
10 Anomaly Detection of Calcifications in
Mammography Based on 11,000
Negative Cases
IEEE
TRANSACTI
ONS ON
BIOMEDICA
L
ENGINEERI
NG, VOL.
69, NO. 5,
MAY 2022
annotation-
efficient,
semi-supervised
detect
calcifications on
digital
mammograms
although trained
on negative
images, the
performance is
promising
investigate more
fundamental properties of
anomaly
detection that could
potentially generalize
beyond the present
task to other areas of
medical imaging.
12
11 Real-Time Ultrasound Detection
of Breast
Microcalcifications Using
Multifocus
Twinkling Artifact Imaging.
IEEE
TRANSACTIONS ON
MEDICAL IMAGING,
VOL. 41, NO. 5, MAY
2022.
Real-time multifocus
twinkling artifact (MF-
TA).
Exploits
time-varying TAs
arising from acoustic
random scattering
on MCs with rough or
irregular surfaces.
The biggest drawback of
US is the low visibility of
MCs, and
it is a major limitation in
screening for early breast
cancer.
12 Multidirectional Gabor Filter-
Based Approach for
Pectoral Muscle Boundary
Detection.
IEEE TRANSACTIONS
ON RADIATION AND
PLASMA MEDICAL
SCIENCES, VOL. 6,
NO. 4, APRIL 2022.
Multidirectional
Gabor filter (MDGF)-
based approach for PMB
detection.
It eliminates the need for
straight-line estimation
and is able to detect
PMB with high accuracy
even if it deviates highly
from its usual straight-
line estimate.
The performance of any
intensity-based approach in
detecting such
fuzzy textural edges is poor.
13 WDCCNet: Weighted Double-
Classifier
Constraint Neural Network for
Mammographic
Image Classification.
IEEE TRANSACTIONS
ON MEDICAL
IMAGING, VOL. 41,
NO. 3, MARCH 2022.
Double-classifier
network architecture that
constrains the extracted
features.
Easily applied to an
existing
convolutional neural
network to improve
mammographic
image classification
performance.
Two classifiers decision
boundaries
to constrain the distribution
of feature vectors with small
intra-class variance and large
inter-class variance.
3. Research Gaps
1. Present research on breast cancer declined mortality
rate, but not reached to eliminate cancer.
2. In present research the accuracy cancer detection is
declined due presence of noise, uneven-illumination,
labels and pectoral muscles.
3. Present breast cancer detection systems assist the
radiologist, but not with high accuracy.
4. In present research miss classification breast cancer
leads to high false positive and false negative rate.
5. To improve performance of breast cancer detection in
present research identify important features before
training the model. 13
Continued…
6. Develop a effective CAD system to enhance a
mammographic image.
7. In place of digital mammograms use multi-field digital
mammograms to boost performance.
8. Investigate fundamental properties of anomaly
detection to extend to other medical images.
9. Meta learning to classify other medical images.
10. Poor performance of intensity based approach to
detect fuzzy textural edges in pectoralis.
11. Constraint on classification if the feature vector is
with small intra-class variance and large inter-class
variance. 14
4. Objectives of Research
1. Hough transformation based GLCM feature
extraction method(HTGFEM) implemented to remove
pectoralis and artefacts from mammograms.
2. Geometric pattern based feature extraction and ANN
(GPFEA)novel technique designed and developed to
eliminate noise and enhance mammogram with
uneven illumination and low contrast.
3. Gabor filter based ensemble ML method(GFEMLM)
implemented to developed to classify breast masses
as benign or malignant with high accuracy.
15
1. P. Narasimhaiah, C. Naga Raju ” Machine Learning Technique for
Prediction of Breast Cancer ” published in International Journal on
Recent and Innovation Trends in Computing and Communication ISSN:
2321-8169 Volume: 11 Issue: 7s DOI:
https://doi.org/10.17762/ijritcc.v11i7s.7012 ,Published: 05 June 2023 ,
PAGE NO: 368-380, data based in Scopus.
2. P. Narasimhaiah, C. Nagaraju “Breast Cancer Screening Tool Using
Gabor Filter-Based Ensemble Machine Learning Algorithms” published
in International Journal of INTELLIGENT SYSTEMS AND
APPLICATIONS IN ENGINEERING ISSN:2147-67992 Volume: 11
Issue: 2 Published: 17.02.2023 , PAGE NO: 936-947, data based in Scopus.
.
3. P. Narasimhaiah, C. Nagaraju “Breast Mass Prediction Based on
Geometric Pattern Features Using ANN and Mammograms” published
in NOVYI MIR Research Journal ISSN NO: 0130-7673
DOI:16.10098.NMRJ.2022.V8I7.256342.37693, VOLUME 8 ISSUE 7,
Published: July 2023,PAGE NO: 163-172, data based in WOS.
5. Research Papers
• Hough transformation based GLCM feature extraction
method(HTGFEM) developed to remove pectoralis and
artefacts from mammograms.
• Hough Transformation used determine the boundary
between breast and pectoralis.
• Opening and closing operations are used to eliminate
bright spots and artefacts.
• Using GLCM six properties are computed to construct
feature database.
17
6. Implementation of objective1
6.1 Proposed method system diagram
18
6.2 Proposed Method
 Pre-processing
 The quality of the database is improved by applying pre-processing to database.
 The pre-processing operations are
 Noise removal- Noise is removed in mammograms using adaptive median
filter.
 Pectoral removal- pectoral muscles are identified and removed using Hough
Transformation.
 Label removal-abnormal bright spots are removed by using opening operations
and labels are removed by using closing operations.
 Canny edge detection-the abnormal region boundary can be determined using
Canny edge operator.
19
Continued…
 GLCM- Grey Level Co-occurrence Matrix
GLCM is applied on abnormal boundary determined
mammographic image to obtain similar grey values feature.
The database is created by calculating contrast, dissimilarity,
homogeneity, ASM, energy, and correlation properties from the
final GLCM.
Database of12 mammogram images:
20
Contrast Dissimilarity Homogeneity ASM Energy Correlation Label
1037.875212 9.914760 1.836836 1.626391 1.803531 0.466991 1
1309.909322 12.955791 1.777469 1.500280 1.732201 0.508517 1
3433.140819 34.837429 1.431750 0.907204 1.346868 0.794332 1
2989.287924 31.638912 1.455632 0.949860 1.377985 0.780013 1
1673.154025 16.694562 1.727541 1.405416 1.676446 0.539076 1
1245.323446 12.961017 1.784734 1.512344 1.739076 0.735289 1
914.944421 8.954308 1.845410 1.639447 1.810736 0.460411 2
1148.243291 11.081992 1.811778 1.569226 1.771545 0.465053 2
964.208263 9.590466 1.841341 1.627621 1.804199 0.527467 2
1394.526554 14.196328 1.750392 1.439865 1.696928 0.568649 2
1478.695550 15.536935 1.722486 1.361569 1.650141 0.577631 2
1567.289548 16.577966 1.699204 1.318028 1.623554 0.602922 2
Continued…
 Multi Linear Regression Model
 Mathematical representation of multi linear
regression model is
 MLR model trained using training database.
 The trained model classify abnormal regions as benign or
malignant.
Malignant
Benign
21
6.3 Experimental results and Discussion
 For experimentation two publically available datasets are considered:
INbreast Dataset and MIAS Dataset.
 Confusion matrix is used to analyse the performance of MLR model.
 From the confusion matrix the statistical parameters precision, recall,
accuracy, f1-score, T-test, and F-test are computed to analyse the
performance.
22
Sample
size%
Precision Recall F1-Score MAE MSE RMSE T-test F-test Accuracy
0.10 0.50 1.00 0.67 0.503 0.334 0.00 0.565 0.0 0.50
0.15 0.50 1.00 0.67 0.503 0.334 0.00 0.568 0.0 0.50
0.20 0.50 0.50 0.50 0.552 0.393 0.627 -0.12 0.6 0.33
0.25 0.50 0.50 0.50 0.552 0.393 0.627 -0.12 0.6 0.33
0.30 0.67 0.67 0.67 0.454 0.318 0.564 -0.20 0.73 0.50
0.35 0.33 1.00 0.50 2.672 15.75 3.968 1.38 48.30 0.40
0.04 0,33 1.00 0.50 2.672 15.75 3.968 1.38 48.30 0.40
0.45 0.33 1.00 0.50 5.411 60.64 7.787 1.97 142.96 0.17
0.50 0.33 1.00 0.50 5.411 60.64 7.787 1.97 142.96 0.17
0.55 0.33 0.33 0.33 1.077 1.886 1.376 1.26 6.85 0.14
0.60 0,00 0.00 0.00 1.451 3.041 1.743 -0.46 16.26 0.12
0.65 0.00 0.00 0.00 1.45 3.04 1.74 -0.46 16.26 0.12
0.70 1.00 0.50 0.67 0.531 0.387 0.622 -0.46 16.26 0.56
0.75 1.00 0.50 0.67 0.531 0.387 0.622 -1.41 0.213 0.56
0,80 1.00 0.56 0.71 0.437 0.293 0.541 -1.03 0.213 0.60
0.85 1.00 0.45 0.62 0.545 0.545 0.338 -3.46 0.418 0.45
6.4 Graphs
23
Relationship between actual values and
predicted values
Relationship between sample size and
accuracy
Relationship between precision and accuracy. Relationship between recall and accuracy.
Continued…
24
Relationship between MAE and accuracy
Relationship between F1-scoreandaccuracy.
Relationship between MSE and accuracy. Relationship between RMSE and Accuracy.
Relationship between F-test and accuracy.
6.5 Conclusion
 The proposed HTGFEM is implemented and tested with
bench mark dataset.
 This method produced better results in the presence of noise
and unwanted regions like pectoral muscles and artefact for
classification.
 The accuracy of the proposed HTGFEM is 60%.on high
noise, low contrasted artefacts and ambiguous pectorals
which are considered for the experimentation.
 The next research work further improve the performance of
classification by further enhancing pre-processing.
25
7. Implementation of Objectives2
 Gabor filter based ensemble ML method(GFEMLM)
implemented to enhance the mammogram in the
presence of uneven illumination and low contrast.
 OTSU threshold method is used find different
thresholds of mammogram. From these thresholds
hard threshold is determined. Breast region is selected
as largest connected component.
 Fuzzy textural edge of PMB is determined using Gabor
filter and PMB is removed.
 Noise is removed using AMF .
 Classification of breast mass is done using ensemble
ML algorithms.
26
7.1 System diagram
27
7.2 Proposed Method
 Pre-processing- to achieve better performance
database should be pre-processed.
 The pre-processing of the mammographic database
involves:
 Change of orientation- instead of developing the proposed
method to each orientation, develop only for left orientation and
right orientation should be changed to left by flipping.
 Artefact removal-The hard threshold of mammogram is
determined as 0.5 * lowest threshold value. Thus mammogram
region is selected as the largest connected area by eliminating
different kinds of artefacts.
28
Continue…
 Image enhancement-The contrast-limited adaptive histogram equalization
(CLAHE) technique is used to increase the contrast of the mammogram.
 Noise removal- An adaptive median filter is used to de-noise the
mammographic image.
 Pectoral muscle boundary detection and elimination- a pool of multi
directional Gabor filters are used to detect fuzzy textural edge of PMB and
pectoralis is eliminated.
 Features Extraction- a bank of Gabor filters are used to extract textural
features and mathematical representation is
 To improve the performance in addition to Gabor features Sobel edge,
Robert’s edge, Scharr edge, Prewitt edge, Gaussian and Median filters
are used. 29
7.3 Machine learning models
 The breast masses are classified as normal, benign, or
malignant in the proposed work using ensemble ML
methods like
Random decision forest (RF)- Combination of large
number of decision tree classifiers, each decision tree is
applied to a subset of the dataset .
Light gradient boosting machine (LightGBM)- based on
advanced ensemble technique boosting and It takes less
time to process even if the data set is huge.
Extreme gradient boost (XGBoost)- is scalable, distributed
gradient-boosting and reduce the over fitting of data.
30
7.4 EXPERIMENTAL RESULTS and DISCUSSION
 The proposed system experimented with publically available data set MIAS.
 Performance evaluation of ensemble classification methods done using
statistical parameters precision, recall, accuracy, f1-score, T-test, and F-test
are computed from confusion matrix (CM).
 Table below illustrate the performance metrics of three ensemble machine
learning methods RF, LGBM, and XGBoost.
 Table below illustrates the various performance metrics of the RF
ensemble learning method for different sample sizes.
31
Metrics/Models Accuracy Error Rate Precision Recall F1 score Sensitivity Specificity
RF 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898
LGBM 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898
XGBoost 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898
Data samples Accuracy Error Rate Precision Recall F1 score Sensitivity Specificity
15 0.95 0.05 0.95 0.95 0.95 0.95 0.975
20 0.953 0.047 0.953 0.953 0.953 0.953 0.9764
25 0.96 0.04 0.96 0.96 0.96 0.96 0.98
30 0.9666 0.0334 0.9666 0.9666 0.9666 0.9666 0.98333
35 0.9714 0.0286 0.9714 0.9714 0.9714 0.9714 0.9857
40 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898
7.5 Graphs
32
Continue…
33
7.6 Conclusion
 This proposed GFEML technique used XGBoost,
LGBM, and RF ensemble machine learning algorithms
to categorise the breast tissue as normal, benign, or
malign.
 The experimental result shows that all three algorithms
have the same accuracy.
 When comparing the individual class performance of
three machine learning algorithms, the RF is the best.
34
8. Implementation of Objective3
GFEMLM and geometric pattern based ANN
(GPANN)novel technique designed and developed to
classify breast masses as benign or malignant with high
accuracy.
.
35
8.1 System flow diagram
36
8.2 Proposed Method
 Noise elimination- Multiplicative noise like speckle noise can be
eliminated using a nonlinear median filter.
 Artefacts elimination- the threshold binary connected
component (TBCC) algorithm is used to eliminate artefacts.
 Pectoral muscles elimination- the segmented global and grey
threshold connected component labelling (SGGTCCL)
algorithm is used to remove pectoralis.
 Mass boundary detection -The masses boundaries of
mammographic images can be detected using a canny edge
detector.
 Extracting geometric shape features- the count of various
geometric shapes like lines, triangles, and squares in a
mammogram.
37
Continue…
38
Continue…
ANN model
•In this proposed work to classify breast masses artificial
neural network learning model is used.
39
8.3 Experimental results and discussion
 In this work considered the publically available benchmark
dataset All MIAS for experimentation.
 The proposed method performance is analysed using the
accuracy, error rate, precision, recall, f1 score, specificity,
and sensitivity measures computed from confusion matrix.
 The confusion matrix of the present method based on
statistical and textural features and the newly proposed
method based on geometric pattern features are shown in
Figures below.
 The various statistical performance measures of the present
method and the proposed method are shown in Table below.
40
Continue…
The following graph shows the comparison of all these
statistical parameters of present and proposed method .
From this graph in terms of all these parameters, the
proposed methodology is much better than that of the present
method.
41
8.4 Conclusion
.
In this proposed work geometric patterns features are
extracted and based on these features breast tumour is
categorised as Benign or Malignant.
ANN used to predict breast mass as benign or malignant and
its accuracy is 86.67% and is better compared with the works
based on prediction with the statistical and texture features.
42
8.5 References
8.5 References
1. American Journal of Roentgenology (AJR) (2010) ‘Computer-aided
detection improves early breast cancer identification’, available at
http://www.ajronline.org (accessed on 5 February 2010.
2. Abdullahi Isa, Iliyas I
brahim Iliyas and Muhammad Lefami Zarma “Computational Intelligence
Approaches for Enhancing Biomedical Image Processing Applications Based on
Breast Cancer”, Biomedical Signal and Image Processing - Advanced Imaging
Technology and Application, December 2022.
3. C. Kaushal, S. Bhat, D. Koundal, A. Singla “Recent Trends in Computer
Assisted Diagnosis (CAD) System for Breast Cancer Diagnosis Using
Histopathological Images’, IRBM Volume 40, Issue 4, August 2019, Pages 211-
227.
4. Bushra Mughal, Muhammad Sharif & Nazeer Muhammad “Bi-model
processing for early detection of breast tumour in CAD system", The
European Physical Journal Plus, Published: 15 June 2017.
5. Osta, H., Qahwaji, R. and Ipson, S. (2008) ‘Comparisons of feature selection
methods using discrete wavelet transforms and support vector machines for
mammogram images’, in 5th International Multi-Conference on Systems,
Signals and Devices, pp.1–6. 43
Continue…
6. Babymol Kurian, VL Jyothi, “Breast cancer prediction using an optimal machine
learning technique for next generation sequences”. 2021, Vol. 29(1) 49–57.
7. Maleika Heenaye-Mamode Khan, Nazmeen Boodoo-Jahangeer, Wasiimah Dullull,
Shaista Nathire, Xiaohong Gao, G. R. Sinha, Kapil Kumar Nagwanshi. “Multi- class
classification of breast cancer abnormalities using Deep Convolutional Neural
Network (CNN)”. August 26, 2021.
8. Qingji Tian , Yongtang, Xiang Ren, Navid Razmjooy. “A New optimized sequential
method for lung tumor diagnosis based on deep learning and converged search and
rescue algorithm”. Biomedical Signal Processing and Control, Volume 68, July 2021,
102761.
9. Zhiqiang Guo, Lina Xu, Yujuan Si, Navid Razmjooy. “Novel computer-aided lung
cancer detection based on convolutional neural network-based and feature-based
classifiers using metaheuristics”. INTENATIONAL JOURNAL OF IMAGING
SYSTEMS AND TECHNOLOGY, 05 June 2021.
10. Shen, L., Margolies, L.R., Rothstein, J.H. et al. “Deep Learning to Improve Breast
Cancer Detection on Screening Mammography”. Sci Rep 9, 12495 (2019).
03/12/25
Dr.C.Naga Raju, Yogivemana
University,Proddatur,9949218570 44
Dr.
C.N
aga
Raj
u,
Yog
ive
man
a
Uni
vers
ity,
Pro
ddat
ur,9
949
218
570
03/12/25 45

EARLY DETECTION AND CLASSIFICATION OF BREAST CANCER USING MAMMOGRAMS AND MACHINE LEARNING ALGORITHMS

  • 1.
    1 EARLY DETECTION ANDCLASSIFICATION OF BREAST CANCER USING MAMMOGRAMS AND MACHINE LEARNING ALGORITHMS Mr. P. Narasimhaiah Research Scholar ROLL NO: 1012002203 Department of CSE YSR Engineering College of Yogivemana University Proddatur Andhra Pradesh SUPERVISOR Prof. Dr. C. NAGARAJU Department of CSE YSR Engineering College of Yogivemana University
  • 2.
    CONTENT 1.Introduction 1.1. Motivation 1.2. Tumor 1.3.Socio economic relevance of the Breast cancer 1.4. Mammography Database 2. Literature Survey 3. Research gaps 4. Objectives 5. Research Work 6. Objective 2 & 3 7. Objective 2, 3 & 4 8. Objectives 1, 2, 3, & 4 2
  • 3.
    1. 1. Introduction 1.1 Motivation •BreastCancer comprises 16% of all cancers in women- WHO Report. •The number of cases of Breast Cancer is more in developed countries while the incident rates are low. •On the contrary the incidence rates are high in developing nations while the number of cases currently are relatively low. • 1.15 million new cases. • Incidence increasing in most countries •470 000 deaths. 3
  • 4.
    Continued…  Breast canceris increasing in India also. There are an estimated 1,00,000 to 1,25,000 new breast cancer cases in India every year.  Breast cancer survival rate in India is 66%.  The number of breast cancer cases in India is estimated to double by 2025.  75% breast cancer Indian women are below 50 aged.  Early detection of breast cancer can improve the chances of successful treatment and recovery. 4
  • 5.
    1.2 About Tumours Anabnormal mass of tissue that forms when cells grow and divide more than they should or do not die when they should. A tumour is a mass or lump of tissue that may resemble swelling. Tumour Types Mainly Three types of tumour are there: Benign are not cancerous. They either cannot spread or grow, or they do so very slowly. If a doctor removes them, they do not generally return. 5
  • 6.
    Continued… … Premalignant tumors arenot yet cancerous, but they have the potential to become malignant. Malignant tumors are cancerous. The cells can grow and spread to other parts of the body. It is not always clear how a tumor will act in the future. Some benign tumors can become premalignant and then malignant. For this reason, it is best to monitor any growth. 6
  • 7.
    • The survivalrate of breast cancer is low in countries where literacy rate is low. • Diagnosis, treatment and prognosis of breast cancer patients with low socioeconomic status were lower and poorer. • Low socio economic status women were less likely to get breast screening. • High socio economic status women were more likely to have breast cancer risk. • Racial differences in survival of breast cancer are also large. 1.3. Socio economic relevance of the Breast cancer
  • 8.
    1. 4. MammographyDatabase  Digital Database For Screening Mammography(DDSM)  Inbreast Mammographic Database.  National Mammography Database (NMD).  MIAS Mammography.  Chinese Mammography Database (CMD).  LLNL/UCSF Database.  NIJMEGEN Database.  MammoGRID.  Vin Dr Mammogram Dataset.  Washington University Digital Mammography Database. 8
  • 9.
    2. Literature Survey Breast Cancer is a kind of cancer where cells in the chest tissue split and cultivate in unusual manner.  Breast cancer is the second leading cause of cancer deaths among women.  Every two minutes a woman is diagnosed with breast cancer.  The chance that breast cancer will be responsible for a woman's death is about 1 in 36 (about 3%).  Treatment includes surgery, drugs (hormonal therapy and chemotherapy, and radiation). 9
  • 10.
    Survey of PreviousWorks S. No Work Publisher Method Results Feature Scope 1 Application of Gabor wavelet and Locality Sensitive DiscriminantAnalysis for automated identification of breast cancer using digitized mammogram images Elsevier Analysis of Variance (ANOVA) Decision Tree (DT) classifier An automated diagnosis of breast cancer with a huge sample size The proposed work can be extended to diagnose other diseases like diabetes retinopathy, coronary artery disease, brain tumor, and tuberculosis. 2 An Optimized Framework for Breast Cancer Classification Using Machine Learning BioMed Research International 2022 ML based on BOTPE( Bayesian Optimization with Tree Structured Parzen Estimator) Machine Learning Model-Based HPO(Hyperparamet er Optimization) the LightGBM classifier outperformed the other four classifiers in accuracy, precision, recall, and –score. Future work Reduce Misclassification of breast lesions, resulting in a high false-positive rate. 3 Machine learning models in breast cancer survival prediction Technology and Health Care 24 Machine Learning prediction of breast cancer survival. Feature importance can be estimated during training for little additional computation. 4 Character-based Convolutional Grid Neural Network for Breast Cancer Classification IEEE CNN It can achieve a state-of the- art performance with minimal computational cost. Using the CharCNN and residual layers for representation learning remains an avenue for future work. 5 Computer-Aided Detection and Diagnosis of Breast Cancer With Mammography IEEE Transactions Development of CAD systems Image enhancement To develop more effective CAD systems for further image enhancement. 6 Breast cancer detection by leveraging Machine Learning Elsevier Deep Neural Network with Support Value To meet the better performance, efficiency, and quality of images To meet the better performance, efficiency, and quality of images using Deep learning.
  • 11.
    7 A MultimodalDeep Neural Network for Human Breast Cancer Prognosis Prediction by Integrating Multi- Dimensional Data IEEE/ACM TRANSACTI ONS 2019 MDNNMD Prediction Model for Multi- Dimension Data MDNNMD achieves competitive or better performance than SVM, RF and LR.3. MDNNMD still has some avenues for further investigation predicting survival time of breast cancer. 8 Multi-task fusion for improving mammography screening data classification IEEE TRANSACTIO NS ON MEDICAL IMAGING 2021 fusion prediction classifying images model F reaches an AUC score of 0.921 on test data with TPR = 0.881 and specificity = 0.802 full-field digital mammography dataset would be the logical next step and potentially boost performance when trained on a larger data resource. 9 Meta-learning Based Breast Abnormality Classification on Screening Mammograms ICCEA-2021 The idea of meta- learning, also known as “learn to learn” The meta-learning- based Mass and Calcification classifier reached the accuracy of 78% and the contrast experiment using The possible future analysis on applying meta-learning on cross- category classification with medical images. 10 Anomaly Detection of Calcifications in Mammography Based on 11,000 Negative Cases IEEE TRANSACTI ONS ON BIOMEDICA L ENGINEERI NG, VOL. 69, NO. 5, MAY 2022 annotation- efficient, semi-supervised detect calcifications on digital mammograms although trained on negative images, the performance is promising investigate more fundamental properties of anomaly detection that could potentially generalize beyond the present task to other areas of medical imaging.
  • 12.
    12 11 Real-Time UltrasoundDetection of Breast Microcalcifications Using Multifocus Twinkling Artifact Imaging. IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022. Real-time multifocus twinkling artifact (MF- TA). Exploits time-varying TAs arising from acoustic random scattering on MCs with rough or irregular surfaces. The biggest drawback of US is the low visibility of MCs, and it is a major limitation in screening for early breast cancer. 12 Multidirectional Gabor Filter- Based Approach for Pectoral Muscle Boundary Detection. IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES, VOL. 6, NO. 4, APRIL 2022. Multidirectional Gabor filter (MDGF)- based approach for PMB detection. It eliminates the need for straight-line estimation and is able to detect PMB with high accuracy even if it deviates highly from its usual straight- line estimate. The performance of any intensity-based approach in detecting such fuzzy textural edges is poor. 13 WDCCNet: Weighted Double- Classifier Constraint Neural Network for Mammographic Image Classification. IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 3, MARCH 2022. Double-classifier network architecture that constrains the extracted features. Easily applied to an existing convolutional neural network to improve mammographic image classification performance. Two classifiers decision boundaries to constrain the distribution of feature vectors with small intra-class variance and large inter-class variance.
  • 13.
    3. Research Gaps 1.Present research on breast cancer declined mortality rate, but not reached to eliminate cancer. 2. In present research the accuracy cancer detection is declined due presence of noise, uneven-illumination, labels and pectoral muscles. 3. Present breast cancer detection systems assist the radiologist, but not with high accuracy. 4. In present research miss classification breast cancer leads to high false positive and false negative rate. 5. To improve performance of breast cancer detection in present research identify important features before training the model. 13
  • 14.
    Continued… 6. Develop aeffective CAD system to enhance a mammographic image. 7. In place of digital mammograms use multi-field digital mammograms to boost performance. 8. Investigate fundamental properties of anomaly detection to extend to other medical images. 9. Meta learning to classify other medical images. 10. Poor performance of intensity based approach to detect fuzzy textural edges in pectoralis. 11. Constraint on classification if the feature vector is with small intra-class variance and large inter-class variance. 14
  • 15.
    4. Objectives ofResearch 1. Hough transformation based GLCM feature extraction method(HTGFEM) implemented to remove pectoralis and artefacts from mammograms. 2. Geometric pattern based feature extraction and ANN (GPFEA)novel technique designed and developed to eliminate noise and enhance mammogram with uneven illumination and low contrast. 3. Gabor filter based ensemble ML method(GFEMLM) implemented to developed to classify breast masses as benign or malignant with high accuracy. 15
  • 16.
    1. P. Narasimhaiah,C. Naga Raju ” Machine Learning Technique for Prediction of Breast Cancer ” published in International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 11 Issue: 7s DOI: https://doi.org/10.17762/ijritcc.v11i7s.7012 ,Published: 05 June 2023 , PAGE NO: 368-380, data based in Scopus. 2. P. Narasimhaiah, C. Nagaraju “Breast Cancer Screening Tool Using Gabor Filter-Based Ensemble Machine Learning Algorithms” published in International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING ISSN:2147-67992 Volume: 11 Issue: 2 Published: 17.02.2023 , PAGE NO: 936-947, data based in Scopus. . 3. P. Narasimhaiah, C. Nagaraju “Breast Mass Prediction Based on Geometric Pattern Features Using ANN and Mammograms” published in NOVYI MIR Research Journal ISSN NO: 0130-7673 DOI:16.10098.NMRJ.2022.V8I7.256342.37693, VOLUME 8 ISSUE 7, Published: July 2023,PAGE NO: 163-172, data based in WOS. 5. Research Papers
  • 17.
    • Hough transformationbased GLCM feature extraction method(HTGFEM) developed to remove pectoralis and artefacts from mammograms. • Hough Transformation used determine the boundary between breast and pectoralis. • Opening and closing operations are used to eliminate bright spots and artefacts. • Using GLCM six properties are computed to construct feature database. 17 6. Implementation of objective1
  • 18.
    6.1 Proposed methodsystem diagram 18
  • 19.
    6.2 Proposed Method Pre-processing  The quality of the database is improved by applying pre-processing to database.  The pre-processing operations are  Noise removal- Noise is removed in mammograms using adaptive median filter.  Pectoral removal- pectoral muscles are identified and removed using Hough Transformation.  Label removal-abnormal bright spots are removed by using opening operations and labels are removed by using closing operations.  Canny edge detection-the abnormal region boundary can be determined using Canny edge operator. 19
  • 20.
    Continued…  GLCM- GreyLevel Co-occurrence Matrix GLCM is applied on abnormal boundary determined mammographic image to obtain similar grey values feature. The database is created by calculating contrast, dissimilarity, homogeneity, ASM, energy, and correlation properties from the final GLCM. Database of12 mammogram images: 20 Contrast Dissimilarity Homogeneity ASM Energy Correlation Label 1037.875212 9.914760 1.836836 1.626391 1.803531 0.466991 1 1309.909322 12.955791 1.777469 1.500280 1.732201 0.508517 1 3433.140819 34.837429 1.431750 0.907204 1.346868 0.794332 1 2989.287924 31.638912 1.455632 0.949860 1.377985 0.780013 1 1673.154025 16.694562 1.727541 1.405416 1.676446 0.539076 1 1245.323446 12.961017 1.784734 1.512344 1.739076 0.735289 1 914.944421 8.954308 1.845410 1.639447 1.810736 0.460411 2 1148.243291 11.081992 1.811778 1.569226 1.771545 0.465053 2 964.208263 9.590466 1.841341 1.627621 1.804199 0.527467 2 1394.526554 14.196328 1.750392 1.439865 1.696928 0.568649 2 1478.695550 15.536935 1.722486 1.361569 1.650141 0.577631 2 1567.289548 16.577966 1.699204 1.318028 1.623554 0.602922 2
  • 21.
    Continued…  Multi LinearRegression Model  Mathematical representation of multi linear regression model is  MLR model trained using training database.  The trained model classify abnormal regions as benign or malignant. Malignant Benign 21
  • 22.
    6.3 Experimental resultsand Discussion  For experimentation two publically available datasets are considered: INbreast Dataset and MIAS Dataset.  Confusion matrix is used to analyse the performance of MLR model.  From the confusion matrix the statistical parameters precision, recall, accuracy, f1-score, T-test, and F-test are computed to analyse the performance. 22 Sample size% Precision Recall F1-Score MAE MSE RMSE T-test F-test Accuracy 0.10 0.50 1.00 0.67 0.503 0.334 0.00 0.565 0.0 0.50 0.15 0.50 1.00 0.67 0.503 0.334 0.00 0.568 0.0 0.50 0.20 0.50 0.50 0.50 0.552 0.393 0.627 -0.12 0.6 0.33 0.25 0.50 0.50 0.50 0.552 0.393 0.627 -0.12 0.6 0.33 0.30 0.67 0.67 0.67 0.454 0.318 0.564 -0.20 0.73 0.50 0.35 0.33 1.00 0.50 2.672 15.75 3.968 1.38 48.30 0.40 0.04 0,33 1.00 0.50 2.672 15.75 3.968 1.38 48.30 0.40 0.45 0.33 1.00 0.50 5.411 60.64 7.787 1.97 142.96 0.17 0.50 0.33 1.00 0.50 5.411 60.64 7.787 1.97 142.96 0.17 0.55 0.33 0.33 0.33 1.077 1.886 1.376 1.26 6.85 0.14 0.60 0,00 0.00 0.00 1.451 3.041 1.743 -0.46 16.26 0.12 0.65 0.00 0.00 0.00 1.45 3.04 1.74 -0.46 16.26 0.12 0.70 1.00 0.50 0.67 0.531 0.387 0.622 -0.46 16.26 0.56 0.75 1.00 0.50 0.67 0.531 0.387 0.622 -1.41 0.213 0.56 0,80 1.00 0.56 0.71 0.437 0.293 0.541 -1.03 0.213 0.60 0.85 1.00 0.45 0.62 0.545 0.545 0.338 -3.46 0.418 0.45
  • 23.
    6.4 Graphs 23 Relationship betweenactual values and predicted values Relationship between sample size and accuracy Relationship between precision and accuracy. Relationship between recall and accuracy.
  • 24.
    Continued… 24 Relationship between MAEand accuracy Relationship between F1-scoreandaccuracy. Relationship between MSE and accuracy. Relationship between RMSE and Accuracy. Relationship between F-test and accuracy.
  • 25.
    6.5 Conclusion  Theproposed HTGFEM is implemented and tested with bench mark dataset.  This method produced better results in the presence of noise and unwanted regions like pectoral muscles and artefact for classification.  The accuracy of the proposed HTGFEM is 60%.on high noise, low contrasted artefacts and ambiguous pectorals which are considered for the experimentation.  The next research work further improve the performance of classification by further enhancing pre-processing. 25
  • 26.
    7. Implementation ofObjectives2  Gabor filter based ensemble ML method(GFEMLM) implemented to enhance the mammogram in the presence of uneven illumination and low contrast.  OTSU threshold method is used find different thresholds of mammogram. From these thresholds hard threshold is determined. Breast region is selected as largest connected component.  Fuzzy textural edge of PMB is determined using Gabor filter and PMB is removed.  Noise is removed using AMF .  Classification of breast mass is done using ensemble ML algorithms. 26
  • 27.
  • 28.
    7.2 Proposed Method Pre-processing- to achieve better performance database should be pre-processed.  The pre-processing of the mammographic database involves:  Change of orientation- instead of developing the proposed method to each orientation, develop only for left orientation and right orientation should be changed to left by flipping.  Artefact removal-The hard threshold of mammogram is determined as 0.5 * lowest threshold value. Thus mammogram region is selected as the largest connected area by eliminating different kinds of artefacts. 28
  • 29.
    Continue…  Image enhancement-Thecontrast-limited adaptive histogram equalization (CLAHE) technique is used to increase the contrast of the mammogram.  Noise removal- An adaptive median filter is used to de-noise the mammographic image.  Pectoral muscle boundary detection and elimination- a pool of multi directional Gabor filters are used to detect fuzzy textural edge of PMB and pectoralis is eliminated.  Features Extraction- a bank of Gabor filters are used to extract textural features and mathematical representation is  To improve the performance in addition to Gabor features Sobel edge, Robert’s edge, Scharr edge, Prewitt edge, Gaussian and Median filters are used. 29
  • 30.
    7.3 Machine learningmodels  The breast masses are classified as normal, benign, or malignant in the proposed work using ensemble ML methods like Random decision forest (RF)- Combination of large number of decision tree classifiers, each decision tree is applied to a subset of the dataset . Light gradient boosting machine (LightGBM)- based on advanced ensemble technique boosting and It takes less time to process even if the data set is huge. Extreme gradient boost (XGBoost)- is scalable, distributed gradient-boosting and reduce the over fitting of data. 30
  • 31.
    7.4 EXPERIMENTAL RESULTSand DISCUSSION  The proposed system experimented with publically available data set MIAS.  Performance evaluation of ensemble classification methods done using statistical parameters precision, recall, accuracy, f1-score, T-test, and F-test are computed from confusion matrix (CM).  Table below illustrate the performance metrics of three ensemble machine learning methods RF, LGBM, and XGBoost.  Table below illustrates the various performance metrics of the RF ensemble learning method for different sample sizes. 31 Metrics/Models Accuracy Error Rate Precision Recall F1 score Sensitivity Specificity RF 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898 LGBM 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898 XGBoost 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898 Data samples Accuracy Error Rate Precision Recall F1 score Sensitivity Specificity 15 0.95 0.05 0.95 0.95 0.95 0.95 0.975 20 0.953 0.047 0.953 0.953 0.953 0.953 0.9764 25 0.96 0.04 0.96 0.96 0.96 0.96 0.98 30 0.9666 0.0334 0.9666 0.9666 0.9666 0.9666 0.98333 35 0.9714 0.0286 0.9714 0.9714 0.9714 0.9714 0.9857 40 0.9798 0.0202 0.9798 0.9798 0.9798 0.98 0.9898
  • 32.
  • 33.
  • 34.
    7.6 Conclusion  Thisproposed GFEML technique used XGBoost, LGBM, and RF ensemble machine learning algorithms to categorise the breast tissue as normal, benign, or malign.  The experimental result shows that all three algorithms have the same accuracy.  When comparing the individual class performance of three machine learning algorithms, the RF is the best. 34
  • 35.
    8. Implementation ofObjective3 GFEMLM and geometric pattern based ANN (GPANN)novel technique designed and developed to classify breast masses as benign or malignant with high accuracy. . 35
  • 36.
    8.1 System flowdiagram 36
  • 37.
    8.2 Proposed Method Noise elimination- Multiplicative noise like speckle noise can be eliminated using a nonlinear median filter.  Artefacts elimination- the threshold binary connected component (TBCC) algorithm is used to eliminate artefacts.  Pectoral muscles elimination- the segmented global and grey threshold connected component labelling (SGGTCCL) algorithm is used to remove pectoralis.  Mass boundary detection -The masses boundaries of mammographic images can be detected using a canny edge detector.  Extracting geometric shape features- the count of various geometric shapes like lines, triangles, and squares in a mammogram. 37
  • 38.
  • 39.
    Continue… ANN model •In thisproposed work to classify breast masses artificial neural network learning model is used. 39
  • 40.
    8.3 Experimental resultsand discussion  In this work considered the publically available benchmark dataset All MIAS for experimentation.  The proposed method performance is analysed using the accuracy, error rate, precision, recall, f1 score, specificity, and sensitivity measures computed from confusion matrix.  The confusion matrix of the present method based on statistical and textural features and the newly proposed method based on geometric pattern features are shown in Figures below.  The various statistical performance measures of the present method and the proposed method are shown in Table below. 40
  • 41.
    Continue… The following graphshows the comparison of all these statistical parameters of present and proposed method . From this graph in terms of all these parameters, the proposed methodology is much better than that of the present method. 41
  • 42.
    8.4 Conclusion . In thisproposed work geometric patterns features are extracted and based on these features breast tumour is categorised as Benign or Malignant. ANN used to predict breast mass as benign or malignant and its accuracy is 86.67% and is better compared with the works based on prediction with the statistical and texture features. 42
  • 43.
    8.5 References 8.5 References 1.American Journal of Roentgenology (AJR) (2010) ‘Computer-aided detection improves early breast cancer identification’, available at http://www.ajronline.org (accessed on 5 February 2010. 2. Abdullahi Isa, Iliyas I brahim Iliyas and Muhammad Lefami Zarma “Computational Intelligence Approaches for Enhancing Biomedical Image Processing Applications Based on Breast Cancer”, Biomedical Signal and Image Processing - Advanced Imaging Technology and Application, December 2022. 3. C. Kaushal, S. Bhat, D. Koundal, A. Singla “Recent Trends in Computer Assisted Diagnosis (CAD) System for Breast Cancer Diagnosis Using Histopathological Images’, IRBM Volume 40, Issue 4, August 2019, Pages 211- 227. 4. Bushra Mughal, Muhammad Sharif & Nazeer Muhammad “Bi-model processing for early detection of breast tumour in CAD system", The European Physical Journal Plus, Published: 15 June 2017. 5. Osta, H., Qahwaji, R. and Ipson, S. (2008) ‘Comparisons of feature selection methods using discrete wavelet transforms and support vector machines for mammogram images’, in 5th International Multi-Conference on Systems, Signals and Devices, pp.1–6. 43
  • 44.
    Continue… 6. Babymol Kurian,VL Jyothi, “Breast cancer prediction using an optimal machine learning technique for next generation sequences”. 2021, Vol. 29(1) 49–57. 7. Maleika Heenaye-Mamode Khan, Nazmeen Boodoo-Jahangeer, Wasiimah Dullull, Shaista Nathire, Xiaohong Gao, G. R. Sinha, Kapil Kumar Nagwanshi. “Multi- class classification of breast cancer abnormalities using Deep Convolutional Neural Network (CNN)”. August 26, 2021. 8. Qingji Tian , Yongtang, Xiang Ren, Navid Razmjooy. “A New optimized sequential method for lung tumor diagnosis based on deep learning and converged search and rescue algorithm”. Biomedical Signal Processing and Control, Volume 68, July 2021, 102761. 9. Zhiqiang Guo, Lina Xu, Yujuan Si, Navid Razmjooy. “Novel computer-aided lung cancer detection based on convolutional neural network-based and feature-based classifiers using metaheuristics”. INTENATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 05 June 2021. 10. Shen, L., Margolies, L.R., Rothstein, J.H. et al. “Deep Learning to Improve Breast Cancer Detection on Screening Mammography”. Sci Rep 9, 12495 (2019). 03/12/25 Dr.C.Naga Raju, Yogivemana University,Proddatur,9949218570 44
  • 45.