Lung cancer prediction using CNN with VGG16
Prithwish Raymahapatra1,
, Dr. Avijit Kumar Chaudhuri2
, Sulekha Das3
1
UG - Computer Science and Engineering, Techno Engineering College
Banipur, Habra, Kolkata
2
Associate Professor, Computer Science, and Engineering, Techno Engineering
College Banipur, Habra, Kolkata
3
Assistant Professor, Computer Science, and Engineering, Techno Engineering
College Banipur, Habra, Kolkata
1
prithsray@gmail.com, 2
c.avijit@gmail.com, 3
shu7773sea@gmail.com
1
0000-0001-7147-9491, 2
0000-0002-5310-3180, 3
0000-0002-6641-3268
Abstract
The objective of this research work is to classify Lungs Cancer CT Scan images into 4
different classes using Convolutional Neural Network (CNN) with VGG16 architecture. The
4 classes are Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal
Lungs images. The dataset used for this research is a publicly available Chest CT-Scan
images Dataset with 1000 images at Kaggle. The methodology followed in this project
includes data pre-processing, model building, and evaluation. The dataset is pre-processed by
resizing the images to 224x224 and normalizing the pixel values. The VGG16 architecture
has been used to build the CNN model, and it is trained on the pre-processed image data for
12 epochs with a batch size of 64 and learning rate of 0.001. The model is evaluated using the
area under the receiver operating characteristic curve (AUC) metric. The results of this
project show that the CNN model with VGG16 architecture achieves an AUC of 0.94 for
classifying Lungs cancer CT scan images into 4 different classes. In conclusion, the CNN
model with VGG16 architecture is an effective approach for classifying Lung Cancer CT
Scan images into 4 different classes. The model achieves high accuracy of 0.94 in identifying
different types of Lung cancer, which could potentially aid in early diagnosis and treatment
of Lung cancer.
The proposed model is tested on the “Chest CT-Scan images Dataset” from kaggle which is
available at https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images .
Keywords - CNN, VGG16, AUC, Lungs Cancer
Introduction:
Lungs Cancer CT scan image classification using a convolutional neural network (CNN) is a
challenging task in medical image analysis. In this Article, we have used VGG16 model for
this image classification tasks, and got the best result on this model. The VGG16 is a recently
developed model that hasn’t been used on this dataset. The goal of this project is to classify
Lungs Cancer CT scan images into 4 different classes: Large cell carcinoma, Squamous cell
carcinoma, Adenocarcinoma, and Normal Lungs images. We will use the Area Under the
Operating Characteristic Curve (AUC) of the receiver as the evaluation metric for our model.
AUC, or Area Under the Curve, is an important metric used in machine learning(ML) to
evaluate the performances of binary classification models. It measures the performance of the
given model by distinguishing between positive and negative samples.
In a problem that is binary classified, the model predicts a probability score for each sample,
and the AUC represents the probability that the model will rank a positive sample that is
randomly chosen higher than a negative sample that is also randomly chosen. The AUC
ranges from 0 to 1, where an AUC of 0.5 shows that the model is not good than random
guessing, while a model whose AUC value is 1, indicates that it’s a perfect model. AUC is a
popular metric because it is more robust than accuracy when dealing with imbalanced
datasets. It also provides a comprehensive evaluation of the model's performance, taking into
account both false positive and false negative rates.
Furthermore, AUC is useful for comparing the performance of different models, as it is
independent of the classification threshold used to make predictions. This allows for a fair
comparison of models even if they have different thresholds.
Overall, we can imply that AUC is a valuable metric in any machine learning technique that
provides insight into the performance of models that are binary classified and can help in
selecting the best model for a given problem.
To accomplish this, we will first pre-process the dataset, which consists of Lungs cancer CT
scan images. We will then train the models using transfer learning with pre-trained weights
on ImageNet. Next, we will fine-tune the model on our dataset, followed by evaluating the
model's performance on the test set using the AUC metric. The final output of our project will
be a model that can accurately classify Lungs cancer CT scan images into one of the four
classes with high accuracy and AUC. This model can be used as an effective tool for early
diagnosis and treatment of Lungs cancer.
Relevant Literature :
Author Feature/methods Performance
Rehman, Amjad, et al. (2021)
[2]
SVM,KNN Accuracy: 93%
Sensitivity : 86%
Specificity : 95.4 %
Toğaçar, Mesut, Burhan Ergen,
and Zafer Cömert. (2020) [3]
LR,LDA,DT,SVM,KNN, Accuracy: 98.74%
Sensitivity : 98.35%
Specificity : 99.12 %
Precision : 99.12%
F1-Score : 98.74%
Nasser, Ibrahim M., and Samy
S. Abu-Naser.(2019) [4]
ANN Accuracy: 96.67%
Bhatia, Siddharth, Yash Sinha,
and Lavika Goel.(2019) [5]
XGBoost and Random Forest Accuracy: 84%
Makaju, Suren, et al.(2018) [6] SVM Accuracy: 92%
Sensitivity : 100%
Specificity : 50%
Taher, Fatma, and Rachid
Sammouda.(2011) [7]
HNN Accuracy: 98%
Sensitivity : 83%
Specificity : 99%
Tekade, Ruchita, and K.
Rajeswari.(2018) [8]
CNN Accuracy: 95.66%
Alam, Janee, Sabrina Alam, and
Alamgir Hossan.(2018) [9]
Multiclass SVM Accuracy (detection) : 97%
Accuracy (prediction) : 87%
Table 1: Literature Study
Table 1 shows the literature review of various skin cancer research work
Methodology:
 Dataset: The first step of this whole research work was selecting the dataset. In this
case, we have chosen a dataset which is a Lungs Cancer CT scan images datset. This
dataset contains 1000 CT scan images of human lungs, classified into 4 classes: Large
cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs
images. The particular reason behind working with this dataset is that this dataset
consists of a lot of sample images and a lot of research work has already been done
with this dataset and got remarkable results. The dataset was divided into two parts:
the training set, the testing set and the validation set. The training set contains a total
of 613 images, the testing set consists of 315 images and validation set consist of 72
images. Each of the two consists of all the 4 classes i.e. Large cell carcinoma,
Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images.
Training Set Testing Set Validation Set
Large cell
carcinoma
115 51 21
Squamous cell
carcinoma
155 90 15
Adenocarcinoma 195 120 23
Normal 148 54 13
Table 2: Train Test Validation Divisions
Table 2 shows the Training, testing, validation division of the data that have been used in this
article.
Research Method:
 Convolutional Neural Network: CNN[23] stands for Convolutional Neural
Network, which is a type of deep neural network i.e. deep learning commonly used in
image and video recognition and also to process any tasks. The key characteristic of
CNN is that it can automatically learn and extract features from the raw data, in this
case, images or videos. These features are learned through a process of convolution,
where the network applies a complete set of filters or kernels to the input image to
identify patterns and structures in the data. The output of the convolutional layers is
then passed through a series of pooling layers, which reduce the spatial size of the
features and help to increase the network's ability to generalize to new images. After
the pooling layers, the resulting features are flattened into a vector and fed into the
fully connected layer, where the network can make predictions based on the learned
features. CNNs have proven to be highly effective in a range of computer vision tasks,
including classification of the image, the detection of objects, and segmentation.
 VGG16: VGG16 [13] is a convolutional neural network (CNN) architecture used to
win the 2014 ILSVR (ImageNet) competition. Today it is considered one of the
excellent machine vision model architectures. The great feature of VGG16 is that it
avoids lots of hyper parameters, we focused on 3x3 filter convolution layers in step 1,
and always used the same padding and max pool layers of 2x2 filters in step 2.
Maximum pool layers consistently throughout the architecture. In the end, there are 2
Fully Connected Layers (FCs) and a softmax for the output layer. The 16 in VGG16
means there are 16 layers with weights. This network is quite large and has about 138
million parameters.
Figure 1: VGG16 Architecture
Figure 1 shows the VGG16 Architecture that has been used in this article
 ImageNet: The ImageNet [22] weights for VGG16 are pre-trained weights that have
been learned on the large-scale ImageNet dataset. These weights are often used as
starting point for transfer learning in computer vision tasks This includes the weights
for all the layers in the network, as well as the biases for the fully connected layers.
 ImageDataGenerator: In Keras, the ImageDataGenerator[21] class is used for image
generation and data augmentation. This class provides a set of functions for pre-
processing and data augmentation on the input images. It generates batches of tensor
image data using real-time data augmentation. This allows you to train deep learning
models on a large dataset without having to load all the images into memory at once.
Instead, the ImageDataGenerator loads the images in batches and applies various
image transformations on the fly.
PRIMARY WORK: The first step of this whole research work was selecting the dataset.
In this case, we have chosen a dataset which is Lungs Cancer CT scan images dataset from
Kaggle. This dataset contains 1000 CT scan images of human lung, classified into 4 classes:
Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs
images. The particular reason behind working with this dataset is that this dataset has a lot of
sample images and a lot of research work has already been done with this dataset and got
remarkable results.
After selection of the dataset, we have used the VGG16 model that came out in 2014 which is
one of the best CNN models available right now and is used in many classification models
over other models like AlexNet which are less discriminative.
Post training the model over the dataset, we tested it over the testing set and got remarkable
results with the classifications. The various parameters of measuring the performance i.e.
accuracy, recall, precision, specificity, F1-score, and AUC of this research are depicted later.
Confusion Matrix: A confusion matrix [20] i.e. also called an error matrix, is one type of
matrix or a table where we put the results of the MLR model i.e. the test data. The confusion
matrix is the shortest way to see and understand the result of the model. In the confusion
matrix, there are a total of four variables as – TP, TN, FP, and FN. TP stands for 'true
positive' which shows the total number of positive data classified accurately. TN stands for
'true negative' which shows the total number of negative data classified accurately. FP stands
for 'false positive' which indicates the real value is negative but predicted as positive. FP is
called a TYPE 1 ERROR. FN stands for 'false negative' which indicates the real value is
positive but predicted as negative. FN is also called a TYPE 2 ERROR.
Figure 2: Confusion Matrix
Figure 2 shows the Confusion Matrix that has been used in this article
DEVELOPING EQUATION OF CONFUSION MATRIX:
Let’s take-
TP= TRUE POSITIVE
TN= TRUE NEGATIVE
FP= FALSE POSITIVE
FN= FALSE NEGATIVE
FPR= FALSE POSITIVE RATE
Table 3: Accuracy Metrics
Table 3 shows the various accuracy metrics
TP+TN
Accuracy =
TP+TN+FP+FN
In any model, it represents the ratio of the
number of times the model can make the
correct prediction with the total number of
predictions.
TP
Sensitivity =
TP+FN
We defined it as the ratio of the number of
times a model can make a positive prediction
to the total number of correct predictions.
TN
Specificity =
TN+FP
We defined it as the ratio of the number of
times a model can predict that the result will
be negative to the total number of times it has
made the correct prediction.
TP
Precision =
TP + FP
Precision is the method by which way one
can say how correctly predicted cases turned
positive.
TP
Recall =
TP+FN
Recall is calculated as the
ratio of the number of positive samples
correctly classified as positive to the total
number of positive samples. Recall measures
the ability of a model to detect positive
samples. The higher the recall, the more
positive samples are found.
FP
FPR
TN FP


It is the probability of falsely rejecting the
null hypothesis.
2 * Recall * Precision
F1_Score =
Recall + Precision
F1 score is the measurement of accuracy and
it is the harmonic mean of precision and
recall. Its maximum value can be 1 and the
minimum value can be 0.
1 FPR recall
auc= - +
2 2 2
AUC [14] stands for Area Under the ROC
Curve, which is a popular evaluation metric
in machine learning for binary classification
problems. The ROC (Receiver Operating
Characteristic) curve is a graphical
representation of the performance of a binary
classifier, and the AOC measures the area
under this curve. In a problem that is binary
classified, the classifier tries to predict
whether an input belongs to a positive or
negative class. The ROC curve plots the true
positive rate (TPR) against the false positive
rate (FPR) for different classification
thresholds. The TPR is the ratio of correctly
predicted positive samples to the total
number of actual positive samples, and the
FPR is the ratio of incorrectly predicted
positive samples to the total number of actual
negative samples. The AOC ranges from 0 to
1, with higher values indicating better
performance. A perfect classifier would have
an AOC of 1, while a completely random
classifier would have an AOC of 0.5. The
AOC is a useful evaluation metric because it
takes into account all possible classification
thresholds and provides a single number to
compare the performance of different
classifiers.
Procedure:
 Define the model architecture using the VGG16 pre-trained model as a base and add
new classifier layers on top.
 Load the pre-trained weights for the VGG16 model.
 Freeze all the layers of the VGG16 model to prevent them from being updated during
training.
 Add new fully connected classifier layers with appropriate activation functions and
kernel initializers.
 Compile the model with appropriate optimizer and loss function, and evaluate using
relevant metrics like accuracy, precision, recall, AUC, and F1 score.
 Augment the data using ImageDataGenerator to increase the size of the training
dataset.
 Fit the model to the augmented data and evaluate the model on the test data.
 Calculate and print relevant metrics like accuracy, precision, recall, specificity, and
F1 score for the test dataset.
 Calculate and print the AUC (Area under the Curve) score.
FLOWCHART:
Figure 3: Flowchart
Figure 3 shows the Flowchart of the process that has been used in this article to get the
desired result.
RESULTS AND DISCUSSION
After analysing the VGG16 model on this dataset we get the results that are given below.
Epochs Accuracy
(%)
Precision
(%)
Recall
(%)
AUC
(%)
F1-Score
(%)
9 91.69 70.70 94.76 92.88 80.98
10 92.18 79.68 87.93 90.68 83.60
12 94.62 84.76 93.13 94.10 88.75
15 93.75 81.25 92.85 93.42 86.66
Table 4: Comparisons Table
Table 4 shows the Comparison of results from different epochs that have been used to get the
desired results.
COMPARISON:
ACCURACY:
Figure 4: Graph of Accuracy
Figure 4 shows the Accuracy difference graph between the different epochs that have been
obtained in this article for getting the desired result.
F1_SCORE:
Figure 5: Graph of F1-Score
Figure 5 shows the F1-Score difference graph between the different epochs that have been
obtained in this article for getting the desired result.
90
90.5
91
91.5
92
92.5
93
93.5
94
94.5
95
epoch 9 epoch 10 epoch 12 epoch 15
Accuracy
Accuracy
76
78
80
82
84
86
88
90
epoch 9 epoch 10 epoch 12 epoch 15
F1-Score
F1-Score
AUC:
Figure 6: Graph of AUC
Figure 6 shows the AUC difference graph between the different epochs that have been
obtained in this article for getting the desired result.
CONCLUSION
Author Feature/methods Performance
Rehman, Amjad, et al. SVM,KNN Accuracy: 93%
Sensitivity -86%
Specificity – 95.4 %
Proposed architecture VGG16 Accuracy: 94.62 %
Precision: 84.76 %
Recall: 93.13 %
F1-Score: 88.75 %
AUC: 94.10 %
Table 5: Lung Cancer CT scan Dataset Performance Comparison
Table 5 shows the comparative study of different metrics used in the same dataset.
This article focuses on the identification and classification of different Lungs cancer CT Scan
images into their respective classes i.e. Large cell carcinoma, Squamous cell carcinoma,
Adenocarcinoma, and Normal Lungs images by transfer learning approach using
Convolutional Neural Network (CNN) as the working model with VGG16 architecture with
sigmoid and relu activation function, and calculating the AUC of the model which depicts
how efficiently the model is working and how accurately it is classifying those images. The
AUC of this model is 0.94 which depicts that this model is highly efficient to classify those
88
89
90
91
92
93
94
95
epoch 9 epoch 10 epoch 12 epoch 15
AUC
AUC
images. The model achieves an accuracy of 0.94 in identifying different types of Lung cancer
CT scan images, which could potentially aid in early diagnosis and treatment of Lung cancer.
FUTURE SCOPE
As the AUC of this model is very high so this model can be used in the future for other
disease datasets and also other datasets. In the future, we will collect data from various
healthcare agencies and hospitals and we will try a new approach on this dataset and other
dataset like this with the help of reinforcement learning techniques to obtain better results and
to increase the performance of those models for early detection of these type of diseases.
REFERENCES:
[1] https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images
[2] Rehman, A., Kashif, M., Abunadi, I., & Ayesha, N. (2021, April). Lung cancer detection
and classification from chest CT scans using machine learning techniques. In 2021 1st
International Conference on Artificial Intelligence and Data Analytics (CAIDA) (pp. 101-
104). IEEE.
[3] Toğaçar, M., Ergen, B., & Cömert, Z. (2020). Detection of lung cancer on chest CT
images using minimum redundancy maximum relevance feature selection method with
convolutional neural networks. Biocybernetics and Biomedical Engineering, 40(1), 23-39.
[4] Nasser, I. M., & Abu-Naser, S. S. (2019). Lung cancer detection using artificial neural
network. International Journal of Engineering and Information Systems (IJEAIS), 3(3), 17-
23.
[5] Bhatia, S., Sinha, Y., & Goel, L. (2019). Lung cancer detection: a deep learning approach.
In Soft Computing for Problem Solving: SocProS 2017, Volume 2 (pp. 699-705). Springer
Singapore.
[6] Makaju, S., Prasad, P. W. C., Alsadoon, A., Singh, A. K., & Elchouemi, A. (2018). Lung
cancer detection using CT scan images. Procedia Computer Science, 125, 107-114.
[7] Taher, F., & Sammouda, R. (2011, February). Lung cancer detection by using artificial
neural network and fuzzy clustering methods. In 2011 IEEE GCC conference and exhibition
(GCC) (pp. 295-298). IEEE.
[8] Tekade, R., & Rajeswari, K. (2018, August). Lung cancer detection and classification
using deep learning. In 2018 fourth international conference on computing communication
control and automation (ICCUBEA) (pp. 1-5). IEEE.
[9] Alam, J., Alam, S., & Hossan, A. (2018, February). Multi-stage lung cancer detection and
prediction using multi-class svm classifie. In 2018 International conference on computer,
communication, chemical, material and electronic engineering (IC4ME2) (pp. 1-4). IEEE.
[10] Shakeel, P. M., Burhanuddin, M. A., & Desa, M. I. (2022). Automatic lung cancer
detection from CT image using improved deep neural network and ensemble classifier.
Neural Computing and Applications, 1-14.
[11] Shakeel, P. M., Burhanuddin, M. A., & Desa, M. I. (2019). Lung cancer detection from
CT image using improved profuse clustering and deep learning instantaneously trained neural
networks. Measurement, 145, 702-712.
[12] Zhang, C., Sun, X., Dang, K., Li, K., Guo, X. W., Chang, J., ... & Zhong, W. Z. (2019).
Toward an expert level of lung cancer detection and classification using a deep convolutional
neural network. The oncologist, 24(9), 1159-1165.
[13] Qassim, H., Verma, A., & Feinzimer, D. (2018, January). Compressed residual-VGG16
CNN model for big data places image recognition. In 2018 IEEE 8th annual computing and
communication workshop and conference (CCWC) (pp. 169-175). IEEE.
[14] Pal, S. S., Raymahapatra, P., Paul, S., Dolui, S., Chaudhuri, A. K., & Das, S. A Novel
Brain Tumor Classification Model Using Machine Learning Techniques.
[15] Saha, S., Mondal, J., Arnam Ghosh, M., Das, S., & Chaudhuri, A. K. Prediction on the
Combine Effect of Population, Education, and Unemployment on Criminal Activity Using
Machine Learning.
[16] Dey, R., Bose, S., Ghosh, N., Chakraborty, S., Kumar, A., & Chaudhuri, S. D. An
Extensive Review on Cancer Detection using Machine Learning Algorithms.
[17] Ray, A., & Chaudhuri, A. K. (2021). Smart healthcare disease diagnosis and patient
management: Innovation, improvement and skill development. Machine Learning with
Applications, 3, 100011.
[18] Chaudhuri, A. K., Banerjee, D. K., & Das, A. (2021). A Dataset Centric Feature
Selection and Stacked Model to Detect Breast Cancer. International Journal of Intelligent
Systems and Applications (IJISA), 13(4), 24-37.
[19] Chaudhuri, A. K., Ray, A., Banerjee, D. K., & Das, A. (2021). An Enhanced Random
Forest Model for Detecting Effects on Organs after Recovering from Dengue. methods, 8(8).
[20] Pal, S. S., Paul, S., Dey, R., Das, S., & Chaudhuri, A. K. Determining the probability of
poverty levels of the Indigenous Americans and Black Americans in the US using Multiple
Regression.
[21] Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for
deep learning. Journal of big data, 6(1), 1-48.
[22] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep
convolutional neural networks. Communications of the ACM, 60(6), 84-90.
[23] Girshick, R. (2015). Fast r-CNN. In Proceedings of the IEEE international conference on
computer vision (pp. 1440-1448).
[24] Pham, L., Le, C., Ngo, D., Nguyen, A., Lampert, J., Schindler, A., & McLoughlin, I.
(2023). A Light-weight Deep Learning Model for Remote Sensing Image Classification.
arXiv preprint arXiv:2302.13028.

researchpaper_2023_Lungs_Cancer.pdfdfgdgfhdf

  • 1.
    Lung cancer predictionusing CNN with VGG16 Prithwish Raymahapatra1, , Dr. Avijit Kumar Chaudhuri2 , Sulekha Das3 1 UG - Computer Science and Engineering, Techno Engineering College Banipur, Habra, Kolkata 2 Associate Professor, Computer Science, and Engineering, Techno Engineering College Banipur, Habra, Kolkata 3 Assistant Professor, Computer Science, and Engineering, Techno Engineering College Banipur, Habra, Kolkata 1 prithsray@gmail.com, 2 c.avijit@gmail.com, 3 shu7773sea@gmail.com 1 0000-0001-7147-9491, 2 0000-0002-5310-3180, 3 0000-0002-6641-3268 Abstract The objective of this research work is to classify Lungs Cancer CT Scan images into 4 different classes using Convolutional Neural Network (CNN) with VGG16 architecture. The 4 classes are Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images. The dataset used for this research is a publicly available Chest CT-Scan images Dataset with 1000 images at Kaggle. The methodology followed in this project includes data pre-processing, model building, and evaluation. The dataset is pre-processed by resizing the images to 224x224 and normalizing the pixel values. The VGG16 architecture has been used to build the CNN model, and it is trained on the pre-processed image data for 12 epochs with a batch size of 64 and learning rate of 0.001. The model is evaluated using the area under the receiver operating characteristic curve (AUC) metric. The results of this project show that the CNN model with VGG16 architecture achieves an AUC of 0.94 for classifying Lungs cancer CT scan images into 4 different classes. In conclusion, the CNN model with VGG16 architecture is an effective approach for classifying Lung Cancer CT Scan images into 4 different classes. The model achieves high accuracy of 0.94 in identifying different types of Lung cancer, which could potentially aid in early diagnosis and treatment of Lung cancer. The proposed model is tested on the “Chest CT-Scan images Dataset” from kaggle which is available at https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images . Keywords - CNN, VGG16, AUC, Lungs Cancer
  • 2.
    Introduction: Lungs Cancer CTscan image classification using a convolutional neural network (CNN) is a challenging task in medical image analysis. In this Article, we have used VGG16 model for this image classification tasks, and got the best result on this model. The VGG16 is a recently developed model that hasn’t been used on this dataset. The goal of this project is to classify Lungs Cancer CT scan images into 4 different classes: Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images. We will use the Area Under the Operating Characteristic Curve (AUC) of the receiver as the evaluation metric for our model. AUC, or Area Under the Curve, is an important metric used in machine learning(ML) to evaluate the performances of binary classification models. It measures the performance of the given model by distinguishing between positive and negative samples. In a problem that is binary classified, the model predicts a probability score for each sample, and the AUC represents the probability that the model will rank a positive sample that is randomly chosen higher than a negative sample that is also randomly chosen. The AUC ranges from 0 to 1, where an AUC of 0.5 shows that the model is not good than random guessing, while a model whose AUC value is 1, indicates that it’s a perfect model. AUC is a popular metric because it is more robust than accuracy when dealing with imbalanced datasets. It also provides a comprehensive evaluation of the model's performance, taking into account both false positive and false negative rates. Furthermore, AUC is useful for comparing the performance of different models, as it is independent of the classification threshold used to make predictions. This allows for a fair comparison of models even if they have different thresholds. Overall, we can imply that AUC is a valuable metric in any machine learning technique that provides insight into the performance of models that are binary classified and can help in selecting the best model for a given problem. To accomplish this, we will first pre-process the dataset, which consists of Lungs cancer CT scan images. We will then train the models using transfer learning with pre-trained weights on ImageNet. Next, we will fine-tune the model on our dataset, followed by evaluating the model's performance on the test set using the AUC metric. The final output of our project will be a model that can accurately classify Lungs cancer CT scan images into one of the four classes with high accuracy and AUC. This model can be used as an effective tool for early diagnosis and treatment of Lungs cancer.
  • 3.
    Relevant Literature : AuthorFeature/methods Performance Rehman, Amjad, et al. (2021) [2] SVM,KNN Accuracy: 93% Sensitivity : 86% Specificity : 95.4 % Toğaçar, Mesut, Burhan Ergen, and Zafer Cömert. (2020) [3] LR,LDA,DT,SVM,KNN, Accuracy: 98.74% Sensitivity : 98.35% Specificity : 99.12 % Precision : 99.12% F1-Score : 98.74% Nasser, Ibrahim M., and Samy S. Abu-Naser.(2019) [4] ANN Accuracy: 96.67% Bhatia, Siddharth, Yash Sinha, and Lavika Goel.(2019) [5] XGBoost and Random Forest Accuracy: 84% Makaju, Suren, et al.(2018) [6] SVM Accuracy: 92% Sensitivity : 100% Specificity : 50% Taher, Fatma, and Rachid Sammouda.(2011) [7] HNN Accuracy: 98% Sensitivity : 83% Specificity : 99% Tekade, Ruchita, and K. Rajeswari.(2018) [8] CNN Accuracy: 95.66% Alam, Janee, Sabrina Alam, and Alamgir Hossan.(2018) [9] Multiclass SVM Accuracy (detection) : 97% Accuracy (prediction) : 87% Table 1: Literature Study Table 1 shows the literature review of various skin cancer research work
  • 4.
    Methodology:  Dataset: Thefirst step of this whole research work was selecting the dataset. In this case, we have chosen a dataset which is a Lungs Cancer CT scan images datset. This dataset contains 1000 CT scan images of human lungs, classified into 4 classes: Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images. The particular reason behind working with this dataset is that this dataset consists of a lot of sample images and a lot of research work has already been done with this dataset and got remarkable results. The dataset was divided into two parts: the training set, the testing set and the validation set. The training set contains a total of 613 images, the testing set consists of 315 images and validation set consist of 72 images. Each of the two consists of all the 4 classes i.e. Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images. Training Set Testing Set Validation Set Large cell carcinoma 115 51 21 Squamous cell carcinoma 155 90 15 Adenocarcinoma 195 120 23 Normal 148 54 13 Table 2: Train Test Validation Divisions Table 2 shows the Training, testing, validation division of the data that have been used in this article. Research Method:  Convolutional Neural Network: CNN[23] stands for Convolutional Neural Network, which is a type of deep neural network i.e. deep learning commonly used in image and video recognition and also to process any tasks. The key characteristic of CNN is that it can automatically learn and extract features from the raw data, in this case, images or videos. These features are learned through a process of convolution, where the network applies a complete set of filters or kernels to the input image to identify patterns and structures in the data. The output of the convolutional layers is then passed through a series of pooling layers, which reduce the spatial size of the features and help to increase the network's ability to generalize to new images. After the pooling layers, the resulting features are flattened into a vector and fed into the fully connected layer, where the network can make predictions based on the learned features. CNNs have proven to be highly effective in a range of computer vision tasks, including classification of the image, the detection of objects, and segmentation.
  • 5.
     VGG16: VGG16[13] is a convolutional neural network (CNN) architecture used to win the 2014 ILSVR (ImageNet) competition. Today it is considered one of the excellent machine vision model architectures. The great feature of VGG16 is that it avoids lots of hyper parameters, we focused on 3x3 filter convolution layers in step 1, and always used the same padding and max pool layers of 2x2 filters in step 2. Maximum pool layers consistently throughout the architecture. In the end, there are 2 Fully Connected Layers (FCs) and a softmax for the output layer. The 16 in VGG16 means there are 16 layers with weights. This network is quite large and has about 138 million parameters. Figure 1: VGG16 Architecture Figure 1 shows the VGG16 Architecture that has been used in this article  ImageNet: The ImageNet [22] weights for VGG16 are pre-trained weights that have been learned on the large-scale ImageNet dataset. These weights are often used as starting point for transfer learning in computer vision tasks This includes the weights for all the layers in the network, as well as the biases for the fully connected layers.  ImageDataGenerator: In Keras, the ImageDataGenerator[21] class is used for image generation and data augmentation. This class provides a set of functions for pre- processing and data augmentation on the input images. It generates batches of tensor image data using real-time data augmentation. This allows you to train deep learning models on a large dataset without having to load all the images into memory at once. Instead, the ImageDataGenerator loads the images in batches and applies various image transformations on the fly.
  • 6.
    PRIMARY WORK: Thefirst step of this whole research work was selecting the dataset. In this case, we have chosen a dataset which is Lungs Cancer CT scan images dataset from Kaggle. This dataset contains 1000 CT scan images of human lung, classified into 4 classes: Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images. The particular reason behind working with this dataset is that this dataset has a lot of sample images and a lot of research work has already been done with this dataset and got remarkable results. After selection of the dataset, we have used the VGG16 model that came out in 2014 which is one of the best CNN models available right now and is used in many classification models over other models like AlexNet which are less discriminative. Post training the model over the dataset, we tested it over the testing set and got remarkable results with the classifications. The various parameters of measuring the performance i.e. accuracy, recall, precision, specificity, F1-score, and AUC of this research are depicted later. Confusion Matrix: A confusion matrix [20] i.e. also called an error matrix, is one type of matrix or a table where we put the results of the MLR model i.e. the test data. The confusion matrix is the shortest way to see and understand the result of the model. In the confusion matrix, there are a total of four variables as – TP, TN, FP, and FN. TP stands for 'true positive' which shows the total number of positive data classified accurately. TN stands for 'true negative' which shows the total number of negative data classified accurately. FP stands for 'false positive' which indicates the real value is negative but predicted as positive. FP is called a TYPE 1 ERROR. FN stands for 'false negative' which indicates the real value is positive but predicted as negative. FN is also called a TYPE 2 ERROR. Figure 2: Confusion Matrix Figure 2 shows the Confusion Matrix that has been used in this article DEVELOPING EQUATION OF CONFUSION MATRIX: Let’s take- TP= TRUE POSITIVE TN= TRUE NEGATIVE
  • 7.
    FP= FALSE POSITIVE FN=FALSE NEGATIVE FPR= FALSE POSITIVE RATE Table 3: Accuracy Metrics Table 3 shows the various accuracy metrics TP+TN Accuracy = TP+TN+FP+FN In any model, it represents the ratio of the number of times the model can make the correct prediction with the total number of predictions. TP Sensitivity = TP+FN We defined it as the ratio of the number of times a model can make a positive prediction to the total number of correct predictions. TN Specificity = TN+FP We defined it as the ratio of the number of times a model can predict that the result will be negative to the total number of times it has made the correct prediction. TP Precision = TP + FP Precision is the method by which way one can say how correctly predicted cases turned positive. TP Recall = TP+FN Recall is calculated as the ratio of the number of positive samples correctly classified as positive to the total number of positive samples. Recall measures the ability of a model to detect positive samples. The higher the recall, the more positive samples are found. FP FPR TN FP   It is the probability of falsely rejecting the null hypothesis. 2 * Recall * Precision F1_Score = Recall + Precision F1 score is the measurement of accuracy and it is the harmonic mean of precision and recall. Its maximum value can be 1 and the minimum value can be 0. 1 FPR recall auc= - + 2 2 2 AUC [14] stands for Area Under the ROC Curve, which is a popular evaluation metric in machine learning for binary classification problems. The ROC (Receiver Operating
  • 8.
    Characteristic) curve isa graphical representation of the performance of a binary classifier, and the AOC measures the area under this curve. In a problem that is binary classified, the classifier tries to predict whether an input belongs to a positive or negative class. The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) for different classification thresholds. The TPR is the ratio of correctly predicted positive samples to the total number of actual positive samples, and the FPR is the ratio of incorrectly predicted positive samples to the total number of actual negative samples. The AOC ranges from 0 to 1, with higher values indicating better performance. A perfect classifier would have an AOC of 1, while a completely random classifier would have an AOC of 0.5. The AOC is a useful evaluation metric because it takes into account all possible classification thresholds and provides a single number to compare the performance of different classifiers. Procedure:  Define the model architecture using the VGG16 pre-trained model as a base and add new classifier layers on top.  Load the pre-trained weights for the VGG16 model.  Freeze all the layers of the VGG16 model to prevent them from being updated during training.  Add new fully connected classifier layers with appropriate activation functions and kernel initializers.  Compile the model with appropriate optimizer and loss function, and evaluate using relevant metrics like accuracy, precision, recall, AUC, and F1 score.  Augment the data using ImageDataGenerator to increase the size of the training dataset.  Fit the model to the augmented data and evaluate the model on the test data.  Calculate and print relevant metrics like accuracy, precision, recall, specificity, and F1 score for the test dataset.  Calculate and print the AUC (Area under the Curve) score.
  • 9.
    FLOWCHART: Figure 3: Flowchart Figure3 shows the Flowchart of the process that has been used in this article to get the desired result. RESULTS AND DISCUSSION After analysing the VGG16 model on this dataset we get the results that are given below. Epochs Accuracy (%) Precision (%) Recall (%) AUC (%) F1-Score (%) 9 91.69 70.70 94.76 92.88 80.98 10 92.18 79.68 87.93 90.68 83.60 12 94.62 84.76 93.13 94.10 88.75 15 93.75 81.25 92.85 93.42 86.66 Table 4: Comparisons Table Table 4 shows the Comparison of results from different epochs that have been used to get the desired results.
  • 10.
    COMPARISON: ACCURACY: Figure 4: Graphof Accuracy Figure 4 shows the Accuracy difference graph between the different epochs that have been obtained in this article for getting the desired result. F1_SCORE: Figure 5: Graph of F1-Score Figure 5 shows the F1-Score difference graph between the different epochs that have been obtained in this article for getting the desired result. 90 90.5 91 91.5 92 92.5 93 93.5 94 94.5 95 epoch 9 epoch 10 epoch 12 epoch 15 Accuracy Accuracy 76 78 80 82 84 86 88 90 epoch 9 epoch 10 epoch 12 epoch 15 F1-Score F1-Score
  • 11.
    AUC: Figure 6: Graphof AUC Figure 6 shows the AUC difference graph between the different epochs that have been obtained in this article for getting the desired result. CONCLUSION Author Feature/methods Performance Rehman, Amjad, et al. SVM,KNN Accuracy: 93% Sensitivity -86% Specificity – 95.4 % Proposed architecture VGG16 Accuracy: 94.62 % Precision: 84.76 % Recall: 93.13 % F1-Score: 88.75 % AUC: 94.10 % Table 5: Lung Cancer CT scan Dataset Performance Comparison Table 5 shows the comparative study of different metrics used in the same dataset. This article focuses on the identification and classification of different Lungs cancer CT Scan images into their respective classes i.e. Large cell carcinoma, Squamous cell carcinoma, Adenocarcinoma, and Normal Lungs images by transfer learning approach using Convolutional Neural Network (CNN) as the working model with VGG16 architecture with sigmoid and relu activation function, and calculating the AUC of the model which depicts how efficiently the model is working and how accurately it is classifying those images. The AUC of this model is 0.94 which depicts that this model is highly efficient to classify those 88 89 90 91 92 93 94 95 epoch 9 epoch 10 epoch 12 epoch 15 AUC AUC
  • 12.
    images. The modelachieves an accuracy of 0.94 in identifying different types of Lung cancer CT scan images, which could potentially aid in early diagnosis and treatment of Lung cancer. FUTURE SCOPE As the AUC of this model is very high so this model can be used in the future for other disease datasets and also other datasets. In the future, we will collect data from various healthcare agencies and hospitals and we will try a new approach on this dataset and other dataset like this with the help of reinforcement learning techniques to obtain better results and to increase the performance of those models for early detection of these type of diseases.
  • 13.
    REFERENCES: [1] https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images [2] Rehman,A., Kashif, M., Abunadi, I., & Ayesha, N. (2021, April). Lung cancer detection and classification from chest CT scans using machine learning techniques. In 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA) (pp. 101- 104). IEEE. [3] Toğaçar, M., Ergen, B., & Cömert, Z. (2020). Detection of lung cancer on chest CT images using minimum redundancy maximum relevance feature selection method with convolutional neural networks. Biocybernetics and Biomedical Engineering, 40(1), 23-39. [4] Nasser, I. M., & Abu-Naser, S. S. (2019). Lung cancer detection using artificial neural network. International Journal of Engineering and Information Systems (IJEAIS), 3(3), 17- 23. [5] Bhatia, S., Sinha, Y., & Goel, L. (2019). Lung cancer detection: a deep learning approach. In Soft Computing for Problem Solving: SocProS 2017, Volume 2 (pp. 699-705). Springer Singapore. [6] Makaju, S., Prasad, P. W. C., Alsadoon, A., Singh, A. K., & Elchouemi, A. (2018). Lung cancer detection using CT scan images. Procedia Computer Science, 125, 107-114. [7] Taher, F., & Sammouda, R. (2011, February). Lung cancer detection by using artificial neural network and fuzzy clustering methods. In 2011 IEEE GCC conference and exhibition (GCC) (pp. 295-298). IEEE. [8] Tekade, R., & Rajeswari, K. (2018, August). Lung cancer detection and classification using deep learning. In 2018 fourth international conference on computing communication control and automation (ICCUBEA) (pp. 1-5). IEEE. [9] Alam, J., Alam, S., & Hossan, A. (2018, February). Multi-stage lung cancer detection and prediction using multi-class svm classifie. In 2018 International conference on computer, communication, chemical, material and electronic engineering (IC4ME2) (pp. 1-4). IEEE. [10] Shakeel, P. M., Burhanuddin, M. A., & Desa, M. I. (2022). Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier. Neural Computing and Applications, 1-14. [11] Shakeel, P. M., Burhanuddin, M. A., & Desa, M. I. (2019). Lung cancer detection from CT image using improved profuse clustering and deep learning instantaneously trained neural networks. Measurement, 145, 702-712. [12] Zhang, C., Sun, X., Dang, K., Li, K., Guo, X. W., Chang, J., ... & Zhong, W. Z. (2019). Toward an expert level of lung cancer detection and classification using a deep convolutional neural network. The oncologist, 24(9), 1159-1165.
  • 14.
    [13] Qassim, H.,Verma, A., & Feinzimer, D. (2018, January). Compressed residual-VGG16 CNN model for big data places image recognition. In 2018 IEEE 8th annual computing and communication workshop and conference (CCWC) (pp. 169-175). IEEE. [14] Pal, S. S., Raymahapatra, P., Paul, S., Dolui, S., Chaudhuri, A. K., & Das, S. A Novel Brain Tumor Classification Model Using Machine Learning Techniques. [15] Saha, S., Mondal, J., Arnam Ghosh, M., Das, S., & Chaudhuri, A. K. Prediction on the Combine Effect of Population, Education, and Unemployment on Criminal Activity Using Machine Learning. [16] Dey, R., Bose, S., Ghosh, N., Chakraborty, S., Kumar, A., & Chaudhuri, S. D. An Extensive Review on Cancer Detection using Machine Learning Algorithms. [17] Ray, A., & Chaudhuri, A. K. (2021). Smart healthcare disease diagnosis and patient management: Innovation, improvement and skill development. Machine Learning with Applications, 3, 100011. [18] Chaudhuri, A. K., Banerjee, D. K., & Das, A. (2021). A Dataset Centric Feature Selection and Stacked Model to Detect Breast Cancer. International Journal of Intelligent Systems and Applications (IJISA), 13(4), 24-37. [19] Chaudhuri, A. K., Ray, A., Banerjee, D. K., & Das, A. (2021). An Enhanced Random Forest Model for Detecting Effects on Organs after Recovering from Dengue. methods, 8(8). [20] Pal, S. S., Paul, S., Dey, R., Das, S., & Chaudhuri, A. K. Determining the probability of poverty levels of the Indigenous Americans and Black Americans in the US using Multiple Regression. [21] Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of big data, 6(1), 1-48. [22] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. [23] Girshick, R. (2015). Fast r-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). [24] Pham, L., Le, C., Ngo, D., Nguyen, A., Lampert, J., Schindler, A., & McLoughlin, I. (2023). A Light-weight Deep Learning Model for Remote Sensing Image Classification. arXiv preprint arXiv:2302.13028.