SlideShare a Scribd company logo
1 of 54
Download to read offline
EXCEL ENGINEERING COLLEGE
(AUTONOMOUS)
A
Project Report
On
“Skin Disease Classification from Image”
SUBMITTED BY
Sura Vishnu Vardhan Reddy
(Regd.No:730921106108)
LinkedIn Id: https://www.linkedin.com/in/sura-vishnu-vardhan-reddy-b68729269
GitHub Id: https://github.com/vishnu6643
SUBMITTED TO
Pegasus Aerospace System
Erode, Tamil Nadu, Pin – 638002
All content following this page was uploaded by Sura Vishnu Vardhan Reddy on 25 April 2023.
The user has requested enhancement of the downloaded file.
Skin Disease Classification from Image
Abstract
Skin diseases are one of the most common types of health illnesses faced by the people for
ages. The identification of skin disease mostly relies on the expertise of the doctors and skin
biopsy results, which is a time-consuming process. An automated computer-based system for
skin disease identification and classification through images is needed to improve the
diagnostic accuracy as well as to handle the scarcity of human experts. Classification of skin
disease from an image is a crucial task and highly depends on the features of the diseases
considered in order to classify it correctly. Many skin diseases have highly similar visual
characteristics, which add more challenges to the selection of useful features from the image.
The accurate analysis of such diseases from the image would improve the diagnosis,
accelerates the diagnostic time and leads to better and cost-effective treatment for patients.
This paper presents the survey of different methods and techniques for skin disease
classification namely; traditional or handcrafted feature-based as well as deep learning-based
techniques.
Keywords— Skin diseases, lesions, classification, deep learning, CNN, SVM.
I – Introduction
The largest organ of human body is “Skin”, an adult carry around 3.6 kg and 2 square meters
of it. Skin acts as a waterproof, insulating shield, guarding the body against extremes of
temperature, damaging UV lights, and harmful chemicals. With the rate of 10-12%, the
population affected across India from skin disease is estimated at nearly 15.1 Crore in 2013
and which increases to 18.8 crores by 2015[38]. According to statistics provided by the World
Health Organization [39] around 13 million melanoma skin cancer occurs globally each year,
which shows skin diseases are growing very rapidly. There are many factors responsible for a
disease to occur such as UV lights, pollution, poor immunity, and an unhealthy lifestyle. There
are two major categories in which the lesions (spot) of skin disease are classified; benign and
malignant skin lesions. Most of the skin lesions are benign in nature which is gentle and non-
dangerous, whereas those which are dangerous for patient’s health and evil in nature are
malignant skin lesions such as melanoma skin cancer. Diagnosis of skin disease from an image
is a challenging problem as there exist many skin diseases. Researchers reported following
problems during skin disease classification: 1) A disease may have many lesion types. 2) Many
diseases may have a similar visual characteristic, which is often confusing for the
dermatologist as well to identify the disease by visual inspection. 3) The varying skin colors
and skin type (age) introduce more difficulty in computer-based diagnosis. Therefore, relevant
feature selection for such diseases is very important in computer-based diagnosis in order to
identify it correctly. The success of an automatic system rely on how accurately the system
performs and does need image processing as well as machine learning tasks.
There are many technologies available in the medical science for diagnosis of skin diseases.
But the computer based automatic diagnosis is quite more useful for medical decision
support and makes the entire process fast. For example, if such automated system is
implemented in the healthcare centres, then patient does not have to suffer unnecessarily
due to unavailability of experts. Further, it is non-invasive method of diagnosis therefore it is
not painful. As per 2015 statistics of India [38], for approximately 121 crore of population
there are about 6,000 dermatologists providing services in India. This means that for every
100,000 people, only 0.49 dermatologists are available in India as compared to 3.2 in many
states of the US [38].
Due to recent advances in the technology large amount of medical data is produced daily
and these data contains valuable and crucial information about the patients. The image based
artificial intelligence is becoming more popular for certain diseases especially skin diseases.
The diagnostic accuracy for computer-based system highly relies on the selection of relevant
feature, classifier used and the availability of dataset as well as number of images on which
the model has been trained. Now a day’s for pattern recognition and classification tasks the
Convolution Neural Networks (CNN) are highly used. For better understanding of various
works done by the researchers, we carry out a survey on different approaches used for the
classification of the skin diseases.
This paper is divided into four sections. Section II presents the background knowledge; type
of images, and usage of traditional and deep learning based approaches for skin disease
classification. Section III presents a survey on traditional or feature extraction based methods
as well as CNN based approaches for skin disease identification and classification. Section IV
presents the analysis and findings of traditional and CNN based methods and finally, Section V
presents the conclusion
II - BACKGROUND KNOWLEDGE
This section is divided into three parts: Skin disease Image type, general process for skin
disease classification using traditional techniques and using deep learning-based techniques.
A. Clinical and Dermoscopic Images
A clinical image is said to be the image of the patient's affected body part- such as an
injury, skin lesion or it can be diagnostic image. The image is captured with normal or digital
camera. This type of image may have different lightening, resolution and different angle
depend on the type of camera used for capturing the image. For computer aided diagnosis,
dermoscopic images are more useful. These images are produced using dermoscope [16],
which is an instrument used by dermatologist to analyse the skin lesions. The dermoscope
usually has uniform illumination and more contrast. As the device has bright illumination, the
lesions are clear enough for visualization and recognition. Furthermore, processing of
dermoscopic images become easy because the images have less noise. Fig.1 (a) illustrates the
way to capture dermoscopic image, (b) presents the dermoscopic image and (c) shows the
clinical image.
(a) (d) (c)
B. Skin Disease Classification using Traditional Approach;
In the traditional approach, the handcrafted features are fed into the conventional
classifier. Fig. 2 shows the general process of skin disease classification using the
traditional approach.
1) Input Image
Input Image Skin disease image databases for many diseases are available freely.
However, some are fully or partially open source and others are commercially available.
The input image can be of type dermoscopic or clinical based on the dataset used. Table I
contains the information about the availability and details of various datasets. The widely
used datasets are mentioned in the table.
2) Image pre-processing
Image pre-processing is an important step and it is required because an image may
contain many noises such as dermoscopic gel, air bubbles, and hairs.
However, clinical images require more pre-processing as compared to dermoscopic because
of parameters such as resolution, lightening condition, illumination, angle of image captured,
size of skin area covered may vary and depends on the person who is capturing the image.
These captured images could create problems in the subsequent stages.
The skin hairs can be removed using different filters such as; median, average or Gaussian filter,
morphological operations such as erosion and dilation, binary thresholding and software such as Dull
Razor. For low contrast images; lesion or contrast enhancement algorithms are useful. The contrast
enhancement with histogram equalization provides better visualization by uniform distribution of
pixel intensity across the image and it is one of the most used techniques in literature. For salt and
pepper kind of noise; a median or mean filter can give better noise removal results.
Dataset Image No. Images Classes Open source
Derm Net NZ
image library.
Clinical 20000+ - Partially
Dermofit Image
Library.
Dermoscopic 1300 10 Yes
ISBI-2016. Dermoscopic 1279 2 Yes
ISBI – 2017 Dermoscopic 2750 2 Yes
Ham10000 Dermoscopic 10015 7 Yes
Stanford Hospital Clinical - - No
Pecking Union medical
college
clinical database
Dermoscopic 28000 - No
IRMA Dataset Dermoscopic 747 2 Not
Available
PH2 Dermoscopic 200 2 Yes
MED-NODE Clinical 170 2 Yes
DermQuest Clinical 22500 - Yes
Hospital Pedro
Hispano,
Matosinhos
Dermoscopic 200 3 No
SD-198 Clinical and
Dermoscopic
6584 198 Yes
3) Image Segmentation
Image segmentation extracts the disease affected area from the normal skin and can play very
important role in skin disease detection [16]. Image segmentation can be carried out by three
ways: 1) pixel-based, 2) edge-based, and 3) region-based segmentation. In pixel-based
segmentation, each pixel of an image is identified to be the part of a homogeneous region or to
an object. This can be done using binary thresholding or variant of it. The edge-based method
detects and links edge pixels to form the bounding shape of the skin lesions. For example,
Robert, Prewitt, Sobel and Canny operators, adaptive snake or gradient vector flow can be used.
The Region-based methods rely on similar patterns in the intensity values within the
neighborhood pixels and are based on continuity. The examples are region growing, merging
and splitting, and Watershed algorithm.
4) Feature Extraction
The most prominent features which are used to describe and identify skin diseases
visually are its color and texture information. The color information plays an
important role to distinguish one disease from another. These color features can be
extracted using various techniques such as color histograms, color correlograms,
color descriptors, GLCM. The texture information conveys the complex visual patterns
of the skin lesions and spatially organized entities such as brightness, color, shape,
and size. Image texture is basically a function of variation in pixel intensity. GLCM,
local binary pattern, SIFT are some techniques used by researchers to get the texture
information from the image. In addition to color and texture, each lesion may have
different shapes and sizes based on the type of the disease and its severity.
5) Classification
Classification is a supervised learning approach for machine learning task. It requires
labelled dataset to map the data into specific groups or classes. There are various
classification algorithms used to classify the skin disease images such as support vector
machine, feed forward neural network, back propagation neural network, k-nearest
neighbour, decision trees, etc.
Deep Learning is a part of machine learning algorithm inspired by the structure and function of
human brain commonly known as neural networks. Convolution Neural Networks (CNN) is a class of
deep learning algorithm which is mostly used for analysing the visual contents such as images and
videos. With the development of CNN, there has been dramatic improvement observed to solve
many classification-based problems in medical image analysis... The basic process for CNN based
skin disease image classification is presented.
CNN based approach of skin disease Classification
The process starts with data acquisition. Input to the CNN can be dermoscopic or clinical image,
which can be pre-processed if needed; the next step is data augmentation. This results in enough
training samples to train the model. Finally, the data is fed into the CNN which performs feature
extraction and classification by its own. A CNN typically consists of convolution layer in which
numbers of filters perform convolution operation on the image and generates feature maps. These
feature maps are further down sampled by pooling layers. Finally, the fully connected layer has all
the connection from previous layer and does the classification accordingly.
Many researchers have used CNN for skin disease classification via transfer learning or fine-tuning of
pre-trained models like Inception v3, ResNet, VGG architecture and many more. In transfer learning
only weights are optimized if new classification layers have to be added. However, the weights of the
original model remain as it is. In fine-tuning the parameters of a trained model must be altered very
carefully while trying to validate that model for a dataset with a smaller number of images which
does not belong to the train set. Moreover, we need to keep track of the hyper parameters of CNN
otherwise the model may have problem of over-fitting. Over-fitting means model learned too well,
i.e., it also learns irrelevant information and noise as well which may result in good training accuracy
but poor testing accuracy.
III. SURVEY OF LITERATURE
This section presents a survey on both traditional and deep learning-based skin disease
identification and classification approaches. Table II and III analyses all major works for both the
aforementioned techniques; traditional/handcrafted feature-based techniques and deep learning
techniques for classification of skin disease from images.
C. Skin Disease Classification using Deep Learning based Approach
Amarathunga et al. It have come up with expert system limited to classify three diseases. The
system consists of two separate units namely; data processing and Image processing unit.
The data processing unit was responsible for image acquisition, pre-processing for noise
removal, segmentation and feature extraction from the skin disease images whereas data
processing unit was employed for data mining task or classification. Five classification
algorithms were tested by the authors namely; AdaBoost, BayesNet, J48, MLP and
NaiveBayes. Out of these five the MLP classifier gave better results as compared to other
classifiers. However, the data source of images and attributes considered for disease
classification is not mentioned.
Chakraborty et al. [3] have proposed a hybrid model using multi objective optimization algorithm
NSGA-II and ANN for diagnosis of skin lesion being benign or malignant. The bagof-features
approach is applied to classify the skin lesions and are generated using SIFT. SIFT algorithm identifies
and locates the key points from the input image and generates the feature vector. Also, to handle
large number of keypoints kmeans clustering algorithm was used to get representative keypoints
where each cluster contains some representative keypoints and these are the generated bag-of-
features. These features are then fed to the hybrid classifier where NSGA-II is used to train the ANN.
Authors [3] also compared the model’s accuracy with ANN-PSO (ANN trained with particle swarm
optimization) and ANN-CS (ANN trained with cukoo search.)
The spatial and frequency domain-based technique is used by Chatterjee et al. It is for identification
of skin lesion being benign or malignant. The malignant lesions are further classified into
subcategories namely; melanocytic or epidermal skin lesions. The cross-correlation technique is
used to extract regional features which are invariant to light intensity and illumination changes. Also,
the cross spectrum-based frequency domain analysis has been used for retrieving more detailed
features of skin lesions. For classification the SVM classifier was used with three non-linear kernels
[10] out of which SVM with RBF kernel gave promising accuracy as compared to other kernels.
A. Survey on Traditional Techniques for Skin Disease Image Classification
TABLE II. SURVEY OF TRADITIONAL TECHNIQUES FOR SKIN DISEASE CLASSIFICATION
References Disease Image Type No. of
images
Pre
processing
Segmentation Feature Extractions Classifiers Performance
Measure
Amarathunga Eczema,
Impetigo,
melanoma
Clinical - Y Thresholding - MLP Accuracy: 90%
Chakraborty BCC, SA Dermoscopic - - Thresholding SIFT NN-NSGA-II Accuracy:
90.56%
Precision:
88.26%
Recall:93.64%
F-measure:
90.87%
Manerkar . Warts, Benign &
Malignant Skin
cancer
Clinical 45 Y C-means
Clustering
and
watershed
algorithm
GLCM and IQA SVM Accuracy:
9698%
Zaqout . Benign,
Malignant or
suspecious
lesions
Dermoscopic 200 Y Thresholding ABCD rule
implementation
using entropy,
bifold,color
and
diameter
TDS Accuracy: 90%
Sensitivity: 85%
Specificity:
92.22%
Chatterjee Melanoma,Nevus,
BCC,SK
Dermoscopic 6,838 - - Crosscorrelation,cross
spectrum
SVM Accuracy:
98.79%
Sensitivity:
99.01%
Specificity:
95.35%
Arifin Acne, eczema,
psoriasis, Tinea,
vitilogo, scabies
Clinical 704 Y Thresholding GLCM feedforward
backpropagation
ANN
Accuracy:
94.04%
Monisha BCC, SA,
Lentigo simplex
Dermoscopic - Y GMM GLCM, DRLBP &
GRLTP
NSGA-II-PNN -
a. Disease: SK - Seborrheic keratoses, BCC - Basal Cell Carcinoma, SA-Skin Angioma, classifier: TDS (Total Dermoscopic score =
Asymmetry* 1.3 + Border-Irregularity*0.1 + color *0.5 + diameter*0.5), NSGA-II - Nondominated Sorting Genetic Algorithm, feature
Extraction: GMM- Gradient Mixture Model, GLCM-Grey level co-occurrence matrix, IQA-Image Quality Assessment, PNN- probabilistic
Neural Network
Esteva et al. were first to report about how the image classifier convolutional neural netwok (CNN)
can achieve the performance similar to the 21 board-certified dermatologists for identification of
malignant lesions. The 3-way disease partition algorithm was designed to classify a given skin lesion
to be malignant, benign or non-neoplastic. Also, 9-way disease partition was performed to classify a
given lesion into one of the 9 mentioned categories. The state-of-the art InceptionV3 CNN
architecture was used for skin lesion classification has concluded that the CNN can outperform
human experts if it is trained with enough data. Also, has concluded that the CNN can outperform
human experts if it is trained with enough data.
Zhang et al. It also used InceptionV3 architecture with modified final layer to classify 4 diseases. The
model was trained on two nearly similar datasets of dermoscopic images. Authors concluded that
misclassification can occur due to presence of multiple disease lesions on the single skin image.
Sun et al. It has proposed handcrafted feature based as well as CNN based approaches for
classification of clinical images. They trained four CNN architectures namely; Caffenet, fine-tuned
Caffenet, VGG and fine-tuned VGGNet. Out of these four the fine-tuned VGGNet gave quite good
accuracy. The accuracy of VGGNet was similar to that of the handcrafted feature which was
generated by 7 different methods namely; SIFT, HOG, LBP, and color histogram with SVM classifier.
However, the architectures and use of benchmark dataset plays an important role for skin disease
image classification to achieve good accuracy.
Gessert et al. introduced patch-based method to obtain fine-grain differences between various skin
lesions from high resolution images. The high resolution images are divided into 5, 9, and 16 crops or
patches and these images patches or crops are fed to the standard CNN architectures. Three
architectures were used by the authors namely; Inception v3, DenseNet and SE-Resnext50
architecture for prediction of disease from high resolution image patch.
Rehman et al. It has proposed CNN architecture by setting 16 different filters of 7*7 kernel size with
pooling layers for down sampling. The proposed model was trained for malignant and benign
category of diseases namely; melanoma, Seborrheic keratosis and nevus. The RGB channels of the
segmented image are normalized with zero mean and unit variance. This normalized matrix was fed
to CNN for feature extraction, further the fully connected layer consists of 3-layer ANN classifier
which classify the skin lesion being banign or malignant.
Kulhalli et al. It has proposed a 5-stage, 3-stage and 2stage hierarchical approach to classify 7
diseases using InceptionV3 CNN architecture. The authors have addressed the class imbalance
problem by using image augmentation technique in order to balance the category classes. The 5-
stage classifier gave better result as compared to 2 and 3-stage hierarchical classifiers. Further, the
authors suggested that the model can be further fine-tuned and ensemble-based methods might
help in order to improve the classification performance.
B. Survey on deep learning-based approach for Skin Disease Image
Classification
IV. ANALYSIS & FINDINGS
Both traditional, as well as CNN based approaches are useful for the classification of skin diseases.
The traditional methods require appropriate feature extraction as well as segmentation method for
skin diseases. Further, it is important to identify the relevant features and discard irrelevant features
as the classification often depends on features selected. Therefore, if irrelevant features got selected
then it may lead to misclassification. However, contrary to CNN traditional approach does not
require a large size dataset.
CNN can learn the features of the skin diseases automatically. It selects the filters intelligently as
compared to the traditional or manual way of selecting filters in traditional approach to extract the
relevant features from the images. Therefore, no feature extraction method is needed in CNN based
approach. However, pretrained models can be used to classify the skin diseases but these models
are heavy in terms of: 1) number of parameters, 2) number of layers, 3) selection and fine-tuning of
the appropriate pre-trained model and 4) The model has to be trained from scratch as it is not been
trained for skin disease images.
However, CNN can also be designed from scratch. The following criteria are important whenever
CNN architecture is designed to classify skin diseases:
1) Dataset: The availability of large dataset is very important as CNN learns much efficiently
whenever it’s been trained with enough data. The large dataset of clinical images are available on [31],
[32]. For dermoscopic images the large datasets are published by ISIC [27].
2) Hyperparameters of CNN: The network structure is determined by the hyperparameters.
These hyperparameters are supposed to be set before training the CNN. The parameters which define
the network structure are number of hidden layers, dropout, kernel size, number of kernels, batch
size, number of epochs, activation function, learning rate, etc.
3) Computational Power: The main challenge of training CNN is the availability of computational
resources. There are thousands of trainable parameters on CNN; therefore, it is computationally costly
as compared to the traditional way of classifying skin disease. To train the CNN GPU availability is a
must. Also, the training time is more and it depends on the size of the dataset used to train the model.
TABLE III. SURVEY OF DEEP LEARNING BASED SKIN DISEASE CLASSIFICATION
Reference Disease
classes
Image type No. of
images
Dataset Additional
(Preprocessing/
Segmentation)
CNN Architecture Performance Measures
Sun et al.
[24]
Wide Variety Clinical 6,584 SD-198 [34] Fine-tuned VGG19 Accuracy: 50.27%
5,619 SD-128[24]
Esteva et al.
[4]
Malignant
and
Ben
ign
skin lesions
Clinical 129,450 ASIC [27],
Edinburgh
Dermofit
Library[33],
Stanford
Hospital [4]
Inception V3 with PA
(partition algorithm)
Accuracy: 72.1 ±0.9%
Dermoscopic 3,374
Zhang et al.
[5]
Melanocyticn
evus, SK
BCC,
psoriasis.
Dermoscopic 1,067 Dataset A [5] - Inception v3 Dataset A:
Accuracy: 87.25 ± 2.24%
522 Dataset B [5] Dataset B:
Accuracy:86.63% ± 5.78%
Rehman
al. [22]
et Malignant
and
Ben
ign
skin lesions
Dermoscopic 379 ASIC-2016 [27] Segmentation
using
Generalized
Gaussian
Distribution
CNN With Conv : 16
filters of 7*7,
pooling layer:16
100*50*5
FC: Accuracy: 98.32%
Sensitivity: 98.15%
Specificity: 98.41%
Brinker
al. [6]
et Melanoma
and Nevi
Clincal - HAM10000 [27] ResNet50 Mean Sensitivity: 89.4%
Dermoscopic 12,378 Mean Specificity: 64.4%
ROC: 0.769
Kulhalli al.
[7]
et Melanoma,
Nevi, SK,
Akiec, BCC,
DF, BKL
Dermoscopic 10,015 HAM10000 [27] InceptionV3 Normalized F1 Score : 0.93
Khan et al.
[8]
Melanoma
Vs other
Dermoscopic 1,279 ISBI-16[27] Lesion
Enhancement
ResNet50
ResNet101
and Accuracy:
ISBI 2016 : 90.20%
2,790 ISBI-17[27] ISBI 2017: 95.60%
10,000 HAM10000 [27] Ham1000: 89.8%
SKIN DISEASE CLASSIFICATION
Deep learning to predict the various skin diseases. The main objective of this project is to
achieve maximum accuracy of skin disease prediction. Deep learning techniques helps in
detection of skin disease at an initial stage. The feature extraction plays a key role in
classification of skin diseases. The usage of Deep Learning algorithms reduces the need for
human labor, such as manual feature extraction and data reconstruction for classification
purpose. Moreover, Explainable AI is used to interpret the decisions made by our model.
ABOUT THE DATASET
HAM10000 ("Human Against Machine with 10000 training images") dataset - a large
collection of multi-sources dermatoscopic images of pigmented lesions. The dermatoscopic
images are collected from different populations, acquired and stored by different modalities.
The final dataset consists of 10015 dermatoscopic images.
It has 7 different classes of skin cancer which are listed below:
• Melanocytic nevi
• Melanoma
• Benign keratosis-like lesions
• Basal cell carcinoma
• Actinic keratoses
• Vascular lesions
• Dermatofibroma
Importing Libraries
#Importing required libraries
import matplotlib.pyplot as plt
from PIL import Image
import seaborn as sns
import numpy as np
import pandas as pd
import os
from tensorflow.keras.utils import to_categorical
from glob import glob
✓ HAM10000_metadata.csv file is the main csv file that includes the data of all training
images, the features of which are –
1. Lesion_id
2. Image_id
3. Dx
4. Dx_type
5. Age
6. Sex
7. Localization
Reading the Data from the Dataset
# Reading the data from HAM_metadata.csv
df = pd.read_csv('../input/skin-cancer-mnist-ham10000/HAM10000_metadata.csv'
df.head()
df.dtypes
lesion_id object
image_id object
dx object
dx_type object
age float64
sex object
localization object
dtype: object
lesion_id image_id dx dx_type age sex localization
0 HAM_0000118 ISIC_0027419 bkl histo 80.0 male scalp
1 HAM_0000118 ISIC_0025030 bkl histo 80.0 male scalp
2 HAM_0002730 ISIC_0026769 bkl histo 80.0 male scalp
3 HAM_0002730 ISIC_0025661 bkl histo 80.0 male scalp
4 HAM_0001466 ISIC_0031633 bkl histo 75.0 male ear
df.describe()
Data Cleaning
Removing NULL values and performing visualizations to gain insights of
dataset: Univariate and Bivariate Analysis
df. isnull(). sum()
lesion_id 0
image_id 0
dx 0
dx_type 0
age 57
sex 0
localization 0
dtype: int64
The feature 'age' consists of 57 null records. Thus, we need to replace them with the mean of
'age' since dropping 57 records would lead to loss of data.
age
count 9958.000000
mean 51.863828
std 16.968614
min 0.000000
25% 40.000000
50% 50.000000
75% 65.000000
max 85.000000
df['age'].fillna(int(df['age'].mean()),inplace=True)
df.isnull().sum()
lesion_id 0
image_id 0
dx 0
dx_type 0
age 0
sex 0
localization 0
dtype: int64
lesion_type_dict = {
'nv': 'Melanocytic nevi',
'mel': 'Melanoma',
'bkl': 'Benign keratosis-like lesions ',
'bcc': 'Basal cell carcinoma',
'akiec': 'Actinic keratoses',
'vasc': 'Vascular lesions',
'df': 'Dermatofibroma'
}
base_skin_dir = '../input/skin-cancer-mnist-ham10000'
# Merge images from both folders into one dictionary
imageid_path_dict = {os.path.splitext(os.path.basename(x))[0]: x
for x in glob(os.path.join(base_skin_dir, '*', '*.jpg'))}
df['path'] = df['image_id'].map(imageid_path_dict.get)
df['cell_type'] = df['dx'].map(lesion_type_dict.get)
df['cell_type_idx'] = pd.Categorical(df['cell_type']).codes
df.head()
lesion_id image_id
d
x
dx_ty
pe
ag
e
sex
localizat
ion
path
cell_ty
pe
cell_type
_idx
0
HAM_000
0118
ISIC_0027
419
b
kl
histo
80.
0
ma
le
scalp
../input/skin-
cancer-mnist-
ham10000/ham1
0000_i...
Benign
kerato
sis-like
lesions
2
1
HAM_000
0118
ISIC_0025
030
b
kl
histo
80.
0
ma
le
scalp
../input/skin-
cancer-mnist-
ham10000/ham1
0000_i...
Benign
kerato
sis-like
lesions
2
2
HAM_000
2730
ISIC_0026
769
b
kl
histo
80.
0
ma
le
scalp
../input/skin-
cancer-mnist-
ham10000/ham1
0000_i...
Benign
kerato
sis-like
lesions
2
3
HAM_000
2730
ISIC_0025
661
b
kl
histo
80.
0
ma
le
scalp
../input/skin-
cancer-mnist-
ham10000/ham1
0000_i...
Benign
kerato
sis-like
lesions
2
4
HAM_000
1466
ISIC_0031
633
b
kl
histo
75.
0
ma
le
ear
../input/skin-
cancer-mnist-
ham10000/ham1
0000_i...
Benign
kerato
sis-like
lesions
2
Image Preprocessing
df['image'] = df['path'].map(lambda x: np.asarray(Image.open(x).resize((125,100))))
n_samples = 5
fig, m_axs = plt.subplots(7, n_samples, figsize = (4*n_samples, 3*7))
for n_axs, (type_name, type_rows) in zip(m_axs,
df.sort_values(['cell_type']).groupby('cell_type')):
n_axs[0].set_title(type_name)
for c_ax, (_, c_row) in zip(n_axs, type_rows.sample(n_samples,
random_state=2018).iterrows()):
c_ax.imshow(c_row['image'])
c_ax.axis('off')
fig.savefig('category_samples.png', dpi=300)
Resizing of images because the original dimensions of 450 * 600 * 3
take long time to process in Neural Networks
# See the image size distribution - should just return one row (all images are uniform)
df['image'].map(lambda x: x.shape).value_counts()
(100, 125, 3) 10015
Name: image, dtype: int64
Exploratory Data Analysis
Exploratory data analysis can help detect obvious errors, identify outliers in datasets,
understand relationships, unearth important factors, find patterns within data, and provide
new insights.
df= df[df['age'] != 0]
df= df[df['sex'] != 'unknown']
plt.figure(figsize=(20,10))
plt.subplots_adjust(left=0.125, bottom=1, right=0.9, top=2, hspace=0.2)
plt.subplot(2,4,1)
plt.title("AGE",fontsize=15)
plt.ylabel("Count")
df['age'].value_counts().plot.bar()
plt.subplot(2,4,2)
plt.title("GENDER",fontsize=15)
plt.ylabel("Count")
df['sex'].value_counts().plot.bar()
plt.subplot(2,4,3)
plt.title("localization",fontsize=15)
plt.ylabel("Count")
plt.xticks(rotation=45)
df['localization'].value_counts().plot.bar()
plt.subplot(2,4,4)
plt.title("CELL TYPE",fontsize=15)
plt.ylabel("Count")
df['cell_type'].value_counts().plot.bar()
<AxesSubplot:title={'center':'CELL TYPE'}, ylabel='Count'>
1. Skin diseases are found to be maximum in people aged around 45. Minimum for 10
and below. We also observe that the probability of having skin disease increases with
the increase in age.
2. Skin diseases are more prominent in Men as compared to Women and other gender.
3. Skin diseases are more visible on the "back" of the body and least on the "acral
surfaces"(such as limbs, fingers, or ears).
4. The most found disease among people is Melanocytic nevi while the least found is
Dermatofibroma.
plt.figure(figsize=(15,10))
plt.subplot(1,2,1)
df['dx'].value_counts().plot.pie(autopct="%1.1f%%")
plt.subplot(1,2,2)
df['dx_type'].value_counts().plot.pie(autopct="%1.1f%%")
plt.show()
1. Type of skin disease:
• nv: Melanocytic nevi - 69.9%
• mel: Melanoma - 11.1 %
• bkl: Benign keratosis-like lesions - 11.0%
• bcc: Basal cell carcinoma - 5.1%
• akiec: Actinic keratoses- 3.3%
• vasc: Vascular lesions-1.4%
• df: Dermatofibroma - 1.1%
2. How the skin disease was discovered :
• histo - histopathology - 53.3%
• follow_up - follow up examination - 37.0%
• consensus - expert consensus - 9.0%
• confocal - confirmation by in-vivo confocal microscopy - 0.7%
BIVARIATE ANALYSIS
plt.figure(figsize=(25,10))
plt.title('LOCALIZATION VS GENDER',fontsize = 15)
sns.countplot(y='localization', hue='sex',data=df)
<AxesSubplot:title={'center':'LOCALIZATION VS GENDER'}, xlabel='count', ylabel='localization'>
• Back are is the most affected among people and more prominent in men.
• Infection on Lower extremity of the body is more visible in women.
• Some unknown regions also show infections and it's visible in men, women and other
genders.
• The acral surfaces show the least infection cases that too in men only. Other gender
groups don't show this kind of infection.
plt.figure(figsize=(25,10))
plt.title('LOCALIZATION VS CELL TYPE',fontsize = 15)
sns.countplot(y='localization', hue='cell_type',data=df)
<AxesSubplot:title={'center':'LOCALIZATION VS CELL TYPE'}, xlabel='count', ylabel='localization'>
• The face is infected the most by Benign keratosis-like lesions.
• Body parts(except face) are infected the most by Melanocytic nevi.
plt.figure(figsize=(25,10))
plt.subplot(131)
plt.title('AGE VS CELL TYPE',fontsize = 15)
sns.countplot(y='age', hue='cell_type',data=df)
plt.subplot(132)
plt.title('GENDER VS CELL TYPE',fontsize = 15)
sns.countplot(y='sex', hue='cell_type',data=df)
<AxesSubplot:title={'center':'GENDER VS CELL TYPE'}, xlabel='count', ylabel='sex'>
1. The age group between 0-75 years is infected the most by Melanocytic nevi. On the other hand,
the people aged 80-90 are affected more by Benign keratosis-like lesions.
2. All the gender groups are affected the most by Melanocytic nevi.
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
ANN
features=df.drop(columns=['cell_type_idx'],axis=1)
target=df['cell_type_idx']
features.head()
lesion_id image_id dx
dx_ty
pe
age sex
localizati
on
path
cell_ty
pe
imag
e
0
HAM_0000
118
ISIC_0027
419
bk
l
histo
80.
0
mal
e
scalp
../input/skin-cancer-
mnist-
ham10000/ham1000
0_i...
Benign
keratosi
s-like
lesions
[[[18
9,
152,
194],
[192,
156,
198],
[191,
154,..
.
1
HAM_0000
118
ISIC_0025
030
bk
l
histo
80.
0
mal
e
scalp
../input/skin-cancer-
mnist-
ham10000/ham1000
0_i...
Benign
keratosi
s-like
lesions
[[[24,
13,
22],
[24,
14,
22],
[24,
14,
26],
[2...
2
HAM_0002
730
ISIC_0026
769
bk
l
histo
80.
0
mal
e
scalp
../input/skin-cancer-
mnist-
ham10000/ham1000
0_i...
Benign
keratosi
s-like
lesions
[[[18
6,
127,
135],
[189,
133,
145],
[192,
135,..
.
3
HAM_0002
730
ISIC_0025
661
bk
l
histo
80.
0
mal
e
scalp
../input/skin-cancer-
mnist-
ham10000/ham1000
0_i...
Benign
keratosi
s-like
lesions
[[[24,
11,
17],
[24,
11,
20],
[30,
15,
25],
[4...
x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(features, target,
test_size=0.25,random_state=666)
tf.unique(x_train_o.cell_type.values)
Unique(y=<tf.Tensor: shape=(7,), dtype=string, numpy=
array([b'Melanocytic nevi', b'Basal cell carcinoma', b'Melanoma',
b'Actinic keratoses', b'Vascular lesions',
b'Benign keratosis-like lesions ', b'Dermatofibroma'], dtype=object)>, idx=<tf.Tensor: sha
pe=(7440,), dtype=int32, numpy=array([0, 1, 2, ..., 1, 0, 0], dtype=int32)>)
x_train = np.asarray(x_train_o['image'].tolist())
x_test = np.asarray(x_test_o['image'].tolist())
x_train_mean = np.mean(x_train)
x_train_std = np.std(x_train)
x_test_mean = np.mean(x_test)
x_test_std = np.std(x_test)
x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_test_mean)/x_test_std
# Perform one-hot encoding on the labels
y_train = to_categorical(y_train_o, num_classes = 7)
y_test = to_categorical(y_test_o, num_classes = 7)
y_test
array([[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 1., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 1., 0., 0.]], dtype=float32)
x_train, x_validate, y_train, y_validate = train_test_split(x_train, y_train, test_size = 0.1,
random_state = 999)
# Reshape image in 3 dimensions (height = 100, width = 125 , canal = 3)
x_train = x_train.reshape(x_train.shape[0], *(100, 125, 3))
x_test = x_test.reshape(x_test.shape[0], *(100, 125, 3))
x_validate = x_validate.reshape(x_validate.shape[0], *(100, 125, 3))
x_train = x_train.reshape(6696,125*100*3)
x_test = x_test.reshape(2481,125*100*3)
print(x_train.shape)
print(x_test.shape)
(6696, 37500)
(2481, 37500)
# define the keras model
model = Sequential()
model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu', input_dim =
37500))
model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 7, kernel_initializer = 'uniform', activation = 'softmax'))
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.00075,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = 1e-8)
# compile the keras model
model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics =
['accuracy'])
# fit the keras model on the dataset
history = model.fit(x_train, y_train, batch_size = 10, epochs = 50)
accuracy = model.evaluate(x_test, y_test, verbose=1)[1]
print("Test: accuracy = ",accuracy*100,"%")
result of Epoch
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
CNN
CNN is ideal for image classification. It is better since CNN has features parameter sharing and
dimensionality reduction. Because of parameter sharing in CNN, the number of parameters is
reduced thus the computations get decreased.
Since the data is less, we apply data augmentation using ImageDataGenerator.
ImageDataGenerator generates augmentation of images in real-time while the model is still
training. One can apply any random transformations on each training image as it is passed to
the model.
The CNN model is a repeated network of the following layers:
1. Convolutional
2. Pooling
3. Dropout
4. Flatten
5. Dense
from tensorflow.keras.layers import Flatten,Dense,Dropout,BatchNormalization,Conv2D,
MaxPool2D
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Set the CNN model
# my CNN architechture is In -> [[Conv2D->relu]*2 -> MaxPool2D -> Dropout]*3 -> Flatten -> Dense*2 -> Dropout
-> Out
input_shape = (100, 125, 3)
num_classes = 7
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',padding = 'Same',input_shape=input_shape))
model.add(Conv2D(32,kernel_size=(3, 3), activation='relu',padding = 'Same',))
model.add(MaxPool2D(pool_size = (2, 2)))
model.add(Dropout(0.16))
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',padding = 'Same'))
model.add(Conv2D(32,kernel_size=(3, 3), activation='relu',padding = 'Same',))
model.add(MaxPool2D(pool_size = (2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), activation='relu',padding = 'same'))
model.add(Conv2D(64, (3, 3), activation='relu',padding = 'Same'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(num_classes, activation='softmax'))
model.summary()
Applied Data augmentation using ImageDatagenerator before
model training
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 100, 125, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 100, 125, 32) 9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 50, 62, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 50, 62, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 50, 62, 32) 9248
_________________________________________________________________
conv2d_3 (Conv2D) (None, 50, 62, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 25, 31, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 25, 31, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 25, 31, 64) 18496
_________________________________________________________________
conv2d_5 (Conv2D) (None, 25, 31, 64) 36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 15, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 12, 15, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 11520) 0
_________________________________________________________________
dense_5 (Dense) (None, 256) 2949376
_________________________________________________________________
dense_6 (Dense) (None, 128) 32896
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_7 (Dense) (None, 7) 903
=================================================================
Total params: 3,067,239
Trainable params: 3,067,239
Non-trainable params: 0
# Define the optimizer
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0,
amsgrad=False)
# Compile the model
model.compile(optimizer = optimizer , loss = "categorical_crossentropy",
metrics=["accuracy"])
# Set a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy',
patience=4,
verbose=1,
factor=0.5,
min_lr=0.00001)
x_train, x_validate, y_train, y_validate = train_test_split(x_train, y_train, test_size = 0.1,
random_state = 999)
# Reshape image in 3 dimensions (height = 100, width = 125 , canal = 3)
x_train = x_train.reshape(x_train.shape[0], *(100, 125, 3))
x_test = x_test.reshape(x_test.shape[0], *(100, 125, 3))
x_validate = x_validate.reshape(x_validate.shape[0], *(100, 125, 3))
# With data augmentation to prevent overfitting
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.1, # Randomly zoom image
width_shift_range=0.12, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.12, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=True) # randomly flip images
datagen.fit(x_train)
# Fit the model
epochs = 60
batch_size = 16
history = model.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size),
epochs = epochs, validation_data = (x_validate,y_validate),
verbose = 1, steps_per_epoch=x_train.shape[0] // batch_size
, callbacks=[learning_rate_reduction])
from tensorflow.keras.metrics import Recall
from sklearn.metrics import classification_report,confusion_matrix
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
loss, accuracy = model.evaluate(x_test, y_test, verbose=1)
loss_v, accuracy_v = model.evaluate(x_validate, y_validate, verbose=1)
print("Validation: accuracy = %f ; loss_v = %f" % (accuracy_v, loss_v))
print("Test: accuracy = %f ; loss = %f" % (accuracy, loss))
model.save("model.h5")
78/78 [==============================] - 1s 8ms/step - loss: 0.6185 - accuracy: 0.7686
21/21 [==============================] - 0s 7ms/step - loss: 0.6881 - accuracy: 0.7433
Validation: accuracy = 0.743284 ; loss_v = 0.688070
Test: accuracy = 0.768642 ; loss = 0.618472
import itertools
# Function to plot confusion matrix
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Predict the values from the validation dataset
Y_pred = model.predict(x_validate)
# Convert predictions classes to one hot vectors
Y_pred_classes = np.argmax(Y_pred,axis = 1)
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_validate,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
# Predict the values from the validation dataset
Y_pred = model.predict(x_test)
# Convert predictions classes to one hot vectors
Y_pred_classes = np.argmax(Y_pred,axis = 1)
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_test,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(7))
label_frac_error = 1 - np.diag(confusion_mtx) / np.sum(confusion_mtx, axis=1)
plt.bar(np.arange(7),label_frac_error)
plt.xlabel('True Label')
plt.ylabel('Fraction classified incorrectly')
Text(0, 0.5, 'Fraction classified incorrectly')
# # Function to plot model's validation loss and validation accuracy
# def plot_model_history(model_history):
# fig, axs = plt.subplots(1,2,figsize=(15,5))
# # summarize history for accuracy
# axs[0].plot(range(1,len(model_history.history['accuracy'])+1),model_history.history['accuracy'])
#
axs[0].plot(range(1,len(model_history.history['val_accuracy'])+1),model_history.history['val_accuracy'
])
# axs[0].set_title('Model Accuracy')
# axs[0].set_ylabel('Accuracy')
# axs[0].set_xlabel('Epoch')
#
axs[0].set_xticks(np.arange(1,len(model_history.history['accuracy'])+1),len(model_history.history['acc
uracy'])/10)
# axs[0].legend(['train', 'val'], loc='best')
# # summarize history for loss
# axs[1].plot(range(1,len(model_history.history['loss'])+1),model_history.history['loss'])
# axs[1].plot(range(1,len(model_history.history['val_loss'])+1),model_history.history['val_loss'])
# axs[1].set_title('Model Loss')
# axs[1].set_ylabel('Loss')
# axs[1].set_xlabel('Epoch')
#
axs[1].set_xticks(np.arange(1,len(model_history.history['loss'])+1),len(model_history.history['loss'])/10
)
# axs[1].legend(['train', 'val'], loc='best')
# plt.show()
# plot_model_history(history)
Tranfer Learning
Why MobileNet?
MobileNet significantly reduces the number of parameters when compared to the network
with regular convolutions with the same depth in the nets. This results in lightweight deep
neural networks.
The 2 layers in addition to the ones used for CNN are: Batch Normalization Zero Padding
import tensorflow
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau,
ModelCheckpoint
df['image'] = df['path'].map(lambda x: np.asarray(Image.open(x).resize((450,600))))
features=df.drop(columns=['cell_type_idx'],axis=1)
target=df['cell_type_idx']
x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(features, target,
test_size=0.25,random_state=666)
tf.unique(x_train_o.cell_type.values)
Unique(y=<tf.Tensor: shape=(7,), dtype=string, numpy=
array([b'Melanocytic nevi', b'Basal cell carcinoma', b'Melanoma',
b'Vascular lesions', b'Benign keratosis-like lesions ',
b'Actinic keratoses', b'Dermatofibroma'], dtype=object)>, idx=<tf.Tensor: shape=(7511,), dtype=int32, numpy=array([0, 1
, 0, ..., 1, 0, 0], dtype=int32)>)
x_train = np.asarray(x_train_o['image'].tolist())
x_test = np.asarray(x_test_o['image'].tolist())
x_train_mean = np.mean(x_train)
x_train_std = np.std(x_train)
x_test_mean = np.mean(x_test)
x_test_std = np.std(x_test)
x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_test_mean)/x_test_std
# Perform one-hot encoding on the labels
y_train = to_categorical(y_train_o, num_classes = 7)
y_test = to_categorical(y_test_o, num_classes = 7)
y_test
Due to lack of dataset, pretrained model of MobileNet is used.
x_train, x_validate, y_train, y_validate = train_test_split(x_train, y_train, test_size = 0.1,
random_state = 999)
# Reshape image in 3 dimensions (height = 100, width = 125 , canal = 3)
x_train = x_train.reshape(x_train.shape[0], *(224, 224, 3))
x_test = x_test.reshape(x_test.shape[0], *(224, 224, 3))
x_validate = x_validate.reshape(x_validate.shape[0], *(224, 224, 3)
print(x_train.shape)
# create a copy of a mobilenet model
mobile = tensorflow.keras.applications.mobilenet.MobileNet()
mobile.summary()
def change_model(model, new_input_shape=(None, 40, 40, 3),custom_objects=None):
# replace input shape of first layer
config = model.layers[0].get_config()
config['batch_input_shape']=new_input_shape
model._layers[0]=model.layers[0].from_config(config)
# rebuild model architecture by exporting and importing via json
new_model =
tensorflow.keras.models.model_from_json(model.to_json(),custom_objects=custom_objects
)
# copy weights from old model to new one
for layer in new_model._layers:
try:
layer.set_weights(model.get_layer(name=layer.name).get_weights())
print("Loaded layer {}".format(layer.name))
except:
print("Could not transfer weights for layer {}".format(layer.name))
return new_model
new_model = change_model(mobile, new_input_shape=[None] + [100,125,3])
new_model.summary()
# CREATE THE MODEL ARCHITECTURE
# Exclude the last 5 layers of the above model.
# This will include all layers up to and including global_average_pooling2d_1
x = new_model.layers[-6].output
# Create a new dense layer for predictions
# 7 corresponds to the number of classes
x = Dropout(0.25)(x)
predictions = Dense(7, activation='softmax')(x)
# inputs=mobile.input selects the input layer, outputs=predictions refers to the
# dense layer we created above.
model = Model(inputs=new_model.input, outputs=predictions)
# We need to choose how many layers we actually want to be trained.
# Here we are freezing the weights of all layers except the
# last 23 layers in the new model.
# The last 23 layers of the model will be trained.
for layer in model.layers[:-23]:
layer.trainable = False
# Define Top2 and Top3 Accuracy
from tensorflow.keras.metrics import categorical_accuracy, top_k_categorical_accuracy
def top_3_accuracy(y_true, y_pred):
return top_k_categorical_accuracy(y_true, y_pred, k=3)
def top_2_accuracy(y_true, y_pred):
return top_k_categorical_accuracy(y_true, y_pred, k=2)
model.compile(Adam(lr=0.01), loss='categorical_crossentropy',
metrics=[categorical_accuracy, top_2_accuracy, top_3_accuracy])
# Add weights to try to make the model more sensitive to melanoma
class_weights={
0: 1.0, # akiec
1: 1.0, # bcc
2: 1.0, # bkl
3: 1.0, # df
4: 3.0, # mel # Try to make the model more sensitive to Melanoma.
5: 1.0, # nv
6: 1.0, # vasc
}
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_top_3_accuracy', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_top_3_accuracy', factor=0.5, patience=2,
verbose=1, mode='max', min_lr=0.00001)
callbacks_list = [checkpoint, reduce_lr]
history = model.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size),
class_weight=class_weights,
validation_data=(x_validate,y_validate),steps_per_epoch=x_train.shape[0] //
batch_size,
epochs=10, verbose=1,
callbacks=callbacks_list)
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
# get the metric names so we can use evaulate_generator
model.metrics_names
# Here the the last epoch will be used.
val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = 
model.evaluate(datagen.flow(x_test,y_test, batch_size=16) )
print('val_loss:', val_loss)
print('val_cat_acc:', val_cat_acc)
print('val_top_2_acc:', val_top_2_acc)
print('val_top_3_acc:', val_top_3_acc)
# Here the best epoch will be used.
model.load_weights('model.h5')
val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = 
model.evaluate_generator(datagen.flow(x_test,y_test, batch_size=16)
)
print('val_loss:', val_loss)
print('val_cat_acc:', val_cat_acc)
print('val_top_2_acc:', val_top_2_acc)
print('val_top_3_acc:', val_top_3_acc)
# display the loss and accuracy curves
import matplotlib.pyplot as plt
acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
train_top2_acc = history.history['top_2_accuracy']
val_top2_acc = history.history['val_top_2_accuracy']
train_top3_acc = history.history['top_3_accuracy']
val_top3_acc = history.history['val_top_3_accuracy']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.figure()
plt.plot(epochs, acc, 'bo', label='Training cat acc')
plt.plot(epochs, val_acc, 'b', label='Validation cat acc')
plt.title('Training and validation cat accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, train_top2_acc, 'bo', label='Training top2 acc')
plt.plot(epochs, val_top2_acc, 'b', label='Validation top2 acc')
plt.title('Training and validation top2 accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, train_top3_acc, 'bo', label='Training top3 acc')
plt.plot(epochs, val_top3_acc, 'b', label='Validation top3 acc')
plt.title('Training and validation top3 accuracy')
plt.legend()
plt.show()
Plot the Training Curves
accuracy = model.evaluate(x_test, y_test,verbose=1)[1]
accuracy_v = model.evaluate(x_validate, y_validate)[1]
print("Validation: accuracy = ", accuracy_v)
print("Test: accuracy = ",accuracy)
model.save("model.h5")
# make a prediction
predictions = model.predict_generator(datagen.flow(x_test,y_test, batch_size=16),
verbose=1)
predictions.shape
test_batches = datagen.flow(x_test,y_test, batch_size=16)
test_batches
# Source: Scikit Learn website
# http://scikit-learn.org/stable/auto_examples/
# model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-
# selection-plot-confusion-matrix-py
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
Create a Confusion Matrix
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.tight_layout()
# Function to plot confusion matrix
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Predict the values from the validation dataset
Y_pred = model.predict(x_validate)
# Convert predictions classes to one hot vectors
Y_pred_classes = np.argmax(Y_pred,axis = 1)
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_validate,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(7))
# Predict the values from the validation dataset
Y_pred = model.predict(x_test)
# Convert predictions classes to one hot vectors
Y_pred_classes = np.argmax(Y_pred,axis = 1)
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_test,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(7))
y_pred = model.predict(x_test)
y_pred =y_pred>0.5
cm_plot_labels = ['akiec', 'bcc', 'bkl', 'df', 'mel','nv', 'vasc']
from sklearn.metrics import classification_report
# Generate a classification report
report = classification_report(y_test, y_pred, target_names=cm_plot_labels)
print(report)
model.save("mobilenet_model.h5")
Generate the Classification Report
tile_df = df.copy()
tile_df.drop('lesion_id', inplace=True, axis=1)
tile_df.drop('image_id', inplace=True, axis=1)
tile_df.drop('cell_type', inplace=True, axis=1)
tile_df.drop('path', inplace=True, axis=1)
tile_df.drop('dx', inplace=True, axis=1)
tile_df.head()
X = tile_df.drop(['cell_type_idx'],axis=1).values
y = tile_df['cell_type_idx'].values
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)
pip install alibi
pip install shap
import shap
shap.initjs()
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from alibi.explainers import KernelShap
from scipy.special import logit
Techniques applied: LIME, PDP, SHAP, etc.
dx_type
age sex localization cell_type_idx
0 histo 80.0 male scalp 2
1 histo 80.0 male scalp 2
2 histo 80.0 male scalp 2
3 histo 80.0 male scalp 2
4 histo 75.0 male ear 2
from sklearn.metrics import confusion_matrix, plot_confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
tile_df['localization_onehot'] = tile_df.localization.map({'scalp':0, 'ear':1, 'face':2,
'neck':3,'back':4, 'trunk':5, 'chest':6,
'upper extremity':7, 'abdomen':8, 'lower extremity':9,
'genital':10, 'hand':11, 'foot':12, 'acral':13, 'unknown':14})
tile_df.head()
tile_df['dx_type_onehot'] =
tile_df.dx_type.map({'confocal':0,'consensus':1,'follow_up':2,'histo':3})
tile_df.head()
[23]:
dx_type age sex localization cell_type_idx localization_onehot dx_type_onehot
0 histo 80.0 male scalp 2 0 3
1 histo 80.0 male scalp 2 0 3
2 histo 80.0 male scalp 2 0 3
3 histo 80.0 male scalp 2 0 3
4 histo 75.0 male ear 2 1 3
dx_type
age sex localization cell_type_idx localization_onehot
0 histo 80.0 male scalp 2 0
1 histo 80.0 male scalp 2 0
2 histo 80.0 male scalp 2 0
3 histo 80.0 male scalp 2 0
4 histo 75.0 male ear 2 1
tile_df['gender_male'] = tile_df.sex.map({'female':0, 'male':1, 'unknown':2})
tile_df.head()
dx_typ
e
age sex
localizatio
n
cell_type_id
x
localization_oneho
t
dx_type_oneho
t
gender_mal
e
0
hist
o
80.
0
male scalp 2 0 3 1
1
hist
o
80.
0
male scalp 2 0 3 1
2
hist
o
80.
0
male scalp 2 0 3 1
3
hist
o
80.
0
male scalp 2 0 3 1
4
hist
o
75.
0
male ear 2 1 3 1
tile_df.columns
Index(['dx_type', 'age', 'sex', 'localization', 'cell_type_idx',
'localization_onehot', 'dx_type_onehot', 'gender_male'],
dtype='object')
features = ['age', 'localization_onehot', 'dx_type_onehot','gender_male']
X = tile_df[features]
y = tile_df['cell_type_idx'].values
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)
from xgboost import XGBClassifier
from sklearn.ensemble import RandomForestClassifier
model = XGBClassifier(random_state=1)
model = model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
Accuracy: 72.16%
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
print('Expected Value: ', explainer.expected_value)
Expected Value: [-0.6287137, -0.21934628, 0.4661603, -1.7456617, 2.6632032, 0.5190712, -1.2845858]
shap.summary_plot(shap_values, X_test, plot_type="bar")
shap.summary_plot(shap_values[0], X_test)
from sklearn.preprocessing import LabelEncoder
## Preprocess training and test target (y) after having performed train-test split
le = LabelEncoder()
y_multi_train = pd.Series(le.fit_transform(y_train))
y_multi_test = pd.Series(le.transform(y_test))
## Check classes
le.classes_
array([0, 1, 2, 3, 4, 5, 6], dtype=int8)
shap.initjs()
shap.dependence_plot('dx_type_onehot', interaction_index='age',
shap_values=shap_values[0],
features=X_test,
display_features=X_test)
shap.initjs()
shap.force_plot(explainer.expected_value[0], shap_values[0][:100,:], X_test.iloc[:100,:])
shap.initjs()
shap.force_plot(explainer.expected_value[0], shap_values[0][15,:], X_test.iloc[15,:])
Feature Importance:
Feature importance measures the increase in the prediction error of the model after we
permuted the feature values.
A feature is "important" if shuffling its values increases the model error, because in this case
the model relied on the feature for the prediction.
A feature is "unimportant" if shuffling its values leaves the model error unchanged, because
in this case the model ignored the feature for the prediction.
Now install the eli5
pip install eli5
import eli5
from eli5.sklearn import PermutationImportance
eli5.show_weights(model.get_booster(), top=15)
tgt = 6
print('Reference:', y_test[tgt])
print('Predicted:', predictions[tgt])
eli5.show_prediction(model.get_booster(), X_test.iloc[tgt],
feature_names=features, show_feature_values=True)
Weight Feature
0.8239 dx_type_onehot
0.0748 age
0.0667 localization_onehot
0.0346 gender_male
Reference: 4
Predicted: 4
y=0 (probabili
ty 0.000,
score -7.697)
top features
y=1 (probabili
ty 0.000,
score -5.861)
top features
y=2 (probabili
ty 0.000,
score -1.815)
top features
y=3 (probabili
ty 0.000,
score -2.777)
top features
y=4 (probabili
ty 1.000,
score 7.013)
top features
y=5 (probabili
ty 0.000,
score -5.009)
top features
y=6 (probabili
ty 0.000,
score -4.376)
top features
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
-
0
.
1
3
3
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
1
.
1
2
9
<
B
I
A
S
>
1
.
0
0
0
-
1
.
3
4
7
lo
c
al
iz
at
io
n
_
o
n
e
h
ot
9
.
0
0
0
-
1
.
3
7
0
a
g
e
5
0
.
0
0
0
-
3
.
7
d
x
_t
y
2
.
0
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
-
0
.
0
9
6
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
0
.
4
5
2
a
g
e
5
0
.
0
0
0
-
0
.
5
5
5
lo
c
al
iz
at
io
n
_
o
n
e
h
ot
9
.
0
0
0
-
0
.
7
1
9
<
B
I
A
S
>
1
.
0
0
0
-
4
.
0
d
x
_t
y
2
.
0
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
+
0
.
0
8
7
lo
c
al
iz
at
io
n
_
o
n
e
h
ot
9
.
0
0
0
+
0
.
0
2
1
a
g
e
5
0
.
0
0
0
+
0
.
0
1
6
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
0
.
0
3
4
<
B
I
A
S
>
1
.
0
0
0
-
1
.
9
d
x
_t
y
2
.
0
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
+
0
.
8
1
6
a
g
e
5
0
.
0
0
0
+
0
.
5
5
9
lo
c
al
iz
at
io
n
_
o
n
e
h
ot
9
.
0
0
0
-
0
.
5
2
5
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
1
.
3
8
1
d
x
_t
y
p
e
_
o
n
e
2
.
0
0
0
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
+
4
.
7
5
1
d
x
_t
y
p
e
_
o
n
e
h
ot
2
.
0
0
0
+
2
.
1
6
3
<
B
I
A
S
>
1
.
0
0
0
+
0
.
0
6
4
a
g
e
5
0
.
0
0
0
+
0
.
0
4
1
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
0
.
0
lo
c
al
iz
at
9
.
0
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
+
0
.
0
1
9
<
B
I
A
S
>
1
.
0
0
0
-
0
.
0
0
4
lo
c
al
iz
at
io
n
_
o
n
e
h
ot
9
.
0
0
0
-
0
.
0
3
3
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
0
.
1
4
5
a
g
e
5
0
.
0
0
0
-
4
.
8
d
x
_t
y
2
.
0
C
o
n
t
r
i
b
u
ti
o
n
?
F
e
a
t
u
r
e
V
a
l
u
e
+
0
.
2
0
7
g
e
n
d
er
_
m
al
e
0
.
0
0
0
-
0
.
0
6
7
lo
c
al
iz
at
io
n
_
o
n
e
h
ot
9
.
0
0
0
-
1
.
1
8
2
d
x
_t
y
p
e
_
o
n
e
h
ot
2
.
0
0
0
-
1
.
5
a
g
e
5
0
.
0
1
8
p
e
_
o
n
e
h
ot
0
0
3
8
p
e
_
o
n
e
h
ot
0
0
0
6
p
e
_
o
n
e
h
ot
0
0
h
ot
-
2
.
2
4
6
<
B
I
A
S
>
1
.
0
0
0
0
6
io
n
_
o
n
e
h
ot
0
0
4
7
p
e
_
o
n
e
h
ot
0
0
4
9
0
0
-
1
.
7
8
5
<
B
I
A
S
>
1
.
0
0
0
PDP :
The partial dependence plot shows the marginal effect one or two features have on the
predicted outcome of a machine learning model.
A partial dependence plot can show whether the relationship between the target and a
feature is linear, monotonic or more complex.
For each of the categories, we get a PDP estimate by forcing all data instances to have the
same category.
pip install pdpbox
from pdpbox import pdp, get_dataset, info_plots
pdp_feat_67_rf = pdp.pdp_isolate(model=model,
dataset=X_train,
model_features=features,
feature='dx_type_onehot')
fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_feat_67_rf,
feature_name='type of diagnosis',
center=True,
x_quantile=True,
ncols=3,
plot_lines=True,
frac_to_plot=100
The PDP (Partial Dependence Plot) shows us the relation between an increase/decrease of one feature to the prediction of
the model.
For example: In figure 1 (class 0), we observe that the chances of the skin disease belonging to class 0 increases when the
value of dx_type_onehot changes from 2 (follow up) to 3 (histopathology).
Similarly, in figure 5 (class 4), we observe that the probability of the skin disease belonging to class 4 is extremely high when
the value of dx_type_onehot lies between 0 and 2, and decreases comparatively when it lies between 2 and 3.
Similarly, probability of the skin disease belonging to class 6 is extremely low when the value of dx_type_onehot lies
between 0 and 2 (confocal, consensus and follow up), and increases comparatively when it changes from 2 to 3.
LIME
Step 1: Generate random perturbations for input image
Step 2: Predict class for perturbations
Step 3: Compute weights (importance) for the perturbations
Step 4: Fit a explainable linear model using the perturbations, predictions and weights
import skimage.io
import skimage.segmentation
np.random.seed(222)
Xi = x_test[3]
preds = model.predict(Xi[np.newaxis,:,:,:])
top_pred_classes = preds[0].argsort()[-5:][::-1] # Save ids of top 5 classes
top_pred_classes
print(y_test[3])
skimage.io.imshow(Xi)
#Generate segmentation for image
superpixels = skimage.segmentation.quickshift(Xi, kernel_size=4,max_dist=200, ratio=0.2)
num_superpixels = np.unique(superpixels).shape[0]
skimage.io.imshow(skimage.segmentation.mark_boundaries(Xi, superpixels))
print("The number of super pixels generated")
num_superpixels
LIME is a technique that explains how the input features of a machine learning model
affect its predictions. For instance, for image classification tasks, LIME finds the region of
an image (set of super-pixels) with the strongest association with a prediction label.
LIME creates explanations by generating a new dataset of random perturbations (with
their respective predictions) around the instance being explained and then fitting a
weighted local surrogate model - model that gives explanation of individual predictions.
#Generate perturbations
num_perturb = 150
perturbations = np.random.binomial(1, 0.5, size=(num_perturb, num_superpixels))
#Create function to apply perturbations to images
import copy
def perturb_image(img,perturbation,segments):
active_pixels = np.where(perturbation == 1)[0]
mask = np.zeros(segments.shape)
for active in active_pixels:
mask[segments == active] = 1
perturbed_image = copy.deepcopy(img)
perturbed_image = perturbed_image*mask[:,:,np.newaxis]
return perturbed_image
#Show example of perturbations
print(perturbations[0])
predictions = []
for pert in perturbations:
perturbed_img = perturb_image(Xi,pert,superpixels)
pred = model.predict(perturbed_img[np.newaxis,:,:,:])
predictions.append(pred)
predictions = np.array(predictions)
print(predictions.shape)
skimage.io.imshow(perturb_image(Xi,perturbations[0],superpixels))
skimage.io.imshow(perturb_image(Xi,perturbations[11],superpixels))
skimage.io.imshow(perturb_image(Xi,perturbations[2],superpixels))
#Compute distances to original image
import sklearn.metrics
original_image = np.ones(num_superpixels)[np.newaxis,:] #Perturbation with all superpixels enabled
distances = sklearn.metrics.pairwise_distances(perturbations,original_image, metric='cosine').ravel()
print(distances.shape)
#Transform distances to a value between 0 an 1 (weights) using a kernel function
kernel_width = 0.25
weights = np.sqrt(np.exp(-(distances**2)/kernel_width**2)) #Kernel function
print(weights.shape)
#Estimate linear model
from sklearn.linear_model import LinearRegression
class_to_explain = 4
simpler_model = LinearRegression()
simpler_model.fit(X=perturbations, y=predictions[:,:,class_to_explain],
sample_weight=weights)
coeff = simpler_model.coef_[0]
#Use coefficients from linear model to extract top features
num_top_features = 4
top_features = np.argsort(coeff)[-num_top_features:]
#Show only the superpixels corresponding to the top features
mask = np.zeros(num_superpixels)
mask[top_features]= True #Activate top superpixels
skimage.io.imshow(perturb_image(Xi,mask,superpixels))
Conclusion
This paper is focused on various techniques for classification of skin diseases. Automating
the process of skin disease identification and classification can be very helpful and takes less time for
diagnosis as well. This paper presents the survey of traditional or feature extraction based and CNN
based approach for skin disease classification. From the study it is concluded that for traditional
approach the feature selection process is time consuming also selection of relevant feature is very
important. Whereas, the deep learning algorithm CNN learns the features automatically and
efficiently, for feature extraction CNN selects the filters intelligently as compared with manual ones.
The pre-trained models like Inception v3, resnet, VGG16, VGG19, Alexner etc are trained on very
large dataset with millions of general images and can be used with transfer learning or fine tuning.
However, the pre-trained model has to be trained from scratch if it is not being trained with skin
disease images before. Also, the CNN needs quite big dataset for training so it can learn effectively as
compare to the traditional way of skin disease classification.
References
[1] D.A. Okuboyejo, O.O. Olugbara and S.A. Odunaike, “Automating skin disease diagnosis using image classification,” In proceedings of
the world congress on engineering and computer science 2013 Oct 23, Vol. 2, pp. 850-854.
[2] A.A. Amarathunga, E.P. Ellawala, G.N. Abeysekar and C.R Amalraj, “Expert system for diagnosis of skin diseases,” International Journal
of Scientific & Technology Research, 2015 Jan 4;4(01):174-8.
[3] S. Chakraborty, K. Mali, S. Chatterjee, S. Anand, A. Basu, S. Banerjee, M. Das and A. Bhattacharya, “Image based skin disease detection
using hybrid neural network coupled bag-of-features,” In 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile
Communication Conference (UEMCON), 2017 Oct 19, pp. 242-246. IEEE.
[4] A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau and S. Thrun, “Dermatologist-level classification of skin cancer with deep
neural networks,” Nature, 2017 Feb;542(7639):115-8.
[5] X. Zhang, S. Wang, J. Liu and C. Tao, “Towards improving diagnosis of skin diseases by combining deep neural network and human
knowledge,” BMC medical informatics and decision making, 2018 Jul;18(2):59.
[6] T.J. Brinker, A. Hekler, J.S. Utikal, N. Grabe, D. Schadendorf, J. Klode, C. Berking, T. Steeb, A.H. Enk and C. von Kalle, “Skin cancer
classification using convolutional neural networks: systematic review,” Journal of medical Internet research, 2018;20(10):e11936.
[7] R. Kulhalli, C. Savadikar and B. Garware, “A Hierarchical Approach to Skin Lesion Classification,” In Proceedings of the ACM India Joint
International Conference on Data Science and Management of Data 2019 Jan 3 (pp. 245-250).
[8] M.A. Khan, M.Y. Javed, M. Sharif, T. Saba and A. Rehman, “Multimodel deep neural network based features extraction and optimal
selection approach for skin lesion classification,” In 2019 international conference on computer and information sciences (ICCIS) 2019
Apr 3 (pp. 1-7) IEEE.
[9] J. Premaladha, S. Sujitha, M.L. Priya and K.S. Ravichandran, “A survey on melanoma diagnosis using image processing and soft
computing techniques,” Research Journal of Information Technology, 2014 May;6(2):65-80.
[10] S. Chatterjee, D. Dey, S. Munshi and S. Gorai, “Extraction of features from cross correlation in space and frequency domains for
classification of skin lesions,” Biomedical Signal Processing and Control, 2019 Aug 1,53:101581.
[11] M.S. Manerkar, U. Snekhalatha, S. Harsh, J. Saxena, S.P. Sarma and M. Anburajan, “Automated skin disease segmentation and
classification using multi-class SVM classifier”. 2016.
[12] N. Codella, J. Cai, M. Abedini, R. Garnavi, A. Halpern and J.R. Smith, “Deep learning, sparse coding, and SVM for melanoma recognition
in dermoscopy images,” In International workshop on machine learning in medical imaging, 2015 Oct 5 (pp. 118-126), Springer, Cham.
[13] P.M. Burlina, N.J. Joshi, E. Ng, S.D. Billings, A.W. Rebman and J.N. Aucott, “Automated detection of erythema migrans and other
confounding skin lesions via deep learning,” Computers in biology and medicine, 2019 Feb 1, 105:151-6.
[14] I. Zaqout, “Diagnosis of skin lesions based on dermoscopic images using image processing TECHNIQUES,” In Pattern
RecognitionSelected Methods and Applications, 2019 Jul 15, IntechOpen.
[15] V.B. Kumar, S.S. Kumar and V. Saboo, “Dermatological disease detection using image processing and machine learning,” In 2016 Third
International Conference on Artificial Intelligence and Pattern Recognition, (AIPR) 2016 Sep 19 (pp. 1-6). IEEE.
[16] E. Jana, R. Subban and S. Saraswathi, “Research on Skin Cancer Cell Detection using Image Processing,” In 2017 IEEE International
Conference on Computational Intelligence and Computing Research (ICCIC), 2017 Dec 14, (pp. 1-8), IEEE.
[17] M. Monisha, A. Suresh and M.R. Rashmi, “Artificial intelligence based skin classification using GMM,” Journal of medical systems, 2019
Jan 1, 43(1):3.
[18] N.C. Codella NC, D. Gutman, M.E. Celebi, B. Helba, M.A. Marchetti, S.W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler and A. Halpern,
“Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi),
hosted by the international skin imaging collaboration (isic),” In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI
2018), 2018 Apr 4, (pp. 168-172), IEEE.
>@ https://www.cancercenter.com/cancer-types/melanoma/symptoms -
>@ https://www.mayoclinic.org/diseases-conditions
>@ https://www.isic-archive.com
>@ https://sites.google.com/site/robustmelanomascreening/dataset
>@ https://www.dropbox.com/s/k88qukc20ljnbuo/PH2Dataset.rar [30]http://www.cs.rug.nl/~imaging/databases/melanoma_naevi/
[31]https://www.derm101.com/image-library/?match=IN
>@ N. Yadav, V.K. Narang and U. Shrivastava, “Skin diseases detection models using image processing: A survey,” International Journal of
Computer Applications, 2016 Mar, 137(12):34-9.
>@ N. Gessert, T. Sentker, F. Madesta, R. Schmitz, H. Kniep, I. Baltruschat,
5 Werner and A. Schlaefer, “Skin Lesion Classification Using CNNswith Patch-Based Attention and Diagnosis-Guided Loss Weighting,”
IEEE Transactions on Biomedical Engineering, 2019 May
9.
>@ https://www.biospectrumindia.com/news/73/8437/skin-diseases-togrow-in-india-by-2015-report.html
>@ https://www.who.int/uv/faq/skincancer/en/index1.html
>@ https://towardsdatascience.com
>@ www.analyticsvidhya.com

More Related Content

Similar to Vishnu Vardhan Project.pdf

Smartphone app for Skin Cancer Diagnostics
Smartphone app for Skin Cancer Diagnostics Smartphone app for Skin Cancer Diagnostics
Smartphone app for Skin Cancer Diagnostics Iowa State University
 
Common Skin Disease Diagnosis and Prediction: A Review
Common Skin Disease Diagnosis and Prediction: A ReviewCommon Skin Disease Diagnosis and Prediction: A Review
Common Skin Disease Diagnosis and Prediction: A ReviewIRJET Journal
 
Computer Vision in Health Care (1).pptx
Computer Vision in Health Care (1).pptxComputer Vision in Health Care (1).pptx
Computer Vision in Health Care (1).pptxAishwaryaKulal1
 
A review of human skin detection applications based on image processing
A review of human skin detection applications based on image processingA review of human skin detection applications based on image processing
A review of human skin detection applications based on image processingjournalBEEI
 
Skin Cancer Detection Application
Skin Cancer Detection ApplicationSkin Cancer Detection Application
Skin Cancer Detection ApplicationIRJET Journal
 
Detection of Skin Cancer Based on Skin Lesion Images UsingDeep Learning
Detection of Skin Cancer Based on Skin Lesion Images UsingDeep LearningDetection of Skin Cancer Based on Skin Lesion Images UsingDeep Learning
Detection of Skin Cancer Based on Skin Lesion Images UsingDeep LearningIRJET Journal
 
Skin Cancer Detection using Digital Image Processing and Implementation using...
Skin Cancer Detection using Digital Image Processing and Implementation using...Skin Cancer Detection using Digital Image Processing and Implementation using...
Skin Cancer Detection using Digital Image Processing and Implementation using...ijtsrd
 
SKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNING
SKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNINGSKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNING
SKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNINGIRJET Journal
 
Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...
Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...
Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...CSCJournals
 
SKin lesion detection using ml approach.pptx
SKin lesion detection using ml approach.pptxSKin lesion detection using ml approach.pptx
SKin lesion detection using ml approach.pptxPrachiPancholi5
 
IRJET- Skin Cancer Detection using Digital Image Processing
IRJET- Skin Cancer Detection using Digital Image ProcessingIRJET- Skin Cancer Detection using Digital Image Processing
IRJET- Skin Cancer Detection using Digital Image ProcessingIRJET Journal
 
DETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNING
DETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNINGDETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNING
DETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNINGIRJET Journal
 
IRJET- Cancer Detection Techniques - A Review
IRJET- Cancer Detection Techniques - A ReviewIRJET- Cancer Detection Techniques - A Review
IRJET- Cancer Detection Techniques - A ReviewIRJET Journal
 
A Review of Super Resolution and Tumor Detection Techniques in Medical Imaging
A Review of Super Resolution and Tumor Detection Techniques in Medical ImagingA Review of Super Resolution and Tumor Detection Techniques in Medical Imaging
A Review of Super Resolution and Tumor Detection Techniques in Medical Imagingijtsrd
 
Skin Cancer Detection and Classification
Skin Cancer Detection and ClassificationSkin Cancer Detection and Classification
Skin Cancer Detection and ClassificationDr. Amarjeet Singh
 
An Innovative Approach for Automated Skin Disease Identification through Adva...
An Innovative Approach for Automated Skin Disease Identification through Adva...An Innovative Approach for Automated Skin Disease Identification through Adva...
An Innovative Approach for Automated Skin Disease Identification through Adva...IRJET Journal
 
Melanoma Skin Cancer Detection using Deep Learning
Melanoma Skin Cancer Detection using Deep LearningMelanoma Skin Cancer Detection using Deep Learning
Melanoma Skin Cancer Detection using Deep LearningIRJET Journal
 
A Survey On Melanoma: Skin Cancer Through Computerized Diagnosis
A Survey On Melanoma: Skin Cancer Through Computerized DiagnosisA Survey On Melanoma: Skin Cancer Through Computerized Diagnosis
A Survey On Melanoma: Skin Cancer Through Computerized DiagnosisChristo Ananth
 
Comparing the performance of linear regression versus deep learning on detect...
Comparing the performance of linear regression versus deep learning on detect...Comparing the performance of linear regression versus deep learning on detect...
Comparing the performance of linear regression versus deep learning on detect...journalBEEI
 

Similar to Vishnu Vardhan Project.pdf (20)

Smartphone app for Skin Cancer Diagnostics
Smartphone app for Skin Cancer Diagnostics Smartphone app for Skin Cancer Diagnostics
Smartphone app for Skin Cancer Diagnostics
 
Skin Cancer Diagnostics.pdf
Skin Cancer Diagnostics.pdfSkin Cancer Diagnostics.pdf
Skin Cancer Diagnostics.pdf
 
Common Skin Disease Diagnosis and Prediction: A Review
Common Skin Disease Diagnosis and Prediction: A ReviewCommon Skin Disease Diagnosis and Prediction: A Review
Common Skin Disease Diagnosis and Prediction: A Review
 
Computer Vision in Health Care (1).pptx
Computer Vision in Health Care (1).pptxComputer Vision in Health Care (1).pptx
Computer Vision in Health Care (1).pptx
 
A review of human skin detection applications based on image processing
A review of human skin detection applications based on image processingA review of human skin detection applications based on image processing
A review of human skin detection applications based on image processing
 
Skin Cancer Detection Application
Skin Cancer Detection ApplicationSkin Cancer Detection Application
Skin Cancer Detection Application
 
Detection of Skin Cancer Based on Skin Lesion Images UsingDeep Learning
Detection of Skin Cancer Based on Skin Lesion Images UsingDeep LearningDetection of Skin Cancer Based on Skin Lesion Images UsingDeep Learning
Detection of Skin Cancer Based on Skin Lesion Images UsingDeep Learning
 
Skin Cancer Detection using Digital Image Processing and Implementation using...
Skin Cancer Detection using Digital Image Processing and Implementation using...Skin Cancer Detection using Digital Image Processing and Implementation using...
Skin Cancer Detection using Digital Image Processing and Implementation using...
 
SKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNING
SKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNINGSKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNING
SKIN CANCER DETECTION AND SEVERITY PREDICTION USING DEEP LEARNING
 
Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...
Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...
Diagnosis of Burn Images using Template Matching, k-Nearest Neighbor and Arti...
 
SKin lesion detection using ml approach.pptx
SKin lesion detection using ml approach.pptxSKin lesion detection using ml approach.pptx
SKin lesion detection using ml approach.pptx
 
IRJET- Skin Cancer Detection using Digital Image Processing
IRJET- Skin Cancer Detection using Digital Image ProcessingIRJET- Skin Cancer Detection using Digital Image Processing
IRJET- Skin Cancer Detection using Digital Image Processing
 
DETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNING
DETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNINGDETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNING
DETECTION AND CLASSIFICATION OF SKIN DISEASE USING DEEP LEARNING
 
IRJET- Cancer Detection Techniques - A Review
IRJET- Cancer Detection Techniques - A ReviewIRJET- Cancer Detection Techniques - A Review
IRJET- Cancer Detection Techniques - A Review
 
A Review of Super Resolution and Tumor Detection Techniques in Medical Imaging
A Review of Super Resolution and Tumor Detection Techniques in Medical ImagingA Review of Super Resolution and Tumor Detection Techniques in Medical Imaging
A Review of Super Resolution and Tumor Detection Techniques in Medical Imaging
 
Skin Cancer Detection and Classification
Skin Cancer Detection and ClassificationSkin Cancer Detection and Classification
Skin Cancer Detection and Classification
 
An Innovative Approach for Automated Skin Disease Identification through Adva...
An Innovative Approach for Automated Skin Disease Identification through Adva...An Innovative Approach for Automated Skin Disease Identification through Adva...
An Innovative Approach for Automated Skin Disease Identification through Adva...
 
Melanoma Skin Cancer Detection using Deep Learning
Melanoma Skin Cancer Detection using Deep LearningMelanoma Skin Cancer Detection using Deep Learning
Melanoma Skin Cancer Detection using Deep Learning
 
A Survey On Melanoma: Skin Cancer Through Computerized Diagnosis
A Survey On Melanoma: Skin Cancer Through Computerized DiagnosisA Survey On Melanoma: Skin Cancer Through Computerized Diagnosis
A Survey On Melanoma: Skin Cancer Through Computerized Diagnosis
 
Comparing the performance of linear regression versus deep learning on detect...
Comparing the performance of linear regression versus deep learning on detect...Comparing the performance of linear regression versus deep learning on detect...
Comparing the performance of linear regression versus deep learning on detect...
 

Recently uploaded

Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510Vipesco
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Anamika Rawat
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadGENUINE ESCORT AGENCY
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...khalifaescort01
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...tanya dube
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Anamika Rawat
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Availableperfect solution
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...chandars293
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls ServiceGENUINE ESCORT AGENCY
 
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Mumbai Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Mumbai Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Ishani Gupta
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...parulsinha
 
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableJanvi Singh
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappInaaya Sharma
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...khalifaescort01
 

Recently uploaded (20)

Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
 
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
 
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Mumbai Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Mumbai Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 8250077686 Top Class Call Girl Service Available
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 

Vishnu Vardhan Project.pdf

  • 1. EXCEL ENGINEERING COLLEGE (AUTONOMOUS) A Project Report On “Skin Disease Classification from Image” SUBMITTED BY Sura Vishnu Vardhan Reddy (Regd.No:730921106108) LinkedIn Id: https://www.linkedin.com/in/sura-vishnu-vardhan-reddy-b68729269 GitHub Id: https://github.com/vishnu6643 SUBMITTED TO Pegasus Aerospace System Erode, Tamil Nadu, Pin – 638002
  • 2. All content following this page was uploaded by Sura Vishnu Vardhan Reddy on 25 April 2023. The user has requested enhancement of the downloaded file.
  • 3. Skin Disease Classification from Image Abstract Skin diseases are one of the most common types of health illnesses faced by the people for ages. The identification of skin disease mostly relies on the expertise of the doctors and skin biopsy results, which is a time-consuming process. An automated computer-based system for skin disease identification and classification through images is needed to improve the diagnostic accuracy as well as to handle the scarcity of human experts. Classification of skin disease from an image is a crucial task and highly depends on the features of the diseases considered in order to classify it correctly. Many skin diseases have highly similar visual characteristics, which add more challenges to the selection of useful features from the image. The accurate analysis of such diseases from the image would improve the diagnosis, accelerates the diagnostic time and leads to better and cost-effective treatment for patients. This paper presents the survey of different methods and techniques for skin disease classification namely; traditional or handcrafted feature-based as well as deep learning-based techniques. Keywords— Skin diseases, lesions, classification, deep learning, CNN, SVM. I – Introduction The largest organ of human body is “Skin”, an adult carry around 3.6 kg and 2 square meters of it. Skin acts as a waterproof, insulating shield, guarding the body against extremes of temperature, damaging UV lights, and harmful chemicals. With the rate of 10-12%, the population affected across India from skin disease is estimated at nearly 15.1 Crore in 2013 and which increases to 18.8 crores by 2015[38]. According to statistics provided by the World Health Organization [39] around 13 million melanoma skin cancer occurs globally each year, which shows skin diseases are growing very rapidly. There are many factors responsible for a disease to occur such as UV lights, pollution, poor immunity, and an unhealthy lifestyle. There are two major categories in which the lesions (spot) of skin disease are classified; benign and malignant skin lesions. Most of the skin lesions are benign in nature which is gentle and non- dangerous, whereas those which are dangerous for patient’s health and evil in nature are malignant skin lesions such as melanoma skin cancer. Diagnosis of skin disease from an image is a challenging problem as there exist many skin diseases. Researchers reported following problems during skin disease classification: 1) A disease may have many lesion types. 2) Many diseases may have a similar visual characteristic, which is often confusing for the dermatologist as well to identify the disease by visual inspection. 3) The varying skin colors and skin type (age) introduce more difficulty in computer-based diagnosis. Therefore, relevant feature selection for such diseases is very important in computer-based diagnosis in order to identify it correctly. The success of an automatic system rely on how accurately the system performs and does need image processing as well as machine learning tasks.
  • 4. There are many technologies available in the medical science for diagnosis of skin diseases. But the computer based automatic diagnosis is quite more useful for medical decision support and makes the entire process fast. For example, if such automated system is implemented in the healthcare centres, then patient does not have to suffer unnecessarily due to unavailability of experts. Further, it is non-invasive method of diagnosis therefore it is not painful. As per 2015 statistics of India [38], for approximately 121 crore of population there are about 6,000 dermatologists providing services in India. This means that for every 100,000 people, only 0.49 dermatologists are available in India as compared to 3.2 in many states of the US [38]. Due to recent advances in the technology large amount of medical data is produced daily and these data contains valuable and crucial information about the patients. The image based artificial intelligence is becoming more popular for certain diseases especially skin diseases. The diagnostic accuracy for computer-based system highly relies on the selection of relevant feature, classifier used and the availability of dataset as well as number of images on which the model has been trained. Now a day’s for pattern recognition and classification tasks the Convolution Neural Networks (CNN) are highly used. For better understanding of various works done by the researchers, we carry out a survey on different approaches used for the classification of the skin diseases. This paper is divided into four sections. Section II presents the background knowledge; type of images, and usage of traditional and deep learning based approaches for skin disease classification. Section III presents a survey on traditional or feature extraction based methods as well as CNN based approaches for skin disease identification and classification. Section IV presents the analysis and findings of traditional and CNN based methods and finally, Section V presents the conclusion
  • 5. II - BACKGROUND KNOWLEDGE This section is divided into three parts: Skin disease Image type, general process for skin disease classification using traditional techniques and using deep learning-based techniques. A. Clinical and Dermoscopic Images A clinical image is said to be the image of the patient's affected body part- such as an injury, skin lesion or it can be diagnostic image. The image is captured with normal or digital camera. This type of image may have different lightening, resolution and different angle depend on the type of camera used for capturing the image. For computer aided diagnosis, dermoscopic images are more useful. These images are produced using dermoscope [16], which is an instrument used by dermatologist to analyse the skin lesions. The dermoscope usually has uniform illumination and more contrast. As the device has bright illumination, the lesions are clear enough for visualization and recognition. Furthermore, processing of dermoscopic images become easy because the images have less noise. Fig.1 (a) illustrates the way to capture dermoscopic image, (b) presents the dermoscopic image and (c) shows the clinical image. (a) (d) (c) B. Skin Disease Classification using Traditional Approach; In the traditional approach, the handcrafted features are fed into the conventional classifier. Fig. 2 shows the general process of skin disease classification using the traditional approach. 1) Input Image Input Image Skin disease image databases for many diseases are available freely. However, some are fully or partially open source and others are commercially available. The input image can be of type dermoscopic or clinical based on the dataset used. Table I contains the information about the availability and details of various datasets. The widely used datasets are mentioned in the table. 2) Image pre-processing Image pre-processing is an important step and it is required because an image may contain many noises such as dermoscopic gel, air bubbles, and hairs.
  • 6. However, clinical images require more pre-processing as compared to dermoscopic because of parameters such as resolution, lightening condition, illumination, angle of image captured, size of skin area covered may vary and depends on the person who is capturing the image. These captured images could create problems in the subsequent stages. The skin hairs can be removed using different filters such as; median, average or Gaussian filter, morphological operations such as erosion and dilation, binary thresholding and software such as Dull Razor. For low contrast images; lesion or contrast enhancement algorithms are useful. The contrast enhancement with histogram equalization provides better visualization by uniform distribution of pixel intensity across the image and it is one of the most used techniques in literature. For salt and pepper kind of noise; a median or mean filter can give better noise removal results. Dataset Image No. Images Classes Open source Derm Net NZ image library. Clinical 20000+ - Partially Dermofit Image Library. Dermoscopic 1300 10 Yes ISBI-2016. Dermoscopic 1279 2 Yes ISBI – 2017 Dermoscopic 2750 2 Yes Ham10000 Dermoscopic 10015 7 Yes Stanford Hospital Clinical - - No Pecking Union medical college clinical database Dermoscopic 28000 - No IRMA Dataset Dermoscopic 747 2 Not Available PH2 Dermoscopic 200 2 Yes MED-NODE Clinical 170 2 Yes DermQuest Clinical 22500 - Yes Hospital Pedro Hispano, Matosinhos Dermoscopic 200 3 No SD-198 Clinical and Dermoscopic 6584 198 Yes
  • 7. 3) Image Segmentation Image segmentation extracts the disease affected area from the normal skin and can play very important role in skin disease detection [16]. Image segmentation can be carried out by three ways: 1) pixel-based, 2) edge-based, and 3) region-based segmentation. In pixel-based segmentation, each pixel of an image is identified to be the part of a homogeneous region or to an object. This can be done using binary thresholding or variant of it. The edge-based method detects and links edge pixels to form the bounding shape of the skin lesions. For example, Robert, Prewitt, Sobel and Canny operators, adaptive snake or gradient vector flow can be used. The Region-based methods rely on similar patterns in the intensity values within the neighborhood pixels and are based on continuity. The examples are region growing, merging and splitting, and Watershed algorithm. 4) Feature Extraction The most prominent features which are used to describe and identify skin diseases visually are its color and texture information. The color information plays an important role to distinguish one disease from another. These color features can be extracted using various techniques such as color histograms, color correlograms, color descriptors, GLCM. The texture information conveys the complex visual patterns of the skin lesions and spatially organized entities such as brightness, color, shape, and size. Image texture is basically a function of variation in pixel intensity. GLCM, local binary pattern, SIFT are some techniques used by researchers to get the texture information from the image. In addition to color and texture, each lesion may have different shapes and sizes based on the type of the disease and its severity. 5) Classification Classification is a supervised learning approach for machine learning task. It requires labelled dataset to map the data into specific groups or classes. There are various classification algorithms used to classify the skin disease images such as support vector machine, feed forward neural network, back propagation neural network, k-nearest neighbour, decision trees, etc.
  • 8. Deep Learning is a part of machine learning algorithm inspired by the structure and function of human brain commonly known as neural networks. Convolution Neural Networks (CNN) is a class of deep learning algorithm which is mostly used for analysing the visual contents such as images and videos. With the development of CNN, there has been dramatic improvement observed to solve many classification-based problems in medical image analysis... The basic process for CNN based skin disease image classification is presented. CNN based approach of skin disease Classification The process starts with data acquisition. Input to the CNN can be dermoscopic or clinical image, which can be pre-processed if needed; the next step is data augmentation. This results in enough training samples to train the model. Finally, the data is fed into the CNN which performs feature extraction and classification by its own. A CNN typically consists of convolution layer in which numbers of filters perform convolution operation on the image and generates feature maps. These feature maps are further down sampled by pooling layers. Finally, the fully connected layer has all the connection from previous layer and does the classification accordingly. Many researchers have used CNN for skin disease classification via transfer learning or fine-tuning of pre-trained models like Inception v3, ResNet, VGG architecture and many more. In transfer learning only weights are optimized if new classification layers have to be added. However, the weights of the original model remain as it is. In fine-tuning the parameters of a trained model must be altered very carefully while trying to validate that model for a dataset with a smaller number of images which does not belong to the train set. Moreover, we need to keep track of the hyper parameters of CNN otherwise the model may have problem of over-fitting. Over-fitting means model learned too well, i.e., it also learns irrelevant information and noise as well which may result in good training accuracy but poor testing accuracy. III. SURVEY OF LITERATURE This section presents a survey on both traditional and deep learning-based skin disease identification and classification approaches. Table II and III analyses all major works for both the aforementioned techniques; traditional/handcrafted feature-based techniques and deep learning techniques for classification of skin disease from images. C. Skin Disease Classification using Deep Learning based Approach
  • 9. Amarathunga et al. It have come up with expert system limited to classify three diseases. The system consists of two separate units namely; data processing and Image processing unit. The data processing unit was responsible for image acquisition, pre-processing for noise removal, segmentation and feature extraction from the skin disease images whereas data processing unit was employed for data mining task or classification. Five classification algorithms were tested by the authors namely; AdaBoost, BayesNet, J48, MLP and NaiveBayes. Out of these five the MLP classifier gave better results as compared to other classifiers. However, the data source of images and attributes considered for disease classification is not mentioned. Chakraborty et al. [3] have proposed a hybrid model using multi objective optimization algorithm NSGA-II and ANN for diagnosis of skin lesion being benign or malignant. The bagof-features approach is applied to classify the skin lesions and are generated using SIFT. SIFT algorithm identifies and locates the key points from the input image and generates the feature vector. Also, to handle large number of keypoints kmeans clustering algorithm was used to get representative keypoints where each cluster contains some representative keypoints and these are the generated bag-of- features. These features are then fed to the hybrid classifier where NSGA-II is used to train the ANN. Authors [3] also compared the model’s accuracy with ANN-PSO (ANN trained with particle swarm optimization) and ANN-CS (ANN trained with cukoo search.) The spatial and frequency domain-based technique is used by Chatterjee et al. It is for identification of skin lesion being benign or malignant. The malignant lesions are further classified into subcategories namely; melanocytic or epidermal skin lesions. The cross-correlation technique is used to extract regional features which are invariant to light intensity and illumination changes. Also, the cross spectrum-based frequency domain analysis has been used for retrieving more detailed features of skin lesions. For classification the SVM classifier was used with three non-linear kernels [10] out of which SVM with RBF kernel gave promising accuracy as compared to other kernels. A. Survey on Traditional Techniques for Skin Disease Image Classification
  • 10. TABLE II. SURVEY OF TRADITIONAL TECHNIQUES FOR SKIN DISEASE CLASSIFICATION References Disease Image Type No. of images Pre processing Segmentation Feature Extractions Classifiers Performance Measure Amarathunga Eczema, Impetigo, melanoma Clinical - Y Thresholding - MLP Accuracy: 90% Chakraborty BCC, SA Dermoscopic - - Thresholding SIFT NN-NSGA-II Accuracy: 90.56% Precision: 88.26% Recall:93.64% F-measure: 90.87% Manerkar . Warts, Benign & Malignant Skin cancer Clinical 45 Y C-means Clustering and watershed algorithm GLCM and IQA SVM Accuracy: 9698% Zaqout . Benign, Malignant or suspecious lesions Dermoscopic 200 Y Thresholding ABCD rule implementation using entropy, bifold,color and diameter TDS Accuracy: 90% Sensitivity: 85% Specificity: 92.22% Chatterjee Melanoma,Nevus, BCC,SK Dermoscopic 6,838 - - Crosscorrelation,cross spectrum SVM Accuracy: 98.79% Sensitivity: 99.01% Specificity: 95.35% Arifin Acne, eczema, psoriasis, Tinea, vitilogo, scabies Clinical 704 Y Thresholding GLCM feedforward backpropagation ANN Accuracy: 94.04% Monisha BCC, SA, Lentigo simplex Dermoscopic - Y GMM GLCM, DRLBP & GRLTP NSGA-II-PNN - a. Disease: SK - Seborrheic keratoses, BCC - Basal Cell Carcinoma, SA-Skin Angioma, classifier: TDS (Total Dermoscopic score = Asymmetry* 1.3 + Border-Irregularity*0.1 + color *0.5 + diameter*0.5), NSGA-II - Nondominated Sorting Genetic Algorithm, feature Extraction: GMM- Gradient Mixture Model, GLCM-Grey level co-occurrence matrix, IQA-Image Quality Assessment, PNN- probabilistic Neural Network
  • 11. Esteva et al. were first to report about how the image classifier convolutional neural netwok (CNN) can achieve the performance similar to the 21 board-certified dermatologists for identification of malignant lesions. The 3-way disease partition algorithm was designed to classify a given skin lesion to be malignant, benign or non-neoplastic. Also, 9-way disease partition was performed to classify a given lesion into one of the 9 mentioned categories. The state-of-the art InceptionV3 CNN architecture was used for skin lesion classification has concluded that the CNN can outperform human experts if it is trained with enough data. Also, has concluded that the CNN can outperform human experts if it is trained with enough data. Zhang et al. It also used InceptionV3 architecture with modified final layer to classify 4 diseases. The model was trained on two nearly similar datasets of dermoscopic images. Authors concluded that misclassification can occur due to presence of multiple disease lesions on the single skin image. Sun et al. It has proposed handcrafted feature based as well as CNN based approaches for classification of clinical images. They trained four CNN architectures namely; Caffenet, fine-tuned Caffenet, VGG and fine-tuned VGGNet. Out of these four the fine-tuned VGGNet gave quite good accuracy. The accuracy of VGGNet was similar to that of the handcrafted feature which was generated by 7 different methods namely; SIFT, HOG, LBP, and color histogram with SVM classifier. However, the architectures and use of benchmark dataset plays an important role for skin disease image classification to achieve good accuracy. Gessert et al. introduced patch-based method to obtain fine-grain differences between various skin lesions from high resolution images. The high resolution images are divided into 5, 9, and 16 crops or patches and these images patches or crops are fed to the standard CNN architectures. Three architectures were used by the authors namely; Inception v3, DenseNet and SE-Resnext50 architecture for prediction of disease from high resolution image patch. Rehman et al. It has proposed CNN architecture by setting 16 different filters of 7*7 kernel size with pooling layers for down sampling. The proposed model was trained for malignant and benign category of diseases namely; melanoma, Seborrheic keratosis and nevus. The RGB channels of the segmented image are normalized with zero mean and unit variance. This normalized matrix was fed to CNN for feature extraction, further the fully connected layer consists of 3-layer ANN classifier which classify the skin lesion being banign or malignant. Kulhalli et al. It has proposed a 5-stage, 3-stage and 2stage hierarchical approach to classify 7 diseases using InceptionV3 CNN architecture. The authors have addressed the class imbalance problem by using image augmentation technique in order to balance the category classes. The 5- stage classifier gave better result as compared to 2 and 3-stage hierarchical classifiers. Further, the authors suggested that the model can be further fine-tuned and ensemble-based methods might help in order to improve the classification performance. B. Survey on deep learning-based approach for Skin Disease Image Classification
  • 12. IV. ANALYSIS & FINDINGS Both traditional, as well as CNN based approaches are useful for the classification of skin diseases. The traditional methods require appropriate feature extraction as well as segmentation method for skin diseases. Further, it is important to identify the relevant features and discard irrelevant features as the classification often depends on features selected. Therefore, if irrelevant features got selected then it may lead to misclassification. However, contrary to CNN traditional approach does not require a large size dataset. CNN can learn the features of the skin diseases automatically. It selects the filters intelligently as compared to the traditional or manual way of selecting filters in traditional approach to extract the relevant features from the images. Therefore, no feature extraction method is needed in CNN based approach. However, pretrained models can be used to classify the skin diseases but these models are heavy in terms of: 1) number of parameters, 2) number of layers, 3) selection and fine-tuning of the appropriate pre-trained model and 4) The model has to be trained from scratch as it is not been trained for skin disease images. However, CNN can also be designed from scratch. The following criteria are important whenever CNN architecture is designed to classify skin diseases: 1) Dataset: The availability of large dataset is very important as CNN learns much efficiently whenever it’s been trained with enough data. The large dataset of clinical images are available on [31], [32]. For dermoscopic images the large datasets are published by ISIC [27]. 2) Hyperparameters of CNN: The network structure is determined by the hyperparameters. These hyperparameters are supposed to be set before training the CNN. The parameters which define the network structure are number of hidden layers, dropout, kernel size, number of kernels, batch size, number of epochs, activation function, learning rate, etc. 3) Computational Power: The main challenge of training CNN is the availability of computational resources. There are thousands of trainable parameters on CNN; therefore, it is computationally costly as compared to the traditional way of classifying skin disease. To train the CNN GPU availability is a must. Also, the training time is more and it depends on the size of the dataset used to train the model.
  • 13. TABLE III. SURVEY OF DEEP LEARNING BASED SKIN DISEASE CLASSIFICATION Reference Disease classes Image type No. of images Dataset Additional (Preprocessing/ Segmentation) CNN Architecture Performance Measures Sun et al. [24] Wide Variety Clinical 6,584 SD-198 [34] Fine-tuned VGG19 Accuracy: 50.27% 5,619 SD-128[24] Esteva et al. [4] Malignant and Ben ign skin lesions Clinical 129,450 ASIC [27], Edinburgh Dermofit Library[33], Stanford Hospital [4] Inception V3 with PA (partition algorithm) Accuracy: 72.1 ±0.9% Dermoscopic 3,374 Zhang et al. [5] Melanocyticn evus, SK BCC, psoriasis. Dermoscopic 1,067 Dataset A [5] - Inception v3 Dataset A: Accuracy: 87.25 ± 2.24% 522 Dataset B [5] Dataset B: Accuracy:86.63% ± 5.78% Rehman al. [22] et Malignant and Ben ign skin lesions Dermoscopic 379 ASIC-2016 [27] Segmentation using Generalized Gaussian Distribution CNN With Conv : 16 filters of 7*7, pooling layer:16 100*50*5 FC: Accuracy: 98.32% Sensitivity: 98.15% Specificity: 98.41% Brinker al. [6] et Melanoma and Nevi Clincal - HAM10000 [27] ResNet50 Mean Sensitivity: 89.4% Dermoscopic 12,378 Mean Specificity: 64.4% ROC: 0.769 Kulhalli al. [7] et Melanoma, Nevi, SK, Akiec, BCC, DF, BKL Dermoscopic 10,015 HAM10000 [27] InceptionV3 Normalized F1 Score : 0.93 Khan et al. [8] Melanoma Vs other Dermoscopic 1,279 ISBI-16[27] Lesion Enhancement ResNet50 ResNet101 and Accuracy: ISBI 2016 : 90.20% 2,790 ISBI-17[27] ISBI 2017: 95.60% 10,000 HAM10000 [27] Ham1000: 89.8%
  • 14. SKIN DISEASE CLASSIFICATION Deep learning to predict the various skin diseases. The main objective of this project is to achieve maximum accuracy of skin disease prediction. Deep learning techniques helps in detection of skin disease at an initial stage. The feature extraction plays a key role in classification of skin diseases. The usage of Deep Learning algorithms reduces the need for human labor, such as manual feature extraction and data reconstruction for classification purpose. Moreover, Explainable AI is used to interpret the decisions made by our model. ABOUT THE DATASET HAM10000 ("Human Against Machine with 10000 training images") dataset - a large collection of multi-sources dermatoscopic images of pigmented lesions. The dermatoscopic images are collected from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images. It has 7 different classes of skin cancer which are listed below: • Melanocytic nevi • Melanoma • Benign keratosis-like lesions • Basal cell carcinoma • Actinic keratoses • Vascular lesions • Dermatofibroma Importing Libraries #Importing required libraries import matplotlib.pyplot as plt from PIL import Image import seaborn as sns import numpy as np import pandas as pd import os from tensorflow.keras.utils import to_categorical from glob import glob ✓ HAM10000_metadata.csv file is the main csv file that includes the data of all training images, the features of which are – 1. Lesion_id 2. Image_id 3. Dx 4. Dx_type 5. Age 6. Sex 7. Localization
  • 15. Reading the Data from the Dataset # Reading the data from HAM_metadata.csv df = pd.read_csv('../input/skin-cancer-mnist-ham10000/HAM10000_metadata.csv' df.head() df.dtypes lesion_id object image_id object dx object dx_type object age float64 sex object localization object dtype: object lesion_id image_id dx dx_type age sex localization 0 HAM_0000118 ISIC_0027419 bkl histo 80.0 male scalp 1 HAM_0000118 ISIC_0025030 bkl histo 80.0 male scalp 2 HAM_0002730 ISIC_0026769 bkl histo 80.0 male scalp 3 HAM_0002730 ISIC_0025661 bkl histo 80.0 male scalp 4 HAM_0001466 ISIC_0031633 bkl histo 75.0 male ear
  • 16. df.describe() Data Cleaning Removing NULL values and performing visualizations to gain insights of dataset: Univariate and Bivariate Analysis df. isnull(). sum() lesion_id 0 image_id 0 dx 0 dx_type 0 age 57 sex 0 localization 0 dtype: int64 The feature 'age' consists of 57 null records. Thus, we need to replace them with the mean of 'age' since dropping 57 records would lead to loss of data. age count 9958.000000 mean 51.863828 std 16.968614 min 0.000000 25% 40.000000 50% 50.000000 75% 65.000000 max 85.000000
  • 17. df['age'].fillna(int(df['age'].mean()),inplace=True) df.isnull().sum() lesion_id 0 image_id 0 dx 0 dx_type 0 age 0 sex 0 localization 0 dtype: int64 lesion_type_dict = { 'nv': 'Melanocytic nevi', 'mel': 'Melanoma', 'bkl': 'Benign keratosis-like lesions ', 'bcc': 'Basal cell carcinoma', 'akiec': 'Actinic keratoses', 'vasc': 'Vascular lesions', 'df': 'Dermatofibroma' } base_skin_dir = '../input/skin-cancer-mnist-ham10000' # Merge images from both folders into one dictionary imageid_path_dict = {os.path.splitext(os.path.basename(x))[0]: x for x in glob(os.path.join(base_skin_dir, '*', '*.jpg'))} df['path'] = df['image_id'].map(imageid_path_dict.get) df['cell_type'] = df['dx'].map(lesion_type_dict.get) df['cell_type_idx'] = pd.Categorical(df['cell_type']).codes df.head()
  • 18. lesion_id image_id d x dx_ty pe ag e sex localizat ion path cell_ty pe cell_type _idx 0 HAM_000 0118 ISIC_0027 419 b kl histo 80. 0 ma le scalp ../input/skin- cancer-mnist- ham10000/ham1 0000_i... Benign kerato sis-like lesions 2 1 HAM_000 0118 ISIC_0025 030 b kl histo 80. 0 ma le scalp ../input/skin- cancer-mnist- ham10000/ham1 0000_i... Benign kerato sis-like lesions 2 2 HAM_000 2730 ISIC_0026 769 b kl histo 80. 0 ma le scalp ../input/skin- cancer-mnist- ham10000/ham1 0000_i... Benign kerato sis-like lesions 2 3 HAM_000 2730 ISIC_0025 661 b kl histo 80. 0 ma le scalp ../input/skin- cancer-mnist- ham10000/ham1 0000_i... Benign kerato sis-like lesions 2 4 HAM_000 1466 ISIC_0031 633 b kl histo 75. 0 ma le ear ../input/skin- cancer-mnist- ham10000/ham1 0000_i... Benign kerato sis-like lesions 2 Image Preprocessing df['image'] = df['path'].map(lambda x: np.asarray(Image.open(x).resize((125,100)))) n_samples = 5 fig, m_axs = plt.subplots(7, n_samples, figsize = (4*n_samples, 3*7)) for n_axs, (type_name, type_rows) in zip(m_axs, df.sort_values(['cell_type']).groupby('cell_type')): n_axs[0].set_title(type_name) for c_ax, (_, c_row) in zip(n_axs, type_rows.sample(n_samples, random_state=2018).iterrows()): c_ax.imshow(c_row['image']) c_ax.axis('off') fig.savefig('category_samples.png', dpi=300) Resizing of images because the original dimensions of 450 * 600 * 3 take long time to process in Neural Networks
  • 19. # See the image size distribution - should just return one row (all images are uniform) df['image'].map(lambda x: x.shape).value_counts() (100, 125, 3) 10015 Name: image, dtype: int64 Exploratory Data Analysis Exploratory data analysis can help detect obvious errors, identify outliers in datasets, understand relationships, unearth important factors, find patterns within data, and provide new insights.
  • 20. df= df[df['age'] != 0] df= df[df['sex'] != 'unknown'] plt.figure(figsize=(20,10)) plt.subplots_adjust(left=0.125, bottom=1, right=0.9, top=2, hspace=0.2) plt.subplot(2,4,1) plt.title("AGE",fontsize=15) plt.ylabel("Count") df['age'].value_counts().plot.bar() plt.subplot(2,4,2) plt.title("GENDER",fontsize=15) plt.ylabel("Count") df['sex'].value_counts().plot.bar() plt.subplot(2,4,3) plt.title("localization",fontsize=15) plt.ylabel("Count") plt.xticks(rotation=45) df['localization'].value_counts().plot.bar() plt.subplot(2,4,4) plt.title("CELL TYPE",fontsize=15) plt.ylabel("Count") df['cell_type'].value_counts().plot.bar() <AxesSubplot:title={'center':'CELL TYPE'}, ylabel='Count'> 1. Skin diseases are found to be maximum in people aged around 45. Minimum for 10 and below. We also observe that the probability of having skin disease increases with the increase in age. 2. Skin diseases are more prominent in Men as compared to Women and other gender. 3. Skin diseases are more visible on the "back" of the body and least on the "acral surfaces"(such as limbs, fingers, or ears). 4. The most found disease among people is Melanocytic nevi while the least found is Dermatofibroma.
  • 21. plt.figure(figsize=(15,10)) plt.subplot(1,2,1) df['dx'].value_counts().plot.pie(autopct="%1.1f%%") plt.subplot(1,2,2) df['dx_type'].value_counts().plot.pie(autopct="%1.1f%%") plt.show() 1. Type of skin disease: • nv: Melanocytic nevi - 69.9% • mel: Melanoma - 11.1 % • bkl: Benign keratosis-like lesions - 11.0% • bcc: Basal cell carcinoma - 5.1% • akiec: Actinic keratoses- 3.3% • vasc: Vascular lesions-1.4% • df: Dermatofibroma - 1.1% 2. How the skin disease was discovered : • histo - histopathology - 53.3% • follow_up - follow up examination - 37.0% • consensus - expert consensus - 9.0% • confocal - confirmation by in-vivo confocal microscopy - 0.7%
  • 22. BIVARIATE ANALYSIS plt.figure(figsize=(25,10)) plt.title('LOCALIZATION VS GENDER',fontsize = 15) sns.countplot(y='localization', hue='sex',data=df) <AxesSubplot:title={'center':'LOCALIZATION VS GENDER'}, xlabel='count', ylabel='localization'> • Back are is the most affected among people and more prominent in men. • Infection on Lower extremity of the body is more visible in women. • Some unknown regions also show infections and it's visible in men, women and other genders. • The acral surfaces show the least infection cases that too in men only. Other gender groups don't show this kind of infection. plt.figure(figsize=(25,10)) plt.title('LOCALIZATION VS CELL TYPE',fontsize = 15) sns.countplot(y='localization', hue='cell_type',data=df) <AxesSubplot:title={'center':'LOCALIZATION VS CELL TYPE'}, xlabel='count', ylabel='localization'>
  • 23. • The face is infected the most by Benign keratosis-like lesions. • Body parts(except face) are infected the most by Melanocytic nevi. plt.figure(figsize=(25,10)) plt.subplot(131) plt.title('AGE VS CELL TYPE',fontsize = 15) sns.countplot(y='age', hue='cell_type',data=df) plt.subplot(132) plt.title('GENDER VS CELL TYPE',fontsize = 15) sns.countplot(y='sex', hue='cell_type',data=df) <AxesSubplot:title={'center':'GENDER VS CELL TYPE'}, xlabel='count', ylabel='sex'> 1. The age group between 0-75 years is infected the most by Melanocytic nevi. On the other hand, the people aged 80-90 are affected more by Benign keratosis-like lesions. 2. All the gender groups are affected the most by Melanocytic nevi. from sklearn.model_selection import train_test_split import keras from keras.models import Sequential from keras.layers import Dense, Dropout import tensorflow as tf from sklearn.preprocessing import StandardScaler
  • 24. ANN features=df.drop(columns=['cell_type_idx'],axis=1) target=df['cell_type_idx'] features.head() lesion_id image_id dx dx_ty pe age sex localizati on path cell_ty pe imag e 0 HAM_0000 118 ISIC_0027 419 bk l histo 80. 0 mal e scalp ../input/skin-cancer- mnist- ham10000/ham1000 0_i... Benign keratosi s-like lesions [[[18 9, 152, 194], [192, 156, 198], [191, 154,.. . 1 HAM_0000 118 ISIC_0025 030 bk l histo 80. 0 mal e scalp ../input/skin-cancer- mnist- ham10000/ham1000 0_i... Benign keratosi s-like lesions [[[24, 13, 22], [24, 14, 22], [24, 14, 26], [2... 2 HAM_0002 730 ISIC_0026 769 bk l histo 80. 0 mal e scalp ../input/skin-cancer- mnist- ham10000/ham1000 0_i... Benign keratosi s-like lesions [[[18 6, 127, 135], [189, 133, 145], [192, 135,.. . 3 HAM_0002 730 ISIC_0025 661 bk l histo 80. 0 mal e scalp ../input/skin-cancer- mnist- ham10000/ham1000 0_i... Benign keratosi s-like lesions [[[24, 11, 17], [24, 11, 20], [30, 15, 25], [4...
  • 25. x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(features, target, test_size=0.25,random_state=666) tf.unique(x_train_o.cell_type.values) Unique(y=<tf.Tensor: shape=(7,), dtype=string, numpy= array([b'Melanocytic nevi', b'Basal cell carcinoma', b'Melanoma', b'Actinic keratoses', b'Vascular lesions', b'Benign keratosis-like lesions ', b'Dermatofibroma'], dtype=object)>, idx=<tf.Tensor: sha pe=(7440,), dtype=int32, numpy=array([0, 1, 2, ..., 1, 0, 0], dtype=int32)>) x_train = np.asarray(x_train_o['image'].tolist()) x_test = np.asarray(x_test_o['image'].tolist()) x_train_mean = np.mean(x_train) x_train_std = np.std(x_train) x_test_mean = np.mean(x_test) x_test_std = np.std(x_test) x_train = (x_train - x_train_mean)/x_train_std x_test = (x_test - x_test_mean)/x_test_std # Perform one-hot encoding on the labels y_train = to_categorical(y_train_o, num_classes = 7) y_test = to_categorical(y_test_o, num_classes = 7) y_test array([[0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], ..., [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 0., 1., 0.], [0., 0., 0., ..., 1., 0., 0.]], dtype=float32) x_train, x_validate, y_train, y_validate = train_test_split(x_train, y_train, test_size = 0.1, random_state = 999) # Reshape image in 3 dimensions (height = 100, width = 125 , canal = 3) x_train = x_train.reshape(x_train.shape[0], *(100, 125, 3)) x_test = x_test.reshape(x_test.shape[0], *(100, 125, 3)) x_validate = x_validate.reshape(x_validate.shape[0], *(100, 125, 3))
  • 26. x_train = x_train.reshape(6696,125*100*3) x_test = x_test.reshape(2481,125*100*3) print(x_train.shape) print(x_test.shape) (6696, 37500) (2481, 37500) # define the keras model model = Sequential() model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu', input_dim = 37500)) model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu')) model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu')) model.add(Dense(units= 64, kernel_initializer = 'uniform', activation = 'relu')) model.add(Dense(units = 7, kernel_initializer = 'uniform', activation = 'softmax')) optimizer = tf.keras.optimizers.Adam(learning_rate = 0.00075, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-8) # compile the keras model model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy']) # fit the keras model on the dataset history = model.fit(x_train, y_train, batch_size = 10, epochs = 50) accuracy = model.evaluate(x_test, y_test, verbose=1)[1] print("Test: accuracy = ",accuracy*100,"%") result of Epoch from keras.utils.vis_utils import plot_model plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
  • 27. CNN CNN is ideal for image classification. It is better since CNN has features parameter sharing and dimensionality reduction. Because of parameter sharing in CNN, the number of parameters is reduced thus the computations get decreased.
  • 28. Since the data is less, we apply data augmentation using ImageDataGenerator. ImageDataGenerator generates augmentation of images in real-time while the model is still training. One can apply any random transformations on each training image as it is passed to the model. The CNN model is a repeated network of the following layers: 1. Convolutional 2. Pooling 3. Dropout 4. Flatten 5. Dense from tensorflow.keras.layers import Flatten,Dense,Dropout,BatchNormalization,Conv2D, MaxPool2D from keras.optimizers import Adam from keras.callbacks import ReduceLROnPlateau from tensorflow.keras.preprocessing.image import ImageDataGenerator # Set the CNN model # my CNN architechture is In -> [[Conv2D->relu]*2 -> MaxPool2D -> Dropout]*3 -> Flatten -> Dense*2 -> Dropout -> Out input_shape = (100, 125, 3) num_classes = 7 model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',padding = 'Same',input_shape=input_shape)) model.add(Conv2D(32,kernel_size=(3, 3), activation='relu',padding = 'Same',)) model.add(MaxPool2D(pool_size = (2, 2))) model.add(Dropout(0.16)) model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',padding = 'Same')) model.add(Conv2D(32,kernel_size=(3, 3), activation='relu',padding = 'Same',)) model.add(MaxPool2D(pool_size = (2, 2))) model.add(Dropout(0.20)) model.add(Conv2D(64, (3, 3), activation='relu',padding = 'same')) model.add(Conv2D(64, (3, 3), activation='relu',padding = 'Same')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dense(128, activation='relu')) model.add(Dropout(0.4)) model.add(Dense(num_classes, activation='softmax')) model.summary() Applied Data augmentation using ImageDatagenerator before model training
  • 29. Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 100, 125, 32) 896 _________________________________________________________________ conv2d_1 (Conv2D) (None, 100, 125, 32) 9248 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 50, 62, 32) 0 _________________________________________________________________ dropout (Dropout) (None, 50, 62, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 50, 62, 32) 9248 _________________________________________________________________ conv2d_3 (Conv2D) (None, 50, 62, 32) 9248 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 25, 31, 32) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 25, 31, 32) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 25, 31, 64) 18496 _________________________________________________________________ conv2d_5 (Conv2D) (None, 25, 31, 64) 36928 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 12, 15, 64) 0 _________________________________________________________________ dropout_2 (Dropout) (None, 12, 15, 64) 0 _________________________________________________________________ flatten (Flatten) (None, 11520) 0 _________________________________________________________________ dense_5 (Dense) (None, 256) 2949376 _________________________________________________________________ dense_6 (Dense) (None, 128) 32896 _________________________________________________________________ dropout_3 (Dropout) (None, 128) 0 _________________________________________________________________ dense_7 (Dense) (None, 7) 903 ================================================================= Total params: 3,067,239 Trainable params: 3,067,239 Non-trainable params: 0 # Define the optimizer optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) # Compile the model model.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy"]) # Set a learning rate annealer learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy', patience=4, verbose=1, factor=0.5, min_lr=0.00001)
  • 30. x_train, x_validate, y_train, y_validate = train_test_split(x_train, y_train, test_size = 0.1, random_state = 999) # Reshape image in 3 dimensions (height = 100, width = 125 , canal = 3) x_train = x_train.reshape(x_train.shape[0], *(100, 125, 3)) x_test = x_test.reshape(x_test.shape[0], *(100, 125, 3)) x_validate = x_validate.reshape(x_validate.shape[0], *(100, 125, 3)) # With data augmentation to prevent overfitting datagen = ImageDataGenerator( featurewise_center=False, # set input mean to 0 over the dataset samplewise_center=False, # set each sample mean to 0 featurewise_std_normalization=False, # divide inputs by std of the dataset samplewise_std_normalization=False, # divide each input by its std zca_whitening=False, # apply ZCA whitening rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180) zoom_range = 0.1, # Randomly zoom image width_shift_range=0.12, # randomly shift images horizontally (fraction of total width) height_shift_range=0.12, # randomly shift images vertically (fraction of total height) horizontal_flip=True, # randomly flip images vertical_flip=True) # randomly flip images datagen.fit(x_train) # Fit the model epochs = 60 batch_size = 16 history = model.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size), epochs = epochs, validation_data = (x_validate,y_validate), verbose = 1, steps_per_epoch=x_train.shape[0] // batch_size , callbacks=[learning_rate_reduction]) from tensorflow.keras.metrics import Recall from sklearn.metrics import classification_report,confusion_matrix from keras.utils.vis_utils import plot_model plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
  • 31.
  • 32. loss, accuracy = model.evaluate(x_test, y_test, verbose=1) loss_v, accuracy_v = model.evaluate(x_validate, y_validate, verbose=1) print("Validation: accuracy = %f ; loss_v = %f" % (accuracy_v, loss_v)) print("Test: accuracy = %f ; loss = %f" % (accuracy, loss)) model.save("model.h5") 78/78 [==============================] - 1s 8ms/step - loss: 0.6185 - accuracy: 0.7686 21/21 [==============================] - 0s 7ms/step - loss: 0.6881 - accuracy: 0.7433 Validation: accuracy = 0.743284 ; loss_v = 0.688070 Test: accuracy = 0.768642 ; loss = 0.618472 import itertools # Function to plot confusion matrix def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues): """ This function prints and plots the confusion matrix. Normalization can be applied by setting `normalize=True`. """ plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, cm[i, j], horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.tight_layout() plt.ylabel('True label') plt.xlabel('Predicted label') # Predict the values from the validation dataset Y_pred = model.predict(x_validate) # Convert predictions classes to one hot vectors Y_pred_classes = np.argmax(Y_pred,axis = 1) # Convert validation observations to one hot vectors Y_true = np.argmax(y_validate,axis = 1) # compute the confusion matrix confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
  • 33. # Predict the values from the validation dataset Y_pred = model.predict(x_test) # Convert predictions classes to one hot vectors Y_pred_classes = np.argmax(Y_pred,axis = 1) # Convert validation observations to one hot vectors Y_true = np.argmax(y_test,axis = 1) # compute the confusion matrix confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) # plot the confusion matrix plot_confusion_matrix(confusion_mtx, classes = range(7))
  • 34. label_frac_error = 1 - np.diag(confusion_mtx) / np.sum(confusion_mtx, axis=1) plt.bar(np.arange(7),label_frac_error) plt.xlabel('True Label') plt.ylabel('Fraction classified incorrectly') Text(0, 0.5, 'Fraction classified incorrectly') # # Function to plot model's validation loss and validation accuracy # def plot_model_history(model_history): # fig, axs = plt.subplots(1,2,figsize=(15,5)) # # summarize history for accuracy # axs[0].plot(range(1,len(model_history.history['accuracy'])+1),model_history.history['accuracy']) # axs[0].plot(range(1,len(model_history.history['val_accuracy'])+1),model_history.history['val_accuracy' ]) # axs[0].set_title('Model Accuracy') # axs[0].set_ylabel('Accuracy') # axs[0].set_xlabel('Epoch') # axs[0].set_xticks(np.arange(1,len(model_history.history['accuracy'])+1),len(model_history.history['acc uracy'])/10) # axs[0].legend(['train', 'val'], loc='best') # # summarize history for loss # axs[1].plot(range(1,len(model_history.history['loss'])+1),model_history.history['loss']) # axs[1].plot(range(1,len(model_history.history['val_loss'])+1),model_history.history['val_loss']) # axs[1].set_title('Model Loss') # axs[1].set_ylabel('Loss') # axs[1].set_xlabel('Epoch') # axs[1].set_xticks(np.arange(1,len(model_history.history['loss'])+1),len(model_history.history['loss'])/10 ) # axs[1].legend(['train', 'val'], loc='best') # plt.show() # plot_model_history(history)
  • 35. Tranfer Learning Why MobileNet? MobileNet significantly reduces the number of parameters when compared to the network with regular convolutions with the same depth in the nets. This results in lightweight deep neural networks. The 2 layers in addition to the ones used for CNN are: Batch Normalization Zero Padding import tensorflow from tensorflow.keras.models import Model from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint df['image'] = df['path'].map(lambda x: np.asarray(Image.open(x).resize((450,600)))) features=df.drop(columns=['cell_type_idx'],axis=1) target=df['cell_type_idx'] x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(features, target, test_size=0.25,random_state=666) tf.unique(x_train_o.cell_type.values) Unique(y=<tf.Tensor: shape=(7,), dtype=string, numpy= array([b'Melanocytic nevi', b'Basal cell carcinoma', b'Melanoma', b'Vascular lesions', b'Benign keratosis-like lesions ', b'Actinic keratoses', b'Dermatofibroma'], dtype=object)>, idx=<tf.Tensor: shape=(7511,), dtype=int32, numpy=array([0, 1 , 0, ..., 1, 0, 0], dtype=int32)>) x_train = np.asarray(x_train_o['image'].tolist()) x_test = np.asarray(x_test_o['image'].tolist()) x_train_mean = np.mean(x_train) x_train_std = np.std(x_train) x_test_mean = np.mean(x_test) x_test_std = np.std(x_test) x_train = (x_train - x_train_mean)/x_train_std x_test = (x_test - x_test_mean)/x_test_std # Perform one-hot encoding on the labels y_train = to_categorical(y_train_o, num_classes = 7) y_test = to_categorical(y_test_o, num_classes = 7) y_test Due to lack of dataset, pretrained model of MobileNet is used.
  • 36. x_train, x_validate, y_train, y_validate = train_test_split(x_train, y_train, test_size = 0.1, random_state = 999) # Reshape image in 3 dimensions (height = 100, width = 125 , canal = 3) x_train = x_train.reshape(x_train.shape[0], *(224, 224, 3)) x_test = x_test.reshape(x_test.shape[0], *(224, 224, 3)) x_validate = x_validate.reshape(x_validate.shape[0], *(224, 224, 3) print(x_train.shape) # create a copy of a mobilenet model mobile = tensorflow.keras.applications.mobilenet.MobileNet() mobile.summary() def change_model(model, new_input_shape=(None, 40, 40, 3),custom_objects=None): # replace input shape of first layer config = model.layers[0].get_config() config['batch_input_shape']=new_input_shape model._layers[0]=model.layers[0].from_config(config) # rebuild model architecture by exporting and importing via json new_model = tensorflow.keras.models.model_from_json(model.to_json(),custom_objects=custom_objects ) # copy weights from old model to new one for layer in new_model._layers: try: layer.set_weights(model.get_layer(name=layer.name).get_weights()) print("Loaded layer {}".format(layer.name)) except: print("Could not transfer weights for layer {}".format(layer.name)) return new_model new_model = change_model(mobile, new_input_shape=[None] + [100,125,3]) new_model.summary()
  • 37. # CREATE THE MODEL ARCHITECTURE # Exclude the last 5 layers of the above model. # This will include all layers up to and including global_average_pooling2d_1 x = new_model.layers[-6].output # Create a new dense layer for predictions # 7 corresponds to the number of classes x = Dropout(0.25)(x) predictions = Dense(7, activation='softmax')(x) # inputs=mobile.input selects the input layer, outputs=predictions refers to the # dense layer we created above. model = Model(inputs=new_model.input, outputs=predictions) # We need to choose how many layers we actually want to be trained. # Here we are freezing the weights of all layers except the # last 23 layers in the new model. # The last 23 layers of the model will be trained. for layer in model.layers[:-23]: layer.trainable = False # Define Top2 and Top3 Accuracy from tensorflow.keras.metrics import categorical_accuracy, top_k_categorical_accuracy def top_3_accuracy(y_true, y_pred): return top_k_categorical_accuracy(y_true, y_pred, k=3) def top_2_accuracy(y_true, y_pred): return top_k_categorical_accuracy(y_true, y_pred, k=2) model.compile(Adam(lr=0.01), loss='categorical_crossentropy', metrics=[categorical_accuracy, top_2_accuracy, top_3_accuracy]) # Add weights to try to make the model more sensitive to melanoma class_weights={ 0: 1.0, # akiec 1: 1.0, # bcc 2: 1.0, # bkl 3: 1.0, # df 4: 3.0, # mel # Try to make the model more sensitive to Melanoma. 5: 1.0, # nv 6: 1.0, # vasc }
  • 38. filepath = "model.h5" checkpoint = ModelCheckpoint(filepath, monitor='val_top_3_accuracy', verbose=1, save_best_only=True, mode='max') reduce_lr = ReduceLROnPlateau(monitor='val_top_3_accuracy', factor=0.5, patience=2, verbose=1, mode='max', min_lr=0.00001) callbacks_list = [checkpoint, reduce_lr] history = model.fit_generator(datagen.flow(x_train,y_train, batch_size=batch_size), class_weight=class_weights, validation_data=(x_validate,y_validate),steps_per_epoch=x_train.shape[0] // batch_size, epochs=10, verbose=1, callbacks=callbacks_list) from keras.utils.vis_utils import plot_model plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True) # get the metric names so we can use evaulate_generator model.metrics_names # Here the the last epoch will be used. val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = model.evaluate(datagen.flow(x_test,y_test, batch_size=16) ) print('val_loss:', val_loss) print('val_cat_acc:', val_cat_acc) print('val_top_2_acc:', val_top_2_acc) print('val_top_3_acc:', val_top_3_acc) # Here the best epoch will be used. model.load_weights('model.h5') val_loss, val_cat_acc, val_top_2_acc, val_top_3_acc = model.evaluate_generator(datagen.flow(x_test,y_test, batch_size=16) ) print('val_loss:', val_loss) print('val_cat_acc:', val_cat_acc) print('val_top_2_acc:', val_top_2_acc) print('val_top_3_acc:', val_top_3_acc)
  • 39. # display the loss and accuracy curves import matplotlib.pyplot as plt acc = history.history['categorical_accuracy'] val_acc = history.history['val_categorical_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] train_top2_acc = history.history['top_2_accuracy'] val_top2_acc = history.history['val_top_2_accuracy'] train_top3_acc = history.history['top_3_accuracy'] val_top3_acc = history.history['val_top_3_accuracy'] epochs = range(1, len(acc) + 1) plt.plot(epochs, loss, 'bo', label='Training loss') plt.plot(epochs, val_loss, 'b', label='Validation loss') plt.title('Training and validation loss') plt.legend() plt.figure() plt.plot(epochs, acc, 'bo', label='Training cat acc') plt.plot(epochs, val_acc, 'b', label='Validation cat acc') plt.title('Training and validation cat accuracy') plt.legend() plt.figure() plt.plot(epochs, train_top2_acc, 'bo', label='Training top2 acc') plt.plot(epochs, val_top2_acc, 'b', label='Validation top2 acc') plt.title('Training and validation top2 accuracy') plt.legend() plt.figure() plt.plot(epochs, train_top3_acc, 'bo', label='Training top3 acc') plt.plot(epochs, val_top3_acc, 'b', label='Validation top3 acc') plt.title('Training and validation top3 accuracy') plt.legend() plt.show() Plot the Training Curves
  • 40. accuracy = model.evaluate(x_test, y_test,verbose=1)[1] accuracy_v = model.evaluate(x_validate, y_validate)[1] print("Validation: accuracy = ", accuracy_v) print("Test: accuracy = ",accuracy) model.save("model.h5") # make a prediction predictions = model.predict_generator(datagen.flow(x_test,y_test, batch_size=16), verbose=1) predictions.shape test_batches = datagen.flow(x_test,y_test, batch_size=16) test_batches # Source: Scikit Learn website # http://scikit-learn.org/stable/auto_examples/ # model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model- # selection-plot-confusion-matrix-py def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues): """ This function prints and plots the confusion matrix. Normalization can be applied by setting `normalize=True`. """ if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] print("Normalized confusion matrix") else: print('Confusion matrix, without normalization') print(cm) plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) Create a Confusion Matrix
  • 41. plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) fmt = '.2f' if normalize else 'd' thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, format(cm[i, j], fmt), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.ylabel('True label') plt.xlabel('Predicted label') plt.tight_layout() # Function to plot confusion matrix def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues): """ This function prints and plots the confusion matrix. Normalization can be applied by setting `normalize=True`. """ plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, cm[i, j], horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.tight_layout() plt.ylabel('True label') plt.xlabel('Predicted label') # Predict the values from the validation dataset Y_pred = model.predict(x_validate) # Convert predictions classes to one hot vectors
  • 42. Y_pred_classes = np.argmax(Y_pred,axis = 1) # Convert validation observations to one hot vectors Y_true = np.argmax(y_validate,axis = 1) # compute the confusion matrix confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) # plot the confusion matrix plot_confusion_matrix(confusion_mtx, classes = range(7)) # Predict the values from the validation dataset Y_pred = model.predict(x_test) # Convert predictions classes to one hot vectors Y_pred_classes = np.argmax(Y_pred,axis = 1) # Convert validation observations to one hot vectors Y_true = np.argmax(y_test,axis = 1) # compute the confusion matrix confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) # plot the confusion matrix plot_confusion_matrix(confusion_mtx, classes = range(7)) y_pred = model.predict(x_test) y_pred =y_pred>0.5 cm_plot_labels = ['akiec', 'bcc', 'bkl', 'df', 'mel','nv', 'vasc'] from sklearn.metrics import classification_report # Generate a classification report report = classification_report(y_test, y_pred, target_names=cm_plot_labels) print(report) model.save("mobilenet_model.h5") Generate the Classification Report
  • 43. tile_df = df.copy() tile_df.drop('lesion_id', inplace=True, axis=1) tile_df.drop('image_id', inplace=True, axis=1) tile_df.drop('cell_type', inplace=True, axis=1) tile_df.drop('path', inplace=True, axis=1) tile_df.drop('dx', inplace=True, axis=1) tile_df.head() X = tile_df.drop(['cell_type_idx'],axis=1).values y = tile_df['cell_type_idx'].values X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0) pip install alibi pip install shap import shap shap.initjs() import matplotlib.pyplot as plt import numpy as np import pandas as pd from alibi.explainers import KernelShap from scipy.special import logit Techniques applied: LIME, PDP, SHAP, etc. dx_type age sex localization cell_type_idx 0 histo 80.0 male scalp 2 1 histo 80.0 male scalp 2 2 histo 80.0 male scalp 2 3 histo 80.0 male scalp 2 4 histo 75.0 male ear 2
  • 44. from sklearn.metrics import confusion_matrix, plot_confusion_matrix from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression tile_df['localization_onehot'] = tile_df.localization.map({'scalp':0, 'ear':1, 'face':2, 'neck':3,'back':4, 'trunk':5, 'chest':6, 'upper extremity':7, 'abdomen':8, 'lower extremity':9, 'genital':10, 'hand':11, 'foot':12, 'acral':13, 'unknown':14}) tile_df.head() tile_df['dx_type_onehot'] = tile_df.dx_type.map({'confocal':0,'consensus':1,'follow_up':2,'histo':3}) tile_df.head() [23]: dx_type age sex localization cell_type_idx localization_onehot dx_type_onehot 0 histo 80.0 male scalp 2 0 3 1 histo 80.0 male scalp 2 0 3 2 histo 80.0 male scalp 2 0 3 3 histo 80.0 male scalp 2 0 3 4 histo 75.0 male ear 2 1 3 dx_type age sex localization cell_type_idx localization_onehot 0 histo 80.0 male scalp 2 0 1 histo 80.0 male scalp 2 0 2 histo 80.0 male scalp 2 0 3 histo 80.0 male scalp 2 0 4 histo 75.0 male ear 2 1
  • 45. tile_df['gender_male'] = tile_df.sex.map({'female':0, 'male':1, 'unknown':2}) tile_df.head() dx_typ e age sex localizatio n cell_type_id x localization_oneho t dx_type_oneho t gender_mal e 0 hist o 80. 0 male scalp 2 0 3 1 1 hist o 80. 0 male scalp 2 0 3 1 2 hist o 80. 0 male scalp 2 0 3 1 3 hist o 80. 0 male scalp 2 0 3 1 4 hist o 75. 0 male ear 2 1 3 1 tile_df.columns Index(['dx_type', 'age', 'sex', 'localization', 'cell_type_idx', 'localization_onehot', 'dx_type_onehot', 'gender_male'], dtype='object') features = ['age', 'localization_onehot', 'dx_type_onehot','gender_male'] X = tile_df[features] y = tile_df['cell_type_idx'].values X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0) from xgboost import XGBClassifier from sklearn.ensemble import RandomForestClassifier model = XGBClassifier(random_state=1) model = model.fit(X_train, y_train) y_pred = model.predict(X_test) predictions = [round(value) for value in y_pred]
  • 46. from sklearn.metrics import accuracy_score accuracy = accuracy_score(y_test, predictions) print("Accuracy: %.2f%%" % (accuracy * 100.0)) Accuracy: 72.16% explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test) print('Expected Value: ', explainer.expected_value) Expected Value: [-0.6287137, -0.21934628, 0.4661603, -1.7456617, 2.6632032, 0.5190712, -1.2845858] shap.summary_plot(shap_values, X_test, plot_type="bar") shap.summary_plot(shap_values[0], X_test) from sklearn.preprocessing import LabelEncoder
  • 47. ## Preprocess training and test target (y) after having performed train-test split le = LabelEncoder() y_multi_train = pd.Series(le.fit_transform(y_train)) y_multi_test = pd.Series(le.transform(y_test)) ## Check classes le.classes_ array([0, 1, 2, 3, 4, 5, 6], dtype=int8) shap.initjs() shap.dependence_plot('dx_type_onehot', interaction_index='age', shap_values=shap_values[0], features=X_test, display_features=X_test) shap.initjs() shap.force_plot(explainer.expected_value[0], shap_values[0][:100,:], X_test.iloc[:100,:])
  • 48. shap.initjs() shap.force_plot(explainer.expected_value[0], shap_values[0][15,:], X_test.iloc[15,:]) Feature Importance: Feature importance measures the increase in the prediction error of the model after we permuted the feature values. A feature is "important" if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is "unimportant" if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction. Now install the eli5 pip install eli5 import eli5 from eli5.sklearn import PermutationImportance eli5.show_weights(model.get_booster(), top=15) tgt = 6 print('Reference:', y_test[tgt]) print('Predicted:', predictions[tgt]) eli5.show_prediction(model.get_booster(), X_test.iloc[tgt], feature_names=features, show_feature_values=True) Weight Feature 0.8239 dx_type_onehot 0.0748 age 0.0667 localization_onehot 0.0346 gender_male
  • 49. Reference: 4 Predicted: 4 y=0 (probabili ty 0.000, score -7.697) top features y=1 (probabili ty 0.000, score -5.861) top features y=2 (probabili ty 0.000, score -1.815) top features y=3 (probabili ty 0.000, score -2.777) top features y=4 (probabili ty 1.000, score 7.013) top features y=5 (probabili ty 0.000, score -5.009) top features y=6 (probabili ty 0.000, score -4.376) top features C o n t r i b u ti o n ? F e a t u r e V a l u e - 0 . 1 3 3 g e n d er _ m al e 0 . 0 0 0 - 1 . 1 2 9 < B I A S > 1 . 0 0 0 - 1 . 3 4 7 lo c al iz at io n _ o n e h ot 9 . 0 0 0 - 1 . 3 7 0 a g e 5 0 . 0 0 0 - 3 . 7 d x _t y 2 . 0 C o n t r i b u ti o n ? F e a t u r e V a l u e - 0 . 0 9 6 g e n d er _ m al e 0 . 0 0 0 - 0 . 4 5 2 a g e 5 0 . 0 0 0 - 0 . 5 5 5 lo c al iz at io n _ o n e h ot 9 . 0 0 0 - 0 . 7 1 9 < B I A S > 1 . 0 0 0 - 4 . 0 d x _t y 2 . 0 C o n t r i b u ti o n ? F e a t u r e V a l u e + 0 . 0 8 7 lo c al iz at io n _ o n e h ot 9 . 0 0 0 + 0 . 0 2 1 a g e 5 0 . 0 0 0 + 0 . 0 1 6 g e n d er _ m al e 0 . 0 0 0 - 0 . 0 3 4 < B I A S > 1 . 0 0 0 - 1 . 9 d x _t y 2 . 0 C o n t r i b u ti o n ? F e a t u r e V a l u e + 0 . 8 1 6 a g e 5 0 . 0 0 0 + 0 . 5 5 9 lo c al iz at io n _ o n e h ot 9 . 0 0 0 - 0 . 5 2 5 g e n d er _ m al e 0 . 0 0 0 - 1 . 3 8 1 d x _t y p e _ o n e 2 . 0 0 0 C o n t r i b u ti o n ? F e a t u r e V a l u e + 4 . 7 5 1 d x _t y p e _ o n e h ot 2 . 0 0 0 + 2 . 1 6 3 < B I A S > 1 . 0 0 0 + 0 . 0 6 4 a g e 5 0 . 0 0 0 + 0 . 0 4 1 g e n d er _ m al e 0 . 0 0 0 - 0 . 0 lo c al iz at 9 . 0 C o n t r i b u ti o n ? F e a t u r e V a l u e + 0 . 0 1 9 < B I A S > 1 . 0 0 0 - 0 . 0 0 4 lo c al iz at io n _ o n e h ot 9 . 0 0 0 - 0 . 0 3 3 g e n d er _ m al e 0 . 0 0 0 - 0 . 1 4 5 a g e 5 0 . 0 0 0 - 4 . 8 d x _t y 2 . 0 C o n t r i b u ti o n ? F e a t u r e V a l u e + 0 . 2 0 7 g e n d er _ m al e 0 . 0 0 0 - 0 . 0 6 7 lo c al iz at io n _ o n e h ot 9 . 0 0 0 - 1 . 1 8 2 d x _t y p e _ o n e h ot 2 . 0 0 0 - 1 . 5 a g e 5 0 . 0
  • 50. 1 8 p e _ o n e h ot 0 0 3 8 p e _ o n e h ot 0 0 0 6 p e _ o n e h ot 0 0 h ot - 2 . 2 4 6 < B I A S > 1 . 0 0 0 0 6 io n _ o n e h ot 0 0 4 7 p e _ o n e h ot 0 0 4 9 0 0 - 1 . 7 8 5 < B I A S > 1 . 0 0 0 PDP : The partial dependence plot shows the marginal effect one or two features have on the predicted outcome of a machine learning model. A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex. For each of the categories, we get a PDP estimate by forcing all data instances to have the same category. pip install pdpbox from pdpbox import pdp, get_dataset, info_plots pdp_feat_67_rf = pdp.pdp_isolate(model=model, dataset=X_train, model_features=features, feature='dx_type_onehot') fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_feat_67_rf, feature_name='type of diagnosis', center=True, x_quantile=True, ncols=3, plot_lines=True, frac_to_plot=100 The PDP (Partial Dependence Plot) shows us the relation between an increase/decrease of one feature to the prediction of the model. For example: In figure 1 (class 0), we observe that the chances of the skin disease belonging to class 0 increases when the value of dx_type_onehot changes from 2 (follow up) to 3 (histopathology). Similarly, in figure 5 (class 4), we observe that the probability of the skin disease belonging to class 4 is extremely high when the value of dx_type_onehot lies between 0 and 2, and decreases comparatively when it lies between 2 and 3. Similarly, probability of the skin disease belonging to class 6 is extremely low when the value of dx_type_onehot lies between 0 and 2 (confocal, consensus and follow up), and increases comparatively when it changes from 2 to 3.
  • 51. LIME Step 1: Generate random perturbations for input image Step 2: Predict class for perturbations Step 3: Compute weights (importance) for the perturbations Step 4: Fit a explainable linear model using the perturbations, predictions and weights import skimage.io import skimage.segmentation np.random.seed(222) Xi = x_test[3] preds = model.predict(Xi[np.newaxis,:,:,:]) top_pred_classes = preds[0].argsort()[-5:][::-1] # Save ids of top 5 classes top_pred_classes print(y_test[3]) skimage.io.imshow(Xi) #Generate segmentation for image superpixels = skimage.segmentation.quickshift(Xi, kernel_size=4,max_dist=200, ratio=0.2) num_superpixels = np.unique(superpixels).shape[0] skimage.io.imshow(skimage.segmentation.mark_boundaries(Xi, superpixels)) print("The number of super pixels generated") num_superpixels LIME is a technique that explains how the input features of a machine learning model affect its predictions. For instance, for image classification tasks, LIME finds the region of an image (set of super-pixels) with the strongest association with a prediction label. LIME creates explanations by generating a new dataset of random perturbations (with their respective predictions) around the instance being explained and then fitting a weighted local surrogate model - model that gives explanation of individual predictions.
  • 52. #Generate perturbations num_perturb = 150 perturbations = np.random.binomial(1, 0.5, size=(num_perturb, num_superpixels)) #Create function to apply perturbations to images import copy def perturb_image(img,perturbation,segments): active_pixels = np.where(perturbation == 1)[0] mask = np.zeros(segments.shape) for active in active_pixels: mask[segments == active] = 1 perturbed_image = copy.deepcopy(img) perturbed_image = perturbed_image*mask[:,:,np.newaxis] return perturbed_image #Show example of perturbations print(perturbations[0]) predictions = [] for pert in perturbations: perturbed_img = perturb_image(Xi,pert,superpixels) pred = model.predict(perturbed_img[np.newaxis,:,:,:]) predictions.append(pred) predictions = np.array(predictions) print(predictions.shape) skimage.io.imshow(perturb_image(Xi,perturbations[0],superpixels)) skimage.io.imshow(perturb_image(Xi,perturbations[11],superpixels)) skimage.io.imshow(perturb_image(Xi,perturbations[2],superpixels)) #Compute distances to original image import sklearn.metrics original_image = np.ones(num_superpixels)[np.newaxis,:] #Perturbation with all superpixels enabled distances = sklearn.metrics.pairwise_distances(perturbations,original_image, metric='cosine').ravel() print(distances.shape) #Transform distances to a value between 0 an 1 (weights) using a kernel function kernel_width = 0.25 weights = np.sqrt(np.exp(-(distances**2)/kernel_width**2)) #Kernel function print(weights.shape)
  • 53. #Estimate linear model from sklearn.linear_model import LinearRegression class_to_explain = 4 simpler_model = LinearRegression() simpler_model.fit(X=perturbations, y=predictions[:,:,class_to_explain], sample_weight=weights) coeff = simpler_model.coef_[0] #Use coefficients from linear model to extract top features num_top_features = 4 top_features = np.argsort(coeff)[-num_top_features:] #Show only the superpixels corresponding to the top features mask = np.zeros(num_superpixels) mask[top_features]= True #Activate top superpixels skimage.io.imshow(perturb_image(Xi,mask,superpixels)) Conclusion This paper is focused on various techniques for classification of skin diseases. Automating the process of skin disease identification and classification can be very helpful and takes less time for diagnosis as well. This paper presents the survey of traditional or feature extraction based and CNN based approach for skin disease classification. From the study it is concluded that for traditional approach the feature selection process is time consuming also selection of relevant feature is very important. Whereas, the deep learning algorithm CNN learns the features automatically and efficiently, for feature extraction CNN selects the filters intelligently as compared with manual ones. The pre-trained models like Inception v3, resnet, VGG16, VGG19, Alexner etc are trained on very large dataset with millions of general images and can be used with transfer learning or fine tuning. However, the pre-trained model has to be trained from scratch if it is not being trained with skin disease images before. Also, the CNN needs quite big dataset for training so it can learn effectively as compare to the traditional way of skin disease classification.
  • 54. References [1] D.A. Okuboyejo, O.O. Olugbara and S.A. Odunaike, “Automating skin disease diagnosis using image classification,” In proceedings of the world congress on engineering and computer science 2013 Oct 23, Vol. 2, pp. 850-854. [2] A.A. Amarathunga, E.P. Ellawala, G.N. Abeysekar and C.R Amalraj, “Expert system for diagnosis of skin diseases,” International Journal of Scientific & Technology Research, 2015 Jan 4;4(01):174-8. [3] S. Chakraborty, K. Mali, S. Chatterjee, S. Anand, A. Basu, S. Banerjee, M. Das and A. Bhattacharya, “Image based skin disease detection using hybrid neural network coupled bag-of-features,” In 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), 2017 Oct 19, pp. 242-246. IEEE. [4] A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, 2017 Feb;542(7639):115-8. [5] X. Zhang, S. Wang, J. Liu and C. Tao, “Towards improving diagnosis of skin diseases by combining deep neural network and human knowledge,” BMC medical informatics and decision making, 2018 Jul;18(2):59. [6] T.J. Brinker, A. Hekler, J.S. Utikal, N. Grabe, D. Schadendorf, J. Klode, C. Berking, T. Steeb, A.H. Enk and C. von Kalle, “Skin cancer classification using convolutional neural networks: systematic review,” Journal of medical Internet research, 2018;20(10):e11936. [7] R. Kulhalli, C. Savadikar and B. Garware, “A Hierarchical Approach to Skin Lesion Classification,” In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data 2019 Jan 3 (pp. 245-250). [8] M.A. Khan, M.Y. Javed, M. Sharif, T. Saba and A. Rehman, “Multimodel deep neural network based features extraction and optimal selection approach for skin lesion classification,” In 2019 international conference on computer and information sciences (ICCIS) 2019 Apr 3 (pp. 1-7) IEEE. [9] J. Premaladha, S. Sujitha, M.L. Priya and K.S. Ravichandran, “A survey on melanoma diagnosis using image processing and soft computing techniques,” Research Journal of Information Technology, 2014 May;6(2):65-80. [10] S. Chatterjee, D. Dey, S. Munshi and S. Gorai, “Extraction of features from cross correlation in space and frequency domains for classification of skin lesions,” Biomedical Signal Processing and Control, 2019 Aug 1,53:101581. [11] M.S. Manerkar, U. Snekhalatha, S. Harsh, J. Saxena, S.P. Sarma and M. Anburajan, “Automated skin disease segmentation and classification using multi-class SVM classifier”. 2016. [12] N. Codella, J. Cai, M. Abedini, R. Garnavi, A. Halpern and J.R. Smith, “Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images,” In International workshop on machine learning in medical imaging, 2015 Oct 5 (pp. 118-126), Springer, Cham. [13] P.M. Burlina, N.J. Joshi, E. Ng, S.D. Billings, A.W. Rebman and J.N. Aucott, “Automated detection of erythema migrans and other confounding skin lesions via deep learning,” Computers in biology and medicine, 2019 Feb 1, 105:151-6. [14] I. Zaqout, “Diagnosis of skin lesions based on dermoscopic images using image processing TECHNIQUES,” In Pattern RecognitionSelected Methods and Applications, 2019 Jul 15, IntechOpen. [15] V.B. Kumar, S.S. Kumar and V. Saboo, “Dermatological disease detection using image processing and machine learning,” In 2016 Third International Conference on Artificial Intelligence and Pattern Recognition, (AIPR) 2016 Sep 19 (pp. 1-6). IEEE. [16] E. Jana, R. Subban and S. Saraswathi, “Research on Skin Cancer Cell Detection using Image Processing,” In 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), 2017 Dec 14, (pp. 1-8), IEEE. [17] M. Monisha, A. Suresh and M.R. Rashmi, “Artificial intelligence based skin classification using GMM,” Journal of medical systems, 2019 Jan 1, 43(1):3. [18] N.C. Codella NC, D. Gutman, M.E. Celebi, B. Helba, M.A. Marchetti, S.W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler and A. Halpern, “Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic),” In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 2018 Apr 4, (pp. 168-172), IEEE. >@ https://www.cancercenter.com/cancer-types/melanoma/symptoms - >@ https://www.mayoclinic.org/diseases-conditions >@ https://www.isic-archive.com >@ https://sites.google.com/site/robustmelanomascreening/dataset >@ https://www.dropbox.com/s/k88qukc20ljnbuo/PH2Dataset.rar [30]http://www.cs.rug.nl/~imaging/databases/melanoma_naevi/ [31]https://www.derm101.com/image-library/?match=IN >@ N. Yadav, V.K. Narang and U. Shrivastava, “Skin diseases detection models using image processing: A survey,” International Journal of Computer Applications, 2016 Mar, 137(12):34-9. >@ N. Gessert, T. Sentker, F. Madesta, R. Schmitz, H. Kniep, I. Baltruschat, 5 Werner and A. Schlaefer, “Skin Lesion Classification Using CNNswith Patch-Based Attention and Diagnosis-Guided Loss Weighting,” IEEE Transactions on Biomedical Engineering, 2019 May 9. >@ https://www.biospectrumindia.com/news/73/8437/skin-diseases-togrow-in-india-by-2015-report.html >@ https://www.who.int/uv/faq/skincancer/en/index1.html >@ https://towardsdatascience.com >@ www.analyticsvidhya.com