SlideShare a Scribd company logo
PREDICTION OF COVID-19 USING
MACHINE LEARNING TECHNIQUES
Narenraj Vivekanandan
Dept. of Electrical Engineering
National Institute of Technology
Calicut, India
naren.raj.vivek7@gmail.com
Mohamed Ashiq Rahman S
Dept. of Electrical Engineering
National Institute of Technology
Calicut, Inda
ashiqxq@gmail.com
Vedant Mahalle
Dept. of Electrical Engineering
National Institute of Technology
Calicut, India
vedantmahalle21@gmail.com
Sharath M Nair
Dept. of Electrical Engineering
National Institute of Technology
Calicut, India
sharathmappu99@gmail.com
Abstract​—Numerous techniques have been proposed by
WHO and other esteemed medical authorities for the diagnosis of
the COVID-19 virus. The most popular diagnostic method is
Reverse transcription polymerase chain reaction (RT –PCR).
Other clinical diagnosis techniques involve antibody tests. There
has been other research focused on classifying the covid vs
non-covid classification using chest x-ray images. However, many
of these classification is done over images that account for
increased overfitting. We propose a different model that employs
wavelet entropy to extract features from and then classify the
chest x-ray images. The proposed technique extracts space
frequency features from chest x-ray images using Discrete
Wavelet Transform, the dimensionality of which is reduced using
Shannon entropy technique, and resulting vector is trained using
Standard machine learning classifiers such as Logistic
Regression, Support Vector Machine, Decision Tree classifier,
Gaussian Naïve bayes and Convolutional Neural Network
Keywords— wavelet transform, entropy, logistic regression,
naive bayes, decision tree, support vector machine
I. I​NTRODUCTION
Rapid and reliable diagnosing of COVID-19 is one of the
foremost challenges we face today. This is most important for
those who may be critical and need medical care. The main
effect of SARS-COV-2 or COVID-19 is that it affects the
lungs of the infected person. The most common effects of the
virus is that it causes severe respiratory illness and pneumonia.
These effects can be commonly diagnosed with the
examination of Chest X-Ray(CXR) images. Previous studies
have shown that machine learning models are much more
accurate and better in reading X-ray images than a human eye.
Diagnosing COVID-19 with CXR images is much more
reliable and rapid than RT-PCR test or Antigen tests. We have
built a machine learning model which can help the medical
community with speedy diagnosis of COVID-19 with the use
of CXR images using a pretrained model. We will use
Discrete Wavelet Transform(DWT) for feature extraction as
studies have shown that wavelet transforms are excellent in
detecting edges and distinguishing frequencies. We will
further use different classifiers to train our model such as
Logistic Regression, Support Vector Machine , Decision Tree
Classifier and Naive Bayes and study the results. Also, we will
examine the effect of using CNN to classify our images
without feature extraction.
II. BASIC CONCEPTS
A. Discrete Wavelet Transform
The discrete wavelet transform (DWT) is used to get the
multi-scale (frequency) representation of the function.Using
wavelets, the image data can be analyzed in multiple
resolutions. Wavelet transformation is better at capturing fine
details because of the high frequency components. The 1D
-DWT of signal x is calculated by passing it through a pair of
high and low pass filters (quadrature mirror filters) with
impulse response h, and g respectively
Fig 1. Filter representation of Wavelet Transform.
The Approximation coefficients is represented by
[k]g[2n ]
Y low = ∑
∞
k=−∞
x − k (2)
The detail coefficients are represented as
[k]h[2n ]
Y high = ∑
∞
k=−∞
x − k (3)
At every decomposed level since half of the frequency is
discarded, half of the samples can be discarded as well as per
the Nyquist criterion.
1 dimensional discrete wavelet transform (1D-DWT) can
be extended to (2D-DWT) by processing along the x and y
axis using low pass filters (expanded wavelets) and high pass
filters (shrunken wavelets). Four sub-band of images (HH1,
LH1, HL1, LL1) at each scale will be generated after the
level-1 decomposition. The A1 sub-band containing the
low-frequency components can be regarded as the
approximation component of the image.
Fig 2. 2-level wavelet decomposition
while the LH, HL, and HH sub-bands, which contain
relatively higher frequency portions of the image, have the
more detailed components of the image.
Working over the assumption that most of the image data is
contained in the LL1 sub-band, it can be further decomposed
to level-2 thus arriving at 7 sub-bands (HH1, HL2, LH2, HH2,
HL1, LH1, HH1)
B. Entropy
The major disadvantage of the Discrete wavelet transform
technique is the curse of dimensionality. Too many features
results in increased computation times and excessive storage
memory. To overcome this disadvantage, we have to reduce
the number of coefficients, thus we employ an additional
parameter, entropy, to reduce the dimension by averaging out
the inter-related variables while maintaining the sufficient
information. In information theory entropy is the minimum
limit to which you can compress an information without loss.
Shannon defined that the entropy H for a discrete random
variable X with values {x1, x2, … xn}and probability mass
function P(X) as:
H(X) = - (4)
log x
∑
n
i=0
xi b i
Shanon’s entropy thus quantifies the amount of
information available in a variable. It’s metric is defined as
the absolute minimum amount of storage required to
succinctly capture any information.
C. Feature Extraction
For a 256*256 image there can be 65536 coefficients
however with the inclusion of entropy parameter, the number
of features can be reduced to 7 entropy vectors with each
vector corresponding to a sub-band after 2-level 2D wavelet
transform of the image. This can be computationally efficient.
III. M​ACHINE​ ​LEARNING​ ​MODELS
A. Naive Bayes
Naive Bayes is a family of probabilistic algorithms that use
probability theory and the theorem of Bayes to predict an
event. They are probabilistic, meaning that they measure for a
given data the likelihood of each label, and then output the
label with the highest one. Using Bayes' Theorem, which
defines the likelihood of feature, is the way they get these
probabilities, based on previous knowledge of what could be
relevant to that feature.
Abstractly, Naive Bayes is a Conditional Probability
model: We are given a problem sample X to be classified,
where
{x , x , x ......, x }
X = 1 2 3 n (5)
Where X represents n features (independent variables).
The probability estimated from the model will be a dependent
class C with a small number of outcomes (Covid positive/
negative here) conditional on feature vector X.
(6)
(C |x , , x ......, x )
P K 1 x2 3 n
Here if a feature can take on a large number of values, or
the number of features n is large, then basing such a model on
Probability tables is impractical. Thus using Bayes’ Theorem,
the conditional probability can be reduced to
(C |x)
p k =
p(x)
p(C )p(x|C )
k K
(7)
Thus the posterior probability is formed combining both
sources of information, the prior and the likelihood. Since the
features are known beforehand, the denominator is a constant
and is not considered in practice.
Now considering the conditional independence of the
features i.e since each feature Xi is independent, the joint
model can be expressed as
(8)
(C | x , , x ......, x ) ∝p(C ) (x |C )
P K 1 x2 3 n k ∏
n
i=1
p i k
Where P(xi | Ck) can be estimated using the training
sample.
B. Logistic Regression
The name logistic regression comes from the logistic
function or the sigmoid function used as the activation
function. The sigmoid function has a range of 0 to 1 thus it is
widely used in models that require a probability estimate as an
output.
Logistic regression is a statistical model that in its basic
form uses a logistic function to model a binary dependent
variable. In regression the parameters corresponding to most
accurate probability is estimated.
Let X be an n*d dimensional matrix. Here n is the number
of samples and d is the number of features or independent
attributes, and y be a binary outcomes vector. y is a
n*1dimensional matrix which corresponds to the labels for
each 1*d data in X
A linear model to describing this problem would be of
form
(9)
W X
Z = T
+ B
(10)
(z)
y
︿
= a = σ
(11)
(a, y) loga 1 )log(1 )
L = − y + ( − y − a
Where a is the sigmoid of z and represents the probability
of a class to occur given a data in X and y is the ground truth
(0 or 1). L is the loss function which is a relationship between
y and a, and the objective of the regression is to estimate the
parameter vectors w and b to minimise the Loss function as
much as possible. This can be done using Gradient Descent.
In gradient descent, we reduce the parameters w and b by
dw and db until the optimal parameters are achieved. Here dw
is the derivative of the loss function with respect to the
parameter w and db is the derivative of the loss function with
respect to the parameter b. Here,
dz = a - y (12)
dw = x*dz (13)
db = dz (14)
w = w - *dw (15)
b = b - *db (16)
Where 𝜶 is the learning rate of the algorithm.
C. Decision Tree Classifier
Decision tree algorithm is from a class of supervised machine
learning algorithms. The goal of the classifier is to create an
optimal decision tree from the given set of features and labels
so that it can predict the label of a new set of features by
iterating down the decision tree.
A decision tree consists of a root node (which is the best
predictor) , a set of inner nodes and leaf nodes. Leaf nodes
correspond to different classes the dataset belongs to, whereas
the root node and the inner nodes correspond to the features
extracted from the dataset.
The performance of the classifier depends on how good the
tree is constructed from the training data. The process of
building a decision tree is recursive. It begins from the root
node and continues to split the dataset into many subsets
depending on the number of classes. The features which best
predicts a particular sub dataset takes the place in that
particular inner node in the tree.
A common metric to measure which feature is the best
predictor of a sub dataset is the Gini impurity of that sub
dataset. Gini impurity measures how often a random element
from the dataset would be mis-classfied if it was randomly
labeled according to the distribution of classes in the subset.
The Gini impurity can be calculated by summing the
probability of class i being chosen times the probability of
pi
misclassifying that item which is .
1 − pi
To compute the Gini impurity of a sub dataset with J classes
(p) (p ) (1 )
G = ∑
J
i=1
i ∑
k=i
/
pk = ∑
J
i=1
pi − pi = ∑
J
i−1
pi − ∑
J
i=1
pi
2
Hence, (p)
G = 1 − ∑
J
i=1
pi
2
(17)
Where can be estimated in each sub dataset.
pi
D. Support Vector Machine
The Support Vector Machine (SVM) is a machine learning
classifier that takes a multi-dimensional data vector and the
class/label they belong to and establishes a boundary called
the decision boundary between the various classes, so that it is
simple to identify new data by inspecting the boundary it falls
within.
However, depending on the parameters a maximum margin
classifier may not always lead to an optimal decision boundary
as, if there are errors on either side of the boundary the
boundary may be very close to some data points. Hence, it is
important to sometimes allow misclassifications to find the
optimal boundary. Such a classifier that allows some
misclassification to find the most optimal boundary with
maximum margin is called a soft margin classifier or a support
vector classifier.
Mathematically, the aim of support vector machine is to
minimize in relation with eq.(18) and subject to eq.(19)
|w|
2
1 2
X
Y = WT
+ B (18)
Y < , x − |
| i − w i > b ≤ ε (19)
Again, a linear support vector classifier may not always be
optimal in the case of a dataset with complex features. Hence
different kernel functions exist using which we can find the
maximal margin hyperplane. Some of the more common
kernels are linear kernels, polynomial kernels and RBF
kernels. Kernels like polynomial kernel work in higher
dimensions to find the best support vector classifier while
radial basis function (RBF) also known as Gaussian kernels
are functions that are based on the absolute distance from a
data point (r = ||x−xi||) . The RBF kernel between two data
points,x and x′ is defined by
(x, x ) e
K ′ = −γ||x−x ||
′
2
(20)
Where is the Euclidean distance, γ is a parameter
||x ||
− x′ 2
specified and K(x,x′) is given as a feature vector.
IV. C​ONVOLUTIONAL​ N​EURAL​ N​ETWORKS
A Convolutional Neural Network is a deep learning neural
network that is used to analyze visual imagery. It consists of
several layers in the order: input layer, hidden
layers(convolution layers, pooling layers and fully connected
layers) and output layer. ConvNet learns the features by
applying appropriate kernel filters. As the parameters are
decreased and weights updated, the network is able to
generalise very well on the image dataset. Its work is to ensure
that the images are in form that is easily handled, without
compromising the features which are essential for obtaining an
accurate prediction.
The convolution operation is a mathematical operation
applied on the input images to capture the high-level features
such as edges.A Pooling Layer almost always follows a
Convolutional layer and is used to reduce spatial size of the
matrix. It also employs dimensionality reduction to efficiently
lower the computational power necessary for model training.
By applying the above techniques, we have a convolved
matrix which understands several features from the images
fed. We will now flatten the matrix and employ a neural
network for classification.The flattened matrix has values
which are non-linear combinations and in order to learn these
combinations and make accurate predictions, we use a
Fully-Connected layer which in this case is a multi-level
perceptron. Backpropagation is applied to every iteration of
training. After some epochs, the model classifies the image
into two classes using the Sigmoid Classifier.
V. C​LASSIFICATION​ & ​COMPARISON
A. Dataset
We used the publicly available CovidX dataset Covid-Net
Open Source Initiative by Linda Wang, Alexander Wong from
Department of Systems Design Engineering, University of
Waterloo, Canada. This is a standard and labelled dataset. This
dataset contains 14904 Non-Covid images and 594 Covid
images
Fig 3. Covid -ve CXR images
The images were read and converted to integer representation
using cv2 module, the obtained values were scaled uniformly
to avoid zero values that may lead to division by zero
scenarios. The images were then transformed to a 7 feature
vector using DWT and entropy.
​Fig 4: Covid +ve CXR images
B. Result and Analysis
Choosing the appropriate parameters is essential to
arriving at the best classification model, for which we used
hyper-parameter tuning techniques to validate our models at
different parameter values. The Naive Bayes classifier turned
out to be independent of the major parameters such as prior
probability, the Logistic regression performed better with the
penalty set as ‘l2’ which uses ridge method, and solver set as
‘newton-cg’ that uses second order derivatives to arrive at
optimization. The DTC performed the best with the criterion
parameter set to ‘entropy’ as compared to ‘gini’. The SVM
showed the best with parameter ‘C’ set to 63, this parameter is
inversely proportional to the proportion of mis-classification
allowed, in SVM the kernel was set to ‘RBF’ as expected -
allowing classification to work in infinite dimension, the
gamma, that defines the curvature of rbf kernel is set to 0.001
thus allowing less curvature.
We compared the features obtained from DWT+entropy
technique using Decision Tree classifier, Logistic Regression
Classifier, Naive Bayes Classifier and Support Vector
Machine; The classification parameters were obtained from a
method of Hyper-parameter tuning. Alternatively, we used the
image directly without any other feature extraction in the
Convolutional Neural Network based classifiers.
TABLE I. C​
LASSIFICATION​ C​
OMPARISON
Feature Precision
Score
Recall
Score
F1-Score Accuracy
CNN NA NA NA 83.44%
DWT+ENTRO
PY+SVM
0.9162 0.867 0.8854 99.13%
DWT+DTC 0.855 0.8538 0.8494 98.85%
DWT+LRC 0.909 0.4298 0.5556 97.59%
DWT+NBC 0.907 0.7932 0.8414 98.86%
The f1-score is chosen as the appropriate classification
metric since we were dealing with an imbalanced dataset.
As evident from the scores given in Table I. the support
vector machine classifier did the best job at classification, with
a mean f1-score of 0.8854. The support vector machine came
ahead in all other classification metrics as well.
The logistic regression performed worse with a ‘F1-score’
of 0.5556 which is marginally better than random prediction,
this underwhelming performance can be attributed to the
linearly inseparable nature of the feature set, which the logistic
regression cannot classify
VI. C​ONCLUSION
.In this paper, we compared ML classification algorithms
to accurately predict covid-19 using the feature set extracted
from wavelet entropy.
Although the entropy values and other hyper-parameters
used in the classification are difficult to interpret, the proposed
method using SVM has good classification results. The
classification metrics can be improved by training with more
images, and more robust hyperparameter tuning, alternatively
we can use techniques other than entropy as a dimensionality
reduction measure.
The model can be further improved to accommodate more
diseases that can be diagnosed using CXR images thus in
future we can improve the model to a multi-disease
classification model. .
A​CKNOWLEDGMENT
The work was done under the guidance of Dr. Shihabudeen
K.V, Assistant professor at National Institute of Technology,
Calicut.
R​EFERENCES​.​.
[1] Sun, Da-Wen. (2008). Computer Vision Technology for Food Quality
Evaluation. 10.1016/B978-0-12-373642-0.X5001-7.
[2] Zhou, Xing-Xing & Zhang, Yu-Dong & ji, Genlin & Yang, Jiquan &
Dong, Zhengchao & Wang, Shuihua & Zhang, Guangshuai & Phillips,
Preetha. (2016). Detection of abnormal MR brains based on wavelet
entropy and feature selection. IEEJ Transactions on Electrical and
Electronic Engineering. 11. n/a-n/a. 10.1002/tee.22226.
[3] Akshay Iyer, Akanksha Pandey, Dipti Pamnani, Karmanya Pathak and
Prof. Mrs. Jayshree Hajgude “Email Filtering and Analysis Using
Classification Algorithms” IJCSI International Journal of Computer
Science Issues, Vol. 11, Issue 4, No 1, July 2014
[4] Joaquim de Moura, Jorge Novo, Marcos Ortega. "Fully automatic deep
convolutional approaches for the analysis of Covid-19 using chest X-ray
images", Cold Spring Harbor Laboratory, 2020
[5] Sohaib Asif, Yi Wenhui. "Automatic Detection of COVID-19 Using
X-ray Images with Deep Convolutional Neural Networks and Machine
Learning", Cold Spring Harbor Laboratory, 2020
[6] Zhang, Yudong, Shuihua Wang, Preetha Phillips, Zhengchao Dong,
Genlin Ji, and Jiquan Yang. "Detection of Alzheimer's disease and mild
cognitive impairment based on structural volumetric MR images using
3D-DWT and WTA-KSVM trained by PSOTVAC", Biomedical Signal
Processing and Control, 2015.
[7] Jian-Ding Qiu. "Prediction of the Types of Membrane Proteins Based on
Discrete Wavelet Transform and Support Vector Machines", The Protein
Journal, 02/18/2010
[8] "Wavelet-entropy based detection of pathological brain in MRI
scanning", Computer Science and Applications, 2015.
[9]
[10] Maher Maalouf. "Logistic regression in data analysis: an overview",
International Journal of Data Analysis Techniques and Strategies, 2011

More Related Content

Similar to Conference_paper.pdf

EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
Yaxin Liu
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
ANIRBANMAJUMDAR18
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Farah M. Altufaili
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
paperpublications3
 
DICTA 2017 poster
DICTA 2017 posterDICTA 2017 poster
DICTA 2017 poster
Ashek Ahmmed
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
ijcsit
 
Density Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri ImageDensity Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri Image
IOSRjournaljce
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace Reduction
Mohammad
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
Editor Jacotech
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
Editor Jacotech
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
IJRAT
 
Gesture recognition system
Gesture recognition systemGesture recognition system
Gesture recognition system
eSAT Journals
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
shesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
shesnasuneer
 
mini prjt
mini prjtmini prjt
Gesture Recognition using Principle Component Analysis & Viola-Jones Algorithm
Gesture Recognition using Principle Component Analysis &  Viola-Jones AlgorithmGesture Recognition using Principle Component Analysis &  Viola-Jones Algorithm
Gesture Recognition using Principle Component Analysis & Viola-Jones Algorithm
IJMER
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
IRJET Journal
 
A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...
eSAT Publishing House
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
butest
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
butest
 

Similar to Conference_paper.pdf (20)

EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
 
DICTA 2017 poster
DICTA 2017 posterDICTA 2017 poster
DICTA 2017 poster
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
Density Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri ImageDensity Driven Image Coding for Tumor Detection in mri Image
Density Driven Image Coding for Tumor Detection in mri Image
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace Reduction
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
 
Gesture recognition system
Gesture recognition systemGesture recognition system
Gesture recognition system
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
mini prjt
mini prjtmini prjt
mini prjt
 
Gesture Recognition using Principle Component Analysis & Viola-Jones Algorithm
Gesture Recognition using Principle Component Analysis &  Viola-Jones AlgorithmGesture Recognition using Principle Component Analysis &  Viola-Jones Algorithm
Gesture Recognition using Principle Component Analysis & Viola-Jones Algorithm
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 

Recently uploaded

22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 

Recently uploaded (20)

22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 

Conference_paper.pdf

  • 1. PREDICTION OF COVID-19 USING MACHINE LEARNING TECHNIQUES Narenraj Vivekanandan Dept. of Electrical Engineering National Institute of Technology Calicut, India naren.raj.vivek7@gmail.com Mohamed Ashiq Rahman S Dept. of Electrical Engineering National Institute of Technology Calicut, Inda ashiqxq@gmail.com Vedant Mahalle Dept. of Electrical Engineering National Institute of Technology Calicut, India vedantmahalle21@gmail.com Sharath M Nair Dept. of Electrical Engineering National Institute of Technology Calicut, India sharathmappu99@gmail.com Abstract​—Numerous techniques have been proposed by WHO and other esteemed medical authorities for the diagnosis of the COVID-19 virus. The most popular diagnostic method is Reverse transcription polymerase chain reaction (RT –PCR). Other clinical diagnosis techniques involve antibody tests. There has been other research focused on classifying the covid vs non-covid classification using chest x-ray images. However, many of these classification is done over images that account for increased overfitting. We propose a different model that employs wavelet entropy to extract features from and then classify the chest x-ray images. The proposed technique extracts space frequency features from chest x-ray images using Discrete Wavelet Transform, the dimensionality of which is reduced using Shannon entropy technique, and resulting vector is trained using Standard machine learning classifiers such as Logistic Regression, Support Vector Machine, Decision Tree classifier, Gaussian Naïve bayes and Convolutional Neural Network Keywords— wavelet transform, entropy, logistic regression, naive bayes, decision tree, support vector machine I. I​NTRODUCTION Rapid and reliable diagnosing of COVID-19 is one of the foremost challenges we face today. This is most important for those who may be critical and need medical care. The main effect of SARS-COV-2 or COVID-19 is that it affects the lungs of the infected person. The most common effects of the virus is that it causes severe respiratory illness and pneumonia. These effects can be commonly diagnosed with the examination of Chest X-Ray(CXR) images. Previous studies have shown that machine learning models are much more accurate and better in reading X-ray images than a human eye. Diagnosing COVID-19 with CXR images is much more reliable and rapid than RT-PCR test or Antigen tests. We have built a machine learning model which can help the medical community with speedy diagnosis of COVID-19 with the use of CXR images using a pretrained model. We will use Discrete Wavelet Transform(DWT) for feature extraction as studies have shown that wavelet transforms are excellent in detecting edges and distinguishing frequencies. We will further use different classifiers to train our model such as Logistic Regression, Support Vector Machine , Decision Tree Classifier and Naive Bayes and study the results. Also, we will examine the effect of using CNN to classify our images without feature extraction. II. BASIC CONCEPTS A. Discrete Wavelet Transform The discrete wavelet transform (DWT) is used to get the multi-scale (frequency) representation of the function.Using wavelets, the image data can be analyzed in multiple resolutions. Wavelet transformation is better at capturing fine details because of the high frequency components. The 1D -DWT of signal x is calculated by passing it through a pair of high and low pass filters (quadrature mirror filters) with impulse response h, and g respectively Fig 1. Filter representation of Wavelet Transform. The Approximation coefficients is represented by [k]g[2n ] Y low = ∑ ∞ k=−∞ x − k (2) The detail coefficients are represented as [k]h[2n ] Y high = ∑ ∞ k=−∞ x − k (3)
  • 2. At every decomposed level since half of the frequency is discarded, half of the samples can be discarded as well as per the Nyquist criterion. 1 dimensional discrete wavelet transform (1D-DWT) can be extended to (2D-DWT) by processing along the x and y axis using low pass filters (expanded wavelets) and high pass filters (shrunken wavelets). Four sub-band of images (HH1, LH1, HL1, LL1) at each scale will be generated after the level-1 decomposition. The A1 sub-band containing the low-frequency components can be regarded as the approximation component of the image. Fig 2. 2-level wavelet decomposition while the LH, HL, and HH sub-bands, which contain relatively higher frequency portions of the image, have the more detailed components of the image. Working over the assumption that most of the image data is contained in the LL1 sub-band, it can be further decomposed to level-2 thus arriving at 7 sub-bands (HH1, HL2, LH2, HH2, HL1, LH1, HH1) B. Entropy The major disadvantage of the Discrete wavelet transform technique is the curse of dimensionality. Too many features results in increased computation times and excessive storage memory. To overcome this disadvantage, we have to reduce the number of coefficients, thus we employ an additional parameter, entropy, to reduce the dimension by averaging out the inter-related variables while maintaining the sufficient information. In information theory entropy is the minimum limit to which you can compress an information without loss. Shannon defined that the entropy H for a discrete random variable X with values {x1, x2, … xn}and probability mass function P(X) as: H(X) = - (4) log x ∑ n i=0 xi b i Shanon’s entropy thus quantifies the amount of information available in a variable. It’s metric is defined as the absolute minimum amount of storage required to succinctly capture any information. C. Feature Extraction For a 256*256 image there can be 65536 coefficients however with the inclusion of entropy parameter, the number of features can be reduced to 7 entropy vectors with each vector corresponding to a sub-band after 2-level 2D wavelet transform of the image. This can be computationally efficient. III. M​ACHINE​ ​LEARNING​ ​MODELS A. Naive Bayes Naive Bayes is a family of probabilistic algorithms that use probability theory and the theorem of Bayes to predict an event. They are probabilistic, meaning that they measure for a given data the likelihood of each label, and then output the label with the highest one. Using Bayes' Theorem, which defines the likelihood of feature, is the way they get these probabilities, based on previous knowledge of what could be relevant to that feature. Abstractly, Naive Bayes is a Conditional Probability model: We are given a problem sample X to be classified, where {x , x , x ......, x } X = 1 2 3 n (5) Where X represents n features (independent variables). The probability estimated from the model will be a dependent class C with a small number of outcomes (Covid positive/ negative here) conditional on feature vector X. (6) (C |x , , x ......, x ) P K 1 x2 3 n Here if a feature can take on a large number of values, or the number of features n is large, then basing such a model on Probability tables is impractical. Thus using Bayes’ Theorem, the conditional probability can be reduced to (C |x) p k = p(x) p(C )p(x|C ) k K (7) Thus the posterior probability is formed combining both sources of information, the prior and the likelihood. Since the features are known beforehand, the denominator is a constant and is not considered in practice. Now considering the conditional independence of the features i.e since each feature Xi is independent, the joint model can be expressed as (8) (C | x , , x ......, x ) ∝p(C ) (x |C ) P K 1 x2 3 n k ∏ n i=1 p i k Where P(xi | Ck) can be estimated using the training sample. B. Logistic Regression The name logistic regression comes from the logistic function or the sigmoid function used as the activation function. The sigmoid function has a range of 0 to 1 thus it is widely used in models that require a probability estimate as an output. Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable. In regression the parameters corresponding to most accurate probability is estimated. Let X be an n*d dimensional matrix. Here n is the number of samples and d is the number of features or independent attributes, and y be a binary outcomes vector. y is a n*1dimensional matrix which corresponds to the labels for each 1*d data in X
  • 3. A linear model to describing this problem would be of form (9) W X Z = T + B (10) (z) y ︿ = a = σ (11) (a, y) loga 1 )log(1 ) L = − y + ( − y − a Where a is the sigmoid of z and represents the probability of a class to occur given a data in X and y is the ground truth (0 or 1). L is the loss function which is a relationship between y and a, and the objective of the regression is to estimate the parameter vectors w and b to minimise the Loss function as much as possible. This can be done using Gradient Descent. In gradient descent, we reduce the parameters w and b by dw and db until the optimal parameters are achieved. Here dw is the derivative of the loss function with respect to the parameter w and db is the derivative of the loss function with respect to the parameter b. Here, dz = a - y (12) dw = x*dz (13) db = dz (14) w = w - *dw (15) b = b - *db (16) Where 𝜶 is the learning rate of the algorithm. C. Decision Tree Classifier Decision tree algorithm is from a class of supervised machine learning algorithms. The goal of the classifier is to create an optimal decision tree from the given set of features and labels so that it can predict the label of a new set of features by iterating down the decision tree. A decision tree consists of a root node (which is the best predictor) , a set of inner nodes and leaf nodes. Leaf nodes correspond to different classes the dataset belongs to, whereas the root node and the inner nodes correspond to the features extracted from the dataset. The performance of the classifier depends on how good the tree is constructed from the training data. The process of building a decision tree is recursive. It begins from the root node and continues to split the dataset into many subsets depending on the number of classes. The features which best predicts a particular sub dataset takes the place in that particular inner node in the tree. A common metric to measure which feature is the best predictor of a sub dataset is the Gini impurity of that sub dataset. Gini impurity measures how often a random element from the dataset would be mis-classfied if it was randomly labeled according to the distribution of classes in the subset. The Gini impurity can be calculated by summing the probability of class i being chosen times the probability of pi misclassifying that item which is . 1 − pi To compute the Gini impurity of a sub dataset with J classes (p) (p ) (1 ) G = ∑ J i=1 i ∑ k=i / pk = ∑ J i=1 pi − pi = ∑ J i−1 pi − ∑ J i=1 pi 2 Hence, (p) G = 1 − ∑ J i=1 pi 2 (17) Where can be estimated in each sub dataset. pi D. Support Vector Machine The Support Vector Machine (SVM) is a machine learning classifier that takes a multi-dimensional data vector and the class/label they belong to and establishes a boundary called the decision boundary between the various classes, so that it is simple to identify new data by inspecting the boundary it falls within. However, depending on the parameters a maximum margin classifier may not always lead to an optimal decision boundary as, if there are errors on either side of the boundary the boundary may be very close to some data points. Hence, it is important to sometimes allow misclassifications to find the optimal boundary. Such a classifier that allows some misclassification to find the most optimal boundary with maximum margin is called a soft margin classifier or a support vector classifier. Mathematically, the aim of support vector machine is to minimize in relation with eq.(18) and subject to eq.(19) |w| 2 1 2 X Y = WT + B (18) Y < , x − | | i − w i > b ≤ ε (19) Again, a linear support vector classifier may not always be optimal in the case of a dataset with complex features. Hence different kernel functions exist using which we can find the maximal margin hyperplane. Some of the more common kernels are linear kernels, polynomial kernels and RBF kernels. Kernels like polynomial kernel work in higher dimensions to find the best support vector classifier while radial basis function (RBF) also known as Gaussian kernels are functions that are based on the absolute distance from a data point (r = ||x−xi||) . The RBF kernel between two data points,x and x′ is defined by (x, x ) e K ′ = −γ||x−x || ′ 2 (20) Where is the Euclidean distance, γ is a parameter ||x || − x′ 2 specified and K(x,x′) is given as a feature vector.
  • 4. IV. C​ONVOLUTIONAL​ N​EURAL​ N​ETWORKS A Convolutional Neural Network is a deep learning neural network that is used to analyze visual imagery. It consists of several layers in the order: input layer, hidden layers(convolution layers, pooling layers and fully connected layers) and output layer. ConvNet learns the features by applying appropriate kernel filters. As the parameters are decreased and weights updated, the network is able to generalise very well on the image dataset. Its work is to ensure that the images are in form that is easily handled, without compromising the features which are essential for obtaining an accurate prediction. The convolution operation is a mathematical operation applied on the input images to capture the high-level features such as edges.A Pooling Layer almost always follows a Convolutional layer and is used to reduce spatial size of the matrix. It also employs dimensionality reduction to efficiently lower the computational power necessary for model training. By applying the above techniques, we have a convolved matrix which understands several features from the images fed. We will now flatten the matrix and employ a neural network for classification.The flattened matrix has values which are non-linear combinations and in order to learn these combinations and make accurate predictions, we use a Fully-Connected layer which in this case is a multi-level perceptron. Backpropagation is applied to every iteration of training. After some epochs, the model classifies the image into two classes using the Sigmoid Classifier. V. C​LASSIFICATION​ & ​COMPARISON A. Dataset We used the publicly available CovidX dataset Covid-Net Open Source Initiative by Linda Wang, Alexander Wong from Department of Systems Design Engineering, University of Waterloo, Canada. This is a standard and labelled dataset. This dataset contains 14904 Non-Covid images and 594 Covid images Fig 3. Covid -ve CXR images The images were read and converted to integer representation using cv2 module, the obtained values were scaled uniformly to avoid zero values that may lead to division by zero scenarios. The images were then transformed to a 7 feature vector using DWT and entropy. ​Fig 4: Covid +ve CXR images B. Result and Analysis Choosing the appropriate parameters is essential to arriving at the best classification model, for which we used hyper-parameter tuning techniques to validate our models at different parameter values. The Naive Bayes classifier turned out to be independent of the major parameters such as prior probability, the Logistic regression performed better with the penalty set as ‘l2’ which uses ridge method, and solver set as ‘newton-cg’ that uses second order derivatives to arrive at optimization. The DTC performed the best with the criterion parameter set to ‘entropy’ as compared to ‘gini’. The SVM showed the best with parameter ‘C’ set to 63, this parameter is inversely proportional to the proportion of mis-classification allowed, in SVM the kernel was set to ‘RBF’ as expected - allowing classification to work in infinite dimension, the gamma, that defines the curvature of rbf kernel is set to 0.001 thus allowing less curvature. We compared the features obtained from DWT+entropy technique using Decision Tree classifier, Logistic Regression Classifier, Naive Bayes Classifier and Support Vector Machine; The classification parameters were obtained from a method of Hyper-parameter tuning. Alternatively, we used the image directly without any other feature extraction in the Convolutional Neural Network based classifiers. TABLE I. C​ LASSIFICATION​ C​ OMPARISON Feature Precision Score Recall Score F1-Score Accuracy CNN NA NA NA 83.44% DWT+ENTRO PY+SVM 0.9162 0.867 0.8854 99.13% DWT+DTC 0.855 0.8538 0.8494 98.85% DWT+LRC 0.909 0.4298 0.5556 97.59% DWT+NBC 0.907 0.7932 0.8414 98.86%
  • 5. The f1-score is chosen as the appropriate classification metric since we were dealing with an imbalanced dataset. As evident from the scores given in Table I. the support vector machine classifier did the best job at classification, with a mean f1-score of 0.8854. The support vector machine came ahead in all other classification metrics as well. The logistic regression performed worse with a ‘F1-score’ of 0.5556 which is marginally better than random prediction, this underwhelming performance can be attributed to the linearly inseparable nature of the feature set, which the logistic regression cannot classify VI. C​ONCLUSION .In this paper, we compared ML classification algorithms to accurately predict covid-19 using the feature set extracted from wavelet entropy. Although the entropy values and other hyper-parameters used in the classification are difficult to interpret, the proposed method using SVM has good classification results. The classification metrics can be improved by training with more images, and more robust hyperparameter tuning, alternatively we can use techniques other than entropy as a dimensionality reduction measure. The model can be further improved to accommodate more diseases that can be diagnosed using CXR images thus in future we can improve the model to a multi-disease classification model. . A​CKNOWLEDGMENT The work was done under the guidance of Dr. Shihabudeen K.V, Assistant professor at National Institute of Technology, Calicut. R​EFERENCES​.​. [1] Sun, Da-Wen. (2008). Computer Vision Technology for Food Quality Evaluation. 10.1016/B978-0-12-373642-0.X5001-7. [2] Zhou, Xing-Xing & Zhang, Yu-Dong & ji, Genlin & Yang, Jiquan & Dong, Zhengchao & Wang, Shuihua & Zhang, Guangshuai & Phillips, Preetha. (2016). Detection of abnormal MR brains based on wavelet entropy and feature selection. IEEJ Transactions on Electrical and Electronic Engineering. 11. n/a-n/a. 10.1002/tee.22226. [3] Akshay Iyer, Akanksha Pandey, Dipti Pamnani, Karmanya Pathak and Prof. Mrs. Jayshree Hajgude “Email Filtering and Analysis Using Classification Algorithms” IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 4, No 1, July 2014 [4] Joaquim de Moura, Jorge Novo, Marcos Ortega. "Fully automatic deep convolutional approaches for the analysis of Covid-19 using chest X-ray images", Cold Spring Harbor Laboratory, 2020 [5] Sohaib Asif, Yi Wenhui. "Automatic Detection of COVID-19 Using X-ray Images with Deep Convolutional Neural Networks and Machine Learning", Cold Spring Harbor Laboratory, 2020 [6] Zhang, Yudong, Shuihua Wang, Preetha Phillips, Zhengchao Dong, Genlin Ji, and Jiquan Yang. "Detection of Alzheimer's disease and mild cognitive impairment based on structural volumetric MR images using 3D-DWT and WTA-KSVM trained by PSOTVAC", Biomedical Signal Processing and Control, 2015. [7] Jian-Ding Qiu. "Prediction of the Types of Membrane Proteins Based on Discrete Wavelet Transform and Support Vector Machines", The Protein Journal, 02/18/2010 [8] "Wavelet-entropy based detection of pathological brain in MRI scanning", Computer Science and Applications, 2015. [9] [10] Maher Maalouf. "Logistic regression in data analysis: an overview", International Journal of Data Analysis Techniques and Strategies, 2011