SlideShare a Scribd company logo
BOOST MODELACCURACY OF IMBALANCED COVID-
19 MORTALITY PREDICTION USING GAN-BASED
OVERSAMPLING TECHNIQUE.
CONTENTS
• Abstract
• Introduction
• GAN
• Data Preprocessing
• Data Analysis
• Evaluation Metrics
• Model Comparison
• Conclusion
• References
ABSTRACT
The model uses the COVID-19 patient's geographical, travel, health, and
demographic data to predict the severity of the case and the possible
outcome, recovery, or death. The data analysis reveals a positive correlation
between patients' gender and deaths, and also indicates that the majority of
patients are aged between 20 and 70 years. This paper proposes a fine-tuned
Random Forest model boosted by the AdaBoost algorithm.
INTRODUCTION
• The solved cases and data from these forums or published research
publications understand their methodology, and try to improve accuracy or
reduce the error with additional steps.
• Conventional methods include Random Oversampling (ROS), Synthetic
Minority Oversampling Technique (SMOTE) and others can be applied.
• The data used in studies were trained using 222 patient records with 13
features.
GENERATIVE ADVERSARIAL NETWORKS (GAN)
• Generative adversarial networks are based on a game-theoretic scenario in
which the generator network must compete against an adversary.
• As GAN learns to mimic the distribution of data, It is applied in various fields
such as music, video, and natural language, and more recently to imbalanced
data problems.
GENERATIVE ADVERSARIAL
NETWORKS (GAN)
 Oversampling based on Generative
Adversarial Networks(GAN) over
comes the limitations of conventional
method such as overfitting, and
allows the development of a highly
accurate prediction model of
imbalanced data FIG 1: GAN BASED OVERSAMPLING
https://cdn.Analytics.Com/wp-content/uploads/2020/10/image2-2.Png
HOW GAN GENERATE SYNTHETIC DATA?
• Two neural networks compete against each other to learn the target distribution
and generate artificial data.
• A generator network training samples to fool the discriminator.
• A discriminator network D: discriminate training samples and generated samples.
Column Description
Values (for categorical
variables)
Type
id Patient Id NA Numeric
location
The location where the
patient belongs to
Multiple cities located
throughout the world
String, Categorical
country Patient’s native country Multiple countries String, Categorical
gender Patient’s gender Male, Female String, Categorical
age Patient’s age NA Numeric
sym_on
The date patient started
noticing the symptoms
NA Date
DATASET
DATA PRE-PROCESSING
• The dataset consists of columns with the data being the Date, String, and
Numeric type. We also have categorical variables in the dataset.
• Since the ML model requires all the data that is passed as input to be in the
numeric form, we performed label-encoding of the categorical variables.
• This assigns a number to every unique categorical value in the column.
DEFINING GENERATOR
• The generator takes input from latent space and generates new synthetic samples.
The leaky rectified linear activation unit (LeakyReLU) is a good practice to use
in both the generator and the discriminator model for handling some negative
values.
• It is used with the default recommended value of 0.2 and the appropriate weight
initializer “he uniform”.
• In the output layer, the SoftMax activation function is used for categorical
variables and sigmoid is used for continuous variables.
DEFINING DISCRIMINATOR
• The discriminator model will take a sample from our data, such as a vector, and
output a classification prediction as to whether the sample is real or fake.
• This is a binary classification problem, so sigmoid activation is used in the
output layer and binary cross-entropy loss function is used in model
compilation.
• The Adam optimization algorithm with the learning rate LR of 0.0002 and the
recommended beta1 momentum value of 0.5 is used.
DATAANALYSIS
 Fever, cough, cold, fatigue, body pain,
and malaise were the most common
symptoms that were noticed in patients
whose data is available in this dataset.
 Correlation between features of the
dataset provides crucial information
about the features and the degree of
influence they have over the target value. FIG 2 : SYMPTOMS IN PATIENTS
https://www.Ncbi.v/pmc/articles/PMC7350612/figure/F1/
EVALUATION METRICS
• The purpose of evaluating the model, is three evaluation metrics.
• ACCURACY: Given a dataset consisting of (TP + TN) data points, the accuracy is
equal to the ratio of total correct predictions (TP + TN + FP + FN) by the classifier
to the total data points. Accuracy is an important measure which is used to assess the
performance of the classification model.
• Accuracy = TP + TN
TP + TN + FP + FN 0.0<Accuracy<1.0
PRECISION
• Precision is equal to the ratio of the True Positive (TP) samples to the sum of
True Positive (TP) and False Positive (FP) samples.
• Precision is also a key metric to identify the number of correctly classified
patients in an imbalanced class dataset.
• Precision = TP
TP + FP
RECALL
• Recall is equal to the ratio of the True Positive (TP) samples to the sum of True
Positive (TP) and False Negative (FN) samples.
• Recall is a significant metric to identify the number of correctly classified patients
in an imbalanced class dataset out of all the patients that could have been
correctly predicted.
• Recall = TP
TP + FN
F1 SCORE
• F1 Score is equal to the harmonic mean of Recall and Precision value.
• The F1 Score strikes the perfect balance between Precision and Recall thereby
providing a correct evaluation of the model's performance in classifying
COVID-19 patients.
• This is the most significant measure that we will be using to evaluate the model.
• F1 Score = 2 × Precision × Recall
Precision + Recall
EVALUATION METRICES
MODEL COMPARISON
 The model performance is tested on the
actual (original) split test data.
 After splitting the original data into train
and test, generated data from GAN is
added to the train data to compare the
performance with the base model.
FIG 3 : COMPARISON OF VARIOUS MODELS
MODEL COMPARISON
Metric Score of Base Model*
Score with Augmented
Generated Data
Recall Score 0.75 0.83
Precision Score 1 1
F1 Score 0.86 0.9
Accuracy 0.9 0.95
CONCLUSION
The proposed model provides a more accurate and robust result compared to that
of the based model, showing that GAN-based oversampling overcomes the
limitations of the imbalanced data and it appropriately inflates the minority class.
REFERENCES
[1] WHO Situation Report-94 Coronavirus disease 2019 (COVID-19) (2020).
[2] Sujatha R, Chatterjee JM, Hassanien AE. (2020).
[3] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017).
[4] Kathiresan S, Sait ARW, Gupta D, Lakshmanaprabu SK, Pandey HM (2020).]
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition.
[5] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely
connected convolutional networks.
DATA AVAILABILITY STATEMENT
• Novel Corona Virus 2019 Dataset (accessed April 23, 2020).
• Bayes C, Valdivieso L. Modelling death rates due to COVID-19: a Bayesian
approach.arXiv.(accessed May 5, 2020).
• The datasets presented in the study can be found in online repositories.
• GitHub repository Link :
https://github.com/bindhu520/Boost-Model-Accuracy-of-Imbalanced-
COVID-19-Mortality-Prediction-Using-GAN-based-Oversampling-Techni
THANK YOU

More Related Content

What's hot

A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsAlexander Decker
 
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...Pubrica
 
Errors in chemical analyses
Errors in chemical analysesErrors in chemical analyses
Errors in chemical analysesGrace de Jesus
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisWansuklangk
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysisSasquatch S
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesChristoph Molnar
 
Specification based or black box techniques
Specification based or black box techniques Specification based or black box techniques
Specification based or black box techniques Muhammad Ibnu Wardana
 
Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Andika Mardanu
 

What's hot (17)

Comparison and evaluation of alternative designs
Comparison and evaluation of alternative designsComparison and evaluation of alternative designs
Comparison and evaluation of alternative designs
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Dt33726730
Dt33726730Dt33726730
Dt33726730
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
 
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
Metasem: An R Package For Meta-Analysis Using Structural Equation Modelling: ...
 
Errors in chemical analyses
Errors in chemical analysesErrors in chemical analyses
Errors in chemical analyses
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two Cultures
 
Specification based or black box techniques
Specification based or black box techniques Specification based or black box techniques
Specification based or black box techniques
 
Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)Specification based or black box techniques (andika m)
Specification based or black box techniques (andika m)
 

Similar to Boost model accuracy of imbalanced covid 19 mortality prediction

CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Predicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B PatientsPredicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B Patientsnabeelali11101999
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
Analysis of Surveillance Data
Analysis of Surveillance DataAnalysis of Surveillance Data
Analysis of Surveillance DataPerez Eric
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Pptbarthriley
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in Indiaarjun_bhardwaj
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptxSridhar Nomula
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithmHoopeer Hoopeer
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response AnalysisAlejandro Jaramillo
 
IRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET Journal
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Seval Çapraz
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Exploratory Data Analysis and Machine Learning.pptx
Exploratory Data Analysis and Machine Learning.pptxExploratory Data Analysis and Machine Learning.pptx
Exploratory Data Analysis and Machine Learning.pptxAraniNavaratnarajah2
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation AssessmentsDr Lendy Spires
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
 

Similar to Boost model accuracy of imbalanced covid 19 mortality prediction (20)

CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
cadd.pptx
cadd.pptxcadd.pptx
cadd.pptx
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Predicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B PatientsPredicting Life Expectancy of Hepatitis B Patients
Predicting Life Expectancy of Hepatitis B Patients
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Analysis of Surveillance Data
Analysis of Surveillance DataAnalysis of Surveillance Data
Analysis of Surveillance Data
 
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.PptDetecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
Detecting Dif Between Conventional And Computerized Adaptive Testing.Ppt
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in India
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis
 
IRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction SystemIRJET- Heart Disease Prediction System
IRJET- Heart Disease Prediction System
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Exploratory Data Analysis and Machine Learning.pptx
Exploratory Data Analysis and Machine Learning.pptxExploratory Data Analysis and Machine Learning.pptx
Exploratory Data Analysis and Machine Learning.pptx
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
 

More from BindhuBhargaviTalasi (20)

Inheritance
InheritanceInheritance
Inheritance
 
Blood relations
Blood relationsBlood relations
Blood relations
 
Battery
BatteryBattery
Battery
 
Batteries
BatteriesBatteries
Batteries
 
Water
WaterWater
Water
 
Stories
StoriesStories
Stories
 
Predicates
PredicatesPredicates
Predicates
 
Mathematical foundations of computer science
Mathematical foundations of computer scienceMathematical foundations of computer science
Mathematical foundations of computer science
 
Jdbc
JdbcJdbc
Jdbc
 
Blue jacking
Blue jackingBlue jacking
Blue jacking
 
Mathematical foundations of computer science
Mathematical foundations of computer scienceMathematical foundations of computer science
Mathematical foundations of computer science
 
Algebraic structures
Algebraic structuresAlgebraic structures
Algebraic structures
 
Bike sharing prediction
Bike sharing predictionBike sharing prediction
Bike sharing prediction
 
Travel agency
Travel agencyTravel agency
Travel agency
 
Functions
FunctionsFunctions
Functions
 
Introduction to set theory
Introduction to set theoryIntroduction to set theory
Introduction to set theory
 
Library system
Library systemLibrary system
Library system
 
Data analytics
Data analyticsData analytics
Data analytics
 
Agristore
AgristoreAgristore
Agristore
 
Collection framework
Collection frameworkCollection framework
Collection framework
 

Recently uploaded

The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxCenterEnamel
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwoodseandesed
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdfAhmedHussein950959
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationDr. Radhey Shyam
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfAyahmorsy
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdfKamal Acharya
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdfKamal Acharya
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamDr. Radhey Shyam
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringDr. Radhey Shyam
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...Amil baba
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerapareshmondalnita
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 

Recently uploaded (20)

The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
 
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
NO1 Pandit Amil Baba In Bahawalpur, Sargodha, Sialkot, Sheikhupura, Rahim Yar...
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 

Boost model accuracy of imbalanced covid 19 mortality prediction

  • 1. BOOST MODELACCURACY OF IMBALANCED COVID- 19 MORTALITY PREDICTION USING GAN-BASED OVERSAMPLING TECHNIQUE.
  • 2. CONTENTS • Abstract • Introduction • GAN • Data Preprocessing • Data Analysis • Evaluation Metrics • Model Comparison • Conclusion • References
  • 3. ABSTRACT The model uses the COVID-19 patient's geographical, travel, health, and demographic data to predict the severity of the case and the possible outcome, recovery, or death. The data analysis reveals a positive correlation between patients' gender and deaths, and also indicates that the majority of patients are aged between 20 and 70 years. This paper proposes a fine-tuned Random Forest model boosted by the AdaBoost algorithm.
  • 4. INTRODUCTION • The solved cases and data from these forums or published research publications understand their methodology, and try to improve accuracy or reduce the error with additional steps. • Conventional methods include Random Oversampling (ROS), Synthetic Minority Oversampling Technique (SMOTE) and others can be applied. • The data used in studies were trained using 222 patient records with 13 features.
  • 5. GENERATIVE ADVERSARIAL NETWORKS (GAN) • Generative adversarial networks are based on a game-theoretic scenario in which the generator network must compete against an adversary. • As GAN learns to mimic the distribution of data, It is applied in various fields such as music, video, and natural language, and more recently to imbalanced data problems.
  • 6. GENERATIVE ADVERSARIAL NETWORKS (GAN)  Oversampling based on Generative Adversarial Networks(GAN) over comes the limitations of conventional method such as overfitting, and allows the development of a highly accurate prediction model of imbalanced data FIG 1: GAN BASED OVERSAMPLING https://cdn.Analytics.Com/wp-content/uploads/2020/10/image2-2.Png
  • 7. HOW GAN GENERATE SYNTHETIC DATA? • Two neural networks compete against each other to learn the target distribution and generate artificial data. • A generator network training samples to fool the discriminator. • A discriminator network D: discriminate training samples and generated samples.
  • 8. Column Description Values (for categorical variables) Type id Patient Id NA Numeric location The location where the patient belongs to Multiple cities located throughout the world String, Categorical country Patient’s native country Multiple countries String, Categorical gender Patient’s gender Male, Female String, Categorical age Patient’s age NA Numeric sym_on The date patient started noticing the symptoms NA Date DATASET
  • 9. DATA PRE-PROCESSING • The dataset consists of columns with the data being the Date, String, and Numeric type. We also have categorical variables in the dataset. • Since the ML model requires all the data that is passed as input to be in the numeric form, we performed label-encoding of the categorical variables. • This assigns a number to every unique categorical value in the column.
  • 10. DEFINING GENERATOR • The generator takes input from latent space and generates new synthetic samples. The leaky rectified linear activation unit (LeakyReLU) is a good practice to use in both the generator and the discriminator model for handling some negative values. • It is used with the default recommended value of 0.2 and the appropriate weight initializer “he uniform”. • In the output layer, the SoftMax activation function is used for categorical variables and sigmoid is used for continuous variables.
  • 11. DEFINING DISCRIMINATOR • The discriminator model will take a sample from our data, such as a vector, and output a classification prediction as to whether the sample is real or fake. • This is a binary classification problem, so sigmoid activation is used in the output layer and binary cross-entropy loss function is used in model compilation. • The Adam optimization algorithm with the learning rate LR of 0.0002 and the recommended beta1 momentum value of 0.5 is used.
  • 12. DATAANALYSIS  Fever, cough, cold, fatigue, body pain, and malaise were the most common symptoms that were noticed in patients whose data is available in this dataset.  Correlation between features of the dataset provides crucial information about the features and the degree of influence they have over the target value. FIG 2 : SYMPTOMS IN PATIENTS https://www.Ncbi.v/pmc/articles/PMC7350612/figure/F1/
  • 13. EVALUATION METRICS • The purpose of evaluating the model, is three evaluation metrics. • ACCURACY: Given a dataset consisting of (TP + TN) data points, the accuracy is equal to the ratio of total correct predictions (TP + TN + FP + FN) by the classifier to the total data points. Accuracy is an important measure which is used to assess the performance of the classification model. • Accuracy = TP + TN TP + TN + FP + FN 0.0<Accuracy<1.0
  • 14. PRECISION • Precision is equal to the ratio of the True Positive (TP) samples to the sum of True Positive (TP) and False Positive (FP) samples. • Precision is also a key metric to identify the number of correctly classified patients in an imbalanced class dataset. • Precision = TP TP + FP
  • 15. RECALL • Recall is equal to the ratio of the True Positive (TP) samples to the sum of True Positive (TP) and False Negative (FN) samples. • Recall is a significant metric to identify the number of correctly classified patients in an imbalanced class dataset out of all the patients that could have been correctly predicted. • Recall = TP TP + FN
  • 16. F1 SCORE • F1 Score is equal to the harmonic mean of Recall and Precision value. • The F1 Score strikes the perfect balance between Precision and Recall thereby providing a correct evaluation of the model's performance in classifying COVID-19 patients. • This is the most significant measure that we will be using to evaluate the model. • F1 Score = 2 × Precision × Recall Precision + Recall
  • 18. MODEL COMPARISON  The model performance is tested on the actual (original) split test data.  After splitting the original data into train and test, generated data from GAN is added to the train data to compare the performance with the base model. FIG 3 : COMPARISON OF VARIOUS MODELS
  • 19. MODEL COMPARISON Metric Score of Base Model* Score with Augmented Generated Data Recall Score 0.75 0.83 Precision Score 1 1 F1 Score 0.86 0.9 Accuracy 0.9 0.95
  • 20. CONCLUSION The proposed model provides a more accurate and robust result compared to that of the based model, showing that GAN-based oversampling overcomes the limitations of the imbalanced data and it appropriately inflates the minority class.
  • 21. REFERENCES [1] WHO Situation Report-94 Coronavirus disease 2019 (COVID-19) (2020). [2] Sujatha R, Chatterjee JM, Hassanien AE. (2020). [3] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017). [4] Kathiresan S, Sait ARW, Gupta D, Lakshmanaprabu SK, Pandey HM (2020).] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. [5] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks.
  • 22. DATA AVAILABILITY STATEMENT • Novel Corona Virus 2019 Dataset (accessed April 23, 2020). • Bayes C, Valdivieso L. Modelling death rates due to COVID-19: a Bayesian approach.arXiv.(accessed May 5, 2020). • The datasets presented in the study can be found in online repositories. • GitHub repository Link : https://github.com/bindhu520/Boost-Model-Accuracy-of-Imbalanced- COVID-19-Mortality-Prediction-Using-GAN-based-Oversampling-Techni