SlideShare a Scribd company logo
1 of 21
Analysis and Early Prediction of
Sepsis using Clinical Data
By Anushree Ankola
Advisor: Dr. Anand Panangadan
Reviewer : Prof. Tseng Chen James
Agenda
 Sepsis – Affects and Symptoms
 Objective
 Challenge Dataset
 Procedure
 EDA – missing values
 Feature Engineering
 EDA – Dataset Imbalance
 Choosing Accuracy metric
 Decision Tree for prediction
 Future Scope 1 - Using XGBoost for prediction
 Further research and findings
 Conclusion
 Reference
Sepsis - Statistics
Sepsis - Affect and Symptoms
 Affects:
• very young children,
• older adults,
• people with chronic diseases,
• and those with weakened immune system
 Sepsis can be difficult to diagnose because it occurs quickly and can be confused
with other conditions. Watch for a combination of the following symptoms.
 S Shivering, fever, or very cold
E Extreme pain or general discomfort (“worst ever”)
P Pale or discolored skin
S Sleepy, difficult to rouse, confused
I “I feel like I might die!”
S Short of breath
Objective
 Goal of the analysis is the early detection of sepsis using physiological data.
 The early prediction of sepsis is potentially life-saving, and we aim to predict
sepsis 6 hours before the clinical prediction of sepsis.
 Late prediction of sepsis is potentially life-threatening, and also consumes heavy
hospital resources.
 By predicting sepsis in non-sepsis patients or predicting sepsis very early in sepsis
patients consumes limited resources and we can assume the risk of prediction to
be minimal but revolutionary.
Challenge Dataset
 Data used in the competition is sourced from ICU patients in two separate hospital
systems and is obtained from Physionet.
 The data will be split into 70% Training and 30 % testing set. The training set will be
split for validating the training set.
 The original data for each patient will be contained within a single pipe-delimited text
file. Each file will have the same header and each row will represent a single hour's
worth of data. Each hospital have 20,000 patients and hence 20,000 files.
 Available patient co-variates consist of Demographics, Vital Signs, and Laboratory
values
 Features:
• 8 Vital Signs : Heart Rate, Temperature , Blood Pressure, Respiratory rate,
• 26 Laboratory Values : Platelet Count, Glucose , Calcium etc
• 6 Demographics : Age, Gender, Time in ICU , Hospital Admit time
 1 Label :
• 0 (Non-sepsis) and 1 (Sepsis)
Sepsis Data
Assumptions
 Combined dataset by appending all the patient files
 Total files: 43,765 psv files
 Shape of original file: (1552287 * 41)
 The dataset is not time dependent.
 2 approaches to solve it:
1. Add a time component and patient ID
2. Ignoring time component and consider each row independently
 Following 2nd approach. Reason: Can predict sepsis without past patient data. More
robust and need less resources.
Procedure
COMBINE ALL DATA NON-TIME
DEPENDENT
APPROACH
HANDLING MISSING
VALUES
HANDLING DATA
IMBALANCE
BASELINE
PREDICTION
FEATURE
ENGINEERING
EDA - Handling
Missing Values
 Most of Laboratory Data are having missing
values (Fig)
 There are more than 90% of missingness in
the dataset
 2 steps to handle:
• Remove features with missingness > 92%
• Categorically encode features to handle
missingness.
Feature Selection – Part 1
 Two Approaches employed for Feature Selection:
1. Checked correlation of features contributing to the presence of Sepsis
2. Read health magazines and Research journals such as
• US National Library of Medicine, National Institutes of Health
• Centers for Disease Control and Prevention
• Sepsis - The American Journal of Medicine
and filtered out the most named indicator of Sepsis
 Outcome: Heart rate, Pulse Oximetry, Body temperature, Blood
Pressure (SBP, DBP), Mean Arterial Pressure, Respiration rate, Frac of
inspired oxygen, Age, Gender, Hospital Admission Time and ICU
length of stay.
Feature Engineering & label encoding
 Developed 8 new features and are described:
1. new_age : has 3 categorical values – old, young and adult
2. new_hr, new_temp, new_o2sat, new_bp, new_resp, new_map, new_fio2: has 3
categorical values – normal, abnormal and missing
 Next, performed feature section again on them and selected all above features,
plus Gender, Hospital Admission Time and ICU length of Stay for further
processing as a training set
 ]
 All these are categorically values. They are encoded so that it is easier to run a ML
algorithm.
EDA – Handling Data
Imbalance
 98% of patients does not have sepsis and 2%
have sepsis.
 Problem with Accuracy
 Ways to deal with Imbalance:
• Under sampling
• Oversampling
• Using a good algorithm
• Using Balanced Bagging Classifier
 Which is better?
• Balanced Bagging Classifier with Decision Trees
Training Data with Decision
Trees
 Pre-work:
• Common classification Metrics are not useful as there is an imbalance in
the data– accuracy score
• Precision is defined as the fraction of relevant examples (true positives)
among all of the examples which were predicted to belong in a certain
class.
Precision = (true positives) / (true positives + false positives)
• Recall is defined as the fraction of examples which were predicted to
belong to a class with respect to all of the examples that truly belong in
the class.
Recall = (true positives) / (true positives + false negatives)
Training Data with
Decision Trees
 Using Balanced Bagging Classifier from
imblearn library, which automatically create
balanced samples of the input data.
 has the parameter 'ratio' that should control
how the data is sampled. I have used majority
- resample the majority class
 From Fig, although ROC curve seems
promising, we can see that P-R curve is not
great at classifying.
Training the data with XGBoost
XGBoost - eXtreme Gradient Boosting
• Boosting: Method converts
weak learners -> strong learners
• Boosting algorithm like XGBoost adds iterations of
the model sequentially, adjusting the weights of the
weak-learners along the way. This reduces bias from
the model and typically improves accuracy.
• Benefits of XGBoost: Highly scalable/parallelizable,
quick to execute, and typically out performs other
algorithms.
Further Research and Findings
 Time component Approach ; need domain expert
 PCA for understanding variables better
 Using SMOTE for handling Imbalance
 Work further on XGBoost
 Better Feature Engineering
 Ways to reduce Hospital stay time
Learning Curve with the Project
 Python – Object Oriented Structure and Programming
 Libraries heavily used – Sklearn, Matplotlib
 Built on Jupyter Notebook
Conclusion
 We have handled the missing ness and imbalance in the large dataset
 We removed missing values > 92%
 Performed feature engineering (8 new features) and selected important features
 We aimed to predict the onset of the sepsis by 6 hours and so far the Machine
Learning model employed seem to classify it partially
 The project has a scope of continuing with further research on the importance of
the features, better model building and under the guidance of a good health
science domain expert.
References
[1] https://www.physionet.org/content/challenge-2019/1.0.0/
[2] https://www.datacamp.com/community/tutorials/decision-tree-classification-python
[3] https://towardsdatascience.com/using-bagging-and-boosting-to-improve-classification-
tree-accuracy-6d3bb6c95e5b
[4] https://towardsdatascience.com/early-detection-of-sepsis-using-physiological-data-
78d5f31fab9d
[5] https://iopscience.iop.org/article/10.1088/1757-899X/428/1/012004
[6] https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-
classification-in-python/
[7] https://www.cdc.gov/
[8] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429642/
[9] http://www.erogol.com/fighting-class-unbalance-supervised-ml-problem/
Thank You
 I would like to thank my advisor Dr.
Anand Panangadan for helping me
with the project
 I would like to thank my friends at
Edward Life Sciences for advising me
on ways to approach the problem
 I would like my university for giving
me the necessary skills to attempt and
complete the project

More Related Content

Similar to Final_Presentation.pptx

Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
 
Disease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemDisease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemsabafarheen
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPAlAcademia Tsr
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer SystemIRJET Journal
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Mark Wilkinson
 
Zeroth review major project (1).pptx
Zeroth review major project (1).pptxZeroth review major project (1).pptx
Zeroth review major project (1).pptxShreyaBharadwaj7
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Clinical_Decision_Support_For_Heart_Disease
Clinical_Decision_Support_For_Heart_DiseaseClinical_Decision_Support_For_Heart_Disease
Clinical_Decision_Support_For_Heart_DiseaseSunil Kakade
 
Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...
Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...
Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...CSCJournals
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 
Proposed Model for Chest Disease Prediction using Data Analytics
Proposed Model for Chest Disease Prediction using Data AnalyticsProposed Model for Chest Disease Prediction using Data Analytics
Proposed Model for Chest Disease Prediction using Data Analyticsvivatechijri
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEwout Steyerberg
 
ICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining ApproachICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
 
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHcscpconf
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineSean Yu
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionalsNadir Mehmood
 

Similar to Final_Presentation.pptx (20)

Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
 
Disease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemDisease prediction and doctor recommendation system
Disease prediction and doctor recommendation system
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer System
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
Zeroth review major project (1).pptx
Zeroth review major project (1).pptxZeroth review major project (1).pptx
Zeroth review major project (1).pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Clinical_Decision_Support_For_Heart_Disease
Clinical_Decision_Support_For_Heart_DiseaseClinical_Decision_Support_For_Heart_Disease
Clinical_Decision_Support_For_Heart_Disease
 
Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...
Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...
Case Based Medical Diagnosis of Occupational Chronic Lung Diseases From Their...
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
BFRG AI Investor Aug 2023
BFRG AI Investor Aug 2023BFRG AI Investor Aug 2023
BFRG AI Investor Aug 2023
 
Proposed Model for Chest Disease Prediction using Data Analytics
Proposed Model for Chest Disease Prediction using Data AnalyticsProposed Model for Chest Disease Prediction using Data Analytics
Proposed Model for Chest Disease Prediction using Data Analytics
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
ICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining ApproachICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining Approach
 
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionals
 

Recently uploaded

Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meetpriyashah722354
 
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7Miss joya
 
Basics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxBasics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxAyush Gupta
 
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅gragmanisha42
 
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In ChandigarhHot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In ChandigarhVip call girls In Chandigarh
 
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunNiamh verma
 
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetChandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meetpriyashah722354
 
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋Sheetaleventcompany
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...High Profile Call Girls Chandigarh Aarushi
 
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...Gfnyt.com
 
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591adityaroy0215
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012Call Girls Service Gurgaon
 
💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋Sheetaleventcompany
 
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service GurgaonCall Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service GurgaonCall Girls Service Gurgaon
 
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In LudhianaHot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In LudhianaRussian Call Girls in Ludhiana
 

Recently uploaded (20)

Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
 
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
Vip Kolkata Call Girls Cossipore 👉 8250192130 ❣️💯 Available With Room 24×7
 
Basics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptxBasics of Anatomy- Language of Anatomy.pptx
Basics of Anatomy- Language of Anatomy.pptx
 
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
 
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In ChandigarhHot  Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
Hot Call Girl In Chandigarh 👅🥵 9053'900678 Call Girls Service In Chandigarh
 
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service DehradunDehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
Dehradun Call Girls Service ❤️🍑 9675010100 👄🫦Independent Escort Service Dehradun
 
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetChandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls in Lucknow Esha 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
Call Girls in Lucknow Esha 🔝 8923113531  🔝 🎶 Independent Escort Service LucknowCall Girls in Lucknow Esha 🔝 8923113531  🔝 🎶 Independent Escort Service Lucknow
Call Girls in Lucknow Esha 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
 
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
 
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
Russian Call Girls in Chandigarh Ojaswi ❤️🍑 9907093804 👄🫦 Independent Escort ...
 
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
❤️♀️@ Jaipur Call Girl Agency ❤️♀️@ Manjeet Russian Call Girls Service in Jai...
 
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
 
Call Girl Lucknow Gauri 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
Call Girl Lucknow Gauri 🔝 8923113531  🔝 🎶 Independent Escort Service LucknowCall Girl Lucknow Gauri 🔝 8923113531  🔝 🎶 Independent Escort Service Lucknow
Call Girl Lucknow Gauri 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
 
💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Kolkata Escort Service Call Girls, ₹5000 To 25K With AC💚😋
 
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service GurgaonCall Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
Call Girl Gurgaon Saloni 9711199012 Independent Escort Service Gurgaon
 
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In LudhianaHot  Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
Hot Call Girl In Ludhiana 👅🥵 9053'900678 Call Girls Service In Ludhiana
 
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
 
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service GuwahatiCall Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
Call Girl Guwahati Aashi 👉 7001305949 👈 🔝 Independent Escort Service Guwahati
 

Final_Presentation.pptx

  • 1. Analysis and Early Prediction of Sepsis using Clinical Data By Anushree Ankola Advisor: Dr. Anand Panangadan Reviewer : Prof. Tseng Chen James
  • 2. Agenda  Sepsis – Affects and Symptoms  Objective  Challenge Dataset  Procedure  EDA – missing values  Feature Engineering  EDA – Dataset Imbalance  Choosing Accuracy metric  Decision Tree for prediction  Future Scope 1 - Using XGBoost for prediction  Further research and findings  Conclusion  Reference
  • 4. Sepsis - Affect and Symptoms  Affects: • very young children, • older adults, • people with chronic diseases, • and those with weakened immune system  Sepsis can be difficult to diagnose because it occurs quickly and can be confused with other conditions. Watch for a combination of the following symptoms.  S Shivering, fever, or very cold E Extreme pain or general discomfort (“worst ever”) P Pale or discolored skin S Sleepy, difficult to rouse, confused I “I feel like I might die!” S Short of breath
  • 5. Objective  Goal of the analysis is the early detection of sepsis using physiological data.  The early prediction of sepsis is potentially life-saving, and we aim to predict sepsis 6 hours before the clinical prediction of sepsis.  Late prediction of sepsis is potentially life-threatening, and also consumes heavy hospital resources.  By predicting sepsis in non-sepsis patients or predicting sepsis very early in sepsis patients consumes limited resources and we can assume the risk of prediction to be minimal but revolutionary.
  • 6. Challenge Dataset  Data used in the competition is sourced from ICU patients in two separate hospital systems and is obtained from Physionet.  The data will be split into 70% Training and 30 % testing set. The training set will be split for validating the training set.  The original data for each patient will be contained within a single pipe-delimited text file. Each file will have the same header and each row will represent a single hour's worth of data. Each hospital have 20,000 patients and hence 20,000 files.  Available patient co-variates consist of Demographics, Vital Signs, and Laboratory values  Features: • 8 Vital Signs : Heart Rate, Temperature , Blood Pressure, Respiratory rate, • 26 Laboratory Values : Platelet Count, Glucose , Calcium etc • 6 Demographics : Age, Gender, Time in ICU , Hospital Admit time  1 Label : • 0 (Non-sepsis) and 1 (Sepsis)
  • 8. Assumptions  Combined dataset by appending all the patient files  Total files: 43,765 psv files  Shape of original file: (1552287 * 41)  The dataset is not time dependent.  2 approaches to solve it: 1. Add a time component and patient ID 2. Ignoring time component and consider each row independently  Following 2nd approach. Reason: Can predict sepsis without past patient data. More robust and need less resources.
  • 9. Procedure COMBINE ALL DATA NON-TIME DEPENDENT APPROACH HANDLING MISSING VALUES HANDLING DATA IMBALANCE BASELINE PREDICTION FEATURE ENGINEERING
  • 10. EDA - Handling Missing Values  Most of Laboratory Data are having missing values (Fig)  There are more than 90% of missingness in the dataset  2 steps to handle: • Remove features with missingness > 92% • Categorically encode features to handle missingness.
  • 11. Feature Selection – Part 1  Two Approaches employed for Feature Selection: 1. Checked correlation of features contributing to the presence of Sepsis 2. Read health magazines and Research journals such as • US National Library of Medicine, National Institutes of Health • Centers for Disease Control and Prevention • Sepsis - The American Journal of Medicine and filtered out the most named indicator of Sepsis  Outcome: Heart rate, Pulse Oximetry, Body temperature, Blood Pressure (SBP, DBP), Mean Arterial Pressure, Respiration rate, Frac of inspired oxygen, Age, Gender, Hospital Admission Time and ICU length of stay.
  • 12. Feature Engineering & label encoding  Developed 8 new features and are described: 1. new_age : has 3 categorical values – old, young and adult 2. new_hr, new_temp, new_o2sat, new_bp, new_resp, new_map, new_fio2: has 3 categorical values – normal, abnormal and missing  Next, performed feature section again on them and selected all above features, plus Gender, Hospital Admission Time and ICU length of Stay for further processing as a training set
  • 13.  ]  All these are categorically values. They are encoded so that it is easier to run a ML algorithm.
  • 14. EDA – Handling Data Imbalance  98% of patients does not have sepsis and 2% have sepsis.  Problem with Accuracy  Ways to deal with Imbalance: • Under sampling • Oversampling • Using a good algorithm • Using Balanced Bagging Classifier  Which is better? • Balanced Bagging Classifier with Decision Trees
  • 15. Training Data with Decision Trees  Pre-work: • Common classification Metrics are not useful as there is an imbalance in the data– accuracy score • Precision is defined as the fraction of relevant examples (true positives) among all of the examples which were predicted to belong in a certain class. Precision = (true positives) / (true positives + false positives) • Recall is defined as the fraction of examples which were predicted to belong to a class with respect to all of the examples that truly belong in the class. Recall = (true positives) / (true positives + false negatives)
  • 16. Training Data with Decision Trees  Using Balanced Bagging Classifier from imblearn library, which automatically create balanced samples of the input data.  has the parameter 'ratio' that should control how the data is sampled. I have used majority - resample the majority class  From Fig, although ROC curve seems promising, we can see that P-R curve is not great at classifying.
  • 17. Training the data with XGBoost XGBoost - eXtreme Gradient Boosting • Boosting: Method converts weak learners -> strong learners • Boosting algorithm like XGBoost adds iterations of the model sequentially, adjusting the weights of the weak-learners along the way. This reduces bias from the model and typically improves accuracy. • Benefits of XGBoost: Highly scalable/parallelizable, quick to execute, and typically out performs other algorithms.
  • 18. Further Research and Findings  Time component Approach ; need domain expert  PCA for understanding variables better  Using SMOTE for handling Imbalance  Work further on XGBoost  Better Feature Engineering  Ways to reduce Hospital stay time Learning Curve with the Project  Python – Object Oriented Structure and Programming  Libraries heavily used – Sklearn, Matplotlib  Built on Jupyter Notebook
  • 19. Conclusion  We have handled the missing ness and imbalance in the large dataset  We removed missing values > 92%  Performed feature engineering (8 new features) and selected important features  We aimed to predict the onset of the sepsis by 6 hours and so far the Machine Learning model employed seem to classify it partially  The project has a scope of continuing with further research on the importance of the features, better model building and under the guidance of a good health science domain expert.
  • 20. References [1] https://www.physionet.org/content/challenge-2019/1.0.0/ [2] https://www.datacamp.com/community/tutorials/decision-tree-classification-python [3] https://towardsdatascience.com/using-bagging-and-boosting-to-improve-classification- tree-accuracy-6d3bb6c95e5b [4] https://towardsdatascience.com/early-detection-of-sepsis-using-physiological-data- 78d5f31fab9d [5] https://iopscience.iop.org/article/10.1088/1757-899X/428/1/012004 [6] https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for- classification-in-python/ [7] https://www.cdc.gov/ [8] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429642/ [9] http://www.erogol.com/fighting-class-unbalance-supervised-ml-problem/
  • 21. Thank You  I would like to thank my advisor Dr. Anand Panangadan for helping me with the project  I would like to thank my friends at Edward Life Sciences for advising me on ways to approach the problem  I would like my university for giving me the necessary skills to attempt and complete the project

Editor's Notes

  1. Thank you so much for attending my presentation. I welcome you both. If you have any questions during my presentation please stop me and ask and I will try my best to answer them. My final year project is Analysis and Prediction of Sepsis using Clinical Data
  2. The agenda for today’s presentation is – first I will talk about sepsis, its statistics, affects and symptoms. The objective of the project, the challenge dataset, Procedure I took to solve the problem, Exploratory Data Analysis and my intuitions , findings and inferring the course of project, handling data imbalance and missingness, choosing the right accuracy metric. Then building prediction models, future scope of project and conclusion.
  3. What is Sepsis ? Sepsis is a potentially life-threatening condition caused by the body’s response to an infection. In a usual case, the body releases chemicals into bloodstream to neutralise an infection. Sepsis occurs when the body’s response to these chemicals is out of balance, triggering changes that can damage multiple organ systems. Sepsis is caused by infection and can happen to anyone. Sepsis is most common and most dangerous in: Older adults Pregnant women Children younger than 1 People who have chronic conditions, such as diabetes, kidney or lung disease, or cancer People who have weakened immune systems Statistics In USA, 270,000 people die from sepsis each year Internationally , 6 Million people die from sepsis each year US hospitals spend 24 Billion each year on sepsis (13 % of Health Budget) Each hour of delay in treatment can roughly increase mortality by 4–8 % Source : https://www.mayoclinic.org/diseases-conditions/sepsis/symptoms-causes/syc-20351214
  4. The Challenge data repository contains one file per patient (e.g., training/p00101.psv ). Each training data file provides a table with measurements over time. Each column of the table provides a sequence of measurements over time (e.g., heart rate over several hours), where the header of the column describes the measurement. Each row of the table provides a collection of measurements at the same time (e.g., heart rate and oxygen level at the same time). Features: Vital Signs : Heart Rate, Temperature , Blood Pressure, Respiratory rate, End tidal carbon dioxide Laboratory Values : Platelet Count, Glucose , Calcium etc Demographics : Age, Gender, Time in ICU , Hospital Admit time Label : 0 (Non-sepsis) and 1 (Sepsis) Hence we can see that this is a Binary Classification problem
  5. I will explain the relevant features later
  6. This approach would help in predicting Sepsis at each hour for any patient(with or without patient past data). The data for the problem is an hourly time sequence record for each patient. But the records do not have a time-label associated with them, so that opens the scope of interpreting it as a non-temporal problem (ignoring the time component) There are two ways in which one can approach this problem: Temporal Approach : Take into the account the time component for the data. Sepsis is diagnosed for each patient at each hour using the past data. Non-temporal Approach : Ignore the time component and treat record as independently and identically distributed. This approach would help in predicting Sepsis at each hour for any patient(with or without patient past data)
  7. Plan Of Action The data for the problem is an hourly time sequence record for each patient. But the records do not have a time-label associated with them, so that opens the scope of interpreting it as a non-temporal problem (ignoring the time component) There are two ways in which one can approach this problem: Temporal Approach : Take into the account the time component for the data. Sepsis is diagnosed for each patient at each hour using the past data. Non-temporal Approach : Ignore the time component and treat record as independently and identically distributed. This approach would help in predicting Sepsis at each hour for any patient(with or without patient past data)
  8. 1. Age¶ Three categories - Child - Age less than 10 year Adult - Age more than 10 year and less than 60 years Senior - Age more than 60
  9. Non-Temporal Approach In this approach we ignore the time component associated with each patient hourly record and treat them as independently and identically distributed. Train-Validation-Test -Split The data repository has data from two hospitals and a total of 40 thousand patients. The actual number of records would be higher as a patient could have stayed in the hospital for a variable amount of time. Splitting these records to train , validation and test. While splitting I have made sure that each patient is fully contained in exactly one of the splits. Train : 30K Patients Test : 5K Patients Validation : 5K Patients Note : The script to divide the data to train -test-validation split can be found here https://github.com/kskaran94/Sepsis_Identification Exploratory Data Analysis After performing descriptive data analysis on the train data, these were the concerns that highlighted Concerns Extremely Imbalance data : As we can see from the bar plot, the records are extremely imbalanced (Less than 1 % vs 99 %+) with the minority class being Sepsis (1).
  10. Attribute Selection Measure: Information Gain: which measures the impurity of the input set Entropy: it refers to the impurity in a group of examples Information gain computes the difference between entropy before split and average entropy after split of the dataset based on given attribute values Gini Ratio: An extension to information gain known as the gain ratio. Gain ratio handles the issue of bias by normalizing the information gain using Split Info Gini Index: Gini Index considers a binary split for each attribute. You can compute a weighted sum of the impurity of each partition Attribute selection measure is a heuristic for selecting the splitting criterion that partition data into the best possible manner. It is also known as splitting rules because it helps us to determine breakpoints for tuples on a given node. ASM provides a rank to each feature(or attribute) by explaining the given dataset. Best score attribute will be selected as a splitting attribute (Source). In the case of a continuous-valued attribute, split points for branches also need to define. Most popular selection measures are Information Gain, Gain Ratio, and Gini Index.