SlideShare a Scribd company logo
1 of 13
LUNG CANCER
RISK PREDICTION
MODELS
Thao Ngo
INTRODUCTION
• Lung Cancer is the number one cause of all cancer deaths in the US, estimated
234,030 new cases and 154,050 deaths in 2018.
• Early detection using low-dose computed tomography (CT) Screening on high risk
individuals can reduce lung cancer mortality by 20%.
• The current CT screening criteria are 55-77 years old adults, currently smoking, and
30 pack-year smoking history, but these simple criteria are relatively ineffective.
• Many researches suggest that using lung cancer risk prediction models could lead
to more effective screening programs compared to the current screening criteria.
• Develop two risk prediction models for Lung Cancer using classification
algorithms in R.
Decision Tree – Classification and Regression Tree ( CART)
Neural Network – Artificial Neural Network (ANN)
• Select the better model base on their performance metrics.
• Identify the major risk factors associated with lung cancer.
PROJECT PURPOSE
Variables Characteristic
Patient ID Character
Age Numeric 14-73
Gender Binary 1-2
Smoking Numeric 1-8
Passive Smoking Numeric 1-8
Air Pollution Numeric 1-8
Occupational Hazards Numeric 1-8
Genetic Risk Numeric 1-7
Alcohol Use Numeric 1-7
Chronic Lung Disease Numeric 1-7
Dust Allergy Numeric 1-7
Diet Balance Numeric 1-7
Chest Pain Numeric 1-9
Short Breath Numeric 1-9
Fatigue Numeric 1-9
Bloody Coughing Numeric 1-9
Wheezing Numeric 1-7
Swallowing Difficulty Numeric 1-7
Clubbing of finger nails Numeric 1-7
Weight Loss Numeric 1-7
Frequent Cold Numeric 1-7
Dry Cough Numeric 1-7
Clubbing of finger nails Numeric 1-9
Levels Chr /Binary High, Medium, Low
DATA
DESCRIPTION
• Data is a subset of the National Lung
Screening Trial Cohort
• 1000 randomized participants
• 22 attributes are potential risk
factors and symptoms of lung
cancer
• Each observation has one of 3
possible classes: Low, Medium, High
DATA PREPARATION
MODELING
Accuracy
• Accuracy = (true positive + true negative) / (positive +
negative)
Sensitivity (True Positive Rate)
• Sensitivity= true positives/(true positive + false negative)
Specificity (True Negative Rate)
• Specificity=true negatives/(true negative + false positives)
Precision (Positive Predictive Value)
• Precision= true positive/( true positive +false
positive)
Receiver Operating Characteristic (ROC) Area
• a model ability to discriminate between positive and
negative classes
PERFORMANCE
METRICS
Decision Tree (CART)
RESULT ANALYSIS
Class Accuracy Sensitivity Specificity Precision ROC area
High .9832 .9541 1 1 .9721
Low .9731 1 0.9615 0.9184 .9342
Medium .9899 .9697 1 1 .9573
RESULT ANALYSIS
Neural Network (ANN)
Class Accuracy Sensitivity Specificity Precision ROC area
High(black) .9899 1 .9841 .9732 .9636
Low(red) .9592 1 8990 .8108 .8894
Medium(green) .9194 .7576 1 1 .9039
MODEL EVALUATION
Models Accuracy Sensitivity Specificity Precision ROC Area
Decision Tree
(High Level)
.9832 .9541 1 1 .9721
Neural Network
(High Level)
.9899 1 .9841 .9732 .9636
DISCUSSION
• In medical test, False Negative is more dangerous than False Positive, so Finale risk prediction model is
Artificial Neural Network model which has 100% Sensitivity (0% False Negative) compared to Decision
Tree 95.41% Sensitivity (4.59% False Negative).
• Based on Variable Importance result, the most significant risk factors for lung cancer are Air Pollution,
Age, Smoking, Passive Smoking, and Alcohol Use.
• Future improvements
Improve the model performance by fine-tuning the model parameters
Reduce input features to prevent overfitting.
Increase data inputs for better model performance.
Use different classification algorithms for better selection ( Support Vector Machine, RandomForest)
• The project has developed the risk prediction model for Lung Cancer and identified top
5 risk factors associated with Lung cancer using classification methods in R packages.
• Using risk prediction models to select high-risk individuals for lung cancer screening
would be more superior to current selection criteria.
• Avoiding the major risk factors may help to prevent and lower lung cancer.
• The project shows that the results are promising for the application of lung cancer risk
prediction models for selective screening.
CONCLUSION
• American Lung Association http://www.lung.org
• National Lung Screening Trials https://www.cancer.gov/types/lung/research/nlst
• Fitting a neural network in R https://www.r-bloggers.com
• Classification And Regression Trees for Machine Learning https://machinelearningmastery.com
• Machine Learning in Medicine, Rahul C. Deo, Circulation. 2015;132:1920-1930, November 16,
2015
• Evaluation of Classification Model Accuracy: Essentials http://www.sthda.com/english/articles
• Cross-Validation for Predictive Analytics using R http://www.milanor.net/blog/cross-validation-
for-predictive-analytics-using-r/
• Ideas on interpreting machine learning Patrick Hall, Wen Phan, SriSatish Ambati,March 15, 2017
• R packages https://cran.r-project.org/web/packages
REFERENCES

More Related Content

What's hot

Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodKirkwood Donavin
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care Meenakshi Sood
 
Bayesian classification
Bayesian classification Bayesian classification
Bayesian classification Zul Kawsar
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Simplilearn
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNNKush Kulshrestha
 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Edureka!
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 

What's hot (20)

Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
Bayesian classification
Bayesian classification Bayesian classification
Bayesian classification
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
KNN
KNNKNN
KNN
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Xgboost
XgboostXgboost
Xgboost
 
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 

Similar to Lung Cancer Risk Prediction Models

IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...IRJET Journal
 
Health economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development processHealth economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development processcheweb1
 
Technology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic AnalysesTechnology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic Analysesevadew1
 
Low Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung CancerLow Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung CancerKue Lee
 
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016evadew1
 
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalYOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalIRJET Journal
 
Detection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM ClassificationDetection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM ClassificationIRJET Journal
 
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
IRJET-  	  Survey Paper on Oral Cancer Detection using Machine LearningIRJET-  	  Survey Paper on Oral Cancer Detection using Machine Learning
IRJET- Survey Paper on Oral Cancer Detection using Machine LearningIRJET Journal
 
Journal club lung cancer screening
Journal club lung cancer screeningJournal club lung cancer screening
Journal club lung cancer screeningRanjita Pallavi
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxMaligireddyTanujaRed1
 
randomized clinical trials II
randomized clinical trials IIrandomized clinical trials II
randomized clinical trials IIIAU Dent
 
Oncotype dx
Oncotype dxOncotype dx
Oncotype dxNHS
 
The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...Integrated DNA Technologies
 
Quality Measurement in Cardiac Surgery
Quality Measurement in Cardiac SurgeryQuality Measurement in Cardiac Surgery
Quality Measurement in Cardiac SurgeryNora Albogami
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesMohamed Loey
 
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...breastcancerupdatecongress
 
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen AshabH2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen AshabSri Ambati
 

Similar to Lung Cancer Risk Prediction Models (20)

IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
 
Health economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development processHealth economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development process
 
Technology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic AnalysesTechnology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic Analyses
 
Low Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung CancerLow Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung Cancer
 
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
 
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalYOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
 
16
1616
16
 
Detection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM ClassificationDetection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM Classification
 
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
IRJET-  	  Survey Paper on Oral Cancer Detection using Machine LearningIRJET-  	  Survey Paper on Oral Cancer Detection using Machine Learning
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
 
Journal club lung cancer screening
Journal club lung cancer screeningJournal club lung cancer screening
Journal club lung cancer screening
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
 
randomized clinical trials II
randomized clinical trials IIrandomized clinical trials II
randomized clinical trials II
 
Oncotype dx
Oncotype dxOncotype dx
Oncotype dx
 
The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...
 
Quality Measurement in Cardiac Surgery
Quality Measurement in Cardiac SurgeryQuality Measurement in Cardiac Surgery
Quality Measurement in Cardiac Surgery
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
AI in Gynaec Onco
AI in Gynaec OncoAI in Gynaec Onco
AI in Gynaec Onco
 
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
 
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen AshabH2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Lung Cancer Risk Prediction Models

  • 2. INTRODUCTION • Lung Cancer is the number one cause of all cancer deaths in the US, estimated 234,030 new cases and 154,050 deaths in 2018. • Early detection using low-dose computed tomography (CT) Screening on high risk individuals can reduce lung cancer mortality by 20%. • The current CT screening criteria are 55-77 years old adults, currently smoking, and 30 pack-year smoking history, but these simple criteria are relatively ineffective. • Many researches suggest that using lung cancer risk prediction models could lead to more effective screening programs compared to the current screening criteria.
  • 3. • Develop two risk prediction models for Lung Cancer using classification algorithms in R. Decision Tree – Classification and Regression Tree ( CART) Neural Network – Artificial Neural Network (ANN) • Select the better model base on their performance metrics. • Identify the major risk factors associated with lung cancer. PROJECT PURPOSE
  • 4. Variables Characteristic Patient ID Character Age Numeric 14-73 Gender Binary 1-2 Smoking Numeric 1-8 Passive Smoking Numeric 1-8 Air Pollution Numeric 1-8 Occupational Hazards Numeric 1-8 Genetic Risk Numeric 1-7 Alcohol Use Numeric 1-7 Chronic Lung Disease Numeric 1-7 Dust Allergy Numeric 1-7 Diet Balance Numeric 1-7 Chest Pain Numeric 1-9 Short Breath Numeric 1-9 Fatigue Numeric 1-9 Bloody Coughing Numeric 1-9 Wheezing Numeric 1-7 Swallowing Difficulty Numeric 1-7 Clubbing of finger nails Numeric 1-7 Weight Loss Numeric 1-7 Frequent Cold Numeric 1-7 Dry Cough Numeric 1-7 Clubbing of finger nails Numeric 1-9 Levels Chr /Binary High, Medium, Low DATA DESCRIPTION • Data is a subset of the National Lung Screening Trial Cohort • 1000 randomized participants • 22 attributes are potential risk factors and symptoms of lung cancer • Each observation has one of 3 possible classes: Low, Medium, High
  • 7. Accuracy • Accuracy = (true positive + true negative) / (positive + negative) Sensitivity (True Positive Rate) • Sensitivity= true positives/(true positive + false negative) Specificity (True Negative Rate) • Specificity=true negatives/(true negative + false positives) Precision (Positive Predictive Value) • Precision= true positive/( true positive +false positive) Receiver Operating Characteristic (ROC) Area • a model ability to discriminate between positive and negative classes PERFORMANCE METRICS
  • 8. Decision Tree (CART) RESULT ANALYSIS Class Accuracy Sensitivity Specificity Precision ROC area High .9832 .9541 1 1 .9721 Low .9731 1 0.9615 0.9184 .9342 Medium .9899 .9697 1 1 .9573
  • 9. RESULT ANALYSIS Neural Network (ANN) Class Accuracy Sensitivity Specificity Precision ROC area High(black) .9899 1 .9841 .9732 .9636 Low(red) .9592 1 8990 .8108 .8894 Medium(green) .9194 .7576 1 1 .9039
  • 10. MODEL EVALUATION Models Accuracy Sensitivity Specificity Precision ROC Area Decision Tree (High Level) .9832 .9541 1 1 .9721 Neural Network (High Level) .9899 1 .9841 .9732 .9636
  • 11. DISCUSSION • In medical test, False Negative is more dangerous than False Positive, so Finale risk prediction model is Artificial Neural Network model which has 100% Sensitivity (0% False Negative) compared to Decision Tree 95.41% Sensitivity (4.59% False Negative). • Based on Variable Importance result, the most significant risk factors for lung cancer are Air Pollution, Age, Smoking, Passive Smoking, and Alcohol Use. • Future improvements Improve the model performance by fine-tuning the model parameters Reduce input features to prevent overfitting. Increase data inputs for better model performance. Use different classification algorithms for better selection ( Support Vector Machine, RandomForest)
  • 12. • The project has developed the risk prediction model for Lung Cancer and identified top 5 risk factors associated with Lung cancer using classification methods in R packages. • Using risk prediction models to select high-risk individuals for lung cancer screening would be more superior to current selection criteria. • Avoiding the major risk factors may help to prevent and lower lung cancer. • The project shows that the results are promising for the application of lung cancer risk prediction models for selective screening. CONCLUSION
  • 13. • American Lung Association http://www.lung.org • National Lung Screening Trials https://www.cancer.gov/types/lung/research/nlst • Fitting a neural network in R https://www.r-bloggers.com • Classification And Regression Trees for Machine Learning https://machinelearningmastery.com • Machine Learning in Medicine, Rahul C. Deo, Circulation. 2015;132:1920-1930, November 16, 2015 • Evaluation of Classification Model Accuracy: Essentials http://www.sthda.com/english/articles • Cross-Validation for Predictive Analytics using R http://www.milanor.net/blog/cross-validation- for-predictive-analytics-using-r/ • Ideas on interpreting machine learning Patrick Hall, Wen Phan, SriSatish Ambati,March 15, 2017 • R packages https://cran.r-project.org/web/packages REFERENCES