SlideShare a Scribd company logo
LUNG CANCER
RISK PREDICTION
MODELS
Thao Ngo
INTRODUCTION
• Lung Cancer is the number one cause of all cancer deaths in the US, estimated
234,030 new cases and 154,050 deaths in 2018.
• Early detection using low-dose computed tomography (CT) Screening on high risk
individuals can reduce lung cancer mortality by 20%.
• The current CT screening criteria are 55-77 years old adults, currently smoking, and
30 pack-year smoking history, but these simple criteria are relatively ineffective.
• Many researches suggest that using lung cancer risk prediction models could lead
to more effective screening programs compared to the current screening criteria.
• Develop two risk prediction models for Lung Cancer using classification
algorithms in R.
Decision Tree – Classification and Regression Tree ( CART)
Neural Network – Artificial Neural Network (ANN)
• Select the better model base on their performance metrics.
• Identify the major risk factors associated with lung cancer.
PROJECT PURPOSE
Variables Characteristic
Patient ID Character
Age Numeric 14-73
Gender Binary 1-2
Smoking Numeric 1-8
Passive Smoking Numeric 1-8
Air Pollution Numeric 1-8
Occupational Hazards Numeric 1-8
Genetic Risk Numeric 1-7
Alcohol Use Numeric 1-7
Chronic Lung Disease Numeric 1-7
Dust Allergy Numeric 1-7
Diet Balance Numeric 1-7
Chest Pain Numeric 1-9
Short Breath Numeric 1-9
Fatigue Numeric 1-9
Bloody Coughing Numeric 1-9
Wheezing Numeric 1-7
Swallowing Difficulty Numeric 1-7
Clubbing of finger nails Numeric 1-7
Weight Loss Numeric 1-7
Frequent Cold Numeric 1-7
Dry Cough Numeric 1-7
Clubbing of finger nails Numeric 1-9
Levels Chr /Binary High, Medium, Low
DATA
DESCRIPTION
• Data is a subset of the National Lung
Screening Trial Cohort
• 1000 randomized participants
• 22 attributes are potential risk
factors and symptoms of lung
cancer
• Each observation has one of 3
possible classes: Low, Medium, High
DATA PREPARATION
MODELING
Accuracy
• Accuracy = (true positive + true negative) / (positive +
negative)
Sensitivity (True Positive Rate)
• Sensitivity= true positives/(true positive + false negative)
Specificity (True Negative Rate)
• Specificity=true negatives/(true negative + false positives)
Precision (Positive Predictive Value)
• Precision= true positive/( true positive +false
positive)
Receiver Operating Characteristic (ROC) Area
• a model ability to discriminate between positive and
negative classes
PERFORMANCE
METRICS
Decision Tree (CART)
RESULT ANALYSIS
Class Accuracy Sensitivity Specificity Precision ROC area
High .9832 .9541 1 1 .9721
Low .9731 1 0.9615 0.9184 .9342
Medium .9899 .9697 1 1 .9573
RESULT ANALYSIS
Neural Network (ANN)
Class Accuracy Sensitivity Specificity Precision ROC area
High(black) .9899 1 .9841 .9732 .9636
Low(red) .9592 1 8990 .8108 .8894
Medium(green) .9194 .7576 1 1 .9039
MODEL EVALUATION
Models Accuracy Sensitivity Specificity Precision ROC Area
Decision Tree
(High Level)
.9832 .9541 1 1 .9721
Neural Network
(High Level)
.9899 1 .9841 .9732 .9636
DISCUSSION
• In medical test, False Negative is more dangerous than False Positive, so Finale risk prediction model is
Artificial Neural Network model which has 100% Sensitivity (0% False Negative) compared to Decision
Tree 95.41% Sensitivity (4.59% False Negative).
• Based on Variable Importance result, the most significant risk factors for lung cancer are Air Pollution,
Age, Smoking, Passive Smoking, and Alcohol Use.
• Future improvements
Improve the model performance by fine-tuning the model parameters
Reduce input features to prevent overfitting.
Increase data inputs for better model performance.
Use different classification algorithms for better selection ( Support Vector Machine, RandomForest)
• The project has developed the risk prediction model for Lung Cancer and identified top
5 risk factors associated with Lung cancer using classification methods in R packages.
• Using risk prediction models to select high-risk individuals for lung cancer screening
would be more superior to current selection criteria.
• Avoiding the major risk factors may help to prevent and lower lung cancer.
• The project shows that the results are promising for the application of lung cancer risk
prediction models for selective screening.
CONCLUSION
• American Lung Association http://www.lung.org
• National Lung Screening Trials https://www.cancer.gov/types/lung/research/nlst
• Fitting a neural network in R https://www.r-bloggers.com
• Classification And Regression Trees for Machine Learning https://machinelearningmastery.com
• Machine Learning in Medicine, Rahul C. Deo, Circulation. 2015;132:1920-1930, November 16,
2015
• Evaluation of Classification Model Accuracy: Essentials http://www.sthda.com/english/articles
• Cross-Validation for Predictive Analytics using R http://www.milanor.net/blog/cross-validation-
for-predictive-analytics-using-r/
• Ideas on interpreting machine learning Patrick Hall, Wen Phan, SriSatish Ambati,March 15, 2017
• R packages https://cran.r-project.org/web/packages
REFERENCES

More Related Content

What's hot

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
Pramod Sharma
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data mining
RishabhKumar283
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
ahmad abdelhafeez
 
Lung Nodule detection System
Lung Nodule detection SystemLung Nodule detection System
Lung Nodule detection System
Editor IJMTER
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
Breast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural NetworkBreast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural Network
Subroto Biswas
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
Christopher Marker
 
Machine learning
Machine learningMachine learning
Machine learning
Mike Martinez
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Seonho Park
 
Breast cancer Detection using MATLAB
Breast cancer Detection using MATLABBreast cancer Detection using MATLAB
Breast cancer Detection using MATLAB
NupurRathi7
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
Himadri Mishra
 
House price ppt 18 bcs6588_md. tauhid alam
House price ppt  18 bcs6588_md. tauhid alamHouse price ppt  18 bcs6588_md. tauhid alam
House price ppt 18 bcs6588_md. tauhid alam
ArmanMalik66
 
AI in Gynaec Onco
AI in Gynaec OncoAI in Gynaec Onco
AI in Gynaec Onco
Niranjan Chavan
 
Breast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural NetworkBreast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural Network
IRJET Journal
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Interactive Technologies and Games: Education, Health and Disability
 
22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
Andres Mendez-Vazquez
 
Radiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer ScreeningRadiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer Screening
Wookjin Choi
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
The Titanic - machine learning from disaster
The Titanic - machine learning from disasterThe Titanic - machine learning from disaster
The Titanic - machine learning from disaster
Mostafa Nizam
 

What's hot (20)

K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data mining
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
Lung Nodule detection System
Lung Nodule detection SystemLung Nodule detection System
Lung Nodule detection System
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Breast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural NetworkBreast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural Network
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Machine learning
Machine learningMachine learning
Machine learning
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
Breast cancer Detection using MATLAB
Breast cancer Detection using MATLABBreast cancer Detection using MATLAB
Breast cancer Detection using MATLAB
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
House price ppt 18 bcs6588_md. tauhid alam
House price ppt  18 bcs6588_md. tauhid alamHouse price ppt  18 bcs6588_md. tauhid alam
House price ppt 18 bcs6588_md. tauhid alam
 
AI in Gynaec Onco
AI in Gynaec OncoAI in Gynaec Onco
AI in Gynaec Onco
 
Breast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural NetworkBreast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural Network
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
 
22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
 
Radiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer ScreeningRadiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer Screening
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
The Titanic - machine learning from disaster
The Titanic - machine learning from disasterThe Titanic - machine learning from disaster
The Titanic - machine learning from disaster
 

Similar to Lung Cancer Risk Prediction Models

IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET Journal
 
Health economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development processHealth economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development process
cheweb1
 
Technology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic AnalysesTechnology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic Analyses
evadew1
 
Low Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung CancerLow Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung Cancer
Kue Lee
 
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
evadew1
 
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalYOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
IRJET Journal
 
16
1616
Detection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM ClassificationDetection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM Classification
IRJET Journal
 
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
IRJET-  	  Survey Paper on Oral Cancer Detection using Machine LearningIRJET-  	  Survey Paper on Oral Cancer Detection using Machine Learning
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
IRJET Journal
 
Journal club lung cancer screening
Journal club lung cancer screeningJournal club lung cancer screening
Journal club lung cancer screening
Ranjita Pallavi
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
International Journal of Reconfigurable and Embedded Systems
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
MaligireddyTanujaRed1
 
randomized clinical trials II
randomized clinical trials IIrandomized clinical trials II
randomized clinical trials II
IAU Dent
 
Oncotype dx
Oncotype dxOncotype dx
Oncotype dx
NHS
 
The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...
Integrated DNA Technologies
 
Quality Measurement in Cardiac Surgery
Quality Measurement in Cardiac SurgeryQuality Measurement in Cardiac Surgery
Quality Measurement in Cardiac Surgery
Nora Albogami
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
Mohamed Loey
 
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
breastcancerupdatecongress
 
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen AshabH2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
Sri Ambati
 
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
University of Malaya
 

Similar to Lung Cancer Risk Prediction Models (20)

IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
 
Health economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development processHealth economic modelling in the diagnostics development process
Health economic modelling in the diagnostics development process
 
Technology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic AnalysesTechnology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic Analyses
 
Low Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung CancerLow Dose CT Screening for Early Diagnosis of Lung Cancer
Low Dose CT Screening for Early Diagnosis of Lung Cancer
 
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
 
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model ProposalYOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
YOLOv8-Based Lung Nodule Detection: A Novel Hybrid Deep Learning Model Proposal
 
16
1616
16
 
Detection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM ClassificationDetection of Lung Cancer using SVM Classification
Detection of Lung Cancer using SVM Classification
 
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
IRJET-  	  Survey Paper on Oral Cancer Detection using Machine LearningIRJET-  	  Survey Paper on Oral Cancer Detection using Machine Learning
IRJET- Survey Paper on Oral Cancer Detection using Machine Learning
 
Journal club lung cancer screening
Journal club lung cancer screeningJournal club lung cancer screening
Journal club lung cancer screening
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
 
randomized clinical trials II
randomized clinical trials IIrandomized clinical trials II
randomized clinical trials II
 
Oncotype dx
Oncotype dxOncotype dx
Oncotype dx
 
The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...The quest for high confidence mutations in plasma: searching for a needle in ...
The quest for high confidence mutations in plasma: searching for a needle in ...
 
Quality Measurement in Cardiac Surgery
Quality Measurement in Cardiac SurgeryQuality Measurement in Cardiac Surgery
Quality Measurement in Cardiac Surgery
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
Frederique Penault Llorca : Prosigna : un test décentralisé apporte t il une ...
 
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen AshabH2O World - H2O for Genomics with Hussam Al-Deen Ashab
H2O World - H2O for Genomics with Hussam Al-Deen Ashab
 
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
 

Recently uploaded

Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 

Recently uploaded (20)

Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 

Lung Cancer Risk Prediction Models

  • 2. INTRODUCTION • Lung Cancer is the number one cause of all cancer deaths in the US, estimated 234,030 new cases and 154,050 deaths in 2018. • Early detection using low-dose computed tomography (CT) Screening on high risk individuals can reduce lung cancer mortality by 20%. • The current CT screening criteria are 55-77 years old adults, currently smoking, and 30 pack-year smoking history, but these simple criteria are relatively ineffective. • Many researches suggest that using lung cancer risk prediction models could lead to more effective screening programs compared to the current screening criteria.
  • 3. • Develop two risk prediction models for Lung Cancer using classification algorithms in R. Decision Tree – Classification and Regression Tree ( CART) Neural Network – Artificial Neural Network (ANN) • Select the better model base on their performance metrics. • Identify the major risk factors associated with lung cancer. PROJECT PURPOSE
  • 4. Variables Characteristic Patient ID Character Age Numeric 14-73 Gender Binary 1-2 Smoking Numeric 1-8 Passive Smoking Numeric 1-8 Air Pollution Numeric 1-8 Occupational Hazards Numeric 1-8 Genetic Risk Numeric 1-7 Alcohol Use Numeric 1-7 Chronic Lung Disease Numeric 1-7 Dust Allergy Numeric 1-7 Diet Balance Numeric 1-7 Chest Pain Numeric 1-9 Short Breath Numeric 1-9 Fatigue Numeric 1-9 Bloody Coughing Numeric 1-9 Wheezing Numeric 1-7 Swallowing Difficulty Numeric 1-7 Clubbing of finger nails Numeric 1-7 Weight Loss Numeric 1-7 Frequent Cold Numeric 1-7 Dry Cough Numeric 1-7 Clubbing of finger nails Numeric 1-9 Levels Chr /Binary High, Medium, Low DATA DESCRIPTION • Data is a subset of the National Lung Screening Trial Cohort • 1000 randomized participants • 22 attributes are potential risk factors and symptoms of lung cancer • Each observation has one of 3 possible classes: Low, Medium, High
  • 7. Accuracy • Accuracy = (true positive + true negative) / (positive + negative) Sensitivity (True Positive Rate) • Sensitivity= true positives/(true positive + false negative) Specificity (True Negative Rate) • Specificity=true negatives/(true negative + false positives) Precision (Positive Predictive Value) • Precision= true positive/( true positive +false positive) Receiver Operating Characteristic (ROC) Area • a model ability to discriminate between positive and negative classes PERFORMANCE METRICS
  • 8. Decision Tree (CART) RESULT ANALYSIS Class Accuracy Sensitivity Specificity Precision ROC area High .9832 .9541 1 1 .9721 Low .9731 1 0.9615 0.9184 .9342 Medium .9899 .9697 1 1 .9573
  • 9. RESULT ANALYSIS Neural Network (ANN) Class Accuracy Sensitivity Specificity Precision ROC area High(black) .9899 1 .9841 .9732 .9636 Low(red) .9592 1 8990 .8108 .8894 Medium(green) .9194 .7576 1 1 .9039
  • 10. MODEL EVALUATION Models Accuracy Sensitivity Specificity Precision ROC Area Decision Tree (High Level) .9832 .9541 1 1 .9721 Neural Network (High Level) .9899 1 .9841 .9732 .9636
  • 11. DISCUSSION • In medical test, False Negative is more dangerous than False Positive, so Finale risk prediction model is Artificial Neural Network model which has 100% Sensitivity (0% False Negative) compared to Decision Tree 95.41% Sensitivity (4.59% False Negative). • Based on Variable Importance result, the most significant risk factors for lung cancer are Air Pollution, Age, Smoking, Passive Smoking, and Alcohol Use. • Future improvements Improve the model performance by fine-tuning the model parameters Reduce input features to prevent overfitting. Increase data inputs for better model performance. Use different classification algorithms for better selection ( Support Vector Machine, RandomForest)
  • 12. • The project has developed the risk prediction model for Lung Cancer and identified top 5 risk factors associated with Lung cancer using classification methods in R packages. • Using risk prediction models to select high-risk individuals for lung cancer screening would be more superior to current selection criteria. • Avoiding the major risk factors may help to prevent and lower lung cancer. • The project shows that the results are promising for the application of lung cancer risk prediction models for selective screening. CONCLUSION
  • 13. • American Lung Association http://www.lung.org • National Lung Screening Trials https://www.cancer.gov/types/lung/research/nlst • Fitting a neural network in R https://www.r-bloggers.com • Classification And Regression Trees for Machine Learning https://machinelearningmastery.com • Machine Learning in Medicine, Rahul C. Deo, Circulation. 2015;132:1920-1930, November 16, 2015 • Evaluation of Classification Model Accuracy: Essentials http://www.sthda.com/english/articles • Cross-Validation for Predictive Analytics using R http://www.milanor.net/blog/cross-validation- for-predictive-analytics-using-r/ • Ideas on interpreting machine learning Patrick Hall, Wen Phan, SriSatish Ambati,March 15, 2017 • R packages https://cran.r-project.org/web/packages REFERENCES