Lung Cancer Risk Prediction Models

LUNG CANCER
RISK PREDICTION
MODELS
Thao Ngo

INTRODUCTION
• Lung Cancer is the number one cause of all cancer deaths in the US, estimated
234,030 new cases and 154,050 deaths in 2018.
• Early detection using low-dose computed tomography (CT) Screening on high risk
individuals can reduce lung cancer mortality by 20%.
• The current CT screening criteria are 55-77 years old adults, currently smoking, and
30 pack-year smoking history, but these simple criteria are relatively ineffective.
• Many researches suggest that using lung cancer risk prediction models could lead
to more effective screening programs compared to the current screening criteria.

• Develop two risk prediction models for Lung Cancer using classification
algorithms in R.
Decision Tree – Classification and Regression Tree ( CART)
Neural Network – Artificial Neural Network (ANN)
• Select the better model base on their performance metrics.
• Identify the major risk factors associated with lung cancer.
PROJECT PURPOSE

Variables Characteristic
Patient ID Character
Age Numeric 14-73
Gender Binary 1-2
Smoking Numeric 1-8
Passive Smoking Numeric 1-8
Air Pollution Numeric 1-8
Occupational Hazards Numeric 1-8
Genetic Risk Numeric 1-7
Alcohol Use Numeric 1-7
Chronic Lung Disease Numeric 1-7
Dust Allergy Numeric 1-7
Diet Balance Numeric 1-7
Chest Pain Numeric 1-9
Short Breath Numeric 1-9
Fatigue Numeric 1-9
Bloody Coughing Numeric 1-9
Wheezing Numeric 1-7
Swallowing Difficulty Numeric 1-7
Clubbing of finger nails Numeric 1-7
Weight Loss Numeric 1-7
Frequent Cold Numeric 1-7
Dry Cough Numeric 1-7
Clubbing of finger nails Numeric 1-9
Levels Chr /Binary High, Medium, Low
DATA
DESCRIPTION
• Data is a subset of the National Lung
Screening Trial Cohort
• 1000 randomized participants
• 22 attributes are potential risk
factors and symptoms of lung
cancer
• Each observation has one of 3
possible classes: Low, Medium, High

Accuracy
• Accuracy = (true positive + true negative) / (positive +
negative)
Sensitivity (True Positive Rate)
• Sensitivity= true positives/(true positive + false negative)
Specificity (True Negative Rate)
• Specificity=true negatives/(true negative + false positives)
Precision (Positive Predictive Value)
• Precision= true positive/( true positive +false
positive)
Receiver Operating Characteristic (ROC) Area
• a model ability to discriminate between positive and
negative classes
PERFORMANCE
METRICS

Decision Tree (CART)
RESULT ANALYSIS
Class Accuracy Sensitivity Specificity Precision ROC area
High .9832 .9541 1 1 .9721
Low .9731 1 0.9615 0.9184 .9342
Medium .9899 .9697 1 1 .9573

RESULT ANALYSIS
Neural Network (ANN)
Class Accuracy Sensitivity Specificity Precision ROC area
High(black) .9899 1 .9841 .9732 .9636
Low(red) .9592 1 8990 .8108 .8894
Medium(green) .9194 .7576 1 1 .9039

MODEL EVALUATION
Models Accuracy Sensitivity Specificity Precision ROC Area
Decision Tree
(High Level)
.9832 .9541 1 1 .9721
Neural Network
(High Level)
.9899 1 .9841 .9732 .9636

DISCUSSION
• In medical test, False Negative is more dangerous than False Positive, so Finale risk prediction model is
Artificial Neural Network model which has 100% Sensitivity (0% False Negative) compared to Decision
Tree 95.41% Sensitivity (4.59% False Negative).
• Based on Variable Importance result, the most significant risk factors for lung cancer are Air Pollution,
Age, Smoking, Passive Smoking, and Alcohol Use.
• Future improvements
Improve the model performance by fine-tuning the model parameters
Reduce input features to prevent overfitting.
Increase data inputs for better model performance.
Use different classification algorithms for better selection ( Support Vector Machine, RandomForest)

• The project has developed the risk prediction model for Lung Cancer and identified top
5 risk factors associated with Lung cancer using classification methods in R packages.
• Using risk prediction models to select high-risk individuals for lung cancer screening
would be more superior to current selection criteria.
• Avoiding the major risk factors may help to prevent and lower lung cancer.
• The project shows that the results are promising for the application of lung cancer risk
prediction models for selective screening.
CONCLUSION

• American Lung Association http://www.lung.org
• National Lung Screening Trials https://www.cancer.gov/types/lung/research/nlst
• Fitting a neural network in R https://www.r-bloggers.com
• Classification And Regression Trees for Machine Learning https://machinelearningmastery.com
• Machine Learning in Medicine, Rahul C. Deo, Circulation. 2015;132:1920-1930, November 16,
2015
• Evaluation of Classification Model Accuracy: Essentials http://www.sthda.com/english/articles
• Cross-Validation for Predictive Analytics using R http://www.milanor.net/blog/cross-validation-
for-predictive-analytics-using-r/
• Ideas on interpreting machine learning Patrick Hall, Wen Phan, SriSatish Ambati,March 15, 2017
• R packages https://cran.r-project.org/web/packages
REFERENCES

Lung Cancer Risk Prediction Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lung Cancer Risk Prediction Models

Similar to Lung Cancer Risk Prediction Models (20)

Recently uploaded

Recently uploaded (20)

Lung Cancer Risk Prediction Models