LUNG CANCER
DETECTION USING
MACHINE LEARNING AND
DEEP LEARNING
TECHNIQUES
BY HAMZA SHAUKAT
INTRODUCTION
• Cancer is one of the most deadly disease that humans have ever encountered
• Most common cause of cancer death in men and women
• Tobacco use accounts for 87% of Lung Cancer
• In 70% of lung cancer patients the disease is already spread to other organs
• Not easy to detect in early stages
• Menace for humankind
COMPUTER AIDED DIAGNOSIS
• With the advancement in technology computers are now used in the medical
field to diagnose different diseases by reading the information in medical images
such as CT images and MR images etc.
• Computers perform this time consuming and redundant tasks more efficiently
• These techniques have revolutionized the medical world and taken burden off
the shoulders of medical experts specially in decision making
OBJECTIVES
 Develop an application for the prediction of lung cancer with the help of CT images using machine
learning and deep learning techniques.
 Increase the accuracy of performance of Lung Cancer Prediction.
 Also reduce the time and cost required for various excessive Medical tests.
 Comparison of the different algorithms and analysis of our results.
DATASET
• THE DATASET WAS TAKEN FROM THE CANCER IMAGING
ARCHIVE
• IT CONTAINED CT IMAGES OF 1500 PATIENTS AND A CSV FILE
WHICH HAD INFORMATION ABOUT THE PATIENTS
PRE-PROCESSING
• THE IMAGES WERE CONVERTED FROM DICOM TO JPG FORMAT
• CORRUPTED IMAGES WERE REMOVED FROM THE DATASET
• THE IMAGES WERE RESIZED FROM (512X512) TO (256X256)
• THE IMAGES WERE CONVERTED INTO GRAYSCALE
• DIMENSION OF IMAGES WAS EXPANDED USING NP.EXPAND_DIMS()
• ALL IMAGES WERE APPENDED AND SAVED IN THE FORM OF A NUMPY
ARRAY
PRE-PROCESSING
• THE VARIABLE X CONTAINED THE IMAGES
• THE LIST NAMED Y IS ALS0 INITIALIZED WHICH HOLDS THE
CLASS F ALL THE IMAGES ie 0 OR 1
FEATURE EXTRACTION USING HOG
• HOG FEATURE DESCRIPTOR WAS USED TO EXTRACT USEFUL
FEATURES OF THE IMAGES
• HOG GETS THE MAGNITUDE AS OF THE GRADIENT AND
DIRECTION OF THE EDGES
MACHINE LEARNING
• ALL THE NECESSARY LIBRARIES ARE IMPORTED eg SKLEARN etc
• IMAGES ARE SPLIT INTO 80/20. 80 % FOR TRAINING THE
MACHINE LEARNING MODELS AND 20 % FOR TESTING THEM
• THIS IS DONE USING TEST TRAIN SPLIT FUNCTION
TRAINING
• ALL THE MACHINE LEARNING MODELS WERE TRAINED ON 80 %
OF THE DATASET
lda=LinearDiscriminantAnalysis()
lda.fit(X_train,y_train)
SIMILARLY 12 OTHER MODELS WERE TRAINED ON THIS DATA
SAVING TRAINED MODELS
• EACH TRAINED MODEL WAS SAVED USING PICKLE LIBRARY IN
THE FORM OF .PKL FORMAT SO IT CAN BE LATER USED FOR
LUNG CANCER DETECTION
with open('lda.pkl', 'wb') as file:
model=pickle.dump(lda, file)
TESTING
• THE TRAINED MODELS WERE LOADED USING PICKLE LIBRARY
• MODELS WERE TESTED ON UNSEEN DATA
• CONFUSION MATRIX AND CLASSIFICATION REPORT WAS
GENERATED FOR EVERY MODEL AND SAVED IN THE FORM OF
IMAGE AND TEXT FILE RESPECTIVELY
RESULTS
THE RESULTS OF ALL THE ALGORITHMS THAT I APPLIED ARE
GIVEN BELOW FOR COMPARISON AND ANALYSIS
CONFUSION MATRIX AND CLASSIFICATION REPORT ARE
PROVIDED IN THE DOCUMENTATION
RESULTS
COMPARATIVE ANALYSIS
S.NO CLASSIFIER PRECISION RECALL F1-SCORE ACCURACY
1 SUPPORT VECTOR 0.67 0.64 0.65 66%
2 DECISION TREE 0.66 0.66 0.66 66%
3 RANDOM FOREST 0.76 0.76 0.76 76%
4 BERNOULLI-NB 0.57 0.66 0.56 57%
5 GAUSSIAN-NB 0.59 0.58 0.58 58%
6 K-NEAREST NEIGHBORS 0.80 0.79 0.79 79%
7 LDA 0.65 0.65 0.65 65%
8 LOGISTIC REGRESSION 0.69 0.69 0.69 65%
9 NEAREST CENTROID 0.58 0.58 0.57 58%
10 PASSIVE AGGRESSIVE 0.69 0.69 0.69 69%
11 PERCEPTRON 0.68 0.67 0.67 67%
12 RIDGE 0.66 0.66 0.66 66%
13 SGD 0.68 0.68 0.68 68%
14 CNN 0.86 0.85 0.85 85%
CONCLUSION
• KNN AND THE DEEP LEARNING MODEL CNN PROVIDE BEST
ACCURACY FOR LUNG CANCER DETECTION FROM CT IMAGES
• THESE RESULTS CAN BE IMPROVED BY USING A BIGGER DATASET
OR OTHER PERFORMANCE ENHANCING TECHNIQUES
TESTING A CT IMAGE FOR LUNG CANCER
• I DEVELOPED A SIMPLE APPLICATION FOR CANCER DETECTION
USINT TKINTER LIBRARY OF PYTHON
• THIS APPLICATION ALLOWS USER TO SELECT ANY ALGORITHM
AND SELECT AN IMAGE TO GET THE RESULTS ABOUT THE IMAGE
• THIS ALSO SHOWS THE CONFUSION MATRIX AND
CLASSIFICATION REPORT OF THE DELECTED MODEL
APPLICATION
APPLICATION
APPLICATION
APPLICATION
APPLICATION
THE END

presentation about machine learning .pptx

  • 2.
    LUNG CANCER DETECTION USING MACHINELEARNING AND DEEP LEARNING TECHNIQUES BY HAMZA SHAUKAT
  • 3.
    INTRODUCTION • Cancer isone of the most deadly disease that humans have ever encountered • Most common cause of cancer death in men and women • Tobacco use accounts for 87% of Lung Cancer • In 70% of lung cancer patients the disease is already spread to other organs • Not easy to detect in early stages • Menace for humankind
  • 4.
    COMPUTER AIDED DIAGNOSIS •With the advancement in technology computers are now used in the medical field to diagnose different diseases by reading the information in medical images such as CT images and MR images etc. • Computers perform this time consuming and redundant tasks more efficiently • These techniques have revolutionized the medical world and taken burden off the shoulders of medical experts specially in decision making
  • 5.
    OBJECTIVES  Develop anapplication for the prediction of lung cancer with the help of CT images using machine learning and deep learning techniques.  Increase the accuracy of performance of Lung Cancer Prediction.  Also reduce the time and cost required for various excessive Medical tests.  Comparison of the different algorithms and analysis of our results.
  • 6.
    DATASET • THE DATASETWAS TAKEN FROM THE CANCER IMAGING ARCHIVE • IT CONTAINED CT IMAGES OF 1500 PATIENTS AND A CSV FILE WHICH HAD INFORMATION ABOUT THE PATIENTS
  • 7.
    PRE-PROCESSING • THE IMAGESWERE CONVERTED FROM DICOM TO JPG FORMAT • CORRUPTED IMAGES WERE REMOVED FROM THE DATASET • THE IMAGES WERE RESIZED FROM (512X512) TO (256X256) • THE IMAGES WERE CONVERTED INTO GRAYSCALE • DIMENSION OF IMAGES WAS EXPANDED USING NP.EXPAND_DIMS() • ALL IMAGES WERE APPENDED AND SAVED IN THE FORM OF A NUMPY ARRAY
  • 8.
    PRE-PROCESSING • THE VARIABLEX CONTAINED THE IMAGES • THE LIST NAMED Y IS ALS0 INITIALIZED WHICH HOLDS THE CLASS F ALL THE IMAGES ie 0 OR 1
  • 9.
    FEATURE EXTRACTION USINGHOG • HOG FEATURE DESCRIPTOR WAS USED TO EXTRACT USEFUL FEATURES OF THE IMAGES • HOG GETS THE MAGNITUDE AS OF THE GRADIENT AND DIRECTION OF THE EDGES
  • 10.
    MACHINE LEARNING • ALLTHE NECESSARY LIBRARIES ARE IMPORTED eg SKLEARN etc • IMAGES ARE SPLIT INTO 80/20. 80 % FOR TRAINING THE MACHINE LEARNING MODELS AND 20 % FOR TESTING THEM • THIS IS DONE USING TEST TRAIN SPLIT FUNCTION
  • 11.
    TRAINING • ALL THEMACHINE LEARNING MODELS WERE TRAINED ON 80 % OF THE DATASET lda=LinearDiscriminantAnalysis() lda.fit(X_train,y_train) SIMILARLY 12 OTHER MODELS WERE TRAINED ON THIS DATA
  • 12.
    SAVING TRAINED MODELS •EACH TRAINED MODEL WAS SAVED USING PICKLE LIBRARY IN THE FORM OF .PKL FORMAT SO IT CAN BE LATER USED FOR LUNG CANCER DETECTION with open('lda.pkl', 'wb') as file: model=pickle.dump(lda, file)
  • 13.
    TESTING • THE TRAINEDMODELS WERE LOADED USING PICKLE LIBRARY • MODELS WERE TESTED ON UNSEEN DATA • CONFUSION MATRIX AND CLASSIFICATION REPORT WAS GENERATED FOR EVERY MODEL AND SAVED IN THE FORM OF IMAGE AND TEXT FILE RESPECTIVELY
  • 14.
    RESULTS THE RESULTS OFALL THE ALGORITHMS THAT I APPLIED ARE GIVEN BELOW FOR COMPARISON AND ANALYSIS CONFUSION MATRIX AND CLASSIFICATION REPORT ARE PROVIDED IN THE DOCUMENTATION
  • 15.
    RESULTS COMPARATIVE ANALYSIS S.NO CLASSIFIERPRECISION RECALL F1-SCORE ACCURACY 1 SUPPORT VECTOR 0.67 0.64 0.65 66% 2 DECISION TREE 0.66 0.66 0.66 66% 3 RANDOM FOREST 0.76 0.76 0.76 76% 4 BERNOULLI-NB 0.57 0.66 0.56 57% 5 GAUSSIAN-NB 0.59 0.58 0.58 58% 6 K-NEAREST NEIGHBORS 0.80 0.79 0.79 79% 7 LDA 0.65 0.65 0.65 65% 8 LOGISTIC REGRESSION 0.69 0.69 0.69 65% 9 NEAREST CENTROID 0.58 0.58 0.57 58% 10 PASSIVE AGGRESSIVE 0.69 0.69 0.69 69% 11 PERCEPTRON 0.68 0.67 0.67 67% 12 RIDGE 0.66 0.66 0.66 66% 13 SGD 0.68 0.68 0.68 68% 14 CNN 0.86 0.85 0.85 85%
  • 16.
    CONCLUSION • KNN ANDTHE DEEP LEARNING MODEL CNN PROVIDE BEST ACCURACY FOR LUNG CANCER DETECTION FROM CT IMAGES • THESE RESULTS CAN BE IMPROVED BY USING A BIGGER DATASET OR OTHER PERFORMANCE ENHANCING TECHNIQUES
  • 17.
    TESTING A CTIMAGE FOR LUNG CANCER • I DEVELOPED A SIMPLE APPLICATION FOR CANCER DETECTION USINT TKINTER LIBRARY OF PYTHON • THIS APPLICATION ALLOWS USER TO SELECT ANY ALGORITHM AND SELECT AN IMAGE TO GET THE RESULTS ABOUT THE IMAGE • THIS ALSO SHOWS THE CONFUSION MATRIX AND CLASSIFICATION REPORT OF THE DELECTED MODEL
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.