Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
1. USING DEEP LEARNING TO IDENTIFY MEDICAL CONDITIONS RELATED TO
THORAX REGION FROM RADIOGRAPHIC X-RAY IMAGES
ADITYA BHATTACHARYA
LEAD ML ENGINEER, WEST PHARMACEUTICALS
AI RESEARCHER, MUST RESEARCH
2. CONTENT
INTRODUCTION
ABOUT THE DATA
EXPLORATORY DATA ANALYSIS
DEEP LEARNING MODELS USED AND ARCHITECTURAL FLOW DIAGRAM
APPROACHES AND METHODOLOGY
EVALUATION METRICS
RESULTS AND CONCLUSION
IMPROVEMENTS AND FUTURE WORKS
3. INTRODUCTION
In recent times Artificial Intelligence and Computer Vision based methods are used extensively in Computed-Aided Diagnosis which aims to help
doctors diagnose the disease quickly and, in an error free manner.
Chest X-Rays play an important role in detecting Thorax diseases. With advent of technology and its use in automated analysis of Chest X-ray images
to diagnose various pathologies by using deep learning-based approaches, has helped in overcoming costly, time-consuming and prone to error
manual analysis of them.
Various research works have been published with different approaches by incorporating new ideas and frameworks. Some of them are
Hong yu Wang and Yong Xia’s paper1 incorporates attention mechanism into a Deep CNN, and thus proposed the ChestNet model.
In the paper2 by Z. Ge, et al proposed a novel approach to estimate an error function Multi-label Softmax Loss (MSML) to address the
problems of multiple labels and imbalanced data.
In paper3 by J. Irvin et al. talks about the design of a labeler to automatically detect the presence of 14 pathologies in radiology reports,
capturing uncertainties inherent in radiograph interpretation.
In paper4 by P. Rajpurkar et al., talks about development of an algorithm CheXNet that can detect pneumonia from chest X-rays at a level
exceeding practicing radiologist.
In this project, we seek to replicate various aspects of above papers and will try to improve the results by incorporating a new framework which will
include attention mechanism, data augmentation, deep learning models and ensemble methods.
4. ABOUT THE DATA
We used CheXpert dataset, a large public dataset for chest radiograph from Stanford Hospital, collected between October 2002 and July
201710.
Experiments within this project was conducted using the downsampled (lower resolution) CheXpert dataset downloaded from -
https://us13.mailchimp.com/mctx/click?url=http%3A%2F%2Fdownload.cs.stanford.edu%2Fdeep%2FCheXpert-v1.0-
small.zip&xid=e8c89d129f&uid=55365305&pool=&subject=.
It consist of 224,316 chest radiographs of 65,240 patients.
This dataset includes labelled data for 14 pathological observations as positive, negative, or uncertain.
5. Exploratory Data Analysis:
EXPLORATORY DATA ANALYSIS
CheXpert Dataset has
more Frontal Chest
Images and lesser lateral
images. In terms of
gender and age group
distributions, the data
belongs to more male
patients, concentrated
between the age group
of 45 to 90 years.
As shown below, there are significant
number of uncertain labels in various
classes which posed a major challenge
while coming up with a model which works
across classes.
6. Exploratory Data Analysis:
EXPLORATORY DATA ANALYSIS (CONT..)
Below figure shows imbalance among the classes, along with
significant number of uncertain labels in various classes.
Further,
We observed uneven distribution of data for positive, negative classes in 14
medical conditions during our analysis.
Lots of missing or null values were observed along with uneven distribution in
the 14 Medical Conditions.
7. ARCHITECTURAL FLOW DIAGRAM
Training CXRs
Chexpert Dataset consists of
224,316 check X-Ray images of
65,240 images for 14
Pathological observations.
Test CXR
Exploratory Data Analysis
Exploratory Data Analysis will
help to identify patterns, spot
any anomalies, test given
hypothesis, check assumptions
with summary statistics and use
of graphical representations.
Data Preprocessing
In this stage we are converting
raw data into clean data set.
Data preprocessing will include
steps like data scaling and data
augmentation.
Feature Extraction
Feature extraction by using Deep
Learning Models (like LightNet-7,
DenseNet121, Hybrid Model) to
extract features from pre-processed
image data. We have tried concept
of Transfer Learning for this project.
Train Classifier
We are using Training data to fit
the model where train & test ratio
is 80:20.
Predictive Model
Model is now ready. It is capable to
make predictions on unknown input
x-ray image.
Target Label
Final output/pathology class
that we are trying to predict.
Pathology Label
14 Pathological
observations.
Performance Evaluation
Evaluation of predictive model
performance using various
metrics
8. MODELLING APPROACH
Following models were tried for the Project:
1. LightNet-7 : A 7 layered Deep Neural Network.
2. DenseNet121 from scratch.
3. DenseNet121 using pre-trained ImageNet weights.
4. Hybrid model with Random Forests (including extended features like Age-group, gender and type of image).
5. Hybrid model with AdaBoost (including extended features like Age-group, gender and type of image).
6. Hybrid model with XGBoost (including extended features like Age-group, gender and type of image).
9. Conv2D() -> Max Pool ->
DropOut(0.5)
Conv2D() -> Max Pool ->
DropOut(0.5)
Conv2D() -> Max Pool ->
DropOut(0.5)
Conv2D() -> Max Pool ->
DropOut(0.5)
Flatten()
Dense(128,relu) -> DropOut(0.5)
Dense(64,relu) -> DropOut(0.5)
Dense(2,softmax)
DEEP LEARNING MODELS USED – 7-LAYERED DNN (LIGHTNET-7*)
conv2
D
Max
Pool
Input
Max
Pool
Max
Pool
Max
Pool
conv2
D
conv2
D
conv2
D
Flattene
d
Fully Connected Neural
Network + ReLU Activation
With Dropout
Prediction
LightNet-
7
11. DEEP LEARNING MODELS USED – DENSENET121 (TRANSFER LEARNING)
Pre Trained
Generic Feature
Detection
Pre Trained Model
Fully Connected Neural
Network + ReLU Activation
With Dropout
Prediction
Input
Classification LayerInput Layer
12. HYBRID MODEL – DEEP NEURAL NETWORK WITH MACHINE LEARNING
CLASSIFICATION ALGORITHM
Pre Trained
Generic Feature
Detection
Pre Trained Model
Prediction
Input
Multiclass
Classification Model
Input Layer
Machine Learning
Models like
AdaBoost, XGBoost
or Random Forest
13. APPROACHES AND METHODOLOGY
We will discuss below, our approach and methodology which we adopted for this project:
We tried with Individual Class Wise Binary Classification, as training model together for all 14 Disease condition didn’t yield expected results.
With this approach, we came across Imbalanced Class problem for almost all 14 Disease conditions. Further, we observed uneven distribution of data for
positive, negative classes in 14 diseases along with missing or null values.
To handle imbalanced class problem, we tried technique of up-sampling the minority class using SMOTE approach which improved our results significantly.
We will now take a deep dive into other approaches used in the project :
Data Pre-processing through Data Scaling and Data Augmentation. By Data Scaling, we created image tensors for Input images before feeding to DNN. Each
Input Image Data was converted into Scaled Image Array with values between 0 and 1. An array list by consolidating all images was finally converted to numpy
array. Also, to overcome issue of overfitting, Data Augmentation was done on input images to generate virtual images, by making transformations like flipping
image, shearing and zooming.
Feature Extraction using Transfer Learning, where we used weights of Pre-Trained DenseNet-121, trained on vast set of images. We used its initial layers
to extract features from our images.
We also tried a Hybrid Model, where DNN was used only for feature extraction and classical machine learning and ensemble method algorithms like Random
Forest, AdaBoost and XGboost was used for Final classification.
14. EVALUATION METRICS
Model’s performance was evaluated by using different metrices like Accuracy, Precision, Recall, F1 Scores, AUROC scores.
Two additional metrics Hamming Loss and Model Training and Prediction Performance Rate (execution time for each iteration) was also tested.
Disease Best Model AUC Accuracy
No Finding LightNet-7 0.5 84%
Enlarged Cardio-
mediastinum
LightNet-7 0.79 79%
Cardiomegaly LightNet-7 0.71 74%
Lung Lesion DenseNet121 0.74 47%
Lung Opacity LightNet-7 0.72 72%
Edema LightNet-7 0.77 79%
Consolidation LightNet-7 0.81 68%
Pneumonia LightNet-7 0.69 64%
Atelectasis DenseNet121 0.59 54%
Pneumothorax LightNet-7 0.43 60%
Pleural Effusion DenseNet121 0.69 72%
Pleural Other DenseNet121 0.85 71%
Fracture LightNet-7 0.73 45%
Support Devices LightNet-7 0.5 54%
Accuracy and AUC-ROC scores on the validation dataset using LightNet-7,
DenseNet121 and Hybrid Model was considered and the best results are shown
in left figure.
LightNet-7 was the fastest as compared to the other 5 models tried and most
accurate in most of the cases.
The Ensemble methods although was very slow, but these improved the training
time accuracy. But on the validation dataset, the results were not good as
compared to LightNet-7 and DenseNet 121
Our results, were slightly better than results of Wang, Guendel and Yao, but
slightly worse than Rajpurkar.
15. RESULTS AND CONCLUSION
Through this project, we tried to come up with a framework to classify 14 thoracic diseases using CheXpert dataset, splitting it in 70-10-20
ratio.
In this framework, we used pre-trained DenseNet121 for Feature Extraction. For Classification, we have experimented with CNN
model, VGG16, Densenet121, ResNet models and the concept of Transfer Learning. We even tried classical Machine Learning
approaches like Random Forests, AdaBoost and XGBoost for classification.
For evaluating model's performance, we used metrics like Confusion matrices, Accuracy, Precision, Recall, F1 Scores, AUROC Scores
and Hamming Loss although we presented just Accuracy and AUC in this presentation.
Hybrid models are improving accuracy with more training but not improving AUC (especially on validation set) and they are slower than
others
As it was shown in slide before, our custom LightNet-7 worked better than others in most of the cases. DenseNet121 was the second best.
LightNet-7 also proved to be faster to train as compared to other models.
16. CHALLENGES FACED
The question of how to deal with null values for classification result was a challenge. Based on initial experiments and exploratory analysis, it
was decided to simply drop the null values.
Dealing with uncertain classes was one of the initial challenges, although the U-Ignore approach as mentioned in the CheXpert paper
(reference 5) was followed for this project.
We faced the challenge of Imbalanced class for almost all medical conditions. To overcome the same, we applied concepts like up-sampling
using Balanced class weights and SMOTE where the latter worked out better.
We applied concept of Data Augmentation and Regularization Techniques like Drop-Out to fight the challenge of Overfitting.
Due to Infrastructure limitations, like unavailability of GPU and high VM cost we couldn’t train our models on complete dataset.
17. IMPROVEMENTS AND FUTURE WORKS
The SMOTE approach to handle imbalanced data was taking very long as it tries to oversample from the entire dataset. Instead we would try to
come up with an algorithm which oversamples from random batches from the entire dataset.
Another future improvement can be specific localization with the help of Regional CNN (R-CNN).
We would like to try other approaches like U-One, U-Zero, U-SelfTrained mentioned in the CheXpert paper (reference 5) on the current models
to handle uncertain labels and see which approach works better for the various disease classes.