Pneumothorax (PTX), more commonly known as “collapsed lung”, is a potentially life-threatening condition in which air enters the cavity between a patient’s lung and chest wall, inhibiting their ability to breathe. If not treated in a timely manner, typically through the insertion of a chest tube, PTX can result in hypoxia (oxygen deprivation), neurological damage, and, in the worst case, death. Diagnosis of PTX is currently a manual process performed by a radiologist via inspection of a patient’s chest x-ray image. In this paper we explore the potential of Deep Convolutional Neural Networks (CNNs) to automatically diagnose PTX in chest x-rays with the hope of reducing diagnosis time of this condition. We also present heat mapping based on network activations as a novel technique to visualize its performance against individually classified images. The NIH ChestX-ray8 dataset, which is labeled and contains over 100,000 anonymized chest x-rays from 30,000 patients, was used to train a Deep CNN. The final trained CNN is comprised of 5 convolutional layers, 4 pooling layers and 3 dropout layers. This network has a prediction accuracy of 78.5% and a ROC of 0.86 on the validation dataset. These results are encouraging and indicate that with further development Deep Learning has the potential to be clinically useful for automated Pneumothorax detection
Deep Learning Detects Pneumothorax in Chest X-Rays
1. Pneumothorax Detection Using Deep Convolutional Neural Networks
Matt DiNardo, Jennifer Leong, Michael Sebetich, Eric Sheng
Georgia Institute of Technology, Atlanta, GA
Abstract
Pneumothorax (PTX), more commonly known as “collapsed lung”, is a potentially life-threatening
condition in which air enters the cavity between a patient’s lung and chest wall, inhibiting their ability to
breathe. If not treated in a timely manner, typically through the insertion of a chest tube, PTX can result in
hypoxia (oxygen deprivation), neurological damage, and, in the worst case, death. Diagnosis of PTX is
currently a manual process performed by a radiologist via inspection of a patient’s chest x-ray image. In
this paper we explore the potential of Deep Convolutional Neural Networks (CNNs) to automatically
diagnose PTX in chest x-rays with the hope of reducing diagnosis time of this condition. We also present
heat mapping based on network activations as a novel technique to visualize its performance against
individually classified images. The NIH ChestX-ray8 dataset, which is labeled and contains over 100,000
anonymized chest x-rays from 30,000 patients, was used to train a Deep CNN. The final trained CNN is
comprised of 5 convolutional layers, 4 pooling layers and 3 dropout layers. This network has a prediction
accuracy of 78.5% and a ROC of 0.86 on the validation dataset. These results are encouraging and
indicate that with further development Deep Learning has the potential to be clinically useful for
automated Pneumothorax detection. A video presentation of this analysis can be found on YouTube at
this URL: https://youtu.be/GQjjMoDVBRE
1 Introduction
In American hospitals between the years 2000 and 2002 1.011 per 1,000 hospitalized at risk patients (a
total of 33,571) developed iatrogenic pneumothorax (caused by medical care) and 18.57% of these cases
resulted in death1
. More broadly it was estimated that in 2000 in the US iatrogenic PTX cases occurred in
0.698 per 1,000 hospital discharges2
. Other types of PTX, such as Primary Spontaneous PTX (PSP),
occur at lower rates (7.4 per 100,000 population in the US), but can also threaten patient safety if not
treated. The rate of PSP is estimated to be even higher in the UK at 37 per 100,000 population per year3
.
A clinician may suspect pneumothorax if a patient complains of sharp and stabbing chest pains
accompanied by a popping sensation7
. However, PTX can be confirmed most directly by a review of the
patient’s chest x-ray. To diagnose PTX, radiologists look for telltale signs of the condition in the x-ray,
such as a pleural white line and lack of pulmonary indication in a section of the chest cavity6
. Given that
this is a visual detection task based on the examination of an image, we feel a Deep Learning algorithm
has the potential to learn how to identify potential cases of PTX. Such an algorithm, if deployed in a
clinical setting as a tool for triaging patient care, could detect PTX in an x-ray automatically and notify an
attending physician or radiologist of the condition for immediate follow-up. This could result in significantly
faster treatment times for patients with PTX than is possible today.
Given the potential for improving patient health by assisting in the early diagnosis of PTX, as well as
advances in recent years in the performance of Deep Learning algorithms such as Deep CNNs for image
classification tasks, we explore in this paper the development and training of such an algorithm for
pneumothorax detection. The goal of this algorithm is to accurately and consistently detect the occurrence
of PTX in a chest x-ray for the purpose of triaging the most critical patients for clinicians. We feel that now
is an opportune time to explore such a project and is supported by other work currently being conducted
on this topic via partnerships like those between GE & UCSF5
and also by academic research like
Chexnet14
, and other related studies on Deep Learning with medical image analysis15,16,17
.
2. 2 Methodology
In this section, we will discuss the methodology used to develop the CNN solution for PTX diagnosis. The
key components of the methodology are: curation of datasets, neural network architecture design, and
model training and evaluation practices.
Data & Statistics
For this study we used the NIH chest x-ray dataset, a labeled dataset that was recently made available to
the public8,9
. The dataset includes 108,948 chest x-rays, representing 32,717 patients with labels
generated by using NLP on the corresponding radiological reports. 5,302 of the images have a finding
label of Pneumothorax, representing 1,487 patients and ~5% of all images. The images are black and
white and were all resized by the data publisher to a resolution of 1024x1024. The original image size and
details on pixel spacing are included in the labeled dataset provided with the images. Labels are also
provided to indicate the conditions present in each image including Pneumothorax as well as others
(Hernias, Mass, Cardiomegaly, and so on.)
No Finding Single Finding:
Infiltration
Single Finding:
Pneumothorax
Multiple Findings
(Including PTX)
Multiple Findings (no
PTX)
Figure 1: Examples of Chest X-Rays with Various Labels from NIH CRX8 Dataset
This dataset presents several potentially serious challenges to training an effective classification model.
First, images were automatically labeled by a NLP algorithm which did not take into account whether PTX
was already treated before the image was taken. In many examples reviewed manually we can see a
chest tube inserted into the patient indicating the PTX has already been treated creating false positives in
the dataset. Second, patient orientation is not uniform across images. Most x-rays are front facing but
some are taken from the lateral axis of the patient. In other cases images are flipped upside down with
the neck of the patient at the bottom of the image and lungs are above while most images are inverted.
Finally, less than 5% of images are classified as PTX leaving relatively few positive cases to work with.
Figure 2: Frequency of Findings and Distribution of Data by Finding Multiplicity
3. Acknowledging these challenges we chose to construct our initial training dataset by combining the
pneumothorax-labeled images with an equal number of randomly selected non-pneumothorax-labeled
images. Our rationale was to include all of the available pneumothorax images as well as enough
non-pneumothorax images to provide sufficient training without non-pneumothorax images overwhelming
the model. We created a PySpark tool capable of resizing the images to 256x256 pixels in order to reduce
the dimensionality of the input data while maintaining enough information to detect subtle features of the
x-ray images.
CNN Architecture
Figure 3: Neural network architecture
After having experimented with different CNN architectures, we settled on one depicted in Figure 3. This
neural network begins with 2 layers of convolution (both 64 channels and a 5x5 kernel size), is then fed
into a max pooling of 2x2 kernel filtering at a 2x2 stride. ReLU activation function is used for this and all
other convolution layers. A dropout layer (p=0.1) is then introduced for the purpose of regularization to
prevent overfitting. Subsequently, the output of that is fed into a 128 channel convolution layer, followed
by a max pooling and then dropout (p=0.2) layer. The output of that is then fed into the next convolution
layer (256 channels), followed, again, by a max pooling and dropout (p=0.3). The last convolution layer is
of 512 channels, followed by another max pooling layer, only this time, is connected to a “Global Average
Pooling” layer responsible for the creation of heatmap outputs. This layer takes the dot product of the
output of that layer with the weights of the output layer, and is then normalized to generate the actual
heatmap. This special layer not only allows the CNN to perform classification on the input images, but
also uses heat mapping to draw physicians’ attention to specific areas of the x-ray in regards to where the
detection of PTX occurs. The heatmap would also be useful for future deep learning researchers when
exploring options to further increase model accuracy by visually examining where in images the
inaccuracies are coming from.
4. We chose the Pytorch deep learning framework to construct our model. An NVIDIA GTX 1080 GPU was
used to train our CNN.
Training and Validation of the Model
We used Cross Entropy Loss criterion with Stochastic Gradient Descent (SGD). Learning rate was 0.4
and we added momentum to make training faster with more accurate estimates of the gradient. We also
normalized each individual image to mean 0 and unit variance.
Our training set consisted of approximately 8,000 images, split roughly into 50/50 images of each class
(PTX vs. Not PTX). Each 20 training steps, we measured the running validation accuracy/loss with batch
size 40 on the validation set. Each full epoch we tracked the validation/loss on the full validation set (950
images). We used validation loss on the full set to decide when to stop updating the weights. The ROC
was measured on a separate test set of 950 images (1900 total test + validation images).
The key metrics we monitored during training are training and validation set loss and accuracy. We
monitored the training progress through TensorboardX to look for signs of overfitting - where the training
set loss and accuracy continues to improve, but those of the validation set start to worsen with additional
epochs. As it turns out, this started to occur when going beyond 120 epochs; early stopping was
administered as a result.
We had a model which had better AUC and ROC, but because the heat maps showed that it focused
more on non-lung features such as medical equipment and x-ray annotations, we decided to use the
current model, which focused on lung features.
3 Experimental Results
Our experiment is able to obtain an area under the ROC curve of 0.86 and classify images up to 78.5%
accuracy (Figure 5). Our CNN implementation also provides a key advantage of heatmap functionality
that isolates and calls attention to the areas indicative of PTX (Figure 6).
Figure 4: Validation Accuracy and Loss of Trained Model (on test set) over Epochs Training batch size was 15, loss was
assessed at each timestep. Validation batch size was 40 and was assessed every 20 steps.
5. Although the accuracy of our current model is of lower performance than the state-of-the-art classification
capability for PTX detection on x-ray images12, 13
, it is still at a comparable level. Figure 4 above shows
that our current accuracy is at 78.5% vs that of the state-of-the-art at 84.8%; Figure 5 below shows the
corresponding ROC curve as AUC equalling 0.86 vs. that of state-of-the-art at 0.91.
Figure 5: ROC Curve (on test set) and Normalized confusion matrix of Trained Deep CNN for PTX Detection
One key highlight of our solution is that among the x-ray images in the training that are labeled as PTX, a
large portion of them contain chest tubes - a apparatus inserted into the patients’ chest cavity to treat
PTX. Our solution, surprisingly, was not “distracted” by the presence of these chest tubes, and was able
to not only correctly classify but also highlight areas of the image that show signs of PTX, other than the
chest tubes themselves. We regard this as a significant feature of the model because the main purpose
of this classifier is to identify cases of PTX before any type of treatments. The presence of a chest tube
indicates the x-ray was taken post-treatment - so if a classifier were to depend heavily on the presence a
chest tube (a post-treatment feature) for classification, its usefulness would be largely diminished.
Another takeaway from our experimental results is that our approach to generating a heat map using the
activations of our CNN is incredibly useful in visualizing the areas of an x-ray that match learned markers
for Pneumothorax diagnosis. As can be seen in Figure 6 this provides a visual mechanism for clinicians to
understand why a PTX detection algorithm believes the condition is present in a given x-ray. It also
provides a critical tool for other researchers to interpret the findings and performance of their models on
particular image examples. We feel that this approach is novel and moves the needle forward in the
development of Deep Learning models for PTX diagnosis.
6. Figure 6: Left - Original input x-ray image diagnosed as a case of PTX. Right - Post classification version of the same image,
correctly classified as PTX, presented in a heatmap gradient, highlighting signs of PTX.
4 Further Research
Although our experimental results were approaching the performance of state of the art models previously
reported, we feel there are two primary areas of exploration that will further improve model performance
and make automated PTX detection a viable tool for application in a clinical setting. First, additional
training data should be generated by leveraging our PySpark image translation tool to create new images
from rotations and reflections of existing training data. This additional data will allow the learned model to
deal with more variations in x-ray data in the future, thus increasing its ability to generalize well. Second,
work should be done to isolate the lung cavities and the areas immediately surrounding them from the
rest of the x-ray in training and in deployment. This will help models to avoid learning to classify x-rays as
having PTX present because of the presence of medical devices and chest tubes in the image, reducing
false positives and allowing the model to be applied in a real clinical setting.
5 Conclusion
Our work demonstrates the high potential value of applying Deep CNN architectures to the problem of
automatically detecting Pneumothorax in chest x-ray images. Our heatmap visualization technique
represents a novel approach to the explainability problem of interpreting the activation mappings of neural
networks to better understand how a network classifies an x-ray. We believe that the combination of
additional performance improvements aimed at increasing prediction accuracy with our approach to
visualization will ultimately lead towards the development of models that can be applied in a clinical
setting to Pneumothorax detection. Having such an automated system in place will help to alert clinicians
of the presence of this life-threatening condition in a timely manner, reducing the amount of time it takes
for a patient to receive treatment as well as reducing morbidity and mortality in patient populations across
the globe.
7. References
1. HealthGrades Patient Safety Survey. July 2004
2. Agency for Healthcare Research & Quality National Healthcare Quality Report. May 2013
3. Light RW. Pleural Diseases, 6th ed, LWW; Sixth edition. May 2013
4. Daley. Pneumothorax Clinical Presentation. Sept 2015
5. UCSF, GE Healthcare Launch Deep Learning Partnership to Advance Care Globally. Nov 2016
6. Radiology Masterclass. Accessed 18 Nov 2018
7. Bhatnagar. How not to miss pneumothorax. Jan 2014
8. NIH. ChestX-ray8 Dataset Press Release. Sept 2017
9. Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald M. Summers,
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised
Classification and Localization of Common Thorax Diseases. May 2017
10. Light RW. Primary spontaneous pneumothorax in adults. May 2018
11. Zhou, Khosla, Lapedriza, Oliva, Torralba. Learning Deep Features for Discriminative Localization.
Dec 2015
12. Tae Joon Jun, Dohyeun Kim, and Daeyoung Kim. Automated diagnosis of pneumothorax using an
ensemble of convolutional neural networks with multi-sized chest radiography images.
13. Hamilton, Mueller, Alsaker. Incorporating a Spatial Prior into Nonlinear D-Bar EIT imaging for
Complex Admittivities. May 2016
14. Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding,
Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng. CheXNet:
Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. Dec 2017
15. Christian S. Perone, Pedro Ballester, Rodrigo C. Barros, Julien Cohen-Adad. Unsupervised domain
adaptation for medical imaging segmentation with self-ensembling. Nov 2018.
16. Emre Eğriboz, Furkan Kaynar, Songül Varli Albayrak, Benan Müsellim, Tuba Selçuk. Finding and
Following of Honeycombing Regions in Computed Tomography Lung Images by Deep Learning. Nov
2018.
17. Dwarikanath Mahapatra, Behzad Bozorgtabar, Jean-Philippe Thiran, Mauricio Reyes. Efficient Active
Learning for Image Classification and Segmentation using a Sample Selection and Conditional
Generative Adversarial Network. June 2018.