2. Dataset(MIMIC-CXR)
• The total size of the dataset is 4.6 TB.
• The data is too large, so we only downloaded p13 file with total size
274GB .
• We downloaded MIMIC-CXR dataset of year 2013 using wget from
https://physionet.org/files/mimic-cxr/2.0.0/files/p13/.
3. FINAL REPORT EXAMINATION: CHEST (PA AND LAT)
INDICATION: ___ year old man with pleural effusion //
eval eval
COMPARISON: Prior chest radiographs since ___ most
recently ___.
IMPRESSION: Moderate to large right pleural effusion
has increased since ___. No pneumothorax.
Atelectasis at the left base in the left upper lobe have
not improved since ___. Heart size indeterminate.
Right subclavian infusion catheter ends in the region of
the superior cavoatrial junction. No pneumothorax.
Severe thoracolumbar scoliosis alters the thoracic
anatomy.
4. Multiple images for single caption
• We had multiple images for a single caption, so we only extracted one
image with one caption for convenience.
5. Conversion
• The downloaded images from the dataset were in .dcm extension, so
we used Python to convert all the images from .dcm to .jpg.
• In this process the size of the image changed to 1.73gb from 62.4 gb.
6. Naming
• For proper processing we named corresponding image caption of the
x ray image with same name.
• There were total 4454 images.
7. Copying impression only from the text
• The .txt file contained the overall report, but we only need an
impression.
• Here, the impression is the summary of the findings.
• We used Python to copy contents from impression only, and any txt
file with no impression was saved as no impression.
8. Image1.jpg,Moderate to large right pleural effusion has increased since ___. No pneumothorax. Atelectasis at the left base in
the left upper lobe have not improved since ___. Heart size indeterminate. Right subclavian infusion catheter ends in the
region of the superior cavoatrial junction. No pneumothorax. Severe thoracolumbar scoliosis alters the thoracic anatomy.
9. Discarding unwanted caption
• The .txt files with no impression written on it, were discarded from
the file and corresponding images were also discarded.
• Same was done for .txt file with ‘as above’.
• The total number of remaining images and captions were 3378.
11. Flicker dataset
• Initially, we ran VGG16 and
LSTM model using flicker8k
dataset.
• The acquired BLEU scores for
the model were:
• BLEU-1:0.556621
• BLEU-2:0.327213
12. Using MIMIC-CXR dataset
• We used 3378 images and captions from MIMIC-CXR dataset.
• Each image file had its own .txt file, so we used python to covert all
.txt file into a single .txt file.
13.
14. Model
• We used VGG16 and LSTM for the model and the acquired BLEU score
were:
• BLEU-1:0.177608
• BLEU-2:0.0602426