Assessing Explainability in Deep Learning for
Medical Image Analysis
AI & Data Science Group,
Department of Bioinformatics,
Fraunhofer Institute for Algorithms and Scientific Computing (SCAI)
December 10th
, 2024
Yusuf Brima
Talk Outline
● Introduction
● Significance of Explainability in Medical Image Analysis
● Experimental Setup
● Evaluation Metrics
● Results
● Key Takeaways
2
Introduction
3
Colon Pathology, Chest X-Ray, Dermatoscope, Retinal OCT, Chest X-Ray, Fundus Camera, Breast Ultrasound, Blood Cell Microscope, Kidney
Cortex Microscope, Abdominal CT
Background:
The promise of deep learning
(DL) in medical image analysis.
The challenge of explainability in
DL models.
Research Question
How well do state-of-the-art (SoTA) explainable deep learning
techniques work for medical image analysis tasks?
Research Objective
5
To assess image-based saliency methods using objective quantitative
analyses
Brima, Y., Atemkeng, M. Evaluation of explainability in deep learning
for medical image analysis. Springer Nature BioData Mining 17, 18
(2024). https://doi.org/10.1186/s13040-024-00370-4
Significance of Explainability in Medical Imaging
● Trust and Adoption:
○ Important for clinical adoption of AI in safety-critical settings.
● Research Gap:
○ Existing methods focus on visual inspection without quantitative analysis .
○ Thus, a need for an objective measure of performance of saliency methods.
6
Experimental Setup
● Datasets:
○ Brain Tumor MRI dataset (w GT annotation).
○ COVID-19 Chest X-ray dataset (w/o GT annotation).
● Models Used:
○ Various CNN architectures (e.g., ResNet, DenseNet,
InceptionResNetV2).
● Tasks
○ Multi-task classification
○ Model Explainability
7
Model Training and Attribution
8
1
2
e.g., ScoreCAM
Quantitative and qualitative
assessment.
Saliency Methods
9
● Gradient-based (e.g., GradCAM, Integrated Gradients).
● Gradient-free (e.g., ScoreCAM).
● Other methods such as :
○ Concept Learning often require extensive annotation to define
concepts accurately and risk information leakage
○ LRP provides information about the relevance of input features to the
model's output, but it may not reveal all aspects of the model's
behavior
Evaluation Metrics
● Qualitative Assessment:
○ Visual inspection of attribution maps.
● Quantitative Assessment using Performance Information Curves (PICs)1
:
○ AICs to measure the relationship between saliency map intensity and model accuracy.
○ SICs to measure the correlation between the saliency map intensity and the model’s output
probabilities (Softmax scores)
10
1 Kapishnikov A, Venugopalan S, Avci B, Wedin B, Terry M, Bolukbasi T. Guided integrated gradients: An adaptive path method for removing noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. Honolulu: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2021. pp. 5050–8.
11
Start
Make image fully blurred
Use saliency method to
reintroduce important
pixels
Compute Entropy
Perform Classification
Fully
Restored
Plot AIC
End
No
Yes
Shannon Entropy with total
intensity levels
Start
Make image fully blurred
Use saliency method to
reintroduce important
pixels
Compute Entropy
Perform Classification
Fully
Restored
Plot AIC
End
No
Yes
Buffer-size-based Shannon Entropy-based
Estimates image
entropy by the file
size of the image
after lossless
compression and
calculating the buffer
length
Image Blurring and Saliency Assessment for XRAI
12
Image Blurring and Saliency Assessment for GradCAM
13
Task-based evaluation
14
Brain MRI
Chest X-ray
Attribution on the Brain MRI
15
Comparison
16
Ours Existing method
Qualitative Assessment Chest X-ray
17
Ours on Chest X-ray
18
Existing method
Key Takeaways
● Enhanced Explainability : The proposed saliency-driven framework effectively
combines qualitative and quantitative analyses to assess explanability of deep
learning models in medical imaging.
● Critical Role of PICs : PICs provide valuable objective assessment of image-based
saliency methods.
● Model-Specific Effectiveness : Saliency methods like ScoreCAM and XRAI were
found to be particularly effective.
19
Limitations and Future Directions
● Limitation
○ 2-D images, however, real-world medical images are often 3-D voxels
○ Further analysis on other medical imaging modalities is also required
● Future directions
○ Alternative entropy estimation approaches due to quantization trade-off with
histogram approach
○ 3-D voxel-based saliency methods
○ Clinical translation
20
Thank you!
21

Assessing Explainability in Deep Learning for Medical Image Analysis

  • 1.
    Assessing Explainability inDeep Learning for Medical Image Analysis AI & Data Science Group, Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) December 10th , 2024 Yusuf Brima
  • 2.
    Talk Outline ● Introduction ●Significance of Explainability in Medical Image Analysis ● Experimental Setup ● Evaluation Metrics ● Results ● Key Takeaways 2
  • 3.
    Introduction 3 Colon Pathology, ChestX-Ray, Dermatoscope, Retinal OCT, Chest X-Ray, Fundus Camera, Breast Ultrasound, Blood Cell Microscope, Kidney Cortex Microscope, Abdominal CT Background: The promise of deep learning (DL) in medical image analysis. The challenge of explainability in DL models.
  • 4.
    Research Question How welldo state-of-the-art (SoTA) explainable deep learning techniques work for medical image analysis tasks?
  • 5.
    Research Objective 5 To assessimage-based saliency methods using objective quantitative analyses Brima, Y., Atemkeng, M. Evaluation of explainability in deep learning for medical image analysis. Springer Nature BioData Mining 17, 18 (2024). https://doi.org/10.1186/s13040-024-00370-4
  • 6.
    Significance of Explainabilityin Medical Imaging ● Trust and Adoption: ○ Important for clinical adoption of AI in safety-critical settings. ● Research Gap: ○ Existing methods focus on visual inspection without quantitative analysis . ○ Thus, a need for an objective measure of performance of saliency methods. 6
  • 7.
    Experimental Setup ● Datasets: ○Brain Tumor MRI dataset (w GT annotation). ○ COVID-19 Chest X-ray dataset (w/o GT annotation). ● Models Used: ○ Various CNN architectures (e.g., ResNet, DenseNet, InceptionResNetV2). ● Tasks ○ Multi-task classification ○ Model Explainability 7
  • 8.
    Model Training andAttribution 8 1 2 e.g., ScoreCAM Quantitative and qualitative assessment.
  • 9.
    Saliency Methods 9 ● Gradient-based(e.g., GradCAM, Integrated Gradients). ● Gradient-free (e.g., ScoreCAM). ● Other methods such as : ○ Concept Learning often require extensive annotation to define concepts accurately and risk information leakage ○ LRP provides information about the relevance of input features to the model's output, but it may not reveal all aspects of the model's behavior
  • 10.
    Evaluation Metrics ● QualitativeAssessment: ○ Visual inspection of attribution maps. ● Quantitative Assessment using Performance Information Curves (PICs)1 : ○ AICs to measure the relationship between saliency map intensity and model accuracy. ○ SICs to measure the correlation between the saliency map intensity and the model’s output probabilities (Softmax scores) 10 1 Kapishnikov A, Venugopalan S, Avci B, Wedin B, Terry M, Bolukbasi T. Guided integrated gradients: An adaptive path method for removing noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2021. pp. 5050–8.
  • 11.
    11 Start Make image fullyblurred Use saliency method to reintroduce important pixels Compute Entropy Perform Classification Fully Restored Plot AIC End No Yes Shannon Entropy with total intensity levels Start Make image fully blurred Use saliency method to reintroduce important pixels Compute Entropy Perform Classification Fully Restored Plot AIC End No Yes Buffer-size-based Shannon Entropy-based Estimates image entropy by the file size of the image after lossless compression and calculating the buffer length
  • 12.
    Image Blurring andSaliency Assessment for XRAI 12
  • 13.
    Image Blurring andSaliency Assessment for GradCAM 13
  • 14.
  • 15.
    Attribution on theBrain MRI 15
  • 16.
  • 17.
  • 18.
    Ours on ChestX-ray 18 Existing method
  • 19.
    Key Takeaways ● EnhancedExplainability : The proposed saliency-driven framework effectively combines qualitative and quantitative analyses to assess explanability of deep learning models in medical imaging. ● Critical Role of PICs : PICs provide valuable objective assessment of image-based saliency methods. ● Model-Specific Effectiveness : Saliency methods like ScoreCAM and XRAI were found to be particularly effective. 19
  • 20.
    Limitations and FutureDirections ● Limitation ○ 2-D images, however, real-world medical images are often 3-D voxels ○ Further analysis on other medical imaging modalities is also required ● Future directions ○ Alternative entropy estimation approaches due to quantization trade-off with histogram approach ○ 3-D voxel-based saliency methods ○ Clinical translation 20
  • 21.