A Benchmark for Interpretability Methods in Deep Neural Networks

Contents
1. Introduction
2. Interpretability methods
3. ROAR : RemOve And Retrain
4. KAR : Keep And Retrain
5. Conclusion
2

Introduction
• XAI (Explainable AI)
1. There is no ground truth.
2. It is unclear which should select.
→ We need a framework to validate the relative merits and reliability
• Commonly used strategy
: Remove the informative features & look at how the classifier degrades
→ Samples where the features are removed come from a different distribution!
→ It is unclear whether the degradation in model performance comes from the
distribution shift.
3

Introduction
• RemOve And Retrain (ROAR)
4
Train the model
Get
the saliency map
from trained
model
Remove
informative
features
from inputs
Retrain
the model

Introduction
• RemOve And Retrain (ROAR)
5
Train the
model
Get
the saliency
map
from trained
model
Remove
informative
features
from inputs
Retrain
the model

Interpretability methods
• Base estimators
• Gradients or Sensitivity heatmaps (GRAD)
𝒆 =
𝝏𝑨 𝒏
𝒍
𝝏𝒙𝒊
• Guided Backpropagation (GB)
• Integrated Gradients (IG)
𝒆 = 𝒙𝒊 − 𝒙𝒊
𝟎
× ෍
𝒊=𝟏
𝒌
𝝏𝒇 𝒘(𝒙 𝟎
+
𝒊
𝒌
𝒙 − 𝒙 𝟎
)
𝝏𝒙𝒊
×
𝟏
𝒌
6
https://arxiv.org/pdf/1412.6806.pdf

• Ensembling methods
• Classic SmoothGrad (SG)
𝒆 = ෍
𝒊=𝟏
𝑱
(𝒈𝒊 𝒙 + 𝜼, 𝑨 𝒏
𝒍
)
• SmoothGrad² (SG-SQ)
𝒆 = ෍
𝒊=𝟏
𝑱
(𝒈𝒊 𝒙 + 𝜼, 𝑨 𝒏
𝒍 𝟐
)
• VarGrad (Var)
• 𝒆 = Var(𝒈𝒊 𝒙 + 𝜼, 𝑨 𝒏
𝒍
) 7

• Control Variants
• Random
• Sobel Edge Filter
8
https://m.blog.naver.com/PostView.nhn?blogId=roboholic84&logNo=220482877717&proxyReferer=https%3A%2F%2Fwww.google.com%2F

ROAR : RemOve And Retrain
• Mechanism
1. Get an estimate 𝒆 of feature importance from every input
2. Rank each 𝒆 into an ordered set 𝒆𝒊
𝒐
𝒊=𝟏
𝑵
3. Replace the corresponding pixels in the raw image with the per channel mean
4. Generate new train and testset at different degradation levels t = [0., 10., …, 100]
5. Retrain!
9Saliency map t = 10% t = 30% t = 70% t = 90%Input image

• Why do we need to retrain the model?
1. The train and testset must have a similar distribution!
2.
10

• Results
11

• Results
12

KAR : Keep And Retrain
• Results
14

A Benchmark for Interpretability Methods in Deep Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to A Benchmark for Interpretability Methods in Deep Neural Networks

Similar to A Benchmark for Interpretability Methods in Deep Neural Networks (20)

More from Sungchul Kim

More from Sungchul Kim (20)

Recently uploaded

Recently uploaded (20)

A Benchmark for Interpretability Methods in Deep Neural Networks