Explanation in Machine Learning and Its Reliability

NeurIPS Meetup Japan 2021, Satoshi Hara
Explanation in ML
and Its Reliability
Satoshi Hara
Osaka University
1
NeurIPS Meetup Japan 2021

“Explanation” in ML
◼ Most of ML models are highly complex, or “black-box”.
◼ “Explanation in ML”: Obtain some useful information
from the model (in addition to prediction).
2
Preliminary
You are
sick.
Why?
Your XX
score is
too high.
You are
sick.
Why?
???
I don’t
know.
…
XX score is
too high.
Oh…

[Typical Explanation 1] Saliency Map
◼ Generate heatmaps where the model has focused on
when making predictions.
3
Preliminary
The outline of zebra
seems to be relevant.

[Typical Explanation 2] Similar Examples
◼ Provide some similar examples to the input of interest.
4
These images look similar.
The prediction “Lapwing” will
be correct.
Lapwing
Database
Provide some similar examples
Input
Prediction
Lapwing
Preliminary

History of “Explanation”
◼ History of Saliency Map
5
Dawn
2014 2016 2018 2020
2015 2017 2019
Exponential Growth of
Saliency Map Algos
Attack & Manipulation
Sanity Check
[Adebayo+,2018]
GuidedBP
[Springenberg+,2014]
DeepLIFT
[Shrikumar+,2017]
Grad-CAM
[Selvaraju+,2017]
ROAR
[Hooker+,2019]
MoRF/Deletion Metric
[Bach+,2015; Vitali+,2018]
LeRF/Insertion Metric
[Arras+,2017; Vitali+,2018]
Sensitivity
[Kindermans+,2017]
Evaluation Methods
Saliency
[Simonyan+,2014]
IntGrad
[Sundararajan+,2017]
SHAP
[Lundberg+,2017]
LIME
[Ribeiro+,2016]
LRP
[Bach+,2015]
Fairwashing
[Aivodji+,2019]
SmoothGrad
[Smilkov+,2017]
DeepTaylor
[Montavon+,2017]
Occlusion
[Zeiler+,2014]
CAM
[Zhou+,2016]
Manipulation
[Domobrowski+,2019]
The papers on “Explanation”
increased exponentially.
2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008
800
700
600
500
400
300
200
100
0
Searched
“Interpretable Machine Learning”
and
“Explainable AI”
on Web of Science
Preliminary

History of “Explanation”
◼ History of Saliency Map
6
Dawn
2014 2016 2018 2020
2015 2017 2019
Exponential Growth of
Saliency Map Algos
Attack & Manipulation
Sanity Check
[Adebayo+,2018]
GuidedBP
[Springenberg+,2014]
DeepLIFT
[Shrikumar+,2017]
Grad-CAM
[Selvaraju+,2017]
ROAR
[Hooker+,2019]
MoRF/Deletion Metric
[Bach+,2015; Vitali+,2018]
LeRF/Insertion Metric
[Arras+,2017; Vitali+,2018]
Sensitivity
[Kindermans+,2017]
Evaluation Methods
Saliency
[Simonyan+,2014]
IntGrad
[Sundararajan+,2017]
SHAP
[Lundberg+,2017]
LIME
[Ribeiro+,2016]
LRP
[Bach+,2015]
Fairwashing
[Aivodji+,2019]
SmoothGrad
[Smilkov+,2017]
DeepTaylor
[Montavon+,2017]
Occlusion
[Zeiler+,2014]
CAM
[Zhou+,2016]
Manipulation
[Domobrowski+,2019]
The papers on “Explanation”
increased exponentially.
800
700
600
500
400
300
200
100
0
Searched
“Interpretable Machine Learning”
and
“Explainable AI”
on Web of Science
Reliability of “Explanation” has raised
as a crucial concern.
Are the “Explanation” truly valid?
With “Explanation”, how malicious
we can be?
Preliminary
2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008

Technical / Social Reliability of “Explanation”
Technical Reliability “Is the explanation valid?”
What we care:
• Do the algorithms output valid “Explanation”?
Research Question:
• How can we evaluate the validity of “Explanation”?
Social Reliability “Does explanation harm the society?”
What we care:
• What will happen if we introduce “Explanation” to society?
Research Question:
• Are there any malicious use cases of “Explanation”?
7
Technical Reliability

Faithfulness & Plausibility of “Explanation”
◼ Faithfulness [Lakkaraju+’19; Jacovi+’20]
• Does “Explanation” reflect the model’s reasoning process?
- Our interest is “How and why the model predicted that way.”
• Any “Explanation” irrelevant to the reasoning process is invalid.
- e.g. “Explanation” outputs something independent of the model.
◼ Plausibility [Lage+’19; Strout+’19]
• Does “Explanation” make sense to the users?
• Any “Explanation” unacceptable by the users is not ideal.
- e.g. Entire program code; Very noisy saliency map.
8

Evaluation of “Explanation”
◼ Based on Faithfulness
• Sanity Checks for Saliency Maps, NeurIPS’18.
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
• An epoch-making paper by Google Brain.
• Evaluation of Faithfulness for saliency maps.
◼ Based on Plausibility
• Evaluation of Similarity-based Explanations, ICLR’21.
- Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui
• Evaluation of Plausibility for similarity-based explanations.
9

Evaluation of Saliency Map
◼ Plausibility
• All the maps look more or less plausible.
• Gradient, IntegratedGrad are bit noisy.
◼ Faithfulness?
10
The outline of zebra
seems to be relevant.

Evaluation of Faithfulness is Not Possible.
◼ Faithfulness
• Does “Explanation” reflect the model’s reasoning process?
◼ Alternative: Sanity Check
• Check the necessary condition for faithful “Explanation”.
◼ Q. What is the necessary condition?
• “Explanation” is model-dependent.
- Any “Explanation” irrelevant to the reasoning process is invalid.
11
Unknown
→ We cannot compare with Ground Truth.
[Remark] Passing Sanity Check alone
does not guarantee faithfulness.

Model Parameter Randomization Test
◼ Compare “Explanation” of two models with different
reasoning processes.
• Faithful “Explanation” → Outputs are different.
• Non-Faithful “Explanation” → Outputs can be identical.
12
Satisfies the necessary condition.
Passed the sanity check.
[Assumption]
These models have
different reasoning
processes.
Model 1: Fully Trained Model 2: Randomly Initialized
Input “Explanation”
by Algo. 1
“Explanation”
by Algo. 2
“Explanation” by Algo. 1 are different.
“Explanation” by Algo. 2 are identical.
Violates the necessary condition.
Failed the sanity check.

Model Parameter Randomization Test
◼ Model 2: DNN with last few layers randomized.
• Saliency Maps of Guided Backprop and Guided GradCAM are
invariant against model randomization.
→ They violate the necessary condition for faithfulness.
13
Model
1
Model
2
[Ref] Sanity Checks for Saliency Maps

Evaluation of “Explanation”
◼ Based on Faithfulness
• Sanity Checks for Saliency Maps, NeurIPS’18.
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
• An epoch-making paper by Google Brain.
• Evaluation of Faithfulness for saliency maps.
◼ Based on Plausibility
• Evaluation of Similarity-based Explanations, ICLR’21.
- Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui
• Evaluation of Plausibility for similarity-based explanations.
14

Evaluation of Similarity-based Explanation
◼ Faithfulness
• We can use Model Parameter Randomization Test.
◼ Plausibility?
15
These images look similar.
The prediction “Lapwing” will
be correct.
Lapwing
Database
Provide some similar examples
Input
Prediction
Lapwing

Plausibility in Similarity-based Explanation
◼ Example
• Explanation B won’t be acceptable by the users.
- Plausibility of Explanation A > Plausibility of Explanation B
16
Database
frog
Explanation A
Database
truck
Explanation B
frog
Input
Prediction

Evaluation of Plausibility is Not Possible.
◼ There is no universal criterion that determines the
acceptability of the users.
◼ Alternative: Sanity Check
• Check the necessary condition for faithful “Plausibility”.
◼ Q. What is the necessary condition?
• Obtained similar instance should belong to the same class.
17
is cat because a similar is cat.
is cat because a similar is dog.
Plausible
Non-Plausible
Identical Class Test

Identical Class Test
18
Input
Dot Last Layer
All Layers
Input
Cos Last Layer
All Layers
Input
L2 Dist. Last Layer
All Layers
Influence Function
Relative IF
Fisher Kernel
Dot
Cos
Parameter Grad.
Fraction of Test Instances Passed Identical Class Test
0 0.5 1.0 0 0.5 1.0
(Image Clf.)
CIFAR10
+ CNN
(Text Clf.)
AGNews
+ Bi-LSTM
Cosine similarity of the
parameter gradient
performed almost perfectly.

Cosine of Parameter Gradient
• GC 𝑧, 𝑧′ =
∇𝜃ℓ 𝑦,𝑓𝜃 𝑥 ,∇𝜃ℓ 𝑦′,𝑓𝜃 𝑥′
∇𝜃ℓ 𝑦,𝑓𝜃 𝑥 ∇𝜃ℓ 𝑦′,𝑓𝜃 𝑥′
19
Sussex spaniel beer bottle mobile house

What we care:
Research Question:
What we care:
Research Question:
20
Social Reliability

Malicious Use Cases of “Explanation”
◼ Q. Are there malicious use cases of “Explanation”?
A. Some may try to deceive people
by providing fake explanations.
◼ Q. When and why fake explanations can be used?
A. Fake explanations can show models better,
e.g., by pretending as if the models are fair.
◼ Q. Why we need to research fake explanations?
Are you evil?
A. We need to know how malicious we can be with fake
explanations. Otherwise, we cannot defend against
possible maliciousness.
21
Social Reliability

Fake “Explanation” for Fairness
◼ Fairness in ML
• Models can be biased towards gender, race, etc.
• Ensuring fairness of the models is crucial nowadays.
◼ What if we cannot detect the use of unfair models?
• Some may use unfair models.
- Unfair models are typically more accurate than the fair ones.
22
Social Reliability
Our model is the most accurate one in this business field.
(because of the use of unfair yet accurate model)
Moreover, our model is fair without any bias.
(by showing fake explanation)

◼ Fake “Explanation” by Surrogate Models
• Fairwashing: the risk of rationalization, ICML’19.
- Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, Alain Tapp
• Characterizing the risk of fairwashing, NeurIPS’21.
- Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, Satoshi Hara
◼ Fake “Explanation” by Examples
• Faking Fairness via Stealthily Biased Sampling, AAAI’20.
- Kazuto Fukuchi, Satoshi Hara, Takanori Maehara
◼ Ref.
• It’s Too Easy to Hide Bias in Deep-Learning Systems,
IEEE Spectrum, 2021.
23

The risk of “Fairwashing”
◼ Explaining fairness
24
an honest explanation
Your loan application is rejected
because your gender is …
Unfair AI: reject applicants
based on their gender.
Social Reliability

25
a dishonest explanation
because your income is low.
Social Reliability

26
“Fairwashing”
Malicious decision-makers can disclose a fake
explanation to rationalize their unfair decisions.
“Fairwashing”
Social Reliability

27
This Study: LaundryML
Possible to systematically generate
fake explanations.
Raise the awareness of the risk of
“Fairwashing”.
“Fairwashing”
Malicious decision-makers can disclose a fake
explanation to rationalize their unfair decisions.
“Fairwashing”
Social Reliability

◼ The idea
Generate many explanations,
and pick one that is useful for “Fairwashing”.
◼ many explanations
• Use “Model Enumeration” [Hara & Maehara’17; Hara & Ishihata’18]
• Enumerate explanation models.
◼ pick one
• Use fairness metrices such as demographic parity (DP).
• Pick an explanation most faithful to the model, with DP less
than a threshold.
28
LaundryML
Systematically generating fake explanations
The idea
Social Reliability

Result
◼ “Fairwashing” for decisions on Adult dataset
• Feature importance by FairML on “gender” has dropped.
29
A naïve explanation A fake explanation
gender
gender
Social Reliability

Result
◼ “Fairwashing” for decisions on Adult dataset
• Feature importance by FairML on “gender” has dropped.
30
A naïve explanation A false explanation
gender
gender
If
else if
else if
else if
else if
else low-income
then high-income
then low-income
then low-income
then low-income
then high-income
capital gain > 7056
marital = single
education = HS-grad
occupation = other
occupation = white-colloar
Fake Explanation
Social Reliability

◼ Fake “Explanation” by Surrogate Models
• Fairwashing: the risk of rationalization, ICML’19.
- Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, Alain Tapp
• Characterizing the risk of fairwashing, NeurIPS’21.
- Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, Satoshi Hara
◼ Fake “Explanation” by Examples
• Faking Fairness via Stealthily Biased Sampling, AAAI’20.
- Kazuto Fukuchi, Satoshi Hara, Takanori Maehara
◼ Ref.
• It’s Too Easy to Hide Bias in Deep-Learning Systems,
IEEE Spectrum, 2021.
31

Fairness Metrics
◼ Quantifying fairness of the models
• Several metrics + toolboxes
- FairML, AI Fairness 360 [Bellamy+’19], Aequitas [Saleiro+’18]
32
AI Fairness 360
Social Reliability

Fake Fairness Metrics
33
Malicious Party
Unfair Model
Service
Fairness Metric
as Evidence
Is this a fake
metric.
No guarantee whether the metrics are
computed appropriately.
Impossible to determine fake or not.
Metric alone is not a valid evidence of fairness.
Social Reliability

Avoiding Fake Fairness Metrics
34
Malicious Party
Unfair Model
Service
Benchmark Data
as Evidence
Fairness metric computed
on the benchmark is fair!
The metric is reproducible
using the benchmark data.
We can avoid fake!
Social Reliability

(Failed) Avoiding Fake Fairness Metrics
35
Malicious Party
Unfair Model
Service
Benchmark Data
as Evidence
Fairness metric computed
on the benchmark is fair!
The metric is reproducible
using the benchmark data.
We can avoid fake!
The benchmark data can be fake.
Social Reliability

Generating Fake Benchmark
◼ Subsample the benchmark dataset 𝑆
from the original dataset 𝐷.
◼ “Ideal” Fake Benchmark Dataset 𝑆
• Fairness : Fairness metric computed on 𝑆 is fair.
• Stealthiness : The distribution of 𝑆 is close to 𝐷.
36
Benchmark
Fairness
Stealthiness
“Fair” Contingency Table
Original dataset
Social Reliability

参照用
データ
Goodness-of-Fit Test
Generating Fake Benchmark
◼ Optimization of 𝑆 as LP (Min-Cost Flow)
min𝑆𝑊 𝑆, 𝐷 , s. t. 𝐶 𝑆 = 𝐶𝑇
◼ Detection of fake benchmark using statistical test.
• Min. Distribution diff. ≈ small detection probability
• Rejecting 𝑝 𝑆 = 𝑝(𝐷′) with KS test is probability
at most 𝑂 𝑆 𝛼 × Distribution diff.
37
Stealthiness
(Min. Distribution diff.)
Fairness
(Constraint on Contingency Table)
Reference
Data
Social Reliability

Undetectability of Fake Benchmark
38
Positive Cases in Contingency Table Positive Cases in Contingency Table
Fairness
Metric
(DP)
Distribution
diff.
COMPAS
Positive Cases in Contingency Table Positive Cases in Contingency Table
Fairness
Metric
(DP)
Distribution
diff.
Adult
Random Sampling
Case-Control Sampling
Proposed Sampling
Random Sampling
Case-Control Sampling
Proposed Sampling
Proposed sampling resulted to
fairer metric.
(= achieved fake fairness)
Proposed sampling attained distribution
almost identical to the original distribution.
(= undetectable)
Social Reliability

What we care:
Research Question:
What we care:
Research Question:
39
Summary

What we care:
Research Question:
What we care:
Research Question:
40
Summary
How can we evaluate the validity of “Explanation”?
Which evaluation is good for which “Explanation”?
When “Explanation” can be used maliciously?
Can we detect malicious use cases?

Explanation in Machine Learning and Its Reliability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Explanation in Machine Learning and Its Reliability

Similar to Explanation in Machine Learning and Its Reliability (20)

More from Satoshi Hara

More from Satoshi Hara (10)

Recently uploaded

Recently uploaded (20)

Explanation in Machine Learning and Its Reliability