ML Interpretability Inside Out

Machine Learning
Interpretability Inside Out
Mara Graziani, PhD student

What is interpretablity?
3
Well… it depends.
“ Rising
barometer”

Desired Interpretability
4
Algorithm
icTransparency
Translatability
Decomposability
Explainability
Scope for interpretation

5
“Interpretability is defined as the ability to explain or
to present in understandable terms to a human*.”
* not all humans are familiar with Machine Learning
[Kim et al., 2018]

6
“The goal of interpretability is to describe the internals of
a system in a way that is understandable to humans*.”
* not all humans are familiar with Machine Learning
[Giplin et al., 2019]

Interpretability as a human-centric “translation” problem
8
Explanation in the model
representation space
(input pixels, activations)
Explanation in the human
representation space
(a visualization, a concept, a
sentence, an important
factor)
[Kim et al., 2018]

Why do we need interpretablity?
9

10
Trained CNN
It’s a cat

11
Trained CNN
It’s a cat
YAY!

12
Trained CNN
It’s a cat
If you want to know more about networks easily fooled:
[Szegedy et al., 2013], [Nguyen et al., 2015],
[Papernot et al., 2016], [Moosavi-Dezfooli et al., 2017]

13
Trained CNN
It’s a cat
Oh…but…why?

14
Trained CNN
It’s a cat
…

Why questions
[Giplin et al., 2019]
Why is the model working?
Why is it not?
Why is the output like this?
Why is it not something else?
Why should we trust
the model?
Explain
Defend
actions
Gain trust …
develop better models!
15

Where do we need interpretablity?
16

HEALTH ROBOTICS ASSISTED DRIVING LAW SOCIAL SCIENCES
High-risk applications demanding also for accountability,
transparency, fairness, trust [FAccT conference]
FINANCE
Where do we need interpretability?
17

Where it is not needed
Why is it not needed?
One motivation does not cover it all …
privacy robustness
[Kim B, Hooker S.]
already well
studied problem
Where is it needed?
Why is it needed?
Safety, science, debugging.
aligning objectives
18

How do we achieve interpretablity?
19
Interpretability is challenging
and trending

Interpretability is challenging and trending
20
plot rights: B. Kim

Our goal today is the how
21
gain a clearer understanding
KNOWING WHAT TO
APPLY & WHERE

22
1. Inherently interpretable models
a. (Generalized) linear regression
b. Decision trees and rules
2. Interpreting complex models
a. From inside (opening the black box)
b. From outside (black-box)
3. What else? Use interpretability to develop
better models.
4. Q&A
Outline
Inside
Out
I will not talk about
dimensionality reduction

1. Inherently interpretable models
23 [Lou et al., 2012] [Caruana et al., 2015]

24
Linear Regression
Output is a weighted sum of each feature
A linear increase of a feature is translated into a proportional effected in the outcome
No interactions between features

25
Generalized Linear Regression
Family: normal
Link: identity
Family: binomial
Link: logit
Interpretation comes mostly from assumption on the data generation process
Complexity ~ generalization
We can replace with arbitrary distributions according to the data generation process
[Caruana et al. 2015]Generalized Additive Models

26
Decision trees and rules
Car’s model
Ah, it’s a Tesla
Autonomy >= 350?
Yes
Is it electric?
Yes
Trackable and explainable decisions.
Good for data interactions!
Only categorical features
Step functions…sharp!
Changes in data lead to
different tree
Complexity ~ depth [Kim B.]

27
Decision trees and rules
IF
PERSON CAPACITY < 2 &&
PRICE = ‘high’ &&
ELECTRIC = False &&
COMPANY LOGO = ‘horse’
THEN
car is a FERRARI
Intrinsic explanation
Sparse
Efficient
Not for regression
Only categorical features
Complexity ~ #rules
[Kim B.]

2. Interpret complex models
28
Deep Learning

Local Global
Model-specific Post-hoc
29
Helpful terminology
vs
= true for a specific instance
= true for an entire set of inputs
(ex. a class)
= model built-in analysis = applicable to any model
[Lipton, 2016]

InsideMMD [Kim et al., 2016].
IF [Koh et al., 2017]
Database search
Geometric approaches
Visualization
Deconv [Zeiler et al., 2013],
AM [Erhan et al., 2009],
Dissection [Bau et al., 2017]
SVCCA [Raghu et al., 2017]
Surrogates
2. Interpretability of deep learning inside out
30
Out Attribution…
To features
To concepts
LIME [Ribeiro et al., 2016].
SHAP [Lundberg et al., 2016]
Saliency [Simonyan et al., 2013],
CAM [Zhou et al., 2016],
LRP [Binder et al., 2016]
TCAV [Kim et al., 2018],
RCVs [Graziani et al., 2018]

Database search
31
What examples explain the data or the model?
Inside Global
Post-hoc

Database search
32
What examples explain the data or the model?
Prototype
= representative of all data
Criticism
= under-represented bit
Maximum Mean
Discrepancy
[Kim et al., 2016]
Global
Post-hoc

Influential Instances
33
Deleting one of these would strongly affect learning
Influence Functions
[Koh et al., 2017]
Best paper award!
Post-hoc
Global

Visualization
34
What is a neuron, a channel or a layer looking for?
Inside

35
Deconvolution:
inverting convolution
operations
[Zeiler et al., 2013]
Figure credits: stanford CS230 (2018 Youtube)
Post-hoc
Local Deconvolutions

36
Gradient Ascent
[Erhan et al., 2009],
[Olah et al. 2019]
This activation is maximized
Lucid toolbox
Post-hoc
Global Gradient Ascent

Network Dissection
37
Network Dissection
[Bau et al., 2017]
~1K concepts:
Post-hoc
Global
Early training finds concepts, late training improves them
color, texture,
material, object, scene
Set of segmented
regions for all
concepts

38
Singular Vector Canonical Correlation Analysis
Can we compress what a layer has learned? [Raghu et al., 2017]
Responses of this
layer to all data
Singular Value Decomposition
& Canonical Correlation Analysis
Allows comparisons of layers, archiectures and insights on training dynamics
What did it look like and what can we
do here?
Post-hoc
Global

InsideMMD [Kim et al., 2016].
IF [Koh et al., 2017]
Database search
Geometric approaches
Visualization
Deconv [Zeiler et al., 2013],
AM [Erhan et al., 2009],
Dissection [Bau et al., 2017]
SVCCA [Raghu et al., 2017]
Surrogates
2. Interpretability of deep learning inside out
39
Out Attribution…
To features
To concepts
LIME [Ribeiro et al., 2016].
SHAP [Lundberg et al., 2016]
Saliency [Simonyan et al., 2013],
CAM [Zhou et al., 2016],
LRP [Binder et al., 2016]
TCAV [Kim et al., 2018],
RCVs [Graziani et al., 2018]

Surrogate models
40
Replacement is an interpretable model trained on the data and the black-box predictions
Complex decision function
Linear surrogate (R2 0.7)
Flexibility by different surrogates
Very approximative ….
Post-hocGlobal

Local Interpretable Model-agnostic Explanations (LIME)
41
to explain each prediction individually
Local linear surrogate
Flexible, universal (post-hoc)
Size of local neighborhood undefined
[Ribeiro et al., 2017]

42
[Ribeiro et al., 2017]

43
[Ribeiro et al., 2017]Sampling of local instances not very robust

44
Sampling of local instances not very robust

45
[Ribeiro et al., 2017]Sampling of local instances not very robust

SHapley Additive exPlanations
46
A game theoretic approach of competing features
[Lundberg et al., 2017]
Attributes to each input feature the change in the expected model prediction when conditioning on that feature.
Unifying framework, direct for categorical features
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint.http://dx.doi.org/10.1101/206540doi:bioRxiv preprint first posted online Oct. 21, 2017;
Only qualitative on images, difficult abstraction

Attribution to features
47
Out

Saliency
48
Indicates which pixels need to be changed the least to affect the output the most.
∂output
∂input

Saliency**
49 Slide credits: Hooker S,
** Saliency was recently shown to be very unstable and failing to be reliable in edge cases such
as in randomized networks or compared to random attribution maps (see Remove and Retrain, ROAR)

Class Activation Mapping
50
Importance of the image
regions is given by the
projection of the weights
of the output layer on the
last layer’s convolutional
feature maps
[Zhou et al., 2015]
Greatly succesfull technique for its transparency and directness
Only qualitative evaluation
Little focus on multiple instances of the same object
Post-hocLocal

Attribution to concepts
51
Out
Concepts can define object categories, so what about their relevance in the classification?

Concept Activation Vectors
52
?
Vector of “striped” texture
∂output
∂vector
Directional
derivative
[Kim et al., 2018]
We collect examples of a
concept, i.e. “striped
texture”
We take the internal
activations (unrolled)
Classification of
“striped” vs
“random”
What about the relevance of “striped” texture in the classification of a zebra?
Generalized
saliency
Post-hocLocal Global

53
Out
What if the concept is non-binary?
Such as tumor extension, patient age, color, ..

Our research on
54
Out
Regression Concept Vectors
Continuous concepts are directly measured on the image

Regression Concept Vectors
55
Segmentation
(manual or
automatic)
Handcrafted
features, texture
descriptors, shape,
size, …
Take the internal
activations (aggregation)
Linear regression of
measures
∂output
∂vector
Directional
derivative
Generalized
saliency
[Graziani et al., 2018]
Best paper award, iMIMIC, MICCAI 2018!
Vector of “size”
Post-hocLocal Global

Application to health
56
1 Modeling of visual concepts
2 CNN explanation
Nuclei
Pleomorphism
Tubular formation
Mitotic count
Enlarged nuclei
Vesicular
appearance
Multiple nucleoli
Segmentation
size
Image
texture
descriptors
contrast
ASM
correlation
area
Nottingham
grading
guidelines
relevant for
positive class
image
tumor
probability
contrast
area
ASM
correlation
relevant for
negative class
black-box
state-of-art
model
contrast
ASM
correlation
area
high
low

Application to health
57
curvature mean
raw segmented
Individual relevancepn = 0.22
ppre = 0.70
pplus = 0.08
GT: normal; prediction: normal
cti median
cti meancurvature median
avg point diameter mean avg segment diameter median
raw segmented
pn = 0.99
ppre = 0.009
pplus = 0.0
1.082

1.168

0.118

0.447

5.24

-1 10
1.030

1.045

0.040

0.095

3.775

Retinopathy of prematurity
[Yeche et al., 2019]
Radiomics
Image credits: Yeche et al. springer

Applications to computer vision
58
Understanding scale

Applications to computer vision
59
Image blue-ness
Color and texture were used to
reduce this loss!
Interpreting intentionally ﬂawed models

Our goal today is the how
60
gain a clearer understanding
WHAT TO
APPLY & WHERE?

61
WHAT TO APPLY & WHERE?
What do you need most?
In deep learning
Understand
each
component
User-friendly
explanations
Visual explanations on
single input
Global
understanding
Individually
Interactions
Make
comparisons
Surrogate
models
From dataset of
conceptual examples
Measuring attributes on
imagesRCVs CAVs
Gradient
Ascent
Geometry-
based
Before decision
layer
Class
Activation
Maps
Layerwise
Layerwise
Relevance

Current research
63
Can we use interpretability for
better control and development?

64
Can we use interpretability for better control and development?
It’s a cat
It has pointy ears…
And mustache!
Cats DO NOT have penguin legs….
Robustness to
adversarial

65
Can we use interpretability for better control and development?
Interpretability
analysis
Prior knowledge
User’s feedback
Additional targets for
our model output
Desired
features
Multi-task
Learning
Undesired
Features
Adversarial
Learning

66
TAKE AWAY
Pic credits: bannerengineering.com
Cartography to navigate
interpretability (slide 121)
Growing conferences, workshops:
Some interesting people and projects:
FAccT, Tutorial on Interpretable Machine Learning,
NeurIPS Interpretable ML, ICML Interpretable ML,
DL summer schools, AISTATS, CVPR, ICLR, ECCV,
ICCV, ECML, KDD, Workshop on Intepreting and
Explaining Visual AI Models, Tutorial on
Interpretable & transporent deep learning, WHI 2020
(virtual this year)… and many others!
B. Zhou (Torralba, MIT), B. Kim (Google Brain), G. Montavon (heatmapping.org), DARPA’s XAI, Ruth C.
Fong (Harvard & Microsoft), Finale Doshi-Velez (Harvard), A. Weller (Cambridge), S. Lundberg
(Microsoft), DARPA’s XAI Explainable Artificial Intelligence…and many others!
ML interpretability is human-centric, multi-faceted and should be tailored on a precise scope.

Thank you!
67
linear probing interpretationinternal representation
classiﬁcation [1] regression (ours)
TCAV [1]
Br (ours [3,4,5])
UBS [2]
post-hoc interpretability
Text
DL model
feedback
through
concepts
Modeling of prior knowledge
area
contrast
Update of objective function
Handcrafted ML features
ML interpretability for healthcare: our vision
mara.graziani@hevs.ch
@mormontre
@maragraziani

68
mara.graziani@hevs.ch
@mormontre
@maragraziani
Q&A?

ML Interpretability Inside Out

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ML Interpretability Inside Out

Similar to ML Interpretability Inside Out (20)

Recently uploaded

Recently uploaded (20)

ML Interpretability Inside Out