SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Machine Learning Models: Explaining their Decisions, and Validating the Explanations
Machine Learning Models: Explaining their Decisions, and Validating the Explanations
1.
1/45
Explaining Machine Learning Decisions
Grégoire Montavon, TU Berlin
Joint work with: Wojciech Samek, Klaus-Robert Müller,
Sebastian Lapuschkin, Alexander Binder
18/09/2018 Intl. Workshop ML & AI, Telecom ParisTech
2.
2/45
From ML Successes to Applications
Autonomous Driving
Medical Diagnosis
Networks (smart grids, etc.)
Visual Reasoning
AlphaGo beats Go
human champ
Deep Net outperforms
humans in image
classification
3.
3/45
Can we interpret
what a ML model has
learned?
4.
4/45
First, we need to define
what we want from
interpretable ML.
5.
5/45
Understanding Deep Nets: Two Views
Understanding what
mechanism the network
uses to solve a problem or
implement a function.
Understanding how the
network relates the input
to the output variables.
7.
Interpreting Predicted Classes
Image from Symonian’13
Example: “How does a goose typically look like
according to the neural network?”
goose
non-goose
8.
8/45
Explaining Individual Decisions
Images from Lapuschkin’16
Example: “Why is a given image classified as a
sheep?”
sheep
non-sheep
9.
9/45
Example: Autonomous Driving [Bojarski’17]
Input:
Decision
Bojarski et al. 2017 “Explaining How a Deep Neural Network Trained with End-to-
End Learning Steers a Car”
PilotNet
Explanation:
10.
10/45
Example: Pascal VOC Classification [Lapuschkin’16]
Comparing Performance on Pascal VOC 2007
(Fisher Vector Classifier vs. DeepNet pretrained on ImageNet)
Fisher classifier (pretrained) deep net
Lapuschkin et al. 2016. Analyzing Classifiers: Fisher Vectors and Deep Neural
Networks
11.
11/45
Example: Pascal VOC Classification [Lapuschkin’16]
Lapuschkin et al. 2016. Analyzing Classifiers: Fisher Vectors and Deep Neural
Networks
12.
12/45
Example: Medical Diagnosis [Binder’18]
A: Invasive breast cancer, H&E stain; B: Normal mammary glands and fibrous tissue,
H&E stain; C: Diffuse carcinoma infiltrate in fibrous tissue, Hematoxylin stain
Binder et al. 2018 “Towards computational fluorescence microscopy: Machine learning-
based integrated prediction of morphological and molecular tumor profiles”
13.
13/45
Example: Quantum Chemistry [Schütt’17]
DFT calculation of
the stationary
Schrödinger
Equation
DTNN,
Schütt’17
molecular structure (e.g. atoms positions)
molecular electronic properties (e.g. atomization energy)
PBE0,
Pedrew’86
interpretable insight
Schütt et al. 2017: Quantum-Chemical Insights from Deep Tensor Neural Networks
15.
15/45
Explaining by Decomposing
Importance of a variable is the share of the function score
that can be attributed to it.
Decomposition property:
input
DNN
16.
16/45
Explaning Linear Models
A simple method:
17.
17/45
Explaning Linear Models
Taylor decomposition approach:
Insight: explanation depends
on the root point.
18.
18/45
Explaining Nonlinear Models
second-order terms are
hard to interpret and
can be very large
20.
20/45
Explaining Nonlinear Models by Propagation
Is there an
underlying
mathematical
framework?
21.
21/45
Deep Taylor Decomposition (DTD) [Montavon’17]
Question: Suppose that we have propagated LRP scores
(“relevance”) until a given layer. How should it be
propagated one layer further?
Key idea: Let’s use Taylor expansions for this.
22.
22/45
DTD Step 1: The Structure of Relevance
Observation: Relevance at each layer is a product of
the activation and an approximately constant term.
25.
25/45
DTD Step 2: Taylor Expansion
Taylor expansion at root point:
Relevance can now be
backward propagated
26.
26/45
DTD Step 3: Choosing the Root Point
(same as LRP-α1
β0
)
(Deep Taylor generic)
✔
1. nearest root
2. rescaled excitations
Choice of root point
27.
27/45
DTD: Choosing the Root Point
(Deep Taylor generic)
Pixels domain
Choice of root point
28.
28/45
DTD: Choosing the Root Point
(Deep Taylor generic)
Choice of root point
Embedding:
image source:
Tensorflow tutorial
29.
29/45
DTD: Application to Pooling Layers
A sum-pooling layer over positive activations is equivalent to a ReLU
layer with weights 1.
A p-norm pooling layer can be approximated as a sum-pooling layer
multiplied by a ratio of norms that we treat as constant [Montavon’17].
→ Treat pooling layers as ReLU detection layers
30.
30/45
Deep Taylor Decomposition on ConvNets
LRP-α1
β0
DTDfor
pixels
LRP-α1
β0
LRP-α1
β0
LRP-α1
β0
LRP-α1
β0
LRP-α1
β0
*
* For top-layers, other rules may improve selectivity
forward pass
backward pass
31.
31/45
Implementing Propagation Rules
Sequence of element-wise
computations
Sequence of vector
computations
Example: LRP-α1
β0
:
32.
32/45
Implementing Propagation Rules
Example: LRP-α1
β0
:
Code that reuses forward and gradient computations:
34.
34/45
Implementation on Large-Scale Models [Alber’18]
https://github.com/albermax/innvestigate
35.
35/45
DTD for Kernel Models [Kauffmann’18]
1. Build a neural network equivalent of the One-Class SVM:
Gaussian/Laplace Kernel
Student Kernel
2. Computes its deep Taylor
decomposition
Outlier score
36.
36/45
DTD: Choosing the Root Point (Revisited)
✔
✔
1. nearest root
3. rescaled activation
2. rescaled excitations
Choice of root point
2. LRP-α1
β0
(Deep Taylor generic)
3. Another rule
37.
37/45
Selecting the Explanation Technique
How to select the
best root points?
38.
38/45
Selecting the Explanation Technique
Which rule leads to the
best explanation?
39.
39/45
Selecting the Explanation Technique
What axioms should
an explanation satisfy?
40.
40/45
Selection based on Axioms
division by zero →
scores explode.
LRP-α1
β0
41.
41/45
Selection based on Axioms
discontinuous
step function
for bj
= 0
LRP-α1
β0
42.
42/45
Explainable ML: Challenges
Validating
Explanations
Underlying
mathematical
framework
Axioms of an
explanation
Perturbation
analysis [Samek’17]
Similarity to
ground truth
Human
perception
43.
43/45
Explainable ML: Opportunities
Using
Explanations
Human
interaction
Designing better
ML algorithms?
Extracting new
domain knowledge
Detecting unexpected
ML behavior
Finding weaknesses
of a dataset
44.
44/45
Check our webpage
with interactive demos, software, tutorials, ...
and our tutorial paper:
Montavon, G., Samek, W., Müller, K.-R. Methods for Interpreting and Understanding Deep
Neural Networks, Digital Signal Processing, 2018
45.
45/45
References
● [Alber’18] Alber, M. Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K., Montavon, G., Samek, W., Müller, K.-R.,
Dähne, S., Kindermans. P.-J. iNNvestigate neural networks. CoRR abs/1808.04260, 2018
● [Bach’15] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W. On pixel-wise explanations
for nonlinear classifier decisions by layer-wise relevance propagation. PLOS ONE 10 (7), 2015
● [Binder’18] Binder, A., Bockmayr, M., …, Müller, K.-R., Klauschen, F. Towards computational fluorescence
microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles CoRR
abs/1805.11178, 2018
● [Bojarski’17] Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L. D., Muller, U.
Explaining how a deep neural network trained with end-to-end learning steers a car. CoRR abs/1704.07911, 2017
● [Kauffmann’18] Kauffmann, J. Müller, K.-R., Montavon, G., Towards Explaining Anomalies: A Deep Taylor
Decomposition of One-Class Models CoRR abs/1805.06230, 2018
● [Lapuschkin’16] Lapuschkin, S., Binder, A., Montavon, G., Müller, K.-R., Samek, W. Analyzing classifiers: Fisher
vectors and deep neural networks. CVPR 2016
● [Montavon’17] Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.-R. Explaining nonlinear
classification decisions with deep Taylor decomposition. Pattern Recognition 65, 211–222, 2017
● [Samek’17] Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. Evaluating the visualization of
what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 2017
● [Schütt’17] Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K.-R., Tkatchenko, A. Quantum-Chemical Insights
from Deep Tensor Neural Networks, Nature Communications 8, 13890, 2017
● [Symonian’13] Symonian, K. Vedaldi, A. Zisserman, A. Deep Inside Convolutional Networks: Visualising Image
Classification Models and Saliency Maps. ArXiv 2013
0 likes
Be the first to like this
Views
Total views
83
On SlideShare
0
From Embeds
0
Number of Embeds
1
You have now unlocked unlimited access to 20M+ documents!
Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.