Advertisement

May. 2, 2019•0 likes## 0 likes

•85 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Technology

Gregoire Montavon (TU Berlin) at the International Workshop Machine Learning and Artificial intelligence at Télécom ParisTech

Samantha BonnetFollow

Advertisement

Advertisement

Advertisement

The Student's Guide to LinkedInLinkedIn

Different Roles in Machine Learning CareerIntellipaat

Defining a Tech Project Vision in Eight Quick Steps pdfTechSoup

The Hero's Journey (For movie fans, Lego fans, and presenters!)Dan Roam

10 Inspirational Quotes for GraduationGuy Kawasaki

The Health Benefits of DogsThe Presentation Designer

- 1/45 Explaining Machine Learning Decisions Grégoire Montavon, TU Berlin Joint work with: Wojciech Samek, Klaus-Robert Müller, Sebastian Lapuschkin, Alexander Binder 18/09/2018 Intl. Workshop ML & AI, Telecom ParisTech
- 2/45 From ML Successes to Applications Autonomous Driving Medical Diagnosis Networks (smart grids, etc.) Visual Reasoning AlphaGo beats Go human champ Deep Net outperforms humans in image classification
- 3/45 Can we interpret what a ML model has learned?
- 4/45 First, we need to define what we want from interpretable ML.
- 5/45 Understanding Deep Nets: Two Views Understanding what mechanism the network uses to solve a problem or implement a function. Understanding how the network relates the input to the output variables.
- 6/45 interpreting predicted classes explaining individual decisions
- Interpreting Predicted Classes Image from Symonian’13 Example: “How does a goose typically look like according to the neural network?” goose non-goose
- 8/45 Explaining Individual Decisions Images from Lapuschkin’16 Example: “Why is a given image classified as a sheep?” sheep non-sheep
- 9/45 Example: Autonomous Driving [Bojarski’17] Input: Decision Bojarski et al. 2017 “Explaining How a Deep Neural Network Trained with End-to- End Learning Steers a Car” PilotNet Explanation:
- 10/45 Example: Pascal VOC Classification [Lapuschkin’16] Comparing Performance on Pascal VOC 2007 (Fisher Vector Classifier vs. DeepNet pretrained on ImageNet) Fisher classifier (pretrained) deep net Lapuschkin et al. 2016. Analyzing Classifiers: Fisher Vectors and Deep Neural Networks
- 11/45 Example: Pascal VOC Classification [Lapuschkin’16] Lapuschkin et al. 2016. Analyzing Classifiers: Fisher Vectors and Deep Neural Networks
- 12/45 Example: Medical Diagnosis [Binder’18] A: Invasive breast cancer, H&E stain; B: Normal mammary glands and fibrous tissue, H&E stain; C: Diffuse carcinoma infiltrate in fibrous tissue, Hematoxylin stain Binder et al. 2018 “Towards computational fluorescence microscopy: Machine learning- based integrated prediction of morphological and molecular tumor profiles”
- 13/45 Example: Quantum Chemistry [Schütt’17] DFT calculation of the stationary Schrödinger Equation DTNN, Schütt’17 molecular structure (e.g. atoms positions) molecular electronic properties (e.g. atomization energy) PBE0, Pedrew’86 interpretable insight Schütt et al. 2017: Quantum-Chemical Insights from Deep Tensor Neural Networks
- 14/45 Examples of Explanation Methods
- 15/45 Explaining by Decomposing Importance of a variable is the share of the function score that can be attributed to it. Decomposition property: input DNN
- 16/45 Explaning Linear Models A simple method:
- 17/45 Explaning Linear Models Taylor decomposition approach: Insight: explanation depends on the root point.
- 18/45 Explaining Nonlinear Models second-order terms are hard to interpret and can be very large
- 19/45 Explaining Nonlinear Models by Propagation Layer-Wise Relevance Propagation (LRP) [Bach’15]
- 20/45 Explaining Nonlinear Models by Propagation Is there an underlying mathematical framework?
- 21/45 Deep Taylor Decomposition (DTD) [Montavon’17] Question: Suppose that we have propagated LRP scores (“relevance”) until a given layer. How should it be propagated one layer further? Key idea: Let’s use Taylor expansions for this.
- 22/45 DTD Step 1: The Structure of Relevance Observation: Relevance at each layer is a product of the activation and an approximately constant term.
- 23/45 DTD Step 1: The Stucture of Relevance
- 24/45 DTD Step 2: Taylor Expansion
- 25/45 DTD Step 2: Taylor Expansion Taylor expansion at root point: Relevance can now be backward propagated
- 26/45 DTD Step 3: Choosing the Root Point (same as LRP-α1 β0 ) (Deep Taylor generic) ✔ 1. nearest root 2. rescaled excitations Choice of root point
- 27/45 DTD: Choosing the Root Point (Deep Taylor generic) Pixels domain Choice of root point
- 28/45 DTD: Choosing the Root Point (Deep Taylor generic) Choice of root point Embedding: image source: Tensorflow tutorial
- 29/45 DTD: Application to Pooling Layers A sum-pooling layer over positive activations is equivalent to a ReLU layer with weights 1. A p-norm pooling layer can be approximated as a sum-pooling layer multiplied by a ratio of norms that we treat as constant [Montavon’17]. → Treat pooling layers as ReLU detection layers
- 30/45 Deep Taylor Decomposition on ConvNets LRP-α1 β0 DTDfor pixels LRP-α1 β0 LRP-α1 β0 LRP-α1 β0 LRP-α1 β0 LRP-α1 β0 * * For top-layers, other rules may improve selectivity forward pass backward pass
- 31/45 Implementing Propagation Rules Sequence of element-wise computations Sequence of vector computations Example: LRP-α1 β0 :
- 32/45 Implementing Propagation Rules Example: LRP-α1 β0 : Code that reuses forward and gradient computations:
- 33/45 How Deep Taylor / LRP Scales
- 34/45 Implementation on Large-Scale Models [Alber’18] https://github.com/albermax/innvestigate
- 35/45 DTD for Kernel Models [Kauffmann’18] 1. Build a neural network equivalent of the One-Class SVM: Gaussian/Laplace Kernel Student Kernel 2. Computes its deep Taylor decomposition Outlier score
- 36/45 DTD: Choosing the Root Point (Revisited) ✔ ✔ 1. nearest root 3. rescaled activation 2. rescaled excitations Choice of root point 2. LRP-α1 β0 (Deep Taylor generic) 3. Another rule
- 37/45 Selecting the Explanation Technique How to select the best root points?
- 38/45 Selecting the Explanation Technique Which rule leads to the best explanation?
- 39/45 Selecting the Explanation Technique What axioms should an explanation satisfy?
- 40/45 Selection based on Axioms division by zero → scores explode. LRP-α1 β0
- 41/45 Selection based on Axioms discontinuous step function for bj = 0 LRP-α1 β0
- 42/45 Explainable ML: Challenges Validating Explanations Underlying mathematical framework Axioms of an explanation Perturbation analysis [Samek’17] Similarity to ground truth Human perception
- 43/45 Explainable ML: Opportunities Using Explanations Human interaction Designing better ML algorithms? Extracting new domain knowledge Detecting unexpected ML behavior Finding weaknesses of a dataset
- 44/45 Check our webpage with interactive demos, software, tutorials, ... and our tutorial paper: Montavon, G., Samek, W., Müller, K.-R. Methods for Interpreting and Understanding Deep Neural Networks, Digital Signal Processing, 2018
- 45/45 References ● [Alber’18] Alber, M. Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K., Montavon, G., Samek, W., Müller, K.-R., Dähne, S., Kindermans. P.-J. iNNvestigate neural networks. CoRR abs/1808.04260, 2018 ● [Bach’15] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W. On pixel-wise explanations for nonlinear classifier decisions by layer-wise relevance propagation. PLOS ONE 10 (7), 2015 ● [Binder’18] Binder, A., Bockmayr, M., …, Müller, K.-R., Klauschen, F. Towards computational fluorescence microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles CoRR abs/1805.11178, 2018 ● [Bojarski’17] Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L. D., Muller, U. Explaining how a deep neural network trained with end-to-end learning steers a car. CoRR abs/1704.07911, 2017 ● [Kauffmann’18] Kauffmann, J. Müller, K.-R., Montavon, G., Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models CoRR abs/1805.06230, 2018 ● [Lapuschkin’16] Lapuschkin, S., Binder, A., Montavon, G., Müller, K.-R., Samek, W. Analyzing classifiers: Fisher vectors and deep neural networks. CVPR 2016 ● [Montavon’17] Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.-R. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65, 211–222, 2017 ● [Samek’17] Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 2017 ● [Schütt’17] Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K.-R., Tkatchenko, A. Quantum-Chemical Insights from Deep Tensor Neural Networks, Nature Communications 8, 13890, 2017 ● [Symonian’13] Symonian, K. Vedaldi, A. Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ArXiv 2013

Advertisement