Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning Models: Explaining their Decisions, and Validating the Explanations

Gregoire Montavon (TU Berlin) at the International Workshop Machine Learning and Artificial intelligence at Télécom ParisTech

  • Be the first to comment

  • Be the first to like this

Machine Learning Models: Explaining their Decisions, and Validating the Explanations

  1. 1. 1/45 Explaining Machine Learning Decisions Grégoire Montavon, TU Berlin Joint work with: Wojciech Samek, Klaus-Robert Müller, Sebastian Lapuschkin, Alexander Binder 18/09/2018 Intl. Workshop ML & AI, Telecom ParisTech
  2. 2. 2/45 From ML Successes to Applications Autonomous Driving Medical Diagnosis Networks (smart grids, etc.) Visual Reasoning AlphaGo beats Go human champ Deep Net outperforms humans in image classification
  3. 3. 3/45 Can we interpret what a ML model has learned?
  4. 4. 4/45 First, we need to define what we want from interpretable ML.
  5. 5. 5/45 Understanding Deep Nets: Two Views Understanding what mechanism the network uses to solve a problem or implement a function. Understanding how the network relates the input to the output variables.
  6. 6. 6/45 interpreting predicted classes explaining individual decisions
  7. 7. Interpreting Predicted Classes Image from Symonian’13 Example: “How does a goose typically look like according to the neural network?” goose non-goose
  8. 8. 8/45 Explaining Individual Decisions Images from Lapuschkin’16 Example: “Why is a given image classified as a sheep?” sheep non-sheep
  9. 9. 9/45 Example: Autonomous Driving [Bojarski’17] Input: Decision Bojarski et al. 2017 “Explaining How a Deep Neural Network Trained with End-to- End Learning Steers a Car” PilotNet Explanation:
  10. 10. 10/45 Example: Pascal VOC Classification [Lapuschkin’16] Comparing Performance on Pascal VOC 2007 (Fisher Vector Classifier vs. DeepNet pretrained on ImageNet) Fisher classifier (pretrained) deep net Lapuschkin et al. 2016. Analyzing Classifiers: Fisher Vectors and Deep Neural Networks
  11. 11. 11/45 Example: Pascal VOC Classification [Lapuschkin’16] Lapuschkin et al. 2016. Analyzing Classifiers: Fisher Vectors and Deep Neural Networks
  12. 12. 12/45 Example: Medical Diagnosis [Binder’18] A: Invasive breast cancer, H&E stain; B: Normal mammary glands and fibrous tissue, H&E stain; C: Diffuse carcinoma infiltrate in fibrous tissue, Hematoxylin stain Binder et al. 2018 “Towards computational fluorescence microscopy: Machine learning- based integrated prediction of morphological and molecular tumor profiles”
  13. 13. 13/45 Example: Quantum Chemistry [Schütt’17] DFT calculation of the stationary Schrödinger Equation DTNN, Schütt’17 molecular structure (e.g. atoms positions) molecular electronic properties (e.g. atomization energy) PBE0, Pedrew’86 interpretable insight Schütt et al. 2017: Quantum-Chemical Insights from Deep Tensor Neural Networks
  14. 14. 14/45 Examples of Explanation Methods
  15. 15. 15/45 Explaining by Decomposing Importance of a variable is the share of the function score that can be attributed to it. Decomposition property: input DNN
  16. 16. 16/45 Explaning Linear Models A simple method:
  17. 17. 17/45 Explaning Linear Models Taylor decomposition approach: Insight: explanation depends on the root point.
  18. 18. 18/45 Explaining Nonlinear Models second-order terms are hard to interpret and can be very large
  19. 19. 19/45 Explaining Nonlinear Models by Propagation Layer-Wise Relevance Propagation (LRP) [Bach’15]
  20. 20. 20/45 Explaining Nonlinear Models by Propagation Is there an underlying mathematical framework?
  21. 21. 21/45 Deep Taylor Decomposition (DTD) [Montavon’17] Question: Suppose that we have propagated LRP scores (“relevance”) until a given layer. How should it be propagated one layer further? Key idea: Let’s use Taylor expansions for this.
  22. 22. 22/45 DTD Step 1: The Structure of Relevance Observation: Relevance at each layer is a product of the activation and an approximately constant term.
  23. 23. 23/45 DTD Step 1: The Stucture of Relevance
  24. 24. 24/45 DTD Step 2: Taylor Expansion
  25. 25. 25/45 DTD Step 2: Taylor Expansion Taylor expansion at root point: Relevance can now be backward propagated
  26. 26. 26/45 DTD Step 3: Choosing the Root Point (same as LRP-α1 β0 ) (Deep Taylor generic) ✔ 1. nearest root 2. rescaled excitations Choice of root point
  27. 27. 27/45 DTD: Choosing the Root Point (Deep Taylor generic) Pixels domain Choice of root point
  28. 28. 28/45 DTD: Choosing the Root Point (Deep Taylor generic) Choice of root point Embedding: image source: Tensorflow tutorial
  29. 29. 29/45 DTD: Application to Pooling Layers A sum-pooling layer over positive activations is equivalent to a ReLU layer with weights 1. A p-norm pooling layer can be approximated as a sum-pooling layer multiplied by a ratio of norms that we treat as constant [Montavon’17]. → Treat pooling layers as ReLU detection layers
  30. 30. 30/45 Deep Taylor Decomposition on ConvNets LRP-α1 β0 DTDfor pixels LRP-α1 β0 LRP-α1 β0 LRP-α1 β0 LRP-α1 β0 LRP-α1 β0 * * For top-layers, other rules may improve selectivity forward pass backward pass
  31. 31. 31/45 Implementing Propagation Rules Sequence of element-wise computations Sequence of vector computations Example: LRP-α1 β0 :
  32. 32. 32/45 Implementing Propagation Rules Example: LRP-α1 β0 : Code that reuses forward and gradient computations:
  33. 33. 33/45 How Deep Taylor / LRP Scales
  34. 34. 34/45 Implementation on Large-Scale Models [Alber’18]
  35. 35. 35/45 DTD for Kernel Models [Kauffmann’18] 1. Build a neural network equivalent of the One-Class SVM: Gaussian/Laplace Kernel Student Kernel 2. Computes its deep Taylor decomposition Outlier score
  36. 36. 36/45 DTD: Choosing the Root Point (Revisited) ✔ ✔ 1. nearest root 3. rescaled activation 2. rescaled excitations Choice of root point 2. LRP-α1 β0 (Deep Taylor generic) 3. Another rule
  37. 37. 37/45 Selecting the Explanation Technique How to select the best root points?
  38. 38. 38/45 Selecting the Explanation Technique Which rule leads to the best explanation?
  39. 39. 39/45 Selecting the Explanation Technique What axioms should an explanation satisfy?
  40. 40. 40/45 Selection based on Axioms division by zero → scores explode. LRP-α1 β0
  41. 41. 41/45 Selection based on Axioms discontinuous step function for bj = 0 LRP-α1 β0
  42. 42. 42/45 Explainable ML: Challenges Validating Explanations Underlying mathematical framework Axioms of an explanation Perturbation analysis [Samek’17] Similarity to ground truth Human perception
  43. 43. 43/45 Explainable ML: Opportunities Using Explanations Human interaction Designing better ML algorithms? Extracting new domain knowledge Detecting unexpected ML behavior Finding weaknesses of a dataset
  44. 44. 44/45 Check our webpage with interactive demos, software, tutorials, ... and our tutorial paper: Montavon, G., Samek, W., Müller, K.-R. Methods for Interpreting and Understanding Deep Neural Networks, Digital Signal Processing, 2018
  45. 45. 45/45 References ● [Alber’18] Alber, M. Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K., Montavon, G., Samek, W., Müller, K.-R., Dähne, S., Kindermans. P.-J. iNNvestigate neural networks. CoRR abs/1808.04260, 2018 ● [Bach’15] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W. On pixel-wise explanations for nonlinear classifier decisions by layer-wise relevance propagation. PLOS ONE 10 (7), 2015 ● [Binder’18] Binder, A., Bockmayr, M., …, Müller, K.-R., Klauschen, F. Towards computational fluorescence microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles CoRR abs/1805.11178, 2018 ● [Bojarski’17] Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L. D., Muller, U. Explaining how a deep neural network trained with end-to-end learning steers a car. CoRR abs/1704.07911, 2017 ● [Kauffmann’18] Kauffmann, J. Müller, K.-R., Montavon, G., Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models CoRR abs/1805.06230, 2018 ● [Lapuschkin’16] Lapuschkin, S., Binder, A., Montavon, G., Müller, K.-R., Samek, W. Analyzing classifiers: Fisher vectors and deep neural networks. CVPR 2016 ● [Montavon’17] Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.-R. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65, 211–222, 2017 ● [Samek’17] Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 2017 ● [Schütt’17] Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K.-R., Tkatchenko, A. Quantum-Chemical Insights from Deep Tensor Neural Networks, Nature Communications 8, 13890, 2017 ● [Symonian’13] Symonian, K. Vedaldi, A. Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. ArXiv 2013