Ideas on Machine Learning Interpretability
Patrick Hall, Wen Phan, SriSatish Ambati and the H2O.ai team
February 2017
2
Part 1: Seeing your data
• Glyphs
• Correlation graphs
• 2-D projections
• Partial dependence plots
• Residual analysis
Part 2: Using machine learning in regulated industry
• OLS regression alternatives
• Build toward ML model benchmarks
• ML in traditional analytics processes
• Small, interpretable ensembles
• Monotonicity constraints
Part 3: Understanding complex ML models
• Surrogate models
• LIME
• Maximum activation analysis
• Sensitivity analysis
• Variable importance measures
• TreeInterpreter
Contents
3
A framework for interpretability
Complexity of learned functions:
• Linear, monotonic
• Nonlinear, monotonic
• Nonlinear, non-monotonic
Enhancing trust and understanding:
the mechanisms and results of an
interpretable model should be both
transparent AND dependable.
Scope of interpretability:
Global vs. local
Application domain:
Model-agnostic vs. model-specific
Part 1: Seeing your data
5
Glyphs
Variables and their values can be
represented by small pictures with
certain attributes, called “glyphs”.
Here the four variables are
represented by their position in a
square and their values are
represented by a color.
os_type
os_version
iphone
newest oldest
OSX Win Android Linux
agent_type
agent_version
Opera
newest oldest
Safari IE Others Firefox bot
agent_type agent_version
os_type os_version
6
Correlation graphs
The nodes of this graph are the
variables in a data set. The
weights between the nodes are
defined by the absolute value of
their pairwise Pearson
correlation.
7
2-D projections
784 dimensions to 2 dimensions with PCA
784 dimensions to 2 dimensions with
autoencoder network
8
Partial dependence plots
Source: http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
HomeValue ~ MedInc + AveOccup + HouseAge + AveRooms
9
Residual analysis
Residuals from a machine
learning model should be
randomly distributed
obvious patterns in residuals
can indicate problems with
data preparation or model
specification
Part 2: Using ML in regulated industry
11
OLS regression alternatives
Penalized Regression Generalized Additive Models Quantile Regression
Source: http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
12
Build toward ML model benchmarks
Incorporate interactions and piecewise linear components to increase the accuracy of linear models
relative to machine learning models.
13
Machine learning in traditional analytics processes
14
Small, interpretable ensembles
15
Monotonicity constraints
Part 3: Understanding complex machine learning
models
17
Surrogate models
18
Local interpretable model-agnostic explanations
Source: https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
19
Maximum activation analysis
20
Sensitivity analysis
Data distributions shift over time. How will your model handle these shifts?
21
Variable importance measures
Global variable
importance indicates the
impact of a variable on
the model for the entire
training data set.
Local variable
importance can indicate
the impact of a variable
for each decision a model
makes – similar to reason
codes.
22
Treeinterpreter
Source: http://blog.datadive.net/interpreting-random-forests/
Tree interpreter decomposes
decision tree and random forest
predictions into bias (overall
average) and component terms.
This slide portrays the
decomposition of the decision
path into bias and individual
contributions for a single
decision tree - treeinterpreter
simply prints a ranked list of
the bias and individual
contributions for a given
prediction.

Interpretable machine learning

  • 1.
    Ideas on MachineLearning Interpretability Patrick Hall, Wen Phan, SriSatish Ambati and the H2O.ai team February 2017
  • 2.
    2 Part 1: Seeingyour data • Glyphs • Correlation graphs • 2-D projections • Partial dependence plots • Residual analysis Part 2: Using machine learning in regulated industry • OLS regression alternatives • Build toward ML model benchmarks • ML in traditional analytics processes • Small, interpretable ensembles • Monotonicity constraints Part 3: Understanding complex ML models • Surrogate models • LIME • Maximum activation analysis • Sensitivity analysis • Variable importance measures • TreeInterpreter Contents
  • 3.
    3 A framework forinterpretability Complexity of learned functions: • Linear, monotonic • Nonlinear, monotonic • Nonlinear, non-monotonic Enhancing trust and understanding: the mechanisms and results of an interpretable model should be both transparent AND dependable. Scope of interpretability: Global vs. local Application domain: Model-agnostic vs. model-specific
  • 4.
    Part 1: Seeingyour data
  • 5.
    5 Glyphs Variables and theirvalues can be represented by small pictures with certain attributes, called “glyphs”. Here the four variables are represented by their position in a square and their values are represented by a color. os_type os_version iphone newest oldest OSX Win Android Linux agent_type agent_version Opera newest oldest Safari IE Others Firefox bot agent_type agent_version os_type os_version
  • 6.
    6 Correlation graphs The nodesof this graph are the variables in a data set. The weights between the nodes are defined by the absolute value of their pairwise Pearson correlation.
  • 7.
    7 2-D projections 784 dimensionsto 2 dimensions with PCA 784 dimensions to 2 dimensions with autoencoder network
  • 8.
    8 Partial dependence plots Source:http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf HomeValue ~ MedInc + AveOccup + HouseAge + AveRooms
  • 9.
    9 Residual analysis Residuals froma machine learning model should be randomly distributed obvious patterns in residuals can indicate problems with data preparation or model specification
  • 10.
    Part 2: UsingML in regulated industry
  • 11.
    11 OLS regression alternatives PenalizedRegression Generalized Additive Models Quantile Regression Source: http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
  • 12.
    12 Build toward MLmodel benchmarks Incorporate interactions and piecewise linear components to increase the accuracy of linear models relative to machine learning models.
  • 13.
    13 Machine learning intraditional analytics processes
  • 14.
  • 15.
  • 16.
    Part 3: Understandingcomplex machine learning models
  • 17.
  • 18.
    18 Local interpretable model-agnosticexplanations Source: https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
  • 19.
  • 20.
    20 Sensitivity analysis Data distributionsshift over time. How will your model handle these shifts?
  • 21.
    21 Variable importance measures Globalvariable importance indicates the impact of a variable on the model for the entire training data set. Local variable importance can indicate the impact of a variable for each decision a model makes – similar to reason codes.
  • 22.
    22 Treeinterpreter Source: http://blog.datadive.net/interpreting-random-forests/ Tree interpreterdecomposes decision tree and random forest predictions into bias (overall average) and component terms. This slide portrays the decomposition of the decision path into bias and individual contributions for a single decision tree - treeinterpreter simply prints a ranked list of the bias and individual contributions for a given prediction.