Machine Learning Interpretation Lab using
aquarium
Confidential2 Confidential2
Lab session Initiation
Confidential3
Link to aquarium: http://aquarium.h2o.ai
License key: bit.ly/diveintoh2o
LOGGING IN TO THE LAB
Confidential4
Info
Pramit Choudhary
@MaverickPramit
www.linkedin.com/in/pramitc/
pramit.choudhary@h2o.ai
● Lead data scientist(ML Scientist) at h2o.ai.
● Previously Lead data scientist at DataScience.com(acquired by Oracle)
● Have been exploring better ways to evaluate,
extract and explain the learned decision policies for predictive models at h2o.ai
● For sometime I was also using machine learning algorithms to find love for eHarmony.
Confidential5
Machine Learning workflow
Figure: Shows an improved ML workflow showing how MLI fits
into a typical predictive analytical setup
**Reference: Adapted from ”Learning from data - Professor Yaser Abu-Mostafa”
Model Interpretation
Learnings by model inference
Confidential6
Driverless AI Components
Automatic
Visualization
Machine Learning
Interpretability
Machine Learning
Experimentation
Project
Management
ML
Recipes
Scoring Pipeline
and Deployment
Feature
Engineering
“Brining in human in the
loop”
Confidential7
Problems Addressed by Driverless AI
• Supervised Learning
– Regression
– Classification
– Binary
– Multinomial
• Tabular structured data
– Numeric
– Categorical
– Time/Date
– Text
– Missing values are allowed
• IDENTICALLY AND
INDEPENDENTLY DISTRIBUTED
(IID) ROWS
• TIME-SERIES
– Single time-series
– Grouped time-series
– Time-series problems with a gap
between training and testing to
account for time to deploy
• NLP
– Categorization
– Sentiment
Confidential8
What is Model Interpretation?
• Ability to explain and present a model in a way that is understandable to humans.
-- Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable machine learning.” arXiv
preprint. 2017. https://arxiv.org/pdf/1702.08608.pdf
• “Why did you do A” or ”Why DIDN’T you do B”, ”Why CAN’T you do C” ?
-- Gilpin, Leilani H., et al. "Explaining Explanations: An Approach to Evaluating Interpretability of Machine
Learning." arXiv preprint arXiv:1806.00069 (2018).
• A Model’s result is self descriptive and needs no further explanation; expressed in terms of inputs and
outputs. “Mapping an abstract concept(e.g. predicted class) into a domain that human can make sense
of“
-- Gregoire Montavon et al. “Methods for Interpreting and Understanding Deep Neural Networks”
Confidential9 Confidential9
“A collection of visual and/or interactive artifacts
that provide a user with sufficient description of
the model behavior to accurately perform tasks
like evaluation, trusting, predicting, or improving
the model.”
Assistant Professor Sameer Singh, University of
California, Irvine
Confidential10
Motives for Model Interpretation
1. Helps in defining hypothesis for the problem
2. Debugging and improving an ML system –
mismatched objectives.
3. Exploring and discovering latent or hidden feature
interactions (useful for feature engineering/selection
and resolving preconceptions).
4. Understanding model variability to avoid over-fitting.
5. Helps in model comparison.
6. Building domain knowledge about a particular use
case.
7. Bring transparency to decision making to enable
trust and safety – ML System is making sound
decisions.
Model Maker(Producer)
1. Ability to share the explanations to consumers of the
predictive model.
2. Explain the model/algorithm.
3. Explain the key features driving the KPI.
4. Verify and validate the accountability of ML learning
systems, e.g., causes for false positives in credit scoring,
insurance claim frauds(identifying spurious correlations
is easier for humans).
5. Identify blind spots to prevent adversarial attacks or fix
dataset errors.
6. Right to explanation: Comply with data protection law
and regulations, e.g., EU’s GDPR.
Model Breaker(Consumer)
Confidential11
How will it help?
Currently
ML use-case = data munging + data scientist expertise + model building with availability to computation
How can we scale this 10x?
ML use-case = data munging + data scientist expertise + model building with availability to computation
Driverless Machine Learning + Machine Learning Interpretation(MLI)
Confidential12
Scope of Interpretation
Global Interpretation
Being able to explain the conditional interaction
between dependent (response) variables and
independent (predictor or explanatory)
variables based on the complete dataset.
Helpful in explaining the context of the decision
classification.
Local Interpretation
Being able to explain the conditional
interaction between dependent (response)
variables and independent (predictor or
explanatory) variables for a single row or a
subset of rows. Helpful in identifying local
trends and intuitions.
Global Interpretation
Local Interpretation
Confidential13
• Supervised learning problem:
– Titanic Dataset: https://www.kaggle.com/c/titanic/data
• Timeseries problem:
– Walmart sales dataset: https://www.kaggle.com/c/walmart-recruiting-store-
sales-forecasting/data
Datasets
Confidential14 Confidential14
Algorithms
1. Native Interpretation
2. Post-hoc Interpretation
Confidential15
Experiment Settings
TimeAccuracy
• Relative interpretability – higher
values favor more interpretable
models
• The higher the interpretability
setting, the lower the complexity
of the engineered features and
of the final model(s)
Interpretability
Confidential16
Native Interpretability
Interpretability
Ensemble
Level
Target
Transformation
Feature Engineering
Feature Pre-
Pruning
Monotonicity
Constraints
1 - 3 <= 3 None Disabled
4 <= 3 Inverse None Disabled
5 <= 3 Anscombe
Clustering (ID, distance)
Truncated SVD
None Disabled
6 <= 2
Logit
Sigmoid
Feature selection Disabled
7 <= 2 Frequency Encoding Feature selection Enabled
8 <= 1 4th
Root Feature selection Enabled
9 <= 1
Square
Square Root
Bulk Interactions (add,
subtract, multiply,
divide)
Weight of Evidence
Feature selection Enabled
10 0
Identity
Unit Box
Log
Date Decompositions
Number Encoding
Target Encoding
Text (TF-IDF,
Frequency)
Feature selection Enabled
Good
start
Confidential17
Post-hoc Interpretation/Evaluation
Confidential18
Reason Codes using K-LIME
Confidential19
PDP/ICE
p(HouseAge, Avg. Occupants per household) vs
Avg. House Value: One can observe that once the
avg. occupancy > 2, houseAge does not seem to
have much of an effect on the avg. house value
Image credit: Patrick Hall, Navdeep Gill,
h2o.ai team
• Helps in understanding interaction impact of two independent features in a low dimensional space
visually
• Helps in understanding the average partial dependence of the target function f(Y|X )on
subsetof features by marginalizing over rest of the features (complement set of features)
Confidential20
Leave One Covariate Out (LOCO)
Confidential21
Shapley values
• Explanation idea is borrowed from coalitional game theory
• Initial idea was proposed by Lloyd S. Shapley (1953)
-- Lloyd S. Shapley, Alvin E. Roth, et al. The Shapley value: Essays in honor of Lloyd S. Shapley. Cambridge University
Press, 1988. URL: http://www.library.fa.ru/files/Roth2.pdf.
• Idea:
– Feature’s contribution is computed by observing the difference between model’s initial predictions
– Followed by it’s average prediction post perturbing the feature space
– One has to be careful with features which are computationally dependent while perturbing
• Defined as
-- Erik Strumbelj and Igor Kononenko. An Efficient Explanation of Individual Classifications using Game Theory. Journal of
Machine Learning Research,
11(Jan):1–18, 2010. URL: http://www.jmlr.org/papers/volume11/strumbelj10a/strumbelj10a.pdf.
-- Scott M. Lundberg and Su-In Lee. A Unified Approach to Interpreting Model Predictions. URL: http://papers.nips.cc/paper/
7062-a-unified-approach-to-interpreting-model-predictions.pdf.
• Scope of Interpretation: Helps in computing Globally and locally faithful feature importance
• Flexible: Can be adopted in different forms - model agnostic or model specific approximations (Tree SHAP for tree ensemble
methods, Linear SHAP)
Confidential22 Confidential22
Surrogate Decision Trees
• If the original DNN learned decision function is denoted as 𝑔 and set of predictions, g(X)=Ỳ, then a surrogate tree(𝘩tree) can be
learned such 𝘩tree (X, Ỳ) ≅ g(X)
• The faithfulness of the decisions generated by SDT depends on how precisely the tree surrogates captures the decisions learned
by the original estimator(g).
-- Craven, M., & Shavlik, J. W. (1996). Extracting tree-structured representations of trained networks. In Advances in neural
information processing systems (pp. 24-30).
A B
Confidential23
• Capture Feature Interactions
• Partial dependence function could possibly be extended to capture
feature interactions by applying statistical tests
Exploring: Capturing interactions
Image: Showing H-Statistics scores for all possible pairs of features ordered in-terms of importance
Confidential24 Confidential24
Time Series
Confidential25
0
50
100
150
200
250
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018
Sales over time
0
10
20
30
40
50
60
70
80
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018
Sales over time
0
50
100
150
200
250
300
350
400
12/31/2017 1/5/2018 1/10/2018 1/15/2018
Sales over time
Linear relationshipNonlinear (seasonal)
relationship
What is a Time Series Problem?
Confidential26
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales per per day (all groups)
0
100
200
300
400
500
600
700
800
12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018
sales by group
group 1 group 2 group 3
time groups sales
01/01/2018 group1 30
01/01/2018 group2 100
01/01/2018 group3 10
02/01/2018 group1 60.2
02/01/2018 group2 200.2
02/01/2018 group3 20.2
03/01/2018 group1 90.3
03/01/2018 group2 300.3
03/01/2018 group3 30.3
04/01/2018 group1 120.4
04/01/2018 group2 400.4
04/01/2018 group3 40.4
Time Groups
Confidential27
MLI for Time Series
Confidential28
Can we think?
An interactive playground to simulate, infer and evaluate
all forms of model- what-if(scenario analysis), identifying
counterfactuals to establish model robustness
Image reference: Rajeswara et.al (http://arxiv.org/abs/1703.02660)
Confidential29
Template
29 Image credit: Patrick Hall
Confidential30 Confidential30
• https://www.fatml.org/resources/principles-for-accountable-algorithms
• https://www.darpa.mil/program/explainable-artificial-intelligence
• https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning
• https://twitter.com/jpatrickhall/status/1073572209988255744
• https://github.com/Castro38/automatic-ml-intro-tutorial-h2oai/blob/master/automatic-ml-
intro-tutorial.md
• https://github.com/h2oai/mli-resources
• https://github.com/jphall663/awesome-machine-learning-interpretability
• https://www.h2o.ai/blog/what-is-your-ai-thinking-part-1/
Questions?
Confidential31
Driverless AI - Machine Learning Interpretability
Gain confidence in models before deploying them!

Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!

  • 1.
  • 2.
  • 3.
    Confidential3 Link to aquarium:http://aquarium.h2o.ai License key: bit.ly/diveintoh2o LOGGING IN TO THE LAB
  • 4.
    Confidential4 Info Pramit Choudhary @MaverickPramit www.linkedin.com/in/pramitc/ pramit.choudhary@h2o.ai ● Leaddata scientist(ML Scientist) at h2o.ai. ● Previously Lead data scientist at DataScience.com(acquired by Oracle) ● Have been exploring better ways to evaluate, extract and explain the learned decision policies for predictive models at h2o.ai ● For sometime I was also using machine learning algorithms to find love for eHarmony.
  • 5.
    Confidential5 Machine Learning workflow Figure:Shows an improved ML workflow showing how MLI fits into a typical predictive analytical setup **Reference: Adapted from ”Learning from data - Professor Yaser Abu-Mostafa” Model Interpretation Learnings by model inference
  • 6.
    Confidential6 Driverless AI Components Automatic Visualization MachineLearning Interpretability Machine Learning Experimentation Project Management ML Recipes Scoring Pipeline and Deployment Feature Engineering “Brining in human in the loop”
  • 7.
    Confidential7 Problems Addressed byDriverless AI • Supervised Learning – Regression – Classification – Binary – Multinomial • Tabular structured data – Numeric – Categorical – Time/Date – Text – Missing values are allowed • IDENTICALLY AND INDEPENDENTLY DISTRIBUTED (IID) ROWS • TIME-SERIES – Single time-series – Grouped time-series – Time-series problems with a gap between training and testing to account for time to deploy • NLP – Categorization – Sentiment
  • 8.
    Confidential8 What is ModelInterpretation? • Ability to explain and present a model in a way that is understandable to humans. -- Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable machine learning.” arXiv preprint. 2017. https://arxiv.org/pdf/1702.08608.pdf • “Why did you do A” or ”Why DIDN’T you do B”, ”Why CAN’T you do C” ? -- Gilpin, Leilani H., et al. "Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning." arXiv preprint arXiv:1806.00069 (2018). • A Model’s result is self descriptive and needs no further explanation; expressed in terms of inputs and outputs. “Mapping an abstract concept(e.g. predicted class) into a domain that human can make sense of“ -- Gregoire Montavon et al. “Methods for Interpreting and Understanding Deep Neural Networks”
  • 9.
    Confidential9 Confidential9 “A collectionof visual and/or interactive artifacts that provide a user with sufficient description of the model behavior to accurately perform tasks like evaluation, trusting, predicting, or improving the model.” Assistant Professor Sameer Singh, University of California, Irvine
  • 10.
    Confidential10 Motives for ModelInterpretation 1. Helps in defining hypothesis for the problem 2. Debugging and improving an ML system – mismatched objectives. 3. Exploring and discovering latent or hidden feature interactions (useful for feature engineering/selection and resolving preconceptions). 4. Understanding model variability to avoid over-fitting. 5. Helps in model comparison. 6. Building domain knowledge about a particular use case. 7. Bring transparency to decision making to enable trust and safety – ML System is making sound decisions. Model Maker(Producer) 1. Ability to share the explanations to consumers of the predictive model. 2. Explain the model/algorithm. 3. Explain the key features driving the KPI. 4. Verify and validate the accountability of ML learning systems, e.g., causes for false positives in credit scoring, insurance claim frauds(identifying spurious correlations is easier for humans). 5. Identify blind spots to prevent adversarial attacks or fix dataset errors. 6. Right to explanation: Comply with data protection law and regulations, e.g., EU’s GDPR. Model Breaker(Consumer)
  • 11.
    Confidential11 How will ithelp? Currently ML use-case = data munging + data scientist expertise + model building with availability to computation How can we scale this 10x? ML use-case = data munging + data scientist expertise + model building with availability to computation Driverless Machine Learning + Machine Learning Interpretation(MLI)
  • 12.
    Confidential12 Scope of Interpretation GlobalInterpretation Being able to explain the conditional interaction between dependent (response) variables and independent (predictor or explanatory) variables based on the complete dataset. Helpful in explaining the context of the decision classification. Local Interpretation Being able to explain the conditional interaction between dependent (response) variables and independent (predictor or explanatory) variables for a single row or a subset of rows. Helpful in identifying local trends and intuitions. Global Interpretation Local Interpretation
  • 13.
    Confidential13 • Supervised learningproblem: – Titanic Dataset: https://www.kaggle.com/c/titanic/data • Timeseries problem: – Walmart sales dataset: https://www.kaggle.com/c/walmart-recruiting-store- sales-forecasting/data Datasets
  • 14.
    Confidential14 Confidential14 Algorithms 1. NativeInterpretation 2. Post-hoc Interpretation
  • 15.
    Confidential15 Experiment Settings TimeAccuracy • Relativeinterpretability – higher values favor more interpretable models • The higher the interpretability setting, the lower the complexity of the engineered features and of the final model(s) Interpretability
  • 16.
    Confidential16 Native Interpretability Interpretability Ensemble Level Target Transformation Feature Engineering FeaturePre- Pruning Monotonicity Constraints 1 - 3 <= 3 None Disabled 4 <= 3 Inverse None Disabled 5 <= 3 Anscombe Clustering (ID, distance) Truncated SVD None Disabled 6 <= 2 Logit Sigmoid Feature selection Disabled 7 <= 2 Frequency Encoding Feature selection Enabled 8 <= 1 4th Root Feature selection Enabled 9 <= 1 Square Square Root Bulk Interactions (add, subtract, multiply, divide) Weight of Evidence Feature selection Enabled 10 0 Identity Unit Box Log Date Decompositions Number Encoding Target Encoding Text (TF-IDF, Frequency) Feature selection Enabled Good start
  • 17.
  • 18.
  • 19.
    Confidential19 PDP/ICE p(HouseAge, Avg. Occupantsper household) vs Avg. House Value: One can observe that once the avg. occupancy > 2, houseAge does not seem to have much of an effect on the avg. house value Image credit: Patrick Hall, Navdeep Gill, h2o.ai team • Helps in understanding interaction impact of two independent features in a low dimensional space visually • Helps in understanding the average partial dependence of the target function f(Y|X )on subsetof features by marginalizing over rest of the features (complement set of features)
  • 20.
  • 21.
    Confidential21 Shapley values • Explanationidea is borrowed from coalitional game theory • Initial idea was proposed by Lloyd S. Shapley (1953) -- Lloyd S. Shapley, Alvin E. Roth, et al. The Shapley value: Essays in honor of Lloyd S. Shapley. Cambridge University Press, 1988. URL: http://www.library.fa.ru/files/Roth2.pdf. • Idea: – Feature’s contribution is computed by observing the difference between model’s initial predictions – Followed by it’s average prediction post perturbing the feature space – One has to be careful with features which are computationally dependent while perturbing • Defined as -- Erik Strumbelj and Igor Kononenko. An Efficient Explanation of Individual Classifications using Game Theory. Journal of Machine Learning Research, 11(Jan):1–18, 2010. URL: http://www.jmlr.org/papers/volume11/strumbelj10a/strumbelj10a.pdf. -- Scott M. Lundberg and Su-In Lee. A Unified Approach to Interpreting Model Predictions. URL: http://papers.nips.cc/paper/ 7062-a-unified-approach-to-interpreting-model-predictions.pdf. • Scope of Interpretation: Helps in computing Globally and locally faithful feature importance • Flexible: Can be adopted in different forms - model agnostic or model specific approximations (Tree SHAP for tree ensemble methods, Linear SHAP)
  • 22.
    Confidential22 Confidential22 Surrogate DecisionTrees • If the original DNN learned decision function is denoted as 𝑔 and set of predictions, g(X)=Ỳ, then a surrogate tree(𝘩tree) can be learned such 𝘩tree (X, Ỳ) ≅ g(X) • The faithfulness of the decisions generated by SDT depends on how precisely the tree surrogates captures the decisions learned by the original estimator(g). -- Craven, M., & Shavlik, J. W. (1996). Extracting tree-structured representations of trained networks. In Advances in neural information processing systems (pp. 24-30). A B
  • 23.
    Confidential23 • Capture FeatureInteractions • Partial dependence function could possibly be extended to capture feature interactions by applying statistical tests Exploring: Capturing interactions Image: Showing H-Statistics scores for all possible pairs of features ordered in-terms of importance
  • 24.
  • 25.
    Confidential25 0 50 100 150 200 250 12/21/2017 12/31/2017 1/10/20181/20/2018 1/30/2018 2/9/2018 2/19/2018 Sales over time 0 10 20 30 40 50 60 70 80 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 Sales over time 0 50 100 150 200 250 300 350 400 12/31/2017 1/5/2018 1/10/2018 1/15/2018 Sales over time Linear relationshipNonlinear (seasonal) relationship What is a Time Series Problem?
  • 26.
    Confidential26 0 100 200 300 400 500 600 700 800 12/21/2017 12/31/2017 1/10/20181/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018 sales per per day (all groups) 0 100 200 300 400 500 600 700 800 12/21/2017 12/31/2017 1/10/2018 1/20/2018 1/30/2018 2/9/2018 2/19/2018 3/1/2018 3/11/2018 sales by group group 1 group 2 group 3 time groups sales 01/01/2018 group1 30 01/01/2018 group2 100 01/01/2018 group3 10 02/01/2018 group1 60.2 02/01/2018 group2 200.2 02/01/2018 group3 20.2 03/01/2018 group1 90.3 03/01/2018 group2 300.3 03/01/2018 group3 30.3 04/01/2018 group1 120.4 04/01/2018 group2 400.4 04/01/2018 group3 40.4 Time Groups
  • 27.
  • 28.
    Confidential28 Can we think? Aninteractive playground to simulate, infer and evaluate all forms of model- what-if(scenario analysis), identifying counterfactuals to establish model robustness Image reference: Rajeswara et.al (http://arxiv.org/abs/1703.02660)
  • 29.
  • 30.
    Confidential30 Confidential30 • https://www.fatml.org/resources/principles-for-accountable-algorithms •https://www.darpa.mil/program/explainable-artificial-intelligence • https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning • https://twitter.com/jpatrickhall/status/1073572209988255744 • https://github.com/Castro38/automatic-ml-intro-tutorial-h2oai/blob/master/automatic-ml- intro-tutorial.md • https://github.com/h2oai/mli-resources • https://github.com/jphall663/awesome-machine-learning-interpretability • https://www.h2o.ai/blog/what-is-your-ai-thinking-part-1/ Questions?
  • 31.
    Confidential31 Driverless AI -Machine Learning Interpretability Gain confidence in models before deploying them!