Understanding Black Box Models with Shapley Values

Behind The Black BoX:
How To Understand Any Ml Model Using SHAP

Welcome!
I am Jonathan Bechtel
- MS Analytics @ Georgia Tech
- Head Data Science Instructor @
General Assembly
- Private Consultant To Help
Companies Work Through Data
Projects
- Learn More About Me:
www.jonathanbech.tel
2

Our Agenda
⊹ Describe A Problem: The tension between
model accuracy and interpretability
⊹ Study A Solution: Shapley sampling and its
ability to describe any model’s prediction
⊹ Learn New Software: Take a deep dive into
the SHAP library and its different uses
⊹ Get Hands On Practice: We’ll have small
breaks for knowledge checks & coding practice
3

Model Interpretability:
The Fundamental Tradeoff
1

Simple Models
- Less accurate
- Extract shallow
patterns from data
- Generate model
interpretations that are
straightforward
Accuracy Vs. Interpretability
Complicated Models
- More accurate
- Extract perceptual,
non-linear patterns
- No straightforward
way to map how inputs
contribute to output
5

Simple Models
- Linear Regression
- Logistic Regression
- Naïve Bayes
- Exponential Smoothing
Accuracy Vs. Interpretability
Complicated Models
- Neural Networks
- Tree Based Ensembles:
- Random Forests
- Gradient Boosting
Machines
6

The Case For Complicated Models
The emergence of large datasets and cheap compute power has
made the enhanced pattern recognition capabilities of complicated
models more practical and relevant than ever before
7

Case In Point: An Empirical Comparison of
Supervised Learning Algorithms Using
Different Performance Metrics
8
https://bit.ly/3b8b1ug

8 Datasets
Testing Different Classification Problems
30 Different models
From Linear Models to Deep Neural Networks
9 Metrics
To Capture Different Aspects of Model Performance
9

But They Are Often Underused….
Because they lack an easy way to be understood by people who
need to use their outputs in decision making
Simpler models are often used as substitutes because they have
methods of explanation that are more tractable, despite their faults
11

Understanding Black Box
Models:
An Ongoing Struggle
2

14
Feature Importance Has Major
shortcomings
What direction does each
feature impact the model in?
What impact do individual
values have on the model?
What direction does each
feature Move the model?
How do model features
impact a prediction at a local
level?
How do individual values
impact the model?
How do model features impact a
prediction at a local level?

15
Simple Models
Model
Prediction
Explanation
Complicated Models
Model
Prediction
ExplanatoryModel(Model)
Explanation
Complicated models need a separate model to
study the relationship between their inputs and
outputs

What Properties Should A Model Explainer
have?
16
1 2 3

17
1). Start with your base prediction
2). Add up the
contributions of
all of your
features
3). And that should
add up to your model’s
final prediction
Local Accuracy
For live demonstration: https://bit.ly/policexray

18
Missingness
Since the contribution
of the Quarter
column is 0,
removing it should
not impact the
model’s prediction

19
Consistency
Since we increased
the value of Age from
40 to 50 and held
everything else
constant, its
contribution should not
decrease

Shapley
Additive
exPlanation
A Unified Way to Understand Any Model
3

21
SHAP is a method for deriving the contributions of
individual factors for any model
Its Main Parts:
Game Theory
How do players in a game collaborate with one another to achieve
payouts for their contributions?
Permutation
Selectively changing the arrangement of items in a system to measure
their impact against one another.

22
Wither Game Theory?
Every
column is a
‘participant’
in the game
The ‘prize’
they are
competing
for is the
model’s
prediction

23
Wither Game Theory?
The contribution
of each feature
to a prediction is
the ‘payout’ they
receive for their
efforts

24
Permutation
Sample 1 Sample 2 FrankenSample1 FrankenSample2
Values from
sample 1
Values from
sample 2
Values from RM column deliberately kept different

25
Permutation
By repeatedly shuffling different combinations of columns from
different samples and deliberately holding one value separate between
them, we can eventually find the expected contribution of each column
for a particular value in a model by Averaging differences
In their predictions

26
Please visit the link at
https://bit.ly/shapley_sampling to
continue!

27
Why We like Shapley Sampling
Values
- They satisfy the model explanation criteria: local
accuracy, missingness, and consistency
- They are model agnostic: they work regardless of the type of
model you use
- They capture interaction effects: always compare change of
a single value against a random combination of others

28
Why We don’t like Shapley
Sampling Values
- They’re expensive: calculating every shapley value for every
single combination of columns for every sample scales in 2k! time
- Don’t scale for certain models: could you calculate a
shapley value for every pixel in an image? No, probably not
- They require access to your data: you must be able to
examine your training set extensively in order to derive them

29
Shap
Deep
Shap
Tree
Shap
Kernel
Shap
Shapley
Sampling
LIME
The SHAP technique
couples the benefits
of shapley sampling
with performant
methods of
understanding
specific models

“ The SHAP framework identifies the class
of additive feature importance methods
(which includes six previous methods)
and shows there is a unique solution in
this class that adheres to desirable
properties.
- Scott Lundberg, “A Unified Approach to Interpreting Model
Preditions”, https://bit.ly/shap_paper
30

31
Please visit the link at
https://bit.ly/shap_demo to
continue!

Different
applications for shap
3

33
Interaction Effects
If called on, SHAP can decompose a SHAP value into its
independent effect, and the contributing effect of each of the
individual columns that contributed to it
Example Shap value for RM: 6.4
6.4 = 7.3 – 2.4 + 4.6 – 3.1
Total Main LSTAT CRIM TAX
Value Effect Effect Effect Effect

34
Example Interaction Matrix for one sample
Independent effect of a
column’s shap value across
the diagonal
Contributing effect of each
additional column for that
shap value
Add up the values in each
row to get the total shap
value for that column

35
Low values of LSTAT counteract the
impact of high RM
RM and TAX are functionally
independent

36
Classification
exp(-3.294) = .037 = model prediction
exp(-3.294) = .037 = model prediction

37
Deep Learning
SHAP uses the DeepLift algorithm to allow for the
computation of shapley values in a way that is well suited for
different types of neural networks
This means out-of-the-box functionality for image classifiers
with convolutional neural networks and language models with
transformers

38
SHAP + Deep Learning
Impact of removing individual words on weight
vectors associated with them
Impact of removing individual words on Resulting
model probability

40
Model Inference Causal Inference
SHAP values tell you about patterns inside your
data, but do not provide a genuine counterfactual

Shap is two tools wrapped up in one
Computational
A fast and exact way
to calculate shapley
values for a wide
variety of models
Graphing
Helpful charts that
use matplotlib to
make shap values
more digestible
41
It is perfectly fine to use shap for any combination of these
as suits your needs
Sometimes the graphing section can be a little buggy:
- Inconsistent support for different models
- No pytorch support yet

Despite some limitations and bugs, shap has
quickly established itself as the most widely
used tool for interpretable ml
42
In Approximately 2 Years:
14, 400
Github Stars
2,200
Forks

Extra Resources
Because Learning Is Fun
2

44
Useful Links
Main github repo: https://github.com/slundberg/shap
Original paper: https://arxiv.org/abs/1705.07874
Paper for TreeShap: https://arxiv.org/abs/1905.04610
Useful reading for interpretable ML: https://christophm.github.io/interpretable-ml-book/
Kaggle course: https://www.kaggle.com/dansbecker/shap-values
Live production example of shap: https://www.policexray.com/home/ (disclosure: I wrote it)

Thanks!
Any questions?
You can find me at:
⊹ jonathan@jonathanbech.tel
⊹ www.jonathanbech.tel
45

Understanding Black Box Models with Shapley Values

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Understanding Black Box Models with Shapley Values

Similar to Understanding Black Box Models with Shapley Values (20)

Recently uploaded

Recently uploaded (20)

Understanding Black Box Models with Shapley Values