3. What’s all the fuss about?
shapley
● Game theory approach to giving “credit”
to cooperative group
● Shapley values calculate the
importance of a feature by
comparing what a model predicts
with and without the feature.
However, since the order in which a
model sees features can affect its
predictions, this is done in every
possible order, so that the features
are fairly compared. source
shap
● What Shapley does is quantifying the contribution that
each player brings to the game. What SHAP does is
quantifying the contribution that each feature brings to
the prediction made by the model.
● One game: one observation. SHAP is local
● Lundberg, Scott M., and Su-In Lee. “A unified
approach to interpreting model predictions.” Advances
in Neural Information Processing Systems (2017)
● implementation of shapley (TreeSHAP, KernelSHAP)
● connects LIME and shapley values
● one line of python gives you feature explanations
9. ● Imagine a machine learning model that
predicts the income of a person
knowing age, gender and job of the
person.
● Shapley values are based on the idea
that the outcome of each possible
combination (or coalition) of players
should be considered to determine the
importance of a single player. In our
case, this corresponds to each possible
combination of f features (f going from
0 to F, F being the number of all
features available, in our example 3).
● In math, this is called a “power set” and
can be represented as a tree. h/t this article
10. ● Cardinality of a power set is 2 ^ n,
where n is the number of elements of
the original set.
● SHAP requires to train a distinct
predictive model for each distinct
coalition in the power set (2 ^ F models)
● Models are completely equivalent:
hyperparameters and their training data
(which is the full dataset). The only thing
that changes is the set of features
included in the model.
● Imagine that we have already trained
our 8 models on the same training data.
take a new observation (let us call it x₀)
and see what the 8 different models
predict for the same observation x₀.
11. ● Two nodes connected by an edge differ
for just one feature, the gap between
the predictions of two connected nodes
due to additional feature. This is called
“marginal contribution” of a feature.
● Each edge represents the marginal
contribution brought by a feature
● Overall effect of Age on the final model
(i.e. the SHAP value of Age for x₀)
● Consider marginal contribution of Age
in all the models - edges highlighted in
red.
● How does SHAP figure out the weights -
next section!
18. Shapley Equation
for a subset S, the weight is the
product of the number of
permutations of S and the number of
permutations of the complement of S
and i (i.e.; N{S∪{i}}).
20. Shapley in ML
● Shapley value is computed by perturbing
input features and seeing how changes to
the input features correspond to the final
model prediction.
● Shapley value = the average marginal
contribution of a feature to the overall
model score
● For ML models, it’s not possible to just
“exclude” a feature when determining a
prediction.
● The formulation of Shapley values within
an ML context simulates “excluded”
features by sampling the empirical
distribution of the feature’s values and
averaging over multiple samples (Monte
Carlo with other data sample’s features -
FrakenFeatures!)
21. Shap Package
Explainers for
● Tree models (e.g. XGBoost)
● Deep explainer (neural nets)
● Linear explainer (regression)
25. Advantages / Disadvantages
● Everyone likes explainability
● SHAP python package is two lines and
fairly fast (especially for tree based
models)
● Model agnostic (black box)
● Performed on each data point - so we
get granularity to a single point, and can
aggregate over the whole model or
subsets of data.
● Brute force calculation is combinatorial,
SHAP does some fancy Monte Carlo like
approximation, especially when model
structure (think trees) is know - but it is
still a compute beast
● Stakeholders (who have not heard
Junlin’s talk yet) will mistake SHAP
analysis for causation NOT correlation
● SHAP may make prediction on
unrealistic data
● There is no native SPARK version (so you
have to convert pySpark dataframes to
pandas.