1) The document discusses various methods for interpreting machine learning models, including global and local surrogate models, feature importance plots, Shapley values, partial dependence plots, and individual conditional expectation plots.
2) It explains that interpretability refers to how understandable the reasons for a model's predictions are to humans. Interpretability methods can provide global explanations of entire models or local explanations of individual predictions.
3) The document advocates that improving interpretability is important for addressing issues like bias in machine learning systems and increasing trust in applications used for high-stakes decisions like criminal justice.
Generative AI on Enterprise Cloud with NiFi and Milvus
Citython presentation
1. Interpretability: Challenging the Black Box of
Machine Learning
Ankit Tewari
Research Data Scientist
Knowledge Engineering and Machine Learning Group (KEMLG)
Biomedical and Biophysical Signal Processing Group (B2S LAB)
Universitat Politecnica de Catalunya (UPC)
November 10, 2018
Smart City Week: City, Society and Technology
2. Lunchtime, Storytime!
1. Amazon’s AI based recruitment tool that favored men for
technical jobs:penalized the resume files that in-
cluded the word ”women’s”, as in ”women’s”chess club captain”;
https://www.theguardian.com/technology/2018/oct/10/amazon-
hiring-ai-gender-bias-recruiting-engine
2. Racial and Gender Bias in AI based Criminal Justice
System: ProPublica compared COMPAS’s risk assessments for
7,000 people arrested in a Florida county with how often they
reoffended; https://www.propublica.org/article/machine-bias-
risk-assessments-in-criminal-sentencing
4. Solutions?
While there are many reasons such biases are encountered in our
machine learning systems, there are pretty straight-forward
mechanisms to address. But, remember straight forward is not
always simple!
Data preprocessing techniques for classification without
discrimination. (statistical parity)
Discrimination aware Machine Learning Models
and many more approaches!
However, our discussion is focused on examining whether and how
much biased a system is through explaining the predictions made
by the system.
5. Prediction Accuracy versus Explainability
Remember, nothing comes free of cost. And a good accuracy
comes often with a complex model, that is not interpretable.
6. Smarter the System, the more Black the Box gets!
Remember, nothing comes free of cost. And a good accuracy
comes often with a complex model, that is not interpretable.
7. The intolerable silence!
Silence of your lover is different from the silence of your computer.
It signifies the barrier between tolerance and intolerance!
8. Interpretability: The ray of hope :)
Definition: Interpretability is the degree to which a human
can understand the cause of a decision. It is the degree to
which a human can consistently predict the model’s result.
The higher the interpretability of a model, the easier it is for
someone to comprehend why certain decisions (read: predictions)
were made.
9. Interpretability versus Interpretation
While interpretability is a measure of the extent to which a
machine learning model can be explained, the interpretation
is the explanation associated with the model’s predictions.
1. Importance and Scope
2. Taxonomy of Interpretability Methods
10. Taxonomy of Interpretability Models
Intrinsic or post hoc?
Intrinsic interpretability means selecting and training a
machine learning model that is considered to be intrinsically
interpretable (for example short decision trees). Post hoc
interpretability means selecting and training a black box
model (for example a neural network) and applying
interpretability methods after the training (for example
measuring the feature importance).
Model-specific or model-agnostic?
Model-specific interpretation tools are limited to specific
model classes. Model-agnostic tools can be used on any
machine learning model and are usually post hoc.
Local or Global?
Does the interpretation method explain a single prediction or
the entire model behavior?
11. Model Agnostic Methods for Interpretability
Global Surrogate Models
Local Surrogate Models (LIME)
Feature Importance Plot
Shapley Values
Partial Conditional Dependence (PDP)
Individual Conditional Expectation (ICE)
12. Global Surrogate Models
We want to approximate our black box prediction function ˆf (x) as closely
as possible with the surrogate model prediction function ˆg(x), under the
constraint that is interpretable. We can make use of any interpretable
model, say, linear regression model
ˆg(x) = β0 + β1x1 + · · · + βP xP (1)
Now,the idea is to fit ˆf (x) on the dataset and obtain predictions ˆy.
Then, we train the ˆg(x) using ˆy as the target. The obtained surrogate
model ˆg can be used to interpret the blackbox model ˆf .
We can also measure how well the surrogate model fits the original black
box model with the R squared measure as an example-
R2
= 1 − SSE
SST = 1 −
n
i=1(ˆy∗
i −ˆyi )2
n
i=1(ˆyi −¯ˆy)2
13. The terminal nodes of a surrogate tree that approximates the
behaviour of a support vector machine trained on the bike rental
dataset. The distributions in the nodes show that the surrogate
tree predicts a higher number of rented bikes when the weather is
above around 13 degrees (Celsius) and when the day was later in
the 2 year period (cut point at 435 days).
14. Local Surrogate Model (LIME)
Intuitively, the local surrogate models attempt to explain a single
instance in the same way, the global surrogate models do.
Mathematically, the local surrogate models can be described as-
explanation(x) = arg ming∈G L(f , g, πx ) + Ω(g)
The explanation model for instance x is the model g (e.g. linear
regression model) that minimizes loss L (e.g. mean squared error),
which measures how close the explanation is to the prediction of
the original model f (e.g. an xgboost model), while the model
complexity Ω(g) is kept low (e.g. favor fewer features).
15. Local Surrogate Model (LIME)
We can describe the recipe for fitting local surrogate models as follows:
We first choose our instance (observations) of interest for which we
want to have an explanation of its black box prediction
Then we perturb our dataset and get the black box predictions for
these new data points
We then weight the new samples by their proximity to the instance
of interest to allow the model to learn locally
Finally, we fit a weighted, interpretable model on the dataset with
the variations and explain prediction by interpreting the local model
17. Local Surrogate Model (LIME)
A) The plot displays the decision boundaries learned by a machine
learning model. In this case it was a Random Forest, but it does
not matter, because LIME is model-agnostic.
B) The yellow point is the instance of interest, which we want to
explain. The black dots are data sampled from a normal
distribution around the means of the features in the training
sample. This needs to be done only once and can be reused for
other explanations.
C) Introducing locality by giving points near the instance of
interest higher weights.
D) The colours and signs of the grid display the classifications of
the locally learned model form the weighted samples. The white
line marks the decision boundary (P(class) = 0.5) at which the
classification of the local model changes.
18. Local Surrogate Model (LIME)
Application of the LIME on a counter-terrorism dataset, an
ongoing project that aims to measure the fingerprints of terrorist
outfits across the globe
19. Feature Importance
A feature’s importance is the increase in the model’s prediction
error after we permuted the feature’s values (breaks the
relationship between the feature and the outcome).
Just like the global surrogate models, it provides a salient overview
of how the model is behaving globally.
20. Feature Importance
Feature Importance
Input: Trained model ˆf , feature matrix X , target vector Y , error
measure L(Y , ˆY )
1. Estimate the original model error eorig (ˆf ) = L(Y , ˆf (X)) (e.g. mean
squared error)
2. For each feature j ∈ {1, ..., p} d
Generate feature matrix Xpermj
by permuting feature Xj in X. This
breaks the association between Xj and Y .
Estimate error eperm = L(Y , ˆf (Xpermj
)) based on the predictions of
the permuted data
Calculate permutation feature importance FIj = eperm(ˆf )/eorig (ˆf ) .
Alternatively, the difference can be used: FIj = eperm(ˆf ) − eorig (f )
3. Sort variables by descending FI.
22. Shapley Values
The Shapley value is the average marginal contribution of a
feature value over all possible coalitions.
Predictions can be explained by assuming that each
feature is a ’player’ in a game where the prediction is
the payout. The Shapley value - a method from
coalitional game theory - tells us how to fairly distribute
the ’payout’ among the features.
The interpretation of the Shapley value. φij for feature j
and instance i is: the feature value xij contributed φij
towards the prediction for instance i compared to the
average prediction for the dataset. The Shapley value
works for both classification (if we deal with probabilities) and
regression. We use the Shapley value to analyse the
predictions of a Random Forest model predicting the
absenteeism at workplace.
24. Partial Dependence Plot (PDP)
The partial dependence plot (PDP or PD plot) shows the
marginal effect of a feature on the predicted outcome of a
previously fit model (J. H. Friedman). The prediction function is
fixed at a few values of the chosen features and averaged over the
other features.
In practice, the set of features Xs usually only contains one feature
or a maximum of two, because one feature produces 2D plots and
two features produce 3D plots. Everything beyond that is quite
tricky. Even 3D on a 2D paper or monitor is already challenging.
26. Indepent Conditional Expectation (ICE)
For a chosen feature, Individual Conditional Expectation (ICE)
plots draw one line per instance, representing how the instance’s
prediction changes when the feature changes.
27. Individual Conditional Expectation (ICE)
An ICE plot visualizes the dependence of the predicted response on
a feature for EACH instance separately, resulting in multiple lines,
one for each instance, compared to one line in partial dependence
plots. A PDP is the average of the lines of an ICE plot.
The values for a line (and one instance) can be computed by
leaving all other features the same, creating variants of this
instance by replacing the featureˆas value with values from a grid
and letting the black box make the predictions with these newly
created instances. The result is a set of points for an instance with
the feature value from the grid and the respective predictions.
29. Evaluating the Interpretability
Application Level Evaluation: Put the explanation into the
product and let the end user test it.
Human Level Evaluation: is a simplified application level
evaluation. The difference is that these experiments are not
conducted with the domain experts, but with lay humans. An
example would be to show a user different explanations and
the human would choose the best.
Functional Level Evaluation: This works best when the
class of models used was already evaluated by someone else in
a human level evaluation. For example it might be known that
the end users understand decision trees. In this case, a proxy
for explanation quality might be the depth of the tree. Shorter
trees would get a better explainability rating.
30. Questions?
Thank you so much for being the part of this talk. You can also
write me at ankitt.nic@gmail.com :)