SlideShare a Scribd company logo
1 of 270
Interpreting Machine Learning Models:
State-of-the-art, Challenges, Opportunities
Hima Lakkaraju
Schedule for Today
• 9am to 1020am: Introduction; Overview of Inherently Interpretable Models
• 1020am to 1040am: Break
• 1040am to 12pm: Overview of Post hoc Explanation Methods
• 12pm to 1pm: Lunch
• 105pm to 125pm: Breakout Groups
• 125pm to 245pm: Evaluating and Analyzing Model Interpretations and Explanations
• 245pm to 3pm: Break
• 3pm to 4pm: Analyzing Model Interpretations and Explanations, and Future Research Directions
Motivation
3
Machine Learning is EVERYWHERE!!
[ Weller 2017 ]
Is Model Understanding Needed Everywhere?
4
[ Weller 2017 ]
When and Why Model Understanding?
• Not all applications require model understanding
• E.g., ad/product/friend recommendations
• No human intervention
• Model understanding not needed because:
• Little to no consequences for incorrect predictions
• Problem is well studied and models are extensively validated in real-
world applications 🡪 trust model predictions
5
[ Weller 2017, Lipton 2017, Doshi-Velez and Kim 2016 ]
When and Why Model Understanding?
ML is increasingly being employed in complex high-stakes settings
6
When and Why Model Understanding?
• High-stakes decision-making settings
• Impact on human lives/health/finances
• Settings relatively less well studied, models not extensively validated
• Accuracy alone is no longer enough
• Train/test data may not be representative of data encountered in
practice
• Auxiliary criteria are also critical:
• Nondiscrimination
• Right to explanation
• Safety
When and Why Model Understanding?
• Auxiliary criteria are often hard to quantify (completely)
• E.g.: Impossible to predict/enumerate all scenarios violating safety of an
autonomous car
• Incompleteness in problem formalization
• Hinders optimization and evaluation
• Incompleteness ≠ Uncertainty; Uncertainty can be quantified
When and Why Model Understanding?
9
Model understanding becomes critical when:
(i) models not extensively validated in applications;
train/test data not representative of real time data
(i) key criteria are hard to quantify, and we need to rely on
a “you will know it when you see it” approach
Example: Why Model Understanding?
10
Predictive
Model
Input
Prediction = Siberian Husky
Model Understanding
This model is
relying on incorrect
features to make
this prediction!! Let
me fix the model
Model understanding facilitates debugging.
Example: Why Model Understanding?
11
Predictive
Model
Defendant Details
Prediction = Risky to Release
Model Understanding
Race
Crimes
Gender
This prediction is
biased. Race and
gender are being
used to make the
prediction!!
Model understanding facilitates bias detection.
[ Larson et. al. 2016 ]
Example: Why Model Understanding?
12
Predictive
Model
Loan Applicant Details
Prediction = Denied Loan
Model Understanding
Increase salary by
50K + pay credit
card bills on time
for next 3 months
to get a loan
Loan Applicant
I have some means
for recourse. Let me
go and work on my
promotion and pay
my bills on time.
Model understanding helps provide recourse to individuals
who are adversely affected by model predictions.
Example: Why Model Understanding?
13
Predictive
Model
Patient Data
Model Understanding This model is using
irrelevant features when
predicting on female
subpopulation. I should
not trust its predictions
for that group.
Predictions
25, Female, Cold
32, Male, No
31, Male, Cough
.
.
.
.
Healthy
Sick
Sick
.
.
Healthy
Healthy
Sick
If gender = female,
if ID_num > 200, then sick
If gender = male,
if cold = true and cough = true, then sick
Model understanding helps assess if and when to trust
model predictions when making decisions.
Example: Why Model Understanding?
14
Predictive
Model
Patient Data
Model Understanding
This model is using
irrelevant features when
predicting on female
subpopulation. This
cannot be approved!
Predictions
25, Female, Cold
32, Male, No
31, Male, Cough
.
.
.
.
Healthy
Sick
Sick
.
.
Healthy
Healthy
Sick
If gender = female,
if ID_num > 200, then sick
If gender = male,
if cold = true and cough = true, then sick
Model understanding allows us to vet models to determine
if they are suitable for deployment in real world.
Summary: Why Model Understanding?
15
Debugging
Bias Detection
Recourse
If and when to trust model predictions
Vet models to assess suitability for
deployment
Utility
End users (e.g., loan applicants)
Decision makers (e.g., doctors, judges)
Regulatory agencies (e.g., FDA, European
commission)
Researchers and engineers
Stakeholders
Achieving Model Understanding
Take 1: Build inherently interpretable predictive models
16
[ Letham and Rudin 2015; Lakkaraju et. al. 2016 ]
Achieving Model Understanding
Take 2: Explain pre-built models in a post-hoc manner
17
Explainer
[ Ribeiro et. al. 2016, Ribeiro et al. 2018; Lakkaraju et. al. 2019 ]
Inherently Interpretable Models vs.
Post hoc Explanations
In certain settings, accuracy-interpretability trade offs may exist.
18
Example
[ Cireşan et. al. 2012, Caruana et. al. 2006, Frosst et. al. 2017, Stewart 2020 ]
Inherently Interpretable Models vs.
Post hoc Explanations
19
complex models might
achieve higher accuracy
can build interpretable +
accurate models
Sometimes, you don’t have enough data to build your model from scratch.
And, all you have is a (proprietary) black box!
Inherently Interpretable Models vs.
Post hoc Explanations
20
[ Ribeiro et. al. 2016 ]
Inherently Interpretable Models vs.
Post hoc Explanations
21
If you can build an interpretable model which is also adequately
accurate for your setting, DO IT!
Otherwise, post hoc explanations come to the rescue!
Agenda
❑ Inherently Interpretable Models
❑ Post hoc Explanation Methods
❑ Evaluating Model Interpretations/Explanations
❑ Empirically & Theoretically Analyzing Interpretations/Explanations
❑ Future of Model Understanding
22
Agenda
23
❑ Inherently Interpretable Models
❑ Post hoc Explanation Methods
❑ Evaluating Model Interpretations/Explanations
❑ Empirically & Theoretically Analyzing Interpretations/Explanations
❑ Future of Model Understanding
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
• Attention Based Models
Bayesian Rule Lists
• A rule list classifier for stroke prediction
[Letham et. al. 2016]
Bayesian Rule Lists
• A generative model designed to produce rule lists (if/else-if) that
strike a balance between accuracy, interpretability, and
computation
• What about using other similar models?
• Decision trees (CART, C5.0 etc.)
• They employ greedy construction methods
• Not computationally demanding but affects quality of solution – both
accuracy and interpretability
27
Bayesian Rule Lists: Generative Model
28
is a set of pre-mined
antecedents
Model parameters are inferred using the Metropolis-Hastings algorithm which is a
Markov Chain Monte Carlo (MCMC) Sampling method
Pre-mined Antecedents
• A major source of practical feasibility: pre-mined antecedents
• Reduces model space
• Complexity of problem depends on number of pre-mined antecedents
• As long as pre-mined set is expressive, accurate decision list can
be found + smaller model space means better generalization
(Vapnik, 1995)
29
Interpretable Decision Sets
• A decision set classifier for disease diagnosis
[Lakkaraju et. al. 2016]
Interpretable Decision Sets: Desiderata
• Optimize for the following criteria
• Recall
• Precision
• Distinctness
• Parsimony
• Class Coverage
• Recall and Precision 🡪 Accurate predications
• Distinctness, Parsimony, and Class Coverage 🡪Interpretability
IDS: Objective Function
32
IDS: Objective Function
33
IDS: Objective Function
34
IDS: Objective Function
35
IDS: Objective Function
36
IDS: Objective Function
37
IDS: Optimization Procedure
38
• The problem is a non-normal, non-monotone, submodular
optimization problem
• Maximizing a non-monotone submodular function is NP-hard
• Local search method which iteratively adds and removes
elements until convergence
• Provides a 2/5 approximation
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
• Attention Based Models
Risk Scores: Motivation
• Risk scores are widely used in medicine and criminal justice
• E.g., assess risk of mortality in ICU, assess the risk of recidivism
• Adoption 🡪 decision makers find them easy to understand
• Until very recently, risk scores were constructed manually by
domain experts. Can we learn these in a data-driven fashion?
40
Risk Scores: Examples
41
• Recidivism
• Loan Default
[Ustun and Rudin, 2016]
Objective function to learn risk scores
42
Above turns out to be a mixed integer program, and is optimized using a cutting plane
method and a branch-and-bound technique.
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
• Attention Based Models
Generalized Additive Models (GAMs)
44
[Lou et. al., 2012; Caruana et. al., 2015]
Formulation and Characteristics of GAMs
45
g is a link function; E.g., identity function in case of
regression;
log (y/1 – y) in case of classification;
fi is a shape function
GAMs and GA2Ms
• While GAMs model first order terms, GA2Ms model second order
feature interactions as well.
GAMs and GA2Ms
• Learning:
• Represent each component as a spline
• Least squares formulation; Optimization problem to balance
smoothness and empirical error
• GA2Ms: Build GAM first and and then detect and rank all possible
pairs of interactions in the residual
• Choose top k pairs
• k determined by CV
47
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
• Attention Based Models
Prototype Selection for Interpretable
Classification
• The goal here is to identify K prototypes (instances) from the
data s.t. a new instance which will be assigned the same label as
the closest prototype will be correctly classified (with a high
probability)
• Let each instance “cover” the ϵ - neighborhood around it.
• Once we define the neighborhood covered by each instance, this
problem becomes similar to the problem of finding rule sets, and
can be solved analogously.
[Bien et. al., 2012]
Prototype Selection for Interpretable
Classification
Prototype Layers in Deep Learning Models
[Li et. al. 2017, Chen et. al. 2019]
Prototype Layers in Deep Learning Models
Prototype Layers in Deep Learning Models
Prototype Layers in Deep Learning Models
Prototype Layers in Deep Learning Models
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
• Attention Based Models
Attention Layers in Deep Learning Models
• Let us consider the example of machine translation
[Bahdanau et. al. 2016; Xu et. al. 2015]
Input Encoder Context Vector Decoder Outputs
I
am
Bob
h1
h2
h3
C
s1
s2
s3
Je
suis
Bob
Attention Layers in Deep Learning Models
• Let us consider the example of machine translation
[Bahdanau et. al. 2016; Xu et. al. 2015]
Input Encoder Context Vector Decoder Outputs
I
am
Bob
h1
h2
h3
s1
s2
s3
Je
suis
Bob
c1
c2
c3
Attention Layers in Deep Learning Models
• Context vector corresponding to si can be written as follows:
• captures the attention placed on input token j when
determining the decoder hidden state si; it can be computed as a
softmax of the “match” between si-1 and hj
Inherently Interpretable Models
• Rule Based Models
• Risk Scores
• Generalized Additive Models
• Prototype Based Models
• Attention Based Models
Agenda
61
❑ Inherently Interpretable Models
❑ Post hoc Explanation Methods
❑ Evaluating Model Interpretations/Explanations
❑ Empirically & Theoretically Analyzing Interpretations/Explanations
❑ Future of Model Understanding
What is an Explanation?
62
What is an Explanation?
Definition: Interpretable description of the model behavior
63
Classifier User
Explanation
Faithful Understandable
Summarize with a program/rule/tree
Classifier
What is an Explanation?
Definition: Interpretable description of the model behavior
64
User
Send all the model parameters θ?
Send many example predictions?
Select most important features/points
Describe how to flip the model prediction
...
[ Lipton 2016 ]
Local Explanations vs. Global Explanations
65
Explain individual predictions Explain complete behavior of the model
Help unearth biases in the local
neighborhood of a given instance
Help shed light on big picture biases
affecting larger subgroups
Help vet if individual predictions are
being made for the right reasons
Help vet if the model, at a high level, is
suitable for deployment
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
66
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
67
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
LIME: Local Interpretable Model-Agnostic
Explanations
68
1. Sample points around xi
[ Ribeiro et al. 2016 ]
LIME: Local Interpretable Model-Agnostic
Explanations
69
1. Sample points around xi
2. Use model to predict labels for each sample
LIME: Local Interpretable Model-Agnostic
Explanations
70
1. Sample points around xi
2. Use model to predict labels for each sample
3. Weigh samples according to distance to xi
LIME: Local Interpretable Model-Agnostic
Explanations
71
1. Sample points around xi
2. Use model to predict labels for each sample
3. Weigh samples according to distance to xi
4. Learn simple linear model on weighted
samples
LIME: Local Interpretable Model-Agnostic
Explanations
72
1. Sample points around xi
2. Use model to predict labels for each sample
3. Weigh samples according to distance to xi
4. Learn simple linear model on weighted
samples
5. Use simple linear model to explain
Only 1 mistake!
Predict Wolf vs Husky
73
[ Ribeiro et al. 2016 ]
Predict Wolf vs Husky
We’ve built a great snow detector…
74
[ Ribeiro et al. 2016 ]
SHAP: Shapley Values as Importance
Marginal contribution of each feature towards the prediction,
averaged over all possible permutations.
75
xi
P(y) = 0.9
xi
P(y) = 0.8
M(xi, O) = 0.1
O
O/xi
Attributes the prediction to each of the features.
[ Lundberg & Lee 2017 ]
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
76
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Anchors
[ Ribeiro et al. 2018 ]
• Perturb a given instance x to generate local
neighborhood
• Identify an “anchor” rule which has the maximum
coverage of the local neighborhood and also achieves a
high precision.
Salary Prediction
[ Ribeiro et al. 2018 ]
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
79
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Saliency Map Overview
80
Input Model Predictions
Junco Bird
81
What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’?
Input Model Predictions
Junco Bird
Saliency Map Overview
82
What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’?
Input Model Predictions
Junco Bird
Saliency Map Overview
Modern DNN Setting
83
Model
class specific logit
Input Model Predictions
Junco Bird
Input-Gradient
84
Input-Gradient
Logit
Input
Same dimension as
the input.
Input Model Predictions
Junco Bird
Baehrens et. al. 2010; Simonyan et. al. 2014 .
Input-Gradient
85
Input Model Predictions
Junco Bird
Baehrens et. al. 2010; Simonyan et. al. 2014 .
Input-Gradient
Logit
Visualize as a heatmap
Input
Input-Gradient
86
Input Model Predictions
Junco Bird
Input-Gradient
Logit
Input
Challenges
● Visually noisy & difficult to
interpret.
● ‘Gradient saturation.’
Shrikumar et. al. 2017.
Baehrens et. al. 2010; Simonyan et. al. 2014 .
SmoothGrad
87
Input Model Predictions
Junco Bird
Smilkov et. al. 2017
SmoothGrad
Gaussian noise
Average Input-gradient of
‘noisy’ inputs.
SmoothGrad
88
Input Model Predictions
Junco Bird
Smilkov et. al. 2017
SmoothGrad
Gaussian noise
Average Input-gradient of
‘noisy’ inputs.
Integrated Gradients
89
Input Model Predictions
Junco Bird
Baseline input
Path integral: ‘sum’ of
interpolated gradients
Sundararajan et. al. 2017
Integrated Gradients
90
Input Model Predictions
Junco Bird
Path integral: ‘sum’ of
interpolated gradients
Sundararajan et. al. 2017
Baseline input
Gradient-Input
91
Input Model Predictions
Junco Bird
Gradient-Input
Input gradient
Input
Element-wise product of input-
gradient and input.
Shrikumar et. al. 2017, Ancona et. al. 2018.
Gradient-Input
92
Input Model Predictions
Junco Bird
Gradient-Input
logit gradient
Input
Shrikumar et. al. 2017, Ancona et. al. 2018.
Element-wise product of input-
gradient and input.
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
93
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Prototypes/Example Based Post hoc Explanations
Use examples (synthetic or natural) to explain individual predictions
◆ Influence Functions (Koh & Liang 2017)
● Identify instances in the training set that are responsible for the prediction of a
given test instance
◆ Activation Maximization (Erhan et al. 2009)
● Identify examples (synthetic or natural) that strongly activate a function (neuron)
of interest
94
95
Input Model Predictions
Junco Bird
Training Point Ranking via Influence Functions
Which training data points have the most ‘influence’ on the test loss?
96
Input Model Predictions
Junco Bird
Which training data points have the most ‘influence’ on the test loss?
Training Point Ranking via Influence Functions
97
Training Point Ranking via Influence Functions
Influence Function: classic tool used in robust statistics for assessing
the effect of a sample on regression parameters (Cook & Weisberg, 1980).
Instead of refitting model for every data point, Cook’s distance provides analytical
alternative.
98
Training Point Ranking via Influence Functions
Koh & Liang 2017
Training sample point
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
99
Training Point Ranking via Influence Functions
Koh & Liang 2017
Training sample point
ERM Solution
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
UpWeighted ERM Solution
100
Training Point Ranking via Influence Functions
Koh & Liang 2017
Training sample point
ERM Solution
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
UpWeighted ERM Solution
Influence of Training Point on Parameters
Influence of Training Point on Test-Input’s loss
101
Training Point Ranking via Influence Functions
Applications:
• compute self-influence to identify mislabelled
examples;
• diagnose possible domain mismatch;
• craft training-time poisoning examples.
[ Koh & Liang 2017 ]
102
Challenges and Other Approaches
Influence function Challenges:
1. scalability: computing hessian-vector products can be tedious in
practice.
1. non-convexity: possibly loose approximation for deeper networks
(Basu et. al. 2020).
103
Challenges and Other Approaches
Influence function Challenges:
1. scalability: computing hessian-vector products can be tedious in
practice.
1. non-convexity: possibly loose approximation for ‘deeper’ networks
(Basu et. al. 2020).
Alternatives:
• Representer Points (Yeh et. al. 2018).
• TracIn (Pruthi et. al. appearing at NeuRIPs 2020).
Activation Maximization
104
These approaches identify examples, synthetic or natural, that
strongly activate a function (neuron) of interest.
Activation Maximization
105
These approaches identify examples, synthetic or natural, that
strongly activate a function (neuron) of interest.
Implementation Flavors:
• Search for natural examples within a specified set
(training or validation corpus) that strongly activate a
neuron of interest;
• Synthesize examples, typically via gradient descent,
that strongly activate a neuron of interest.
Feature Visualization
106
Olah et. al. 2017
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
107
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Counterfactual Explanations
What features need to be changed and by how much to flip a model’s prediction?
108
[Goyal et. al., 2019]
Counterfactual Explanations
As ML models increasingly deployed to make high-stakes decisions
(e.g., loan applications), it becomes important to provide recourse
to affected individuals.
109
Counterfactual Explanations
What features need to be changed and by
how much to flip a model’s prediction ?
(i.e., to reverse an unfavorable outcome).
Predictive
Model
Deny Loan
Loan Application
Counterfactual Explanations
110
Recourse: Increase your salary by 5K & pay your credit card bills on time for next 3 months
f(x)
Applicant
Counterfactual Generation
Algorithm
Generating Counterfactual Explanations:
Intuition
111
Proposed solutions differ on:
1. How to choose among
candidate counterfactuals?
1. How much access is needed to
the underlying predictive model?
Take 1: Minimum Distance Counterfactuals
112
Distance Metric
Predictive Model Desired Outcome
[ Wachter et. al., 2018 ]
Original Instance
Counterfactual
Choice of distance metric dictates what kinds of counterfactuals are chosen.
Wachter et. al. use normalized Manhattan distance.
Take 1: Minimum Distance Counterfactuals
113
Wachter et. al. solve a differentiable, unconstrained version of the objective
using ADAM optimization algorithm with random restarts.
This method requires access to gradients of the underlying predictive model.
[ Wachter et. al., 2018 ]
Take 1: Minimum Distance Counterfactuals
114
Not feasible to act upon these features!
Take 2: Feasible and Least Cost Counterfactuals
• is the set of feasible counterfactuals (input by end user)
• E.g., changes to race, gender are not feasible
• Cost is modeled as total log-percentile shift
• Changes become harder when starting off from a higher percentile value
115
[ Ustun et. al., 2019 ]
• Ustun et. al. only consider the case where the model is a linear classifier
• Objective formulated as an IP and optimized using CPLEX
• Requires complete access to the linear classifier i.e., weight vector
116
Take 2: Feasible and Least Cost Counterfactuals
[ Ustun et. al., 2019 ]
117
Question: What if we have a black box or a non-linear classifier?
Answer: generate a local linear model approximation (e.g., using LIME)
and then apply Ustun et. al.’s framework
Take 2: Feasible and Least Cost Counterfactuals
[ Ustun et. al., 2019 ]
118
Changing one feature without affecting another might not be possible!
Take 2: Feasible and Least Cost Counterfactuals
[ Ustun et. al., 2019 ]
After 1 year
Take 3: Causally Feasible Counterfactuals
119
Recourse:
Reduce current debt
from 3250$ to 1000$
My current debt has
reduced to 1000$.
Please give me loan.
Loan Applicant
f(x)
Your age increased by 1
year and the recourse is
no longer valid! Sorry!
Important to account for feature interactions when generating counterfactuals!
But how?!
Loan Applicant Predictive Model
[ Mahajan et. al., 2019, Karimi et. al. 2020 ]
120
Take 3: Causally Feasible Counterfactuals
[ Ustun et. al., 2019 ]
is the set of causally feasible counterfactuals permitted according to a given
Structural Causal Model (SCM).
Question: What if we don’t have access to the structural causal model?
Counterfactuals on Data Manifold
121
• Generated counterfactuals should lie on the data manifold
• Construct Variational Autoencoders (VAEs) to map input instances to latent
space
• Search for counterfactuals in the latent space
• Once a counterfactual is found, map it back to the input space using the
decoder
[ Verma et. al., 2020, Pawelczyk et. al., 2020]
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
122
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Global Explanations
● Explain the complete behavior of a given (black box) model
○ Provide a bird’s eye view of model behavior
● Help detect big picture model biases persistent across larger subgroups
of the population
○ Impractical to manually inspect local explanations of several instances to
ascertain big picture biases!
● Global explanations are complementary to local explanations
123
Local vs. Global Explanations
124
Explain individual predictions
Help unearth biases in the local
neighborhood of a given instance
Help vet if individual predictions are
being made for the right reasons
Explain complete behavior of the model
Help shed light on big picture biases
affecting larger subgroups
Help vet if the model, at a high level, is
suitable for deployment
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
125
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Global Explanation as a Collection of Local Explanations
How to generate a global explanation of a (black box) model?
• Generate a local explanation for every instance in the data using
one of the approaches discussed earlier
• Pick a subset of k local explanations to constitute the global
explanation
126
What local explanation technique to use?
How to choose the subset of k local explanations?
Global Explanations from Local Feature Importances: SP-LIME
LIME explains a single prediction
local behavior for a single instance
Can’t examine all explanations
Instead pick k explanations to show to the user
Diverse
Should not be redundant in
their descriptions
Representative
Should summarize the
model’s global behavior
Single explanation
SP-LIME uses submodular optimization
and greedily picks k explanations
83
Model Agnostic
[ Ribeiro et al. 2016 ]
84
Global Explanations from Local Rule Sets: SP-Anchor
• Use Anchors algorithm discussed earlier to obtain local rule sets
for every instance in the data
• Use the same procedure to greedily select a subset of k local rule
sets to correspond to the global explanation
[ Ribeiro et al. 2018 ]
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
129
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
130
Representation Based Approaches
• Derive model understanding by analyzing intermediate representations of a DNN.
• Determine model’s reliance on ‘concepts’ that are semantically meaningful to
humans.
Representation Based Explanations
131
[Kim et. al., 2018]
Zebra
(0.97)
How important is the notion of “stripes” for this prediction?
Representation Based Explanations: TCAV
132
Examples of the concept “stripes”
Random examples
Train a linear classifier to separate
activations
The vector orthogonal to the decision boundary pointing
towards the “stripes” class quantifies the concept “stripes”
Compute derivatives by leveraging this vector to determine
the importance of the notion of stripes for any given
prediction
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
134
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Model Distillation for Generating Global Explanations
135
Model
Predictions
Predictive
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
Simpler, interpretable model
which is optimized to mimic
the model predictions
f(x)
Generalized Additive Models as Global Explanations
136
Model
Predictions
Black Box
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
[Tan et. al., 2019]
Model Agnostic
Generalized Additive Models as Global Explanations:
Shape Functions for Predicting Bike Demand
137
[Tan et. al., 2019]
Generalized Additive Models as Global Explanations:
Shape Functions for Predicting Bike Demand
How does bike demand vary as a function of temperature?
138
[Tan et. al., 2019]
Generalized Additive Models as Global Explanations
Generalized Additive Model (GAM) :
139
[Tan et. al., 2019]
Shape functions of
individual features
Higher order
feature interaction
terms
Fit this model to the predictions of the black box to obtain the shape functions.
ŷ =
Decision Trees as Global Explanations
140
Model
Predictions
Black Box
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
[ Bastani et. al., 2019 ]
Model Agnostic
Customizable Decision Sets as Global Explanations
141
Model
Predictions
Black Box
Model
Label 1
Label 1
.
.
.
Label 2
.
v1, v2
.
.
v11, v12
.
Data
Explainer
Model Agnostic
[ Lakkaraju et. al., 2019 ]
142
Customizable Decision Sets as Global Explanations
Subgroup Descriptor
Decision Logic
[ Lakkaraju et. al., 2019 ]
143
Customizable Decision Sets as Global Explanations
Explain how the model
behaves across patient
subgroups with different
values of smoking and
exercise
[ Lakkaraju et. al., 2019 ]
144
Customizable Decision Sets as Global Explanations:
Desiderata & Optimization Problem
Fidelity
Describe model behavior accurately
Unambiguity
No contradicting explanations
Simplicity
Users should be able to look at the explanation
and reason about model behavior
Customizability
Users should be able to understand model
behavior across various subgroups of
interest
Fidelity
Minimize number of instances for which
explanation’s label ≠ model prediction
Unambiguity
Minimize the number of duplicate rules
applicable to each instance
Simplicity
Minimize the number of conditions in rules;
Constraints on number of rules & subgroups;
Customizability
Outer rules should only comprise of features
of user interest (candidate set restricted)
[ Lakkaraju et. al., 2019 ]
● The complete optimization problem is non-negative, non-normal,
non-monotone, and submodular with matroid constraints
● Solved using the well-known smooth local search algorithm (Feige
et. al., 2007) with best known optimality guarantees.
145
Customizable Decision Sets as Global Explanations
[ Lakkaraju et. al., 2019 ]
Local Explanations
• Feature Importances
• Rule Based
• Saliency Maps
• Prototypes/Example Based
• Counterfactuals
Approaches for Post hoc Explainability
146
Global Explanations
• Collection of Local Explanations
• Representation Based
• Model Distillation
• Summaries of Counterfactuals
Predictive
Model
Counterfactual Explanations
147
f(x)
Counterfactual Generation
Algorithm
DENIED
LOANS
RECOURSES
How do recourses permitted by the model vary
across various racial & gender subgroups?
Are there any biases against certain
demographics?
[ Rawal and Lakkaraju, 2020 ]
Decision Maker
(or) Regulatory Authority
Predictive
Model
Customizable Global Summaries of Counterfactuals
148
f(x)
Algorithm for generating
global summaries of
counterfactuals
DENIED
LOANS
How do recourses permitted by the model vary
across various racial & gender subgroups?
Are there any biases against certain
demographics?
[ Rawal and Lakkaraju, 2020 ]
149
Omg! this model is biased. It requires
certain demographics to “act upon” lot
more features than others.
Subgroup Descriptor
Recourse Rules
Customizable Global Summaries of Counterfactuals
[ Rawal and Lakkaraju, 2020 ]
150
Customizable Global Summaries of Counterfactuals:
Desiderata & Optimization Problem
Recourse Correctness
Prescribed recourses should obtain desirable outcomes
Recourse Correctness
Minimize number of applicants for whom prescribed recourse
does not lead to desired outcome
Recourse Coverage
Minimize number of applicants for whom recourse does not exist
(i.e., satisfy no rule).
Minimal Recourse Costs
Minimize total feature costs as well as magnitude of changes
in feature values
Interpretability of Summaries
Constraints on # of rules, # of conditions in rules & # of subgroups
Recourse Coverage
(Almost all) applicants should be provided with recourses
Minimal Recourse Costs
Acting upon a prescribed recourse
should not be impractical or terribly expensive
Interpretability of Summaries
Summaries should be readily understandable to
stakeholders (e.g., decision makers/regulatory authorities).
Customizability
Stakeholders should be able to understand model behavior
across various subgroups of interest
Customizability
Outer rules should only comprise of features of stakeholder interest
(candidate set restricted)
[ Rawal and Lakkaraju, 2020 ]
● The complete optimization problem is non-negative, non-normal,
non-monotone, and submodular with matroid constraints
● Solved using the well-known smooth local search algorithm (Feige
et. al., 2007) with best known optimality guarantees.
151
Customizable Global Summaries of Counterfactuals
[ Rawal and Lakkaraju, 2020 ]
Breakout Groups
• What concepts/ideas/approaches from our morning discussion stood out to you ?
• We discussed different basic units of interpretation -- prototypes, rules, risk
scores, shape functions (GAMs), feature importances
• Are some of these more suited to certain data modalities (e.g., tabular, images,
text) than others?
• What could be some potential vulnerabilities/drawbacks of inherently
interpretable models and post hoc explanation methods?
• Given the diversity of the methods we discussed, how do we go about evaluating
inherently interpretable models and post hoc explanation methods?
Agenda
153
❑ Inherently Interpretable Models
❑ Post hoc Explanation Methods
❑ Evaluating Model Interpretations/Explanations
❑ Empirically & Theoretically Analyzing Interpretations/Explanations
❑ Future of Model Understanding
Evaluating Model
Interpretations/Explanations
• Evaluating the meaningfulness or correctness of explanations
• Diverse ways of doing this depending on the type of model
interpretation/explanation
• Evaluating the interpretability of explanations
Evaluating Interpretability
155
[ Doshi-Velez and Kim, 2017 ]
Evaluating Interpretability
• Functionally-grounded evaluation: Quantitative metrics – e.g.,
number of rules, prototypes --> lower is better!
• Human-grounded evaluation: binary forced choice, forward
simulation/prediction, counterfactual simulation
• Application-grounded evaluation: Domain expert with exact
application task or simpler/partial task
Evaluating Inherently Interpretable
Models
• Evaluating the accuracy of the resulting model
• Evaluating the interpretability of the resulting model
• Do we need to evaluate the “correctness” or “meaningfulness” of
the resulting interpretations?
Evaluating Bayesian Rule Lists
• A rule list classifier for stroke prediction
[Letham et. al. 2016]
Evaluating Interpretable Decision Sets
• A decision set classifier for disease diagnosis
[Lakkaraju et. al. 2016]
Evaluating Interpretability of Bayesian
Rule Lists and Interpretable Decision Sets
• Number of rules, predicates etc. 🡪 lower is better!
• User studies to compare interpretable decision sets to Bayesian
Decision Lists (Letham et. al.)
• Each user is randomly assigned one of the two models
• 10 objective and 2 descriptive questions per user
160
Interface for Objective Questions
161
Interface for Descriptive Questions
162
User Study Results
163
Task Metrics Our
Approach
Bayesian
Decision
Lists
Descriptive Human
Accuracy
0.81 0.17
Avg. Time
Spent (secs.)
113.4 396.86
Avg. # of
Words
31.11 120.57
Objective Human
Accuracy
0.97 0.82
Avg. Time
Spent (secs.)
28.18 36.34
Objective Questions: 17% more accurate, 22% faster;
Descriptive Questions: 74% fewer words, 71% faster.
Evaluating Prototype and Attention Layers
• Are prototypes and attention weights always meaningful?
• Do attention weights correlate with other measures of feature
importance? E.g., gradients
• Would alternative attention weights yield different predictions?
[Jain and Wallace, 2019]
Evaluating Post hoc Explanations
• Evaluating the faithfulness (or correctness) of post hoc
explanations
• Evaluating the stability of post hoc explanations
• Evaluating the fairness of post hoc explanations
• Evaluating the interpretability of post hoc explanations
[Agarwal et. al., 2022]
Evaluating Faithfulness of Post hoc
Explanations – Ground Truth
166
Evaluating Faithfulness of Post hoc
Explanations – Ground Truth
167
Spearman rank correlation coefficient computed over features of
interest
Evaluating Faithfulness of Post hoc
Explanations – Explanations as Models
168
• If the explanation is itself a model (e.g., linear model fit by LIME),
we can compute the fraction of instances for which the labels
assigned by explanation model match those assigned by the
underlying model
Evaluating Faithfulness of Post hoc
Explanations
• What if we do not have any ground truth?
• What if explanations cannot be considered as models that output
predictions?
How important are selected features?
• Deletion: remove important features and see what happens..
170
% of Pixels deleted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Deletion: remove important features and see what happens..
171
% of Pixels deleted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Deletion: remove important features and see what happens..
172
% of Pixels deleted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Deletion: remove important features and see what happens..
173
% of Pixels deleted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Deletion: remove important features and see what happens..
174
% of Pixels deleted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Deletion: remove important features and see what happens..
175
% of Pixels deleted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Insertion: add important features and see what happens..
176
% of Pixels inserted
Prediction
Probability
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Insertion: add important features and see what happens..
177
Prediction
Probability
% of Pixels inserted
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Insertion: add important features and see what happens..
178
Prediction
Probability
% of Pixels inserted
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Insertion: add important features and see what happens..
179
Prediction
Probability
% of Pixels inserted
[ Qi, Khorram, Fuxin, 2020 ]
How important are selected features?
• Insertion: add important features and see what happens..
180
Prediction
Probability
% of Pixels inserted
[ Qi, Khorram, Fuxin, 2020 ]
Evaluating Stability of Post hoc
Explanations
• Are post hoc explanations unstable w.r.t. small input
perturbations?
[Alvarez-Melis, 2018; Agarwal et. al., 2022]
Local Lipschitz Constant
Input
Post hoc Explanation
max
Evaluating Stability of Post hoc
Explanations
• What if the underlying model itself is unstable?
• Relative Output Stability: Denominator accounts for changes in
the prediction probabilities
• Relative Representation Stability: Denominator accounts for
changes in the intermediate representations of the underlying
model
[Agarwal et. al., 2022]
Evaluating Fairness of Post hoc
Explanations
• Compute mean faithfulness/stability metrics for instances from
majority and minority groups (e.g., race A vs. race B, male vs.
female)
• If the difference between the two means is statistically
significant, then there is unfairness in the post hoc explanations
• Why/when can such unfairness occur?
[Dai et. al., 2022]
Evaluating Interpretability of Post hoc
Explanations
184
[ Doshi-Velez and Kim, 2017 ]
Predicting Behavior (“Simulation”)
Classifier
Predictions &
Explanations
Show to user
Data
Predictions
New
Data
User guesses what
the classifier would do
on new data
185
[ Ribeiro et al. 2018, Hase and Bansal 2020 ]
Compare Accuracy
Predicting Behavior (“Simulation”)
186
[ Poursabzi-Sangdeh et al. 2018 ]
Human-AI Collaboration
• Are Explanations Useful for Making Decisions?
• For tasks where the algorithms are not reliable by themselves
187
[ Lai and Tan, 2019 ]
Human-AI Collaboration
• Deception Detection: Identify fake reviews online
• Are Humans better detectors with explanations?
188
[ Lai and Tan, 2019 ]
https://machineintheloop.com/deception/
Can we improve the accuracy of decisions
using feature attribution-based explanations?
• Prediction Problem: Is a given patient likely to be diagnosed with
breast cancer within 2 years?
• User studies carried out with about 78 doctors (Residents,
Internal Medicine)
• Each doctor looks at 10 patient records from historical data and
makes predictions for each of them.
189
Lakkaraju et. al.,
2022
190
Accuracy:
78.32%
Accuracy:
93.11%
Accuracy:
82.02%
+
At Risk (0.91)
+
At Risk (0.91)
Important Features
ESR
Family Risk
Chronic Health
Conditions
Model Accuracy:
Can we improve the accuracy of decisions
using feature attribution-based explanations?
191
Accuracy:
78.32%
Accuracy:
82.02%
+
At Risk (0.91)
+
At Risk (0.91)
Model Accuracy:
Important Features
Appointment time
Appointment day
Zip code
Doctor ID > 150
Accuracy:
93.11%
Can we improve the accuracy of decisions
using feature attribution-based explanations?
Challenges of Evaluating Interpretable
Models/Post hoc Explanation Methods
●Evaluating interpretations/explanations still an ongoing endeavor
●Parameter settings heavily influence the resulting
interpretations/explanations
●Diversity of explanation/interpretation methods 🡪 diverse metrics
●User studies are not consistent
○Affected by choice of: UI, phrasing, visualization, population, incentives, …
●All the above leading to conflicting findings
192
●Interpretable models: https://github.com/interpretml/interpret
●Post hoc explanation methods: OpenXAI: https://open-xai.github.io/ -- 22
metrics (faithfulness, stability, fairness); public dashboards comparing
various metrics on different metrics; 11 lines of code to evaluate
explanation quality
●Other XAI libraries: Captum, quantus, shap bench, ERASER (NLP)
Open Source Tools for Quantitative
Evaluation
Agenda
194
❑ Inherently Interpretable Models
❑ Post hoc Explanation Methods
❑ Evaluating Model Interpretations/Explanations
❑ Empirically & Theoretically Analyzing Interpretations/Explanations
❑ Future of Model Understanding
Empirically Analyzing
Interpretations/Explanations
•Lot of recent focus on analyzing the behavior of post hoc explanation methods.
•Empirical studies analyzing the faithfulness, stability, fairness, adversarial
vulnerabilities, and utility of post hoc explanation methods.
•Several studies demonstrate limitations of existing post hoc methods.
Limitations: Faithfulness
196
Gradient ⊙
Input
Guided Backprop
Guided GradCAM
Model parameter randomization test
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
Limitations: Faithfulness
197
Gradient ⊙
Input
Guided Backprop
Guided GradCAM
Model parameter randomization test
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
Limitations: Faithfulness
198
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
Gradient ⊙
Input
Guided Backprop
Guided GradCAM
Model parameter randomization test
Limitations: Faithfulness
199
Randomizing class labels of instances
also didn’t impact explanations!
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
Limitations: Stability
200
Are post-hoc explanations unstable wrt small non-
adversarial input perturbation?
input model
hyperparameters
Local Lipschitz Constant
Input
Explanation function: LIME,
SHAP, Gradient...etc.
Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018.
Limitations: Stability
201
● Perturbation approaches like LIME can
be unstable.
Estimate for 100 tests for an MNIST Model.
Are post-hoc explanations unstable wrt small non-
adversarial input perturbation?
Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018.
202
[Slack et. al., 2020]
Many = 250 perturbations; Few = 25 perturbations;
When you repeatedly run LIME on the same instance, you get different explanations (blue region)
Problem with having too few perturbations?
If so, what is the optimal number of
perturbations?
Limitations: Stability – Problem is Worse!
203
Dombrowski et. al. 2019
Post-hoc Explanations are Fragile
Post-hoc explanations can be easily manipulated.
204
Dombrowski et. al. 2019
Post-hoc Explanations are Fragile
Post-hoc explanations can be easily manipulated.
205
Post-hoc explanations can be easily manipulated.
Dombrowski et. al. 2019
Post-hoc Explanations are Fragile
206
Post-hoc explanations can be easily manipulated.
Dombrowski et. al. 2019
Post-hoc Explanations are Fragile
207
Adversarial Attacks on Explanations
Adversarial Attack of Ghorbani et. al. 2018
Minimally modify the input with a small perturbation without
changing the model prediction.
208
Adversarial Attacks on Explanations
Adversarial Attack of Ghorbani et. al. 2018
Minimally modify the input with a small perturbation without
changing the model prediction.
209
Adversarial Attacks on Explanations
Adversarial Attack of Ghorbani et. al. 2018
Minimally modify the input with a small perturbation without
changing the model prediction.
210
Scaffolding attack used to hide classifier dependence on gender.
Slack and Hilgard et. al. 2020
Adversarial Classifiers to fool LIME & SHAP
Vulnerabilities of LIME/SHAP: Intuition
211
Several perturbed data points are out of distribution
(OOD)!
Vulnerabilities of LIME/SHAP: Intuition
212
Adversaries can exploit this and build a classifier that is
biased on in-sample data points and unbiased on OOD
samples!
Building Adversarial Classifiers
• Setting:
• Adversary wants to deploy a biased classifier f in real
world.
• E.g., uses only race to make decisions
• Adversary must provide black box access to customers
and regulators who may use post hoc techniques
(GDPR).
• Goal of adversary is to fool post hoc explanation
techniques and hide underlying biases of f
213
Building Adversarial Classifiers
• Input: Adversary provides us with the biased classifier f, an
input dataset X sampled from real world input distribution Xdist
• Output: Scaffolded classifier e which behaves exactly like f when
making predictions on instances sampled from Xdist but will not
reveal underlying biases of f when probed with perturbation-
based post hoc explanation techniques.
• e is the adversarial classifier
214
Building Adversarial Classifiers
• Adversarial classifier e can be defined as:
• f is the biased classifier input by adversary.
• is the unbiased classifier (e.g., only uses features uncorrelated
to sensitive attributes)
215
216
Limitations: Stability
Post-hoc explanations can be unstable to small, non-adversarial,
perturbations to the input.
Alvarez et. al. 2018.
217
Limitations: Stability
Post-hoc explanations can be unstable to small, non-adversarial,
perturbations to the input.
‘Local Lipschitz Constant’
Input
Explanation function: LIME, SHAP,
Gradient...etc.
Alvarez et. al. 2018.
218
Limitations: Stability
● Perturbation approaches like LIME
can be unstable.
Alvarez et. al. 2018.
Estimate for 100 tests for an MNIST Model.
219
Sensitivity to Hyperparameters
Bansal, Agarwal, & Nguyen, 2020.
Explanations can be highly
sensitive to hyperparameters
such as random seed, number
of perturbations, patch size, etc.
220
Utility: High fidelity explanations can mislead
Lakkaraju & Bastani 2019.
In a bail adjudication task, misleading high-fidelity explanations
improve end-user (domain experts) trust.
True Classifier relies on race
221
Lakkaraju & Bastani 2019.
True Classifier relies on race High fidelity ‘misleading’ explanation
In a bail adjudication task, misleading high-fidelity explanations
improve end-user (domain experts) trust.
Utility: High fidelity explanations can mislead
Utility: Post hoc Explanations Instill Over Trust
• Domain experts and end users seem to be over trusting
explanations & the underlying models based on explanations
• Data scientists over trusted explanations without even comprehending
them -- “Participants trusted the tools because of their visualizations and
their public availability”
222
[Kaur et. al., 2020; Bucinca et. al., 2020]
223
Responses from Data Scientists Using Explainability Tools
(GAM and SHAP)
[Kaur et. al., 2020]
224
Utility: Explanations for Debugging
Poursabzi-Sangdeh et. al. 2019
In a housing price prediction task, Amazon mechanical turkers are
unable to use linear model coefficients to diagnose model mistakes.
225
Adebayo et. al., 2020.
In a dog breeds classification task, users familiar with machine
learning rely on labels, instead of saliency maps, for diagnosing
model errors.
Utility: Explanations for Debugging
226
Adebayo et. al., 2020.
In a dog breeds classification task, users familiar with machine
learning rely on labels, instead of saliency maps, for diagnosing
model errors.
Utility: Explanations for Debugging
227
Conflicting Evidence on Utility of Explanations
● Mixed evidence:
● simulation and benchmark studies show that
explanations are useful for debugging;
● however, recent user studies show limited utility in
practice.
228
● Mixed evidence:
● simulation and benchmark studies show that
explanations are useful for debugging;
● however, recent user studies show limited utility in
practice.
● Rigorous user studies and pilots with end-users can
continue to help provide feedback to researchers on what
to address (see: Alqaraawi et. al. 2020, Bhatt et. al. 2020 &
Kaur et. al. 2020).
Conflicting Evidence on Utility of Explanations
Utility: Disagreement Problem in XAI
• Study to understand:
• if and how often feature attribution based explanation methods disagree
with each other in practice
• What constitutes disagreement between these explanations, and how to
formalize the notion of explanation disagreement based on practitioner
inputs?
• How do practitioners resolve explanation disagreement?
229
Krishna and Han et. al., 2022
• 30 minute semi-structured interviews with 25 data scientists
• 84% of participants said they often encountered disagreement
between explanation methods
• Characterizing disagreement:
• Top features are different
• Ordering among top features is different
• Direction of top feature contributions is different
• Relative ordering of features of interest is different
230
Practitioner Inputs on Explanation Disagreement
• Online user study where 25 users were shown explanations that
disagree and asked to make a choice, and explain why
• Practitioners are choosing methods due to:
• Associated theory or publication time (33%)
• Explanations matching human intuition better (32%)
• Type of data (23%)
• E.g., LIME or SHAP are better for tabular data
231
How do Practitioners Resolve Disagreements?
232
How do Practitioners Resolve Disagreements?
233
Empirical Analysis: Summary
● Faithfulness/Fidelity
■ Some explanation methods do not ‘reflect’ the underlying model.
● Fragility
■ Post-hoc explanations can be easily manipulated.
● Stability
■ Slight changes to inputs can cause large changes in explanations.
● Useful in practice?
Theoretically Analyzing Interpretable
Models
▪ Two main classes of theoretical results:
▪ Interpretable models learned using certain algorithms are certifiably
optimal
▪ E.g., rule lists (Angelino et. al., 2018)
▪ No accuracy-interpretability tradeoffs in certain settings
▪ E.g., reinforcement learning for mazes (Mansour et. al., 2022)
Theoretical Analysis of Tabular LIME w.r.t. Linear Models
• Theoretical analysis of LIME
• “black box” is a linear model
• data is tabular and discretized
• Obtained closed-form solutions of the average coefficients of the “surrogate” model
(explanation output by LIME)
• The coefficients obtained are proportional to the gradient of the function to be
explained
• Local error of surrogate model is bounded away from zero with high probability
235
[Garreau et. al., 2020]
Unification and Robustness of LIME and SmoothGrad
• C-LIME (a continuous variant of LIME) and SmoothGrad converge to the same
explanation in expectation
• At expectation, the resulting explanations are provably robust according to the
notion of Lipschitz continuity
• Finite sample complexity bounds for the number of perturbed samples required for
SmoothGrad and C-LIME to converge to their expected output
236
[Agarwal et. al., 2020]
Function Approximation Perspective to Characterizing
Post hoc Explanation Methods
237
[Han et. al., 2022]
• Various feature attribution methods (e.g., LIME, C-LIME,
KernelSHAP, Occlusion, Vanilla Gradients, Gradient times Input,
SmoothGrad, Integrated Gradients) are essentially local linear
function approximations.
• But…
Function Approximation Perspective to Characterizing
Post hoc Explanation Methods
238
[Han et. al., 2022]
• But, they adopt different loss functions, and local neighborhoods
Function Approximation Perspective to
Characterizing Post hoc Explanation Methods
• No Free Lunch Theorem for Explanation Methods: No single
method can perform optimally across all neighborhoods
239
Agenda
240
❑ Inherently Interpretable Models
❑ Post hoc Explanation Methods
❑ Evaluating Model Interpretations/Explanations
❑ Empirically & Theoretically Analyzing Interpretations/Explanations
❑ Future of Model Understanding
Future of Model Understanding
241
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Methods for More Reliable Post hoc
Explanations
242
❑ We have seen several limitations in the behavior of post hoc
explanation methods – e.g., unstable, inconsistent, fragile, not
faithful
❑ While there are already attempts to address some of these
limitations, more work is needed
Challenges with LIME: Stability
• Perturbation approaches like LIME/SHAP are unstable
243
Alvarez-Melis,
2018
Challenges with LIME: Consistency
244
Slack et. al.,
2020
Many = 250 perturbations; Few = 25 perturbations;
When you repeatedly run LIME on the same instance,
you get different explanations (blue region)
Challenges with LIME: Consistency
245
Problem with having too few perturbations?
What is the optimal number of perturbations?
Can we just use a very large number of
perturbations?
Challenges with LIME: Scalability
• Querying complex models (e.g., Inception Network, ResNet,
AlexNet) repeatedly for labels can be computationally
prohibitive
• Large number of perturbations 🡪 Large number of model queries
246
Generating reliable explanations using LIME can be
computationally expensive!
Explanations with Guarantees: BayesLIME
and BayesSHAP
• Intuition: Instead of point estimates of feature importances,
define these as distributions
247
BayesLIME and BayesSHAP
• Construct a Bayesian locally weighted regression that can
accommodate LIME/SHAP weighting functions
248
Feature
Importances
Perturbations
Weighting
Function
Black Box
Predictions
Priors on feature importances and feature importance
uncertainty
• Conjugacy results in following posteriors
• We can compute all parameters in closed form
BayesLIME and BayesSHAP: Inference
249
These are the same
equations used in
LIME & SHAP!
Estimating the Required Number of Perturbations
250
Estimate required number of
perturbations for user specified
uncertainty level.
Improving Efficiency: Focused Sampling
• Instead of sampling perturbations randomly and querying the
black box, choose points the learning algorithm is most uncertain
about and only query their labels from the black box.
251
This approach allows us to construct explanations with
user defined levels of confidence in an efficient manner!
Other Questions
• Can we construct post hoc explanations that are provably robust
to various adversarial attacks discussed earlier?
• Can we construct post hoc explanations that can guarantee
faithfulness, stability, and fairness simultaneously?
252
Future of Model Understanding
253
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Theoretical Analysis of the Behavior of
Explanations/Models
• We discussed some of the recent theoretical results earlier. Despite these, several
important questions remain unanswered
• Can we characterize the conditions under which each post hoc explanation method
(un)successfully captures the behavior of the underlying model?
• Given the properties of the underlying model, data distribution, can we theoretically
determine which explanation method should be employed?
• Can we theoretically analyze the nature of the prototypes/attention weights learned by
deep nets with added layers? When are these meaningful/when are they spurious?
254
Future of Model Understanding
255
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Empirical Analysis of Correctness/Utility
• While there is already a lot of work on empirical analysis of correctness/utility for
post hoc explanation methods, there is still no clear characterization of which
methods (if any) are correct/useful under what conditions.
• There is even less work on the empirical analysis of the correctness/utility of the
interpretations generated by inherently interpretable models. For instance, are
the prototypes generated by adding prototype layers correct/meaningful? Can
they be leveraged in any real world applications? What about attention weights?
256
Future of Model Understanding
257
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Characterizing Similarities and Differences
• Several post hoc explanation methods exist which employ diverse
algorithms and definitions of what constitutes an explanation, under what
conditions do these methods generate similar outputs (e.g., top K features) ?
• Multiple interpretable models which output natural/synthetic prototypes
(e.g., Li et. al, Chen et. al. etc.). When do they generate similar answers and
why?
258
Future of Model Understanding
259
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Model Understanding Beyond
Classification
• How to think about interpretability in the context of large language models and
foundation models? What is even feasible here?
• Already active work on interpretability in RL and GNNs. However, very little
research on analyzing the correctness/utility of these explanations.
• Given that primitive interpretable models/post hoc explanations suffer from so
many limitations, how to ensure explanations for more complex models are
reliable?
260
[Coppens et. al., 2019, Amir et. al. 2018]
[Ying et. al., 2019]
Future of Model Understanding
261
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Intersections with Model Robustness
• Are inherently interpretable models with prototype/attention layers more robust
than those without these layers? If so, why?
• Are there any inherent trade-offs between (certain kinds of) model interpretability
and model robustness? Or do these aspects reinforce each other?
• Prior works show that counterfactual explanation generation algorithms output
adversarial examples. What is the impact of adversarially robust models on these
explanations? [Pawelczyk et. al., 2022]
262
Future of Model Understanding
263
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Intersections with Model Fairness
• It is often hypothesized that model interpretations and explanations help unearth unfairness biases
of underlying models. However, there is little to no empirical research demonstrating this.
• Conducting more empirical evaluations and user studies to determine how interpetations and
explanations can complement statistical notions of fairness in identifying racial/gender biases
• How does the fairness (statistical) of inherently interpretable models compare with that of vanilla
models? Are there any inherent trade-offs between (certain kinds of) model interpretability and
model fairness? Or do these aspects reinforce each other?
264
Future of Model Understanding
265
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
Intersections with Differential Privacy
● Model interpretations and explanations could potentially expose
sensitive information from the datasets.
● Little to no research on the privacy implications of interpretable
models and/or explanations. What kinds of privacy attacks (e.g.,
membership inference, model inversion etc.) are enabled?
● Do differentially private models help thwart these attacks?If so,
under what conditions? Should we construct differentially
private explanations?
266
[Harder et. al., 2020; Patel et. al. 2020]
Future of Model Understanding
267
Methods for More Reliable
Post hoc Explanations
Theoretical Analysis of the Behavior of
Interpretable Models & Explanation Methods
Model Understanding
Beyond Classification
Intersections with Model Privacy
Intersections with Model Fairness
Empirical Evaluation of the Correctness &
Utility of Model Interpretations/Explanations
Intersections with Model Robustness
Characterizing Similarities and Differences
Between Various Methods
New Interfaces, Tools, Benchmarks for Model Understanding
New Interfaces, Tools, Benchmarks for
Model Understanding
● Can we construct more interactive interfaces for end users to
engage with models? What would be the nature of such
interactions? [demo]
● As model interpretations and explanations are employed in
different settings, we need to develop new benchmarks and tools
for enabling comparison of faithfulness, stability, fairness, utility
of various methods. How to enable that?
268
[Lakkaraju et. al., 2022, Slack et. al., 2022]
Some Parting Thoughts..
● There has been renewed interest in model understanding over the past
half decade, thanks to ML models being deployed in healthcare and other
high-stakes settings
● As ML models continue to get increasingly complex and they continue to
find more applications, the need for model understanding is only going
to raise
● Lots of interesting and open problems waiting to be solved
● You can approach the field of XAI from diverse perspectives: theory,
algorithms, HCI, or interdisciplinary research – there is room for
everyone! ☺
269
Thank You!
• Email: hlakkaraju@hbs.edu; hlakkaraju@seas.harvard.edu;
• Course on interpretability and explainability: https://interpretable-ml-class.github.io/
• More tutorials on interpretability and explainability: https://explainml-tutorial.github.io/
• Trustworthy ML Initiative: https://www.trustworthyml.org/
• Lots of resources and seminar series on topics related to explainability, fairness, adversarial
robustness, differential privacy, causality etc.
• Acknowledgements: Special thanks to Julius Adebayo, Chirag Agarwal, Shalmali Joshi, and Sameer
Singh for co-developing and co-presenting sub-parts of this tutorial at NeurIPS, AAAI, and FAccT
conferences.
270
Hima_Lakkaraju_XAI_ShortCourse.pptx

More Related Content

What's hot

Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Krishnaram Kenthapadi
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AIErika Agostinelli
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Sri Ambati
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Sri Ambati
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Saurabh Kaushik
 
Explainable AI
Explainable AIExplainable AI
Explainable AIDinesh V
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Krishnaram Kenthapadi
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Krishnaram Kenthapadi
 
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...SlideTeam
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learningDaiki Tanaka
 

What's hot (20)

Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Explainable AI (XAI)
Explainable AI (XAI)Explainable AI (XAI)
Explainable AI (XAI)
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
 
Shap
ShapShap
Shap
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)
 
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
eScience SHAP talk
eScience SHAP talkeScience SHAP talk
eScience SHAP talk
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
 

Similar to Hima_Lakkaraju_XAI_ShortCourse.pptx

AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model RiskQuantUniversity
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcarevonaurum
 
Model risk management: a primer
Model risk management: a primerModel risk management: a primer
Model risk management: a primerImanuel Costigan
 
Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...
Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...
Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...Office of Health Economics
 
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docxhttphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docxadampcarr67227
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulationKarishma Chaudhary
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
FitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk PresentationFitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk PresentationQuantUniversity
 
Model Management for FP&A
Model Management for FP&AModel Management for FP&A
Model Management for FP&ARob Trippe
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in PracticeHobsons
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010TEST Huddle
 
Environmental Pollution RecommendationThere is a concern in yo.docx
Environmental Pollution RecommendationThere is a concern in yo.docxEnvironmental Pollution RecommendationThere is a concern in yo.docx
Environmental Pollution RecommendationThere is a concern in yo.docxSALU18
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorRudradeb Mitra
 
AI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use CasesAI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use CasesHealth Catalyst
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDatabricks
 
OR chapter 1.pdf
OR chapter 1.pdfOR chapter 1.pdf
OR chapter 1.pdfAlexHayme
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 

Similar to Hima_Lakkaraju_XAI_ShortCourse.pptx (20)

AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
 
Model risk management: a primer
Model risk management: a primerModel risk management: a primer
Model risk management: a primer
 
Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...
Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...
Building Bridges Not Walls: Can We Develop Sustainable and Sharable Cost-Effe...
 
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docxhttphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulation
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
FitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk PresentationFitchLearning QuantUniversity Model Risk Presentation
FitchLearning QuantUniversity Model Risk Presentation
 
Model Management for FP&A
Model Management for FP&AModel Management for FP&A
Model Management for FP&A
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Model Risk Aggregation
Model Risk AggregationModel Risk Aggregation
Model Risk Aggregation
 
Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in Practice
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
 
Environmental Pollution RecommendationThere is a concern in yo.docx
Environmental Pollution RecommendationThere is a concern in yo.docxEnvironmental Pollution RecommendationThere is a concern in yo.docx
Environmental Pollution RecommendationThere is a concern in yo.docx
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
 
AI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use CasesAI in Healthcare: Real-World Machine Learning Use Cases
AI in Healthcare: Real-World Machine Learning Use Cases
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
 
OR chapter 1.pdf
OR chapter 1.pdfOR chapter 1.pdf
OR chapter 1.pdf
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Hima_Lakkaraju_XAI_ShortCourse.pptx

  • 1. Interpreting Machine Learning Models: State-of-the-art, Challenges, Opportunities Hima Lakkaraju
  • 2. Schedule for Today • 9am to 1020am: Introduction; Overview of Inherently Interpretable Models • 1020am to 1040am: Break • 1040am to 12pm: Overview of Post hoc Explanation Methods • 12pm to 1pm: Lunch • 105pm to 125pm: Breakout Groups • 125pm to 245pm: Evaluating and Analyzing Model Interpretations and Explanations • 245pm to 3pm: Break • 3pm to 4pm: Analyzing Model Interpretations and Explanations, and Future Research Directions
  • 3. Motivation 3 Machine Learning is EVERYWHERE!! [ Weller 2017 ]
  • 4. Is Model Understanding Needed Everywhere? 4 [ Weller 2017 ]
  • 5. When and Why Model Understanding? • Not all applications require model understanding • E.g., ad/product/friend recommendations • No human intervention • Model understanding not needed because: • Little to no consequences for incorrect predictions • Problem is well studied and models are extensively validated in real- world applications 🡪 trust model predictions 5 [ Weller 2017, Lipton 2017, Doshi-Velez and Kim 2016 ]
  • 6. When and Why Model Understanding? ML is increasingly being employed in complex high-stakes settings 6
  • 7. When and Why Model Understanding? • High-stakes decision-making settings • Impact on human lives/health/finances • Settings relatively less well studied, models not extensively validated • Accuracy alone is no longer enough • Train/test data may not be representative of data encountered in practice • Auxiliary criteria are also critical: • Nondiscrimination • Right to explanation • Safety
  • 8. When and Why Model Understanding? • Auxiliary criteria are often hard to quantify (completely) • E.g.: Impossible to predict/enumerate all scenarios violating safety of an autonomous car • Incompleteness in problem formalization • Hinders optimization and evaluation • Incompleteness ≠ Uncertainty; Uncertainty can be quantified
  • 9. When and Why Model Understanding? 9 Model understanding becomes critical when: (i) models not extensively validated in applications; train/test data not representative of real time data (i) key criteria are hard to quantify, and we need to rely on a “you will know it when you see it” approach
  • 10. Example: Why Model Understanding? 10 Predictive Model Input Prediction = Siberian Husky Model Understanding This model is relying on incorrect features to make this prediction!! Let me fix the model Model understanding facilitates debugging.
  • 11. Example: Why Model Understanding? 11 Predictive Model Defendant Details Prediction = Risky to Release Model Understanding Race Crimes Gender This prediction is biased. Race and gender are being used to make the prediction!! Model understanding facilitates bias detection. [ Larson et. al. 2016 ]
  • 12. Example: Why Model Understanding? 12 Predictive Model Loan Applicant Details Prediction = Denied Loan Model Understanding Increase salary by 50K + pay credit card bills on time for next 3 months to get a loan Loan Applicant I have some means for recourse. Let me go and work on my promotion and pay my bills on time. Model understanding helps provide recourse to individuals who are adversely affected by model predictions.
  • 13. Example: Why Model Understanding? 13 Predictive Model Patient Data Model Understanding This model is using irrelevant features when predicting on female subpopulation. I should not trust its predictions for that group. Predictions 25, Female, Cold 32, Male, No 31, Male, Cough . . . . Healthy Sick Sick . . Healthy Healthy Sick If gender = female, if ID_num > 200, then sick If gender = male, if cold = true and cough = true, then sick Model understanding helps assess if and when to trust model predictions when making decisions.
  • 14. Example: Why Model Understanding? 14 Predictive Model Patient Data Model Understanding This model is using irrelevant features when predicting on female subpopulation. This cannot be approved! Predictions 25, Female, Cold 32, Male, No 31, Male, Cough . . . . Healthy Sick Sick . . Healthy Healthy Sick If gender = female, if ID_num > 200, then sick If gender = male, if cold = true and cough = true, then sick Model understanding allows us to vet models to determine if they are suitable for deployment in real world.
  • 15. Summary: Why Model Understanding? 15 Debugging Bias Detection Recourse If and when to trust model predictions Vet models to assess suitability for deployment Utility End users (e.g., loan applicants) Decision makers (e.g., doctors, judges) Regulatory agencies (e.g., FDA, European commission) Researchers and engineers Stakeholders
  • 16. Achieving Model Understanding Take 1: Build inherently interpretable predictive models 16 [ Letham and Rudin 2015; Lakkaraju et. al. 2016 ]
  • 17. Achieving Model Understanding Take 2: Explain pre-built models in a post-hoc manner 17 Explainer [ Ribeiro et. al. 2016, Ribeiro et al. 2018; Lakkaraju et. al. 2019 ]
  • 18. Inherently Interpretable Models vs. Post hoc Explanations In certain settings, accuracy-interpretability trade offs may exist. 18 Example [ Cireşan et. al. 2012, Caruana et. al. 2006, Frosst et. al. 2017, Stewart 2020 ]
  • 19. Inherently Interpretable Models vs. Post hoc Explanations 19 complex models might achieve higher accuracy can build interpretable + accurate models
  • 20. Sometimes, you don’t have enough data to build your model from scratch. And, all you have is a (proprietary) black box! Inherently Interpretable Models vs. Post hoc Explanations 20 [ Ribeiro et. al. 2016 ]
  • 21. Inherently Interpretable Models vs. Post hoc Explanations 21 If you can build an interpretable model which is also adequately accurate for your setting, DO IT! Otherwise, post hoc explanations come to the rescue!
  • 22. Agenda ❑ Inherently Interpretable Models ❑ Post hoc Explanation Methods ❑ Evaluating Model Interpretations/Explanations ❑ Empirically & Theoretically Analyzing Interpretations/Explanations ❑ Future of Model Understanding 22
  • 23. Agenda 23 ❑ Inherently Interpretable Models ❑ Post hoc Explanation Methods ❑ Evaluating Model Interpretations/Explanations ❑ Empirically & Theoretically Analyzing Interpretations/Explanations ❑ Future of Model Understanding
  • 24. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models
  • 25. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models • Attention Based Models
  • 26. Bayesian Rule Lists • A rule list classifier for stroke prediction [Letham et. al. 2016]
  • 27. Bayesian Rule Lists • A generative model designed to produce rule lists (if/else-if) that strike a balance between accuracy, interpretability, and computation • What about using other similar models? • Decision trees (CART, C5.0 etc.) • They employ greedy construction methods • Not computationally demanding but affects quality of solution – both accuracy and interpretability 27
  • 28. Bayesian Rule Lists: Generative Model 28 is a set of pre-mined antecedents Model parameters are inferred using the Metropolis-Hastings algorithm which is a Markov Chain Monte Carlo (MCMC) Sampling method
  • 29. Pre-mined Antecedents • A major source of practical feasibility: pre-mined antecedents • Reduces model space • Complexity of problem depends on number of pre-mined antecedents • As long as pre-mined set is expressive, accurate decision list can be found + smaller model space means better generalization (Vapnik, 1995) 29
  • 30. Interpretable Decision Sets • A decision set classifier for disease diagnosis [Lakkaraju et. al. 2016]
  • 31. Interpretable Decision Sets: Desiderata • Optimize for the following criteria • Recall • Precision • Distinctness • Parsimony • Class Coverage • Recall and Precision 🡪 Accurate predications • Distinctness, Parsimony, and Class Coverage 🡪Interpretability
  • 38. IDS: Optimization Procedure 38 • The problem is a non-normal, non-monotone, submodular optimization problem • Maximizing a non-monotone submodular function is NP-hard • Local search method which iteratively adds and removes elements until convergence • Provides a 2/5 approximation
  • 39. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models • Attention Based Models
  • 40. Risk Scores: Motivation • Risk scores are widely used in medicine and criminal justice • E.g., assess risk of mortality in ICU, assess the risk of recidivism • Adoption 🡪 decision makers find them easy to understand • Until very recently, risk scores were constructed manually by domain experts. Can we learn these in a data-driven fashion? 40
  • 41. Risk Scores: Examples 41 • Recidivism • Loan Default [Ustun and Rudin, 2016]
  • 42. Objective function to learn risk scores 42 Above turns out to be a mixed integer program, and is optimized using a cutting plane method and a branch-and-bound technique.
  • 43. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models • Attention Based Models
  • 44. Generalized Additive Models (GAMs) 44 [Lou et. al., 2012; Caruana et. al., 2015]
  • 45. Formulation and Characteristics of GAMs 45 g is a link function; E.g., identity function in case of regression; log (y/1 – y) in case of classification; fi is a shape function
  • 46. GAMs and GA2Ms • While GAMs model first order terms, GA2Ms model second order feature interactions as well.
  • 47. GAMs and GA2Ms • Learning: • Represent each component as a spline • Least squares formulation; Optimization problem to balance smoothness and empirical error • GA2Ms: Build GAM first and and then detect and rank all possible pairs of interactions in the residual • Choose top k pairs • k determined by CV 47
  • 48. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models • Attention Based Models
  • 49. Prototype Selection for Interpretable Classification • The goal here is to identify K prototypes (instances) from the data s.t. a new instance which will be assigned the same label as the closest prototype will be correctly classified (with a high probability) • Let each instance “cover” the ϵ - neighborhood around it. • Once we define the neighborhood covered by each instance, this problem becomes similar to the problem of finding rule sets, and can be solved analogously. [Bien et. al., 2012]
  • 50. Prototype Selection for Interpretable Classification
  • 51. Prototype Layers in Deep Learning Models [Li et. al. 2017, Chen et. al. 2019]
  • 52. Prototype Layers in Deep Learning Models
  • 53. Prototype Layers in Deep Learning Models
  • 54. Prototype Layers in Deep Learning Models
  • 55. Prototype Layers in Deep Learning Models
  • 56. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models • Attention Based Models
  • 57. Attention Layers in Deep Learning Models • Let us consider the example of machine translation [Bahdanau et. al. 2016; Xu et. al. 2015] Input Encoder Context Vector Decoder Outputs I am Bob h1 h2 h3 C s1 s2 s3 Je suis Bob
  • 58. Attention Layers in Deep Learning Models • Let us consider the example of machine translation [Bahdanau et. al. 2016; Xu et. al. 2015] Input Encoder Context Vector Decoder Outputs I am Bob h1 h2 h3 s1 s2 s3 Je suis Bob c1 c2 c3
  • 59. Attention Layers in Deep Learning Models • Context vector corresponding to si can be written as follows: • captures the attention placed on input token j when determining the decoder hidden state si; it can be computed as a softmax of the “match” between si-1 and hj
  • 60. Inherently Interpretable Models • Rule Based Models • Risk Scores • Generalized Additive Models • Prototype Based Models • Attention Based Models
  • 61. Agenda 61 ❑ Inherently Interpretable Models ❑ Post hoc Explanation Methods ❑ Evaluating Model Interpretations/Explanations ❑ Empirically & Theoretically Analyzing Interpretations/Explanations ❑ Future of Model Understanding
  • 62. What is an Explanation? 62
  • 63. What is an Explanation? Definition: Interpretable description of the model behavior 63 Classifier User Explanation Faithful Understandable
  • 64. Summarize with a program/rule/tree Classifier What is an Explanation? Definition: Interpretable description of the model behavior 64 User Send all the model parameters θ? Send many example predictions? Select most important features/points Describe how to flip the model prediction ... [ Lipton 2016 ]
  • 65. Local Explanations vs. Global Explanations 65 Explain individual predictions Explain complete behavior of the model Help unearth biases in the local neighborhood of a given instance Help shed light on big picture biases affecting larger subgroups Help vet if individual predictions are being made for the right reasons Help vet if the model, at a high level, is suitable for deployment
  • 66. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 66 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 67. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 67 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 68. LIME: Local Interpretable Model-Agnostic Explanations 68 1. Sample points around xi [ Ribeiro et al. 2016 ]
  • 69. LIME: Local Interpretable Model-Agnostic Explanations 69 1. Sample points around xi 2. Use model to predict labels for each sample
  • 70. LIME: Local Interpretable Model-Agnostic Explanations 70 1. Sample points around xi 2. Use model to predict labels for each sample 3. Weigh samples according to distance to xi
  • 71. LIME: Local Interpretable Model-Agnostic Explanations 71 1. Sample points around xi 2. Use model to predict labels for each sample 3. Weigh samples according to distance to xi 4. Learn simple linear model on weighted samples
  • 72. LIME: Local Interpretable Model-Agnostic Explanations 72 1. Sample points around xi 2. Use model to predict labels for each sample 3. Weigh samples according to distance to xi 4. Learn simple linear model on weighted samples 5. Use simple linear model to explain
  • 73. Only 1 mistake! Predict Wolf vs Husky 73 [ Ribeiro et al. 2016 ]
  • 74. Predict Wolf vs Husky We’ve built a great snow detector… 74 [ Ribeiro et al. 2016 ]
  • 75. SHAP: Shapley Values as Importance Marginal contribution of each feature towards the prediction, averaged over all possible permutations. 75 xi P(y) = 0.9 xi P(y) = 0.8 M(xi, O) = 0.1 O O/xi Attributes the prediction to each of the features. [ Lundberg & Lee 2017 ]
  • 76. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 76 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 77. Anchors [ Ribeiro et al. 2018 ] • Perturb a given instance x to generate local neighborhood • Identify an “anchor” rule which has the maximum coverage of the local neighborhood and also achieves a high precision.
  • 79. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 79 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 80. Saliency Map Overview 80 Input Model Predictions Junco Bird
  • 81. 81 What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’? Input Model Predictions Junco Bird Saliency Map Overview
  • 82. 82 What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’? Input Model Predictions Junco Bird Saliency Map Overview
  • 83. Modern DNN Setting 83 Model class specific logit Input Model Predictions Junco Bird
  • 84. Input-Gradient 84 Input-Gradient Logit Input Same dimension as the input. Input Model Predictions Junco Bird Baehrens et. al. 2010; Simonyan et. al. 2014 .
  • 85. Input-Gradient 85 Input Model Predictions Junco Bird Baehrens et. al. 2010; Simonyan et. al. 2014 . Input-Gradient Logit Visualize as a heatmap Input
  • 86. Input-Gradient 86 Input Model Predictions Junco Bird Input-Gradient Logit Input Challenges ● Visually noisy & difficult to interpret. ● ‘Gradient saturation.’ Shrikumar et. al. 2017. Baehrens et. al. 2010; Simonyan et. al. 2014 .
  • 87. SmoothGrad 87 Input Model Predictions Junco Bird Smilkov et. al. 2017 SmoothGrad Gaussian noise Average Input-gradient of ‘noisy’ inputs.
  • 88. SmoothGrad 88 Input Model Predictions Junco Bird Smilkov et. al. 2017 SmoothGrad Gaussian noise Average Input-gradient of ‘noisy’ inputs.
  • 89. Integrated Gradients 89 Input Model Predictions Junco Bird Baseline input Path integral: ‘sum’ of interpolated gradients Sundararajan et. al. 2017
  • 90. Integrated Gradients 90 Input Model Predictions Junco Bird Path integral: ‘sum’ of interpolated gradients Sundararajan et. al. 2017 Baseline input
  • 91. Gradient-Input 91 Input Model Predictions Junco Bird Gradient-Input Input gradient Input Element-wise product of input- gradient and input. Shrikumar et. al. 2017, Ancona et. al. 2018.
  • 92. Gradient-Input 92 Input Model Predictions Junco Bird Gradient-Input logit gradient Input Shrikumar et. al. 2017, Ancona et. al. 2018. Element-wise product of input- gradient and input.
  • 93. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 93 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 94. Prototypes/Example Based Post hoc Explanations Use examples (synthetic or natural) to explain individual predictions ◆ Influence Functions (Koh & Liang 2017) ● Identify instances in the training set that are responsible for the prediction of a given test instance ◆ Activation Maximization (Erhan et al. 2009) ● Identify examples (synthetic or natural) that strongly activate a function (neuron) of interest 94
  • 95. 95 Input Model Predictions Junco Bird Training Point Ranking via Influence Functions Which training data points have the most ‘influence’ on the test loss?
  • 96. 96 Input Model Predictions Junco Bird Which training data points have the most ‘influence’ on the test loss? Training Point Ranking via Influence Functions
  • 97. 97 Training Point Ranking via Influence Functions Influence Function: classic tool used in robust statistics for assessing the effect of a sample on regression parameters (Cook & Weisberg, 1980). Instead of refitting model for every data point, Cook’s distance provides analytical alternative.
  • 98. 98 Training Point Ranking via Influence Functions Koh & Liang 2017 Training sample point Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
  • 99. 99 Training Point Ranking via Influence Functions Koh & Liang 2017 Training sample point ERM Solution Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting. UpWeighted ERM Solution
  • 100. 100 Training Point Ranking via Influence Functions Koh & Liang 2017 Training sample point ERM Solution Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting. UpWeighted ERM Solution Influence of Training Point on Parameters Influence of Training Point on Test-Input’s loss
  • 101. 101 Training Point Ranking via Influence Functions Applications: • compute self-influence to identify mislabelled examples; • diagnose possible domain mismatch; • craft training-time poisoning examples. [ Koh & Liang 2017 ]
  • 102. 102 Challenges and Other Approaches Influence function Challenges: 1. scalability: computing hessian-vector products can be tedious in practice. 1. non-convexity: possibly loose approximation for deeper networks (Basu et. al. 2020).
  • 103. 103 Challenges and Other Approaches Influence function Challenges: 1. scalability: computing hessian-vector products can be tedious in practice. 1. non-convexity: possibly loose approximation for ‘deeper’ networks (Basu et. al. 2020). Alternatives: • Representer Points (Yeh et. al. 2018). • TracIn (Pruthi et. al. appearing at NeuRIPs 2020).
  • 104. Activation Maximization 104 These approaches identify examples, synthetic or natural, that strongly activate a function (neuron) of interest.
  • 105. Activation Maximization 105 These approaches identify examples, synthetic or natural, that strongly activate a function (neuron) of interest. Implementation Flavors: • Search for natural examples within a specified set (training or validation corpus) that strongly activate a neuron of interest; • Synthesize examples, typically via gradient descent, that strongly activate a neuron of interest.
  • 107. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 107 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 108. Counterfactual Explanations What features need to be changed and by how much to flip a model’s prediction? 108 [Goyal et. al., 2019]
  • 109. Counterfactual Explanations As ML models increasingly deployed to make high-stakes decisions (e.g., loan applications), it becomes important to provide recourse to affected individuals. 109 Counterfactual Explanations What features need to be changed and by how much to flip a model’s prediction ? (i.e., to reverse an unfavorable outcome).
  • 110. Predictive Model Deny Loan Loan Application Counterfactual Explanations 110 Recourse: Increase your salary by 5K & pay your credit card bills on time for next 3 months f(x) Applicant Counterfactual Generation Algorithm
  • 111. Generating Counterfactual Explanations: Intuition 111 Proposed solutions differ on: 1. How to choose among candidate counterfactuals? 1. How much access is needed to the underlying predictive model?
  • 112. Take 1: Minimum Distance Counterfactuals 112 Distance Metric Predictive Model Desired Outcome [ Wachter et. al., 2018 ] Original Instance Counterfactual Choice of distance metric dictates what kinds of counterfactuals are chosen. Wachter et. al. use normalized Manhattan distance.
  • 113. Take 1: Minimum Distance Counterfactuals 113 Wachter et. al. solve a differentiable, unconstrained version of the objective using ADAM optimization algorithm with random restarts. This method requires access to gradients of the underlying predictive model. [ Wachter et. al., 2018 ]
  • 114. Take 1: Minimum Distance Counterfactuals 114 Not feasible to act upon these features!
  • 115. Take 2: Feasible and Least Cost Counterfactuals • is the set of feasible counterfactuals (input by end user) • E.g., changes to race, gender are not feasible • Cost is modeled as total log-percentile shift • Changes become harder when starting off from a higher percentile value 115 [ Ustun et. al., 2019 ]
  • 116. • Ustun et. al. only consider the case where the model is a linear classifier • Objective formulated as an IP and optimized using CPLEX • Requires complete access to the linear classifier i.e., weight vector 116 Take 2: Feasible and Least Cost Counterfactuals [ Ustun et. al., 2019 ]
  • 117. 117 Question: What if we have a black box or a non-linear classifier? Answer: generate a local linear model approximation (e.g., using LIME) and then apply Ustun et. al.’s framework Take 2: Feasible and Least Cost Counterfactuals [ Ustun et. al., 2019 ]
  • 118. 118 Changing one feature without affecting another might not be possible! Take 2: Feasible and Least Cost Counterfactuals [ Ustun et. al., 2019 ]
  • 119. After 1 year Take 3: Causally Feasible Counterfactuals 119 Recourse: Reduce current debt from 3250$ to 1000$ My current debt has reduced to 1000$. Please give me loan. Loan Applicant f(x) Your age increased by 1 year and the recourse is no longer valid! Sorry! Important to account for feature interactions when generating counterfactuals! But how?! Loan Applicant Predictive Model [ Mahajan et. al., 2019, Karimi et. al. 2020 ]
  • 120. 120 Take 3: Causally Feasible Counterfactuals [ Ustun et. al., 2019 ] is the set of causally feasible counterfactuals permitted according to a given Structural Causal Model (SCM). Question: What if we don’t have access to the structural causal model?
  • 121. Counterfactuals on Data Manifold 121 • Generated counterfactuals should lie on the data manifold • Construct Variational Autoencoders (VAEs) to map input instances to latent space • Search for counterfactuals in the latent space • Once a counterfactual is found, map it back to the input space using the decoder [ Verma et. al., 2020, Pawelczyk et. al., 2020]
  • 122. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 122 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 123. Global Explanations ● Explain the complete behavior of a given (black box) model ○ Provide a bird’s eye view of model behavior ● Help detect big picture model biases persistent across larger subgroups of the population ○ Impractical to manually inspect local explanations of several instances to ascertain big picture biases! ● Global explanations are complementary to local explanations 123
  • 124. Local vs. Global Explanations 124 Explain individual predictions Help unearth biases in the local neighborhood of a given instance Help vet if individual predictions are being made for the right reasons Explain complete behavior of the model Help shed light on big picture biases affecting larger subgroups Help vet if the model, at a high level, is suitable for deployment
  • 125. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 125 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 126. Global Explanation as a Collection of Local Explanations How to generate a global explanation of a (black box) model? • Generate a local explanation for every instance in the data using one of the approaches discussed earlier • Pick a subset of k local explanations to constitute the global explanation 126 What local explanation technique to use? How to choose the subset of k local explanations?
  • 127. Global Explanations from Local Feature Importances: SP-LIME LIME explains a single prediction local behavior for a single instance Can’t examine all explanations Instead pick k explanations to show to the user Diverse Should not be redundant in their descriptions Representative Should summarize the model’s global behavior Single explanation SP-LIME uses submodular optimization and greedily picks k explanations 83 Model Agnostic [ Ribeiro et al. 2016 ]
  • 128. 84 Global Explanations from Local Rule Sets: SP-Anchor • Use Anchors algorithm discussed earlier to obtain local rule sets for every instance in the data • Use the same procedure to greedily select a subset of k local rule sets to correspond to the global explanation [ Ribeiro et al. 2018 ]
  • 129. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 129 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 130. 130 Representation Based Approaches • Derive model understanding by analyzing intermediate representations of a DNN. • Determine model’s reliance on ‘concepts’ that are semantically meaningful to humans.
  • 131. Representation Based Explanations 131 [Kim et. al., 2018] Zebra (0.97) How important is the notion of “stripes” for this prediction?
  • 132. Representation Based Explanations: TCAV 132 Examples of the concept “stripes” Random examples Train a linear classifier to separate activations The vector orthogonal to the decision boundary pointing towards the “stripes” class quantifies the concept “stripes” Compute derivatives by leveraging this vector to determine the importance of the notion of stripes for any given prediction
  • 133. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 134 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 134. Model Distillation for Generating Global Explanations 135 Model Predictions Predictive Model Label 1 Label 1 . . . Label 2 . v1, v2 . . v11, v12 . Data Explainer Simpler, interpretable model which is optimized to mimic the model predictions f(x)
  • 135. Generalized Additive Models as Global Explanations 136 Model Predictions Black Box Model Label 1 Label 1 . . . Label 2 . v1, v2 . . v11, v12 . Data Explainer [Tan et. al., 2019] Model Agnostic
  • 136. Generalized Additive Models as Global Explanations: Shape Functions for Predicting Bike Demand 137 [Tan et. al., 2019]
  • 137. Generalized Additive Models as Global Explanations: Shape Functions for Predicting Bike Demand How does bike demand vary as a function of temperature? 138 [Tan et. al., 2019]
  • 138. Generalized Additive Models as Global Explanations Generalized Additive Model (GAM) : 139 [Tan et. al., 2019] Shape functions of individual features Higher order feature interaction terms Fit this model to the predictions of the black box to obtain the shape functions. ŷ =
  • 139. Decision Trees as Global Explanations 140 Model Predictions Black Box Model Label 1 Label 1 . . . Label 2 . v1, v2 . . v11, v12 . Data Explainer [ Bastani et. al., 2019 ] Model Agnostic
  • 140. Customizable Decision Sets as Global Explanations 141 Model Predictions Black Box Model Label 1 Label 1 . . . Label 2 . v1, v2 . . v11, v12 . Data Explainer Model Agnostic [ Lakkaraju et. al., 2019 ]
  • 141. 142 Customizable Decision Sets as Global Explanations Subgroup Descriptor Decision Logic [ Lakkaraju et. al., 2019 ]
  • 142. 143 Customizable Decision Sets as Global Explanations Explain how the model behaves across patient subgroups with different values of smoking and exercise [ Lakkaraju et. al., 2019 ]
  • 143. 144 Customizable Decision Sets as Global Explanations: Desiderata & Optimization Problem Fidelity Describe model behavior accurately Unambiguity No contradicting explanations Simplicity Users should be able to look at the explanation and reason about model behavior Customizability Users should be able to understand model behavior across various subgroups of interest Fidelity Minimize number of instances for which explanation’s label ≠ model prediction Unambiguity Minimize the number of duplicate rules applicable to each instance Simplicity Minimize the number of conditions in rules; Constraints on number of rules & subgroups; Customizability Outer rules should only comprise of features of user interest (candidate set restricted) [ Lakkaraju et. al., 2019 ]
  • 144. ● The complete optimization problem is non-negative, non-normal, non-monotone, and submodular with matroid constraints ● Solved using the well-known smooth local search algorithm (Feige et. al., 2007) with best known optimality guarantees. 145 Customizable Decision Sets as Global Explanations [ Lakkaraju et. al., 2019 ]
  • 145. Local Explanations • Feature Importances • Rule Based • Saliency Maps • Prototypes/Example Based • Counterfactuals Approaches for Post hoc Explainability 146 Global Explanations • Collection of Local Explanations • Representation Based • Model Distillation • Summaries of Counterfactuals
  • 146. Predictive Model Counterfactual Explanations 147 f(x) Counterfactual Generation Algorithm DENIED LOANS RECOURSES How do recourses permitted by the model vary across various racial & gender subgroups? Are there any biases against certain demographics? [ Rawal and Lakkaraju, 2020 ] Decision Maker (or) Regulatory Authority
  • 147. Predictive Model Customizable Global Summaries of Counterfactuals 148 f(x) Algorithm for generating global summaries of counterfactuals DENIED LOANS How do recourses permitted by the model vary across various racial & gender subgroups? Are there any biases against certain demographics? [ Rawal and Lakkaraju, 2020 ]
  • 148. 149 Omg! this model is biased. It requires certain demographics to “act upon” lot more features than others. Subgroup Descriptor Recourse Rules Customizable Global Summaries of Counterfactuals [ Rawal and Lakkaraju, 2020 ]
  • 149. 150 Customizable Global Summaries of Counterfactuals: Desiderata & Optimization Problem Recourse Correctness Prescribed recourses should obtain desirable outcomes Recourse Correctness Minimize number of applicants for whom prescribed recourse does not lead to desired outcome Recourse Coverage Minimize number of applicants for whom recourse does not exist (i.e., satisfy no rule). Minimal Recourse Costs Minimize total feature costs as well as magnitude of changes in feature values Interpretability of Summaries Constraints on # of rules, # of conditions in rules & # of subgroups Recourse Coverage (Almost all) applicants should be provided with recourses Minimal Recourse Costs Acting upon a prescribed recourse should not be impractical or terribly expensive Interpretability of Summaries Summaries should be readily understandable to stakeholders (e.g., decision makers/regulatory authorities). Customizability Stakeholders should be able to understand model behavior across various subgroups of interest Customizability Outer rules should only comprise of features of stakeholder interest (candidate set restricted) [ Rawal and Lakkaraju, 2020 ]
  • 150. ● The complete optimization problem is non-negative, non-normal, non-monotone, and submodular with matroid constraints ● Solved using the well-known smooth local search algorithm (Feige et. al., 2007) with best known optimality guarantees. 151 Customizable Global Summaries of Counterfactuals [ Rawal and Lakkaraju, 2020 ]
  • 151. Breakout Groups • What concepts/ideas/approaches from our morning discussion stood out to you ? • We discussed different basic units of interpretation -- prototypes, rules, risk scores, shape functions (GAMs), feature importances • Are some of these more suited to certain data modalities (e.g., tabular, images, text) than others? • What could be some potential vulnerabilities/drawbacks of inherently interpretable models and post hoc explanation methods? • Given the diversity of the methods we discussed, how do we go about evaluating inherently interpretable models and post hoc explanation methods?
  • 152. Agenda 153 ❑ Inherently Interpretable Models ❑ Post hoc Explanation Methods ❑ Evaluating Model Interpretations/Explanations ❑ Empirically & Theoretically Analyzing Interpretations/Explanations ❑ Future of Model Understanding
  • 153. Evaluating Model Interpretations/Explanations • Evaluating the meaningfulness or correctness of explanations • Diverse ways of doing this depending on the type of model interpretation/explanation • Evaluating the interpretability of explanations
  • 155. Evaluating Interpretability • Functionally-grounded evaluation: Quantitative metrics – e.g., number of rules, prototypes --> lower is better! • Human-grounded evaluation: binary forced choice, forward simulation/prediction, counterfactual simulation • Application-grounded evaluation: Domain expert with exact application task or simpler/partial task
  • 156. Evaluating Inherently Interpretable Models • Evaluating the accuracy of the resulting model • Evaluating the interpretability of the resulting model • Do we need to evaluate the “correctness” or “meaningfulness” of the resulting interpretations?
  • 157. Evaluating Bayesian Rule Lists • A rule list classifier for stroke prediction [Letham et. al. 2016]
  • 158. Evaluating Interpretable Decision Sets • A decision set classifier for disease diagnosis [Lakkaraju et. al. 2016]
  • 159. Evaluating Interpretability of Bayesian Rule Lists and Interpretable Decision Sets • Number of rules, predicates etc. 🡪 lower is better! • User studies to compare interpretable decision sets to Bayesian Decision Lists (Letham et. al.) • Each user is randomly assigned one of the two models • 10 objective and 2 descriptive questions per user 160
  • 160. Interface for Objective Questions 161
  • 161. Interface for Descriptive Questions 162
  • 162. User Study Results 163 Task Metrics Our Approach Bayesian Decision Lists Descriptive Human Accuracy 0.81 0.17 Avg. Time Spent (secs.) 113.4 396.86 Avg. # of Words 31.11 120.57 Objective Human Accuracy 0.97 0.82 Avg. Time Spent (secs.) 28.18 36.34 Objective Questions: 17% more accurate, 22% faster; Descriptive Questions: 74% fewer words, 71% faster.
  • 163. Evaluating Prototype and Attention Layers • Are prototypes and attention weights always meaningful? • Do attention weights correlate with other measures of feature importance? E.g., gradients • Would alternative attention weights yield different predictions? [Jain and Wallace, 2019]
  • 164. Evaluating Post hoc Explanations • Evaluating the faithfulness (or correctness) of post hoc explanations • Evaluating the stability of post hoc explanations • Evaluating the fairness of post hoc explanations • Evaluating the interpretability of post hoc explanations [Agarwal et. al., 2022]
  • 165. Evaluating Faithfulness of Post hoc Explanations – Ground Truth 166
  • 166. Evaluating Faithfulness of Post hoc Explanations – Ground Truth 167 Spearman rank correlation coefficient computed over features of interest
  • 167. Evaluating Faithfulness of Post hoc Explanations – Explanations as Models 168 • If the explanation is itself a model (e.g., linear model fit by LIME), we can compute the fraction of instances for which the labels assigned by explanation model match those assigned by the underlying model
  • 168. Evaluating Faithfulness of Post hoc Explanations • What if we do not have any ground truth? • What if explanations cannot be considered as models that output predictions?
  • 169. How important are selected features? • Deletion: remove important features and see what happens.. 170 % of Pixels deleted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 170. How important are selected features? • Deletion: remove important features and see what happens.. 171 % of Pixels deleted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 171. How important are selected features? • Deletion: remove important features and see what happens.. 172 % of Pixels deleted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 172. How important are selected features? • Deletion: remove important features and see what happens.. 173 % of Pixels deleted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 173. How important are selected features? • Deletion: remove important features and see what happens.. 174 % of Pixels deleted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 174. How important are selected features? • Deletion: remove important features and see what happens.. 175 % of Pixels deleted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 175. How important are selected features? • Insertion: add important features and see what happens.. 176 % of Pixels inserted Prediction Probability [ Qi, Khorram, Fuxin, 2020 ]
  • 176. How important are selected features? • Insertion: add important features and see what happens.. 177 Prediction Probability % of Pixels inserted [ Qi, Khorram, Fuxin, 2020 ]
  • 177. How important are selected features? • Insertion: add important features and see what happens.. 178 Prediction Probability % of Pixels inserted [ Qi, Khorram, Fuxin, 2020 ]
  • 178. How important are selected features? • Insertion: add important features and see what happens.. 179 Prediction Probability % of Pixels inserted [ Qi, Khorram, Fuxin, 2020 ]
  • 179. How important are selected features? • Insertion: add important features and see what happens.. 180 Prediction Probability % of Pixels inserted [ Qi, Khorram, Fuxin, 2020 ]
  • 180. Evaluating Stability of Post hoc Explanations • Are post hoc explanations unstable w.r.t. small input perturbations? [Alvarez-Melis, 2018; Agarwal et. al., 2022] Local Lipschitz Constant Input Post hoc Explanation max
  • 181. Evaluating Stability of Post hoc Explanations • What if the underlying model itself is unstable? • Relative Output Stability: Denominator accounts for changes in the prediction probabilities • Relative Representation Stability: Denominator accounts for changes in the intermediate representations of the underlying model [Agarwal et. al., 2022]
  • 182. Evaluating Fairness of Post hoc Explanations • Compute mean faithfulness/stability metrics for instances from majority and minority groups (e.g., race A vs. race B, male vs. female) • If the difference between the two means is statistically significant, then there is unfairness in the post hoc explanations • Why/when can such unfairness occur? [Dai et. al., 2022]
  • 183. Evaluating Interpretability of Post hoc Explanations 184 [ Doshi-Velez and Kim, 2017 ]
  • 184. Predicting Behavior (“Simulation”) Classifier Predictions & Explanations Show to user Data Predictions New Data User guesses what the classifier would do on new data 185 [ Ribeiro et al. 2018, Hase and Bansal 2020 ] Compare Accuracy
  • 185. Predicting Behavior (“Simulation”) 186 [ Poursabzi-Sangdeh et al. 2018 ]
  • 186. Human-AI Collaboration • Are Explanations Useful for Making Decisions? • For tasks where the algorithms are not reliable by themselves 187 [ Lai and Tan, 2019 ]
  • 187. Human-AI Collaboration • Deception Detection: Identify fake reviews online • Are Humans better detectors with explanations? 188 [ Lai and Tan, 2019 ] https://machineintheloop.com/deception/
  • 188. Can we improve the accuracy of decisions using feature attribution-based explanations? • Prediction Problem: Is a given patient likely to be diagnosed with breast cancer within 2 years? • User studies carried out with about 78 doctors (Residents, Internal Medicine) • Each doctor looks at 10 patient records from historical data and makes predictions for each of them. 189 Lakkaraju et. al., 2022
  • 189. 190 Accuracy: 78.32% Accuracy: 93.11% Accuracy: 82.02% + At Risk (0.91) + At Risk (0.91) Important Features ESR Family Risk Chronic Health Conditions Model Accuracy: Can we improve the accuracy of decisions using feature attribution-based explanations?
  • 190. 191 Accuracy: 78.32% Accuracy: 82.02% + At Risk (0.91) + At Risk (0.91) Model Accuracy: Important Features Appointment time Appointment day Zip code Doctor ID > 150 Accuracy: 93.11% Can we improve the accuracy of decisions using feature attribution-based explanations?
  • 191. Challenges of Evaluating Interpretable Models/Post hoc Explanation Methods ●Evaluating interpretations/explanations still an ongoing endeavor ●Parameter settings heavily influence the resulting interpretations/explanations ●Diversity of explanation/interpretation methods 🡪 diverse metrics ●User studies are not consistent ○Affected by choice of: UI, phrasing, visualization, population, incentives, … ●All the above leading to conflicting findings 192
  • 192. ●Interpretable models: https://github.com/interpretml/interpret ●Post hoc explanation methods: OpenXAI: https://open-xai.github.io/ -- 22 metrics (faithfulness, stability, fairness); public dashboards comparing various metrics on different metrics; 11 lines of code to evaluate explanation quality ●Other XAI libraries: Captum, quantus, shap bench, ERASER (NLP) Open Source Tools for Quantitative Evaluation
  • 193. Agenda 194 ❑ Inherently Interpretable Models ❑ Post hoc Explanation Methods ❑ Evaluating Model Interpretations/Explanations ❑ Empirically & Theoretically Analyzing Interpretations/Explanations ❑ Future of Model Understanding
  • 194. Empirically Analyzing Interpretations/Explanations •Lot of recent focus on analyzing the behavior of post hoc explanation methods. •Empirical studies analyzing the faithfulness, stability, fairness, adversarial vulnerabilities, and utility of post hoc explanation methods. •Several studies demonstrate limitations of existing post hoc methods.
  • 195. Limitations: Faithfulness 196 Gradient ⊙ Input Guided Backprop Guided GradCAM Model parameter randomization test Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
  • 196. Limitations: Faithfulness 197 Gradient ⊙ Input Guided Backprop Guided GradCAM Model parameter randomization test Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
  • 197. Limitations: Faithfulness 198 Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018. Gradient ⊙ Input Guided Backprop Guided GradCAM Model parameter randomization test
  • 198. Limitations: Faithfulness 199 Randomizing class labels of instances also didn’t impact explanations! Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018.
  • 199. Limitations: Stability 200 Are post-hoc explanations unstable wrt small non- adversarial input perturbation? input model hyperparameters Local Lipschitz Constant Input Explanation function: LIME, SHAP, Gradient...etc. Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018.
  • 200. Limitations: Stability 201 ● Perturbation approaches like LIME can be unstable. Estimate for 100 tests for an MNIST Model. Are post-hoc explanations unstable wrt small non- adversarial input perturbation? Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018.
  • 201. 202 [Slack et. al., 2020] Many = 250 perturbations; Few = 25 perturbations; When you repeatedly run LIME on the same instance, you get different explanations (blue region) Problem with having too few perturbations? If so, what is the optimal number of perturbations? Limitations: Stability – Problem is Worse!
  • 202. 203 Dombrowski et. al. 2019 Post-hoc Explanations are Fragile Post-hoc explanations can be easily manipulated.
  • 203. 204 Dombrowski et. al. 2019 Post-hoc Explanations are Fragile Post-hoc explanations can be easily manipulated.
  • 204. 205 Post-hoc explanations can be easily manipulated. Dombrowski et. al. 2019 Post-hoc Explanations are Fragile
  • 205. 206 Post-hoc explanations can be easily manipulated. Dombrowski et. al. 2019 Post-hoc Explanations are Fragile
  • 206. 207 Adversarial Attacks on Explanations Adversarial Attack of Ghorbani et. al. 2018 Minimally modify the input with a small perturbation without changing the model prediction.
  • 207. 208 Adversarial Attacks on Explanations Adversarial Attack of Ghorbani et. al. 2018 Minimally modify the input with a small perturbation without changing the model prediction.
  • 208. 209 Adversarial Attacks on Explanations Adversarial Attack of Ghorbani et. al. 2018 Minimally modify the input with a small perturbation without changing the model prediction.
  • 209. 210 Scaffolding attack used to hide classifier dependence on gender. Slack and Hilgard et. al. 2020 Adversarial Classifiers to fool LIME & SHAP
  • 210. Vulnerabilities of LIME/SHAP: Intuition 211 Several perturbed data points are out of distribution (OOD)!
  • 211. Vulnerabilities of LIME/SHAP: Intuition 212 Adversaries can exploit this and build a classifier that is biased on in-sample data points and unbiased on OOD samples!
  • 212. Building Adversarial Classifiers • Setting: • Adversary wants to deploy a biased classifier f in real world. • E.g., uses only race to make decisions • Adversary must provide black box access to customers and regulators who may use post hoc techniques (GDPR). • Goal of adversary is to fool post hoc explanation techniques and hide underlying biases of f 213
  • 213. Building Adversarial Classifiers • Input: Adversary provides us with the biased classifier f, an input dataset X sampled from real world input distribution Xdist • Output: Scaffolded classifier e which behaves exactly like f when making predictions on instances sampled from Xdist but will not reveal underlying biases of f when probed with perturbation- based post hoc explanation techniques. • e is the adversarial classifier 214
  • 214. Building Adversarial Classifiers • Adversarial classifier e can be defined as: • f is the biased classifier input by adversary. • is the unbiased classifier (e.g., only uses features uncorrelated to sensitive attributes) 215
  • 215. 216 Limitations: Stability Post-hoc explanations can be unstable to small, non-adversarial, perturbations to the input. Alvarez et. al. 2018.
  • 216. 217 Limitations: Stability Post-hoc explanations can be unstable to small, non-adversarial, perturbations to the input. ‘Local Lipschitz Constant’ Input Explanation function: LIME, SHAP, Gradient...etc. Alvarez et. al. 2018.
  • 217. 218 Limitations: Stability ● Perturbation approaches like LIME can be unstable. Alvarez et. al. 2018. Estimate for 100 tests for an MNIST Model.
  • 218. 219 Sensitivity to Hyperparameters Bansal, Agarwal, & Nguyen, 2020. Explanations can be highly sensitive to hyperparameters such as random seed, number of perturbations, patch size, etc.
  • 219. 220 Utility: High fidelity explanations can mislead Lakkaraju & Bastani 2019. In a bail adjudication task, misleading high-fidelity explanations improve end-user (domain experts) trust. True Classifier relies on race
  • 220. 221 Lakkaraju & Bastani 2019. True Classifier relies on race High fidelity ‘misleading’ explanation In a bail adjudication task, misleading high-fidelity explanations improve end-user (domain experts) trust. Utility: High fidelity explanations can mislead
  • 221. Utility: Post hoc Explanations Instill Over Trust • Domain experts and end users seem to be over trusting explanations & the underlying models based on explanations • Data scientists over trusted explanations without even comprehending them -- “Participants trusted the tools because of their visualizations and their public availability” 222 [Kaur et. al., 2020; Bucinca et. al., 2020]
  • 222. 223 Responses from Data Scientists Using Explainability Tools (GAM and SHAP) [Kaur et. al., 2020]
  • 223. 224 Utility: Explanations for Debugging Poursabzi-Sangdeh et. al. 2019 In a housing price prediction task, Amazon mechanical turkers are unable to use linear model coefficients to diagnose model mistakes.
  • 224. 225 Adebayo et. al., 2020. In a dog breeds classification task, users familiar with machine learning rely on labels, instead of saliency maps, for diagnosing model errors. Utility: Explanations for Debugging
  • 225. 226 Adebayo et. al., 2020. In a dog breeds classification task, users familiar with machine learning rely on labels, instead of saliency maps, for diagnosing model errors. Utility: Explanations for Debugging
  • 226. 227 Conflicting Evidence on Utility of Explanations ● Mixed evidence: ● simulation and benchmark studies show that explanations are useful for debugging; ● however, recent user studies show limited utility in practice.
  • 227. 228 ● Mixed evidence: ● simulation and benchmark studies show that explanations are useful for debugging; ● however, recent user studies show limited utility in practice. ● Rigorous user studies and pilots with end-users can continue to help provide feedback to researchers on what to address (see: Alqaraawi et. al. 2020, Bhatt et. al. 2020 & Kaur et. al. 2020). Conflicting Evidence on Utility of Explanations
  • 228. Utility: Disagreement Problem in XAI • Study to understand: • if and how often feature attribution based explanation methods disagree with each other in practice • What constitutes disagreement between these explanations, and how to formalize the notion of explanation disagreement based on practitioner inputs? • How do practitioners resolve explanation disagreement? 229 Krishna and Han et. al., 2022
  • 229. • 30 minute semi-structured interviews with 25 data scientists • 84% of participants said they often encountered disagreement between explanation methods • Characterizing disagreement: • Top features are different • Ordering among top features is different • Direction of top feature contributions is different • Relative ordering of features of interest is different 230 Practitioner Inputs on Explanation Disagreement
  • 230. • Online user study where 25 users were shown explanations that disagree and asked to make a choice, and explain why • Practitioners are choosing methods due to: • Associated theory or publication time (33%) • Explanations matching human intuition better (32%) • Type of data (23%) • E.g., LIME or SHAP are better for tabular data 231 How do Practitioners Resolve Disagreements?
  • 231. 232 How do Practitioners Resolve Disagreements?
  • 232. 233 Empirical Analysis: Summary ● Faithfulness/Fidelity ■ Some explanation methods do not ‘reflect’ the underlying model. ● Fragility ■ Post-hoc explanations can be easily manipulated. ● Stability ■ Slight changes to inputs can cause large changes in explanations. ● Useful in practice?
  • 233. Theoretically Analyzing Interpretable Models ▪ Two main classes of theoretical results: ▪ Interpretable models learned using certain algorithms are certifiably optimal ▪ E.g., rule lists (Angelino et. al., 2018) ▪ No accuracy-interpretability tradeoffs in certain settings ▪ E.g., reinforcement learning for mazes (Mansour et. al., 2022)
  • 234. Theoretical Analysis of Tabular LIME w.r.t. Linear Models • Theoretical analysis of LIME • “black box” is a linear model • data is tabular and discretized • Obtained closed-form solutions of the average coefficients of the “surrogate” model (explanation output by LIME) • The coefficients obtained are proportional to the gradient of the function to be explained • Local error of surrogate model is bounded away from zero with high probability 235 [Garreau et. al., 2020]
  • 235. Unification and Robustness of LIME and SmoothGrad • C-LIME (a continuous variant of LIME) and SmoothGrad converge to the same explanation in expectation • At expectation, the resulting explanations are provably robust according to the notion of Lipschitz continuity • Finite sample complexity bounds for the number of perturbed samples required for SmoothGrad and C-LIME to converge to their expected output 236 [Agarwal et. al., 2020]
  • 236. Function Approximation Perspective to Characterizing Post hoc Explanation Methods 237 [Han et. al., 2022] • Various feature attribution methods (e.g., LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradient times Input, SmoothGrad, Integrated Gradients) are essentially local linear function approximations. • But…
  • 237. Function Approximation Perspective to Characterizing Post hoc Explanation Methods 238 [Han et. al., 2022] • But, they adopt different loss functions, and local neighborhoods
  • 238. Function Approximation Perspective to Characterizing Post hoc Explanation Methods • No Free Lunch Theorem for Explanation Methods: No single method can perform optimally across all neighborhoods 239
  • 239. Agenda 240 ❑ Inherently Interpretable Models ❑ Post hoc Explanation Methods ❑ Evaluating Model Interpretations/Explanations ❑ Empirically & Theoretically Analyzing Interpretations/Explanations ❑ Future of Model Understanding
  • 240. Future of Model Understanding 241 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 241. Methods for More Reliable Post hoc Explanations 242 ❑ We have seen several limitations in the behavior of post hoc explanation methods – e.g., unstable, inconsistent, fragile, not faithful ❑ While there are already attempts to address some of these limitations, more work is needed
  • 242. Challenges with LIME: Stability • Perturbation approaches like LIME/SHAP are unstable 243 Alvarez-Melis, 2018
  • 243. Challenges with LIME: Consistency 244 Slack et. al., 2020 Many = 250 perturbations; Few = 25 perturbations; When you repeatedly run LIME on the same instance, you get different explanations (blue region)
  • 244. Challenges with LIME: Consistency 245 Problem with having too few perturbations? What is the optimal number of perturbations? Can we just use a very large number of perturbations?
  • 245. Challenges with LIME: Scalability • Querying complex models (e.g., Inception Network, ResNet, AlexNet) repeatedly for labels can be computationally prohibitive • Large number of perturbations 🡪 Large number of model queries 246 Generating reliable explanations using LIME can be computationally expensive!
  • 246. Explanations with Guarantees: BayesLIME and BayesSHAP • Intuition: Instead of point estimates of feature importances, define these as distributions 247
  • 247. BayesLIME and BayesSHAP • Construct a Bayesian locally weighted regression that can accommodate LIME/SHAP weighting functions 248 Feature Importances Perturbations Weighting Function Black Box Predictions Priors on feature importances and feature importance uncertainty
  • 248. • Conjugacy results in following posteriors • We can compute all parameters in closed form BayesLIME and BayesSHAP: Inference 249 These are the same equations used in LIME & SHAP!
  • 249. Estimating the Required Number of Perturbations 250 Estimate required number of perturbations for user specified uncertainty level.
  • 250. Improving Efficiency: Focused Sampling • Instead of sampling perturbations randomly and querying the black box, choose points the learning algorithm is most uncertain about and only query their labels from the black box. 251 This approach allows us to construct explanations with user defined levels of confidence in an efficient manner!
  • 251. Other Questions • Can we construct post hoc explanations that are provably robust to various adversarial attacks discussed earlier? • Can we construct post hoc explanations that can guarantee faithfulness, stability, and fairness simultaneously? 252
  • 252. Future of Model Understanding 253 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 253. Theoretical Analysis of the Behavior of Explanations/Models • We discussed some of the recent theoretical results earlier. Despite these, several important questions remain unanswered • Can we characterize the conditions under which each post hoc explanation method (un)successfully captures the behavior of the underlying model? • Given the properties of the underlying model, data distribution, can we theoretically determine which explanation method should be employed? • Can we theoretically analyze the nature of the prototypes/attention weights learned by deep nets with added layers? When are these meaningful/when are they spurious? 254
  • 254. Future of Model Understanding 255 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 255. Empirical Analysis of Correctness/Utility • While there is already a lot of work on empirical analysis of correctness/utility for post hoc explanation methods, there is still no clear characterization of which methods (if any) are correct/useful under what conditions. • There is even less work on the empirical analysis of the correctness/utility of the interpretations generated by inherently interpretable models. For instance, are the prototypes generated by adding prototype layers correct/meaningful? Can they be leveraged in any real world applications? What about attention weights? 256
  • 256. Future of Model Understanding 257 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 257. Characterizing Similarities and Differences • Several post hoc explanation methods exist which employ diverse algorithms and definitions of what constitutes an explanation, under what conditions do these methods generate similar outputs (e.g., top K features) ? • Multiple interpretable models which output natural/synthetic prototypes (e.g., Li et. al, Chen et. al. etc.). When do they generate similar answers and why? 258
  • 258. Future of Model Understanding 259 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 259. Model Understanding Beyond Classification • How to think about interpretability in the context of large language models and foundation models? What is even feasible here? • Already active work on interpretability in RL and GNNs. However, very little research on analyzing the correctness/utility of these explanations. • Given that primitive interpretable models/post hoc explanations suffer from so many limitations, how to ensure explanations for more complex models are reliable? 260 [Coppens et. al., 2019, Amir et. al. 2018] [Ying et. al., 2019]
  • 260. Future of Model Understanding 261 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 261. Intersections with Model Robustness • Are inherently interpretable models with prototype/attention layers more robust than those without these layers? If so, why? • Are there any inherent trade-offs between (certain kinds of) model interpretability and model robustness? Or do these aspects reinforce each other? • Prior works show that counterfactual explanation generation algorithms output adversarial examples. What is the impact of adversarially robust models on these explanations? [Pawelczyk et. al., 2022] 262
  • 262. Future of Model Understanding 263 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 263. Intersections with Model Fairness • It is often hypothesized that model interpretations and explanations help unearth unfairness biases of underlying models. However, there is little to no empirical research demonstrating this. • Conducting more empirical evaluations and user studies to determine how interpetations and explanations can complement statistical notions of fairness in identifying racial/gender biases • How does the fairness (statistical) of inherently interpretable models compare with that of vanilla models? Are there any inherent trade-offs between (certain kinds of) model interpretability and model fairness? Or do these aspects reinforce each other? 264
  • 264. Future of Model Understanding 265 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 265. Intersections with Differential Privacy ● Model interpretations and explanations could potentially expose sensitive information from the datasets. ● Little to no research on the privacy implications of interpretable models and/or explanations. What kinds of privacy attacks (e.g., membership inference, model inversion etc.) are enabled? ● Do differentially private models help thwart these attacks?If so, under what conditions? Should we construct differentially private explanations? 266 [Harder et. al., 2020; Patel et. al. 2020]
  • 266. Future of Model Understanding 267 Methods for More Reliable Post hoc Explanations Theoretical Analysis of the Behavior of Interpretable Models & Explanation Methods Model Understanding Beyond Classification Intersections with Model Privacy Intersections with Model Fairness Empirical Evaluation of the Correctness & Utility of Model Interpretations/Explanations Intersections with Model Robustness Characterizing Similarities and Differences Between Various Methods New Interfaces, Tools, Benchmarks for Model Understanding
  • 267. New Interfaces, Tools, Benchmarks for Model Understanding ● Can we construct more interactive interfaces for end users to engage with models? What would be the nature of such interactions? [demo] ● As model interpretations and explanations are employed in different settings, we need to develop new benchmarks and tools for enabling comparison of faithfulness, stability, fairness, utility of various methods. How to enable that? 268 [Lakkaraju et. al., 2022, Slack et. al., 2022]
  • 268. Some Parting Thoughts.. ● There has been renewed interest in model understanding over the past half decade, thanks to ML models being deployed in healthcare and other high-stakes settings ● As ML models continue to get increasingly complex and they continue to find more applications, the need for model understanding is only going to raise ● Lots of interesting and open problems waiting to be solved ● You can approach the field of XAI from diverse perspectives: theory, algorithms, HCI, or interdisciplinary research – there is room for everyone! ☺ 269
  • 269. Thank You! • Email: hlakkaraju@hbs.edu; hlakkaraju@seas.harvard.edu; • Course on interpretability and explainability: https://interpretable-ml-class.github.io/ • More tutorials on interpretability and explainability: https://explainml-tutorial.github.io/ • Trustworthy ML Initiative: https://www.trustworthyml.org/ • Lots of resources and seminar series on topics related to explainability, fairness, adversarial robustness, differential privacy, causality etc. • Acknowledgements: Special thanks to Julius Adebayo, Chirag Agarwal, Shalmali Joshi, and Sameer Singh for co-developing and co-presenting sub-parts of this tutorial at NeurIPS, AAAI, and FAccT conferences. 270