Manipulating and measuring model interpretability

Manipulating and Measuring Model Interpretability
Microsoft Research NYC
Forough Poursabzi-
Sangdeh
Dan Goldstein Jake Hofman Jenn Wortman
Vaughan
Hanna Wallach

u = k(x, u)
INTERPRETABLE MACHINE LEARNING

u = k(x, u)
e.g., generalized additive models
Lou et al. 2012 and 2013
Simple models

u = k(x, u)
e.g., LIME
Ribiero et al. 2016
Post-hoc explanations
e.g., generalized additive models
Lou et al. 2012 and 2013
Simple models

INTERPRETABILITY?
u = k(x, u)
• What makes a model or explanation interpretable?

DIFFERENT SCENARIOS, DIFFERENT PEOPLE, DIFFERENT NEEDS
u = k(x, u)
Explain a
prediction
Understand
model
Make better
decisions
Debug
model
De-bias
model
Inspire trust
CEOs Approach A
Data
scientists
Approach C
Laypeople
Regulators Approach B

Interpretability
INTERPRETABILITY AS A LATENT PROPERTY

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
… …
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of model and
system design
…
trust
ability to debug
ability to simulate
ability to explain

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
system design
…
trust
ability to debug
ability to simulate
ability to explain

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
We need interdisciplinary approaches
system design
…
trust
ability to debug
ability to simulate
ability to explain

Interpretability
FOCUS ON LAYPEOPLE
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
Randomized human-subject experiments
system design
…
trust
ability to debug
ability to simulate
ability to explain

USER EXPERIMENT, PREDICTIVE TASK
u = k(x, u)
• Predict the price of apartments in NYC with the help of a model

EXPERIMENTAL CONDITIONS
CLEAR-2 feature BB-2 feature

TIGHTLY CONTROLLED EXPERIMENTS

USER INTERFACE AND INTERACTIONS
u = k(x, u)
• Training phase: participants get familiar with the model
• Testing phase step 1: simulate the model’s prediction
Simulate the model

u = k(x, u)
• Testing phase step 2: observe the model’s prediction and guess the price
Predict actual selling price

PRE-REGISTERED HYPOTHESES
u = k(x, u)
• CLEAR-2 feature will be easiest for participants to simulate
• Participants will trust CLEAR-2 feature more than BB-8 feature
• Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
https://aspredicted.org/xy5s6.pdf

SIMULATION ERROR
u = k(x, u)
CLEAR-2 feature will be easiest for participants to simulate

SIMULATION ERROR
u = k(x, u)
m
$um

SIMULATION ERROR
u = k(x, u)
Simulation error
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$100k
$200k
Meansimulationerror
m
$um

TRUST (DEVIATION FROM THE MODEL)
Participants will trust CLEAR-2 feature more than BB-8 feature

m
$ua

Deviation
$0k
$50k
$100k
$150k
Meandeviationfromthemodel
m
$ua

DETECTION OF MISTAKES
Participants’ behaviors will vary when they see unusual examples where the model makes

m
$ua

Apartment 12: 1 bed, 3 bath
$0k
$50k
$100k
$150k
$200k
$250k
$300k
forapartment12 m
$ua

Apartment 12: 1 bed, 3 bath
$0k
$50k
$100k
$150k
$200k
$250k
$300k
forapartment12 m
$ua
When participants see unusual examples, they are less likely to correct inaccurate
predictions made by clear models than black-box models

CONJECTURE: ANCHORING EFFECT
User’s simulation of the model’s prediction

u = k(x, u)
• We remove potential anchors

PRE-REGISTERED HYPOTHESES
u = k(x, u)
• Explicit attention checks on unusual inputs will aﬀect participants’ abilities in detecting
model’s mistakes
• Model transparency aﬀects participants’ abilities in detecting model’s mistakes, both with
and without attention checks
https://aspredicted.org/5xy8y.pdf

Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction

No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M

No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
• Attention checks improve users’ ability to correct model’s mistakes
mistakes
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M

No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
• Attention checks improve users’ ability to correct model’s mistakes
mistakes
• With attention checks, there is no diﬀerence between clear and black-box
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M

SUMMARY OF RESULTS
u = k(x, u)
• A clear model with a small number of features is easier for participants to simulate
- People have a better understanding of simple and transparent models
• No signiﬁcant diﬀerence in participants’ trust in the model
- Contrary to intuition, people do not necessarily trust simple and transparent models
more
• Participants were less able to correct inaccurate predictions of a clear model than a black-
box model
- Too much transparency can be harmful
- Design implications (e.g., highlighting unusual inputs, display model internals on
demand)

• Interpretability is not a purely computational problem
- We need interdisciplinary research to understand interpretability
• Our surprising results underscore that interpretability research is much more complicated
- We need more empirical studies
- Other scenarios, domains, models, factors, outcomes
TAKEAWAYS

u = k(x, u)
https://csel.cs.colorado.edu/~fopo5620/
forough.poursabzi@microsoft.com
Thanks!

Manipulating and measuring model interpretability

Recommended

Recommended

More Related Content

Similar to Manipulating and measuring model interpretability

Similar to Manipulating and measuring model interpretability (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Manipulating and measuring model interpretability