SlideShare a Scribd company logo
Manipulating and Measuring Model Interpretability
Microsoft Research NYC
Forough Poursabzi-
Sangdeh
Dan Goldstein Jake Hofman Jenn Wortman
Vaughan
Hanna Wallach
u = k(x, u)
INTERPRETABLE MACHINE LEARNING
u = k(x, u)
e.g., generalized additive models
Lou et al. 2012 and 2013
Simple models
INTERPRETABLE MACHINE LEARNING
u = k(x, u)
e.g., LIME
Ribiero et al. 2016
Post-hoc explanations
e.g., generalized additive models
Lou et al. 2012 and 2013
Simple models
INTERPRETABLE MACHINE LEARNING
INTERPRETABILITY?
u = k(x, u)
• What makes a model or explanation interpretable?
DIFFERENT SCENARIOS, DIFFERENT PEOPLE, DIFFERENT NEEDS
u = k(x, u)
Explain a
prediction
Understand
model
Make better
decisions
Debug
model
De-bias
model
Inspire trust
CEOs Approach A
Data
scientists
Approach C
Laypeople
Regulators Approach B
Interpretability
INTERPRETABILITY AS A LATENT PROPERTY
Interpretability
INTERPRETABILITY AS A LATENT PROPERTY
number of features
linearity
black-box vs. clear
visualizations
types of features
…
Interpretability
INTERPRETABILITY AS A LATENT PROPERTY
number of features
linearity
black-box vs. clear
visualizations
types of features
… …
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes
Interpretability
INTERPRETABILITY AS A LATENT PROPERTY
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of model and
system design
…
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes
Interpretability
INTERPRETABILITY AS A LATENT PROPERTY
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
properties of model and
system design
…
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes
Interpretability
INTERPRETABILITY AS A LATENT PROPERTY
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
We need interdisciplinary approaches
properties of model and
system design
…
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes
Interpretability
FOCUS ON LAYPEOPLE
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
Randomized human-subject experiments
properties of model and
system design
…
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes
USER EXPERIMENT, PREDICTIVE TASK
u = k(x, u)
• Predict the price of apartments in NYC with the help of a model
EXPERIMENTAL CONDITIONS
EXPERIMENTAL CONDITIONS
EXPERIMENTAL CONDITIONS
EXPERIMENTAL CONDITIONS
EXPERIMENTAL CONDITIONS
CLEAR-2 feature BB-2 feature
CLEAR-8 feature BB-8 feature
TIGHTLY CONTROLLED EXPERIMENTS
CLEAR-2 feature BB-2 feature
CLEAR-8 feature BB-8 feature
TIGHTLY CONTROLLED EXPERIMENTS
CLEAR-2 feature BB-2 feature
CLEAR-8 feature BB-8 feature
TIGHTLY CONTROLLED EXPERIMENTS
CLEAR-2 feature BB-2 feature
CLEAR-8 feature BB-8 feature
USER INTERFACE AND INTERACTIONS
u = k(x, u)
• Training phase: participants get familiar with the model
• Testing phase step 1: simulate the model’s prediction
Simulate the model
USER INTERFACE AND INTERACTIONS
u = k(x, u)
• Testing phase step 2: observe the model’s prediction and guess the price
Predict actual selling price
PRE-REGISTERED HYPOTHESES
u = k(x, u)
• CLEAR-2 feature will be easiest for participants to simulate
• Participants will trust CLEAR-2 feature more than BB-8 feature
• Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
https://aspredicted.org/xy5s6.pdf
SIMULATION ERROR
u = k(x, u)
CLEAR-2 feature will be easiest for participants to simulate
SIMULATION ERROR
u = k(x, u)
CLEAR-2 feature will be easiest for participants to simulate
m
$um
SIMULATION ERROR
u = k(x, u)
CLEAR-2 feature will be easiest for participants to simulate
Simulation error
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$100k
$200k
Meansimulationerror
m
$um
SIMULATION ERROR
u = k(x, u)
CLEAR-2 feature will be easiest for participants to simulate
Simulation error
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$100k
$200k
Meansimulationerror
m
$um
TRUST (DEVIATION FROM THE MODEL)
Participants will trust CLEAR-2 feature more than BB-8 feature
TRUST (DEVIATION FROM THE MODEL)
Participants will trust CLEAR-2 feature more than BB-8 feature
m
$ua
Deviation
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$50k
$100k
$150k
Meandeviationfromthemodel
TRUST (DEVIATION FROM THE MODEL)
Participants will trust CLEAR-2 feature more than BB-8 feature
m
$ua
Deviation
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$50k
$100k
$150k
Meandeviationfromthemodel
TRUST (DEVIATION FROM THE MODEL)
Participants will trust CLEAR-2 feature more than BB-8 feature
m
$ua
WEIRD APARTMENT
u = k(x, u)
DETECTION OF MISTAKES
Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
DETECTION OF MISTAKES
Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
m
$ua
DETECTION OF MISTAKES
Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
Apartment 12: 1 bed, 3 bath
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$50k
$100k
$150k
$200k
$250k
$300k
Meandeviationfromthemodel
forapartment12 m
$ua
DETECTION OF MISTAKES
Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
Apartment 12: 1 bed, 3 bath
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$50k
$100k
$150k
$200k
$250k
$300k
Meandeviationfromthemodel
forapartment12 m
$ua
When participants see unusual examples, they are less likely to correct inaccurate
predictions made by clear models than black-box models
WHAT IS UP WITH THIS?
CONJECTURE: VISUAL OVERLOAD
CONJECTURE: VISUAL OVERLOAD
CONJECTURE: ANCHORING EFFECT
CONJECTURE: ANCHORING EFFECT
User’s simulation of the model’s prediction
EXPLICIT ATTENTION CHECK
USER INTERFACE AND INTERACTIONS
u = k(x, u)
• We remove potential anchors
PRE-REGISTERED HYPOTHESES
u = k(x, u)
• Explicit attention checks on unusual inputs will affect participants’ abilities in detecting
model’s mistakes
• Model transparency affects participants’ abilities in detecting model’s mistakes, both with
and without attention checks
https://aspredicted.org/5xy8y.pdf
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• Attention checks improve users’ ability to correct model’s mistakes
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• Attention checks improve users’ ability to correct model’s mistakes
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• Attention checks improve users’ ability to correct model’s mistakes
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• Attention checks improve users’ ability to correct model’s mistakes
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
• With attention checks, there is no difference between clear and black-box
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
DETECTION OF MISTAKES
• Attention checks improve users’ ability to correct model’s mistakes
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
• With attention checks, there is no difference between clear and black-box
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction
Model's prediction CLEAR BB
SUMMARY OF RESULTS
u = k(x, u)
• A clear model with a small number of features is easier for participants to simulate
- People have a better understanding of simple and transparent models
• No significant difference in participants’ trust in the model
- Contrary to intuition, people do not necessarily trust simple and transparent models
more
• Participants were less able to correct inaccurate predictions of a clear model than a black-
box model
- Too much transparency can be harmful
- Design implications (e.g., highlighting unusual inputs, display model internals on
demand)
• Interpretability is not a purely computational problem
- We need interdisciplinary research to understand interpretability
• Our surprising results underscore that interpretability research is much more complicated
- We need more empirical studies
- Other scenarios, domains, models, factors, outcomes
TAKEAWAYS
u = k(x, u)
https://csel.cs.colorado.edu/~fopo5620/
forough.poursabzi@microsoft.com
Thanks!

More Related Content

Similar to Manipulating and measuring model interpretability

MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
The Statistical and Applied Mathematical Sciences Institute
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Pramit Choudhary
 
Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18
DataconomyGmbH
 
DN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
DN18 | A/B Testing: Lessons Learned | Dan McKinley | MailchimpDN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
DN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
Dataconomy Media
 
Sattose 2020 presentation
Sattose 2020 presentationSattose 2020 presentation
Sattose 2020 presentation
Céline Deknop
 
FairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment FrameworkFairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment Framework
maniopas
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
Divyansh Dokania
 
Avi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptxAvi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptx
ssuser3fa333
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18
Matt Yang
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...YONG ZHENG
 
House price prediction
House price predictionHouse price prediction
House price prediction
AdityaKumar1505
 
ch06 solution design of experiment for eng
ch06 solution design of experiment for engch06 solution design of experiment for eng
ch06 solution design of experiment for eng
ashaby
 
How to focus - design your new app in 60 minutes!
How to focus - design your new app in 60 minutes!How to focus - design your new app in 60 minutes!
How to focus - design your new app in 60 minutes!
Zach Pousman
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
BigML, Inc
 
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-CommercePersonalised Recommendations in E-Commerce
Personalised Recommendations in E-CommerceWing Yung Chan
 
Design of Engineering Experiments Part 5
Design of Engineering Experiments Part 5Design of Engineering Experiments Part 5
Design of Engineering Experiments Part 5
Stats Statswork
 
Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...
Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...
Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...
Domino Data Lab
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
harmonylab
 
Using Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value TestingUsing Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value Testing
Felix Dobslaw
 

Similar to Manipulating and measuring model interpretability (20)

MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18Causal inference-for-profit | Dan McKinley | DN18
Causal inference-for-profit | Dan McKinley | DN18
 
DN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
DN18 | A/B Testing: Lessons Learned | Dan McKinley | MailchimpDN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
DN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
 
Sattose 2020 presentation
Sattose 2020 presentationSattose 2020 presentation
Sattose 2020 presentation
 
FairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment FrameworkFairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment Framework
 
UXD lesson 1 - Intro To UX
UXD lesson 1 - Intro To UXUXD lesson 1 - Intro To UX
UXD lesson 1 - Intro To UX
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
 
Avi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptxAvi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptx
 
HOP-Rec_RecSys18
HOP-Rec_RecSys18HOP-Rec_RecSys18
HOP-Rec_RecSys18
 
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
ch06 solution design of experiment for eng
ch06 solution design of experiment for engch06 solution design of experiment for eng
ch06 solution design of experiment for eng
 
How to focus - design your new app in 60 minutes!
How to focus - design your new app in 60 minutes!How to focus - design your new app in 60 minutes!
How to focus - design your new app in 60 minutes!
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Personalised Recommendations in E-Commerce
Personalised Recommendations in E-CommercePersonalised Recommendations in E-Commerce
Personalised Recommendations in E-Commerce
 
Design of Engineering Experiments Part 5
Design of Engineering Experiments Part 5Design of Engineering Experiments Part 5
Design of Engineering Experiments Part 5
 
Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...
Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...
Data Science Popup Austin: Predicting Customer Behavior & Enhancing Customer ...
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Using Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value TestingUsing Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value Testing
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Manipulating and measuring model interpretability

  • 1. Manipulating and Measuring Model Interpretability Microsoft Research NYC Forough Poursabzi- Sangdeh Dan Goldstein Jake Hofman Jenn Wortman Vaughan Hanna Wallach
  • 2. u = k(x, u) INTERPRETABLE MACHINE LEARNING
  • 3. u = k(x, u) e.g., generalized additive models Lou et al. 2012 and 2013 Simple models INTERPRETABLE MACHINE LEARNING
  • 4. u = k(x, u) e.g., LIME Ribiero et al. 2016 Post-hoc explanations e.g., generalized additive models Lou et al. 2012 and 2013 Simple models INTERPRETABLE MACHINE LEARNING
  • 5. INTERPRETABILITY? u = k(x, u) • What makes a model or explanation interpretable?
  • 6. DIFFERENT SCENARIOS, DIFFERENT PEOPLE, DIFFERENT NEEDS u = k(x, u) Explain a prediction Understand model Make better decisions Debug model De-bias model Inspire trust CEOs Approach A Data scientists Approach C Laypeople Regulators Approach B
  • 8. Interpretability INTERPRETABILITY AS A LATENT PROPERTY number of features linearity black-box vs. clear visualizations types of features …
  • 9. Interpretability INTERPRETABILITY AS A LATENT PROPERTY number of features linearity black-box vs. clear visualizations types of features … … trust ability to debug ability to simulate ability to explain ability to detect mistakes
  • 10. Interpretability INTERPRETABILITY AS A LATENT PROPERTY number of features linearity black-box vs. clear visualizations types of features … properties of model and system design … trust ability to debug ability to simulate ability to explain ability to detect mistakes
  • 11. Interpretability INTERPRETABILITY AS A LATENT PROPERTY number of features linearity black-box vs. clear visualizations types of features … properties of human behavior properties of model and system design … trust ability to debug ability to simulate ability to explain ability to detect mistakes
  • 12. Interpretability INTERPRETABILITY AS A LATENT PROPERTY number of features linearity black-box vs. clear visualizations types of features … properties of human behavior We need interdisciplinary approaches properties of model and system design … trust ability to debug ability to simulate ability to explain ability to detect mistakes
  • 13. Interpretability FOCUS ON LAYPEOPLE number of features linearity black-box vs. clear visualizations types of features … properties of human behavior Randomized human-subject experiments properties of model and system design … trust ability to debug ability to simulate ability to explain ability to detect mistakes
  • 14. USER EXPERIMENT, PREDICTIVE TASK u = k(x, u) • Predict the price of apartments in NYC with the help of a model
  • 19. EXPERIMENTAL CONDITIONS CLEAR-2 feature BB-2 feature CLEAR-8 feature BB-8 feature
  • 20. TIGHTLY CONTROLLED EXPERIMENTS CLEAR-2 feature BB-2 feature CLEAR-8 feature BB-8 feature
  • 21. TIGHTLY CONTROLLED EXPERIMENTS CLEAR-2 feature BB-2 feature CLEAR-8 feature BB-8 feature
  • 22. TIGHTLY CONTROLLED EXPERIMENTS CLEAR-2 feature BB-2 feature CLEAR-8 feature BB-8 feature
  • 23. USER INTERFACE AND INTERACTIONS u = k(x, u) • Training phase: participants get familiar with the model • Testing phase step 1: simulate the model’s prediction Simulate the model
  • 24. USER INTERFACE AND INTERACTIONS u = k(x, u) • Testing phase step 2: observe the model’s prediction and guess the price Predict actual selling price
  • 25. PRE-REGISTERED HYPOTHESES u = k(x, u) • CLEAR-2 feature will be easiest for participants to simulate • Participants will trust CLEAR-2 feature more than BB-8 feature • Participants’ behaviors will vary when they see unusual examples where the model makes inaccurate predictions https://aspredicted.org/xy5s6.pdf
  • 26. SIMULATION ERROR u = k(x, u) CLEAR-2 feature will be easiest for participants to simulate
  • 27. SIMULATION ERROR u = k(x, u) CLEAR-2 feature will be easiest for participants to simulate m $um
  • 28. SIMULATION ERROR u = k(x, u) CLEAR-2 feature will be easiest for participants to simulate Simulation error CLEAR−2 CLEAR−8 BB−2 BB−8 $0k $100k $200k Meansimulationerror m $um
  • 29. SIMULATION ERROR u = k(x, u) CLEAR-2 feature will be easiest for participants to simulate Simulation error CLEAR−2 CLEAR−8 BB−2 BB−8 $0k $100k $200k Meansimulationerror m $um
  • 30. TRUST (DEVIATION FROM THE MODEL) Participants will trust CLEAR-2 feature more than BB-8 feature
  • 31. TRUST (DEVIATION FROM THE MODEL) Participants will trust CLEAR-2 feature more than BB-8 feature m $ua
  • 32. Deviation CLEAR−2 CLEAR−8 BB−2 BB−8 $0k $50k $100k $150k Meandeviationfromthemodel TRUST (DEVIATION FROM THE MODEL) Participants will trust CLEAR-2 feature more than BB-8 feature m $ua
  • 33. Deviation CLEAR−2 CLEAR−8 BB−2 BB−8 $0k $50k $100k $150k Meandeviationfromthemodel TRUST (DEVIATION FROM THE MODEL) Participants will trust CLEAR-2 feature more than BB-8 feature m $ua
  • 35. DETECTION OF MISTAKES Participants’ behaviors will vary when they see unusual examples where the model makes inaccurate predictions
  • 36. DETECTION OF MISTAKES Participants’ behaviors will vary when they see unusual examples where the model makes inaccurate predictions m $ua
  • 37. DETECTION OF MISTAKES Participants’ behaviors will vary when they see unusual examples where the model makes inaccurate predictions Apartment 12: 1 bed, 3 bath CLEAR−2 CLEAR−8 BB−2 BB−8 $0k $50k $100k $150k $200k $250k $300k Meandeviationfromthemodel forapartment12 m $ua
  • 38. DETECTION OF MISTAKES Participants’ behaviors will vary when they see unusual examples where the model makes inaccurate predictions Apartment 12: 1 bed, 3 bath CLEAR−2 CLEAR−8 BB−2 BB−8 $0k $50k $100k $150k $200k $250k $300k Meandeviationfromthemodel forapartment12 m $ua When participants see unusual examples, they are less likely to correct inaccurate predictions made by clear models than black-box models
  • 39. WHAT IS UP WITH THIS?
  • 43. CONJECTURE: ANCHORING EFFECT User’s simulation of the model’s prediction
  • 45. USER INTERFACE AND INTERACTIONS u = k(x, u) • We remove potential anchors
  • 46. PRE-REGISTERED HYPOTHESES u = k(x, u) • Explicit attention checks on unusual inputs will affect participants’ abilities in detecting model’s mistakes • Model transparency affects participants’ abilities in detecting model’s mistakes, both with and without attention checks https://aspredicted.org/5xy8y.pdf
  • 47. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 48. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • No attention checks: clear models lower users’ ability to correct model’s mistakes Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 49. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • No attention checks: clear models lower users’ ability to correct model’s mistakes Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 50. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • No attention checks: clear models lower users’ ability to correct model’s mistakes Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 51. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • Attention checks improve users’ ability to correct model’s mistakes • No attention checks: clear models lower users’ ability to correct model’s mistakes Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 52. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • Attention checks improve users’ ability to correct model’s mistakes • No attention checks: clear models lower users’ ability to correct model’s mistakes Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 53. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • Attention checks improve users’ ability to correct model’s mistakes • No attention checks: clear models lower users’ ability to correct model’s mistakes Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 54. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • Attention checks improve users’ ability to correct model’s mistakes • No attention checks: clear models lower users’ ability to correct model’s mistakes • With attention checks, there is no difference between clear and black-box Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 55. Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantpredictio Model's prediction CLEAR BB DETECTION OF MISTAKES • Attention checks improve users’ ability to correct model’s mistakes • No attention checks: clear models lower users’ ability to correct model’s mistakes • With attention checks, there is no difference between clear and black-box Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft No attention check With attention check No attention check With attention check $0M $0.5M $1M $1.5M Meanparticipantprediction Model's prediction CLEAR BB
  • 56. SUMMARY OF RESULTS u = k(x, u) • A clear model with a small number of features is easier for participants to simulate - People have a better understanding of simple and transparent models • No significant difference in participants’ trust in the model - Contrary to intuition, people do not necessarily trust simple and transparent models more • Participants were less able to correct inaccurate predictions of a clear model than a black- box model - Too much transparency can be harmful - Design implications (e.g., highlighting unusual inputs, display model internals on demand)
  • 57. • Interpretability is not a purely computational problem - We need interdisciplinary research to understand interpretability • Our surprising results underscore that interpretability research is much more complicated - We need more empirical studies - Other scenarios, domains, models, factors, outcomes TAKEAWAYS
  • 58. u = k(x, u) https://csel.cs.colorado.edu/~fopo5620/ forough.poursabzi@microsoft.com Thanks!