SlideShare a Scribd company logo
Using game theory to explain
the output of any machine
learning model
April 18th, eScience Seminar
Scott Lundberg, Su-In Lee
How much money is someone likely to make?
model 31%
chance of making
> $50K annually
Why?!
No loan
Interpretable Accurate
Complex model ✘ ✔
Simple model ✔ ✘
Interpretable, accurate: choose one.
3
Problem: Explaining complex models is
hard!
 $
Idea: Don’t explain the whole model,
just one prediction
Complex models are
inherently complex!
But a single prediction involves only a
small piece of that complexity.
Inputvalue
Outputvalue
4
Goal: Model agnostic interpretability
model prediction
magic explanation
What if we could view the model as a black box…
…and yet still be able to explain predictions?
data
Interpretable, accurate: choose two!
5 $
If only we had this magic box…
model prediction
magic explanation
data
 Predictions from any complex model could be explained.
 Prediction would be decoupled from explanation, reducing method lock-in.
 Explanations from very different model types could be easily compared.
 and $!
6
So let’s build it!
model prediction
data
7
magic explanation
How much money is someone likely to make?
model 31%
chance of making
> $50K annually
How much money is someone likely to make?
model 31%
chance of making
> $50K annually
Capital losses $0
Weekly hours 40
Occupation Protective-serv
Capital gains $0
Age 28
Marital status Married-civ-spouse
10
chance of making > $50K annually
Base rate
26%
How did we get here?
15% 40%
Model prediction
31%
model 26%
chance of making
> 50K annually
Occupation Exec-managerial
Age 37
Relationship Wife
Years in school 14
Sex Female
Marital status Married-civ-spouse
No attributes are
given to the model
Base rate
model 25%
chance of making
> 50K annually
Capital losses $0
Weekly hours 40
Occupation Protective-serv
Capital losses $0
Age 28
Marital status Married-civ-spouse
13
chance of making > $50K annually
Base rate
26%
15% 40%
No capital losses
Model prediction if we only know they had no capital losses
25%
14
chance of making > $50K annually
Base rate
26%
15% 40%
Police/fire
Prediction if we know they had no capital losses and work in police/fire
24%
15
chance of making > $50K annually
Base rate
26%
15% 40%
Work 40 hr/week
16
chance of making > $50K annually
Base rate
26%
15% 40%
No capital gains
17
chance of making > $50K annually
Base rate
26%
15% 40%
Age (no effect)
18
chance of making > $50K annually
Base rate
26%
15% 40%
Married
19
chance of making > $50K annually
Base rate
26%
15% 40%
20
chance of making > $50K annually
Base rate
26%
15% 40%
21
chance of making > $50K annually
Base rate
26%
15% 40%Married No capital
gains
40 P/F Cap
loss
22
chance of making > $50K annually
15% 40%
23
Largecapitallossecapitalgain
Youngandsingle
Highlyeducatedandmarried
Highlyeducatedandsingle
Youngandmarried
Divorcedwomen
Married,typicaleducation
SamplessortedbyESvaluesimilarity
Barwidthisequaltothe
ESvalueforthatinput
Largecapitallossecapitalgain
Youngandsingle
Highlyeducatedandmarried
Highlyeducatedandsingle
Youngandmarried
Divorcedwomen
Married,typicaleducation
SamplessortedbyESvaluesimilarity
Barwidthisequaltothe
ESvalueforthatinput
Largecapitallossecapitalgain
Youngandsingle
Highlyeducatedandmarried
Highlyeducatedandsingle
Youngandmarried
Divorcedwomen
Married,typicaleducation
SamplessortedbyESvaluesimilarity
Barwidthisequaltothe
ESvalueforthatinput
Large capital lossLarge capital gain
Young and single
Highly educated and married
Highly educated and single
Young and married
Divorced women
Married,typical education
Samplessorted by ESvalue similarity(B)
(A)
Bar width isequal to the
ESvalue for that input
Predictiedprobabilityofmaking>=50K
24
Samples clustered by explanation similarity
25
chance of making > $50K annually
15% 40%
Explaining a single prediction from a model
with 500 decision trees
Unique optimal explanation under basic axioms from
cooperative game theory.
Some notation…
model prediction
data
26
magic explanation
Some notation…
model prediction
x
27
magic explanation
Some notation…
f prediction
x
28
magic explanation
Some notation…
f f(x)
x
29
magic explanation
Some notation…
f f(x)
x
30
magic gx
What is an explanation anyway?
It is an interpretable function
that approximates f!
Car salesman example
31
Age
Weight
Is student
Probability of buying a car
Imagine the explanation gx is a linear model of x:
Use “interpretable” inputs
f f(x)
x
32
magic gx
Interpretable input features
Mapping to interpretable inputs
Car salesman example
33
Age
Weight
Is student
Imagine the explanation gx is a linear model of x’:
Age known to be 25
Weight known to be 150
Is student known to be 1
Some notation…
f f(x)
x
34
m gx
We are going to derive ‘m’ from basic axioms!
• Axiom 1 (Binarization)
• Axiom 2 (Linearity)
SHAP class of explanation methods
35
All methods that satisfy Axioms 1 and 2 are in the
Shapley Additive Explanation (SHAP) class.
Where 0 means “missing” and 1 means “observed”
An explanation is a linear model.
Given two natural axioms, there is only one
possible magic box in the SHAP class!
36
f f(x)
x
m gx
‘m’ is uniquely determined for all methods in the SHAP class under two axioms
Efficiency axiom: g correctly reproduces
the original prediction
Monotonicity axiom: If we make a new
model 𝑓′
𝑥 that is larger than 𝑓(𝑥)
whenever 𝑥𝑖
′
= 1 then 𝜙𝑖 𝑓′
, 𝑥 ≥ 𝜙𝑖 𝑓, 𝑥
38
f([0, 1, 1, 0, 0]) - f([0, 0, 1, 0, 0]) =
f([0, 1, 1, 1, 1]) - f([0, 0, 1, 1, 1]) =
f([0, 1, 0, 0, 0]) - f([0, 0, 0, 0, 0]) =
f([1, 1, 1, 0, 1]) - f([1, 0, 1, 0, 1]) =
f([1, 1, 1, 1, 1]) - f([1, 0, 1, 1, 1]) =
f([1, 1, 0, 0, 1]) - f([1, 0, 0, 0, 1]) =
𝜙𝑖 𝑓, 𝑥 =
Input feature i Input feature i
Output value difference
when adding 𝑥𝑖
′
i’th SHAP value for f
Monotonicity axiom: If we make a new
model 𝑓′
𝑥 that is larger than 𝑓(𝑥)
whenever 𝑥𝑖
′
= 1 then 𝜙𝑖 𝑓′
, 𝑥 ≥ 𝜙𝑖 𝑓, 𝑥
39
f’([0, 1, 1, 0, 0]) – f’([0, 0, 1, 0, 0]) =
f’([0, 1, 1, 1, 1]) – f’([0, 0, 1, 1, 1]) =
f’([0, 1, 0, 0, 0]) – f’([0, 0, 0, 0, 0]) =
f’([1, 1, 1, 0, 1]) – f’([1, 0, 1, 0, 1]) =
f’([1, 1, 1, 1, 1]) – f’([1, 0, 1, 1, 1]) =
f’([1, 1, 0, 0, 1]) – f’([1, 0, 0, 0, 1]) =
𝜙𝑖 𝑓′
, 𝑥 =
Input feature i Input feature i
Output value difference
when adding 𝑥𝑖
′
i’th SHAP value for f’
Proofs from coalitional game theory show there is only
one possible set of values 𝜙𝑖 that satisfy these axioms.
They are the Shapley values.
40
Modelagnostic
41
LIME
Shapley value sampling /
Quantitative Input Influence
Approximate the complex model near a given
prediction. - Ribeiro et al. 2016
Feature importance for a given prediction using
game theory. - Štrumbelj et al 2014, Datta et al. 2016
DeepLIFT
Difference from a reference explanations of
neural networks. – Shrikumar et al. 2016
Layer-wise relevance prop
Back propagates neural network explanations. -
Bach et al. 2015
Shapley regression values
Explain linear models in the presence of
collinearity. – Gromping et al. 2012
NeuralnetworksLinear
The SHAP class is large
Surprising unity!
Modelagnostic
42
LIME
Shapley regression values
NeuralnetworksLinear
The SHAP class has one
optimum, in the sense that it is
the only set of additive values
satisfying several desirable
properties.
DeepLIFT
Layer-wise relevance prop. SHAP
Shapley value sampling /
Quantitative Input Influence
class
Surprising unity!
Modelagnostic
43
LIME
Shapley regression values
NeuralnetworksLinear
The SHAP class has one
optimum, in the sense that it is
the only set of additive values
satisfying several desirable
properties.
DeepLIFT
Layer-wise relevance prop. SHAP
Shapley value sampling /
Quantitative Input Influence
class
SHAP class unifies in three ways:
44
Shapley/QIISHAP
1. Extends Shapley value sampling and Quantitative Input Influence.
2. Provides theoretically justified improvements and motivation for other methods.
3. Adapts other method’s to improve Shapley value estimation performance.
Theoretically justified improvements to
LIME
45
Shapley/QIISHAP
The LIME formalism of fitting a simple
interpretable model to a complex model locally
The loss function forcing g to well approximate f
A class of interpretable models
Kernel specifying what ‘local’ means
Optional regularization of g
But how do we pick 𝒢, L, 𝜴, and 𝝅 𝒙′?
46
SHAP
Surprise: If 𝒢 is linear models, and x’ is
binary then we are in the SHAP class!
This means the Shapley values are the only possible
solution satisfying efficiency and monotonicity.
47
Great! But what about the parameters L, 𝛺, and 𝜋 𝑥′?
We found a kernel and loss function that cause a local approximation
to reproduce the Shapley values.
The Shapley kernel
here, and let f x (S) = f (hx (1S )). If for all subsets S that do not contain i or j
f x (S [ { i} ) = f x (S [ { j } ) (4)
φi (f , x) = φj (f , x). This states that if two features contribute equally to the model
heir effects must bethe same.
tonicity. For any two models f and f 0
, if for all subsets S that do not contain i
f x (S [ { i} ) − f x (S) ≥ f 0
x (S [ { i} ) − f 0
x (S) (5)
φi (f , x) ≥ φi (f 0
, x). Thisstates that if observing afeature increases f morethan f 0
in
uations, then that feature’seffect should belarger for f than for f 0
.
of theseaxioms would lead to potentially confusing behavior. In 1985, Peyton Young
hat there is only one set of values that satisfies the aboveassumptions and they are
ues [7, 4]. ESvaluesareShapley values of expected valuefunctions, therefore they
ution to Equation 1 that conforms to Equation 2 and satisfies thethree axioms above.
of ESvaluesholdsover alargeclass of possible models, including theexamples used
aper that originally proposed this formalism [3].
pecific forms of x0
, L, and ⌦that lead to Shapley values asthesolution and they are:
⌦(g) = 0
⇡x 0(z0
) =
(M − 1)
(M choose |z0|)|z0|(M − |z0|)
X ⇥ ⇤
(6) 48
of theseaxioms would lead to potentially confusing behavior. In 1985, Peyton Young
hat there is only one set of values that satisfies the aboveassumptions and they are
ues [7, 4]. ESvaluesareShapley values of expected valuefunctions, therefore they
ution to Equation 1 that conforms to Equation 2 and satisfies thethree axioms above.
of ESvaluesholdsover alargeclass of possible models, including theexamples used
aper that originally proposed this formalism [3].
pecific forms of x0
, L, and ⌦that lead to Shapley values asthesolution and they are:
⌦(g) = 0
⇡x 0(z0
) =
(M − 1)
(M choose |z0|)|z0|(M − |z0|)
L(f , g, ⇡x0) =
X
z02 Z
⇥
f (h− 1
x (z0
)) − g(z0
)
⇤2
⇡x 0(z0
)
(6)
to note that ⇡x 0(z0
) = 1 when |z0
| 2 { 0, M } , which enforces φ0 = f x (; ) and
φi . In practicetheseinfiniteweightscan beavoided during optimization by analytically
o variables using these constraints. Figure 2A compares our Shapley kernel with
schosen heuristically. Theintuitiveconnection between linear regression and classical
estimates isthat classical Shapley value estimates arecomputed asthemean of many
s. Sincethemean isalso thebest least squares point estimate for aset of datapointsit
arch for aweighting kernel that causeslinear least squares regression to recapitulate
tonicity. For any two models f and f 0
, if for all subsets S that do not contain i
f x (S [ { i} ) − f x (S) ≥ f 0
x (S [ { i} ) − f 0
x (S) (5)
φi (f , x) ≥ φi (f 0
, x). Thisstates that if observing afeature increases f morethan f 0
in
uations, then that feature’seffect should belarger for f than for f 0
.
of theseaxioms would lead to potentially confusing behavior. In 1985, Peyton Young
hat there is only one set of values that satisfies the aboveassumptions and they are
ues [7, 4]. ESvaluesareShapley values of expected valuefunctions, therefore they
ution to Equation 1 that conforms to Equation 2 and satisfies thethree axioms above.
of ESvaluesholdsover alargeclass of possible models, including theexamples used
aper that originally proposed this formalism [3].
pecific forms of x0
, L, and ⌦that lead to Shapley values asthesolution and they are:
⌦(g) = 0
⇡x 0(z0
) =
(M − 1)
(M choose |z0|)|z0|(M − |z0|)
L(f , g, ⇡x0) =
X
z02 Z
⇥
f (h− 1
x (z0
)) − g(z0
)
⇤2
⇡x 0(z0
)
(6)
to note that ⇡x 0(z0
) = 1 when |z0
| 2 { 0, M } , which enforces φ0 = f x (; ) and
φi . In practicetheseinfiniteweightscan beavoided during optimization by analytically
There is no other kernel that satisfies the axioms and produces a different result.
Thanks!
Remember to ask why…

More Related Content

What's hot

Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
Bill Liu
 
Shap
ShapShap
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
inovex GmbH
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
Deep Learning Italia
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
Mansour Saffar
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
Krishnaram Kenthapadi
 
Using SHAP to Understand Black Box Models
Using SHAP to Understand Black Box ModelsUsing SHAP to Understand Black Box Models
Using SHAP to Understand Black Box Models
Jonathan Bechtel
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Dinesh V
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Sri Ambati
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
Francesca Lazzeri, PhD
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
Aditya Bhattacharya
 
A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)
Rama Irsheidat
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
inovex GmbH
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
Sri Ambati
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
Saurabh Kaushik
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
Raouf KESKES
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Explainable AI (XAI)
Explainable AI (XAI)Explainable AI (XAI)
Explainable AI (XAI)
Manojkumar Parmar
 
Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models?
Raheel Ahmad
 

What's hot (20)

Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Shap
ShapShap
Shap
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
Using SHAP to Understand Black Box Models
Using SHAP to Understand Black Box ModelsUsing SHAP to Understand Black Box Models
Using SHAP to Understand Black Box Models
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Explainable AI (XAI)
Explainable AI (XAI)Explainable AI (XAI)
Explainable AI (XAI)
 
Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models?
 

Similar to eScience SHAP talk

Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
Ankit Tewari
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
Fuzzy sets
Fuzzy sets Fuzzy sets
Fuzzy sets
ABSARQURESHI
 
Score Week 5 Correlation and RegressionCorrelation and Regres.docx
Score Week 5 Correlation and RegressionCorrelation and Regres.docxScore Week 5 Correlation and RegressionCorrelation and Regres.docx
Score Week 5 Correlation and RegressionCorrelation and Regres.docx
kenjordan97598
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Quinn Lathrop
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
Electronic Arts / DICE
 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Dorothy Bishop
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigData
AnalyticsWeek
 
20 August Quantitative Aptitude And Language Proficiency Business Mathematics
20 August Quantitative Aptitude  And Language Proficiency Business Mathematics20 August Quantitative Aptitude  And Language Proficiency Business Mathematics
20 August Quantitative Aptitude And Language Proficiency Business MathematicsDr. Trilok Kumar Jain
 
First meeting
First meetingFirst meeting
First meetingbutest
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
Libya Thomas
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
Jeongmin Cha
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursDr. Trilok Kumar Jain
 
DIGITAL TEXT BOOK
DIGITAL TEXT BOOKDIGITAL TEXT BOOK
DIGITAL TEXT BOOKbintu55
 
Lecture 005-15_fuzzy logic _part1_ membership_function.pdf
Lecture 005-15_fuzzy logic _part1_ membership_function.pdfLecture 005-15_fuzzy logic _part1_ membership_function.pdf
Lecture 005-15_fuzzy logic _part1_ membership_function.pdf
tusharjangra5
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
pujashri1975
 
Lab4 2022
Lab4 2022Lab4 2022
Lab4 2022
AndersonBussing
 

Similar to eScience SHAP talk (20)

Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Fuzzy sets
Fuzzy sets Fuzzy sets
Fuzzy sets
 
Score Week 5 Correlation and RegressionCorrelation and Regres.docx
Score Week 5 Correlation and RegressionCorrelation and Regres.docxScore Week 5 Correlation and RegressionCorrelation and Regres.docx
Score Week 5 Correlation and RegressionCorrelation and Regres.docx
 
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
Computer Generated Items, Within-Template Variation, and the Impact on the Pa...
 
Module 2 topic 1 notes
Module 2 topic 1 notesModule 2 topic 1 notes
Module 2 topic 1 notes
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigData
 
20 August Quantitative Aptitude And Language Proficiency Business Mathematics
20 August Quantitative Aptitude  And Language Proficiency Business Mathematics20 August Quantitative Aptitude  And Language Proficiency Business Mathematics
20 August Quantitative Aptitude And Language Proficiency Business Mathematics
 
First meeting
First meetingFirst meeting
First meeting
 
151028_abajpai1
151028_abajpai1151028_abajpai1
151028_abajpai1
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneurs
 
DIGITAL TEXT BOOK
DIGITAL TEXT BOOKDIGITAL TEXT BOOK
DIGITAL TEXT BOOK
 
Lecture 005-15_fuzzy logic _part1_ membership_function.pdf
Lecture 005-15_fuzzy logic _part1_ membership_function.pdfLecture 005-15_fuzzy logic _part1_ membership_function.pdf
Lecture 005-15_fuzzy logic _part1_ membership_function.pdf
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
Lab4 2022
Lab4 2022Lab4 2022
Lab4 2022
 

Recently uploaded

Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 

eScience SHAP talk

  • 1. Using game theory to explain the output of any machine learning model April 18th, eScience Seminar Scott Lundberg, Su-In Lee
  • 2. How much money is someone likely to make? model 31% chance of making > $50K annually Why?! No loan
  • 3. Interpretable Accurate Complex model ✘ ✔ Simple model ✔ ✘ Interpretable, accurate: choose one. 3 Problem: Explaining complex models is hard!  $
  • 4. Idea: Don’t explain the whole model, just one prediction Complex models are inherently complex! But a single prediction involves only a small piece of that complexity. Inputvalue Outputvalue 4
  • 5. Goal: Model agnostic interpretability model prediction magic explanation What if we could view the model as a black box… …and yet still be able to explain predictions? data Interpretable, accurate: choose two! 5 $
  • 6. If only we had this magic box… model prediction magic explanation data  Predictions from any complex model could be explained.  Prediction would be decoupled from explanation, reducing method lock-in.  Explanations from very different model types could be easily compared.  and $! 6
  • 7. So let’s build it! model prediction data 7 magic explanation
  • 8. How much money is someone likely to make? model 31% chance of making > $50K annually
  • 9. How much money is someone likely to make? model 31% chance of making > $50K annually Capital losses $0 Weekly hours 40 Occupation Protective-serv Capital gains $0 Age 28 Marital status Married-civ-spouse
  • 10. 10 chance of making > $50K annually Base rate 26% How did we get here? 15% 40% Model prediction 31%
  • 11. model 26% chance of making > 50K annually Occupation Exec-managerial Age 37 Relationship Wife Years in school 14 Sex Female Marital status Married-civ-spouse No attributes are given to the model Base rate
  • 12. model 25% chance of making > 50K annually Capital losses $0 Weekly hours 40 Occupation Protective-serv Capital losses $0 Age 28 Marital status Married-civ-spouse
  • 13. 13 chance of making > $50K annually Base rate 26% 15% 40% No capital losses Model prediction if we only know they had no capital losses 25%
  • 14. 14 chance of making > $50K annually Base rate 26% 15% 40% Police/fire Prediction if we know they had no capital losses and work in police/fire 24%
  • 15. 15 chance of making > $50K annually Base rate 26% 15% 40% Work 40 hr/week
  • 16. 16 chance of making > $50K annually Base rate 26% 15% 40% No capital gains
  • 17. 17 chance of making > $50K annually Base rate 26% 15% 40% Age (no effect)
  • 18. 18 chance of making > $50K annually Base rate 26% 15% 40% Married
  • 19. 19 chance of making > $50K annually Base rate 26% 15% 40%
  • 20. 20 chance of making > $50K annually Base rate 26% 15% 40%
  • 21. 21 chance of making > $50K annually Base rate 26% 15% 40%Married No capital gains 40 P/F Cap loss
  • 22. 22 chance of making > $50K annually 15% 40%
  • 24. Large capital lossLarge capital gain Young and single Highly educated and married Highly educated and single Young and married Divorced women Married,typical education Samplessorted by ESvalue similarity(B) (A) Bar width isequal to the ESvalue for that input Predictiedprobabilityofmaking>=50K 24 Samples clustered by explanation similarity
  • 25. 25 chance of making > $50K annually 15% 40% Explaining a single prediction from a model with 500 decision trees Unique optimal explanation under basic axioms from cooperative game theory.
  • 30. Some notation… f f(x) x 30 magic gx What is an explanation anyway? It is an interpretable function that approximates f!
  • 31. Car salesman example 31 Age Weight Is student Probability of buying a car Imagine the explanation gx is a linear model of x:
  • 32. Use “interpretable” inputs f f(x) x 32 magic gx Interpretable input features Mapping to interpretable inputs
  • 33. Car salesman example 33 Age Weight Is student Imagine the explanation gx is a linear model of x’: Age known to be 25 Weight known to be 150 Is student known to be 1
  • 34. Some notation… f f(x) x 34 m gx We are going to derive ‘m’ from basic axioms!
  • 35. • Axiom 1 (Binarization) • Axiom 2 (Linearity) SHAP class of explanation methods 35 All methods that satisfy Axioms 1 and 2 are in the Shapley Additive Explanation (SHAP) class. Where 0 means “missing” and 1 means “observed” An explanation is a linear model.
  • 36. Given two natural axioms, there is only one possible magic box in the SHAP class! 36 f f(x) x m gx ‘m’ is uniquely determined for all methods in the SHAP class under two axioms
  • 37. Efficiency axiom: g correctly reproduces the original prediction
  • 38. Monotonicity axiom: If we make a new model 𝑓′ 𝑥 that is larger than 𝑓(𝑥) whenever 𝑥𝑖 ′ = 1 then 𝜙𝑖 𝑓′ , 𝑥 ≥ 𝜙𝑖 𝑓, 𝑥 38 f([0, 1, 1, 0, 0]) - f([0, 0, 1, 0, 0]) = f([0, 1, 1, 1, 1]) - f([0, 0, 1, 1, 1]) = f([0, 1, 0, 0, 0]) - f([0, 0, 0, 0, 0]) = f([1, 1, 1, 0, 1]) - f([1, 0, 1, 0, 1]) = f([1, 1, 1, 1, 1]) - f([1, 0, 1, 1, 1]) = f([1, 1, 0, 0, 1]) - f([1, 0, 0, 0, 1]) = 𝜙𝑖 𝑓, 𝑥 = Input feature i Input feature i Output value difference when adding 𝑥𝑖 ′ i’th SHAP value for f
  • 39. Monotonicity axiom: If we make a new model 𝑓′ 𝑥 that is larger than 𝑓(𝑥) whenever 𝑥𝑖 ′ = 1 then 𝜙𝑖 𝑓′ , 𝑥 ≥ 𝜙𝑖 𝑓, 𝑥 39 f’([0, 1, 1, 0, 0]) – f’([0, 0, 1, 0, 0]) = f’([0, 1, 1, 1, 1]) – f’([0, 0, 1, 1, 1]) = f’([0, 1, 0, 0, 0]) – f’([0, 0, 0, 0, 0]) = f’([1, 1, 1, 0, 1]) – f’([1, 0, 1, 0, 1]) = f’([1, 1, 1, 1, 1]) – f’([1, 0, 1, 1, 1]) = f’([1, 1, 0, 0, 1]) – f’([1, 0, 0, 0, 1]) = 𝜙𝑖 𝑓′ , 𝑥 = Input feature i Input feature i Output value difference when adding 𝑥𝑖 ′ i’th SHAP value for f’
  • 40. Proofs from coalitional game theory show there is only one possible set of values 𝜙𝑖 that satisfy these axioms. They are the Shapley values. 40
  • 41. Modelagnostic 41 LIME Shapley value sampling / Quantitative Input Influence Approximate the complex model near a given prediction. - Ribeiro et al. 2016 Feature importance for a given prediction using game theory. - Štrumbelj et al 2014, Datta et al. 2016 DeepLIFT Difference from a reference explanations of neural networks. – Shrikumar et al. 2016 Layer-wise relevance prop Back propagates neural network explanations. - Bach et al. 2015 Shapley regression values Explain linear models in the presence of collinearity. – Gromping et al. 2012 NeuralnetworksLinear The SHAP class is large
  • 42. Surprising unity! Modelagnostic 42 LIME Shapley regression values NeuralnetworksLinear The SHAP class has one optimum, in the sense that it is the only set of additive values satisfying several desirable properties. DeepLIFT Layer-wise relevance prop. SHAP Shapley value sampling / Quantitative Input Influence class
  • 43. Surprising unity! Modelagnostic 43 LIME Shapley regression values NeuralnetworksLinear The SHAP class has one optimum, in the sense that it is the only set of additive values satisfying several desirable properties. DeepLIFT Layer-wise relevance prop. SHAP Shapley value sampling / Quantitative Input Influence class
  • 44. SHAP class unifies in three ways: 44 Shapley/QIISHAP 1. Extends Shapley value sampling and Quantitative Input Influence. 2. Provides theoretically justified improvements and motivation for other methods. 3. Adapts other method’s to improve Shapley value estimation performance.
  • 45. Theoretically justified improvements to LIME 45 Shapley/QIISHAP
  • 46. The LIME formalism of fitting a simple interpretable model to a complex model locally The loss function forcing g to well approximate f A class of interpretable models Kernel specifying what ‘local’ means Optional regularization of g But how do we pick 𝒢, L, 𝜴, and 𝝅 𝒙′? 46 SHAP
  • 47. Surprise: If 𝒢 is linear models, and x’ is binary then we are in the SHAP class! This means the Shapley values are the only possible solution satisfying efficiency and monotonicity. 47 Great! But what about the parameters L, 𝛺, and 𝜋 𝑥′?
  • 48. We found a kernel and loss function that cause a local approximation to reproduce the Shapley values. The Shapley kernel here, and let f x (S) = f (hx (1S )). If for all subsets S that do not contain i or j f x (S [ { i} ) = f x (S [ { j } ) (4) φi (f , x) = φj (f , x). This states that if two features contribute equally to the model heir effects must bethe same. tonicity. For any two models f and f 0 , if for all subsets S that do not contain i f x (S [ { i} ) − f x (S) ≥ f 0 x (S [ { i} ) − f 0 x (S) (5) φi (f , x) ≥ φi (f 0 , x). Thisstates that if observing afeature increases f morethan f 0 in uations, then that feature’seffect should belarger for f than for f 0 . of theseaxioms would lead to potentially confusing behavior. In 1985, Peyton Young hat there is only one set of values that satisfies the aboveassumptions and they are ues [7, 4]. ESvaluesareShapley values of expected valuefunctions, therefore they ution to Equation 1 that conforms to Equation 2 and satisfies thethree axioms above. of ESvaluesholdsover alargeclass of possible models, including theexamples used aper that originally proposed this formalism [3]. pecific forms of x0 , L, and ⌦that lead to Shapley values asthesolution and they are: ⌦(g) = 0 ⇡x 0(z0 ) = (M − 1) (M choose |z0|)|z0|(M − |z0|) X ⇥ ⇤ (6) 48 of theseaxioms would lead to potentially confusing behavior. In 1985, Peyton Young hat there is only one set of values that satisfies the aboveassumptions and they are ues [7, 4]. ESvaluesareShapley values of expected valuefunctions, therefore they ution to Equation 1 that conforms to Equation 2 and satisfies thethree axioms above. of ESvaluesholdsover alargeclass of possible models, including theexamples used aper that originally proposed this formalism [3]. pecific forms of x0 , L, and ⌦that lead to Shapley values asthesolution and they are: ⌦(g) = 0 ⇡x 0(z0 ) = (M − 1) (M choose |z0|)|z0|(M − |z0|) L(f , g, ⇡x0) = X z02 Z ⇥ f (h− 1 x (z0 )) − g(z0 ) ⇤2 ⇡x 0(z0 ) (6) to note that ⇡x 0(z0 ) = 1 when |z0 | 2 { 0, M } , which enforces φ0 = f x (; ) and φi . In practicetheseinfiniteweightscan beavoided during optimization by analytically o variables using these constraints. Figure 2A compares our Shapley kernel with schosen heuristically. Theintuitiveconnection between linear regression and classical estimates isthat classical Shapley value estimates arecomputed asthemean of many s. Sincethemean isalso thebest least squares point estimate for aset of datapointsit arch for aweighting kernel that causeslinear least squares regression to recapitulate tonicity. For any two models f and f 0 , if for all subsets S that do not contain i f x (S [ { i} ) − f x (S) ≥ f 0 x (S [ { i} ) − f 0 x (S) (5) φi (f , x) ≥ φi (f 0 , x). Thisstates that if observing afeature increases f morethan f 0 in uations, then that feature’seffect should belarger for f than for f 0 . of theseaxioms would lead to potentially confusing behavior. In 1985, Peyton Young hat there is only one set of values that satisfies the aboveassumptions and they are ues [7, 4]. ESvaluesareShapley values of expected valuefunctions, therefore they ution to Equation 1 that conforms to Equation 2 and satisfies thethree axioms above. of ESvaluesholdsover alargeclass of possible models, including theexamples used aper that originally proposed this formalism [3]. pecific forms of x0 , L, and ⌦that lead to Shapley values asthesolution and they are: ⌦(g) = 0 ⇡x 0(z0 ) = (M − 1) (M choose |z0|)|z0|(M − |z0|) L(f , g, ⇡x0) = X z02 Z ⇥ f (h− 1 x (z0 )) − g(z0 ) ⇤2 ⇡x 0(z0 ) (6) to note that ⇡x 0(z0 ) = 1 when |z0 | 2 { 0, M } , which enforces φ0 = f x (; ) and φi . In practicetheseinfiniteweightscan beavoided during optimization by analytically There is no other kernel that satisfies the axioms and produces a different result.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.

Editor's Notes

  1. Animate the the check boxes and X’s one at a time as I explain them.
  2. Explain that this is one feature input, don’t mention approximating with simpler model yet
  3. Not clear that the explanation matches the prediction (could be the whole model)
  4. Even ones in the future.
  5. Even ones in the future.
  6. Mention they have seen this figure before. Erase small text
  7. Mention they have seen this figure before. Erase small text
  8. Mention they have seen this figure before. Erase small text
  9. Mention they have seen this figure before. Erase small text
  10. Mention they have seen this figure before. Erase small text
  11. Mention they have seen this figure before. Erase small text
  12. Mention they have seen this figure before. Erase small text
  13. Mention they have seen this figure before. Erase small text
  14. Mention they have seen this figure before. Erase small text
  15. Mention they have seen this figure before. Erase small text
  16. Mention they have seen this figure before. Erase small text
  17. Mention they have seen this figure before. Erase small text
  18. Mention they have seen this figure before. Erase small text
  19. Even ones in the future.
  20. Even ones in the future.
  21. Even ones in the future.
  22. Even ones in the future.
  23. Even ones in the future.
  24. Even ones in the future.
  25. Even ones in the future.
  26. Even ones in the future.
  27. Even ones in the future.
  28. Even ones in the future.
  29. Even ones in the future.
  30. Shorten all the text blurbs
  31. Highlight the columns with a box
  32. Make title shorter
  33. Make title shorter
  34. Mention what collienarity is
  35. Same size boxes? Citations
  36. Same size boxes? Citations
  37. Simplify by only explaining the top arrows as a group, and the bottom arrows as a group.
  38. Simplify by only explaining the top arrows as a group, and the bottom arrows as a group.
  39. Make title shorter
  40. Be better about explaining why the symmetry is good.