SlideShare a Scribd company logo
1 of 30
Download to read offline
Interpretability: Challenging the Black Box of
Machine Learning
Ankit Tewari
Research Data Scientist
Knowledge Engineering and Machine Learning Group (KEMLG)
Biomedical and Biophysical Signal Processing Group (B2S LAB)
Universitat Politecnica de Catalunya (UPC)
November 10, 2018
Smart City Week: City, Society and Technology
Lunchtime, Storytime!
1. Amazon’s AI based recruitment tool that favored men for
technical jobs:penalized the resume files that in-
cluded the word ”women’s”, as in ”women’s”chess club captain”;
https://www.theguardian.com/technology/2018/oct/10/amazon-
hiring-ai-gender-bias-recruiting-engine
2. Racial and Gender Bias in AI based Criminal Justice
System: ProPublica compared COMPAS’s risk assessments for
7,000 people arrested in a Florida county with how often they
reoffended; https://www.propublica.org/article/machine-bias-
risk-assessments-in-criminal-sentencing
COMPAS Software Results’, Julia Angwin et al. (2016)
Solutions?
While there are many reasons such biases are encountered in our
machine learning systems, there are pretty straight-forward
mechanisms to address. But, remember straight forward is not
always simple!
Data preprocessing techniques for classification without
discrimination. (statistical parity)
Discrimination aware Machine Learning Models
and many more approaches!
However, our discussion is focused on examining whether and how
much biased a system is through explaining the predictions made
by the system.
Prediction Accuracy versus Explainability
Remember, nothing comes free of cost. And a good accuracy
comes often with a complex model, that is not interpretable.
Smarter the System, the more Black the Box gets!
Remember, nothing comes free of cost. And a good accuracy
comes often with a complex model, that is not interpretable.
The intolerable silence!
Silence of your lover is different from the silence of your computer.
It signifies the barrier between tolerance and intolerance!
Interpretability: The ray of hope :)
Definition: Interpretability is the degree to which a human
can understand the cause of a decision. It is the degree to
which a human can consistently predict the model’s result.
The higher the interpretability of a model, the easier it is for
someone to comprehend why certain decisions (read: predictions)
were made.
Interpretability versus Interpretation
While interpretability is a measure of the extent to which a
machine learning model can be explained, the interpretation
is the explanation associated with the model’s predictions.
1. Importance and Scope
2. Taxonomy of Interpretability Methods
Taxonomy of Interpretability Models
Intrinsic or post hoc?
Intrinsic interpretability means selecting and training a
machine learning model that is considered to be intrinsically
interpretable (for example short decision trees). Post hoc
interpretability means selecting and training a black box
model (for example a neural network) and applying
interpretability methods after the training (for example
measuring the feature importance).
Model-specific or model-agnostic?
Model-specific interpretation tools are limited to specific
model classes. Model-agnostic tools can be used on any
machine learning model and are usually post hoc.
Local or Global?
Does the interpretation method explain a single prediction or
the entire model behavior?
Model Agnostic Methods for Interpretability
Global Surrogate Models
Local Surrogate Models (LIME)
Feature Importance Plot
Shapley Values
Partial Conditional Dependence (PDP)
Individual Conditional Expectation (ICE)
Global Surrogate Models
We want to approximate our black box prediction function ˆf (x) as closely
as possible with the surrogate model prediction function ˆg(x), under the
constraint that is interpretable. We can make use of any interpretable
model, say, linear regression model
ˆg(x) = β0 + β1x1 + · · · + βP xP (1)
Now,the idea is to fit ˆf (x) on the dataset and obtain predictions ˆy.
Then, we train the ˆg(x) using ˆy as the target. The obtained surrogate
model ˆg can be used to interpret the blackbox model ˆf .
We can also measure how well the surrogate model fits the original black
box model with the R squared measure as an example-
R2
= 1 − SSE
SST = 1 −
n
i=1(ˆy∗
i −ˆyi )2
n
i=1(ˆyi −¯ˆy)2
The terminal nodes of a surrogate tree that approximates the
behaviour of a support vector machine trained on the bike rental
dataset. The distributions in the nodes show that the surrogate
tree predicts a higher number of rented bikes when the weather is
above around 13 degrees (Celsius) and when the day was later in
the 2 year period (cut point at 435 days).
Local Surrogate Model (LIME)
Intuitively, the local surrogate models attempt to explain a single
instance in the same way, the global surrogate models do.
Mathematically, the local surrogate models can be described as-
explanation(x) = arg ming∈G L(f , g, πx ) + Ω(g)
The explanation model for instance x is the model g (e.g. linear
regression model) that minimizes loss L (e.g. mean squared error),
which measures how close the explanation is to the prediction of
the original model f (e.g. an xgboost model), while the model
complexity Ω(g) is kept low (e.g. favor fewer features).
Local Surrogate Model (LIME)
We can describe the recipe for fitting local surrogate models as follows:
We first choose our instance (observations) of interest for which we
want to have an explanation of its black box prediction
Then we perturb our dataset and get the black box predictions for
these new data points
We then weight the new samples by their proximity to the instance
of interest to allow the model to learn locally
Finally, we fit a weighted, interpretable model on the dataset with
the variations and explain prediction by interpreting the local model
Local Surrogate Model (LIME)
Local Surrogate Model (LIME)
A) The plot displays the decision boundaries learned by a machine
learning model. In this case it was a Random Forest, but it does
not matter, because LIME is model-agnostic.
B) The yellow point is the instance of interest, which we want to
explain. The black dots are data sampled from a normal
distribution around the means of the features in the training
sample. This needs to be done only once and can be reused for
other explanations.
C) Introducing locality by giving points near the instance of
interest higher weights.
D) The colours and signs of the grid display the classifications of
the locally learned model form the weighted samples. The white
line marks the decision boundary (P(class) = 0.5) at which the
classification of the local model changes.
Local Surrogate Model (LIME)
Application of the LIME on a counter-terrorism dataset, an
ongoing project that aims to measure the fingerprints of terrorist
outfits across the globe
Feature Importance
A feature’s importance is the increase in the model’s prediction
error after we permuted the feature’s values (breaks the
relationship between the feature and the outcome).
Just like the global surrogate models, it provides a salient overview
of how the model is behaving globally.
Feature Importance
Feature Importance
Input: Trained model ˆf , feature matrix X , target vector Y , error
measure L(Y , ˆY )
1. Estimate the original model error eorig (ˆf ) = L(Y , ˆf (X)) (e.g. mean
squared error)
2. For each feature j ∈ {1, ..., p} d
Generate feature matrix Xpermj
by permuting feature Xj in X. This
breaks the association between Xj and Y .
Estimate error eperm = L(Y , ˆf (Xpermj
)) based on the predictions of
the permuted data
Calculate permutation feature importance FIj = eperm(ˆf )/eorig (ˆf ) .
Alternatively, the difference can be used: FIj = eperm(ˆf ) − eorig (f )
3. Sort variables by descending FI.
Feature Importance
Shapley Values
The Shapley value is the average marginal contribution of a
feature value over all possible coalitions.
Predictions can be explained by assuming that each
feature is a ’player’ in a game where the prediction is
the payout. The Shapley value - a method from
coalitional game theory - tells us how to fairly distribute
the ’payout’ among the features.
The interpretation of the Shapley value. φij for feature j
and instance i is: the feature value xij contributed φij
towards the prediction for instance i compared to the
average prediction for the dataset. The Shapley value
works for both classification (if we deal with probabilities) and
regression. We use the Shapley value to analyse the
predictions of a Random Forest model predicting the
absenteeism at workplace.
Shapley Values
Partial Dependence Plot (PDP)
The partial dependence plot (PDP or PD plot) shows the
marginal effect of a feature on the predicted outcome of a
previously fit model (J. H. Friedman). The prediction function is
fixed at a few values of the chosen features and averaged over the
other features.
In practice, the set of features Xs usually only contains one feature
or a maximum of two, because one feature produces 2D plots and
two features produce 3D plots. Everything beyond that is quite
tricky. Even 3D on a 2D paper or monitor is already challenging.
Partial Dependence Plot (PDP)
Indepent Conditional Expectation (ICE)
For a chosen feature, Individual Conditional Expectation (ICE)
plots draw one line per instance, representing how the instance’s
prediction changes when the feature changes.
Individual Conditional Expectation (ICE)
An ICE plot visualizes the dependence of the predicted response on
a feature for EACH instance separately, resulting in multiple lines,
one for each instance, compared to one line in partial dependence
plots. A PDP is the average of the lines of an ICE plot.
The values for a line (and one instance) can be computed by
leaving all other features the same, creating variants of this
instance by replacing the featureˆas value with values from a grid
and letting the black box make the predictions with these newly
created instances. The result is a set of points for an instance with
the feature value from the grid and the respective predictions.
Individual Conditional Expectation (ICE)
Evaluating the Interpretability
Application Level Evaluation: Put the explanation into the
product and let the end user test it.
Human Level Evaluation: is a simplified application level
evaluation. The difference is that these experiments are not
conducted with the domain experts, but with lay humans. An
example would be to show a user different explanations and
the human would choose the best.
Functional Level Evaluation: This works best when the
class of models used was already evaluated by someone else in
a human level evaluation. For example it might be known that
the end users understand decision trees. In this case, a proxy
for explanation quality might be the depth of the tree. Shorter
trees would get a better explainability rating.
Questions?
Thank you so much for being the part of this talk. You can also
write me at ankitt.nic@gmail.com :)

More Related Content

What's hot

Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
Syed Atif Naseem
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
butest
 
Fcv hum mach_geman
Fcv hum mach_gemanFcv hum mach_geman
Fcv hum mach_geman
zukun
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Xueping Peng
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 

What's hot (17)

Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Fcv hum mach_geman
Fcv hum mach_gemanFcv hum mach_geman
Fcv hum mach_geman
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Similar to Citython presentation

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
butest
 
Coates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsCoates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worlds
ArchiLab 7
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
IAEME Publication
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Similar to Citython presentation (20)

Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
 
Human Emotion Recognition
Human Emotion RecognitionHuman Emotion Recognition
Human Emotion Recognition
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
 
IMAGE GENERATION FROM CAPTION
IMAGE GENERATION FROM CAPTIONIMAGE GENERATION FROM CAPTION
IMAGE GENERATION FROM CAPTION
 
Image Generation from Caption
Image Generation from Caption Image Generation from Caption
Image Generation from Caption
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Coates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsCoates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worlds
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Citython presentation

  • 1. Interpretability: Challenging the Black Box of Machine Learning Ankit Tewari Research Data Scientist Knowledge Engineering and Machine Learning Group (KEMLG) Biomedical and Biophysical Signal Processing Group (B2S LAB) Universitat Politecnica de Catalunya (UPC) November 10, 2018 Smart City Week: City, Society and Technology
  • 2. Lunchtime, Storytime! 1. Amazon’s AI based recruitment tool that favored men for technical jobs:penalized the resume files that in- cluded the word ”women’s”, as in ”women’s”chess club captain”; https://www.theguardian.com/technology/2018/oct/10/amazon- hiring-ai-gender-bias-recruiting-engine 2. Racial and Gender Bias in AI based Criminal Justice System: ProPublica compared COMPAS’s risk assessments for 7,000 people arrested in a Florida county with how often they reoffended; https://www.propublica.org/article/machine-bias- risk-assessments-in-criminal-sentencing
  • 3. COMPAS Software Results’, Julia Angwin et al. (2016)
  • 4. Solutions? While there are many reasons such biases are encountered in our machine learning systems, there are pretty straight-forward mechanisms to address. But, remember straight forward is not always simple! Data preprocessing techniques for classification without discrimination. (statistical parity) Discrimination aware Machine Learning Models and many more approaches! However, our discussion is focused on examining whether and how much biased a system is through explaining the predictions made by the system.
  • 5. Prediction Accuracy versus Explainability Remember, nothing comes free of cost. And a good accuracy comes often with a complex model, that is not interpretable.
  • 6. Smarter the System, the more Black the Box gets! Remember, nothing comes free of cost. And a good accuracy comes often with a complex model, that is not interpretable.
  • 7. The intolerable silence! Silence of your lover is different from the silence of your computer. It signifies the barrier between tolerance and intolerance!
  • 8. Interpretability: The ray of hope :) Definition: Interpretability is the degree to which a human can understand the cause of a decision. It is the degree to which a human can consistently predict the model’s result. The higher the interpretability of a model, the easier it is for someone to comprehend why certain decisions (read: predictions) were made.
  • 9. Interpretability versus Interpretation While interpretability is a measure of the extent to which a machine learning model can be explained, the interpretation is the explanation associated with the model’s predictions. 1. Importance and Scope 2. Taxonomy of Interpretability Methods
  • 10. Taxonomy of Interpretability Models Intrinsic or post hoc? Intrinsic interpretability means selecting and training a machine learning model that is considered to be intrinsically interpretable (for example short decision trees). Post hoc interpretability means selecting and training a black box model (for example a neural network) and applying interpretability methods after the training (for example measuring the feature importance). Model-specific or model-agnostic? Model-specific interpretation tools are limited to specific model classes. Model-agnostic tools can be used on any machine learning model and are usually post hoc. Local or Global? Does the interpretation method explain a single prediction or the entire model behavior?
  • 11. Model Agnostic Methods for Interpretability Global Surrogate Models Local Surrogate Models (LIME) Feature Importance Plot Shapley Values Partial Conditional Dependence (PDP) Individual Conditional Expectation (ICE)
  • 12. Global Surrogate Models We want to approximate our black box prediction function ˆf (x) as closely as possible with the surrogate model prediction function ˆg(x), under the constraint that is interpretable. We can make use of any interpretable model, say, linear regression model ˆg(x) = β0 + β1x1 + · · · + βP xP (1) Now,the idea is to fit ˆf (x) on the dataset and obtain predictions ˆy. Then, we train the ˆg(x) using ˆy as the target. The obtained surrogate model ˆg can be used to interpret the blackbox model ˆf . We can also measure how well the surrogate model fits the original black box model with the R squared measure as an example- R2 = 1 − SSE SST = 1 − n i=1(ˆy∗ i −ˆyi )2 n i=1(ˆyi −¯ˆy)2
  • 13. The terminal nodes of a surrogate tree that approximates the behaviour of a support vector machine trained on the bike rental dataset. The distributions in the nodes show that the surrogate tree predicts a higher number of rented bikes when the weather is above around 13 degrees (Celsius) and when the day was later in the 2 year period (cut point at 435 days).
  • 14. Local Surrogate Model (LIME) Intuitively, the local surrogate models attempt to explain a single instance in the same way, the global surrogate models do. Mathematically, the local surrogate models can be described as- explanation(x) = arg ming∈G L(f , g, πx ) + Ω(g) The explanation model for instance x is the model g (e.g. linear regression model) that minimizes loss L (e.g. mean squared error), which measures how close the explanation is to the prediction of the original model f (e.g. an xgboost model), while the model complexity Ω(g) is kept low (e.g. favor fewer features).
  • 15. Local Surrogate Model (LIME) We can describe the recipe for fitting local surrogate models as follows: We first choose our instance (observations) of interest for which we want to have an explanation of its black box prediction Then we perturb our dataset and get the black box predictions for these new data points We then weight the new samples by their proximity to the instance of interest to allow the model to learn locally Finally, we fit a weighted, interpretable model on the dataset with the variations and explain prediction by interpreting the local model
  • 17. Local Surrogate Model (LIME) A) The plot displays the decision boundaries learned by a machine learning model. In this case it was a Random Forest, but it does not matter, because LIME is model-agnostic. B) The yellow point is the instance of interest, which we want to explain. The black dots are data sampled from a normal distribution around the means of the features in the training sample. This needs to be done only once and can be reused for other explanations. C) Introducing locality by giving points near the instance of interest higher weights. D) The colours and signs of the grid display the classifications of the locally learned model form the weighted samples. The white line marks the decision boundary (P(class) = 0.5) at which the classification of the local model changes.
  • 18. Local Surrogate Model (LIME) Application of the LIME on a counter-terrorism dataset, an ongoing project that aims to measure the fingerprints of terrorist outfits across the globe
  • 19. Feature Importance A feature’s importance is the increase in the model’s prediction error after we permuted the feature’s values (breaks the relationship between the feature and the outcome). Just like the global surrogate models, it provides a salient overview of how the model is behaving globally.
  • 20. Feature Importance Feature Importance Input: Trained model ˆf , feature matrix X , target vector Y , error measure L(Y , ˆY ) 1. Estimate the original model error eorig (ˆf ) = L(Y , ˆf (X)) (e.g. mean squared error) 2. For each feature j ∈ {1, ..., p} d Generate feature matrix Xpermj by permuting feature Xj in X. This breaks the association between Xj and Y . Estimate error eperm = L(Y , ˆf (Xpermj )) based on the predictions of the permuted data Calculate permutation feature importance FIj = eperm(ˆf )/eorig (ˆf ) . Alternatively, the difference can be used: FIj = eperm(ˆf ) − eorig (f ) 3. Sort variables by descending FI.
  • 22. Shapley Values The Shapley value is the average marginal contribution of a feature value over all possible coalitions. Predictions can be explained by assuming that each feature is a ’player’ in a game where the prediction is the payout. The Shapley value - a method from coalitional game theory - tells us how to fairly distribute the ’payout’ among the features. The interpretation of the Shapley value. φij for feature j and instance i is: the feature value xij contributed φij towards the prediction for instance i compared to the average prediction for the dataset. The Shapley value works for both classification (if we deal with probabilities) and regression. We use the Shapley value to analyse the predictions of a Random Forest model predicting the absenteeism at workplace.
  • 24. Partial Dependence Plot (PDP) The partial dependence plot (PDP or PD plot) shows the marginal effect of a feature on the predicted outcome of a previously fit model (J. H. Friedman). The prediction function is fixed at a few values of the chosen features and averaged over the other features. In practice, the set of features Xs usually only contains one feature or a maximum of two, because one feature produces 2D plots and two features produce 3D plots. Everything beyond that is quite tricky. Even 3D on a 2D paper or monitor is already challenging.
  • 26. Indepent Conditional Expectation (ICE) For a chosen feature, Individual Conditional Expectation (ICE) plots draw one line per instance, representing how the instance’s prediction changes when the feature changes.
  • 27. Individual Conditional Expectation (ICE) An ICE plot visualizes the dependence of the predicted response on a feature for EACH instance separately, resulting in multiple lines, one for each instance, compared to one line in partial dependence plots. A PDP is the average of the lines of an ICE plot. The values for a line (and one instance) can be computed by leaving all other features the same, creating variants of this instance by replacing the featureˆas value with values from a grid and letting the black box make the predictions with these newly created instances. The result is a set of points for an instance with the feature value from the grid and the respective predictions.
  • 29. Evaluating the Interpretability Application Level Evaluation: Put the explanation into the product and let the end user test it. Human Level Evaluation: is a simplified application level evaluation. The difference is that these experiments are not conducted with the domain experts, but with lay humans. An example would be to show a user different explanations and the human would choose the best. Functional Level Evaluation: This works best when the class of models used was already evaluated by someone else in a human level evaluation. For example it might be known that the end users understand decision trees. In this case, a proxy for explanation quality might be the depth of the tree. Shorter trees would get a better explainability rating.
  • 30. Questions? Thank you so much for being the part of this talk. You can also write me at ankitt.nic@gmail.com :)