SlideShare a Scribd company logo
Interpretability: Challenging the Black Box of
Machine Learning
Ankit Tewari
Research Data Scientist
Knowledge Engineering and Machine Learning Group (KEMLG)
Biomedical and Biophysical Signal Processing Group (B2S LAB)
Universitat Politecnica de Catalunya (UPC)
November 10, 2018
Smart City Week: City, Society and Technology
Lunchtime, Storytime!
1. Amazon’s AI based recruitment tool that favored men for
technical jobs:penalized the resume files that in-
cluded the word ”women’s”, as in ”women’s”chess club captain”;
https://www.theguardian.com/technology/2018/oct/10/amazon-
hiring-ai-gender-bias-recruiting-engine
2. Racial and Gender Bias in AI based Criminal Justice
System: ProPublica compared COMPAS’s risk assessments for
7,000 people arrested in a Florida county with how often they
reoffended; https://www.propublica.org/article/machine-bias-
risk-assessments-in-criminal-sentencing
COMPAS Software Results’, Julia Angwin et al. (2016)
Solutions?
While there are many reasons such biases are encountered in our
machine learning systems, there are pretty straight-forward
mechanisms to address. But, remember straight forward is not
always simple!
Data preprocessing techniques for classification without
discrimination. (statistical parity)
Discrimination aware Machine Learning Models
and many more approaches!
However, our discussion is focused on examining whether and how
much biased a system is through explaining the predictions made
by the system.
Prediction Accuracy versus Explainability
Remember, nothing comes free of cost. And a good accuracy
comes often with a complex model, that is not interpretable.
Smarter the System, the more Black the Box gets!
Remember, nothing comes free of cost. And a good accuracy
comes often with a complex model, that is not interpretable.
The intolerable silence!
Silence of your lover is different from the silence of your computer.
It signifies the barrier between tolerance and intolerance!
Interpretability: The ray of hope :)
Definition: Interpretability is the degree to which a human
can understand the cause of a decision. It is the degree to
which a human can consistently predict the model’s result.
The higher the interpretability of a model, the easier it is for
someone to comprehend why certain decisions (read: predictions)
were made.
Interpretability versus Interpretation
While interpretability is a measure of the extent to which a
machine learning model can be explained, the interpretation
is the explanation associated with the model’s predictions.
1. Importance and Scope
2. Taxonomy of Interpretability Methods
Taxonomy of Interpretability Models
Intrinsic or post hoc?
Intrinsic interpretability means selecting and training a
machine learning model that is considered to be intrinsically
interpretable (for example short decision trees). Post hoc
interpretability means selecting and training a black box
model (for example a neural network) and applying
interpretability methods after the training (for example
measuring the feature importance).
Model-specific or model-agnostic?
Model-specific interpretation tools are limited to specific
model classes. Model-agnostic tools can be used on any
machine learning model and are usually post hoc.
Local or Global?
Does the interpretation method explain a single prediction or
the entire model behavior?
Model Agnostic Methods for Interpretability
Global Surrogate Models
Local Surrogate Models (LIME)
Feature Importance Plot
Shapley Values
Partial Conditional Dependence (PDP)
Individual Conditional Expectation (ICE)
Global Surrogate Models
We want to approximate our black box prediction function ˆf (x) as closely
as possible with the surrogate model prediction function ˆg(x), under the
constraint that is interpretable. We can make use of any interpretable
model, say, linear regression model
ˆg(x) = β0 + β1x1 + · · · + βP xP (1)
Now,the idea is to fit ˆf (x) on the dataset and obtain predictions ˆy.
Then, we train the ˆg(x) using ˆy as the target. The obtained surrogate
model ˆg can be used to interpret the blackbox model ˆf .
We can also measure how well the surrogate model fits the original black
box model with the R squared measure as an example-
R2
= 1 − SSE
SST = 1 −
n
i=1(ˆy∗
i −ˆyi )2
n
i=1(ˆyi −¯ˆy)2
The terminal nodes of a surrogate tree that approximates the
behaviour of a support vector machine trained on the bike rental
dataset. The distributions in the nodes show that the surrogate
tree predicts a higher number of rented bikes when the weather is
above around 13 degrees (Celsius) and when the day was later in
the 2 year period (cut point at 435 days).
Local Surrogate Model (LIME)
Intuitively, the local surrogate models attempt to explain a single
instance in the same way, the global surrogate models do.
Mathematically, the local surrogate models can be described as-
explanation(x) = arg ming∈G L(f , g, πx ) + Ω(g)
The explanation model for instance x is the model g (e.g. linear
regression model) that minimizes loss L (e.g. mean squared error),
which measures how close the explanation is to the prediction of
the original model f (e.g. an xgboost model), while the model
complexity Ω(g) is kept low (e.g. favor fewer features).
Local Surrogate Model (LIME)
We can describe the recipe for fitting local surrogate models as follows:
We first choose our instance (observations) of interest for which we
want to have an explanation of its black box prediction
Then we perturb our dataset and get the black box predictions for
these new data points
We then weight the new samples by their proximity to the instance
of interest to allow the model to learn locally
Finally, we fit a weighted, interpretable model on the dataset with
the variations and explain prediction by interpreting the local model
Local Surrogate Model (LIME)
Local Surrogate Model (LIME)
A) The plot displays the decision boundaries learned by a machine
learning model. In this case it was a Random Forest, but it does
not matter, because LIME is model-agnostic.
B) The yellow point is the instance of interest, which we want to
explain. The black dots are data sampled from a normal
distribution around the means of the features in the training
sample. This needs to be done only once and can be reused for
other explanations.
C) Introducing locality by giving points near the instance of
interest higher weights.
D) The colours and signs of the grid display the classifications of
the locally learned model form the weighted samples. The white
line marks the decision boundary (P(class) = 0.5) at which the
classification of the local model changes.
Local Surrogate Model (LIME)
Application of the LIME on a counter-terrorism dataset, an
ongoing project that aims to measure the fingerprints of terrorist
outfits across the globe
Feature Importance
A feature’s importance is the increase in the model’s prediction
error after we permuted the feature’s values (breaks the
relationship between the feature and the outcome).
Just like the global surrogate models, it provides a salient overview
of how the model is behaving globally.
Feature Importance
Feature Importance
Input: Trained model ˆf , feature matrix X , target vector Y , error
measure L(Y , ˆY )
1. Estimate the original model error eorig (ˆf ) = L(Y , ˆf (X)) (e.g. mean
squared error)
2. For each feature j ∈ {1, ..., p} d
Generate feature matrix Xpermj
by permuting feature Xj in X. This
breaks the association between Xj and Y .
Estimate error eperm = L(Y , ˆf (Xpermj
)) based on the predictions of
the permuted data
Calculate permutation feature importance FIj = eperm(ˆf )/eorig (ˆf ) .
Alternatively, the difference can be used: FIj = eperm(ˆf ) − eorig (f )
3. Sort variables by descending FI.
Feature Importance
Shapley Values
The Shapley value is the average marginal contribution of a
feature value over all possible coalitions.
Predictions can be explained by assuming that each
feature is a ’player’ in a game where the prediction is
the payout. The Shapley value - a method from
coalitional game theory - tells us how to fairly distribute
the ’payout’ among the features.
The interpretation of the Shapley value. φij for feature j
and instance i is: the feature value xij contributed φij
towards the prediction for instance i compared to the
average prediction for the dataset. The Shapley value
works for both classification (if we deal with probabilities) and
regression. We use the Shapley value to analyse the
predictions of a Random Forest model predicting the
absenteeism at workplace.
Shapley Values
Partial Dependence Plot (PDP)
The partial dependence plot (PDP or PD plot) shows the
marginal effect of a feature on the predicted outcome of a
previously fit model (J. H. Friedman). The prediction function is
fixed at a few values of the chosen features and averaged over the
other features.
In practice, the set of features Xs usually only contains one feature
or a maximum of two, because one feature produces 2D plots and
two features produce 3D plots. Everything beyond that is quite
tricky. Even 3D on a 2D paper or monitor is already challenging.
Partial Dependence Plot (PDP)
Indepent Conditional Expectation (ICE)
For a chosen feature, Individual Conditional Expectation (ICE)
plots draw one line per instance, representing how the instance’s
prediction changes when the feature changes.
Individual Conditional Expectation (ICE)
An ICE plot visualizes the dependence of the predicted response on
a feature for EACH instance separately, resulting in multiple lines,
one for each instance, compared to one line in partial dependence
plots. A PDP is the average of the lines of an ICE plot.
The values for a line (and one instance) can be computed by
leaving all other features the same, creating variants of this
instance by replacing the featureˆas value with values from a grid
and letting the black box make the predictions with these newly
created instances. The result is a set of points for an instance with
the feature value from the grid and the respective predictions.
Individual Conditional Expectation (ICE)
Evaluating the Interpretability
Application Level Evaluation: Put the explanation into the
product and let the end user test it.
Human Level Evaluation: is a simplified application level
evaluation. The difference is that these experiments are not
conducted with the domain experts, but with lay humans. An
example would be to show a user different explanations and
the human would choose the best.
Functional Level Evaluation: This works best when the
class of models used was already evaluated by someone else in
a human level evaluation. For example it might be known that
the end users understand decision trees. In this case, a proxy
for explanation quality might be the depth of the tree. Shorter
trees would get a better explainability rating.
Questions?
Thank you so much for being the part of this talk. You can also
write me at ankitt.nic@gmail.com :)

More Related Content

What's hot

Recommendation system
Recommendation systemRecommendation system
Recommendation system
Ding Li
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
csandit
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
csandit
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
Humberto Marchezi
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
Srimatre K
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
swapnac12
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
Fcv hum mach_geman
Fcv hum mach_gemanFcv hum mach_geman
Fcv hum mach_gemanzukun
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
hoangminhdong
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
csandit
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
Codemotion
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 

What's hot (17)

Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Fcv hum mach_geman
Fcv hum mach_gemanFcv hum mach_geman
Fcv hum mach_geman
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Similar to Citython presentation

Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
Ankit Tewari
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
Krzysztof Kowalczyk
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
Jéssyca Bessa
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
Riccardo Satta
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
midi
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
Abdullah Gubbi
 
Human Emotion Recognition
Human Emotion RecognitionHuman Emotion Recognition
Human Emotion Recognition
Chaitanya Maddala
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
ijaia
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Vikash Kumar
 
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
ijscmcj
 
IMAGE GENERATION FROM CAPTION
IMAGE GENERATION FROM CAPTIONIMAGE GENERATION FROM CAPTION
IMAGE GENERATION FROM CAPTION
ijscai
 
Image Generation from Caption
Image Generation from Caption Image Generation from Caption
Image Generation from Caption
IJSCAI Journal
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
shesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
shesnasuneer
 
Coates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsCoates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsArchiLab 7
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
Masahiro Suzuki
 

Similar to Citython presentation (20)

Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
fuzzy LBP for face recognition ppt
fuzzy LBP for face recognition pptfuzzy LBP for face recognition ppt
fuzzy LBP for face recognition ppt
 
Human Emotion Recognition
Human Emotion RecognitionHuman Emotion Recognition
Human Emotion Recognition
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
 
IMAGE GENERATION FROM CAPTION
IMAGE GENERATION FROM CAPTIONIMAGE GENERATION FROM CAPTION
IMAGE GENERATION FROM CAPTION
 
Image Generation from Caption
Image Generation from Caption Image Generation from Caption
Image Generation from Caption
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Coates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsCoates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worlds
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 

Citython presentation

  • 1. Interpretability: Challenging the Black Box of Machine Learning Ankit Tewari Research Data Scientist Knowledge Engineering and Machine Learning Group (KEMLG) Biomedical and Biophysical Signal Processing Group (B2S LAB) Universitat Politecnica de Catalunya (UPC) November 10, 2018 Smart City Week: City, Society and Technology
  • 2. Lunchtime, Storytime! 1. Amazon’s AI based recruitment tool that favored men for technical jobs:penalized the resume files that in- cluded the word ”women’s”, as in ”women’s”chess club captain”; https://www.theguardian.com/technology/2018/oct/10/amazon- hiring-ai-gender-bias-recruiting-engine 2. Racial and Gender Bias in AI based Criminal Justice System: ProPublica compared COMPAS’s risk assessments for 7,000 people arrested in a Florida county with how often they reoffended; https://www.propublica.org/article/machine-bias- risk-assessments-in-criminal-sentencing
  • 3. COMPAS Software Results’, Julia Angwin et al. (2016)
  • 4. Solutions? While there are many reasons such biases are encountered in our machine learning systems, there are pretty straight-forward mechanisms to address. But, remember straight forward is not always simple! Data preprocessing techniques for classification without discrimination. (statistical parity) Discrimination aware Machine Learning Models and many more approaches! However, our discussion is focused on examining whether and how much biased a system is through explaining the predictions made by the system.
  • 5. Prediction Accuracy versus Explainability Remember, nothing comes free of cost. And a good accuracy comes often with a complex model, that is not interpretable.
  • 6. Smarter the System, the more Black the Box gets! Remember, nothing comes free of cost. And a good accuracy comes often with a complex model, that is not interpretable.
  • 7. The intolerable silence! Silence of your lover is different from the silence of your computer. It signifies the barrier between tolerance and intolerance!
  • 8. Interpretability: The ray of hope :) Definition: Interpretability is the degree to which a human can understand the cause of a decision. It is the degree to which a human can consistently predict the model’s result. The higher the interpretability of a model, the easier it is for someone to comprehend why certain decisions (read: predictions) were made.
  • 9. Interpretability versus Interpretation While interpretability is a measure of the extent to which a machine learning model can be explained, the interpretation is the explanation associated with the model’s predictions. 1. Importance and Scope 2. Taxonomy of Interpretability Methods
  • 10. Taxonomy of Interpretability Models Intrinsic or post hoc? Intrinsic interpretability means selecting and training a machine learning model that is considered to be intrinsically interpretable (for example short decision trees). Post hoc interpretability means selecting and training a black box model (for example a neural network) and applying interpretability methods after the training (for example measuring the feature importance). Model-specific or model-agnostic? Model-specific interpretation tools are limited to specific model classes. Model-agnostic tools can be used on any machine learning model and are usually post hoc. Local or Global? Does the interpretation method explain a single prediction or the entire model behavior?
  • 11. Model Agnostic Methods for Interpretability Global Surrogate Models Local Surrogate Models (LIME) Feature Importance Plot Shapley Values Partial Conditional Dependence (PDP) Individual Conditional Expectation (ICE)
  • 12. Global Surrogate Models We want to approximate our black box prediction function ˆf (x) as closely as possible with the surrogate model prediction function ˆg(x), under the constraint that is interpretable. We can make use of any interpretable model, say, linear regression model ˆg(x) = β0 + β1x1 + · · · + βP xP (1) Now,the idea is to fit ˆf (x) on the dataset and obtain predictions ˆy. Then, we train the ˆg(x) using ˆy as the target. The obtained surrogate model ˆg can be used to interpret the blackbox model ˆf . We can also measure how well the surrogate model fits the original black box model with the R squared measure as an example- R2 = 1 − SSE SST = 1 − n i=1(ˆy∗ i −ˆyi )2 n i=1(ˆyi −¯ˆy)2
  • 13. The terminal nodes of a surrogate tree that approximates the behaviour of a support vector machine trained on the bike rental dataset. The distributions in the nodes show that the surrogate tree predicts a higher number of rented bikes when the weather is above around 13 degrees (Celsius) and when the day was later in the 2 year period (cut point at 435 days).
  • 14. Local Surrogate Model (LIME) Intuitively, the local surrogate models attempt to explain a single instance in the same way, the global surrogate models do. Mathematically, the local surrogate models can be described as- explanation(x) = arg ming∈G L(f , g, πx ) + Ω(g) The explanation model for instance x is the model g (e.g. linear regression model) that minimizes loss L (e.g. mean squared error), which measures how close the explanation is to the prediction of the original model f (e.g. an xgboost model), while the model complexity Ω(g) is kept low (e.g. favor fewer features).
  • 15. Local Surrogate Model (LIME) We can describe the recipe for fitting local surrogate models as follows: We first choose our instance (observations) of interest for which we want to have an explanation of its black box prediction Then we perturb our dataset and get the black box predictions for these new data points We then weight the new samples by their proximity to the instance of interest to allow the model to learn locally Finally, we fit a weighted, interpretable model on the dataset with the variations and explain prediction by interpreting the local model
  • 17. Local Surrogate Model (LIME) A) The plot displays the decision boundaries learned by a machine learning model. In this case it was a Random Forest, but it does not matter, because LIME is model-agnostic. B) The yellow point is the instance of interest, which we want to explain. The black dots are data sampled from a normal distribution around the means of the features in the training sample. This needs to be done only once and can be reused for other explanations. C) Introducing locality by giving points near the instance of interest higher weights. D) The colours and signs of the grid display the classifications of the locally learned model form the weighted samples. The white line marks the decision boundary (P(class) = 0.5) at which the classification of the local model changes.
  • 18. Local Surrogate Model (LIME) Application of the LIME on a counter-terrorism dataset, an ongoing project that aims to measure the fingerprints of terrorist outfits across the globe
  • 19. Feature Importance A feature’s importance is the increase in the model’s prediction error after we permuted the feature’s values (breaks the relationship between the feature and the outcome). Just like the global surrogate models, it provides a salient overview of how the model is behaving globally.
  • 20. Feature Importance Feature Importance Input: Trained model ˆf , feature matrix X , target vector Y , error measure L(Y , ˆY ) 1. Estimate the original model error eorig (ˆf ) = L(Y , ˆf (X)) (e.g. mean squared error) 2. For each feature j ∈ {1, ..., p} d Generate feature matrix Xpermj by permuting feature Xj in X. This breaks the association between Xj and Y . Estimate error eperm = L(Y , ˆf (Xpermj )) based on the predictions of the permuted data Calculate permutation feature importance FIj = eperm(ˆf )/eorig (ˆf ) . Alternatively, the difference can be used: FIj = eperm(ˆf ) − eorig (f ) 3. Sort variables by descending FI.
  • 22. Shapley Values The Shapley value is the average marginal contribution of a feature value over all possible coalitions. Predictions can be explained by assuming that each feature is a ’player’ in a game where the prediction is the payout. The Shapley value - a method from coalitional game theory - tells us how to fairly distribute the ’payout’ among the features. The interpretation of the Shapley value. φij for feature j and instance i is: the feature value xij contributed φij towards the prediction for instance i compared to the average prediction for the dataset. The Shapley value works for both classification (if we deal with probabilities) and regression. We use the Shapley value to analyse the predictions of a Random Forest model predicting the absenteeism at workplace.
  • 24. Partial Dependence Plot (PDP) The partial dependence plot (PDP or PD plot) shows the marginal effect of a feature on the predicted outcome of a previously fit model (J. H. Friedman). The prediction function is fixed at a few values of the chosen features and averaged over the other features. In practice, the set of features Xs usually only contains one feature or a maximum of two, because one feature produces 2D plots and two features produce 3D plots. Everything beyond that is quite tricky. Even 3D on a 2D paper or monitor is already challenging.
  • 26. Indepent Conditional Expectation (ICE) For a chosen feature, Individual Conditional Expectation (ICE) plots draw one line per instance, representing how the instance’s prediction changes when the feature changes.
  • 27. Individual Conditional Expectation (ICE) An ICE plot visualizes the dependence of the predicted response on a feature for EACH instance separately, resulting in multiple lines, one for each instance, compared to one line in partial dependence plots. A PDP is the average of the lines of an ICE plot. The values for a line (and one instance) can be computed by leaving all other features the same, creating variants of this instance by replacing the featureˆas value with values from a grid and letting the black box make the predictions with these newly created instances. The result is a set of points for an instance with the feature value from the grid and the respective predictions.
  • 29. Evaluating the Interpretability Application Level Evaluation: Put the explanation into the product and let the end user test it. Human Level Evaluation: is a simplified application level evaluation. The difference is that these experiments are not conducted with the domain experts, but with lay humans. An example would be to show a user different explanations and the human would choose the best. Functional Level Evaluation: This works best when the class of models used was already evaluated by someone else in a human level evaluation. For example it might be known that the end users understand decision trees. In this case, a proxy for explanation quality might be the depth of the tree. Shorter trees would get a better explainability rating.
  • 30. Questions? Thank you so much for being the part of this talk. You can also write me at ankitt.nic@gmail.com :)