Unified Approach to Interpret Machine Learning Model: SHAP + LIME

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Layla Yang, Databricks
Unified approach to interpret
machine learning model: SHAP
+ LIME
#UnifiedDataAnalytics #SparkAISummit

Overview
• What is Machine Learning Interpreter?
• Why is it important?
• Why is interpreting ML model difficult?
• Different methodologies of model explainer
• SHAP: SHapley Additive exPlanations
• LIME: Local Interpretable Model-Agnostic Explanations
• Application: real world use case
3#UnifiedDataAnalytics #SparkAISummit

ML interpreter: What is it?
Black box
Accuracy and Interpretability
Trade-off
Input Data Output

ML interpreter: What is it?
“ Interpretability is the degree to which a human can
understand the cause of a decision.”
-- Miller, Tim. “Explanation in artificial intelligence: Insights from the social sciences.”

Why is it important?
Commercial Drive:
● Many enterprises rely on machine learning models to make important decision:
○ Loan, credit card application
○ To buy / sell commodities, stocks or advertisements
○ Diagnose cancer or benign cells
● Trust
○ It is easier for humans to trust a business that explains its decisions
● Legal Regulation
○ GDPR: the customer has right to obtain explanation

Technical Drive: Can you trust your model based on accuracy?
● Understand edge cases and
circumstances of failure
● Knowing the ‘why’ can help you learn
more about the problem and the data
● Improve the model

Social and ethical aspect:
● Interpretability - a useful debugging tool for Detecting Bias
○ Sean Owen: What do Developer Salaries Tell us about the Gender Pay Gap?
○ Minority, Gender
● Machine learning models pick up biases and may really harm vulnerable group of people
○ Be cautious about marketing strategy
○ Fairness: e.g. automatic approval or rejection of loan applications

Why is it difficult?
ML creates functions to
explicitly or implicitly combine
variables (features) in
sophisticated way
Disaggregate the final
prediction to single feature
contribution and untangle
interaction between features
are very difficult!

Why it’s difficult?
Explain one model out of many good
models: if they give different pictures of
nature’s mechanism and lead to different
conclusions - how to justify one against
another and recommend to the business?
“ Data will often point with almost equal emphasis on several possible models. The
question of which one most accurately reflects the data is difficult to resolve.”
-- Leo Breiman, “Statistical Modeling: The Two Cultures”

Different methodologies
● Surrogate model
○ simple model out of fancy model (to establish relationship between input and prediction)
● Tree interpreter
○ feature importance
○ it measures how often and how much
a feature was used in the model
○ report relative score for feature
importance
○ global picture of the effect of features

LIME (Local Interpretable Model-Agnostic Explanations)
● Model Agnostic! Approximate a black-box model by a simple linear surrogate model locally
● Learned on perturbations of the original instance in some cases faster than SHAP
● It doesn’t work out-of-the-box on all models.

SHAP (SHapley Additive exPlanation)
● To unify various model explanation methods: Model-Agnostic or Model-Specific Approximations
● Based on the game theory, Shapley Values, by Scott Lundberg
● Shapley value is the average contribution of features which are predicting in different situation.
● SHAP provides multiple explainers for different kind of models:
○ TreeExplainer: Support XGBoost, LightGBM, CatBoost and scikit-learn models by Tree SHAP.
○ DeepExplainer (DEEP SHAP): Support TensorFlow and Keras models by using DeepLIFT and Shapley
values.
○ KernelExplainer (Kernel SHAP): Applying to any models by using LIME and Shapley values.
○ GradientExplainer: Support TensorFlow and Keras models.

SHAP: Important Benefits
Produce explanations at the level of individual inputs
● Traditional feature importance algorithms
will tell us which features are most
important across the entire population
● With individual-level SHAP values, we
can pinpoint which factors are most
impactful for each customer

SHAP: Important Benefits
Can directly relate feature values to the output, which greatly improves interpretation of the results
● SHAP can quantify the impact of a feature on the unit of the model target
● Avoid decomp matrix which involves complex transformation and calculation
Read the impact of a feature in dollars!

Application
SHAP explainer: CNN
LIME: NLP Fasttext

Real world implementation
Run on Single Node Distributed Implementation

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Unified Approach to Interpret Machine Learning Model: SHAP + LIME

More Related Content

What's hot

Similar to Unified Approach to Interpret Machine Learning Model: SHAP + LIME

More from Databricks

Recently uploaded

Unified Approach to Interpret Machine Learning Model: SHAP + LIME