SlideShare a Scribd company logo
1 of 123
Explainable AI in Industry
FAT* 2020 Tutorial
Krishna Gade (Fiddler Labs), Sahin Cem Geyik (LinkedIn),
Krishnaram Kenthapadi (Amazon AWS AI), Varun Mithal (LinkedIn),
Ankur Taly (Fiddler Labs)
https://sites.google.com/view/fat20-explainable-ai-tutorial
1
Introduction and Motivation
2
Explainable AI - From a Business Perspective
3
What is Explainable AI?
Data
Black-Box
AI
AI
product
Confusion with Today’s AI Black
Box
● Why did you do that?
● Why did you not do that?
● When do you succeed or fail?
● How do I correct an error?
Black Box AI
Decision,
Recommendation
Clear & Transparent Predictions
● I understand why
● I understand why not
● I know why you succeed or fail
● I understand how to correct errors
Explainable AI
Data
Explainable
AI
Explainable
AI Product
Decision
Explanation
Feedback
Internal Audit, Regulators
IT & Operations
Data Scientists
Business Owner
Can I trust our AI
decisions?
Are these AI system
decisions fair?
Customer Support
How do I answer this
customer complaint?
How do I monitor and
debug this model?
Is this the best model
that can be built?
Black-box
AI
Why I am getting this
decision?
How can I get a better
decision?
Poor Decision
Black-box AI creates confusion and doubt
Black-box AI creates business risk for Industry
Explainable AI - From a Model Perspective
7
Why Explainability: Debug (Mis-)Predictions
8
Top label: “clog”
Why did the network label
this image as “clog”?
9
Why Explainability: Improve ML Model
Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
Why Explainability: Verify the ML Model / System
10
Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
11
Why Explainability: Learn New Insights
Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
12
Why Explainability: Learn Insights in the Sciences
Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
Explainable AI - From a Regulatory Perspective
13
Immigration Reform and Control Act
Citizenship
Rehabilitation Act of 1973;
Americans with Disabilities Act
of 1990
Disability status
Civil Rights Act of 1964
Race
Age Discrimination in Employment Act
of 1967
Age
Equal Pay Act of 1963;
Civil Rights Act of 1964
Sex
And more...
Why Explainability: Laws against Discrimination
14
Fairness Privacy
Transparency Explainability
15
Fairness Privacy
Transparency Explainability
16
Fairness Privacy
Transparency Explainability
17
Why Explainability: Growing Global AI Regulation
● GDPR: Article 22 empowers individuals with the right to demand an explanation of how an
automated system made a decision that affects them.
● Algorithmic Accountability Act 2019: Requires companies to provide an assessment of the risks posed
by the automated decision system to the privacy or security and the risks that contribute to inaccurate,
unfair, biased, or discriminatory decisions impacting consumers
● California Consumer Privacy Act: Requires companies to rethink their approach to capturing,
storing, and sharing personal data to align with the new requirements by January 1, 2020.
● Washington Bill 1655: Establishes guidelines for the use of automated decision systems to
protect consumers, improve transparency, and create more market predictability.
● Massachusetts Bill H.2701: Establishes a commission on automated decision-making,
transparency, fairness, and individual rights.
● Illinois House Bill 3415: States predictive data analytics determining creditworthiness or hiring
decisions may not include information that correlates with the applicant race or zip code.
19
SR 11-7 and OCC regulations for Financial Institutions
Model Diagnostics
Root Cause Analytics
Performance monitoring
Fairness monitoring
Model Comparison
Cohort Analysis
Explainable Decisions
API Support
Model Launch Signoff
Model Release Mgmt
Model Evaluation
Compliance Testing
Model Debugging
Model Visualization
Explainable
AI
Train
QA
Predict
Deploy
A/B Test
Monitor
Debug
Feedback Loop
“Explainability by Design” for AI products
AI @ Scale - Challenges for Explainable AI
21
LinkedIn operates the largest professional
network on the Internet
645M+ members
30M+
companies are
represented on
LinkedIn
90K+
schools listed
(high school &
college)
35K+
skills listed
20M+
open jobs on
LinkedIn Jobs
280B
Feed updates
23© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
The AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS
NEW
Amazon SageMaker Ground
Truth
Augmented
AI
SageMaker
Neo
Built-in
algorithms
SageMaker
Notebooks NEW
SageMaker
Experiments NEW
Model
tuning
SageMaker
Debugger NEW
SageMaker
Autopilot NEW
Model
hosting
SageMaker
Model Monitor NEW
Deep Learning
AMIs & Containers
GPUs &
CPUs
Elastic
Inference
Inferentia FPGA
Amazon
Rekognition
Amazon
Polly
Amazon
Transcribe
+Medical
Amazon
Comprehend
+Medical
Amazon
Translate
Amazon
Lex
Amazon
Personalize
Amazon
Forecast
Amazon
Fraud Detector
Amazon
CodeGuru
AI SERVICES
ML SERVICES
ML FRAMEWORKS & INFRASTRUCTURE
Amazon
Textract
Amazon
Kendra
Contact Lens
For Amazon
Connect
SageMaker Studio IDE NEW
NEW
NEW
NEW
NE
W
Achieving Explainable AI
Approach 1: Post-hoc explain a given AI model
● Individual prediction explanations in terms of input features, influential examples,
concepts, local decision rules
● Global prediction explanations in terms of entire model in terms of partial
dependence plots, global feature importance, global decision rules
Approach 2: Build an interpretable model
● Logistic regression, Decision trees, Decision lists and sets, Generalized Additive
Models (GAMs)
24
Slide credit: https://twitter.com/chandan_singh96/status/1138811752769101825?s=20
Integrated
Gradients
Case Studies on Explainable AI
26
Case Study:
Explainability for Debugging ML models
Ankur Taly** (Fiddler labs)
(Joint work with Mukund Sundararajan, Kedar
Dhamdhere, Pramod Mudrakarta)
27
**This research was carried out at Google Research
Achieving Explainable AI
Approach 1: Post-hoc explain a given AI model
● Individual prediction explanations in terms of input features, influential examples,
concepts, local decision rules
● Global prediction explanations in terms of entire model in terms of partial
dependence plots, global feature importance, global decision rules
Approach 2: Build an interpretable model
● Logistic regression, Decision trees, Decision lists and sets, Generalized Additive
Models (GAMs)
28
Top label: “fireboat”
Why did the model label this
image as “fireboat”?
29
Top label: “clog”
Why did the model label this image as “clog”?
Attribute a model’s prediction on an input to features of the input
Examples:
● Attribute an object recognition network’s prediction to its pixels
● Attribute a text sentiment network’s prediction to individual words
● Attribute a lending model’s prediction to its features
A reductive formulation of “why this prediction” but surprisingly useful :-)
The Attribution Problem
Feature*Gradient
Attribution to a feature is feature value times gradient, i.e., xi* 𝜕y/𝜕xi
● Gradient captures sensitivity of output w.r.t. feature
● Equivalent to Feature*Coefficient for linear models
○ First-order Taylor approximation of non-linear models
● Popularized by SaliencyMaps [NIPS 2013], Baehrens et al. [JMLR 2010]
32
Feature*Gradient
Attribution to a feature is feature value times gradient, i.e., xi* 𝜕y/𝜕xi
● Gradient captures sensitivity of output w.r.t. feature
● Equivalent to Feature*Coefficient for linear models
○ First-order Taylor approximation of non-linear models
● Popularized by SaliencyMaps [NIPS 2013], Baehrens et al. [JMLR 2010]
33
Gradients in the
vicinity of the input
seem like noise
score
intensity
Interesting
gradients
uninteresting gradients
(saturation)
1.0
0.0
Baseline … scaled inputs ...
… gradients of scaled inputs
….
Input
IG(input, base) ::= (input - base) * ∫0 -1▽F(𝛂*input + (1-𝛂)*base) d𝛂
Original image Integrated Gradients
Integrated Gradients [ICML 2017]
Integrate the gradients along a straight-line path from baseline to input
● Ideally, the baseline is an informationless input for the model
○ E.g., Black image for image models
○ E.g., Empty text or zero embedding vector for text models
● Integrated Gradients explains F(input) - F(baseline) in terms of input
features
Aside: Baselines (or Norms) are essential to explanations [Kahneman-Miller 86]
E.g., A man suffers from indigestion. Doctor blames it to a stomach ulcer. Wife
blames it on eating turnips.
● Both are correct relative to their baselines.
What is a baseline?
Original image “Clog”
Why is this image labeled as “clog”?
Original image Integrated Gradients
(for label “clog”)
“Clog”
Why is this image labeled as “clog”?
● Deep network [Kearns, 2016] predicts if a molecule binds to certain DNA site
● Finding: Some atoms had identical attributions despite different connectivity
Detecting an architecture bug
● There was an implementation bug due to which the convolved bond features
did not affect the prediction!
● Deep network [Kearns, 2016] predicts if a molecule binds to certain DNA site
● Finding: Some atoms had identical attributions despite different connectivity
Detecting an architecture bug
● Deep network predicts various diseases from chest x-rays
Original image
Integrated gradients
(for top label)
Detecting a data issue
● Deep network predicts various diseases from chest x-rays
● Finding: Attributions fell on radiologist’s markings (rather than the
pathology)
Original image
Integrated gradients
(for top label)
Detecting a data issue
Tabular QA Visual QA Reading Comprehension
Peyton Manning became the first
quarterback ever to lead two different
teams to multiple Super Bowls. He is also
the oldest quarterback ever to play in a
Super Bowl at age 39. The past record
was held by John Elway, who led the
Broncos to victory in Super Bowl XXXIII at
age 38 and is currently Denver’s Executive
Vice President of Football Operations and
General Manager
Kazemi and Elqursh (2017) model.
61.1% on VQA 1.0 dataset
(state of the art = 66.7%)
Neural Programmer (2017) model
33.5% accuracy on WikiTableQuestions
Yu et al (2018) model.
84.6 F-1 score on SQuAD (state of the art)
43
Q: How many medals did India win?
A: 197
Q: How symmetrical are the white
bricks on either side of the building?
A: very
Q: Name of the quarterback who
was 38 in Super Bowl XXXIII?
A: John Elway
Three Question Answering Tasks
Tabular QA Visual QA Reading Comprehension
Peyton Manning became the first
quarterback ever to lead two different
teams to multiple Super Bowls. He is also
the oldest quarterback ever to play in a
Super Bowl at age 39. The past record
was held by John Elway, who led the
Broncos to victory in Super Bowl XXXIII at
age 38 and is currently Denver’s Executive
Vice President of Football Operations and
General Manager
Kazemi and Elqursh (2017) model.
61.1% on VQA 1.0 dataset
(state of the art = 66.7%)
Neural Programmer (2017) model
33.5% accuracy on WikiTableQuestions
Yu et al (2018) model.
84.6 F-1 score on SQuAD (state of the art)
44
Q: How many medals did India win?
A: 197
Q: How symmetrical are the white
bricks on either side of the building?
A: very
Q: Name of the quarterback who
was 38 in Super Bowl XXXIII?
A: John Elway
Three Question Answering Tasks
Robustness question: Do these network read the question carefully? :-)
Visual Question Answering
Q: How symmetrical are the white bricks on either side of the
building?
A: very
45
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Visual Question Answering
Q: How symmetrical are the white bricks on either side of the
building?
A: very
How do we probe the model’s understanding of the
question?
46
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Visual Question Answering
Q: How symmetrical are the white bricks on either side of the
building?
A: very
How do we probe the model’s understanding of the
question?
Enter: Attributions
● Baseline: Empty question, but full image
● By design, attribution will not fall on the image
47
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Attributions
Q: How symmetrical are the white bricks on either side of the
building?
A: very
How symmetrical are the white bricks on
either side of the building?
red: high attribution
blue: negative attribution
gray: near-zero attribution
48
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Attacking the Model
Q: How symmetrical are the white bricks on either side of the building?
A: very
49
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Attacking the Model
Q: How symmetrical are the white bricks on either side of the building?
A: very
Q: How asymmetrical are the white bricks on either side of the building?
A: very
50
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Attacking the Model
Q: How symmetrical are the white bricks on either side of the building?
A: very
Q: How asymmetrical are the white bricks on either side of the building?
A: very
Q: How big are the white bricks on either side of the building?
A: very
51
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Attacking the Model
Q: How symmetrical are the white bricks on either side of the building?
A: very
Q: How asymmetrical are the white bricks on either side of the building?
A: very
Q: How big are the white bricks on either side of the building?
A: very
Q: How fast are the white bricks speaking on either side of the building?
A: very
52
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Attacking the Model
Q: How symmetrical are the white bricks on either side of the building?
A: very
Q: How asymmetrical are the white bricks on either side of the building?
A: very
Q: How big are the white bricks on either side of the building?
A: very
Q: How fast are the white bricks speaking on either side of the building?
A: very
53
Attributions help assess model robustness and uncover
blind spots!
Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention)
Accuracy: 61.1% (state of the art: 66.7%)
Low-attribution nouns
'tweet',
'childhood',
'copyrights',
'mornings',
'disorder',
'importance',
'critter',
'jumper',
'fits'
What is the man doing? → What is the tweet doing?
How many children are there? → How many tweet are there?
VQA model’s response remains the same 75.6% of the
time on questions that it originally answered correctly
54
Replace the subject of a question with a low-attribution noun from the vocabulary
● This ought to change the answer but often does not!
Attack: Subject Ablation
● Visual QA
○ Prefix concatenation attack (accuracy drop: 61.1% to 19%)
○ Stop word deletion attack (accuracy drop: 61.1% to 52%)
● Tabular QA
○ Prefix concatenation attack (accuracy drop: 33.5% to 11.4%)
○ Stop word deletion attack (accuracy drop: 33.5% to 28.5%)
○ Table row reordering attack (accuracy drop: 33.5 to 23%)
● Paragraph QA
○ Improved paragraph concatenation attacks of Jia and Liang from [EMNLP 2017]
Paper: Did the model understand the question? [ACL 2018]
Many Other Attacks
Case Study:
Talent Platform
“Diversity Insights and Fairness-Aware Ranking”
Sahin Cem Geyik, Krishnaram Kenthapadi
56
Guiding Principle:
“Diversity by Design”
Insights to
Identify Diverse
Talent Pools
Representative
Talent Search
Results
Diversity
Learning
Curriculum
“Diversity by Design” in LinkedIn’s Talent Solutions
Plan for Diversity
Plan for Diversity
Identify Diverse Talent Pools
Inclusive Job Descriptions / Recruiter Outreach
Representative Ranking for Talent Search
S. C. Geyik, S. Ambler,
K. Kenthapadi, Fairness-Aware
Ranking in Search &
Recommendation Systems with
Application to LinkedIn Talent
Search, KDD’19.
[Microsoft’s AI/ML
conference
(MLADS’18). Distinguished
Contribution Award]
Building Representative
Talent Search at LinkedIn
(LinkedIn engineering blog)
Intuition for Measuring and Achieving Representativeness
Ideal: Top ranked results should follow a desired distribution on
gender/age/…
E.g., same distribution as the underlying talent pool
Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16]
Defined measures (skew, divergence) based on this intuition
Desired Proportions within the Attribute of Interest
Compute the proportions of the values of the attribute (e.g., gender,
gender-age combination) amongst the set of qualified candidates
● “Qualified candidates” = Set of candidates that match the
search query criteria
● Retrieved by LinkedIn’s Galene search engine
Desired proportions could also be obtained based on legal
mandate / voluntary commitment
Fairness-aware Reranking Algorithm (Simplified)
Partition the set of potential candidates into different buckets for
each attribute value
Rank the candidates in each bucket according to the scores
assigned by the machine-learned model
Merge the ranked lists, balancing the representation requirements
and the selection of highest scored candidates
Representation requirement: Desired distribution on gender/age/…
Algorithmic variants based on how we achieve this balance
Validating Our Approach
Gender Representativeness
● Over 95% of all searches are representative compared to the
qualified population of the search
Business Metrics
● A/B test over LinkedIn Recruiter users for two weeks
● No significant change in business metrics (e.g., # InMails sent
or accepted)
Ramped to 100% of LinkedIn Recruiter users worldwide
Lessons
learned
• Post-processing approach desirable
• Model agnostic
• Scalable across different model choices
for our application
• Acts as a “fail-safe”
• Robust to application-specific business
logic
• Easier to incorporate as part of existing
systems
• Build a stand-alone service or component
for post-processing
• No significant modifications to the existing
components
• Complementary to efforts to reduce bias from
training data & during model training
• Collaboration/consensus across key stakeholders
Engineering for Fairness in AI Lifecycle
Problem
Formation
Dataset
Construction
Algorithm
Selection
Training
Process
Testing
Process
Deployment
Feedback
Is an algorithm an
ethical solution to our
problem?
Does our data include enough
minority samples?
Are there missing/biased
features?
Do we need to apply debiasing
algorithms to preprocess our
data?
Do we need to include fairness
constraints in the function?
Have we evaluated the model
using relevant fairness
metrics?
Are we deploying our model
on a population that we did
not train/test on?
Are there unequal effects
across users?
Does the model encourage
feedback loops that can
produce increasingly unfair
outcomes?
Credit: K. Browne & J. Draper
Acknowledgements
LinkedIn Talent Solutions Diversity team, Hire & Careers AI team, Anti-abuse AI team, Data Science
Applied Research team
Special thanks to Deepak Agarwal, Parvez Ahammad, Stuart Ambler, Kinjal Basu, Jenelle Bray, Erik
Buchanan, Bee-Chung Chen, Patrick Cheung, Gil Cottle, Cyrus DiCiccio, Patrick Driscoll, Carlos Faham,
Nadia Fawaz, Priyanka Gariba, Meg Garlinghouse, Gurwinder Gulati, Rob Hallman, Sara Harrington,
Joshua Hartman, Daniel Hewlett, Nicolas Kim, Rachel Kumar, Nicole Li, Heloise Logan, Stephen Lynch,
Divyakumar Menghani, Varun Mithal, Arashpreet Singh Mor, Tanvi Motwani, Preetam Nandy, Lei Ni,
Nitin Panjwani, Igor Perisic, Hema Raghavan, Romer Rosales, Guillaume Saint-Jacques, Badrul Sarwar,
Amir Sepehri, Arun Swami, Ram Swaminathan, Grace Tang, Ketan Thakkar, Sriram Vasudevan,
Janardhanan Vembunarayanan, James Verbus, Xin Wang, Hinkmond Wong, Ya Xu, Lin Yang, Yang Yang,
Chenhui Zhai, Liang Zhang, Yani Zhang
Case Study:
Talent Search
Varun Mithal, Girish Kathalagiri, Sahin Cem Geyik
71
LinkedIn Recruiter
● Recruiter Searches for Candidates
○ Standardized and free-text search criteria
● Retrieval and Ranking
○ Filter candidates using the criteria
○ Rank candidates in multiple levels using ML
models
72
Modeling Approaches
● Pairwise XGBoost
● GLMix
● DNNs via TensorFlow
● Optimization Criteria: inMail Accepts
○ Positive: inMail sent by recruiter, and positively responded by candidate
■ Mutual interest between the recruiter and the candidate
73
Feature Importance in XGBoost
74
How We Utilize Feature Importances for GBDT
● Understanding feature digressions
○ Which a feature that was impactful no longer is?
○ Should we debug feature generation?
● Introducing new features in bulk and identifying effective ones
○ An activity feature for last 3 hours, 6 hours, 12 hours, 24 hours introduced (costly to compute)
○ Should we keep all such features?
● Separating the factors for that caused an improvement
○ Did an improvement come from a new feature, or a new labeling strategy, data source?
○ Did the ordering between features change?
● Shortcoming: A global view, not case by case
75
GLMix Models
● Generalized Linear Mixed Models
○ Global: Linear Model
○ Per-contract: Linear Model
○ Per-recruiter: Linear Model
● Lots of parameters overall
○ For a specific recruiter or contract the weights can be summed up
● Inherently explainable
○ Contribution of a feature is “weight x feature value”
○ Can be examined in a case-by-case manner as well
76
TensorFlow Models in Recruiter and Explaining Them
● We utilize the Integrated Gradients [ICML 2017] method
● How do we determine the baseline example?
○ Every query creates its own feature values for the same candidate
○ Query match features, time-based features
○ Recruiter affinity, and candidate affinity features
○ A candidate would be scored differently by each query
○ Cannot recommend a “Software Engineer” to a search for a “Forensic Chemist”
○ There is no globally neutral example for comparison!
77
Query-Specific Baseline Selection
● For each query:
○ Score examples by the TF model
○ Rank examples
○ Choose one example as the baseline
○ Compare others to the baseline example
● How to choose the baseline example
○ Last candidate
○ Kth percentile in ranking
○ A random candidate
○ Request by user (answering a question like: “Why was I presented candidate x above
candidate y?”)
78
Example
79
Example - Detailed
80
Feature Description Difference (1 vs 2) Contribution
Feature………. Description………. -2.0476928 -2.144455602
Feature………. Description………. -2.3223877 1.903594618
Feature………. Description………. 0.11666667 0.2114946752
Feature………. Description………. -2.1442587 0.2060414469
Feature………. Description………. -14 0.1215354111
Feature………. Description………. 1 0.1000282466
Feature………. Description………. -92 -0.085286277
Feature………. Description………. 0.9333333 0.0568533262
Feature………. Description………. -1 -0.051796317
Feature………. Description………. -1 -0.050895940
Pros & Cons
● Explains potentially very complex models
● Case-by-case analysis
○ Why do you think candidate x is a better match for my position?
○ Why do you think I am a better fit for this job?
○ Why am I being shown this ad?
○ Great for debugging real-time problems in production
● Global view is missing
○ Aggregate Contributions can be computed
○ Could be costly to compute
81
Lessons Learned and Next Steps
● Global explanations vs. Case-by-case Explanations
○ Global gives an overview, better for making modeling decisions
○ Case-by-case could be more useful for the non-technical user, better for debugging
● Integrated gradients worked well for us
○ Complex models make it harder for developers to map improvement to effort
○ Use-case gave intuitive results, on top of completely describing score differences
● Next steps
○ Global explanations for Deep Models
82
Case Study:
Model Interpretation for Predictive Models in B2B
Sales Predictions
Jilei Yang, Wei Di, Songtao Guo
83
Problem Setting
● Predictive models in B2B sales prediction
○ E.g.: random forest, gradient boosting, deep neural network, …
○ High accuracy, low interpretability
● Global feature importance → Individual feature reasoning
84
Example
85
Revisiting LIME
● Given a target sample 𝑥 𝑘, approximate its prediction 𝑝𝑟𝑒𝑑(𝑥 𝑘) by building a
sample-specific linear model:
𝑝𝑟𝑒𝑑(𝑋) ≈ 𝛽 𝑘1 𝑋1 + 𝛽 𝑘2 𝑋2 + …, 𝑋 ∈ 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟(𝑥 𝑘)
● E.g., for company CompanyX:
0.76 ≈ 1.82 ∗ 0.17 + 1.61 ∗ 0.11+…
86
xLIME
Piecewise Linear
Regression
Localized Stratified
Sampling
87
Piecewise Linear Regression
Motivation: Separate top positive feature influencers and top negative feature influencers
88
Impact of Piecewise Approach
● Target sample 𝑥 𝑘=(𝑥 𝑘1, 𝑥 𝑘2, ⋯)
● Top feature contributor
○ LIME: large magnitude of 𝛽 𝑘𝑗 ⋅ 𝑥 𝑘𝑗
○ xLIME: large magnitude of 𝛽 𝑘𝑗
− ⋅ 𝑥 𝑘𝑗
● Top positive feature influencer
○ LIME: large magnitude of 𝛽 𝑘𝑗
○ xLIME: large magnitude of negative 𝛽 𝑘𝑗
− or positive 𝛽 𝑘𝑗
+
● Top negative feature influencer
○ LIME: large magnitude of 𝛽 𝑘𝑗
○ xLIME: large magnitude of positive 𝛽 𝑘𝑗
− or negative 𝛽 𝑘𝑗
+
89
Localized Stratified Sampling: Idea
Method: Sampling based on empirical distribution around target value at each feature level
90
Localized Stratified Sampling: Method
● Sampling based on empirical distribution around target value for each
feature
● For target sample 𝑥 𝑘 = (𝑥 𝑘1 , 𝑥 𝑘2 , ⋯), sampling values of feature 𝑗 according to
𝑝𝑗 (𝑋𝑗) ⋅ 𝑁(𝑥 𝑘𝑗 , (𝛼 ⋅ 𝑠𝑗 )2)
○ 𝑝𝑗 (𝑋𝑗) : empirical distribution.
○ 𝑥 𝑘𝑗 : feature value in target sample.
○ 𝑠𝑗 : standard deviation.
○ 𝛼 : Interpretable range: tradeoff between interpretable coverage and local accuracy.
● In LIME, sampling according to 𝑁(𝑥𝑗 , 𝑠𝑗
2).
91
_
Summary
92
LTS LCP (LinkedIn Career Page) Upsell
● A subset of churn data
○ Total Companies: ~ 19K
○ Company features: 117
● Problem: Estimate whether there will be upsell given a set of features about
the company’s utility from the product
93
Top Feature Contributor
94
95
Top Feature Influencers
96
Key Takeaways
● Looking at the explanation as contributor vs. influencer features is useful
○ Contributor: Which features end-up in the current outcome case-by-case
○ Influencer: What needs to be done to improve likelihood, case-by-case
● xLIME aims to improve on LIME via:
○ Piecewise linear regression: More accurately describes local point, helps with finding correct
influencers
○ Localized stratified sampling: More realistic set of local points
● Better captures the important features
97
Case Study:
Integrated Gradients for Explaining Diabetic
Retinopathy Predictions
Ankur Taly** (Fiddler labs)
(Joint work with Mukund Sundararajan, Kedar Dhamdhere, Pramod Mudrakarta)
98
**This research was carried out at Google Research
Prediction: “proliferative” DR1
● Proliferative implies vision-threatening
Can we provide an explanation to
the doctor with supporting evidence
for “proliferative” DR?
Retinal Fundus Image
1Diabetic Retinopathy (DR) is a diabetes complication that affects the eye. Deep networks can
predict DR grade from retinal fundus images with high accuracy (AUC ~0.97) [JAMA, 2016].
Retinal Fundus Image
Integrated Gradients for label: “proliferative”
Visualization: Overlay heatmap on green channel
Retinal Fundus Image
Integrated Gradients for label: “proliferative”
Visualization: Overlay heatmap on green channel
Lesions
Neovascularization
● Hard to see on original image
● Known to be vision-threatening
Retinal Fundus Image
Integrated Gradients for label: “proliferative”
Visualization: Overlay heatmap on green channel
Lesions
Neovascularization
● Hard to see on original image
● Known to be vision-threatening
Does attributions help improve doctors better diagnose diabetic retinopathy?
Assisted-read study
9 doctors grade 2000 images under three different conditions
A. Image only
B. Image + Model’s prediction scores
C. Image + Model’s prediction scores + Explanation (Integrated Gradients)
9 doctors grade 2000 images under three different conditions
A. Image only
B. Image + Model’s prediction scores
C. Image + Model’s prediction scores + Explanation (Integrated Gradients)
Findings:
● Model’s predictions (B) significantly improve accuracy vs. image only (A) (p < 0.001)
● Both forms of assistance (B and C) improved sensitivity without hurting specificity
● Explanations (C) improved accuracy of cases with DR (p < 0.001) but hurt accuracy of
cases without DR (p = 0.006)
● Both B and C increase doctor ↔ model agreement
Paper: Using a deep learning algorithm and integrated gradients explanation to assist
grading for diabetic retinopathy --- Journal of Ophthalmology [2018]
Assisted-read study
Case Study:
Fiddler’s Explainable AI Engine
Ankur Taly (Fiddler labs)
105
All your data
Any data warehouse
Custom Models
Fiddler Modeling Layer
Explainable AI for everyone
APIs, Dashboards, Reports, Trusted Insights
Fiddler’s Explainable AI Engine
Mission: Unlock Trust, Visibility and Insights by making AI Explainable in every enterprise
Credit Line Increase
Fair lending laws [ECOA, FCRA] require credit decisions to be explainable
Bank Credit Lending Model
Why? Why not? How?
? Request Denied
Query AI System
Credit Lending Score = 0.3
Example: Credit Lending in a black-box ML world
How Can This Help…
Customer Support
Why was a customer loan
rejected?
Bias & Fairness
How is my model doing
across demographics?
Lending LOB
What variables should they
validate with customers on
“borderline” decisions?
Explain individual predictions (using Shapley Values)
How Can This Help…
Customer Support
Why was a customer loan
rejected?
Bias & Fairness
How is my model doing
across demographics?
Lending LOB
What variables should they
validate with customers on
“borderline” decisions?
Explain individual predictions (using Shapley Values)
How Can This Help…
Customer Support
Why was a customer loan
rejected?
Bias & Fairness
How is my model doing
across demographics?
Lending LOB
What variables should they
validate with customers on
“borderline” decisions?
Explain individual predictions (using Shapley Values)
Probe the
model on
counterfactuals
How Can This Help…
Customer Support
Why was a customer loan
rejected?
Why was the credit card
limit low?
Why was this transaction
marked as fraud?
Integrating explanations
Classic result in game theory on distributing gain in a coalition game
● Coalition Games
○ Players collaborating to generate some gain (think: revenue)
○ Set function v(S) determining the gain for any subset S of players
● Shapley Values are a fair way to attribute the gain to players based on their
contributions
○ Shapley value of a player is a specific weighted aggregation of its marginal contribution
to all possible subsets of other players
○ Axiomatic guarantee: Unique method under four very simple axioms
Shapley Value [Annals of Mathematical studies,1953]
● Define a coalition game for each model input X
○ Players are the features in the input
○ Gain is the model prediction (output), i.e., gain = F(X)
● Feature attributions are the Shapley values of this game
Shapley Values for Explaining ML models
● Define a coalition game for each model input X
○ Players are the features in the input
○ Gain is the model prediction (output), i.e., gain = F(X)
● Feature attributions are the Shapley values of this game
Challenge: What does it mean for only some players (features) to be present?
Shapley Values for Explaining ML models
● Define a coalition game for each model input X
○ Players are the features in the input
○ Gain is the model prediction (output), i.e., gain = F(X)
● Feature attributions are the Shapley values of this game
Challenge: What does it mean for only some players (features) to be present?
Approach: Randomly sample the (absent) feature from a certain distribution
● Different approaches choose different distributions and yield significantly different attributions!
Preprint: The Explanation Game: Explaining Machine Learning Models with Cooperative
Game Theory
Shapley Values for Explaining ML models
How Can This Help…
Global Explanations
What are the primary feature
drivers of the dataset on my
model?
Region Explanations
How does my model
perform on a certain slice?
Where does the model not
perform well? Is my model
uniformly fair across slices?
Slice & Explain
Model Monitoring: Feature Drift
Investigate Data Drift Impacting Model Performance
Time slice
Feature distribution for
time slice relative to
training distribution
How Can This Help…
Operations
Why are there outliers in model
predictions? What caused model
performance to go awry?
Data Science
How can I improve my ML model?
Where does it not do well?
Model Monitoring: Outliers with Explanations
Outlier
Individual
Explanations
Model Diagnostics
Root Cause Analytics
Performance monitoring
Fairness monitoring
Model Comparison
Cohort Analysis
Explainable Decisions
API Support
Model Launch Signoff
Model Release Mgmt
Model Evaluation
Compliance Testing
Model Debugging
Model Visualization
Explainable
AI
Train
QA
Predict
Deploy
A/B Test
Monitor
Debug
Feedback Loop
An Explainable Future
Explainability Challenges & Tradeoffs
120
User PrivacyTransparency
Fairness Performance
?
● Lack of standard interface for ML models
makes pluggable explanations hard
● Explanation needs vary depending on the type
of the user who needs it and also the problem
at hand.
● The algorithm you employ for explanations
might depend on the use-case, model type,
data format, etc.
● There are trade-offs w.r.t. Explainability,
Performance, Fairness, and Privacy.
Explainability in ML: Broad Challenges
Actionable explanations
Balance between explanations & model secrecy
Robustness of explanations to failure modes (Interaction between ML
components)
Application-specific challenges
Conversational AI systems: contextual explanations
Gradation of explanations
Tools for explanations across AI lifecycle
Pre & post-deployment for ML models
Model developer vs. End user focused
Reflections
● Case studies on explainable AI in
practice
● Need “Explainability by Design” when
building AI products
122
Thanks! Questions?
● Feedback most welcome :-)
○ krishna@fiddler.ai, sgeyik@linkedin.com,
kenthk@amazon.com, vamithal@linkedin.com, ankur@fiddler.ai
● Tutorial website: https://sites.google.com/view/fat20-explainable-ai-
tutorial
● To try Fiddler, please send an email to info@fiddler.ai
123

More Related Content

What's hot

An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Saurabh Kaushik
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsManojit Nandi
 
Explainable AI
Explainable AIExplainable AI
Explainable AIDinesh V
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Responsible Data Use in AI - core tech pillars
Responsible Data Use in AI - core tech pillarsResponsible Data Use in AI - core tech pillars
Responsible Data Use in AI - core tech pillarsSofus Macskássy
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessManojit Nandi
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityRaouf KESKES
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
Interpretable ML
Interpretable MLInterpretable ML
Interpretable MLMayur Sand
 
Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Raheel Ahmad
 

What's hot (20)

An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex models
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Responsible Data Use in AI - core tech pillars
Responsible Data Use in AI - core tech pillarsResponsible Data Use in AI - core tech pillars
Responsible Data Use in AI - core tech pillars
 
Shap
ShapShap
Shap
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Interpretable ML
Interpretable MLInterpretable ML
Interpretable ML
 
Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models? Explainable AI: Building trustworthy AI models?
Explainable AI: Building trustworthy AI models?
 

Similar to Explainable AI in Industry (FAT* 2020 Tutorial)

Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...Raheel Ahmad
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Debmalya Biswas
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceAnimesh Singh
 
Responsible Generative AI Design Patterns
Responsible Generative AI Design PatternsResponsible Generative AI Design Patterns
Responsible Generative AI Design PatternsDebmalya Biswas
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...Edge AI and Vision Alliance
 
Artificial Intelligence and Machine Learning in Research
Artificial Intelligence and Machine Learning in ResearchArtificial Intelligence and Machine Learning in Research
Artificial Intelligence and Machine Learning in ResearchAmazon Web Services
 
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Sri Ambati
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
 
AI Developments and Trends (OECD)
AI Developments and Trends (OECD)AI Developments and Trends (OECD)
AI Developments and Trends (OECD)AnandSRao1962
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
AI-900 Slides.pptx
AI-900 Slides.pptxAI-900 Slides.pptx
AI-900 Slides.pptxkprasad8
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Andrew Ly
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 

Similar to Explainable AI in Industry (FAT* 2020 Tutorial) (20)

Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
Debugging AI
Debugging AIDebugging AI
Debugging AI
 
Responsible Generative AI Design Patterns
Responsible Generative AI Design PatternsResponsible Generative AI Design Patterns
Responsible Generative AI Design Patterns
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
 
Artificial Intelligence and Machine Learning in Research
Artificial Intelligence and Machine Learning in ResearchArtificial Intelligence and Machine Learning in Research
Artificial Intelligence and Machine Learning in Research
 
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
 
AI Developments and Trends (OECD)
AI Developments and Trends (OECD)AI Developments and Trends (OECD)
AI Developments and Trends (OECD)
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
AI-900 Slides.pptx
AI-900 Slides.pptxAI-900 Slides.pptx
AI-900 Slides.pptx
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
 
Debugging AI
Debugging AIDebugging AI
Debugging AI
 
AI in healthcare
AI in healthcareAI in healthcare
AI in healthcare
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 

More from Krishnaram Kenthapadi

Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
Privacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInPrivacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInKrishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 

More from Krishnaram Kenthapadi (11)

Amazon SageMaker Clarify
Amazon SageMaker ClarifyAmazon SageMaker Clarify
Amazon SageMaker Clarify
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Privacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInPrivacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedIn
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 

Recently uploaded

Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.soniya singh
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 

Recently uploaded (20)

Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 

Explainable AI in Industry (FAT* 2020 Tutorial)

  • 1. Explainable AI in Industry FAT* 2020 Tutorial Krishna Gade (Fiddler Labs), Sahin Cem Geyik (LinkedIn), Krishnaram Kenthapadi (Amazon AWS AI), Varun Mithal (LinkedIn), Ankur Taly (Fiddler Labs) https://sites.google.com/view/fat20-explainable-ai-tutorial 1
  • 3. Explainable AI - From a Business Perspective 3
  • 4. What is Explainable AI? Data Black-Box AI AI product Confusion with Today’s AI Black Box ● Why did you do that? ● Why did you not do that? ● When do you succeed or fail? ● How do I correct an error? Black Box AI Decision, Recommendation Clear & Transparent Predictions ● I understand why ● I understand why not ● I know why you succeed or fail ● I understand how to correct errors Explainable AI Data Explainable AI Explainable AI Product Decision Explanation Feedback
  • 5. Internal Audit, Regulators IT & Operations Data Scientists Business Owner Can I trust our AI decisions? Are these AI system decisions fair? Customer Support How do I answer this customer complaint? How do I monitor and debug this model? Is this the best model that can be built? Black-box AI Why I am getting this decision? How can I get a better decision? Poor Decision Black-box AI creates confusion and doubt
  • 6. Black-box AI creates business risk for Industry
  • 7. Explainable AI - From a Model Perspective 7
  • 8. Why Explainability: Debug (Mis-)Predictions 8 Top label: “clog” Why did the network label this image as “clog”?
  • 9. 9 Why Explainability: Improve ML Model Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
  • 10. Why Explainability: Verify the ML Model / System 10 Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
  • 11. 11 Why Explainability: Learn New Insights Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
  • 12. 12 Why Explainability: Learn Insights in the Sciences Credit: Samek, Binder, Tutorial on Interpretable ML, MICCAI’18
  • 13. Explainable AI - From a Regulatory Perspective 13
  • 14. Immigration Reform and Control Act Citizenship Rehabilitation Act of 1973; Americans with Disabilities Act of 1990 Disability status Civil Rights Act of 1964 Race Age Discrimination in Employment Act of 1967 Age Equal Pay Act of 1963; Civil Rights Act of 1964 Sex And more... Why Explainability: Laws against Discrimination 14
  • 18. Why Explainability: Growing Global AI Regulation ● GDPR: Article 22 empowers individuals with the right to demand an explanation of how an automated system made a decision that affects them. ● Algorithmic Accountability Act 2019: Requires companies to provide an assessment of the risks posed by the automated decision system to the privacy or security and the risks that contribute to inaccurate, unfair, biased, or discriminatory decisions impacting consumers ● California Consumer Privacy Act: Requires companies to rethink their approach to capturing, storing, and sharing personal data to align with the new requirements by January 1, 2020. ● Washington Bill 1655: Establishes guidelines for the use of automated decision systems to protect consumers, improve transparency, and create more market predictability. ● Massachusetts Bill H.2701: Establishes a commission on automated decision-making, transparency, fairness, and individual rights. ● Illinois House Bill 3415: States predictive data analytics determining creditworthiness or hiring decisions may not include information that correlates with the applicant race or zip code.
  • 19. 19 SR 11-7 and OCC regulations for Financial Institutions
  • 20. Model Diagnostics Root Cause Analytics Performance monitoring Fairness monitoring Model Comparison Cohort Analysis Explainable Decisions API Support Model Launch Signoff Model Release Mgmt Model Evaluation Compliance Testing Model Debugging Model Visualization Explainable AI Train QA Predict Deploy A/B Test Monitor Debug Feedback Loop “Explainability by Design” for AI products
  • 21. AI @ Scale - Challenges for Explainable AI 21
  • 22. LinkedIn operates the largest professional network on the Internet 645M+ members 30M+ companies are represented on LinkedIn 90K+ schools listed (high school & college) 35K+ skills listed 20M+ open jobs on LinkedIn Jobs 280B Feed updates
  • 23. 23© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | The AWS ML Stack Broadest and most complete set of Machine Learning capabilities VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS NEW Amazon SageMaker Ground Truth Augmented AI SageMaker Neo Built-in algorithms SageMaker Notebooks NEW SageMaker Experiments NEW Model tuning SageMaker Debugger NEW SageMaker Autopilot NEW Model hosting SageMaker Model Monitor NEW Deep Learning AMIs & Containers GPUs & CPUs Elastic Inference Inferentia FPGA Amazon Rekognition Amazon Polly Amazon Transcribe +Medical Amazon Comprehend +Medical Amazon Translate Amazon Lex Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon CodeGuru AI SERVICES ML SERVICES ML FRAMEWORKS & INFRASTRUCTURE Amazon Textract Amazon Kendra Contact Lens For Amazon Connect SageMaker Studio IDE NEW NEW NEW NEW NE W
  • 24. Achieving Explainable AI Approach 1: Post-hoc explain a given AI model ● Individual prediction explanations in terms of input features, influential examples, concepts, local decision rules ● Global prediction explanations in terms of entire model in terms of partial dependence plots, global feature importance, global decision rules Approach 2: Build an interpretable model ● Logistic regression, Decision trees, Decision lists and sets, Generalized Additive Models (GAMs) 24
  • 26. Case Studies on Explainable AI 26
  • 27. Case Study: Explainability for Debugging ML models Ankur Taly** (Fiddler labs) (Joint work with Mukund Sundararajan, Kedar Dhamdhere, Pramod Mudrakarta) 27 **This research was carried out at Google Research
  • 28. Achieving Explainable AI Approach 1: Post-hoc explain a given AI model ● Individual prediction explanations in terms of input features, influential examples, concepts, local decision rules ● Global prediction explanations in terms of entire model in terms of partial dependence plots, global feature importance, global decision rules Approach 2: Build an interpretable model ● Logistic regression, Decision trees, Decision lists and sets, Generalized Additive Models (GAMs) 28
  • 29. Top label: “fireboat” Why did the model label this image as “fireboat”? 29
  • 30. Top label: “clog” Why did the model label this image as “clog”?
  • 31. Attribute a model’s prediction on an input to features of the input Examples: ● Attribute an object recognition network’s prediction to its pixels ● Attribute a text sentiment network’s prediction to individual words ● Attribute a lending model’s prediction to its features A reductive formulation of “why this prediction” but surprisingly useful :-) The Attribution Problem
  • 32. Feature*Gradient Attribution to a feature is feature value times gradient, i.e., xi* 𝜕y/𝜕xi ● Gradient captures sensitivity of output w.r.t. feature ● Equivalent to Feature*Coefficient for linear models ○ First-order Taylor approximation of non-linear models ● Popularized by SaliencyMaps [NIPS 2013], Baehrens et al. [JMLR 2010] 32
  • 33. Feature*Gradient Attribution to a feature is feature value times gradient, i.e., xi* 𝜕y/𝜕xi ● Gradient captures sensitivity of output w.r.t. feature ● Equivalent to Feature*Coefficient for linear models ○ First-order Taylor approximation of non-linear models ● Popularized by SaliencyMaps [NIPS 2013], Baehrens et al. [JMLR 2010] 33 Gradients in the vicinity of the input seem like noise
  • 34. score intensity Interesting gradients uninteresting gradients (saturation) 1.0 0.0 Baseline … scaled inputs ... … gradients of scaled inputs …. Input
  • 35. IG(input, base) ::= (input - base) * ∫0 -1▽F(𝛂*input + (1-𝛂)*base) d𝛂 Original image Integrated Gradients Integrated Gradients [ICML 2017] Integrate the gradients along a straight-line path from baseline to input
  • 36. ● Ideally, the baseline is an informationless input for the model ○ E.g., Black image for image models ○ E.g., Empty text or zero embedding vector for text models ● Integrated Gradients explains F(input) - F(baseline) in terms of input features Aside: Baselines (or Norms) are essential to explanations [Kahneman-Miller 86] E.g., A man suffers from indigestion. Doctor blames it to a stomach ulcer. Wife blames it on eating turnips. ● Both are correct relative to their baselines. What is a baseline?
  • 37. Original image “Clog” Why is this image labeled as “clog”?
  • 38. Original image Integrated Gradients (for label “clog”) “Clog” Why is this image labeled as “clog”?
  • 39. ● Deep network [Kearns, 2016] predicts if a molecule binds to certain DNA site ● Finding: Some atoms had identical attributions despite different connectivity Detecting an architecture bug
  • 40. ● There was an implementation bug due to which the convolved bond features did not affect the prediction! ● Deep network [Kearns, 2016] predicts if a molecule binds to certain DNA site ● Finding: Some atoms had identical attributions despite different connectivity Detecting an architecture bug
  • 41. ● Deep network predicts various diseases from chest x-rays Original image Integrated gradients (for top label) Detecting a data issue
  • 42. ● Deep network predicts various diseases from chest x-rays ● Finding: Attributions fell on radiologist’s markings (rather than the pathology) Original image Integrated gradients (for top label) Detecting a data issue
  • 43. Tabular QA Visual QA Reading Comprehension Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory in Super Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations and General Manager Kazemi and Elqursh (2017) model. 61.1% on VQA 1.0 dataset (state of the art = 66.7%) Neural Programmer (2017) model 33.5% accuracy on WikiTableQuestions Yu et al (2018) model. 84.6 F-1 score on SQuAD (state of the art) 43 Q: How many medals did India win? A: 197 Q: How symmetrical are the white bricks on either side of the building? A: very Q: Name of the quarterback who was 38 in Super Bowl XXXIII? A: John Elway Three Question Answering Tasks
  • 44. Tabular QA Visual QA Reading Comprehension Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory in Super Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations and General Manager Kazemi and Elqursh (2017) model. 61.1% on VQA 1.0 dataset (state of the art = 66.7%) Neural Programmer (2017) model 33.5% accuracy on WikiTableQuestions Yu et al (2018) model. 84.6 F-1 score on SQuAD (state of the art) 44 Q: How many medals did India win? A: 197 Q: How symmetrical are the white bricks on either side of the building? A: very Q: Name of the quarterback who was 38 in Super Bowl XXXIII? A: John Elway Three Question Answering Tasks Robustness question: Do these network read the question carefully? :-)
  • 45. Visual Question Answering Q: How symmetrical are the white bricks on either side of the building? A: very 45 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 46. Visual Question Answering Q: How symmetrical are the white bricks on either side of the building? A: very How do we probe the model’s understanding of the question? 46 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 47. Visual Question Answering Q: How symmetrical are the white bricks on either side of the building? A: very How do we probe the model’s understanding of the question? Enter: Attributions ● Baseline: Empty question, but full image ● By design, attribution will not fall on the image 47 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 48. Attributions Q: How symmetrical are the white bricks on either side of the building? A: very How symmetrical are the white bricks on either side of the building? red: high attribution blue: negative attribution gray: near-zero attribution 48 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 49. Attacking the Model Q: How symmetrical are the white bricks on either side of the building? A: very 49 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 50. Attacking the Model Q: How symmetrical are the white bricks on either side of the building? A: very Q: How asymmetrical are the white bricks on either side of the building? A: very 50 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 51. Attacking the Model Q: How symmetrical are the white bricks on either side of the building? A: very Q: How asymmetrical are the white bricks on either side of the building? A: very Q: How big are the white bricks on either side of the building? A: very 51 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 52. Attacking the Model Q: How symmetrical are the white bricks on either side of the building? A: very Q: How asymmetrical are the white bricks on either side of the building? A: very Q: How big are the white bricks on either side of the building? A: very Q: How fast are the white bricks speaking on either side of the building? A: very 52 Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 53. Attacking the Model Q: How symmetrical are the white bricks on either side of the building? A: very Q: How asymmetrical are the white bricks on either side of the building? A: very Q: How big are the white bricks on either side of the building? A: very Q: How fast are the white bricks speaking on either side of the building? A: very 53 Attributions help assess model robustness and uncover blind spots! Kazemi and Elqursh (2017) model (ResNet + multi-layer LSTM + Attention) Accuracy: 61.1% (state of the art: 66.7%)
  • 54. Low-attribution nouns 'tweet', 'childhood', 'copyrights', 'mornings', 'disorder', 'importance', 'critter', 'jumper', 'fits' What is the man doing? → What is the tweet doing? How many children are there? → How many tweet are there? VQA model’s response remains the same 75.6% of the time on questions that it originally answered correctly 54 Replace the subject of a question with a low-attribution noun from the vocabulary ● This ought to change the answer but often does not! Attack: Subject Ablation
  • 55. ● Visual QA ○ Prefix concatenation attack (accuracy drop: 61.1% to 19%) ○ Stop word deletion attack (accuracy drop: 61.1% to 52%) ● Tabular QA ○ Prefix concatenation attack (accuracy drop: 33.5% to 11.4%) ○ Stop word deletion attack (accuracy drop: 33.5% to 28.5%) ○ Table row reordering attack (accuracy drop: 33.5 to 23%) ● Paragraph QA ○ Improved paragraph concatenation attacks of Jia and Liang from [EMNLP 2017] Paper: Did the model understand the question? [ACL 2018] Many Other Attacks
  • 56. Case Study: Talent Platform “Diversity Insights and Fairness-Aware Ranking” Sahin Cem Geyik, Krishnaram Kenthapadi 56
  • 58. Insights to Identify Diverse Talent Pools Representative Talent Search Results Diversity Learning Curriculum “Diversity by Design” in LinkedIn’s Talent Solutions
  • 62. Inclusive Job Descriptions / Recruiter Outreach
  • 63. Representative Ranking for Talent Search S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search, KDD’19. [Microsoft’s AI/ML conference (MLADS’18). Distinguished Contribution Award] Building Representative Talent Search at LinkedIn (LinkedIn engineering blog)
  • 64. Intuition for Measuring and Achieving Representativeness Ideal: Top ranked results should follow a desired distribution on gender/age/… E.g., same distribution as the underlying talent pool Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16] Defined measures (skew, divergence) based on this intuition
  • 65. Desired Proportions within the Attribute of Interest Compute the proportions of the values of the attribute (e.g., gender, gender-age combination) amongst the set of qualified candidates ● “Qualified candidates” = Set of candidates that match the search query criteria ● Retrieved by LinkedIn’s Galene search engine Desired proportions could also be obtained based on legal mandate / voluntary commitment
  • 66. Fairness-aware Reranking Algorithm (Simplified) Partition the set of potential candidates into different buckets for each attribute value Rank the candidates in each bucket according to the scores assigned by the machine-learned model Merge the ranked lists, balancing the representation requirements and the selection of highest scored candidates Representation requirement: Desired distribution on gender/age/… Algorithmic variants based on how we achieve this balance
  • 67. Validating Our Approach Gender Representativeness ● Over 95% of all searches are representative compared to the qualified population of the search Business Metrics ● A/B test over LinkedIn Recruiter users for two weeks ● No significant change in business metrics (e.g., # InMails sent or accepted) Ramped to 100% of LinkedIn Recruiter users worldwide
  • 68. Lessons learned • Post-processing approach desirable • Model agnostic • Scalable across different model choices for our application • Acts as a “fail-safe” • Robust to application-specific business logic • Easier to incorporate as part of existing systems • Build a stand-alone service or component for post-processing • No significant modifications to the existing components • Complementary to efforts to reduce bias from training data & during model training • Collaboration/consensus across key stakeholders
  • 69. Engineering for Fairness in AI Lifecycle Problem Formation Dataset Construction Algorithm Selection Training Process Testing Process Deployment Feedback Is an algorithm an ethical solution to our problem? Does our data include enough minority samples? Are there missing/biased features? Do we need to apply debiasing algorithms to preprocess our data? Do we need to include fairness constraints in the function? Have we evaluated the model using relevant fairness metrics? Are we deploying our model on a population that we did not train/test on? Are there unequal effects across users? Does the model encourage feedback loops that can produce increasingly unfair outcomes? Credit: K. Browne & J. Draper
  • 70. Acknowledgements LinkedIn Talent Solutions Diversity team, Hire & Careers AI team, Anti-abuse AI team, Data Science Applied Research team Special thanks to Deepak Agarwal, Parvez Ahammad, Stuart Ambler, Kinjal Basu, Jenelle Bray, Erik Buchanan, Bee-Chung Chen, Patrick Cheung, Gil Cottle, Cyrus DiCiccio, Patrick Driscoll, Carlos Faham, Nadia Fawaz, Priyanka Gariba, Meg Garlinghouse, Gurwinder Gulati, Rob Hallman, Sara Harrington, Joshua Hartman, Daniel Hewlett, Nicolas Kim, Rachel Kumar, Nicole Li, Heloise Logan, Stephen Lynch, Divyakumar Menghani, Varun Mithal, Arashpreet Singh Mor, Tanvi Motwani, Preetam Nandy, Lei Ni, Nitin Panjwani, Igor Perisic, Hema Raghavan, Romer Rosales, Guillaume Saint-Jacques, Badrul Sarwar, Amir Sepehri, Arun Swami, Ram Swaminathan, Grace Tang, Ketan Thakkar, Sriram Vasudevan, Janardhanan Vembunarayanan, James Verbus, Xin Wang, Hinkmond Wong, Ya Xu, Lin Yang, Yang Yang, Chenhui Zhai, Liang Zhang, Yani Zhang
  • 71. Case Study: Talent Search Varun Mithal, Girish Kathalagiri, Sahin Cem Geyik 71
  • 72. LinkedIn Recruiter ● Recruiter Searches for Candidates ○ Standardized and free-text search criteria ● Retrieval and Ranking ○ Filter candidates using the criteria ○ Rank candidates in multiple levels using ML models 72
  • 73. Modeling Approaches ● Pairwise XGBoost ● GLMix ● DNNs via TensorFlow ● Optimization Criteria: inMail Accepts ○ Positive: inMail sent by recruiter, and positively responded by candidate ■ Mutual interest between the recruiter and the candidate 73
  • 74. Feature Importance in XGBoost 74
  • 75. How We Utilize Feature Importances for GBDT ● Understanding feature digressions ○ Which a feature that was impactful no longer is? ○ Should we debug feature generation? ● Introducing new features in bulk and identifying effective ones ○ An activity feature for last 3 hours, 6 hours, 12 hours, 24 hours introduced (costly to compute) ○ Should we keep all such features? ● Separating the factors for that caused an improvement ○ Did an improvement come from a new feature, or a new labeling strategy, data source? ○ Did the ordering between features change? ● Shortcoming: A global view, not case by case 75
  • 76. GLMix Models ● Generalized Linear Mixed Models ○ Global: Linear Model ○ Per-contract: Linear Model ○ Per-recruiter: Linear Model ● Lots of parameters overall ○ For a specific recruiter or contract the weights can be summed up ● Inherently explainable ○ Contribution of a feature is “weight x feature value” ○ Can be examined in a case-by-case manner as well 76
  • 77. TensorFlow Models in Recruiter and Explaining Them ● We utilize the Integrated Gradients [ICML 2017] method ● How do we determine the baseline example? ○ Every query creates its own feature values for the same candidate ○ Query match features, time-based features ○ Recruiter affinity, and candidate affinity features ○ A candidate would be scored differently by each query ○ Cannot recommend a “Software Engineer” to a search for a “Forensic Chemist” ○ There is no globally neutral example for comparison! 77
  • 78. Query-Specific Baseline Selection ● For each query: ○ Score examples by the TF model ○ Rank examples ○ Choose one example as the baseline ○ Compare others to the baseline example ● How to choose the baseline example ○ Last candidate ○ Kth percentile in ranking ○ A random candidate ○ Request by user (answering a question like: “Why was I presented candidate x above candidate y?”) 78
  • 80. Example - Detailed 80 Feature Description Difference (1 vs 2) Contribution Feature………. Description………. -2.0476928 -2.144455602 Feature………. Description………. -2.3223877 1.903594618 Feature………. Description………. 0.11666667 0.2114946752 Feature………. Description………. -2.1442587 0.2060414469 Feature………. Description………. -14 0.1215354111 Feature………. Description………. 1 0.1000282466 Feature………. Description………. -92 -0.085286277 Feature………. Description………. 0.9333333 0.0568533262 Feature………. Description………. -1 -0.051796317 Feature………. Description………. -1 -0.050895940
  • 81. Pros & Cons ● Explains potentially very complex models ● Case-by-case analysis ○ Why do you think candidate x is a better match for my position? ○ Why do you think I am a better fit for this job? ○ Why am I being shown this ad? ○ Great for debugging real-time problems in production ● Global view is missing ○ Aggregate Contributions can be computed ○ Could be costly to compute 81
  • 82. Lessons Learned and Next Steps ● Global explanations vs. Case-by-case Explanations ○ Global gives an overview, better for making modeling decisions ○ Case-by-case could be more useful for the non-technical user, better for debugging ● Integrated gradients worked well for us ○ Complex models make it harder for developers to map improvement to effort ○ Use-case gave intuitive results, on top of completely describing score differences ● Next steps ○ Global explanations for Deep Models 82
  • 83. Case Study: Model Interpretation for Predictive Models in B2B Sales Predictions Jilei Yang, Wei Di, Songtao Guo 83
  • 84. Problem Setting ● Predictive models in B2B sales prediction ○ E.g.: random forest, gradient boosting, deep neural network, … ○ High accuracy, low interpretability ● Global feature importance → Individual feature reasoning 84
  • 86. Revisiting LIME ● Given a target sample 𝑥 𝑘, approximate its prediction 𝑝𝑟𝑒𝑑(𝑥 𝑘) by building a sample-specific linear model: 𝑝𝑟𝑒𝑑(𝑋) ≈ 𝛽 𝑘1 𝑋1 + 𝛽 𝑘2 𝑋2 + …, 𝑋 ∈ 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟(𝑥 𝑘) ● E.g., for company CompanyX: 0.76 ≈ 1.82 ∗ 0.17 + 1.61 ∗ 0.11+… 86
  • 88. Piecewise Linear Regression Motivation: Separate top positive feature influencers and top negative feature influencers 88
  • 89. Impact of Piecewise Approach ● Target sample 𝑥 𝑘=(𝑥 𝑘1, 𝑥 𝑘2, ⋯) ● Top feature contributor ○ LIME: large magnitude of 𝛽 𝑘𝑗 ⋅ 𝑥 𝑘𝑗 ○ xLIME: large magnitude of 𝛽 𝑘𝑗 − ⋅ 𝑥 𝑘𝑗 ● Top positive feature influencer ○ LIME: large magnitude of 𝛽 𝑘𝑗 ○ xLIME: large magnitude of negative 𝛽 𝑘𝑗 − or positive 𝛽 𝑘𝑗 + ● Top negative feature influencer ○ LIME: large magnitude of 𝛽 𝑘𝑗 ○ xLIME: large magnitude of positive 𝛽 𝑘𝑗 − or negative 𝛽 𝑘𝑗 + 89
  • 90. Localized Stratified Sampling: Idea Method: Sampling based on empirical distribution around target value at each feature level 90
  • 91. Localized Stratified Sampling: Method ● Sampling based on empirical distribution around target value for each feature ● For target sample 𝑥 𝑘 = (𝑥 𝑘1 , 𝑥 𝑘2 , ⋯), sampling values of feature 𝑗 according to 𝑝𝑗 (𝑋𝑗) ⋅ 𝑁(𝑥 𝑘𝑗 , (𝛼 ⋅ 𝑠𝑗 )2) ○ 𝑝𝑗 (𝑋𝑗) : empirical distribution. ○ 𝑥 𝑘𝑗 : feature value in target sample. ○ 𝑠𝑗 : standard deviation. ○ 𝛼 : Interpretable range: tradeoff between interpretable coverage and local accuracy. ● In LIME, sampling according to 𝑁(𝑥𝑗 , 𝑠𝑗 2). 91 _
  • 93. LTS LCP (LinkedIn Career Page) Upsell ● A subset of churn data ○ Total Companies: ~ 19K ○ Company features: 117 ● Problem: Estimate whether there will be upsell given a set of features about the company’s utility from the product 93
  • 95. 95
  • 97. Key Takeaways ● Looking at the explanation as contributor vs. influencer features is useful ○ Contributor: Which features end-up in the current outcome case-by-case ○ Influencer: What needs to be done to improve likelihood, case-by-case ● xLIME aims to improve on LIME via: ○ Piecewise linear regression: More accurately describes local point, helps with finding correct influencers ○ Localized stratified sampling: More realistic set of local points ● Better captures the important features 97
  • 98. Case Study: Integrated Gradients for Explaining Diabetic Retinopathy Predictions Ankur Taly** (Fiddler labs) (Joint work with Mukund Sundararajan, Kedar Dhamdhere, Pramod Mudrakarta) 98 **This research was carried out at Google Research
  • 99. Prediction: “proliferative” DR1 ● Proliferative implies vision-threatening Can we provide an explanation to the doctor with supporting evidence for “proliferative” DR? Retinal Fundus Image 1Diabetic Retinopathy (DR) is a diabetes complication that affects the eye. Deep networks can predict DR grade from retinal fundus images with high accuracy (AUC ~0.97) [JAMA, 2016].
  • 100. Retinal Fundus Image Integrated Gradients for label: “proliferative” Visualization: Overlay heatmap on green channel
  • 101. Retinal Fundus Image Integrated Gradients for label: “proliferative” Visualization: Overlay heatmap on green channel Lesions Neovascularization ● Hard to see on original image ● Known to be vision-threatening
  • 102. Retinal Fundus Image Integrated Gradients for label: “proliferative” Visualization: Overlay heatmap on green channel Lesions Neovascularization ● Hard to see on original image ● Known to be vision-threatening Does attributions help improve doctors better diagnose diabetic retinopathy?
  • 103. Assisted-read study 9 doctors grade 2000 images under three different conditions A. Image only B. Image + Model’s prediction scores C. Image + Model’s prediction scores + Explanation (Integrated Gradients)
  • 104. 9 doctors grade 2000 images under three different conditions A. Image only B. Image + Model’s prediction scores C. Image + Model’s prediction scores + Explanation (Integrated Gradients) Findings: ● Model’s predictions (B) significantly improve accuracy vs. image only (A) (p < 0.001) ● Both forms of assistance (B and C) improved sensitivity without hurting specificity ● Explanations (C) improved accuracy of cases with DR (p < 0.001) but hurt accuracy of cases without DR (p = 0.006) ● Both B and C increase doctor ↔ model agreement Paper: Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy --- Journal of Ophthalmology [2018] Assisted-read study
  • 105. Case Study: Fiddler’s Explainable AI Engine Ankur Taly (Fiddler labs) 105
  • 106. All your data Any data warehouse Custom Models Fiddler Modeling Layer Explainable AI for everyone APIs, Dashboards, Reports, Trusted Insights Fiddler’s Explainable AI Engine Mission: Unlock Trust, Visibility and Insights by making AI Explainable in every enterprise
  • 107. Credit Line Increase Fair lending laws [ECOA, FCRA] require credit decisions to be explainable Bank Credit Lending Model Why? Why not? How? ? Request Denied Query AI System Credit Lending Score = 0.3 Example: Credit Lending in a black-box ML world
  • 108. How Can This Help… Customer Support Why was a customer loan rejected? Bias & Fairness How is my model doing across demographics? Lending LOB What variables should they validate with customers on “borderline” decisions? Explain individual predictions (using Shapley Values)
  • 109. How Can This Help… Customer Support Why was a customer loan rejected? Bias & Fairness How is my model doing across demographics? Lending LOB What variables should they validate with customers on “borderline” decisions? Explain individual predictions (using Shapley Values)
  • 110. How Can This Help… Customer Support Why was a customer loan rejected? Bias & Fairness How is my model doing across demographics? Lending LOB What variables should they validate with customers on “borderline” decisions? Explain individual predictions (using Shapley Values) Probe the model on counterfactuals
  • 111. How Can This Help… Customer Support Why was a customer loan rejected? Why was the credit card limit low? Why was this transaction marked as fraud? Integrating explanations
  • 112. Classic result in game theory on distributing gain in a coalition game ● Coalition Games ○ Players collaborating to generate some gain (think: revenue) ○ Set function v(S) determining the gain for any subset S of players ● Shapley Values are a fair way to attribute the gain to players based on their contributions ○ Shapley value of a player is a specific weighted aggregation of its marginal contribution to all possible subsets of other players ○ Axiomatic guarantee: Unique method under four very simple axioms Shapley Value [Annals of Mathematical studies,1953]
  • 113. ● Define a coalition game for each model input X ○ Players are the features in the input ○ Gain is the model prediction (output), i.e., gain = F(X) ● Feature attributions are the Shapley values of this game Shapley Values for Explaining ML models
  • 114. ● Define a coalition game for each model input X ○ Players are the features in the input ○ Gain is the model prediction (output), i.e., gain = F(X) ● Feature attributions are the Shapley values of this game Challenge: What does it mean for only some players (features) to be present? Shapley Values for Explaining ML models
  • 115. ● Define a coalition game for each model input X ○ Players are the features in the input ○ Gain is the model prediction (output), i.e., gain = F(X) ● Feature attributions are the Shapley values of this game Challenge: What does it mean for only some players (features) to be present? Approach: Randomly sample the (absent) feature from a certain distribution ● Different approaches choose different distributions and yield significantly different attributions! Preprint: The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory Shapley Values for Explaining ML models
  • 116. How Can This Help… Global Explanations What are the primary feature drivers of the dataset on my model? Region Explanations How does my model perform on a certain slice? Where does the model not perform well? Is my model uniformly fair across slices? Slice & Explain
  • 117. Model Monitoring: Feature Drift Investigate Data Drift Impacting Model Performance Time slice Feature distribution for time slice relative to training distribution
  • 118. How Can This Help… Operations Why are there outliers in model predictions? What caused model performance to go awry? Data Science How can I improve my ML model? Where does it not do well? Model Monitoring: Outliers with Explanations Outlier Individual Explanations
  • 119. Model Diagnostics Root Cause Analytics Performance monitoring Fairness monitoring Model Comparison Cohort Analysis Explainable Decisions API Support Model Launch Signoff Model Release Mgmt Model Evaluation Compliance Testing Model Debugging Model Visualization Explainable AI Train QA Predict Deploy A/B Test Monitor Debug Feedback Loop An Explainable Future
  • 120. Explainability Challenges & Tradeoffs 120 User PrivacyTransparency Fairness Performance ? ● Lack of standard interface for ML models makes pluggable explanations hard ● Explanation needs vary depending on the type of the user who needs it and also the problem at hand. ● The algorithm you employ for explanations might depend on the use-case, model type, data format, etc. ● There are trade-offs w.r.t. Explainability, Performance, Fairness, and Privacy.
  • 121. Explainability in ML: Broad Challenges Actionable explanations Balance between explanations & model secrecy Robustness of explanations to failure modes (Interaction between ML components) Application-specific challenges Conversational AI systems: contextual explanations Gradation of explanations Tools for explanations across AI lifecycle Pre & post-deployment for ML models Model developer vs. End user focused
  • 122. Reflections ● Case studies on explainable AI in practice ● Need “Explainability by Design” when building AI products 122
  • 123. Thanks! Questions? ● Feedback most welcome :-) ○ krishna@fiddler.ai, sgeyik@linkedin.com, kenthk@amazon.com, vamithal@linkedin.com, ankur@fiddler.ai ● Tutorial website: https://sites.google.com/view/fat20-explainable-ai- tutorial ● To try Fiddler, please send an email to info@fiddler.ai 123

Editor's Notes

  1. Krishna
  2. Models have many interactions within a business, multiple stakeholders, and varying requirements. Businesses need a 360-degree view of all models within a unified modeling universe that allows them to understand how models affect one another and all aspects of business functions across the organization. Worth saying that some of the highlighted concerns are applicable even for decisions not involving humans. Example: “Can I trust our AI decisions?” arises in domains with high stakes, e.g., deciding where to excavate/drill for mining/oil exploration; understanding the predicted impact of certain actions (e.g., reducing carbon emission by xx%) on global warming.
  3. Given the high-stakes associated with safety critical systems (such as autonomous driving and healthcare), explainability is a prerequisite for understanding failure modes, limitations, and safety verification.
  4. Explainability can be useful for learning about new insights (e.g., new go moves or how the brain works) from explaining the behavior of a blackbox model. E.g., can you extract a language dictionary from a MT model?
  5. At the same time, explainability is important from compliance & legal perspective as well. There have been several laws in countries such as the United States that prohibit discrimination based on “protected attributes” such as race, gender, age, disability status, and religion. Many of these laws have their origins in the Civil Rights Movement in the United States (e.g., US Civil Rights Act of 1964). When legal frameworks prohibit the use of such protected attributes in decision making, there are usually two competing approaches on how this is enforced in practice: Disparate Treatment vs. Disparate Impact. Avoiding disparate treatment requires that such attributes should not be actively used as a criterion for decision making and no group of people should be discriminated against because of their membership in some protected group. Avoiding disparate impact requires that the end result of any decision making should result in equal opportunities for members of all protected groups irrespective of how the decision is made. Please see NeurIPS’17 tutorial titled Fairness in machine learning by Solon Barocas and Moritz Hardt for a thorough discussion.
  6. Renewed focus in light of privacy / algorithmic bias / discrimination issues observed recently with algorithmic systems. Examples: EU General Data Protection Regulation (GDPR), which came into effect in May, 2018, and subsequent regulations such as California’s Consumer Privacy Act, which would take effect in January, 2020. The focus is not only around privacy of users (as the name may indicate), but also on related dimensions such as algorithmic bias (or ensuring fairness), transparency, and explainability of decisions. Image credit: https://pixabay.com/en/gdpr-symbol-privacy-icon-security-3499380/
  7. Retain human decision in order to assign responsibility. https://gdpr-info.eu/art-22-gdpr/ Bryce Goodman, Seth Flaxman, European Union regulations on algorithmic decision-making and a “right to explanation” https://arxiv.org/pdf/1606.08813.pdf Renewed focus in light of privacy / algorithmic bias / discrimination issues observed recently with algorithmic systems. Examples: EU General Data Protection Regulation (GDPR), which came into effect in May, 2018, and subsequent regulations such as California’s Consumer Privacy Act, which would take effect in January, 2020. The focus is not only around privacy of users (as the name may indicate), but also on related dimensions such as algorithmic bias (or ensuring fairness), transparency, and explainability of decisions. Image credit: https://pixabay.com/en/gdpr-symbol-privacy-icon-security-3499380/
  8. https://gdpr-info.eu/recitals/no-71/ Renewed focus in light of privacy / algorithmic bias / discrimination issues observed recently with algorithmic systems. Examples: EU General Data Protection Regulation (GDPR), which came into effect in May, 2018, and subsequent regulations such as California’s Consumer Privacy Act, which would take effect in January, 2020. The focus is not only around privacy of users (as the name may indicate), but also on related dimensions such as algorithmic bias (or ensuring fairness), transparency, and explainability of decisions. Image credit: https://pixabay.com/en/gdpr-symbol-privacy-icon-security-3499380/
  9. European Commission Vice-President for the #DigitalSingleMarket.
  10. There are also regulations in the financial industry pertaining to fairness such as Fair Credit Reporting Act (FCRA) and to model governance and risk management. For example, stress testing of models, what happens during recession, etc.
  11. Convey that this is the Need Purpose: Convey the need for adopting “explainability by design” approach when building AI products, that is, we should take explainability into account while designing & building AI/ML systems, as opposed to viewing these factors as an after-thought. -- Want explainability at all stages of the ML lifecycle
  12. Challenge: scaling explainability solutions for AI based services and products across the entire LinkedIn ecosystem, considering the scale of data (2PB+ data processed nearline/offline per day; 200+ A/B experiments per week; Billions of parameters across AI models). Largest professional network in the world. It is a platform for every professional to tell their story. Who they are, where they work, skills, etc. Once you have done that, the platform works infusing intelligence by harnessing the power of this data to help connect talent with opportunity at scale. We have have built the Economic Graph. these numbers here reflect some of our entities. Companies mean company pages, jobs means job postings The graph has links (this field of study creates these skills, which in turn are used for these jobs,, in this company) Professionals are signing up to join LinkedIn at a rate of more than two new members per second. There are more than 40 million students and recent college graduates on LinkedIn. They are LinkedIn's fastest-growing demographic. Our dream is to help people more easily navigate this increasingly challenging 21st century global economy by developing the world's first economic graph, i.e. we want to digitally map the global economy and in doing so, create economic opportunity for every one of the 3B people in the global workforce. * More than 200+ countries | 1.5K fields of study | 600+ degrees | 24K titles ... Our vision is to develop a profile for every member of the global workforce, all 3 billion of them, every employer in the world, and every open job at each of these companies, to provide every member of the global workforce with transparency into the skills required to obtain those jobs.  We want to build a profile for every educational institution or training facility that enables people to acquire those skills, and a publishing platform that enables every individual, every company, and every university to share their professionally-relevant knowledge if they’re interested in doing so.
  13. Challenge: Scaling explainability solutions to meet the needs of customers from several different industries, with varied needs and requirements We think about ML at a macro level as having 3 layers of the stack… At the bottom layer is the frameworks, interfaces, and infrastructure for expert ML practitioners At the frameworks layer, there are three frameworks that are predominantly used: TensorFlow, PyTorch, and MXNet. TF has the most resonance and largest community today, and we have a lot of TF experience, usage. 85% of TensorFlow running in the cloud runs on AWS. Very few ML practitioners use only a single framework (only 1 in 10); nine out of ten ML practitioners use more than one framework. New algorithms are being developed all the time in different frameworks that people want to borrow from, or run in that framework – instead of having to port it themselves to a different framework. So we have dedicated teams for all of the major frameworks so customers have the right tool for the right job.
  14. This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc). Intrinsic interpretability refers to machine learning models that are considered interpretable due to their simple structure, such as short decision trees or sparse linear models. Post hoc interpretability refers to the application of interpretation methods after model training.
  15. Disclaimer: not comprehensive Left pink bubble: methods to build interpretable ML models Bottom left bubble: methods to explain individual predictions of blackbox models Right side: methods to understand model behavior globally
  16. This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc). Intrinsic interpretability refers to machine learning models that are considered interpretable due to their simple structure, such as short decision trees or sparse linear models. Post hoc interpretability refers to the application of interpretation methods after model training.
  17. We ask this question for a training image classification network.
  18. There are several different formulations of this Why question. Explain the function captured by the network in the local vicinity of the network Explain the entire function of the network Explain each prediction in terms of training examples that influence it And then there is attribution which is what we consider.
  19. Explinability helped in identifying the bug that bond convolutions were disconnected from the prediction layer. This was subsequently fixed. Notice that while this is a glaring bug test accuracy of the original model did not catch this bug.
  20. Have these models read the question carefully? Can we say “yes”, because they achieve good accuracy? That is the question we wish to address in this paper. Let us start by looking at an example.
  21. Have these models read the question carefully? Can we say “yes”, because they achieve good accuracy? That is the question we wish to address in this paper. Let us start by looking at an example.
  22. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  23. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  24. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  25. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  26. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  27. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  28. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  29. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  30. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  31. You may feel that Integration and Gradients are inverse operations of each other, and we did nothing.
  32. Our product approach is “diversity by design”. This means we’re not building a separate diversity product or diversity filter but rather integrating diversity insights through every step of our customers’ talent strategy. Why diversity? – Diversity tied to company culture (78%) & financial performance (62%) Motivated by LinkedIn’s goal of creating economic opportunity for every member of the global workforce and by a keen interest from our customers in making sure that they are able to source diverse talent, we have adopted a “diversity by design” approach for Talent Search in our LinkedIn Recruiter product. https://business.linkedin.com/talent-solutions/blog/product-updates/2018/linkedin-diversity-insights Building diverse and inclusive teams is the No. 1 talent priority for HR and talent acquisition professionals, according to the Global Recruiting Trends 2018 report. Research shows that diversity is tied directly to company culture and financial performance. In fact, LinkedIn data shows that 78% of companies prioritize diversity to improve culture and 62% do so to boost financial performance.
  33. https://business.linkedin.com/talent-solutions/blog/product-updates/2018/linkedin-diversity-insights Building diverse and inclusive teams is the No. 1 talent priority for HR and talent acquisition professionals, according to the Global Recruiting Trends 2018 report. Research shows that diversity is tied directly to company culture and financial performance. In fact, LinkedIn data shows that 78% of companies prioritize diversity to improve culture and 62% do so to boost financial performance. Our product approach is “diversity by design”. This means we’re not building a separate diversity product or diversity filter but rather integrating diversity insights through every step of our customers’ talent strategy.
  34. Gender insights as part of LinkedIn Talent Insights help our customers plan for diverse teams, set hiring goals and identify talent pools with greater gender representation.
  35. Gender insights as part of LinkedIn Talent Insights help our customers plan for diverse teams, set hiring goals and identify talent pools with greater gender representation. LinkedIn’s customers can gauge the gender representation of their workforce, benchmark against industry peers, and identify teams that have the biggest room for improvement. These insights could be a persuasive tool to set realistic hiring goals that account for greater representation.
  36. The Talent Pool report as part of LinkedIn Talent Insights will include gender insights in the Location, Industry, Title, and Skills tabs to help customers identify talent pools with greater gender representation. These insights are powerful to inspire and influence leaders to hire qualified candidates from new industries or locations their recruiting team may have otherwise overlooked.
  37. Gender insights in Jobs and InMail (a message sent by a recruiter to a candidate) reports help teams understand their InMail acceptance rates and Jobs performance by gender. These insights serve to flag and identify opportunities for more inclusive job descriptions or InMail outreach.
  38. S. C. Geyik, K. Kenthapadi, Building Representative Talent Search at LinkedIn, LinkedIn engineering blog post, October’18 (https://engineering.linkedin.com/blog/2018/10/building-representative-talent-search-at-linkedin) S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search, KDD’19. https://arxiv.org/abs/1905.01989 The goal of LinkedIn Talent Search systems is to provide a tool that recruiters and hiring managers can use to source suitable talent for their needs. This is mainly accomplished through the LinkedIn Recruiter product, which recruiters use to identify “talent pools” that are optimized for the likelihood of making a successful hire. The Recruiter product provides a ranked list of candidates corresponding to a search request in the form of a query, a job posting, or an ideal candidate. Given a search request, candidates that match the request are first selected and then ranked based on a variety of factors (such as the similarity of their work experience/skills with the search criteria, job posting location, and the likelihood of a response from an interested candidate) using machine-learned models in multiple passes.  ---- Recruiter search will deliver representative results that reflect the gender distribution of the underlying talent pool. As a simplified example, if the underlying talent pool for Account Executives in the U.S. is 44% women, a search in Recruiter would yield approximately 44% female and 56% male candidates. With representative results, our customers can expect any sourcing they undertake to produce diverse results reflective of the available talent pool, starting with Page 1 in LinkedIn Recruiter.
  39. Our measurement and mitigation approach assumes that in the ideal setting, the set of qualified candidates and the set of top-ranked results for a search request both should have the same distribution on the attribute of interest; i.e., the ranked/recommended list should be representative of the qualified list. This assumption could be thought of as based on the definition of equal opportunity defined in prior research in the domain of machine learning. In mathematical terms, a predictor function is said to satisfy equal opportunity with respect to an attribute and true outcome if the predictor and the attribute are independent conditional on the true outcome being 1 (favorable). In our setting, we assume that the LinkedIn members that match the criteria specified by recruiters in a search request are “qualified” for that search request. We can roughly map to the above definition as follows. The predictor function corresponds to whether a candidate is presented in the top-ranked results for the search request, while the true outcome corresponds to whether a candidate matches the search request criteria (or equivalently, is “qualified” for the search request). Satisfying the above definition means that whether or not a member is included in the top-ranked results does not depend on the attribute of interest, or, equivalently, that the proportion of members belonging to a given value of the attribute does not vary between the set of qualified candidates and the set of top-ranked results. This requirement can also be thought of as seeking statistical parity between the set of qualified candidates and the set of top-ranked results. In other words, the operating definition of “equal” or “fair” here is towards achieving ranked results that are representative (in terms of the attribute of interest, and in the scope of this work, inferred gender) of the qualified population. We define qualified population to be the set of candidates (LinkedIn members) that match the criteria set forth in the recruiter’s query (e.g., if the query is “Skill:Java,” then the qualified population is the set of members that are familiar with computer programming using the Java programming language).
  40. We compute the gender proportions in the set of qualified candidates as follows. First, we use LinkedIn’s Galene search engine to obtain the set of qualified candidates that match the criteria specified as part of the search query by the user. We then compute the empirical distribution over the genders to understand the required gender representation constraints, and use this distribution for computing our representation metrics (after the fact, for evaluation), and apply re-ranking (during recommendation). 
  41. Partition the set of potential candidates into different gender buckets. Rank the candidates in each gender bucket according to the scores assigned by the machine-learned model. Merge the gender buckets, while obeying representation constraints based on the gender proportions computed from the set of qualified candidates. The merging is designed to keep the gender proportions in the ranked list similar to the corresponding proportions in the set of qualified candidates for every index of recommendation (e.g., for the top result, for the top two results, for the top five results, and so on). Note that the merging preserves the score-based order of candidates within each gender bucket. We adopted the post-processing approach due to the following practical considerations. First, this approach is agnostic to the specifics of each model and therefore scalable across different model choices for our application. Second, this approach is easier to incorporate as part of existing systems, since we can build a stand-alone service/component for post-processing without significant modifications to the existing components. Finally, our approach aims to ensure that the search results presented to the users of LinkedIn Recruiter are representative of the underlying talent pool.
  42. Our experiments with the gender-representative ranking approach as outlined in this blog post have shown that such a ranking scheme ensures that more than 95% of all the searches are representative of any gender compared to the qualified population of the search. This shows that the re-ranking approach works well, and we have ongoing efforts at LinkedIn to further improve the representativeness of our ranking algorithms. While the representativeness results have shown the effectiveness of our approach, we still ran an A/B test over our recruiter user set for two weeks to see the effect of representative ranking on our standard business metrics. During this evaluation, 50% of the users were assigned to a baseline algorithm which directly ranks candidates according to the scores generated by the machine-learned model, and 50% of the users were assigned to the representative ranking algorithm. In total, hundreds of thousands of recruiters were included in the A/B testing and validation work that went into this change. We have found out that the representative ranking approach caused no significant change in our business metrics, such as the number of inMails sent or accepted, meaning that ensuring representation does not negatively impact our customers’ success metrics. Based on the results of our validation of the representative ranking approach, we decided to ramp the representative ranking to an even wider audience. Currently, 100% of LinkedIn Recruiter users worldwide are presented candidates which go through the representative ranking process, helping providing equal economic opportunity for all the members of the workforce.
  43. (Lessons Learned in Practice) We decided to focus on post-processing algorithms due to the following practical considerations which we learned over the course of our investigations. First, applying such a methodology is agnostic to the specifics of each model and therefore scalable across different model choices for the same application and also across other similar applications. Second, in many practical internet applications, domain-specific business logic is typically applied prior to displaying the results from the ML model to the end user (e.g., prune candidates working at the same company as the recruiter), and hence it is more effective to incorporate bias mitigation as the very last step of the pipeline. Third, this approach is easier to incorporate as part of existing systems, compared to modifying the training algorithm or the features, since we can build a stand-alone service or component for post-processing without significant modifications to the existing components. In fact, our experience in practice suggests that post-processing is easier than eliminating bias from training data or during model training (especially due to redundant encoding of attributes of interest and the likelihood of both the model choices and features evolving over time). However, we remark that efforts to eliminate/reduce bias from training data or during model training can still be explored, and can be thought of as complementary to our approach, which functions as a ``fail-safe''.
  44. Problem formation: A team in china tried to use photos to predict criminality. Some tasks will definitionally be unethical. Is algorithm misusable in other contexts?
  45. Cross-functional initiative
  46. Neovascularization is hard to see on the original image.
  47. Neovascularization is hard to see on the original image.
  48. End-user may like explanations of individual predictions Some may prefer recourse based explanations Developers may prefer global explanations Many different algorithms: Even in the space of Shapley values, there are 4 different algorithms all very popular that yield different attributions. There are subtle design differrences.