SlideShare a Scribd company logo
What? Why? How? References
Increasing Trust and Understanding in Machine Learning with
Model Debugging
c Patrick Hall∗
H2O.ai
October 21, 2019
∗
This material is shared under a CC By 4.0 license which allows for editing and redistribution, even for commercial purposes.
However, any derivative work should attribute the author and H2O.ai.
1
What? Why? How? References
Contents
What?
Why?
How?
2
What? Why? How? References
What is Model Debugging?
• Model debugging is an emergent discipline focused on discovering and remediating
errors in the internal mechanisms and outputs of machine learning models.†
• Model debugging attempts to test machine learning models like code (because the
models are code).
• Model debugging promotes trust directly and enhances interpretability as a
side-effect.
• Model debugging is similar to regression diagnostics, but for nonlinear machine
learning models.
†
See https://debug-ml-iclr2019.github.io/ for numerous model debugging approaches.
3
What? Why? How? References
Trust and Understanding
Trust and understanding in machine learning are different but complementary goals,
and they are technically feasible today.
4
What? Why? How? References
Why Debug?
• Constrained, monotonic GBM probability of
default (PD) classifier, gmono.
• Grid search over hundreds of models.
• Best model selected by validation-based early
stopping.
• Seemingly well-regularized (row and column
sampling, explicit specification of L1 and L2
penalties).
• No evidence of over- or under-fitting.
• Better validation logloss than benchmark GLM.
• Decision threshold selected by maximization of F1
statistic.
• BUT traditional assessment can be inadequate ...
Validation Confusion Matrix at Threshold:
Actual: 1 Actual: 0
Predicted: 1 1159 827
Predicted: 0 1064 6004
5
What? Why? How? References
Why Debug?
Machine learning models can be inaccurate.
gmono PD classifier over-emphasizes the most important feature, a
customer’s most recent repayment status, PAY_0.
gmono also struggles to predict default for favorable statuses,
−2 ≤ PAY_0 < 2, and often cannot predict on-time payment
when recent payments are late, PAY_0 ≥ 2.
6
What? Why? How? References
Why Debug?
Machine learning models can perpetuate sociological biases [1].
Group disparity metrics are out-of-range for gmono across different marital statuses.
This does not address localized instances of discrimination.
7
What? Why? How? References
Why Debug?
Machine learning models can have security vulnerabilities [2], [5], [6].‡
‡
See https://github.com/jphall663/secure_ML_ideas for full size image and more information.
8
What? Why? How? References
How to Debug Models?
As part of a holistic, low-risk approach to machine learning.§
9
What? Why? How? References
Sensitivity Analysis: Partial Dependence and ICE
• Training data is very sparse for PAY_0 > 2.
• Residuals of partial dependence confirm over-emphasis on PAY_0.
• ICE curves indicate that partial dependence is likely trustworthy and empirically confirm monotonicity, but
also expose adversarial attack vulnerabilities.
• Partial dependence and ICE indicate gmono likely learned very little for PAY_0 ≥ 2.
• PAY_0 = missing gives lowest probability of default.
10
What? Why? How? References
Sensitivity Analysis: Search for Adversarial Examples
Adversary search confirms multiple avenues of attack and exposes a potential flaw in gmono scoring logic: default is predicted for
customer’s who make payments above their credit limit. (Try the cleverhans package for finding adversarial examples.)
11
What? Why? How? References
Sensitivity Analysis: Search for Adversarial Examples
gmono appears to prefer younger, unmarried customers, which should be investigated further with disparate impact analysis - see slide 7
- and could expose the lender to impersonation attacks. (Try the AIF360, aequitas, or themis packages for disparate impact audits.)
12
What? Why? How? References
Sensitivity Analysis: Random Attacks
• In general, random attacks are a viable method to identify software bugs in machine learning pipelines.
(Start here if you don’t know where to start.)
• Random data can apparently elicit all probabilities ∈ [0, 1] from gmono.
• Around the decision threshold, lower probabilities can be attained simply by injecting missing values, yet
another vulnerability to adversarial attack.
13
What? Why? How? References
Residual Analysis: Disparate Accuracy and Errors
For PAY_0:
• Notable change in accuracy and error characteristics for PAY_0 ≥ 2.
• 100% false omission rate for PAY_0 ≥ 2. (Every prediction of non-default is incorrect!)
For SEX, accuracy and error characteristics vary little across individuals represented in the training data. Non-discrimination should be
confirmed by disparate impact analysis.
14
What? Why? How? References
Residual Analysis: Mean Local Feature Contributions
Exact local Shapley feature contributions [4], which are available at scoring time for unlabeled data, are
noticeably different for low and high residual predictions. (Both monotonicity constraints and Shapley values are
available in h2o-3 and XGBoost.)
15
What? Why? How? References
Residual Analysis: Non-Robust Features
Globally important features PAY_3 and PAY_2 are more important, on average, to the loss than to the
predictions. (Shapley contributions to XGBoost logloss can be calculated using the shap package. This is a
time-consuming calculation.)
16
What? Why? How? References
Residual Analysis: Local Contributions to Logloss
Exact, local feature contributions to logloss can be calculated, enabling ranking of
features contributing to logloss residuals for each prediction.
17
What? Why? How? References
Residual Analysis: Modeling Residuals
Decision tree model of gmono DEFAULT_NEXT_MONTH = 1 logloss residuals with
3-fold CV MSE = 0.0070 and R2 = 0.8871.
This tree encodes rules describing when gmono is probably wrong.
18
What? Why? How? References
Benchmark Models: Compare to Linear Models
For a range of probabilities ∈ (∼ 0.2, ∼ 0.6), gmono displays exactly incorrect prediction
behavior as compared to a benchmark GLM.
19
What? Why? How? References
Remediation: for gmono
• Over-emphasis of PAY_0:
• Engineer features for payment trends or stability.
• Missing value injection during training or scoring.
• Sparsity of PAY_0 > 2 training data: Increase observation weights.
• Payments ≥ credit limit: Scoring-time model assertion [3].
• Disparate impact: Adversarial de-biasing [7] or model selection by minimal disparate impact.
• Security vulnerabilities: API throttling, authentication, real-time model monitoring.
• Large logloss importance: Evaluate dropping non-robust features.
• Poor accuracy vs. benchmark GLM: Blend gmono and GLM for probabilities ∈ (∼ 0.2, ∼ 0.6).
• Miscellaneous strategies:
• Local feature importance and decision tree rules can indicate additional scoring-time model
assertions, e.g. alternate treatment for locally non-robust features in known high-residual ranges of
the learned response function.
• Incorporate local feature contributions to logloss into training or scoring processes.
20
What? Why? How? References
Remediation: General Strategies
• Calibration to past data.
• Data collection or simulation for model blindspots.
• Detection and elimination of non-robust features.
• Missing value injection during training or scoring.
• Model or model artifact editing.
21
What? Why? How? References
References
This presentation:
https://www.github.com/jphall663/jsm_2019
Code examples for this presentation:
https://www.github.com/jphall663/interpretable_machine_learning_with_python
https://www.github.com/jphall663/responsible_xai
22
What? Why? How? References
References
[1] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. URL:
http://www.fairmlbook.org. fairmlbook.org, 2018.
[2] Marco Barreno et al. “The Security of Machine Learning.” In: Machine Learning 81.2 (2010). URL:
https://people.eecs.berkeley.edu/~adj/publications/paper-files/SecML-MLJ2010.pdf,
pp. 121–148.
[3] Daniel Kang et al. Debugging Machine Learning Models via Model Assertions. URL:
https://debug-ml-iclr2019.github.io/cameraready/DebugML-19_paper_27.pdf.
[4] Scott M. Lundberg and Su-In Lee. “A Unified Approach to Interpreting Model Predictions.” In: Advances
in Neural Information Processing Systems 30. Ed. by I. Guyon et al. URL: http:
//papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
Curran Associates, Inc., 2017, pp. 4765–4774.
[5] Reza Shokri et al. “Membership Inference Attacks Against Machine Learning Models.” In: 2017 IEEE
Symposium on Security and Privacy (SP). URL: https://arxiv.org/pdf/1610.05820.pdf. IEEE. 2017,
pp. 3–18.
23
What? Why? How? References
References
[6] Florian Tramèr et al. “Stealing Machine Learning Models via Prediction APIs.” In: 25th {USENIX}
Security Symposium ({USENIX} Security 16). URL:
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf.
2016, pp. 601–618.
[7] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. “Mitigating Unwanted Biases with Adversarial
Learning.” In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. URL:
https://arxiv.org/pdf/1801.07593.pdf. ACM. 2018, pp. 335–340.
24

More Related Content

What's hot

Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
vonaurum
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
Manojit Nandi
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
Mansour Saffar
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex models
Manojit Nandi
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Equifax Ltd
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
MaatougSelim
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
Krishnaram Kenthapadi
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
Gary Allemann
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)
Krishnaram Kenthapadi
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
James by CrowdProcess
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
Delip Rao
 
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and GovernanceGRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
Andrew Clark
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
Sudeep Shukla
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Operationalize deep learning models for fraud detection with Azure Machine Le...
Operationalize deep learning models for fraud detection with Azure Machine Le...Operationalize deep learning models for fraud detection with Azure Machine Le...
Operationalize deep learning models for fraud detection with Azure Machine Le...
Francesca Lazzeri, PhD
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
Scaleway
 

What's hot (20)

Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
 
Interpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex modelsInterpretable machine learning : Methods for understanding complex models
Interpretable machine learning : Methods for understanding complex models
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and GovernanceGRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Operationalize deep learning models for fraud detection with Azure Machine Le...
Operationalize deep learning models for fraud detection with Azure Machine Le...Operationalize deep learning models for fraud detection with Azure Machine Le...
Operationalize deep learning models for fraud detection with Azure Machine Le...
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 

Similar to Patrick Hall, H2O.ai - The Case for Model Debugging - H2O World 2019 NYC

Real-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsReal-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning Systems
Databricks
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Loan Approval Prediction
Loan Approval PredictionLoan Approval Prediction
Loan Approval Prediction
IRJET Journal
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
Sease
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
vodQA
 
Can AI finally "cure" the Marketing Myopia?
Can AI finally "cure" the Marketing Myopia?Can AI finally "cure" the Marketing Myopia?
Can AI finally "cure" the Marketing Myopia?
Tathagat Varma
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
loanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptxloanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptx
081JeetPatel
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET Journal
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
Software Design principales
Software Design principalesSoftware Design principales
Software Design principales
ABDEL RAHMAN KARIM
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Towards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptxTowards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptx
Luis775803
 
C3 w5
C3 w5C3 w5
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
gdgsurrey
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
JishanAhmed24
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
SigOpt
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
IRJET Journal
 

Similar to Patrick Hall, H2O.ai - The Case for Model Debugging - H2O World 2019 NYC (20)

Real-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsReal-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning Systems
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Loan Approval Prediction
Loan Approval PredictionLoan Approval Prediction
Loan Approval Prediction
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
 
Can AI finally "cure" the Marketing Myopia?
Can AI finally "cure" the Marketing Myopia?Can AI finally "cure" the Marketing Myopia?
Can AI finally "cure" the Marketing Myopia?
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
loanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptxloanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptx
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Software Design principales
Software Design principalesSoftware Design principales
Software Design principales
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Towards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptxTowards Responsible AI - KC.pptx
Towards Responsible AI - KC.pptx
 
C3 w5
C3 w5C3 w5
C3 w5
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Patrick Hall, H2O.ai - The Case for Model Debugging - H2O World 2019 NYC

  • 1. What? Why? How? References Increasing Trust and Understanding in Machine Learning with Model Debugging c Patrick Hall∗ H2O.ai October 21, 2019 ∗ This material is shared under a CC By 4.0 license which allows for editing and redistribution, even for commercial purposes. However, any derivative work should attribute the author and H2O.ai. 1
  • 2. What? Why? How? References Contents What? Why? How? 2
  • 3. What? Why? How? References What is Model Debugging? • Model debugging is an emergent discipline focused on discovering and remediating errors in the internal mechanisms and outputs of machine learning models.† • Model debugging attempts to test machine learning models like code (because the models are code). • Model debugging promotes trust directly and enhances interpretability as a side-effect. • Model debugging is similar to regression diagnostics, but for nonlinear machine learning models. † See https://debug-ml-iclr2019.github.io/ for numerous model debugging approaches. 3
  • 4. What? Why? How? References Trust and Understanding Trust and understanding in machine learning are different but complementary goals, and they are technically feasible today. 4
  • 5. What? Why? How? References Why Debug? • Constrained, monotonic GBM probability of default (PD) classifier, gmono. • Grid search over hundreds of models. • Best model selected by validation-based early stopping. • Seemingly well-regularized (row and column sampling, explicit specification of L1 and L2 penalties). • No evidence of over- or under-fitting. • Better validation logloss than benchmark GLM. • Decision threshold selected by maximization of F1 statistic. • BUT traditional assessment can be inadequate ... Validation Confusion Matrix at Threshold: Actual: 1 Actual: 0 Predicted: 1 1159 827 Predicted: 0 1064 6004 5
  • 6. What? Why? How? References Why Debug? Machine learning models can be inaccurate. gmono PD classifier over-emphasizes the most important feature, a customer’s most recent repayment status, PAY_0. gmono also struggles to predict default for favorable statuses, −2 ≤ PAY_0 < 2, and often cannot predict on-time payment when recent payments are late, PAY_0 ≥ 2. 6
  • 7. What? Why? How? References Why Debug? Machine learning models can perpetuate sociological biases [1]. Group disparity metrics are out-of-range for gmono across different marital statuses. This does not address localized instances of discrimination. 7
  • 8. What? Why? How? References Why Debug? Machine learning models can have security vulnerabilities [2], [5], [6].‡ ‡ See https://github.com/jphall663/secure_ML_ideas for full size image and more information. 8
  • 9. What? Why? How? References How to Debug Models? As part of a holistic, low-risk approach to machine learning.§ 9
  • 10. What? Why? How? References Sensitivity Analysis: Partial Dependence and ICE • Training data is very sparse for PAY_0 > 2. • Residuals of partial dependence confirm over-emphasis on PAY_0. • ICE curves indicate that partial dependence is likely trustworthy and empirically confirm monotonicity, but also expose adversarial attack vulnerabilities. • Partial dependence and ICE indicate gmono likely learned very little for PAY_0 ≥ 2. • PAY_0 = missing gives lowest probability of default. 10
  • 11. What? Why? How? References Sensitivity Analysis: Search for Adversarial Examples Adversary search confirms multiple avenues of attack and exposes a potential flaw in gmono scoring logic: default is predicted for customer’s who make payments above their credit limit. (Try the cleverhans package for finding adversarial examples.) 11
  • 12. What? Why? How? References Sensitivity Analysis: Search for Adversarial Examples gmono appears to prefer younger, unmarried customers, which should be investigated further with disparate impact analysis - see slide 7 - and could expose the lender to impersonation attacks. (Try the AIF360, aequitas, or themis packages for disparate impact audits.) 12
  • 13. What? Why? How? References Sensitivity Analysis: Random Attacks • In general, random attacks are a viable method to identify software bugs in machine learning pipelines. (Start here if you don’t know where to start.) • Random data can apparently elicit all probabilities ∈ [0, 1] from gmono. • Around the decision threshold, lower probabilities can be attained simply by injecting missing values, yet another vulnerability to adversarial attack. 13
  • 14. What? Why? How? References Residual Analysis: Disparate Accuracy and Errors For PAY_0: • Notable change in accuracy and error characteristics for PAY_0 ≥ 2. • 100% false omission rate for PAY_0 ≥ 2. (Every prediction of non-default is incorrect!) For SEX, accuracy and error characteristics vary little across individuals represented in the training data. Non-discrimination should be confirmed by disparate impact analysis. 14
  • 15. What? Why? How? References Residual Analysis: Mean Local Feature Contributions Exact local Shapley feature contributions [4], which are available at scoring time for unlabeled data, are noticeably different for low and high residual predictions. (Both monotonicity constraints and Shapley values are available in h2o-3 and XGBoost.) 15
  • 16. What? Why? How? References Residual Analysis: Non-Robust Features Globally important features PAY_3 and PAY_2 are more important, on average, to the loss than to the predictions. (Shapley contributions to XGBoost logloss can be calculated using the shap package. This is a time-consuming calculation.) 16
  • 17. What? Why? How? References Residual Analysis: Local Contributions to Logloss Exact, local feature contributions to logloss can be calculated, enabling ranking of features contributing to logloss residuals for each prediction. 17
  • 18. What? Why? How? References Residual Analysis: Modeling Residuals Decision tree model of gmono DEFAULT_NEXT_MONTH = 1 logloss residuals with 3-fold CV MSE = 0.0070 and R2 = 0.8871. This tree encodes rules describing when gmono is probably wrong. 18
  • 19. What? Why? How? References Benchmark Models: Compare to Linear Models For a range of probabilities ∈ (∼ 0.2, ∼ 0.6), gmono displays exactly incorrect prediction behavior as compared to a benchmark GLM. 19
  • 20. What? Why? How? References Remediation: for gmono • Over-emphasis of PAY_0: • Engineer features for payment trends or stability. • Missing value injection during training or scoring. • Sparsity of PAY_0 > 2 training data: Increase observation weights. • Payments ≥ credit limit: Scoring-time model assertion [3]. • Disparate impact: Adversarial de-biasing [7] or model selection by minimal disparate impact. • Security vulnerabilities: API throttling, authentication, real-time model monitoring. • Large logloss importance: Evaluate dropping non-robust features. • Poor accuracy vs. benchmark GLM: Blend gmono and GLM for probabilities ∈ (∼ 0.2, ∼ 0.6). • Miscellaneous strategies: • Local feature importance and decision tree rules can indicate additional scoring-time model assertions, e.g. alternate treatment for locally non-robust features in known high-residual ranges of the learned response function. • Incorporate local feature contributions to logloss into training or scoring processes. 20
  • 21. What? Why? How? References Remediation: General Strategies • Calibration to past data. • Data collection or simulation for model blindspots. • Detection and elimination of non-robust features. • Missing value injection during training or scoring. • Model or model artifact editing. 21
  • 22. What? Why? How? References References This presentation: https://www.github.com/jphall663/jsm_2019 Code examples for this presentation: https://www.github.com/jphall663/interpretable_machine_learning_with_python https://www.github.com/jphall663/responsible_xai 22
  • 23. What? Why? How? References References [1] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. URL: http://www.fairmlbook.org. fairmlbook.org, 2018. [2] Marco Barreno et al. “The Security of Machine Learning.” In: Machine Learning 81.2 (2010). URL: https://people.eecs.berkeley.edu/~adj/publications/paper-files/SecML-MLJ2010.pdf, pp. 121–148. [3] Daniel Kang et al. Debugging Machine Learning Models via Model Assertions. URL: https://debug-ml-iclr2019.github.io/cameraready/DebugML-19_paper_27.pdf. [4] Scott M. Lundberg and Su-In Lee. “A Unified Approach to Interpreting Model Predictions.” In: Advances in Neural Information Processing Systems 30. Ed. by I. Guyon et al. URL: http: //papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf. Curran Associates, Inc., 2017, pp. 4765–4774. [5] Reza Shokri et al. “Membership Inference Attacks Against Machine Learning Models.” In: 2017 IEEE Symposium on Security and Privacy (SP). URL: https://arxiv.org/pdf/1610.05820.pdf. IEEE. 2017, pp. 3–18. 23
  • 24. What? Why? How? References References [6] Florian Tramèr et al. “Stealing Machine Learning Models via Prediction APIs.” In: 25th {USENIX} Security Symposium ({USENIX} Security 16). URL: https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf. 2016, pp. 601–618. [7] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. “Mitigating Unwanted Biases with Adversarial Learning.” In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. URL: https://arxiv.org/pdf/1801.07593.pdf. ACM. 2018, pp. 335–340. 24