The past decade has seen tremendous growth in production deployments of machine learning algorithms across a range of applications such as targeted advertising, self driving cars, speech translation, medical diagnosis etc [1]. In these contexts, models make key decisions such as predicting the likelihood of a person committing a future crime, trustworthiness for a loan approval, medical diagnosis etc [2]. Presence of bias based on gender, geographical location, race etc., and their consequent negative impact, have been uncovered in several of these deployments [3], [4]. Industries and governments are reacting, enacting regulations requiring that decisions made by machine learning models be Interpretable/Explainable [5].
Explainability across the full range of ML and DL algorithms is an unsolved research problem, with many innovations over the last several years and entire conferences devoted to the topic. However, even simple explainability solutions that are considered established in development (training environments) run into additional difficulties when put into live production.
Our design pattern uses a well known technique for explainability - the Canary model (sometimes called Surrogate model) [6,7]. In this approach, a classically non-explainable technique, such as a Neural Network, is paired with an explainable model (that approximates the predictions of the non-explainable technique) such as a Decision Tree. As long as predictions match - the Canary model’s behavior can be used to provide a human understandable reasoning for the prediction.
Interpretability and Reproducibility in Production Machine Learning Applications
1. Interpretability and Reproducibility in
Production Machine Learning
Applications
S i n d h u G h a n t a , S r i r a m S u b r a m a n i a n ,
S w a m i n a t h a n S u n d a r a r a m a n , L i o r
K h e r m o s h , V i n a y S r i d h a r , D u l c a r d o
A r t e a g a , Q i a n m e i L u o , D h a n a n j o y D a s ,
N i s h a T a l a g a l a
s w a m i @ p a r a l l e l m . c o m
3. ML in production
Research
Sandbox Production
Deployment
Operations
Predictions
Errors
Alerts
Warnings
Data
Scientist
Checkout https://mlops.org
4. Challenges
1) Interpretability of models is a requirement (i.e., model explainability)
• Growing regulatory requirements (SR11-7, OSFI-E23, etc.)
• Complex Data → Complex Models (e.g., Deep Learning models)
• Correlation ≠ Causality
2) Complex models require a “Canary” model for explainability
• Production data does not have labels and can change overtime
• Models represent the learnings from the data that they were trained on
3) How to diagnose or recreate production issues?
• Complex dependencies, distributed and heterogeneity, changing state
• Not always possible to recreate the production state
5. Explainability in Production (Canary)
Features
Labels
Primary
Model
(complex)
Training
Features
Labels
Canary
Model
(simple)
Training
6. Explainability in Production (pred. comparison)
Train set Train
RMSE
Inference set
Periodic Flash Linear Constant Poisson
Periodic 0.029 0.029 0.43 0.4 0 0.25
Flash 0.01 0.3 0.01 0.62 0.11 0.77
Linear 0.08 0.5 0.19 0.08 0.62 0.04
Constant 0 0 0 0 0 0
Poisson 0036 0.03 0.01 0.23 1 0.037
Same load Different load
How do you know
that Canary is able to
explain primary
predictions in
production?
Primary: MLP Canary: Decision Tree
Compare predictions
TELCO Dataset
8. Reproducibility (Challenges & Requirements)
• Complex Dependencies
• Datasets, pipelines, schedule, user actions
• Distribution & Heterogeneity
• Running at different locations, libraries, languages, environments
• Changing temporal State
• Interdependent pipelines running on different schedules
• Newer models can impact the prediction results
9. Reproducibility (Timeline Capture)
• Built a system to capture the entire state of the application
• Datasets, pipelines, models, user actions, logs, events, environment, etc.
• Supports “Auto” & “After the fact” captures and “live” browsing of timelines
11. Conclusions
• Production ≠ Sandbox
• Maintaining interpretability in production deployments is challenging!
• Reproducibility is just the first step in this journey
• We built a system that enables reproducibility & explainability for
complex ML environments
• Demonstrated using the Canary deployment use case