Challenges and strategies in bringing AI models to production

Challenges and strategies in bringing
AI models to production
David Qixiang Chen, PhD
Co-founder, CTO, Director of AI

(Watakabe et al, 2014)
Biomedical research is impossible without biological products

Reproducibility Crisis
Blame it on the antibodies
(Nature 2015)

Observation
Hypothesis
ExperimentAnalyze
Theory
2-6
Months wasted per
project
2M
Wasted funding“Blame it on the Antibodies”, Nature 2015
“Reproducibility: Standardize antibodies used in research”, Nature 2015
50%
of products fail

Software Ate The World
■ Software and IT
■ Consumer and Media
■ Finances/banking
904,860
895,670
874,710
818,160
493,750
475,730
472,940
440,980
372,230
342,170
MICROSOFT
APPLE
AMAZON
ALPHABET
BERKSHIRE HATHAWAY
FACEBOOK
ALIBABA
TENCENT
JOHNSON & JOHNSON
EXXONMOBILE
Top 10 Most Valuable Companies 2019 Q1
Source:Wikipedia

What is “Software” Anyway
“Traditional” Computing
■ Deterministic
■ Linear Models
“Artificial Intelligence”
■ Probabilistic
■ Non-linear models

Why Not Biomed
■ Nature is not
deterministic
■ Decisions are not
clear cut
■ Independent from
IT and computing
Traditional
Computing
Biology
Medical

Biology
Medical
Solvable Tasks
All Problems
Humans
AI
Trad Compute
AlphaGo

Biology
Medical
We Are Here
All Problems
Humans
AI
Trad Compute
AlphaGo
BenchSci

Challenges of ML Engineering
■Model Code Organization
■Data dependency
■Data/Model Drift
■Model co-dependency

The New Wall Of Confusion
ML Scientists Software Engineers
Here’s the BERT on GCN,
got accuracy to 99%.
Can you deploy it?
What’s aTensor?
Can I npm it?

The Engineering in ML
■ ML engineering is more than fine-tuning training metrics
■ Run-time efficiency
■ Coding structure for extensibility
■ Deployment scaling
■ Good ML engineers are good software engineers first

Model Code Structure
■ NLP and Image tasks
often require
transforming input data
■ Data transformation at
run-time is expensive
■ Models class should not
include these
preprocessing logic.

Use Classes To Encapsulate Models
■ Do use classes to
encapsulate model
training/prediction and
model definitions
■ Separate training and
prediction from the model
■ Don’t relying on ad-hoc
linear codes and do
everything within a single
file.

Separate Forward and Loss
■ Separate model forward
computation and loss
calculation
■ Optimizer and loss can change
often during R&D
■ Forward function will be
reused for inference
■ Needs to be as efficient as
possible

Separate Batching and Single Compute
■ Model assumes tensor I/O
only, do not include batching
logic within a model
■ In Tensorflow and PyTorch,
data loader is a separate class
that can include preprocessing
logic, and output an input
batch.
■ This should be included in the
training class, not the model
definition.

Data Dependency
■ Source control (Git) tracks stateless logic changes as code
■ ML systems are stateful depending on Data
ML Code
Training
Data
Model
Weights
Git
Inference
Data
Prediction

Data/Model Drift
“It’s not that I don’t understand, the world changes too fast”
– Cui Jian
■ Model captures training data assumptions
■ If input changes, the model will breakdown
■ 1. Data format contract ( string instead of numbers )
■ 2. Data input distribution (Here be dragons)

But Why?
ModelSensor ActionWorld
Input & Labels
Prediction

Data Dependency
■ Need to track Input Distribution assumptions
■ Meta should be captured with the model weights
Meta
Distribution
Monitor
ML Code
Training
Data
Model
WeightsInference
Data
Prediction

ML Code
Training
Data
Model
WeightsInference
Data
ML Code
Training
Data
Model
Weights
ML Code
Training
Data Model
Weights
ML Code
Training
Data
Model
Weights Prediction
Model Co-dependency

ML systems grow together
■ Real world systems is a composite of many ML deployments
■ End-to-end model is not realistic
■ Multiple models are intimately linked by data distribution
dependency
■ Top-level output distribution change will cause failure
cascades

Observation from Neuroscience
(Kuner and Flor, 2016)

ML Code
Training
Data
Model
WeightsInference
Data ML Code
Training
Data
Model
Weights
ML Code
Training
Data
Model
Weights Prediction
ML Systems as Single Entity
Meta
Distribution
Monitor

New Strategy Is Needed
■ Combine both modular system design, and ML system dependencies
■ Current coding practice only solves part of the problem
■ Better tools are needed to track multiple ML systems based on
distribution analysis
■ Rethink engineering roles and organization

Conclusion
■ ML Data dependency has challenges at all levels of system
engineering
■ ML system reliability is particularly critical in biomedical
domains
■ ML Deployment is a different beast from ML R&D
development
■ ML engineers will require a wider range of expertise

Challenges and strategies in bringing AI models to production

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Challenges and strategies in bringing AI models to production

Similar to Challenges and strategies in bringing AI models to production (20)

Recently uploaded

Recently uploaded (20)

Challenges and strategies in bringing AI models to production

Editor's Notes