In this talk, David Aronchick, co-founder of Kubeflow and Microsoft's Head of Open Source ML, talks about designing reproducible and reliable ML pipelines. He speaks about the importance and impact of MLOps and use of metadata in pipelines. He also talks about a library he wrote to help with this problem, MLSpecLib.
**These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**
6. Building
a model
Data ingestion Data analysis
Data
transformation
Data validation Data splitting
Trainer
Model
validation
Training
at scale
LoggingRoll-out Serving Monitoring
7. Ok, but, like, I’m
a data scientist. IDGAF
I don’t care
about all that.
10. Cowboys and Ranchers Can Be Friends!
SRE/ML EngineersData Scientist
• Quick iteration
• Frameworks they
understand
• Best of breed tools
• No management
headaches
• Unlimited scale
• Reuse of tooling and
platforms
• Corporate compliance
• Observability
• Uptime
12. MLOps = ML + DEV + OPS
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
+ Testing
Continuous Integration
Continuous Deployment
ML
17. A Small Example of Issues You Can Have…
• Inappropriate HW/SW stack
• Mismatched driver versions
• Crash looping deployment
• Data/model versioning [Nick Walsh]
• Non-standard images/OS version
• Pre-processing code doesn’t match
production pre-processing
• Production data doesn’t match
training/test data
• Output of the model doesn’t match
application expectations
• Hand-coded heuristics better than model
[Adam Laiacano]
• Model freshness (train on out-of-date
data/input shape changed)
• Test/production statistics/population
shape skew
• Overfitting on training/test data
• Bias introduction (or not tested)
• Over/under HW provisioning
• Latency issues
Or It Just Doesn’t Work!
At All!
• Permissions/certs
• Failure to obey health checks
• Killed production model before roll out
of new/in wrong order
• Thundering herd for new model
• Logging to the wrong location
• Storage for model not allocated
properly/accessible by deployment
tooling
• Route to artifacts not available for
download
• API signature changes not
propagated/expected
• Cross-data center latency
• Expected benefit doesn’t materialize
(e.g. multiple components in the app
change simultaneously)
• Get wrong/no traffic because A/B
config didn’t roll out
• No CI/CD; manual changes untracked
[Jon Peck]
• Get too much traffic too soon (expected to
canary/exponential roll out)
• Outliers not predicted [MikeBSilverman]
• Change was a good change, but didn’t
communicate with the rest of the team (so
you must roll back)
• No dates! (date to measure
impact/improvement against a pre-agreed
measure; date scheduled to assess data
changes) [Mary Branscombe]
• LACK OF DOCUMENTATION!! (the
problem, the testing, the solution, lots more)
[Terry Christiani]
• Successful model causes pain elsewhere in
the organization (e.g. detecting faults
previously missed) [Mark Round]
• Lack of visibility into real-time model
behavior (detecting data drift, live data
distribution vs train data, etc) [Nick Walsh]
18. Does My Model Actually Work?
SRE/ML EngineersData Scientist
Laptop The Cloud
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
Nice. Nice.
✔
21. MLOps is a Platform and a Philosophy
Even if:
• Every data scientist trained...
• And you had all the tools necessary...
• And they all worked together...
• And your SREs understood ML modeling...
• And and and and ...
You’d still need a permanent, repeatable
record of what you did
23. Does My Model Actually Work?
SRE/ML EngineersData Scientist
Laptop The Cloud
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
Nice. Nice.
✔
What goes
here?
25. Metadata is ...
A contract for the interface of a service
A historical record of the outcome of a process
3. Structured data that allows for (more) reliable
automated workflows
4. And much much more...
26. Does My Model Actually Work?
SRE/ML EngineersData Scientist
Laptop The Cloud
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
Nice. Nice.
✔
28. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
I’d Like a loan,
please.
Source Control
29. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
No.
Source Control
30. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Ok, but why?
Source Control
31. Source Control
What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Uh oh.
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
LawyerLawyer
32. It’s Not Just About Explainability!
• Yes, models are complicated
• But, that’s not enough:
• What data did you train on?
• How did you transform/exclude outliers?
• What are the data statistics?
• Did anything change between code and production?
• What model did you actually serve (to this person)?
• Metadata can help!
33. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
34. 32c04681d7573
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Source Control
Immutable
Metadata Store
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
32c04681d7573
35. Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Source Control
Immutable
Metadata Store
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
32c04681d7573
Why didn’t I get a
loan?
32c04681d7573
36. What Did My Customers See?
SRE/ML Engineers
Front End
Model Server
Customer
Immutable
Metadata Store
32c04681d7573
32c04681d7573
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for Bias
Clean/
Minimize
Code
Sane
Deployment
The Cloud
Source Control
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
32c04681d7573
37. Metadata Gives You a Repeatable Record
• What data you trained on
• How you transformed it for training
• What the results of the training were
• What kind of fairness tests you ran
• How those results compared with previous results
• How you rolled it out
• Which version a customer saw
• And, and, and ...
All Automatically!
(Mostly)
38. Ok, but you can’t
possibly expect me
to use YAML.
39. Introducing MLSpecLib
A simple, Python-native library for using with schematized objects
• Extends marshmallow (minimum rewriting)
• Comes with some standard schemas in the box
• It started with ML but it works for anything
But wait there’s more!
• Read/write serialized objects natively with Python (using dot
notation and everything) - No YAML! No JSON!
• User friendly, trivially extensible schema language - including
importing from a remote store
• “Lazy” enforcement (at load/save time only)
• Code-gen for the REALLY lazy (like me)