Adopting software design practices for better machine learning
1. Jeff McGehee
Data Scientist, IoT Practice Lead
The Process is
Everything
An interpretation of Google’s Rules of Machine Learning
(https://goo.gl/9AKRvC)
3. It’s easy to optimize your model loss, but it’s hard
to optimize value delivered.
3
4. Introducing SoDQoP
4
Speed of Delivery
● Time to market
● Avg speed of new features (agility)
Quality of Product
● User Experience
○ Reliability
○ Availability
○ Ease of use
○ Valuable Features
10. Scientists make discoveries. Engineers develop predictable processes for
aggregating and leveraging these discoveries in the world at large.
From (Computer) Science to (Software)
Engineering
10
11. Build lean.
Be clever with machine learning APIs.
Make it easy to improve your model.
Machine Learning Engineering
11
12. Build Lean
12
In General
● Understand what you’re measuring, and
why.
● Fail fast.
● Take “ship early and ship often”
seriously. Don’t waste time chasing a
few percentage points of accuracy on a
feature that users haven’t even been
exposed to yet.
Things We Do
● Two weeks or less to validate ML as a
viable solution.*
○ Jupyter, Pandas, SKLearn, TF,
Keras, Matplotlib/Seaborn.
● If you haven’t failed, ship what you have.
○ Serverless framework (AWS
Lambda, S3, Sagemaker, Batch).
● Iterate (rapidly) as needed.*
*Should be led by “Understand what you’re measuring, and why”
13. Be clever with ML APIs
13
In General
● Don’t reinvent the wheel.
● Have a deep understanding of accuracy
requirements.
● Build custom solutions where they will
have the highest impact.
Things We Do
● Wrap “noisy” API models with a
Bayesian inference engine tuned to
improve desired accuracy metric.
● Obtain features from API models (object
detection), and pass those to a final
model.
14. Make it easy to improve your model(s)
14
In General
● Record your predictions, along with
ground truth (if possible).
● Build features that allow users to label or
make corrections on predictions.
● Collect data that might be leveraged for
future models.
Things We Do
● AWS Lambda endpoints for receiving
data related to model feedback.
● High test coverage to facilitate agility
around changing the model.