ML IN PRODUCTION
Serverless and Painless
Oliver Gindele
@tinyoli
oliver@datatonic.com
22.11.2019 ODSC London
Who is Oliver?
+ Head of Machine Learning
+ PhD in computational physics
Who is datatonic?
We are a strong team of data scientists, machine learning
experts, software engineers and mathematicians.
Our mission is to provide tailor-made systems to help your
organization get smart actionable insights from large data
volumes.
Why is moving models
into production hard?
Define ML use cases
Define specific ML use cases for
the project
Select algorithm
Choose the right ML
algorithm for the task
Build ML model
Develop the first iteration
of the ML model
Present results
Present results of the model in
a way that demonstrates its
value to stakeholders
Iterate ML model
Refine the ML model to
improve performance and
efficacy
Data pipeline &
feature engineering
Create the right
features from raw data
for the ML task
Plan for deployment
Prepare for deployment in
production
Operationalize model
Deploy and operationalize
ML model in production
Monitor model
Monitor deployed ML model
and retrain or rebuild when
performance degrades
1 3
10 789
Data exploration
Perform exploratory
analysis to understand the
data
2 4
6
5
Start
a new ML project
Discover Model Build Deploy
ML Project Life Cycle
Define ML use cases
Define specific ML use cases for
the project
Select algorithm
Choose the right ML
algorithm for the task
Build ML model
Develop the first iteration
of the ML model
Present results
Present results of the model in
a way that demonstrates its
value to stakeholders
Iterate ML model
Refine the ML model to
improve performance and
efficacy
Data pipeline &
feature engineering
Create the right
features from raw data
for the ML task
Plan for deployment
Prepare for deployment in
production
Operationalize model
Deploy and operationalize
ML model in production
Monitor model
Monitor deployed ML model
and retrain or rebuild when
performance degrades
1 3
10 789
Data exploration
Perform exploratory
analysis to understand the
data
2 4
6
5
Start
a new ML project
Discover Model Build Deploy
ML Project Life Cycle
The hard part!
Needs SE/DevOps skills
The Model
+ MobileNetV2
+ Lots of image augmentation, careful data selection
+ Transfer learning → retrain top layers
+ Small changes to overall architecture
+ Hyperparameter tuning
→ F1 score: 45% -> 97%
Define ML use cases
Define specific ML use cases for
the project
Select algorithm
Choose the right ML
algorithm for the task
Build ML model
Develop the first iteration
of the ML model
Present results
Present results of the model in
a way that demonstrates its
value to stakeholders
Iterate ML model
Refine the ML model to
improve performance and
efficacy
Data pipeline &
feature engineering
Create the right
features from raw data
for the ML task
Plan for deployment
Prepare for deployment in
production
Operationalize model
Deploy and operationalize
ML model in production
Monitor model
Monitor deployed ML model
and retrain or rebuild when
performance degrades
1 3
10 789
Data exploration
Perform exploratory
analysis to understand the
data
2 4
6
5
Start
a new ML project
Discover Model Build Deploy
ML Project Life Cycle
We want:
+ Quick development cycles 🚀
+ Continuous delivery 🔁
+ Ship ML models like “normal”
software 📦
+ As much automation as possible 🤖
We want:
+ Quick development cycles 🚀
+ Continuous delivery 🔁
+ Ship ML models like “normal”
software 📦
+ As much automation as possible 🤖
Why MLOps?
+ Orchestration of multiple
pipelines
+ Scalable ML Applications
+ Flexible, self-serve R&D
+ Reliable APIs
+ Ongoing data validation
+ Monitoring and validation of
ML predictions
+ Model governance/versioning
+ Continuous Integration and
Deployment
Avoid learning all
this!
The Landscape
(a sample) Tensorflow Extended (TFX)
Machine-learning Studio
AWS SageMaker
GCP AI Platform
Current issues with data science platforms
(generalisations, 1/2)
+ Incomplete, missing features
+ Not stable yet, still maturing (and changing!)
+ New custom languages/APIs to learn
+ Vendor Lock-in
Current issues with data science platforms
(generalisations, 2/2)
+ Requires non DS/ML skillset (Docker, Spark, K8s)
+ Experimentation can’t be contained to a platform
+ Data access/data silos not solved
+ Focus on DS/ML but not business value
What now? (2019 View)
Check if one the these tools ticks all your boxes
→ If not: roll your own tailored solution - It’s not that hard
→ Serverless and managed options are your friend!
Composer
BigQuery
GCS
AI Platform Managed (training, deployment, notebooks)
Blob storage
Fully managed data warehouse
Managed Airflow (orchestration)
Dataflow Managed ETL
Google Cloud Platform
example tools:
Let’s Build a Machine Learning Pipeline
for Computer Vision
🖼 Preprocess images
⚙ Automate model training
and evaluation
🔬 Monitor and track model
performance
🏷 Store and version models
Training
Images
.tflite
model
Android and
iOS apps🤷🏻‍♀ 🤷🏼‍♂
❓ ❓
Even more challenges on mobile
🏎 Processor speed
📦 Storage space
🔋 Power
⏰ Latency
📴 Disconnections
📡 Bandwidth
Data
preparation
Machine
Learning
Pipeline Overview
Image Upload
GCS
Image Upload
GCS
Convert images to
TFRecords
Dataflow
Data augmentation
AI Platform
Train/Eval Sets
GCS
Data preparation
Convert images to TFRecords
🏃 Dataflow is a serverless
runner for Beam pipelines
🌬 Converted 50k jpeg
training images in 10
minutes
🖼 TFRecords are serialized
images stored in a set of
files
🐍 Batch data processing in
Apache Beam (Python)
Model training &
evaluation
AI Platform
Convert model to
TFLite
AI Platform
Store evaluation
metrics
BigQuery
Model versions
GCS
Machine Learning
Train & evaluate and deploy model
📈 Serverless scalable model training with AI Platform
🔀 Distributed training & hyperparameter tuning
♻ Tensorflow code can run on AI Platform with no change
🔗 Easily Attach GPUs and TPUs
📲 Convert to .tflite (3.5 MB with full integer quantization)
Quantization
https://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd
Evaluate new models
Automation
Composer
Image Upload
GCS
Convert images
to TFRecords
Dataflow
Data
augmentation
AI Platform
Model training &
evaluation
AI Platform
Train/Eval Sets
GCS
Convert model to
TFLite
AI Platform
Store evaluation
metrics
BigQuery
Model versions
GCS
Data preparation Machine Learning
Serverless ML Pipeline build on GCP
Exported to TFLite and deployed to
Android and IOS apps
Huge improvement of model
performance
✅
✅
✅
Custom MLOps on GCP
+ Orchestration of multiple pipelines ✔
+ Scalable ML Applications ✔
+ Monitoring and validation of ML predictions ✔
+ Ongoing data validation ✔
+ Model Governance ✔
+ Continuous Integration and Deployment ✔
AI Platform
Composer
BigQuery
GCS
Composer Gitlab
AI Platform
Takeaways
+ Building your own custom solution is not that hard!
+ Cloud vendors already offer easy to use, fully
managed and battle tested components
+ New MLops platforms are maturing rapidly
—> Kubeflow, TFX, MLFlow
—> Can’t wait for 2020!
Thank you.
oliver@datatonic.com
@tinyoli
www.datatonic.com

ODSC London - ML in production

  • 1.
    ML IN PRODUCTION Serverlessand Painless Oliver Gindele @tinyoli oliver@datatonic.com 22.11.2019 ODSC London
  • 2.
    Who is Oliver? +Head of Machine Learning + PhD in computational physics Who is datatonic? We are a strong team of data scientists, machine learning experts, software engineers and mathematicians. Our mission is to provide tailor-made systems to help your organization get smart actionable insights from large data volumes.
  • 3.
    Why is movingmodels into production hard?
  • 4.
    Define ML usecases Define specific ML use cases for the project Select algorithm Choose the right ML algorithm for the task Build ML model Develop the first iteration of the ML model Present results Present results of the model in a way that demonstrates its value to stakeholders Iterate ML model Refine the ML model to improve performance and efficacy Data pipeline & feature engineering Create the right features from raw data for the ML task Plan for deployment Prepare for deployment in production Operationalize model Deploy and operationalize ML model in production Monitor model Monitor deployed ML model and retrain or rebuild when performance degrades 1 3 10 789 Data exploration Perform exploratory analysis to understand the data 2 4 6 5 Start a new ML project Discover Model Build Deploy ML Project Life Cycle
  • 5.
    Define ML usecases Define specific ML use cases for the project Select algorithm Choose the right ML algorithm for the task Build ML model Develop the first iteration of the ML model Present results Present results of the model in a way that demonstrates its value to stakeholders Iterate ML model Refine the ML model to improve performance and efficacy Data pipeline & feature engineering Create the right features from raw data for the ML task Plan for deployment Prepare for deployment in production Operationalize model Deploy and operationalize ML model in production Monitor model Monitor deployed ML model and retrain or rebuild when performance degrades 1 3 10 789 Data exploration Perform exploratory analysis to understand the data 2 4 6 5 Start a new ML project Discover Model Build Deploy ML Project Life Cycle The hard part! Needs SE/DevOps skills
  • 6.
    The Model + MobileNetV2 +Lots of image augmentation, careful data selection + Transfer learning → retrain top layers + Small changes to overall architecture + Hyperparameter tuning → F1 score: 45% -> 97%
  • 7.
    Define ML usecases Define specific ML use cases for the project Select algorithm Choose the right ML algorithm for the task Build ML model Develop the first iteration of the ML model Present results Present results of the model in a way that demonstrates its value to stakeholders Iterate ML model Refine the ML model to improve performance and efficacy Data pipeline & feature engineering Create the right features from raw data for the ML task Plan for deployment Prepare for deployment in production Operationalize model Deploy and operationalize ML model in production Monitor model Monitor deployed ML model and retrain or rebuild when performance degrades 1 3 10 789 Data exploration Perform exploratory analysis to understand the data 2 4 6 5 Start a new ML project Discover Model Build Deploy ML Project Life Cycle
  • 8.
    We want: + Quickdevelopment cycles 🚀 + Continuous delivery 🔁 + Ship ML models like “normal” software 📦 + As much automation as possible 🤖
  • 9.
    We want: + Quickdevelopment cycles 🚀 + Continuous delivery 🔁 + Ship ML models like “normal” software 📦 + As much automation as possible 🤖
  • 10.
    Why MLOps? + Orchestrationof multiple pipelines + Scalable ML Applications + Flexible, self-serve R&D + Reliable APIs + Ongoing data validation + Monitoring and validation of ML predictions + Model governance/versioning + Continuous Integration and Deployment
  • 11.
  • 12.
    The Landscape (a sample)Tensorflow Extended (TFX) Machine-learning Studio AWS SageMaker GCP AI Platform
  • 13.
    Current issues withdata science platforms (generalisations, 1/2) + Incomplete, missing features + Not stable yet, still maturing (and changing!) + New custom languages/APIs to learn + Vendor Lock-in
  • 14.
    Current issues withdata science platforms (generalisations, 2/2) + Requires non DS/ML skillset (Docker, Spark, K8s) + Experimentation can’t be contained to a platform + Data access/data silos not solved + Focus on DS/ML but not business value
  • 15.
    What now? (2019View) Check if one the these tools ticks all your boxes → If not: roll your own tailored solution - It’s not that hard → Serverless and managed options are your friend! Composer BigQuery GCS AI Platform Managed (training, deployment, notebooks) Blob storage Fully managed data warehouse Managed Airflow (orchestration) Dataflow Managed ETL Google Cloud Platform example tools:
  • 16.
    Let’s Build aMachine Learning Pipeline for Computer Vision 🖼 Preprocess images ⚙ Automate model training and evaluation 🔬 Monitor and track model performance 🏷 Store and version models
  • 17.
  • 18.
    Even more challengeson mobile 🏎 Processor speed 📦 Storage space 🔋 Power ⏰ Latency 📴 Disconnections 📡 Bandwidth
  • 19.
  • 20.
    Image Upload GCS Convert imagesto TFRecords Dataflow Data augmentation AI Platform Train/Eval Sets GCS Data preparation
  • 21.
    Convert images toTFRecords 🏃 Dataflow is a serverless runner for Beam pipelines 🌬 Converted 50k jpeg training images in 10 minutes 🖼 TFRecords are serialized images stored in a set of files 🐍 Batch data processing in Apache Beam (Python)
  • 22.
    Model training & evaluation AIPlatform Convert model to TFLite AI Platform Store evaluation metrics BigQuery Model versions GCS Machine Learning
  • 23.
    Train & evaluateand deploy model 📈 Serverless scalable model training with AI Platform 🔀 Distributed training & hyperparameter tuning ♻ Tensorflow code can run on AI Platform with no change 🔗 Easily Attach GPUs and TPUs 📲 Convert to .tflite (3.5 MB with full integer quantization)
  • 24.
  • 25.
  • 26.
    Automation Composer Image Upload GCS Convert images toTFRecords Dataflow Data augmentation AI Platform Model training & evaluation AI Platform Train/Eval Sets GCS Convert model to TFLite AI Platform Store evaluation metrics BigQuery Model versions GCS Data preparation Machine Learning
  • 27.
    Serverless ML Pipelinebuild on GCP Exported to TFLite and deployed to Android and IOS apps Huge improvement of model performance ✅ ✅ ✅
  • 28.
    Custom MLOps onGCP + Orchestration of multiple pipelines ✔ + Scalable ML Applications ✔ + Monitoring and validation of ML predictions ✔ + Ongoing data validation ✔ + Model Governance ✔ + Continuous Integration and Deployment ✔ AI Platform Composer BigQuery GCS Composer Gitlab AI Platform
  • 29.
    Takeaways + Building yourown custom solution is not that hard! + Cloud vendors already offer easy to use, fully managed and battle tested components + New MLops platforms are maturing rapidly —> Kubeflow, TFX, MLFlow —> Can’t wait for 2020!
  • 30.