Webinar video: https://www.youtube.com/watch?v=Y3_fcJBgpMw
Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring, and Retraining
Speakers:
Stepan Pushkarev, CTO, Hydrosphere.io and Ilnur Garifullin is an ML Engineer, Hydrosphere.io
Abstract: Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model's performance. It hardly could be named "continuous delivery" if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal. In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.
1. Train and deliver machine
learning models to production
with a single command
STEPAN PUSHKAREV
ILNUR GARIFULLIN
2. Today’s webinar overview
1. Machine Learning Workflow
2. Tools overview
a. Kubeflow
b. Hydrosphere.io
3. Deep Dive into Automation
a. Steps definition
b. Steps automation
4. ML Workflow
1. Research
2. Data Preparation
3. Model Training
4. Model Cataloguing
5. Model Deployment
6. Model Integration Testing
7. Production Inferencing
8. Model Performance Monitoring
9. Model Maintenance
5. Step 1: Research
● Defining an objective
● Defining requirements
● Defining methods
● Defining data sources
1.
6. Step 2: Data Preparation
● Collecting data
● Preparing data
○ Cleaning
○ Feature engineering
○ Transformation
● Important! To be reused for Inferencing.
1. 2.
7. Step 3: Model Training
● Building the model
● Training the model
● Evaluating the model
● Tuning hyper-parameters
● Versioning training data
1. 2. 3.
8. Step 4: Model Cataloguing
● Metadata extraction
○ Graph definition
○ Weights
○ Training data version / stats
○ Other dependencies (look_up vocabulary, etc)
● Indexing model’s binaries
● Versioning a model artifact
● Storing a model in Repository
1. 2. 3. 4.
9. Step 5: Model Deployment
● Preparing infrastructure for the model
● Preparing runtime for the model
● Deploying the model server
● Exposing API endpoints to the model
● Model Integration
1. 2. 3. 4. 5.
10. Step 6: Model Integration Testing
● Performing integration tests
● Replaying a golden data set
● Replaying edge cases
● Replaying recent traffic
● Asserting results
1. 2. 3. 4. 5. 6.
11. Step 7: Production Inferencing
● A/B & Canary deployment
● Model scaling
1. 2. 3. 4. 5. 6. 7.
12. Step 8: Model Performance Monitoring
● System metrics monitoring
● Model metrics tracking
● Model comparison
● Concept drift monitoring
● Anomaly detection
● Data profiling
1. 2. 3. 4. 5. 6. 7. 8.
15. The Machine Learning Model Management Platform
The Machine Learning Toolkit for Kubernetes
16. What is Kubeflow?
● Began as Kubernetes template / blueprint for running Tensorflow
● Evolved into “Toolkit” - loosely coupled tools and blueprints for ML on
Kubernetes
17. What is Hydrosphere.io?
Hydrosphere.io is a platform for ML models Management.
- An exact value-add “tool” - a part of the toolkit
- Opensource
- Augments Cataloguing, Deployment, Inferencing,
Monitoring and Maintenance
18. Research Data Prep Training Cataloguing Deployment Integration
Testing
Production
Inferencing
Performance
Monitoring
Model
Maintenance
Tools Landscape
Orchestrate
ModelDB
21. Step 1: Research
MNIST● Objective – given an image of the handwritten
digit, predict what digit it is;
● Requirements – model export with an ease;
● Tools and Methods – Tensorflow Estimator API;
● Data – Mnist dataset
27. Step 4: Model Cataloguing
DIY:
Instrument training
pipeline
Store metadata
Zip model and metadata
Store in S3
Or push to Artifactory
Or push to git
28. Step 4: Model Cataloguing
DIY:
Instrument training
pipeline
Store metadata
Zip model and metadata
Store in S3
Or push to Artifactory
Or push to git
ModelDB:
Python DSL:
- Sync Model
- Sync Test data
- Sync metrics
Nice UI
29. Step 4: Model Cataloguing
DIY:
Instrument training
pipeline
Store metadata
Zip model and metadata
Store in S3
Or push to Artifactory
Or push to git
ModelDB:
Python DSL:
- Sync Model
- Sync Test data
- Sync metrics
Nice UI
Hydrosphere.io:
$ hs upload /models/mnist/
$ hs profile push /data/mnist/
30. Step 4: Model Cataloguing
Version
Extract metadata
Build model docker
Image
Store in Docker
Registry
Hydrosphere.io:
$ hs upload /models/mnist/
$ hs profile push /data/mnist/
32. Step 5: Model Deployment
DIY:
Implement model
server (Flask App)
Lookup for model
Dockerize
Add Kube configs, tags
Expose API (HTTP,
gRPC, batch, Streaming)
33. Step 5: Model Deployment
DIY:
Implement model
server (Flask App)
Lookup for Model
Dockerize
Add Kube configs, tags
Expose API (HTTP,
gRPC, batch, Streaming)
Niche tools:
TensorFlow Serving
PyTorch Serving
Nvidia TensorRT Serving
34. Step 5: Model Deployment
DIY:
Implement model
server (Flask App)
Lookup for Model
Dockerize
Add Kube configs, tags
Expose API (HTTP,
gRPC, batch, Streaming)
Niche tools:
TensorFlow Serving
PyTorch Serving
Nvidia TensorRT Serving
Hydrosphere.io
$ hs apply -f - << EOF
kind: Application
name : “MyPredictionApp”
singular:
model: mnist:1
runtime:
“serving-runtime-python:1.7.0-latest”
EOF
35. Step 5: Model Deployment
Hydrosphere.io
$ hs apply -f - << EOF
kind: Application
name : “MyPredictionApp”
singular:
model: mnist:1
runtime:
“serving-runtime-python:1.7.0-latest”
EOF
metadata
runtime
model
Model launched on Kube
HTTP, gRPC, Kafka API
36. Step 6: Model Integration Testing
DIY:
Implement testing
script
Dockerize, add to Kube
Replay a golden data
Replay edge cases
Replay recent traffic
Asserting results
37. Step 6: Model Integration Testing
DIY:
Implement testing
script
Dockerize, add to Kube
Replay a golden data
Replay edge cases
Replay recent traffic
Asserting results
38. Step 6: Model Integration Testing
DIY:
Implement testing
script
Dockerize, add to Kube
Replay a golden data
Replay edge cases
Replay recent traffic
Asserting results
Hydrosphere Serving (Q2 2019)
$ hs test -f /test/dataset
$ hs test replay anomalies
$ hs test replay <from_date>