End-to-End Machine Learning Pipeline with Docker Enterprise and Kubeflow
Try it out using Docker for Desktop: https://github.com/dockersamples/docker-hub-ml-project
Enjoy!
4. ● Current state:
○ More than 4.6M unlabeled public repositories on Hub
○ Inability to search Hub images based on categories
○ Limited user experience due to lack of context (e.g. backend-
frameworks->databases)
● Proposed Solution:
○ Automate the Docker Hub image categorization (send
description -> return suggested categories/topics)
Use Case - Docker Hub
5. ● How to reliably deploy models created by data scientists to
production?
● How to facilitate the creation and maintenance of end-to-end ML
pipelines?
● How to automate the ML pipeline workflows?
Problem Definition
6. Demonstrate the use of Kubeflow on Docker
Enterprise to train and deploy an ML model.
Project / Solution Scope
7. Personas
DEVOPS
ENGINEERS
• Provide and maintain
infrastructure services for
other teams
• Monitoring & Configuration
management
• Deployment Automation &
Security
DATA
ENGINEERS
• Develop, test and maintain
data pipelines
• Improve data reliability,
efficiency and quality
• Discover opportunities for
data acquisition
DATA
SCIENTISTS
• Answer industry and business
questions
• Prepare data for use in
predictive modeling
• Building ML models to
improve existing product
workflows and UX
DATAOPS ENGINEERS
8. The Kubeflow project is dedicated to making deployments of
machine learning (ML) workflows on Kubernetes simple, portable
and scalable.
Source: github.com/kubeflow/kubeflow
What is Kubeflow?
9. Architecture
Kubeflow Prometheus
TFJob
Grafana Jupyter Hub
Tensorflow
Hub
Kubernetes +
Docker Engine
ENTERPRISE PLATFORM
Cloud VM
Bare
Metal
Docker for
Desktops
Notebook 1
Notebook 2
….
Model 1
Model 2
….
Seldon
ArgoAmbassadorKatib
10. Seldon-Core Architecture
Kubernetes API
Docker Trusted Registry
ENTERPRISE PLATFORM
Ambassador
(API Gateway)
Argo Job (CI/CD)
Data scientists
and engineers Operator
Service
Orchestrator
1.N deployment
graphs
Service
Orchestrator
Service
Orchestrator
Model
1..N
REST or gRPC
Business
Applications
11. Production Environments
Docker Trusted Registry
Docker EE
Production Environments
Version Control
Non-Production EnvironmentsDeveloper Machine
Development CI/CD Customers
Datacenter 1
Datacenter 2
Docker for
Development Process
Docker EE
12. Docker Content Trust
$ docker trust sign dev/dockerhubclassifier:v3
Signing and pushing trust metadata for dev/dockerhubclassifier:v3
The push refers to a repository [docker.io/dev/dockerhubclassifier:v3]
...
v1: digest: sha256:74d4bfa917d55d53c7df3d2ab20a8d926874d61c3da5ef6de15dd2654fc467c4 size: 1357
Signing and pushing trust metadata
Enter passphrase for delegation key with ID 27d42a8:
Successfully signed dev/dockerhubclassifier:v3
16. Data Collection Exploratory Data Analysis
Data Cleaning and
Transformation
Model Building
Deploying Model to
Production
21 3
4 5
Collected data from other sources on
the internet with tagged content and
joined with Hub data
Pandas data frames, distribution of
categories, observing correlations etc.
using several Python packages
Splitting, training, predicting and tuning.
Reducing categories, stop word
removal, encoding labels, vectorization
and general transformations to feed into
classifiers.
Serving, monitoring and logging at scale
20. Argo Workflow on Docker for Desktop
and Docker Enterprise
Model performance Monitoring
DEMO
21. Repo owner Mike
Mike decides to be a good person
and labels his repo
With ML, Mike can see the most
relevant categories according the
repo’s content
and proceeds to choose the
appropriate labels
22. - Simpler workflow with Kubeflow Pipelines
- Stepping stone for future ML projects
- Canary deployments and A/B Testing with Seldon +
Istio
Challenges / Next Steps
23. ● End-to-End Machine Learning Pipeline with Docker for Desktop and Kubeflow
○ https://github.com/dockersamples/docker-hub-ml-project
Contributions to the Community
Docker for Desktop Kubeflow