Enterprise Data Science Workflows on Kubeflow
Use GitOps to deploy and manage your Kubeflow cluster.
Perform an end-to-end data science workflow on Kubeflow.
Stefano Fioravanzo
Yannis Zarkadas
Arrikto
Simplify. Accelerate. Collaborate. arrik.to/odsc20
GitOps and Multi-Tenancy Combined for an
Enterprise Data Science Experience on Kubeflow
Stefano Fioravanzo Yannis Zarkadas
Software Engineer Software Engineer
2
Simplify. Accelerate. Collaborate. arrik.to/odsc20
● How to deploy and manage Kubeflow in a GitOps manner
● How to make sure you run Kubeflow in a secure way
● How to optimize and build production-ready models faster
Why is this important?
✓ Simplify deployment and management of Kubeflow
✓ Accelerate time to production
✓ Collaborate in a secure and isolated manner
What You’ll Learn In This Session
3
Simplify. Accelerate. Collaborate. arrik.to/odsc20
What is Kubeflow
The Kubeflow project is dedicated to making deployments of
machine learning (ML) workflows on Kubernetes: simple,
portable and scalable.
4
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Perception: ML Products are mostly about ML
Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.
Configuration
Data Collection
Data
Verification
Feature
Extraction
Process
Management Tools
Analysis Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
ML Code
5
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Reality: ML Requires DevOps; lots of it
Configuration
Data Collection
Data
Verification
Feature Extraction Process Management
Tools
Analysis Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
ML
Code
Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.
6
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Kubeflow components
7
Jupyter Notebooks
Workflow Building
Pipelines
Tools
Serving
Metadata
Data Management
Kale
Fairing
TFX
Airflow, +
KF Pipelines
HP Tuning
Tensorboard
KFServing
Seldon Core
TFServing, + Training Operators
Pytorch
XGBoost, +
Tensorflow
Prometheus
Versioning ReproducibilitySecure
Sharing
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Platforms /
clouds
GCP AWS IBM CloudAzure OpenShift
Istio
ML tools
PyTorch scikit-learn
Jupyter
TensorFlow
PyTorch
Serving
TensorFlow
Serving
XGBoost
Kubernetes
Argo
Prometheus
Spartakus
Seldon Core
Kubeflow
applications
and
scaffolding
Chainer MPI MXNet
On prem
Jupyter notebook
web app and
controller
Hyperparameter
tuning (Katib)
Kale
Pipelines
Metadata
Training operators:
MPI, MXNet, PyTorch,
TFJob, XGBoost
Kubeflow UI
KFServing
8
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Platforms /
clouds
Kubeflow
applications
and
scaffolding
ML tools
PyTorch scikit-learn
Jupyter
TensorFlow XGBoost
Chainer MPI MXNet
GCP AWS IBM CloudAzure OpenShift
Istio
PyTorch
Serving
TensorFlow
Serving
Kubernetes
Argo
Prometheus
Spartakus
Seldon Core
On prem
Jupyter notebook
web app and
controller
Hyperparameter
tuning (Katib)
Kale
Pipelines
Metadata
Kubeflow UI
KFServing
Training operators:
MPI, MXNet, PyTorch,
TFJob, XGBoost
9
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Platforms /
clouds
ML tools
PyTorch scikit-learn
Jupyter
TensorFlow XGBoost
Kubeflow
applications
and
scaffolding
Chainer MPI MXNet
GCP AWS IBM CloudAzure OpenShift
Istio
PyTorch
Serving
TensorFlow
Serving
Kubernetes
Argo
Prometheus
Spartakus
Seldon Core
On prem
Jupyter notebook
web app and
controller
Hyperparameter
tuning (Katib)
Kale
Pipelines
Metadata
Kubeflow UI
KFServing
Training operators:
MPI, MXNet, PyTorch,
TFJob, XGBoost
10
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Platforms /
clouds
ML tools
PyTorch scikit-learn
Jupyter
TensorFlow XGBoost
Kubeflow
applications
and
scaffolding
Chainer MPI MXNet
GCP AWS IBM CloudAzure OpenShift
Istio
PyTorch
Serving
TensorFlow
Serving
Kubernetes
Argo
Prometheus
Spartakus
Seldon Core
Jupyter notebook
web app and
controller
Hyperparameter
tuning (Katib)
Kale
Pipelines
Metadata
Kubeflow UI
KFServing
On prem
Training operators:
MPI, MXNet, PyTorch,
TFJob, XGBoost
11
Simplify. Accelerate. Collaborate. arrik.to/odsc20
ML workflow
Identify
problem and
collect and
analyse data
Choose an ML
algorithm and
code your
model
Experiment
with data and
model training
Tune the model
hyperparamet
ers
Jupyter
Notebook
Katib
TensorFlow
scikit-learn
PyTorch
XGBoost
Jupyter
Notebook
Kale
Pipelines
KFServing
PyTorch
TFServing
Seldon Core
NVIDIA
TensorRT
Serve the
model for
online/batch
prediction
12
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Testimonials
● Dyson: “Kubeflow is to data science what a lab notebook is to biomedical
scientists — a way to expedite ideas from the lab to the ‘bedside’ 3x faster, while
ensuring experimental reproducibility.”
● US Bank: “The Kubeflow 1.0 release is a significant milestone as it positions
Kubeflow to be a viable ML Enterprise platform. Kubeflow 1.0 delivers material
productivity enhancements for ML researchers.”
● One Technologies: “With Kubeflow at the heart of our ML platform, our small
company has been able to stack models in production to improve CR, find new
customers, and present the right product to the right customer at the right time.”
13
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Testimonials
● GroupBy: “Kubeflow is helping GroupBy in standardizing ML workflows and
simplifying very complicated deployments!”
● Volvo Cars: “Kubeflow provides a seamless interface to a great set of tools
that together manages the complexity of ML workflows and encourages best
practices. The Data Science and Machine Learning teams at Volvo Cars are
able to iterate and deliver reproducible, production grade services with ease.”
14
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Kubeflow - The Infra Side
● Install
● Manage
● Secure
● Upgrade
Simplify. Accelerate. Collaborate. arrik.to/odsc20
What is GitOps
16
All configuration state is declaratively stored in git.
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Imperative vs Declarative
Imperative
1. Create Service
2. Update LoadBalancer
3. Upgrade Deployment
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Imperative vs Declarative
Declarative
Desired State (YAML)
K8s
kind: Pod
metadata:
name: mysql
spec:
image: mysql:7.6
apply
etcd
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Controller
Spec
(desired)
Status
(real)
Kubernetes
Objects
Controller Pattern - The driver behind declarative APIs
Used everywhere in Kubernetes
Observe
Calculate
Reconcile
Physical ResourcesPhysical ResourcesPhysical Resources
write
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Why GitOps?
Simplify. Accelerate. Collaborate. arrik.to/odsc20
K8s
etcd
K8s
etcd
Reproducibility
commit 856df4gdf56g4561d1fg564df5g61v6854df
Author: yanniszark <yanniszark@arrikto.com>
Date: Tuesday, Sep 8 11:24:12 2020 +0200
Upgrade MySQL to new version.
K8s
etcd
apply
● Whole configuration state in git, versioned by commits
● Careful! Mutable state still outside of git (e.g., volumes, S3)
○ Need versioning solution for end-to-end reproducibility
○ Arrikto Rok produces data commits for your volumes (e.g., MySQL)
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Rollbacks
commit 856df4gdf56g4561d1fg564df5g61v6854df
Author: yanniszark <yanniszark@arrikto.com>
Date: Tuesday, Sep 8 11:24:12 2020 +0200
Upgrade MySQL to new version.
commit er1f1ef8f1e1rf5641sdfs564d1fsd1f5sd61fgwd
Author: yanniszark <yanniszark@arrikto.com>
Date: Tuesday, Sep 4 15:24:12 2020 +0200
Increase MySQL read-replicas to 3 for higher
availability.
git log
K8s
etcd
apply
apply
Unhealthy
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Auditing
git blame
48f078b0 (Yannis Zarkadas 2020-06-11 41) kind: Deployment
48f078b0 (Yannis Zarkadas 2020-06-11 42) metadata:
48f078b0 (Yannis Zarkadas 2020-06-11 43) name: nginx
48f078b0 (Yannis Zarkadas 2020-06-11 46) spec:
48f078b0 (Stefano Fioravanzo 2020-06-11 47) replicas: 1
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Rich Ecosystem
● Collaboration through familiar and battle-tested tools
○ Pull Requests and Code Reviews
● Rich offerings
○ GitHub, GitLab, etc.
● Plenty of integrations
○ GitHub Actions, GitLab Pipelines, etc.
Reuse whatever you already know about git!
Simplify. Accelerate. Collaborate. arrik.to/odsc20
GitOps Workflow
Simplify. Accelerate. Collaborate. arrik.to/odsc20
GitOps Workflow
Deployer
GitOps repo
commit
kubectl
apply
Desired State (YAML)
kind: Pod
metadata:
name: mysql
spec:
image: mysql:7.6
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Deployer
GitOps repo
(downstream)
commit
kubectl
apply
GitOps Workflow
● What about 3rd-party applications?
● Usually, infrastructure configuration is provided by the vendor
● For example, Kubeflow maintains a “manifests” monorepo with all deployment
configurations
manifests repo
(upstream)
Kubeflow
Developer
commit
periodic
rebase
Simplify. Accelerate. Collaborate. arrik.to/odsc20
GitOps - Managing Configuration
● How do you manage configuration?
○ Use 3rd-party provided configs
○ Customer changes
○ Update periodically
● Several tools:
○ helm
○ kustomize
○ ...
● Kubeflow uses kustomize
● We (Arrikto) use kustomize for our deployments
kind: Deployment
metadata:
name: redis
namespace: deploy
spec:
template:
spec:
image: gcr.io/redis:6
replicas: 3
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Managing Configuration - Helm
● Helm is the most popular tool that uses templating
● Exposes knobs to consumers via values file
● Templating is hard to read
values.yaml Chart
Customer Repo
(downstream)
Vendor Repo
(upstream)
Simplify. Accelerate. Collaborate. arrik.to/odsc20
{{ if (or (not .Values.persistence.enabled) (eq .Values.persistence.type "pvc")) }}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "grafana.fullname" . }}
namespace: {{ template "grafana.namespace" . }}
labels:
{{- include "grafana.labels" . | nindent 4 }}
{{- if .Values.labels }}
{{ toYaml .Values.labels | indent 4 }}
{{- end }}
{{- with .Values.annotations }}
annotations:
{{ toYaml . | indent 4 }}
{{- end }}
https://github.com/helm/charts/blob/99805df25da220c379ad609fcb7cf20e5e0d4fc0/stable/grafana/templates/deployment.yaml
Managing Configuration - Templating
Simplify. Accelerate. Collaborate. arrik.to/odsc20
└── redis
├── base
│ ├── configmap.yaml
│ ├── kustomization.yaml
│ ├── service.yaml
│ └── statefulset.yaml
Managing Configuration - kustomize
resources:
- configmap.yaml
- service.yaml
- statefulset.yaml
kustomization.yaml
● Base configuration
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Managing Configuration - kustomize
kind: Deployment
metadata:
name: redis
spec:
template:
spec:
image: gcr.io/redis:6
replicas: 1
kustomize build
redis/base
resources:
- configmap.yaml
- service.yaml
- statefulset.yaml
kustomization.yaml
Simplify. Accelerate. Collaborate. arrik.to/odsc20
└── redis
├── base
└── overlays
├── deploy
│ ├── kustomization.yaml
│ └── patches
│ └── replicas.yaml
Managing Configuration - kustomize
bases:
- ../base
namespace: deploy
patches:
- path: patches/replicas.yaml
kustomization.yaml
● Create overlays (variants) to customize
deployment
kind: Deployment
metadata:
name: redis
spec:
template:
spec:
replicas: 3
patches/replicas.yaml
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Managing Configuration - kustomize
kind: Deployment
metadata:
name: redis
namespace: deploy
spec:
template:
spec:
image: gcr.io/redis:6
replicas: 3
kustomize build
redis/overlays/deploy
bases:
- ../base
namespace: deploy
patches:
- path: patches/replicas.yaml
kustomization.yaml
kind: Deployment
metadata:
name: redis
spec:
template:
spec:
replicas: 3
patches/replicas.yaml
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Managing Configuration - kustomize
Vendor Repo
(upstream)
Customer Repo
(downstream)
v1
v2
d1
v1
v2
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Managing Configuration - kustomize
Vendor Repo
(upstream)
Customer Repo
(downstream)
v1
v2
d1
v1
v2
v3
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Managing Configuration - kustomize
Vendor Repo
(upstream)
Customer Repo
(downstream)
d1
v1
v2
v3
v1
v2
v3
● Update with git rebase
● Separate file == no conflicts
Simplify. Accelerate. Collaborate. arrik.to/odsc20
└── redis
├── base
└── overlays
├── deploy
Managing Configuration - kustomize
● Powerful customization capabilities
● Rebase from upstream to get new updates
● Customizations in separate folders, no conflicts on rebase
Consumer
customizations
Upstream repo
GitOps repo
Simplify. Accelerate. Collaborate. arrik.to/odsc20
● Simplify Kubeflow stack installation, configuration, and management
○ Deploy and manage software in a declarative way
○ Complete visibility of system configuration
● Accelerate the upgrade process by continuously deploying changes to the
cluster
○ Track changes and revert if something goes wrong
● Collaborate better and faster, share knowledge with the whole team
○ Keep using your favorite familiar tools and workflow
Why GitOps in your Kubeflow Deployment
39
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Demo
1. Kubernetes Cluster (EKS) on Amazon Web Services
2. Deploy Rok
3. Deploy Kubeflow
4. Update installation from upstream
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Security in Kubeflow
“We observed that this attack effected
on tens of Kubernetes clusters.”
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Multi-User Isolation
Authentication?
Authorization?
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Authentication using OIDC Protocol
● Open & Standardized OAuth Flow
● Objective: Get the User’s Identity (username, groups)
● Popular and Secure
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Identity Provider
LDAP / AD
Static
Password
File
External
IdP
(Google, LinkedIn,
…)
OIDC Provider
Interface
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Authorization
● Authorization with Role Based Access Control (RBAC)
● Commit RBAC resources in git for reproducibility
Endpoints
RBAC
Resources Verbs
GET /apis/kubeflow.org/v1/notebooks/{name} Notebooks GET
GET /apis/kubeflow.org/v1/notebooks Notebooks LIST
POST /apis/kubeflow.org/v1/notebooks Notebooks CREATE
DELETE /apis/kubeflow.org/v1/notebooks/{name} Notebooks DELETE
GET /apis/kubeflow.org/v1/experiments/{name} Experiments GET
Can USER do ACTION on RESOURCE in NAMESPACE?
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Handling Credentials
● Credentials are kept in Secrets
● Injected into Pods at runtime with PodDefaults
● Applications expect to find secrets in files or environment variables
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Auth Guidelines for Kubeflow
● Guidelines for secure applications in Kubeflow
https://github.com/kubeflow/community/blob/3357efef4947297026111df17e468d9204fa2061/guidelines/auth.md
Simplify. Accelerate. Collaborate. arrik.to/odsc20
CI/CD for ML
How can data scientists continually improve
and validate models?
● Develop models and pipelines in Jupyter
● Convert notebook to pipeline using Kale
● Run pipeline using Kubeflow Pipelines
● Explore and debug pipeline using Rok
Develop
(Jupyter)
Explore Pipeline
(Rok)
Create Pipeline
(Kale)
Run Pipeline
(KF Pipelines)
N2P CUJ
48
Simplify. Accelerate. Collaborate. arrik.to/odsc20
This workshop will focus on two essential
aspects:
• Low barrier to entry: deploy a Jupyter
Notebook to Kubeflow Pipelines in the
Cloud using a fully GUI-based approach
• Reproducibility: automatic data
versioning to enable reproducibility and
better collaboration between data
scientists
Data Science with Kubeflow
Building
a
Model
Logging
Data
Ingestion
Data
Analysis
Data
Transform
-ation
Data
Validation
Data
Splitting
Trainer
Model
Validation
Training
At Scale
Roll-out Serving Monitoring
Kubeflow Pipelines exists because Data Science and ML are inherently pipeline processes
49
Simplify. Accelerate. Collaborate. arrik.to/odsc20
This workshop will focus on two essential
aspects:
• Low barrier to entry: deploy a Jupyter
Notebook to Kubeflow Pipelines in the
Cloud using a fully GUI-based approach
• Reproducibility: automatic data
versioning to enable reproducibility and
better collaboration between data
scientists
Data Science with Kubeflow
Building
a
Model
Logging
Data
Ingestion
Data
Analysis
Data
Transform
-ation
Data
Validation
Data
Splitting
Trainer
Model
Validation
Training
At Scale
Roll-out Serving Monitoring
Kubeflow Pipelines exists because Data Science and ML are inherently pipeline processes
50
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Benefits of running a Notebook as a Pipeline
● The steps of the workflow are clearly defined
● Parallelization & isolation
○ Hyperparameter tuning
● Data versioning
● Different infrastructure requirements
○ Different hardware (GPU/CPU)
51
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Before
Amend your ML code?
Write your ML code
Create Docker images
Write DSL KFP code
Compile DSL KFP
Upload pipeline to KFP
Run the Pipeline
Workflow
52
Simplify. Accelerate. Collaborate. arrik.to/odsc20
After
Amend your ML code?
Write your ML code
Tag your Notebook cells
Run the Pipeline at the click of a button
Just edit your Notebook!
Before
Amend your ML code?
Write your ML code
Create Docker images
Write DSL KFP code
Compile DSL KFP
Upload pipeline to KFP
Run the Pipeline
Workflow
53
Simplify. Accelerate. Collaborate. arrik.to/odsc20
After
Amend your ML code?
Write your ML code
Tag your Notebook cells
Run the Pipeline at the click of a button
Just edit your Notebook!
Before
Amend your ML code?
Write your ML code
Create Docker images
Write DSL KFP code
Compile DSL KFP
Upload pipeline to KFP
Run the Pipeline
Workflow
A Data Scientist can now
reduce the time taken to write
ML code and run a pipeline by
70%. 
That means you can now run
3x as many experiments as
you did before.
 
What that really means is that
you can deliver work faster to
the business and drive more
revenue
54
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Hyperparameter optimization
The two ways of life
● Change the parameters manually
● Use Katib
55
Simplify. Accelerate. Collaborate. arrik.to/odsc20
What is Katib
Katib is a Kubernetes-based system for Hyperparameter Tuning and
Neural Architecture Search. It supports a number of ML frameworks,
including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.
56
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Hyperparameter optimization
Combining the N2P CUJ with Katib
● Configure parameters, search algorithm, and objectives
using a GUI
● Start HP tuning with the click of a button
● Reproducibility of every pipeline and every step
● Run Katib Trials as Pipelines
● Complete visibility of every different Katib Trial
● Caching for faster computation
57
Simplify. Accelerate. Collaborate. arrik.to/odsc20
A data science journey
58
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Convert notebook to a
Kubeflow pipeline
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
Go to arrik.to/demowfhp to find the
Codelab with the step-by-step
instructions for this tutorial
59
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Convert notebook to a
Kubeflow pipeline
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
60
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
Convert notebook to a
Kubeflow pipeline
61
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Convert notebook to a
Kubeflow pipeline
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
62
Simplify. Accelerate. Collaborate. arrik.to/odsc20
KALE – Kubeflow Automated Pipelines Engine
● Python package + JupyterLab extension
● Convert a Jupyter Notebook to a KFP workflow
● No need for Kubeflow SDK
Annotated
Jupyter Notebook
Kale
Conversion Engine
63
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Kale Modules
Parse Analyze Marshal Generate
Derive pipeline
structure
Identify
dependencies
Inject data objects Generate & deploy
pipeline
64
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Contribute
github.com/kubeflow-kale
65
Simplify. Accelerate. Collaborate. arrik.to/odsc20
TFDV TFTransform TFDV Estimators TFΜΑ TFServing
Katib
Tuner
Arrikto Rok
66
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Arrikto Rok
Data Versioning, Packaging, and Sharing
Across teams and cloud boundaries for complete Reproducibility, Provenance, and Portability
ProductionExperimentation Training
Any Storage Any Storage Any Storage
Data-aware
PVCs
Data-aware
PVCs
Data-aware
PVCs
Arrikto Arrikto Arrikto
CSI CSI CSI
67
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Data Lake
Step 1 Step 2 Step 3
1. Download data from
Lake
2. Store it locally
3. Do initial analysis
4. Upload data to Lake
5. Download data from
Lake
6. Store it locally
7. Transform data
8. Upload to Lake
9. Download data from
Lake
10. Store it locally
11. Train model
12. Upload
Model Building without Data Management
71
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Step 1 Step 2 Step 3
1. Clone disk from
snapshot
2. Do initial analysis
3. Snapshot
4. Clone disk of Step 1
5. Transform data
7. Clone disk of Step 2
8. Train model
Rok
6. Snapshot 9. Snapshot
Object Store
Model Building with Local Data Management (Rok)
72
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Step 1 Step 2 Step 3
Arrikto
Object Store
Step 1 Step 2 Step 3
Arrikto
Object Store
Step 4 Step 5 Step 6
Arrikto
Object Store
Location 2
Pipeline 2: Start after Step 3 of Pipeline 1
Pipeline 3: Reproduce Pipeline 1
Location 1
Pipeline 1
Sync State & Data
73
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Validation Preprocessing Training
Data
Cloned
Data
Evaluation DeploymentTraining
Fail
Validated
Data
Preprocessed
Data
Trained
Model
Evaluated
Model
Deployed
Model
Arrikto Rok
74
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Convert notebook to a
Kubeflow pipeline
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
75
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Convert notebook to a
Kubeflow pipeline
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
76
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Agenda
Convert notebook to a
Kubeflow pipeline
Explore Kubeflow
components
Explore the ML code of the dog
breed identification example
Explore the accuracy of
the various models
Optimize a model with
hyperparameter tuning
Explore the results
of HP tuning
21 3
54 6
77
Simplify. Accelerate. Collaborate. arrik.to/odsc20
What have we achieved in this tutorial?
● Streamline your ML workflows using intuitive UIs
● Exploit the caching feature to give a boost to your pipeline runs
● Run a pipeline-based hyperparameter tuning workflow starting from your
Jupyter Notebook
● Use Kale as a workflow tool to orchestrate Katib and Kubeflow Pipelines
experiments
● Simplify the deployment and management of Kubeflow using GitOps
● Accelerate the time to production
● Collaborate faster and more easily in a secure and isolated manner
Summary
78
Simplify. Accelerate. Collaborate. arrik.to/odsc20 79
Just a small sample of
community contributions
● Jupyter manager UI
● Pipelines volume support
● MiniKF
● Auth with Istio + Dex
● On-premise installation
● Linux Kernel
Simplify. Accelerate. Collaborate. arrik.to/odsc20 80
Community
Kubeflow is open
● Open community
● Open design
● Open source
● Open to ideas
Get involved
● github.com/kubeflow
● kubeflow.slack.com
● @kubeflow
● kubeflow-discuss@googlegroups.com
● Community call on Tuesdays
Simplify. Accelerate. Collaborate. arrik.to/odsc20
Thank You!
More Info
arrik.to/odsc20
Email Address:
stefano@arrikto.com
yanniszark@arrikto.com
company/arrikto
Arrikto Arrikto
Arrikto

Yannis Zarkadas. Enterprise data science workflows on kubeflow

  • 1.
    Enterprise Data ScienceWorkflows on Kubeflow Use GitOps to deploy and manage your Kubeflow cluster. Perform an end-to-end data science workflow on Kubeflow. Stefano Fioravanzo Yannis Zarkadas Arrikto
  • 2.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 GitOps and Multi-Tenancy Combined for an Enterprise Data Science Experience on Kubeflow Stefano Fioravanzo Yannis Zarkadas Software Engineer Software Engineer 2
  • 3.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 ● How to deploy and manage Kubeflow in a GitOps manner ● How to make sure you run Kubeflow in a secure way ● How to optimize and build production-ready models faster Why is this important? ✓ Simplify deployment and management of Kubeflow ✓ Accelerate time to production ✓ Collaborate in a secure and isolated manner What You’ll Learn In This Session 3
  • 4.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 What is Kubeflow The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes: simple, portable and scalable. 4
  • 5.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Perception: ML Products are mostly about ML Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al. Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring ML Code 5
  • 6.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Reality: ML Requires DevOps; lots of it Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring ML Code Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al. 6
  • 7.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Kubeflow components 7 Jupyter Notebooks Workflow Building Pipelines Tools Serving Metadata Data Management Kale Fairing TFX Airflow, + KF Pipelines HP Tuning Tensorboard KFServing Seldon Core TFServing, + Training Operators Pytorch XGBoost, + Tensorflow Prometheus Versioning ReproducibilitySecure Sharing
  • 8.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Platforms / clouds GCP AWS IBM CloudAzure OpenShift Istio ML tools PyTorch scikit-learn Jupyter TensorFlow PyTorch Serving TensorFlow Serving XGBoost Kubernetes Argo Prometheus Spartakus Seldon Core Kubeflow applications and scaffolding Chainer MPI MXNet On prem Jupyter notebook web app and controller Hyperparameter tuning (Katib) Kale Pipelines Metadata Training operators: MPI, MXNet, PyTorch, TFJob, XGBoost Kubeflow UI KFServing 8
  • 9.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Platforms / clouds Kubeflow applications and scaffolding ML tools PyTorch scikit-learn Jupyter TensorFlow XGBoost Chainer MPI MXNet GCP AWS IBM CloudAzure OpenShift Istio PyTorch Serving TensorFlow Serving Kubernetes Argo Prometheus Spartakus Seldon Core On prem Jupyter notebook web app and controller Hyperparameter tuning (Katib) Kale Pipelines Metadata Kubeflow UI KFServing Training operators: MPI, MXNet, PyTorch, TFJob, XGBoost 9
  • 10.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Platforms / clouds ML tools PyTorch scikit-learn Jupyter TensorFlow XGBoost Kubeflow applications and scaffolding Chainer MPI MXNet GCP AWS IBM CloudAzure OpenShift Istio PyTorch Serving TensorFlow Serving Kubernetes Argo Prometheus Spartakus Seldon Core On prem Jupyter notebook web app and controller Hyperparameter tuning (Katib) Kale Pipelines Metadata Kubeflow UI KFServing Training operators: MPI, MXNet, PyTorch, TFJob, XGBoost 10
  • 11.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Platforms / clouds ML tools PyTorch scikit-learn Jupyter TensorFlow XGBoost Kubeflow applications and scaffolding Chainer MPI MXNet GCP AWS IBM CloudAzure OpenShift Istio PyTorch Serving TensorFlow Serving Kubernetes Argo Prometheus Spartakus Seldon Core Jupyter notebook web app and controller Hyperparameter tuning (Katib) Kale Pipelines Metadata Kubeflow UI KFServing On prem Training operators: MPI, MXNet, PyTorch, TFJob, XGBoost 11
  • 12.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 ML workflow Identify problem and collect and analyse data Choose an ML algorithm and code your model Experiment with data and model training Tune the model hyperparamet ers Jupyter Notebook Katib TensorFlow scikit-learn PyTorch XGBoost Jupyter Notebook Kale Pipelines KFServing PyTorch TFServing Seldon Core NVIDIA TensorRT Serve the model for online/batch prediction 12
  • 13.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Testimonials ● Dyson: “Kubeflow is to data science what a lab notebook is to biomedical scientists — a way to expedite ideas from the lab to the ‘bedside’ 3x faster, while ensuring experimental reproducibility.” ● US Bank: “The Kubeflow 1.0 release is a significant milestone as it positions Kubeflow to be a viable ML Enterprise platform. Kubeflow 1.0 delivers material productivity enhancements for ML researchers.” ● One Technologies: “With Kubeflow at the heart of our ML platform, our small company has been able to stack models in production to improve CR, find new customers, and present the right product to the right customer at the right time.” 13
  • 14.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Testimonials ● GroupBy: “Kubeflow is helping GroupBy in standardizing ML workflows and simplifying very complicated deployments!” ● Volvo Cars: “Kubeflow provides a seamless interface to a great set of tools that together manages the complexity of ML workflows and encourages best practices. The Data Science and Machine Learning teams at Volvo Cars are able to iterate and deliver reproducible, production grade services with ease.” 14
  • 15.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Kubeflow - The Infra Side ● Install ● Manage ● Secure ● Upgrade
  • 16.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 What is GitOps 16 All configuration state is declaratively stored in git.
  • 17.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Imperative vs Declarative Imperative 1. Create Service 2. Update LoadBalancer 3. Upgrade Deployment
  • 18.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Imperative vs Declarative Declarative Desired State (YAML) K8s kind: Pod metadata: name: mysql spec: image: mysql:7.6 apply etcd
  • 19.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Controller Spec (desired) Status (real) Kubernetes Objects Controller Pattern - The driver behind declarative APIs Used everywhere in Kubernetes Observe Calculate Reconcile Physical ResourcesPhysical ResourcesPhysical Resources write
  • 20.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Why GitOps?
  • 21.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 K8s etcd K8s etcd Reproducibility commit 856df4gdf56g4561d1fg564df5g61v6854df Author: yanniszark <yanniszark@arrikto.com> Date: Tuesday, Sep 8 11:24:12 2020 +0200 Upgrade MySQL to new version. K8s etcd apply ● Whole configuration state in git, versioned by commits ● Careful! Mutable state still outside of git (e.g., volumes, S3) ○ Need versioning solution for end-to-end reproducibility ○ Arrikto Rok produces data commits for your volumes (e.g., MySQL)
  • 22.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Rollbacks commit 856df4gdf56g4561d1fg564df5g61v6854df Author: yanniszark <yanniszark@arrikto.com> Date: Tuesday, Sep 8 11:24:12 2020 +0200 Upgrade MySQL to new version. commit er1f1ef8f1e1rf5641sdfs564d1fsd1f5sd61fgwd Author: yanniszark <yanniszark@arrikto.com> Date: Tuesday, Sep 4 15:24:12 2020 +0200 Increase MySQL read-replicas to 3 for higher availability. git log K8s etcd apply apply Unhealthy
  • 23.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Auditing git blame 48f078b0 (Yannis Zarkadas 2020-06-11 41) kind: Deployment 48f078b0 (Yannis Zarkadas 2020-06-11 42) metadata: 48f078b0 (Yannis Zarkadas 2020-06-11 43) name: nginx 48f078b0 (Yannis Zarkadas 2020-06-11 46) spec: 48f078b0 (Stefano Fioravanzo 2020-06-11 47) replicas: 1
  • 24.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Rich Ecosystem ● Collaboration through familiar and battle-tested tools ○ Pull Requests and Code Reviews ● Rich offerings ○ GitHub, GitLab, etc. ● Plenty of integrations ○ GitHub Actions, GitLab Pipelines, etc. Reuse whatever you already know about git!
  • 25.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 GitOps Workflow
  • 26.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 GitOps Workflow Deployer GitOps repo commit kubectl apply Desired State (YAML) kind: Pod metadata: name: mysql spec: image: mysql:7.6
  • 27.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Deployer GitOps repo (downstream) commit kubectl apply GitOps Workflow ● What about 3rd-party applications? ● Usually, infrastructure configuration is provided by the vendor ● For example, Kubeflow maintains a “manifests” monorepo with all deployment configurations manifests repo (upstream) Kubeflow Developer commit periodic rebase
  • 28.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 GitOps - Managing Configuration ● How do you manage configuration? ○ Use 3rd-party provided configs ○ Customer changes ○ Update periodically ● Several tools: ○ helm ○ kustomize ○ ... ● Kubeflow uses kustomize ● We (Arrikto) use kustomize for our deployments kind: Deployment metadata: name: redis namespace: deploy spec: template: spec: image: gcr.io/redis:6 replicas: 3
  • 29.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Managing Configuration - Helm ● Helm is the most popular tool that uses templating ● Exposes knobs to consumers via values file ● Templating is hard to read values.yaml Chart Customer Repo (downstream) Vendor Repo (upstream)
  • 30.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 {{ if (or (not .Values.persistence.enabled) (eq .Values.persistence.type "pvc")) }} apiVersion: apps/v1 kind: Deployment metadata: name: {{ template "grafana.fullname" . }} namespace: {{ template "grafana.namespace" . }} labels: {{- include "grafana.labels" . | nindent 4 }} {{- if .Values.labels }} {{ toYaml .Values.labels | indent 4 }} {{- end }} {{- with .Values.annotations }} annotations: {{ toYaml . | indent 4 }} {{- end }} https://github.com/helm/charts/blob/99805df25da220c379ad609fcb7cf20e5e0d4fc0/stable/grafana/templates/deployment.yaml Managing Configuration - Templating
  • 31.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 └── redis ├── base │ ├── configmap.yaml │ ├── kustomization.yaml │ ├── service.yaml │ └── statefulset.yaml Managing Configuration - kustomize resources: - configmap.yaml - service.yaml - statefulset.yaml kustomization.yaml ● Base configuration
  • 32.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Managing Configuration - kustomize kind: Deployment metadata: name: redis spec: template: spec: image: gcr.io/redis:6 replicas: 1 kustomize build redis/base resources: - configmap.yaml - service.yaml - statefulset.yaml kustomization.yaml
  • 33.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 └── redis ├── base └── overlays ├── deploy │ ├── kustomization.yaml │ └── patches │ └── replicas.yaml Managing Configuration - kustomize bases: - ../base namespace: deploy patches: - path: patches/replicas.yaml kustomization.yaml ● Create overlays (variants) to customize deployment kind: Deployment metadata: name: redis spec: template: spec: replicas: 3 patches/replicas.yaml
  • 34.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Managing Configuration - kustomize kind: Deployment metadata: name: redis namespace: deploy spec: template: spec: image: gcr.io/redis:6 replicas: 3 kustomize build redis/overlays/deploy bases: - ../base namespace: deploy patches: - path: patches/replicas.yaml kustomization.yaml kind: Deployment metadata: name: redis spec: template: spec: replicas: 3 patches/replicas.yaml
  • 35.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Managing Configuration - kustomize Vendor Repo (upstream) Customer Repo (downstream) v1 v2 d1 v1 v2
  • 36.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Managing Configuration - kustomize Vendor Repo (upstream) Customer Repo (downstream) v1 v2 d1 v1 v2 v3
  • 37.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Managing Configuration - kustomize Vendor Repo (upstream) Customer Repo (downstream) d1 v1 v2 v3 v1 v2 v3 ● Update with git rebase ● Separate file == no conflicts
  • 38.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 └── redis ├── base └── overlays ├── deploy Managing Configuration - kustomize ● Powerful customization capabilities ● Rebase from upstream to get new updates ● Customizations in separate folders, no conflicts on rebase Consumer customizations Upstream repo GitOps repo
  • 39.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 ● Simplify Kubeflow stack installation, configuration, and management ○ Deploy and manage software in a declarative way ○ Complete visibility of system configuration ● Accelerate the upgrade process by continuously deploying changes to the cluster ○ Track changes and revert if something goes wrong ● Collaborate better and faster, share knowledge with the whole team ○ Keep using your favorite familiar tools and workflow Why GitOps in your Kubeflow Deployment 39
  • 40.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Demo 1. Kubernetes Cluster (EKS) on Amazon Web Services 2. Deploy Rok 3. Deploy Kubeflow 4. Update installation from upstream
  • 41.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Security in Kubeflow “We observed that this attack effected on tens of Kubernetes clusters.”
  • 42.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Multi-User Isolation Authentication? Authorization?
  • 43.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Authentication using OIDC Protocol ● Open & Standardized OAuth Flow ● Objective: Get the User’s Identity (username, groups) ● Popular and Secure
  • 44.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Identity Provider LDAP / AD Static Password File External IdP (Google, LinkedIn, …) OIDC Provider Interface
  • 45.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Authorization ● Authorization with Role Based Access Control (RBAC) ● Commit RBAC resources in git for reproducibility Endpoints RBAC Resources Verbs GET /apis/kubeflow.org/v1/notebooks/{name} Notebooks GET GET /apis/kubeflow.org/v1/notebooks Notebooks LIST POST /apis/kubeflow.org/v1/notebooks Notebooks CREATE DELETE /apis/kubeflow.org/v1/notebooks/{name} Notebooks DELETE GET /apis/kubeflow.org/v1/experiments/{name} Experiments GET Can USER do ACTION on RESOURCE in NAMESPACE?
  • 46.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Handling Credentials ● Credentials are kept in Secrets ● Injected into Pods at runtime with PodDefaults ● Applications expect to find secrets in files or environment variables
  • 47.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Auth Guidelines for Kubeflow ● Guidelines for secure applications in Kubeflow https://github.com/kubeflow/community/blob/3357efef4947297026111df17e468d9204fa2061/guidelines/auth.md
  • 48.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 CI/CD for ML How can data scientists continually improve and validate models? ● Develop models and pipelines in Jupyter ● Convert notebook to pipeline using Kale ● Run pipeline using Kubeflow Pipelines ● Explore and debug pipeline using Rok Develop (Jupyter) Explore Pipeline (Rok) Create Pipeline (Kale) Run Pipeline (KF Pipelines) N2P CUJ 48
  • 49.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 This workshop will focus on two essential aspects: • Low barrier to entry: deploy a Jupyter Notebook to Kubeflow Pipelines in the Cloud using a fully GUI-based approach • Reproducibility: automatic data versioning to enable reproducibility and better collaboration between data scientists Data Science with Kubeflow Building a Model Logging Data Ingestion Data Analysis Data Transform -ation Data Validation Data Splitting Trainer Model Validation Training At Scale Roll-out Serving Monitoring Kubeflow Pipelines exists because Data Science and ML are inherently pipeline processes 49
  • 50.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 This workshop will focus on two essential aspects: • Low barrier to entry: deploy a Jupyter Notebook to Kubeflow Pipelines in the Cloud using a fully GUI-based approach • Reproducibility: automatic data versioning to enable reproducibility and better collaboration between data scientists Data Science with Kubeflow Building a Model Logging Data Ingestion Data Analysis Data Transform -ation Data Validation Data Splitting Trainer Model Validation Training At Scale Roll-out Serving Monitoring Kubeflow Pipelines exists because Data Science and ML are inherently pipeline processes 50
  • 51.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Benefits of running a Notebook as a Pipeline ● The steps of the workflow are clearly defined ● Parallelization & isolation ○ Hyperparameter tuning ● Data versioning ● Different infrastructure requirements ○ Different hardware (GPU/CPU) 51
  • 52.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Before Amend your ML code? Write your ML code Create Docker images Write DSL KFP code Compile DSL KFP Upload pipeline to KFP Run the Pipeline Workflow 52
  • 53.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 After Amend your ML code? Write your ML code Tag your Notebook cells Run the Pipeline at the click of a button Just edit your Notebook! Before Amend your ML code? Write your ML code Create Docker images Write DSL KFP code Compile DSL KFP Upload pipeline to KFP Run the Pipeline Workflow 53
  • 54.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 After Amend your ML code? Write your ML code Tag your Notebook cells Run the Pipeline at the click of a button Just edit your Notebook! Before Amend your ML code? Write your ML code Create Docker images Write DSL KFP code Compile DSL KFP Upload pipeline to KFP Run the Pipeline Workflow A Data Scientist can now reduce the time taken to write ML code and run a pipeline by 70%.  That means you can now run 3x as many experiments as you did before.   What that really means is that you can deliver work faster to the business and drive more revenue 54
  • 55.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Hyperparameter optimization The two ways of life ● Change the parameters manually ● Use Katib 55
  • 56.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 What is Katib Katib is a Kubernetes-based system for Hyperparameter Tuning and Neural Architecture Search. It supports a number of ML frameworks, including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others. 56
  • 57.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Hyperparameter optimization Combining the N2P CUJ with Katib ● Configure parameters, search algorithm, and objectives using a GUI ● Start HP tuning with the click of a button ● Reproducibility of every pipeline and every step ● Run Katib Trials as Pipelines ● Complete visibility of every different Katib Trial ● Caching for faster computation 57
  • 58.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 A data science journey 58
  • 59.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Convert notebook to a Kubeflow pipeline Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 Go to arrik.to/demowfhp to find the Codelab with the step-by-step instructions for this tutorial 59
  • 60.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Convert notebook to a Kubeflow pipeline Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 60
  • 61.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 Convert notebook to a Kubeflow pipeline 61
  • 62.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Convert notebook to a Kubeflow pipeline Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 62
  • 63.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 KALE – Kubeflow Automated Pipelines Engine ● Python package + JupyterLab extension ● Convert a Jupyter Notebook to a KFP workflow ● No need for Kubeflow SDK Annotated Jupyter Notebook Kale Conversion Engine 63
  • 64.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Kale Modules Parse Analyze Marshal Generate Derive pipeline structure Identify dependencies Inject data objects Generate & deploy pipeline 64
  • 65.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Contribute github.com/kubeflow-kale 65
  • 66.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 TFDV TFTransform TFDV Estimators TFΜΑ TFServing Katib Tuner Arrikto Rok 66
  • 67.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Arrikto Rok Data Versioning, Packaging, and Sharing Across teams and cloud boundaries for complete Reproducibility, Provenance, and Portability ProductionExperimentation Training Any Storage Any Storage Any Storage Data-aware PVCs Data-aware PVCs Data-aware PVCs Arrikto Arrikto Arrikto CSI CSI CSI 67
  • 68.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Data Lake Step 1 Step 2 Step 3 1. Download data from Lake 2. Store it locally 3. Do initial analysis 4. Upload data to Lake 5. Download data from Lake 6. Store it locally 7. Transform data 8. Upload to Lake 9. Download data from Lake 10. Store it locally 11. Train model 12. Upload Model Building without Data Management 71
  • 69.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Step 1 Step 2 Step 3 1. Clone disk from snapshot 2. Do initial analysis 3. Snapshot 4. Clone disk of Step 1 5. Transform data 7. Clone disk of Step 2 8. Train model Rok 6. Snapshot 9. Snapshot Object Store Model Building with Local Data Management (Rok) 72
  • 70.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Step 1 Step 2 Step 3 Arrikto Object Store Step 1 Step 2 Step 3 Arrikto Object Store Step 4 Step 5 Step 6 Arrikto Object Store Location 2 Pipeline 2: Start after Step 3 of Pipeline 1 Pipeline 3: Reproduce Pipeline 1 Location 1 Pipeline 1 Sync State & Data 73
  • 71.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Validation Preprocessing Training Data Cloned Data Evaluation DeploymentTraining Fail Validated Data Preprocessed Data Trained Model Evaluated Model Deployed Model Arrikto Rok 74
  • 72.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Convert notebook to a Kubeflow pipeline Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 75
  • 73.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Convert notebook to a Kubeflow pipeline Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 76
  • 74.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Agenda Convert notebook to a Kubeflow pipeline Explore Kubeflow components Explore the ML code of the dog breed identification example Explore the accuracy of the various models Optimize a model with hyperparameter tuning Explore the results of HP tuning 21 3 54 6 77
  • 75.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 What have we achieved in this tutorial? ● Streamline your ML workflows using intuitive UIs ● Exploit the caching feature to give a boost to your pipeline runs ● Run a pipeline-based hyperparameter tuning workflow starting from your Jupyter Notebook ● Use Kale as a workflow tool to orchestrate Katib and Kubeflow Pipelines experiments ● Simplify the deployment and management of Kubeflow using GitOps ● Accelerate the time to production ● Collaborate faster and more easily in a secure and isolated manner Summary 78
  • 76.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 79 Just a small sample of community contributions ● Jupyter manager UI ● Pipelines volume support ● MiniKF ● Auth with Istio + Dex ● On-premise installation ● Linux Kernel
  • 77.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 80 Community Kubeflow is open ● Open community ● Open design ● Open source ● Open to ideas Get involved ● github.com/kubeflow ● kubeflow.slack.com ● @kubeflow ● kubeflow-discuss@googlegroups.com ● Community call on Tuesdays
  • 78.
    Simplify. Accelerate. Collaborate.arrik.to/odsc20 Thank You! More Info arrik.to/odsc20 Email Address: stefano@arrikto.com yanniszark@arrikto.com company/arrikto Arrikto Arrikto Arrikto