Kubernetes, Data Science and Machine Learning
Click to add text
Click to add text
Learn more at kublr.com/how-it-works/kublr-platform
Common ML Challenges and Approaches
Common ML challenges:
• Computer vision, natural language processing, speech
recognition, predictions, anomaly detection,
ML approaches:
• Supervised learning
• Classification, Regression
• Unsupervised learning
• Clustering
Typical ML Challenges
1. Data Source
2. Data preparation
3. Modelling
4. Model serving
5. Analysis
DB / File storage
Data Cleansing
Batches /Streaming
Data Transformation
Model Training
A/B Testing
Optimization
Inferencing
Results Exploration Interpretations
Why Using Kubernetes for ML?
Architecture – separation of concerns (dev, ops, infra), useful abstractions; universality
Pluggable and extensible – k8s is a set of open source microservices
Scalability and HA – autoscaling, resource management, self-healing
Container based – isolation, lightweight, few (if any) limitations on applications
Cloud and OS agnostic – Kubernetes + containers
Shared compute – RBAC, Limits, Quotas
On-demand – cloud support, autoscaling, reproducible applications
Frameworks – Great community
Kubernetes and Kublr for ML
• Infrastructure abstraction and scheduling
• DevOps and operational layer: monitoring and logging,
observability, HA
• Auto-scaling: HPA and cluster auto-scaler
• Kubernetes operators
• Storage (HDFS, Rook/Ceph)
• Custom resources and GPU
Kubernetes as an Orchestration Platform
Kubernetes
• Infrastructure abstraction
• Orchestration
• Network
• Configuration
• Service discovery
• Ingress
• Persistence
Master Node
K8s master components:
etcd, scheduler, api,
controller
K8s
metadata
Docker
kubelet
App data
K8s node components:
overlay network,
discovery, connectivity
Infrastructure and
application containers
Infrastructure and
application containers
Overlay
network
Kublr as Operations and DevOps Layer
K8S Clusters
PoC
Dev
Prod
Cloud
Data
center
API UI
Log collection
Operations
Monitoring
Authn and authz, SSO, fed
Audit Image Repo
Infrastructure management
Backup & DR
• Security
• Multiple environments
• Hybrid support
• Infrastructure
• Operations
• Monitoring and logs
• Backup and DR
• Container image
management
Horizontal Pod Autoscaler
• Cooldown/delay
• Rolling update
• Multiple metrics
• Custom metrics
Kubernetes
Deployment
HPA
Pod NPod 1
scale
...
metrics
Cluster Autoscaler
Kubernetes
Node group 1
Cluster
Autoscaler
Node NNode 1
scale
...
Resources
1. Requested by pods
2. Provided by nodes
• Multiple node groups
• AWS, Azure, GCE
• Cool-down period
• Scheduling rules
compliance
Node group M
Node NNode 1 ...
...
Master
Operators Kubernetes
K8S objects:
• Deployments
• Pods
• Namespaces
• Persistent Volumes
• Custom Resources
• ...
Operator
• Arbitrary software
• Operations automation
• Management automation
• Cloud native adaptation
• Custom Resources
• Annotations
• ...
Storage: HDFS and Hadoop
Hadoop/HDFS
• Scheduling tasks close to the data
• Reliable storage
• Established tool stack for data science and ML
Kubernetes
• Infrastructure management and recovery
• Underlying storage management
• Portability, hybrid support
Storage| Rook and Ceph
Rook = Ceph operator
Cloud native Ceph
Custom resources:
• Cluster
• Replica pool
• File system
Supported storage types:
• Block (rdb)
• Filesystem
• Object (S3, OpenStack
API)
[1] https://www.youtube.com/watch?v=iwVAvV_lI_Q
Kubernetes, GPU, and Kublr
• Standard Kubernetes resources: CPU, RAM, storage
• Custom resources – GPU, FPGA – via device plugins
• Nvidia GPU require driver and custom container runtime
• Kublr automates
• GPU driver installation
• Nvidia container runtime setup
• Nvidia device plugin setup
Major ML Stacks Compatible w/ Kubernetes
• Kubernetes TensorFlow TF-operator: github.com/kubeflow/kubeflow
• Spark 2.3.0
• In-house solution (model in Docker containers, run them on cloud or on-
prem Kubernetes )
• beam.apache.org
• Other rather “new” open source solutions
• Cloud and other vendor solutions
Kubeflow
• Simplify scaling and deploy machine learning applications
• Work on including different tooling
• Train/serve TensorFlow models in different environments
• Use Jupyter notebooks to manage TensorFlow training jobs
Spark without “Native " Kubernetes Support
A Spark standalone cluster in Kubernetes
Spark 2.3.0 bin/spark-submit 
--master k8s://https://<k8s-apiserver-
host>:<k8s-apiserver-port> 
--deploy-mode cluster 
--name spark-pi  --class
org.apache.spark.examples.SparkPi 
--conf spark.executor.instances=5 
--conf
spark.kubernetes.container.image=<spark-
image> 
local:///path/to/examples.jar
In-House Solution
• Custom Docker image with ML logic
• Kubernetes as scheduler
• Monitoring tools
• Custom implementation of distributed tasks scheduler or
framework (e.g. Celery)
Demo | ML and HPA with Custom Metrics
Demo
ML and HPA with Custom Metrics
To view the demo, check out our webinar on:
https://goo.gl/vY6HbE
kublr.com/demo
Vlad Penkin
Oleg Chunikhin
Arkadii Ocheretnoi
Thank you!

Kubernetes data science and machine learning

  • 1.
    Kubernetes, Data Scienceand Machine Learning
  • 2.
    Click to addtext Click to add text Learn more at kublr.com/how-it-works/kublr-platform
  • 3.
    Common ML Challengesand Approaches Common ML challenges: • Computer vision, natural language processing, speech recognition, predictions, anomaly detection, ML approaches: • Supervised learning • Classification, Regression • Unsupervised learning • Clustering
  • 4.
    Typical ML Challenges 1.Data Source 2. Data preparation 3. Modelling 4. Model serving 5. Analysis DB / File storage Data Cleansing Batches /Streaming Data Transformation Model Training A/B Testing Optimization Inferencing Results Exploration Interpretations
  • 5.
    Why Using Kubernetesfor ML? Architecture – separation of concerns (dev, ops, infra), useful abstractions; universality Pluggable and extensible – k8s is a set of open source microservices Scalability and HA – autoscaling, resource management, self-healing Container based – isolation, lightweight, few (if any) limitations on applications Cloud and OS agnostic – Kubernetes + containers Shared compute – RBAC, Limits, Quotas On-demand – cloud support, autoscaling, reproducible applications Frameworks – Great community
  • 6.
    Kubernetes and Kublrfor ML • Infrastructure abstraction and scheduling • DevOps and operational layer: monitoring and logging, observability, HA • Auto-scaling: HPA and cluster auto-scaler • Kubernetes operators • Storage (HDFS, Rook/Ceph) • Custom resources and GPU
  • 7.
    Kubernetes as anOrchestration Platform Kubernetes • Infrastructure abstraction • Orchestration • Network • Configuration • Service discovery • Ingress • Persistence Master Node K8s master components: etcd, scheduler, api, controller K8s metadata Docker kubelet App data K8s node components: overlay network, discovery, connectivity Infrastructure and application containers Infrastructure and application containers Overlay network
  • 8.
    Kublr as Operationsand DevOps Layer K8S Clusters PoC Dev Prod Cloud Data center API UI Log collection Operations Monitoring Authn and authz, SSO, fed Audit Image Repo Infrastructure management Backup & DR • Security • Multiple environments • Hybrid support • Infrastructure • Operations • Monitoring and logs • Backup and DR • Container image management
  • 9.
    Horizontal Pod Autoscaler •Cooldown/delay • Rolling update • Multiple metrics • Custom metrics Kubernetes Deployment HPA Pod NPod 1 scale ... metrics
  • 10.
    Cluster Autoscaler Kubernetes Node group1 Cluster Autoscaler Node NNode 1 scale ... Resources 1. Requested by pods 2. Provided by nodes • Multiple node groups • AWS, Azure, GCE • Cool-down period • Scheduling rules compliance Node group M Node NNode 1 ... ... Master
  • 11.
    Operators Kubernetes K8S objects: •Deployments • Pods • Namespaces • Persistent Volumes • Custom Resources • ... Operator • Arbitrary software • Operations automation • Management automation • Cloud native adaptation • Custom Resources • Annotations • ...
  • 12.
    Storage: HDFS andHadoop Hadoop/HDFS • Scheduling tasks close to the data • Reliable storage • Established tool stack for data science and ML Kubernetes • Infrastructure management and recovery • Underlying storage management • Portability, hybrid support
  • 13.
    Storage| Rook andCeph Rook = Ceph operator Cloud native Ceph Custom resources: • Cluster • Replica pool • File system Supported storage types: • Block (rdb) • Filesystem • Object (S3, OpenStack API) [1] https://www.youtube.com/watch?v=iwVAvV_lI_Q
  • 14.
    Kubernetes, GPU, andKublr • Standard Kubernetes resources: CPU, RAM, storage • Custom resources – GPU, FPGA – via device plugins • Nvidia GPU require driver and custom container runtime • Kublr automates • GPU driver installation • Nvidia container runtime setup • Nvidia device plugin setup
  • 15.
    Major ML StacksCompatible w/ Kubernetes • Kubernetes TensorFlow TF-operator: github.com/kubeflow/kubeflow • Spark 2.3.0 • In-house solution (model in Docker containers, run them on cloud or on- prem Kubernetes ) • beam.apache.org • Other rather “new” open source solutions • Cloud and other vendor solutions
  • 16.
    Kubeflow • Simplify scalingand deploy machine learning applications • Work on including different tooling • Train/serve TensorFlow models in different environments • Use Jupyter notebooks to manage TensorFlow training jobs
  • 17.
    Spark without “Native" Kubernetes Support A Spark standalone cluster in Kubernetes
  • 18.
    Spark 2.3.0 bin/spark-submit --master k8s://https://<k8s-apiserver- host>:<k8s-apiserver-port> --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=<spark- image> local:///path/to/examples.jar
  • 19.
    In-House Solution • CustomDocker image with ML logic • Kubernetes as scheduler • Monitoring tools • Custom implementation of distributed tasks scheduler or framework (e.g. Celery)
  • 20.
    Demo | MLand HPA with Custom Metrics
  • 21.
    Demo ML and HPAwith Custom Metrics To view the demo, check out our webinar on: https://goo.gl/vY6HbE
  • 22.
  • 23.

Editor's Notes