SlideShare a Scribd company logo
Kubernetes, Data Science and Machine Learning
Click to add text
Click to add text
Learn more at kublr.com/how-it-works/kublr-platform
Common ML Challenges and Approaches
Common ML challenges:
• Computer vision, natural language processing, speech
recognition, predictions, anomaly detection,
ML approaches:
• Supervised learning
• Classification, Regression
• Unsupervised learning
• Clustering
Typical ML Challenges
1. Data Source
2. Data preparation
3. Modelling
4. Model serving
5. Analysis
DB / File storage
Data Cleansing
Batches /Streaming
Data Transformation
Model Training
A/B Testing
Optimization
Inferencing
Results Exploration Interpretations
Why Using Kubernetes for ML?
Architecture – separation of concerns (dev, ops, infra), useful abstractions; universality
Pluggable and extensible – k8s is a set of open source microservices
Scalability and HA – autoscaling, resource management, self-healing
Container based – isolation, lightweight, few (if any) limitations on applications
Cloud and OS agnostic – Kubernetes + containers
Shared compute – RBAC, Limits, Quotas
On-demand – cloud support, autoscaling, reproducible applications
Frameworks – Great community
Kubernetes and Kublr for ML
• Infrastructure abstraction and scheduling
• DevOps and operational layer: monitoring and logging,
observability, HA
• Auto-scaling: HPA and cluster auto-scaler
• Kubernetes operators
• Storage (HDFS, Rook/Ceph)
• Custom resources and GPU
Kubernetes as an Orchestration Platform
Kubernetes
• Infrastructure abstraction
• Orchestration
• Network
• Configuration
• Service discovery
• Ingress
• Persistence
Master Node
K8s master components:
etcd, scheduler, api,
controller
K8s
metadata
Docker
kubelet
App data
K8s node components:
overlay network,
discovery, connectivity
Infrastructure and
application containers
Infrastructure and
application containers
Overlay
network
Kublr as Operations and DevOps Layer
K8S Clusters
PoC
Dev
Prod
Cloud
Data
center
API UI
Log collection
Operations
Monitoring
Authn and authz, SSO, fed
Audit Image Repo
Infrastructure management
Backup & DR
• Security
• Multiple environments
• Hybrid support
• Infrastructure
• Operations
• Monitoring and logs
• Backup and DR
• Container image
management
Horizontal Pod Autoscaler
• Cooldown/delay
• Rolling update
• Multiple metrics
• Custom metrics
Kubernetes
Deployment
HPA
Pod NPod 1
scale
...
metrics
Cluster Autoscaler
Kubernetes
Node group 1
Cluster
Autoscaler
Node NNode 1
scale
...
Resources
1. Requested by pods
2. Provided by nodes
• Multiple node groups
• AWS, Azure, GCE
• Cool-down period
• Scheduling rules
compliance
Node group M
Node NNode 1 ...
...
Master
Operators Kubernetes
K8S objects:
• Deployments
• Pods
• Namespaces
• Persistent Volumes
• Custom Resources
• ...
Operator
• Arbitrary software
• Operations automation
• Management automation
• Cloud native adaptation
• Custom Resources
• Annotations
• ...
Storage: HDFS and Hadoop
Hadoop/HDFS
• Scheduling tasks close to the data
• Reliable storage
• Established tool stack for data science and ML
Kubernetes
• Infrastructure management and recovery
• Underlying storage management
• Portability, hybrid support
Storage| Rook and Ceph
Rook = Ceph operator
Cloud native Ceph
Custom resources:
• Cluster
• Replica pool
• File system
Supported storage types:
• Block (rdb)
• Filesystem
• Object (S3, OpenStack
API)
[1] https://www.youtube.com/watch?v=iwVAvV_lI_Q
Kubernetes, GPU, and Kublr
• Standard Kubernetes resources: CPU, RAM, storage
• Custom resources – GPU, FPGA – via device plugins
• Nvidia GPU require driver and custom container runtime
• Kublr automates
• GPU driver installation
• Nvidia container runtime setup
• Nvidia device plugin setup
Major ML Stacks Compatible w/ Kubernetes
• Kubernetes TensorFlow TF-operator: github.com/kubeflow/kubeflow
• Spark 2.3.0
• In-house solution (model in Docker containers, run them on cloud or on-
prem Kubernetes )
• beam.apache.org
• Other rather “new” open source solutions
• Cloud and other vendor solutions
Kubeflow
• Simplify scaling and deploy machine learning applications
• Work on including different tooling
• Train/serve TensorFlow models in different environments
• Use Jupyter notebooks to manage TensorFlow training jobs
Spark without “Native " Kubernetes Support
A Spark standalone cluster in Kubernetes
Spark 2.3.0 bin/spark-submit 
--master k8s://https://<k8s-apiserver-
host>:<k8s-apiserver-port> 
--deploy-mode cluster 
--name spark-pi  --class
org.apache.spark.examples.SparkPi 
--conf spark.executor.instances=5 
--conf
spark.kubernetes.container.image=<spark-
image> 
local:///path/to/examples.jar
In-House Solution
• Custom Docker image with ML logic
• Kubernetes as scheduler
• Monitoring tools
• Custom implementation of distributed tasks scheduler or
framework (e.g. Celery)
Demo | ML and HPA with Custom Metrics
Demo
ML and HPA with Custom Metrics
To view the demo, check out our webinar on:
https://goo.gl/vY6HbE
kublr.com/demo
Vlad Penkin
Oleg Chunikhin
Arkadii Ocheretnoi
Thank you!

More Related Content

What's hot

Managing kubernetes deployment with operators
Managing kubernetes deployment with operatorsManaging kubernetes deployment with operators
Managing kubernetes deployment with operators
Cloud Technology Experts
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)
Allan Naim
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
inwin stack
 
How to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive EnvironmentsHow to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive Environments
Kublr
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Kublr
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
Timothy Chen
 
Securing and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with KyvernoSecuring and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with Kyverno
Saim Safder
 
Running I/O intensive workloads on Kubernetes, by Nati Shalom
Running I/O intensive workloads on Kubernetes, by Nati ShalomRunning I/O intensive workloads on Kubernetes, by Nati Shalom
Running I/O intensive workloads on Kubernetes, by Nati Shalom
Cloud Native Day Tel Aviv
 
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
KCDItaly
 
Top Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at ScaleTop Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at Scale
SignalFx
 
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment modelsSf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Peter Ss
 
Spinnaker on Kubernetes
Spinnaker on KubernetesSpinnaker on Kubernetes
Spinnaker on Kubernetes
Jinwoong Kim
 
Devops Columbia October 2020 - Gabriel Alix: A Discussion on Terraform
Devops Columbia October 2020 - Gabriel Alix: A Discussion on TerraformDevops Columbia October 2020 - Gabriel Alix: A Discussion on Terraform
Devops Columbia October 2020 - Gabriel Alix: A Discussion on Terraform
Drew Malone
 
Advanced Scheduling in Kubernetes
Advanced Scheduling in KubernetesAdvanced Scheduling in Kubernetes
Advanced Scheduling in Kubernetes
Kublr
 
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsTectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
CoreOS
 
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CISecure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Mitchell Pronschinske
 
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Platform9
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster
inwin stack
 
OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)
rhirschfeld
 
Storage os kubernetes clusters need persistent data
Storage os   kubernetes clusters need persistent dataStorage os   kubernetes clusters need persistent data
Storage os kubernetes clusters need persistent data
LibbySchulze
 

What's hot (20)

Managing kubernetes deployment with operators
Managing kubernetes deployment with operatorsManaging kubernetes deployment with operators
Managing kubernetes deployment with operators
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
 
How to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive EnvironmentsHow to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive Environments
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
 
Securing and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with KyvernoSecuring and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with Kyverno
 
Running I/O intensive workloads on Kubernetes, by Nati Shalom
Running I/O intensive workloads on Kubernetes, by Nati ShalomRunning I/O intensive workloads on Kubernetes, by Nati Shalom
Running I/O intensive workloads on Kubernetes, by Nati Shalom
 
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
 
Top Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at ScaleTop Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at Scale
 
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment modelsSf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment models
 
Spinnaker on Kubernetes
Spinnaker on KubernetesSpinnaker on Kubernetes
Spinnaker on Kubernetes
 
Devops Columbia October 2020 - Gabriel Alix: A Discussion on Terraform
Devops Columbia October 2020 - Gabriel Alix: A Discussion on TerraformDevops Columbia October 2020 - Gabriel Alix: A Discussion on Terraform
Devops Columbia October 2020 - Gabriel Alix: A Discussion on Terraform
 
Advanced Scheduling in Kubernetes
Advanced Scheduling in KubernetesAdvanced Scheduling in Kubernetes
Advanced Scheduling in Kubernetes
 
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsTectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
 
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CISecure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
 
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
Cost-effective Compute Clusters with Spot and Pre-emptible Instances - KubeCo...
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster
 
OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)
 
Storage os kubernetes clusters need persistent data
Storage os   kubernetes clusters need persistent dataStorage os   kubernetes clusters need persistent data
Storage os kubernetes clusters need persistent data
 

Similar to Kubernetes data science and machine learning

Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Kubernetes stack reliability
Kubernetes stack reliabilityKubernetes stack reliability
Kubernetes stack reliability
Oleg Chunikhin
 
How Self-Healing Nodes and Infrastructure Management Impact Reliability
How Self-Healing Nodes and Infrastructure Management Impact ReliabilityHow Self-Healing Nodes and Infrastructure Management Impact Reliability
How Self-Healing Nodes and Infrastructure Management Impact Reliability
Kublr
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
DataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
DataWorks Summit
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018
Rachit Arora
 
Fb talk arch_summit
Fb talk arch_summitFb talk arch_summit
Fb talk arch_summitdrewz lin
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Kamesh Pemmaraju
 
'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015
Lenny Pruss
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
Kamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Red_Hat_Storage
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinderopenstackindia
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 

Similar to Kubernetes data science and machine learning (20)

Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Kubernetes stack reliability
Kubernetes stack reliabilityKubernetes stack reliability
Kubernetes stack reliability
 
How Self-Healing Nodes and Infrastructure Management Impact Reliability
How Self-Healing Nodes and Infrastructure Management Impact ReliabilityHow Self-Healing Nodes and Infrastructure Management Impact Reliability
How Self-Healing Nodes and Infrastructure Management Impact Reliability
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018
 
Fb talk arch_summit
Fb talk arch_summitFb talk arch_summit
Fb talk arch_summit
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015'Cloud-Native' Ecosystem - Aug 2015
'Cloud-Native' Ecosystem - Aug 2015
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 

More from Kublr

Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2
Kublr
 
Container Runtimes and Tooling
Container Runtimes and ToolingContainer Runtimes and Tooling
Container Runtimes and Tooling
Kublr
 
Kubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with SubmarinerKubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with Submariner
Kublr
 
Intro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on KubernetesIntro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on Kubernetes
Kublr
 
Hybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stackHybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stack
Kublr
 
Multi-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with VeleroMulti-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with Velero
Kublr
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101
Kublr
 
Kubernetes Ingress 101
Kubernetes Ingress 101Kubernetes Ingress 101
Kubernetes Ingress 101
Kublr
 
Kubernetes persistence 101
Kubernetes persistence 101Kubernetes persistence 101
Kubernetes persistence 101
Kublr
 
Portable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Portable CI/CD Environment as Code with Kubernetes, Kublr and JenkinsPortable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Portable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Kublr
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Kublr
 
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-stepSetting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Kublr
 
Building Portable Applications with Kubernetes
Building Portable Applications with KubernetesBuilding Portable Applications with Kubernetes
Building Portable Applications with Kubernetes
Kublr
 
Introduction to Kubernetes RBAC
Introduction to Kubernetes RBACIntroduction to Kubernetes RBAC
Introduction to Kubernetes RBAC
Kublr
 
Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Canary Releases on Kubernetes w/ Spinnaker, Istio, and PrometheusCanary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Kublr
 

More from Kublr (15)

Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2
 
Container Runtimes and Tooling
Container Runtimes and ToolingContainer Runtimes and Tooling
Container Runtimes and Tooling
 
Kubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with SubmarinerKubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with Submariner
 
Intro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on KubernetesIntro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on Kubernetes
 
Hybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stackHybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stack
 
Multi-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with VeleroMulti-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with Velero
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101
 
Kubernetes Ingress 101
Kubernetes Ingress 101Kubernetes Ingress 101
Kubernetes Ingress 101
 
Kubernetes persistence 101
Kubernetes persistence 101Kubernetes persistence 101
Kubernetes persistence 101
 
Portable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Portable CI/CD Environment as Code with Kubernetes, Kublr and JenkinsPortable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Portable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-stepSetting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
 
Building Portable Applications with Kubernetes
Building Portable Applications with KubernetesBuilding Portable Applications with Kubernetes
Building Portable Applications with Kubernetes
 
Introduction to Kubernetes RBAC
Introduction to Kubernetes RBACIntroduction to Kubernetes RBAC
Introduction to Kubernetes RBAC
 
Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Canary Releases on Kubernetes w/ Spinnaker, Istio, and PrometheusCanary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Kubernetes data science and machine learning

  • 1. Kubernetes, Data Science and Machine Learning
  • 2. Click to add text Click to add text Learn more at kublr.com/how-it-works/kublr-platform
  • 3. Common ML Challenges and Approaches Common ML challenges: • Computer vision, natural language processing, speech recognition, predictions, anomaly detection, ML approaches: • Supervised learning • Classification, Regression • Unsupervised learning • Clustering
  • 4. Typical ML Challenges 1. Data Source 2. Data preparation 3. Modelling 4. Model serving 5. Analysis DB / File storage Data Cleansing Batches /Streaming Data Transformation Model Training A/B Testing Optimization Inferencing Results Exploration Interpretations
  • 5. Why Using Kubernetes for ML? Architecture – separation of concerns (dev, ops, infra), useful abstractions; universality Pluggable and extensible – k8s is a set of open source microservices Scalability and HA – autoscaling, resource management, self-healing Container based – isolation, lightweight, few (if any) limitations on applications Cloud and OS agnostic – Kubernetes + containers Shared compute – RBAC, Limits, Quotas On-demand – cloud support, autoscaling, reproducible applications Frameworks – Great community
  • 6. Kubernetes and Kublr for ML • Infrastructure abstraction and scheduling • DevOps and operational layer: monitoring and logging, observability, HA • Auto-scaling: HPA and cluster auto-scaler • Kubernetes operators • Storage (HDFS, Rook/Ceph) • Custom resources and GPU
  • 7. Kubernetes as an Orchestration Platform Kubernetes • Infrastructure abstraction • Orchestration • Network • Configuration • Service discovery • Ingress • Persistence Master Node K8s master components: etcd, scheduler, api, controller K8s metadata Docker kubelet App data K8s node components: overlay network, discovery, connectivity Infrastructure and application containers Infrastructure and application containers Overlay network
  • 8. Kublr as Operations and DevOps Layer K8S Clusters PoC Dev Prod Cloud Data center API UI Log collection Operations Monitoring Authn and authz, SSO, fed Audit Image Repo Infrastructure management Backup & DR • Security • Multiple environments • Hybrid support • Infrastructure • Operations • Monitoring and logs • Backup and DR • Container image management
  • 9. Horizontal Pod Autoscaler • Cooldown/delay • Rolling update • Multiple metrics • Custom metrics Kubernetes Deployment HPA Pod NPod 1 scale ... metrics
  • 10. Cluster Autoscaler Kubernetes Node group 1 Cluster Autoscaler Node NNode 1 scale ... Resources 1. Requested by pods 2. Provided by nodes • Multiple node groups • AWS, Azure, GCE • Cool-down period • Scheduling rules compliance Node group M Node NNode 1 ... ... Master
  • 11. Operators Kubernetes K8S objects: • Deployments • Pods • Namespaces • Persistent Volumes • Custom Resources • ... Operator • Arbitrary software • Operations automation • Management automation • Cloud native adaptation • Custom Resources • Annotations • ...
  • 12. Storage: HDFS and Hadoop Hadoop/HDFS • Scheduling tasks close to the data • Reliable storage • Established tool stack for data science and ML Kubernetes • Infrastructure management and recovery • Underlying storage management • Portability, hybrid support
  • 13. Storage| Rook and Ceph Rook = Ceph operator Cloud native Ceph Custom resources: • Cluster • Replica pool • File system Supported storage types: • Block (rdb) • Filesystem • Object (S3, OpenStack API) [1] https://www.youtube.com/watch?v=iwVAvV_lI_Q
  • 14. Kubernetes, GPU, and Kublr • Standard Kubernetes resources: CPU, RAM, storage • Custom resources – GPU, FPGA – via device plugins • Nvidia GPU require driver and custom container runtime • Kublr automates • GPU driver installation • Nvidia container runtime setup • Nvidia device plugin setup
  • 15. Major ML Stacks Compatible w/ Kubernetes • Kubernetes TensorFlow TF-operator: github.com/kubeflow/kubeflow • Spark 2.3.0 • In-house solution (model in Docker containers, run them on cloud or on- prem Kubernetes ) • beam.apache.org • Other rather “new” open source solutions • Cloud and other vendor solutions
  • 16. Kubeflow • Simplify scaling and deploy machine learning applications • Work on including different tooling • Train/serve TensorFlow models in different environments • Use Jupyter notebooks to manage TensorFlow training jobs
  • 17. Spark without “Native " Kubernetes Support A Spark standalone cluster in Kubernetes
  • 18. Spark 2.3.0 bin/spark-submit --master k8s://https://<k8s-apiserver- host>:<k8s-apiserver-port> --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=<spark- image> local:///path/to/examples.jar
  • 19. In-House Solution • Custom Docker image with ML logic • Kubernetes as scheduler • Monitoring tools • Custom implementation of distributed tasks scheduler or framework (e.g. Celery)
  • 20. Demo | ML and HPA with Custom Metrics
  • 21. Demo ML and HPA with Custom Metrics To view the demo, check out our webinar on: https://goo.gl/vY6HbE
  • 23. Vlad Penkin Oleg Chunikhin Arkadii Ocheretnoi Thank you!

Editor's Notes

  1. Oleg C