SlideShare a Scribd company logo
1 of 15
Download to read offline
WORKSHOP
Spark on Kubernetes
Create and set up your Spark cluster on Kubernetes
Leah Kolben, CTO
@leah4kosh
leah@cnvrg.io
whoami
• Developer/Data scientist => CTO
• cnvrg.io = built by data scientists, for data scientists to help teams:
• Get from data to models to production in the most efficient and fast way
• bridge science and engineering
agenda
• Introduction
• What’s spark
• Different spark implementations
• What’s kubernetes
• Spark on Kubernetes – pros vs cons
• Live Workshop
• Summary
What is Spark?
• Unified analytics engine for large-scale data processing
• Faster processing speed of applications due to In-memory cluster computing (100x
improvement)
• Support different workloads – batch, iterative, streaming, interactive SQL etc.
• Support multiple languages and different environments
Spark deployment modes
• Hadoop Yarn
• Apache Mesos
• Kubernetes
• Standalone
Kubernetes - recap
• Provides a runtime environment for Docker containers
• Provides an abstraction layer for containers to run on
• All services are natively load balanced
• Can scale up and down dynamically
• Monitor the health of the containers
• Schedule runs and cronjobs
• Use the same API across EVERY cloud provider and bare metal!
Spark Architecture on Kubernetes
Spark Architecture on Kubernetes
• Spark-submit will be used to submit a spark application using
kubectl:
• Spark will create a spark driver as a pod
• Driver will create executors (pods) and run the application code
• When job is done – terminate jobs and clean resources
(terminate nodes)
Spark on kubernetes
• Kubernetes can manage unified containerized pipelines
• Optimization for resource sharing
• Leverage kubernetes resources: PV, service mesh, scheduling
• Beta stage
Running out first workload
• Use GKE for kubernetes cluster ( with auto scaling enabled)
• Build spark docker image for kubernetes to use
• Run pi.py on the cluster
Let’s do it!
Why use cnvrg to run your spark workloads?
• Leverages the spark & kubernetes to one unified system
• Reproducible jobs: artifacts are linked to workloads
• Monitor your SPARK workload health
• One unified dashboard for all your projects and workloads
• Simple & fast
• Clarity
DEMO
Summary
• Spark is a unified analytics engine for large-scale data processing
• Kubernetes is a platform for containers orchestration
• Overview on Spark different implementaitons
• Overview of spark on Kubernetes architecture
• Overview of spark on Kubernetes: Pros vs. Cons
• Submit a spark job directly on kubernetes cluster
• Manage, monitor and automate your workloads using cnvrg
Thanks!
https://cnvrg.io
info@cnvrg.io
+972-506-660186

More Related Content

What's hot

Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Outlyer
 
Knative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftKnative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftChris Suszyński
 
Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018Francesco Strozzi
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow ArchitectureGerard Toonstra
 
Cloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.comCloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.comAWS Vietnam Community
 
Serverless spark
Serverless sparkServerless spark
Serverless sparkMamathaBusi
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhiskTechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhiskJanakiram MSV
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS User Group - Thailand
 
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Seldon
 
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4Outlyer
 
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCPstackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCPNETWAYS
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real lifeDataArt
 
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...Shift Conference
 

What's hot (20)

Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
 
Knative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftKnative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and Openshift
 
Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
 
Java & Microservices in Azure
Java & Microservices in AzureJava & Microservices in Azure
Java & Microservices in Azure
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Cloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.comCloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.com
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Kubernetes in Adform
Kubernetes in AdformKubernetes in Adform
Kubernetes in Adform
 
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhiskTechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
 
OpenShift Meetup - Red Hat OpenShift Container Storage explained
OpenShift Meetup - Red Hat OpenShift Container Storage explainedOpenShift Meetup - Red Hat OpenShift Container Storage explained
OpenShift Meetup - Red Hat OpenShift Container Storage explained
 
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
 
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
 
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCPstackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real life
 
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
 

Similar to Webinar kubernetes and-spark

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on KubernetesTimothy Chen
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learningAntje Barth
 
Innovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXCInnovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXCkscaldef
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellEugene Fedorenko
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerLaura Frank Tacho
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAnimesh Singh
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Kubernetes 101 WorkshopVishal Biyani
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesVishal Biyani
 
Container management with docker & kubernetes
Container management with docker & kubernetesContainer management with docker & kubernetes
Container management with docker & kubernetesKasun Rajapakse
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedShikha Srivastava
 
Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016Charles Eckel
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetesdatamantra
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesRonny Trommer
 

Similar to Webinar kubernetes and-spark (20)

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Innovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXCInnovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXC
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Adf with docker
Adf with dockerAdf with docker
Adf with docker
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Kubernetes 101 Workshop
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
 
Container management with docker & kubernetes
Container management with docker & kubernetesContainer management with docker & kubernetes
Container management with docker & kubernetes
 
CNCF Projects Overview
CNCF Projects OverviewCNCF Projects Overview
CNCF Projects Overview
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
 
Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
How to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with KubernetesHow to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with Kubernetes
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
 

More from cnvrg.io AI OS - Hands-on ML Workshops (9)

CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Continual learning with human in-the-loop
Continual learning with human in-the-loopContinual learning with human in-the-loop
Continual learning with human in-the-loop
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
 
Why more than half of ML models don't make it to production
Why more than half of ML models don't make it to productionWhy more than half of ML models don't make it to production
Why more than half of ML models don't make it to production
 
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
Training Machine Learning models directly from GitHub with cnvrg.io MLOpsTraining Machine Learning models directly from GitHub with cnvrg.io MLOps
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Webinar kubernetes and-spark

  • 1. WORKSHOP Spark on Kubernetes Create and set up your Spark cluster on Kubernetes Leah Kolben, CTO @leah4kosh leah@cnvrg.io
  • 2. whoami • Developer/Data scientist => CTO • cnvrg.io = built by data scientists, for data scientists to help teams: • Get from data to models to production in the most efficient and fast way • bridge science and engineering
  • 3. agenda • Introduction • What’s spark • Different spark implementations • What’s kubernetes • Spark on Kubernetes – pros vs cons • Live Workshop • Summary
  • 4. What is Spark? • Unified analytics engine for large-scale data processing • Faster processing speed of applications due to In-memory cluster computing (100x improvement) • Support different workloads – batch, iterative, streaming, interactive SQL etc. • Support multiple languages and different environments
  • 5. Spark deployment modes • Hadoop Yarn • Apache Mesos • Kubernetes • Standalone
  • 6. Kubernetes - recap • Provides a runtime environment for Docker containers • Provides an abstraction layer for containers to run on • All services are natively load balanced • Can scale up and down dynamically • Monitor the health of the containers • Schedule runs and cronjobs • Use the same API across EVERY cloud provider and bare metal!
  • 8. Spark Architecture on Kubernetes • Spark-submit will be used to submit a spark application using kubectl: • Spark will create a spark driver as a pod • Driver will create executors (pods) and run the application code • When job is done – terminate jobs and clean resources (terminate nodes)
  • 9. Spark on kubernetes • Kubernetes can manage unified containerized pipelines • Optimization for resource sharing • Leverage kubernetes resources: PV, service mesh, scheduling • Beta stage
  • 10. Running out first workload • Use GKE for kubernetes cluster ( with auto scaling enabled) • Build spark docker image for kubernetes to use • Run pi.py on the cluster
  • 12. Why use cnvrg to run your spark workloads? • Leverages the spark & kubernetes to one unified system • Reproducible jobs: artifacts are linked to workloads • Monitor your SPARK workload health • One unified dashboard for all your projects and workloads • Simple & fast • Clarity
  • 13. DEMO
  • 14. Summary • Spark is a unified analytics engine for large-scale data processing • Kubernetes is a platform for containers orchestration • Overview on Spark different implementaitons • Overview of spark on Kubernetes architecture • Overview of spark on Kubernetes: Pros vs. Cons • Submit a spark job directly on kubernetes cluster • Manage, monitor and automate your workloads using cnvrg