SlideShare a Scribd company logo
WORKSHOP
Spark on Kubernetes
Create and set up your Spark cluster on Kubernetes
Leah Kolben, CTO
@leah4kosh
leah@cnvrg.io
whoami
• Developer/Data scientist => CTO
• cnvrg.io = built by data scientists, for data scientists to help teams:
• Get from data to models to production in the most efficient and fast way
• bridge science and engineering
agenda
• Introduction
• What’s spark
• Different spark implementations
• What’s kubernetes
• Spark on Kubernetes – pros vs cons
• Live Workshop
• Summary
What is Spark?
• Unified analytics engine for large-scale data processing
• Faster processing speed of applications due to In-memory cluster computing (100x
improvement)
• Support different workloads – batch, iterative, streaming, interactive SQL etc.
• Support multiple languages and different environments
Spark deployment modes
• Hadoop Yarn
• Apache Mesos
• Kubernetes
• Standalone
Kubernetes - recap
• Provides a runtime environment for Docker containers
• Provides an abstraction layer for containers to run on
• All services are natively load balanced
• Can scale up and down dynamically
• Monitor the health of the containers
• Schedule runs and cronjobs
• Use the same API across EVERY cloud provider and bare metal!
Spark Architecture on Kubernetes
Spark Architecture on Kubernetes
• Spark-submit will be used to submit a spark application using
kubectl:
• Spark will create a spark driver as a pod
• Driver will create executors (pods) and run the application code
• When job is done – terminate jobs and clean resources
(terminate nodes)
Spark on kubernetes
• Kubernetes can manage unified containerized pipelines
• Optimization for resource sharing
• Leverage kubernetes resources: PV, service mesh, scheduling
• Beta stage
Running out first workload
• Use GKE for kubernetes cluster ( with auto scaling enabled)
• Build spark docker image for kubernetes to use
• Run pi.py on the cluster
Let’s do it!
Why use cnvrg to run your spark workloads?
• Leverages the spark & kubernetes to one unified system
• Reproducible jobs: artifacts are linked to workloads
• Monitor your SPARK workload health
• One unified dashboard for all your projects and workloads
• Simple & fast
• Clarity
DEMO
Summary
• Spark is a unified analytics engine for large-scale data processing
• Kubernetes is a platform for containers orchestration
• Overview on Spark different implementaitons
• Overview of spark on Kubernetes architecture
• Overview of spark on Kubernetes: Pros vs. Cons
• Submit a spark job directly on kubernetes cluster
• Manage, monitor and automate your workloads using cnvrg
Thanks!
https://cnvrg.io
info@cnvrg.io
+972-506-660186

More Related Content

What's hot

Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Sumit Maheshwari
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
SigOpt
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
Outlyer
 
Knative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftKnative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and Openshift
Chris Suszyński
 
Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018
Francesco Strozzi
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
Rustem Zakiev
 
Java & Microservices in Azure
Java & Microservices in AzureJava & Microservices in Azure
Java & Microservices in Azure
CodeOps Technologies LLP
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
Gerard Toonstra
 
Cloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.comCloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.com
AWS Vietnam Community
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
MamathaBusi
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
Databricks
 
Kubernetes in Adform
Kubernetes in AdformKubernetes in Adform
Kubernetes in Adform
Edgaras Apšega
 
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhiskTechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
Janakiram MSV
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS User Group - Thailand
 
OpenShift Meetup - Red Hat OpenShift Container Storage explained
OpenShift Meetup - Red Hat OpenShift Container Storage explainedOpenShift Meetup - Red Hat OpenShift Container Storage explained
OpenShift Meetup - Red Hat OpenShift Container Storage explained
ConSol Consulting & Solutions Software GmbH
 
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Seldon
 
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Outlyer
 
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCPstackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
NETWAYS
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real life
DataArt
 
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Conference
 

What's hot (20)

Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
 
Knative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and OpenshiftKnative, Serverless on Kubernetes, and Openshift
Knative, Serverless on Kubernetes, and Openshift
 
Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018Nextflow and AWS Batch - GCC/BOSC 2018
Nextflow and AWS Batch - GCC/BOSC 2018
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
 
Java & Microservices in Azure
Java & Microservices in AzureJava & Microservices in Azure
Java & Microservices in Azure
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Cloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.comCloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.com
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Kubernetes in Adform
Kubernetes in AdformKubernetes in Adform
Kubernetes in Adform
 
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhiskTechTalk Webinar Series - Getting Started with Apache OpenWhisk
TechTalk Webinar Series - Getting Started with Apache OpenWhisk
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
 
OpenShift Meetup - Red Hat OpenShift Container Storage explained
OpenShift Meetup - Red Hat OpenShift Container Storage explainedOpenShift Meetup - Red Hat OpenShift Container Storage explained
OpenShift Meetup - Red Hat OpenShift Container Storage explained
 
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
 
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
Brendon Foxen (Channel 4) - Speeding up Software Delivery at Channel 4
 
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCPstackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
stackconf 2021 | How we finally migrated an eCommerce-Platform to GCP
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real life
 
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
 

Similar to Webinar kubernetes and-spark

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Databricks
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
Timothy Chen
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
Antje Barth
 
Innovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXCInnovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXC
kscaldef
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
Eugene Fedorenko
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
Laura Frank Tacho
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
Adf with docker
Adf with dockerAdf with docker
Adf with docker
Eugene Fedorenko
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
Animesh Singh
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Spark Summit
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Kubernetes 101 Workshop
Vishal Biyani
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
Vishal Biyani
 
Container management with docker & kubernetes
Container management with docker & kubernetesContainer management with docker & kubernetes
Container management with docker & kubernetes
Kasun Rajapakse
 
CNCF Projects Overview
CNCF Projects OverviewCNCF Projects Overview
CNCF Projects Overview
Neependra Khare
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
Shikha Srivastava
 
Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016
Charles Eckel
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
How to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with KubernetesHow to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
Ronny Trommer
 

Similar to Webinar kubernetes and-spark (20)

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes[Spark Summit 2017 NA] Apache Spark on Kubernetes
[Spark Summit 2017 NA] Apache Spark on Kubernetes
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Innovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXCInnovating faster with SBT, Continuous Delivery, and LXC
Innovating faster with SBT, Continuous Delivery, and LXC
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Adf with docker
Adf with dockerAdf with docker
Adf with docker
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Kubernetes 101 Workshop
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
 
Container management with docker & kubernetes
Container management with docker & kubernetesContainer management with docker & kubernetes
Container management with docker & kubernetes
 
CNCF Projects Overview
CNCF Projects OverviewCNCF Projects Overview
CNCF Projects Overview
 
How kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updatedHow kubernetes operators can rescue dev secops in midst of a pandemic updated
How kubernetes operators can rescue dev secops in midst of a pandemic updated
 
Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016Getting Started with OpenStack, Red Hat Summit 2016
Getting Started with OpenStack, Red Hat Summit 2016
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
How to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with KubernetesHow to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with Kubernetes
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
 

More from cnvrg.io AI OS - Hands-on ML Workshops

CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
cnvrg.io AI OS - Hands-on ML Workshops
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
cnvrg.io AI OS - Hands-on ML Workshops
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Continual learning with human in-the-loop
Continual learning with human in-the-loopContinual learning with human in-the-loop
Continual learning with human in-the-loop
cnvrg.io AI OS - Hands-on ML Workshops
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
cnvrg.io AI OS - Hands-on ML Workshops
 
Why more than half of ML models don't make it to production
Why more than half of ML models don't make it to productionWhy more than half of ML models don't make it to production
Why more than half of ML models don't make it to production
cnvrg.io AI OS - Hands-on ML Workshops
 
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
Training Machine Learning models directly from GitHub with cnvrg.io MLOpsTraining Machine Learning models directly from GitHub with cnvrg.io MLOps
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
cnvrg.io AI OS - Hands-on ML Workshops
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
cnvrg.io AI OS - Hands-on ML Workshops
 

More from cnvrg.io AI OS - Hands-on ML Workshops (9)

CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
How to use continual learning in your ML models
How to use continual learning in your ML modelsHow to use continual learning in your ML models
How to use continual learning in your ML models
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Continual learning with human in-the-loop
Continual learning with human in-the-loopContinual learning with human in-the-loop
Continual learning with human in-the-loop
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
 
Why more than half of ML models don't make it to production
Why more than half of ML models don't make it to productionWhy more than half of ML models don't make it to production
Why more than half of ML models don't make it to production
 
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
Training Machine Learning models directly from GitHub with cnvrg.io MLOpsTraining Machine Learning models directly from GitHub with cnvrg.io MLOps
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
 
Scaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX SystemsScaling MLOps on NVIDIA DGX Systems
Scaling MLOps on NVIDIA DGX Systems
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

Webinar kubernetes and-spark

  • 1. WORKSHOP Spark on Kubernetes Create and set up your Spark cluster on Kubernetes Leah Kolben, CTO @leah4kosh leah@cnvrg.io
  • 2. whoami • Developer/Data scientist => CTO • cnvrg.io = built by data scientists, for data scientists to help teams: • Get from data to models to production in the most efficient and fast way • bridge science and engineering
  • 3. agenda • Introduction • What’s spark • Different spark implementations • What’s kubernetes • Spark on Kubernetes – pros vs cons • Live Workshop • Summary
  • 4. What is Spark? • Unified analytics engine for large-scale data processing • Faster processing speed of applications due to In-memory cluster computing (100x improvement) • Support different workloads – batch, iterative, streaming, interactive SQL etc. • Support multiple languages and different environments
  • 5. Spark deployment modes • Hadoop Yarn • Apache Mesos • Kubernetes • Standalone
  • 6. Kubernetes - recap • Provides a runtime environment for Docker containers • Provides an abstraction layer for containers to run on • All services are natively load balanced • Can scale up and down dynamically • Monitor the health of the containers • Schedule runs and cronjobs • Use the same API across EVERY cloud provider and bare metal!
  • 8. Spark Architecture on Kubernetes • Spark-submit will be used to submit a spark application using kubectl: • Spark will create a spark driver as a pod • Driver will create executors (pods) and run the application code • When job is done – terminate jobs and clean resources (terminate nodes)
  • 9. Spark on kubernetes • Kubernetes can manage unified containerized pipelines • Optimization for resource sharing • Leverage kubernetes resources: PV, service mesh, scheduling • Beta stage
  • 10. Running out first workload • Use GKE for kubernetes cluster ( with auto scaling enabled) • Build spark docker image for kubernetes to use • Run pi.py on the cluster
  • 12. Why use cnvrg to run your spark workloads? • Leverages the spark & kubernetes to one unified system • Reproducible jobs: artifacts are linked to workloads • Monitor your SPARK workload health • One unified dashboard for all your projects and workloads • Simple & fast • Clarity
  • 13. DEMO
  • 14. Summary • Spark is a unified analytics engine for large-scale data processing • Kubernetes is a platform for containers orchestration • Overview on Spark different implementaitons • Overview of spark on Kubernetes architecture • Overview of spark on Kubernetes: Pros vs. Cons • Submit a spark job directly on kubernetes cluster • Manage, monitor and automate your workloads using cnvrg