SlideShare a Scribd company logo
Cloud-Native Spark Scheduling with YuniKorn Scheduler
Li Gao
Tech lead and engineer @ Databricks Compute Fabric
Previous tech lead at data infrastructure @ Lyft
Weiwei Yang
Tech Lead @ Cloudera Compute Platform
Apache Hadoop Committer & PMC member
Previous tech lead at Real-time Compute Infra @ Alibaba
Agenda
Li Gao
Why Lyft is choosing Spark on K8s
The need for custom k8s scheduler for Spark
Weiwei Yang
Spark Scheduling with YuniKorn
Deep Dive into YuniKorn Features
Community and Roadmap
Role of K8s in Lyft’s Data Landscape
Why Choose K8s for Spark
▪ Containerized spark compute to provide shared resources across
different ML and ETL jobs
▪ Support for multiple Spark versions, Python versions, and version
controlled containers on the shared K8s clusters for both faster iteration
and stable production
▪ A single, unified infrastructure for both majority of our data compute and
micro services with advanced, unified observability and resource
isolation support
▪ Fine grained access controls on shared clusters
The Spark K8s Infra @ Lyft
Multi-step creation for a Spark K8s job
Resource
Labels
Jobs
Cluster
Pool
K8s
Cluster
Namespace
Group
Namespace
Spark CRD
Spark Pods
DataLake
Problems of existing Spark K8s infrastructure
▪ Complexity of layers of custom K8s controllers to handle the scale of the
spark jobs
▪ Tight coupling of controller layers makes latency issues amplified in
certain cases
▪ Priority queues between jobs, clusters, and namespaces are managed by
multiple layers of controllers to achieve desired performance
Why we need a customized K8s Scheduler
▪ High latency (~100 seconds) using the default scheduler is observed on a single
K8s cluster for large volumes of batch workloads
▪ Large batch fair sharing in the same resource pool is unpredictable with the
default scheduler
▪ Mix of FIFO and FAIR requirements on shared jobs clusters
▪ The need for an elastic and hierarchical priority management for jobs in K8s
▪ Richer and online user visibility into the scheduling behavior
▪ Simplified layers of controllers with custom K8s scheduler
Spark Scheduling with YuniKorn
Flavors of Running Spark on K8s
Native Spark on K8s Spark K8s Operator
Identify Spark jobs by the pod labels Identify Spark jobs by CRDs (e.g SparkApplication)
Resource Scheduling in K8s
Scheduler workflow in human language: The scheduler picks
up a pod each time, find the best fit node and then launch
the pod on that node.
Spark on K8s: the scheduling challenges
▪ Job Scheduling Requirements
▪ Job ordering/queueing
▪ Job level priority
▪ Resource fairness (between jobs / queues)
▪ Gang scheduling
▪ Resource Sharing and Utilization Challenges
▪ Spark driver pods occupied all resources in a
namespace
▪ Resource competition, deadlock between large jobs
▪ Misbehave jobs could abuse resources
▪ High throughput
Ad-Hoc Queries Batch Jobs
Workflow (DAG) Streaming
The need of an unified architecture for both on-prem, cloud,
multi-cloud and hybrid cloud
K8s default scheduler was NOT created to tackle these challenges
Apache YuniKorn (Incubating)
What is it:
▪ A standalone resource scheduler for K8s
▪ Focus on building scheduling capabilities to empower Big Data
on K8s
Simple to use:
▪ A stateless service on K8s
▪ A secondary K8s scheduler or replacement of the default
scheduler
Resource Scheduling in YuniKorn (and compare w/ default scheduler)
Apps
API
Server
ETCD
Resource
Scheduler
master
Apps
Nodes
Queues
Request
Kubelet
Filter
Score
Sort
Extensions
Queue
Sort
App
Sort
Node
Sort
Pluggable
Policies
YUNIKORN
Default
Scheduler
31 2
YuniKorn QUEUE, APP
concepts are critical to
provide advanced job
scheduling and fine-grained
resource management
capabilities
Main difference (YuniKorn v.s Default Scheduler)
Feature Default
Scheduler
YUNIKORN Note
Scheduling at app
dimension
App is the 1st class citizen in YuniKorn, YuniKorn schedules apps with respect
to, e,g their submission order, priority, resource usage, etc.
Job ordering YuniKorn supports FIFO/FAIR/Priority (WIP) job ordering policies
Fine-grained resource
capacity management
Manage cluster resources with hierarchy queues, queue provides the
guaranteed resources (min) and the resource quota (max).
Resource fairness Inter-queue resource fairness
Natively support Big Data
workloads
The default scheduler is main for long-running services. YuniKorn is designed
for Big Data app workloads, it natively supports Spark/Flink/Tensorflow, etc.
Scale & Performance YuniKorn is optimized for performance, it is suitable for high throughput and
large scale environments.
Run Spark with YuniKorn
Submit a Spark job
1) Run spark-submit
2) Create SparkApplication CRD
Spark
Driver
pod
Pending
Spark-job-001
Spark-job-001
Spark
Driver
Pod Spark-job-001 Spark-job-001
Spark
Driver
Pod
Spark
Executor
Pod
Api-server creates the
driver pod
Spark job is registered to
YuniKorn in a leaf queue
Sort queues -> sort apps -> select
request -> select node
Driver pod is started, it
requests for Spark
executor pods from
api-server
Api-server binds the
pod to the assigned
node
Driver pod requests for
executors, api-server creates
executor pods
Spark
Driver
pod
Pending
Spark
Driver
pod
Bound
Spark
Driver
pod
Bound
Spark
Driver
pod
Bound
Spark
Executor
pod
Spark
Executor
pod
Spark
Executor
pod
Bound
Job is Starting
Spark driver is
running
Spark executors
are created
Spark job is
running
Spark-job-001
Spark
Driver
Pod
Spark
Executor
Pod
Spark
Executor
pod
Spark
Executor
pod
Spark
Executor
pod
Pending
New executors are added as
pending requests
Ask api-server to bind
the pod to the node
Schedule, and bind executors
Pending
Deep Dive into YuniKorn Features/Performance
Job Ordering
Why this matters?
▪ If I submit the job earlier, I want my job to run first
▪ I don’t want my job gets starved as resources are used by others
▪ I have a urgent job, let me run first!
Per queue sorting policy
▪ FIFO - Order jobs by submission time
▪ FAIR - Order jobs by resource consumption
▪ Priority (WIP-0.9) - Order jobs by job-level prioritizes within the
same queue
Resource Quota Management: K8s Namespace ResourceQuota
K8s Namespace Resource Quota
▪ Defines resource limits
▪ Enforced by the quota admission-controller
Problems
▪ Hard to control when resource quotas are overcommitted
▪ Users has no guaranteed resources
▪ Quota could be abused (e.g by pending pods)
▪ No queueing for jobs…
▪ Low utilization?!
Namespace Resource Quota is suboptimal to support
resource sharing between multi-tenants
Resource Quota Management: YuniKorn Queue Capacity
YuniKorn Queue provides a optimal solution to manage resource quotas
▪ A queue can map to one (or more) namespaces automatically
▪ Capacity is elastic from min to max
▪ Honor resource fairness
▪ Quota is only counted for pods which actually consumes resources
▪ Enable Job queueing
Namespace
YuniKorn
Queue
CPU: 1
Memory: 1024Mi
CPU: 2
Memory: 2048Mi
CPU: 2
Memory: 2048Mi
Queue Max CPU: 5
Memory: 5120Mi
-> better resource sharing, ensure guarantee, enforce max
-> zero config queue mgmt
-> avoid starving jobs/users
-> accurate resource counting, improve utilization
-> jobs can be queued in the scheduler, keep client side logic simple
Resource Fairness in YuniKorn Queues
Queue
Guaranteed Resource
(Mem)
Requests
(NumOfPods * Mem)
root.default 500,000 1000 * 10
root.search 400,000 500 * 10
root.test 100,000 200 * 10
root.sandbox 100,000 200 * 50
Scheduling workloads with different requests in 4 queues with
different guaranteed resources.
Usage ratios of queues increased with similar trend
Scheduler Throughput Benchmark
Schedule 50,000 pods on
2,000/4,000 nodes.
Compare Scheduling throughput
(Pods per second allocated by
scheduler)
Red line (YuniKorn)
Green line (Default Scheduler)
617 vs 263 ↑ 134%
373 vs 141 ↑ 164%
Detail report:
https://github.com/apache/incubator-yunikorn-core/blob/master/docs/evaluate-perf-function-with-Kubemark.md
50k pods on 2k nodes 50k pods on 4k nodes
Fully K8s Compatible
▪ Support K8s Predicates
▪ Node selector
▪ Pod affinity/anti-affinity
▪ Taints and toleration
▪ …
▪ Support PersistentVolumeClaim and PersistentVolume
▪ Volume bindings
▪ Dynamical provisioning
▪ Publishes key scheduling events to K8s event system
▪ Work with cluster autoscaler
▪ Support management commands
▪ cordon nodes
YuniKorn Management Console
Compare YuniKorn with other K8s schedulers
Scheduler
Capabilities
Resource Sharing Resource Fairness Preemption
Gang
Scheduling
Bin
Packing Throughput
Hierarchy
queues
Queue
elastic
capacity
Cross
queue
fairness
User level
fairness
App level
fairness
Basic
preemption
With
fairness
K8s
default
scheduler x x x x x v x x v
260 allocs/s
(2k nodes)
Kube-batch x x v x v v x v v
? Likely slower than
kube-default from [1]
YuniKorn v v v v v v v v* YUNIKORN-2 v
610 allocs/s
(2k nodes)
[1] https://github.com/kubernetes-sigs/kube-batch/issues/930
Community, Summary and Next
Current Status
▪ Open source at July 17, 2019, Apache 2.0 License
▪ Enter Apache Incubator since Jan 21, 2020
▪ Latest stable version 0.8.0 released on May 4, 2020
▪ Diverse community with members from Alibaba, Cloudera,
Microsoft, LinkedIn, Apple, Tencent, Nvidia and more…
The Community
▪ Deployed in non-production K8s clusters
▪ Launched 100s of large jobs per day on
some of the YuniKorn queues
▪ Reduced our large job scheduler latency by
factor of ~ 3x at peak time
▪ K8s cluster overall resource utilization
efficiency (cost per compute) improved
over the default kube-scheduler for mixed
workloads
▪ FIFO and FAIR requests are more frequently
met than before
▪ Shipping with Cloudera Public Cloud
offerings
▪ Provide resource quota management and
advanced job scheduling capabilities for
Spark
▪ Responsible for both micro-service, and
batch jobs scheduling
▪ Running on Cloud with auto-scaling enabled
▪ Deployed on pre-production on-prem
cluster with ~100 nodes
▪ Plan to deploy on 1000+ nodes production
K8s cluster this year
▪ Leverage YuniKorn features such as
hiercharchy queues, resource fairness to
run large scale Flink jobs on K8s
▪ Gained x4 scheduling performance
improvements
Roadmap
Current (0.8.0)
● Hirechay queues
● Cross queue fairness
● Fair/FIFO job ordering policies
● Fair/Bin-packing node sorting policies
● Self queue management
● Pluggable app discover
● Metrics system and Prometheus integration
Upcoming (0.9.0)
● Gang Scheduling
● Job/task priority support (scheduling & preemption)
● Support Spark dynamic allocation
3rd quarter of 2020
Our Vision - Resource Mgmt for Big Data & ML
Data Engineering, Realtime
Streaming, Machine Learning
Micro services, batch jobs, long
running workloads, interactive
sessions, model serving
Multi-tenancy, SLA, Resource
Utilization, Cost Mgmt, Budget
Computes Types Targets
Unified Compute Platform for Big Data & ML
Join us in the
YuniKorn Community !!
▪ Project web site: http://yunikorn.apache.org/
▪ Github repo: apache/incubator-yunikorn-core
▪ Mailing list: dev@yunikorn.apache.org
▪ Slack channel:
▪ Bi-weekly/Monthly sync up meetings for different time zones
Thank you!!

More Related Content

What's hot

State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Dimitris Kontokostas
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
kafka
kafkakafka
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Databricks
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Databricks
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 

What's hot (20)

State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
kafka
kafkakafka
kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 

Similar to Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler

Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Alluxio, Inc.
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Spark Summit
 
Optimized NFV placement in Openstack Clouds
Optimized NFV placement in Openstack CloudsOptimized NFV placement in Openstack Clouds
Optimized NFV placement in Openstack Clouds
Yathiraj Udupi, Ph.D.
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
DataWorks Summit
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Spark & Yarn better together 1.2
Spark & Yarn better together 1.2
Jianfeng Zhang
 
Optimized placement in Openstack for NFV
Optimized placement in Openstack for NFVOptimized placement in Openstack for NFV
Optimized placement in Openstack for NFV
Debojyoti Dutta
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
RightScale
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
Li Gao
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
Rakuten Group, Inc.
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 

Similar to Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler (20)

Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Optimized NFV placement in Openstack Clouds
Optimized NFV placement in Openstack CloudsOptimized NFV placement in Openstack Clouds
Optimized NFV placement in Openstack Clouds
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Spark & Yarn better together 1.2
Spark & Yarn better together 1.2Spark & Yarn better together 1.2
Spark & Yarn better together 1.2
 
Optimized placement in Openstack for NFV
Optimized placement in Openstack for NFVOptimized placement in Openstack for NFV
Optimized placement in Openstack for NFV
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler

  • 1.
  • 2. Cloud-Native Spark Scheduling with YuniKorn Scheduler
  • 3. Li Gao Tech lead and engineer @ Databricks Compute Fabric Previous tech lead at data infrastructure @ Lyft Weiwei Yang Tech Lead @ Cloudera Compute Platform Apache Hadoop Committer & PMC member Previous tech lead at Real-time Compute Infra @ Alibaba
  • 4. Agenda Li Gao Why Lyft is choosing Spark on K8s The need for custom k8s scheduler for Spark Weiwei Yang Spark Scheduling with YuniKorn Deep Dive into YuniKorn Features Community and Roadmap
  • 5. Role of K8s in Lyft’s Data Landscape
  • 6. Why Choose K8s for Spark ▪ Containerized spark compute to provide shared resources across different ML and ETL jobs ▪ Support for multiple Spark versions, Python versions, and version controlled containers on the shared K8s clusters for both faster iteration and stable production ▪ A single, unified infrastructure for both majority of our data compute and micro services with advanced, unified observability and resource isolation support ▪ Fine grained access controls on shared clusters
  • 7. The Spark K8s Infra @ Lyft
  • 8. Multi-step creation for a Spark K8s job Resource Labels Jobs Cluster Pool K8s Cluster Namespace Group Namespace Spark CRD Spark Pods DataLake
  • 9. Problems of existing Spark K8s infrastructure ▪ Complexity of layers of custom K8s controllers to handle the scale of the spark jobs ▪ Tight coupling of controller layers makes latency issues amplified in certain cases ▪ Priority queues between jobs, clusters, and namespaces are managed by multiple layers of controllers to achieve desired performance
  • 10. Why we need a customized K8s Scheduler ▪ High latency (~100 seconds) using the default scheduler is observed on a single K8s cluster for large volumes of batch workloads ▪ Large batch fair sharing in the same resource pool is unpredictable with the default scheduler ▪ Mix of FIFO and FAIR requirements on shared jobs clusters ▪ The need for an elastic and hierarchical priority management for jobs in K8s ▪ Richer and online user visibility into the scheduling behavior ▪ Simplified layers of controllers with custom K8s scheduler
  • 12. Flavors of Running Spark on K8s Native Spark on K8s Spark K8s Operator Identify Spark jobs by the pod labels Identify Spark jobs by CRDs (e.g SparkApplication)
  • 13. Resource Scheduling in K8s Scheduler workflow in human language: The scheduler picks up a pod each time, find the best fit node and then launch the pod on that node.
  • 14. Spark on K8s: the scheduling challenges ▪ Job Scheduling Requirements ▪ Job ordering/queueing ▪ Job level priority ▪ Resource fairness (between jobs / queues) ▪ Gang scheduling ▪ Resource Sharing and Utilization Challenges ▪ Spark driver pods occupied all resources in a namespace ▪ Resource competition, deadlock between large jobs ▪ Misbehave jobs could abuse resources ▪ High throughput Ad-Hoc Queries Batch Jobs Workflow (DAG) Streaming The need of an unified architecture for both on-prem, cloud, multi-cloud and hybrid cloud K8s default scheduler was NOT created to tackle these challenges
  • 15. Apache YuniKorn (Incubating) What is it: ▪ A standalone resource scheduler for K8s ▪ Focus on building scheduling capabilities to empower Big Data on K8s Simple to use: ▪ A stateless service on K8s ▪ A secondary K8s scheduler or replacement of the default scheduler
  • 16. Resource Scheduling in YuniKorn (and compare w/ default scheduler) Apps API Server ETCD Resource Scheduler master Apps Nodes Queues Request Kubelet Filter Score Sort Extensions Queue Sort App Sort Node Sort Pluggable Policies YUNIKORN Default Scheduler 31 2 YuniKorn QUEUE, APP concepts are critical to provide advanced job scheduling and fine-grained resource management capabilities
  • 17. Main difference (YuniKorn v.s Default Scheduler) Feature Default Scheduler YUNIKORN Note Scheduling at app dimension App is the 1st class citizen in YuniKorn, YuniKorn schedules apps with respect to, e,g their submission order, priority, resource usage, etc. Job ordering YuniKorn supports FIFO/FAIR/Priority (WIP) job ordering policies Fine-grained resource capacity management Manage cluster resources with hierarchy queues, queue provides the guaranteed resources (min) and the resource quota (max). Resource fairness Inter-queue resource fairness Natively support Big Data workloads The default scheduler is main for long-running services. YuniKorn is designed for Big Data app workloads, it natively supports Spark/Flink/Tensorflow, etc. Scale & Performance YuniKorn is optimized for performance, it is suitable for high throughput and large scale environments.
  • 18. Run Spark with YuniKorn Submit a Spark job 1) Run spark-submit 2) Create SparkApplication CRD Spark Driver pod Pending Spark-job-001 Spark-job-001 Spark Driver Pod Spark-job-001 Spark-job-001 Spark Driver Pod Spark Executor Pod Api-server creates the driver pod Spark job is registered to YuniKorn in a leaf queue Sort queues -> sort apps -> select request -> select node Driver pod is started, it requests for Spark executor pods from api-server Api-server binds the pod to the assigned node Driver pod requests for executors, api-server creates executor pods Spark Driver pod Pending Spark Driver pod Bound Spark Driver pod Bound Spark Driver pod Bound Spark Executor pod Spark Executor pod Spark Executor pod Bound Job is Starting Spark driver is running Spark executors are created Spark job is running Spark-job-001 Spark Driver Pod Spark Executor Pod Spark Executor pod Spark Executor pod Spark Executor pod Pending New executors are added as pending requests Ask api-server to bind the pod to the node Schedule, and bind executors Pending
  • 19. Deep Dive into YuniKorn Features/Performance
  • 20. Job Ordering Why this matters? ▪ If I submit the job earlier, I want my job to run first ▪ I don’t want my job gets starved as resources are used by others ▪ I have a urgent job, let me run first! Per queue sorting policy ▪ FIFO - Order jobs by submission time ▪ FAIR - Order jobs by resource consumption ▪ Priority (WIP-0.9) - Order jobs by job-level prioritizes within the same queue
  • 21. Resource Quota Management: K8s Namespace ResourceQuota K8s Namespace Resource Quota ▪ Defines resource limits ▪ Enforced by the quota admission-controller Problems ▪ Hard to control when resource quotas are overcommitted ▪ Users has no guaranteed resources ▪ Quota could be abused (e.g by pending pods) ▪ No queueing for jobs… ▪ Low utilization?! Namespace Resource Quota is suboptimal to support resource sharing between multi-tenants
  • 22. Resource Quota Management: YuniKorn Queue Capacity YuniKorn Queue provides a optimal solution to manage resource quotas ▪ A queue can map to one (or more) namespaces automatically ▪ Capacity is elastic from min to max ▪ Honor resource fairness ▪ Quota is only counted for pods which actually consumes resources ▪ Enable Job queueing Namespace YuniKorn Queue CPU: 1 Memory: 1024Mi CPU: 2 Memory: 2048Mi CPU: 2 Memory: 2048Mi Queue Max CPU: 5 Memory: 5120Mi -> better resource sharing, ensure guarantee, enforce max -> zero config queue mgmt -> avoid starving jobs/users -> accurate resource counting, improve utilization -> jobs can be queued in the scheduler, keep client side logic simple
  • 23. Resource Fairness in YuniKorn Queues Queue Guaranteed Resource (Mem) Requests (NumOfPods * Mem) root.default 500,000 1000 * 10 root.search 400,000 500 * 10 root.test 100,000 200 * 10 root.sandbox 100,000 200 * 50 Scheduling workloads with different requests in 4 queues with different guaranteed resources. Usage ratios of queues increased with similar trend
  • 24. Scheduler Throughput Benchmark Schedule 50,000 pods on 2,000/4,000 nodes. Compare Scheduling throughput (Pods per second allocated by scheduler) Red line (YuniKorn) Green line (Default Scheduler) 617 vs 263 ↑ 134% 373 vs 141 ↑ 164% Detail report: https://github.com/apache/incubator-yunikorn-core/blob/master/docs/evaluate-perf-function-with-Kubemark.md 50k pods on 2k nodes 50k pods on 4k nodes
  • 25. Fully K8s Compatible ▪ Support K8s Predicates ▪ Node selector ▪ Pod affinity/anti-affinity ▪ Taints and toleration ▪ … ▪ Support PersistentVolumeClaim and PersistentVolume ▪ Volume bindings ▪ Dynamical provisioning ▪ Publishes key scheduling events to K8s event system ▪ Work with cluster autoscaler ▪ Support management commands ▪ cordon nodes
  • 27. Compare YuniKorn with other K8s schedulers Scheduler Capabilities Resource Sharing Resource Fairness Preemption Gang Scheduling Bin Packing Throughput Hierarchy queues Queue elastic capacity Cross queue fairness User level fairness App level fairness Basic preemption With fairness K8s default scheduler x x x x x v x x v 260 allocs/s (2k nodes) Kube-batch x x v x v v x v v ? Likely slower than kube-default from [1] YuniKorn v v v v v v v v* YUNIKORN-2 v 610 allocs/s (2k nodes) [1] https://github.com/kubernetes-sigs/kube-batch/issues/930
  • 29. Current Status ▪ Open source at July 17, 2019, Apache 2.0 License ▪ Enter Apache Incubator since Jan 21, 2020 ▪ Latest stable version 0.8.0 released on May 4, 2020 ▪ Diverse community with members from Alibaba, Cloudera, Microsoft, LinkedIn, Apple, Tencent, Nvidia and more…
  • 30. The Community ▪ Deployed in non-production K8s clusters ▪ Launched 100s of large jobs per day on some of the YuniKorn queues ▪ Reduced our large job scheduler latency by factor of ~ 3x at peak time ▪ K8s cluster overall resource utilization efficiency (cost per compute) improved over the default kube-scheduler for mixed workloads ▪ FIFO and FAIR requests are more frequently met than before ▪ Shipping with Cloudera Public Cloud offerings ▪ Provide resource quota management and advanced job scheduling capabilities for Spark ▪ Responsible for both micro-service, and batch jobs scheduling ▪ Running on Cloud with auto-scaling enabled ▪ Deployed on pre-production on-prem cluster with ~100 nodes ▪ Plan to deploy on 1000+ nodes production K8s cluster this year ▪ Leverage YuniKorn features such as hiercharchy queues, resource fairness to run large scale Flink jobs on K8s ▪ Gained x4 scheduling performance improvements
  • 31. Roadmap Current (0.8.0) ● Hirechay queues ● Cross queue fairness ● Fair/FIFO job ordering policies ● Fair/Bin-packing node sorting policies ● Self queue management ● Pluggable app discover ● Metrics system and Prometheus integration Upcoming (0.9.0) ● Gang Scheduling ● Job/task priority support (scheduling & preemption) ● Support Spark dynamic allocation 3rd quarter of 2020
  • 32. Our Vision - Resource Mgmt for Big Data & ML Data Engineering, Realtime Streaming, Machine Learning Micro services, batch jobs, long running workloads, interactive sessions, model serving Multi-tenancy, SLA, Resource Utilization, Cost Mgmt, Budget Computes Types Targets Unified Compute Platform for Big Data & ML
  • 33. Join us in the YuniKorn Community !! ▪ Project web site: http://yunikorn.apache.org/ ▪ Github repo: apache/incubator-yunikorn-core ▪ Mailing list: dev@yunikorn.apache.org ▪ Slack channel: ▪ Bi-weekly/Monthly sync up meetings for different time zones