SlideShare a Scribd company logo
Spark and YARN:
Better Together
Saisai Shao , Jeff Zhang
sshao@hortonworks.com, jzhang@hortonworks.com
August 6, 2016
Saisai Shao
 Member of Technical Staff @ Hortonworks
 Focus on Open Source Big Data area
 Active Apache Spark Contributor
Jeff Zhang
 Member of Technical Staff @ Hortonworks
 Apache Pig/Tez Committer and Spark Contributor
 Open source evangelist
Spark on YARN Recap
Cluster Manager
Overview of Spark Cluster
Driver
Executor ExecutorExecutor
NM NMNM
Spark Running On YARN (Client Mode)
Driver
ResourceManager
Client
Container
Executor
ContainerContainer
Executor Executor
Cont
ainer
AM
NM
NM NMNM
Spark Running On YARN (Cluster Mode)
ResourceManager
Client
Container ContainerContainer
Container
Executor Executor Executor
AM
Driver
Difference Compared to Other Cluster Manager
 Application has to be submitted into a queue
 Jars/files/archives are distributed through distributed cache
 Additional ApplicationMaster
 …
Better Run Spark On YARN
What Do We Concern About ?
 Better use the resources
 Better run on cluster
 Easy to debug
Calculate Container Size
 What is the size of a Container ?
- Memory
- CPU#
spark.executor.memory
spark.yarn.executor.memoryOverhead (offheap)
spark.memory.fraction (0.75)
spark.memory.storage
Fraction
(0.5)
execution
memory
spark.memory.offHeap.size
# Depend on CPU scheduling is enabled or not
container memory = spark executor memory + overhead memory
yarn.scheduler.minimum-allocation-mb <= container memory
<= yarn.nodemanager.resource.memory-mb
container memory will be round to yarn.scheduler.increment-allocation-mb
Calculate Container Size (Cont’d)
 Enable CPU Scheduling
- Capacity Scheduler with DefaultResourceCalculator
(default)
- Only takes memory into account
- CPU requirements are ignored when carrying out
allocations
- The setting of “--executor-cores” is controlled by Spark
itself
- Capacity Scheduler with DominantResourceCalculator
- CPU will also take into account when calculating
- Container vcores = executor corescontainer cores <= nodemanger.resource.cpu-vcores
Isolate Container Resource
Containers should only be allowed to use resource they get allocated, they
should not be affected by other containers on the node
How do we ensure containers don’t exceed their vcore allocation?
What’s stopping an errant container from spawning bunch of threads and consume
all the CPU on the node?
CGroups
With the setting of LinuxContainerExecutor and others, YARN could enable CGroups to constrain the
CPU usage (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html).
Label Based Scheduling
How to specify applications to run on specific nodes?
Label based scheduling is what you want.
To use it:
- Enable node label and label scheduling in YARN side (Hadoop 2.6+)
- Configure node label expression in Spark conf:
- spark.yarn.am.nodeLabelExpression
- spark.yarn.executor.nodeLabelExpression
Dynamic Resource Allocation
How to use the resource more effectively and more resiliently?
Spark supports dynamically requesting or releasing executors according to the current load of jobs.
This is especially useful for long-running applications like Spark shell, Thrift Server, Zeppelin.
To Enable Dynamic Resource Allocation
spark.streaming.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
Resilient to Service Restart/Failure
 Resilient to ResourceManager restart/failure
- RM is a single point of failure
- Configuring yarn.resourcemanager.ha.enabled to
enable RM HA
- Enable yarn.resourcemanager.recovery.enabled
to be resilient to RM change or restart
- Non work preserving RM restart (Hadoop 2.4)
- Work preserving RM restart (Hadoop 2.6)
 Resilient to NodeManager restart
- Enable yarn.nodemanager.recovery.enabled to
be resilient to NM restart (Hadoop 2.6)
Access Kerberized Environment
 Spark on YARN supports accessing Kerberized Hadoop environment by Kerberos.
 It will automatically get tokens from NN if the Hadoop environment is security
enabled.
 To retrieve the delegation tokens for non-HDFS services when security is enabled,
configure spark.yarn.security.tokens.{hive|hbase}.enabled.
 You could also specify --principal and --keytab to let Spark on YARN to do kinit and
token renewal automatically, this is useful for long running service like Spark
Streaming.
Fast Debug YARN Application
 What we usually meet when running Spark on YARN?
- Classpath problem
- Java parameters does not work
- Configuration doesn‘t take affect.
• yarn.nodemanager.delete.debug-delay-sec
• ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}/container_${contid}
Future works of Spark On YARN
More Advanced Dynamic Resource Allocation
 The problem of current algorithm:
- Current algorithm is based on the load of tasks, the more tasks submitted the
more executor will be requested. This will introduce resource starvation for other
applications.
- Current algorithm is an universal algorithm doesn't consider the specific features
of cluster manager.
 To improve current dynamic resource allocation:
- Make current algorithm pluggable, to be adapted to customized algorithms.
- Incorporate more YARN specific features to improve the current algorithm
- Container resizing
- Cooperative preemption
- …
Integrate YARN Application Timeline Server
 Timeline Server - Storage and retrieval of applications' current as well as historic
information in a generic fashion is solved in YARN through the Timeline Server. This
serves two responsibilities:
- Generic information about completed applications
- Per-framework information of running and completed applications
 We’re working on integrating Spark’s history server with YARN application timeline
server.
 Already released with HDP, continue improving ATS in the yarn side to bring in
better performance and stability.
Pluggable Security Services
 Current Problems:
- How to access security services other than HDFS, HBase and Hive
- How to support security keys other than Delegation Token
 To improve current token management mechanism:
- Provide a configurable security credential manager
- Make security credential provider pluggable for different services.
- SPARK-14743
ThanksThanks

More Related Content

What's hot

Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQL
Julien Pierre
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
Databricks
 
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersSpark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
shareddatamsft
 
Module01
 Module01 Module01
Module01
NPN Training
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
Databricks
 
Koalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache SparkKoalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache Spark
Databricks
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
phanleson
 
Apache spark
Apache sparkApache spark
Apache spark
Prashant Pranay
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Databricks
 
Apache spark
Apache sparkApache spark
Apache spark
Dona Mary Philip
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Databricks
 
Application Timeline Server Past, Present and Future
Application Timeline Server  Past, Present and FutureApplication Timeline Server  Past, Present and Future
Application Timeline Server Past, Present and Future
Naganarasimha Garla
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Spark Summit
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
Jen Aman
 

What's hot (20)

Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQL
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersSpark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
 
Module01
 Module01 Module01
Module01
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Koalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache SparkKoalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache Spark
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
 
Apache spark
Apache sparkApache spark
Apache spark
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Application Timeline Server Past, Present and Future
Application Timeline Server  Past, Present and FutureApplication Timeline Server  Past, Present and Future
Application Timeline Server Past, Present and Future
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 

Similar to Spark & Yarn better together 1.2

Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
Xuan-Chao Huang
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
Amazon Web Services
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
Amazon Web Services
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
DataWorks Summit
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
Amazon Web Services
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
datamantra
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Amazon Web Services
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
Demi Ben-Ari
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
Databricks
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
Hortonworks
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Databricks
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
Omkar Joshi
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
MLconf
 

Similar to Spark & Yarn better together 1.2 (20)

Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Spark & Yarn better together 1.2

  • 1. Spark and YARN: Better Together Saisai Shao , Jeff Zhang sshao@hortonworks.com, jzhang@hortonworks.com August 6, 2016
  • 2. Saisai Shao  Member of Technical Staff @ Hortonworks  Focus on Open Source Big Data area  Active Apache Spark Contributor Jeff Zhang  Member of Technical Staff @ Hortonworks  Apache Pig/Tez Committer and Spark Contributor  Open source evangelist
  • 4. Cluster Manager Overview of Spark Cluster Driver Executor ExecutorExecutor
  • 5. NM NMNM Spark Running On YARN (Client Mode) Driver ResourceManager Client Container Executor ContainerContainer Executor Executor Cont ainer AM
  • 6. NM NM NMNM Spark Running On YARN (Cluster Mode) ResourceManager Client Container ContainerContainer Container Executor Executor Executor AM Driver
  • 7. Difference Compared to Other Cluster Manager  Application has to be submitted into a queue  Jars/files/archives are distributed through distributed cache  Additional ApplicationMaster  …
  • 9. What Do We Concern About ?  Better use the resources  Better run on cluster  Easy to debug
  • 10. Calculate Container Size  What is the size of a Container ? - Memory - CPU# spark.executor.memory spark.yarn.executor.memoryOverhead (offheap) spark.memory.fraction (0.75) spark.memory.storage Fraction (0.5) execution memory spark.memory.offHeap.size # Depend on CPU scheduling is enabled or not container memory = spark executor memory + overhead memory yarn.scheduler.minimum-allocation-mb <= container memory <= yarn.nodemanager.resource.memory-mb container memory will be round to yarn.scheduler.increment-allocation-mb
  • 11. Calculate Container Size (Cont’d)  Enable CPU Scheduling - Capacity Scheduler with DefaultResourceCalculator (default) - Only takes memory into account - CPU requirements are ignored when carrying out allocations - The setting of “--executor-cores” is controlled by Spark itself - Capacity Scheduler with DominantResourceCalculator - CPU will also take into account when calculating - Container vcores = executor corescontainer cores <= nodemanger.resource.cpu-vcores
  • 12. Isolate Container Resource Containers should only be allowed to use resource they get allocated, they should not be affected by other containers on the node How do we ensure containers don’t exceed their vcore allocation? What’s stopping an errant container from spawning bunch of threads and consume all the CPU on the node? CGroups With the setting of LinuxContainerExecutor and others, YARN could enable CGroups to constrain the CPU usage (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html).
  • 13. Label Based Scheduling How to specify applications to run on specific nodes? Label based scheduling is what you want. To use it: - Enable node label and label scheduling in YARN side (Hadoop 2.6+) - Configure node label expression in Spark conf: - spark.yarn.am.nodeLabelExpression - spark.yarn.executor.nodeLabelExpression
  • 14. Dynamic Resource Allocation How to use the resource more effectively and more resiliently? Spark supports dynamically requesting or releasing executors according to the current load of jobs. This is especially useful for long-running applications like Spark shell, Thrift Server, Zeppelin. To Enable Dynamic Resource Allocation spark.streaming.dynamicAllocation.enabled true spark.shuffle.service.enabled true <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>
  • 15. Resilient to Service Restart/Failure  Resilient to ResourceManager restart/failure - RM is a single point of failure - Configuring yarn.resourcemanager.ha.enabled to enable RM HA - Enable yarn.resourcemanager.recovery.enabled to be resilient to RM change or restart - Non work preserving RM restart (Hadoop 2.4) - Work preserving RM restart (Hadoop 2.6)  Resilient to NodeManager restart - Enable yarn.nodemanager.recovery.enabled to be resilient to NM restart (Hadoop 2.6)
  • 16. Access Kerberized Environment  Spark on YARN supports accessing Kerberized Hadoop environment by Kerberos.  It will automatically get tokens from NN if the Hadoop environment is security enabled.  To retrieve the delegation tokens for non-HDFS services when security is enabled, configure spark.yarn.security.tokens.{hive|hbase}.enabled.  You could also specify --principal and --keytab to let Spark on YARN to do kinit and token renewal automatically, this is useful for long running service like Spark Streaming.
  • 17. Fast Debug YARN Application  What we usually meet when running Spark on YARN? - Classpath problem - Java parameters does not work - Configuration doesn‘t take affect. • yarn.nodemanager.delete.debug-delay-sec • ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}/container_${contid}
  • 18. Future works of Spark On YARN
  • 19. More Advanced Dynamic Resource Allocation  The problem of current algorithm: - Current algorithm is based on the load of tasks, the more tasks submitted the more executor will be requested. This will introduce resource starvation for other applications. - Current algorithm is an universal algorithm doesn't consider the specific features of cluster manager.  To improve current dynamic resource allocation: - Make current algorithm pluggable, to be adapted to customized algorithms. - Incorporate more YARN specific features to improve the current algorithm - Container resizing - Cooperative preemption - …
  • 20. Integrate YARN Application Timeline Server  Timeline Server - Storage and retrieval of applications' current as well as historic information in a generic fashion is solved in YARN through the Timeline Server. This serves two responsibilities: - Generic information about completed applications - Per-framework information of running and completed applications  We’re working on integrating Spark’s history server with YARN application timeline server.  Already released with HDP, continue improving ATS in the yarn side to bring in better performance and stability.
  • 21. Pluggable Security Services  Current Problems: - How to access security services other than HDFS, HBase and Hive - How to support security keys other than Delegation Token  To improve current token management mechanism: - Provide a configurable security credential manager - Make security credential provider pluggable for different services. - SPARK-14743