SlideShare a Scribd company logo
1 of 22
Spark and YARN:
Better Together
Saisai Shao , Jeff Zhang
sshao@hortonworks.com, jzhang@hortonworks.com
August 6, 2016
Saisai Shao
 Member of Technical Staff @ Hortonworks
 Focus on Open Source Big Data area
 Active Apache Spark Contributor
Jeff Zhang
 Member of Technical Staff @ Hortonworks
 Apache Pig/Tez Committer and Spark Contributor
 Open source evangelist
Spark on YARN Recap
Cluster Manager
Overview of Spark Cluster
Driver
Executor ExecutorExecutor
NM NMNM
Spark Running On YARN (Client Mode)
Driver
ResourceManager
Client
Container
Executor
ContainerContainer
Executor Executor
Cont
ainer
AM
NM
NM NMNM
Spark Running On YARN (Cluster Mode)
ResourceManager
Client
Container ContainerContainer
Container
Executor Executor Executor
AM
Driver
Difference Compared to Other Cluster Manager
 Application has to be submitted into a queue
 Jars/files/archives are distributed through distributed cache
 Additional ApplicationMaster
 …
Better Run Spark On YARN
What Do We Concern About ?
 Better use the resources
 Better run on cluster
 Easy to debug
Calculate Container Size
 What is the size of a Container ?
- Memory
- CPU#
spark.executor.memory
spark.yarn.executor.memoryOverhead (offheap)
spark.memory.fraction (0.75)
spark.memory.storage
Fraction
(0.5)
execution
memory
spark.memory.offHeap.size
# Depend on CPU scheduling is enabled or not
container memory = spark executor memory + overhead memory
yarn.scheduler.minimum-allocation-mb <= container memory
<= yarn.nodemanager.resource.memory-mb
container memory will be round to yarn.scheduler.increment-allocation-mb
Calculate Container Size (Cont’d)
 Enable CPU Scheduling
- Capacity Scheduler with DefaultResourceCalculator
(default)
- Only takes memory into account
- CPU requirements are ignored when carrying out
allocations
- The setting of “--executor-cores” is controlled by Spark
itself
- Capacity Scheduler with DominantResourceCalculator
- CPU will also take into account when calculating
- Container vcores = executor corescontainer cores <= nodemanger.resource.cpu-vcores
Isolate Container Resource
Containers should only be allowed to use resource they get allocated, they
should not be affected by other containers on the node
How do we ensure containers don’t exceed their vcore allocation?
What’s stopping an errant container from spawning bunch of threads and consume
all the CPU on the node?
CGroups
With the setting of LinuxContainerExecutor and others, YARN could enable CGroups to constrain the
CPU usage (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html).
Label Based Scheduling
How to specify applications to run on specific nodes?
Label based scheduling is what you want.
To use it:
- Enable node label and label scheduling in YARN side (Hadoop 2.6+)
- Configure node label expression in Spark conf:
- spark.yarn.am.nodeLabelExpression
- spark.yarn.executor.nodeLabelExpression
Dynamic Resource Allocation
How to use the resource more effectively and more resiliently?
Spark supports dynamically requesting or releasing executors according to the current load of jobs.
This is especially useful for long-running applications like Spark shell, Thrift Server, Zeppelin.
To Enable Dynamic Resource Allocation
spark.streaming.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
Resilient to Service Restart/Failure
 Resilient to ResourceManager restart/failure
- RM is a single point of failure
- Configuring yarn.resourcemanager.ha.enabled to
enable RM HA
- Enable yarn.resourcemanager.recovery.enabled
to be resilient to RM change or restart
- Non work preserving RM restart (Hadoop 2.4)
- Work preserving RM restart (Hadoop 2.6)
 Resilient to NodeManager restart
- Enable yarn.nodemanager.recovery.enabled to
be resilient to NM restart (Hadoop 2.6)
Access Kerberized Environment
 Spark on YARN supports accessing Kerberized Hadoop environment by Kerberos.
 It will automatically get tokens from NN if the Hadoop environment is security
enabled.
 To retrieve the delegation tokens for non-HDFS services when security is enabled,
configure spark.yarn.security.tokens.{hive|hbase}.enabled.
 You could also specify --principal and --keytab to let Spark on YARN to do kinit and
token renewal automatically, this is useful for long running service like Spark
Streaming.
Fast Debug YARN Application
 What we usually meet when running Spark on YARN?
- Classpath problem
- Java parameters does not work
- Configuration doesn‘t take affect.
• yarn.nodemanager.delete.debug-delay-sec
• ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}/container_${contid}
Future works of Spark On YARN
More Advanced Dynamic Resource Allocation
 The problem of current algorithm:
- Current algorithm is based on the load of tasks, the more tasks submitted the
more executor will be requested. This will introduce resource starvation for other
applications.
- Current algorithm is an universal algorithm doesn't consider the specific features
of cluster manager.
 To improve current dynamic resource allocation:
- Make current algorithm pluggable, to be adapted to customized algorithms.
- Incorporate more YARN specific features to improve the current algorithm
- Container resizing
- Cooperative preemption
- …
Integrate YARN Application Timeline Server
 Timeline Server - Storage and retrieval of applications' current as well as historic
information in a generic fashion is solved in YARN through the Timeline Server. This
serves two responsibilities:
- Generic information about completed applications
- Per-framework information of running and completed applications
 We’re working on integrating Spark’s history server with YARN application timeline
server.
 Already released with HDP, continue improving ATS in the yarn side to bring in
better performance and stability.
Pluggable Security Services
 Current Problems:
- How to access security services other than HDFS, HBase and Hive
- How to support security keys other than Delegation Token
 To improve current token management mechanism:
- Provide a configurable security credential manager
- Make security credential provider pluggable for different services.
- SPARK-14743
ThanksThanks

More Related Content

What's hot

Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLJulien Pierre
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersSpark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersshareddatamsft
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
Koalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache SparkKoalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache SparkDatabricks
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Clusterphanleson
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Databricks
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Application Timeline Server Past, Present and Future
Application Timeline Server  Past, Present and FutureApplication Timeline Server  Past, Present and Future
Application Timeline Server Past, Present and FutureNaganarasimha Garla
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Spark Summit
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
 

What's hot (20)

Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQL
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Spark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clustersSpark in yarn managed multi-tenant clusters
Spark in yarn managed multi-tenant clusters
 
Module01
 Module01 Module01
Module01
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Koalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache SparkKoalas: Interoperability Between Koalas and Apache Spark
Koalas: Interoperability Between Koalas and Apache Spark
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
 
Apache spark
Apache sparkApache spark
Apache spark
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhus...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Application Timeline Server Past, Present and Future
Application Timeline Server  Past, Present and FutureApplication Timeline Server  Past, Present and Future
Application Timeline Server Past, Present and Future
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 

Similar to Spark & Yarn better together 1.2

Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Xuan-Chao Huang
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAmazon Web Services
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsDataWorks Summit
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your EyesDemi Ben-Ari
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanSpark Summit
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPOmkar Joshi
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16MLconf
 

Similar to Spark & Yarn better together 1.2 (20)

Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Spark & Yarn better together 1.2

  • 1. Spark and YARN: Better Together Saisai Shao , Jeff Zhang sshao@hortonworks.com, jzhang@hortonworks.com August 6, 2016
  • 2. Saisai Shao  Member of Technical Staff @ Hortonworks  Focus on Open Source Big Data area  Active Apache Spark Contributor Jeff Zhang  Member of Technical Staff @ Hortonworks  Apache Pig/Tez Committer and Spark Contributor  Open source evangelist
  • 4. Cluster Manager Overview of Spark Cluster Driver Executor ExecutorExecutor
  • 5. NM NMNM Spark Running On YARN (Client Mode) Driver ResourceManager Client Container Executor ContainerContainer Executor Executor Cont ainer AM
  • 6. NM NM NMNM Spark Running On YARN (Cluster Mode) ResourceManager Client Container ContainerContainer Container Executor Executor Executor AM Driver
  • 7. Difference Compared to Other Cluster Manager  Application has to be submitted into a queue  Jars/files/archives are distributed through distributed cache  Additional ApplicationMaster  …
  • 9. What Do We Concern About ?  Better use the resources  Better run on cluster  Easy to debug
  • 10. Calculate Container Size  What is the size of a Container ? - Memory - CPU# spark.executor.memory spark.yarn.executor.memoryOverhead (offheap) spark.memory.fraction (0.75) spark.memory.storage Fraction (0.5) execution memory spark.memory.offHeap.size # Depend on CPU scheduling is enabled or not container memory = spark executor memory + overhead memory yarn.scheduler.minimum-allocation-mb <= container memory <= yarn.nodemanager.resource.memory-mb container memory will be round to yarn.scheduler.increment-allocation-mb
  • 11. Calculate Container Size (Cont’d)  Enable CPU Scheduling - Capacity Scheduler with DefaultResourceCalculator (default) - Only takes memory into account - CPU requirements are ignored when carrying out allocations - The setting of “--executor-cores” is controlled by Spark itself - Capacity Scheduler with DominantResourceCalculator - CPU will also take into account when calculating - Container vcores = executor corescontainer cores <= nodemanger.resource.cpu-vcores
  • 12. Isolate Container Resource Containers should only be allowed to use resource they get allocated, they should not be affected by other containers on the node How do we ensure containers don’t exceed their vcore allocation? What’s stopping an errant container from spawning bunch of threads and consume all the CPU on the node? CGroups With the setting of LinuxContainerExecutor and others, YARN could enable CGroups to constrain the CPU usage (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html).
  • 13. Label Based Scheduling How to specify applications to run on specific nodes? Label based scheduling is what you want. To use it: - Enable node label and label scheduling in YARN side (Hadoop 2.6+) - Configure node label expression in Spark conf: - spark.yarn.am.nodeLabelExpression - spark.yarn.executor.nodeLabelExpression
  • 14. Dynamic Resource Allocation How to use the resource more effectively and more resiliently? Spark supports dynamically requesting or releasing executors according to the current load of jobs. This is especially useful for long-running applications like Spark shell, Thrift Server, Zeppelin. To Enable Dynamic Resource Allocation spark.streaming.dynamicAllocation.enabled true spark.shuffle.service.enabled true <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>
  • 15. Resilient to Service Restart/Failure  Resilient to ResourceManager restart/failure - RM is a single point of failure - Configuring yarn.resourcemanager.ha.enabled to enable RM HA - Enable yarn.resourcemanager.recovery.enabled to be resilient to RM change or restart - Non work preserving RM restart (Hadoop 2.4) - Work preserving RM restart (Hadoop 2.6)  Resilient to NodeManager restart - Enable yarn.nodemanager.recovery.enabled to be resilient to NM restart (Hadoop 2.6)
  • 16. Access Kerberized Environment  Spark on YARN supports accessing Kerberized Hadoop environment by Kerberos.  It will automatically get tokens from NN if the Hadoop environment is security enabled.  To retrieve the delegation tokens for non-HDFS services when security is enabled, configure spark.yarn.security.tokens.{hive|hbase}.enabled.  You could also specify --principal and --keytab to let Spark on YARN to do kinit and token renewal automatically, this is useful for long running service like Spark Streaming.
  • 17. Fast Debug YARN Application  What we usually meet when running Spark on YARN? - Classpath problem - Java parameters does not work - Configuration doesn‘t take affect. • yarn.nodemanager.delete.debug-delay-sec • ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}/container_${contid}
  • 18. Future works of Spark On YARN
  • 19. More Advanced Dynamic Resource Allocation  The problem of current algorithm: - Current algorithm is based on the load of tasks, the more tasks submitted the more executor will be requested. This will introduce resource starvation for other applications. - Current algorithm is an universal algorithm doesn't consider the specific features of cluster manager.  To improve current dynamic resource allocation: - Make current algorithm pluggable, to be adapted to customized algorithms. - Incorporate more YARN specific features to improve the current algorithm - Container resizing - Cooperative preemption - …
  • 20. Integrate YARN Application Timeline Server  Timeline Server - Storage and retrieval of applications' current as well as historic information in a generic fashion is solved in YARN through the Timeline Server. This serves two responsibilities: - Generic information about completed applications - Per-framework information of running and completed applications  We’re working on integrating Spark’s history server with YARN application timeline server.  Already released with HDP, continue improving ATS in the yarn side to bring in better performance and stability.
  • 21. Pluggable Security Services  Current Problems: - How to access security services other than HDFS, HBase and Hive - How to support security keys other than Delegation Token  To improve current token management mechanism: - Provide a configurable security credential manager - Make security credential provider pluggable for different services. - SPARK-14743