Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Spark Tuning for Enterprise
System Administrators
Anya T. Bida, PhD
Rachel B. Warren

Don't worry about missing something...
Presentation: http://www.slideshare.net/anyabida
Cheat-sheet: http://techsuppdiva.github.io/
!
!
Anya: https://www.linkedin.com/in/anyabida
Rachel: https://www.linkedin.com/in/rachelbwarren
!
!
  !2

About Anya About Rachel
Operations Engineer
!
!
!
Spark & Scala Enthusiast /
Data Engineer
About Alpine Data
!
alpinenow.com
Alpine deploys Spark in Production
for our Enterprise Customers

About You*
Intermittent
Reliable
Optimal
Enterprise System Administrators
mySparkApp Success
*

Intermittent
Reliable
Optimal
mySparkApp Success

Default != Recommended
Example: By default, spark.executor.memory = 1g
1g allows small jobs to finish out of the box.
Spark assumes you'll increase this parameter. 
!6

Which parameters are important?
!
How do I configure them?
!7
Default != Recommended

Filter* data
before an
expensive reduce
or aggregation
consider*
coalesce(
Use* data
structures that
require less
memory
Serialize*
PySpark
serializing
is built-in
Scala/
Java?
persist(storageLevel.[*]_SER)
Recommended:
kryoserializer *
tuning.html#tuning-
data-structures
See "Optimize partitions."
*
See "GC investigation." *
See "Checkpointing." *
The Spark Tuning Cheat-Sheet

Intermittent
Reliable
Optimal
mySparkApp Success
mySparkApp memory issues
Shared Cluster

Fair Schedulers
!12
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<maxRunningApps>10</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations> 
<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight> 
<minShare>2</minShare> 
</pool> 
</allocations>

Fair Schedulers
!13
YARN
<allocations>
<queue name="sample_queue">
<minResources>4000 mb,0vcores</minResources>
<maxResources>8000 mb,8vcores</maxResources>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
</allocations>
SPARK
<allocations> 
<pool name="sample_queue">
<schedulingMode>FAIR</sch
<weight>1</weight> 
<minShare>2</minShare> 
</pool> 
</allocations>
Configure these parameters too!

Fair Schedulers
!14
YARN
<allocations>
<user name="sample_user">
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
!
</allocations>

What is the memory limit for
mySparkApp?
!15

!16
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
!
!
<maxResources>8000 mb</maxResources>
Limitation
mySparkApp?
Reserve 25% for overhead.

!18
!
mySparkApp_mem_limit = driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
mySparkApp?

!19
!
mySparkApp?Limitation: Each driver
and executor must not be larger than a
single node.
Limitation: Driver and
executor memory must not be larger than
a single node.
!
(yarn.nodemanager.resource.memory-mb - 1Gb)
executor.memory ~
# executors per node
Limitation

!20
!
Limitation: maxExecutors should
not exceed pool allocation.
!
Yarn: <maxResources>8vcores</maxResources>
Limitation
mySparkApp?

!21
I want a little more information...
Top 5 Mistakes When Writing Spark Applications
by Mark Grover and Ted Malaska of Cloudera
http://www.slideshare.net/hadooparchbook/top-5-mistakes-when-writing-spark-applications
How-to: Tune Your Apache Spark Jobs (Part 2)
by Sandy Ryza of Cloudera
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
I want lots more...

Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?

mySparkApp. How?
here let's talk about one scenario

mySparkApp. How?
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *

limitations. How?
mySparkApp. How?

limitations. How?
here let's talk about one scenario

Symptoms:
!30
• mySparkApp is running for several hours
Container is lost.
• I notice one container fails, then the rest fail
one by one
• The first container to fail was the driver
• Driver is a SPOF

Investigate:
!31
collect unbounded data to the driver
• Driver failures are often caused by:
• I verified only bounded data is brought to the
driver, but still the driver fails intermittently.

Potential Solution: RDD.checkpoint()
!32
Use in these cases:
• high-traffic cluster
• network blips
• preemption
• disk space nearly full
!
!
Function:
• saves the RDD to stable
storage (eg hdfs or S3)
How-to:
SparkContext.setCheckpointDir(directory: String)
RDD.checkpoint()

Intermittent
Reliable
Optimal
mySparkApp Success
Shared Cluster
Instead of 2.5 hours, myApp
completes in 1 hour.

Cheat-sheet
techsuppdiva.github.io/

Intermittent
Reliable
Optimal
mySparkApp Success
Shared Cluster
HighPerformanceSpark.com

Further Reading:
• Learning Spark, by H. Karau, A. Konwinski, P. Wendell, M. Zaharia, 2015, O'Reilly 
https://databricks.com/blog/2015/02/09/learning-spark-book-available-from-oreilly.html
• Scheduling: 
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
• Tuning the Spark Conf: 
Mark Grover and Ted Malaska from Cloudera 
http://www.slideshare.net/hadooparchbook/top-5-mistakes-when-writing-spark-applications 
Sandy Ryza (Cloudera) 
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
• Checkpointing: 
http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing
• Troubleshooting: 
Miklos Christine from Databricks  
https://spark-summit.org/east-2016/events/operational-tips-for-deploying-spark/
• High Performance Spark by R. Warren, H. Karau, coming in 2016, O'Reilly 
http://highperformancespark.com/
!36

More Questions?
!37
Presentation: http://www.slideshare.net/anyabida
Cheat-sheet: http://techsuppdiva.github.io/
!
!
Anya: https://www.linkedin.com/in/anyabida
Rachel: https://www.linkedin.com/in/rachelbwarren
!
!

Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Similar to Spark Tuning For Enterprise System Administrators, Spark Summit East 2016 (20)

Recently uploaded

Recently uploaded (20)

Spark Tuning For Enterprise System Administrators, Spark Summit East 2016