Best Practices for Enabling Speculative Execution on Large Scale Platforms

Best Practices for Enabling
Speculative Execution on
Large Scale Platforms
Ron Hu, Venkata Sowrirajan
LinkedIn

Agenda
▪ Motivation
▪ Enhancements
▪ Conﬁguration
▪ Metrics and analysis
▪ User guidance
▪ Future work

Speculative Execution
• A stage consists of many parallel
tasks and the stage can run as fast as
the slowest task runs.
• If one task is running very slowly in a
stage, Spark driver will re-launch a
speculation task for it on a different host.
• Between the regular task and the
speculation task, whichever task ﬁnishes
ﬁrst is used. The slower task is killed.
3
❌
✅
Launch
speculative task
Time

Default Speculation Parameters
4
Conﬁguration
Parameters
Default
value
Meaning
spark.speculation false If set to "true", performs speculative execution of
tasks.
spark.speculation.interval 100ms How often Spark will check for tasks to speculate.
spark.speculation.multiplier 1.5 How many times slower a task is than the median to
be considered for speculation.
spark.speculation.quantile 0.75 Fraction of tasks which must be complete before
speculation is enabled for a particular stage.

Motivation
• Speeds up straggler tasks - additional overhead.
• Default conﬁgs are generally too aggressive in most
cases.
• Speculating tasks that run for few seconds are
mostly wasteful.
• Investigate impact caused by data skews,
overloaded shuffle services etc with speculation
enabled.
• What is the impact if we enable speculation by
default in a multi-tenant large scale cluster?
5

Speculative Execution improvements
• Tasks run for few seconds gets speculated wasting
resources unnecessarily.
• Solution: Prevent tasks getting speculated which run
for few seconds
• Internally, introduced a new spark configuration
(spark.speculation.minRuntimeThreshold) which prevents
tasks from getting speculated that runs for less than the min threshold
time.
• Similar feature later got added to Apache Spark in
SPARK-33741
6

Speculative execution metrics
• Additional metrics are required to understand both the
usefulness and overhead introduced by speculative
execution.
• Existing onTaskEnd and onTaskStart event in
AppStatusListener is enriched to produce speculation
summary metrics for a stage.
7

• Added additional metrics for a stage with
speculative execution like:
• Number of speculated tasks
• Number of successful speculated tasks
• Number of killed speculated tasks
• Number of failed speculated tasks
8

• Speculation summary for a stage with additional metrics using
the existing events.
9

Updated Speculation Parameter Values
• Upstream Spark’s default speculation parameter values are not good for us.
• LinkedIn’s Spark jobs are mainly for batch off-line jobs plus some interactive analytics
workloads.
• We set speculation parameters to these default values for most LinkedIn’s applications.
Users can still overwrite per their individual needs.
10
Conﬁguration Parameters
Upstream
Default
LinkedIn
Default
spark.speculation false true
spark.speculation.interval 100ms 1 sec
spark.speculation.multiplier 1.5 4.0
spark.speculation.quantile 0.75 0.90
spark.speculation.min.threshold N/A 30 sec

Metrics and Analysis
• We care about ROI (Return On Investment).
• We analyzed
• The return or performance gain, and
• The investment/overhead or additional cost
• We measured various metrics for one week on a large
cluster with 10K+ machines.
• A multi-tenant environment with 40K+ Spark applications running daily.
• Enabled dynamic allocations.
• With resource sharing and contention, performance varies due to transient
delays/congestions.
11

Task Level Statistics
12
1.24% 0.32% 60%
Duration
delta
Success
rate
Additional
tasks
Ratio of all
launched
speculation
tasks over
all tasks
Speculated
tasks success
rate
Ratio of
duration of all
speculation
tasks over
duration of all
regular tasks
1.65M
2.73M
Fruitful
tasks
Speculated
tasks
Total number
of the
launched
speculation
tasks
Total number
of fruitful
speculation
tasks
● A speculation task is
fruitful if it ﬁnishes
earlier than the
corresponding regular
task.
● The conservative values
in the conﬁg parameters
leads to high success
rate.

Stage Level Statistics
447K 184K 140K
Total
eligible
stages
Stages with
speculation
tasks
Stages with
fruitful
speculation
tasks
● A stage is eligible for
speculation if its duration > 30
seconds with at least 10 tasks.
● 41% of them launched
speculation tasks
● Among those stages that
launched speculation tasks,
76% of them received
performance beneﬁts.
Stages Fruitful Stages
Speculated stages

Application Level Statistics
157K 59K 51K
Total
applications
Applications
with
speculation
tasks
Applications
with fruitful
speculation
tasks
● 38% of all Spark applications
launched speculation tasks.
● 87% of them beneﬁt from the
speculative execution.
● Overall 32% of all Spark
applications beneﬁt from the
speculation execution.
Applications Fruitful apps
Speculated apps

Case Study
• We analyzed the impact on a mission critical
application.
• It has a total of 29 Spark application flows.
• Some Spark flows run daily. Some run hourly.
• Each flow has a well defined SLA.
• We took measures of all the flows for
• two weeks before enabling speculation, and
• two weeks after enabling speculation.
15

Number in Minutes BEFORE enabling AFTER enabling After/Before ratio
Geometric mean of average
elapsed times of all ﬂows
7.44 6.47 87% (or decreased by
13%)
Geometric mean of standard
deviation of elapsed times for
all ﬂows
2.91 1.71 59%(or decreased by
41%)

Resource Consumption Impact

17
Decrease
by 24%

User Guidance: Where speculation can help
• A mapper task is slow because the running executor is too busy and/or
some system hangs due to hardware/software issues.
• We used to see ‘run-away’ tasks sometimes due to some system hang issues.
• After enabling speculation, we rarely see ‘run-away’ tasks.
• The ‘run-away’ tasks were later killed since their corresponding speculation tasks
ﬁnished earlier.
• The network route is congested somewhere.
• There exists another data copy.
• The regular task normally will reach the ‘NODE_LOCAL’/’RACK_LOCAL” copy. The
speculation task usually reaches the ‘ANY’ data copy
• If the initial task was launched suboptimally, its speculative task can have better
locality.
18

User Guidance: Where speculation cannot help
• Data skew
• Overload of shuffle services causing reducer task
delays
• Not enough memory causing tasks to spill.
• Spark driver does not know the root cause why a task
is slow when it launches a speculation task.
19

Summary
• At LinkedIn, we further enhanced Spark engine to monitor
speculation statistics.
• We shared our conﬁguration settings to effectively manage
speculative executions.
• Depending on your performance goal, you need to decide how much
overhead you can tolerate.
• ROI if speculation parameters are properly set:
• I: small increase in network messages
• I: small overhead in Spark Driver
• R: good saving in executor resources
• R: good reduction in job elapsed times
• R: significant reduction in the variation of elapsed times, leading to a
more predictable/consistent performance.

Future Work
• Add intelligence to Spark driver to decide whether or
not to launch speculation tasks.
• Distinguish between the manageable/unmanageable causes.
• On the cloud, we may have unlimited resources.
However, we may need to factor in the money cost.
• What is the cost in launching additional executors?
21

Acknowledgement
We want to thank
▪ Eric Baldeschweiler
▪ Sunitha Beeram
▪ LinkedIn Spark Team
for their enlightening discussions
and insightful comments.

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Best Practices for Enabling Speculative Execution on Large Scale Platforms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Best Practices for Enabling Speculative Execution on Large Scale Platforms

Similar to Best Practices for Enabling Speculative Execution on Large Scale Platforms (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Best Practices for Enabling Speculative Execution on Large Scale Platforms