Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Best Practices for Enabling Speculative Execution on Large Scale Platforms

Download to read offline

Apache Spark has the ‘speculative execution’ feature to handle the slow tasks in a stage due to environment issues like slow network, disk etc. If one task is running slowly in a stage, Spark driver can launch a speculation task for it on a different host. Between the regular task and its speculation task, Spark system will later take the result from the first successfully completed task and kill the slower one.



When we first enabled the speculation feature for all Spark applications by default on a large cluster of 10K+ nodes at LinkedIn, we observed that the default values set for Spark’s speculation configuration parameters did not work well for LinkedIn’s batch jobs. For example, the system launched too many fruitless speculation tasks (i.e. tasks that were killed later). Besides, the speculation tasks did not help shorten the shuffle stages. In order to reduce the number of fruitless speculation tasks, we tried to find out the root cause, enhanced Spark engine, and tuned the speculation parameters carefully. We analyzed the number of speculation tasks launched, number of fruitful versus fruitless speculation tasks, and their corresponding cpu-memory resource consumption in terms of gigabytes-hours. We were able to reduce the average job response times by 13%, decrease the standard deviation of job elapsed times by 40%, and lower total resource consumption by 24% in a heavily utilized multi-tenant environment on a large cluster. In this talk, we will share our experience on enabling the speculative execution to achieve good job elapsed time reduction at the same time keeping a minimal overhead.

  • Be the first to like this

Best Practices for Enabling Speculative Execution on Large Scale Platforms

  1. 1. Best Practices for Enabling Speculative Execution on Large Scale Platforms Ron Hu, Venkata Sowrirajan LinkedIn
  2. 2. Agenda ▪ Motivation ▪ Enhancements ▪ Configuration ▪ Metrics and analysis ▪ User guidance ▪ Future work
  3. 3. Speculative Execution • A stage consists of many parallel tasks and the stage can run as fast as the slowest task runs. • If one task is running very slowly in a stage, Spark driver will re-launch a speculation task for it on a different host. • Between the regular task and the speculation task, whichever task finishes first is used. The slower task is killed. 3 ❌ ✅ Launch speculative task Time
  4. 4. Default Speculation Parameters 4 Configuration Parameters Default value Meaning spark.speculation false If set to "true", performs speculative execution of tasks. spark.speculation.interval 100ms How often Spark will check for tasks to speculate. spark.speculation.multiplier 1.5 How many times slower a task is than the median to be considered for speculation. spark.speculation.quantile 0.75 Fraction of tasks which must be complete before speculation is enabled for a particular stage.
  5. 5. Motivation • Speeds up straggler tasks - additional overhead. • Default configs are generally too aggressive in most cases. • Speculating tasks that run for few seconds are mostly wasteful. • Investigate impact caused by data skews, overloaded shuffle services etc with speculation enabled. • What is the impact if we enable speculation by default in a multi-tenant large scale cluster? 5
  6. 6. Speculative Execution improvements • Tasks run for few seconds gets speculated wasting resources unnecessarily. • Solution: Prevent tasks getting speculated which run for few seconds • Internally, introduced a new spark configuration (spark.speculation.minRuntimeThreshold) which prevents tasks from getting speculated that runs for less than the min threshold time. • Similar feature later got added to Apache Spark in SPARK-33741 6
  7. 7. Speculative execution metrics • Additional metrics are required to understand both the usefulness and overhead introduced by speculative execution. • Existing onTaskEnd and onTaskStart event in AppStatusListener is enriched to produce speculation summary metrics for a stage. 7
  8. 8. Speculative execution metrics • Added additional metrics for a stage with speculative execution like: • Number of speculated tasks • Number of successful speculated tasks • Number of killed speculated tasks • Number of failed speculated tasks 8
  9. 9. Speculative execution metrics • Speculation summary for a stage with additional metrics using the existing events. 9
  10. 10. Updated Speculation Parameter Values • Upstream Spark’s default speculation parameter values are not good for us. • LinkedIn’s Spark jobs are mainly for batch off-line jobs plus some interactive analytics workloads. • We set speculation parameters to these default values for most LinkedIn’s applications. Users can still overwrite per their individual needs. 10 Configuration Parameters Upstream Default LinkedIn Default spark.speculation false true spark.speculation.interval 100ms 1 sec spark.speculation.multiplier 1.5 4.0 spark.speculation.quantile 0.75 0.90 spark.speculation.min.threshold N/A 30 sec
  11. 11. Metrics and Analysis • We care about ROI (Return On Investment). • We analyzed • The return or performance gain, and • The investment/overhead or additional cost • We measured various metrics for one week on a large cluster with 10K+ machines. • A multi-tenant environment with 40K+ Spark applications running daily. • Enabled dynamic allocations. • With resource sharing and contention, performance varies due to transient delays/congestions. 11
  12. 12. Task Level Statistics 12 1.24% 0.32% 60% Duration delta Success rate Additional tasks Ratio of all launched speculation tasks over all tasks Speculated tasks success rate Ratio of duration of all speculation tasks over duration of all regular tasks 1.65M 2.73M Fruitful tasks Speculated tasks Total number of the launched speculation tasks Total number of fruitful speculation tasks ● A speculation task is fruitful if it finishes earlier than the corresponding regular task. ● The conservative values in the config parameters leads to high success rate.
  13. 13. Stage Level Statistics 447K 184K 140K Total eligible stages Stages with speculation tasks Stages with fruitful speculation tasks ● A stage is eligible for speculation if its duration > 30 seconds with at least 10 tasks. ● 41% of them launched speculation tasks ● Among those stages that launched speculation tasks, 76% of them received performance benefits. Stages Fruitful Stages Speculated stages
  14. 14. Application Level Statistics 157K 59K 51K Total applications Applications with speculation tasks Applications with fruitful speculation tasks ● 38% of all Spark applications launched speculation tasks. ● 87% of them benefit from the speculative execution. ● Overall 32% of all Spark applications benefit from the speculation execution. Applications Fruitful apps Speculated apps
  15. 15. Case Study • We analyzed the impact on a mission critical application. • It has a total of 29 Spark application flows. • Some Spark flows run daily. Some run hourly. • Each flow has a well defined SLA. • We took measures of all the flows for • two weeks before enabling speculation, and • two weeks after enabling speculation. 15
  16. 16. Number in Minutes BEFORE enabling AFTER enabling After/Before ratio Geometric mean of average elapsed times of all flows 7.44 6.47 87% (or decreased by 13%) Geometric mean of standard deviation of elapsed times for all flows 2.91 1.71 59%(or decreased by 41%)
  17. 17. Resource Consumption Impact   17 Decrease by 24%
  18. 18. User Guidance: Where speculation can help • A mapper task is slow because the running executor is too busy and/or some system hangs due to hardware/software issues. • We used to see ‘run-away’ tasks sometimes due to some system hang issues. • After enabling speculation, we rarely see ‘run-away’ tasks. • The ‘run-away’ tasks were later killed since their corresponding speculation tasks finished earlier. • The network route is congested somewhere. • There exists another data copy. • The regular task normally will reach the ‘NODE_LOCAL’/’RACK_LOCAL” copy. The speculation task usually reaches the ‘ANY’ data copy • If the initial task was launched suboptimally, its speculative task can have better locality. 18
  19. 19. User Guidance: Where speculation cannot help • Data skew • Overload of shuffle services causing reducer task delays • Not enough memory causing tasks to spill. • Spark driver does not know the root cause why a task is slow when it launches a speculation task. 19
  20. 20. Summary • At LinkedIn, we further enhanced Spark engine to monitor speculation statistics. • We shared our configuration settings to effectively manage speculative executions. • Depending on your performance goal, you need to decide how much overhead you can tolerate. • ROI if speculation parameters are properly set: • I: small increase in network messages • I: small overhead in Spark Driver • R: good saving in executor resources • R: good reduction in job elapsed times • R: significant reduction in the variation of elapsed times, leading to a more predictable/consistent performance.
  21. 21. Future Work • Add intelligence to Spark driver to decide whether or not to launch speculation tasks. • Distinguish between the manageable/unmanageable causes. • On the cloud, we may have unlimited resources. However, we may need to factor in the money cost. • What is the cost in launching additional executors? 21
  22. 22. Acknowledgement We want to thank ▪ Eric Baldeschweiler ▪ Sunitha Beeram ▪ LinkedIn Spark Team for their enlightening discussions and insightful comments.
  23. 23. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Apache Spark has the ‘speculative execution’ feature to handle the slow tasks in a stage due to environment issues like slow network, disk etc. If one task is running slowly in a stage, Spark driver can launch a speculation task for it on a different host. Between the regular task and its speculation task, Spark system will later take the result from the first successfully completed task and kill the slower one. When we first enabled the speculation feature for all Spark applications by default on a large cluster of 10K+ nodes at LinkedIn, we observed that the default values set for Spark’s speculation configuration parameters did not work well for LinkedIn’s batch jobs. For example, the system launched too many fruitless speculation tasks (i.e. tasks that were killed later). Besides, the speculation tasks did not help shorten the shuffle stages. In order to reduce the number of fruitless speculation tasks, we tried to find out the root cause, enhanced Spark engine, and tuned the speculation parameters carefully. We analyzed the number of speculation tasks launched, number of fruitful versus fruitless speculation tasks, and their corresponding cpu-memory resource consumption in terms of gigabytes-hours. We were able to reduce the average job response times by 13%, decrease the standard deviation of job elapsed times by 40%, and lower total resource consumption by 24% in a heavily utilized multi-tenant environment on a large cluster. In this talk, we will share our experience on enabling the speculative execution to achieve good job elapsed time reduction at the same time keeping a minimal overhead.

Views

Total views

139

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

7

Shares

0

Comments

0

Likes

0

×