Successfully reported this slideshow.

Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale with Chenzhao Guo and Carson Wang

4

Share

1 of 23
1 of 23

Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale with Chenzhao Guo and Carson Wang

4

Share

Download to read offline

Spark SQL is a very effective distributed SQL engine for OLAP and widely adopted in Baidu production for many internal BI projects. However, Baidu has also been facing many challenges for large scale including tuning the shuffle parallelism for thousands of jobs, inefficient execution plan, and handling data skew. In this talk, we will explore Intel and Baidu’s joint efforts to address challenges in large scale and offer an overview of an adaptive execution mode we implemented for Baidu’s Big SQL platform which is based on Spark SQL. At runtime, adaptive execution can change the execution plan to use a better join strategy and handle skewed join automatically. It can also change the number of reducer to better fit the data scale. In general, adaptive execution decreases the effort involved in tuning SQL query parameters and improves the execution performance by choosing a better execution plan and parallelism at runtime.

We’ll also share our experience of using adaptive execution in Baidu’s production cluster with thousands of server, where adaptive execution helps to improve the performance of some complex queries by 200%. After further analysis we found that several special scenarios in Baidu data analysis can benefit from the optimization of choosing better join type. We got 2x performance improvement in the scenario where the user wanted to analysis 1000+ advertisers’ cost from both web and mobile side and each side has a full information table with 10 TB parquet file per-day. Now we are writing probe jobs to detect more scenarios from current daily jobs of our users. We are also considering to expose the strategy interface based on the detailed metrics collected form adaptive execution mode for the upper users.

Spark SQL is a very effective distributed SQL engine for OLAP and widely adopted in Baidu production for many internal BI projects. However, Baidu has also been facing many challenges for large scale including tuning the shuffle parallelism for thousands of jobs, inefficient execution plan, and handling data skew. In this talk, we will explore Intel and Baidu’s joint efforts to address challenges in large scale and offer an overview of an adaptive execution mode we implemented for Baidu’s Big SQL platform which is based on Spark SQL. At runtime, adaptive execution can change the execution plan to use a better join strategy and handle skewed join automatically. It can also change the number of reducer to better fit the data scale. In general, adaptive execution decreases the effort involved in tuning SQL query parameters and improves the execution performance by choosing a better execution plan and parallelism at runtime.

We’ll also share our experience of using adaptive execution in Baidu’s production cluster with thousands of server, where adaptive execution helps to improve the performance of some complex queries by 200%. After further analysis we found that several special scenarios in Baidu data analysis can benefit from the optimization of choosing better join type. We got 2x performance improvement in the scenario where the user wanted to analysis 1000+ advertisers’ cost from both web and mobile side and each side has a full information table with 10 TB parquet file per-day. Now we are writing probe jobs to detect more scenarios from current daily jobs of our users. We are also considering to expose the strategy interface based on the detailed metrics collected form adaptive execution mode for the upper users.

More Related Content

Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale with Chenzhao Guo and Carson Wang

  1. 1. #SAISEco12 Carson Wang(Intel) Chenzhao Guo(Intel) carson.wang@intel.com chenzhao.guo@intel.com Spark SQL Adaptive Execution Unleashes the Power of Cluster in Large Scale #SAISEco12
  2. 2. #SAISEco12 About me • Chenzhao Guo • Big Data Engineer at Intel • Contributor of Spark*, OAP and HiBench • Github:gczsjdy 2 *Other names and brands may be claimed as the property of others.
  3. 3. #SAISEco12 Agenda • Challenges • Adaptive Execution • Performance 3
  4. 4. #SAISEco12 Agenda • Challenges • Adaptive Execution • Performance 4
  5. 5. #SAISEco12 Parallelism on reduce side • Influence performance but hard to tune • Manual efforts • Too small -> Spill, OOM • Too big -> Scheduling overhead, I/O requests • Single spark.sql.shuffle.partitions doesn’t fit all stages within an application 5
  6. 6. #SAISEco12 Join Strategy Selection • Spark* chooses Join implementation(influences performance) based on data size estimation • BroadcastHashJoin(when 1 side < broadcastThreshold) • SortMergeJoin • ShuffledHashJoin • Problem: output data size of an intermediate operator can’t be accurately anticipated while planning • The most efficient Join operator may not be employed 6 *Other names and brands may be claimed as the property of others.
  7. 7. #SAISEco12 Data Skew in Join • Some partitions data >> the others’ • Poor load balancing slows down the whole • Common ways to resolve it, but with limitation: • Increase spark.sql.shuffle.partitions • Increase BroadcastJoin threshold • Add prefix to the skewed keys 7 execution time task 0 task 1 task 2
  8. 8. #SAISEco12 Spark SQL* Execution Diagram 8 *Other names and brands may be claimed as the property of others.
  9. 9. #SAISEco12 Agenda • Challenges • Adaptive Execution • Performance 9
  10. 10. #SAISEco12 Core Idea • Transfer control (dicisions of physical plan & parallelism) from users and static to framework & runtime 10
  11. 11. #SAISEco12 Adaptive Execution during Planning 11 SortMerge Join Sort Sort Exchange Exchange … … SortMerge Join Sort Sort Exchange Exchange … … QueryStage QueryStageInput QueryStageInput QueryStage QueryStage Non-AE physical plan AE physical plan QueryStage: To take over control and make runtime decision, like altering SortMergeJoin -> BroadcastHashJoin QueryStageInput: After modifying the physical plans, need to embed new RDDs in some operators
  12. 12. #SAISEco12 Adaptive Execution during Execution 12 SortMerge Join Sort Sort Exchange Exchange … … QueryStage QueryStageInput QueryStageInput QueryStage QueryStage 1.Execute child stages & collect runtime information 2.Alter physical plan choice 3.Handle skewed Join 4.Determine reducer number execute
  13. 13. #SAISEco12 Adaptively Determine Number of Reducers • Enable the feature • spark.sql.adaptive.enabled -> true • Configurations • Target input size for a reducer • Min/Max reducer number 13
  14. 14. #SAISEco12 14 Adaptively Determine Number of Reducers • Target input size for reducer -> 64MB • Min-max reducer number -> 1-5 map side reduce side partition merging better load balancing reduced I/O…
  15. 15. #SAISEco12 Alter Join Implementation at Runtime 15 SortMerge Join Sort Sort QueryStage QueryStageInput QueryStageInput … … only 5M ! Reduced shuffle Broadcast HashJoin Broadcast Exchange QueryStage QueryStageInput QueryStageInput … … only 5M ! runtime altering
  16. 16. #SAISEco12 • Enable the feature • spark.sql.adaptive.skewedJoin.enabled -> true • Configuration • skewed factor F, skewed size S, skewed rowcount R • A partition is considered skewed iff • its size is larger than median partition size * F and also larger than S or • its rowcount is larger than median partition rowcount * F and also larger than R 16 Handle Skewed Join at Runtime
  17. 17. #SAISEco12 Handle Skewed Join at Runtime 17 better load balancing Partition 0 Partition 0 left table right table Join runtime translation Partition 0 left table Join partition 0 from mapper 1 partition 0 from mapper 2 partition 0 from mapper 0 right table Union Partition 1 Partition 2
  18. 18. #SAISEco12 Agenda • Challenges • Adaptive Execution • Performance 18
  19. 19. #SAISEco12 Performance in Baidu* • Performance boost: 50% ~ 200% • Certain BI scenario hitting SMJ ->BHJ rule • Long running application & use Spark as a service • GraphFrame 19 *Other names and brands may be claimed as the property of others.
  20. 20. #SAISEco12 Performance in Alibaba* Cloud • TPC-DS* 1TB • 1 master (32 Cores, 64GB RAM) + 6 slave(d1, 32 Cores, 64GB RAM) • Overall performance boost: 1.38x, and max performance boost: 3x • Already incorporated Adaptive Execution in their cloud service product 20 *Other names and brands may be claimed as the property of others.
  21. 21. #SAISEco12 Summary • Adaptively determine reducer number • Alter Join strategy selection at runtime • Handle skewed Join at runtime • More potential opportunies since Adaptive Execution provides a runtime optimization framework • https://github.com/Intel-bigdata/spark-adaptive 21
  22. 22. #SAISEco12 Thank you! 22
  23. 23. #SAISEco12 Legal Disclaimer No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others Copyright ©2018 Intel Corporation.

×