Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle

Bucketing is commonly used in Hive and Spark SQL to improve performance by eliminating Shuffle in Join or group-by-aggregate scenario. This is ideal for a variety of write-once and read-many datasets at Bytedance.

  • Be the first to comment

Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle

  1. 1. Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle Guo, Jun (jason.guo.vip@gmail.com) Lead of Data Engine Team, ByteDance
  2. 2. Who we are o Data Engine team of ByteDance o Build a platform of one-stop experience for OLAP , on which users can analyze PB level data by writing SQL without caring about the underlying execution engine
  3. 3. What we do o Manage Spark SQL / Presto / Hive workload o Offer Open API and self-serve platform o Optimize Spark SQL / Presto / Hive engine o Design data architecture for most business lines in ByteDance
  4. 4. Agenda ▪ Spark SQL at ByteDance ▪ What is Bucketing ▪ Spark Bucketing Limitations ▪ Bucketing Optimizations at ByteDance
  5. 5. Spark SQL at ByteDance
  6. 6. Spark SQL at ByteDance 2016 2017 2018 2019 2020 Small Scale Experiments Ad-hoc workload Few ETL pipelines in production Full-production deployment Main engine in DW area
  7. 7. What is Bucketing
  8. 8. What is bucketing ▪ Create Bucketed Table CREATE TABLE order ( order_id long, user_id long, product long, amount long ) using parquet clustered by (user_id) sorted by (user_id) into 1024 buckets location ‘/user/warehouse/test.db/order’ CREATE TABLE order ( order_id long, user_id long, product long, amount long ) clustered by (user_id) sorted by (user_id) into 1024 buckets stored as parquet location ‘/user/warehouse/test.db/order’
  9. 9. What is bucketing ▪ Insert into Bucketed Table INSERT INTO order SELECT order_id, user_id, product, amount FROM order_staging
  10. 10. What is bucketing ▪ ShuffledHashJoin Exchange(user_id) Exchange(user_id) TableScan(order) TableScan(user) ShuffleHashJoin
  11. 11. What is bucketing ▪ ShuffledHashJoin with Bucketing TableScan(order) TableScan(user) ShuffleHashJoin There is no Exchange since both tables are pre-shuffle on user_id
  12. 12. What is bucketing ▪ SortMergeJoin Exchange(user_id) Exchange(user_id) TableScan(order) TableScan(user) SortMergeJoin Sort(user_id)Sort(user_id)
  13. 13. What is bucketing ▪ SortMergeJoin with Bucketing TableScan(order) TableScan(user) SortMergeJoin There is no Exchange since both tables are pre-shuffle and pre-sort on join key(user_id)
  14. 14. Spark Bucketing Limitations
  15. 15. Spark Bucketing Limitations ▪ Small files hdfs dfs –ls /user/warehouse/test.db/order/_temporary/0/_temporary/ attempt_20200519145628_0014_m_000014_0 | wc –l 988 INSERT INTO order SELECT order_id, user_id, product, amount FROM order_staging Each task will generate up to 1024 small files. 1024 is the bucket number There are up to 1024 * M small files in total. M is task number When M is 1024, there will be up to 1 million small files
  16. 16. Spark Bucketing Limitations ▪ Small files INSERT INTO order SELECT order_id, user_id, product, amount FROM order_staging DISTRIBUTE BY user_id There are up to 1024 files when 1024 is multiple of M M equals to spark.sql.shuffle.partitions
  17. 17. Spark Bucketing Limitations ▪ Small files INSERT INTO order SELECT order_id, user_id, product, amount FROM order_staging DISTRIBUTE BY user_id There are up to M files when M is multiple of 1024 M equals to spark.sql.shuffle.partitions
  18. 18. Spark Bucketing Limitations ▪ Incompatible across SQL engines Hive M M M M...... R R R bucket 0 bucket 1 bucket (n-1) …... Spak SQL M M M M…... bucket 0 bucket 0 bucket 0 bucket 0 bucket 1 bucket (n-1) …... HiveHash Murmur3
  19. 19. Spark Bucketing Limitations ▪ Incompatible across SQL engines Exchange(user_id) Exchange(user_id) TableScan(order) TableScan(user) SortMergeJoin Sort(user_id)Sort(user_id) Exchange and Sort are required when joining tables with Hive bucketing in Spark SQL or joining tables with Spark bucketing in Hive join on user_id
  20. 20. Spark Bucketing Limitations ▪ Extra Sort TableScan(order) TableScan(user) SortMergeJoin Sort(user_id)Sort(user_id) Sort is required when joining tables with Spark SQL bucketing in Spark SQL because each bucket may consist of more than one file join on user_id
  21. 21. Spark Bucketing Limitations ▪ Unaligned Bucket Number Exchange(user_id) TableScan(order) TableScan(user) SortMergeJoin Exchange is required on one of the bucketed tables when the bucket number is different clustered by user_id into 4096 buckets clustered by user_id into 1024 buckets join on user_id
  22. 22. Spark Bucketing Limitations ▪ Join key set is different from bucket key set Exchange(user_id, location_id) TableScan(order) TableScan(user) SortMergeJoin Exchange is required on when the bucketing key set is different from the join key set clustered by user_id into 1024 buckets clustered by user_id into 1024 buckets join on user_id,location_id Exchange(user_id, location_id)
  23. 23. Spark Bucketing Limitations ▪ Union all after bucketing Exchange(user_id) TableScan( order_mobile) TableScan(user) SortMergeJoin clustered by user_id into 1024 buckets clustered by user_id into 1024 buckets join on user_id TableScan( order_web) clustered by user_id into 1024 buckets Exchange is required in this case even when the underlying tables are both bucketed by user_id, which is the join key and the bucket number is the same with the other one
  24. 24. Bucketing Optimizations at ByteDance
  25. 25. Bucketing Optimizations at ByteDance ▪ Align Spark Bucketing with Hive Hive M M M M...... R R R bucket 0 bucket 1 bucket (n-1) …... Spak SQL M M M M…... bucket 0 bucket 0 bucket 0 bucket 0 bucket 1 bucket (n-1) …... HiveHash Murmur3
  26. 26. Bucketing Optimizations at ByteDance ▪ Align Spark Bucketing with Hive ▪ Spark SQL write to Hive bucketed table in the same as Hive ▪ override InsertIntoHiveTable#requiredOrdering ▪ HashClusteredDistribution with HiveHash on bucketing keys ▪ override InsertIntoHiveTable#requiredDistribution ▪ SortOrder on bucketing keys with Ascending
  27. 27. Bucketing Optimizations at ByteDance ▪ Align Spark Bucketing with Hive ▪ Spark SQL read Hive bucketed table with bucketing metadata ▪ override HiveTableScanExec#outputPartitioning ▪ HashPartitioning with HiveHash ▪ override HiveTableScanExec#outputOrdering ▪ SortOrder on bucketing keys with Ascending
  28. 28. Bucketing Optimizations at ByteDance HiveTableScan HiveTableScan Sort Merge Join outputPartitioning: HashPartitioning(id, n, HiveHash) outputOrdering: SortOrder(id) requireChildDistribution: HashClusteredDistribution(id, n, HiveHash) requireChildOrdering: SortOrder(id) HiveTableScan Exchange HiveTableScan Exchange Sort Sort Sort Merge Join requireChildDistribution: HashClusteredDistribution(id, n, Murmur3Hash) requireChildOrdering: SortOrder(id) outputPartitioning: UnknownPartitioning outputOrdering: Nil
  29. 29. Bucketing Optimizations at ByteDance ▪ One to Mange Bucket Join
  30. 30. Bucketing Optimizations at ByteDance Table A (3 bucket) bucket 0 Table B(6 bucket) bucket 1 bucket 5 bucket 2 bucket 3 bucket 4 (0, 6, 12) (1, 7, 13) (2, 8, 14) (3, 9, 15) (4, 10, 16) (5, 11, 17) (0, 3, 6, 9, 12, 15) (2, 5, 8, 11, 14, 17) (1, 4, 7, 10,13, 16) TableScan Sort Merge Join Sort Table A (3 bucket) Table B (6 bucket)bucket 0 bucket 1 bucket 2 TableScan
  31. 31. Bucketing Optimizations at ByteDance Table A (3 bucket) bucket 0 Table B(6 bucket) bucket 1 bucket 5 bucket 2 bucket 3 bucket 4 (0, 6, 12) (1, 7, 13) (2, 8, 14) (3, 9, 15) (4, 10, 16) (5, 11, 17) (0, 3, 6, 9, 12, 15) (2, 5, 8, 11, 14, 17) (1, 4, 7, 10,13, 16) TableScan Sort Merge Join Bucket Union Table A (3 bucket) Table B (6 bucket)bucket 0 bucket 1 bucket 2 TableScan (0, 3, 6, 9, 12, 15) (2, 5, 8, 11, 14, 17) (1, 4, 7, 10,13, 16) bucket 0’ bucket 1’ bucket 2’ Table A (3 bucket) TableScan
  32. 32. Bucketing Optimizations at ByteDance TableScan Sort Merge Join Bucket Union Table A (3 bucket) Table B (6 bucket) TableScan Table A (3 bucket) TableScan • B left join A • B left semi join A • B anti join A • B inner join A • B right join A • B full outer join A • B cross join A
  33. 33. Bucketing Optimizations at ByteDance TableScan Sort Merge Join Bucket Union Table A (3 bucket) Table B (6 bucket) TableScan Table A (3 bucket) TableScan Filter Filter • B left join A • B left semi join A • B anti join A • B inner join A • B right join A • B full outer join A • B cross join A
  34. 34. Bucketing Optimizations at ByteDance Table A (3 bucket) bucket 0 Table B(6 bucket) bucket 1 bucket 5 bucket 2 bucket 3 bucket 4 (0, 6, 12) (1, 7, 13) (2, 8, 14) (3, 9, 15) (4, 10, 16) (5, 11, 17) (0, 3, 6, 9, 12, 15) (2, 5, 8, 11, 14, 17) (1, 4, 7, 10,13, 16) bucket 0 bucket 1 bucket 2 (0, 3, 6, 9, 12, 15) (2, 5, 8, 11, 14, 17) (1, 4, 7, 10,13, 16) bucket 0’ bucket 1’ bucket 2’ TableScan Sort Merge Join Bucket Union Table A (3 bucket) Table B (6 bucket) TableScan Table A (3 bucket) TableScan Filter Filter
  35. 35. Bucketing Optimizations at ByteDance ▪ Join on more than bucketing keys TableScan Sort Merge Join on A B Table X Bucket by A Table Y Bucket by A TableScan Exchange on A B Exchange on A B Sort on A B Sort on A B X 1 1 X 2 3 X 4 2 Y 6 7 Y 7 3 Y 8 5 Z 2 8 Z 4 3 Z 5 2 BA C Table X Bucket by A X 2 3 X 1 1 X 4 2 Y 8 5 Y 6 7 Y 7 3 Z 2 8 Z 5 2 Z 4 3 BA C Table Y Bucket by A
  36. 36. Bucketing Optimizations at ByteDance ▪ Join on more than bucketing keys TableScan Sort Merge Join on A B Table X Bucket by A Table Y Bucket by A TableScan Sort on A B Sort on A B X 1 1 X 2 3 X 4 2 Y 6 7 Y 7 3 Y 8 5 Z 2 8 Z 4 3 Z 5 2 BA C Table X Bucket by A X 2 3 X 1 1 X 4 2 Y 8 5 Y 6 7 Y 7 3 Z 2 8 Z 5 2 Z 4 3 BA C Table Y Bucket by AJoin keys(A B) is superset of bucketing keys(A)
  37. 37. Bucketing Optimizations at ByteDance ▪ Bucketing evolution ▪ Case 1: A non-bucketed table is partitioned by date. User want to convert it to a bucketed table without overhead ▪ Case 2: The bucket number is X and user need to enlarge it to 2X because the data volume increased
  38. 38. Bucketing Optimizations at ByteDance ▪ Bucketing evolution ▪ Put bucketing information into partition parameter ▪ Only if all target partitions have the same bucketing information will the table be read as bucketed table. Otherwise, it will be read as non- bucketed table ▪ Reading a bucket table as non-bucketed table only impact performance but not correctness
  39. 39. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×