Successfully reported this slideshow.
Your SlideShare is downloading. ×

Spark SQL Bucketing at Facebook

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 38 Ad

Spark SQL Bucketing at Facebook

Download to read offline

Bucketing is a popular data partitioning technique to pre-shuffle and (optionally) pre-sort data during writes. This is ideal for a variety of write-once and read-many datasets at Facebook, where Spark can automatically avoid expensive shuffles/sorts (when the underlying data is joined/aggregated on its bucketed keys) resulting in substantial savings in both CPU and IO.

Over the last year, we’ve added a series of optimizations in Apache Spark as a means towards achieving feature parity with Hive and Spark. These include avoiding shuffle/sort when joining/aggregating/inserting on tables with mismatching buckets, allowing user to skip shuffle/sort when writing to bucketed tables, adding data validators before writing bucketed data, among many others. As a direct consequence of these efforts, we’ve witnessed over 10x growth (spanning 40% of total compute) in queries that read one or more bucketed tables across the entire data warehouse at Facebook.

In this talk, we’ll take a deep dive into the internals of bucketing support in SparkSQL, describe use-cases where bucketing is useful, touch upon some of the on-going work to automatically suggest bucketing tables based on query column lineage, and summarize the lessons learned from developing bucketing support in Spark at Facebook over the last 2 years

Bucketing is a popular data partitioning technique to pre-shuffle and (optionally) pre-sort data during writes. This is ideal for a variety of write-once and read-many datasets at Facebook, where Spark can automatically avoid expensive shuffles/sorts (when the underlying data is joined/aggregated on its bucketed keys) resulting in substantial savings in both CPU and IO.

Over the last year, we’ve added a series of optimizations in Apache Spark as a means towards achieving feature parity with Hive and Spark. These include avoiding shuffle/sort when joining/aggregating/inserting on tables with mismatching buckets, allowing user to skip shuffle/sort when writing to bucketed tables, adding data validators before writing bucketed data, among many others. As a direct consequence of these efforts, we’ve witnessed over 10x growth (spanning 40% of total compute) in queries that read one or more bucketed tables across the entire data warehouse at Facebook.

In this talk, we’ll take a deep dive into the internals of bucketing support in SparkSQL, describe use-cases where bucketing is useful, touch upon some of the on-going work to automatically suggest bucketing tables based on query column lineage, and summarize the lessons learned from developing bucketing support in Spark at Facebook over the last 2 years

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Spark SQL Bucketing at Facebook (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Spark SQL Bucketing at Facebook

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Cheng Su, Facebook Spark SQL Bucketing at Facebook #UnifiedDataAnalytics #SparkAISummit
  3. 3. About me Cheng Su • Software Engineer at Facebook (Data Infrastructure Organization) • Working in Spark team • Previously worked in Hive/Corona team 3#UnifiedDataAnalytics #SparkAISummit
  4. 4. Agenda • Spark at Facebook • What is Bucketing • Spark Bucketing Optimizations (JIRA: SPARK-19256) • Bucketing Compatability across SQL Engines • The Road Ahead 4#UnifiedDataAnalytics #SparkAISummit
  5. 5. Spark at Facebook 5#UnifiedDataAnalytics #SparkAISummit
  6. 6. What is Bucketing 6#UnifiedDataAnalytics #SparkAISummit Pre-shuffle and (optionally) pre-sort when writing table. Avoid shuffle and (optionally) sort when reading table. table user(id, info) write normal table . . . . . . (2, ) (1, ) (5, ) (1, ) (2, ) (4, ) (3, ) (0, ) write bucketed sorted table . . . . . . (2, ) (1, ) (5, ) (1, ) (2, ) (4, ) (3, ) (0, ) file0 file1 file9 (0, ) (0, ) (4, ) (1, ) (1, ) (5, ) (3, ) (3, ) (2, ) (2, ) shuffle(id) sort(id)
  7. 7. What is Bucketing (query plan) CREATE TABLE user (id INT, info STRING) CLUSTERED BY (id) SORTED BY (id) INTO 8 BUCKETS 7#UnifiedDataAnalytics #SparkAISummit SQL query to create bucketed table InsertIntoTable Sort(id) ShuffleExechange (id, 8, HashFunc) . . . Query plan to write bucketed table INSERT OVERWRITE TABLE user SELECT id, info FROM . . . WHERE . . . SQL query to write bucketed table
  8. 8. What is Bucketing (write path) 8#UnifiedDataAnalytics #SparkAISummit
  9. 9. Spark Bucketing Optimizations (join) 9#UnifiedDataAnalytics #SparkAISummit Avoid shuffle and sort when sort-merge-join bucketed tables SELECT . . . FROM left L JOIN right R ON L.id = R.id SQL query to join tables SortMergeJoin Sort(id) Shuffle(id) Sort(id) Shuffle(id) TableScan(L) TableScan(R) SortMergeJoin TableScan(L) TableScan(R) Query plan to sort-merge- join two bucketed tables with same buckets
  10. 10. Table Scan L Table Scan R ShuffleShuffleShuffle Join Sort Join Sort . . . . . . (2, ) (1, ) (5, ) (0, ) (2, ) (4, ) (3, ) (0, ) . . . . . .(3, ) (9, ) (5, ) (4, ) (2, ) (8, ) (2, ) (1, ) Sort merge join - Shuffle both tables - Sort both tables - Join by buffer one, stream the bigger one Join Sort
  11. 11. Table Scan L Table Scan R Join Join Join Sort merge join of bucketed sorted table - Join by buffer one, stream the bigger one . . . . . . (1, ) (1, ) (5, ) (0, ) (0, ) (4, ) (3, ) (3, ) . . . . . .(1, ) (9, ) (0, ) (4, ) (4, ) (3, ) (7, ) (7, )
  12. 12. 12#UnifiedDataAnalytics #SparkAISummit Avoid shuffle when shuffled-hash-join bucketed tables SELECT . . . FROM left L JOIN right R ON L.id = R.id SQL query to join tables ShuffledHashJoin Shuffle(id) Shuffle(id) TableScan(L) TableScan(R) ShuffledHashJoin TableScan(L) TableScan(R) Query plan to shuffled- hash-join two bucketed tables with same buckets Spark Bucketing Optimizations (join)
  13. 13. Table Scan L Table Scan R ShuffleShuffleShuffle Join Build hash table Join . . . . . . (2, ) (1, ) (5, ) (0, ) (2, ) (4, ) (3, ) (0, ) . . . . . .(3, ) (9, ) (5, ) (4, ) (2, ) (8, ) (2, ) (1, ) Shuffled hash join - Shuffle both tables - Join by hash one, stream the bigger one Join Build hash table Build hash table
  14. 14. Table Scan L Table Scan R Join Build hash table Join Shuffled hash join of bucketed table - Join by hash one, stream the bigger one Join Build hash table Build hash table . . . . . . (5 ) (1, ) (5, ) (0, ) (4, ) (8, ) (3, ) (3, ) . . . . . .(9, ) (1, ) (0, ) (4, ) (4, ) (7, ) (3, ) (7, )
  15. 15. 15#UnifiedDataAnalytics #SparkAISummit Avoid shuffle and sort when joining non-bucketed, and bucketed table SELECT . . . FROM left L JOIN right R ON L.id = R.id SQL query to join tables SortMergeJoin Sort(id) Shuffle(id) TableScan(L) TableScan(R) Query plan to sort-merge-join non-bucketed table (L) with bucketed table (R) Spark Bucketing Optimizations (join)
  16. 16. Table Scan L (non-bucketed) Table Scan R (bucketed) ShuffleShuffleShuffle Join Sort Join Sort . . . . . . (2, ) (1, ) (5, ) (0, ) (2, ) (4, ) (3, ) (0, ) Sort merge join of non-bucketed and bucketed table - Shuffle non-bucketed table - Sort non-bucketed table - Join by buffer one, stream the bigger one Join Sort . . . . . .(1, ) (9, ) (0, ) (4, ) (4, ) (3, ) (7, ) (7, )
  17. 17. 17#UnifiedDataAnalytics #SparkAISummit Avoid shuffle and sort when joining bucketed tables with different buckets SELECT . . . FROM left L JOIN right R ON L.id = R.id SQL query to join tables SortMergeJoin TableScan(L) TableScan(R) Query plan to join 4-buckets-table (L) with 16-buckets-table (R) Spark Bucketing Optimizations (join) SortedCoalesce(4) SortedCoalesceExec (physical plan operator inherits child ordering ) SortedCoalescedRDD (extends CoalescedRDD to read children RDDs in sort-merge-way) (priority-queue)
  18. 18. Table Scan L Table Scan R Join Sort merge join of bucketed sorted table with different buckets - Coalesce the bigger one in sort-merge way - Join by buffer one, stream the bigger one (1, ) (1, ) (3, ) (0, ) (0, ) (2, ) (1, ) (9, ) (0, ) (4, ) (3, ) (7, ) (7, ) (2, ) (2, ) (6, ) (0, ) (0, ) (2, ) (0, ) (2, ) (2, ) (4, ) (6, ) Sorted-Coalesce Join (1, ) (1, ) (3, ) (1, ) (3, ) (7, ) (7, ) (9, ) Sorted-Coalesce
  19. 19. 19#UnifiedDataAnalytics #SparkAISummit Avoid shuffle and sort when joining bucketed tables with different buckets SELECT . . . FROM left L JOIN right R ON L.id = R.id SQL query to join tables SortMergeJoin TableScan(L) TableScan(R) Query plan to join 4-buckets-table (L) with 16-buckets-table (R) Spark Bucketing Optimizations (join) Repartition(16) RepartitionWithoutShuffleExe c (physical plan operator inherits child ordering) RepartitionWithoutShuffleRD D (divide-read-filter children RDD partitions)
  20. 20. Table Scan L Table Scan R Join Sort merge join of bucketed sorted table with different buckets - Divide (repartition-w/o- shuffle) the smaller one - Join by buffer one, stream the bigger one (1, ) (1, ) (3, ) (0, ) (0, ) (2, ) (1, ) (9, ) (0, ) (4, ) (3, ) (7, ) (7, ) (2, ) (2, ) (6, ) (0, ) (0, ) Divide (0, ) (4, ) Join (1, ) (1, ) Divide (1, ) (9, ) Join (2, ) Divide (2, ) (2, ) (6, ) Join (3, ) Divide (3, ) (7, ) (7, )
  21. 21. Spark Bucketing Optimizations (group-by) 21#UnifiedDataAnalytics #SparkAISummit Avoid shuffle and sort when sort-aggregate bucketed tables SELECT . . . FROM t GROUP BY id SQL query to group- by table SortAggregate Sort(id) Shuffle(id) TableScan(t) Query plan to sort- aggregate bucketed table SortAggregate TableScan(t)
  22. 22. Table Scan t ShuffleShuffleShuffle Sort . . . . . .(3, ) (9, ) (5, ) (4, ) (2, ) (8, ) (2, ) (1, ) Sort aggregation - Shuffle table - Sort table - Aggregate Aggregate Sort Aggregate Sort Aggregate
  23. 23. Table Scan t Sort aggregation of bucketed table - Aggregate Aggregate Aggregate Aggregate . . . . . .(1, ) (9, ) (0, ) (4, ) (4, ) (3, ) (7, ) (7, )
  24. 24. Spark Bucketing Optimizations (group-by) 24#UnifiedDataAnalytics #SparkAISummit Avoid shuffle when hash-aggregate bucketed tables SELECT . . . FROM t GROUP BY id SQL query to group- by table HashAggregate Shuffle(id) TableScan(t) Query plan to hash- aggregate bucketed table HashAggregate TableScan(t)
  25. 25. Table Scan t ShuffleShuffleShuffle . . . . . .(3, ) (9, ) (5, ) (4, ) (2, ) (8, ) (2, ) (1, ) Hash aggregation - Shuffle table - Aggregate Aggregate Aggregate Aggregate
  26. 26. Table Scan t Hash aggregation of bucketed table - Aggregate Aggregate Aggregate Aggregate . . . . . .(9, ) (1, ) (4 ) (0, ) (4, ) (7, ) (3, ) (7, )
  27. 27. Spark Bucketing Optimizations (union all) 27#UnifiedDataAnalytics #SparkAISummit Avoid shuffle and sort when join/group-by on union-all of bucketed tables SELECT . . . FROM ( SELECT … FROM L UNION ALL SELECT … FROM R ) GROUP BY id SQL query to group-by on union-all of tables SortAggregate Union TableScan(L) Query plan to hash- aggregate union-all of bucketed tables TableScan(R) Change UnionExec to produce SortedCoalescedRDD instead of CoalescedRDD
  28. 28. Table Scan L Table Scan R Union-all Aggregate . . . . . . (2, ) (1, ) (5, ) (0, ) (2, ) (4, ) (3, ) (0, ) . . . . . .(3, ) (9, ) (5, ) (4, ) (2, ) (8, ) (2, ) (1, ) Aggregate after union-all - Union-all of both tables - Shuffle both tables - Sort both tables - Aggregate Union-all Aggregate Union-all Aggregate Shuffle & Sort Shuffle & Sort Shuffle & Sort
  29. 29. Table Scan L Table Scan R Union-all Aggregate Aggregate after union-all of bucketed sorted table - Union-all of both tables in sort-merge way - Aggregate Union-all Union-all . . . . . . (1, ) (1, ) (5, ) (0, ) (0, ) (4, ) (3, ) (3, ) . . . . . .(1, ) (9, ) (0, ) (4, ) (4, ) (3, ) (7, ) (7, ) (0, ) (0, ) (0, ) (4, ) (4, ) (4, ) Aggregate (1, ) (1, ) (1, ) (5, ) (9, ) Aggregate (3, ) (3, ) (3, ) (7, ) (7, )
  30. 30. Spark Bucketing Optimizations (filter) 30#UnifiedDataAnalytics #SparkAISummit Filter pushdown for bucketed table SELECT … FROM t WHERE id = 1 SQL query to read bucketed table with filter on bucketed column (id) Filter Query plan to read bucketed table with filter pushdown PushDownBucketFilter physical plan rule to extract bucketed column filter from FilterExec, then filtering out unnecessary buckets from e.g. HiveTableScanExec (i.e. not read unrelated buckets at all) TableScan(t)SELECT … FROM t WHERE id IN (1, 2, 3)
  31. 31. Bucket Filter Push Down SELECT … FROM t WHERE id = 1 Normal Filter Bucket Filter Push Down . . . . . .(9, ) (1, ) (4 ) (0, ) (4, ) (7, ) (3, ) (7, ) (1, ) - Only read required bucket files (9, ) (1, ) (1, )
  32. 32. Spark Bucketing Optimizations (validation) 32#UnifiedDataAnalytics #SparkAISummit Validate bucketing and sorting before writing bucketed tables INSERT OVERWRITE TABLE t SELECT … FROM … SQL query to write bucketed table InsertIntoTable(t) SortVerifie r Query plan to validate bucketing and sorting before writing table ShuffleVerifierExec compute bucket-id for each row on-the-fly, compare bucket-id with RDD-partition-id ShuffleVerifie r SortVerifierExec compare ordering between current and previous rows
  33. 33. Shuffle Verifier Shuffle Verifier Sort Verifier - Validate bucket id - Validate sort order - Write to table . . . . . .(9, ) (1, ) (0, ) (4, ) (4, ) (3, ) (6, ) (7, ) . . . . . .(9, ) (1, ) (0, ) (4, ) (4, ) (3, ) (6, ) (7, ) Sort Verifier . . . . . .(9, ) (1, ) (0, ) (4, ) (4, )
  34. 34. Spark Bucketing Optimizations (others) 34#UnifiedDataAnalytics #SparkAISummit • Sorted-coalesced-read multiple partitions of bucketed table • Prefer sort-merge-join for bucketed sorted tables • Prefer sort-aggregate for bucketed sorted tables • Avoid shuffle for NULL-safe-equal join (<=>) on bucketed tables • Allow to skip shuffle and sort before writing bucketed table • Automatically align dynamic allocation maximal executors, with buckets • Efficiently hive table sampling support
  35. 35. • Hive hash is different from murmur3 hash! (bitwise-and with 2^31-1 in org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getBucketNumber) • Should use same bucketing hash function (e.g. hive hash) across SQL engines (Spark/Presto/Hive) • Number of buckets of all tables should be divisible by each other (e.g. power-of-two) 35#UnifiedDataAnalytics #SparkAISummit Bucketing Compatability across SQL Engines
  36. 36. • Change number of buckets should be easy and pain-less across compute engines for SQL users • When and What to bucket? • Have more than one query to do join or group-by on some columns 36#UnifiedDataAnalytics #SparkAISummit Bucketing Compatability across SQL Engines
  37. 37. The Road Ahead • Bucketing should be user-transparent • Auto-bucketing project • Audit join/group-by columns information for all warehouse queries • Recommend bucketed columns and number of buckets based on computational cost models • What is problem of bucketing? Can we have better data placement, besides bucketing and partitioning? 37#UnifiedDataAnalytics #SparkAISummit
  38. 38. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×