Advertisement

Spark SQL Join Improvement at Facebook

Developer Marketing and Relations at MuleSoft
Nov. 24, 2020
Advertisement

More Related Content

Slideshows for you(20)

Advertisement

More from Databricks(20)

Advertisement

Spark SQL Join Improvement at Facebook

  1. Spark SQL Join Improvement at Facebook Cheng Su Facebook
  2. About Me ▪ Cheng Su ▪ Software Engineer at Facebook (Data Platform Team) ▪ Apache Spark Contributor (Spark SQL) ▪ Previously worked on Hive & Hadoop at Facebook
  3. Agenda ▪ Overview for Spark SQL Joins ▪ Shuffled Hash Join Improvement (SPARK-32461) ▪ Leverage Bloom Filter for Join (under discussion with community) ▪ Stream-stream Join Improvement (SPARK-32883) ▪ Future Work
  4. Overview for Spark SQL Joins ▪ SQL Join Physical Operators ▪ BroadcastHashJoinExec (broadcast hash join) ▪ ShuffledHashJoinExec (shuffled hash join) ▪ SortMergeJoinExec (sort merge join)
  5. Overview for Spark SQL Joins Data Source A Data Source B . . . . . . . . . .JOIN
  6. Spark Driver - Scan source B - Join with A Broadcast hash join - Ship smaller data source to all nodes - Stream the other side One side should be smaller than spark.sql.autoBroadcastJoinThreshold (default 10 MB). Pros: No shuffle and sort on both sides. No skew. Cons: OOM on driver. Broadcast - Broadcast hash relation on source A
  7. Scan A Scan B . . . . . . . . . . ShuffleShuffleShuffle Join Sort merge join - Shuffle both sides, - Sort both sides, buffer one, stream the bigger one Pros: Handle large data size well on both sides. Cons: Need shuffle and sort. Skew. Sort Join Sort Join Sort
  8. Shuffle hash join - Shuffle both sides, - Hash smaller one, stream the bigger one Disable by default via spark.sql.join.preferSortMergeJoin. One side should be smaller than (spark.sql.autoBroadcastJoinThreshold * spark.sql.shuffle.partitions) (default 10MB * 200), and 3x smaller than the other side Pros: Handle large data size well on one side. Cons: Need shuffle. Skew. OOM on task for building hash table. Scan A Scan B . . . . . . . . . . Build hash table ShuffleShuffleShuffle Join Build hash table Join
  9. Shuffled Hash Join Improvement (SPARK-32461) ▪ Agenda ▪ Code-gen support ▪ Full outer join support ▪ Sort-based fallback mechanism
  10. Code-gen Support for Shuffled Hash Join (SPARK- 32421) ▪ Why? ▪ Save compute resource, improve CPU (whole-stage code-gen) ▪ How? ▪ Refactor broadcast hash join code-gen logic ▪ Broadcast hash join already supports code-gen ▪ Refactor into common parent class for BHJ and SHJ - HashJoin.scala ▪ Performance Improvement ▪ 30% run-time improvement compared to non-code-gen for benchmark query ▪ PR Status ▪ Merged, will be available in Spark 3.1
  11. Full Outer Shuffled Hash Join (SPARK-32399) ▪ Why? ▪ Save compute resource, improve CPU and IO ▪ Only sort merge join supports full outer, and sort is very expensive when table is large and needs to spill to disk. ▪ Shuffled hash join does hash table lookup join, instead of sorting. ▪ How? ▪ Need to record non-matched rows from both sides ▪ Stream side: trivial ▪ Build side: non-trivial, need extra data structure (e.g. hash set for matched rows)
  12. Full outer shuffle hash join - Shuffle both sides, - Hash smaller one, stream the bigger one - Hash set for build side to record matched rows - Iterate build side hash table and output non-matched rows Scan A Scan B . . . . . . . . . . Build hash table ShuffleShuffleShuffle Join Build hash table Join Hash set Hash set
  13. Full Outer Shuffled Hash Join (SPARK-32399) ▪ Performance Improvement ▪ 30% run-time improvement compared to full outer sort merge join for benchmark query ▪ PR Status ▪ Merged, will be available in Spark 3.1
  14. Sort-based fallback mechanism for SHJ (SPARK-32634) ▪ Why? ▪ Build side hash table out of memory ▪ No fallback, no spill, task failure, query failure ▪ Hard to enable shuffled hash join by default given OOM limitation ▪ How? ▪ Introduce fallback when building hash table ▪ Whenever fail to get memory to insert current row to hash table, stop build hash table. Sort both sides and do sort merge join. ▪ PR Status ▪ WIP
  15. Leverage Blook Filter for Join ▪ Why? ▪ Save compute resource, improve CPU and IO for shuffled hash join and sort merge join ▪ How? ▪ Build bloom filter on join key of smaller side ▪ Use bloom filter to filter out rows when scanning larger side ▪ Reduce amount of data to process in followed stages (less data to shuffle/sort/etc) ▪ PR Status ▪ Under discussion with community members, will submit JIRA later
  16. Stream-stream Join Improvement (SPARK-32883) ▪ Agenda ▪ Left semi join support ▪ Full outer join support
  17. Quick Refreshment for Stream-Stream Join ▪ Join Physical Operator ▪ StreamingSymmetricHashJoinExec (stream-stream join for Structured Streaming)
  18. Scan A Scan B . . . . . . . . . . ShuffleShuffleShuffle Stream-stream join - Shuffle both sides, - Join both sides by looking up from each state store State Store B Join State Store A Join State Store B Join State Store A Join State Store B Join State Store A Join
  19. Left semi stream-stream join (SPARK-32862) ▪ Why? ▪ Left semi is even more popular than left outer (observation on some FB streaming workload) ▪ Get all ads impression (left side) which has ads click (right side), but do not care what those ads clicks are ▪ How? ▪ For left side input row, check if there's a match on right side state store ▪ If there's a match, output the left side row, but do not put the row in left side state store (no need to put in state store). ▪ If there's no match, output nothing, but put the row in left side state store (with "matched" field to set to false in state store). ▪ For right side input row, check if there's a match on left side state store. ▪ For all matched left rows in state store, output the rows with "matched" field as false. Set all left rows with "matched" field to be true. Only output the left side rows matched for the first time to guarantee left semi join semantics. ▪ State store eviction: evict rows from left/right side state store below watermark, same as inner join. ▪ PR Status ▪ Merged, will be available in Spark 3.1
  20. Full outer stream-stream join (SPARK-32863) ▪ How? ▪ for left side input row, check if there's a match on right side state store. If there's a match, output all matched rows. Put the row in left side state store. ▪ for right side input row, check if there's a match on left side state store. If there's a match, output all matched rows and update left side rows state with "matched" field to set to true. Put the row in right side state store. ▪ for left side row needs to be evicted from state store, output the row if "matched" field is false. ▪ for right side row needs to be evicted from state store, output the row if "matched" field is false. ▪ PR Status ▪ WIP
  21. Future Work ▪ History-based optimization (HBO) to select best join strategy ▪ Decide between broadcast hash join, shuffled hash join, and sort merge join based on historical join input size
  22. Summary ▪ Shuffled Hash Join Improvement (SPARK-32461) ▪ Leverage Bloom Filter for Join (under discussion with community) ▪ Stream-stream Join Improvement (SPARK-32883)
  23. Thank you! Your feedback is important to us. Don’t forget to rate and review the sessions.
Advertisement