Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive: Memory Management in Apache Spark

9,342 views

Published on

Memory management is at the heart of any data-intensive system. Spark, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.

Published in: Software

Deep Dive: Memory Management in Apache Spark

  1. 1. Deep Dive: Memory Management in Apache Andrew Or June 8th, 2016 @andrewor14
  2. 2. students.select("name").orderBy("age").cache().show() Caching Tungsten Off-heapMemory Contention
  3. 3. 3 Efficient memory use is critical to good performance
  4. 4. Memory contention poses three challenges for Apache Spark 5 How to arbitrate memory between execution and storage? How to arbitrate memory across tasks running in parallel? How to arbitrate memory across operators running within the same task?
  5. 5. Two usages of memory in Apache Spark 6 Execution Memory used for shuffles, joins, sorts and aggregations Storage Memory used to cache data that will be reused later
  6. 6. Iterator 4, 3, 5, 1, 6, 2 Sort 4 3 5 1 6 2 Iterator 1, 2, 3, 4, 5, 6 1 2 3 4 5 6 Execution memory Take(3) What if I want the sorted values again?
  7. 7. Iterator 4, 3, 5, 1, 6, 2 Sort 4 3 5 1 6 2 Iterator 1, 2, 3, 4, 5, 6 1 2 3 4 5 6 Take(3) Iterator 4, 3, 5, 1, 6, 2 Sort 4 3 5 1 6 2 Iterator 1, 2, 3, 4, 5, 6 1 2 3 4 5 6 Take(4) ...
  8. 8. Sort Iterator 4, 3, 5, 1, 6, 2 4 3 5 1 6 2 Iterator 1, 2, 3, 4, 5, 6 1 2 3 4 5 6 Cache 4 3 5 1 6 21 2 3 4 5 6 Execution memory Storage memory Take(5)Take(4)Take(3) ...
  9. 9. Challenge #1 How to arbitrate memory between execution and storage?
  10. 10. Easy, static assignment! 11 Total available memory Execution Storage Spark 1.0 May 2014
  11. 11. Easy, static assignment! 12 Execution Storage Spill to disk Spark 1.0 May 2014
  12. 12. Easy, static assignment! 13 Execution Storage Spark 1.0 May 2014
  13. 13. Easy, static assignment! 14 Execution Storage Evict LRU block to disk Spark 1.0 May 2014
  14. 14. 15 Inefficient memory use leads to bad performance
  15. 15. Easy, static assignment! 16 Execution can only use a fraction of the memory, even when there is no storage! Execution Storage Spark 1.0May 2014
  16. 16. Storage Easy, static assignment! 17 Efficient use of memory required user tuning Execution Spark 1.0May 2014
  17. 17. 18 Fast forward to 2016… How could we have done better?
  18. 18. 19 Execution Storage
  19. 19. 20 Unified memory management Spark 1.6+ Jan 2016 What happens if there is already storage? Execution Storage
  20. 20. 21 Unified memory management Spark 1.6+ Jan 2016 Evict LRU block to disk Execution Storage
  21. 21. 22 Unified memory management Spark 1.6+ Jan 2016 What about the other way round? Execution Storage
  22. 22. 23 Unified memory management Spark 1.6+ Jan 2016 Evict LRU block to disk Execution Storage
  23. 23. Design considerations 24 Why evict storage, not execution? Spilled execution data will always be read back from disk, whereas cached data may not. What if the application relies on caching? Allow the user to specify a minimum unevictable amount of cached data (not a reservation!). Spark 1.6+ Jan 2016
  24. 24. Challenge #2 How to arbitrate memory across tasks running in parallel?
  25. 25. Easy, static assignment! Worker machine has 4 cores Each task gets 1/4 of the total memory Slot 1 Slot 2 Slot 3 Slot 4
  26. 26. Alternative: Dynamic assignment The share of each task depends on number of actively running tasks (N) Task 1
  27. 27. Alternative: Dynamic assignment Now, another task comes along so the first task will have to spill Task 1
  28. 28. Alternative: Dynamic assignment Each task is now assigned 1/N of the memory, where N = 2 Task 1 Task 2
  29. 29. Alternative: Dynamic assignment Each task is now assigned 1/N of the memory, where N = 4 Task 1 Task 2 Task 3 Task 4
  30. 30. Alternative: Dynamic assignment Last remaining task gets all the memory because N = 1 Task 3 Spark 1.0+ May 2014
  31. 31. Static vs dynamic assignment 32 Both are fair and starvation free Static assignment is simpler Dynamic assignment handles stragglers better
  32. 32. Challenge #3 How to arbitrate memory across operators running within the same task?
  33. 33. SELECT age, avg(height) FROM students GROUP BY age ORDER BY avg(height) students.groupBy("age") .avg("height") .orderBy("avg(height)") .collect() Scan Project Aggregate Sort
  34. 34. Worker has 6 pages of memory Scan Project Aggregate Sort
  35. 35. Scan Project Aggregate Sort Map { // age → heights 20 → [154, 174, 175] 21 → [167, 168, 181] 22 → [155, 166, 188] 23 → [160, 168, 178, 183] }
  36. 36. Scan Project Aggregate Sort All 6 pages were used by Aggregate, leaving no memory for Sort!
  37. 37. Solution #1: Reserve a page for each operator Scan Project Aggregate Sort
  38. 38. Solution #1: Reserve a page for each operator Scan Project Aggregate Sort Starvation free, but still not fair… What if there were more operators?
  39. 39. Solution #2: Cooperative spilling Scan Project Aggregate Sort
  40. 40. Scan Project Aggregate Sort Solution #2: Cooperative spilling
  41. 41. Scan Project Aggregate Sort Solution #2: Cooperative spilling Sort forces Aggregate to spill a page to free memory
  42. 42. Scan Project Aggregate Sort Solution #2: Cooperative spilling Sort needs more memory so it forces Aggregate to spill another page (and so on)
  43. 43. Scan Project Aggregate Sort Solution #2: Cooperative spilling Sort finishes with 3 pages Aggregate does not have to spill its remaining pages Spark 1.6+ Jan 2016
  44. 44. Recap: Three sources of contention 45 How to arbitrate memory … ● between execution and storage? ● across tasks running in parallel? ● across operators running within the same task? Instead of avoid statically reserving memory in advance, deal with memory contention when it arises by forcing members to spill
  45. 45. Project Tungsten 46 Binary in-memory data representation Cache-aware computation Code generation (next time) Spark 1.4+ Jun 2015
  46. 46. “abcd” 47 • Native: 4 bytes with UTF-8 encoding • Java: 48 bytes – 12 byte header – 2 bytes per character (UTF-16 internal representation) – 20 bytes of additional overhead – 8 byte hash code Java objects have large overheads
  47. 47. 48 Schema: (Int, String, String) Row Array String(“data”) String(“bricks”) 5+ objects, high space overhead, expensive hashCode() BoxedInteger(123) Java objects based row format
  48. 48. 6 “bricks” 49 0x0 123 32L 48L 4 “data” (123, “data”, “bricks”) Null tracking bitmap Offset to var. length data Offset to var. length data Tungsten row format
  49. 49. Cache-aware computation 50 ptr key rec ptr key rec ptr key rec Naive layout Poor cache locality ptrkey prefix rec ptrkey prefix rec ptrkey prefix rec Cache-aware layout Good cache locality E.g. sorting a list of records
  50. 50. Off-heap memory 51 Available for execution since Apache Spark 1.6 Available for storage since Apache Spark 2.0 Very important for large heaps Many potential advantages: memory sharing, zero copy I/O, dynamic allocation
  51. 51. For more info... Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal https://www.youtube.com/watch?v=5ajs8EIPWGI Spark Performance: What’s Next https://www.youtube.com/watch?v=JX0CdOTWYX4 Unified Memory Management https://issues.apache.org/jira/browse/SPARK-10000
  52. 52. Databricks Community Edition Free version of cloud based platform in beta More than 8,000 users registered Users created over 61,000 notebooks in different languages http://www.databricks.com/try 53
  53. 53. Thank you andrew@databricks.com @andrewor14

×