Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Developer's View Into Spark's Memory Model with Wenchen Fan

2,185 views

Published on

As part of Project Tungsten, we started an ongoing effort to substantially improve the memory and CPU efficiency of Apache Spark’s backend execution and push performance closer to the limits of modern hardware. In this talk, we’ll take a deep dive into Apache Spark’s unified memory model and discuss how Spark exploits memory hierarchy and leverages application semantics to manage memory explicitly (both on and off-heap) to eliminate the overheads of JVM object model and garbage collection.

Published in: Data & Analytics
  • Be the first to comment

A Developer's View Into Spark's Memory Model with Wenchen Fan

  1. 1. A Developer’s View Into Spark’s Memory Model Wenchen Fan 2017-6-7
  2. 2. About Me • Software Engineer @ • Apache Spark Committer • One of the most active Spark contributors
  3. 3. About Databricks TEAM Started Spark project (now Apache Spark) at UC Berkeley in 2009 MISSON Make Big Data Simple PRODUC TUnified Analytics Platform
  4. 4. Maste r Worker 1 Worker 2 Worker 3 Driver Executor Executor Driver Executor Executor Memory Manager Thread Pool
  5. 5. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 internal format Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages
  6. 6. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  7. 7. Memory Allocation • Allocation happens in page granularity. • Off-heap supported! • Page is not fixed-size, but has a lower and upper bound. • No pooling, pages are freed once there is no data on it.
  8. 8. Why var-length page and no pooling? • Pros: • simplify the implementation. (no single record will across pages) • free memory immediately so that the OS can use them for file buffer, etc. • Cons: • can not handle super big single record. (very rare in reality) • fragmentation for records bigger than page size lower bound. (the lower bound is several mega bytes, so it’s also rare) • overhead in allocation. (most malloc algorithms should work well)
  9. 9. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  10. 10. Java Objects Based Row Format Array BoxedInteger(123) String(“data”) String(“bricks”) Row • 5+ objects • high space overhead • slow value accessing • expensive hashCode() (123, “data”, “bricks”)
  11. 11. Data objects? No! • It is hard to monitor and control the memory usage when we have a lot of objects. • Garbage collection will be the killer. • High serialization cost when transfer data inside cluster.
  12. 12. Efficient Binary Format 0x0 null tracking (123, “data”, “bricks”)
  13. 13. Efficient Binary Format 0x0 123 null tracking (123, “data”, “bricks”)
  14. 14. Efficient Binary Format 0x0 123 32 “data”4 null tracking (123, “data”, “bricks”) offset and length of
  15. 15. Efficient Binary Format “bricks”0x0 123 32 40 “data”4 6 null tracking (123, “data”, “bricks”) offset and length of offset and length of data
  16. 16. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  17. 17. Operate On Binary 0x 0 12 3 2 4 4 “data ” 0x 0 12 3 2 4 4 “data ” 0x 0 1 6 2 “da” 123 > JSON files substring
  18. 18. How to process binary data more efficiently?
  19. 19. Understanding CPU Cache Memory is becoming slower and slower than CPU.
  20. 20. Understanding CPU Cache Pre-fetch frequently accessed data into CPU cache.
  21. 21. The most 2 important algorithms in big data are ... Sort and Hash!
  22. 22. Naive Sort “bbb” “aaa” “ccc” pointer pointer pointer
  23. 23. Naive Sort pointer pointer pointer “bbb” “aaa” “ccc”
  24. 24. Naive Sort pointer pointer pointer “bbb” “aaa” “ccc”
  25. 25. Naive Sort pointer pointer pointer “bbb” “aaa” “ccc”
  26. 26. Naive Sort Each comparison needs to access 2 different memory regions, which makes it hard for CPU cache to pre-fetch data, poor cache locality!
  27. 27. Cache-aware Sort record record record ptr ptr ptr key-prefix key-prefix key-prefix
  28. 28. Cache-aware Sort record record record ptr ptr ptr key-prefix key-prefix key-prefix
  29. 29. Cache-aware Sort record record record ptr ptr ptr key-prefix key-prefix key-prefix
  30. 30. Cache-aware Sort Most of the time, just go through the key-prefixes in a linear fashion, good cache locality!
  31. 31. Naive Hash Map key1pointer v1 key2pointer v2 key3pointer v3
  32. 32. Naive Hash Map pointer pointer pointer key3 lookup key1 v1 key2 v2 key3 v3
  33. 33. Naive Hash Map pointer pointer pointer key3 hash(key) % size key1 v1 key2 v2 key3 v3
  34. 34. Naive Hash Map pointer pointer pointer key3 compare these 2 keys key1 v1 key2 v2 key3 v3
  35. 35. Naive Hash Map pointer pointer pointer key3 quadratic probing key1 v1 key2 v2 key3 v3
  36. 36. Naive Hash Map key3 compare these 2 keys pointer pointer pointer key1 v1 key2 v2 key3 v3
  37. 37. Naive Hash Map Each lookup needs many pointer dereferences and key comparison when hash collision happens, and jumps between 2 memory regions, bad cache locality!
  38. 38. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key1 v1 key2 v2 key3 v3
  39. 39. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 lookup key1 v1 key2 v2 key3 v3
  40. 40. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 hash(key) % size, and compare the full hash key1 v1 key2 v2 key3 v3
  41. 41. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 quadratic probing, and compare the full hash key1 v1 key2 v2 key3 v3
  42. 42. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 key1 v1 key2 v2 key3 v3 compare these 2 keys
  43. 43. Cache-aware Hash Map Each lookup mostly only needs one pointer dereference and key comparison(full hash collision is rare), and access data mostly in a single memory region, better cache locality!
  44. 44. Recap: Cache-aware data structure How to improve cache locality … • store key-prefix with pointer. • store key full hash with pointer. Store extra information to try to keep the memory accessing in a single region.
  45. 45. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  46. 46. Future Work • SPARK-19489: Stable serialization format for external & native code integration. • SPARK-15689: Data source API v2 • SPARK-15687: Columnar execution engine.
  47. 47. Try Apache Spark in Databricks! UNIFIED ANALYTICS PLATFORM • Collaborative cloud environment • Free version (community edition) DATABRICKS RUNTIME 3.0 • Apache Spark - optimized for the cloud • Caching and optimization layer - DBIO • Enterprise security - DBES Try for free today. databricks.com
  48. 48. Thank You

×