Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Developer’s View Into
Spark’s Memory Model
Wenchen Fan
2017-6-7
About Me
• Software Engineer @
• Apache Spark Committer
• One of the most active Spark contributors
About Databricks
TEAM
Started Spark project (now Apache Spark) at UC Berkeley in
2009
MISSON
Make Big Data Simple
PRODUC
T...
Maste
r
Worker
1
Worker
2
Worker
3
Driver
Executor
Executor
Driver Executor
Executor
Memory
Manager
Thread Pool
Memory Model inside Executor
data
1010100001010
0010001001001
1010001000111
1010100100101
internal format
Cache
Manager
Me...
Memory Model inside Executor
data
1010100001010
0010001001001
1010001000111
1010100100101
Cache
Manager
Memory
Manager
ope...
Memory Allocation
• Allocation happens in page granularity.
• Off-heap supported!
• Page is not fixed-size, but has a lowe...
Why var-length page and no pooling?
• Pros:
• simplify the implementation. (no single record will across pages)
• free mem...
Memory Model inside Executor
data
1010100001010
0010001001001
1010001000111
1010100100101
Cache
Manager
Memory
Manager
ope...
Java Objects Based Row Format
Array
BoxedInteger(123)
String(“data”)
String(“bricks”)
Row
• 5+ objects
• high space overhe...
Data objects? No!
• It is hard to monitor and control the memory usage when
we have a lot of objects.
• Garbage collection...
Efficient Binary Format
0x0
null tracking
(123, “data”, “bricks”)
Efficient Binary Format
0x0 123
null tracking
(123, “data”, “bricks”)
Efficient Binary Format
0x0 123 32 “data”4
null tracking
(123, “data”, “bricks”)
offset and length of
Efficient Binary Format
“bricks”0x0 123 32 40 “data”4 6
null tracking
(123, “data”, “bricks”)
offset and length of
offset ...
Memory Model inside Executor
data
1010100001010
0010001001001
1010001000111
1010100100101
Cache
Manager
Memory
Manager
ope...
Operate On Binary
0x
0
12
3
2
4
4
“data
”
0x
0
12
3
2
4
4
“data
”
0x
0
1
6
2 “da”
123 >
JSON
files
substring
How to process binary
data more efficiently?
Understanding CPU Cache
Memory is becoming
slower and slower
than CPU.
Understanding CPU Cache
Pre-fetch frequently
accessed data into
CPU cache.
The most 2 important
algorithms in big data
are ...
Sort and Hash!
Naive Sort
“bbb”
“aaa”
“ccc”
pointer
pointer
pointer
Naive Sort
pointer
pointer
pointer
“bbb”
“aaa”
“ccc”
Naive Sort
pointer
pointer
pointer
“bbb”
“aaa”
“ccc”
Naive Sort
pointer
pointer
pointer
“bbb”
“aaa”
“ccc”
Naive Sort
Each comparison needs to access 2 different
memory regions, which makes it hard for CPU cache
to pre-fetch data...
Cache-aware Sort
record
record
record
ptr
ptr
ptr
key-prefix
key-prefix
key-prefix
Cache-aware Sort
record
record
record
ptr
ptr
ptr
key-prefix
key-prefix
key-prefix
Cache-aware Sort
record
record
record
ptr
ptr
ptr
key-prefix
key-prefix
key-prefix
Cache-aware Sort
Most of the time, just go through the key-prefixes in a
linear fashion, good cache locality!
Naive Hash Map
key1pointer v1
key2pointer v2
key3pointer v3
Naive Hash Map
pointer
pointer
pointer
key3
lookup key1 v1
key2 v2
key3 v3
Naive Hash Map
pointer
pointer
pointer
key3
hash(key) %
size
key1 v1
key2 v2
key3 v3
Naive Hash Map
pointer
pointer
pointer
key3
compare these 2
keys
key1 v1
key2 v2
key3 v3
Naive Hash Map
pointer
pointer
pointer
key3
quadratic
probing
key1 v1
key2 v2
key3 v3
Naive Hash Map
key3
compare these 2
keys
pointer
pointer
pointer
key1 v1
key2 v2
key3 v3
Naive Hash Map
Each lookup needs many pointer dereferences and
key comparison when hash collision happens, and
jumps betwe...
ptr
ptr
ptr
full hash
full hash
full hash
Cache-aware Hash Map
key1 v1
key2 v2
key3 v3
ptr
ptr
ptr
full hash
full hash
full hash
Cache-aware Hash Map
key3
lookup key1 v1
key2 v2
key3 v3
ptr
ptr
ptr
full hash
full hash
full hash
Cache-aware Hash Map
key3
hash(key) %
size, and
compare the full
hash
key1 v1
ke...
ptr
ptr
ptr
full hash
full hash
full hash
Cache-aware Hash Map
key3
quadratic probing, and
compare the full hash
key1 v1
k...
ptr
ptr
ptr
full hash
full hash
full hash
Cache-aware Hash Map
key3
key1 v1
key2 v2
key3 v3
compare these 2
keys
Cache-aware Hash Map
Each lookup mostly only needs one pointer
dereference and key comparison(full hash collision is
rare)...
Recap: Cache-aware data structure
How to improve cache locality …
• store key-prefix with pointer.
• store key full hash w...
Memory Model inside Executor
data
1010100001010
0010001001001
1010001000111
1010100100101
Cache
Manager
Memory
Manager
ope...
Future Work
• SPARK-19489: Stable serialization format for external &
native code integration.
• SPARK-15689: Data source ...
Try Apache Spark in Databricks!
UNIFIED ANALYTICS PLATFORM
• Collaborative cloud environment
• Free version (community edi...
Thank You
A Developer’s View into Spark's Memory Model with Wenchen Fan
Upcoming SlideShare
Loading in …5
×

A Developer’s View into Spark's Memory Model with Wenchen Fan

988 views

Published on

As part of Project Tungsten, we started an ongoing effort to substantially improve the memory and CPU efficiency of Apache Spark’s backend execution and push performance closer to the limits of modern hardware. In this talk, we’ll take a deep dive into Apache Spark’s unified memory model and discuss how Spark exploits memory hierarchy and leverages application semantics to manage memory explicitly (both on and off-heap) to eliminate the overheads of JVM object model and garbage collection.

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

A Developer’s View into Spark's Memory Model with Wenchen Fan

  1. 1. A Developer’s View Into Spark’s Memory Model Wenchen Fan 2017-6-7
  2. 2. About Me • Software Engineer @ • Apache Spark Committer • One of the most active Spark contributors
  3. 3. About Databricks TEAM Started Spark project (now Apache Spark) at UC Berkeley in 2009 MISSON Make Big Data Simple PRODUC TUnified Analytics Platform
  4. 4. Maste r Worker 1 Worker 2 Worker 3 Driver Executor Executor Driver Executor Executor Memory Manager Thread Pool
  5. 5. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 internal format Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages
  6. 6. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  7. 7. Memory Allocation • Allocation happens in page granularity. • Off-heap supported! • Page is not fixed-size, but has a lower and upper bound. • No pooling, pages are freed once there is no data on it.
  8. 8. Why var-length page and no pooling? • Pros: • simplify the implementation. (no single record will across pages) • free memory immediately so that the OS can use them for file buffer, etc. • Cons: • can not handle super big single record. (very rare in reality) • fragmentation for records bigger than page size lower bound. (the lower bound is several mega bytes, so it’s also rare) • overhead in allocation. (most malloc algorithms should work well)
  9. 9. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  10. 10. Java Objects Based Row Format Array BoxedInteger(123) String(“data”) String(“bricks”) Row • 5+ objects • high space overhead • slow value accessing • expensive hashCode() (123, “data”, “bricks”)
  11. 11. Data objects? No! • It is hard to monitor and control the memory usage when we have a lot of objects. • Garbage collection will be the killer. • High serialization cost when transfer data inside cluster.
  12. 12. Efficient Binary Format 0x0 null tracking (123, “data”, “bricks”)
  13. 13. Efficient Binary Format 0x0 123 null tracking (123, “data”, “bricks”)
  14. 14. Efficient Binary Format 0x0 123 32 “data”4 null tracking (123, “data”, “bricks”) offset and length of
  15. 15. Efficient Binary Format “bricks”0x0 123 32 40 “data”4 6 null tracking (123, “data”, “bricks”) offset and length of offset and length of data
  16. 16. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  17. 17. Operate On Binary 0x 0 12 3 2 4 4 “data ” 0x 0 12 3 2 4 4 “data ” 0x 0 1 6 2 “da” 123 > JSON files substring
  18. 18. How to process binary data more efficiently?
  19. 19. Understanding CPU Cache Memory is becoming slower and slower than CPU.
  20. 20. Understanding CPU Cache Pre-fetch frequently accessed data into CPU cache.
  21. 21. The most 2 important algorithms in big data are ... Sort and Hash!
  22. 22. Naive Sort “bbb” “aaa” “ccc” pointer pointer pointer
  23. 23. Naive Sort pointer pointer pointer “bbb” “aaa” “ccc”
  24. 24. Naive Sort pointer pointer pointer “bbb” “aaa” “ccc”
  25. 25. Naive Sort pointer pointer pointer “bbb” “aaa” “ccc”
  26. 26. Naive Sort Each comparison needs to access 2 different memory regions, which makes it hard for CPU cache to pre-fetch data, poor cache locality!
  27. 27. Cache-aware Sort record record record ptr ptr ptr key-prefix key-prefix key-prefix
  28. 28. Cache-aware Sort record record record ptr ptr ptr key-prefix key-prefix key-prefix
  29. 29. Cache-aware Sort record record record ptr ptr ptr key-prefix key-prefix key-prefix
  30. 30. Cache-aware Sort Most of the time, just go through the key-prefixes in a linear fashion, good cache locality!
  31. 31. Naive Hash Map key1pointer v1 key2pointer v2 key3pointer v3
  32. 32. Naive Hash Map pointer pointer pointer key3 lookup key1 v1 key2 v2 key3 v3
  33. 33. Naive Hash Map pointer pointer pointer key3 hash(key) % size key1 v1 key2 v2 key3 v3
  34. 34. Naive Hash Map pointer pointer pointer key3 compare these 2 keys key1 v1 key2 v2 key3 v3
  35. 35. Naive Hash Map pointer pointer pointer key3 quadratic probing key1 v1 key2 v2 key3 v3
  36. 36. Naive Hash Map key3 compare these 2 keys pointer pointer pointer key1 v1 key2 v2 key3 v3
  37. 37. Naive Hash Map Each lookup needs many pointer dereferences and key comparison when hash collision happens, and jumps between 2 memory regions, bad cache locality!
  38. 38. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key1 v1 key2 v2 key3 v3
  39. 39. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 lookup key1 v1 key2 v2 key3 v3
  40. 40. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 hash(key) % size, and compare the full hash key1 v1 key2 v2 key3 v3
  41. 41. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 quadratic probing, and compare the full hash key1 v1 key2 v2 key3 v3
  42. 42. ptr ptr ptr full hash full hash full hash Cache-aware Hash Map key3 key1 v1 key2 v2 key3 v3 compare these 2 keys
  43. 43. Cache-aware Hash Map Each lookup mostly only needs one pointer dereference and key comparison(full hash collision is rare), and access data mostly in a single memory region, better cache locality!
  44. 44. Recap: Cache-aware data structure How to improve cache locality … • store key-prefix with pointer. • store key full hash with pointer. Store extra information to try to keep the memory accessing in a single region.
  45. 45. Memory Model inside Executor data 1010100001010 0010001001001 1010001000111 1010100100101 Cache Manager Memory Manager operators allocate memor y memor y pages allocate memor ymemor y pages internal format
  46. 46. Future Work • SPARK-19489: Stable serialization format for external & native code integration. • SPARK-15689: Data source API v2 • SPARK-15687: Columnar execution engine.
  47. 47. Try Apache Spark in Databricks! UNIFIED ANALYTICS PLATFORM • Collaborative cloud environment • Free version (community edition) DATABRICKS RUNTIME 3.0 • Apache Spark - optimized for the cloud • Caching and optimization layer - DBIO • Enterprise security - DBES Try for free today. databricks.com
  48. 48. Thank You

×