Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

15,156 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

  1. 1. Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal Josh Rosen (@jshrsn) June 16, 2015
  2. 2. About Databricks Offers a hosted service: •  Spark on EC2 •  Notebooks •  Plot visualizations •  Cluster management •  Scheduled jobs 2 Founded by creators of Spark and remains largest contributor
  3. 3. Goals of Project Tungsten Substantially improve the memory and CPU efficiency of Spark applications . Push performance closer to the limits of modern hardware. 3
  4. 4. In this talk 4 • Motivation: why we’re focusing on compute instead of IO • How Tungsten optimizes memory + CPU • Case study: aggregation • Case study: record sorting • Performance results • Roadmap + next steps
  5. 5. Many big data workloads are now compute bound 5 NSDI’15: •  “Network optimizations can only reduce job completion time by a median of at most 2%.” •  “Optimizing or eliminating disk accesses can only reduce job completion time by a median of at most 19%.” •  We’ve observed similar characteristics in many Databricks Cloud customer workloads.
  6. 6. Why is CPU the new bottleneck? 6 •  Hardware has improved: –  Increasingly large aggregate IO bandwidth, such as 10Gbps links in networks –  High bandwidth SSD’s or striped HDD arrays for storage •  Spark’s IO has been optimized: –  many workloads now avoid significant disk IO by pruning input data that is not needed in a given job –  new shuffle and network layer implementations •  Data formats have improved: –  Parquet, binary data formats •  Serialization and hashing are CPU-bound bottlenecks
  7. 7. How Tungsten improves CPU & memory efficiency •  Memory Management and Binary Processing: leverage application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection •  Cache-aware computation: algorithms and data structures to exploit memory hierarchy •  Code generation: exploit modern compilers and CPUs; allow efficient operation directly on binary data 7
  8. 8. 8
  9. 9. The overheads of Java objects “abcd” 9 •  Native: 4 bytes with UTF-8 encoding •  Java: 48 bytes java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) ... 4 4 (object header) ... 8 4 (object header) ... 12 4 char[] String.value [] 16 4 int String.hash 0 20 4 int String.hash32 0 Instance size: 24 bytes (reported by Instrumentation API) 12 byte object header 8 byte hashcode 20 bytes of overhead + 8 bytes for chars
  10. 10. Garbage collection challenges •  Many big data workloads create objects in ways that are unfriendly to regular Java GC. •  Guest blog on GC tuning: tinyurl.com/db-gc-tuning 10 eden   S0   S1   tenured   permanent   Permanent GenerationOld GenerationYoung Generation Survivor Space
  11. 11. sun.misc.Unsafe 11 •  JVM internal API for directly manipulating memory without safety checks (hence “unsafe”) •  We use this API to build data structures in both on- and off-heap memory Data   structures   with  pointers   Flat  data   structures   Complex   examples  
  12. 12. Java object-based row representation 12 3 fields of type (int, string, string) with value (123, “data”, “bricks”) GenericMutableRow   Array   String(“data”)   String(“bricks”)   5+ objects; high space overhead; expensive hashCode() BoxedInteger(123)  
  13. 13. Tungsten’s UnsafeRow format 13 •  Bit set for tracking null values •  Every column appears in the fixed-length values region: –  Small values are inlined –  For variable-length values (strings), we store a relative offset into the variable- length data section •  Rows are always 8-byte word aligned (size is multiple of 8 bytes) •  Equality comparison and hashing can be performed on raw bytes without requiring additional interpretation null  bit  set  (1  bit/field)     values  (8  bytes  /  field)       variable  length     Offset to var. length data
  14. 14. 6 “bricks” Example of an UnsafeRow 14 0x0 123 32L 48L 4 “data” (123, “data”, “bricks”) Null tracking bitmap Offset to var. length data Offset to var. length data Field lengths
  15. 15. How we encode memory addresses 15 •  Off heap: addresses are raw memory pointers. •  On heap: addresses are base object + offset pairs. •  We use our own “page table” abstraction to enable more compact encoding of on-heap addresses: 0   1   …   N  –  1   Page table Data  page   (Java  object)   page   offset  in  page  
  16. 16. 16 java.util.HashMap …   key  ptr   value  ptr   next   key   value   array •  Huge object overheads •  Poor memory locality •  Size estimation is hard
  17. 17. Memory  page   hc   17 Tungsten’s BytesToBytesMap ptr   …   array •  Low space overheads •  Good memory locality, especially for scans key   value   key   value   key   value   key   value   key   value   key   value  
  18. 18. Code generation •  Generic evaluation of expression logic is very expensive on the JVM –  Virtual function calls –  Branches based on expression type –  Object creation due to primitive boxing –  Memory consumption by boxed primitive objects •  Generating custom bytecode can eliminate these overheads 18 9.33 9.36 36.65 Hand written Code gen Interpreted Projection Evaluating “SELECT a + a + a” (query time in seconds)
  19. 19. Code generation •  Project Tungsten uses the Janino compiler to reduce code generation time. •  Spark 1.5 will greatly expand the number of expressions that support code generation: –  SPARK-8159 19
  20. 20. Example: aggregation optimizations in DataFrames and Spark SQL 20 df.groupBy("department").agg(max("age"), sum("expense"))
  21. 21. Example: aggregation optimizations in DataFrames and Spark SQL 21 Input  Row   Grouping  Key   UnsafeRow   project convert BytesToBytesMap   scan Update   Aggregates   Agg.  Result   update in place probe SPARK-7080
  22. 22. Optimized record sorting in Spark SQL + DataFrames (SPARK-7082) 22 pointer   •  AlphaSort-style prefix sort: –  Store prefixes of sort keys inside the sort pointer array –  During sort, compare prefixes to short-circuit and avoid full record comparisons •  Use this to build external sort-merge join to support joins larger than memory record   Key  prefix   pointer   record   Naïve layout Cache friendly layout
  23. 23. Initial performance results for agg. query 23 0 200 400 600 800 1000 1200 1x 2x 4x 8x 16x Run time (seconds) Data set size (relative) Default Code Gen Tungsten onheap Tungsten offheap
  24. 24. Initial performance results for agg. query 24 0 50 100 150 200 1x 2x 4x 8x 16x Average GC time per node (seconds) Data set size (relative) Default Code Gen Tungsten onheap Tungsten offheap
  25. 25. Project Tungsten Roadmap 25 Spark  1.4   Spark  1.5   Spark  1.6   •  Binary processing for aggregation in Spark SQL / DataFrames •  New Tungsten shuffle manager •  Compression & serialization optimizations •  Optimized code generation •  Optimized sorting in Spark SQL / DataFrames •  End-to-end processing using binary data representations •  External aggregation •  Vectorized / batched processing •  ???
  26. 26. Which Spark jobs can benefit from Tungsten? 26 •  DataFrames –  Java –  Scala –  Python –  R •  Spark SQL queries •  Some Spark RDD API programs, via general serialization + compression optimizations logs.join(! "users,! "logs.userId == users.userId,! ""left_outer") ! .groupBy("userId").agg({"*": "count"})!
  27. 27. How to enable all of Spark 1.4’s Tungsten optimizations 27 spark.sql.codegen = true spark.sql.unsafe.enabled = true spark.shuffle.manager = tungsten-sort Warning!  These  features   are  experimental  in  1.4!  
  28. 28. Thank you. Follow our progress on JIRA: SPARK-7075

×