Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IndexedRDD Efficient Fine-Grained Updates for RDDs


Published on

by Ankur Dave

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

IndexedRDD Efficient Fine-Grained Updates for RDDs

  1. 1. UC  BERKELEY IndexedRDD Ankur Dave UC Berkeley AMPLab Efficient Fine-Grained Updates for RDDs
  2. 2. Immutability in Spark Query: Sequence of data-parallel bulk transformations on RDDs
  3. 3. Immutability in Spark map task 4 map task 3 map task 2 map task 1 reduce task 4 reduce task 3 reduce task 2 reduce task 1 WorkerID Time Transformations are executed as independent tasks
  4. 4. Immutability in Spark map task 4 map task 3 map task 2 map task 1 reduce task 4 reduce task 3 reduce task 2 reduce task 1 WorkerID Time map task 2 (retry) Automatic mid-query fault tolerance
  5. 5. Immutability in Spark map task 4 map task 3 map task 2 (killed) map task 1 reduce task 4 reduce task 3 reduce task 2 reduce task 1 WorkerID Time map task 2 (replica) Automatic straggler mitigation Task replayability: Repeated task executions must give the same result Enforced by input immutability and task determinism
  6. 6. The Need for Efficient Updates •  But applications like streaming aggregation depend on efficient fine-grained updates •  Naive solutions involving mutable state sacrifice the advantages of Spark •  Instead, we propose to enable efficient updates without sacrificing immutable semantics username # followers @taylorswift13 48,182,689 @ankurdave 206 @mejoeyg 81 @taylorswift13: +1 follower @taylorswift13: +1 follower Purely Fu •  Enable ef modifying way •  Immutabl new cop shares st (similar t •  Nodes fr •  Different Streaming Aggregation Incremental Algorithms Old Ranks New RanksGraph Update + =
  7. 7. Existing Solutions map task Direct mutation: Problem:Task failures corrupt state map task (retry) Time x = 0, y = 0x = 1, y = 0x = 2, y = 1 map task def f(row: Row) = { row.x++; row.y++; }
  8. 8. Existing Solutions Atomic batched database updates: Problem: Long-lived snapshots are uncommon in databases Time Pages taskAds task def f(row: Row) = { BEGIN; row.x++; row.y++; COMMIT; } Clicks Ad Views Top Ads Popular Pages
  9. 9. Existing Solutions Full copy: Problem:Very inefficient for small updates def f(row: Row) = { val newRow = row.clone(); newRow.x++; newRow.y++; return newRow; }
  10. 10. Immutability (for fault tolerance, straggler mitigation, dataset reusability, parallel recovery, …) Tradeoff Fine-grained updates (for streaming aggregation, incremental algorithms, …) vs. Can we have both?
  11. 11. Support efficient updates without modifying existing version using structural sharing (fine- grained copy-on-write) Persistent Data Structures
  12. 12. RDD-based distributed key-value store supporting immutable semantics and efficient updates IndexedRDD IndexedRDD (800 LOC) Spark PART (1100 LOC)
  13. 13. IndexedRDD RDD[T] (cached) Array[T] Array[T] Array[T] Array[T] IndexedRDD[K,V] PART[K,V] PART[K,V] PART[K,V] PART[K,V]
  14. 14. class  IndexedRDD[K:   KeySerializer,   V]  extends   RDD[(K,   V)] { //  Fine-­‐Grained   Ops  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ def get(k:   K):  Option[V] def multiget(ks:   Array[K]):   Map[K,   V] def put(k:   K,  v:  V):  IndexedRDD[K,   V] def multiput(kvs:   Map[K,   V], merge:  (K,V,V)   =>  V):  IndexedRDD[K,   V] def delete(ks:   Array[K]):   IndexedRDD[K,   V] //  Accelerated   RDD  Ops  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ def filter(pred:   (K,V)  =>  Boolean):   IndexedRDD[K,   V] //  Specialized   Joins  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ def fullOuterJoin[V2,   W](other:   RDD[(K,   V2)])(f) :  IndexedRDD[K,   W] } IndexedRDD API
  15. 15. Persistent Adaptive Radix Trees (PART)
  16. 16. Recent work in main-memory indexing for databases 256-ary radix tree (trie) with node compression V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree:ARTful indexing for main-memory databases. In ICDE 2013. Adaptive Radix Tree
  17. 17. 1. Sorted order traversals (unlike hash tables) 2. Better asymptotic performance than binary search trees for long keys (O(k) vs O(k log n)) 3. Very efficient union and intersection operations 4. Predictable performance: no rehashing or rebalancing Why a Radix Tree?
  18. 18. Adds persistence to the adaptive radix tree using path copying (shadowing) and reference counting 1100 lines of Java Persistent Adaptive RadixTree (PART) old new … …
  19. 19. Updates can be performed in place if: 1. Update applies to nodes referenced only by one version, and 2. That version will never be referenced in the future Batched Updates × v1 v2 v1 v3 v1 v3 ~ t1 t2 v2
  20. 20. Microbenchmarks 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 PART   (batch  size   1K) PART   (batch  size   10M) hash  table red-­black   tree B-­tree M  inserts/s Insert  throughput  (single  threaded)
  21. 21. Microbenchmarks 0 0.5 1 1.5 2 2.5 M  lookups/s Lookup  throughput 0 5 10 15 20 25 M  elements  scanned/s Scan  throughput 0 0.2 0.4 0.6 0.8 1 1.2 Memory  usage  (GB) Memory  usage
  22. 22. Use custom allocator to arrange nodes in scan order when possible: 3.6x faster scans Scan-Optimized Node Allocation
  23. 23. Tree partitioning: minimize number of changed files by segregating frequently-changed nodes from infrequently- changed nodes. Incremental Checkpointing frequently changing infrequently changing
  24. 24. Incremental Checkpointing
  25. 25. Counting occurrences of 26-character string ids Synthetic data generated on the slaves (8 r3.2xlarge) Load with1 billion keys, stream of 100 million keys Streaming Word Count
  26. 26. 1. GC pauses Future work: Off-heap storage w/ reference counting 2. Long keys with high skew Limitations
  27. 27. Thanks! IndexedRDD: Try out the exercise after lunch!