5. Immutability in Spark
map task 4
map task 3
map task 2 (killed)
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
WorkerID
Time
map task 2 (replica)
Automatic straggler mitigation
Task replayability: Repeated task executions must give
the same result
Enforced by input immutability and task determinism
6. The Need for Efficient Updates
• But applications like streaming aggregation
depend on efficient fine-grained updates
• Naive solutions involving mutable state
sacrifice the advantages of Spark
• Instead, we propose to enable efficient
updates without sacrificing immutable
semantics
username # followers
@taylorswift13 48,182,689
@ankurdave 206
@mejoeyg 81
@taylorswift13:
+1 follower
@taylorswift13:
+1 follower
Purely Fu
• Enable ef
modifying
way
• Immutab
new cop
shares st
(similar t
• Nodes fr
• Different
Streaming Aggregation
Incremental Algorithms
Old Ranks New RanksGraph Update
+ =
7. Existing Solutions
map task
Direct mutation:
Problem:Task failures
corrupt state
map task (retry)
Time
x = 0, y = 0x = 1, y = 0x = 2, y = 1
map task
def f(row: Row) = {
row.x++;
row.y++;
}
8. Existing Solutions
Atomic batched
database updates:
Problem: Long-lived
snapshots are
uncommon in
databases
Time
Pages taskAds task
def f(row: Row) = {
BEGIN;
row.x++;
row.y++;
COMMIT;
}
Clicks Ad Views Top Ads
Popular Pages
16. Recent work in main-memory indexing for
databases
256-ary radix tree (trie) with node compression
V. Leis,A. Kemper, andT. Neumann. The adaptive radix tree:ARTful
indexing for main-memory databases. In ICDE 2013.
Adaptive RadixTree
17. 1. Sorted order traversals (unlike hash tables)
2. Better asymptotic performance than binary
search trees for long keys (O(k) vs O(k log n))
3. Very efficient union and intersection
operations
4. Predictable performance: no rehashing or
rebalancing
Why a RadixTree?
18. Adds persistence to the adaptive radix tree using
path copying (shadowing) and reference counting
1100 lines of Java
Persistent Adaptive RadixTree (PART)
old new
…
…
19. Updates can be performed in place if:
1. Update applies to nodes referenced only by one
version, and
2. That version will never be referenced in the future
Batched Updates
×
v1 v2
v1 v3
v1 v3
~
t1
t2
v2
22. Tree partitioning: minimize number of changed files by
segregating frequently-changed nodes from infrequently-
changed nodes.
Incremental Checkpointing
frequently
changing
infrequently
changing
24. Counting occurrences of 26-character string ids
Synthetic data generated on the slaves (8 r3.2xlarge)
Load with1 billion keys, stream of 100 million keys
Real app: Streaming aggregation