IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)

UC
BERKELEY

IndexedRDD
Ankur Dave
UC Berkeley AMPLab
Efﬁcient Fine-Grained Updates for RDDs

Immutability in Spark
Query: Sequence of data-parallel bulk transformations on RDDs

map task 4
map task 3
map task 2
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
WorkerID
Time
Transformations are executed as independent tasks

map task 4
map task 3
map task 2
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
WorkerID
Time
map task 2 (retry)
Automatic mid-query fault tolerance

map task 4
map task 3
map task 2 (killed)
map task 1
reduce task 4
reduce task 3
reduce task 2
reduce task 1
WorkerID
Time
map task 2 (replica)
Automatic straggler mitigation
Task replayability: Repeated task executions must give
the same result
Enforced by input immutability and task determinism

The Need for Efficient Updates
•  But applications like streaming aggregation
depend on efficient fine-grained updates
•  Naive solutions involving mutable state
sacrifice the advantages of Spark
•  Instead, we propose to enable efficient
updates without sacrificing immutable
semantics
username # followers
@taylorswift13 48,182,689
@ankurdave 206
@mejoeyg 81
@taylorswift13:
+1 follower
@taylorswift13:
+1 follower
Purely Fu
•  Enable ef
modifying
way
•  Immutab
new cop
shares st
(similar t
•  Nodes fr
•  Different
Streaming Aggregation
Incremental Algorithms
Old Ranks New RanksGraph Update
+ =

Existing Solutions
map task
Direct mutation:
Problem:Task failures
corrupt state
map task (retry)
Time
x = 0, y = 0x = 1, y = 0x = 2, y = 1
map task
def f(row: Row) = { 
row.x++; 
row.y++; 
}

Existing Solutions
Atomic batched
database updates:
Problem: Long-lived
snapshots are
uncommon in
databases
Time
Pages taskAds task
BEGIN; 
row.x++; 
row.y++; 
COMMIT; 
}
Clicks Ad Views Top Ads
Popular Pages

Existing Solutions
Full copy:

Problem:Very inefﬁcient for small updates
val newRow = row.clone(); 
newRow.x++; 
newRow.y++; 
return newRow; 
}

Immutability
(for fault tolerance,
straggler mitigation,
dataset reusability,
parallel recovery, …)
Tradeoff
Fine-grained updates
(for streaming
aggregation, incremental
algorithms, …)
vs.
Can we have both?

Support efﬁcient updates without modifying
existing version using structural sharing (ﬁne-
grained copy-on-write)
Persistent Data Structures

RDD-based distributed key-value store supporting
immutable semantics and efﬁcient updates
IndexedRDD
IndexedRDD (800 LOC)
Spark
PART (1100 LOC)

IndexedRDD
RDD[T]
(cached)
Array[T]

Array[T]

Array[T]

Array[T]

IndexedRDD[K,V]
PART[K,V]

PART[K,V]

PART[K,V]

PART[K,V]

class
IndexedRDD[K:
KeySerializer,
V]
extends
RDD[(K,
V)]
{

//
Fine-‐Grained
Ops
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐

def
get(k:
K):
Option[V]

def
multiget(ks:
Array[K]):
Map[K,
V]

def
put(k:
K,
v:
V):
IndexedRDD[K,
V]

def
multiput(kvs:
Map[K,
V],

merge:
(K,V,V)
=
V):
IndexedRDD[K,
V]

def
delete(ks:
Array[K]):
IndexedRDD[K,
V]

//
Accelerated
RDD
Ops
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐

def
filter(pred:
(K,V)
=
Boolean):
IndexedRDD[K,
V]

//
Specialized
Joins
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐

def
fullOuterJoin[V2,
W](other:
RDD[(K,
V2)])(f)

:
IndexedRDD[K,
W]

}

IndexedRDD API

Persistent Adaptive Radix Trees
(PART)

Recent work in main-memory indexing for
databases
256-ary radix tree (trie) with node compression
V. Leis,A. Kemper, andT. Neumann. The adaptive radix tree:ARTful
indexing for main-memory databases. In ICDE 2013.
Adaptive RadixTree

1.  Sorted order traversals (unlike hash tables)
2.  Better asymptotic performance than binary
search trees for long keys (O(k) vs O(k log n))
3.  Very efﬁcient union and intersection
operations
4.  Predictable performance: no rehashing or
rebalancing
Why a RadixTree?

Adds persistence to the adaptive radix tree using
path copying (shadowing) and reference counting
1100 lines of Java
Persistent Adaptive RadixTree (PART)
old new
…
…

Updates can be performed in place if:
1.  Update applies to nodes referenced only by one
version, and
2.  That version will never be referenced in the future
Batched Updates
×
v1 v2
v1 v3
v1 v3
~
t1
t2
v2

Microbenchmarks
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
PART
(batch size
1K)
PART
(batch size
10M)
hash table red-black
tree
B-tree
Minserts/s
Insert throughput (single threaded)

Microbenchmarks
0
0.5
1
1.5
2
2.5
Mlookups/s
Lookup throughput
0
5
10
15
20
25
Melementsscanned/s
Scan throughput
0
0.2
0.4
0.6
0.8
1
1.2
Memoryusage(GB)
Memory usage

Tree partitioning: minimize number of changed ﬁles by
segregating frequently-changed nodes from infrequently-
changed nodes.
Incremental Checkpointing
frequently 
changing
infrequently 
changing

Counting occurrences of 26-character string ids
Synthetic data generated on the slaves (8 r3.2xlarge)
Load with1 billion keys, stream of 100 million keys
Real app: Streaming aggregation

1.  GC pauses
Future work: Off-heap storage w/ reference
counting
2.  Scan performance
Future work: Layout-aware allocator
Limitations

Thanks!
ankurd@eecs.berkeley.edu
IndexedRDD: https://github.com/amplab/spark-indexedrdd
Use it in your project:
resolvers
+=
Spark
Packages
Repo
at
http://dl.bintray.com/
spark-‐packages/maven

libraryDependencies
+=
amplab
%
spark-‐indexedrdd
%
0.1

IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)

Similar to IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley) (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)