Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at Big Data Spain 2014

THE ABSTRACTION THAT POWERS THE BIG DATA
RAÚL CASTRO FERNÁNDEZ
COMPUTER SCIENCE PHD STUDENT IMPERIAL COLLEGE

Data!ows: The Abstraction
that Powers Big Data
Raul
Castro
Fernandez
Imperial
College
London
rc3011@doc.ic.ac.uk
@raulcfernandez

“Big
Data
needs
Democra:za:on”

3
Democratization of Data
Developers
and
DBAs
are
no
longer
the
only
ones
genera:ng,
processing
and
analyzing
data.

4
Democratization of Data
Developers
and
DBAs
are
no
longer
the
only
ones
genera:ng,
processing
and
analyzing
data.
Decision
makers,
domain
scien:sts,
applica:on
users,
journalists,
crowd
workers,
and
everyday
consumers,
sales,
marke:ng…

6
+
Everyone
has
data
+
Many
have
interes:ng
ques:ons

7
+
Everyone
has
data
+
Many
have
interes:ng
ques:ons
-‐
Not
everyone
knows
how
to
analyze
it

8
+
Everyone
has
data
+
Many
have
interes:ng
ques:ons
-‐
Not
everyone
knows
how
to
analyze
it

Bob
11
-‐
Barrier
of
human
communica:on
-‐
Barrier
of
professional
rela:ons
Local
Expert

Bob
12
-‐
Barrier
of
human
communica:on
-‐
Barrier
of
professional
rela:ons
Local
Expert
The
limits
of
my
language
mean
the
limits
of
my
world.
Ludwig
WiWgenstein
“Tractatus
Logico-‐Philosophicus
1922”

13
First
step
to
democra:ze
Big
Data:
to
offer
a
familiar
programming
interface

• Mo>va>on
• SDG:
Stateful
Dataflow
Graphs
• Handling
distributed
state
in
SDGs
• Transla:ng
Java
programs
to
SDGs
• Checkpoint-‐based
fault
tolerance
for
SDGs
• Experimental
evalua:on
14
Outline
?
?

Mutable State in a Recommender System
User-‐Item
matrix
(UI)
Co-‐Occurrence
matrix
(CO)
15
Matrix
userItem
=
new
Matrix();
Matrix
coOcc
=
new
Matrix();
Item-‐A
Item-‐B
User-‐A
4
5
User-‐B
0
5
Item-‐A
Item-‐B
Item-‐A
1
1
Item-‐B
1
2

User-‐Item
matrix
(UI)
Co-‐Occurrence
matrix
(CO)
16
Matrix
userItem
=
new
Matrix();
Matrix
coOcc
=
new
Matrix();
void
addRa>ng(int
user,
int
item,
int
ra>ng)
{
userItem.setElement(user,
item,
ra:ng);
updateCoOccurrence(coOcc,
userItem);
}
Item-‐A
Item-‐B
User-‐A
4
5
User-‐B
0
5
Item-‐A
Item-‐B
Item-‐A
1
1
Item-‐B
1
2
Update
with
new
ra:ngs

User-‐Item
matrix
(UI)
Co-‐Occurrence
matrix
(CO)
17
Matrix
userItem
=
new
Matrix();
Matrix
coOcc
=
new
Matrix();
void
addRa>ng(int
user,
int
item,
int
ra>ng)
{
item,
ra:ng);
userItem);
}
Vector
getRec(int
user)
{
Vector
userRow
=
userItem.getRow(user);
Vector
userRec
=
coOcc.mul:ply(userRow);
return
userRec;
}
Item-‐A
Item-‐B
User-‐A
4
5
User-‐B
0
5
Item-‐A
Item-‐B
Item-‐A
1
1
Item-‐B
1
2
Update
with
new
ra:ngs
User-‐B
Mul:ply
for
recommenda:on
1
2
x

Challenges When Executing with Big Data
18
Big
Data
Problem:
Matrices
become
large
>
Mutable
state
leads
to
concise
algorithms
but
complicates
parallelism
and
fault
tolerance
Matrix
userItem
=
new
Matrix();
Matrix
coOcc
=
new
Matrix();
>
Cannot
lose
state
aRer
failure
>
Need
to
manage
state
to
support
data-‐parallelism

19
Using Current Distributed Data"ow
Frameworks
Input
data
Output
data
>
No
mutable
state
simplifies
fault
tolerance
>
MapReduce:
Map
and
Reduce
tasks
>
Storm:
No
support
for
state
>
Spark:
Immutable
RDDs

20
Imperative Big Data Processing
>
Programming
distributed
dataflow
graphs
requires
learning
new
programming
models

21
Imperative Big Data Processing
>
Programming
distributed
dataflow
graphs
requires
learning
new
programming
models
Our
Goal:
Run
Java
programs
with
mutable
state
but
with
performance
and
fault
tolerance
of
distributed
dataflow
systems

Stateful Data"ow Graphs: From Imperative
22
Programs to Distributed Data"ows
Program.java
SDGs:
Stateful
Dataflow
Graphs
>
Mutable
distributed
state
in
dataflow
graphs
>
@Annota>ons
help
with
transla>on
from
Java
to
SDGs
>
Checkpoint-‐based
fault
tolerance
recovers
mutable
state
aRer
failure

• Mo:va:on
• SDG:
Stateful
Dataflow
Graphs
• Handling
distributed
state
in
SDGs
• Transla:ng
Java
programs
to
SDGs
fault
tolerance
for
SDGs
• Experimental
evalua:on
23
Outline
Program.java

SDG: Data, State and Computation
>
SDGs
separate
data
and
state
to
allow
data
and
pipeline
parallelism
24
Task
Elements
(TEs)
process
data
State
Elements
(SEs)
represent
state
Dataflows
represent
data
>
Task
Elements
have
local
access
to
State
Elements

State
Elements
support
two
abstrac:ons
for
distributed
mutable
state
– Par>>oned
SEs:
task
elements
always
access
state
by
key
– Par>al
SEs:
task
elements
can
access
complete
state
25
Distributed Mutable State

Distributed Mutable State: Partitioned SEs
Access
by
key
State
par::oned
according
26
Dataflow
routed
according
to
hash
func:on
Item-‐A
Item-‐B
User-‐A
4
5
User-‐B
0
5
to
par>>oning
key
>
Par>>oned
SEs
split
into
disjoint
par::ons
User-‐Item
matrix
(UI)
hash(msg.id)
Key
space:
[0-‐N]
[0-‐k]
[(k+1)-‐N]

Distributed Mutable State: Partial SEs
27
Local
access:
Data
sent
to
one
Global
access:
Data
sent
to
all
>
Par>al
SE
gives
nodes
local
state
instances
>
Par>al
SE
access
by
TEs
can
be
local
or
global

28
Merging Distributed Mutable State
>
Reading
all
par:al
SE
instances
results
in
Merge
logic
set
of
par>al
values
>
Requires
applica:on-‐specific
merge
logic

29
>
Reading
all
par:al
SE
instances
results
in
Mul:ple
par:al
values
Merge
logic
set
of
par>al
values
>
Requires
merge
logic

30
>
Reading
all
par:al
SE
instances
results
in
Mul:ple
par:al
values
Collect
par:al
values
Merge
logic
set
of
par>al
values
>
Requires
merge
logic

31
Outline
>
@Annota>ons
• Mo:va:on
• SDG:
Stateful
Dataflow
Graphs
• Handling
distributed
state
in
SDGs
• Transla>ng
Java
programs
to
SDGs
fault
tolerance
for
SDGs
• Experimental
evalua:on
Program.java

32
From Imperative Code to Execution
SEEP
Annotated
program
>
SEEP:
data-‐parallel
processing
plaborm
• Transla:on
occurs
in
two
stages:
– Sta<c
code
analysis:
From
Java
to
SDG
– Bytecode
rewri<ng:
From
SDG
to
SEEP
[SIGMOD’13]
Program.java

Program.java
33
Translation Process
Extract
TEs,
SEs
and
accesses
Live
variable
analysis
TE
and
SE
access
code
assembly
SEEP
runnable
SOOT
Framework
Javassist
>
Extract
state
and
state
access
paderns
through
sta:c
code
analysis
>
Genera:on
of
runnable
code
using
TE
and
SE
connec:ons

Program.java
34
Translation Process
Extract
TEs,
SEs
and
accesses
Live
variable
analysis
TE
and
SE
access
code
assembly
SEEP
runnable
SOOT
Framework
Javassist
>
Extract
state
and
state
access
paderns
through
sta:c
code
analysis
>
Genera:on
of
runnable
code
using
TE
and
SE
connec:ons
Annotated
Program.java

35
@Par>>oned
Partitioned State Annotation
Matrix
userItem
=
new
SeepMatrix();
Matrix
coOcc
=
new
Matrix();
void
addRa:ng(int
user,
int
item,
int
ra:ng)
{
item,
ra:ng);
userItem);
}
Vector
getRec(int
user)
{
Vector
userRow
=
Vector
userRec
=
return
userRec;
}
>
@Par>>on
field
annota>on
indicates
par<<oned
state
hash(msg.id)

36
Partial State and Global Annotations
@Par::oned
Matrix
userItem
=
new
SeepMatrix();
@Par>al
Matrix
coOcc
=
new
SeepMatrix();
void
addRa:ng(int
user,
int
item,
int
ra:ng)
{
item,
ra:ng);
updateCoOccurrence(@Global
coOcc,
userItem);
}
>
@Par>al
field
annota>on
indicates
>
@Global
annotates
variable
par<al
to
indicate
access
to
all
par:al
instances
state

37
Partial and Collection Annotations
@Par::oned
Matrix
userItem
=
new
SeepMatrix();
@Par>al
Matrix
coOcc
=
new
SeepMatrix();
Vector
getRec(int
user)
{
Vector
userRow
=
@Par>al
Vector
puRec
=
@Global
Vector
userRec
=
merge(puRec);
return
userRec;
}
Vector
merge(@Collec>on
Vector[]
v){
/*…*/
}
>
@Collec>on
annota:on
indicates
merge
logic

38
Outline
>
Failures
• Mo:va:on
• SDG:
Stateful
Dataflow
Graphs
• Handling
distributed
state
in
SDGs
• Transla:ng
Java
programs
to
SDGs
• Checkpoint-‐Based
fault
tolerance
for
SDGs
• Experimental
evalua:on
Program.java

Challenges of Making SDGs Fault Tolerant
access
39
Physical
deployment
of
SDG
>
Task
elements
>
Node
failures
may
lead
to
state
loss
local
in-‐memory
state

access
40
RAM
RAM
Physical
deployment
of
SDG
>
Task
elements
>
Node
failures
may
lead
to
state
loss
local
in-‐memory
state
Physical
nodes

41
RAM
RAM
Physical
deployment
of
SDG
>
Node
failures
may
lead
to
state
loss
Checkpoin>ng
State
• No
updates
allowed
while
state
is
being
checkpointed
• Checkpoin:ng
state
should
not
impact
data
processing
path
>
Task
elements
access
local
in-‐memory
state
Physical
nodes

42
RAM
RAM
Physical
deployment
of
SDG
>
Node
failures
may
lead
to
state
loss
State
Backup
• Backups
large
and
cannot
be
stored
in
memory
• Large
writes
to
disk
through
network
have
high
cost
Checkpoin>ng
State
• No
updates
allowed
while
state
is
being
checkpointed
• Checkpoin:ng
state
should
not
impact
data
processing
path
>
Task
elements
access
local
in-‐memory
state
Physical
nodes

Checkpoint Mechanism for Fault Tolerance
1. Freeze
mutable
state
for
checkpoin:ng
2. Dirty
state
supports
updates
concurrently
3. Reconcile
dirty
state
43
Asynchronous,
lock-‐free
checkpoin>ng
Dirty
state

Distributed M to N Checkpoint Backup
44
M
to
N
distributed
backup
and
parallel
recovery

45
M
to
N
distributed
backup
and
parallel
recovery

46
M
to
N
distributed
backup
and
parallel
recovery

47
M
to
N
distributed
backup
and
parallel
recovery

48
M
to
N
distributed
backup
and
parallel
recovery

49
M
to
N
distributed
backup
and
parallel
recovery

50
M
to
N
distributed
backup
and
parallel
recovery

51
M
to
N
distributed
backup
and
parallel
recovery

52
M
to
N
distributed
backup
and
parallel
recovery

How
does
mutable
state
impact
performance?
How
efficient
are
translated
SDGs?
What
is
the
throughput/latency
trade-‐off?
Experimental
set-‐up:
– Amazon
EC2
(c1
and
m1
xlarge
instances)
– Private
cluster
(4-‐core
3.4
GHz
Intel
Xeon
servers
with
8
GB
RAM
)
– Sun
Java
7,
Ubuntu
12.04,
Linux
kernel
3.10
53
Evaluation of SDG Performance

54
Processing with Large Mutable State
>
addRa:ng
and
getRec
func:ons
from
recommender
20
15
10
5
0
algorithm,
while
changing
read/write
ra:o
Throughput
Latency
1:5 1:2 1:1 2:1 5:1
1000
100
Throughput (1000 requests/s)
Latency (ms)
Workload (state read/write ratio)
Combines
batch
and
online
processing
to
serve
fresh
results
over
large
mutable
state

55
E#ciency of Translated SDG
60
50
40
30
20
10
0
>
Batch-‐oriented,
itera:ve
logis:c
regression
25 50 75 100
Throughput (GB/s)
Number of nodes
SDG
Spark
Translated
SDG
achieves
performance
similar
to
non-‐mutable
dataflow

56
Latency/Throughput Tradeo$
>
Streaming
word
count
query,
repor:ng
counts
over
windows
250
200
150
100
50
0
SDG
Naiad-LowLatency
10 100 1000 10000
Window size (ms)
SDGs
achieve
high
throughput
while
main>ng
low
latency

57
>
Streaming
word
count
query,
repor:ng
counts
over
windows
250
250
200
150
100
50
0
Naiad-HighThroughput
SDG
Naiad-LowLatency
Streaming Spark
10 100 1000 10000
s)
Window size (ms)
SDGs
achieve
high
throughput
while
main>ng
low
latency

58
>
Streaming
word
count
query,
repor:ng
counts
over
windows
250
250
250
200
200
150
150
100
100
50
50
0
SDG
SDG
Streaming Spark
Naiad-LowLatency
Streaming Spark
10 100 1000 10000
s)
Window size (ms)
SDGs
achieve
high
throughput
while
main>ng
low
latency

Running
Java
programs
with
the
performance
of
current
distributed
dataflow
frameworks
SDG:
Stateful
Dataflow
Graphs
– Abstrac:ons
for
distributed
mutable
state
– Annota>ons
to
disambiguate
types
of
distributed
state
and
state
access
– Checkpoint-‐based
fault
tolerance
mechanism
59
Summary

Running
Java
programs
with
the
performance
of
current
distributed
dataflow
frameworks
SDG:
Stateful
Dataflow
Graphs
– Abstrac:ons
for
distributed
mutable
state
– Annota>ons
to
disambiguate
types
of
distributed
state
and
state
access
– Checkpoint-‐based
fault
tolerance
mechanism
60
Summary
hEps://github.com/lsds/Seep/
hEps://github.com/raulcf/SEEPng/
Thank
you!
Any
Ques>ons?
@raulcfernandez
rc3011@doc.ic.ac.uk

62
Scalability
on
State
Size
and
Throughput
>
Increase
state
size
in
a
mutated
KV
store
2
1.5
1
0.5
0
Throughput
Latency
50 100 150 200
1000
100
10
1
Throughput (million requests/s)
Latency (ms)
Aggregated memory (GB)
Support
large
state
without
compromising
throughput
or
latency
while
staying
fault
tolerant

63
Itera:on
in
SDG
>
Local
itera>on
supported
by
one
node
>
Itera>on
across
TEs
requires
cycle
in
the
dataflow

• Par::on
• Par:al
• Global
• Par:al
• Collec:on
• Data
annota:ons
– Batch
– Stream
64
Types
of
Annota:ons

Overhead
of
SDG
Fault
Tolerance
Fault
Tolerance
mechanism
impact
on
performance
and
65
10000
1000
100
10
1
No FT 1 2 3 4 5
Latency (ms)
State size (GB)
1000
100
10
1
latency
is
small.
2 4 6 8 10 No FT
Latency (ms)
Checkpoint frequency (s)
State
size
and
checkpoin>ng
Frequency
do
not
affect
the
performance

66
10
8
6
4
2
0
Fault
Tolerance
Overhead
SDG
Naiad-NoDisk
Naiad-Disk
SDG (latency)
Naiad-NoDisk (latency)
10 100 1000 2000
100
80
60
40
20
0
Throughput (10,000 requests/s)
Latency (ms)
Aggregated memory (MB)

40
35
30
25
20
15
10
5
0
1-to-1 recovery
2-to-1 recovery
1-to-2 recovery
2-to-2 recovery
1 2 4
Recovery time (s)
State size (GB)
67
Recovery
Times

68
30
25
20
15
10
5
0
Stragglers
Throughput
0 10 20 30 40 50 60
5
4
3
2
1
0
Throughput (1000 request/s)
Number of nodes
Time (s)
Nodes

69
Fault
Tolerance
Sync.
Vs.
Async.
250
200
150
100
50
0
T'put (Async) Latency (Sync)
T'put (Sync)
1 2 3 4
10
1
0.1
0.01
0.001
Latency (s)
State size (GB)

System
Large
State
Mutable
State
Low
Latency
Itera>on
MapReduce
n/a
n/a
No
No
Spark
n/a
n/a
No
Yes
Storm
n/a
n/a
Yes
No
Naiad
No
Yes
Yes
Yes
SDG
Yes
Yes
Yes
Yes
70
Comparison
to
State-‐of-‐the-‐Art
SDGs
are
first
stateful
fault
tolerant
model;
enabling
execu:on
of
impera:ve
code
with
explicit
state

71
Characteris:cs
of
SDGs
>
Run>me
Data
Parallelism
(elas>city)
>
Support
for
Cyclic
Graphs
>
Low
Latency
Adapta:on
to
varying
workloads
and
mechanism
against
stragglers
Efficiently
represent
itera:ve
algorithms
Pipelining
tasks
decreases
latency

72
Local
Expert
Bob
Hi,
I
have
a
query
to
run
on
“Big
Data”
Ok,
cool,
tell
me
about
it
I
want
to
know
sales
per
employee
on
Saturdays
…
well
…
ok,
come
in
3
days
Well,
this
is
actually
preWy
urgent…
…
2
days,
I’m
preWy
busy
2
Days
Ayer
Hi!
You
have
the
results?
Yes,
here
you
have
your
sales
last
Saturday
My
sales?
I
meant
all
employee
sales,
and
not
only
last
Saturday
ups,
sorry
for
that,
give
me
2
days…

17TH ~ 18th NOV 2014
MADRID (SPAIN)

Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at Big Data Spain 2014

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at Big Data Spain 2014

Similar to Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at Big Data Spain 2014 (20)

More from Big Data Spain

More from Big Data Spain (20)

Recently uploaded

Recently uploaded (20)

Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at Big Data Spain 2014