DISTRIBUTING WORK ACROSS
CLUSTERS
Adventures With Riak Pipe
Last Updated: October 31, 2013

Susan Potter @SusanPotter
Background

Concepts & Intuitions

Applications

WHOAMI

2
2013-10-31

.

.
Distributing Work Across Clusters

WHOAMI

whoami
.
.

An app + middleware dev who has felt Ops pain too
BACKGROUND
Background

Concepts & Intuitions

Applications

MODEL/ASSUMPTIONS

4
Background

Concepts & Intuitions

Applications

MOTIVATING USES

5
Background

Concepts & Intuitions

Applications

COMMON APPROACHES

6
Background

Concepts & Intuitions

Applications

WHY DISTRIBUTE? WHY DECENTRALIZE?

Probability of system failure
→ Large datasets (data locality)
→ Eliminate SPOFs
→ Commodity hardware
→

7
Background

Concepts & Intuitions

Applications

8
CONCEPTS & INTUITIONS
Background

Concepts & Intuitions

Applications

LAYERS

10
Background

Concepts & Intuitions

Applications

TERMINOLOGY (CORE)

→ Partition
→ Vnode
→ Hinted Handoff
→ Read Repair

11
Background

Concepts & Intuitions

Applications

TERMINOLOGY (PIPE)

→ Pipe
→ Fitting ("phase")
→ Queue
→ Worker

12
Background

Concepts & Intuitions

Applications

EXAMPLE UNIX PIPE

1
2
3

find . -name "*. rb" 
| xargs egrep "#.*? TODO:" 
| wc -l
Character-based, through file descriptors

13
Background

Concepts & Intuitions

Applications

EXAMPLE FUNCTION COMPOSITION

1

( length . mapToUpper . sanitize ) input
Value based, through functions

14
Background

Concepts & Intuitions

Applications

EXAMPLE RIAK PIPE

1
2
3
4
5
6
7

[ # fitting_spec {
,
,
, # fitting_spec {
,
,
]

name= fetch_trades
module = riskmgr_fetch_trades
...}
name= calc_var
module = riskmgr_calc_var
...}

Message-based, across nodes

15
Background

Concepts & Intuitions

Applications

VNODE AND WORKERS

→

vnode
manages lifecycle and queues of workers

→

vnode worker
processes inputs

16
Background

Concepts & Intuitions

Applications

VNODE WORKER BEHAVIOR

1
2
3
4
5
6
7

behaviour_info ( callbacks ) ->
[{init ,2},
{process ,3},
{done ,1}];
behaviour_info ( _Other ) ->
undefined .
%% Optionally two more too , not required .

17
Background

Concepts & Intuitions

Applications

DISTRIBUTION: CHASHFUN

→ Use random uniform hash fun to saturate workers
→ See Bryan Fink's previous Pipe presentations

18
Background

Concepts & Intuitions

Applications

VALIDATION

1

2
3
4
5

% # fitting_spec { module = ... , arg = 43, ...
}
% validate_arg /1
validate_arg (Arg) when is_integer (Arg) -> ok;
validate_arg (Arg) when is_atom (Arg) -> ok;
validate_arg (_) -> {error , " Argument must be
a valid integer or atom"}.

19
Background

Concepts & Intuitions

Applications

LEAKY PIPES

20
Background

Concepts & Intuitions

Applications

RIAK (CORE) MECHANICS

→

Handoff
Migrate vnodes from node to node

1
2
3

% Old Node
% Called with last known State of worker
archive /1 % returns {ok , Archive }

4
5
6
7

% New Node
% Called with Archive and State from old node
handoff /2 % returns {ok , NewState }

21
Background

Concepts & Intuitions

Applications

FAILURES

→

validate_arg
Seen above

→

Pipe client process dies
YOLO

→

nval
How many vnodes to ask before failing, default 1

→

vnode worker logic
Think using dataflow semantics

22
Background

Concepts & Intuitions

Applications

EXISTING FITTINGS

→

riak_pipe_w_tee
Useful for intermediate results

→

riak_pipe_w_xform
Simple delegater: 3-arity function

→

riak_pipe_w_reduce
What you would expect: simple accumulating reduce

→

riak_kv_pipe_get
Take advantage of data locality with cohosted KV store

23
APPLICATIONS
Background

Concepts & Intuitions

Applications

KNOWN USES

Riak's Map/Reduce
→ Risk metrics
→ Tenant Usage
→

25
Background

Concepts & Intuitions

Applications

TROUBLESHOOTING

→

riak_pipe:status/1
Provides fittings, processed, failures, queue_length, etc stats

→

riak_pipe_w_crash fitting
Used to test Riak Pipe

→

riak_pipe:active_pipelines/1
See all active pipelines

→

riak_pipe_cinfo module
cluster info interrogation module

26
Background

Concepts & Intuitions

Applications

FURTHER WORK

→ More applications (e.g. genetics, 3rd party APIs with rate
limits)
→ Decentralized pipe control
→ Measure "completeness" of resultset

27
Background

Concepts & Intuitions

Applications

RELATED WORK

→ "Pipe" libraries EVERYWHERE (pipe, scalaz-stream)
→ riak_pg
→ Map-Reduce frameworks
→ Staged Event Driven Architecture
→ Event Stream Processing
→ Dataflow multi-stage processing

28
Background

Concepts & Intuitions

Applications

ROYAL FAIL

http://www.flickr.com/photos/dadavidov/

29
Background

Concepts & Intuitions

Applications

QUESTIONS

Questions?

30

Ricon/West 2013: Adventures with Riak Pipe