Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

Composing and Executing
Parallel Data-flow Graphs with
Shell Pipes
Edward Walker (TACC)
Weijia Xu (TACC)
Vinoth Chandar (Oracle Corp)

Agenda
• Motivation

• Shell language extensions

• Implementation

• Experimental evaluation

• Conclusions

Motivation
• Distributed memory clusters are becoming pervasive in
industry and academia

• Shells are the default login environment on these
systems

• Shell pipes are commonly used for composing extensible
unix commands.

• There has been no change to the syntax/semantics of
shell pipes since their invention over 30 years ago.

• Growing need to compose massively parallel jobs
quickly, using existing software

Extending Shells for Parallel
Computing
• Build a simple, powerful coordination layer at the Shell

• The coordination layer transparently manages the
parallelism in the workflow

• User specifies parallel computation as a dataflow graph
using extensions to the Shell

• Provides the ability to combine different tools and build
interesting parallel programs quickly.

Shell pipe extensions
• Pipeline fork
A | B on n procs
• Pipeline join
A on n procs | B
• Pipeline cycles
(++ n A)
• Pipeline key-value aggregation
A | B on keys

Parallel shell tasks extensions
> function foo()
{
echo “hello world”
}

> foo on all procs # foo() on all CPUs

> foo on all nodes # foo() on all nodes
stride
> foo on 10:2 procs # 10 tasks, 2 tasks on each node
span
> foo on 10:2:2 procs # 10 tasks, 2 tasks on alternative node

Composing data-flow graphs
• Example 1:

function B1() {}
B1
function B2() {}
A C
function B()
{ B2
if (($_ASPECT_TASKID == 0 )) ; then
B1
else
B2
endif
}

A | B on 2 procs | C

Composing data-flow graphs
• Example 2:
function map()
{
reduce
emit_tuple –k key –v value map
}
Key-value
function reduce() DHT
{
consume_tuple –k key –v value map reduce

num=${#value[@]}
for ((i=0; i < $num; i++)) ; do
# process key=$key, value=${value[$i]}
done
}

map on all procs | reduce on keys

Startup Overlay
• Script may have many instances requiring
startup of parallel tasks
• Motivation for overlay:
– Fast startup of parallel shell workers
– Handles node failures gracefully
• Two level hierarchy: sectors and proxies
• Overlay node addressing: 7 0

Compute node ID

Sector id Proxy id

Fault-Tolerance

• Proxy nodes monitor peers within sector, and
sector heads monitor peer sectors
• Node 0 maintains a list of available nodes in the
overlay in a master_node file
Overlay sector 0 Overlay sector 1
Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7
exec exec exec exec

Node 2 Node 1 Node 4 Node 5
Proxy Proxy Proxy Proxy
exec exec exec exec

master_node

Starting shell workers with
startup overlay

1. Bash spawns agent.
2. Agent queries master_node and spawns
node I/O multiplexor
exec exec exec exec

exec exec exec exec

master_node

(2)

(1)
BASH agent

(2)

Node I/O
MUX

3. Agent Invokes overlay to spawn
CPU I/O multiplexor on node
exec exec exec exec

exec exec exec exec

(3)

(3)

(1)
BASH agent

(2)

Node I/O CPU I/O
MUX MUX

4. CPU I/O multiplexor spawns a
shell worker per CPU on node
exec exec exec exec

exec exec exec exec

(3)

(3)
(1)
BASH agent

(2)

Node I/O CPU I/O
MUX MUX

(4)
CPU CPUCPU

5. CPU I/O multiplexor calls back to
node I/O multiplexor
exec exec exec exec

exec exec exec exec

(3)

(1)
BASH agent
(3)

(2)
(5)
Node I/O CPU I/O
MUX MUX

(4)

CPU CPUCPU

Implementation of pipeline
fork

1. Process B pipes stdin into stdin_file
A | B on N procs

stdin BASH
stdout pipe (1)

aspect-agent B

stdin
A reader
stdin_file

2. Constructs command files for each
task
A | B on N procs

stdin BASH
stdout pipe (1)

aspect-agent B
Cmd
stdin dispatcher
A reader (2)
stdin_file

Cmd
files
B
cat stdin_file | B

3. 4. and 5. Execute command files in shell
workers and marshal results back to shell

A | B on N procs

stdin BASH
stdout pipe (1)

control

stdout
aspect-agent B
Cmd
stdin dispatcher
A reader I/O
(2) flusher
flusher MUX
stdin_file flusher
(3)

qu
eue
Cmd (5)
files Node
B Node
MUX Node
MUX
cat stdin_file | B MUX

Compute node (4)

Shell Shell
worker worker

B B

6. Replay command files on failure

A | B on N procs

stdin BASH
stdout pipe (1)

control

stdout
aspect-agent B
Cmd
stdin dispatcher
A reader I/O
(2) flusher
flusher MUX
stdin_file flusher
replayer (3)
(6)

qu
e
Local compute node

ue
Cmd (5)
Shell Shell files Node
worker worker B Node
MUX Node
MUX
cat stdin_file | B MUX

Compute node (4)
B B
Shell Shell
worker worker

B B

Implementation of key-value
aggregation

1. Agent inspects and hashes key

A | B on keys
pipe
BASH
control control
(1)
aspect-agent B
Key
A
dispatcher

2. Routes key-value to compute node based
on key hash, and stored in hash table

A | B on keys
pipe
BASH
control control
(1)
aspect-agent B
Key
A
dispatcher
(2)

Node
MUX

Compute node Compute node
Distributed Hash Table

Hash Hash
gdbm table gdbm table

3. Each node constructs command files to
pipe the key-value entry from its hash table
into process B
A | B on keys
pipe
BASH
control control
(1)
aspect-agent B
Key
A
dispatcher
(2)

Node
MUX


Hash Hash

emit_tuple emit_tuple
(3)
B B

4. Results from the command files
execution are marshaled back to the shell

A | B on keys
pipe
BASH
control control
(1)

stdout

control
aspect-agent B
Key I/O MUX
A
dispatcher
(2)

Node
MUX (4)


Hash Hash

emit_tuple emit_tuple
(3)
B B

Startup overlay performance (when
compared to SSH default mechanism)

Syntactic benchmark I: performance of
pipeline join

Syntactic benchmark II: performance of
key-value aggregation

TeraSort benchmark:
Parallel bucket sort
• Step 1: spawn the data generator in parallel on
each compute node, partitioning data across N
nodes for task T if the first 2 bytes fall in the
range:
 16 T 16 T + 1
2 ∗ N , 2 ∗
N 
 

• Step 2: perform sort on local data on each node

• Step 3: merge results onto global file system

TeraSort benchmark:
Sorting rate

Related Work
• Ptolemy – embedded system design

• Yahoo Pipes – web content filtering

• Hadoop – Java implementation of
MapReduce

• Dryad - distributed DAG data flow
computation

Conclusion
• A debugger would be extremely helpful.
Working on bashdb implementation.

• Run-time simulator would be helpful to
predict performance based on
characteristics of cluster.

• Still thinking about how to incorporate our
extensions for named pipes (i.e. mkfifo).

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

Similar to Composing and Executing Parallel Data Flow Graphs wth Shell Pipes (20)

Recently uploaded

Recently uploaded (20)

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes