Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

1,416 views
1,315 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,416
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

  1. 1. Composing and ExecutingParallel Data-flow Graphs with Shell Pipes Edward Walker (TACC) Weijia Xu (TACC) Vinoth Chandar (Oracle Corp)
  2. 2. Agenda• Motivation• Shell language extensions• Implementation• Experimental evaluation• Conclusions
  3. 3. Motivation• Distributed memory clusters are becoming pervasive in industry and academia• Shells are the default login environment on these systems• Shell pipes are commonly used for composing extensible unix commands.• There has been no change to the syntax/semantics of shell pipes since their invention over 30 years ago.• Growing need to compose massively parallel jobs quickly, using existing software
  4. 4. Extending Shells for Parallel Computing• Build a simple, powerful coordination layer at the Shell• The coordination layer transparently manages the parallelism in the workflow• User specifies parallel computation as a dataflow graph using extensions to the Shell• Provides the ability to combine different tools and build interesting parallel programs quickly.
  5. 5. Shell pipe extensions• Pipeline fork A | B on n procs• Pipeline join A on n procs | B• Pipeline cycles (++ n A)• Pipeline key-value aggregation A | B on keys
  6. 6. Parallel shell tasks extensions> function foo(){ echo “hello world”}> foo on all procs # foo() on all CPUs> foo on all nodes # foo() on all nodes stride> foo on 10:2 procs # 10 tasks, 2 tasks on each node span> foo on 10:2:2 procs # 10 tasks, 2 tasks on alternative node
  7. 7. Composing data-flow graphs• Example 1: function B1() {} B1 function B2() {} A C function B() { B2 if (($_ASPECT_TASKID == 0 )) ; then B1 else B2 endif } A | B on 2 procs | C
  8. 8. Composing data-flow graphs• Example 2: function map() { reduce emit_tuple –k key –v value map } Key-value function reduce() DHT { consume_tuple –k key –v value map reduce num=${#value[@]} for ((i=0; i < $num; i++)) ; do # process key=$key, value=${value[$i]} done } map on all procs | reduce on keys
  9. 9. BASH Implementation
  10. 10. Startup Overlay• Script may have many instances requiring startup of parallel tasks• Motivation for overlay: – Fast startup of parallel shell workers – Handles node failures gracefully• Two level hierarchy: sectors and proxies• Overlay node addressing: 7 0 Compute node ID Sector id Proxy id
  11. 11. Fault-Tolerance• Proxy nodes monitor peers within sector, and sector heads monitor peer sectors• Node 0 maintains a list of available nodes in the overlay in a master_node file Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node
  12. 12. Starting shell workers with startup overlay
  13. 13. 1. Bash spawns agent.2. Agent queries master_node and spawnsnode I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node (2) (1) BASH agent (2) Node I/O MUX
  14. 14. 3. Agent Invokes overlay to spawnCPU I/O multiplexor on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX
  15. 15. 4. CPU I/O multiplexor spawns ashell worker per CPU on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  16. 16. 5. CPU I/O multiplexor calls back tonode I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (1) BASH agent (3) (2) (5) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  17. 17. Implementation of pipeline fork
  18. 18. 1. Process B pipes stdin into stdin_file A | B on N procs stdin BASH stdout pipe (1) aspect-agent B stdin A reader stdin_file
  19. 19. 2. Constructs command files for eachtask A | B on N procs stdin BASH stdout pipe (1) aspect-agent B Cmd stdin dispatcher A reader (2) stdin_file Cmd files B cat stdin_file | B
  20. 20. 3. 4. and 5. Execute command files in shellworkers and marshal results back to shell A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher (3) qu eue Cmd (5) files Node B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) Shell Shell worker worker B B
  21. 21. 6. Replay command files on failure A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher replayer (3) (6) qu e Local compute node ue Cmd (5) Shell Shell files Node worker worker B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) B B Shell Shell worker worker B B
  22. 22. Implementation of key-value aggregation
  23. 23. 1. Agent inspects and hashes key A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher
  24. 24. 2. Routes key-value to compute node basedon key hash, and stored in hash table A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table
  25. 25. 3. Each node constructs command files topipe the key-value entry from its hash tableinto process B A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  26. 26. 4. Results from the command filesexecution are marshaled back to the shell A | B on keys pipe BASH control control (1) stdout control aspect-agent B Key I/O MUX A dispatcher (2) Node MUX (4) Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  27. 27. Experimental Evaluation
  28. 28. Startup overlay performance (whencompared to SSH default mechanism)
  29. 29. Syntactic benchmark I: performance of pipeline join
  30. 30. Syntactic benchmark II: performance of key-value aggregation
  31. 31. TeraSort benchmark: Parallel bucket sort• Step 1: spawn the data generator in parallel on each compute node, partitioning data across N nodes for task T if the first 2 bytes fall in the range:  16 T 16 T + 1 2 ∗ N , 2 ∗ N   • Step 2: perform sort on local data on each node• Step 3: merge results onto global file system
  32. 32. TeraSort benchmark: Sorting rate
  33. 33. Related Work• Ptolemy – embedded system design• Yahoo Pipes – web content filtering• Hadoop – Java implementation of MapReduce• Dryad - distributed DAG data flow computation
  34. 34. Conclusion• A debugger would be extremely helpful. Working on bashdb implementation.• Run-time simulator would be helpful to predict performance based on characteristics of cluster.• Still thinking about how to incorporate our extensions for named pipes (i.e. mkfifo).
  35. 35. Questions ?

×