© 2010 IBM Corporation
MULTI-PIPES
Eric Van Hensbergen (IBM Research)
Noah Evans (Alcatel-Lucent Bell-Labs)
Pravin Shinde (ETH Zurich)
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
Motivation
Figure 7: The data dependency graph for a portion of the Hartree-Fock
procedure using a traditional formulation.
12
64,000 Node TorusDataflow Oriented HPC Problems
2
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
PUSH Pipelines
3
For more detail: refer to PODC09 Short Paper on PUSH Dataflow Shell
UNIX Model
a | b | c
PUSH Model
a |< b >| c
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
AAA
BBB
ABABAB
ABA
BAB
4
Problem: Limitations of Traditional Pipes
AAA
BBB
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation5
Long Packet Pipes
AAA
BBB
AAA BBB
BBB
AAA
AAA
BBB
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
TYPE
SIZE
DESTINATION
PARAMETERS
pwrite(pipefd, buf, sz, ~(0));
6
Header Control Blocks
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation7
Enumerated Pipes
1:A 2:B
A B
A B
A
B
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
A
A
A
Broadcast
Reduce(+)
Allreduce(+)
(B+C)
B
C
(A+B+C)
B
C
A
8
Collective Pipes
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
spliceto(b) a b = a b
splicefrom(b) a b = a b
9
Splicing Pipes
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
Example Simple Invocation
•mpipefs
•mount /srv/mpipe /n/testpipe
•ls -l /n/testpipe
•echo hello > /n/testpipe/data &
•cat /n/testpipe/data
10
--rw-rw-rw- M 24 ericvh ericvh 0 Oct 10 18:10 /n/testpipe/data
hello
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
Passing Arguments via aname
•mount /srv/mpipe /n/test othername
•ls -l /n/test
•mount /srv/mpipe /n/test2 -b bcastpipe
•mount /srv/mpipe /n/test3 -e 5 enumpipe
•....you get the idea, read the man page for more details
11
--rw-rw-rw- M 26 ericvh ericvh 0 Oct 10 18:12 /n/test/otherpipe
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
Example for writing control blocks
int
pipewrite(int fd, char *data, ulong size, ulong which)
{
int n;
char hdr[255];
ulong tag = ~0;
char pkttype='p';
/* header byte is at offset ~0 */
n = snprint(hdr, 31, "%cn%ludn%ludnn", pkttype, size, which);
n = pwrite(fd, hdr, n+1, tag);
if(n <= 0)
return n;
return write(fd, data, size);
}
12
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
Larger Example (execfs)
13
/proc
/clone
/###
/stdin
/stdout
/stderr
/args
/ctl
/fd
/fpregs
/kregs
/mem
/note
/noteid
/notepg
/ns
/proc
/profile
/regs
/segment
/status
/text
/wait
mount -a /srv/mpipe /proc/### stdin
mount -a /srv/mpipe /proc/### stdout
mount -a /srv/mpipe /proc/### stderr
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
Really Large Example (gangfs)
14
/proc
/gclone
/status
/g###
/stdin
/stdout
/stderr
/ctl
/ns
/status
/wait
mount -a /srv/mpipe /proc/### -b stdin
mount -a /srv/mpipe /proc/### stdout
mount -a /srv/mpipe /proc/### stderr
...and then, post exec from gangfs clone - execfs stdins are splicedfrom g#/stdin
...and then execfs stdouts and stderrs are splicedto g#/stdout and g#/stderr
...and you can do -e # with stdin to get enumerated instead of brodcast pipes
Tuesday, November 23, 2010
IBM Research
© 2010 IBM Corporation
This project is supported in part by the
U.S. Department of Energy under
Award Number DE-FG02- 08ER25851
http://www.research.ibm.com/austin
http://goo.gl/5eFB
Code Available: http://www.bitbucket.org/ericvh/hare/sys/src/cmd/uem
Man Page Available: http://www.bitbucket.org/ericvh/hare/sys/man/4/mpipefs
15
Tuesday, November 23, 2010

Multipipes

  • 1.
    © 2010 IBMCorporation MULTI-PIPES Eric Van Hensbergen (IBM Research) Noah Evans (Alcatel-Lucent Bell-Labs) Pravin Shinde (ETH Zurich) Tuesday, November 23, 2010
  • 2.
    IBM Research © 2010IBM Corporation Motivation Figure 7: The data dependency graph for a portion of the Hartree-Fock procedure using a traditional formulation. 12 64,000 Node TorusDataflow Oriented HPC Problems 2 Tuesday, November 23, 2010
  • 3.
    IBM Research © 2010IBM Corporation PUSH Pipelines 3 For more detail: refer to PODC09 Short Paper on PUSH Dataflow Shell UNIX Model a | b | c PUSH Model a |< b >| c Tuesday, November 23, 2010
  • 4.
    IBM Research © 2010IBM Corporation AAA BBB ABABAB ABA BAB 4 Problem: Limitations of Traditional Pipes AAA BBB Tuesday, November 23, 2010
  • 5.
    IBM Research © 2010IBM Corporation5 Long Packet Pipes AAA BBB AAA BBB BBB AAA AAA BBB Tuesday, November 23, 2010
  • 6.
    IBM Research © 2010IBM Corporation TYPE SIZE DESTINATION PARAMETERS pwrite(pipefd, buf, sz, ~(0)); 6 Header Control Blocks Tuesday, November 23, 2010
  • 7.
    IBM Research © 2010IBM Corporation7 Enumerated Pipes 1:A 2:B A B A B A B Tuesday, November 23, 2010
  • 8.
    IBM Research © 2010IBM Corporation A A A Broadcast Reduce(+) Allreduce(+) (B+C) B C (A+B+C) B C A 8 Collective Pipes Tuesday, November 23, 2010
  • 9.
    IBM Research © 2010IBM Corporation spliceto(b) a b = a b splicefrom(b) a b = a b 9 Splicing Pipes Tuesday, November 23, 2010
  • 10.
    IBM Research © 2010IBM Corporation Example Simple Invocation •mpipefs •mount /srv/mpipe /n/testpipe •ls -l /n/testpipe •echo hello > /n/testpipe/data & •cat /n/testpipe/data 10 --rw-rw-rw- M 24 ericvh ericvh 0 Oct 10 18:10 /n/testpipe/data hello Tuesday, November 23, 2010
  • 11.
    IBM Research © 2010IBM Corporation Passing Arguments via aname •mount /srv/mpipe /n/test othername •ls -l /n/test •mount /srv/mpipe /n/test2 -b bcastpipe •mount /srv/mpipe /n/test3 -e 5 enumpipe •....you get the idea, read the man page for more details 11 --rw-rw-rw- M 26 ericvh ericvh 0 Oct 10 18:12 /n/test/otherpipe Tuesday, November 23, 2010
  • 12.
    IBM Research © 2010IBM Corporation Example for writing control blocks int pipewrite(int fd, char *data, ulong size, ulong which) { int n; char hdr[255]; ulong tag = ~0; char pkttype='p'; /* header byte is at offset ~0 */ n = snprint(hdr, 31, "%cn%ludn%ludnn", pkttype, size, which); n = pwrite(fd, hdr, n+1, tag); if(n <= 0) return n; return write(fd, data, size); } 12 Tuesday, November 23, 2010
  • 13.
    IBM Research © 2010IBM Corporation Larger Example (execfs) 13 /proc /clone /### /stdin /stdout /stderr /args /ctl /fd /fpregs /kregs /mem /note /noteid /notepg /ns /proc /profile /regs /segment /status /text /wait mount -a /srv/mpipe /proc/### stdin mount -a /srv/mpipe /proc/### stdout mount -a /srv/mpipe /proc/### stderr Tuesday, November 23, 2010
  • 14.
    IBM Research © 2010IBM Corporation Really Large Example (gangfs) 14 /proc /gclone /status /g### /stdin /stdout /stderr /ctl /ns /status /wait mount -a /srv/mpipe /proc/### -b stdin mount -a /srv/mpipe /proc/### stdout mount -a /srv/mpipe /proc/### stderr ...and then, post exec from gangfs clone - execfs stdins are splicedfrom g#/stdin ...and then execfs stdouts and stderrs are splicedto g#/stdout and g#/stderr ...and you can do -e # with stdin to get enumerated instead of brodcast pipes Tuesday, November 23, 2010
  • 15.
    IBM Research © 2010IBM Corporation This project is supported in part by the U.S. Department of Energy under Award Number DE-FG02- 08ER25851 http://www.research.ibm.com/austin http://goo.gl/5eFB Code Available: http://www.bitbucket.org/ericvh/hare/sys/src/cmd/uem Man Page Available: http://www.bitbucket.org/ericvh/hare/sys/man/4/mpipefs 15 Tuesday, November 23, 2010