Open MPI<br />KYOSS presentation<br />12 Jan, 2011<br />Jeff Squyres<br />
What is the Message Passing Interface (MPI)?<br />The Book<br />of MPI<br />A standards document<br />www.mpi-forum.org<br />
Using MPI<br />Hardware and software implement<br />the interface in the MPI standard (book)<br />
MPI implementations<br />There are many implementations<br />of the MPI standard<br />Some are<br />closed source<br />Oth...
Open MPI<br />Open MPI is a free, open source<br />implementation of the MPI standard<br />www.open-mpi.org<br />
So what is MPI for?<br />Let’s break it down…<br />Message Passing Interface<br />
1. Message passing<br />Process A<br />Process B<br />Message<br />
1. Message passing<br />Process A<br />Process B<br />Pass it<br />
1. Message passing<br />Process A<br />Process B<br />Message has been passed<br />
1. Message passing<br />Process<br />Thread A<br />Thread B<br />…as opposed to data that is shared<br />
2. Interface<br />Fortran too!<br />C programming function calls<br />MPI_Wait(req, status)<br />MPI_Init(argv, argc)<br /...
Fortran?  Really?<br />What most modern developers associate with “Fortran”<br />
Yes, really<br />Some of<br />today’s most<br />advanced<br />simulation<br />codes are<br />written in<br />Fortran<br />
Yes, really<br />Yes,<br />that Intel<br />Optimized<br />for Nehalem,<br />Westmere,<br />and beyond!<br />
Fortran is great for what it is<br />A simple language for mathematical expressions and computations<br />Targeted at scie...
Back to defining “MPI”…<br />
Putting it back together<br />Message Passing Interface<br />“An interface for passing messages”<br />“C functions for pas...
C/Fortran functions for message passing<br />Process A<br />Process B<br />MPI_Send(…)<br />
C/Fortran functions for message passing<br />Process A<br />Process B<br />MPI_Recv(…)<br />
Really?  Is that all MPI is?<br />“Can’t I just do that with sockets?”<br />Yes!<br />(…and no)<br />
Comparison<br />(TCP) Sockets<br />Connections based on IP addresses and ports<br />Point-to-point communication<br />Stre...
Comparison<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br ...
Peer integer “rank”<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided ...
Peer integer “rank”<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided ...
“Collective”: broadcast<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-si...
“Collective”: broadcast<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-si...
“Collective”: scatter<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-side...
“Collective”: scatter<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-side...
“Collective”: gather<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided...
“Collective”: gather<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided...
“Collective”: reduce<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided...
“Collective”: reduce<br />MPI<br />42<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and o...
“Collective”: …and others<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-...
Messages, not bytes<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided ...
Messages, not bytes<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided ...
Network independent<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided ...
Network independent<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided ...
Blazing fast<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<b...
What is MPI?<br />MPI is<br />probably<br />somewhere<br />around here<br />
What is MPI?<br />MPI is<br />hides all<br />the layers<br />underneath<br />
What is MPI?<br />A high-level network <br />programming abstraction<br />IP<br />addresses<br />byte<br />streams<br />ra...
What is MPI?<br />A high-level network <br />programming abstraction<br />Nothing to see here<br />Please move along<br />...
So what?<br />What’s all this message passing stuff<br />got to do with supercomputers? <br />
So what?<br />Let’s define “supercomputers”<br />
Supercomputers<br />
Supercomputers<br />“Nebulae”<br />National<br />Supercomputing<br />Centre,<br />Shenzen,<br />China<br />
Supercomputers<br />“Mare Nostrum”<br />(Our Sea)<br />Barcelona<br />Supercomputer<br />Center,<br />Spain<br />Used to b...
Supercomputers<br />Notice anything?<br />
Supercomputers<br />They’re just<br />racks of<br />servers!<br />
Generally speaking…<br />Supercomputer<br />=<br />Lots of<br />processors<br />Lots of<br />RAM<br />Lots of<br />disk<br...
Generally speaking…<br />Supercomputer<br />=<br />(Many) Racks of (commodity)<br />high-end servers<br />(this is one def...
So if that’s a supercomputer…<br />Rack of<br />36 1U<br />servers<br />
How is it different from my web farm?<br />Rack of<br />36 1U<br />servers<br />
Just a bunch of servers?<br />The difference between<br />supercomputers and web farms<br />and database farms (and …)<br ...
Acting together<br />Take your computational problem…<br />Input<br />Output<br />Computational problem<br />
Acting together<br />…and split it up!<br />Input<br />Output<br />Computational problem<br />
Acting together<br />Distribute the input data<br />across a bunch of servers<br />Input<br />Output<br />Computational pr...
Acting together<br />Use the network between servers<br />to communicate / coordinate<br />Input<br />Output<br />
Acting together<br />Use the network between servers<br />to communicate / coordinate<br />Input<br />Output<br />
Acting together<br />MPI is used for this communication<br />Input<br />Output<br />
Why go to so much trouble?<br />One processor hour<br />Computational problem<br />1 processor = …a long time…<br />
Why go to so much trouble?<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour...
High Performance Computing<br />HPC<br />=<br />Using supercomputers to solve<br />real world problems that are <br />TOO ...
Why does HPC      MPI? <br />Network abstraction<br />Are these<br />cores?<br />
Why does HPC      MPI? <br />Network abstraction<br />…or<br />servers?<br />
Why does HPC      MPI? <br />Message semantics<br />Array of<br />10,000<br />integers<br />
Why does HPC      MPI? <br />Message semantics<br />Array of<br />10,000<br />integers<br />
Why does HPC      MPI? <br />Ultra-low network latency<br />(depending on your network type!)<br />1 micro<br />second<br />
1 microsecond = 0.000001 second<br />From<br />here<br />To<br />here<br />
1 microsecond = 0.000001 second<br />From<br />here<br />To<br />here<br />
Holy smokes!<br />That’s fast<br />
Let’s get into some details…<br />
MPI Basics<br />“6 function MPI”<br />MPI_Init(): startup<br />MPI_Comm_size(): how many peers?<br />MPI_Comm_rank(): my u...
Let’s see “Hello, World” in MPI<br />
MPI Hello, World<br />#include <stdio.h><br />#include <mpi.h><br />intmain(intargc, char **argv) {<br />int rank, size;<b...
Compile it with Open MPI<br />shell$ mpicchello.c -o hello<br />shell$<br />Open MPI comes standard in many Linux and BSD ...
“Wrapper” compiler<br />mpicc simply fills in a bunch of <br />compiler command line options for you<br />shell$ mpicchell...
Now let’s run it<br />shell$ mpirun –np 4 hello<br />Hey – what’s that?  Why don’t I just run “./hello”?<br />
mpirun launcher<br />mpirun launches N copies of your<br />program and “wires them up”<br />shell$ mpirun –np 4 hello<br /...
mpirun launcher<br />shell$ mpirun –np 4 hello<br />hello<br />hello<br />Four copies<br />of “hello”<br />are launched<br...
Now let’s run it<br />shell$ mpirun –np 4 hello<br />Hello, world!  I am 0 of 4<br />Hello, world!  I am 1 of 4<br />Hello...
Run on multiple servers!<br />shell$ cat my_hostfile<br />host1.example.com<br />host2.example.com<br />host3.example.com<...
Run on multiple servers!<br />shell$ cat my_hostfile<br />host1.example.com<br />host2.example.com<br />host3.example.com<...
Run it again<br />shell$ mpirun –hostfilemy_hostfile –np 4 hello<br />Hello, world!  I am 2 of 4<br />Hello, world!  I am ...
Standard output re-routing<br />shell$ mpirun–hostfilemy_hostfile –np 4 hello<br />Hello, world! I am 0 of 4<br />Hello, w...
Standard output re-routing<br />shell$ mpirun–hostfilemy_hostfile –np 4 hello<br />hello<br />0<br />hello<br />1<br />mpi...
Printf debugging = Bad<br />If you can’t rely on output ordering,<br />printf debugging is pretty lousy (!)<br />
Parallel debuggers<br />Fortunately, there are parallel<br />debuggers and other tools<br />Parallel<br />debugger<br />At...
Now let’s send a simple MPI message<br />
Send a simple message<br />int rank;<br />double buffer[SIZE];<br />MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />if (0 == ra...
That’s enough MPI for now…<br />
Open MPI<br />PACX-MPI<br />LAM/MPI<br />Project founded in 2003 after intense<br />discussions between multiple <br />ope...
Open_MPI_Init()<br />shell$ svn log –r 1 https://svn.open-mpi.org/svn/ompi<br />------------------------------------------...
Open_MPI_Current_status()<br />shell$ svn log –r HEAD https://svn.open-mpi.org/svn/ompi<br />-----------------------------...
Open MPI 2011 Membership<br />15 members, 11 contributors, 2 partners<br />
Fun stats<br />ohloh.net says:<br />517,400 lines of code<br />30 developers (over time)<br />“Well-commented source code”...
Open MPI has grown<br />It’s amazing (to me) that the Open MPI<br />project works so well<br />New features, new releases,...
Recap<br />Defined Message Passing Interface (MPI)<br />Defined “supercomputers”<br />Defined High Performance Computing (...
Additional Resources<br />MPI Forum web site<br />The only site for the official MPI standards<br />http://www.mpi-forum.o...
Additional Resources<br />Research, Computing, and Engineering (RCE) podcast<br />http://www.rce-cast.com/<br />My blog: M...
Questions?<br />
Upcoming SlideShare
Loading in …5
×

The Message Passing Interface (MPI) in Layman's Terms

3,569 views

Published on

Introduction to the basic concepts of what the Message Passing Interface (MPI) is, and a brief overview of the Open MPI open source software implementation of the MPI specification.

Published in: Technology, Education
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,569
On SlideShare
0
From Embeds
0
Number of Embeds
841
Actions
Shares
0
Downloads
130
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

The Message Passing Interface (MPI) in Layman's Terms

  1. 1. Open MPI<br />KYOSS presentation<br />12 Jan, 2011<br />Jeff Squyres<br />
  2. 2. What is the Message Passing Interface (MPI)?<br />The Book<br />of MPI<br />A standards document<br />www.mpi-forum.org<br />
  3. 3. Using MPI<br />Hardware and software implement<br />the interface in the MPI standard (book)<br />
  4. 4. MPI implementations<br />There are many implementations<br />of the MPI standard<br />Some are<br />closed source<br />Others are<br />open source<br />
  5. 5. Open MPI<br />Open MPI is a free, open source<br />implementation of the MPI standard<br />www.open-mpi.org<br />
  6. 6. So what is MPI for?<br />Let’s break it down…<br />Message Passing Interface<br />
  7. 7. 1. Message passing<br />Process A<br />Process B<br />Message<br />
  8. 8. 1. Message passing<br />Process A<br />Process B<br />Pass it<br />
  9. 9. 1. Message passing<br />Process A<br />Process B<br />Message has been passed<br />
  10. 10. 1. Message passing<br />Process<br />Thread A<br />Thread B<br />…as opposed to data that is shared<br />
  11. 11. 2. Interface<br />Fortran too!<br />C programming function calls<br />MPI_Wait(req, status)<br />MPI_Init(argv, argc)<br />MPI_Recv(buf, count, type, src, tag, comm, status)<br />MPI_Send(buf, count, type, dest, tag, comm)<br />MPI_Comm_dup(in, out)<br />MPI_Test(req, flag, status)<br />MPI_Finalize(void)<br />MPI_Type_size(dtype, size)<br />
  12. 12. Fortran? Really?<br />What most modern developers associate with “Fortran”<br />
  13. 13. Yes, really<br />Some of<br />today’s most<br />advanced<br />simulation<br />codes are<br />written in<br />Fortran<br />
  14. 14. Yes, really<br />Yes,<br />that Intel<br />Optimized<br />for Nehalem,<br />Westmere,<br />and beyond!<br />
  15. 15. Fortran is great for what it is<br />A simple language for mathematical expressions and computations<br />Targeted at scientists and engineers<br />…not computer scientists or web developers or database developers or …<br />
  16. 16. Back to defining “MPI”…<br />
  17. 17. Putting it back together<br />Message Passing Interface<br />“An interface for passing messages”<br />“C functions for passing messages”<br />Fortran too!<br />
  18. 18. C/Fortran functions for message passing<br />Process A<br />Process B<br />MPI_Send(…)<br />
  19. 19. C/Fortran functions for message passing<br />Process A<br />Process B<br />MPI_Recv(…)<br />
  20. 20. Really? Is that all MPI is?<br />“Can’t I just do that with sockets?”<br />Yes!<br />(…and no)<br />
  21. 21. Comparison<br />(TCP) Sockets<br />Connections based on IP addresses and ports<br />Point-to-point communication<br />Stream-oriented<br />Raw data (bytes / octets)<br />Network-independent<br />“Slow”<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />
  22. 22. Comparison<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />Whoa!<br />What are these?<br />
  23. 23. Peer integer “rank”<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  24. 24. Peer integer “rank”<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  25. 25. “Collective”: broadcast<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  26. 26. “Collective”: broadcast<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  27. 27. “Collective”: scatter<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  28. 28. “Collective”: scatter<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  29. 29. “Collective”: gather<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  30. 30. “Collective”: gather<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  31. 31. “Collective”: reduce<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />4<br />6<br />5<br />3<br />4<br />5<br />5<br />6<br />3<br />6<br />7<br />8<br />3<br />2<br />4<br />9<br />10<br />11<br />2<br />4<br />4<br />
  32. 32. “Collective”: reduce<br />MPI<br />42<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  33. 33. “Collective”: …and others<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />
  34. 34. Messages, not bytes<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />Entire message<br />is sent and<br />received<br />Not a<br />stream of<br />individual bytes<br />
  35. 35. Messages, not bytes<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />Contents:<br />17 integers<br />23 doubles<br />98 structs<br />…or whatever<br />Not a<br />bunch of<br />bytes!<br />
  36. 36. Network independent<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />MPI_Send(…)<br />MPI_Recv(…)<br />Underlying network<br />Ethernet<br />Myrinet<br />InfiniBand<br />Shared memory<br />TCP<br />iWARP<br />RoCE<br />
  37. 37. Network independent<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />MPI_Send(…)<br />MPI_Recv(…)<br />Underlying network<br />Ethernet<br />Myrinet<br />Regardless of underlying<br />network or transport<br />protocol, the application<br />code stays the same<br />InfiniBand<br />Shared memory<br />TCP<br />iWARP<br />RoCE<br />
  38. 38. Blazing fast<br />MPI<br />Based on peer integer “rank” (e.g., 8)<br />Point-to-point and collective and one-sided and …<br />Message oriented<br />Typed messages<br />Network independent<br />Blazing fast<br />One<br />microsecond<br />(!)<br />…more on performance later<br />
  39. 39. What is MPI?<br />MPI is<br />probably<br />somewhere<br />around here<br />
  40. 40. What is MPI?<br />MPI is<br />hides all<br />the layers<br />underneath<br />
  41. 41. What is MPI?<br />A high-level network <br />programming abstraction<br />IP<br />addresses<br />byte<br />streams<br />raw<br />bytes<br />
  42. 42. What is MPI?<br />A high-level network <br />programming abstraction<br />Nothing to see here<br />Please move along<br />IP<br />addresses<br />byte<br />streams<br />raw<br />bytes<br />
  43. 43. So what?<br />What’s all this message passing stuff<br />got to do with supercomputers? <br />
  44. 44. So what?<br />Let’s define “supercomputers”<br />
  45. 45. Supercomputers<br />
  46. 46. Supercomputers<br />“Nebulae”<br />National<br />Supercomputing<br />Centre,<br />Shenzen,<br />China<br />
  47. 47. Supercomputers<br />“Mare Nostrum”<br />(Our Sea)<br />Barcelona<br />Supercomputer<br />Center,<br />Spain<br />Used to be a church<br />
  48. 48. Supercomputers<br />Notice anything?<br />
  49. 49. Supercomputers<br />They’re just<br />racks of<br />servers!<br />
  50. 50. Generally speaking…<br />Supercomputer<br />=<br />Lots of<br />processors<br />Lots of<br />RAM<br />Lots of<br />disk<br />+<br />+<br />
  51. 51. Generally speaking…<br />Supercomputer<br />=<br />(Many) Racks of (commodity)<br />high-end servers<br />(this is one definition; there are others)<br />
  52. 52. So if that’s a supercomputer…<br />Rack of<br />36 1U<br />servers<br />
  53. 53. How is it different from my web farm?<br />Rack of<br />36 1U<br />servers<br />
  54. 54. Just a bunch of servers?<br />The difference between<br />supercomputers and web farms<br />and database farms (and …)<br />All the servers act together to<br />solve a single computational problem<br />
  55. 55. Acting together<br />Take your computational problem…<br />Input<br />Output<br />Computational problem<br />
  56. 56. Acting together<br />…and split it up!<br />Input<br />Output<br />Computational problem<br />
  57. 57. Acting together<br />Distribute the input data<br />across a bunch of servers<br />Input<br />Output<br />Computational problem<br />
  58. 58. Acting together<br />Use the network between servers<br />to communicate / coordinate<br />Input<br />Output<br />
  59. 59. Acting together<br />Use the network between servers<br />to communicate / coordinate<br />Input<br />Output<br />
  60. 60. Acting together<br />MPI is used for this communication<br />Input<br />Output<br />
  61. 61. Why go to so much trouble?<br />One processor hour<br />Computational problem<br />1 processor = …a long time…<br />
  62. 62. Why go to so much trouble?<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />Computational problem<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />One processor hour<br />21 processors = ~1 hour (!)<br />Disclaimer: scaling is rarely perfect<br />
  63. 63. High Performance Computing<br />HPC<br />=<br />Using supercomputers to solve<br />real world problems that are <br />TOO BIG<br />for laptops, desktops, <br />or individuals servers<br />
  64. 64. Why does HPC MPI? <br />Network abstraction<br />Are these<br />cores?<br />
  65. 65. Why does HPC MPI? <br />Network abstraction<br />…or<br />servers?<br />
  66. 66. Why does HPC MPI? <br />Message semantics<br />Array of<br />10,000<br />integers<br />
  67. 67. Why does HPC MPI? <br />Message semantics<br />Array of<br />10,000<br />integers<br />
  68. 68. Why does HPC MPI? <br />Ultra-low network latency<br />(depending on your network type!)<br />1 micro<br />second<br />
  69. 69. 1 microsecond = 0.000001 second<br />From<br />here<br />To<br />here<br />
  70. 70. 1 microsecond = 0.000001 second<br />From<br />here<br />To<br />here<br />
  71. 71. Holy smokes!<br />That’s fast<br />
  72. 72. Let’s get into some details…<br />
  73. 73. MPI Basics<br />“6 function MPI”<br />MPI_Init(): startup<br />MPI_Comm_size(): how many peers?<br />MPI_Comm_rank(): my unique (ordered) ID<br />MPI_Send(): send a message<br />MPI_Recv(): receive a message<br />MPI_Finalize(): shutdown<br />Can implement a huge number of parallel applications with just these 6 functions<br />
  74. 74. Let’s see “Hello, World” in MPI<br />
  75. 75. MPI Hello, World<br />#include <stdio.h><br />#include <mpi.h><br />intmain(intargc, char **argv) {<br />int rank, size;<br />MPI_Init(&argc, &argv);<br />MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />MPI_Comm_size(MPI_COMM_WORLD, &size);<br />printf("Hello, world! I am %d of %dn", rank, size);<br />MPI_Finalize();<br /> return 0;<br />}<br />Initialize MPI<br />Who am I?<br />Num. peers?<br />Shut down MPI<br />
  76. 76. Compile it with Open MPI<br />shell$ mpicchello.c -o hello<br />shell$<br />Open MPI comes standard in many Linux and BSD distributions<br />(and OS X)<br />Hey – what’s that? Where’s gcc?<br />
  77. 77. “Wrapper” compiler<br />mpicc simply fills in a bunch of <br />compiler command line options for you<br />shell$ mpicchello.c -o hello –showme<br />gcchello.c -o hello -I/opt/openmpi/include -pthread -L/open/openmpi/lib -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl<br />shell$<br />
  78. 78. Now let’s run it<br />shell$ mpirun –np 4 hello<br />Hey – what’s that? Why don’t I just run “./hello”?<br />
  79. 79. mpirun launcher<br />mpirun launches N copies of your<br />program and “wires them up”<br />shell$ mpirun –np 4 hello<br />“-np” = “number of processes”<br />This command launches <br />a 4 process parallel job<br />
  80. 80. mpirun launcher<br />shell$ mpirun –np 4 hello<br />hello<br />hello<br />Four copies<br />of “hello”<br />are launched<br />Then they<br />are “wired up”<br />on the network<br />hello<br />hello<br />
  81. 81. Now let’s run it<br />shell$ mpirun –np 4 hello<br />Hello, world! I am 0 of 4<br />Hello, world! I am 1 of 4<br />Hello, world! I am 2 of 4<br />Hello, world! I am 3 of 4<br />shell$ <br />By default, all copies<br />run on the local host<br />
  82. 82. Run on multiple servers!<br />shell$ cat my_hostfile<br />host1.example.com<br />host2.example.com<br />host3.example.com<br />host4.example.com<br />shell$<br />
  83. 83. Run on multiple servers!<br />shell$ cat my_hostfile<br />host1.example.com<br />host2.example.com<br />host3.example.com<br />host4.example.com<br />shell$ mpirun–hostfilemy_hostfile–np 4 hello<br />Hello, world! I am 0 of 4<br />Hello, world! I am 1 of 4<br />Hello, world! I am 2 of 4<br />Hello, world! I am 3 of 4<br />shell$ <br /> Ran on host1<br /> Ran on host2<br /> Ran on host3<br /> Ran on host4<br />
  84. 84. Run it again<br />shell$ mpirun –hostfilemy_hostfile –np 4 hello<br />Hello, world! I am 2 of 4<br />Hello, world! I am 3 of 4<br />Hello, world! I am 0 of 4<br />Hello, world! I am 1 of 4<br />shell$ <br />2<br />3<br />0<br />1<br />Hey – why are the numbers out of order?<br />
  85. 85. Standard output re-routing<br />shell$ mpirun–hostfilemy_hostfile –np 4 hello<br />Hello, world! I am 0 of 4<br />Hello, world! I am 1 of 4<br />hello<br />0<br />hello<br />1<br />mpirun<br />Each “hello” program’s<br />standard output<br />is intercepted<br />and sent across the<br />network to mpirun<br />hello<br />3<br />hello<br />2<br />Hello, world! I am 2 of 4<br />Hello, world! I am 3 of 4<br />
  86. 86. Standard output re-routing<br />shell$ mpirun–hostfilemy_hostfile –np 4 hello<br />hello<br />0<br />hello<br />1<br />mpirun<br />But the exact<br />ordering of<br />received printf’s<br />is non-deterministic<br />hello<br />3<br />hello<br />2<br />Hello, world! I am 2 of 4<br />Hello, world! I am 3 of 4<br />Hello, world! I am 0 of 4<br />Hello, world! I am 1 of 4<br />
  87. 87. Printf debugging = Bad<br />If you can’t rely on output ordering,<br />printf debugging is pretty lousy (!)<br />
  88. 88. Parallel debuggers<br />Fortunately, there are parallel<br />debuggers and other tools<br />Parallel<br />debugger<br />Attaches to all<br />processes in<br />the MPI job<br />hello<br />0<br />hello<br />1<br />mpirun<br />hello<br />3<br />hello<br />2<br />
  89. 89. Now let’s send a simple MPI message<br />
  90. 90. Send a simple message<br />int rank;<br />double buffer[SIZE];<br />MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />if (0 == rank) {<br /> /* …initialize buffer[]… */<br />MPI_Send(buffer, SIZE, MPI_DOUBLE, 1, 123,<br /> MPI_COMM_WORLD);<br />} else if (1 == rank) {<br />MPI_Recv(buffer, SIZE, MPI_DOUBLE, 0, 123,<br /> MPI_COMM_WORLD, MPI_STATUS_IGNORE);<br />}<br />If I’m number 0, send the<br /> buffer[] array to number 1<br />If I’m number 1, receive the<br />buffer[] array from number 0<br />
  91. 91. That’s enough MPI for now…<br />
  92. 92. Open MPI<br />PACX-MPI<br />LAM/MPI<br />Project founded in 2003 after intense<br />discussions between multiple <br />open source MPI implementations <br />LA-MPI<br />FT-MPI<br />Sun CT 6<br />
  93. 93. Open_MPI_Init()<br />shell$ svn log –r 1 https://svn.open-mpi.org/svn/ompi<br />------------------------------------------------------------------------<br />r1 | jsquyres | 2003-11-22 11:36:58 -0500 (Sat, 22 Nov 2003) | 2 lines<br />Firstcommit<br />------------------------------------------------------------------------<br />shell$ <br />
  94. 94. Open_MPI_Current_status()<br />shell$ svn log –r HEAD https://svn.open-mpi.org/svn/ompi<br />------------------------------------------------------------------------<br />r24226 | rhc | 2011-01-11 20:57:47 -0500 (Tue, 11 Jan 2011) | 25 lines<br />Fixes #2683: Move ORTE DPM compiler warning squash to v1.4<br />------------------------------------------------------------------------<br />shell$ <br />
  95. 95. Open MPI 2011 Membership<br />15 members, 11 contributors, 2 partners<br />
  96. 96. Fun stats<br />ohloh.net says:<br />517,400 lines of code<br />30 developers (over time)<br />“Well-commented source code”<br />I rank in top-25 ohloh stats for:<br />C<br />Automake<br />Shell script<br />Fortran (ouch!)<br />
  97. 97. Open MPI has grown<br />It’s amazing (to me) that the Open MPI<br />project works so well<br />New features, new releases, <br />new members<br />Long live Open MPI!<br />
  98. 98. Recap<br />Defined Message Passing Interface (MPI)<br />Defined “supercomputers”<br />Defined High Performance Computing (HPC)<br />Showed what MPI is<br />Showed some trivial MPI codes<br />Discussed Open MPI<br />
  99. 99. Additional Resources<br />MPI Forum web site<br />The only site for the official MPI standards<br />http://www.mpi-forum.org/<br />NCSA MPI basic and intermediate tutorials<br />Requires a free account<br />http://ci-tutor.ncsa.uiuc.edu/login.php<br />“MPI Mechanic” magazine columns<br />http://cw.squyres.com/<br />
  100. 100. Additional Resources<br />Research, Computing, and Engineering (RCE) podcast<br />http://www.rce-cast.com/<br />My blog: MPI_BCAST<br />http://blogs.cisco.com/category/performance/<br />
  101. 101. Questions?<br />

×