MPI Introduction

987 views
862 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
987
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
64
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Can work with shared memory architectures also
  • Why MPI is still being used
  • Many vendors can compete for providing better implementation
  • Boring topic. But fundamental for understanding the basics
  • Safe – different libraries can work together
  • Different return codes for different functions
  • To start coding we need to use these functions.
  • Mention the case where the buffer space might not be available
  • Buffer used only during Buffered mode communication.
  • Ready call indicates the system that a receive has already been posted.
  • Built in collective operations. Reduce, Bcast, Datatypes
  • MPI Introduction

    1. 1. MPI Rohit Banga Prakher Anand K Swagat Manoj Gupta Advanced Computer Architecture Spring, 2010
    2. 2. ORGANIZATION <ul><li>Basics of MPI </li></ul><ul><li>Point to Point Communication </li></ul><ul><li>Collective Communication </li></ul><ul><li>Demo </li></ul>
    3. 3. GOALS <ul><li>Explain basics of MPI </li></ul><ul><li>Start coding today! </li></ul><ul><li>Keep It Short and Simple </li></ul>
    4. 4. MESSAGE PASSING INTERFACE <ul><li>A message passing library specification </li></ul><ul><ul><li>Extended message-passing model </li></ul></ul><ul><ul><li>Not a language or compiler specification </li></ul></ul><ul><ul><li>Not a specific implementation, several implementations (like pthread) </li></ul></ul><ul><li>standard for distributed memory, message passing, parallel computing </li></ul><ul><li>Distributed Memory – Shared Nothing approach! </li></ul><ul><li>Some interconnection technology – TCP, INFINIBAND (on our cluster) </li></ul>
    5. 5. GOALS OF MPI SPECIFICATION <ul><li>Provide source code portability </li></ul><ul><li>Allow efficient implementations </li></ul><ul><li>Flexible to port different algorithms on different hardware environments </li></ul><ul><li>Support for heterogeneous architectures – processors not identical </li></ul>
    6. 6. REASONS FOR USING MPI <ul><li>Standardization – virtually all HPC platforms </li></ul><ul><li>Portability – same code runs on another platform </li></ul><ul><li>Performance – vendor implementations should exploit native hardware features </li></ul><ul><li>Functionality – 115 routines </li></ul><ul><li>Availability – a variety of implementations available </li></ul>
    7. 7. BASIC MODEL <ul><li>Communicators and Groups </li></ul><ul><li>Group </li></ul><ul><ul><li>ordered set of processes </li></ul></ul><ul><ul><li>each process is associated with a unique integer rank </li></ul></ul><ul><ul><li>rank from 0 to (N-1) for N processes </li></ul></ul><ul><ul><li>an object in system memory accessed by handle </li></ul></ul><ul><ul><li>MPI_GROUP_EMPTY </li></ul></ul><ul><ul><li>MPI_GROUP_NULL </li></ul></ul>
    8. 8. BASIC MODEL (CONTD.) <ul><li>Communicator </li></ul><ul><ul><li>Group of processes that may communicate with each other </li></ul></ul><ul><ul><li>MPI messages must specify a communicator </li></ul></ul><ul><ul><li>An object in memory </li></ul></ul><ul><ul><li>Handle to access the object </li></ul></ul><ul><li>There is a default communicator (automatically defined): </li></ul><ul><li>MPI_COMM_WORLD </li></ul><ul><li>identify the group of all processes </li></ul>
    9. 9. COMMUNICATORS <ul><li>Intra-Communicator – All processes from the same group </li></ul><ul><li>Inter-Communicator – Processes picked up from several groups </li></ul>
    10. 10. COMMUNICATOR AND GROUPS <ul><li>For a programmer, group and communicator are one </li></ul><ul><li>Allow you to organize tasks, based upon function, into task groups </li></ul><ul><li>Enable Collective Communications (later) operations across a subset of related tasks </li></ul><ul><li>safe communications </li></ul><ul><li>Many Communicators at the same time </li></ul><ul><li>Dynamic – can be created and destroyed at run time </li></ul><ul><li>Process may be in more than one group/communicator – unique rank in every group/communicator </li></ul><ul><li>implementing user defined virtual topologies </li></ul>
    11. 11. VIRTUAL TOPOLOGIES <ul><li>coord (0,0): rank 0 </li></ul><ul><li>coord (0,1): rank 1 </li></ul><ul><li>coord (1,0): rank 2 </li></ul><ul><li>coord (1,1): rank 3 </li></ul><ul><li>Attach graph topology information to an existing communicator </li></ul>
    12. 12. SEMANTICS <ul><li>Header file </li></ul><ul><ul><li>#include <mpi.h> (C) </li></ul></ul><ul><ul><li>include mpif.h (fortran) </li></ul></ul><ul><ul><li>Java, Python etc. </li></ul></ul>Format: rc = MPI_Xxxxx(parameter, ... ) Example: rc = MPI_Bsend(&buf,count,type,dest,tag,comm) Error code: Returned as &quot;rc&quot;. MPI_SUCCESS if successful
    13. 13. MPI PROGRAM STRUCTURE
    14. 14. MPI FUNCTIONS – MINIMAL SUBSET <ul><li>MPI_Init – Initialize MPI </li></ul><ul><li>MPI_Comm_size – size of group associated with the communicator </li></ul><ul><li>MPI_Comm_rank – identify the process </li></ul><ul><li>MPI_Send </li></ul><ul><li>MPI_Recv </li></ul><ul><li>MPI_Finalize </li></ul><ul><ul><li>We will discuss simple ones first </li></ul></ul>
    15. 15. CLASSIFICATION OF MPI ROUTINES <ul><li>Environment Management </li></ul><ul><ul><li>MPI_Init, MPI_Finalize </li></ul></ul><ul><li>Point-to-Point Communication </li></ul><ul><ul><li>MPI_Send, MPI_Recv </li></ul></ul><ul><li>Collective Communication </li></ul><ul><ul><li>MPI_Reduce, MPI_Bcast </li></ul></ul><ul><li>Information on the Processes </li></ul><ul><ul><li>MPI_Comm_rank, MPI_Get_processor_name </li></ul></ul>
    16. 16. MPI_INIT <ul><li>All MPI programs call this before using other MPI functions </li></ul><ul><ul><li>int MPI_Init(int *pargc, char ***pargv); </li></ul></ul><ul><li>Must be called in every MPI program </li></ul><ul><li>Must be called only once and before any other MPI functions are called </li></ul><ul><li>Pass command line arguments to all processes </li></ul>int main(int argc, char **argv) { MPI_Init(&argc, &argv); … }
    17. 17. MPI_COMM_SIZE <ul><li>Number of processes in the group associated with a communicator </li></ul><ul><ul><li>int MPI_Comm_size(MPI_Comm comm, int *psize); </li></ul></ul><ul><li>Find out number of processes being used by your application </li></ul>int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); … }
    18. 18. MPI_COMM_RANK <ul><li>Rank of the calling process within the communicator </li></ul><ul><li>Unique Rank between 0 and (p-1) </li></ul><ul><li>Can be called task ID </li></ul><ul><ul><li>int MPI_Comm_rank(MPI_Comm comm, int *rank); </li></ul></ul><ul><li>Unique rank for a process in each communicator it belongs to </li></ul><ul><li>Used to identify work for the processor </li></ul>int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); … }
    19. 19. MPI_FINALIZE <ul><li>Terminates the MPI execution environment </li></ul><ul><li>Last MPI routine to be called in any MPI program </li></ul><ul><ul><li>int MPI_Finalize(void); </li></ul></ul>int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(“no. of processors: %dn rank: %d”, p, rank); MPI_Finalize(); }
    20. 21. HOW TO COMPILE THIS <ul><li>Open MPI implementation on our Cluster </li></ul><ul><li>mpicc -o test_1 test_1.c </li></ul><ul><li>Like gcc only </li></ul><ul><li>mpicc not a special compiler </li></ul><ul><ul><li>$mpicc: gcc: no input files </li></ul></ul><ul><ul><li>Mpi implemented just as any other library </li></ul></ul><ul><ul><li>Just a wrapper around gcc that includes required command line parameters </li></ul></ul>
    21. 22. HOW TO RUN THIS <ul><li>mpirun -np X test_1 </li></ul><ul><li>Will run X copies of program in your current run time environment </li></ul><ul><li>np option specifies number of copies of program </li></ul>
    22. 23. MPIRUN <ul><li>Only rank 0 process can receive standard input. </li></ul><ul><ul><li>mpirun redirects standard input of all others to /dev/null </li></ul></ul><ul><ul><li>Open MPI redirects standard input of mpirun to standard input of rank 0 process </li></ul></ul><ul><li>Node which invoked mpirun need not be the same as the node for the MPI_COMM_WORLD rank 0 process </li></ul><ul><li>mpirun directs standard output and error of remote nodes to the node that invoked mpirun </li></ul><ul><li>SIGTERM, SIGKILL kill all processes in the communicator </li></ul><ul><li>SIGUSR1, SIGUSR2 propagated to all processes </li></ul><ul><li>All other signals ignored </li></ul>
    23. 24. A NOTE ON IMPLEMENTATION <ul><li>I want to implement my own version of MPI </li></ul><ul><li>Evidence </li></ul>MPI_Init MPI Thread MPI_Init MPI Thread
    24. 25. SOME MORE FUNCTIONS <ul><li>int MPI_Init (&flag) </li></ul><ul><ul><li>Check if MPI_Initialized has been called </li></ul></ul><ul><ul><li>Why? </li></ul></ul><ul><li>int MPI_Wtime() </li></ul><ul><ul><li>Returns elapsed wall clock time in seconds (double precision) on the calling processor </li></ul></ul><ul><li>int MPI_Wtick() </li></ul><ul><ul><li>Returns the resolution in seconds (double precision) of MPI_Wtime() </li></ul></ul><ul><li>Message Passing Functionality </li></ul><ul><ul><li>That is what MPI is meant for! </li></ul></ul>
    25. 26. POINT TO POINT COMMUNICATION
    26. 27. POINT-TO-POINT COMMUNICATION <ul><li>Communication between 2 and only 2 processes </li></ul><ul><li>One sending and one receiving </li></ul><ul><li>Types </li></ul><ul><ul><li>Synchronous send </li></ul></ul><ul><ul><li>Blocking send / blocking receive </li></ul></ul><ul><ul><li>Non-blocking send / non-blocking receive </li></ul></ul><ul><ul><li>Buffered send </li></ul></ul><ul><ul><li>Combined send/receive </li></ul></ul><ul><ul><li>&quot;Ready&quot; send </li></ul></ul>
    27. 28. POINT-TO-POINT COMMUNICATION <ul><li>Processes can be collected into groups </li></ul><ul><li>Each message is sent in a context, and </li></ul><ul><li>must be received in the same context </li></ul><ul><li>A group and context together form a Communicator </li></ul><ul><li>A process is identified by its rank in the group </li></ul><ul><li>associated with a communicator </li></ul><ul><li>Messages are sent with an accompanying user defined integer tag, to assist the receiving process in identifying the message </li></ul><ul><li>MPI_ANY_TAG </li></ul>
    28. 29. POINT-TO-POINT COMMUNICATION <ul><li>How is “data” described? </li></ul><ul><li>How are processes identified? </li></ul><ul><li>How does the receiver recognize messages? </li></ul><ul><li>What does it mean for these operations to complete? </li></ul>
    29. 30. BLOCKING SEND/RECEIVE <ul><li>int MPI_Send( void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) </li></ul><ul><li>buf: pointer - data to send </li></ul><ul><li>count: number of elements in buffer . </li></ul><ul><li>Datatype : which kind of data types in buffer ? </li></ul>
    30. 31. BLOCKING SEND/RECEIVE <ul><li>int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) </li></ul><ul><li>buf: pointer - data to send </li></ul><ul><li>count: number of elements in buffer . </li></ul><ul><li>Datatype : which kind of data types in buffer ? </li></ul>
    31. 32. BLOCKING SEND/RECEIVE <ul><li>int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) </li></ul><ul><li>buf: pointer - data to send </li></ul><ul><li>count: number of elements in buffer . </li></ul><ul><li>Datatype : which kind of data types in buffer ? </li></ul>
    32. 34. BLOCKING SEND/RECEIVE <ul><li>int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) </li></ul><ul><li>buf: pointer - data to send </li></ul><ul><li>count: number of elements in buffer . </li></ul><ul><li>Datatype : which kind of data types in buffer ? </li></ul><ul><li>dest: the receiver </li></ul><ul><li>tag: the label of the message </li></ul><ul><li>communicator: set of processors involved (MPI_COMM_WORLD) </li></ul>
    33. 35. BLOCKING SEND/RECEIVE <ul><li>int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) </li></ul><ul><li>buf: pointer - data to send </li></ul><ul><li>count: number of elements in buffer . </li></ul><ul><li>Datatype : which kind of data types in buffer ? </li></ul><ul><li>dest: the receiver </li></ul><ul><li>tag: the label of the message </li></ul><ul><li>communicator: set of processors involved (MPI_COMM_WORLD) </li></ul>
    34. 36. BLOCKING SEND/RECEIVE <ul><li>int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator ) </li></ul><ul><li>buf: pointer - data to send </li></ul><ul><li>count: number of elements in buffer . </li></ul><ul><li>Datatype : which kind of data types in buffer ? </li></ul><ul><li>dest: the receiver </li></ul><ul><li>tag: the label of the message </li></ul><ul><li>communicator: set of processors involved (MPI_COMM_WORLD) </li></ul>
    35. 37. BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
    36. 38. A WORD ABOUT SPECIFICATION <ul><li>The user does not know if MPI implementation: </li></ul><ul><ul><li>copies BUFFER in an internal buffer, start communication, and returns control before all the data are transferred. (BUFFERING) </li></ul></ul><ul><ul><li>create links between processors, send data and return control when all the data are sent (but NOT received) </li></ul></ul><ul><ul><li>uses a combination of the above methods </li></ul></ul>
    37. 39. BLOCKING SEND/RECEIVE (CONTD.) <ul><li>&quot;return&quot; after it is safe to modify the application buffer </li></ul><ul><li>Safe </li></ul><ul><ul><li>modifications will not affect the data intended for the receive task </li></ul></ul><ul><ul><li>does not imply that the data was actually received </li></ul></ul><ul><li>Blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send </li></ul><ul><li>A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive </li></ul><ul><li>A blocking receive only &quot;returns&quot; after the data has arrived and is ready for use by the program </li></ul>
    38. 40. NON-BLOCKING SEND/RECEIVE <ul><li>return almost immediately </li></ul><ul><li>simply &quot;request&quot; the MPI library to perform the operation when it is able </li></ul><ul><li>Cannot predict when that will happen </li></ul><ul><li>request a send/receive and start doing other work! </li></ul><ul><li>unsafe to modify the application buffer (your variable space) until you know that the non-blocking operation has been completed </li></ul><ul><li>MPI_Isend (&buf,count,datatype,dest,tag,comm,&request) </li></ul><ul><li>MPI_Irecv (&buf,count,datatype,source,tag,comm,&request) </li></ul>
    39. 41. NON-BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
    40. 42. <ul><li>To check if the send/receive operations have completed </li></ul><ul><li>int MPI_Irecv (void *buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm, MPI_Request *req); </li></ul><ul><li>int MPI_Wait(MPI_Request *req, MPI_Status *status); </li></ul><ul><ul><li>A call to this subroutine cause the code to wait until the communication pointed by req is complete </li></ul></ul><ul><ul><li>input/output, identifier associated to a communications </li></ul></ul><ul><ul><li>event (initiated by MPI_ISEND or MPI_IRECV). </li></ul></ul><ul><ul><li>input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV). </li></ul></ul>NON-BLOCKING SEND/RECEIVE (CONTD.)
    41. 43. <ul><li>int MPI_Test(MPI_Request *req, int *flag, MPI_Status *status); </li></ul><ul><ul><li>A call to this subroutine sets flag to true if the communication pointed by req is complete, sets flag to false otherwise. </li></ul></ul>NON-BLOCKING SEND/RECEIVE (CONTD.)
    42. 44. STANDARD MODE <ul><li>Returns when Sender is free to access and overwrite the send buffer. </li></ul><ul><li>Might be copied directly into the matching receive buffer, or might be copied into a temporary system buffer. </li></ul><ul><li>Message buffering decouples the send and receive operations. </li></ul><ul><li>Message buffering can be expensive. </li></ul><ul><li>It is up to MPI to decide whether outgoing messages will be buffered </li></ul><ul><li>The standard mode send is non-local . </li></ul>
    43. 45. SYNCHRONOUS MODE <ul><li>Send can be started whether or not a matching receive was posted. </li></ul><ul><li>Send completes successfully only if a corresponding receive was already posted and has already started to receive the message sent. </li></ul><ul><li>Blocking send & Blocking receive in synchronous mode. </li></ul><ul><li>Simulate a synchronous communication. </li></ul><ul><li>Synchronous Send is non-local . </li></ul>
    44. 46. BUFFERED MODE <ul><li>Send operation can be started whether or not a matching receive has been posted. </li></ul><ul><li>It may complete before a matching receive is posted. </li></ul><ul><li>Operation is local. </li></ul><ul><li>MPI must buffer the outgoing message. </li></ul><ul><li>Error will occur if there is insufficient buffer space. </li></ul><ul><li>The amount of available buffer space is controlled by the user. </li></ul>
    45. 47. BUFFER MANAGEMENT <ul><li>int MPI_Buffer_attach( void* buffer, int size) </li></ul><ul><li>Provides to MPI a buffer in the user's memory to be used for buffering outgoing messages. </li></ul><ul><li>int MPI_Buffer_detach( void* buffer_addr, int* size) </li></ul><ul><li>Detach the buffer currently associated with MPI. </li></ul>MPI_Buffer_attach( malloc(BUFFSIZE), BUFFSIZE); /* a buffer of BUFFSIZE bytes can now be used by MPI_Bsend */ MPI_Buffer_detach( &buff, &size); /* Buffer size reduced to zero */ MPI_Buffer_attach( buff, size); /* Buffer of BUFFSIZE bytes available again */
    46. 48. READY MODE <ul><li>A send may be started only if the matching receive is already posted. </li></ul><ul><li>The user must be sure of this. </li></ul><ul><li>If the receive is not already posted, the operation is erroneous and its outcome is undefined. </li></ul><ul><li>Completion of the send operation does not depend on the status of a matching receive. </li></ul><ul><li>Merely indicates that the send buffer can be reused. </li></ul><ul><li>Ready-send could be replaced by a standard-send with no effect on the behavior of the program other than performance. </li></ul>
    47. 49. ORDER AND FAIRNESS <ul><li>Order: </li></ul><ul><ul><li>MPI Messages are non-overtaking . </li></ul></ul><ul><ul><li>When a receive matches 2 messages. </li></ul></ul><ul><ul><li>When a sent message matches 2 receive statements. </li></ul></ul><ul><ul><li>Message-passing code is deterministic, unless the processes are multi-threaded or the wild-card MPI_ANY_SOURCE is used in a receive statement. </li></ul></ul><ul><li>Fairness: </li></ul><ul><ul><li>MPI does not guarantee fairness </li></ul></ul><ul><ul><li>Example: task 0 sends a message to task 2. However, task 1 sends a competing message that matches task 2's receive. Only one of the sends will complete. </li></ul></ul>
    48. 50. EXAMPLE OF NON-OVERTAKING MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
    49. 51. EXAMPLE OF INTERTWINGLED MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag1, comm, ierr) CALL MPI_SSEND(buf2, count, MPI_REAL, 1, tag2, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, tag2, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag1, comm, status, ierr) END IF
    50. 52. DEADLOCK EXAMPLE CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) END IF
    51. 53. EXAMPLE OF BUFFERING CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_SEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV (recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE ! rank.EQ.1 CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
    52. 54. COLLECTIVE COMMUNICATIONS
    53. 55. COLLECTIVE ROUTINES <ul><li>Collective routines provide a higher-level way to organize a parallel program. </li></ul><ul><li>Each process executes the same communication operations. </li></ul><ul><li>Communications involving group of processes in a communicator. </li></ul><ul><li>Groups and communicators can be constructed “by hand” or using topology routines. </li></ul><ul><li>Tags are not used; different communicators deliver similar functionality. </li></ul><ul><li>No non-blocking collective operations. </li></ul><ul><li>Three classes of operations: synchronization, data movement, collective computation. </li></ul>
    54. 56. COLLECTIVE ROUTINES (CONTD.) <ul><li>int MPI_Barrier(MPI_Comm comm) </li></ul><ul><li>Stop processes until all processes within a communicator reach the barrier </li></ul><ul><li>Occasionally useful in measuring performance </li></ul>
    55. 57. COLLECTIVE ROUTINES (CONTD.) <ul><li>int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm) </li></ul><ul><li>Broadcast </li></ul><ul><li>One-to-all communication: same data sent from root process to all others in the communicator </li></ul>
    56. 58. COLLECTIVE ROUTINES (CONTD.) <ul><li>Reduction </li></ul><ul><li>The reduction operation allow to: </li></ul><ul><ul><li>Collect data from each process </li></ul></ul><ul><ul><li>Reduce the data to a single value </li></ul></ul><ul><ul><li>Store the result on the root processes </li></ul></ul><ul><ul><li>Store the result on all processes </li></ul></ul><ul><li>Reduction function works with arrays </li></ul><ul><li>other operation: product, min, max, and, …. </li></ul><ul><li>Internally is usually implemented with a binary tree </li></ul>
    57. 59. COLLECTIVE ROUTINES (CONTD.) <ul><li>int MPI_Reduce/MPI_Allreduce(void * snd_buf, void * rcv_buf, int count, MPI_Datatype type, MPI_Op op, int root, MPI_Comm comm) </li></ul><ul><li>snd_buf: input array </li></ul><ul><li>rcv_buf output array </li></ul><ul><li>count: number of element of snd_buf and rcv_buf </li></ul><ul><li>type: MPI type of snd_buf and rcv_buf </li></ul><ul><li>op: parallel operation to be performed </li></ul><ul><li>root: MPI id of the process storing the result </li></ul><ul><li>comm: communicator of processes involved in the operation </li></ul>
    58. 60. MPI OPERATIONS MPI_OP operator MPI_MIN Minimum MPI_SUM Sum MPI_PROD product MPI_MAX maximum MPI_LAND Logical and MPI_BAND Bitwise and MPI_LOR Logical or MPI_BOR Bitwise or MPI_LXOR Logical xor MPI_BXOR Bit-wise xor MPI_MAXLOC Max value and location MPI_MINLOC Min value and location
    59. 61. COLLECTIVE ROUTINES (CONTD.)
    60. 62. Learn by Examples
    61. 63. Parallel Trapezoidal Rule Output: Estimate of the integral from a to b of f(x) using the trapezoidal rule and n trapezoids. Algorithm: 1. Each process calculates &quot;its&quot; interval of integration. 2. Each process estimates the integral of f(x) over its interval using the trapezoidal rule. 3a. Each process != 0 sends its integral to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Notes: 1. f(x), a, b, and n are all hardwired. 2. The number of processes (p) should evenly divide the number of trapezoids (n = 1024)
    62. 64. Parallelizing the Trapezoidal Rule #include <stdio.h> #include &quot;mpi.h&quot; main(int argc, char** argv) { int my_rank; /* My process rank */ int p; /* The number of processes */ double a = 0.0; /* Left endpoint */ double b = 1.0; /* Right endpoint */ int n = 1024; /* Number of trapezoids */ double h; /* Trapezoid base length */ double local_a; /* Left endpoint my process */ double local_b; /* Right endpoint my process */ int local_n; /* Number of trapezoids for */ /* my calculation */ double integral; /* Integral over my interval */ double total; /* Total integral */ int source; /* Process sending integral */ int dest = 0; /* All messages go to 0 */ int tag = 0; MPI_Status status;
    63. 65. Continued… double Trap(double local_a, double local_b, int local_n,double h); /* Calculate local integral */ MPI_Init (&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); double elapsed_time = -MPI_Wtime(); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); h = (b-a)/n; /* h is the same for all processes */ local_n = n/p; /* So is the number of trapezoids */ /* Length of each process' interval of integration = local_n*h. So my interval starts at: */ local_a = a + my_rank*local_n*h; local_b = local_a + local_n*h; integral = Trap(local_a, local_b, local_n, h);
    64. 66. Continued… /* Add up the integrals calculated by each process */ if (my_rank == 0) { total = integral; for (source = 1; source < p; source++) { MPI_Recv(&integral, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status); total = total + integral; }//End for } else MPI_Send(&integral, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD); elapsed_time += MPI_Wtime(); /* Print the result */ if (my_rank == 0) { printf(&quot;With n = %d trapezoids, our estimaten&quot;,n); printf(&quot;of the integral from %lf to %lf = %lfn&quot;,a, b, total); printf(&quot;time taken: %lfn&quot;, elapsed_time); }
    65. 67. Continued… /* Shut down MPI */ MPI_Finalize(); } /* main */ double Trap( double local_a , double local_b, int local_n, double h) { double integral; /* Store result in integral */ double x; int i; double f(double x); /* function we're integrating */ integral = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) { x = x + h; integral = integral + f(x); } integral = integral*h; return integral; } /* Trap */
    66. 68. Continued… double f(double x) { double return_val; /* Calculate f(x). */ /* Store calculation in return_val. */ return_val = 4 / (1+x*x); return return_val; } /* f */
    67. 69. Program 2 Process other than root generates the random value less than 1 and sends to root. Root sums up and displays sum.
    68. 70. #include <stdio.h> #include <mpi.h> #include<stdlib.h> #include <string.h> #include<time.h> int main(int argc, char **argv) { int myrank, p; int tag =0, dest=0; int i; double randIn,randOut; int source; MPI_Status status; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
    69. 71. MPI_Comm_size(MPI_COMM_WORLD, &p); if(myrank==0)//I am the root { double total=0,average=0; for(source=1;source<p;source++) { MPI_Recv(&randIn,1, MPI_DOUBLE, source, MPI_ANY_TAG, MPI_COMM_WORLD, &status); printf(&quot;Message from root: From %d received number %fn&quot;,source ,randIn); total+=randIn; }//End for average=total/(p-1); }//End if
    70. 72. else//I am other than root { srand48((long int) myrank); randOut=drand48(); printf(&quot;randout=%f, myrank=%dn&quot;,randOut,myrank); MPI_Send(&randOut,1,MPI_DOUBLE,dest,tag,MPI_COMM_WORLD); }//End If-Else MPI_Finalize(); return 0; }
    71. 73. MPI References <ul><li>The Standard itself: </li></ul><ul><ul><li>at http://www.mpi-forum.org </li></ul></ul><ul><ul><li>All MPI official releases, in both postscript and HTML </li></ul></ul><ul><li>Books: </li></ul><ul><ul><li>Using MPI: Portable Parallel Programming with the Message-Passing Interface , 2nd Edition, by Gropp, Lusk, and Skjellum, MIT Press, 1999. Also Using MPI-2 , w. R. Thakur </li></ul></ul><ul><ul><li>MPI: The Complete Reference, 2 vols , MIT Press, 1999. </li></ul></ul><ul><ul><li>Designing and Building Parallel Programs , by Ian Foster, Addison-Wesley, 1995. </li></ul></ul><ul><ul><li>Parallel Programming with MPI , by Peter Pacheco, Morgan-Kaufmann, 1997. </li></ul></ul><ul><li>Other information on Web: </li></ul><ul><ul><li>at http://www.mcs.anl.gov/mpi </li></ul></ul><ul><ul><li>For man pages of open MPI on the web : http://www.open-mpi.org/doc/v1.4/ </li></ul></ul><ul><ul><li>apropos mpi </li></ul></ul>
    72. 74. THANK YOU

    ×