MPI
Upcoming SlideShare
Loading in...5
×
 

MPI

on

  • 1,090 views

 

Statistics

Views

Total Views
1,090
Views on SlideShare
962
Embed Views
128

Actions

Likes
0
Downloads
104
Comments
0

1 Embed 128

http://iamrohitbanga.wordpress.com 128

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Can work with shared memory architectures also
  • Why MPI is still being used
  • Many vendors can compete for providing better implementation
  • Boring topic. But fundamental for understanding the basics
  • Safe – different libraries can work together
  • Different return codes for different functions
  • To start coding we need to use these functions.
  • Mention the case where the buffer space might not be available
  • Buffer used only during Buffered mode communication.
  • Ready call indicates the system that a receive has already been posted.
  • Built in collective operations. Reduce, Bcast, Datatypes

MPI MPI Presentation Transcript

  • MPI Rohit Banga Prakher Anand K Swagat Manoj Gupta Advanced Computer Architecture Spring, 2010
  • ORGANIZATION
    • Basics of MPI
    • Point to Point Communication
    • Collective Communication
    • Demo
  • GOALS
    • Explain basics of MPI
    • Start coding today!
    • Keep It Short and Simple
  • MESSAGE PASSING INTERFACE
    • A message passing library specification
      • Extended message-passing model
      • Not a language or compiler specification
      • Not a specific implementation, several implementations (like pthread)
    • standard for distributed memory, message passing, parallel computing
    • Distributed Memory – Shared Nothing approach!
    • Some interconnection technology – TCP, INFINIBAND (on our cluster)
  • GOALS OF MPI SPECIFICATION
    • Provide source code portability
    • Allow efficient implementations
    • Flexible to port different algorithms on different hardware environments
    • Support for heterogeneous architectures – processors not identical
  • REASONS FOR USING MPI
    • Standardization – virtually all HPC platforms
    • Portability – same code runs on another platform
    • Performance – vendor implementations should exploit native hardware features
    • Functionality – 115 routines
    • Availability – a variety of implementations available
  • BASIC MODEL
    • Communicators and Groups
    • Group
      • ordered set of processes
      • each process is associated with a unique integer rank
      • rank from 0 to (N-1) for N processes
      • an object in system memory accessed by handle
      • MPI_GROUP_EMPTY
      • MPI_GROUP_NULL
  • BASIC MODEL (CONTD.)
    • Communicator
      • Group of processes that may communicate with each other
      • MPI messages must specify a communicator
      • An object in memory
      • Handle to access the object
    • There is a default communicator (automatically defined):
    • MPI_COMM_WORLD
    • identify the group of all processes
  • COMMUNICATORS
    • Intra-Communicator – All processes from the same group
    • Inter-Communicator – Processes picked up from several groups
  • COMMUNICATOR AND GROUPS
    • For a programmer, group and communicator are one
    • Allow you to organize tasks, based upon function, into task groups
    • Enable Collective Communications (later) operations across a subset of related tasks
    • safe communications
    • Many Communicators at the same time
    • Dynamic – can be created and destroyed at run time
    • Process may be in more than one group/communicator – unique rank in every group/communicator
    • implementing user defined virtual topologies
  • VIRTUAL TOPOLOGIES
    • coord (0,0): rank 0
    • coord (0,1): rank 1
    • coord (1,0): rank 2
    • coord (1,1): rank 3
    • Attach graph topology information to an existing communicator
  • SEMANTICS
    • Header file
      • #include <mpi.h> (C)
      • include mpif.h (fortran)
      • Java, Python etc.
    Format: rc = MPI_Xxxxx(parameter, ... ) Example: rc = MPI_Bsend(&buf,count,type,dest,tag,comm) Error code: Returned as &quot;rc&quot;. MPI_SUCCESS if successful
  • MPI PROGRAM STRUCTURE
  • MPI FUNCTIONS – MINIMAL SUBSET
    • MPI_Init – Initialize MPI
    • MPI_Comm_size – size of group associated with the communicator
    • MPI_Comm_rank – identify the process
    • MPI_Send
    • MPI_Recv
    • MPI_Finalize
      • We will discuss simple ones first
  • CLASSIFICATION OF MPI ROUTINES
    • Environment Management
      • MPI_Init, MPI_Finalize
    • Point-to-Point Communication
      • MPI_Send, MPI_Recv
    • Collective Communication
      • MPI_Reduce, MPI_Bcast
    • Information on the Processes
      • MPI_Comm_rank, MPI_Get_processor_name
  • MPI_INIT
    • All MPI programs call this before using other MPI functions
      • int MPI_Init(int *pargc, char ***pargv);
    • Must be called in every MPI program
    • Must be called only once and before any other MPI functions are called
    • Pass command line arguments to all processes
    int main(int argc, char **argv) { MPI_Init(&argc, &argv); … }
  • MPI_COMM_SIZE
    • Number of processes in the group associated with a communicator
      • int MPI_Comm_size(MPI_Comm comm, int *psize);
    • Find out number of processes being used by your application
    int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); … }
  • MPI_COMM_RANK
    • Rank of the calling process within the communicator
    • Unique Rank between 0 and (p-1)
    • Can be called task ID
      • int MPI_Comm_rank(MPI_Comm comm, int *rank);
    • Unique rank for a process in each communicator it belongs to
    • Used to identify work for the processor
    int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); … }
  • MPI_FINALIZE
    • Terminates the MPI execution environment
    • Last MPI routine to be called in any MPI program
      • int MPI_Finalize(void);
    int main(int argc, char **argv) { MPI_Init(&argc, &argv); int p; MPI_Comm_size(MPI_COMM_WORLD, &p); int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(“no. of processors: %dn rank: %d”, p, rank); MPI_Finalize(); }
  •  
  • HOW TO COMPILE THIS
    • Open MPI implementation on our Cluster
    • mpicc -o test_1 test_1.c
    • Like gcc only
    • mpicc not a special compiler
      • $mpicc: gcc: no input files
      • Mpi implemented just as any other library
      • Just a wrapper around gcc that includes required command line parameters
  • HOW TO RUN THIS
    • mpirun -np X test_1
    • Will run X copies of program in your current run time environment
    • np option specifies number of copies of program
  • MPIRUN
    • Only rank 0 process can receive standard input.
      • mpirun redirects standard input of all others to /dev/null
      • Open MPI redirects standard input of mpirun to standard input of rank 0 process
    • Node which invoked mpirun need not be the same as the node for the MPI_COMM_WORLD rank 0 process
    • mpirun directs standard output and error of remote nodes to the node that invoked mpirun
    • SIGTERM, SIGKILL kill all processes in the communicator
    • SIGUSR1, SIGUSR2 propagated to all processes
    • All other signals ignored
  • A NOTE ON IMPLEMENTATION
    • I want to implement my own version of MPI
    • Evidence
    MPI_Init MPI Thread MPI_Init MPI Thread
  • SOME MORE FUNCTIONS
    • int MPI_Init (&flag)
      • Check if MPI_Initialized has been called
      • Why?
    • int MPI_Wtime()
      • Returns elapsed wall clock time in seconds (double precision) on the calling processor
    • int MPI_Wtick()
      • Returns the resolution in seconds (double precision) of MPI_Wtime()
    • Message Passing Functionality
      • That is what MPI is meant for!
  • POINT TO POINT COMMUNICATION
  • POINT-TO-POINT COMMUNICATION
    • Communication between 2 and only 2 processes
    • One sending and one receiving
    • Types
      • Synchronous send
      • Blocking send / blocking receive
      • Non-blocking send / non-blocking receive
      • Buffered send
      • Combined send/receive
      • &quot;Ready&quot; send
  • POINT-TO-POINT COMMUNICATION
    • Processes can be collected into groups
    • Each message is sent in a context, and
    • must be received in the same context
    • A group and context together form a Communicator
    • A process is identified by its rank in the group
    • associated with a communicator
    • Messages are sent with an accompanying user defined integer tag, to assist the receiving process in identifying the message
    • MPI_ANY_TAG
  • POINT-TO-POINT COMMUNICATION
    • How is “data” described?
    • How are processes identified?
    • How does the receiver recognize messages?
    • What does it mean for these operations to complete?
  • BLOCKING SEND/RECEIVE
    • int MPI_Send( void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator )
    • buf: pointer - data to send
    • count: number of elements in buffer .
    • Datatype : which kind of data types in buffer ?
  • BLOCKING SEND/RECEIVE
    • int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator )
    • buf: pointer - data to send
    • count: number of elements in buffer .
    • Datatype : which kind of data types in buffer ?
  • BLOCKING SEND/RECEIVE
    • int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator )
    • buf: pointer - data to send
    • count: number of elements in buffer .
    • Datatype : which kind of data types in buffer ?
  •  
  • BLOCKING SEND/RECEIVE
    • int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator )
    • buf: pointer - data to send
    • count: number of elements in buffer .
    • Datatype : which kind of data types in buffer ?
    • dest: the receiver
    • tag: the label of the message
    • communicator: set of processors involved (MPI_COMM_WORLD)
  • BLOCKING SEND/RECEIVE
    • int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator )
    • buf: pointer - data to send
    • count: number of elements in buffer .
    • Datatype : which kind of data types in buffer ?
    • dest: the receiver
    • tag: the label of the message
    • communicator: set of processors involved (MPI_COMM_WORLD)
  • BLOCKING SEND/RECEIVE
    • int MPI_Send(void * buf , int count , MPI_Datatype datatype , int dest , int tag , MPI_Comm communicator )
    • buf: pointer - data to send
    • count: number of elements in buffer .
    • Datatype : which kind of data types in buffer ?
    • dest: the receiver
    • tag: the label of the message
    • communicator: set of processors involved (MPI_COMM_WORLD)
  • BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
  • A WORD ABOUT SPECIFICATION
    • The user does not know if MPI implementation:
      • copies BUFFER in an internal buffer, start communication, and returns control before all the data are transferred. (BUFFERING)
      • create links between processors, send data and return control when all the data are sent (but NOT received)
      • uses a combination of the above methods
  • BLOCKING SEND/RECEIVE (CONTD.)
    • &quot;return&quot; after it is safe to modify the application buffer
    • Safe
      • modifications will not affect the data intended for the receive task
      • does not imply that the data was actually received
    • Blocking send can be synchronous which means there is handshaking occurring with the receive task to confirm a safe send
    • A blocking send can be asynchronous if a system buffer is used to hold the data for eventual delivery to the receive
    • A blocking receive only &quot;returns&quot; after the data has arrived and is ready for use by the program
  • NON-BLOCKING SEND/RECEIVE
    • return almost immediately
    • simply &quot;request&quot; the MPI library to perform the operation when it is able
    • Cannot predict when that will happen
    • request a send/receive and start doing other work!
    • unsafe to modify the application buffer (your variable space) until you know that the non-blocking operation has been completed
    • MPI_Isend (&buf,count,datatype,dest,tag,comm,&request)
    • MPI_Irecv (&buf,count,datatype,source,tag,comm,&request)
  • NON-BLOCKING SEND/RECEIVE (CONTD.) Process 1 Process 2 Data Processor 1 Application Send System Buffer Processor 2 Application Send System Buffer
    • To check if the send/receive operations have completed
    • int MPI_Irecv (void *buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm, MPI_Request *req);
    • int MPI_Wait(MPI_Request *req, MPI_Status *status);
      • A call to this subroutine cause the code to wait until the communication pointed by req is complete
      • input/output, identifier associated to a communications
      • event (initiated by MPI_ISEND or MPI_IRECV).
      • input/output, identifier associated to a communications event (initiated by MPI_ISEND or MPI_IRECV).
    NON-BLOCKING SEND/RECEIVE (CONTD.)
    • int MPI_Test(MPI_Request *req, int *flag, MPI_Status *status);
      • A call to this subroutine sets flag to true if the communication pointed by req is complete, sets flag to false otherwise.
    NON-BLOCKING SEND/RECEIVE (CONTD.)
  • STANDARD MODE
    • Returns when Sender is free to access and overwrite the send buffer.
    • Might be copied directly into the matching receive buffer, or might be copied into a temporary system buffer.
    • Message buffering decouples the send and receive operations.
    • Message buffering can be expensive.
    • It is up to MPI to decide whether outgoing messages will be buffered
    • The standard mode send is non-local .
  • SYNCHRONOUS MODE
    • Send can be started whether or not a matching receive was posted.
    • Send completes successfully only if a corresponding receive was already posted and has already started to receive the message sent.
    • Blocking send & Blocking receive in synchronous mode.
    • Simulate a synchronous communication.
    • Synchronous Send is non-local .
  • BUFFERED MODE
    • Send operation can be started whether or not a matching receive has been posted.
    • It may complete before a matching receive is posted.
    • Operation is local.
    • MPI must buffer the outgoing message.
    • Error will occur if there is insufficient buffer space.
    • The amount of available buffer space is controlled by the user.
  • BUFFER MANAGEMENT
    • int MPI_Buffer_attach( void* buffer, int size)
    • Provides to MPI a buffer in the user's memory to be used for buffering outgoing messages.
    • int MPI_Buffer_detach( void* buffer_addr, int* size)
    • Detach the buffer currently associated with MPI.
    MPI_Buffer_attach( malloc(BUFFSIZE), BUFFSIZE); /* a buffer of BUFFSIZE bytes can now be used by MPI_Bsend */ MPI_Buffer_detach( &buff, &size); /* Buffer size reduced to zero */ MPI_Buffer_attach( buff, size); /* Buffer of BUFFSIZE bytes available again */
  • READY MODE
    • A send may be started only if the matching receive is already posted.
    • The user must be sure of this.
    • If the receive is not already posted, the operation is erroneous and its outcome is undefined.
    • Completion of the send operation does not depend on the status of a matching receive.
    • Merely indicates that the send buffer can be reused.
    • Ready-send could be replaced by a standard-send with no effect on the behavior of the program other than performance.
  • ORDER AND FAIRNESS
    • Order:
      • MPI Messages are non-overtaking .
      • When a receive matches 2 messages.
      • When a sent message matches 2 receive statements.
      • Message-passing code is deterministic, unless the processes are multi-threaded or the wild-card MPI_ANY_SOURCE is used in a receive statement.
    • Fairness:
      • MPI does not guarantee fairness
      • Example: task 0 sends a message to task 2. However, task 1 sends a competing message that matches task 2's receive. Only one of the sends will complete.
  • EXAMPLE OF NON-OVERTAKING MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
  • EXAMPLE OF INTERTWINGLED MESSAGES. CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag1, comm, ierr) CALL MPI_SSEND(buf2, count, MPI_REAL, 1, tag2, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(buf1, count, MPI_REAL, 0, tag2, comm, status, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag1, comm, status, ierr) END IF
  • DEADLOCK EXAMPLE CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) END IF
  • EXAMPLE OF BUFFERING CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_SEND(buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV (recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) ELSE ! rank.EQ.1 CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF
  • COLLECTIVE COMMUNICATIONS
  • COLLECTIVE ROUTINES
    • Collective routines provide a higher-level way to organize a parallel program.
    • Each process executes the same communication operations.
    • Communications involving group of processes in a communicator.
    • Groups and communicators can be constructed “by hand” or using topology routines.
    • Tags are not used; different communicators deliver similar functionality.
    • No non-blocking collective operations.
    • Three classes of operations: synchronization, data movement, collective computation.
  • COLLECTIVE ROUTINES (CONTD.)
    • int MPI_Barrier(MPI_Comm comm)
    • Stop processes until all processes within a communicator reach the barrier
    • Occasionally useful in measuring performance
  • COLLECTIVE ROUTINES (CONTD.)
    • int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
    • Broadcast
    • One-to-all communication: same data sent from root process to all others in the communicator
  • COLLECTIVE ROUTINES (CONTD.)
    • Reduction
    • The reduction operation allow to:
      • Collect data from each process
      • Reduce the data to a single value
      • Store the result on the root processes
      • Store the result on all processes
    • Reduction function works with arrays
    • other operation: product, min, max, and, ….
    • Internally is usually implemented with a binary tree
  • COLLECTIVE ROUTINES (CONTD.)
    • int MPI_Reduce/MPI_Allreduce(void * snd_buf, void * rcv_buf, int count, MPI_Datatype type, MPI_Op op, int root, MPI_Comm comm)
    • snd_buf: input array
    • rcv_buf output array
    • count: number of element of snd_buf and rcv_buf
    • type: MPI type of snd_buf and rcv_buf
    • op: parallel operation to be performed
    • root: MPI id of the process storing the result
    • comm: communicator of processes involved in the operation
  • MPI OPERATIONS MPI_OP operator MPI_MIN Minimum MPI_SUM Sum MPI_PROD product MPI_MAX maximum MPI_LAND Logical and MPI_BAND Bitwise and MPI_LOR Logical or MPI_BOR Bitwise or MPI_LXOR Logical xor MPI_BXOR Bit-wise xor MPI_MAXLOC Max value and location MPI_MINLOC Min value and location
  • COLLECTIVE ROUTINES (CONTD.)
  • Learn by Examples
  • Parallel Trapezoidal Rule Output: Estimate of the integral from a to b of f(x) using the trapezoidal rule and n trapezoids. Algorithm: 1. Each process calculates &quot;its&quot; interval of integration. 2. Each process estimates the integral of f(x) over its interval using the trapezoidal rule. 3a. Each process != 0 sends its integral to 0. 3b. Process 0 sums the calculations received from the individual processes and prints the result. Notes: 1. f(x), a, b, and n are all hardwired. 2. The number of processes (p) should evenly divide the number of trapezoids (n = 1024)
  • Parallelizing the Trapezoidal Rule #include <stdio.h> #include &quot;mpi.h&quot; main(int argc, char** argv) { int my_rank; /* My process rank */ int p; /* The number of processes */ double a = 0.0; /* Left endpoint */ double b = 1.0; /* Right endpoint */ int n = 1024; /* Number of trapezoids */ double h; /* Trapezoid base length */ double local_a; /* Left endpoint my process */ double local_b; /* Right endpoint my process */ int local_n; /* Number of trapezoids for */ /* my calculation */ double integral; /* Integral over my interval */ double total; /* Total integral */ int source; /* Process sending integral */ int dest = 0; /* All messages go to 0 */ int tag = 0; MPI_Status status;
  • Continued… double Trap(double local_a, double local_b, int local_n,double h); /* Calculate local integral */ MPI_Init (&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); double elapsed_time = -MPI_Wtime(); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); h = (b-a)/n; /* h is the same for all processes */ local_n = n/p; /* So is the number of trapezoids */ /* Length of each process' interval of integration = local_n*h. So my interval starts at: */ local_a = a + my_rank*local_n*h; local_b = local_a + local_n*h; integral = Trap(local_a, local_b, local_n, h);
  • Continued… /* Add up the integrals calculated by each process */ if (my_rank == 0) { total = integral; for (source = 1; source < p; source++) { MPI_Recv(&integral, 1, MPI_DOUBLE, source, tag, MPI_COMM_WORLD, &status); total = total + integral; }//End for } else MPI_Send(&integral, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD); elapsed_time += MPI_Wtime(); /* Print the result */ if (my_rank == 0) { printf(&quot;With n = %d trapezoids, our estimaten&quot;,n); printf(&quot;of the integral from %lf to %lf = %lfn&quot;,a, b, total); printf(&quot;time taken: %lfn&quot;, elapsed_time); }
  • Continued… /* Shut down MPI */ MPI_Finalize(); } /* main */ double Trap( double local_a , double local_b, int local_n, double h) { double integral; /* Store result in integral */ double x; int i; double f(double x); /* function we're integrating */ integral = (f(local_a) + f(local_b))/2.0; x = local_a; for (i = 1; i <= local_n-1; i++) { x = x + h; integral = integral + f(x); } integral = integral*h; return integral; } /* Trap */
  • Continued… double f(double x) { double return_val; /* Calculate f(x). */ /* Store calculation in return_val. */ return_val = 4 / (1+x*x); return return_val; } /* f */
  • Program 2 Process other than root generates the random value less than 1 and sends to root. Root sums up and displays sum.
  • #include <stdio.h> #include <mpi.h> #include<stdlib.h> #include <string.h> #include<time.h> int main(int argc, char **argv) { int myrank, p; int tag =0, dest=0; int i; double randIn,randOut; int source; MPI_Status status; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
  • MPI_Comm_size(MPI_COMM_WORLD, &p); if(myrank==0)//I am the root { double total=0,average=0; for(source=1;source<p;source++) { MPI_Recv(&randIn,1, MPI_DOUBLE, source, MPI_ANY_TAG, MPI_COMM_WORLD, &status); printf(&quot;Message from root: From %d received number %fn&quot;,source ,randIn); total+=randIn; }//End for average=total/(p-1); }//End if
  • else//I am other than root { srand48((long int) myrank); randOut=drand48(); printf(&quot;randout=%f, myrank=%dn&quot;,randOut,myrank); MPI_Send(&randOut,1,MPI_DOUBLE,dest,tag,MPI_COMM_WORLD); }//End If-Else MPI_Finalize(); return 0; }
  • MPI References
    • The Standard itself:
      • at http://www.mpi-forum.org
      • All MPI official releases, in both postscript and HTML
    • Books:
      • Using MPI: Portable Parallel Programming with the Message-Passing Interface , 2nd Edition, by Gropp, Lusk, and Skjellum, MIT Press, 1999. Also Using MPI-2 , w. R. Thakur
      • MPI: The Complete Reference, 2 vols , MIT Press, 1999.
      • Designing and Building Parallel Programs , by Ian Foster, Addison-Wesley, 1995.
      • Parallel Programming with MPI , by Peter Pacheco, Morgan-Kaufmann, 1997.
    • Other information on Web:
      • at http://www.mcs.anl.gov/mpi
      • For man pages of open MPI on the web : http://www.open-mpi.org/doc/v1.4/
      • apropos mpi
  • THANK YOU