Parallel computing(2)


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Parallel computing(2)

  1. 1. Introduction to Parallel Computing Part IIb
  2. 2. What is MPI?Message Passing Interface (MPI) is astandardised interface. Using this interface,several implementations have been made.The MPI standard specifies three forms ofsubroutine interfaces:(1) Language independent notation;(2) Fortran notation;(3) C notation.
  3. 3. MPI FeaturesMPI implementations provide:• Abstraction of hardware implementation• Synchronous communication• Asynchronous communication• File operations• Time measurement operations
  4. 4. ImplementationsMPICH Unix / Windows NTMPICH-T3E Cray T3ELAM Unix/SGI Irix/IBM AIXChimp SunOS/AIX/Irix/HP-UXWinMPI Windows 3.1 (no network req.)
  5. 5. Programming with MPIWhat is the difference between programmingusing the traditional approach and the MPIapproach:1. Use of MPI library2. Compiling3. Running
  6. 6. Compiling (1)When a program is written, compiling itshould be done a little bit different from thenormal situation. Although details differ forvarious MPI implementations, there aretwo frequently used approaches.
  7. 7. Compiling (2)First approach $ gcc myprogram.c –o myexecutable -lmpiSecond approach $ mpicc myprogram.c –o myexecutable
  8. 8. Running (1)In order to run an MPI-Enabled applicationwe should generally use the command‘mpirun’: $ mpirun –np x myexecutable <parameters>Where x is the number of processes to use,and <parameters> are the arguments to theExecutable, if any.
  9. 9. Running (2)The ‘mpirun’ program will take care of thecreation of processes on selected processors.By default, ‘mpirun’ will decide whichprocessors to use, this is usually determinedby a global configuration file. It is possibleto specify processors, but they may only beused as a hint.
  10. 10. MPI Programming (1)Implementations of MPI support Fortran, C,or both. Here we only consider programmingusing the C Libraries. The first step in writinga program using MPI is to include the correctheader: #include “mpi.h”
  11. 11. MPI Programming (2)#include “mpi.h”int main (int argc, char *argv[]){ … MPI_Init(&argc, &argv); … MPI_Finalize(); return …;}
  12. 12. MPI_Initint MPI_Init (int *argc, char ***argv)The MPI_Init procedure should be calledbefore any other MPI procedure (exceptMPI_Initialized). It must be called exactlyonce, at program initialisation. If removesthe arguments that are used by MPI from theargument array.
  13. 13. MPI_Finalizeint MPI_Finalize (void)This routine cleans up all MPI states. It shouldbe the last MPI routine to be called in aprogram; no other MPI routine may be calledafter MPI_Finalize. Pending communicationshould be finished before finalisation.
  14. 14. Using multiple processesWhen running an MPI enabled program usingmultiple processes, each process will run anidentical copy of the program. So there mustbe a way to know which process we are.This situation is comparable to that ofprogramming using the ‘fork’ statement. MPIdefines two subroutines that can be used.
  15. 15. MPI_Comm_sizeint MPI_Comm_size (MPI_Comm comm, int *size)This call returns the number of processesinvolved in a communicator. To find out howmany processes are used in total, call thisfunction with the predefined globalcommunicator MPI_COMM_WORLD.
  16. 16. MPI_Comm_rankint MPI_Comm_rank (MPI_Comm comm, int *rank)This procedure determines the rank (index) ofthe calling process in the communicator. Eachprocess is assigned a unique number within acommunicator.
  17. 17. MPI_COMM_WORLDMPI communicators are used to specify towhat processes communication applies to.A communicator is shared by a group ofprocesses. The predefined MPI_COMM_WORLDapplies to all processes. Communicators canbe duplicated, created and deleted. For mostapplication, use of MPI_COMM_WORLDsuffices.
  18. 18. Example ‘Hello World!’#include <stdio.h>#include "mpi.h"int main (int argc, char *argv[]){ int size, rank; MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Comm_rank (MPI_COMM_WORLD, &rank); printf ("Hello world! from processor (%d/%d)n", rank+1, size); MPI_Finalize(); return 0;}
  19. 19. Running ‘Hello World!’$ mpicc -o hello hello.c$ mpirun -np 3 helloHello world! from processor (1/3)Hello world! from processor (2/3)Hello world! from processor (3/3)$ _
  20. 20. MPI_Sendint MPI_Send (void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm )Synchronously sends a message to dest. Datais found in buf, that contains count elementsof datatype. To identify the send, a tag has tobe specified. The destination dest is theprocessor rank in communicator comm.
  21. 21. MPI_Recvint MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)Synchronously receives a message from source.Buffer must be able to hold count elements ofdatatype. The status field is filled with statusinformation. MPI_Recv and MPI_Send callsshould match; equal tag, count, datatype.
  22. 22. DatatypesMPI_CHAR signed charMPI_SHORT signed short intMPI_INT signed intMPI_LONG signed long intMPI_UNSIGNED_CHAR unsigned charMPI_UNSIGNED_SHORT unsigned short intMPI_UNSIGNED unsigned intMPI_UNSIGNED_LONG unsigned long intMPI_FLOAT floatMPI_DOUBLE doubleMPI_LONG_DOUBLE long double(
  23. 23. Example send / receive#include <stdio.h>#include "mpi.h"int main (int argc, char *argv[]){ MPI_Status s; int size, rank, i, j; MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) // Master process { printf ("Receiving data . . .n"); for (i = 1; i < size; i++) { MPI_Recv ((void *)&j, 1, MPI_INT, i, 0xACE5, MPI_COMM_WORLD, &s); printf ("[%d] sent %dn", i, j); } } else { j = rank * rank; MPI_Send ((void *)&j, 1, MPI_INT, 0, 0xACE5, MPI_COMM_WORLD); } MPI_Finalize(); return 0;}
  24. 24. Running send / receive$ mpicc -o sendrecv sendrecv.c$ mpirun -np 4 sendrecvReceiving data . . .[1] sent 1[2] sent 4[3] sent 9$ _
  25. 25. MPI_Bcastint MPI_Bcast (void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)Synchronously broadcasts a message fromroot, to all processors in communicator comm(including itself). Buffer is used as source inroot processor, as destination in others.
  26. 26. MPI_Barrierint MPI_Barrier (MPI_Comm comm)Blocks until all processes defined in commhave reached this routine. Use this routine tosynchronize processes.
  27. 27. Example broadcast / barrierint main (int argc, char *argv[]){ int rank, i; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) i = 27; MPI_Bcast ((void *)&i, 1, MPI_INT, 0, MPI_COMM_WORLD); printf ("[%d] i = %dn", rank, i); // Wait for every process to reach this code MPI_Barrier (MPI_COMM_WORLD); MPI_Finalize(); return 0;}
  28. 28. Running broadcast / barrier$ mpicc -o broadcast broadcast.c$ mpirun -np 3 broadcast[0] i = 27[1] i = 27[2] i = 27$ _
  29. 29. MPI_Sendrecvint MPI_Sendrecv (void *sendbuf, int sendcount, MPI_Datatype sendtype, int dest, int sendtag, void *recvbuf, int recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status *status)int MPI_Sendrecv_replace( void *buf, int count, MPI_Datatype datatype, int dest, int sendtag, int source, int recvtag, MPI_Comm comm, MPI_Status *status )Send and receive (2nd, using only one buffer).
  30. 30. Other useful routines• MPI_Scatter• MPI_Gather• MPI_Type_vector• MPI_Type_commit• MPI_Reduce / MPI_Allreduce• MPI_Op_create
  31. 31. Example scatter / reduceint main (int argc, char *argv[]){ int data[] = {1, 2, 3, 4, 5, 6, 7}; // Size must be >= #processors int rank, i = -1, j = -1; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Scatter ((void *)data, 1, MPI_INT, (void *)&i , 1, MPI_INT, 0, MPI_COMM_WORLD); printf ("[%d] Received i = %dn", rank, i); MPI_Reduce ((void *)&i, (void *)&j, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD); printf ("[%d] j = %dn", rank, j); MPI_Finalize(); return 0;}
  32. 32. Running scatter / reduce$ mpicc -o scatterreduce scatterreduce.c$ mpirun -np 4 scatterreduce[0] Received i = 1[0] j = 24[1] Received i = 2[1] j = -1[2] Received i = 3[2] j = -1[3] Received i = 4[3] j = -1$ _
  33. 33. Some reduce operationsMPI_MAX Maximum valueMPI_MIN Minimum valueMPI_SUM Sum of valuesMPI_PROD Product of valuesMPI_LAND Logical ANDMPI_BAND Boolean ANDMPI_LOR Logical ORMPI_BOR Boolean ORMPI_LXOR Logical Exclusive ORMPI_BXOR Boolean Exclusive OR
  34. 34. Measuring running timedouble MPI_Wtime (void); double timeStart, timeEnd; ... timeStart = MPI_Wtime(); // Code to measure time for goes here. timeEnd = MPI_Wtime() ... printf (“Running time = %f secondsn”, timeEnd – timeStart);
  35. 35. Parallel sorting (1)Sorting an sequence of numbers using thebinary–sort method. This method dividesa given sequence into two halves (untilonly one element remains) and sorts bothhalves recursively. The two halves are thenmerged together to form a sorted sequence.
  36. 36. Binary sort pseudo-codesorted-sequence BinarySort (sequence){ if (# elements in sequence > 1) { seqA = first half of sequence seqB = second half of sequence BinarySort (seqA); BinarySort (seqB); sorted-sequence = merge (seqA, seqB); } else sorted-sequence = sequence}
  37. 37. Merge two sorted sequences 1 2 5 7 3 4 6 8 1 2 3 4 5 6 7 8
  38. 38. Example binary – sort 1 2 7 3 5 4 2 5 8 6 4 7 6 8 3 1 2 7 5 7 2 3 8 4 6 8 3 1 7 2 5 5 2 4 8 8 4 3 6 6 31 7 5 2 8 4 6 3
  39. 39. Parallel sorting (2)This way of dividing work and gathering theresults is a quite natural way to use for aparallel implementation. Divide work in twoto two processors. Have each of theseprocessors divide their work again, until eitherno data can be split again or no processors areavailable anymore.
  40. 40. Implementation problems• Number of processors may not be a power of two• Number of elements may not be a power of two• How to achieve an even workload?• Data size is less than number of processors
  41. 41. Parallel matrix multiplicationWe use the following partitioning of data (p=4) P1 P1 P2 P2 P3 P3 P4 P4
  42. 42. Implementation1. Master (process 0) reads data2. Master sends size of data to slaves3. Slaves allocate memory4. Master broadcasts second matrix to all other processes5. Master sends respective parts of first matrix to all other processes6. Every process performs its local multiplication7. All slave processes send back their result.
  43. 43. Multiplication 1000 x 1000 1000 x 1000 Matrix multiplication 140 120 100 80Time (s) 60 40 20 0 0 10 20 30 40 50 60 Processors Tp T1 / p
  44. 44. Multiplication 5000 x 5000 5000 x 5000 Matrix multiplication 90000 80000 70000 60000Time (s) 50000 40000 30000 20000 10000 0 0 5 10 15 20 25 30 35 Processors Tp T1 / p
  45. 45. Gaussian eliminationWe use the following partitioning of data (p=4) P1 P1 P2 P2 P3 P3 P4 P4
  46. 46. Implementation (1)1. Master reads both matrices2. Master sends size of matrices to slaves3. Slaves calculate their part and allocate memory4. Master sends each slave its respective part5. Set sweeping row to 0 in all processes6. Sweep matrix (see next sheet)7. Slave send back their result
  47. 47. Implementation (2)While sweeping row not past final row doA. Have every process decide whether they own the current sweeping rowB. The owner sends a copy of the row to every other processC. All processes sweep their part of the matrix using the current rowD. Sweeping row is incremented
  48. 48. Programming hints• Keep it simple!• Avoid deadlocks• Write robust code even at cost of speed• Design in advance, debugging is more difficult (printing output is different)• Error handing requires synchronisation, you can’t just exit the program.
  49. 49. References (1)MPI Forum Home Page guide to MPI (see also /MPI/)
  50. 50. References (2)Miscellaneous