Operating Systems
           CMPSCI 377
Distributed Parallel Programming
               Emery Berger
   University of Mass...
Outline
    Previously:


        Programming with threads
    

              Shared memory, single machine
          ...
Why Distribute?
    SMP (symmetric

    multiprocessor):
                                                              P2...
Distributed Memory
    Vastly different platforms


      Networks of workstations

      Supercomputers

      Cluste...
Distributed Architectures
    Distributed memory machines:


    local memory but no global memory
        Individual nod...
Message Passing
    Program: # independent communicating processes


        Thread + local address space only
    

   ...
Message Passing
    Pros: efficient


        Makes data sharing explicit
    

        Can communicate only what is str...
Message Passing Interface
    Library approach to message-passing


    Supports most common architectural

    abstract...
MPI execution model
    Spawns multiple copies of same program


    (SPMD = single program, multiple data)
        Each ...
An Example
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, size;
  MPI_Init(&argc, &a...
An Example
#include <stdio.h>
#include <mpi.h>                 initializes MPI
                                     (passe...
An Example
#include <stdio.h>
#include <mpi.h>                                                             returns # of
  ...
An Example
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, size;
                    ...
An Example
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, size;
  MPI_Init(&argc, &a...
An Example
% mpirun –np 10 exampleProgram
Hello world from process 5 of 10
Hello world from process 3 of 10
Hello world fr...
Message Passing
    Messages can be sent directly to another


    processor
        MPI_Send, MPI_Recv
    

    Or to ...
Send/Recv Example
    Send data from process 0 to all


    “Pass it along” communication





    Operations:

       ...
Send & Receive
int main(int argc, char * argv[]) {
  int rank, value, size;
  MPI_Status status;
  MPI_Init(&argc, &argv);...
Send & Receive
int main(int argc, char * argv[]) {
  int rank, value, size;
  MPI_Status status;
  MPI_Init(&argc, &argv);...
Send & Receive
int main(int argc, char * argv[]) {
  int rank, value, size;
  MPI_Status status;
  MPI_Init(&argc, &argv);...
Send & Receive
int main(int argc, char * argv[]) {
  int rank, value, size;
  MPI_Status status;
  MPI_Init(&argc, &argv);...
Exercise
         Compute expensiveComputation(i) on n processors;
     

         process 0 computes & prints sum
// MPI...
Broadcast
    Send and receive: point-to-point


    Can also broadcast data


        Source sends to everyone else
   ...
Broadcast
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, value;
  MPI_Init( &argc, &...
Broadcast
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, value;
  MPI_Init( &argc, &...
Broadcast
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, value;
  MPI_Init( &argc, &...
Broadcast
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, value;
  MPI_Init( &argc, &...
Broadcast
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
  int rank, value;
  MPI_Init( &argc, &...
Communication Flavors
    Basic communication


        blocking = wait until done
    

        point-to-point = from m...
The End




  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   30
Scaling Limits
    Kernel used in


    atmospheric models
         99% floating point
     

         ops; multiplies/a...
Upcoming SlideShare
Loading in …5
×

Operating Systems - Distributed Parallel Computing

3,064
-1

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,064
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
162
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Operating Systems - Distributed Parallel Computing

  1. 1. Operating Systems CMPSCI 377 Distributed Parallel Programming Emery Berger University of Massachusetts Amherst UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  2. 2. Outline Previously:  Programming with threads  Shared memory, single machine  Today:  Distributed parallel programming  Message passing  some material adapted from slides by Kathy Yelick, UC Berkeley UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
  3. 3. Why Distribute? SMP (symmetric  multiprocessor): P2 P1 Pn easy to program $ $ $ but limited Bus becomes network/bus  bottleneck when processors not memory operating locally Typically < 32  processors $$$  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
  4. 4. Distributed Memory Vastly different platforms   Networks of workstations  Supercomputers  Clusters UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
  5. 5. Distributed Architectures Distributed memory machines:  local memory but no global memory Individual nodes often SMPs  Network interface for all interprocessor  communication – message passing P1 NI P0 NI Pn NI memory memory ... memory interconnect UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
  6. 6. Message Passing Program: # independent communicating processes  Thread + local address space only  Shared data: partitioned  Communicate by send & receive events   Cluster = message sent over sockets s: 14 s: 12 s: 11 receive Pn,s y = ..s ... i: 3 i: 2 i: 1 send P1,s P1 Pn P0 Network UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
  7. 7. Message Passing Pros: efficient  Makes data sharing explicit  Can communicate only what is strictly  necessary for computation No coherence protocols, etc.  Cons: difficult  Requires manual partitioning  Divide up problem across processors  Unnatural model (for some)  Deadlock-prone (hurray)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
  8. 8. Message Passing Interface Library approach to message-passing  Supports most common architectural  abstractions Vendors supply optimized versions  ⇒ programs run on different machine, but with (somewhat) different performance Bindings for popular languages  Especially Fortran, C  Also C++, Java  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
  9. 9. MPI execution model Spawns multiple copies of same program  (SPMD = single program, multiple data) Each one is different “process”  (different local memory) Can act differently by determining which  processor “self” corresponds to UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
  10. 10. An Example #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, size; MPI_Init(&argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(quot;Hello world from process %d of %dnquot;, rank, size); MPI_Finalize(); return 0; } % mpirun –np 10 exampleProgram UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
  11. 11. An Example #include <stdio.h> #include <mpi.h> initializes MPI (passes int main(int argc, char * argv[])arguments in) { int rank, size; MPI_Init(&argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(quot;Hello world from process %d of %dnquot;, rank, size); MPI_Finalize(); return 0; } % mpirun –np 10 exampleProgram UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
  12. 12. An Example #include <stdio.h> #include <mpi.h> returns # of processors in int main(int argc, char * argv[]) { “world” int rank, size; MPI_Init(&argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(quot;Hello world from process %d of %dnquot;, rank, size); MPI_Finalize(); return 0; } % mpirun –np 10 exampleProgram UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
  13. 13. An Example #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, size; which processor MPI_Init(&argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); am I? MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf(quot;Hello world from process %d of %dnquot;, rank, size); MPI_Finalize(); return 0; } % mpirun –np 10 exampleProgram UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
  14. 14. An Example #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, size; MPI_Init(&argc, &argv ); MPI_Comm_size(MPI_COMM_WORLD, &size); we’re done MPI_Comm_rank(MPI_COMM_WORLD, &rank); sending printf(quot;Hello world from process %d of %dnquot;, messages rank, size); MPI_Finalize(); return 0; } % mpirun –np 10 exampleProgram UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
  15. 15. An Example % mpirun –np 10 exampleProgram Hello world from process 5 of 10 Hello world from process 3 of 10 Hello world from process 9 of 10 Hello world from process 0 of 10 Hello world from process 2 of 10 Hello world from process 4 of 10 Hello world from process 1 of 10 Hello world from process 6 of 10 Hello world from process 8 of 10 Hello world from process 7 of 10 % // what happened? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
  16. 16. Message Passing Messages can be sent directly to another  processor MPI_Send, MPI_Recv  Or to all processors  MPI_Bcast (does send or receive)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
  17. 17. Send/Recv Example Send data from process 0 to all  “Pass it along” communication  Operations:  MPI_Send (data *, count, MPI_INT, dest, 0,  MPI_COMM_WORLD ); MPI_Recv (data *, count, MPI_INT, source, 0,  MPI_COMM_WORLD ); UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
  18. 18. Send & Receive int main(int argc, char * argv[]) { int rank, value, size; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); do { if (rank == 0) { scanf( quot;%dquot;, &value ); MPI_Send(&value, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD ); } else { MPI_Recv(&value, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, &status ); if (rank < size - 1) MPI_Send( &value, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD ); } printf(quot;Process %d got %dnquot;, rank, value); } while (value >= 0); MPI_Finalize(); return 0; } Send integer input in a ring  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
  19. 19. Send & Receive int main(int argc, char * argv[]) { int rank, value, size; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); send do { if (rank == 0) { destination? scanf( quot;%dquot;, &value ); MPI_Send(&value, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD ); } else { MPI_Recv(&value, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, &status ); if (rank < size - 1) MPI_Send( &value, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD ); } printf(quot;Process %d got %dnquot;, rank, value); } while (value >= 0); MPI_Finalize(); return 0; } Send integer input in a ring  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
  20. 20. Send & Receive int main(int argc, char * argv[]) { int rank, value, size; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); do { if (rank == 0) { scanf( quot;%dquot;, &value ); MPI_Send(&value, 1, MPI_INT, rank + 1, receive from? 0, MPI_COMM_WORLD ); } else { MPI_Recv(&value, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, &status ); if (rank < size - 1) MPI_Send( &value, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD ); } printf(quot;Process %d got %dnquot;, rank, value); } while (value >= 0); MPI_Finalize(); return 0; } Send integer input in a ring  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
  21. 21. Send & Receive int main(int argc, char * argv[]) { int rank, value, size; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); do { if (rank == 0) { scanf( quot;%dquot;, &value ); MPI_Send(&value, 1, MPI_INT, rank + 1, message tag 0, MPI_COMM_WORLD ); } else { message tag MPI_Recv(&value, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, &status ); if (rank < size - 1) MPI_Send( &value, 1, MPI_INT, rank + 1, message tag 0, MPI_COMM_WORLD ); } printf(quot;Process %d got %dnquot;, rank, value); } while (value >= 0); MPI_Finalize(); return 0; } UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
  22. 22. Exercise Compute expensiveComputation(i) on n processors;  process 0 computes & prints sum // MPI_Send (&value, 1, MPI_INT, dest, 0, MPI_COMM_WORLD ); int main(int argc, char * argv[]) { int rank, size; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if (rank == 0) { int sum = 0; printf(“sum = %dnquot;, sum); } else { } MPI_Finalize(); return 0; } UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
  23. 23. Broadcast Send and receive: point-to-point  Can also broadcast data  Source sends to everyone else  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
  24. 24. Broadcast #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, value; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); do { if (rank == 0) scanf( quot;%dquot;, &value ); MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD); printf( quot;Process %d got %dnquot;, rank, value ); } while (value >= 0); MPI_Finalize( ); return 0; } Repeatedly broadcast input (one integer) to all  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24
  25. 25. Broadcast #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, value; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); send or receive do { value if (rank == 0) scanf( quot;%dquot;, &value ); MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD); printf( quot;Process %d got %dnquot;, rank, value ); } while (value >= 0); MPI_Finalize( ); return 0; } Repeatedly broadcast input (one integer) to all  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25
  26. 26. Broadcast #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, value; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); how many to do { send/receive? if (rank == 0) scanf( quot;%dquot;, &value ); MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD); printf( quot;Process %d got %dnquot;, rank, value ); } while (value >= 0); MPI_Finalize( ); return 0; } Repeatedly broadcast input (one integer) to all  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26
  27. 27. Broadcast #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, value; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); what’s the do { datatype? if (rank == 0) scanf( quot;%dquot;, &value ); MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD); printf( quot;Process %d got %dnquot;, rank, value ); } while (value >= 0); MPI_Finalize( ); return 0; } Repeatedly broadcast input (one integer) to all  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27
  28. 28. Broadcast #include <stdio.h> #include <mpi.h> int main(int argc, char * argv[]) { int rank, value; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); who’s “root” for do { broadcast? if (rank == 0) scanf( quot;%dquot;, &value ); MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD); printf( quot;Process %d got %dnquot;, rank, value ); } while (value >= 0); MPI_Finalize( ); return 0; } Repeatedly broadcast input (one integer) to all  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28
  29. 29. Communication Flavors Basic communication  blocking = wait until done  point-to-point = from me to you  broadcast = from me to everyone  Non-blocking  Think create & join, fork & wait…  MPI_ISend, MPI_IRecv  MPI_Wait, MPI_Waitall, MPI_Test  Collective  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29
  30. 30. The End UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30
  31. 31. Scaling Limits Kernel used in  atmospheric models 99% floating point  ops; multiplies/adds Sweeps through  memory with little reuse One “copy” of code  running independently on varying numbers of procs From Pat Worley, ORNL UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×