Operating Systems - Distributed Parallel Computing

Operating Systems
CMPSCI 377
Distributed Parallel Programming
Emery Berger
University of Massachusetts Amherst

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Outline
Previously:


Programming with threads


Shared memory, single machine


Today:


Distributed parallel programming


Message passing


some material adapted from slides by Kathy Yelick, UC Berkeley
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2

Why Distribute?
SMP (symmetric

multiprocessor):
P2
P1 Pn
easy to program
$
$ $
but limited
Bus becomes network/bus

bottleneck when
processors not memory
operating locally
Typically < 32

processors
$$$



Distributed Memory
Vastly different platforms


 Networks of workstations

 Supercomputers

 Clusters


Distributed Architectures
Distributed memory machines:


local memory but no global memory
Individual nodes often SMPs


Network interface for all interprocessor


communication – message passing

P1 NI
P0 NI Pn NI
memory
memory ... memory

interconnect


Message Passing
Program: # independent communicating processes


Thread + local address space only


Shared data: partitioned


Communicate by send & receive events


 Cluster = message sent over sockets

s: 14
s: 12 s: 11
receive Pn,s
y = ..s ... i: 3
i: 2 i: 1

send P1,s
P1 Pn
P0
Network

Message Passing
Pros: efficient


Makes data sharing explicit


Can communicate only what is strictly

necessary for computation
No coherence protocols, etc.


Cons: difficult


Requires manual partitioning


Divide up problem across processors


Unnatural model (for some)


Deadlock-prone (hurray)



Message Passing Interface
Library approach to message-passing


Supports most common architectural

abstractions
Vendors supply optimized versions


⇒ programs run on different machine, but with
(somewhat) different performance
Bindings for popular languages


Especially Fortran, C


Also C++, Java



MPI execution model
Spawns multiple copies of same program


(SPMD = single program, multiple data)
Each one is different “process”


(different local memory)
Can act differently by determining which


processor “self” corresponds to


An Example
#include <stdio.h>
#include <mpi.h>

int main(int argc, char * argv[]) {
int rank, size;
MPI_Init(&argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(quot;Hello world from process %d of %dnquot;,
rank, size);
MPI_Finalize();
return 0;
}

% mpirun –np 10 exampleProgram


An Example
#include <stdio.h>
#include <mpi.h> initializes MPI
(passes
int main(int argc, char * argv[])arguments in)
{
int rank, size;
rank, size);
MPI_Finalize();
return 0;
}



An Example
#include <stdio.h>
#include <mpi.h> returns # of
processors in
int main(int argc, char * argv[]) { “world”
int rank, size;
rank, size);
MPI_Finalize();
return 0;
}



An Example
#include <stdio.h>
#include <mpi.h>

int rank, size;
which processor
MPI_Comm_size(MPI_COMM_WORLD, &size); am I?
rank, size);
MPI_Finalize();
return 0;
}



An Example
#include <stdio.h>
#include <mpi.h>

int rank, size;
we’re done
sending
messages
rank, size);
MPI_Finalize();
return 0;
}



An Example
Hello world from process 5 of 10
% // what happened?


Message Passing
Messages can be sent directly to another


processor
MPI_Send, MPI_Recv


Or to all processors


MPI_Bcast (does send or receive)



Send/Recv Example
Send data from process 0 to all


“Pass it along” communication


Operations:

MPI_Send (data *, count, MPI_INT, dest, 0,

MPI_COMM_WORLD );
MPI_Recv (data *, count, MPI_INT, source, 0,

MPI_COMM_WORLD );


Send & Receive
int rank, value, size;
MPI_Status status;
MPI_Init(&argc, &argv);
do {
if (rank == 0) {
scanf( quot;%dquot;, &value );
MPI_Send(&value, 1, MPI_INT, rank + 1,
0, MPI_COMM_WORLD );
} else {
MPI_Recv(&value, 1, MPI_INT, rank - 1,
0, MPI_COMM_WORLD, &status );
if (rank < size - 1)
MPI_Send( &value, 1, MPI_INT, rank + 1,
}
printf(quot;Process %d got %dnquot;, rank, value);
} while (value >= 0);
MPI_Finalize();
return 0;
}
Send integer input in a ring


Send & Receive
MPI_Status status;
send
do {
if (rank == 0) {
destination?
MPI_Send(&value, 1, MPI_INT, rank + 1,
} else {
}
MPI_Finalize();
return 0;
}


Send & Receive
MPI_Status status;
do {
if (rank == 0) {
MPI_Send(&value, 1, MPI_INT, rank + 1, receive from?
} else {
}
MPI_Finalize();
return 0;
}


Send & Receive
MPI_Status status;
do {
if (rank == 0) {
MPI_Send(&value, 1, MPI_INT, rank + 1, message tag
} else {
message tag
MPI_Send( &value, 1, MPI_INT, rank + 1, message tag
}
MPI_Finalize();
return 0;
}


Exercise
Compute expensiveComputation(i) on n processors;


process 0 computes & prints sum
// MPI_Send (&value, 1, MPI_INT, dest, 0, MPI_COMM_WORLD );
int rank, size;
MPI_Status status;
if (rank == 0) {
int sum = 0;

printf(“sum = %dnquot;, sum);
} else {

}
MPI_Finalize(); return 0;
}


Broadcast
Send and receive: point-to-point


Can also broadcast data


Source sends to everyone else



Broadcast
#include <stdio.h>
#include <mpi.h>

int rank, value;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
do {
if (rank == 0)
MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf( quot;Process %d got %dnquot;, rank, value );
MPI_Finalize( );
return 0;
}

Repeatedly broadcast input (one integer) to all



Broadcast
#include <stdio.h>
#include <mpi.h>

int rank, value;
send or receive
do {
value
if (rank == 0)
MPI_Finalize( );
return 0;
}




Broadcast
#include <stdio.h>
#include <mpi.h>

int rank, value;
how many to
do {
send/receive?
if (rank == 0)
MPI_Finalize( );
return 0;
}




Broadcast
#include <stdio.h>
#include <mpi.h>

int rank, value;
what’s the
do {
datatype?
if (rank == 0)
MPI_Finalize( );
return 0;
}




Broadcast
#include <stdio.h>
#include <mpi.h>

int rank, value;
who’s “root” for
do {
broadcast?
if (rank == 0)
MPI_Finalize( );
return 0;
}




Communication Flavors
Basic communication


blocking = wait until done


point-to-point = from me to you


broadcast = from me to everyone


Non-blocking


Think create & join, fork & wait…


MPI_ISend, MPI_IRecv


MPI_Wait, MPI_Waitall, MPI_Test


Collective



The End


Scaling Limits
Kernel used in


atmospheric models
99% floating point


ops; multiplies/adds
Sweeps through


memory with little
reuse
One “copy” of code


running
independently on
varying numbers of
procs

From Pat Worley, ORNL

Operating Systems - Distributed Parallel Computing

Recommended

Recommended

More Related Content

Similar to Operating Systems - Distributed Parallel Computing

Similar to Operating Systems - Distributed Parallel Computing (20)

More from Emery Berger

More from Emery Berger (20)

Recently uploaded

Recently uploaded (20)

Operating Systems - Distributed Parallel Computing