SlideShare a Scribd company logo
1 of 82
1
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Point to Point Communications in MPI
• Basic operations of Point to Point (PtoP)
communication and issues of deadlock
• Several steps are involved in the PtoP
communication
• Sending process
– data is copied to the user buffer by the user
– User calls one of the MPI send routines
– System copies the data from the user buffer to the
system buffer
– System sends the data from the system buffer to the
destination processor
2
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Point to Point Communications in MPI
• Receiving process
– User calls one of the MPI receive subroutines
– System receives the data from the source process, and
copies it to the system buffer
– System copies the data from the system buffer to the
user buffer
– User uses the data in the user buffer
3
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
sendbuf
Call send routine
Now sendbuf can be reused
Process 0 : User mode Kernel mode
Copying data from sendbuf to
systembuf
Send data from sysbuf to
dest
data
Process 1 : User mode Kernel mode
Call receive routine
receive data from src to
systembuf
Copying data from sysbuf
to recvbuf
sysbuf
sysbuf
recvbuf
Now recvbuf contains
valid data
4
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Unidirectional communication
• Blocking send and blocking receive
if (myrank == 0) then
call MPI_Send(…)
elseif (myrank == 1) then
call MPI_Recv(….)
endif
• Non-blocking send and blocking receive
if (myrank == 0) then
call MPI_ISend(…)
call MPI_Wait(…)
else if (myrank == 1) then
call MPI_Recv(….)
endif
5
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
• Blocking send and non-blocking recv
if (myrank == 0 ) then
call MPI_Send(…..)
elseif (myrank == 1) then
call MPI_Irecv (…)
call MPI_Wait(…)
endif
• Non-blocking send and non-blocking recv
if (myrank == 0 ) then
call MPI_Isend (…)
call MPI_Wait (…)
elseif (myrank == 1) then
call MPI_Irecv (….)
call MPI_Wait(..)
endif
Unidirectional communication
6
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Bidirectional communication
• Need to be careful about deadlock when two processes exchange data with
each other
• Deadlock can occur due to incorrect order of send and recv or due to limited
size of the system buffer
sendbuf
recvbuf
Rank 0 Rank 1
recvbuf
sendbuf
7
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Bidirectional communication
• Case 1 : both processes call send first, then recv
if (myrank == 0 ) then
call MPI_Send(….)
call MPI_Recv (…)
elseif (myrank == 1) then
call MPI_Send(….)
call MPI_Recv(….)
endif
• No deadlock as long as system buffer is larger than send buffer
• Deadlock if system buffer is smaller than send buf
• If you replace MPI_Send with MPI_Isend and MPI_Wait, it is still the same
• Moral : there may be error in coding that only shows up for
larger problem size
8
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Bidirectional communication
• Following is free from deadlock
if (myrank == 0 ) then
call MPI_Isend(….)
call MPI_Recv (…)
call MPI_Wait(…)
elseif (myrank == 1) then
call MPI_Isend(….)
call MPI_Recv(….)
call MPI_Wait(….)
endif
9
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Bidirectional communication
• Case 2 : both processes call recv first, then send
if (myrank == 0 ) then
call MPI_Recv(….)
call MPI_Send (…)
elseif (myrank == 1) then
call MPI_Recv(….)
call MPI_Send(….)
endif
• The above will always lead to deadlock (even if you replace
MPI_Send with MPI_Isend and MPI_Wait)
10
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Bidirectional communication
• The following code can be safely executed
if (myrank == 0 ) then
call MPI_Irecv(….)
call MPI_Send (…)
call MPI_Wait(…)
elseif (myrank == 1) then
call MPI_Irecv(….)
call MPI_Send(….)
call MPI_Wait(….)
endif
11
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Bidirectional communication
• Case 3 : one process call send and recv in this order, and the other
calls in the opposite order
if (myrank == 0 ) then
call MPI_Send(….)
call MPI_Recv(…)
elseif (myrank == 1) then
call MPI_Recv(….)
call MPI_Send(….)
endif
• The above is always safe
• You can replace both send and recv on both processor with Isend
and Irecv
12
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Scatter and Gather
A Ap0
p1
p2
p3
p0
p1
p2
p3
A
A
A
broadcast
scatterA B C D A
B
C
D
gather
A
B
C
D
A B C D
A B C D
A B C D
A B C D
all gather
p0
p1
p2
p3
p0
p1
p2
p3
p0
p1
p2
p3
p0
p1
p2
p3
13
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Scatter Operation using MPI_Scatter
• Similar to Broadcast but sends a section of
an array to each processors
A(0) A(1) A(2) . . ………. A(N-1)
P0 P1 P2 . . . Pn-1
Goes to processors:
Data in an array on root node:
14
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Scatter
• C
– int MPI_Scatter(&sendbuf, sendcnts, sendtype, &recvbuf,
recvcnts, recvtype, root, comm );
• Fortran
– MPI_Scatter(sendbuf,sendcnts,sendtype,
recvbuf,recvcnts,recvtype,root,comm,ierror)
• Parameters
– sendbuf is an array of size (number processors*sendcnts)
– sendcnts number of elements sent to each processor
– recvcnts number of element(s) obtained from the root processor
– recvbuf contains element(s) obtained from the root processor, may
be an array
15
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Scatter Operation using MPI_Scatter
• Scatter with Sendcnts = 2
A(0) A(2) A(4) . . . A(2N-2)
A(1) A(3) A(5) . . . A(2N-1)
P0 P1 P2 . . . Pn-1
B(0) B(0) B(0) B(0)
B(1) B(1) B(1) B(1)
Goes to processors:
Data in an array on root node:
16
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Gather Operation using MPI_Gather
• Used to collect data from all processors to
the root, inverse of scatter
• Data is collected into an array on root
processor
A(0) A(1) A(2) . . . A(N-1)
P0 P1 P2 . . . Pn-1
A0 A1 A2 . . . An-1
Data from various
Processors:
Goes to an array
on root node:
17
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Gather
• C
– int MPI_Gather(&sendbuf,sendcnts, sendtype, &recvbuf,
recvcnts,recvtype,root, comm );
• Fortran
– MPI_Gather(sendbuf,sendcnts,sendtype,
recvbuf,recvcnts,recvtype,root,comm,ierror)
• Parameters
– sendcnts number of elements sent from each processor
– sendbuf is an array of size sendcnts
– recvcnts number of elements obtained from each processor
– recvbuf of size recvcnts*number of processors
18
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Code for Scatter and Gather
• A parallel program to scatter data using
MPI_Scatter
• Each processor sums the data
• Use MPI_Gather to get the data back to the
root processor
• Root processor prints the global data
• See attached Fortran and C code
19
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
module mpi
!DEC$ NOFREEFORM
include "mpif.h“
!DEC$ FREEFORM
end module
! This program shows how to use MPI_Scatter and MPI_Gather
! Each processor gets different data from the root processor
! by way of mpi_scatter. The data is summed and then sent back
! to the root processor using MPI_Gather. The root processor
! then prints the global sum.
module global
integer numnodes,myid,mpi_err
integer, parameter :: mpi_root=0
end module
subroutine init
use mpi
use global
implicit none
! do the mpi init stuff
call MPI_INIT( mpi_err )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err )
call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err)
20
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
end subroutine init
program test1
use mpi
use global
implicit none
integer, allocatable :: myray(:),send_ray(:),back_ray(:)
integer count
integer size,mysize,i,k,j,total
call init
! each processor will get count elements from the root
count=4
allocate(myray(count))
! create the data to be sent on the root
if(myid == mpi_root)then
size=count*numnodes
allocate(send_ray(0:size-1))
allocate(back_ray(0:numnodes-1))
do i=0,size-1
send_ray(i)= i
enddo
endif
21
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
call MPI_Scatter( send_ray, count, MPI_INTEGER, &
myray, count, MPI_INTEGER, &
mpi_root, MPI_COMM_WORLD,mpi_err)
! each processor does a local sum
total=sum(myray)
write(*,*)"myid= ",myid," total= ",total
! send the local sums back to the root
call MPI_Gather( total, 1, MPI_INTEGER, &
back_ray, 1, MPI_INTEGER, &
mpi_root, MPI_COMM_WORLD,mpi_err)
! the root prints the global sum
if(myid == mpi_root)then
write(*,*)"results from all processors= ",sum(back_ray)
endif
call mpi_finalize(mpi_err)
end program
22
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
/*! This program shows how to use MPI_Scatter and MPI_Gather
! Each processor gets different data from the root processor
! by way of mpi_scatter. The data is summed and then sent back
! to the root processor using MPI_Gather. The root processor
! then prints the global sum. */
/* globals */
int numnodes,myid,mpi_err;
#define mpi_root 0
/* end globals */
void init_it(int *argc, char ***argv);
void init_it(int *argc, char ***argv) {
mpi_err = MPI_Init(argc,argv);
mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes );
mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }
23
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
int main(int argc,char *argv[]){
int *myray,*send_ray,*back_ray;
int count;
int size,mysize,i,k,j,total;
init_it(&argc,&argv);
/* each processor will get count elements from the root */
count=4;
myray=(int*)malloc(count*sizeof(int));
/* create the data to be sent on the root */
if(myid == mpi_root){
size=count*numnodes;
send_ray=(int*)malloc(size*sizeof(int));
back_ray=(int*)malloc(numnodes*sizeof(int));
for(i=0;i<size;i++)
send_ray[i]=i;
}
/* send different data to each processor */
24
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
mpi_err = MPI_Scatter( send_ray, count, MPI_INT, myray, count, MPI_INT,
mpi_root, MPI_COMM_WORLD);
/* each processor does a local sum */
total=0;
for(i=0;i<count;i++)
total=total+myray[i];
printf("myid= %d total= %dn ",myid,total);
/* send the local sums back to the root */
mpi_err = MPI_Gather(&total, 1, MPI_INT, back_ray, 1, MPI_INT,
mpi_root, MPI_COMM_WORLD);
/* the root prints the global sum */
if(myid == mpi_root){
total=0;
for(i=0;i<numnodes;i++)
total=total+back_ray[i];
printf("results from all processors= %d n ",total);
}
mpi_err = MPI_Finalize();}
25
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Output of previous code on 4 procs
ultra:/work/majumdar/examples/mpi % bsub -q hpc -m ultra -I -n 4 ./a.out
Job <48051> is submitted to queue <hpc>.
<<Waiting for dispatch ...>>
<<Starting on ultra>>
myid= 1 total= 22
myid= 2 total= 38
myid= 3 total= 54
myid= 0 total= 6
results from all processors= 120
( 0 through 15 added up = (15) (15 + 1) /2 = 120)
26
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Global Sum with MPI_Reduce
2d array spread across processors
A0+A1+A2 B0+B1+B2 C0+C1+C2NODE 0
NODE 1
NODE 2
X(0) X(1) X(2)
A0 B0 C0
A1 B1 C1
A2 B2 C2
NODE 0
NODE 1
NODE 2
X(0) X(1) X(2)
27
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Allgather and MPI_Allreduce
• Gather and Reduce come in an "ALL"
variation
• Results are returned to all processors
• The root parameter is missing from the call
• Similar to a gather or reduce followed by a
broadcast
28
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Global Sum with MPI_Allreduce
2d array spread across processors
A0 B0 C0
A1 B1 C1
A2 B2 C2
X(0) X(1) X(2)
NODE 0
NODE 1
NODE 2
A0+A1+A2 B0+B1+B2 C0+C1+C2
A0+A1+A2 B0+B1+B2 C0+C1+C2
A0+A1+A2 B0+B1+B2 C0+C1+C2
X(0) X(1) X(2)
NODE 0
NODE 1
NODE 2
29
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
All to All communication with MPI_Alltoall
• Each processor sends and receives data
to/from all others
• C
– int MPI_Alltoall(&sendbuf,sendcnts, sendtype, &recvbuf,
recvcnts, recvtype, MPI_Comm);
• Fortran
– call MPI_Alltoall(sendbuf,sendcnts,sendtype,
recvbuf,recvcnts,recvtype,comm,ierror)
30
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
b0 b1 b2 b3
c0 c1 c2 c3
d0 d1 d2 d3
a0 a1 a2 a3
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
a0 b0 c0 d0MPI_AlltoallP0
P1
P2
P3
P0
P1
P2
P3
31
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
All to All with MPI_Alltoall
• Parameters
– sendcnts # of elements sent to each processor
– sendbuf is an array of size sendcnts
– recvcnts # of elements obtained from each processor
– recvbuf of size recvcnts
• Note that both send buffer and receive
buffer must be an array of size of the
number of processors
• See attached Fortran and C codes
32
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
module mpi
!DEC$ NOFREEFORM
include "mpif.h“
!DEC$ FREEFORM
end module
! This program shows how to use MPI_Alltoall. Each processor
! send/rec a different random number to/from other processors.
module global
integer numnodes,myid,mpi_err
integer, parameter :: mpi_root=0
end module
subroutine init
use mpi
use global
implicit none
! do the mpi init stuff
call MPI_INIT( mpi_err )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err )
call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err)
end subroutine init
33
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
program test1
use mpi
use global
implicit none
integer, allocatable :: scounts(:),rcounts(:)
integer ssize,rsize,i,k,j
real z
call init
! counts and displacement arrays
allocate(scounts(0:numnodes-1))
allocate(rcounts(0:numnodes-1))
call seed_random
! find data to send
do i=0,numnodes-1
call random_number(z)
scounts(i)=nint(10.0*z)+1
Enddo
write(*,*)"myid= ",myid," scounts= ",scounts
34
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
! send the data
call MPI_alltoall( scounts,1,MPI_INTEGER, &
rcounts,1,MPI_INTEGER, MPI_COMM_WORLD,mpi_err)
write(*,*)"myid= ",myid," rcounts= ",rcounts
call mpi_finalize(mpi_err)
end program
subroutine seed_random
use global
implicit none
integer the_size,j
integer, allocatable :: seed(:)
real z
call random_seed(size=the_size) ! how big is the intrisic seed?
allocate(seed(the_size)) ! allocate space for seed
do j=1,the_size ! create the seed
seed(j)=abs(myid*10)+(j*myid*myid)+100 ! abs is generic
enddo
call random_seed(put=seed) ! assign the seed
deallocate(seed)
end subroutine
35
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
#include <mpi.h>
#include <stdio.h>|
#include <stdlib.h>
/*! This program shows how to use MPI_Alltoall. Each processor
! send/rec a different random number to/from other processors. */
/* globals */
int numnodes,myid,mpi_err;
#define mpi_root 0
/* end module */
void init_it(int *argc, char ***argv);
void seed_random(int id);
void random_number(float *z);
void init_it(int *argc, char ***argv) {
mpi_err = MPI_Init(argc,argv);
mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes );
mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
}
36
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
int main(int argc,char *argv[]){
int *sray,*rray;
int *scounts,*rcounts;
int ssize,rsize,i,k,j;
float z;
init_it(&argc,&argv);
scounts=(int*)malloc(sizeof(int)*numnodes);
rcounts=(int*)malloc(sizeof(int)*numnodes);
/*! seed the random number generator with a
! different number on each processor*/
seed_random(myid);
/* find data to send */
for(i=0;i<numnodes;i++){
random_number(&z);
scounts[i]=(int)(10.0*z)+1;
}
printf("myid= %d scounts=",myid);
for(i=0;i<numnodes;i++)
printf("%d ",scounts[i]);
printf("n");
37
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
/* send the data */
mpi_err = MPI_Alltoall( scounts,1,MPI_INT,
rcounts,1,MPI_INT, MPI_COMM_WORLD);
printf("myid= %d rcounts=",myid);
for(i=0;i<numnodes;i++)
printf("%d ",rcounts[i]);
printf("n");
mpi_err = MPI_Finalize();}
void seed_random(int id){
srand((unsigned int)id);}
void random_number(float *z){
int i;
i=rand();
*z=(float)i/32767;
}
38
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Output of previous code on 4 procs
ultra:/work/majumdar/examples/mpi % bsub -q hpc -m ultra -I -n 4 a.out
Job <48059> is submitted to queue <hpc>.
<<Waiting for dispatch ...>>
<<Starting on ultra>>
myid= 1 scounts= 6 2 4 6
myid= 1 rcounts= 7 2 7 3
myid= 2 scounts= 1 7 4 4
myid= 2 rcounts= 4 4 4 4
myid= 3 scounts= 6 3 4 3
myid= 3 rcounts= 7 6 4 3
myid= 0 scounts= 1 7 4 7
myid= 0 rcounts= 1 6 1 6
--------------------------------------------
1 7 4 7 1 6 1 6
6 2 4 6 7 2 7 3
1 7 4 4 4 4 4 4
6 3 4 3 7 6 4 3
39
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
The variable or “V” operators
• A collection of very powerful but difficult
to setup global communication routines
• MPI_Gatherv: Gather different amounts of data
from each processor to the root processor
• MPI_Alltoallv: Send and receive different
amounts of data form all processors
• MPI_Allgatherv: Gather different amounts of data
from each processor and send all data to each
• MPI_Scatterv: Send different amounts of data to
each processor from the root processor
• We discuss MPI_Gatherv and MPI_Alltoallv
40
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Gatherv
• C
– int MPI_Gatherv (&sendbuf, sendcnts, sendtype,
&recvbuf, &recvcnts, &rdispls,recvtype, comm);
• Fortran
– MPI_Gatherv (sendbuf, sendcnts, sendtype, recvbuf,
recvcnts, rdispls, recvtype, comm, ierror)
• Parameters:
– Recvcnts is now an array
– Rdispls is a displacement
· See attached codes
41
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Gatherv
rank 0 = root rank 1 rank 2
1 2 3
sendbuf 2 3
sendbuf 3
sendbuf
recvcnts[0] 1 0 = rdispls[0]
recvcnts[1] 2 1 = rdispls[1]
2 2
recvcnts[2] 3 3 = rdispls[2]
3 4
3 5
recvbuf
42
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Gatherv code
Sample program:
include ‘mpif.h’
integer isend(3), irecv(6)
integer ircnt(0:2), idisp(0:2)
data icrnt/1,2,3/ idisp/0,1,3/
call mpi_init(ierr)
call mpi_comm_size(MPI_COMM_WORLD, nprocs,ierr)
call mpi_comm_rank(MPI_COMM_WORLD,myrank,ierr)
do I = 1,myrank+1
isend(I) = myrank+1
enddo
iscnt = myrank + 1
call MPI_GATHERV(isend,iscnt,MPI_INTEGER,irecv,ircnt,idisp,MPI_INTEGER
& 0,MPI_COMM_WORLD, ierr)
if (myrank .eq. 0) then
print *, ‘irecv =‘, irecv
endif
call MPI_FINALIZE(ierr)
end
Sample execution:
% bsub –q hpc –m ultra –I –n 3 ./a.out
% 0: irecv = 1 2 2 3 3 3
43
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
/*! This program shows how to use MPI_Gatherv. Each processor sends a
! different amount of data to the root processor. We use MPI_Gather
! first to tell the root how much data is going to be sent.*/
/* globals */
int numnodes,myid,mpi_err;
#define mpi_root 0
/* end of globals */
void init_it(int *argc, char ***argv);
void init_it(int *argc, char ***argv) {
mpi_err = MPI_Init(argc,argv);
mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes );
mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
}
44
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
int main(int argc,char *argv[]){
int *will_use;
int *myray,*displacements,*counts,*allray;
int size,mysize,i;
init_it(&argc,&argv);
mysize=myid+1;
myray=(int*)malloc(mysize*sizeof(int));
for(i=0;i<mysize;i++)
myray[i]=myid+1;
/* counts and displacement arrays are only required on the root */
if(myid == mpi_root){
counts=(int*)malloc(numnodes*sizeof(int));
45
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
displacements=(int*)malloc(numnodes*sizeof(int));
}
/* we gather the counts to the root */
mpi_err = MPI_Gather((void*)myray,1,MPI_INT,
(void*)counts, 1,MPI_INT,
mpi_root,MPI_COMM_WORLD);
/* calculate displacements and the size of the recv array */
if(myid == mpi_root){
displacements[0]=0;
for( i=1;i<numnodes;i++){
displacements[i]=counts[i-1]+displacements[i-1];
}
size=0;
for(i=0;i< numnodes;i++)
size=size+counts[i];
allray=(int*)malloc(size*sizeof(int));
}
46
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
/* different amounts of data from each processor */
/* is gathered to the root */
mpi_err = MPI_Gatherv(myray, mysize, MPI_INT,
allray,counts,displacements,MPI_INT,
mpi_root, MPI_COMM_WORLD);
if(myid == mpi_root){
for(i=0;i<size;i++)
printf("%d ",allray[i]);
printf("n");
}
mpi_err = MPI_Finalize();
}
ultra% bsub –q hpc –m ultra –I –n 3 ./a.out
1 2 2 3 3 3
47
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Alltoallv
• Send and receive different amounts of data
form all processors
• C
– int MPI_Alltoallv (&sendbuf, &sendcnts, &sdispls,
sendtype, &recvbuf, &recvcnts, &rdispls, recvtype,
comm );
• Fortran
– Call MPI_Alltoallv(sendbuf, sendcnts, sdispls,
sendtype, recvbuf, recvcnts, rdispls,recvtype,
comm,ierror);
• See attached code
48
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Alltoallv
rank0 rank1 rank2
sendnts[0] 1 4 7 0=sdispls[0]
sendcnts[1] 2 5 8 1=sdispls[1]
2 5 8 2
sendcnts[3] 3 6 9 3=sdispls[2]
3 6 9 4
3 6 9 5
sendbuf sendbuf
recvcnts[0] 1 2 3 0=rdispls[0]
recvcnts[1] 4 2 3 1
recvcnts[3] 7 5 3 2
recvbuf 5 6 3=rdispls[1]
8 6 4
8 6 5
recvbuf 9 6=rdispls[2]
9 7
9 8
49
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Alltoallv
proc#
recvcnts 0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
proc#
rdispls 0 1 2
0 0 0 0
1 1 2 3
2 2 4 6
50
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Alltoallv
Program alltoallv
include ‘mpif.h’
integer isend(6), irecv(9)
integer iscnt(0:2), isdsp(0:2), ircnt(0), irdsp(0:2)
data isend/1,2,2,3,3,3/
data iscnt/1,2,3/ isdsp/0,1,3/
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs, ierr)
call MPI_COMM_RANK(MIP_COMM_WORLD,myrank, ierr)
do i = 1,6
isend(i) = isend(i) + nprocs*myrank
enddo
do i = 0, nprocs – 1
ircnt(i) = myrank + 1
irdsp(i) = i* (myrank + 1)
enddo
print*, ‘isend=‘, isend
call MP_FLUSH(1)
call MPI_ALLTOALLV(isend,iscnt,isdsp,MPI_INTEGER,irecv, ircnt, irdsp,MPI_INTEGER,
MPI_COMM_WORLD, ierr)
print*, ‘irecv=‘,irecv
call MPI_FINALIZE(ierr)
end
51
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Alltoallv
Sample execution of mpialltoallv program:
% bsub –q hpc –m ultra –I –n 3
% 0: isend = 1 2 2 3 3 3
1: isend = 4 5 5 6 6 6
2: isend = 7 8 8 9 9 9
0: irecv = 1 4 7 0 0 0 0 0 0
1: irecv = 2 2 5 5 8 8 0 0 0
2: irecv = 3 3 3 6 6 6 9 9 9
52
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
/*
! This program shows how to use MPI_Alltoallv. Each processor
! send/rec a different and random amount of data to/from other
! processors.
! We use MPI_Alltoall to tell how much data is going to be sent.
*/
/* globals */
int numnodes,myid,mpi_err;
#define mpi_root 0
/* end module */
53
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
void seed_random(int id);
void random_number(float *z);
void init_it(int *argc, char ***argv) {
mpi_err = MPI_Init(argc,argv);
mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes );
mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
}
int main(int argc,char *argv[]){
int *sray,*rray;
int *sdisp,*scounts,*rdisp,*rcounts;
int ssize,rsize,i,k,j;
float z;
init_it(&argc,&argv);
scounts=(int*)malloc(sizeof(int)*numnodes);
rcounts=(int*)malloc(sizeof(int)*numnodes);
sdisp=(int*)malloc(sizeof(int)*numnodes);
rdisp=(int*)malloc(sizeof(int)*numnodes);
/*
54
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
! seed the random number generator with a
! different number on each processor
*/
seed_random(myid);
/* find out how much data to send */
for(i=0;i<numnodes;i++){
random_number(&z);
scounts[i]=(int)(10.0*z)+1;
}
printf("myid= %d scounts=",myid);
for(i=0;i<numnodes;i++)
printf("%d ",scounts[i]);
printf("n");
/* tell the other processors how much data is coming */
mpi_err = MPI_Alltoall( scounts,1,MPI_INT,
rcounts,1,MPI_INT,
MPI_COMM_WORLD);
55
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
/* write(*,*)"myid= ",myid," rcounts= ",rcounts */
/* calculate displacements and the size of the arrays */
sdisp[0]=0;
for(i=1;i<numnodes;i++){
sdisp[i]=scounts[i-1]+sdisp[i-1];
}
rdisp[0]=0;
for(i=1;i<numnodes;i++){
rdisp[i]=rcounts[i-1]+rdisp[i-1];
}
ssize=0;
rsize=0;
for(i=0;i<numnodes;i++){
ssize=ssize+scounts[i];
rsize=rsize+rcounts[i];
}
56
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
/* allocate send and rec arrays */
sray=(int*)malloc(sizeof(int)*ssize);
rray=(int*)malloc(sizeof(int)*rsize);
for(i=0;i<ssize;i++)
sray[i]=myid;
/* send/rec different amounts of data to/from each processor */
mpi_err = MPI_Alltoallv( sray,scounts,sdisp,MPI_INT,
rray,rcounts,rdisp,MPI_INT,
MPI_COMM_WORLD);
printf("myid= %d rray=",myid);
for(i=0;i<rsize;i++)
printf("%d ",rray[i]);
printf("n");
mpi_err = MPI_Finalize();
}
57
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
void seed_random(int id) {
srand((unsigned int)id);
}
void random_number(float *z){
int i;
i=rand();
*z=(float)i/32767;
}
Ultra output from 3 procs run:
0:myid= 0 scounts=1 7 4
0:myid= 0 rray=0 1 1 1 1 1 1 2
1:myid= 1 scounts=6 2 4
1:myid= 1 rray=0 0 0 0 0 0 0 1 1 2 2 2 2 2 2 2
2:myid= 2 scounts=1 7 4
2:myid= 2 rray=0 0 0 0 1 1 1 1 2 2 2 2
58
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Derived types
• C and Fortran 90 have the ability to define
arbitrary data types that encapsulate reals,
integers, and characters.
• MPI allows you to define message data
types corresponding to your data types
• Can use these data types just as default
types
59
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Derived types, Three main classifications:
• Contiguous Vectors: enable you to send
contiguous blocks of the same type of data
lumped together
• Noncontiguous Vectors: enable you to send
noncontiguous blocks of the same type of
data lumped together
• Abstract types: enable you to (carefully)
send C or Fortran 90 structures, don't send
pointers
60
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Derived types, how to use them
• Three step process
– Define the type using
• MPI_Type_contiguous for contiguous vectors
• MPI_Type_vector for noncontiguous vectors
• MPI_Type_struct for structures
– Commit the type using
• MPI_Type_commit
– Use in normal communication calls
• MPI_Send(buffer, count, MY_TYPE,
destination,tag, MPI_COMM_WORLD, ierr)
61
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Type_contiguous
• Defines a new data type of length count elements
from your old data type
• C
– MPI_Type_contiguous(int count, old_type, &new_type)
• Fortran
– Call MPI_TYPE_CONTIGUOUS(count, old_type,
new_type, ierror)
• Parameters
– Old_type: your base type
– New_type: a type count elements of Old_type
• See attached codes
62
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_TYPE_CONTIGUOUS
Sample program - Fortran:
program type_contiguous
include ‘mpif.h’
integer ibuf(20)
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,ierr)
if (myrank .eq. 0) then
do i = 1,20
ibuf(i) = I
enddo
endif
call MPI_TYPE_CONTIGUOUS(3,MPI_INTEGER,inewtype, ierr)
call MPI_TYPE_COMMIT(inewtype, ierr)
call MPI_BCAST(ibuf,3,inewtype,0,MPI_COMM_WORLD, ierr)
print*, ‘ibuf=‘,ibuf
call MPI_FINALIZE(ierr)
end
Sample execution:
% bsub –q hpc –m ultra –I in 2 a.out
% 0 : ibuf =1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1: ibuf = 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0
63
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Type_contiguous#include <stdio.h>
#include "mpi.h“
#include <math.h>
int main(argc,argv)
int argc;
char *argv[];{
int myid, numprocs, i , buffer[20];
MPI_Status status;
MPI_Datatype inewtype ;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { for (i=0; i<20; i++)
buffer[i]=i ;}
if (myid == 1) { for (i=0; i<20; i++)
buffer[i]=0 ;}
MPI_Type_contiguous(3,MPI_INT,&inewtype);
MPI_Type_commit(&inewtype) ;
MPI_Bcast(buffer,3,inewtype,0,MPI_COMM_WORLD);
for(i=0;i<20;i++)
printf("%d ",buffer[i]);
printf("n");
MPI_Finalize(); }
Output on two processors :
0 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
64
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Type_vector
• Defines a datatype which consists of count blocks
each of length blocklength and stride
displacement between blocks
• C
– MPI_Type_vector(count, blocklength, stride, old_type,
*new_type)
• Fortran
– Call MPI_TYPE_VECTOR(count, blocklength, stride,
old_type, new_type, ierror)
• See attached codes
65
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
module mpi
!DEC$ NOFREEFORM
include "mpif.h"
!DEC$ FREEFORM
end module
!Shows how to use MPI_Type_vector to send noncontiguous blocks of data
!and MPI_Get_count and MPI_Get_elements to see the number of elements sent
program do_vect
use mpi
! include "mpif.h"
integer , parameter :: size=24
integer myid, ierr,numprocs
real*8 svect(0:size),rvect(0:size)
integer i,bonk1,bonk2,numx,stride,extent
integer MY_TYPE
integer status(MPI_STATUS_SIZE)
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
stride=5
numx=(size+1)/stride
66
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
extent = 1
if(myid == 1)write(*,*)"numx=",numx," extent=",extent," stride=",stride
call MPI_Type_vector(numx,extent,stride,MPI_DOUBLE_PRECISION,MY_TYPE,ier
r)
call MPI_Type_commit(MY_TYPE, ierr )
if(myid == 0)then
do i=0,size
svect(i)=i
enddo
call MPI_Send(svect,1,MY_TYPE,1,100,MPI_COMM_WORLD,ierr)
endif
if(myid == 1)then
do i=0,size
rvect(i)=-1
enddo
call MPI_Recv(rvect,1,MY_TYPE,0,100,MPI_COMM_WORLD,status,ierr)
endif
if(myid == 1)then
call MPI_Get_count(status,MY_TYPE,bonk1, ierr )
call MPI_Get_elements(status,MPI_DOUBLE_PRECISION,bonk2,ierr)
67
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
write(*,*)"got ", bonk1," elements of type MY_TYPE"
write(*,*)"which contained ", bonk2," elements of type MPI_DOUBLE_PREC
ISION"
do i=0,size
if(rvect(i) /= -1)write(*,'(i2,f4.0)')i,rvect(i)
enddo
endif
call MPI_Finalize(ierr )
end program
! output
! numx= 5 extent= 1 stride= 5
! got 1 elements of type MY_TYPE
! which contained 5 elements of type MPI_DOUBLE_PRECISION
! 0 0.
! 5 5.
! 10 10.
! 15 15.
! 20 20.
68
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Shows how to use MPI_Type_vector to send noncontiguous blocks of data
and MPI_Get_count and MPI_Get_elements to see the number of elements sent
*/
#include <stdio.h>
#include "mpi.h"
#include <math.h>
int main(argc,argv)
int argc;
char *argv[];
{
int myid, numprocs,mpi_err;
#define SIZE 25
double svect[SIZE],rvect[SIZE];
int i,bonk1,bonk2,numx,stride,extent;
MPI_Datatype MPI_LEFT_RITE;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
69
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
stride=5;
numx=(SIZE+1)/stride;
extent=1;
if(myid == 1){
printf("numx=%d extent=%d stride=%dn",stride,numx,extent,stride);
}
mpi_err=MPI_Type_vector(numx,extent,stride,MPI_DOUBLE,&MPI_LEFT_
RITE);
mpi_err=MPI_Type_commit(&MPI_LEFT_RITE);
if(myid == 0){
for (i=0;i<SIZE;i++)
svect[i]=i;
MPI_Send(svect,1,MPI_LEFT_RITE,1,100,MPI_COMM_WORLD);
}
if(myid == 1){
for (i=0;i<SIZE;i++)
rvect[i]=-1;
MPI_Recv(rvect,1,MPI_LEFT_RITE,0,100,MPI_COMM_WORLD,&status);
}
70
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
if(myid == 1){
MPI_Get_count(&status,MPI_LEFT_RITE,&bonk1);
MPI_Get_elements(&status,MPI_DOUBLE,&bonk2);
printf("got %d elements of type MY_TYPEn",bonk1);
printf("which contained %d elements of type MPI_DOUBLEn",bonk2);
for (i=0;i<SIZE;i++)
if(rvect[i] != -1)printf("%d %gn",i,rvect[i]);
}
MPI_Finalize();
}
/*
output
numx=5 extent=5 stride=1
got 1 elements of type MY_TYPE
which contained 5 elements of type MPI_DOUBLE
0 0
5 5
10 10
15 15
20 20
*/
71
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Type_struct
• Defines a MPI datatype which maps to a
user defined derived datatype
• C
– int MPI_Type_struct(count, &array_of_blocklengths,
&array_of_displacement, &array_of_types, &newtype);
• Fortran
– Call MPI_TYPE_STRUCT(count, array_of_blocklengths,
array_of_displacement, array_of_types, newtype,ierror)
72
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Type_struct
• Parameters:
– [IN count] # of old types in the new type (integer)
– [IN array_of_blocklengths] how many of each type in
new structure (integer)
– [IN array_of_types] types in new structure (integer)
– [IN array_of_displacement] offset in bytes for the
beginning of each group of types (integer)
– [OUT newtype] new datatype (handle)
– Call MPI_TYPE_STRUCT(count, array_of_blocklengths,
array_of_displacement,array_of_types, newtype,ierror)
– Ierr = MPI_Type_struct(count, &array_of_blocklengths,
&array_of_displacement, &array_of_types, &newtype);
73
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Derived Data type Example
Consider the data type or structure consisting of
3 mpi double
10 mpi integer
2 mpi character
Creating the MPI data structure matching this C/Fortran
structure is a three step process
• Fill the descriptor arrays:
B - blocklengths
T - types
D - displacements
• Use MPI_Type_struct to create the MPI data structure
• Commit the new data type using MPI_Type_commit
74
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Derived Data type Example
• To create the MPI data structure
matching this C/Fortran structure
– Fill the descriptor arrays:
• B - blocklengths
• T - types
• D - displacements
• Then use MPI_Type_struct
75
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Derived Data type Example (continued)
Fortran :
! t contains the types that
! make up the structure
t(1)=MPI_DOUBLE_PRECISION
t(2)=MPI_INTEGER
t(3)=MPI_CHARACTER
! b contains the # of each type
b(1)=3;b(2)=10;b(3)=2
! d contains the byte offset of
! the start of each type
d(1)=0;d(2)=24;d(3)=64
call MPI_TYPE_STRUCT(3,b,d,t,
MPI_CHARLES,mpi_err)
MPI_CHARLES is our new data type
C :
/* t contains the types that
make up the structure*/
t[0]=MPI_DOUBLE
t[1]=MPI_INT
t[2]=MPI_CHAR
/*b contains the # of each type */
b[0]=3;b[1]=10;b[2]=2
/* d contains the byte offset of
the start of each type*/
d[0]=0;d[1]=24;d[2]=64
ierr = MPI_Type_struct(3,&b,&d,&t,
MPI_CHARLES,mpi_err)
76
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Type_commit
• Before we use the new data type we call
MPI_Type_commit
• C
– MPI_Type_commit(&MPI_CHARLES)
• Fortran
– Call MPI_Type_commit(MPI_CHARLES,ierr)
77
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Communicators
• A communicator is a parameter in all MPI
message passing routines
• A communicator is a collection of
processors that can engage in
communication
• MPI_COMM_WORLD is the default
communicator that consists of all processors
• MPI allows you to create subsets of
communicators
78
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
Why Communicators?
• Isolate communication to a small number of
processors
• Useful for creating libraries
• Different processors can work on different
parts of the problem
• Useful for communicating with "nearest
neighbors"
79
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Comm_split
• Provides a short cut method to create a
collection of communicators
• All processors with the "same color" will be
in the same communicator
• Index gives rank in new communicator
• Fortran
– call MPI_COMM_SPLIT(OLD_COMM, color, index,
NEW_COMM, mpi_err)
• C
– MPI_Comm_split(OLD_COMM, color, index, &NEW_COMM)
80
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Comm_split
• Split odd and even processors into 2 communicators
Program comm_split
include "mpif.h"
Integer color,zero_one
call MPI_INIT( mpi_err )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, mpi_err )
color=mod(myid,2) !color is either 1 or 0
call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,myid,NEW_COMM,mpi_err)
call MPI_COMM_RANK( NEW_COMM, new_id, mpi_err )
call MPI_COMM_SIZE( NEW_COMM, new_nodes, mpi_err )
Zero_one = -1
If(new_id==0)Zero_one = color
Call MPI_Bcast(Zero_one,1,MPI_INTEGER,0, NEW_COMM,mpi_err)
If(zero_one==0)write(*,*)"part of even processor communicator"
If(zero_one==1)write(*,*)"part of odd processor communicator"
Write(*,*)"old_id=", myid, "new_id=", new_id
Call MPI_FINALIZE(mpi_error)
End program
81
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
MPI_Comm_split
• Split odd and even processors into 2 communicators
0: part of even processor communicator
0: old_id= 0 new_id= 0
2: part of even processor communicator
2: old_id= 2 new_id= 1
1: part of odd processor communicator
1: old_id= 1 new_id= 0
3: part of odd processor communicator
3: old_id= 3 new_id= 1
82
San DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL
INFRASTRUCTURE
#include "mpi.h"
#include <math.h>
int main(argc,argv)
int argc;
char *argv[];
{
int myid, numprocs;
int color,Zero_one,new_id,new_nodes;
MPI_Comm NEW_COMM;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
color=myid % 2;
MPI_Comm_split(MPI_COMM_WORLD,color,myid,&NEW_COMM);
MPI_Comm_rank( NEW_COMM, &new_id);
MPI_Comm_size( NEW_COMM, &new_nodes);
Zero_one = -1;
if(new_id==0)Zero_one = color;
MPI_Bcast(&Zero_one,1,MPI_INT,0, NEW_COMM);
if(Zero_one==0)printf("part of even processor communicator n");
if(Zero_one==1)printf("part of odd processor communicator n");
printf("old_id= %d new_id= %dn", myid, new_id);
MPI_Finalize();
}

More Related Content

Viewers also liked

Concurrent Programming Using The Disruptor - Copenhagen
Concurrent Programming Using The Disruptor - CopenhagenConcurrent Programming Using The Disruptor - Copenhagen
Concurrent Programming Using The Disruptor - CopenhagenTrisha Gee
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative FilteringTayfun Sen
 
Cis017 6 revision-parallel_2015
Cis017 6 revision-parallel_2015Cis017 6 revision-parallel_2015
Cis017 6 revision-parallel_2015abdullah al-Thani
 
Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFJeff Squyres
 
الباب الرابع
الباب الرابعالباب الرابع
الباب الرابعtahsal99
 
نظم التشغيل تهاني
نظم التشغيل تهانينظم التشغيل تهاني
نظم التشغيل تهانيtahanisaad
 
MPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumMPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumJeff Squyres
 
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...Marcirio Chaves
 
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...Marcirio Chaves
 

Viewers also liked (20)

Concurrent Programming Using The Disruptor - Copenhagen
Concurrent Programming Using The Disruptor - CopenhagenConcurrent Programming Using The Disruptor - Copenhagen
Concurrent Programming Using The Disruptor - Copenhagen
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 
Admission Control Mechanism For Mpls Ds Te
Admission Control Mechanism For Mpls Ds TeAdmission Control Mechanism For Mpls Ds Te
Admission Control Mechanism For Mpls Ds Te
 
Lecture9
Lecture9Lecture9
Lecture9
 
Cis017 6 revision-parallel_2015
Cis017 6 revision-parallel_2015Cis017 6 revision-parallel_2015
Cis017 6 revision-parallel_2015
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
 
الباب الرابع
الباب الرابعالباب الرابع
الباب الرابع
 
Lecture7
Lecture7Lecture7
Lecture7
 
Rasm uthmani
Rasm uthmaniRasm uthmani
Rasm uthmani
 
Lecture8
Lecture8Lecture8
Lecture8
 
Butterflies
ButterfliesButterflies
Butterflies
 
Mpi Test Suite Multi Threaded
Mpi Test Suite Multi ThreadedMpi Test Suite Multi Threaded
Mpi Test Suite Multi Threaded
 
نظم التشغيل تهاني
نظم التشغيل تهانينظم التشغيل تهاني
نظم التشغيل تهاني
 
Lecture10
Lecture10Lecture10
Lecture10
 
MPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumMPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI Forum
 
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
A Identificação de Riscos Novos e Potencializados em Projetos de Tecnologia d...
 
Clustering manual
Clustering manualClustering manual
Clustering manual
 
Lecture2
Lecture2Lecture2
Lecture2
 
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
A Multidomain and Multilingual Conceptual Data Model for Online Reviews Repre...
 

Similar to Lecture11

Lisandro dalcin-mpi4py
Lisandro dalcin-mpi4pyLisandro dalcin-mpi4py
Lisandro dalcin-mpi4pyA Jorge Garcia
 
Parallel programming using MPI
Parallel programming using MPIParallel programming using MPI
Parallel programming using MPIAjit Nayak
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMPDivya Tiwari
 
Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPIHanif Durad
 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPIjbp4444
 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Davide Carboni
 
CorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. ProgrammingCorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. ProgrammingSlide_N
 
Artificial Neural Networks on a Tic Tac Toe console application
Artificial Neural Networks on a Tic Tac Toe console applicationArtificial Neural Networks on a Tic Tac Toe console application
Artificial Neural Networks on a Tic Tac Toe console applicationEduardo Gulias Davis
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interfaceMohit Raghuvanshi
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
I have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdfI have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdfkashishkochhar5
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 

Similar to Lecture11 (20)

Lisandro dalcin-mpi4py
Lisandro dalcin-mpi4pyLisandro dalcin-mpi4py
Lisandro dalcin-mpi4py
 
Mpi
Mpi Mpi
Mpi
 
mpi4py.pdf
mpi4py.pdfmpi4py.pdf
mpi4py.pdf
 
Parallel programming using MPI
Parallel programming using MPIParallel programming using MPI
Parallel programming using MPI
 
More mpi4py
More mpi4pyMore mpi4py
More mpi4py
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
 
Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPI
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPI
 
Open MPI 2
Open MPI 2Open MPI 2
Open MPI 2
 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?
 
CorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. ProgrammingCorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. Programming
 
Artificial Neural Networks on a Tic Tac Toe console application
Artificial Neural Networks on a Tic Tac Toe console applicationArtificial Neural Networks on a Tic Tac Toe console application
Artificial Neural Networks on a Tic Tac Toe console application
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
 
Open MPI
Open MPIOpen MPI
Open MPI
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
I have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdfI have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdf
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 

More from tt_aljobory (18)

Homework 2 sol
Homework 2 solHomework 2 sol
Homework 2 sol
 
Lecture12
Lecture12Lecture12
Lecture12
 
Lecture6
Lecture6Lecture6
Lecture6
 
Lecture5
Lecture5Lecture5
Lecture5
 
Lecture4
Lecture4Lecture4
Lecture4
 
Lecture3
Lecture3Lecture3
Lecture3
 
Lecture1
Lecture1Lecture1
Lecture1
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Good example on ga
Good example on gaGood example on ga
Good example on ga
 
Lect3
Lect3Lect3
Lect3
 
Lect4
Lect4Lect4
Lect4
 
Lect4
Lect4Lect4
Lect4
 
Above theclouds
Above thecloudsAbove theclouds
Above theclouds
 
Inet prog
Inet progInet prog
Inet prog
 
Form
FormForm
Form
 
8051 experiments1
8051 experiments18051 experiments1
8051 experiments1
 
63071507 interrupts-up
63071507 interrupts-up63071507 interrupts-up
63071507 interrupts-up
 
37471656 interrupts
37471656 interrupts37471656 interrupts
37471656 interrupts
 

Lecture11

  • 1. 1 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Point to Point Communications in MPI • Basic operations of Point to Point (PtoP) communication and issues of deadlock • Several steps are involved in the PtoP communication • Sending process – data is copied to the user buffer by the user – User calls one of the MPI send routines – System copies the data from the user buffer to the system buffer – System sends the data from the system buffer to the destination processor
  • 2. 2 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Point to Point Communications in MPI • Receiving process – User calls one of the MPI receive subroutines – System receives the data from the source process, and copies it to the system buffer – System copies the data from the system buffer to the user buffer – User uses the data in the user buffer
  • 3. 3 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE sendbuf Call send routine Now sendbuf can be reused Process 0 : User mode Kernel mode Copying data from sendbuf to systembuf Send data from sysbuf to dest data Process 1 : User mode Kernel mode Call receive routine receive data from src to systembuf Copying data from sysbuf to recvbuf sysbuf sysbuf recvbuf Now recvbuf contains valid data
  • 4. 4 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Unidirectional communication • Blocking send and blocking receive if (myrank == 0) then call MPI_Send(…) elseif (myrank == 1) then call MPI_Recv(….) endif • Non-blocking send and blocking receive if (myrank == 0) then call MPI_ISend(…) call MPI_Wait(…) else if (myrank == 1) then call MPI_Recv(….) endif
  • 5. 5 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE • Blocking send and non-blocking recv if (myrank == 0 ) then call MPI_Send(…..) elseif (myrank == 1) then call MPI_Irecv (…) call MPI_Wait(…) endif • Non-blocking send and non-blocking recv if (myrank == 0 ) then call MPI_Isend (…) call MPI_Wait (…) elseif (myrank == 1) then call MPI_Irecv (….) call MPI_Wait(..) endif Unidirectional communication
  • 6. 6 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bidirectional communication • Need to be careful about deadlock when two processes exchange data with each other • Deadlock can occur due to incorrect order of send and recv or due to limited size of the system buffer sendbuf recvbuf Rank 0 Rank 1 recvbuf sendbuf
  • 7. 7 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bidirectional communication • Case 1 : both processes call send first, then recv if (myrank == 0 ) then call MPI_Send(….) call MPI_Recv (…) elseif (myrank == 1) then call MPI_Send(….) call MPI_Recv(….) endif • No deadlock as long as system buffer is larger than send buffer • Deadlock if system buffer is smaller than send buf • If you replace MPI_Send with MPI_Isend and MPI_Wait, it is still the same • Moral : there may be error in coding that only shows up for larger problem size
  • 8. 8 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bidirectional communication • Following is free from deadlock if (myrank == 0 ) then call MPI_Isend(….) call MPI_Recv (…) call MPI_Wait(…) elseif (myrank == 1) then call MPI_Isend(….) call MPI_Recv(….) call MPI_Wait(….) endif
  • 9. 9 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bidirectional communication • Case 2 : both processes call recv first, then send if (myrank == 0 ) then call MPI_Recv(….) call MPI_Send (…) elseif (myrank == 1) then call MPI_Recv(….) call MPI_Send(….) endif • The above will always lead to deadlock (even if you replace MPI_Send with MPI_Isend and MPI_Wait)
  • 10. 10 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bidirectional communication • The following code can be safely executed if (myrank == 0 ) then call MPI_Irecv(….) call MPI_Send (…) call MPI_Wait(…) elseif (myrank == 1) then call MPI_Irecv(….) call MPI_Send(….) call MPI_Wait(….) endif
  • 11. 11 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Bidirectional communication • Case 3 : one process call send and recv in this order, and the other calls in the opposite order if (myrank == 0 ) then call MPI_Send(….) call MPI_Recv(…) elseif (myrank == 1) then call MPI_Recv(….) call MPI_Send(….) endif • The above is always safe • You can replace both send and recv on both processor with Isend and Irecv
  • 12. 12 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Scatter and Gather A Ap0 p1 p2 p3 p0 p1 p2 p3 A A A broadcast scatterA B C D A B C D gather A B C D A B C D A B C D A B C D A B C D all gather p0 p1 p2 p3 p0 p1 p2 p3 p0 p1 p2 p3 p0 p1 p2 p3
  • 13. 13 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Scatter Operation using MPI_Scatter • Similar to Broadcast but sends a section of an array to each processors A(0) A(1) A(2) . . ………. A(N-1) P0 P1 P2 . . . Pn-1 Goes to processors: Data in an array on root node:
  • 14. 14 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Scatter • C – int MPI_Scatter(&sendbuf, sendcnts, sendtype, &recvbuf, recvcnts, recvtype, root, comm ); • Fortran – MPI_Scatter(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,root,comm,ierror) • Parameters – sendbuf is an array of size (number processors*sendcnts) – sendcnts number of elements sent to each processor – recvcnts number of element(s) obtained from the root processor – recvbuf contains element(s) obtained from the root processor, may be an array
  • 15. 15 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Scatter Operation using MPI_Scatter • Scatter with Sendcnts = 2 A(0) A(2) A(4) . . . A(2N-2) A(1) A(3) A(5) . . . A(2N-1) P0 P1 P2 . . . Pn-1 B(0) B(0) B(0) B(0) B(1) B(1) B(1) B(1) Goes to processors: Data in an array on root node:
  • 16. 16 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Gather Operation using MPI_Gather • Used to collect data from all processors to the root, inverse of scatter • Data is collected into an array on root processor A(0) A(1) A(2) . . . A(N-1) P0 P1 P2 . . . Pn-1 A0 A1 A2 . . . An-1 Data from various Processors: Goes to an array on root node:
  • 17. 17 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Gather • C – int MPI_Gather(&sendbuf,sendcnts, sendtype, &recvbuf, recvcnts,recvtype,root, comm ); • Fortran – MPI_Gather(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,root,comm,ierror) • Parameters – sendcnts number of elements sent from each processor – sendbuf is an array of size sendcnts – recvcnts number of elements obtained from each processor – recvbuf of size recvcnts*number of processors
  • 18. 18 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Code for Scatter and Gather • A parallel program to scatter data using MPI_Scatter • Each processor sums the data • Use MPI_Gather to get the data back to the root processor • Root processor prints the global data • See attached Fortran and C code
  • 19. 19 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE module mpi !DEC$ NOFREEFORM include "mpif.h“ !DEC$ FREEFORM end module ! This program shows how to use MPI_Scatter and MPI_Gather ! Each processor gets different data from the root processor ! by way of mpi_scatter. The data is summed and then sent back ! to the root processor using MPI_Gather. The root processor ! then prints the global sum. module global integer numnodes,myid,mpi_err integer, parameter :: mpi_root=0 end module subroutine init use mpi use global implicit none ! do the mpi init stuff call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err)
  • 20. 20 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE end subroutine init program test1 use mpi use global implicit none integer, allocatable :: myray(:),send_ray(:),back_ray(:) integer count integer size,mysize,i,k,j,total call init ! each processor will get count elements from the root count=4 allocate(myray(count)) ! create the data to be sent on the root if(myid == mpi_root)then size=count*numnodes allocate(send_ray(0:size-1)) allocate(back_ray(0:numnodes-1)) do i=0,size-1 send_ray(i)= i enddo endif
  • 21. 21 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE call MPI_Scatter( send_ray, count, MPI_INTEGER, & myray, count, MPI_INTEGER, & mpi_root, MPI_COMM_WORLD,mpi_err) ! each processor does a local sum total=sum(myray) write(*,*)"myid= ",myid," total= ",total ! send the local sums back to the root call MPI_Gather( total, 1, MPI_INTEGER, & back_ray, 1, MPI_INTEGER, & mpi_root, MPI_COMM_WORLD,mpi_err) ! the root prints the global sum if(myid == mpi_root)then write(*,*)"results from all processors= ",sum(back_ray) endif call mpi_finalize(mpi_err) end program
  • 22. 22 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE #include <mpi.h> #include <stdio.h> #include <stdlib.h> /*! This program shows how to use MPI_Scatter and MPI_Gather ! Each processor gets different data from the root processor ! by way of mpi_scatter. The data is summed and then sent back ! to the root processor using MPI_Gather. The root processor ! then prints the global sum. */ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end globals */ void init_it(int *argc, char ***argv); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }
  • 23. 23 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE int main(int argc,char *argv[]){ int *myray,*send_ray,*back_ray; int count; int size,mysize,i,k,j,total; init_it(&argc,&argv); /* each processor will get count elements from the root */ count=4; myray=(int*)malloc(count*sizeof(int)); /* create the data to be sent on the root */ if(myid == mpi_root){ size=count*numnodes; send_ray=(int*)malloc(size*sizeof(int)); back_ray=(int*)malloc(numnodes*sizeof(int)); for(i=0;i<size;i++) send_ray[i]=i; } /* send different data to each processor */
  • 24. 24 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE mpi_err = MPI_Scatter( send_ray, count, MPI_INT, myray, count, MPI_INT, mpi_root, MPI_COMM_WORLD); /* each processor does a local sum */ total=0; for(i=0;i<count;i++) total=total+myray[i]; printf("myid= %d total= %dn ",myid,total); /* send the local sums back to the root */ mpi_err = MPI_Gather(&total, 1, MPI_INT, back_ray, 1, MPI_INT, mpi_root, MPI_COMM_WORLD); /* the root prints the global sum */ if(myid == mpi_root){ total=0; for(i=0;i<numnodes;i++) total=total+back_ray[i]; printf("results from all processors= %d n ",total); } mpi_err = MPI_Finalize();}
  • 25. 25 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Output of previous code on 4 procs ultra:/work/majumdar/examples/mpi % bsub -q hpc -m ultra -I -n 4 ./a.out Job <48051> is submitted to queue <hpc>. <<Waiting for dispatch ...>> <<Starting on ultra>> myid= 1 total= 22 myid= 2 total= 38 myid= 3 total= 54 myid= 0 total= 6 results from all processors= 120 ( 0 through 15 added up = (15) (15 + 1) /2 = 120)
  • 26. 26 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Global Sum with MPI_Reduce 2d array spread across processors A0+A1+A2 B0+B1+B2 C0+C1+C2NODE 0 NODE 1 NODE 2 X(0) X(1) X(2) A0 B0 C0 A1 B1 C1 A2 B2 C2 NODE 0 NODE 1 NODE 2 X(0) X(1) X(2)
  • 27. 27 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Allgather and MPI_Allreduce • Gather and Reduce come in an "ALL" variation • Results are returned to all processors • The root parameter is missing from the call • Similar to a gather or reduce followed by a broadcast
  • 28. 28 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Global Sum with MPI_Allreduce 2d array spread across processors A0 B0 C0 A1 B1 C1 A2 B2 C2 X(0) X(1) X(2) NODE 0 NODE 1 NODE 2 A0+A1+A2 B0+B1+B2 C0+C1+C2 A0+A1+A2 B0+B1+B2 C0+C1+C2 A0+A1+A2 B0+B1+B2 C0+C1+C2 X(0) X(1) X(2) NODE 0 NODE 1 NODE 2
  • 29. 29 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE All to All communication with MPI_Alltoall • Each processor sends and receives data to/from all others • C – int MPI_Alltoall(&sendbuf,sendcnts, sendtype, &recvbuf, recvcnts, recvtype, MPI_Comm); • Fortran – call MPI_Alltoall(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,comm,ierror)
  • 30. 30 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE b0 b1 b2 b3 c0 c1 c2 c3 d0 d1 d2 d3 a0 a1 a2 a3 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a0 b0 c0 d0MPI_AlltoallP0 P1 P2 P3 P0 P1 P2 P3
  • 31. 31 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE All to All with MPI_Alltoall • Parameters – sendcnts # of elements sent to each processor – sendbuf is an array of size sendcnts – recvcnts # of elements obtained from each processor – recvbuf of size recvcnts • Note that both send buffer and receive buffer must be an array of size of the number of processors • See attached Fortran and C codes
  • 32. 32 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE module mpi !DEC$ NOFREEFORM include "mpif.h“ !DEC$ FREEFORM end module ! This program shows how to use MPI_Alltoall. Each processor ! send/rec a different random number to/from other processors. module global integer numnodes,myid,mpi_err integer, parameter :: mpi_root=0 end module subroutine init use mpi use global implicit none ! do the mpi init stuff call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err) end subroutine init
  • 33. 33 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE program test1 use mpi use global implicit none integer, allocatable :: scounts(:),rcounts(:) integer ssize,rsize,i,k,j real z call init ! counts and displacement arrays allocate(scounts(0:numnodes-1)) allocate(rcounts(0:numnodes-1)) call seed_random ! find data to send do i=0,numnodes-1 call random_number(z) scounts(i)=nint(10.0*z)+1 Enddo write(*,*)"myid= ",myid," scounts= ",scounts
  • 34. 34 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE ! send the data call MPI_alltoall( scounts,1,MPI_INTEGER, & rcounts,1,MPI_INTEGER, MPI_COMM_WORLD,mpi_err) write(*,*)"myid= ",myid," rcounts= ",rcounts call mpi_finalize(mpi_err) end program subroutine seed_random use global implicit none integer the_size,j integer, allocatable :: seed(:) real z call random_seed(size=the_size) ! how big is the intrisic seed? allocate(seed(the_size)) ! allocate space for seed do j=1,the_size ! create the seed seed(j)=abs(myid*10)+(j*myid*myid)+100 ! abs is generic enddo call random_seed(put=seed) ! assign the seed deallocate(seed) end subroutine
  • 35. 35 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE #include <mpi.h> #include <stdio.h>| #include <stdlib.h> /*! This program shows how to use MPI_Alltoall. Each processor ! send/rec a different random number to/from other processors. */ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end module */ void init_it(int *argc, char ***argv); void seed_random(int id); void random_number(float *z); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }
  • 36. 36 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE int main(int argc,char *argv[]){ int *sray,*rray; int *scounts,*rcounts; int ssize,rsize,i,k,j; float z; init_it(&argc,&argv); scounts=(int*)malloc(sizeof(int)*numnodes); rcounts=(int*)malloc(sizeof(int)*numnodes); /*! seed the random number generator with a ! different number on each processor*/ seed_random(myid); /* find data to send */ for(i=0;i<numnodes;i++){ random_number(&z); scounts[i]=(int)(10.0*z)+1; } printf("myid= %d scounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",scounts[i]); printf("n");
  • 37. 37 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE /* send the data */ mpi_err = MPI_Alltoall( scounts,1,MPI_INT, rcounts,1,MPI_INT, MPI_COMM_WORLD); printf("myid= %d rcounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",rcounts[i]); printf("n"); mpi_err = MPI_Finalize();} void seed_random(int id){ srand((unsigned int)id);} void random_number(float *z){ int i; i=rand(); *z=(float)i/32767; }
  • 38. 38 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Output of previous code on 4 procs ultra:/work/majumdar/examples/mpi % bsub -q hpc -m ultra -I -n 4 a.out Job <48059> is submitted to queue <hpc>. <<Waiting for dispatch ...>> <<Starting on ultra>> myid= 1 scounts= 6 2 4 6 myid= 1 rcounts= 7 2 7 3 myid= 2 scounts= 1 7 4 4 myid= 2 rcounts= 4 4 4 4 myid= 3 scounts= 6 3 4 3 myid= 3 rcounts= 7 6 4 3 myid= 0 scounts= 1 7 4 7 myid= 0 rcounts= 1 6 1 6 -------------------------------------------- 1 7 4 7 1 6 1 6 6 2 4 6 7 2 7 3 1 7 4 4 4 4 4 4 6 3 4 3 7 6 4 3
  • 39. 39 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE The variable or “V” operators • A collection of very powerful but difficult to setup global communication routines • MPI_Gatherv: Gather different amounts of data from each processor to the root processor • MPI_Alltoallv: Send and receive different amounts of data form all processors • MPI_Allgatherv: Gather different amounts of data from each processor and send all data to each • MPI_Scatterv: Send different amounts of data to each processor from the root processor • We discuss MPI_Gatherv and MPI_Alltoallv
  • 40. 40 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Gatherv • C – int MPI_Gatherv (&sendbuf, sendcnts, sendtype, &recvbuf, &recvcnts, &rdispls,recvtype, comm); • Fortran – MPI_Gatherv (sendbuf, sendcnts, sendtype, recvbuf, recvcnts, rdispls, recvtype, comm, ierror) • Parameters: – Recvcnts is now an array – Rdispls is a displacement · See attached codes
  • 41. 41 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Gatherv rank 0 = root rank 1 rank 2 1 2 3 sendbuf 2 3 sendbuf 3 sendbuf recvcnts[0] 1 0 = rdispls[0] recvcnts[1] 2 1 = rdispls[1] 2 2 recvcnts[2] 3 3 = rdispls[2] 3 4 3 5 recvbuf
  • 42. 42 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Gatherv code Sample program: include ‘mpif.h’ integer isend(3), irecv(6) integer ircnt(0:2), idisp(0:2) data icrnt/1,2,3/ idisp/0,1,3/ call mpi_init(ierr) call mpi_comm_size(MPI_COMM_WORLD, nprocs,ierr) call mpi_comm_rank(MPI_COMM_WORLD,myrank,ierr) do I = 1,myrank+1 isend(I) = myrank+1 enddo iscnt = myrank + 1 call MPI_GATHERV(isend,iscnt,MPI_INTEGER,irecv,ircnt,idisp,MPI_INTEGER & 0,MPI_COMM_WORLD, ierr) if (myrank .eq. 0) then print *, ‘irecv =‘, irecv endif call MPI_FINALIZE(ierr) end Sample execution: % bsub –q hpc –m ultra –I –n 3 ./a.out % 0: irecv = 1 2 2 3 3 3
  • 43. 43 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE #include <mpi.h> #include <stdio.h> #include <stdlib.h> /*! This program shows how to use MPI_Gatherv. Each processor sends a ! different amount of data to the root processor. We use MPI_Gather ! first to tell the root how much data is going to be sent.*/ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end of globals */ void init_it(int *argc, char ***argv); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }
  • 44. 44 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE int main(int argc,char *argv[]){ int *will_use; int *myray,*displacements,*counts,*allray; int size,mysize,i; init_it(&argc,&argv); mysize=myid+1; myray=(int*)malloc(mysize*sizeof(int)); for(i=0;i<mysize;i++) myray[i]=myid+1; /* counts and displacement arrays are only required on the root */ if(myid == mpi_root){ counts=(int*)malloc(numnodes*sizeof(int));
  • 45. 45 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE displacements=(int*)malloc(numnodes*sizeof(int)); } /* we gather the counts to the root */ mpi_err = MPI_Gather((void*)myray,1,MPI_INT, (void*)counts, 1,MPI_INT, mpi_root,MPI_COMM_WORLD); /* calculate displacements and the size of the recv array */ if(myid == mpi_root){ displacements[0]=0; for( i=1;i<numnodes;i++){ displacements[i]=counts[i-1]+displacements[i-1]; } size=0; for(i=0;i< numnodes;i++) size=size+counts[i]; allray=(int*)malloc(size*sizeof(int)); }
  • 46. 46 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE /* different amounts of data from each processor */ /* is gathered to the root */ mpi_err = MPI_Gatherv(myray, mysize, MPI_INT, allray,counts,displacements,MPI_INT, mpi_root, MPI_COMM_WORLD); if(myid == mpi_root){ for(i=0;i<size;i++) printf("%d ",allray[i]); printf("n"); } mpi_err = MPI_Finalize(); } ultra% bsub –q hpc –m ultra –I –n 3 ./a.out 1 2 2 3 3 3
  • 47. 47 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Alltoallv • Send and receive different amounts of data form all processors • C – int MPI_Alltoallv (&sendbuf, &sendcnts, &sdispls, sendtype, &recvbuf, &recvcnts, &rdispls, recvtype, comm ); • Fortran – Call MPI_Alltoallv(sendbuf, sendcnts, sdispls, sendtype, recvbuf, recvcnts, rdispls,recvtype, comm,ierror); • See attached code
  • 48. 48 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Alltoallv rank0 rank1 rank2 sendnts[0] 1 4 7 0=sdispls[0] sendcnts[1] 2 5 8 1=sdispls[1] 2 5 8 2 sendcnts[3] 3 6 9 3=sdispls[2] 3 6 9 4 3 6 9 5 sendbuf sendbuf recvcnts[0] 1 2 3 0=rdispls[0] recvcnts[1] 4 2 3 1 recvcnts[3] 7 5 3 2 recvbuf 5 6 3=rdispls[1] 8 6 4 8 6 5 recvbuf 9 6=rdispls[2] 9 7 9 8
  • 49. 49 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Alltoallv proc# recvcnts 0 1 2 0 1 2 3 1 1 2 3 2 1 2 3 proc# rdispls 0 1 2 0 0 0 0 1 1 2 3 2 2 4 6
  • 50. 50 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Alltoallv Program alltoallv include ‘mpif.h’ integer isend(6), irecv(9) integer iscnt(0:2), isdsp(0:2), ircnt(0), irdsp(0:2) data isend/1,2,2,3,3,3/ data iscnt/1,2,3/ isdsp/0,1,3/ call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs, ierr) call MPI_COMM_RANK(MIP_COMM_WORLD,myrank, ierr) do i = 1,6 isend(i) = isend(i) + nprocs*myrank enddo do i = 0, nprocs – 1 ircnt(i) = myrank + 1 irdsp(i) = i* (myrank + 1) enddo print*, ‘isend=‘, isend call MP_FLUSH(1) call MPI_ALLTOALLV(isend,iscnt,isdsp,MPI_INTEGER,irecv, ircnt, irdsp,MPI_INTEGER, MPI_COMM_WORLD, ierr) print*, ‘irecv=‘,irecv call MPI_FINALIZE(ierr) end
  • 51. 51 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Alltoallv Sample execution of mpialltoallv program: % bsub –q hpc –m ultra –I –n 3 % 0: isend = 1 2 2 3 3 3 1: isend = 4 5 5 6 6 6 2: isend = 7 8 8 9 9 9 0: irecv = 1 4 7 0 0 0 0 0 0 1: irecv = 2 2 5 5 8 8 0 0 0 2: irecv = 3 3 3 6 6 6 9 9 9
  • 52. 52 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE #include <mpi.h> #include <stdio.h> #include <stdlib.h> /* ! This program shows how to use MPI_Alltoallv. Each processor ! send/rec a different and random amount of data to/from other ! processors. ! We use MPI_Alltoall to tell how much data is going to be sent. */ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end module */
  • 53. 53 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE void seed_random(int id); void random_number(float *z); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); } int main(int argc,char *argv[]){ int *sray,*rray; int *sdisp,*scounts,*rdisp,*rcounts; int ssize,rsize,i,k,j; float z; init_it(&argc,&argv); scounts=(int*)malloc(sizeof(int)*numnodes); rcounts=(int*)malloc(sizeof(int)*numnodes); sdisp=(int*)malloc(sizeof(int)*numnodes); rdisp=(int*)malloc(sizeof(int)*numnodes); /*
  • 54. 54 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE ! seed the random number generator with a ! different number on each processor */ seed_random(myid); /* find out how much data to send */ for(i=0;i<numnodes;i++){ random_number(&z); scounts[i]=(int)(10.0*z)+1; } printf("myid= %d scounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",scounts[i]); printf("n"); /* tell the other processors how much data is coming */ mpi_err = MPI_Alltoall( scounts,1,MPI_INT, rcounts,1,MPI_INT, MPI_COMM_WORLD);
  • 55. 55 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE /* write(*,*)"myid= ",myid," rcounts= ",rcounts */ /* calculate displacements and the size of the arrays */ sdisp[0]=0; for(i=1;i<numnodes;i++){ sdisp[i]=scounts[i-1]+sdisp[i-1]; } rdisp[0]=0; for(i=1;i<numnodes;i++){ rdisp[i]=rcounts[i-1]+rdisp[i-1]; } ssize=0; rsize=0; for(i=0;i<numnodes;i++){ ssize=ssize+scounts[i]; rsize=rsize+rcounts[i]; }
  • 56. 56 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE /* allocate send and rec arrays */ sray=(int*)malloc(sizeof(int)*ssize); rray=(int*)malloc(sizeof(int)*rsize); for(i=0;i<ssize;i++) sray[i]=myid; /* send/rec different amounts of data to/from each processor */ mpi_err = MPI_Alltoallv( sray,scounts,sdisp,MPI_INT, rray,rcounts,rdisp,MPI_INT, MPI_COMM_WORLD); printf("myid= %d rray=",myid); for(i=0;i<rsize;i++) printf("%d ",rray[i]); printf("n"); mpi_err = MPI_Finalize(); }
  • 57. 57 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE void seed_random(int id) { srand((unsigned int)id); } void random_number(float *z){ int i; i=rand(); *z=(float)i/32767; } Ultra output from 3 procs run: 0:myid= 0 scounts=1 7 4 0:myid= 0 rray=0 1 1 1 1 1 1 2 1:myid= 1 scounts=6 2 4 1:myid= 1 rray=0 0 0 0 0 0 0 1 1 2 2 2 2 2 2 2 2:myid= 2 scounts=1 7 4 2:myid= 2 rray=0 0 0 0 1 1 1 1 2 2 2 2
  • 58. 58 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Derived types • C and Fortran 90 have the ability to define arbitrary data types that encapsulate reals, integers, and characters. • MPI allows you to define message data types corresponding to your data types • Can use these data types just as default types
  • 59. 59 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Derived types, Three main classifications: • Contiguous Vectors: enable you to send contiguous blocks of the same type of data lumped together • Noncontiguous Vectors: enable you to send noncontiguous blocks of the same type of data lumped together • Abstract types: enable you to (carefully) send C or Fortran 90 structures, don't send pointers
  • 60. 60 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Derived types, how to use them • Three step process – Define the type using • MPI_Type_contiguous for contiguous vectors • MPI_Type_vector for noncontiguous vectors • MPI_Type_struct for structures – Commit the type using • MPI_Type_commit – Use in normal communication calls • MPI_Send(buffer, count, MY_TYPE, destination,tag, MPI_COMM_WORLD, ierr)
  • 61. 61 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Type_contiguous • Defines a new data type of length count elements from your old data type • C – MPI_Type_contiguous(int count, old_type, &new_type) • Fortran – Call MPI_TYPE_CONTIGUOUS(count, old_type, new_type, ierror) • Parameters – Old_type: your base type – New_type: a type count elements of Old_type • See attached codes
  • 62. 62 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_TYPE_CONTIGUOUS Sample program - Fortran: program type_contiguous include ‘mpif.h’ integer ibuf(20) call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr) call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,ierr) if (myrank .eq. 0) then do i = 1,20 ibuf(i) = I enddo endif call MPI_TYPE_CONTIGUOUS(3,MPI_INTEGER,inewtype, ierr) call MPI_TYPE_COMMIT(inewtype, ierr) call MPI_BCAST(ibuf,3,inewtype,0,MPI_COMM_WORLD, ierr) print*, ‘ibuf=‘,ibuf call MPI_FINALIZE(ierr) end Sample execution: % bsub –q hpc –m ultra –I in 2 a.out % 0 : ibuf =1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1: ibuf = 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0
  • 63. 63 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Type_contiguous#include <stdio.h> #include "mpi.h“ #include <math.h> int main(argc,argv) int argc; char *argv[];{ int myid, numprocs, i , buffer[20]; MPI_Status status; MPI_Datatype inewtype ; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { for (i=0; i<20; i++) buffer[i]=i ;} if (myid == 1) { for (i=0; i<20; i++) buffer[i]=0 ;} MPI_Type_contiguous(3,MPI_INT,&inewtype); MPI_Type_commit(&inewtype) ; MPI_Bcast(buffer,3,inewtype,0,MPI_COMM_WORLD); for(i=0;i<20;i++) printf("%d ",buffer[i]); printf("n"); MPI_Finalize(); } Output on two processors : 0 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  • 64. 64 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Type_vector • Defines a datatype which consists of count blocks each of length blocklength and stride displacement between blocks • C – MPI_Type_vector(count, blocklength, stride, old_type, *new_type) • Fortran – Call MPI_TYPE_VECTOR(count, blocklength, stride, old_type, new_type, ierror) • See attached codes
  • 65. 65 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE module mpi !DEC$ NOFREEFORM include "mpif.h" !DEC$ FREEFORM end module !Shows how to use MPI_Type_vector to send noncontiguous blocks of data !and MPI_Get_count and MPI_Get_elements to see the number of elements sent program do_vect use mpi ! include "mpif.h" integer , parameter :: size=24 integer myid, ierr,numprocs real*8 svect(0:size),rvect(0:size) integer i,bonk1,bonk2,numx,stride,extent integer MY_TYPE integer status(MPI_STATUS_SIZE) call MPI_INIT( ierr ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr ) stride=5 numx=(size+1)/stride
  • 66. 66 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE extent = 1 if(myid == 1)write(*,*)"numx=",numx," extent=",extent," stride=",stride call MPI_Type_vector(numx,extent,stride,MPI_DOUBLE_PRECISION,MY_TYPE,ier r) call MPI_Type_commit(MY_TYPE, ierr ) if(myid == 0)then do i=0,size svect(i)=i enddo call MPI_Send(svect,1,MY_TYPE,1,100,MPI_COMM_WORLD,ierr) endif if(myid == 1)then do i=0,size rvect(i)=-1 enddo call MPI_Recv(rvect,1,MY_TYPE,0,100,MPI_COMM_WORLD,status,ierr) endif if(myid == 1)then call MPI_Get_count(status,MY_TYPE,bonk1, ierr ) call MPI_Get_elements(status,MPI_DOUBLE_PRECISION,bonk2,ierr)
  • 67. 67 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE write(*,*)"got ", bonk1," elements of type MY_TYPE" write(*,*)"which contained ", bonk2," elements of type MPI_DOUBLE_PREC ISION" do i=0,size if(rvect(i) /= -1)write(*,'(i2,f4.0)')i,rvect(i) enddo endif call MPI_Finalize(ierr ) end program ! output ! numx= 5 extent= 1 stride= 5 ! got 1 elements of type MY_TYPE ! which contained 5 elements of type MPI_DOUBLE_PRECISION ! 0 0. ! 5 5. ! 10 10. ! 15 15. ! 20 20.
  • 68. 68 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Shows how to use MPI_Type_vector to send noncontiguous blocks of data and MPI_Get_count and MPI_Get_elements to see the number of elements sent */ #include <stdio.h> #include "mpi.h" #include <math.h> int main(argc,argv) int argc; char *argv[]; { int myid, numprocs,mpi_err; #define SIZE 25 double svect[SIZE],rvect[SIZE]; int i,bonk1,bonk2,numx,stride,extent; MPI_Datatype MPI_LEFT_RITE; MPI_Status status; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid);
  • 69. 69 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE stride=5; numx=(SIZE+1)/stride; extent=1; if(myid == 1){ printf("numx=%d extent=%d stride=%dn",stride,numx,extent,stride); } mpi_err=MPI_Type_vector(numx,extent,stride,MPI_DOUBLE,&MPI_LEFT_ RITE); mpi_err=MPI_Type_commit(&MPI_LEFT_RITE); if(myid == 0){ for (i=0;i<SIZE;i++) svect[i]=i; MPI_Send(svect,1,MPI_LEFT_RITE,1,100,MPI_COMM_WORLD); } if(myid == 1){ for (i=0;i<SIZE;i++) rvect[i]=-1; MPI_Recv(rvect,1,MPI_LEFT_RITE,0,100,MPI_COMM_WORLD,&status); }
  • 70. 70 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE if(myid == 1){ MPI_Get_count(&status,MPI_LEFT_RITE,&bonk1); MPI_Get_elements(&status,MPI_DOUBLE,&bonk2); printf("got %d elements of type MY_TYPEn",bonk1); printf("which contained %d elements of type MPI_DOUBLEn",bonk2); for (i=0;i<SIZE;i++) if(rvect[i] != -1)printf("%d %gn",i,rvect[i]); } MPI_Finalize(); } /* output numx=5 extent=5 stride=1 got 1 elements of type MY_TYPE which contained 5 elements of type MPI_DOUBLE 0 0 5 5 10 10 15 15 20 20 */
  • 71. 71 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Type_struct • Defines a MPI datatype which maps to a user defined derived datatype • C – int MPI_Type_struct(count, &array_of_blocklengths, &array_of_displacement, &array_of_types, &newtype); • Fortran – Call MPI_TYPE_STRUCT(count, array_of_blocklengths, array_of_displacement, array_of_types, newtype,ierror)
  • 72. 72 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Type_struct • Parameters: – [IN count] # of old types in the new type (integer) – [IN array_of_blocklengths] how many of each type in new structure (integer) – [IN array_of_types] types in new structure (integer) – [IN array_of_displacement] offset in bytes for the beginning of each group of types (integer) – [OUT newtype] new datatype (handle) – Call MPI_TYPE_STRUCT(count, array_of_blocklengths, array_of_displacement,array_of_types, newtype,ierror) – Ierr = MPI_Type_struct(count, &array_of_blocklengths, &array_of_displacement, &array_of_types, &newtype);
  • 73. 73 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Derived Data type Example Consider the data type or structure consisting of 3 mpi double 10 mpi integer 2 mpi character Creating the MPI data structure matching this C/Fortran structure is a three step process • Fill the descriptor arrays: B - blocklengths T - types D - displacements • Use MPI_Type_struct to create the MPI data structure • Commit the new data type using MPI_Type_commit
  • 74. 74 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Derived Data type Example • To create the MPI data structure matching this C/Fortran structure – Fill the descriptor arrays: • B - blocklengths • T - types • D - displacements • Then use MPI_Type_struct
  • 75. 75 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Derived Data type Example (continued) Fortran : ! t contains the types that ! make up the structure t(1)=MPI_DOUBLE_PRECISION t(2)=MPI_INTEGER t(3)=MPI_CHARACTER ! b contains the # of each type b(1)=3;b(2)=10;b(3)=2 ! d contains the byte offset of ! the start of each type d(1)=0;d(2)=24;d(3)=64 call MPI_TYPE_STRUCT(3,b,d,t, MPI_CHARLES,mpi_err) MPI_CHARLES is our new data type C : /* t contains the types that make up the structure*/ t[0]=MPI_DOUBLE t[1]=MPI_INT t[2]=MPI_CHAR /*b contains the # of each type */ b[0]=3;b[1]=10;b[2]=2 /* d contains the byte offset of the start of each type*/ d[0]=0;d[1]=24;d[2]=64 ierr = MPI_Type_struct(3,&b,&d,&t, MPI_CHARLES,mpi_err)
  • 76. 76 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Type_commit • Before we use the new data type we call MPI_Type_commit • C – MPI_Type_commit(&MPI_CHARLES) • Fortran – Call MPI_Type_commit(MPI_CHARLES,ierr)
  • 77. 77 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Communicators • A communicator is a parameter in all MPI message passing routines • A communicator is a collection of processors that can engage in communication • MPI_COMM_WORLD is the default communicator that consists of all processors • MPI allows you to create subsets of communicators
  • 78. 78 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Why Communicators? • Isolate communication to a small number of processors • Useful for creating libraries • Different processors can work on different parts of the problem • Useful for communicating with "nearest neighbors"
  • 79. 79 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Comm_split • Provides a short cut method to create a collection of communicators • All processors with the "same color" will be in the same communicator • Index gives rank in new communicator • Fortran – call MPI_COMM_SPLIT(OLD_COMM, color, index, NEW_COMM, mpi_err) • C – MPI_Comm_split(OLD_COMM, color, index, &NEW_COMM)
  • 80. 80 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Comm_split • Split odd and even processors into 2 communicators Program comm_split include "mpif.h" Integer color,zero_one call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, mpi_err ) color=mod(myid,2) !color is either 1 or 0 call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,myid,NEW_COMM,mpi_err) call MPI_COMM_RANK( NEW_COMM, new_id, mpi_err ) call MPI_COMM_SIZE( NEW_COMM, new_nodes, mpi_err ) Zero_one = -1 If(new_id==0)Zero_one = color Call MPI_Bcast(Zero_one,1,MPI_INTEGER,0, NEW_COMM,mpi_err) If(zero_one==0)write(*,*)"part of even processor communicator" If(zero_one==1)write(*,*)"part of odd processor communicator" Write(*,*)"old_id=", myid, "new_id=", new_id Call MPI_FINALIZE(mpi_error) End program
  • 81. 81 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE MPI_Comm_split • Split odd and even processors into 2 communicators 0: part of even processor communicator 0: old_id= 0 new_id= 0 2: part of even processor communicator 2: old_id= 2 new_id= 1 1: part of odd processor communicator 1: old_id= 1 new_id= 0 3: part of odd processor communicator 3: old_id= 3 new_id= 1
  • 82. 82 San DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE #include "mpi.h" #include <math.h> int main(argc,argv) int argc; char *argv[]; { int myid, numprocs; int color,Zero_one,new_id,new_nodes; MPI_Comm NEW_COMM; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); color=myid % 2; MPI_Comm_split(MPI_COMM_WORLD,color,myid,&NEW_COMM); MPI_Comm_rank( NEW_COMM, &new_id); MPI_Comm_size( NEW_COMM, &new_nodes); Zero_one = -1; if(new_id==0)Zero_one = color; MPI_Bcast(&Zero_one,1,MPI_INT,0, NEW_COMM); if(Zero_one==0)printf("part of even processor communicator n"); if(Zero_one==1)printf("part of odd processor communicator n"); printf("old_id= %d new_id= %dn", myid, new_id); MPI_Finalize(); }