SlideShare a Scribd company logo
1
Distributed Memory
Programming with MPI
Slides extended from
An Introduction to Parallel Programming by Peter
Pacheco
Dilum Bandara
Dilum.Bandara@uom.lk
2
Distributed Memory Systems
 We discuss about developing programs for these
systems using MPI
 MPI – Message Passing Interface
 Are set of libraries that can be called from C, C++, &
Fortran
Copyright © 2010, Elsevier Inc. All rights Reserved
3
Why MPI?
 Standardized & portable message-passing
system
 One of the oldest libraries
 Wide-spread adoption
 Minimal requirements on underlying hardware
 Explicit parallelization
 Achieves high performance
 Scales to a large no of processors
 Intellectually demanding
Copyright © 2010, Elsevier Inc. All rights Reserved
4
Our First MPI Program
Copyright © 2010, Elsevier Inc. All rights Reserved
5
Compilation
Copyright © 2010, Elsevier Inc. All rights Reserved
mpicc -g -Wall -o mpi_hello mpi_hello.c
wrapper script to compile
turns on all warnings
source file
create this executable file name
(as opposed to default a.out)
produce
debugging
information
6
Execution
Copyright © 2010, Elsevier Inc. All rights Reserved
mpiexec -n <no of processes> <executable>
mpiexec -n 1 ./mpi_hello
mpiexec -n 4 ./mpi_hello
run with 1 process
run with 4 processes
7
Execution
Copyright © 2010, Elsevier Inc. All rights Reserved
mpiexec -n 1 ./mpi_hello
mpiexec -n 4 ./mpi_hello
Greetings from process 0 of 1 !
Greetings from process 0 of 4 !
Greetings from process 1 of 4 !
Greetings from process 2 of 4 !
Greetings from process 3 of 4 !
8
MPI Programs
 Need to add mpi.h header file
 Identifiers defined by MPI start with “MPI_”
 1st letter following underscore is uppercase
 For function names & MPI-defined types
 Helps to avoid confusion
Copyright © 2010, Elsevier Inc. All rights Reserved
9
6 Golden MPI Functions
Copyright © 2010, Elsevier Inc. All rights Reserved
10
MPI Components
 MPI_Init
 Tells MPI to do all necessary setup
 e.g., allocate storage for message buffers, decide rank of a
process
 argc_p & argv_p are pointers to argc & argv
arguments in main( )
 Function returns error codes
Copyright © 2010, Elsevier Inc. All rights Reserved
11
 MPI_Finalize
 Tells MPI we’re done, so clean up anything allocated
for this program
MPI Components (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
12
Communicators
 Collection of processes that can send messages
to each other
 Messages from others communicators are ignored
 MPI_Init defines a communicator that consists of
all processes created when the program is
started
 Called MPI_COMM_WORLD
Copyright © 2010, Elsevier Inc. All rights Reserved
13
Communicators (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
My rank
(process making this call)
No of processes in the communicator
14
Single-Program Multiple-Data (SPMD)
 We compile 1 program
 Process 0 does something different
 Receives messages & prints them while the
other processes do the work
 if-else construct makes our program
SPMD
 We can run this program on any no of
processors
 e.g., 4, 8, 32, 1000, …
Copyright © 2010, Elsevier Inc. All rights Reserved
15
Communication
 msg_buf_p, msg_size, msg_type
 Determines content of message
 dest – destination processor’s rank
 tag – use to distinguish messages that are identical in
content
Copyright © 2010, Elsevier Inc. All rights Reserved
16
Data Types
Copyright © 2010, Elsevier Inc. All rights Reserved
17
Communication (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
MPI_ANY_SOURCE to receive messages (from any
source) in order for arrival
18
Message Matching
Copyright © 2010, Elsevier Inc. All rights Reserved
MPI_Send
src = q
MPI_Recv
dest = r
r
q
19
Receiving Messages
 Receiver can get a message without
knowing
 Amount of data in message
 Sender of message
 Tag of message
 How can those be found out?
Copyright © 2010, Elsevier Inc. All rights Reserved
20
How Much Data am I Receiving?
Copyright © 2010, Elsevier Inc. All rights Reserved
21
 Exact behavior is determined by MPI
implementation
 MPI_Send may behave differently with regard to
buffer size, cutoffs, & blocking
 Cutoff
 if message size < cutoff  buffer
 if message size ≥ cutoff  MPI_Send will block
 MPI_Recv always blocks until a matching
message is received
 Preserve message ordering from a sender
 Know your implementation
 Don’t make assumptions!
Issues With Send & Receive
Copyright © 2010, Elsevier Inc. All rights Reserved
22
Trapezoidal Rule
Copyright © 2010, Elsevier Inc. All rights Reserved
23
Trapezoidal Rule (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
24
Serial Pseudo-code
Copyright © 2010, Elsevier Inc. All rights Reserved
25
Parallel Pseudo-Code
Copyright © 2010, Elsevier Inc. All rights Reserved
26
Tasks & Communications for
Trapezoidal Rule
Copyright © 2010, Elsevier Inc. All rights Reserved
27
First Version
Copyright © 2010, Elsevier Inc. All rights Reserved
28
First Version (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
29
First Version (Cont.)
Copyright © 2010, Elsevier Inc. All rights Reserved
30
COLLECTIVE
COMMUNICATION
Copyright © 2010, Elsevier Inc. All rights Reserved
31
Collective Communication
Copyright © 2010, Elsevier Inc. All rights Reserved
A tree-structured global sum
32
Alternative Tree-Structured Global Sum
Copyright © 2010, Elsevier Inc. All rights Reserved
Which is most optimum?
Can we do better?
33
MPI_Reduce
Copyright © 2010, Elsevier Inc. All rights Reserved
34
Predefined Reduction Operators
Copyright © 2010, Elsevier Inc. All rights Reserved
35
Collective vs. Point-to-Point Communications
 All processes in the communicator must call the
same collective function
 e.g., a program that attempts to match a call to
MPI_Reduce on 1 process with a call to MPI_Recv on
another process is erroneous
 Program will hang or crash
 Arguments passed by each process to an MPI
collective communication must be “compatible”
 e.g., if 1 process passes in 0 as dest_process &
another passes in 1, then the outcome of a call to
MPI_Reduce is erroneous
 Program is likely to hang or crash
Copyright © 2010, Elsevier Inc. All rights Reserved
36
Collective vs. P-to-P Communications (Cont.)
 output_data_p argument is only used on
dest_process
 However, all of the processes still need to pass in an
actual argument corresponding to output_data_p,
even if it’s just NULL
 Point-to-point communications are matched on
the basis of tags & communicators
 Collective communications don’t use tags
 Matched solely on the basis of communicator & order
in which they’re called
Copyright © 2010, Elsevier Inc. All rights Reserved
37
MPI_Allreduce
 Useful when all processes need result of a global
sum to complete some larger computation
Copyright © 2010, Elsevier Inc. All rights Reserved
38
Copyright © 2010, Elsevier Inc. All rights Reserved
Global sum followed
by distribution of result
MPI_Allreduce (Cont.)
39
Butterfly-Structured Global Sum
Copyright © 2010, Elsevier Inc. All rights Reserved
Processes exchange partial results
40
Broadcast
 Data belonging to a single process is sent to all
of the processes in communicator
Copyright © 2010, Elsevier Inc. All rights Reserved
41
Tree-Structured Broadcast
Copyright © 2010, Elsevier Inc. All rights Reserved
42
Data Distributions – Compute a Vector Sum
Copyright © 2010, Elsevier Inc. All rights Reserved
Serial implementation
43
Partitioning Options
Copyright © 2010, Elsevier Inc. All rights Reserved
 Block partitioning
 Assign blocks of consecutive components to each process
 Cyclic partitioning
 Assign components in a round robin fashion
 Block-cyclic partitioning
 Use a cyclic distribution of blocks of components
44
Parallel Implementation
Copyright © 2010, Elsevier Inc. All rights Reserved
45
MPI_Scatter
 Can be used in a function that reads in an entire
vector on process 0 but only sends needed
components to other processes
Copyright © 2010, Elsevier Inc. All rights Reserved
46
Reading & Distributing a Vector
Copyright © 2010, Elsevier Inc. All rights Reserved
47
MPI_Gather
 Collect all components of a vector onto process 0
 Then process 0 can process all of components
Copyright © 2010, Elsevier Inc. All rights Reserved
48
MPI_Allgather
 Concatenates contents of each process’
send_buf_p & stores this in each process’
recv_buf_p
 recv_count is the amount of data being received from
each process
Copyright © 2010, Elsevier Inc. All rights Reserved
49
Summary
Copyright © 2010, Elsevier Inc. All rights Reserved
Source: https://computing.llnl.gov/tutorials/mpi/

More Related Content

What's hot

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
Ibrahim Amer
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
Vajira Thambawita
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101AMIT KUMAR
 
Modern operating system.......
Modern operating system.......Modern operating system.......
Modern operating system.......
vignesh0009
 
File replication
File replicationFile replication
File replication
Klawal13
 
Distributed System
Distributed SystemDistributed System
Distributed System
Iqra khalil
 
Operating System 3
Operating System 3Operating System 3
Operating System 3tech2click
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
Prabhakaran V M
 
Resource management
Resource managementResource management
Resource management
Dr Sandeep Kumar Poonia
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Mohammed Bennamoun
 
12. dfs
12. dfs12. dfs
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
National College of Business Administration & Economics ( NCBA&E)
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
Balaji Vignesh
 
Point-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIPoint-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPI
Hanif Durad
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
ishapadhy
 
6.Distributed Operating Systems
6.Distributed Operating Systems6.Distributed Operating Systems
6.Distributed Operating Systems
Dr Sandeep Kumar Poonia
 

What's hot (20)

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
 
Modern operating system.......
Modern operating system.......Modern operating system.......
Modern operating system.......
 
File replication
File replicationFile replication
File replication
 
Distributed System
Distributed SystemDistributed System
Distributed System
 
Operating System 3
Operating System 3Operating System 3
Operating System 3
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
Resource management
Resource managementResource management
Resource management
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
12. dfs
12. dfs12. dfs
12. dfs
 
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
 
Monitors
MonitorsMonitors
Monitors
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Point-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIPoint-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPI
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
 
6.Distributed Operating Systems
6.Distributed Operating Systems6.Distributed Operating Systems
6.Distributed Operating Systems
 

Similar to Distributed Memory Programming with MPI

MPI - 2
MPI - 2MPI - 2
MPI - 2
Shah Zaib
 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
MPI - 3
MPI - 3MPI - 3
MPI - 3
Shah Zaib
 
What is [Open] MPI?
What is [Open] MPI?What is [Open] MPI?
What is [Open] MPI?
Jeff Squyres
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
Shah Zaib
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
Ankit Mahato
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
Shah Zaib
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_c
Maria Cida Rosa
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
SimRelokasi2
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
Mohit Raghuvanshi
 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPC
IJSRD
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
Hanif Durad
 
Chapter 1.ppt
Chapter 1.pptChapter 1.ppt
Chapter 1.ppt
BakhitaSalman
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
Divya Tiwari
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Marcirio Chaves
 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdf
ssuserada6a9
 
Rgk cluster computing project
Rgk cluster computing projectRgk cluster computing project
Rgk cluster computing project
OstopD
 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPI
jbp4444
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First Look
VMware Tanzu
 
CCNA 1 Routing and Switching v5.0 Chapter 10
CCNA 1 Routing and Switching v5.0 Chapter 10CCNA 1 Routing and Switching v5.0 Chapter 10
CCNA 1 Routing and Switching v5.0 Chapter 10
Nil Menon
 

Similar to Distributed Memory Programming with MPI (20)

MPI - 2
MPI - 2MPI - 2
MPI - 2
 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
My ppt hpc u4
 
MPI - 3
MPI - 3MPI - 3
MPI - 3
 
What is [Open] MPI?
What is [Open] MPI?What is [Open] MPI?
What is [Open] MPI?
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_c
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
 
Advanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPCAdvanced Scalable Decomposition Method with MPICH Environment for HPC
Advanced Scalable Decomposition Method with MPICH Environment for HPC
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Chapter 1.ppt
Chapter 1.pptChapter 1.ppt
Chapter 1.ppt
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
 
Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2Tutorial on Parallel Computing and Message Passing Model - C2
Tutorial on Parallel Computing and Message Passing Model - C2
 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdf
 
Rgk cluster computing project
Rgk cluster computing projectRgk cluster computing project
Rgk cluster computing project
 
Intro to MPI
Intro to MPIIntro to MPI
Intro to MPI
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First Look
 
CCNA 1 Routing and Switching v5.0 Chapter 10
CCNA 1 Routing and Switching v5.0 Chapter 10CCNA 1 Routing and Switching v5.0 Chapter 10
CCNA 1 Routing and Switching v5.0 Chapter 10
 

More from Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Dilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
Dilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
Dilum Bandara
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
Dilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Dilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
Dilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
Dilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
Dilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
Dilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
Dilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
Dilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
Dilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
Dilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
Dilum Bandara
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
Dilum Bandara
 

More from Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 

Recently uploaded

Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 

Distributed Memory Programming with MPI

  • 1. 1 Distributed Memory Programming with MPI Slides extended from An Introduction to Parallel Programming by Peter Pacheco Dilum Bandara Dilum.Bandara@uom.lk
  • 2. 2 Distributed Memory Systems  We discuss about developing programs for these systems using MPI  MPI – Message Passing Interface  Are set of libraries that can be called from C, C++, & Fortran Copyright © 2010, Elsevier Inc. All rights Reserved
  • 3. 3 Why MPI?  Standardized & portable message-passing system  One of the oldest libraries  Wide-spread adoption  Minimal requirements on underlying hardware  Explicit parallelization  Achieves high performance  Scales to a large no of processors  Intellectually demanding Copyright © 2010, Elsevier Inc. All rights Reserved
  • 4. 4 Our First MPI Program Copyright © 2010, Elsevier Inc. All rights Reserved
  • 5. 5 Compilation Copyright © 2010, Elsevier Inc. All rights Reserved mpicc -g -Wall -o mpi_hello mpi_hello.c wrapper script to compile turns on all warnings source file create this executable file name (as opposed to default a.out) produce debugging information
  • 6. 6 Execution Copyright © 2010, Elsevier Inc. All rights Reserved mpiexec -n <no of processes> <executable> mpiexec -n 1 ./mpi_hello mpiexec -n 4 ./mpi_hello run with 1 process run with 4 processes
  • 7. 7 Execution Copyright © 2010, Elsevier Inc. All rights Reserved mpiexec -n 1 ./mpi_hello mpiexec -n 4 ./mpi_hello Greetings from process 0 of 1 ! Greetings from process 0 of 4 ! Greetings from process 1 of 4 ! Greetings from process 2 of 4 ! Greetings from process 3 of 4 !
  • 8. 8 MPI Programs  Need to add mpi.h header file  Identifiers defined by MPI start with “MPI_”  1st letter following underscore is uppercase  For function names & MPI-defined types  Helps to avoid confusion Copyright © 2010, Elsevier Inc. All rights Reserved
  • 9. 9 6 Golden MPI Functions Copyright © 2010, Elsevier Inc. All rights Reserved
  • 10. 10 MPI Components  MPI_Init  Tells MPI to do all necessary setup  e.g., allocate storage for message buffers, decide rank of a process  argc_p & argv_p are pointers to argc & argv arguments in main( )  Function returns error codes Copyright © 2010, Elsevier Inc. All rights Reserved
  • 11. 11  MPI_Finalize  Tells MPI we’re done, so clean up anything allocated for this program MPI Components (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 12. 12 Communicators  Collection of processes that can send messages to each other  Messages from others communicators are ignored  MPI_Init defines a communicator that consists of all processes created when the program is started  Called MPI_COMM_WORLD Copyright © 2010, Elsevier Inc. All rights Reserved
  • 13. 13 Communicators (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved My rank (process making this call) No of processes in the communicator
  • 14. 14 Single-Program Multiple-Data (SPMD)  We compile 1 program  Process 0 does something different  Receives messages & prints them while the other processes do the work  if-else construct makes our program SPMD  We can run this program on any no of processors  e.g., 4, 8, 32, 1000, … Copyright © 2010, Elsevier Inc. All rights Reserved
  • 15. 15 Communication  msg_buf_p, msg_size, msg_type  Determines content of message  dest – destination processor’s rank  tag – use to distinguish messages that are identical in content Copyright © 2010, Elsevier Inc. All rights Reserved
  • 16. 16 Data Types Copyright © 2010, Elsevier Inc. All rights Reserved
  • 17. 17 Communication (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved MPI_ANY_SOURCE to receive messages (from any source) in order for arrival
  • 18. 18 Message Matching Copyright © 2010, Elsevier Inc. All rights Reserved MPI_Send src = q MPI_Recv dest = r r q
  • 19. 19 Receiving Messages  Receiver can get a message without knowing  Amount of data in message  Sender of message  Tag of message  How can those be found out? Copyright © 2010, Elsevier Inc. All rights Reserved
  • 20. 20 How Much Data am I Receiving? Copyright © 2010, Elsevier Inc. All rights Reserved
  • 21. 21  Exact behavior is determined by MPI implementation  MPI_Send may behave differently with regard to buffer size, cutoffs, & blocking  Cutoff  if message size < cutoff  buffer  if message size ≥ cutoff  MPI_Send will block  MPI_Recv always blocks until a matching message is received  Preserve message ordering from a sender  Know your implementation  Don’t make assumptions! Issues With Send & Receive Copyright © 2010, Elsevier Inc. All rights Reserved
  • 22. 22 Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved
  • 23. 23 Trapezoidal Rule (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 24. 24 Serial Pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved
  • 25. 25 Parallel Pseudo-Code Copyright © 2010, Elsevier Inc. All rights Reserved
  • 26. 26 Tasks & Communications for Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved
  • 27. 27 First Version Copyright © 2010, Elsevier Inc. All rights Reserved
  • 28. 28 First Version (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 29. 29 First Version (Cont.) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 30. 30 COLLECTIVE COMMUNICATION Copyright © 2010, Elsevier Inc. All rights Reserved
  • 31. 31 Collective Communication Copyright © 2010, Elsevier Inc. All rights Reserved A tree-structured global sum
  • 32. 32 Alternative Tree-Structured Global Sum Copyright © 2010, Elsevier Inc. All rights Reserved Which is most optimum? Can we do better?
  • 33. 33 MPI_Reduce Copyright © 2010, Elsevier Inc. All rights Reserved
  • 34. 34 Predefined Reduction Operators Copyright © 2010, Elsevier Inc. All rights Reserved
  • 35. 35 Collective vs. Point-to-Point Communications  All processes in the communicator must call the same collective function  e.g., a program that attempts to match a call to MPI_Reduce on 1 process with a call to MPI_Recv on another process is erroneous  Program will hang or crash  Arguments passed by each process to an MPI collective communication must be “compatible”  e.g., if 1 process passes in 0 as dest_process & another passes in 1, then the outcome of a call to MPI_Reduce is erroneous  Program is likely to hang or crash Copyright © 2010, Elsevier Inc. All rights Reserved
  • 36. 36 Collective vs. P-to-P Communications (Cont.)  output_data_p argument is only used on dest_process  However, all of the processes still need to pass in an actual argument corresponding to output_data_p, even if it’s just NULL  Point-to-point communications are matched on the basis of tags & communicators  Collective communications don’t use tags  Matched solely on the basis of communicator & order in which they’re called Copyright © 2010, Elsevier Inc. All rights Reserved
  • 37. 37 MPI_Allreduce  Useful when all processes need result of a global sum to complete some larger computation Copyright © 2010, Elsevier Inc. All rights Reserved
  • 38. 38 Copyright © 2010, Elsevier Inc. All rights Reserved Global sum followed by distribution of result MPI_Allreduce (Cont.)
  • 39. 39 Butterfly-Structured Global Sum Copyright © 2010, Elsevier Inc. All rights Reserved Processes exchange partial results
  • 40. 40 Broadcast  Data belonging to a single process is sent to all of the processes in communicator Copyright © 2010, Elsevier Inc. All rights Reserved
  • 41. 41 Tree-Structured Broadcast Copyright © 2010, Elsevier Inc. All rights Reserved
  • 42. 42 Data Distributions – Compute a Vector Sum Copyright © 2010, Elsevier Inc. All rights Reserved Serial implementation
  • 43. 43 Partitioning Options Copyright © 2010, Elsevier Inc. All rights Reserved  Block partitioning  Assign blocks of consecutive components to each process  Cyclic partitioning  Assign components in a round robin fashion  Block-cyclic partitioning  Use a cyclic distribution of blocks of components
  • 44. 44 Parallel Implementation Copyright © 2010, Elsevier Inc. All rights Reserved
  • 45. 45 MPI_Scatter  Can be used in a function that reads in an entire vector on process 0 but only sends needed components to other processes Copyright © 2010, Elsevier Inc. All rights Reserved
  • 46. 46 Reading & Distributing a Vector Copyright © 2010, Elsevier Inc. All rights Reserved
  • 47. 47 MPI_Gather  Collect all components of a vector onto process 0  Then process 0 can process all of components Copyright © 2010, Elsevier Inc. All rights Reserved
  • 48. 48 MPI_Allgather  Concatenates contents of each process’ send_buf_p & stores this in each process’ recv_buf_p  recv_count is the amount of data being received from each process Copyright © 2010, Elsevier Inc. All rights Reserved
  • 49. 49 Summary Copyright © 2010, Elsevier Inc. All rights Reserved Source: https://computing.llnl.gov/tutorials/mpi/

Editor's Notes

  1. 8 January 2024
  2. Began in Supercomputing ’92
  3. Tag – 2 messages – content in 1 should be printed while other should be used for calculation
  4. sendbuf address of send buffer (choice) count no of elements in send buffer (integer) datatype data type of elements of send buffer (handle) op reduce operation (handle) root rank of root process (integer) comm communicator (handle)