SlideShare a Scribd company logo
Chapter 04
Basic Communication Operation
Processes need to exchange data with
other processes
This exchange of data can significantly impact the efficiency of parallel programs by
introducing interaction delays during their execution
ts + mtw
simple exchange of an m-word message between two processes running on different
nodes of an interconnection network with cut-through routing.
ts: latency or the startup time for the data transfer
tw: per word transfer time
ts & tw inversely proportional to the available bandwidthbetween the nodes
Many interactions in practical parallel programs occur in well-defined patterns
involving more than two processes. Often either all processes
participate together in a single global interaction operation, or
subsets of processes participate in interactions local to each
subset.
Let's Digon Different kind of interations.
Chapter 4.1
One-to-All Broadcast
&
All-to-One Reduction
One-to-All Broadcast
Parallel algorithms often
require a single process to
send identical data to all
other processes or to a
subset of them
Initially, only the source process has the data of size m that needs to be broadcast.
At the termination of the procedure, there are p copies of the initial data – one
belonging to each process.
All-to-One Reduction
Each of the pparticipating processes starts
with a buffer M containing m words. The
data from all processes are combined
through an associative operator and
accumulated at a single destination process
into one buffer of size m
One-to-all broadcast and all-to-one reduction
usage
matrix-vector multiplication
Gaussian elimination
shortest paths
vector inner product
Chapter 4.1.1
Ring or Linear Array
Chapter 4.1.1
Ring or Linear Array
Chapter 4.1.1
A naive way to perform one-to-all broadcast is to sequentially
send p - 1 messages from the source to the other
p - 1 processes.
Chapter 4.1.1
A naive way to perform one-to-all broadcast is to sequentially
send p - 1 messages from the source to the other p -
1 processes.
This is inefficientbecause the source process becomes
a bottleneck
only the connection between a single pair of nodes is used at a time
Chapter 4.1.1
How to solve this bottleneck issue !
Chapter 4.1.1
The Solution is recursive doubling
Chapter 4.1.1
The Solution is recursive doubling
The source process first sends the message to another
process. Now both these processes can simultaneously send
the message to two other processes that are still waiting for the
message. By continuing this procedure until all the processes have
received the data, the message can be broadcast in log p steps.
Chapter 4.1.1
Chapter 4.1.1
Reductionon a linear array can be performed by simply
reversing the direction and the sequence of communication
Chapter 4.1.1
Reductionon a linear array can be performed by simply
reversing the direction and the sequence of communication
Chapter 4.1.1
Consider the problem of multiplying a
matrixwith a vector.
● The n x n matrix is assigned to an n x n (virtual)
processor grid. The vector is assumed to be on the
first row of processors.
● The first step of the product requires a one-to-all
broadcast of the vector element along the
corresponding column of processors. This can be
done concurrently for all n columns.
● The processors compute local product of the vector
element and the local matrix entry.
● In the final step, the results of these products are
accumulated to the first row using n concurrent all-to-
one reduction operations along the columns (using the
sum operation).
Chapter 4.1.2
Mesh
Chapter 4.1.2
Each row and column of a
square mesh of p nodes as
a linear array of √p nodes
Chapter 4.1.2
The linear array communication operation can be made in 2 phases
In the first phase, the operation is
performed along one or all rows by
treating the rows as linear arrays.
1
Chapter 4.1.2
The linear array communication operation can be made in 2 phases
In the second phase, the columns are
treated similarly.
2
Chapter 4.1.2
Consider a one-to-all
broadcast in 2D mesh
Chapter 4.1.2
All the data first going to
√(p-1) nodes in the same
raw
Chapter 4.1.2
Once all the nodes in a
row of the mesh have
acquired the data, they
initiate a one-to-all
broadcast in their
respective columns.
Chapter 4.1.2
At the end of the second
phase, every node in the
mesh has a copy of the
initial message.
Chapter 4.1.3
Hypercube
Chapter 4.1.3
Rows of p1/3
nodes in each
of the 3D mesh would be
treated as linear arrays
Chapter 4.1.3
Likethe mesh
The process is carried out by three phases
If D dimensional mesh we need d steps
Chapter 4.1.4
source processor is the root of
this tree
In the first step, the source sends the data to the right child
(assuming the source is also the left child). The problem has now
been decomposed into two problems with half the number of
processors.
Chapter 4.1.5
Algorithm
Chapter 4.1.5
One-to-all Broadcast
● All of the algorithms described above are adaptations of the same
algorithmic template.
● We illustrate the algorithm for a hypercube, but the algorithm, as has
been seen, can be adapted to other architectures.
●
The hypercube has 2d
nodes and my_id is the label for a node.
● X is the message to be broadcast, which initially resides
at the source node 0 means 000.
● initially resides at the source node 0
● The variable mask helps determine which nodes communicate in a
particular iteration of the loop
s
Algorithm 4.1
works only if node 0 is the source of the broadcast. For an arbitrary source, we must relabel the
nodes of the hypothetical hypercube by XORing the label of each node with the label of the source
node before we apply this procedure.
General Algorithm – whatever source can be...
Cost Analysis
T = (ts + mtw)logp
4.2 All-to-All Broadcast
and Reduction
Generalization of One-to-All-Broadcast
A process sends the same m-word
message to every other process, but
different processes may broadcast
different messages.
Usage of All-to-All-Broadcast
It's used on matrix operations like
matrix multiplications and matrix-vector
multiplication
Dual is All-to-All-reduction
Figure illustration
All-to-All-Broadcast in Linear Array and Ring
While performing all-to-all broadcast on a linear array or a ring, all
communication links can be kept busy
simultaneously until the operation is complete because
each node always has some information that it can pass along to
its neighbor.
● Simplest approach: perform p one-to-all
broadcasts. This is not the most efficient
way, though.
● Each node first sends to one of its neighbors
the data it needs to broadcast.
● In subsequent steps, it forwards the data
received from one of its neighbors to its other
neighbor.
● The algorithm terminates in p-1 steps.
All-to-All Broadcast and Reduction
Algorithm for All-to-All-Broadcast in Ring
In all-to-all reduction, the dual of all-to-all broadcast
All-to-all reduction can be performed by reversing the direction and
sequence of the messages
< Example />
First communication step in all-to-all broadcast would be the last step
of all-to-all reduction.
0 sending msg[1] to 7 instead of receiving it. The only additional step required is that upon receiving a
message, a node must combine it with the local copy of the message that has the same destination as
the received message before forwarding the combined message to the next neighbor.
All-to-All-Broadcast in Mesh
Performed in two phases -
In the first phase :- each row of the mesh performs an all-to-all broadcast using
the procedure for the linear array.
In this phase, all nodes collect √p messages corresponding to the √p
nodes of their respective rows. Each node consolidates this information into a
single message of size m√p.
In second communication phase :- a columnwise all-to-all broadcast of the
consolidated messages.
Algorithm for All-to-All-Broadcast in Mesh
● Generalization of the mesh algorithm to log p dimensions.
● Message size doubles at each of the log p steps.
All-to-All-Broadcast in Hypercube
Cost Analysis
For linear array
T = (ts + mtw)(p-1)
For Mesh
Tfirst = (ts + mtw)(√p-1)
Tsecond = (ts + m√ptw)(√p-1)
Ttotal = (2ts)(√p-1) + twm(p-1)
Cost Analysis
Chapter 4.3
All-Reduce and Prefix Sum Operation
All-Reduce
Each node starts with a buffer of size m
and
The final results of the operation are identical buffers of size
m on each node that are formed by combining the original p
buffers using an associative operator
There is 2 ways to perform all-reduce
A simple method to perform all-reduce is to
perform an all-to-one reduction followed by a one-
to-all broadcast
There is 2 ways to perform all-reduce
There is a faster way to perform all-reduce by
using the communication pattern of all-to-all
broadcast
The Specialty is the message size is not increased here.
That is message size is m (m√p)
……...
Chapter 4.4
Scatter and Gather
Scatter and Gather
●
In the scatter operation, a single node sends a unique
message of size m to every other node (also called a
one-to-all personalized communication).
●
In the gather operation, a single node collects a unique
message from each node.
● While the scatter operation is fundamentally different from
broadcast, the algorithmic structure is similar, except for
differences in message sizes (messages get smaller in
scatter and stay constant in broadcast).
● The gather operation is exactly the inverse of the scatter
operation.
Example of Scatter on Hypercube
In the first communication step, the source transfers half of the messages to
one of its neighbors. In subsequent steps, each node that has some data transfers half
of it to a neighbor that has yet to receive any data. There is a total of log p
communication steps corresponding to the log p dimensions of the hypercube.
The gather operation is simply the reverse of
scatter.

More Related Content

What's hot

Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network Layer
Rahul Hada
 
Congestion control
Congestion controlCongestion control
Congestion control
Madhusudhan G
 
Shortest path algorithm
Shortest  path algorithmShortest  path algorithm
Shortest path algorithm
Subrata Kumer Paul
 
Network Layer
Network LayerNetwork Layer
Network Layer
Dr Shashikant Athawale
 
Mobile Transport layer
Mobile Transport layerMobile Transport layer
Mobile Transport layer
Pallepati Vasavi
 
AODV routing protocol
AODV routing protocolAODV routing protocol
AODV routing protocol
Varsha Anandani
 
Asynchronous data transfer
Asynchronous  data  transferAsynchronous  data  transfer
Asynchronous data transfer
NancyBeaulah_R
 
Mobile IP
Mobile IPMobile IP
Mobile IP
Mukesh Chinta
 
Three Address code
Three Address code Three Address code
Three Address code
Pooja Dixit
 
Osi reference model
Osi reference modelOsi reference model
Osi reference model
vasanthimuniasamy
 
TCP/IP and UDP protocols
TCP/IP and UDP protocolsTCP/IP and UDP protocols
TCP/IP and UDP protocols
Dawood Faheem Abbasi
 
Chapter 5: Mapping and Scheduling
Chapter  5: Mapping and SchedulingChapter  5: Mapping and Scheduling
Chapter 5: Mapping and Scheduling
Heman Pathak
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
ArendraSingh2
 
Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks
Divya Tiwari
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
RashiJoshi11
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
Mr. Vikram Singh Slathia
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
Pankaj Kumar Jain
 
proactive and reactive routing comparisons
proactive and reactive routing comparisonsproactive and reactive routing comparisons
proactive and reactive routing comparisons
ITM Universe - Vadodara
 

What's hot (20)

Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network Layer
 
Congestion control
Congestion controlCongestion control
Congestion control
 
Shortest path algorithm
Shortest  path algorithmShortest  path algorithm
Shortest path algorithm
 
Network Layer
Network LayerNetwork Layer
Network Layer
 
Mobile Transport layer
Mobile Transport layerMobile Transport layer
Mobile Transport layer
 
AODV routing protocol
AODV routing protocolAODV routing protocol
AODV routing protocol
 
Asynchronous data transfer
Asynchronous  data  transferAsynchronous  data  transfer
Asynchronous data transfer
 
Mobile IP
Mobile IPMobile IP
Mobile IP
 
Three Address code
Three Address code Three Address code
Three Address code
 
Osi reference model
Osi reference modelOsi reference model
Osi reference model
 
TCP/IP and UDP protocols
TCP/IP and UDP protocolsTCP/IP and UDP protocols
TCP/IP and UDP protocols
 
Chapter 5: Mapping and Scheduling
Chapter  5: Mapping and SchedulingChapter  5: Mapping and Scheduling
Chapter 5: Mapping and Scheduling
 
Inter Process Communication
Inter Process CommunicationInter Process Communication
Inter Process Communication
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks Routing protocols for ad hoc wireless networks
Routing protocols for ad hoc wireless networks
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
proactive and reactive routing comparisons
proactive and reactive routing comparisonsproactive and reactive routing comparisons
proactive and reactive routing comparisons
 

Viewers also liked

Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPI
Hanif Durad
 
Broadcast in Hypercube
Broadcast in HypercubeBroadcast in Hypercube
Broadcast in Hypercube
Sujith Jay Nair
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
Hanif Durad
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In HbaseGeorge Ang
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
Amir Masoud Sefidian
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
Dave Reynolds
 
Basic Communication
Basic Communication Basic Communication
Basic Communication
Sailesh Bhatia
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Ameya Waghmare
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Les 10 tendances de la communication interne et RH
Les 10 tendances de la communication interne et RH Les 10 tendances de la communication interne et RH
Les 10 tendances de la communication interne et RH
quel progrès ! - NETCO GROUP
 
Point-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIPoint-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPI
Hanif Durad
 
Communication - Process & Definition Power Point Presentation
Communication - Process & Definition Power Point PresentationCommunication - Process & Definition Power Point Presentation
Communication - Process & Definition Power Point Presentation
Satyaki Chowdhury
 
Process of communication
Process of communicationProcess of communication
Process of communicationSweetp999
 
La communication interne. outil de gestion des ressources humaines.cas de pos...
La communication interne. outil de gestion des ressources humaines.cas de pos...La communication interne. outil de gestion des ressources humaines.cas de pos...
La communication interne. outil de gestion des ressources humaines.cas de pos...yasmine345
 
COMMUNICATION PROCESS,TYPES,MODES,BARRIERS
COMMUNICATION PROCESS,TYPES,MODES,BARRIERSCOMMUNICATION PROCESS,TYPES,MODES,BARRIERS
COMMUNICATION PROCESS,TYPES,MODES,BARRIERS
Sruthi Balaji
 
Communication ppt
Communication pptCommunication ppt
Communication pptTirtha Mal
 
COMMUNICATION POWERPOINT
COMMUNICATION POWERPOINTCOMMUNICATION POWERPOINT
COMMUNICATION POWERPOINT
Andrew Schwartz
 

Viewers also liked (20)

Collective Communications in MPI
 Collective Communications in MPI Collective Communications in MPI
Collective Communications in MPI
 
Broadcast in Hypercube
Broadcast in HypercubeBroadcast in Hypercube
Broadcast in Hypercube
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In Hbase
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
 
Linked Data Hypercubes
Linked Data HypercubesLinked Data Hypercubes
Linked Data Hypercubes
 
Parallel Computing
Parallel Computing Parallel Computing
Parallel Computing
 
Basic Communication
Basic Communication Basic Communication
Basic Communication
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Les 10 tendances de la communication interne et RH
Les 10 tendances de la communication interne et RH Les 10 tendances de la communication interne et RH
Les 10 tendances de la communication interne et RH
 
Point-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIPoint-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPI
 
Communication - Process & Definition Power Point Presentation
Communication - Process & Definition Power Point PresentationCommunication - Process & Definition Power Point Presentation
Communication - Process & Definition Power Point Presentation
 
Process of communication
Process of communicationProcess of communication
Process of communication
 
La communication interne. outil de gestion des ressources humaines.cas de pos...
La communication interne. outil de gestion des ressources humaines.cas de pos...La communication interne. outil de gestion des ressources humaines.cas de pos...
La communication interne. outil de gestion des ressources humaines.cas de pos...
 
COMMUNICATION PROCESS,TYPES,MODES,BARRIERS
COMMUNICATION PROCESS,TYPES,MODES,BARRIERSCOMMUNICATION PROCESS,TYPES,MODES,BARRIERS
COMMUNICATION PROCESS,TYPES,MODES,BARRIERS
 
Communication ppt
Communication pptCommunication ppt
Communication ppt
 
COMMUNICATION POWERPOINT
COMMUNICATION POWERPOINTCOMMUNICATION POWERPOINT
COMMUNICATION POWERPOINT
 

Similar to Chapter - 04 Basic Communication Operation

Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
BaliThorat1
 
chap4_slides.ppt
chap4_slides.pptchap4_slides.ppt
chap4_slides.ppt
StrangerMe2
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
Sheena Jose
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
Jothish DL
 
All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations
Syed Zaid Irshad
 
Lecturre 07 - Chapter 05 - Basic Communications Operations
Lecturre 07 - Chapter 05 - Basic Communications  OperationsLecturre 07 - Chapter 05 - Basic Communications  Operations
Lecturre 07 - Chapter 05 - Basic Communications Operations
National College of Business Administration & Economics ( NCBA&E)
 
Optimization of Collective Communication in MPICH
Optimization of Collective Communication in MPICH Optimization of Collective Communication in MPICH
Optimization of Collective Communication in MPICH
Lino Possamai
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...
Amir Shokri
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
 
MANET Routing Protocols , a case study
MANET Routing Protocols , a case studyMANET Routing Protocols , a case study
MANET Routing Protocols , a case study
Rehan Hattab
 
Efficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETS
Efficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETSEfficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETS
Efficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETS
ijircee
 
D04622933
D04622933D04622933
D04622933
IOSR-JEN
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
IJERA Editor
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
IJCNCJournal
 
Z02417321735
Z02417321735Z02417321735
Z02417321735
IJMER
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ijac123
 
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATIONA STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
ADEIJ Journal
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
Subhajit Sahu
 

Similar to Chapter - 04 Basic Communication Operation (20)

Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
chap4_slides.ppt
chap4_slides.pptchap4_slides.ppt
chap4_slides.ppt
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations
 
Lecturre 07 - Chapter 05 - Basic Communications Operations
Lecturre 07 - Chapter 05 - Basic Communications  OperationsLecturre 07 - Chapter 05 - Basic Communications  Operations
Lecturre 07 - Chapter 05 - Basic Communications Operations
 
Optimization of Collective Communication in MPICH
Optimization of Collective Communication in MPICH Optimization of Collective Communication in MPICH
Optimization of Collective Communication in MPICH
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
MANET Routing Protocols , a case study
MANET Routing Protocols , a case studyMANET Routing Protocols , a case study
MANET Routing Protocols , a case study
 
Efficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETS
Efficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETSEfficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETS
Efficient Of Multi-Hop Relay Algorithm for Efficient Broadcasting In MANETS
 
D04622933
D04622933D04622933
D04622933
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
 
Z02417321735
Z02417321735Z02417321735
Z02417321735
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
 
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATIONA STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 

Chapter - 04 Basic Communication Operation

  • 2. Processes need to exchange data with other processes This exchange of data can significantly impact the efficiency of parallel programs by introducing interaction delays during their execution
  • 3. ts + mtw simple exchange of an m-word message between two processes running on different nodes of an interconnection network with cut-through routing. ts: latency or the startup time for the data transfer tw: per word transfer time ts & tw inversely proportional to the available bandwidthbetween the nodes
  • 4. Many interactions in practical parallel programs occur in well-defined patterns involving more than two processes. Often either all processes participate together in a single global interaction operation, or subsets of processes participate in interactions local to each subset.
  • 5. Let's Digon Different kind of interations.
  • 7. One-to-All Broadcast Parallel algorithms often require a single process to send identical data to all other processes or to a subset of them Initially, only the source process has the data of size m that needs to be broadcast. At the termination of the procedure, there are p copies of the initial data – one belonging to each process.
  • 8. All-to-One Reduction Each of the pparticipating processes starts with a buffer M containing m words. The data from all processes are combined through an associative operator and accumulated at a single destination process into one buffer of size m
  • 9. One-to-all broadcast and all-to-one reduction usage matrix-vector multiplication Gaussian elimination shortest paths vector inner product
  • 10. Chapter 4.1.1 Ring or Linear Array
  • 11. Chapter 4.1.1 Ring or Linear Array
  • 12. Chapter 4.1.1 A naive way to perform one-to-all broadcast is to sequentially send p - 1 messages from the source to the other p - 1 processes.
  • 13. Chapter 4.1.1 A naive way to perform one-to-all broadcast is to sequentially send p - 1 messages from the source to the other p - 1 processes. This is inefficientbecause the source process becomes a bottleneck only the connection between a single pair of nodes is used at a time
  • 14. Chapter 4.1.1 How to solve this bottleneck issue !
  • 15. Chapter 4.1.1 The Solution is recursive doubling
  • 16. Chapter 4.1.1 The Solution is recursive doubling The source process first sends the message to another process. Now both these processes can simultaneously send the message to two other processes that are still waiting for the message. By continuing this procedure until all the processes have received the data, the message can be broadcast in log p steps.
  • 18. Chapter 4.1.1 Reductionon a linear array can be performed by simply reversing the direction and the sequence of communication
  • 19. Chapter 4.1.1 Reductionon a linear array can be performed by simply reversing the direction and the sequence of communication
  • 20. Chapter 4.1.1 Consider the problem of multiplying a matrixwith a vector. ● The n x n matrix is assigned to an n x n (virtual) processor grid. The vector is assumed to be on the first row of processors. ● The first step of the product requires a one-to-all broadcast of the vector element along the corresponding column of processors. This can be done concurrently for all n columns. ● The processors compute local product of the vector element and the local matrix entry. ● In the final step, the results of these products are accumulated to the first row using n concurrent all-to- one reduction operations along the columns (using the sum operation).
  • 22. Chapter 4.1.2 Each row and column of a square mesh of p nodes as a linear array of √p nodes
  • 23. Chapter 4.1.2 The linear array communication operation can be made in 2 phases In the first phase, the operation is performed along one or all rows by treating the rows as linear arrays. 1
  • 24. Chapter 4.1.2 The linear array communication operation can be made in 2 phases In the second phase, the columns are treated similarly. 2
  • 25. Chapter 4.1.2 Consider a one-to-all broadcast in 2D mesh
  • 26. Chapter 4.1.2 All the data first going to √(p-1) nodes in the same raw
  • 27. Chapter 4.1.2 Once all the nodes in a row of the mesh have acquired the data, they initiate a one-to-all broadcast in their respective columns.
  • 28. Chapter 4.1.2 At the end of the second phase, every node in the mesh has a copy of the initial message.
  • 30. Chapter 4.1.3 Rows of p1/3 nodes in each of the 3D mesh would be treated as linear arrays
  • 31. Chapter 4.1.3 Likethe mesh The process is carried out by three phases If D dimensional mesh we need d steps
  • 32. Chapter 4.1.4 source processor is the root of this tree In the first step, the source sends the data to the right child (assuming the source is also the left child). The problem has now been decomposed into two problems with half the number of processors.
  • 34. Chapter 4.1.5 One-to-all Broadcast ● All of the algorithms described above are adaptations of the same algorithmic template. ● We illustrate the algorithm for a hypercube, but the algorithm, as has been seen, can be adapted to other architectures. ● The hypercube has 2d nodes and my_id is the label for a node. ● X is the message to be broadcast, which initially resides at the source node 0 means 000. ● initially resides at the source node 0 ● The variable mask helps determine which nodes communicate in a particular iteration of the loop
  • 35. s Algorithm 4.1 works only if node 0 is the source of the broadcast. For an arbitrary source, we must relabel the nodes of the hypothetical hypercube by XORing the label of each node with the label of the source node before we apply this procedure.
  • 36. General Algorithm – whatever source can be...
  • 37. Cost Analysis T = (ts + mtw)logp
  • 39. Generalization of One-to-All-Broadcast A process sends the same m-word message to every other process, but different processes may broadcast different messages.
  • 40. Usage of All-to-All-Broadcast It's used on matrix operations like matrix multiplications and matrix-vector multiplication
  • 43. All-to-All-Broadcast in Linear Array and Ring While performing all-to-all broadcast on a linear array or a ring, all communication links can be kept busy simultaneously until the operation is complete because each node always has some information that it can pass along to its neighbor.
  • 44. ● Simplest approach: perform p one-to-all broadcasts. This is not the most efficient way, though. ● Each node first sends to one of its neighbors the data it needs to broadcast. ● In subsequent steps, it forwards the data received from one of its neighbors to its other neighbor. ● The algorithm terminates in p-1 steps.
  • 47. In all-to-all reduction, the dual of all-to-all broadcast All-to-all reduction can be performed by reversing the direction and sequence of the messages < Example /> First communication step in all-to-all broadcast would be the last step of all-to-all reduction. 0 sending msg[1] to 7 instead of receiving it. The only additional step required is that upon receiving a message, a node must combine it with the local copy of the message that has the same destination as the received message before forwarding the combined message to the next neighbor.
  • 48.
  • 49. All-to-All-Broadcast in Mesh Performed in two phases - In the first phase :- each row of the mesh performs an all-to-all broadcast using the procedure for the linear array. In this phase, all nodes collect √p messages corresponding to the √p nodes of their respective rows. Each node consolidates this information into a single message of size m√p. In second communication phase :- a columnwise all-to-all broadcast of the consolidated messages.
  • 51. ● Generalization of the mesh algorithm to log p dimensions. ● Message size doubles at each of the log p steps. All-to-All-Broadcast in Hypercube
  • 52.
  • 53. Cost Analysis For linear array T = (ts + mtw)(p-1)
  • 54. For Mesh Tfirst = (ts + mtw)(√p-1) Tsecond = (ts + m√ptw)(√p-1) Ttotal = (2ts)(√p-1) + twm(p-1) Cost Analysis
  • 55. Chapter 4.3 All-Reduce and Prefix Sum Operation
  • 56. All-Reduce Each node starts with a buffer of size m and The final results of the operation are identical buffers of size m on each node that are formed by combining the original p buffers using an associative operator
  • 57. There is 2 ways to perform all-reduce A simple method to perform all-reduce is to perform an all-to-one reduction followed by a one- to-all broadcast
  • 58. There is 2 ways to perform all-reduce There is a faster way to perform all-reduce by using the communication pattern of all-to-all broadcast The Specialty is the message size is not increased here. That is message size is m (m√p)
  • 61. Scatter and Gather ● In the scatter operation, a single node sends a unique message of size m to every other node (also called a one-to-all personalized communication). ● In the gather operation, a single node collects a unique message from each node. ● While the scatter operation is fundamentally different from broadcast, the algorithmic structure is similar, except for differences in message sizes (messages get smaller in scatter and stay constant in broadcast). ● The gather operation is exactly the inverse of the scatter operation.
  • 62.
  • 63. Example of Scatter on Hypercube In the first communication step, the source transfers half of the messages to one of its neighbors. In subsequent steps, each node that has some data transfers half of it to a neighbor that has yet to receive any data. There is a total of log p communication steps corresponding to the log p dimensions of the hypercube.
  • 64. The gather operation is simply the reverse of scatter.