SlideShare a Scribd company logo
Parallel Computing
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
CSCI-UA.0480-003
MPI - III
Many slides of this
lecture are adopted
and slightly modified from:
• Gerassimos Barlas
• Peter S. Pacheco
Collective
point-to-point
Data distributions
Copyright © 2010, Elsevier Inc.
All rights Reserved
Sequential version
Different partitions of a 12-
component vector among 3 processes
• Block: Assign blocks of consecutive components to each process.
• Cyclic: Assign components in a round robin fashion.
• Block-cyclic: Use a cyclic distribution of blocks of components.
Parallel implementation of
vector addition
Copyright © 2010, Elsevier Inc.
All rights Reserved
How will you distribute parts of x[] and y[] to processes?
Scatter
• Read an entire vector on process 0
• MPI_Scatter sends the needed
components to each of the other
processes.
# data items going
to each process
Important:
• All arguments are important for the source process (process 0 in our example)
• For all other processes, only recv_buf_p, recv_count, recv_type, src_proc,
and comm are important
Reading and distributing a vector
Copyright © 2010, Elsevier Inc.
All rights Reserved
process 0 itself
also receives data.
• send_buf_p
– is not used except by the sender.
– However, it must be defined or NULL on others to make the
code correct.
– Must have at least communicator size * send_count elements
• All processes must call MPI_Scatter, not only the sender.
• send_count the number of data items sent to each process.
• recv_buf_p must have at least send_count elements
• MPI_Scatter uses block distribution
0 21 3 4 5 6 7 0
0 1 2
3 4 5
6 7 8
Process 0
Process 0
Process 1
Process 2
Gather
• MPI_Gather collects all of the
components of the vector onto process
dest process, ordered in rank order.
Important:
• All arguments are important for the destination process.
• For all other processes, only send_buf_p, send_count, send_type, dest_proc,
and comm are important
number of elements
for any single receive
number of elements
in send_buf_p
Print a distributed vector (1)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Print a distributed vector (2)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Allgather
• Concatenates the contents of each
process’ send_buf_p and stores this in
each process’ recv_buf_p.
• As usual, recv_count is the amount of data
being received from each process.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Matrix-vector multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
i-th component of y
Dot product of the ith
row of A with x.
Matrix-vector multiplication
Pseudo-code Serial Version
C style arrays
Copyright © 2010, Elsevier Inc.
All rights Reserved
stored as
Serial matrix-vector multiplication
Let’s assume x[] is distributed among the different processes
An MPI matrix-vector
multiplication function (1)
Copyright © 2010, Elsevier Inc.
All rights Reserved
An MPI matrix-vector
multiplication function (2)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Keep in mind …
• In distributed memory systems,
communication is more expensive than
computation.
• Distributing a fixed amount of data
among several messages is more
expensive than sending a single big
message.
Derived datatypes
• Used to represent any collection of data
items
• If a function that sends data knows this
information about a collection of data items,
it can collect the items from memory before
they are sent.
• A function that receives data can distribute
the items into their correct destinations in
memory when they’re received.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Derived datatypes
• A sequence of basic MPI data types
together with a displacement for each
of the data types.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Address in memory where the variables are stored
a and b are double; n is int
displacement from the beginning of the type
(We assume we start with a.)
MPI_Type create_struct
• Builds a derived datatype that consists
of individual elements that have
different basic types.
From the address of item 0an integer type that is big enough
to store an address on the system.
Number of elements in the type
Before you start using your new data type
Allows the MPI implementation to
optimize its internal representation of
the datatype for use in communication
functions.
When you are finished with your new type
This frees any additional storage used.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Example (1)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Example (2)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Example (3)
Copyright © 2010, Elsevier Inc.
All rights Reserved
The receiving end can use
the received complex data
item as if it is a structure.
MEASURING TIME IN MPI
We have seen in the past …
• time in Linux
• clock() inside your code
• Does MPI offer anything else?
Elapsed parallel time
• Returns the number of seconds that
have elapsed since some time in the
past.
Elapsed time for
the calling process
How to Sync Processes?
MPI_Barrier
• Ensures that no process will return from
calling it until every process in the
communicator has started calling it.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Let’s see how we can analyze the
performance of an MPI program
The matrix-vector multiplication
Run-times of serial and parallel
matrix-vector multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
(Seconds)
Speedups of Parallel Matrix-
Vector Multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
Efficiencies of Parallel Matrix-
Vector Multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
Conclusions
• Reducing messages is a good
performance strategy!
– Collective vs point-to-point
• Distributing a fixed amount of data
among several messages is more
expensive than sending a single big
message.
Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)

More Related Content

What's hot

Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
Ralph Vincent Regalado
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
PyData
 
Dynamic Memory Allocation
Dynamic Memory AllocationDynamic Memory Allocation
Dynamic Memory Allocation
vaani pathak
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
Paolo Tomeo
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
MLconf
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
Polad Saruxanov
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
Muhammad Hammad Waseem
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
Sudhang Shankar
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
hyunyoung Lee
 
Heap Management
Heap ManagementHeap Management
Heap Management
Jenny Galino
 
INET for Starters
INET for StartersINET for Starters
INET for Starters
Fayruz Rahma
 
Scientific Python
Scientific PythonScientific Python
Scientific Python
Eueung Mulyana
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
MLconf
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
Edureka!
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
marwa Ayad Mohamed
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
Cournapeau David
 
C dynamic ppt
C dynamic pptC dynamic ppt
C dynamic ppt
RJ Mehul Gadhiya
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
Mani Goswami
 
Memory allocation
Memory allocationMemory allocation
Memory allocation
sanya6900
 

What's hot (20)

Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
 
Dynamic Memory Allocation
Dynamic Memory AllocationDynamic Memory Allocation
Dynamic Memory Allocation
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
 
Heap Management
Heap ManagementHeap Management
Heap Management
 
INET for Starters
INET for StartersINET for Starters
INET for Starters
 
Scientific Python
Scientific PythonScientific Python
Scientific Python
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
 
C dynamic ppt
C dynamic pptC dynamic ppt
C dynamic ppt
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Memory allocation
Memory allocationMemory allocation
Memory allocation
 

Similar to MPI - 3

Distributed Memory Programming with MPI
Distributed Memory Programming with MPIDistributed Memory Programming with MPI
Distributed Memory Programming with MPI
Dilum Bandara
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
Shah Zaib
 
MPI - 2
MPI - 2MPI - 2
MPI - 2
Shah Zaib
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
SimRelokasi2
 
Parallel Computing - Lec 4
Parallel Computing - Lec 4Parallel Computing - Lec 4
Parallel Computing - Lec 4
Shah Zaib
 
Debugging With Id
Debugging With IdDebugging With Id
Debugging With Idguest215c4e
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
Divya Tiwari
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_c
Maria Cida Rosa
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
Shah Zaib
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
Aya Mahmoud
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
Deepak John
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
Unity Technologies Japan K.K.
 
Linux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/OLinux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/O
YourHelper1
 
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.pptModule-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
KAnurag2
 
Chapter 3 - Processes
Chapter 3 - ProcessesChapter 3 - Processes
Chapter 3 - Processes
Wayne Jones Jnr
 
Application of code composer studio in digital signal processing
Application of code composer studio in digital signal processingApplication of code composer studio in digital signal processing
Application of code composer studio in digital signal processing
IAEME Publication
 
3 processes
3 processes3 processes
3 processes
ari9_dutta
 
Unit 1
Unit  1Unit  1
Unit 1
donny101
 

Similar to MPI - 3 (20)

Distributed Memory Programming with MPI
Distributed Memory Programming with MPIDistributed Memory Programming with MPI
Distributed Memory Programming with MPI
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
 
MPI - 2
MPI - 2MPI - 2
MPI - 2
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
 
Parallel Computing - Lec 4
Parallel Computing - Lec 4Parallel Computing - Lec 4
Parallel Computing - Lec 4
 
Debugging With Id
Debugging With IdDebugging With Id
Debugging With Id
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_c
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
Technical Interview
Technical InterviewTechnical Interview
Technical Interview
 
Linux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/OLinux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/O
 
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.pptModule-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
 
Chapter 3 - Processes
Chapter 3 - ProcessesChapter 3 - Processes
Chapter 3 - Processes
 
Application of code composer studio in digital signal processing
Application of code composer studio in digital signal processingApplication of code composer studio in digital signal processing
Application of code composer studio in digital signal processing
 
Ch03
Ch03Ch03
Ch03
 
3 processes
3 processes3 processes
3 processes
 
Unit 1
Unit  1Unit  1
Unit 1
 

More from Shah Zaib

Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Shah Zaib
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
Shah Zaib
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
Shah Zaib
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
Shah Zaib
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2
Shah Zaib
 
Mpi collective communication operations
Mpi collective communication operationsMpi collective communication operations
Mpi collective communication operations
Shah Zaib
 

More from Shah Zaib (6)

Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2
 
Mpi collective communication operations
Mpi collective communication operationsMpi collective communication operations
Mpi collective communication operations
 

Recently uploaded

NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
Amil baba
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
eemet
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
freshgammer09
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
perweeng31
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
ArjunJain44
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
ThalapathyVijay15
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
kywwoyk
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
PinkySharma900491
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
kywwoyk
 

Recently uploaded (9)

NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 

MPI - 3

  • 1. Parallel Computing Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-UA.0480-003 MPI - III Many slides of this lecture are adopted and slightly modified from: • Gerassimos Barlas • Peter S. Pacheco
  • 3. Data distributions Copyright © 2010, Elsevier Inc. All rights Reserved Sequential version
  • 4. Different partitions of a 12- component vector among 3 processes • Block: Assign blocks of consecutive components to each process. • Cyclic: Assign components in a round robin fashion. • Block-cyclic: Use a cyclic distribution of blocks of components.
  • 5. Parallel implementation of vector addition Copyright © 2010, Elsevier Inc. All rights Reserved How will you distribute parts of x[] and y[] to processes?
  • 6. Scatter • Read an entire vector on process 0 • MPI_Scatter sends the needed components to each of the other processes. # data items going to each process Important: • All arguments are important for the source process (process 0 in our example) • For all other processes, only recv_buf_p, recv_count, recv_type, src_proc, and comm are important
  • 7. Reading and distributing a vector Copyright © 2010, Elsevier Inc. All rights Reserved process 0 itself also receives data.
  • 8. • send_buf_p – is not used except by the sender. – However, it must be defined or NULL on others to make the code correct. – Must have at least communicator size * send_count elements • All processes must call MPI_Scatter, not only the sender. • send_count the number of data items sent to each process. • recv_buf_p must have at least send_count elements • MPI_Scatter uses block distribution
  • 9. 0 21 3 4 5 6 7 0 0 1 2 3 4 5 6 7 8 Process 0 Process 0 Process 1 Process 2
  • 10. Gather • MPI_Gather collects all of the components of the vector onto process dest process, ordered in rank order. Important: • All arguments are important for the destination process. • For all other processes, only send_buf_p, send_count, send_type, dest_proc, and comm are important number of elements for any single receive number of elements in send_buf_p
  • 11. Print a distributed vector (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 12. Print a distributed vector (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 13. Allgather • Concatenates the contents of each process’ send_buf_p and stores this in each process’ recv_buf_p. • As usual, recv_count is the amount of data being received from each process. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 14. Matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved i-th component of y Dot product of the ith row of A with x.
  • 16. C style arrays Copyright © 2010, Elsevier Inc. All rights Reserved stored as
  • 17. Serial matrix-vector multiplication Let’s assume x[] is distributed among the different processes
  • 18. An MPI matrix-vector multiplication function (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 19. An MPI matrix-vector multiplication function (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 20. Keep in mind … • In distributed memory systems, communication is more expensive than computation. • Distributing a fixed amount of data among several messages is more expensive than sending a single big message.
  • 21. Derived datatypes • Used to represent any collection of data items • If a function that sends data knows this information about a collection of data items, it can collect the items from memory before they are sent. • A function that receives data can distribute the items into their correct destinations in memory when they’re received. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 22. Derived datatypes • A sequence of basic MPI data types together with a displacement for each of the data types. Copyright © 2010, Elsevier Inc. All rights Reserved Address in memory where the variables are stored a and b are double; n is int displacement from the beginning of the type (We assume we start with a.)
  • 23. MPI_Type create_struct • Builds a derived datatype that consists of individual elements that have different basic types. From the address of item 0an integer type that is big enough to store an address on the system. Number of elements in the type
  • 24. Before you start using your new data type Allows the MPI implementation to optimize its internal representation of the datatype for use in communication functions.
  • 25. When you are finished with your new type This frees any additional storage used. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 26. Example (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 27. Example (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 28. Example (3) Copyright © 2010, Elsevier Inc. All rights Reserved The receiving end can use the received complex data item as if it is a structure.
  • 30. We have seen in the past … • time in Linux • clock() inside your code • Does MPI offer anything else?
  • 31. Elapsed parallel time • Returns the number of seconds that have elapsed since some time in the past. Elapsed time for the calling process
  • 32. How to Sync Processes?
  • 33. MPI_Barrier • Ensures that no process will return from calling it until every process in the communicator has started calling it. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 34. Let’s see how we can analyze the performance of an MPI program The matrix-vector multiplication
  • 35. Run-times of serial and parallel matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved (Seconds)
  • 36. Speedups of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved
  • 37. Efficiencies of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved
  • 38. Conclusions • Reducing messages is a good performance strategy! – Collective vs point-to-point • Distributing a fixed amount of data among several messages is more expensive than sending a single big message. Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)