SlideShare a Scribd company logo
1 of 38
Download to read offline
Parallel Computing
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
CSCI-UA.0480-003
MPI - III
Many slides of this
lecture are adopted
and slightly modified from:
• Gerassimos Barlas
• Peter S. Pacheco
Collective
point-to-point
Data distributions
Copyright © 2010, Elsevier Inc.
All rights Reserved
Sequential version
Different partitions of a 12-
component vector among 3 processes
• Block: Assign blocks of consecutive components to each process.
• Cyclic: Assign components in a round robin fashion.
• Block-cyclic: Use a cyclic distribution of blocks of components.
Parallel implementation of
vector addition
Copyright © 2010, Elsevier Inc.
All rights Reserved
How will you distribute parts of x[] and y[] to processes?
Scatter
• Read an entire vector on process 0
• MPI_Scatter sends the needed
components to each of the other
processes.
# data items going
to each process
Important:
• All arguments are important for the source process (process 0 in our example)
• For all other processes, only recv_buf_p, recv_count, recv_type, src_proc,
and comm are important
Reading and distributing a vector
Copyright © 2010, Elsevier Inc.
All rights Reserved
process 0 itself
also receives data.
• send_buf_p
– is not used except by the sender.
– However, it must be defined or NULL on others to make the
code correct.
– Must have at least communicator size * send_count elements
• All processes must call MPI_Scatter, not only the sender.
• send_count the number of data items sent to each process.
• recv_buf_p must have at least send_count elements
• MPI_Scatter uses block distribution
0 21 3 4 5 6 7 0
0 1 2
3 4 5
6 7 8
Process 0
Process 0
Process 1
Process 2
Gather
• MPI_Gather collects all of the
components of the vector onto process
dest process, ordered in rank order.
Important:
• All arguments are important for the destination process.
• For all other processes, only send_buf_p, send_count, send_type, dest_proc,
and comm are important
number of elements
for any single receive
number of elements
in send_buf_p
Print a distributed vector (1)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Print a distributed vector (2)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Allgather
• Concatenates the contents of each
process’ send_buf_p and stores this in
each process’ recv_buf_p.
• As usual, recv_count is the amount of data
being received from each process.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Matrix-vector multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
i-th component of y
Dot product of the ith
row of A with x.
Matrix-vector multiplication
Pseudo-code Serial Version
C style arrays
Copyright © 2010, Elsevier Inc.
All rights Reserved
stored as
Serial matrix-vector multiplication
Let’s assume x[] is distributed among the different processes
An MPI matrix-vector
multiplication function (1)
Copyright © 2010, Elsevier Inc.
All rights Reserved
An MPI matrix-vector
multiplication function (2)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Keep in mind …
• In distributed memory systems,
communication is more expensive than
computation.
• Distributing a fixed amount of data
among several messages is more
expensive than sending a single big
message.
Derived datatypes
• Used to represent any collection of data
items
• If a function that sends data knows this
information about a collection of data items,
it can collect the items from memory before
they are sent.
• A function that receives data can distribute
the items into their correct destinations in
memory when they’re received.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Derived datatypes
• A sequence of basic MPI data types
together with a displacement for each
of the data types.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Address in memory where the variables are stored
a and b are double; n is int
displacement from the beginning of the type
(We assume we start with a.)
MPI_Type create_struct
• Builds a derived datatype that consists
of individual elements that have
different basic types.
From the address of item 0an integer type that is big enough
to store an address on the system.
Number of elements in the type
Before you start using your new data type
Allows the MPI implementation to
optimize its internal representation of
the datatype for use in communication
functions.
When you are finished with your new type
This frees any additional storage used.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Example (1)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Example (2)
Copyright © 2010, Elsevier Inc.
All rights Reserved
Example (3)
Copyright © 2010, Elsevier Inc.
All rights Reserved
The receiving end can use
the received complex data
item as if it is a structure.
MEASURING TIME IN MPI
We have seen in the past …
• time in Linux
• clock() inside your code
• Does MPI offer anything else?
Elapsed parallel time
• Returns the number of seconds that
have elapsed since some time in the
past.
Elapsed time for
the calling process
How to Sync Processes?
MPI_Barrier
• Ensures that no process will return from
calling it until every process in the
communicator has started calling it.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Let’s see how we can analyze the
performance of an MPI program
The matrix-vector multiplication
Run-times of serial and parallel
matrix-vector multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
(Seconds)
Speedups of Parallel Matrix-
Vector Multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
Efficiencies of Parallel Matrix-
Vector Multiplication
Copyright © 2010, Elsevier Inc.
All rights Reserved
Conclusions
• Reducing messages is a good
performance strategy!
– Collective vs point-to-point
• Distributing a fixed amount of data
among several messages is more
expensive than sending a single big
message.
Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)

More Related Content

What's hot

Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...PyData
 
Dynamic Memory Allocation
Dynamic Memory AllocationDynamic Memory Allocation
Dynamic Memory Allocationvaani pathak
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowPaolo Tomeo
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologiesPolad Saruxanov
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Muhammad Hammad Waseem
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflowhyunyoung Lee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...Edureka!
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installationmarwa Ayad Mohamed
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
Memory allocation
Memory allocationMemory allocation
Memory allocationsanya6900
 

What's hot (20)

Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
 
Dynamic Memory Allocation
Dynamic Memory AllocationDynamic Memory Allocation
Dynamic Memory Allocation
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Intellectual technologies
Intellectual technologiesIntellectual technologies
Intellectual technologies
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
How to use tensorflow
How to use tensorflowHow to use tensorflow
How to use tensorflow
 
Heap Management
Heap ManagementHeap Management
Heap Management
 
INET for Starters
INET for StartersINET for Starters
INET for Starters
 
Scientific Python
Scientific PythonScientific Python
Scientific Python
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
 
C dynamic ppt
C dynamic pptC dynamic ppt
C dynamic ppt
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Memory allocation
Memory allocationMemory allocation
Memory allocation
 

Similar to MPI - 3

Distributed Memory Programming with MPI
Distributed Memory Programming with MPIDistributed Memory Programming with MPI
Distributed Memory Programming with MPIDilum Bandara
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptxSimRelokasi2
 
Parallel Computing - Lec 4
Parallel Computing - Lec 4Parallel Computing - Lec 4
Parallel Computing - Lec 4Shah Zaib
 
Debugging With Id
Debugging With IdDebugging With Id
Debugging With Idguest215c4e
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMPDivya Tiwari
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_cMaria Cida Rosa
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6Shah Zaib
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsAya Mahmoud
 
Distributed computing
Distributed computingDistributed computing
Distributed computingDeepak John
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニックUnity Technologies Japan K.K.
 
Linux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/OLinux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/OYourHelper1
 
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.pptModule-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.pptKAnurag2
 
Application of code composer studio in digital signal processing
Application of code composer studio in digital signal processingApplication of code composer studio in digital signal processing
Application of code composer studio in digital signal processingIAEME Publication
 

Similar to MPI - 3 (20)

Distributed Memory Programming with MPI
Distributed Memory Programming with MPIDistributed Memory Programming with MPI
Distributed Memory Programming with MPI
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
 
MPI - 2
MPI - 2MPI - 2
MPI - 2
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
 
Parallel Computing - Lec 4
Parallel Computing - Lec 4Parallel Computing - Lec 4
Parallel Computing - Lec 4
 
Debugging With Id
Debugging With IdDebugging With Id
Debugging With Id
 
Programming using MPI and OpenMP
Programming using MPI and OpenMPProgramming using MPI and OpenMP
Programming using MPI and OpenMP
 
Safetty systems intro_embedded_c
Safetty systems intro_embedded_cSafetty systems intro_embedded_c
Safetty systems intro_embedded_c
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
Technical Interview
Technical InterviewTechnical Interview
Technical Interview
 
Linux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/OLinux System Programming - Advanced File I/O
Linux System Programming - Advanced File I/O
 
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.pptModule-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
Module-6 process managedf;jsovj;ksdv;sdkvnksdnvldknvlkdfsment.ppt
 
Chapter 3 - Processes
Chapter 3 - ProcessesChapter 3 - Processes
Chapter 3 - Processes
 
Application of code composer studio in digital signal processing
Application of code composer studio in digital signal processingApplication of code composer studio in digital signal processing
Application of code composer studio in digital signal processing
 
Ch03
Ch03Ch03
Ch03
 
3 processes
3 processes3 processes
3 processes
 
Unit 1
Unit  1Unit  1
Unit 1
 

More from Shah Zaib

Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5Shah Zaib
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3Shah Zaib
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2Shah Zaib
 
Mpi collective communication operations
Mpi collective communication operationsMpi collective communication operations
Mpi collective communication operationsShah Zaib
 

More from Shah Zaib (6)

Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2
 
Mpi collective communication operations
Mpi collective communication operationsMpi collective communication operations
Mpi collective communication operations
 

Recently uploaded

Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...samsungultra782445
 
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信oopacde
 
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in DammamAbortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammamahmedjiabur940
 
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一uodye
 
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证tufbav
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证wpkuukw
 
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证ehyxf
 
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证wpkuukw
 
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime GuwahatiGuwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahatimeghakumariji156
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...drmarathore
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一ougvy
 
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证ehyxf
 
CRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptx
CRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptxCRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptx
CRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptxRishabh332761
 
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制uodye
 
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证ehyxf
 
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Hilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptxHilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptxhiredepot6
 
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证wpkuukw
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理uodye
 

Recently uploaded (20)

Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
Abortion pills in Jeddah +966572737505 <> buy cytotec <> unwanted kit Saudi A...
 
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
在线办理(scu毕业证)南十字星大学毕业证电子版学位证书注册证明信
 
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in DammamAbortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
 
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
在线制作(UQ毕业证书)昆士兰大学毕业证成绩单原版一比一
 
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
怎样办理斯威本科技大学毕业证(SUT毕业证书)成绩单留信认证
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
 
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
怎样办理阿德莱德大学毕业证(Adelaide毕业证书)成绩单留信认证
 
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
一比一定(购)UNITEC理工学院毕业证(UNITEC毕业证)成绩单学位证
 
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime GuwahatiGuwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
Guwahati Escorts Service Girl ^ 9332606886, WhatsApp Anytime Guwahati
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
 
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
怎样办理昆士兰大学毕业证(UQ毕业证书)成绩单留信认证
 
CRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptx
CRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptxCRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptx
CRISIS COMMUNICATION presentation=-Rishabh(11195)-group ppt (4).pptx
 
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
一比一原版(Otago毕业证书)奥塔哥理工学院毕业证成绩单学位证靠谱定制
 
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
 
Critical Commentary Social Work Ethics.pptx
Critical Commentary Social Work Ethics.pptxCritical Commentary Social Work Ethics.pptx
Critical Commentary Social Work Ethics.pptx
 
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Palghar [ 7014168258 ] Call Me For Genuine Models W...
 
Hilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptxHilti's Latest Battery - Hire Depot.pptx
Hilti's Latest Battery - Hire Depot.pptx
 
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
一比一定(购)国立南方理工学院毕业证(Southern毕业证)成绩单学位证
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
 

MPI - 3

  • 1. Parallel Computing Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-UA.0480-003 MPI - III Many slides of this lecture are adopted and slightly modified from: • Gerassimos Barlas • Peter S. Pacheco
  • 3. Data distributions Copyright © 2010, Elsevier Inc. All rights Reserved Sequential version
  • 4. Different partitions of a 12- component vector among 3 processes • Block: Assign blocks of consecutive components to each process. • Cyclic: Assign components in a round robin fashion. • Block-cyclic: Use a cyclic distribution of blocks of components.
  • 5. Parallel implementation of vector addition Copyright © 2010, Elsevier Inc. All rights Reserved How will you distribute parts of x[] and y[] to processes?
  • 6. Scatter • Read an entire vector on process 0 • MPI_Scatter sends the needed components to each of the other processes. # data items going to each process Important: • All arguments are important for the source process (process 0 in our example) • For all other processes, only recv_buf_p, recv_count, recv_type, src_proc, and comm are important
  • 7. Reading and distributing a vector Copyright © 2010, Elsevier Inc. All rights Reserved process 0 itself also receives data.
  • 8. • send_buf_p – is not used except by the sender. – However, it must be defined or NULL on others to make the code correct. – Must have at least communicator size * send_count elements • All processes must call MPI_Scatter, not only the sender. • send_count the number of data items sent to each process. • recv_buf_p must have at least send_count elements • MPI_Scatter uses block distribution
  • 9. 0 21 3 4 5 6 7 0 0 1 2 3 4 5 6 7 8 Process 0 Process 0 Process 1 Process 2
  • 10. Gather • MPI_Gather collects all of the components of the vector onto process dest process, ordered in rank order. Important: • All arguments are important for the destination process. • For all other processes, only send_buf_p, send_count, send_type, dest_proc, and comm are important number of elements for any single receive number of elements in send_buf_p
  • 11. Print a distributed vector (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 12. Print a distributed vector (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 13. Allgather • Concatenates the contents of each process’ send_buf_p and stores this in each process’ recv_buf_p. • As usual, recv_count is the amount of data being received from each process. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 14. Matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved i-th component of y Dot product of the ith row of A with x.
  • 16. C style arrays Copyright © 2010, Elsevier Inc. All rights Reserved stored as
  • 17. Serial matrix-vector multiplication Let’s assume x[] is distributed among the different processes
  • 18. An MPI matrix-vector multiplication function (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 19. An MPI matrix-vector multiplication function (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 20. Keep in mind … • In distributed memory systems, communication is more expensive than computation. • Distributing a fixed amount of data among several messages is more expensive than sending a single big message.
  • 21. Derived datatypes • Used to represent any collection of data items • If a function that sends data knows this information about a collection of data items, it can collect the items from memory before they are sent. • A function that receives data can distribute the items into their correct destinations in memory when they’re received. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 22. Derived datatypes • A sequence of basic MPI data types together with a displacement for each of the data types. Copyright © 2010, Elsevier Inc. All rights Reserved Address in memory where the variables are stored a and b are double; n is int displacement from the beginning of the type (We assume we start with a.)
  • 23. MPI_Type create_struct • Builds a derived datatype that consists of individual elements that have different basic types. From the address of item 0an integer type that is big enough to store an address on the system. Number of elements in the type
  • 24. Before you start using your new data type Allows the MPI implementation to optimize its internal representation of the datatype for use in communication functions.
  • 25. When you are finished with your new type This frees any additional storage used. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 26. Example (1) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 27. Example (2) Copyright © 2010, Elsevier Inc. All rights Reserved
  • 28. Example (3) Copyright © 2010, Elsevier Inc. All rights Reserved The receiving end can use the received complex data item as if it is a structure.
  • 30. We have seen in the past … • time in Linux • clock() inside your code • Does MPI offer anything else?
  • 31. Elapsed parallel time • Returns the number of seconds that have elapsed since some time in the past. Elapsed time for the calling process
  • 32. How to Sync Processes?
  • 33. MPI_Barrier • Ensures that no process will return from calling it until every process in the communicator has started calling it. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 34. Let’s see how we can analyze the performance of an MPI program The matrix-vector multiplication
  • 35. Run-times of serial and parallel matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved (Seconds)
  • 36. Speedups of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved
  • 37. Efficiencies of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved
  • 38. Conclusions • Reducing messages is a good performance strategy! – Collective vs point-to-point • Distributing a fixed amount of data among several messages is more expensive than sending a single big message. Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)