SlideShare a Scribd company logo
1 of 49
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Programming
with MPI and OpenMP
Michael J. Quinn
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 7
Performance Analysis
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Learning Objectives
 Predict performance of parallel programs
 Understand barriers to higher performance
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outline
 General speedup formula
 Amdahl’s Law
 Gustafson-Barsis’ Law
 Karp-Flatt metric
 Isoefficiency metric
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Speedup Formula
time
execution
Parallel
time
execution
Sequential
Speedup 
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Execution Time Components
 Inherently sequential computations: (n)
 Potentially parallel computations: (n)
 Communication operations: (n,p)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Speedup Expression
)
,
(
/
)
(
)
(
)
(
)
(
)
,
(
p
n
p
n
n
n
n
p
n










Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
(n)/p
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
(n,p)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
(n)/p + (n,p)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Speedup Plot
“elbowing out”
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Efficiency
used
Processors
Speedup
Efficiency
time
execution
Parallel
used
Processors
time
execution
Sequential
Efficiency



Processors
Speedup
Efficiency
time
execution
Parallel
Processors
time
execution
Sequential
Efficiency



Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
0  (n,p)  1
)
,
(
)
(
)
(
)
(
)
(
)
,
(
p
n
p
n
n
p
n
n
p
n










All terms > 0  (n,p) > 0
Denominator > numerator  (n,p) < 1
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Amdahl’s Law
p
n
n
n
n
p
n
p
n
n
n
n
p
n
/
)
(
)
(
)
(
)
(
)
,
(
/
)
(
)
(
)
(
)
(
)
,
(

















Let f = (n)/((n) + (n))
p
f
f /
)
1
(
1




Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 1
 95% of a program’s execution time occurs
inside a loop that can be executed in parallel.
What is the maximum speedup we should
expect from a parallel version of the
program executing on 8 CPUs?
9
.
5
8
/
)
05
.
0
1
(
05
.
0
1





Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2
 20% of a program’s execution time is spent
within inherently sequential code. What is
the limit to the speedup achievable by a
parallel version of the program?
5
2
.
0
1
/
)
2
.
0
1
(
2
.
0
1
lim 




 p
p
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pop Quiz
 An oceanographer gives you a serial
program and asks you how much faster it
might run on 8 processors. You can only
find one function amenable to a parallel
solution. Benchmarking on a single
processor reveals 80% of the execution time
is spent inside this function. What is the
best speedup a parallel version is likely to
achieve on 8 processors?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pop Quiz
 A computer animation program generates a
feature movie frame-by-frame. Each frame
can be generated independently and is
output to its own file. If it takes 99 seconds
to render a frame and 1 second to output it,
how much speedup can be achieved by
rendering the movie on 100 processors?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Limitations of Amdahl’s Law
 Ignores (n,p)
 Overestimates speedup achievable
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Amdahl Effect
 Typically (n,p) has lower complexity than
(n)/p
 As n increases, (n)/p dominates (n,p)
 As n increases, speedup increases
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Illustration of Amdahl Effect
n = 100
n = 1,000
n = 10,000
Speedup
Processors
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Review of Amdahl’s Law
 Treats problem size as a constant
 Shows how execution time decreases as
number of processors increases
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Another Perspective
 We often use faster computers to solve
larger problem instances
 Let’s treat time as a constant and allow
problem size to increase with number of
processors
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Gustafson-Barsis’s Law
p
n
n
n
n
p
n
/
)
(
)
(
)
(
)
(
)
,
(








Let s = (n)/((n)+(n)/p)
s
p
p )
1
( 



Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Gustafson-Barsis’s Law
 Begin with parallel execution time
 Estimate sequential execution time to solve
same problem
 Problem size is an increasing function of p
 Predicts scaled speedup
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 1
 An application running on 10 processors
spends 3% of its time in serial code. What is
the scaled speedup of the application?
73
.
9
27
.
0
10
)
03
.
0
)(
10
1
(
10 






Execution on 1 CPU takes 10 times as long…
…except 9 do not have to execute serial code
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2
 What is the maximum fraction of a
program’s parallel execution time that can
be spent in serial code if it is to achieve a
scaled speedup of 7 on 8 processors?
14
.
0
)
8
1
(
8
7 



 s
s
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pop Quiz
 A parallel program executing on 32
processors spends 5% of its time in
sequential code. What is the scaled speedup
of this program?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The Karp-Flatt Metric
 Amdahl’s Law and Gustafson-Barsis’ Law
ignore (n,p)
 They can overestimate speedup or scaled
speedup
 Karp and Flatt proposed another metric
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Experimentally Determined
Serial Fraction
)
(
)
(
)
,
(
)
(
n
n
p
n
n
e







Inherently serial component
of parallel computation +
processor communication and
synchronization overhead
Single processor execution time
p
p
e
/
1
1
/
1
/
1




Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Experimentally Determined
Serial Fraction
 Takes into account parallel overhead
 Detects other sources of overhead or
inefficiency ignored in speedup model
 Process startup time
 Process synchronization time
 Imbalanced workload
 Architectural overhead
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 1
p 2 3 4 5 6 7
1.8 2.5 3.1 3.6 4.0 4.4
8
4.7

What is the primary reason for speedup of only 4.7 on 8 CPUs?
e 0.1 0.1 0.1 0.1 0.1 0.1 0.1
Since e is constant, large serial fraction is the primary reason.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2
p 2 3 4 5 6 7
1.9 2.6 3.2 3.7 4.1 4.5
8
4.7

What is the primary reason for speedup of only 4.7 on 8 CPUs?
e 0.070 0.075 0.080 0.085 0.090 0.095 0.100
Since e is steadily increasing, overhead is the primary reason.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pop Quiz
 Is this program likely to achieve a speedup of 10
on 12 processors?
p 4
3.9
8
6.5

12
?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Isoefficiency Metric
 Parallel system: parallel program executing
on a parallel computer
 Scalability of a parallel system: measure of
its ability to increase performance as
number of processors increases
 A scalable system maintains efficiency as
processors are added
 Isoefficiency: way to measure scalability
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Isoefficiency Derivation Steps
 Begin with speedup formula
 Compute total amount of overhead
 Assume efficiency remains constant
 Determine relation between sequential
execution time and overhead
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Deriving Isoefficiency Relation
)
,
(
)
(
)
1
(
)
,
( p
n
p
n
p
p
n
To 
 


Determine overhead
Substitute overhead into speedup equation
)
,
(
)
(
)
(
))
(
)
(
(
0
)
,
( p
n
T
n
n
n
n
p
p
n 


 




Substitute T(n,1) = (n) + (n). Assume efficiency is constant.
)
,
(
)
1
,
( 0 p
n
CT
n
T  Isoefficiency Relation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Scalability Function
 Suppose isoefficiency relation is n  f(p)
 Let M(n) denote memory required for
problem of size n
 M(f(p))/p shows how memory usage per
processor must increase to maintain same
efficiency
 We call M(f(p))/p the scalability function
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Meaning of Scalability Function
 To maintain efficiency when increasing p, we
must increase n
 Maximum problem size limited by available
memory, which is linear in p
 Scalability function shows how memory usage per
processor must grow to maintain efficiency
 Scalability function a constant means parallel
system is perfectly scalable
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Interpreting Scalability Function
Number of processors
Memory
needed
per
processor
Cplogp
Cp
Clogp
C
Memory Size
Can maintain
efficiency
Cannot maintain
efficiency
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 1: Reduction
 Sequential algorithm complexity
T(n,1) = (n)
 Parallel algorithm
 Computational complexity = (n/p)
 Communication complexity = (log p)
 Parallel overhead
T0(n,p) = (p log p)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Reduction (continued)
 Isoefficiency relation: n  C p log p
 We ask: To maintain same level of
efficiency, how must n increase when p
increases?
 M(n) = n
 The system has good scalability
p
C
p
p
Cp
p
p
Cp
M log
/
log
/
)
log
( 

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2: Floyd’s Algorithm
 Sequential time complexity: (n3)
 Parallel computation time: (n3/p)
 Parallel communication time: (n2log p)
 Parallel overhead: T0(n,p) = (pn2log p)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Floyd’s Algorithm (continued)
 Isoefficiency relation
n3  C(p n3 log p)  n  C p log p
 M(n) = n2
 The parallel system has poor scalability
p
p
C
p
p
p
C
p
p
Cp
M 2
2
2
2
2
log
/
log
/
)
log
( 

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 3: Finite Difference
 Sequential time complexity per iteration:
(n2)
 Parallel communication complexity per
iteration: (n/p)
 Parallel overhead: (n p)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finite Difference (continued)
 Isoefficiency relation
n2  Cnp  n  C p
 M(n) = n2
 This algorithm is perfectly scalable
2
2
/
/
)
( C
p
p
C
p
p
C
M 

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary (1/3)
 Performance terms
 Speedup
 Efficiency
 Model of speedup
 Serial component
 Parallel component
 Communication component
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary (2/3)
 What prevents linear speedup?
 Serial operations
 Communication operations
 Process start-up
 Imbalanced workloads
 Architectural limitations
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary (3/3)
 Analyzing parallel performance
 Amdahl’s Law
 Gustafson-Barsis’ Law
 Karp-Flatt metric
 Isoefficiency metric

More Related Content

Similar to chapter7.ppt

Similar to chapter7.ppt (20)

Ch1
Ch1Ch1
Ch1
 
Enterprise application performance - Understanding & Learnings
Enterprise application performance - Understanding & LearningsEnterprise application performance - Understanding & Learnings
Enterprise application performance - Understanding & Learnings
 
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
 
Speed up your Machine Learning workflows with built-in algorithms - Tel Aviv ...
Speed up your Machine Learning workflows with built-in algorithms - Tel Aviv ...Speed up your Machine Learning workflows with built-in algorithms - Tel Aviv ...
Speed up your Machine Learning workflows with built-in algorithms - Tel Aviv ...
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithms
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
 
Miriam RAM studio reliability modelling made easy
Miriam RAM studio reliability modelling made easyMiriam RAM studio reliability modelling made easy
Miriam RAM studio reliability modelling made easy
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
 
A challenge for thread parallelism on OpenFOAM
A challenge for thread parallelism on OpenFOAMA challenge for thread parallelism on OpenFOAM
A challenge for thread parallelism on OpenFOAM
 
Java/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationJava/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimization
 
I07 Simulation
I07 SimulationI07 Simulation
I07 Simulation
 
I07 Simulation
I07 SimulationI07 Simulation
I07 Simulation
 
Performance of processor.ppt
Performance of processor.pptPerformance of processor.ppt
Performance of processor.ppt
 
Fundamentals Performance Testing
Fundamentals Performance TestingFundamentals Performance Testing
Fundamentals Performance Testing
 
Amdahl`s law -Processor performance
Amdahl`s law -Processor performanceAmdahl`s law -Processor performance
Amdahl`s law -Processor performance
 
ch6.ppt
ch6.pptch6.ppt
ch6.ppt
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 

Recently uploaded

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
siskavia95
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 

Recently uploaded (20)

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 

chapter7.ppt

  • 1. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn
  • 2. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 7 Performance Analysis
  • 3. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Learning Objectives  Predict performance of parallel programs  Understand barriers to higher performance
  • 4. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Outline  General speedup formula  Amdahl’s Law  Gustafson-Barsis’ Law  Karp-Flatt metric  Isoefficiency metric
  • 5. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Speedup Formula time execution Parallel time execution Sequential Speedup 
  • 6. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Execution Time Components  Inherently sequential computations: (n)  Potentially parallel computations: (n)  Communication operations: (n,p)
  • 7. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Speedup Expression ) , ( / ) ( ) ( ) ( ) ( ) , ( p n p n n n n p n          
  • 8. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. (n)/p
  • 9. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. (n,p)
  • 10. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. (n)/p + (n,p)
  • 11. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Speedup Plot “elbowing out”
  • 12. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Efficiency used Processors Speedup Efficiency time execution Parallel used Processors time execution Sequential Efficiency    Processors Speedup Efficiency time execution Parallel Processors time execution Sequential Efficiency   
  • 13. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 0  (n,p)  1 ) , ( ) ( ) ( ) ( ) ( ) , ( p n p n n p n n p n           All terms > 0  (n,p) > 0 Denominator > numerator  (n,p) < 1
  • 14. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Amdahl’s Law p n n n n p n p n n n n p n / ) ( ) ( ) ( ) ( ) , ( / ) ( ) ( ) ( ) ( ) , (                  Let f = (n)/((n) + (n)) p f f / ) 1 ( 1    
  • 15. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 1  95% of a program’s execution time occurs inside a loop that can be executed in parallel. What is the maximum speedup we should expect from a parallel version of the program executing on 8 CPUs? 9 . 5 8 / ) 05 . 0 1 ( 05 . 0 1     
  • 16. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 2  20% of a program’s execution time is spent within inherently sequential code. What is the limit to the speedup achievable by a parallel version of the program? 5 2 . 0 1 / ) 2 . 0 1 ( 2 . 0 1 lim       p p
  • 17. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Pop Quiz  An oceanographer gives you a serial program and asks you how much faster it might run on 8 processors. You can only find one function amenable to a parallel solution. Benchmarking on a single processor reveals 80% of the execution time is spent inside this function. What is the best speedup a parallel version is likely to achieve on 8 processors?
  • 18. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Pop Quiz  A computer animation program generates a feature movie frame-by-frame. Each frame can be generated independently and is output to its own file. If it takes 99 seconds to render a frame and 1 second to output it, how much speedup can be achieved by rendering the movie on 100 processors?
  • 19. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Limitations of Amdahl’s Law  Ignores (n,p)  Overestimates speedup achievable
  • 20. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Amdahl Effect  Typically (n,p) has lower complexity than (n)/p  As n increases, (n)/p dominates (n,p)  As n increases, speedup increases
  • 21. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Illustration of Amdahl Effect n = 100 n = 1,000 n = 10,000 Speedup Processors
  • 22. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Review of Amdahl’s Law  Treats problem size as a constant  Shows how execution time decreases as number of processors increases
  • 23. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Another Perspective  We often use faster computers to solve larger problem instances  Let’s treat time as a constant and allow problem size to increase with number of processors
  • 24. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Gustafson-Barsis’s Law p n n n n p n / ) ( ) ( ) ( ) ( ) , (         Let s = (n)/((n)+(n)/p) s p p ) 1 (    
  • 25. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Gustafson-Barsis’s Law  Begin with parallel execution time  Estimate sequential execution time to solve same problem  Problem size is an increasing function of p  Predicts scaled speedup
  • 26. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 1  An application running on 10 processors spends 3% of its time in serial code. What is the scaled speedup of the application? 73 . 9 27 . 0 10 ) 03 . 0 )( 10 1 ( 10        Execution on 1 CPU takes 10 times as long… …except 9 do not have to execute serial code
  • 27. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 2  What is the maximum fraction of a program’s parallel execution time that can be spent in serial code if it is to achieve a scaled speedup of 7 on 8 processors? 14 . 0 ) 8 1 ( 8 7      s s
  • 28. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Pop Quiz  A parallel program executing on 32 processors spends 5% of its time in sequential code. What is the scaled speedup of this program?
  • 29. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The Karp-Flatt Metric  Amdahl’s Law and Gustafson-Barsis’ Law ignore (n,p)  They can overestimate speedup or scaled speedup  Karp and Flatt proposed another metric
  • 30. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Experimentally Determined Serial Fraction ) ( ) ( ) , ( ) ( n n p n n e        Inherently serial component of parallel computation + processor communication and synchronization overhead Single processor execution time p p e / 1 1 / 1 / 1    
  • 31. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Experimentally Determined Serial Fraction  Takes into account parallel overhead  Detects other sources of overhead or inefficiency ignored in speedup model  Process startup time  Process synchronization time  Imbalanced workload  Architectural overhead
  • 32. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 1 p 2 3 4 5 6 7 1.8 2.5 3.1 3.6 4.0 4.4 8 4.7  What is the primary reason for speedup of only 4.7 on 8 CPUs? e 0.1 0.1 0.1 0.1 0.1 0.1 0.1 Since e is constant, large serial fraction is the primary reason.
  • 33. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 2 p 2 3 4 5 6 7 1.9 2.6 3.2 3.7 4.1 4.5 8 4.7  What is the primary reason for speedup of only 4.7 on 8 CPUs? e 0.070 0.075 0.080 0.085 0.090 0.095 0.100 Since e is steadily increasing, overhead is the primary reason.
  • 34. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Pop Quiz  Is this program likely to achieve a speedup of 10 on 12 processors? p 4 3.9 8 6.5  12 ?
  • 35. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Isoefficiency Metric  Parallel system: parallel program executing on a parallel computer  Scalability of a parallel system: measure of its ability to increase performance as number of processors increases  A scalable system maintains efficiency as processors are added  Isoefficiency: way to measure scalability
  • 36. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Isoefficiency Derivation Steps  Begin with speedup formula  Compute total amount of overhead  Assume efficiency remains constant  Determine relation between sequential execution time and overhead
  • 37. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Deriving Isoefficiency Relation ) , ( ) ( ) 1 ( ) , ( p n p n p p n To      Determine overhead Substitute overhead into speedup equation ) , ( ) ( ) ( )) ( ) ( ( 0 ) , ( p n T n n n n p p n          Substitute T(n,1) = (n) + (n). Assume efficiency is constant. ) , ( ) 1 , ( 0 p n CT n T  Isoefficiency Relation
  • 38. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Scalability Function  Suppose isoefficiency relation is n  f(p)  Let M(n) denote memory required for problem of size n  M(f(p))/p shows how memory usage per processor must increase to maintain same efficiency  We call M(f(p))/p the scalability function
  • 39. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Meaning of Scalability Function  To maintain efficiency when increasing p, we must increase n  Maximum problem size limited by available memory, which is linear in p  Scalability function shows how memory usage per processor must grow to maintain efficiency  Scalability function a constant means parallel system is perfectly scalable
  • 40. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Interpreting Scalability Function Number of processors Memory needed per processor Cplogp Cp Clogp C Memory Size Can maintain efficiency Cannot maintain efficiency
  • 41. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 1: Reduction  Sequential algorithm complexity T(n,1) = (n)  Parallel algorithm  Computational complexity = (n/p)  Communication complexity = (log p)  Parallel overhead T0(n,p) = (p log p)
  • 42. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Reduction (continued)  Isoefficiency relation: n  C p log p  We ask: To maintain same level of efficiency, how must n increase when p increases?  M(n) = n  The system has good scalability p C p p Cp p p Cp M log / log / ) log (  
  • 43. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 2: Floyd’s Algorithm  Sequential time complexity: (n3)  Parallel computation time: (n3/p)  Parallel communication time: (n2log p)  Parallel overhead: T0(n,p) = (pn2log p)
  • 44. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Floyd’s Algorithm (continued)  Isoefficiency relation n3  C(p n3 log p)  n  C p log p  M(n) = n2  The parallel system has poor scalability p p C p p p C p p Cp M 2 2 2 2 2 log / log / ) log (  
  • 45. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example 3: Finite Difference  Sequential time complexity per iteration: (n2)  Parallel communication complexity per iteration: (n/p)  Parallel overhead: (n p)
  • 46. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finite Difference (continued)  Isoefficiency relation n2  Cnp  n  C p  M(n) = n2  This algorithm is perfectly scalable 2 2 / / ) ( C p p C p p C M  
  • 47. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (1/3)  Performance terms  Speedup  Efficiency  Model of speedup  Serial component  Parallel component  Communication component
  • 48. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (2/3)  What prevents linear speedup?  Serial operations  Communication operations  Process start-up  Imbalanced workloads  Architectural limitations
  • 49. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary (3/3)  Analyzing parallel performance  Amdahl’s Law  Gustafson-Barsis’ Law  Karp-Flatt metric  Isoefficiency metric