How much parallelism can be achieved in concurrent/parallel programming?
Amdahl’s Law in achievable performance. Costs and complexities of parallel programming. Reporting performance results using confidence intervals and various plots.
1. How Much Parallelism?
CS4532 Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Slides adapted from “The Art of Multiprocessor Programming”
by Maurice Herlihy & Nir Shavit Slightly, & Dr. Srinath Perera
2. Why Do We Care?
Want as much of the code as possible to
execute concurrently (in parallel)
Larger sequential part implies reduced
performance
Amdahl’s law: this relation is not linear…
2
5. Example – 1
10 processors
60% concurrent, 40% sequential
How close to 10-fold speedup?
5
10
6
.
0
6
.
0
1
1
Speedup=2.17=
6. Example – 2
10 processors
80% concurrent, 20% sequential
How close to 10-fold speedup?
6
10
8
.
0
8
.
0
1
1
Speedup=3.57=
7. Example – 3
10 processors
90% concurrent, 10% sequential
How close to 10-fold speedup?
7
10
9
.
0
9
.
0
1
1
Speedup=5.26=
8. Example – 4
10 processors
99% concurrent, 1% sequential
How close to 10-fold speedup?
8
10
99
.
0
99
.
0
1
1
Speedup=9.17=
9. Speedup Against No of Processors
Even with no of processors, maximum speedup limited to
1/(1 – p)
e.g., with only 5% of computation being serial, maximum
speedup is 20
9
Source:
http://wiki.ccs.tulane.edu/index.php5/
Speedup/Scaling
10. The Moral
Making good use of our multiple processors
(cores) means
Finding ways to effectively parallelize our code
Minimize sequential parts
It’s worth our effort to try & parallelize even these last 10% of
serial code
Reduce idle time in which threads wait without
executing
This is what this course is about…
% that is not easy to make concurrent yet may have a
large impact on overall speedup
10
11. Costs of Parallel Programming
Costs
Task start-up time
Synchronizations
Data communications
Software overhead imposed by parallel compilers, libraries, tools,
operating system, etc.
Task termination time
Parallel programs have efficiency < 1, which means it
waste resources
For small programs, additional cost will be prohibitive
Parallel Programming let us get faster results at the cost
of efficiency
Let us do 1 CPU year problem within a day using more CPUs
11
12. Complexity
Parallel programs are often complex than their
serial counterparts
Complexity is measured in terms of programmers
time in different steps of lifecycle
Design
Coding
Debugging
Tuning
Maintenance
They should yield significant improvement to
justify the costs
Using parallelism to achieve 10-20% gain not useful 12
13. Performance in General
We can never measure the real performance of a
system
Yet, we still try do it
To understand a system, 2 readings are required
1. Latency – time to finish 1 instance of the problem
2. Throughput – no of instances that can be finished in a
unit time
Does throughput = 1/ Latency?
Examples
Water pipe
Car vs. bus
13
14. Measuring Throughput or Latency
When to measure Latency?
When you have only 1 instance to run
When operation has user waiting on it (user
interactions)
When time sensitive deadlines are involved
e.g., real time applications like predicting a Strom as soon as
possible
When to measure Throughput?
When latency is not important & overall utilization is
more crucial
Sometime we need both
14
15. Note on Performance Analysis
When you measure a system, you are taking an
sample
Central Limit Theorem
When we draw n samples from a distribution with
mean µ & variance σ2, as sample size n increases
distribution of the sample average of these random
variables approaches normal distribution with a mean
µ & variance σ2/n irrespective of the shape of
distribution
Confidence Interval + Error Bars
More readings means better confidence interval
15
17. No of Samples?
How many observations n to get an accuracy of
± r% and a confidence level of 100(1 - α)%
17
18. Example
Sample mean of response time = 20 s
Sample standard deviation = 5
How many repetitions are needed to get
response time accurate within 1 second at
95% confidence?
Required accuracy (r) = 1 in 20 = 5%
z= 1.960
18
19. Data Presentation
Numbers
Average, std, min, max, percentiles
Tables
Enable comparisons
Graphs
Easy to see trends
Enable more complex comparisons
19
23. Graph Rules
Use a suitable graph type for case under analysis & data
Should have a title or caption
Axis properly titled with units
Independent variable always goes on x-axis
Time always on x-axis
Range of each axis may be different
Tics should each be large enough to cover needed range without
lots of extra space
No need to start at zero
Use a key to explain colors or symbols
Graph should fill available space
Error bars are encouraged to indicate uncertainty in a
measurement 23