How Much Parallelism?
CS4532 Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Slides adapted from “The Art of Multiprocessor Programming”
by Maurice Herlihy & Nir Shavit Slightly, & Dr. Srinath Perera
Why Do We Care?
 Want as much of the code as possible to
execute concurrently (in parallel)
 Larger sequential part implies reduced
performance
 Amdahl’s law: this relation is not linear…
2
Amdahl’s Law
OldExecutionTime
NewExecutionTime
3
Speedup=
…of computation given n CPUs instead of 1
Amdahl’s Law
 
p
p
n
1
1
4
Speedup=
Parallel
fraction
Sequential
fraction
Number of
processors
Example – 1
 10 processors
 60% concurrent, 40% sequential
 How close to 10-fold speedup?
5
10
6
.
0
6
.
0
1
1


Speedup=2.17=
Example – 2
 10 processors
 80% concurrent, 20% sequential
 How close to 10-fold speedup?
6
10
8
.
0
8
.
0
1
1


Speedup=3.57=
Example – 3
 10 processors
 90% concurrent, 10% sequential
 How close to 10-fold speedup?
7
10
9
.
0
9
.
0
1
1


Speedup=5.26=
Example – 4
 10 processors
 99% concurrent, 1% sequential
 How close to 10-fold speedup?
8
10
99
.
0
99
.
0
1
1


Speedup=9.17=
Speedup Against No of Processors
 Even with  no of processors, maximum speedup limited to
1/(1 – p)
 e.g., with only 5% of computation being serial, maximum
speedup is 20
9
Source:
http://wiki.ccs.tulane.edu/index.php5/
Speedup/Scaling
The Moral
 Making good use of our multiple processors
(cores) means
 Finding ways to effectively parallelize our code
 Minimize sequential parts
 It’s worth our effort to try & parallelize even these last 10% of
serial code
 Reduce idle time in which threads wait without
executing
 This is what this course is about…
 % that is not easy to make concurrent yet may have a
large impact on overall speedup
10
Costs of Parallel Programming
 Costs
 Task start-up time
 Synchronizations
 Data communications
 Software overhead imposed by parallel compilers, libraries, tools,
operating system, etc.
 Task termination time
 Parallel programs have efficiency < 1, which means it
waste resources
 For small programs, additional cost will be prohibitive
 Parallel Programming let us get faster results at the cost
of efficiency
 Let us do 1 CPU year problem within a day using more CPUs
11
Complexity
 Parallel programs are often complex than their
serial counterparts
 Complexity is measured in terms of programmers
time in different steps of lifecycle
 Design
 Coding
 Debugging
 Tuning
 Maintenance
 They should yield significant improvement to
justify the costs
 Using parallelism to achieve 10-20% gain not useful 12
Performance in General
 We can never measure the real performance of a
system
 Yet, we still try do it
 To understand a system, 2 readings are required
1. Latency – time to finish 1 instance of the problem
2. Throughput – no of instances that can be finished in a
unit time
 Does throughput = 1/ Latency?
 Examples
 Water pipe
 Car vs. bus
13
Measuring Throughput or Latency
 When to measure Latency?
 When you have only 1 instance to run
 When operation has user waiting on it (user
interactions)
 When time sensitive deadlines are involved
 e.g., real time applications like predicting a Strom as soon as
possible
 When to measure Throughput?
 When latency is not important & overall utilization is
more crucial
 Sometime we need both
14
Note on Performance Analysis
 When you measure a system, you are taking an
sample
 Central Limit Theorem
 When we draw n samples from a distribution with
mean µ & variance σ2, as sample size n increases
distribution of the sample average of these random
variables approaches normal distribution with a mean
µ & variance σ2/n irrespective of the shape of
distribution
 Confidence Interval + Error Bars
 More readings means better confidence interval
15
Confidence Interval
16
No of Samples?
 How many observations n to get an accuracy of
± r% and a confidence level of 100(1 - α)%
17
Example
 Sample mean of response time = 20 s
 Sample standard deviation = 5
 How many repetitions are needed to get
response time accurate within 1 second at
95% confidence?
 Required accuracy (r) = 1 in 20 = 5%
 z= 1.960
18
Data Presentation
 Numbers
 Average, std, min, max, percentiles
 Tables
 Enable comparisons
 Graphs
 Easy to see trends
 Enable more complex comparisons
19
Graphs
20
Error Bars & Box Plots
21
Box Plots (Cont.)
22
Graph Rules
 Use a suitable graph type for case under analysis & data
 Should have a title or caption
 Axis properly titled with units
 Independent variable always goes on x-axis
 Time always on x-axis
 Range of each axis may be different
 Tics should each be large enough to cover needed range without
lots of extra space
 No need to start at zero
 Use a key to explain colors or symbols
 Graph should fill available space
 Error bars are encouraged to indicate uncertainty in a
measurement 23

How Much Parallelism?

  • 1.
    How Much Parallelism? CS4532Concurrent Programming Dilum Bandara Dilum.Bandara@uom.lk Slides adapted from “The Art of Multiprocessor Programming” by Maurice Herlihy & Nir Shavit Slightly, & Dr. Srinath Perera
  • 2.
    Why Do WeCare?  Want as much of the code as possible to execute concurrently (in parallel)  Larger sequential part implies reduced performance  Amdahl’s law: this relation is not linear… 2
  • 3.
  • 4.
  • 5.
    Example – 1 10 processors  60% concurrent, 40% sequential  How close to 10-fold speedup? 5 10 6 . 0 6 . 0 1 1   Speedup=2.17=
  • 6.
    Example – 2 10 processors  80% concurrent, 20% sequential  How close to 10-fold speedup? 6 10 8 . 0 8 . 0 1 1   Speedup=3.57=
  • 7.
    Example – 3 10 processors  90% concurrent, 10% sequential  How close to 10-fold speedup? 7 10 9 . 0 9 . 0 1 1   Speedup=5.26=
  • 8.
    Example – 4 10 processors  99% concurrent, 1% sequential  How close to 10-fold speedup? 8 10 99 . 0 99 . 0 1 1   Speedup=9.17=
  • 9.
    Speedup Against Noof Processors  Even with  no of processors, maximum speedup limited to 1/(1 – p)  e.g., with only 5% of computation being serial, maximum speedup is 20 9 Source: http://wiki.ccs.tulane.edu/index.php5/ Speedup/Scaling
  • 10.
    The Moral  Makinggood use of our multiple processors (cores) means  Finding ways to effectively parallelize our code  Minimize sequential parts  It’s worth our effort to try & parallelize even these last 10% of serial code  Reduce idle time in which threads wait without executing  This is what this course is about…  % that is not easy to make concurrent yet may have a large impact on overall speedup 10
  • 11.
    Costs of ParallelProgramming  Costs  Task start-up time  Synchronizations  Data communications  Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.  Task termination time  Parallel programs have efficiency < 1, which means it waste resources  For small programs, additional cost will be prohibitive  Parallel Programming let us get faster results at the cost of efficiency  Let us do 1 CPU year problem within a day using more CPUs 11
  • 12.
    Complexity  Parallel programsare often complex than their serial counterparts  Complexity is measured in terms of programmers time in different steps of lifecycle  Design  Coding  Debugging  Tuning  Maintenance  They should yield significant improvement to justify the costs  Using parallelism to achieve 10-20% gain not useful 12
  • 13.
    Performance in General We can never measure the real performance of a system  Yet, we still try do it  To understand a system, 2 readings are required 1. Latency – time to finish 1 instance of the problem 2. Throughput – no of instances that can be finished in a unit time  Does throughput = 1/ Latency?  Examples  Water pipe  Car vs. bus 13
  • 14.
    Measuring Throughput orLatency  When to measure Latency?  When you have only 1 instance to run  When operation has user waiting on it (user interactions)  When time sensitive deadlines are involved  e.g., real time applications like predicting a Strom as soon as possible  When to measure Throughput?  When latency is not important & overall utilization is more crucial  Sometime we need both 14
  • 15.
    Note on PerformanceAnalysis  When you measure a system, you are taking an sample  Central Limit Theorem  When we draw n samples from a distribution with mean µ & variance σ2, as sample size n increases distribution of the sample average of these random variables approaches normal distribution with a mean µ & variance σ2/n irrespective of the shape of distribution  Confidence Interval + Error Bars  More readings means better confidence interval 15
  • 16.
  • 17.
    No of Samples? How many observations n to get an accuracy of ± r% and a confidence level of 100(1 - α)% 17
  • 18.
    Example  Sample meanof response time = 20 s  Sample standard deviation = 5  How many repetitions are needed to get response time accurate within 1 second at 95% confidence?  Required accuracy (r) = 1 in 20 = 5%  z= 1.960 18
  • 19.
    Data Presentation  Numbers Average, std, min, max, percentiles  Tables  Enable comparisons  Graphs  Easy to see trends  Enable more complex comparisons 19
  • 20.
  • 21.
    Error Bars &Box Plots 21
  • 22.
  • 23.
    Graph Rules  Usea suitable graph type for case under analysis & data  Should have a title or caption  Axis properly titled with units  Independent variable always goes on x-axis  Time always on x-axis  Range of each axis may be different  Tics should each be large enough to cover needed range without lots of extra space  No need to start at zero  Use a key to explain colors or symbols  Graph should fill available space  Error bars are encouraged to indicate uncertainty in a measurement 23