Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Performance Analysis in Parallel Programming Ezio Bartocci, PhD SUNY at Stony Brook January 13, 2011
  2. 2. AgendaToday’s agenda Serial Program Performance  An example in the cable  I/O statements  Taking times Parallel Program Performance Analysis  The cost of communication  Amdahls Law  Sources of Overhead  Scalability
  3. 3. Introductions Performance of a Serial ProgramDefinition of Running Time: Example of Trapezoidal Rule:Let be T(n) units the running time of a program. h = (b-a)/n; integral = (f(a)+f(b))/2.0; c1 x = a; for (i=1; i <=n-1; i++){The actual time depends on: x = x + h; integral = integral + f(x); c2  the hardware being used }  the programming language and compiler integral = integral * h; c1  details of the input other than its size T  n  =c1 +c 2  n−1  T  n  ≈k 1 n+k 2 linear polynomial in n k 1 n ≫k 2 T  n ≈ k 1 n a b h
  4. 4. Introductions What about the I/O ?In 33 Mhz 486 PC (linux) and gcc compiler Estimation of the timeA multiplication takes a microsecondprintf about 300 microseconds T  n  =T calc  n  +T i / o  n On faster systems, the ratios may be worse,with the arithmetic time decreasingI/O times remaining the same
  5. 5. Introductions Parallel Program Performance Analysis denotes the runtime of the parallel solution with T  n , p p processes.Speedup of a parallel program: Ambiguous definition T  n Is T  n the fastest known serial program ? S  n , p= or T  n=T  n ,1 T  n , p 0S n , p≤ p S n , p= p linear speedupEfficiency of a parallel program: Efficiency is a measure of process utilitization in a parallel program, relative to the serial program. S n , p T  n 0E n , p≤1E  n , p= = p pT  n , p E n , p=1 linear speedup E n , p1/ p slowdown
  6. 6. Introductions The Cost of CommunicationFor a reasonable estimatation of the performance of a parallelprogram, we should count also the cost of communication. T n , p=T calc n , pT i/ o n , pT comm n , pWhile the cost of sending a single message containig k units of data will be: t s k t ct s is called message latency1 is called bandwidthtc
  7. 7. IntroductionsTaking Timings#include <time.h>....clock_t c1, c2;....c1 = clock();for(j = 0; j < MATRIX_SIZE; j++){ for(i = 0; i < MATRIX_SIZE; i++){ B[i * MATRIX_SIZE + j]= A[j * MATRIX_SIZE + i] + A[j * MATRIX_SIZE + i]; }}c2 = clock();serial_comp = c2 - c1;printf("Time serial computation: %.0f millisecondsn", serial_comp /(CLOCKS_PER_SEC / 1000));
  8. 8. Introductions Amdahls Law0≤r≤1 is the fraction of the program that is perfectly parallelizable T 1 S  p= = 1−r T  rT  / p 1−r r / p dS r = ≥0 S(p) is an increasing function in p dp [1−r  pr ]2 1 lim p  ∞ S  p= 1−r Example: if r =0.5 the maximum speedup is 2if r =0.99 the maximum speedup is 100 even if we use 10000 proc.
  9. 9. Introductions Work and OverheadThe amount of work done by a serial program is simply the runtime: W  n=T   nThe amount of work done by a parallel program is the sum of theAmounts of work done by each process: p−1 W  n , p=∑q=0 W q n , pThus, an alternative definition of efficiency is: T  n W  n E n , p= = pT  n , p W  n , p
  10. 10. Introductions Work and OverheadOverhead is the amount of work done by the parallel program thatis not done by the serial program: T o n , p =W  n , p−W o n= pT  n , p−T  nper-process overhead: difference between the parallel runtime andthe ideal parallel runtime that would be obtained with linear speedup: T o n , p=T  n , p−T  n/ p T o n , p= pT o  n , pMain sources of Overhead: communication, idle time and extracomputation
  11. 11. Introductions ScalabilityAmdahls law define the upper bound on the speedupin terms of efficiency we have: T 1 E= = p1−r T  rT  p 1−r rIn terms of Amdahls law the efficiency achievable by a parallelprogram on a given problem instance will approach zero as thenumber of processes is increased.
  12. 12. Introductions ScalabilityLook this assertion in terms of overhead: T O n , p= pT  n , p−T  nEfficiency can be written as: T  n T  n 1 E n , p= = = pT  n , p T O n , pT  n T O n , p/T  n1The behavior of the efficiency as p is increased is completelydetermined by the overhead function T O n , pA parallel program is scalable if, as the number of processes (p) isincreased, we can find a rate of increase for the problem size (n)so that efficiency remains constant.