Like this presentation? Why not share!

Performances

by Ezio Bartocci, University Assistant at Vienna University of Technology on Aug 15, 2011

• 457 views

Views

Total Views
457
Views on SlideShare
218
Embed Views
239

Likes
0
1
0

1 Embed239

 http://www.eziobartocci.com 239

PerformancesPresentation Transcript

• Performance Analysis in Parallel Programming Ezio Bartocci, PhD SUNY at Stony Brook January 13, 2011
• AgendaToday’s agenda Serial Program Performance  An example in the cable  I/O statements  Taking times Parallel Program Performance Analysis  The cost of communication  Amdahls Law  Sources of Overhead  Scalability
• Introductions Performance of a Serial ProgramDefinition of Running Time: Example of Trapezoidal Rule:Let be T(n) units the running time of a program. h = (b-a)/n; integral = (f(a)+f(b))/2.0; c1 x = a; for (i=1; i <=n-1; i++){The actual time depends on: x = x + h; integral = integral + f(x); c2  the hardware being used }  the programming language and compiler integral = integral * h; c1  details of the input other than its size T  n  =c1 +c 2  n−1  T  n  ≈k 1 n+k 2 linear polynomial in n k 1 n ≫k 2 T  n ≈ k 1 n a b h
• Introductions What about the I/O ?In 33 Mhz 486 PC (linux) and gcc compiler Estimation of the timeA multiplication takes a microsecondprintf about 300 microseconds T  n  =T calc  n  +T i / o  n On faster systems, the ratios may be worse,with the arithmetic time decreasingI/O times remaining the same
• Introductions Parallel Program Performance Analysis denotes the runtime of the parallel solution with T  n , p p processes.Speedup of a parallel program: Ambiguous definition T  n Is T  n the fastest known serial program ? S  n , p= or T  n=T  n ,1 T  n , p 0S n , p≤ p S n , p= p linear speedupEfficiency of a parallel program: Efficiency is a measure of process utilitization in a parallel program, relative to the serial program. S n , p T  n 0E n , p≤1E  n , p= = p pT  n , p E n , p=1 linear speedup E n , p1/ p slowdown
• Introductions The Cost of CommunicationFor a reasonable estimatation of the performance of a parallelprogram, we should count also the cost of communication. T n , p=T calc n , pT i/ o n , pT comm n , pWhile the cost of sending a single message containig k units of data will be: t s k t ct s is called message latency1 is called bandwidthtc
• IntroductionsTaking Timings#include <time.h>....clock_t c1, c2;....c1 = clock();for(j = 0; j < MATRIX_SIZE; j++){ for(i = 0; i < MATRIX_SIZE; i++){ B[i * MATRIX_SIZE + j]= A[j * MATRIX_SIZE + i] + A[j * MATRIX_SIZE + i]; }}c2 = clock();serial_comp = c2 - c1;printf("Time serial computation: %.0f millisecondsn", serial_comp /(CLOCKS_PER_SEC / 1000));
• Introductions Amdahls Law0≤r≤1 is the fraction of the program that is perfectly parallelizable T 1 S  p= = 1−r T  rT  / p 1−r r / p dS r = ≥0 S(p) is an increasing function in p dp [1−r  pr ]2 1 lim p  ∞ S  p= 1−r Example: if r =0.5 the maximum speedup is 2if r =0.99 the maximum speedup is 100 even if we use 10000 proc.
• Introductions Work and OverheadThe amount of work done by a serial program is simply the runtime: W  n=T   nThe amount of work done by a parallel program is the sum of theAmounts of work done by each process: p−1 W  n , p=∑q=0 W q n , pThus, an alternative definition of efficiency is: T  n W  n E n , p= = pT  n , p W  n , p
• Introductions Work and OverheadOverhead is the amount of work done by the parallel program thatis not done by the serial program: T o n , p =W  n , p−W o n= pT  n , p−T  nper-process overhead: difference between the parallel runtime andthe ideal parallel runtime that would be obtained with linear speedup: T o n , p=T  n , p−T  n/ p T o n , p= pT o  n , pMain sources of Overhead: communication, idle time and extracomputation
• Introductions ScalabilityAmdahls law define the upper bound on the speedupin terms of efficiency we have: T 1 E= = p1−r T  rT  p 1−r rIn terms of Amdahls law the efficiency achievable by a parallelprogram on a given problem instance will approach zero as thenumber of processes is increased.
• Introductions ScalabilityLook this assertion in terms of overhead: T O n , p= pT  n , p−T  nEfficiency can be written as: T  n T  n 1 E n , p= = = pT  n , p T O n , pT  n T O n , p/T  n1The behavior of the efficiency as p is increased is completelydetermined by the overhead function T O n , pA parallel program is scalable if, as the number of processes (p) isincreased, we can find a rate of increase for the problem size (n)so that efficiency remains constant.