1a.1Parallel Computing andParallel ComputersITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2012. Jan 9, 2012
1a.2Parallel Computing• Using more than one computer, or a computer withmore than one processor, to solve a problem.Motives• Usually faster computation.• Very simple idea– n computers operating simultaneously can achievethe result faster– it will not be n times faster for various reasons• Other motives include: fault tolerance, larger amountof memory available, ...
1a.3Demand for Computational Speed• Continual demand for greater computationalspeed from a computer system than iscurrently possible• Areas requiring great computational speedinclude:– Numerical modeling– Simulation of scientific and engineering problems.• Computations need to be completed within a“reasonable” time period.
1a.4“Grand Challenge” ProblemsOnes that cannot be solved in a reasonableamount of time with today’s computers.Obviously, an execution time of 10 years isalways unreasonable.Grand Challenge Problem Examples• Modeling large DNA structures• Global weather forecasting• Modeling motion of astronomical bodies.
1a.5Weather Forecasting• Atmosphere modeled by dividing it into 3-dimensional cells.• Calculations of each cell repeated manytimes to model passage of time.Temperature,pressure,humidity, etc.
1a.6Global Weather Forecasting Example• Suppose whole global atmosphere divided into cells of size 1mile × 1 mile × 1 mile to a height of 10 miles (10 cells high) -about 5 × 108cells.• Suppose each calculation requires 200 floating pointoperations. In one time step, 1011floating point operationsnecessary.• To forecast weather over 7 days using 1-minute intervals, acomputer operating at 1Gflops (109floating point operations/s)takes 106seconds or over 10 days.• To perform calculation in 5 minutes requires computeroperating at 3.4 Tflops (3.4 × 1012floating point operations/sec)• Needs to be 34,000 faster.
1a.7Modeling Motion of Astronomical BodiesEach body attracted to each other body bygravitational forces. Movement of each bodypredicted by calculating total force on each body.
1a.8Modeling Motion of Astronomical Bodies• With N bodies, N - 1 forces to calculate for eachbody, or approx. N2calculations, i.e. O(N2) *• After determining new positions of bodies,calculations repeated, i.e. N2x T calculationswhere T is the number of time steps.* There is an O(N log2 N) algorithm, which we will cover in thecourse
1a.9• A galaxy might have, say, 1011stars.• Even if each calculation done in 1 ms (extremelyoptimistic figure), it takes:• 109years for one iteration using N2algorithmor• Almost a year for one iteration using the N log2 Nalgorithm assuming the calculations take the sametime (which may not be true).
1a.10Astrophysical N-body simulation by Scott Linssen (undergraduateUNC-Charlotte student).
1a.11Parallel programmingProgramming parallel computers–Has been around for more than 50years.
1a.12Gill writes in 1958:“... There is therefore nothing new in the idea of parallelprogramming, but its application to computers. Theauthor cannot believe that there will be any insuperabledifficulty in extending it to computers. It is not to beexpected that the necessary programming techniques willbe worked out overnight. Much experimenting remains tobe done. After all, the techniques that are commonlyused in programming today were only won at the cost ofconsiderable toil several years ago. In fact the advent ofparallel programming may do something to revive thepioneering spirit in programming which seems at thepresent to be degenerating into a rather dull and routineoccupation ...”Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.
1a.13Potential for parallelcomputers/parallelprogramming
1a.14Speedup Factorwhere ts is execution time on a single processor and tp isexecution time on a multiprocessor.S(p) gives increase in speed by using multiprocessor.Typically use best sequential algorithm with singleprocessor system. Underlying algorithm for parallelimplementation might be (and is usually) different.S(p) =Execution time using one processor (best sequential algorithm)Execution time using a multiprocessor with p processorststp=
1a.15Speedup factor can also be cast in termsof computational steps:Can also extend time complexity toparallel computations.S(p) =Number of computational steps using one processorNumber of parallel computational steps with p processors
1a.16Maximum SpeedupMaximum speedup usually p with p processors(linear speedup).Possible to get superlinear speedup (greaterthan p) but usually a specific reason such as:• Extra memory in multiprocessor system• Nondeterministic algorithm
1a.18Speedup factor is given by:This equation is known as Amdahl’s lawS(p) =ts p=fts + (1− f )ts /p 1 + (p − 1)f
1a.19Speedup against number of processors481216204 8 12 16 20f = 20%f = 10%f = 5%f = 0%Number of processors , p
1a.20Even with infinite number of processors, maximumspeedup limited to 1/f.ExampleWith only 5% of computation being serial, maximumspeedup is 20, irrespective of number of processors.This is a very discouraging result.Amdahl used this argument to support the design ofultra-high speed single processor systems in the1960s.
Gustafson’s lawLater, Gustafson (1988) described how theconclusion of Amdahl’s law might be overcome byconsidering the effect of increasing the problemsize.He argued that when a problem is ported onto amultiprocessor system, larger problem sizes canbe considered, that is, the same problem but with alarger number of data values.1a.21
Gustafson’s lawStarting point for Gustafson’s law is the computationon the multiprocessor rather than on the singlecomputer.In Gustafson’s analysis, parallel execution time keptconstant, which we assume to be some acceptabletime for waiting for the solution.1a.22
Gustafson’s lawParallel computation composed of fraction computed sequentiallysay f ’ and fraction that contains parallel parts,1 – f ’.Gustafson’s so-called scaled speedup fraction given by:f ’ is fraction of computation on multiprocessor that cannot beparallelized.f ’ is different to f previously, which is fraction of computation on asingle computer that cannot be parallelized.Conclusion drawn from Gustafson’s law is almost linear increasein speedup with increasing number of processors, but thefractional part f ‘ needs to remain small. 1a.23S’(p) =f ’tp + (1 – f ’)ptptp= p + (1 – p)f ’
Gustafson’s lawFor example if is 5%, the scaled speedup computesto 19.05 with 20 processors whereas with Amdahl’slaw with f = 5% the speedup computes to 10.26.Gustafson quotes results obtained in practice of veryhigh speedup close to linear on a 1024-processorhypercube.1a.24
1a.28Worst case for sequential search when solutionfound in last sub-space search. Then parallelversion offers greatest benefit, i.e.S(p)p 1–ptst∆+×t∆∞→=as ∆t tends to zero
1a.29Least advantage for parallel version whensolution found in first sub-space search ofthe sequential search, i.e.Actual speed-up depends upon whichsubspace holds solution but could beextremely large.S(p) = t∆t∆= 1
1a.30• Next question to answer is how doesone construct a computer system withmultiple processors to achieve thespeed-up?