Successfully reported this slideshow.

Full introduction to_parallel_computing

1,221 views

Published on

  • Be the first to comment

  • Be the first to like this

Full introduction to_parallel_computing

  1. 1. Introduction toParallel Computing Presented by Supasit Kajkamhaeng 1
  2. 2. Computational Problem Problem ………… Instructions 2
  3. 3. Serial Computing Problem … CPU Instructions Time 3
  4. 4. What is Parallel Computing? A form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). 1 [Almasi and Gottlieb, 1989] Problem Task Problem Task Task Task Instructions … … … … CPU CPU CPU CPU 4
  5. 5. Pattern of Parallelism Data parallelism [Quinn, 2003] 2  There are independent tasks applying the same operation to different elements of a data set. for i ← 0 to 99 do a[i] = b[i] + c[i] endfor Functional Parallelism [Quinn, 2003] 2  There are independent tasks applying different operations to different data elements. a = 2, b=3 m = (a + b) / 2 n = a 2 + b2 5
  6. 6. Data Communications Task 1 Task 2 exchange 6
  7. 7. Why use Parallel Computing?  Reduce computing time  More Processor 7
  8. 8. Why use Parallel Computing? (1)  Solve larger problems  More Memory Problem Task Problem Task Task Task Instructions … … … … RAM RAM RAM RAM RAM 8
  9. 9. Parallel Computing Systems • A single machine with multi-core processors Process Memory C C C C Multithreaded C C C C P PProblem Limits of a single machine (performance, available memory) 9
  10. 10. What is Cluster? A group of linked computers, working together closely so that in many respects they from a single computer To improve performance and/or availability over that provided by a single computer 3 [Webopedia computer dictionary, 2007] High-Performance High-Availability 10
  11. 11. Cluster Architecture 11
  12. 12. Message-Passing model  The system is assumed to be a collection of processors, each with its own local memory (Distributed memory system)  A processor has direct access only to the instructions and data stored in its local memory  An interconnection network supports message passing between processors MPI Standard 2[Quinn, 2003] 12
  13. 13. Performance metricsfor parallel computing • Speedup [Kumar et al., 1994] 4  How much performance gain is achieved parallelizing a given application over a sequential implementation SP - speedup with p processors TS Sp = P Ts Tp Sp TP 4 40 15 2.67 where TS - a sequential execution time P - a number of processors TP - a parallel execution time with p processors 13
  14. 14. Speedup 5[Eijkhout, 2011] 14
  15. 15. Efficiency  A measure of processor utilization [Quinn, 2003] 2 EP - Efficiency with p processors SP P Sp Ep Ep = P 4 2 0.5 8 3 0.375  In practice, speedup is less than p and efficiency is between zero and one, depending on the degree of effectiveness with which the processors are utilized 5 [Eijkhout, 2011] 15
  16. 16. Effective factors ofParallel Performance • Portion of computation [Quinn, 2003] 2  Computations that must be performed sequentially  Computations that can be performed in parallel fs - Serial fraction of computation fp - Parallel fraction of computation TS TS 1 Sp = = = TP fs(Ts) + fp(Ts) fs + fp P P TS fs fp fs(TS) fp(Ts) 100 10% 90% 10 90 16
  17. 17. Effective factors ofParallel Performance (1) • Parallel Overhead [Barney, 2011] 6  The amount of time required to coordinate parallel tasks, as opposed to doing useful work o Task start-up time o Synchronizations o Data communications o Task termination time • Load balancing, etc. 17
  18. 18. Effective factors ofParallel Performance (2) Tp = (fs)Ts + (1 – fs)Ts + Toverhead P Sp = TS = TS TP (fs)Ts + (1 – fs)Ts + Toverhead P 18
  19. 19. Effective factors ofParallel Performance (3) Fixed Problem Size Fixed Sp = TS = TS TP (fs)Ts + (1 – fs)Ts + Toverhead P 19
  20. 20. Effective factors of Parallel Performance (4) Fixed P; Problem Size => Speedup P Sp = TS = 0 TS 0 TP (fs)Ts + (1 – fs)Ts + Toverhead P2D grid calculations 85 mins 85% 680 mins 97.84%Serial fraction 15 mins 15% 15 mins 2.16% 20
  21. 21. Case Study Hardware Configuration  Linux Cluster (4 compute nodes)  Detail of Compute node o 2x Intel Xeon 2.80 GHz (Single core) o 4 GB RAM o Gigabit Ethernet o CentOS 4.3 21
  22. 22. Case Study - CFD Parallel Fluent Processing [Junhong, 2004] 7  Run Fluent solver on two or more CPUs simultaneously to calculate a computational fluid dynamics (CFD) job 22
  23. 23. Case Study – CFD (1) Case Test #1 23
  24. 24. Case Study – CFD (2) Case Test #1 – Runtime 24
  25. 25. Case Study – CFD (3) Case Test #1 – Speedup 25
  26. 26. Case Study – CFD (4) Case Test #1 – Efficiency 26
  27. 27. Conclusion  Parallel computing help to save time of computation and solve larger problems over that provided by a single computer (sequential computing)  To use parallel computers, then software is developed with parallel programming model  Performance of parallel computing is measured with speedup and efficiency 27
  28. 28. Reference1. G.S. Almasi and A. Gottlieb. 1989. Highly Parallel Computing. The Benjamin-Cummings publishers, Redwood City, CA.2. M.J. Quinn. 2003. Parallel Programming in C with MPI and OpenMP. The McGraw-Hill Companies, Inc. NY.3. What is clustering?. Webopedia computer dictionary. Retrieved on November 7, 2007.4. V. Kumar, A. Grama, A. Gupta, and G. Karypis. 1994. Introduction to parallel computing: design and analysis of parallel algorithms. The Benjamin-Cummings publishers, Redwood City, CA.5. V. Eijkhout. 2011. Introduction to Parallel Computing. Texas Advanced Computing Center (TACC), The University of Texas at Austin.6. B. Barney. 2011. Introduction to Parallel Computing. Lawrence Livermore National Laboratory.7. Junhong, W. 2004. Parallel Fluent Processing. SVU/Academic Computing, Computer Centre, National University of Singapore. 28

×