Full introduction to_parallel_computing


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • serial computation: To be run on a single computer having a single Central Processing Unit (CPU); A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.
  • Multithreading as a widespread programming and execution model allows multiple threads to exist within the context of a single process. These threads share the process' resources but are able to execute independently. The threaded programming model provides developers with a useful abstraction of concurrent execution. However, perhaps the most interesting application of the technology is when it is applied to a single process to enable parallel execution on a multiprocessor system.-----------------------------------------------------------------------------Shared memory systems (SMPs, cc-NUMAs) have a single address space -----------------------------------------------------------OpenMP is the standard for shared memory programming (compiler directives)
  • Clusters vs. MPPs The key differences between a cluster and an MPP system are: In a cluster various components or layers can change relatively independently of each other, whereas components in MPP systems are much more tightly integrated. For example, a cluster administrator can choose to upgrade the interconnect, say from fast ethernet to gigabit ethernet, just by adding new network interface cards (NICs) and switches to the cluster. On the other hand, in most cases the administrator for an MPP system cannot do such upgrades without upgrading the whole machine. A cluster decouples the development of system software from innovations in underlying hardware. Cluster management tools and parallel programming libraries can be optimized independent of the changes in the node hardware itself. This results in more mature and reliable cluster middleware software as compared to the system software layer in an MPP class system, which requires at least a major rewrite with each generation of the system hardware. An MPP usually has a single system serial number used for software licensing and support tracking. Clusters and NOW have multiple serial numbers, one for each of their constituent nodes.
  • MPI is the standard for distributed memory programming (library of subprogram calls)------------------------------------------------------------------------------In computer hardware, shared memory refers to a (typically) large block of random access memory (RAM) that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system.---------------------------------------------------------------------------------------------------------Shared memory systems (SMPs, cc-NUMAs) have a single address space Distributed memory systems have separate address spaces for each processor --------------------------------------------------------------------------------------------------Message Passing Interface (MPI) is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers. The standard defines the syntax and semantics of a core of library routines useful to a wide range of users writing portable message-passing programs in Fortran 77 or the C programming language. Several well-tested and efficient implementations of MPI include some that are free and in the public domain. These fostered the development of a parallel software industry, and there encouraged development of portable and scalable large-scale parallel applications.MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. ------------------------------------------------------------------------------------------From a programming perspective, message passing implementations usually comprise a library of subroutines. Calls to these subroutines are imbedded in source code. The programmer is responsible for determining all parallelism. Historically, a variety of message passing libraries have been available since the 1980s. These implementations differed substantially from each other making it difficult for programmers to develop portable applications. In 1992, the MPI Forum was formed with the primary goal of establishing a standard interface for message passing implementations.
  • Full introduction to_parallel_computing

    1. 1. Introduction toParallel Computing Presented by Supasit Kajkamhaeng 1
    2. 2. Computational Problem Problem ………… Instructions 2
    3. 3. Serial Computing Problem … CPU Instructions Time 3
    4. 4. What is Parallel Computing? A form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). 1 [Almasi and Gottlieb, 1989] Problem Task Problem Task Task Task Instructions … … … … CPU CPU CPU CPU 4
    5. 5. Pattern of Parallelism Data parallelism [Quinn, 2003] 2  There are independent tasks applying the same operation to different elements of a data set. for i ← 0 to 99 do a[i] = b[i] + c[i] endfor Functional Parallelism [Quinn, 2003] 2  There are independent tasks applying different operations to different data elements. a = 2, b=3 m = (a + b) / 2 n = a 2 + b2 5
    6. 6. Data Communications Task 1 Task 2 exchange 6
    7. 7. Why use Parallel Computing?  Reduce computing time  More Processor 7
    8. 8. Why use Parallel Computing? (1)  Solve larger problems  More Memory Problem Task Problem Task Task Task Instructions … … … … RAM RAM RAM RAM RAM 8
    9. 9. Parallel Computing Systems • A single machine with multi-core processors Process Memory C C C C Multithreaded C C C C P PProblem Limits of a single machine (performance, available memory) 9
    10. 10. What is Cluster? A group of linked computers, working together closely so that in many respects they from a single computer To improve performance and/or availability over that provided by a single computer 3 [Webopedia computer dictionary, 2007] High-Performance High-Availability 10
    11. 11. Cluster Architecture 11
    12. 12. Message-Passing model  The system is assumed to be a collection of processors, each with its own local memory (Distributed memory system)  A processor has direct access only to the instructions and data stored in its local memory  An interconnection network supports message passing between processors MPI Standard 2[Quinn, 2003] 12
    13. 13. Performance metricsfor parallel computing • Speedup [Kumar et al., 1994] 4  How much performance gain is achieved parallelizing a given application over a sequential implementation SP - speedup with p processors TS Sp = P Ts Tp Sp TP 4 40 15 2.67 where TS - a sequential execution time P - a number of processors TP - a parallel execution time with p processors 13
    14. 14. Speedup 5[Eijkhout, 2011] 14
    15. 15. Efficiency  A measure of processor utilization [Quinn, 2003] 2 EP - Efficiency with p processors SP P Sp Ep Ep = P 4 2 0.5 8 3 0.375  In practice, speedup is less than p and efficiency is between zero and one, depending on the degree of effectiveness with which the processors are utilized 5 [Eijkhout, 2011] 15
    16. 16. Effective factors ofParallel Performance • Portion of computation [Quinn, 2003] 2  Computations that must be performed sequentially  Computations that can be performed in parallel fs - Serial fraction of computation fp - Parallel fraction of computation TS TS 1 Sp = = = TP fs(Ts) + fp(Ts) fs + fp P P TS fs fp fs(TS) fp(Ts) 100 10% 90% 10 90 16
    17. 17. Effective factors ofParallel Performance (1) • Parallel Overhead [Barney, 2011] 6  The amount of time required to coordinate parallel tasks, as opposed to doing useful work o Task start-up time o Synchronizations o Data communications o Task termination time • Load balancing, etc. 17
    18. 18. Effective factors ofParallel Performance (2) Tp = (fs)Ts + (1 – fs)Ts + Toverhead P Sp = TS = TS TP (fs)Ts + (1 – fs)Ts + Toverhead P 18
    19. 19. Effective factors ofParallel Performance (3) Fixed Problem Size Fixed Sp = TS = TS TP (fs)Ts + (1 – fs)Ts + Toverhead P 19
    20. 20. Effective factors of Parallel Performance (4) Fixed P; Problem Size => Speedup P Sp = TS = 0 TS 0 TP (fs)Ts + (1 – fs)Ts + Toverhead P2D grid calculations 85 mins 85% 680 mins 97.84%Serial fraction 15 mins 15% 15 mins 2.16% 20
    21. 21. Case Study Hardware Configuration  Linux Cluster (4 compute nodes)  Detail of Compute node o 2x Intel Xeon 2.80 GHz (Single core) o 4 GB RAM o Gigabit Ethernet o CentOS 4.3 21
    22. 22. Case Study - CFD Parallel Fluent Processing [Junhong, 2004] 7  Run Fluent solver on two or more CPUs simultaneously to calculate a computational fluid dynamics (CFD) job 22
    23. 23. Case Study – CFD (1) Case Test #1 23
    24. 24. Case Study – CFD (2) Case Test #1 – Runtime 24
    25. 25. Case Study – CFD (3) Case Test #1 – Speedup 25
    26. 26. Case Study – CFD (4) Case Test #1 – Efficiency 26
    27. 27. Conclusion  Parallel computing help to save time of computation and solve larger problems over that provided by a single computer (sequential computing)  To use parallel computers, then software is developed with parallel programming model  Performance of parallel computing is measured with speedup and efficiency 27
    28. 28. Reference1. G.S. Almasi and A. Gottlieb. 1989. Highly Parallel Computing. The Benjamin-Cummings publishers, Redwood City, CA.2. M.J. Quinn. 2003. Parallel Programming in C with MPI and OpenMP. The McGraw-Hill Companies, Inc. NY.3. What is clustering?. Webopedia computer dictionary. Retrieved on November 7, 2007.4. V. Kumar, A. Grama, A. Gupta, and G. Karypis. 1994. Introduction to parallel computing: design and analysis of parallel algorithms. The Benjamin-Cummings publishers, Redwood City, CA.5. V. Eijkhout. 2011. Introduction to Parallel Computing. Texas Advanced Computing Center (TACC), The University of Texas at Austin.6. B. Barney. 2011. Introduction to Parallel Computing. Lawrence Livermore National Laboratory.7. Junhong, W. 2004. Parallel Fluent Processing. SVU/Academic Computing, Computer Centre, National University of Singapore. 28