Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

1,005 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  1. 1. For DummiesFrom a DummyNgobrol Ilmiah PPIS #116 Desember, 2012M. Alfian AmrizalTohoku University
  2. 2. • Introduction to Parallel Computing• GPU as an Accelerator 2
  3. 3. Classical scienceNature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  4. 4. Quantum chemistry Cosmology CFD autoevolution.comscidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  5. 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.comallvoices.com 5
  6. 6. CPU: The brain of thecomputer, all data isprocessed hereMemory: The computersscratch pad, programsare loaded and run hereGPU: For graphicsprocessing. Used asaccelerator in HPCStorage: Hold dataand program files 6
  7. 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  8. 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  9. 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  10. 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  11. 11. • We need a powerful computer• CPU speed cannot be increased anymore• Go parallel: – Multicomputer – Multicore• System’s complexity requires programmer to learn both HW and SW 11
  12. 12. • Introduction to Parallel Computing• GPU as Accelerator 12
  13. 13. 13
  14. 14. • Power is the problem – System size is limited by power budget• Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  15. 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3]*Image from nvidia.com 15
  16. 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  17. 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  18. 18. CPU task 1 task 2 task 3 task 4 task 1 task 2GPU task 3 task 4 time vs vs 18
  19. 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) {E D C ? A = 0; } B = 0; 19
  20. 20. • CPU has a large cache memory and control unit• GPUs devote more hardware resources to ALUs 20
  21. 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  22. 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  23. 23. [1] Levin, E. “Grand challenges to computationalscience.” Communication of the ACM32(12):1456-1457, December 1989.[2] Kauffmann, William J. III, and Larry L. Smarr.Supercomputing and the Transformation.[3] Nvidia. “Doing more with less of a scarceresource.” http://www.nvidia.com/object/gcr-energy-efficiency.html 23

×