Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
For DummiesFrom a DummyNgobrol Ilmiah PPIS #116 Desember, 2012M. Alfian AmrizalTohoku University
• Introduction to Parallel Computing• GPU as an Accelerator                                       2
Classical scienceNature         Observation          Theory                                       blogs.sundaymercury.net ...
Quantum chemistry                                 Cosmology                                            CFD                ...
• Supercomputer         –      The most powerful computers that can be built[2]         –      First computer “ENIAC” ⇒ 35...
CPU: The brain of thecomputer, all data isprocessed hereMemory: The computersscratch pad, programsare loaded and run hereG...
•  The free lunch is over!!                               -Heat                               -Power restriction         ...
• Multicomputers       • Multicore                              Core1      Core2  Distributed memory        Shared memory ...
• Trends in HPC system design     –    More nodes/processors/cores     –    Deep memory hierarchies     –    Non-uniform i...
• Programmers need to learn both Hardware and  Software                              Figure: Markus Pueschel              ...
• We need a powerful computer• CPU speed cannot be increased anymore• Go parallel:  – Multicomputer  – Multicore• System’s...
• Introduction to Parallel Computing• GPU as an Accelerator                                       12
13
• Power is the problem  – System size is limited by power budget• Heterogeneous system is promising  – CPU + Accelerator (...
• Graphics Processing Unit (GPU)      – Originally developed for quickly generating 2D and        3D graphics, images, and...
• CPU and GPU are very different  processors  – Latency-oriented design (=speculative)  – Throughput-oriented design (=par...
• CPU and GPU are very different  processors  – Latency-oriented design (=speculative)  – Throughput-oriented design (=par...
CPU   task 1 task 2 task 3 task 4          task 1          task 2GPU          task 3          task 4                    ti...
• Speculative execution by branch prediction is      effective to shorten the execution time. But      it makes the hardwa...
• CPU has a large cache memory and  control unit• GPUs devote more hardware resources  to ALUs                            ...
• Many simple cores  – No speculation features     • Simplicity to increase the number of cores on a chip     • Fast conte...
• CPU and GPU are very different  processors  – They have own strengths and weaknesses    • CPU has few big cores to short...
[1] Levin, E. “Grand challenges to computationalscience.” Communication of the ACM32(12):1456-1457, December 1989.[2] Kauf...
Upcoming SlideShare
Loading in …5
×

Heterogeneous Parallel Computing with GPU

1,363 views

Published on

Presentasi Alfian Amrizal pada pertemuan Ngobrol Ilmiah #1 PPI Sendai, 16 Desember 2012

Published in: Education
  • Be the first to comment

Heterogeneous Parallel Computing with GPU

  1. 1. For DummiesFrom a DummyNgobrol Ilmiah PPIS #116 Desember, 2012M. Alfian AmrizalTohoku University
  2. 2. • Introduction to Parallel Computing• GPU as an Accelerator 2
  3. 3. Classical scienceNature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  4. 4. Quantum chemistry Cosmology CFD autoevolution.comscidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  5. 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.comallvoices.com 5
  6. 6. CPU: The brain of thecomputer, all data isprocessed hereMemory: The computersscratch pad, programsare loaded and run hereGPU: For graphicsprocessing. Used asaccelerator in HPCStorage: Hold dataand program files 6
  7. 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  8. 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  9. 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C M C MM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  10. 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  11. 11. • We need a powerful computer• CPU speed cannot be increased anymore• Go parallel: – Multicomputer – Multicore• System’s complexity requires programmer to learn both HW and SW 11
  12. 12. • Introduction to Parallel Computing• GPU as an Accelerator 12
  13. 13. 13
  14. 14. • Power is the problem – System size is limited by power budget• Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  15. 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3]*Image from nvidia.com 15
  16. 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  17. 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  18. 18. CPU task 1 task 2 task 3 task 4 task 1 task 2GPU task 3 task 4 time vs vs 18
  19. 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) {E D C ? A = 0; } B = 0; 19
  20. 20. • CPU has a large cache memory and control unit• GPUs devote more hardware resources to ALUs 20
  21. 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  22. 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  23. 23. [1] Levin, E. “Grand challenges to computationalscience.” Communication of the ACM32(12):1456-1457, December 1989.[2] Kauffmann, William J. III, and Larry L. Smarr.Supercomputing and the Transformation.[3] Nvidia. “Doing more with less of a scarceresource.” http://www.nvidia.com/object/gcr-energy-efficiency.html 23

×