For DummiesFrom a DummyNgobrol Ilmiah PPIS #116 Desember, 2012M. Alfian AmrizalTohoku University
• Introduction to Parallel Computing• GPU as an Accelerator                                       2
Classical scienceNature         Observation          Theory                                       blogs.sundaymercury.net ...
Quantum chemistry                                 Cosmology                                            CFD                ...
• Supercomputer         –      The most powerful computers that can be built[2]         –      First computer “ENIAC” ⇒ 35...
CPU: The brain of thecomputer, all data isprocessed hereMemory: The computersscratch pad, programsare loaded and run hereG...
•  The free lunch is over!!                               -Heat                               -Power restriction         ...
• Multicomputers       • Multicore                              Core1      Core2  Distributed memory        Shared memory ...
• Trends in HPC system design     –    More nodes/processors/cores     –    Deep memory hierarchies     –    Non-uniform i...
• Programmers need to learn both Hardware and  Software                              Figure: Markus Pueschel              ...
• We need a powerful computer• CPU speed cannot be increased anymore• Go parallel:  – Multicomputer  – Multicore• System’s...
• Introduction to Parallel Computing• GPU as Accelerator                                       12
13
• Power is the problem  – System size is limited by power budget• Heterogeneous system is promising  – CPU + Accelerator (...
• Graphics Processing Unit (GPU)      – Originally developed for quickly generating 2D and        3D graphics, images, and...
• CPU and GPU are very different  processors  – Latency-oriented design (=speculative)  – Throughput-oriented design (=par...
• CPU and GPU are very different  processors  – Latency-oriented design (=speculative)  – Throughput-oriented design (=par...
CPU   task 1 task 2 task 3 task 4          task 1          task 2GPU          task 3          task 4                    ti...
• Speculative execution by branch prediction is      effective to shorten the execution time. But      it makes the hardwa...
• CPU has a large cache memory and  control unit• GPUs devote more hardware resources  to ALUs                            ...
• Many simple cores  – No speculation features     • Simplicity to increase the number of cores on a chip     • Fast conte...
• CPU and GPU are very different  processors  – They have own strengths and weaknesses    • CPU has few big cores to short...
[1] Levin, E. “Grand challenges to computationalscience.” Communication of the ACM32(12):1456-1457, December 1989.[2] Kauf...
Upcoming SlideShare
Loading in …5
×

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

940 views
716 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
940
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  1. 1. For DummiesFrom a DummyNgobrol Ilmiah PPIS #116 Desember, 2012M. Alfian AmrizalTohoku University
  2. 2. • Introduction to Parallel Computing• GPU as an Accelerator 2
  3. 3. Classical scienceNature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  4. 4. Quantum chemistry Cosmology CFD autoevolution.comscidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  5. 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.comallvoices.com 5
  6. 6. CPU: The brain of thecomputer, all data isprocessed hereMemory: The computersscratch pad, programsare loaded and run hereGPU: For graphicsprocessing. Used asaccelerator in HPCStorage: Hold dataand program files 6
  7. 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  8. 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  9. 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  10. 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  11. 11. • We need a powerful computer• CPU speed cannot be increased anymore• Go parallel: – Multicomputer – Multicore• System’s complexity requires programmer to learn both HW and SW 11
  12. 12. • Introduction to Parallel Computing• GPU as Accelerator 12
  13. 13. 13
  14. 14. • Power is the problem – System size is limited by power budget• Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  15. 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3]*Image from nvidia.com 15
  16. 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  17. 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  18. 18. CPU task 1 task 2 task 3 task 4 task 1 task 2GPU task 3 task 4 time vs vs 18
  19. 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) {E D C ? A = 0; } B = 0; 19
  20. 20. • CPU has a large cache memory and control unit• GPUs devote more hardware resources to ALUs 20
  21. 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  22. 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  23. 23. [1] Levin, E. “Grand challenges to computationalscience.” Communication of the ACM32(12):1456-1457, December 1989.[2] Kauffmann, William J. III, and Larry L. Smarr.Supercomputing and the Transformation.[3] Nvidia. “Doing more with less of a scarceresource.” http://www.nvidia.com/object/gcr-energy-efficiency.html 23

×