SlideShare a Scribd company logo
1 of 23
For Dummies
From a Dummy


Ngobrol Ilmiah PPIS #1
16 Desember, 2012
M. Alfian Amrizal
Tohoku University
• Introduction to Parallel Computing
• GPU as an Accelerator




                                       2
Classical science


Nature
         Observation          Theory
                                       blogs.sundaymercury.net




                  Physical
                Experiments

                                       conserve-energy-future.com

            Numerical Simulations


              Modern science
                                                                    3
                                        SX-9 (Tohoku University)
Quantum chemistry                                 Cosmology                                            CFD




                                                                                    autoevolution.com
scidacreview.org

                                            physicsworld.com


                                 Medicine                           Material design




                   albertkents.com
                                                               solid.me.tut.ac.jp
                                                                                                              4
• Supercomputer
         –      The most powerful computers that can be built[2]
         –      First computer “ENIAC” ⇒ 350 mult/sec (1946)
         –      Todays supercomputer > 1,000,000,000 x ENIACS
         –      Todays processor speed only ~ 1,000,000 x ENIACS (?)

                          “Parallel computing”




                            cbc.ca
                                                 datacenterknowledge.com
allvoices.com                                                              5
CPU: The brain of the
computer, all data is
processed here

Memory: The computers
scratch pad, programs
are loaded and run here


GPU: For graphics
processing. Used as
accelerator in HPC


Storage: Hold data
and program files
                          6
•  The free lunch is over!!

                               -Heat
                               -Power restriction
                               -Transistor size
                               CPU arent getting
                               any faster




                                             7
• Multicomputers       • Multicore
                              Core1      Core2




  Distributed memory        Shared memory
   parallel computer       parallel computer
                       (e.g. dual core, quad core etc)
                                                         8
• Trends in HPC system design
     –    More nodes/processors/cores
     –    Deep memory hierarchies
     –    Non-uniform interconnect network
     –    Accelerators  today’s topic
                                                   N

                                            N           P
                                                             P
                                                                …
                                                               … C
                                                                C
                                        N
                                                    P
                                                            C … CC
                                                              C             A C
                                                                                  …   C
                                    N          P
                                                 P
                                                 ……
                                             PP C C            C M
                                                                               C  …   C
 N          N            N       N         P
                                         PP CCC …
                                                ……       CC
                                                           C
                                                               M
                                                                 M     …
                                                                      A C      C
                                                                                      C
                     …                  P
                                     PP CCC   ……
                                               …      CC
                                                        C             ……
                                                                    A C
                                                                         C
                                                                             C
                                                                               C
  P   C      P   C       P   C               ……      C
                                                   CC M M
                                                                  A C
                                                                     …     C
                                             …                      ………
                                    P     CC                   A C    C      C
                                  P
                                      C
                                        C
                                            …     C
                                                C MMM
                                                  MM           M    C
                                                                        C
                                                                           C M
                                                                                 M

      M          M           M
                                              M
                                               MM
                                                 M                C …   C M
                                                                        M M
                                           M         M            C …   CM
                                                 M                    M
                                               M                    M
          Good old days!                   M
          One proc. / node
          One core / proc.       Too complicated …
          Uniform network…       How can we fully exploit the potential?                  9
• Programmers need to learn both Hardware and
  Software




                              Figure: Markus Pueschel
                                                    10
• We need a powerful computer
• CPU speed cannot be increased anymore
• Go parallel:
  – Multicomputer
  – Multicore
• System’s complexity requires programmer
  to learn both HW and SW


                                       11
• Introduction to Parallel Computing
• GPU as Accelerator




                                       12
13
• Power is the problem
  – System size is limited by power budget
• Heterogeneous system is promising
  – CPU + Accelerator (=GPU)
  – CPU and GPU have their own strengths and
    weaknesses
  – CPU: few cores, high frequency (~GHz)
  – GPU: 1000 cores, low frequency (~MHz)

                                               14
• Graphics Processing Unit (GPU)
      – Originally developed for quickly generating 2D and
        3D graphics, images, and video
      – Highly parallel processor
      – GPU is more power-efficient than CPU[3]




*Image from nvidia.com                                       15
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


                  vs



                                             16
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


             vs vs



                                             17
CPU   task 1 task 2 task 3 task 4


          task 1
          task 2
GPU
          task 3
          task 4                    time




      vs vs



                                           18
• Speculative execution by branch prediction is
      effective to shorten the execution time. But
      it makes the hardware complicated


                                       A = 2;
                                       B = 3;
                                       C = A+B;
                                       D = A*B;
                                       E = A-B;
                                       if ( C > 4 )
                                       {
E   D   C   ?                            A = 0;
                                       }
                                       B = 0;
                                                      19
• CPU has a large cache memory and
  control unit
• GPUs devote more hardware resources
  to ALUs




                                        20
• Many simple cores
  – No speculation features
     • Simplicity to increase the number of cores on a chip
     • Fast context switch due to simplicity of its core design




                  comp.      memory access   comp.
     GPU Core A
                           comp.    memory access
                  context switch
                                   comp.               time




                                                                  21
• CPU and GPU are very different
  processors
  – They have own strengths and weaknesses
    • CPU has few big cores to shorten the execution
      time
    • GPU has many simple cores to increase
      throughput
  – CPU for serial execution and GPU for parallel
    execution

                         22
[1] Levin, E. “Grand challenges to computational
science.” Communication of the ACM
32(12):1456-1457, December 1989.

[2] Kauffmann, William J. III, and Larry L. Smarr.
Supercomputing and the Transformation.

[3] Nvidia. “Doing more with less of a scarce
resource.” http://www.nvidia.com/object/gcr-
energy-efficiency.html

                         23

More Related Content

Viewers also liked (8)

Sistema arterial posterior
Sistema arterial posteriorSistema arterial posterior
Sistema arterial posterior
 
10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego
 
Teoriasevolutivas
TeoriasevolutivasTeoriasevolutivas
Teoriasevolutivas
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y Arritmias
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16
 
(2015-09-16)sol
(2015-09-16)sol(2015-09-16)sol
(2015-09-16)sol
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema Muscular
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominios
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  • 1. For Dummies From a Dummy Ngobrol Ilmiah PPIS #1 16 Desember, 2012 M. Alfian Amrizal Tohoku University
  • 2. • Introduction to Parallel Computing • GPU as an Accelerator 2
  • 3. Classical science Nature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  • 4. Quantum chemistry Cosmology CFD autoevolution.com scidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  • 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.com allvoices.com 5
  • 6. CPU: The brain of the computer, all data is processed here Memory: The computers scratch pad, programs are loaded and run here GPU: For graphics processing. Used as accelerator in HPC Storage: Hold data and program files 6
  • 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  • 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  • 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  • 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  • 11. • We need a powerful computer • CPU speed cannot be increased anymore • Go parallel: – Multicomputer – Multicore • System’s complexity requires programmer to learn both HW and SW 11
  • 12. • Introduction to Parallel Computing • GPU as Accelerator 12
  • 13. 13
  • 14. • Power is the problem – System size is limited by power budget • Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  • 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3] *Image from nvidia.com 15
  • 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  • 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  • 18. CPU task 1 task 2 task 3 task 4 task 1 task 2 GPU task 3 task 4 time vs vs 18
  • 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) { E D C ? A = 0; } B = 0; 19
  • 20. • CPU has a large cache memory and control unit • GPUs devote more hardware resources to ALUs 20
  • 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  • 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  • 23. [1] Levin, E. “Grand challenges to computational science.” Communication of the ACM 32(12):1456-1457, December 1989. [2] Kauffmann, William J. III, and Larry L. Smarr. Supercomputing and the Transformation. [3] Nvidia. “Doing more with less of a scarce resource.” http://www.nvidia.com/object/gcr- energy-efficiency.html 23