SlideShare a Scribd company logo
1 of 23
For Dummies
From a Dummy


Ngobrol Ilmiah PPIS #1
16 Desember, 2012
M. Alfian Amrizal
Tohoku University
• Introduction to Parallel Computing
• GPU as an Accelerator




                                       2
Classical science


Nature
         Observation          Theory
                                       blogs.sundaymercury.net




                  Physical
                Experiments

                                       conserve-energy-future.com

            Numerical Simulations


              Modern science
                                                                    3
                                        SX-9 (Tohoku University)
Quantum chemistry                                 Cosmology                                            CFD




                                                                                    autoevolution.com
scidacreview.org

                                            physicsworld.com


                                 Medicine                           Material design




                   albertkents.com
                                                               solid.me.tut.ac.jp
                                                                                                              4
• Supercomputer
         –      The most powerful computers that can be built[2]
         –      First computer “ENIAC” ⇒ 350 mult/sec (1946)
         –      Todays supercomputer > 1,000,000,000 x ENIACS
         –      Todays processor speed only ~ 1,000,000 x ENIACS (?)

                          “Parallel computing”




                            cbc.ca
                                                 datacenterknowledge.com
allvoices.com                                                              5
CPU: The brain of the
computer, all data is
processed here

Memory: The computers
scratch pad, programs
are loaded and run here


GPU: For graphics
processing. Used as
accelerator in HPC


Storage: Hold data
and program files
                          6
•  The free lunch is over!!

                               -Heat
                               -Power restriction
                               -Transistor size
                               CPU arent getting
                               any faster




                                             7
• Multicomputers       • Multicore
                              Core1      Core2




  Distributed memory        Shared memory
   parallel computer       parallel computer
                       (e.g. dual core, quad core etc)
                                                         8
• Trends in HPC system design
     –    More nodes/processors/cores
     –    Deep memory hierarchies
     –    Non-uniform interconnect network
     –    Accelerators  today’s topic
                                                   N

                                            N           P
                                                             P
                                                                …
                                                               … C
                                                                C
                                        N
                                                    P
                                                            C … CC
                                                              C             A C
                                                                                  …   C
                                    N          P
                                                 P
                                                 ……
                                             PP C C            C M
                                                                               C  …   C
 N          N            N       N         P
                                         PP CCC …
                                                ……       CC
                                                           C
                                                               M
                                                                 M     …
                                                                      A C      C
                                                                                      C
                     …                  P
                                     PP CCC   ……
                                               …      CC
                                                        C             ……
                                                                    A C
                                                                         C
                                                                             C
                                                                               C
  P   C      P   C       P   C               ……      C
                                                   CC M M
                                                                  A C
                                                                     …     C
                                             …                      ………
                                    P     CC                   A C    C      C
                                  P
                                      C
                                        C
                                            …     C
                                                C MMM
                                                  MM           M    C
                                                                        C
                                                                           C M
                                                                                 M

      M          M           M
                                              M
                                               MM
                                                 M                C …   C M
                                                                        M M
                                           M         M            C …   CM
                                                 M                    M
                                               M                    M
          Good old days!                   M
          One proc. / node
          One core / proc.       Too complicated …
          Uniform network…       How can we fully exploit the potential?                  9
• Programmers need to learn both Hardware and
  Software




                              Figure: Markus Pueschel
                                                    10
• We need a powerful computer
• CPU speed cannot be increased anymore
• Go parallel:
  – Multicomputer
  – Multicore
• System’s complexity requires programmer
  to learn both HW and SW


                                       11
• Introduction to Parallel Computing
• GPU as Accelerator




                                       12
13
• Power is the problem
  – System size is limited by power budget
• Heterogeneous system is promising
  – CPU + Accelerator (=GPU)
  – CPU and GPU have their own strengths and
    weaknesses
  – CPU: few cores, high frequency (~GHz)
  – GPU: 1000 cores, low frequency (~MHz)

                                               14
• Graphics Processing Unit (GPU)
      – Originally developed for quickly generating 2D and
        3D graphics, images, and video
      – Highly parallel processor
      – GPU is more power-efficient than CPU[3]




*Image from nvidia.com                                       15
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


                  vs



                                             16
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


             vs vs



                                             17
CPU   task 1 task 2 task 3 task 4


          task 1
          task 2
GPU
          task 3
          task 4                    time




      vs vs



                                           18
• Speculative execution by branch prediction is
      effective to shorten the execution time. But
      it makes the hardware complicated


                                       A = 2;
                                       B = 3;
                                       C = A+B;
                                       D = A*B;
                                       E = A-B;
                                       if ( C > 4 )
                                       {
E   D   C   ?                            A = 0;
                                       }
                                       B = 0;
                                                      19
• CPU has a large cache memory and
  control unit
• GPUs devote more hardware resources
  to ALUs




                                        20
• Many simple cores
  – No speculation features
     • Simplicity to increase the number of cores on a chip
     • Fast context switch due to simplicity of its core design




                  comp.      memory access   comp.
     GPU Core A
                           comp.    memory access
                  context switch
                                   comp.               time




                                                                  21
• CPU and GPU are very different
  processors
  – They have own strengths and weaknesses
    • CPU has few big cores to shorten the execution
      time
    • GPU has many simple cores to increase
      throughput
  – CPU for serial execution and GPU for parallel
    execution

                         22
[1] Levin, E. “Grand challenges to computational
science.” Communication of the ACM
32(12):1456-1457, December 1989.

[2] Kauffmann, William J. III, and Larry L. Smarr.
Supercomputing and the Transformation.

[3] Nvidia. “Doing more with less of a scarce
resource.” http://www.nvidia.com/object/gcr-
energy-efficiency.html

                         23

More Related Content

Viewers also liked

10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de empregoAna Cunha
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasCatalina Guajardo
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Any Flores
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularJuarez Silva
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominiosmiguel hilario
 

Viewers also liked (8)

Sistema arterial posterior
Sistema arterial posteriorSistema arterial posterior
Sistema arterial posterior
 
10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego
 
Teoriasevolutivas
TeoriasevolutivasTeoriasevolutivas
Teoriasevolutivas
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y Arritmias
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16
 
(2015-09-16)sol
(2015-09-16)sol(2015-09-16)sol
(2015-09-16)sol
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema Muscular
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominios
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  • 1. For Dummies From a Dummy Ngobrol Ilmiah PPIS #1 16 Desember, 2012 M. Alfian Amrizal Tohoku University
  • 2. • Introduction to Parallel Computing • GPU as an Accelerator 2
  • 3. Classical science Nature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  • 4. Quantum chemistry Cosmology CFD autoevolution.com scidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  • 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.com allvoices.com 5
  • 6. CPU: The brain of the computer, all data is processed here Memory: The computers scratch pad, programs are loaded and run here GPU: For graphics processing. Used as accelerator in HPC Storage: Hold data and program files 6
  • 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  • 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  • 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  • 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  • 11. • We need a powerful computer • CPU speed cannot be increased anymore • Go parallel: – Multicomputer – Multicore • System’s complexity requires programmer to learn both HW and SW 11
  • 12. • Introduction to Parallel Computing • GPU as Accelerator 12
  • 13. 13
  • 14. • Power is the problem – System size is limited by power budget • Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  • 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3] *Image from nvidia.com 15
  • 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  • 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  • 18. CPU task 1 task 2 task 3 task 4 task 1 task 2 GPU task 3 task 4 time vs vs 18
  • 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) { E D C ? A = 0; } B = 0; 19
  • 20. • CPU has a large cache memory and control unit • GPUs devote more hardware resources to ALUs 20
  • 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  • 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  • 23. [1] Levin, E. “Grand challenges to computational science.” Communication of the ACM 32(12):1456-1457, December 1989. [2] Kauffmann, William J. III, and Larry L. Smarr. Supercomputing and the Transformation. [3] Nvidia. “Doing more with less of a scarce resource.” http://www.nvidia.com/object/gcr- energy-efficiency.html 23