Your SlideShare is downloading. ×
0
THE GPGPU CONTINUUM

Ofer Rosenberg

The GPU continuum workshop, April 25 2013
CONTENT
• Intel’s Compute Continuum
• GPGPU Evolution
• The GPGPU Continuum
• Mobile GPGPU challenges
• GPGPU Continuum ch...
INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
GPGPU EVOLUTION

G80 – 346 GFLOPS

2004 – Stanford University: Brook for GPUs
2006 – AMD releases CTM
NVIDIA releases CUDA...
GPGPU EVOLUTION

Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1
1,024 Intel Xeon E5450 CPUs
5,120 Radeon 4870 X...
GPGPU EVOLUTION

2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320)
Nexus 10 (ARM Mali T604)
Android 4.2 adds GPU support fo...
THE GPGPU CONTINUUM

Apple A6 GPU
25 GFLOPS
< 2W

AMD G-T16R
46 GFLOPS*
4.5W

Intel i7-3770
511 GFLOPS*
77W

NVIDIA GTX Ti...
INTRO TO LEADING MOBILE GPU VENDORS
Imagination PowerVR 543
• Apple, Samsung, Motorola,
Intel
• Unified Shaders
• Supports...
MOBILE GPGPU CHALLENGES
•

Many Different GPU Architectures
• Optimizing for each sets high bar on development costs

•

D...
GPGPU CONTINUUM CHALLENGES
•

Many Different GPU Architectures
• Optimizing for each sets high bar on development costs

•...
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …

GPU
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …

OpenCL
Render
Script

GPU

Direct
Compute

CUD...
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
PyOpenCL

WebCL
Aparapi
(Java)

OpenCL
OpenACC
...
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
PyOpenCL

WebCL
Aparapi
(Java)

OpenCL
OpenACC
...
TOWARDS THE CONTINUUM (1) - LANGUAGES
•

Current GPGPU languages are C/C++
based
• There are “binding” to Python, Java,
Ja...
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK

CUDA

LLVM IR

Vendor X IL
Vendor X GPU
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK

OpenCL

LLVM IR

Vendor X IL
Vendor X GPU

CUDA
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
•

Most GPGPU languages already use
LLVM compilation framework
• Slight “flavor...
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
•

Most GPGPU languages already use
LLVM compilation framework
• Slight “flavor...
TAKEAWAYS
• GPGPU Continuum is here - from Mobile devices to HPC
• Vision: A common ecosystem built on a common (SW)
archi...
QUESTIONS
• Q: What about “Heterogeneous Computing” ?
• A: Go back, replace each “GPGPU” with “Heterogeneous
Computing” – ...
SOME SOURCES:
•

http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/Ne...
Upcoming SlideShare
Loading in...5
×

The GPGPU Continuum

801

Published on

This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
801
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The GPGPU Continuum"

  1. 1. THE GPGPU CONTINUUM Ofer Rosenberg The GPU continuum workshop, April 25 2013
  2. 2. CONTENT • Intel’s Compute Continuum • GPGPU Evolution • The GPGPU Continuum • Mobile GPGPU challenges • GPGPU Continuum challenges • Towards the Continuum
  3. 3. INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
  4. 4. INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
  5. 5. GPGPU EVOLUTION G80 – 346 GFLOPS 2004 – Stanford University: Brook for GPUs 2006 – AMD releases CTM NVIDIA releases CUDA 2008 – OpenCL 1.0 released R580 – 375 GFLOPS
  6. 6. GPGPU EVOLUTION Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1 1,024 Intel Xeon E5450 CPUs 5,120 Radeon 4870 X2 GPUs Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A 14,336 Xeon X5670 CPUs 7,168 Nvidia Tesla M2050 GPUs Source: http://www.top500.org/lists/ Tianhe-1 : 563 TFLOPS Tianhe-1A : 2577 TFLOPS
  7. 7. GPGPU EVOLUTION 2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320) Nexus 10 (ARM Mali T604) Android 4.2 adds GPU support for Renderscript 2014 – NVIDIA Tegra 5 will support CUDA 2013 – GPGPU Continuum becomes a reality
  8. 8. THE GPGPU CONTINUUM Apple A6 GPU 25 GFLOPS < 2W AMD G-T16R 46 GFLOPS* 4.5W Intel i7-3770 511 GFLOPS* 77W NVIDIA GTX Titan 4500 GFLOPS 250W ORNL TITAN SC 27 PFLOPS 8200 KW * GFLOPS of CPU+GPU Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum: A common ecosystem built on a common (SW) architecture
  9. 9. INTRO TO LEADING MOBILE GPU VENDORS Imagination PowerVR 543 • Apple, Samsung, Motorola, Intel • Unified Shaders • Supports OpenCL 1.1 (E) • 38 Gflops (Apple’s MP4 ver) Vivante CG4000 • Unified Shaders • 4 Cores, SIMD4 each • Supports OpenCL 1.2 • 48 Gflops Qualcomm Adreno 320 • Part of Snapdragon S4 • Unified Shader • SIMD4 ? • Supports OpenCL 1.1 (E) • 50 GFlops ARM Mali T604 • 4 Cores • Multiple “pipes” per core • Supports OpenCL 1.1 • 68 GFlops NVIDIA Tegra 4 • 6 X 4-wide Vertex shaders • 4 X 4-wide Pixel Shaders • No GPGPU support • 74 GFLOPS http://kyokojap.myweb.hinet.net/gpu_gflops/
  10. 10. MOBILE GPGPU CHALLENGES • Many Different GPU Architectures • Optimizing for each sets high bar on development costs • Development Tools • Immature (stability, performance) • No common SDK / Debugger / Profiler (different per vendor) • Ecosystem • • Lack of libraries, wizards, middleware  Slow & expensive development Distribution Model • Driver updates are part of OS distribution (no more per-month updates…) • End users are less likely to update version  higher standards on stability & performance of driver release • Security – the unspoken issue (hole) …
  11. 11. GPGPU CONTINUUM CHALLENGES • Many Different GPU Architectures • Optimizing for each sets high bar on development costs • Development Tools • Immature (stability, performance) • No common SDK / Debugger / Profiler (different per vendor) • Ecosystem • • Lack of libraries, wizards, middleware  Slow & expensive development Distribution Model • End users are less likely to update version higher standards on stability & performance of driver release • Security – the unspoken issue (hole) … These challenges are a barrier to GPGPU adoption across the continuum
  12. 12. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … GPU
  13. 13. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … OpenCL Render Script GPU Direct Compute CUDA
  14. 14. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … PyOpenCL WebCL Aparapi (Java) OpenCL OpenACC Render Script GPU Direct Compute C++ AMP CUDA Fortran NumbaPro (Python)
  15. 15. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … PyOpenCL WebCL Aparapi (Java) OpenCL OpenACC Render Script GPU Direct Compute C++ AMP CUDA Fortran NumbaPro (Python) A Jungle of languages… but are these the right ones ?
  16. 16. TOWARDS THE CONTINUUM (1) - LANGUAGES • Current GPGPU languages are C/C++ based • There are “binding” to Python, Java, Javascript – but kernels are still C/C++ • Current developers trends: • Managed languages (Java , C#) • Scripting languages (Python, PHP) https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language • Higher abstraction & manageability: • More room for tools to excel on optimization • Mitigate difference between GPU architectures GPGPU languages need to evolve Data from CodeEval.com, based on 100K+ code samples
  17. 17. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK CUDA LLVM IR Vendor X IL Vendor X GPU
  18. 18. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK OpenCL LLVM IR Vendor X IL Vendor X GPU CUDA
  19. 19. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK • Most GPGPU languages already use LLVM compilation framework • Slight “flavors” of LLVM IR • Most languages also posses similar “API capabilities” set OpenACC Render Script OpenCL LLVM IR Vendor X IL Vendor X GPU CUDA
  20. 20. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK • Most GPGPU languages already use LLVM compilation framework • Slight “flavors” of LLVM IR • • Most languages also posses similar “API capabilities” set Defining a common stack based on LLVM & common API will: • Improve the compiler OpenACC Render Script OpenCL LLVM IR Vendor X IL • Increase driver quality & stability • Enable unified debugger / profiler Vendor X GPU • … Define GPGPU Virtual Machine based on LLVM CUDA
  21. 21. TAKEAWAYS • GPGPU Continuum is here - from Mobile devices to HPC • Vision: A common ecosystem built on a common (SW) architecture • Challenges: many architectures, immature tools, ecosystem
  22. 22. QUESTIONS • Q: What about “Heterogeneous Computing” ? • A: Go back, replace each “GPGPU” with “Heterogeneous Computing” – and it all fits… • More ?
  23. 23. SOME SOURCES: • http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/Newgraphics-the-biggest-news-in-Ivy-Bridge.html • http://elrond.informatik.tu-freiberg.de/papers/WorldComp2012/PDP2833.pdf • http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/5 • http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400 • http://www.chipdesignmag.com/pallab/2011/06/30/arm-mali-gpu-unifying-graphics-across-platforms/ • http://en.wikipedia.org/wiki/Adreno#Renaming_to_Adreno • http://en.wikipedia.org/wiki/PowerVR#Series_5_.28SGX.29 • http://en.wikipedia.org/wiki/Mali_(GPU) • http://johndayautomotivelectronics.com/?p=12412 • http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidiageforce-ulp/ • http://www.brightsideofnews.com/print/2013/1/30/rise-of-vivante-fastest-tablet-gpu-on-the-market.aspx • https://www.uplinq.com/2012/schedule/accelerating-your-android-application-renderscript-and-llvm-0 • http://www.androidauthority.com/adreno-320-features-performance-benchmarks-103269/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×