The GPGPU Continuum
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

The GPGPU Continuum

  • 1,111 views
Uploaded on

This is a presentation I gave on last GPGPU workshop we did on April 2013. ...

This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,111
On Slideshare
1,111
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. THE GPGPU CONTINUUM Ofer Rosenberg The GPU continuum workshop, April 25 2013
  • 2. CONTENT • Intel’s Compute Continuum • GPGPU Evolution • The GPGPU Continuum • Mobile GPGPU challenges • GPGPU Continuum challenges • Towards the Continuum
  • 3. INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
  • 4. INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
  • 5. GPGPU EVOLUTION G80 – 346 GFLOPS 2004 – Stanford University: Brook for GPUs 2006 – AMD releases CTM NVIDIA releases CUDA 2008 – OpenCL 1.0 released R580 – 375 GFLOPS
  • 6. GPGPU EVOLUTION Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1 1,024 Intel Xeon E5450 CPUs 5,120 Radeon 4870 X2 GPUs Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A 14,336 Xeon X5670 CPUs 7,168 Nvidia Tesla M2050 GPUs Source: http://www.top500.org/lists/ Tianhe-1 : 563 TFLOPS Tianhe-1A : 2577 TFLOPS
  • 7. GPGPU EVOLUTION 2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320) Nexus 10 (ARM Mali T604) Android 4.2 adds GPU support for Renderscript 2014 – NVIDIA Tegra 5 will support CUDA 2013 – GPGPU Continuum becomes a reality
  • 8. THE GPGPU CONTINUUM Apple A6 GPU 25 GFLOPS < 2W AMD G-T16R 46 GFLOPS* 4.5W Intel i7-3770 511 GFLOPS* 77W NVIDIA GTX Titan 4500 GFLOPS 250W ORNL TITAN SC 27 PFLOPS 8200 KW * GFLOPS of CPU+GPU Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum: A common ecosystem built on a common (SW) architecture
  • 9. INTRO TO LEADING MOBILE GPU VENDORS Imagination PowerVR 543 • Apple, Samsung, Motorola, Intel • Unified Shaders • Supports OpenCL 1.1 (E) • 38 Gflops (Apple’s MP4 ver) Vivante CG4000 • Unified Shaders • 4 Cores, SIMD4 each • Supports OpenCL 1.2 • 48 Gflops Qualcomm Adreno 320 • Part of Snapdragon S4 • Unified Shader • SIMD4 ? • Supports OpenCL 1.1 (E) • 50 GFlops ARM Mali T604 • 4 Cores • Multiple “pipes” per core • Supports OpenCL 1.1 • 68 GFlops NVIDIA Tegra 4 • 6 X 4-wide Vertex shaders • 4 X 4-wide Pixel Shaders • No GPGPU support • 74 GFLOPS http://kyokojap.myweb.hinet.net/gpu_gflops/
  • 10. MOBILE GPGPU CHALLENGES • Many Different GPU Architectures • Optimizing for each sets high bar on development costs • Development Tools • Immature (stability, performance) • No common SDK / Debugger / Profiler (different per vendor) • Ecosystem • • Lack of libraries, wizards, middleware  Slow & expensive development Distribution Model • Driver updates are part of OS distribution (no more per-month updates…) • End users are less likely to update version  higher standards on stability & performance of driver release • Security – the unspoken issue (hole) …
  • 11. GPGPU CONTINUUM CHALLENGES • Many Different GPU Architectures • Optimizing for each sets high bar on development costs • Development Tools • Immature (stability, performance) • No common SDK / Debugger / Profiler (different per vendor) • Ecosystem • • Lack of libraries, wizards, middleware  Slow & expensive development Distribution Model • End users are less likely to update version higher standards on stability & performance of driver release • Security – the unspoken issue (hole) … These challenges are a barrier to GPGPU adoption across the continuum
  • 12. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … GPU
  • 13. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … OpenCL Render Script GPU Direct Compute CUDA
  • 14. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … PyOpenCL WebCL Aparapi (Java) OpenCL OpenACC Render Script GPU Direct Compute C++ AMP CUDA Fortran NumbaPro (Python)
  • 15. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … PyOpenCL WebCL Aparapi (Java) OpenCL OpenACC Render Script GPU Direct Compute C++ AMP CUDA Fortran NumbaPro (Python) A Jungle of languages… but are these the right ones ?
  • 16. TOWARDS THE CONTINUUM (1) - LANGUAGES • Current GPGPU languages are C/C++ based • There are “binding” to Python, Java, Javascript – but kernels are still C/C++ • Current developers trends: • Managed languages (Java , C#) • Scripting languages (Python, PHP) https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language • Higher abstraction & manageability: • More room for tools to excel on optimization • Mitigate difference between GPU architectures GPGPU languages need to evolve Data from CodeEval.com, based on 100K+ code samples
  • 17. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK CUDA LLVM IR Vendor X IL Vendor X GPU
  • 18. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK OpenCL LLVM IR Vendor X IL Vendor X GPU CUDA
  • 19. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK • Most GPGPU languages already use LLVM compilation framework • Slight “flavors” of LLVM IR • Most languages also posses similar “API capabilities” set OpenACC Render Script OpenCL LLVM IR Vendor X IL Vendor X GPU CUDA
  • 20. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK • Most GPGPU languages already use LLVM compilation framework • Slight “flavors” of LLVM IR • • Most languages also posses similar “API capabilities” set Defining a common stack based on LLVM & common API will: • Improve the compiler OpenACC Render Script OpenCL LLVM IR Vendor X IL • Increase driver quality & stability • Enable unified debugger / profiler Vendor X GPU • … Define GPGPU Virtual Machine based on LLVM CUDA
  • 21. TAKEAWAYS • GPGPU Continuum is here - from Mobile devices to HPC • Vision: A common ecosystem built on a common (SW) architecture • Challenges: many architectures, immature tools, ecosystem
  • 22. QUESTIONS • Q: What about “Heterogeneous Computing” ? • A: Go back, replace each “GPGPU” with “Heterogeneous Computing” – and it all fits… • More ?
  • 23. SOME SOURCES: • http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/Newgraphics-the-biggest-news-in-Ivy-Bridge.html • http://elrond.informatik.tu-freiberg.de/papers/WorldComp2012/PDP2833.pdf • http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/5 • http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400 • http://www.chipdesignmag.com/pallab/2011/06/30/arm-mali-gpu-unifying-graphics-across-platforms/ • http://en.wikipedia.org/wiki/Adreno#Renaming_to_Adreno • http://en.wikipedia.org/wiki/PowerVR#Series_5_.28SGX.29 • http://en.wikipedia.org/wiki/Mali_(GPU) • http://johndayautomotivelectronics.com/?p=12412 • http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidiageforce-ulp/ • http://www.brightsideofnews.com/print/2013/1/30/rise-of-vivante-fastest-tablet-gpu-on-the-market.aspx • https://www.uplinq.com/2012/schedule/accelerating-your-android-application-renderscript-and-llvm-0 • http://www.androidauthority.com/adreno-320-features-performance-benchmarks-103269/