The GPGPU Continuum
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The GPGPU Continuum

on

  • 1,075 views

This is a presentation I gave on last GPGPU workshop we did on April 2013. ...

This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.

Statistics

Views

Total Views
1,075
Views on SlideShare
1,075
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NoDerivs LicenseCC Attribution-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The GPGPU Continuum Presentation Transcript

  • 1. THE GPGPU CONTINUUM Ofer Rosenberg The GPU continuum workshop, April 25 2013
  • 2. CONTENT • Intel’s Compute Continuum • GPGPU Evolution • The GPGPU Continuum • Mobile GPGPU challenges • GPGPU Continuum challenges • Towards the Continuum
  • 3. INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
  • 4. INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
  • 5. GPGPU EVOLUTION G80 – 346 GFLOPS 2004 – Stanford University: Brook for GPUs 2006 – AMD releases CTM NVIDIA releases CUDA 2008 – OpenCL 1.0 released R580 – 375 GFLOPS
  • 6. GPGPU EVOLUTION Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1 1,024 Intel Xeon E5450 CPUs 5,120 Radeon 4870 X2 GPUs Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A 14,336 Xeon X5670 CPUs 7,168 Nvidia Tesla M2050 GPUs Source: http://www.top500.org/lists/ Tianhe-1 : 563 TFLOPS Tianhe-1A : 2577 TFLOPS
  • 7. GPGPU EVOLUTION 2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320) Nexus 10 (ARM Mali T604) Android 4.2 adds GPU support for Renderscript 2014 – NVIDIA Tegra 5 will support CUDA 2013 – GPGPU Continuum becomes a reality
  • 8. THE GPGPU CONTINUUM Apple A6 GPU 25 GFLOPS < 2W AMD G-T16R 46 GFLOPS* 4.5W Intel i7-3770 511 GFLOPS* 77W NVIDIA GTX Titan 4500 GFLOPS 250W ORNL TITAN SC 27 PFLOPS 8200 KW * GFLOPS of CPU+GPU Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum: A common ecosystem built on a common (SW) architecture
  • 9. INTRO TO LEADING MOBILE GPU VENDORS Imagination PowerVR 543 • Apple, Samsung, Motorola, Intel • Unified Shaders • Supports OpenCL 1.1 (E) • 38 Gflops (Apple’s MP4 ver) Vivante CG4000 • Unified Shaders • 4 Cores, SIMD4 each • Supports OpenCL 1.2 • 48 Gflops Qualcomm Adreno 320 • Part of Snapdragon S4 • Unified Shader • SIMD4 ? • Supports OpenCL 1.1 (E) • 50 GFlops ARM Mali T604 • 4 Cores • Multiple “pipes” per core • Supports OpenCL 1.1 • 68 GFlops NVIDIA Tegra 4 • 6 X 4-wide Vertex shaders • 4 X 4-wide Pixel Shaders • No GPGPU support • 74 GFLOPS http://kyokojap.myweb.hinet.net/gpu_gflops/
  • 10. MOBILE GPGPU CHALLENGES • Many Different GPU Architectures • Optimizing for each sets high bar on development costs • Development Tools • Immature (stability, performance) • No common SDK / Debugger / Profiler (different per vendor) • Ecosystem • • Lack of libraries, wizards, middleware  Slow & expensive development Distribution Model • Driver updates are part of OS distribution (no more per-month updates…) • End users are less likely to update version  higher standards on stability & performance of driver release • Security – the unspoken issue (hole) …
  • 11. GPGPU CONTINUUM CHALLENGES • Many Different GPU Architectures • Optimizing for each sets high bar on development costs • Development Tools • Immature (stability, performance) • No common SDK / Debugger / Profiler (different per vendor) • Ecosystem • • Lack of libraries, wizards, middleware  Slow & expensive development Distribution Model • End users are less likely to update version higher standards on stability & performance of driver release • Security – the unspoken issue (hole) … These challenges are a barrier to GPGPU adoption across the continuum
  • 12. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … GPU
  • 13. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … OpenCL Render Script GPU Direct Compute CUDA
  • 14. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … PyOpenCL WebCL Aparapi (Java) OpenCL OpenACC Render Script GPU Direct Compute C++ AMP CUDA Fortran NumbaPro (Python)
  • 15. TOWARDS THE CONTINUUM (1) - LANGUAGES • Welcome to the GPGPU (SW) jungle … PyOpenCL WebCL Aparapi (Java) OpenCL OpenACC Render Script GPU Direct Compute C++ AMP CUDA Fortran NumbaPro (Python) A Jungle of languages… but are these the right ones ?
  • 16. TOWARDS THE CONTINUUM (1) - LANGUAGES • Current GPGPU languages are C/C++ based • There are “binding” to Python, Java, Javascript – but kernels are still C/C++ • Current developers trends: • Managed languages (Java , C#) • Scripting languages (Python, PHP) https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language • Higher abstraction & manageability: • More room for tools to excel on optimization • Mitigate difference between GPU architectures GPGPU languages need to evolve Data from CodeEval.com, based on 100K+ code samples
  • 17. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK CUDA LLVM IR Vendor X IL Vendor X GPU
  • 18. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK OpenCL LLVM IR Vendor X IL Vendor X GPU CUDA
  • 19. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK • Most GPGPU languages already use LLVM compilation framework • Slight “flavors” of LLVM IR • Most languages also posses similar “API capabilities” set OpenACC Render Script OpenCL LLVM IR Vendor X IL Vendor X GPU CUDA
  • 20. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK • Most GPGPU languages already use LLVM compilation framework • Slight “flavors” of LLVM IR • • Most languages also posses similar “API capabilities” set Defining a common stack based on LLVM & common API will: • Improve the compiler OpenACC Render Script OpenCL LLVM IR Vendor X IL • Increase driver quality & stability • Enable unified debugger / profiler Vendor X GPU • … Define GPGPU Virtual Machine based on LLVM CUDA
  • 21. TAKEAWAYS • GPGPU Continuum is here - from Mobile devices to HPC • Vision: A common ecosystem built on a common (SW) architecture • Challenges: many architectures, immature tools, ecosystem
  • 22. QUESTIONS • Q: What about “Heterogeneous Computing” ? • A: Go back, replace each “GPGPU” with “Heterogeneous Computing” – and it all fits… • More ?
  • 23. SOME SOURCES: • http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/Newgraphics-the-biggest-news-in-Ivy-Bridge.html • http://elrond.informatik.tu-freiberg.de/papers/WorldComp2012/PDP2833.pdf • http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/5 • http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400 • http://www.chipdesignmag.com/pallab/2011/06/30/arm-mali-gpu-unifying-graphics-across-platforms/ • http://en.wikipedia.org/wiki/Adreno#Renaming_to_Adreno • http://en.wikipedia.org/wiki/PowerVR#Series_5_.28SGX.29 • http://en.wikipedia.org/wiki/Mali_(GPU) • http://johndayautomotivelectronics.com/?p=12412 • http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidiageforce-ulp/ • http://www.brightsideofnews.com/print/2013/1/30/rise-of-vivante-fastest-tablet-gpu-on-the-market.aspx • https://www.uplinq.com/2012/schedule/accelerating-your-android-application-renderscript-and-llvm-0 • http://www.androidauthority.com/adreno-320-features-performance-benchmarks-103269/