GPU Ecosystem

1,689 views

Published on

This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,689
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

GPU Ecosystem

  1. 1. Page 1 GPU Ecosystem Introduction & Case Study Ofer Rosenberg October 2013
  2. 2. Page 2 Content  GPU Ecosystem  Ecosystem on Mobile/Embedded Platforms  NSIGHT - Tools case study  Libraries
  3. 3. Page 3 Product GPU Ecosystem Software Product Development cycle: The GPU Ecosystem role is to support, speedup, and improve this cycle for GPU Compute Design Write Code Debug Profile
  4. 4. Page 4 GPU Ecosystem  Support writing code by:  IDE integration – Compiler, Parser, Wizards  Libraries: Math (BLAS, IPP-like, Matrix, etc.), STL-like (Thrust, BOLT)  Support Debugging by:  IDE integration of the debugger (preferred)  Provide usable execution control (breakpoints, pause/resume, etc.)  Providing reliable memory view of various address spaces  Support Profiling by:  Provide two levels of profiling: System Tracing and Kernel Profiling  System Tracing - quick highlighting of hotspots and device optimal access  Statistical and TimeLine-based Kernel Profiling (using perf. counters) Design Write Code Debug Profile
  5. 5. Page 5 Ecosystem on Mobile/Embedded Platforms
  6. 6. Page 6 ARM MALI  Part of ARM SoC  OpenCL 1.1Full Profile (Linux, Android)  Renderscript (Android only)  OpenCL SDK – Samples, Tutorials, etc.  No GPU debugging capability  ARM DS-5 (Developer Suite 5)  Eclipse IDE integration  Compiler, Debugger (CPU only)  System Trace – CPU & GPU  Deep Profiling - CPU & GPU
  7. 7. Page 7 Intel Haswell GPU  Part of Haswell (CPU & GPU)  OpenCL 1.2 Full Profile  Windows only for now (Linux @ alpha stage)  OpenCL SDK  Samples  Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)  No GPU debugging capability  VTune Amplifier XE supports OpenCL (CPU & GPU)  System level tracing (Application, Memory, Kernel launch)  Kernel Profiling
  8. 8. Page 8 Intel BayTrail platform (Atom)  BayTrail < 13W, BayTrail-M < 6.5W  Vallyview SoC (Z37xx)  GPU is based on Gen7 (same arch as IvyBridge)  Same as previous slide:  OpenCL 1.2 (windows only for now)  OpenCL SDK  VTune support  System level tracing  Kernel Profiling
  9. 9. Page 9 NVIDIA Tegra 5 ? (Codename: Logan)  Disclaimer: Logan is due early 2014. Part of the information is speculations  Development Boards and Samples available to selected customers  Logan SoC – 2W  ARM CPU A15 4+1 :speculated  Kepler based GPU : verified  CUDA Support : verified  CUDA SDK – Dozens of samples  CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.  NSIGHT : speculated  System Trace  Profiling, Debugging
  10. 10. Page 10 NSIGHT TOOLS CASE STUDY Design Write Code Debug Profile
  11. 11. Page 11 Nsight Highlights  “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous computing” ( Taken from Nsight page )  IDE integration  Windows – integration with Visual Studio  Linux – specialized Eclipse version  Debugging , System Trace , Profiling  Graphics (DX, OpenGL)  Computing (OpenCL, CUDA, C++ AMP)  Profiling only on CUDA kernels  Debug/Trace/Profile Information is highly shaped  Highly efficient information fields, windows, diagrams  Feedback from professional users is noticed
  12. 12. Page 12 Debugging  Much more than “just integrated” with the IDE  Shaped windows showing valuable info Assembly (GPU!) Variables across all warpsVisible layout of the stopped thread
  13. 13. Page 13 Debugging – Eclipse edition  Seems that Eclipse integration is deeper than Visual Studio  Unified CPU / GPU Debugging  Simultaneous visibility into both CPU and GPU state  Multi-GPU support Slides from: “CUDA Development Using NVIDIA Nsight, Eclipse Edition” by David Goodwin, SC12  Full GPU debugging  Set kernel breakpoints  Single-step, run until, etc.  View values across multiple GPU threads at the same time  Examine thread, warp, block state  Source and assembly level debugging
  14. 14. Page 14 System Trace
  15. 15. Page 15 Kernel Profiling  Choose a kernel to profile  Skip N kernels, Profile M kernels  Choose “experiments”  Experiment - Types of profiling/analysis  NVIDIA runs each kernel launch dozens of times with the same data
  16. 16. Page 16 Profiling Results  Experiment list  Each experiment is a tabbed window  Profiling information is shaped in graphs, pie charts, diagrams, etc.  Taking HW counters and shaping them to easy- to-understand graphics  Information targets known HW bottlenecks, Code inefficiencies, etc.  Amazingly shaped…
  17. 17. Page 17 Profiling Results  The information provides a quick & easy methodic way to identify the performance bottlenecks 1 2 3 4
  18. 18. Page 18 Eclipse Edition - Source Code Editor  Project Templates  CUDA code highlighting  CUDA aware refactoring  CUDA aware code completion and inline help
  19. 19. Page 19 LIBRARIES EXAMPLES
  20. 20. Page 20 CUDA Libraries – Part of the SDK  cuFFT  cuBLAS  cuRAND  cuSPARSE  NPP (like IPP)  Math Library  Thrust (next slide)
  21. 21. Page 21 Thrust Library  https://developer.nvidia.com/thrust  Works on top of CUDA  Open-source version is available at github  http://thrust.github.io/  Presentations:  http://on-demand.gputechconf.com/gtc- express/2011/presentations/introductiontothrust.pdf
  22. 22. Page 22 OPENCL LIBRARIES
  23. 23. Page 23 CLPP  OpenCL Data Parallel Primitives Library (similar to thrust)  Source : https://code.google.com/p/clpp/  7 committers, last commit 1.5Y ago
  24. 24. Page 24 OpenCL BLAS  OpenCL BLAS  http://openclblas.sourceforge.net/  Code is available here (GPLv2):  http://sourceforge.net/projects/openclblas/
  25. 25. Page 25 ViennaCL  BLAS implementation  http://viennacl.sourceforge.net/  Looks very promising
  26. 26. Page 26 REFERENCES
  27. 27. Page 27 Platform links:  ARM  Developer site : http://malideveloper.arm.com  OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/  DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php  OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/  OpenCL developer guide:  Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html  PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf  NVIDIA  http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler  http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/  http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
  28. 28. Page 28 Links:  Intel  OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk  GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa  vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl- performance-analysis-on-intel-hd-graphics  http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux  Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc  OpenCL “Beignet” – open source linux compiler :  http://software.intel.com/en-us/forums/topic/402118  http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux  ATOM BayTrail:  http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/  http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested  http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html  http://software.intel.com/en-us/forums/topic/476221  http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
  29. 29. Page 29 NSIGHT Links  http://www.nvidia.com/object/nsight.html  https://developer.nvidia.com/nsight-visual-studio-edition-videos  https://developer.nvidia.com/developer-webinars  http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin- CUDA-Development-Nsight.pdf  http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization- With-Nsight-VSE.pdf

×