Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GPU Ecosystem

1,943 views

Published on

This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight

Published in: Technology
  • My brother found Custom Writing Service ⇒ www.HelpWriting.net ⇐ and ordered a couple of works. Their customer service is outstanding, never left a query unanswered.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You can try to use this service ⇒ HelpWriting.net ⇐ I have used it several times in college and was absolutely satisfied with the result.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

GPU Ecosystem

  1. 1. Page 1 GPU Ecosystem Introduction & Case Study Ofer Rosenberg October 2013
  2. 2. Page 2 Content  GPU Ecosystem  Ecosystem on Mobile/Embedded Platforms  NSIGHT - Tools case study  Libraries
  3. 3. Page 3 Product GPU Ecosystem Software Product Development cycle: The GPU Ecosystem role is to support, speedup, and improve this cycle for GPU Compute Design Write Code Debug Profile
  4. 4. Page 4 GPU Ecosystem  Support writing code by:  IDE integration – Compiler, Parser, Wizards  Libraries: Math (BLAS, IPP-like, Matrix, etc.), STL-like (Thrust, BOLT)  Support Debugging by:  IDE integration of the debugger (preferred)  Provide usable execution control (breakpoints, pause/resume, etc.)  Providing reliable memory view of various address spaces  Support Profiling by:  Provide two levels of profiling: System Tracing and Kernel Profiling  System Tracing - quick highlighting of hotspots and device optimal access  Statistical and TimeLine-based Kernel Profiling (using perf. counters) Design Write Code Debug Profile
  5. 5. Page 5 Ecosystem on Mobile/Embedded Platforms
  6. 6. Page 6 ARM MALI  Part of ARM SoC  OpenCL 1.1Full Profile (Linux, Android)  Renderscript (Android only)  OpenCL SDK – Samples, Tutorials, etc.  No GPU debugging capability  ARM DS-5 (Developer Suite 5)  Eclipse IDE integration  Compiler, Debugger (CPU only)  System Trace – CPU & GPU  Deep Profiling - CPU & GPU
  7. 7. Page 7 Intel Haswell GPU  Part of Haswell (CPU & GPU)  OpenCL 1.2 Full Profile  Windows only for now (Linux @ alpha stage)  OpenCL SDK  Samples  Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)  No GPU debugging capability  VTune Amplifier XE supports OpenCL (CPU & GPU)  System level tracing (Application, Memory, Kernel launch)  Kernel Profiling
  8. 8. Page 8 Intel BayTrail platform (Atom)  BayTrail < 13W, BayTrail-M < 6.5W  Vallyview SoC (Z37xx)  GPU is based on Gen7 (same arch as IvyBridge)  Same as previous slide:  OpenCL 1.2 (windows only for now)  OpenCL SDK  VTune support  System level tracing  Kernel Profiling
  9. 9. Page 9 NVIDIA Tegra 5 ? (Codename: Logan)  Disclaimer: Logan is due early 2014. Part of the information is speculations  Development Boards and Samples available to selected customers  Logan SoC – 2W  ARM CPU A15 4+1 :speculated  Kepler based GPU : verified  CUDA Support : verified  CUDA SDK – Dozens of samples  CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.  NSIGHT : speculated  System Trace  Profiling, Debugging
  10. 10. Page 10 NSIGHT TOOLS CASE STUDY Design Write Code Debug Profile
  11. 11. Page 11 Nsight Highlights  “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous computing” ( Taken from Nsight page )  IDE integration  Windows – integration with Visual Studio  Linux – specialized Eclipse version  Debugging , System Trace , Profiling  Graphics (DX, OpenGL)  Computing (OpenCL, CUDA, C++ AMP)  Profiling only on CUDA kernels  Debug/Trace/Profile Information is highly shaped  Highly efficient information fields, windows, diagrams  Feedback from professional users is noticed
  12. 12. Page 12 Debugging  Much more than “just integrated” with the IDE  Shaped windows showing valuable info Assembly (GPU!) Variables across all warpsVisible layout of the stopped thread
  13. 13. Page 13 Debugging – Eclipse edition  Seems that Eclipse integration is deeper than Visual Studio  Unified CPU / GPU Debugging  Simultaneous visibility into both CPU and GPU state  Multi-GPU support Slides from: “CUDA Development Using NVIDIA Nsight, Eclipse Edition” by David Goodwin, SC12  Full GPU debugging  Set kernel breakpoints  Single-step, run until, etc.  View values across multiple GPU threads at the same time  Examine thread, warp, block state  Source and assembly level debugging
  14. 14. Page 14 System Trace
  15. 15. Page 15 Kernel Profiling  Choose a kernel to profile  Skip N kernels, Profile M kernels  Choose “experiments”  Experiment - Types of profiling/analysis  NVIDIA runs each kernel launch dozens of times with the same data
  16. 16. Page 16 Profiling Results  Experiment list  Each experiment is a tabbed window  Profiling information is shaped in graphs, pie charts, diagrams, etc.  Taking HW counters and shaping them to easy- to-understand graphics  Information targets known HW bottlenecks, Code inefficiencies, etc.  Amazingly shaped…
  17. 17. Page 17 Profiling Results  The information provides a quick & easy methodic way to identify the performance bottlenecks 1 2 3 4
  18. 18. Page 18 Eclipse Edition - Source Code Editor  Project Templates  CUDA code highlighting  CUDA aware refactoring  CUDA aware code completion and inline help
  19. 19. Page 19 LIBRARIES EXAMPLES
  20. 20. Page 20 CUDA Libraries – Part of the SDK  cuFFT  cuBLAS  cuRAND  cuSPARSE  NPP (like IPP)  Math Library  Thrust (next slide)
  21. 21. Page 21 Thrust Library  https://developer.nvidia.com/thrust  Works on top of CUDA  Open-source version is available at github  http://thrust.github.io/  Presentations:  http://on-demand.gputechconf.com/gtc- express/2011/presentations/introductiontothrust.pdf
  22. 22. Page 22 OPENCL LIBRARIES
  23. 23. Page 23 CLPP  OpenCL Data Parallel Primitives Library (similar to thrust)  Source : https://code.google.com/p/clpp/  7 committers, last commit 1.5Y ago
  24. 24. Page 24 OpenCL BLAS  OpenCL BLAS  http://openclblas.sourceforge.net/  Code is available here (GPLv2):  http://sourceforge.net/projects/openclblas/
  25. 25. Page 25 ViennaCL  BLAS implementation  http://viennacl.sourceforge.net/  Looks very promising
  26. 26. Page 26 REFERENCES
  27. 27. Page 27 Platform links:  ARM  Developer site : http://malideveloper.arm.com  OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/  DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php  OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/  OpenCL developer guide:  Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html  PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf  NVIDIA  http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler  http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/  http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
  28. 28. Page 28 Links:  Intel  OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk  GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa  vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl- performance-analysis-on-intel-hd-graphics  http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux  Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc  OpenCL “Beignet” – open source linux compiler :  http://software.intel.com/en-us/forums/topic/402118  http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux  ATOM BayTrail:  http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/  http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested  http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html  http://software.intel.com/en-us/forums/topic/476221  http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
  29. 29. Page 29 NSIGHT Links  http://www.nvidia.com/object/nsight.html  https://developer.nvidia.com/nsight-visual-studio-edition-videos  https://developer.nvidia.com/developer-webinars  http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin- CUDA-Development-Nsight.pdf  http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization- With-Nsight-VSE.pdf

×