• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
GPU Ecosystem
 

GPU Ecosystem

on

  • 975 views

This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight

This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight

Statistics

Views

Total Views
975
Views on SlideShare
974
Embed Views
1

Actions

Likes
1
Downloads
4
Comments
0

1 Embed 1

https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    GPU Ecosystem GPU Ecosystem Presentation Transcript

    • Page 1 GPU Ecosystem Introduction & Case Study Ofer Rosenberg October 2013
    • Page 2 Content  GPU Ecosystem  Ecosystem on Mobile/Embedded Platforms  NSIGHT - Tools case study  Libraries
    • Page 3 Product GPU Ecosystem Software Product Development cycle: The GPU Ecosystem role is to support, speedup, and improve this cycle for GPU Compute Design Write Code Debug Profile
    • Page 4 GPU Ecosystem  Support writing code by:  IDE integration – Compiler, Parser, Wizards  Libraries: Math (BLAS, IPP-like, Matrix, etc.), STL-like (Thrust, BOLT)  Support Debugging by:  IDE integration of the debugger (preferred)  Provide usable execution control (breakpoints, pause/resume, etc.)  Providing reliable memory view of various address spaces  Support Profiling by:  Provide two levels of profiling: System Tracing and Kernel Profiling  System Tracing - quick highlighting of hotspots and device optimal access  Statistical and TimeLine-based Kernel Profiling (using perf. counters) Design Write Code Debug Profile
    • Page 5 Ecosystem on Mobile/Embedded Platforms
    • Page 6 ARM MALI  Part of ARM SoC  OpenCL 1.1Full Profile (Linux, Android)  Renderscript (Android only)  OpenCL SDK – Samples, Tutorials, etc.  No GPU debugging capability  ARM DS-5 (Developer Suite 5)  Eclipse IDE integration  Compiler, Debugger (CPU only)  System Trace – CPU & GPU  Deep Profiling - CPU & GPU
    • Page 7 Intel Haswell GPU  Part of Haswell (CPU & GPU)  OpenCL 1.2 Full Profile  Windows only for now (Linux @ alpha stage)  OpenCL SDK  Samples  Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)  No GPU debugging capability  VTune Amplifier XE supports OpenCL (CPU & GPU)  System level tracing (Application, Memory, Kernel launch)  Kernel Profiling
    • Page 8 Intel BayTrail platform (Atom)  BayTrail < 13W, BayTrail-M < 6.5W  Vallyview SoC (Z37xx)  GPU is based on Gen7 (same arch as IvyBridge)  Same as previous slide:  OpenCL 1.2 (windows only for now)  OpenCL SDK  VTune support  System level tracing  Kernel Profiling
    • Page 9 NVIDIA Tegra 5 ? (Codename: Logan)  Disclaimer: Logan is due early 2014. Part of the information is speculations  Development Boards and Samples available to selected customers  Logan SoC – 2W  ARM CPU A15 4+1 :speculated  Kepler based GPU : verified  CUDA Support : verified  CUDA SDK – Dozens of samples  CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.  NSIGHT : speculated  System Trace  Profiling, Debugging
    • Page 10 NSIGHT TOOLS CASE STUDY Design Write Code Debug Profile
    • Page 11 Nsight Highlights  “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous computing” ( Taken from Nsight page )  IDE integration  Windows – integration with Visual Studio  Linux – specialized Eclipse version  Debugging , System Trace , Profiling  Graphics (DX, OpenGL)  Computing (OpenCL, CUDA, C++ AMP)  Profiling only on CUDA kernels  Debug/Trace/Profile Information is highly shaped  Highly efficient information fields, windows, diagrams  Feedback from professional users is noticed
    • Page 12 Debugging  Much more than “just integrated” with the IDE  Shaped windows showing valuable info Assembly (GPU!) Variables across all warpsVisible layout of the stopped thread
    • Page 13 Debugging – Eclipse edition  Seems that Eclipse integration is deeper than Visual Studio  Unified CPU / GPU Debugging  Simultaneous visibility into both CPU and GPU state  Multi-GPU support Slides from: “CUDA Development Using NVIDIA Nsight, Eclipse Edition” by David Goodwin, SC12  Full GPU debugging  Set kernel breakpoints  Single-step, run until, etc.  View values across multiple GPU threads at the same time  Examine thread, warp, block state  Source and assembly level debugging
    • Page 14 System Trace
    • Page 15 Kernel Profiling  Choose a kernel to profile  Skip N kernels, Profile M kernels  Choose “experiments”  Experiment - Types of profiling/analysis  NVIDIA runs each kernel launch dozens of times with the same data
    • Page 16 Profiling Results  Experiment list  Each experiment is a tabbed window  Profiling information is shaped in graphs, pie charts, diagrams, etc.  Taking HW counters and shaping them to easy- to-understand graphics  Information targets known HW bottlenecks, Code inefficiencies, etc.  Amazingly shaped…
    • Page 17 Profiling Results  The information provides a quick & easy methodic way to identify the performance bottlenecks 1 2 3 4
    • Page 18 Eclipse Edition - Source Code Editor  Project Templates  CUDA code highlighting  CUDA aware refactoring  CUDA aware code completion and inline help
    • Page 19 LIBRARIES EXAMPLES
    • Page 20 CUDA Libraries – Part of the SDK  cuFFT  cuBLAS  cuRAND  cuSPARSE  NPP (like IPP)  Math Library  Thrust (next slide)
    • Page 21 Thrust Library  https://developer.nvidia.com/thrust  Works on top of CUDA  Open-source version is available at github  http://thrust.github.io/  Presentations:  http://on-demand.gputechconf.com/gtc- express/2011/presentations/introductiontothrust.pdf
    • Page 22 OPENCL LIBRARIES
    • Page 23 CLPP  OpenCL Data Parallel Primitives Library (similar to thrust)  Source : https://code.google.com/p/clpp/  7 committers, last commit 1.5Y ago
    • Page 24 OpenCL BLAS  OpenCL BLAS  http://openclblas.sourceforge.net/  Code is available here (GPLv2):  http://sourceforge.net/projects/openclblas/
    • Page 25 ViennaCL  BLAS implementation  http://viennacl.sourceforge.net/  Looks very promising
    • Page 26 REFERENCES
    • Page 27 Platform links:  ARM  Developer site : http://malideveloper.arm.com  OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/  DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php  OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/  OpenCL developer guide:  Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html  PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf  NVIDIA  http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler  http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/  http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
    • Page 28 Links:  Intel  OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk  GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa  vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl- performance-analysis-on-intel-hd-graphics  http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux  Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc  OpenCL “Beignet” – open source linux compiler :  http://software.intel.com/en-us/forums/topic/402118  http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux  ATOM BayTrail:  http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/  http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested  http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html  http://software.intel.com/en-us/forums/topic/476221  http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
    • Page 29 NSIGHT Links  http://www.nvidia.com/object/nsight.html  https://developer.nvidia.com/nsight-visual-studio-edition-videos  https://developer.nvidia.com/developer-webinars  http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin- CUDA-Development-Nsight.pdf  http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization- With-Nsight-VSE.pdf