SlideShare a Scribd company logo
1 of 81
NVIDIA OpenGL in 2012
Talk
 S0023 - NVIDIA OpenGL for 2012

  Mark Kilgard (Principal Software Engineer, NVIDIA)

  Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Topics covered include
  the latest advances available for Cg 3.1, the OpenGL Shading Language (GLSL); programmable tessellation;
  improved support for Direct3D conventions; integration with Direct3D and CUDA resources; bindless graphics;
  and more. When you utilize the latest OpenGL innovations from NVIDIA in your graphics applications, you
  benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.

  Topic Areas: Computer Graphics; Development Tools & Libraries; Visualization; Audio, Image and Video
  Processing

  Level: Intermediate

  Day: Monday, 05/14
  Time: 9:00 am - 10:20 pm
  Location: Room A3
Mark Kilgard
• Principal System Software Engineer
  – OpenGL driver and API evolution
  – Cg (“C for graphics”) shading language
  – GPU-accelerated path rendering
• OpenGL Utility Toolkit (GLUT) implementer
• Author of OpenGL for the X Window System
• Co-author of Cg Tutorial
Outline
• OpenGL’s importance to NVIDIA
• OpenGL API improvements & new features
  –   OpenGL 4.2
  –   Direct3D interoperability
  –   GPU-accelerated path rendering
  –   Kepler Improvements
       • Bindless Textures
• Linux improvements & new features
• Cg 3.1 update
NVIDIA’s OpenGL Leverage

 Cg                  GeForce




                                  Parallel Nsight
      Tegra
                               Quadro


                   OptiX
Example of Hybrid Rendering with OptiX
          OpenGL (Rasterization)




           OptiX (Ray tracing)
Nsight Provides OpenGL Profiling

                                   Configure
                                   Application
                                   Trace
                                   Settings
Parallel Nsight Provides OpenGL Profiling




   Magnified trace options shows specific OpenGL (and Cg) tracing options
Parallel Nsight Provides OpenGL Profiling
Parallel Nsight Provides OpenGL Profiling




   Trace of mix of OpenGL and CUDA shows glFinish & OpenGL draw calls
Only Cross Platform 3D API
OpenGL 3D Graphics API
• cross-platform
• most functional
• peak performance
• open standard
• inter-operable
• well specified & documented
• 20 years of compatibility
OpenGL Spawns Closely Related Standards




 Congratulations: WebGL officially approved, February 2012
               “The web is now 3D enabled”
OpenGL 4 – DirectX 11 Superset                                       Buffer and
                                                                        Event
                                                                       Interop


• Interop with a complete compute solution
    – OpenGL is for graphics – CUDA / OpenCL is for compute


• Shaders can be saved to and loaded from binary blobs
    – Ability to query a binary shader, and save it for reuse later


• Flow of content between desktop and mobile
    – All of OpenGL ES 2.0 capabilities available on desktop
    – EGL on Desktop in the works
    – WebGL bridging desktop and mobile


• Cross platform
    – Mac, Windows, Linux, Android, Solaris, FreeBSD
    – Result of being an open standard
Increasing Functionality of OpenGL

           Tessellation and Compute Features
             4.X

                     Geometry Shaders
                      3.X

                             Vertex and Fragment Shaders
                              2.X
                                           Fixed Function
                                            1.X
Classic OpenGL State Machine
• From 1991-2007
   * vertex & fragment processing got programmable 2001 & 2003




           [source: GL 1.0 specification, 1992]
Complicated from inception
OpenGL 3.x Conceptual Processing Flow (pre-2010)
                                                                                                              uniform/
                                                                  primitive topology,
                                                                                                              parameters
                                                                     transformed                                                                Legend
                                                                      vertex data Geometric primitive
                                                                                                              buffer objects
     Vertex                                     Vertex
                                                                                         assembly &                                                     vertices
    assembly        primitive                 processing
                      batch                                          transformed         processing                                 programmable
                                                                                                                                                         pixels
                                                                                                                                      operations
                      type,                                          vertex
                                                                                                                                                         fragments
                   vertex data                                       attributes                                     point, line,
                                                                                                                    and polygon      fixed-function      filtered texels
                                                                                                 geometry                              operations
                                                             Transform                           texture            fragments                           buffer data
            vertex
                                          transform          feedback                            fetches
            buffer
            objects                       feedback
                                                                                                                   pixels in framebuffer object textures
                                          buffer
                                          objects                                                                                  stenciling, depth testing,
 primitive batch type,                                                        vertex
                                                                                                                                   blending, accumulation
 vertex indices,                                                             texture
 vertex attributes                            texture                       fetches
                                              buffer
                    buffer data,              objects             Texture                                 Fragment               Raster
                      unmap                                                                                                                           Framebuffer
                                                                  mapping                   fragment     processing            operations
                       buffer                                                                  texture
   Command                          Buffer                                                    fetches
    parser                          store               pixel
                    map buffer,                         pack
                     get buffer                         buffer
                       data                             objects                                                 image and bitmap
                                                                         texture
                                                                                                                fragments
                                pixel                                    image
           pixel image or     unpack             Pixel                   specification
           texture image       buffer
           specification                        packing
                              objects                                                                                          OpenGL 3.2
                                                        pixels to pack        image
                                                                              rectangles,
                                                                              bitmaps                      Image
                Pixel                            Pixel
                                                                                                          primitive
              unpacking unpacked              processing
                                 pixels
                                                                                                         processing                          copy pixels,
                                                                                                                                      copy texture image
Patch
     Control point                                                                      Patch tessellation                                 Patch evaluation
                                                   assembly &
      processing          transformed                                 transformed          generation               transformed              processing
                          control points           processing            transformed
                                                                      patch                                       patch, bivariate
tessellation     patch control                                           patch                                         domain
                 points                                                          patch topology, evaluated patch vertex
texture
fetches                                                                   primitive topology,
                                                                             transformed      Geometric primitive                                                  Legend
        Vertex                                           Vertex               vertex data                                                                                  patch data
                                                                                                     assembly &
       assembly            primitive                   processing                                                                                      programmable
                             batch                                           transformed             processing                                                             vertices
                                                                                                                                                         operations
                             type,                                           vertex                                                                                         pixels
                          vertex data                                        attributes                                             point, line,                            fragments
                                                                                                                                    and polygon         fixed-function
                                                                                                            geometry                                                        filtered texels
                                                                      Transform                                                     fragments             operations
                vertex                                                                                      texture
                                                  transform           feedback                              fetches                                                         buffer data
                buffer
                objects                           feedback                                                                      pixels in framebuffer object textures
                                                  buffer
 primitive batch type,                            objects                                                                                            stenciling, depth testing,
                                                                                        vertex
 vertex indices,                                                                       texture                                                       blending, accumulation
 vertex attributes                                    texture                         fetches
                                                      buffer
                           buffer data,               objects              Texture                                      Fragment                     Raster
                             unmap
                                                                                                                                                                         Framebuffer
                                                                           mapping                      fragment       processing                  operations
                              buffer                            pixel                                      texture
    Command                                 Buffer
                                                                pack                                      fetches
     parser                                 store               buffer
                           map buffer,
                            get buffer                          objects
                                                                                 texture                                     image and bitmap
                              data
                                          pixel                                                                              fragments
                                                                                 image
               pixel image or           unpack            Pixel
                                                                                 specification
               texture image             buffer          packing
               specification            objects
                                                                pixels to pack
                                                                                                                                               OpenGL 4.2
                                                                                       image
                                                                                       rectangles,                       Image
                    Pixel                                 Pixel                        bitmaps
                                                                                                                        primitive                                 copy pixels,
                  unpacking          unpacked          processing
                                       pixels
                                                                                                                       processing                          copy texture image
Accelerating OpenGL Innovation                                                                 NOW!

  2004          2005        2006         2007               2008           2009       2010      2011   2012



                              DirectX 10.0   DirectX 10.1                    DirectX 11
         DirectX 9.0c
                                                                                                  It’s
                                                              OpenGL 3.1      OpenGL 3.3
                                                                                   +                cooking!
OpenGL 2.0              OpenGL 2.1                OpenGL 3.0       OpenGL 3.2 OpenGL 4.0



                                                                                  OpenGL 4.1
• OpenGL increased pace of innovation
    - Expect 8 new spec versions in four years
    - Actual implementations following specifications closely
                                                                                             OpenGL 4.2
• OpenGL 4.2 is a superset of DirectX 11 functionality
    - While retaining backwards compatibility
What’s new in OpenGL 4.2?
• OpenGL 4.2 standardized August 2011
  – Immediately shipped by NVIDIA
  – All Kepler and Fermi GPUs support 4.2
• Standardizes:
  – Compressed RGBA texture
  – Shader atomic counters
  – Immutable texture images
  – Further Direct3Disms
  – OpenGL Shading Language (GLSL) 4.20
OpenGL 4.2
• Official compressed RGBA texture formats
  – GL_ARB_texture_compression_bptc
  – S3TC/DXTC was widely implemented as EXT, but was never
    an ARB standard
  – Now, there’s standard RGBA compressed format
• Pixel store modes for compressed block texture
  specification
  – ARB_compressed_texture_pixel_storage
  – New pixel store modes for dealing with sub-rectangle loads of
    blocked compressed texture formats
Details on BPTC
• Four new compressed formats
   – GL_COMPRESSED_RGBA_BPTC_UNORM
      • RGBA format, lower signal-to-noise than S3TC
      • [0,1] component range for low-dynamic range (LDR) images
   – GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM
      • sRGB version of GL_COMPRESSED_RGBA_BPTC_UNORM
   – GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT
      • Compressed high dynamic range (HDR) format
   – GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT
      • HDR format for magnitude (unsigned) component values
• Each has fixed compression ratio with 4x4 blocks
   – Much more intricate encoding than S3TC format (see specification)
Easy Adoption of BPTC Compression
• If you are loading uncompressed texture data
  – Just use BPTC internal texture format with glTexImage2D
  – Driver will compress the image for you
• And you can read back the compressed image
  – Then subsequent loads can use glTexCompressedTexImage2D
    to be faster
  – However, BPTC compression is hard & takes time to do well
    so driver’s compression is not optimal
     • Consider using nvidia-texture-tools Google code project’s compressor
• Same approach works for DXTC/S3TC
OpenGL 4.2 Compressed Texture Format Table
                                                Fixed lossy
                                        Block   Compression
Token name                              Size    ratio          Range
GL_COMPRESSED_RGBA_BPTC_UNORM           4x4     6:1 RGB        LDR


GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM     4x4     6:1 sRGB       LDR


GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT     4x4     3:1 for RGB    Unsigned
                                                4:1 for RGBA   HDR

GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT   4x4     3:1 for RGB    Signed
                                                4:1 for RGBA   HDR
OpenGL 4.2 Compressed Texture
Pixel Storage
• New glPixelStorei state settings
  – Unpack state
     • GL_UNPACK_COMPRESSED_BLOCK_WIDTH
     • GL_UNPACK_COMPRESSED_BLOCK_HEIGHT
     • GL_UNPACK_COMPRESSED_BLOCK_DEPTH
     • GL_UNPACK_COMPRESSED_BLOCK_SIZE
  – Pack state, similar just starting GL_PACK_ ...
• Allows sub rectangle update for
  glCompressedTexImage2D,
  glCompressedTexSubImage2D, etc.
OpenGL 4.2 Atomic Counters
• Shaders seek to write/read from buffers need a way to get a
  unique memory location across alls shader instances
   – Prior to atomic counters, no way to coordinate such reads or writes
• New: atomic counters
   – Allow GLSL shaders to write at unique offset within a buffer object
   – Also allow GLSL shader to read at unique offset within a buffer object
   – Counter value stored in a buffer
       • Updated via glBuferSubData, glMapBuffer, etc.
• GLSL declaration of opaque object
   – layout ( binding = 2, offset = 0 ) uniform atomic_uint variable;
• GLSL usage
   – uint prior_value = atomicCounterIncrement(atomic_uint counter);
OpenGL 4.2 Atomic Counters
• More GLSL usage of atomic counters
  – uint updated_value =
      atomicCounterDecrement(atomic_uint counter);
  – uint current_value = atomicCounter(atomic_uint counter);
• Using
  – #extension GL_ARB_shader_atomic_counters : enable
• Expected usage
  – Atomic values are offsets for loads and store operations
  – Using GLSL imageLoad & imageStore built-in functions
  – Also part of OpenGL 4.2
In OpenGL 4.2, Shaders allowed Side-effects

• Very exciting development
  – Prior to 4.2, standard GLSL shaders had their
    explicit outputs and that was it
• In OpenGL 4.2
  – Loads and stores to images and buffers allowed from
    shaders
  – Use atomic operations or atomic counters to
    synchronize those updates
  – You need to manage any required ordering though
OpenGL 4.2 Immutable Texture Storage
• OpenGL allows flexible layout of mipmap levels with a
  texture object
• New commands gives a frozen mipmap layout
  – glTexStorage1D, glTexStorage2D, glTexStorage3D
     • Free of selector
  – glTextureStorage1DEXT, glTextureStorage2DEXT,
    glTextureStorage3DEXT
     • Explicit texture parameter
• Allow driver to assume layout of texture immutable
  – Still can update texels, just not layout
Direct3Dism Concept
• Allows your 3D content to be API agnostic
   – OpenGL supports both OpenGL & Direct3D conventions, so support either style

                                                      OpenGL
Your OpenGL                        Your                 API         OpenGL
 application                      OpenGL
                   content                            OpenGL         driver
  content         authored       application         + Direct3D
                 to OpenGL                          conventions
                conventions

                                                          3D API           hardware
                            Direct3D conventions        interface          interface   GPU
                          supported by OpenGL too


Your Direct3D                       Your             Direct3D
                                                        API         Direct3D
 application                      Direct3D
                   content                            Direct3D       driver
  content         authored       application        conventions
                 to Direct3D
                conventions
Further Direct3Disms in OpenGL 4.2
• Direct3Disms = semantic compatibility with Direct3D
  behaviors
  – Not just same functionality as Direct3D—but same
    functionality with Direct3D-style semantics
• Alignment constraints for mapped buffers
  – ARB_buffer_alignment
  – Makes sure mapped buffers are aligned for SIMD CPU
    instruction access
  – Really just a query for minimum alignment
     • Expect alignment to be no smaller than 64 bytes
OpenGL 4.2 Base Instance
• Extension on instanced rendering
  – For drawing multiple instances of geometry
     • Designed to be compatible with Direct3D
  – See ARB_instanced_arrays
• Allows the specification of
  – base vertex
  – base instance
  – for each instanced draw
• Generalizes instanced rendering
OpenGL 4.2 GLSL Features
• Conservative Depth
   – Permits depth/stencil tests before shading
   – Re-declares gl_FragDepth to restrict depth changes
• Allows restrictions
   – New depth can only increase depth
      • layout (depth_greater) out float gl_FragDepth;
   – New depth can only decrease depth
      • layout (depth_less) out float gl_FragDepth;
• To use
   – #extension GL_ARB_conservative_depth : enable
OpenGL 4.2 GLSL Features
• Miscellaneous GLSL changes
   – Allow line-continuation character using ‘’
   – Use UTF8 for character set, for comments
   – Implicit conversions of return values
   – const keyword allowed on variables with functions
   – No strict order of qualifiers
   – Adds layout qualifier binding for uniform blocks
   – C-style aggregate initializers
   – Allow .length method to return number of components or columns in
     vectors and matrices
   – Allow swizzles on scalars
   – New built-in constants for gl_MinProgramTexelOffset and Max
• All minor improvements for consistency with C/C++ and minimize
  aggravation
OpenGL-related
Linux Improvements
 Support for X Resize, Rotate, and Reflect Extension
    Also known as RandR
    Version 1.2 and 1.3
 OpenGL enables, by default, “Sync to Vertical Blank”
    Locks your glXSwapBuffers to the monitor refresh rates
    Matches Windows default now
    Previously disabled by default
OpenGL-related
Linux Improvements
• Expose additional full-scene antialiasing (FSAA) modes
  – 16x multisample FSAA on all GeForce GPUs
     • 2x2 supersampling of 4x multisampling
  – Ultra high-quality FSAA modes for Quadro GPUs
     • 32x multisample FSAA
        – 2x2 supersampling of 8x multisampling
     • 64x multisample FSAA
        – 4x4 supersampling of 4x multisampling

• Coverage sample FSAA on GeForce 8 series and better
  – 4 color/depth samples + 12 depth samples
Multisampling FSAA Patterns

    aliased      2x multisampling    4x multisampling      8x multisampling
1 sample/pixel    2 samples/pixel     4 samples/pixel       8 samples/pixel




64 bits/pixel      128 bits/pixel      256 bits/pixel       512 bits/pixel




     Assume: 32-bit RGBA + 24-bit Z + 8-bit Stencil = 64 bits/sample
Supersampling FSAA Patterns
2x2 supersampling       2x2 supersampling        4x4 supersampling
of 4x multisampling     of 8x multisampling      of 16x multisampling
16 samples/pixel        32 samples/pixel         64 samples/pixel




   1024 bits/pixel        2048 bits/pixel           4096 bits/pixel


                                      Quadro GPUs

  Assume: 32-bit RGBA + 24-bit Z + 8-bit Stencil = 64 bits/sample
NVIDIA X Server Settings for Linux
Control Panel
GLX Protocol
• Network transparent OpenGL
  – Run OpenGL app on one machine, display the X and
    3D on a different machine

     3D app                                 X server

                                               GLX
        OpenGL                                Server

               GLX
              Client                             OpenGL



                       network connection
OpenGL-related
Linux Improvements
     Official GLX Protocol support for OpenGL extensions
•   GL_ARB_half_float_pixel        •   GL_EXT_point_parameters
•   GL_ARB_transpose_matrix        •   GL_EXT_stencil_two_side
•   GL_EXT_blend_equation_separate •   GL_NV_copy_image
•   GL_EXT_depth_bounds_test       •   GL_NV_half_float
•   GL_EXT_framebuffer_blit        •   GL_NV_occlusion_query
•   GL_EXT_framebuffer_multisample •   GL_NV_point_sprite
•   GL_EXT_packed_depth_stencil    •   GL_NV_register_combiners2
OpenGL-related
Linux Improvements
     Tentative GLX Protocol support for OpenGL extensions
•   GL_ARB_map_buffer_range      • GL_EXT_vertex_attrib_64bit
•   GL_ARB_shader_subroutine     • GL_NV_conditional_render
•   GL_ARB_stencil_two_side      • GL_NV_framebuffer_multisample
•   GL_EXT_texture_integer         _coverage
•   GL_EXT_transform_feedback2   • GL_NV_texture_barrier
                                 • GL_NV_transform_feedback2
Synchronizing X11-based OpenGL Streams
• New extension
  – GL_EXT_x11_sync_object
• Bridges the X Synchronization Extension with
  OpenGL 3.2 “sync” objects (ARB_sync)
• Introduces new OpenGL command
  – GLintptr sync_handle;
  – GLsync glImportSyncEXT (GLenum external_sync_type,
    GLintptr external_sync, GLbitfield flags);
     • external_sync_type must be GL_SYNC_X11_FENCE_EXT
     • flags must be zero
Other Linux Updates
• GL_CLAMP behaves in conformant way now
  – Long-standing work around for original Quake 3
• Enabled 10-bit per component X desktop support
  – GeForce 8 and better GPUs
• Support for 3D Vision Pro stereo now
What is 3D Vision Pro?
• For Professionals
• All of 3D Vision support, plus
   •   Radio frequency (RF) glasses,
       Bidirectional
   •   Query compass, accelerometer,
       battery
   •   Many RF channels – no collision
   •   Up to ~120 feet
   •   No line of sight needed to
       emitter
   •   NVAPI to control
OpenGL Inter-GPU Communication
                                            PCI Express
• NV_copy_image
  extension                                                   GPU1
                           GPU0   destRC             srcRC
   – Quadro-only            Compute/                       Compute/
                                         DMA       DMA
                             Render                         Render
   – (Asynchronous) copy     Engine     Engine    Engine
                                                            Engine
     between GPU’s
   – Does not require        GPU                             GPU
     binding textures or     Memory                          Memory
                                       destTex    srcTex
     state changes
   – App handles           wglCopyImageSubDataNV(srcRC, srcTex,
                             GL_TEXTURE_2D,0, 0, 0, 0, destRC, destTex,
     synchronization         GL_TEXTURE_2D, 0, 0, 0, 0,
                             width, height, 1);
OpenGL Interoperability




                          Multi-GPU
DirectX / OpenGL Interoperability
           Application                 • DirectX interop is a GL API extension
                                         (WGL_NVX_dx_interop)
                                       • Premise: create resources in DirectX,
                                         get access to them within OpenGL
 OpenGL                      DirectX
  driver                      driver   • Read/write support for DirectX 9
                                           textures, render targets and vertex
                                           buffers
                                       •   Implicit synchronization is done
    Texture / Vertex buffer object         between Direct3D and OpenGL
                data                   •   Soon: DirectX 10/11 support

               GPU

              Display
DirectX / OpenGL Interoperability

// create Direct3D device and resources the regular way
direct3D->CreateDevice(..., &dxDevice);

dxDevice->CreateRenderTarget(width, height, D3DFMT_A8R8G8B8,
                             D3DMULTISAMPLE_4_SAMPLES, 0,
                             FALSE, &dxColorBuffer, NULL);

dxDevice->CreateDepthStencilSurface(width, height, D3DFMT_D24S8,
                                    D3DMULTISAMPLE_4_SAMPLES, 0,
                                    FALSE, &dxDepthBuffer, NULL);
DirectX / OpenGL Interoperability

// Register DirectX device for OpenGL interop
HANDLE gl_handleD3D = wglDXOpenDeviceNVX(dxDevice);

// Register DirectX render targets as GL texture objects
GLuint names[2];
HANDLE handles[2];

handles[0] = wglDXRegisterObjectNVX(gl_handleD3D, dxColorBuffer,
        names[0], GL_TEXTURE_2D_MULTISAMPLE, WGL_ACCESS_READ_WRITE_NVX);
handles[1] = wglDXRegisterObjectNVX(gl_handleD3D, dxDepthBuffer,
       names[0], GL_TEXTURE_2D_MULTISAMPLE, WGL_ACCESS_READ_WRITE_NVX);

// Now textures can be used as normal OpenGL textures
DirectX / OpenGL Interoperability

// Rendering example: DirectX and OpenGL rendering to the
// same render target
direct3d_render_pass(); // D3D renders to the render targets as usual

// Lock the render targets for GL access
wglDXLockObjectsNVX(handleD3D, 2, handles);

opengl_render_pass(); // OpenGL renders using the textures as render
                      // targets (e.g., attached to an FBO)

wglDXUnlockObjectsNVX(handleD3D, 2, handles);

direct3d_swap_buffers(); // Direct3D presents the results on the screen
CUDA Interoperability example
           Application             • Interop APIs are extensions to OpenGL
                                     and CUDA
                                   • Multi-card interop 2x faster on Quadro /
                                       Tesla. Transfer between cards, minimal
 OpenGL                  CUDA          CPU involvement
  driver                 driver
                                   •   GeForce requires CPU copy


 Quadro        fast       Tesla
   or                       or
             Interop
 GeForce                 GeForce

 Display
Multi-GPU OpenGL Interoperability
            Application                • Interop using NV_copy_image
                                         OpenGL extension
           GPU affinity                • Transfer directly between cards,
                                         minimal CPU involvement
 OpenGL                    OpenGL
  driver                    driver     • Quadro only

 Quadro          fast      Quadro
                          Off-screen
                Interop   buffer


 Display
What is path rendering?
• A rendering approach
   –   Resolution-independent two-dimensional
       graphics
   –   Occlusion & transparency depend on rendering
       order
        •     So called “Painter’s Algorithm”
   –   Basic primitive is a path to be filled or stroked
        •     Path is a sequence of path commands
        •     Commands are
               – moveto, lineto, curveto, arcto, closepath, etc.
• Standards
   –   Content: PostScript, PDF, TrueType fonts,
       Flash, Scalable Vector Graphics (SVG), HTML5
       Canvas, Silverlight, Office drawings
   –   APIs: Apple Quartz 2D, Khronos OpenVG,
       Microsoft Direct2D, Cairo, Skia, Qt::QPainter,
       Anti-grain Graphics,
What is NV_path_rendering?

•   OpenGL extension to GPU-accelerate path rendering
•   Uses “stencil, then cover” (StC) approach
     – Create a path object
     – Step 1: “Stencil” the path object into the stencil buffer
          •   GPU provides fast stenciling of filled or stroked paths
     – Step 2: “Cover” the path object and stencil test against its coverage stenciled
       by the prior step
          •   Application can configure arbitrary shading during the step
     – More details later
•   Supports the union of functionality of all major path rendering standards
     – Includes all stroking embellishments
     – Includes first-class text and font support
     – Allows this functionality to mix with traditional 3D and programmable shading
Pixel pipeline                          Vertex pipeline          Path pipeline

                        Application
                                                               Path specification


Pixel assembly                          Vertex assembly         Transform path
    (unpack)
                                       Vertex operations
                   transform
                    feedback
                                      Primitive assembly

Pixel operations                      Primitive operations   Fill/Stroke
                                                              Covering


  Pixel pack                             Rasterization
   read            Texture
                                      Fragment operations
   back            memory
                                                                   Fill/Stroke
  Application                          Raster operations           Stenciling

                                         Framebuffer         Display
Talk on Tuesday
• “GPU-Accelerated Path
  Rendering”
  – Tuesday, May 15
  – 14:00-14:50, Room A3
• Teaser scene
Bindless Graphics
• Problem: Binding to different objects (textures,
  buffers) takes a lot of validation time in driver
  – And applications are limited to a small palette of
    bound buffers and textures
  – Approach of OpenGL, but also Direct3D
• Solution: Exposes GPU virtual addresses
  – Let shaders and vertex puller access buffer and
    texture memory by its virtual address!
Prior to Bindless Graphics
• Traditional OpenGL
  – GPU memory reads are “indirected” through
    bindings
    • Limited number of texture units and vertex array
      attributes
  – glBindTexture—for texture images and buffers
  – glBindBuffer—for vertex arrays
Buffer-centric Evolution
• Data moves onto GPU, away from CPU
      – Apps on CPUs just too slow at moving data otherwise
    Array Element Buffer       glBegin, glDrawElements, etc.

       Object (VeBO)                                                 Texture Buffer
                                                                     Object (TexBO)
Vertex Array Buffer Object
                                        Vertex Puller
         (VaBO)                                                                       texel data


    Transform Feedback
       Buffer (XBO)                   Vertex Shading                                                    Pixel Unpack
                                                                                                        Buffer (PuBO)
                 vertex data                                             Texturing
                                   Geometry Shading                  glDrawPixels, glTexImage2D, etc.
 Parameter Buffer
  Object (PaBO)                                                                       glReadPixels,
                                                                                                        Pixel Pack Buffer
                                                                                               etc.
                                          Fragment                         Pixel
                                                                                                             (PpBO)
   Uniform Buffer                         Shading                         Pipeline
                                                                                                               pixel data
   Object (UBO)
parameter data
                                                           Framebuffer
Kepler – Bindless Textures
  • Enormous increase in the number of unique textures
    available to shaders at run-time
  • More different materials and richer texture detail in a
    scene
                texture #0
Shader code     texture #1                    Shader code
                texture #2
                   …
               texture #127
                                                   …

     Pre-Kepler texture binding model     Kepler bindless textures
                                        over 1 million unique textures
Kepler – Bindless Textures
Pre-Kepler texture binding model        Kepler bindless textures

CPU                                   CPU
  Load texture A                        Load textures A, B, C
  Load texture B                        Draw()
  Load texture C
  Bind texture A to slot I                 GPU
  Bind texture B to slot J                   Read from texture A
  Draw()                                     Read from texture B
                                             Read from texture C
     GPU
       Read from texture at slot I
       Read from texture at slot J

CPU
  Bind texture C to slot K            Bindless model reduces CPU
  Draw()
                                      overhead and improves GPU access
      GPU                             efficiency
        Read from texture at slot K
Bindless Textures

  • Apropos for ray-tracing and advanced rendering
    where textures cannot be “bound” in advance



Shader code
Bindless performance benefit




   Numbers obtained with a directed test
More Information on Bindless Texture
• Kepler has new NV_bindless_texture extension
  – Texture companion to
      • NV_vertex_buffer_unified_memory for bindless vertex
        arrays
      • NV_shader_buffer_load for bindless shader buffer reads
• API specification publically available
  – http://developer.download.nvidia.com/opengl/specs/GL_NV_bindless_texture.txt
API Usage to Initialize Bindless Texture
• Make a conventional OpenGL texture object
  – With a 32-bit GLuint name
• Query a 64-bit texture handle from 32-bit texture name
  – GLuint64 glGetTextureHandleNV(GLuint);
• Make handle resident in context’s GPU address space
  – void glMakeTextureHandleResidentNV(GLuint64);
Writing GLSL for Bindless Textures
• Request GLSL to understand bindless textures
  – #version 400 // or later
  – #extension GL_NV_bindless_texture : require
• Declare a sampler in the normal way
  – in sampler2D bindless_texture;
• Alternatively, access bindless samplers in big array:
  – uniform Samplers {
      sampler2D lotsOfSamplers[256];
    }
  – Exciting: 256 samplers exceeds the available texture units!
Update Sampler Uniforms with
Bindless Texture Handle
• Get a location for a sampler or image uniform
  – GLint loc = glGetUniformLocation(program,
    “bindless_texture”);
  – GLint loc_array = glGetUniformLocation(program,
    “lotsOfSamplers”);
• Then set sampler to the bindless texture handle
  – glProgramUniformHandleui64NV(program, location,
    1, &bindless_handle);
Seam-free Cube Map Edges
• Cube maps have edges along each face
   – Traditionally texture mapping hardware
     simply clamps to these seam edges
• Results in “seam” artifacts
   – Particularly when level-of-detail bias is
     large
       • Meaning very blurry levels
       • But seams appear sharply
                                                 seam
• Use
  glEnable( GL_TEXTURE_CUBE_MAP_SEA
  MLESS)
  to mitigate these artifacts on
  context-wide basis
   – Applies to all cube maps
Seamless Cube Maps: Before and After
 • Before: with edge seams   • After: without




     seams
Kepler Provides
Per-texture Seamless Cube Map Support
• Kepler GPUs support
  – GL_AMD_seamless_cubemap_per_texture
  – Provides per-texture object parameters
    • glTexParameteriv(GL_TEXTURE_2D,
         GL_TEXTURE_CUBE_MAP_SEAMLESS_ARB, GL_TRUE);
  – Not sure if you want a mix of seamless and seamed
    cube maps…
    • …but supported anyway
Cg Toolkit 3.1 Update
• Cg 3.0 = big upgrade
   – Introduced Shader Model 5.0 support for OpenGL
       • New gp5vp, gp5gp, and gp5fp profiles correspond to
         Shader Model 5.0 vertex, geometry, and fragment profiles
          – Compiles to assembly text for the NV_gpu_program5 assembly-level
            shading language
   – Also programmable tessellation profiles
       • gp5tcp = NV_gpu_program5 tessellation control program
       • gp5tep = NV_gpu_program5 tessellation evaluation program
   – Also DirectX 11 profiles
• Cg 3.1 refines 3.0 functionality
   – Bug fixes, constant buffer support, better GLSL support
Cg 3.1 OpenGL Shader Model 5.0 Profiles
                 indices        primitive                                         rectangular &              indices             primitive
                               topology &                                           triangluar                                  topology &
                             in-band vertex                                            patch                                  in-band vertex
                                attributes                                                              index to vertex          attributes
             index to vertex
                                                                                                           attribute
                attribute
                                                                                               vertex       puller
                 puller                                                                                                   vertex
                                                                                           attributes                     attributes
                                                             control point                                                                      vertex
                vertex                                         (vertex)                                                                         shader
            attributes                         vertex           shader
                             gp4vp             shader                             gp5vp                                        gp5vp
                                                                                 control
                                      assembled                                  points     tessellation                                assembled
                                                                                                                                         primitive
                                      primitive            level-of-detail                      control (hull)
                                                           parameters                            shader                                         geometry
                                               geometry
                             gp4gp              shader
                                                                                  gp5tcs                                       gp5gp             shader

                                                           tessellation                      patch control
                                     primitive stream                                                                                   primitive stream
                                                            generator                        points &
programmable                   clipping,                                                     parameters                           clipping,
                               setup, &                                                                                           setup, &
                             rasterization                                                                                      rasterization
fixed-function
                                      fragments                                   gp5tes       tessellation
                                                                                                                                         fragments
                                                            fine primitive
                                                                                                                                                fragment
                                                            topology                           evaluation
                              gp4fp            fragment     (u, v) coordinates
                                                                                           (domain) shader                      gp5fp             shader
                                                 shader
Cg 2.x support                                            Cg 3.x adds programmable tessellation
Cg 3.1 profiles
                  OpenGL         OpenGL
 Shader           Shader Model   Shader Model   OpenGL Shading
 domain           4.0            5.0            Language (GLSL)   DirectX 10   DirectX 11
 Vertex (or
 control point)     gp4vp          gp5vp            glslv         vs_4_0        vs_5_0
 Tessellation
 control (hull)       n/a          gp5tcp            n/a            n/a         hs_5_0
 Tessellation         n/a                            n/a
 evaluation                       gp5tep                            n/a         ds_5_0
 (domain)

 Geometry           gp5gp          gp5gp            glslg         gs_4_0        gs_5_0

 Fragment           gp4fp          gp5fp            glslf         ps_4_0        ps_5_0
Cg 3.1 Update
• Various improvements & bug fixes
  – Constant buffer fixes
  – GLSL translation for different GLSL versions
• Support for CENTROID, FLAT, and
  NOPERSPECTIVE fragment program interpolants
• Deprecated functionality
  – Removed Direct3D 8 support
  – Minimum Mac OS X version: 10.5 (Leopard)
Cg Off-line Compiler
Compiles GLSL to Assembly
•   GLSL is “opaque” about the assembly generated
     – Hard to know the quality of the code generated with the driver
     – Cg Toolkit 3.0 is a useful development tool even for GLSL programmers
•   Cg Toolkit’s cgc command-line compiler actually accepts both the Cg (cgc’s
    default) and GLSL (with –oglsl flag) languages
     – Uses the same compiler technology used to compile GLSL within the driver
•   Example command line for compiling earlier tessellation control and
    evaluation GLSL shaders
     – cgc -profile gp5tcp -oglsl -po InputPatchSize=3 -po PATCH_3 filename.glsl
       (assumes 3 input control points and 3 outputs points)
     – cgc -profile gp5tep -oglsl -po PATCH_3 filename.glsl (assumes 3 input control
       points)
Questions?
Other OpenGL-related Sessions at GTC
• S0024: GPU-Accelerated Path Rendering
   – Tuesday, 14:00, Room A3
• S0049: Using the GPU Direct for Video API
   – Tuesday, 15:00, Room J8
• S0356: Optimized Texture Transfers
   – Tuesday, 16:00, Room J2
• S0267A: Mixing Graphics and Compute with Multiple GPUs
   – Tuesday, 17:00, Room J2
• S0353: Programming Multi-GPUs for Scalable Rendering
   – Wednesday, 9:00, Room A1
• S0267B - Mixing Graphics and Compute with Multiple GPUs
   – Thursday, 17:30, Room L

More Related Content

What's hot

CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well Mark Kilgard
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading LanguageJungsoo Nam
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsMark Kilgard
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016Mark Kilgard
 
CS 354 Introduction
CS 354 IntroductionCS 354 Introduction
CS 354 IntroductionMark Kilgard
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingMark Kilgard
 
NV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked QuestionsNV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked QuestionsMark Kilgard
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardMark Kilgard
 
CS 354 Viewing Stuff
CS 354 Viewing StuffCS 354 Viewing Stuff
CS 354 Viewing StuffMark Kilgard
 
CS 354 Programmable Shading
CS 354 Programmable ShadingCS 354 Programmable Shading
CS 354 Programmable ShadingMark Kilgard
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path RenderingMark Kilgard
 
Avoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsAvoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsMark Kilgard
 
vkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan APIvkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan APITristan Lorach
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 

What's hot (20)

CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well Modern OpenGL Usage: Using Vertex Buffer Objects Well
Modern OpenGL Usage: Using Vertex Buffer Objects Well
 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading Language
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
 
CS 354 Introduction
CS 354 IntroductionCS 354 Introduction
CS 354 Introduction
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path Rendering
 
NV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked QuestionsNV_path_rendering Frequently Asked Questions
NV_path_rendering Frequently Asked Questions
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforward
 
CS 354 Viewing Stuff
CS 354 Viewing StuffCS 354 Viewing Stuff
CS 354 Viewing Stuff
 
CS 354 Programmable Shading
CS 354 Programmable ShadingCS 354 Programmable Shading
CS 354 Programmable Shading
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path Rendering
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
Avoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsAvoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL Pitfalls
 
vkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan APIvkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan API
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 

Viewers also liked

GTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang KeynoteGTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang KeynoteNVIDIA
 
Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory
 Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory
Eternal Engineering Equipment (p.) Ltd., Pune, Vibration LaboratoryIndiaMART InterMESH Limited
 
Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)
Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)
Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)Matthias Trapp
 
Milestone Highway Construction
Milestone Highway ConstructionMilestone Highway Construction
Milestone Highway Constructioncoreycarr
 
Rancon Foundry: industrial parts supplier in Europe
Rancon Foundry: industrial parts supplier in EuropeRancon Foundry: industrial parts supplier in Europe
Rancon Foundry: industrial parts supplier in EuropeRancon
 
Global Power Systems Pvt. Ltd., Nagpur, Generators Set & Controller
Global Power Systems Pvt. Ltd., Nagpur, Generators Set & ControllerGlobal Power Systems Pvt. Ltd., Nagpur, Generators Set & Controller
Global Power Systems Pvt. Ltd., Nagpur, Generators Set & ControllerIndiaMART InterMESH Limited
 

Viewers also liked (11)

NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
GTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang KeynoteGTC 2012 Jen-Hsun Huang Keynote
GTC 2012 Jen-Hsun Huang Keynote
 
Machinery industrial parts
Machinery industrial partsMachinery industrial parts
Machinery industrial parts
 
JMM Machining & Construction of parts
JMM Machining & Construction of partsJMM Machining & Construction of parts
JMM Machining & Construction of parts
 
Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory
 Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory
Eternal Engineering Equipment (p.) Ltd., Pune, Vibration Laboratory
 
Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)
Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)
Interactive Rendering Techniques for Highlighting (3D GeoInfo 2010)
 
Regulators and rectifiers
Regulators and rectifiersRegulators and rectifiers
Regulators and rectifiers
 
Milestone Highway Construction
Milestone Highway ConstructionMilestone Highway Construction
Milestone Highway Construction
 
Rancon Foundry: industrial parts supplier in Europe
Rancon Foundry: industrial parts supplier in EuropeRancon Foundry: industrial parts supplier in Europe
Rancon Foundry: industrial parts supplier in Europe
 
Global Power Systems Pvt. Ltd., Nagpur, Generators Set & Controller
Global Power Systems Pvt. Ltd., Nagpur, Generators Set & ControllerGlobal Power Systems Pvt. Ltd., Nagpur, Generators Set & Controller
Global Power Systems Pvt. Ltd., Nagpur, Generators Set & Controller
 
Animation basics
Animation basicsAnimation basics
Animation basics
 

Similar to GTC 2012: NVIDIA OpenGL in 2012

Casing3d opengl
Casing3d openglCasing3d opengl
Casing3d openglgowell
 
2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPCDVClub
 
Incremental pattern matching in the VIATRA2 model transformation framework
Incremental pattern matching in the VIATRA2 model transformation frameworkIncremental pattern matching in the VIATRA2 model transformation framework
Incremental pattern matching in the VIATRA2 model transformation frameworkIstvan Rath
 
Turmeric SOA Cloud Mashups
Turmeric SOA Cloud MashupsTurmeric SOA Cloud Mashups
Turmeric SOA Cloud Mashupskingargyle
 
Modern Graphics Pipeline Overview
Modern Graphics Pipeline OverviewModern Graphics Pipeline Overview
Modern Graphics Pipeline Overviewslantsixgames
 
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...Daniele Gianni
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationSoftwarePractice
 
Geo alberta2010 ppt_template
Geo alberta2010 ppt_templateGeo alberta2010 ppt_template
Geo alberta2010 ppt_templatePaul Bekker
 
Seattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / CassandraSeattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / Cassandraclive boulton
 
Overview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelOverview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelSoftwarePractice
 
NIAR_VRC_2010
NIAR_VRC_2010NIAR_VRC_2010
NIAR_VRC_2010fftoledo
 
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012Eduserv
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 
GeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL ExtensionsGeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL Extensionsicastano
 

Similar to GTC 2012: NVIDIA OpenGL in 2012 (20)

GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?
 
Casing3d opengl
Casing3d openglCasing3d opengl
Casing3d opengl
 
3 d to _hpc
3 d to _hpc3 d to _hpc
3 d to _hpc
 
3 d to_hpc
3 d to_hpc3 d to_hpc
3 d to_hpc
 
2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPC
 
Incremental pattern matching in the VIATRA2 model transformation framework
Incremental pattern matching in the VIATRA2 model transformation frameworkIncremental pattern matching in the VIATRA2 model transformation framework
Incremental pattern matching in the VIATRA2 model transformation framework
 
Ayse
AyseAyse
Ayse
 
Os Raysmith
Os RaysmithOs Raysmith
Os Raysmith
 
Turmeric SOA Cloud Mashups
Turmeric SOA Cloud MashupsTurmeric SOA Cloud Mashups
Turmeric SOA Cloud Mashups
 
Modern Graphics Pipeline Overview
Modern Graphics Pipeline OverviewModern Graphics Pipeline Overview
Modern Graphics Pipeline Overview
 
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
 
Realizing OpenGL
Realizing OpenGLRealizing OpenGL
Realizing OpenGL
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservation
 
Geo alberta2010 ppt_template
Geo alberta2010 ppt_templateGeo alberta2010 ppt_template
Geo alberta2010 ppt_template
 
Seattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / CassandraSeattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / Cassandra
 
Overview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle ModelOverview of the TriBITS Lifecycle Model
Overview of the TriBITS Lifecycle Model
 
NIAR_VRC_2010
NIAR_VRC_2010NIAR_VRC_2010
NIAR_VRC_2010
 
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
GeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL ExtensionsGeForce 8800 OpenGL Extensions
GeForce 8800 OpenGL Extensions
 

More from Mark Kilgard

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...Mark Kilgard
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsMark Kilgard
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017Mark Kilgard
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsMark Kilgard
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectanglesMark Kilgard
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineMark Kilgard
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsMark Kilgard
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam ReviewMark Kilgard
 
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsCS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsMark Kilgard
 
CS 354 Performance Analysis
CS 354 Performance AnalysisCS 354 Performance Analysis
CS 354 Performance AnalysisMark Kilgard
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingMark Kilgard
 
CS 354 Bezier Curves
CS 354 Bezier Curves CS 354 Bezier Curves
CS 354 Bezier Curves Mark Kilgard
 

More from Mark Kilgard (16)

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School Students
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUs
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectangles
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional Improvements
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam Review
 
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsCS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
 
CS 354 Performance Analysis
CS 354 Performance AnalysisCS 354 Performance Analysis
CS 354 Performance Analysis
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & Tracing
 
CS 354 Typography
CS 354 TypographyCS 354 Typography
CS 354 Typography
 
CS 354 Bezier Curves
CS 354 Bezier Curves CS 354 Bezier Curves
CS 354 Bezier Curves
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

GTC 2012: NVIDIA OpenGL in 2012

  • 2. Talk  S0023 - NVIDIA OpenGL for 2012 Mark Kilgard (Principal Software Engineer, NVIDIA) Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Topics covered include the latest advances available for Cg 3.1, the OpenGL Shading Language (GLSL); programmable tessellation; improved support for Direct3D conventions; integration with Direct3D and CUDA resources; bindless graphics; and more. When you utilize the latest OpenGL innovations from NVIDIA in your graphics applications, you benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard. Topic Areas: Computer Graphics; Development Tools & Libraries; Visualization; Audio, Image and Video Processing Level: Intermediate Day: Monday, 05/14 Time: 9:00 am - 10:20 pm Location: Room A3
  • 3. Mark Kilgard • Principal System Software Engineer – OpenGL driver and API evolution – Cg (“C for graphics”) shading language – GPU-accelerated path rendering • OpenGL Utility Toolkit (GLUT) implementer • Author of OpenGL for the X Window System • Co-author of Cg Tutorial
  • 4. Outline • OpenGL’s importance to NVIDIA • OpenGL API improvements & new features – OpenGL 4.2 – Direct3D interoperability – GPU-accelerated path rendering – Kepler Improvements • Bindless Textures • Linux improvements & new features • Cg 3.1 update
  • 5. NVIDIA’s OpenGL Leverage Cg GeForce Parallel Nsight Tegra Quadro OptiX
  • 6. Example of Hybrid Rendering with OptiX OpenGL (Rasterization) OptiX (Ray tracing)
  • 7. Nsight Provides OpenGL Profiling Configure Application Trace Settings
  • 8. Parallel Nsight Provides OpenGL Profiling Magnified trace options shows specific OpenGL (and Cg) tracing options
  • 9. Parallel Nsight Provides OpenGL Profiling
  • 10. Parallel Nsight Provides OpenGL Profiling Trace of mix of OpenGL and CUDA shows glFinish & OpenGL draw calls
  • 12. OpenGL 3D Graphics API • cross-platform • most functional • peak performance • open standard • inter-operable • well specified & documented • 20 years of compatibility
  • 13. OpenGL Spawns Closely Related Standards Congratulations: WebGL officially approved, February 2012 “The web is now 3D enabled”
  • 14. OpenGL 4 – DirectX 11 Superset Buffer and Event Interop • Interop with a complete compute solution – OpenGL is for graphics – CUDA / OpenCL is for compute • Shaders can be saved to and loaded from binary blobs – Ability to query a binary shader, and save it for reuse later • Flow of content between desktop and mobile – All of OpenGL ES 2.0 capabilities available on desktop – EGL on Desktop in the works – WebGL bridging desktop and mobile • Cross platform – Mac, Windows, Linux, Android, Solaris, FreeBSD – Result of being an open standard
  • 15. Increasing Functionality of OpenGL Tessellation and Compute Features 4.X Geometry Shaders 3.X Vertex and Fragment Shaders 2.X Fixed Function 1.X
  • 16. Classic OpenGL State Machine • From 1991-2007 * vertex & fragment processing got programmable 2001 & 2003 [source: GL 1.0 specification, 1992]
  • 18. OpenGL 3.x Conceptual Processing Flow (pre-2010) uniform/ primitive topology, parameters transformed Legend vertex data Geometric primitive buffer objects Vertex Vertex assembly & vertices assembly primitive processing batch transformed processing programmable pixels operations type, vertex fragments vertex data attributes point, line, and polygon fixed-function filtered texels geometry operations Transform texture fragments buffer data vertex transform feedback fetches buffer objects feedback pixels in framebuffer object textures buffer objects stenciling, depth testing, primitive batch type, vertex blending, accumulation vertex indices, texture vertex attributes texture fetches buffer buffer data, objects Texture Fragment Raster unmap Framebuffer mapping fragment processing operations buffer texture Command Buffer fetches parser store pixel map buffer, pack get buffer buffer data objects image and bitmap texture fragments pixel image pixel image or unpack Pixel specification texture image buffer specification packing objects OpenGL 3.2 pixels to pack image rectangles, bitmaps Image Pixel Pixel primitive unpacking unpacked processing pixels processing copy pixels, copy texture image
  • 19. Patch Control point Patch tessellation Patch evaluation assembly & processing transformed transformed generation transformed processing control points processing transformed patch patch, bivariate tessellation patch control patch domain points patch topology, evaluated patch vertex texture fetches primitive topology, transformed Geometric primitive Legend Vertex Vertex vertex data patch data assembly & assembly primitive processing programmable batch transformed processing vertices operations type, vertex pixels vertex data attributes point, line, fragments and polygon fixed-function geometry filtered texels Transform fragments operations vertex texture transform feedback fetches buffer data buffer objects feedback pixels in framebuffer object textures buffer primitive batch type, objects stenciling, depth testing, vertex vertex indices, texture blending, accumulation vertex attributes texture fetches buffer buffer data, objects Texture Fragment Raster unmap Framebuffer mapping fragment processing operations buffer pixel texture Command Buffer pack fetches parser store buffer map buffer, get buffer objects texture image and bitmap data pixel fragments image pixel image or unpack Pixel specification texture image buffer packing specification objects pixels to pack OpenGL 4.2 image rectangles, Image Pixel Pixel bitmaps primitive copy pixels, unpacking unpacked processing pixels processing copy texture image
  • 20.
  • 21. Accelerating OpenGL Innovation NOW! 2004 2005 2006 2007 2008 2009 2010 2011 2012 DirectX 10.0 DirectX 10.1 DirectX 11 DirectX 9.0c It’s OpenGL 3.1 OpenGL 3.3 + cooking! OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 OpenGL 3.2 OpenGL 4.0 OpenGL 4.1 • OpenGL increased pace of innovation - Expect 8 new spec versions in four years - Actual implementations following specifications closely OpenGL 4.2 • OpenGL 4.2 is a superset of DirectX 11 functionality - While retaining backwards compatibility
  • 22. What’s new in OpenGL 4.2? • OpenGL 4.2 standardized August 2011 – Immediately shipped by NVIDIA – All Kepler and Fermi GPUs support 4.2 • Standardizes: – Compressed RGBA texture – Shader atomic counters – Immutable texture images – Further Direct3Disms – OpenGL Shading Language (GLSL) 4.20
  • 23. OpenGL 4.2 • Official compressed RGBA texture formats – GL_ARB_texture_compression_bptc – S3TC/DXTC was widely implemented as EXT, but was never an ARB standard – Now, there’s standard RGBA compressed format • Pixel store modes for compressed block texture specification – ARB_compressed_texture_pixel_storage – New pixel store modes for dealing with sub-rectangle loads of blocked compressed texture formats
  • 24. Details on BPTC • Four new compressed formats – GL_COMPRESSED_RGBA_BPTC_UNORM • RGBA format, lower signal-to-noise than S3TC • [0,1] component range for low-dynamic range (LDR) images – GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM • sRGB version of GL_COMPRESSED_RGBA_BPTC_UNORM – GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT • Compressed high dynamic range (HDR) format – GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT • HDR format for magnitude (unsigned) component values • Each has fixed compression ratio with 4x4 blocks – Much more intricate encoding than S3TC format (see specification)
  • 25. Easy Adoption of BPTC Compression • If you are loading uncompressed texture data – Just use BPTC internal texture format with glTexImage2D – Driver will compress the image for you • And you can read back the compressed image – Then subsequent loads can use glTexCompressedTexImage2D to be faster – However, BPTC compression is hard & takes time to do well so driver’s compression is not optimal • Consider using nvidia-texture-tools Google code project’s compressor • Same approach works for DXTC/S3TC
  • 26. OpenGL 4.2 Compressed Texture Format Table Fixed lossy Block Compression Token name Size ratio Range GL_COMPRESSED_RGBA_BPTC_UNORM 4x4 6:1 RGB LDR GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM 4x4 6:1 sRGB LDR GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT 4x4 3:1 for RGB Unsigned 4:1 for RGBA HDR GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT 4x4 3:1 for RGB Signed 4:1 for RGBA HDR
  • 27. OpenGL 4.2 Compressed Texture Pixel Storage • New glPixelStorei state settings – Unpack state • GL_UNPACK_COMPRESSED_BLOCK_WIDTH • GL_UNPACK_COMPRESSED_BLOCK_HEIGHT • GL_UNPACK_COMPRESSED_BLOCK_DEPTH • GL_UNPACK_COMPRESSED_BLOCK_SIZE – Pack state, similar just starting GL_PACK_ ... • Allows sub rectangle update for glCompressedTexImage2D, glCompressedTexSubImage2D, etc.
  • 28. OpenGL 4.2 Atomic Counters • Shaders seek to write/read from buffers need a way to get a unique memory location across alls shader instances – Prior to atomic counters, no way to coordinate such reads or writes • New: atomic counters – Allow GLSL shaders to write at unique offset within a buffer object – Also allow GLSL shader to read at unique offset within a buffer object – Counter value stored in a buffer • Updated via glBuferSubData, glMapBuffer, etc. • GLSL declaration of opaque object – layout ( binding = 2, offset = 0 ) uniform atomic_uint variable; • GLSL usage – uint prior_value = atomicCounterIncrement(atomic_uint counter);
  • 29. OpenGL 4.2 Atomic Counters • More GLSL usage of atomic counters – uint updated_value = atomicCounterDecrement(atomic_uint counter); – uint current_value = atomicCounter(atomic_uint counter); • Using – #extension GL_ARB_shader_atomic_counters : enable • Expected usage – Atomic values are offsets for loads and store operations – Using GLSL imageLoad & imageStore built-in functions – Also part of OpenGL 4.2
  • 30. In OpenGL 4.2, Shaders allowed Side-effects • Very exciting development – Prior to 4.2, standard GLSL shaders had their explicit outputs and that was it • In OpenGL 4.2 – Loads and stores to images and buffers allowed from shaders – Use atomic operations or atomic counters to synchronize those updates – You need to manage any required ordering though
  • 31. OpenGL 4.2 Immutable Texture Storage • OpenGL allows flexible layout of mipmap levels with a texture object • New commands gives a frozen mipmap layout – glTexStorage1D, glTexStorage2D, glTexStorage3D • Free of selector – glTextureStorage1DEXT, glTextureStorage2DEXT, glTextureStorage3DEXT • Explicit texture parameter • Allow driver to assume layout of texture immutable – Still can update texels, just not layout
  • 32. Direct3Dism Concept • Allows your 3D content to be API agnostic – OpenGL supports both OpenGL & Direct3D conventions, so support either style OpenGL Your OpenGL Your API OpenGL application OpenGL content OpenGL driver content authored application + Direct3D to OpenGL conventions conventions 3D API hardware Direct3D conventions interface interface GPU supported by OpenGL too Your Direct3D Your Direct3D API Direct3D application Direct3D content Direct3D driver content authored application conventions to Direct3D conventions
  • 33. Further Direct3Disms in OpenGL 4.2 • Direct3Disms = semantic compatibility with Direct3D behaviors – Not just same functionality as Direct3D—but same functionality with Direct3D-style semantics • Alignment constraints for mapped buffers – ARB_buffer_alignment – Makes sure mapped buffers are aligned for SIMD CPU instruction access – Really just a query for minimum alignment • Expect alignment to be no smaller than 64 bytes
  • 34. OpenGL 4.2 Base Instance • Extension on instanced rendering – For drawing multiple instances of geometry • Designed to be compatible with Direct3D – See ARB_instanced_arrays • Allows the specification of – base vertex – base instance – for each instanced draw • Generalizes instanced rendering
  • 35. OpenGL 4.2 GLSL Features • Conservative Depth – Permits depth/stencil tests before shading – Re-declares gl_FragDepth to restrict depth changes • Allows restrictions – New depth can only increase depth • layout (depth_greater) out float gl_FragDepth; – New depth can only decrease depth • layout (depth_less) out float gl_FragDepth; • To use – #extension GL_ARB_conservative_depth : enable
  • 36. OpenGL 4.2 GLSL Features • Miscellaneous GLSL changes – Allow line-continuation character using ‘’ – Use UTF8 for character set, for comments – Implicit conversions of return values – const keyword allowed on variables with functions – No strict order of qualifiers – Adds layout qualifier binding for uniform blocks – C-style aggregate initializers – Allow .length method to return number of components or columns in vectors and matrices – Allow swizzles on scalars – New built-in constants for gl_MinProgramTexelOffset and Max • All minor improvements for consistency with C/C++ and minimize aggravation
  • 37. OpenGL-related Linux Improvements  Support for X Resize, Rotate, and Reflect Extension  Also known as RandR  Version 1.2 and 1.3  OpenGL enables, by default, “Sync to Vertical Blank”  Locks your glXSwapBuffers to the monitor refresh rates  Matches Windows default now  Previously disabled by default
  • 38. OpenGL-related Linux Improvements • Expose additional full-scene antialiasing (FSAA) modes – 16x multisample FSAA on all GeForce GPUs • 2x2 supersampling of 4x multisampling – Ultra high-quality FSAA modes for Quadro GPUs • 32x multisample FSAA – 2x2 supersampling of 8x multisampling • 64x multisample FSAA – 4x4 supersampling of 4x multisampling • Coverage sample FSAA on GeForce 8 series and better – 4 color/depth samples + 12 depth samples
  • 39. Multisampling FSAA Patterns aliased 2x multisampling 4x multisampling 8x multisampling 1 sample/pixel 2 samples/pixel 4 samples/pixel 8 samples/pixel 64 bits/pixel 128 bits/pixel 256 bits/pixel 512 bits/pixel Assume: 32-bit RGBA + 24-bit Z + 8-bit Stencil = 64 bits/sample
  • 40. Supersampling FSAA Patterns 2x2 supersampling 2x2 supersampling 4x4 supersampling of 4x multisampling of 8x multisampling of 16x multisampling 16 samples/pixel 32 samples/pixel 64 samples/pixel 1024 bits/pixel 2048 bits/pixel 4096 bits/pixel Quadro GPUs Assume: 32-bit RGBA + 24-bit Z + 8-bit Stencil = 64 bits/sample
  • 41. NVIDIA X Server Settings for Linux Control Panel
  • 42. GLX Protocol • Network transparent OpenGL – Run OpenGL app on one machine, display the X and 3D on a different machine 3D app X server GLX OpenGL Server GLX Client OpenGL network connection
  • 43. OpenGL-related Linux Improvements Official GLX Protocol support for OpenGL extensions • GL_ARB_half_float_pixel • GL_EXT_point_parameters • GL_ARB_transpose_matrix • GL_EXT_stencil_two_side • GL_EXT_blend_equation_separate • GL_NV_copy_image • GL_EXT_depth_bounds_test • GL_NV_half_float • GL_EXT_framebuffer_blit • GL_NV_occlusion_query • GL_EXT_framebuffer_multisample • GL_NV_point_sprite • GL_EXT_packed_depth_stencil • GL_NV_register_combiners2
  • 44. OpenGL-related Linux Improvements Tentative GLX Protocol support for OpenGL extensions • GL_ARB_map_buffer_range • GL_EXT_vertex_attrib_64bit • GL_ARB_shader_subroutine • GL_NV_conditional_render • GL_ARB_stencil_two_side • GL_NV_framebuffer_multisample • GL_EXT_texture_integer _coverage • GL_EXT_transform_feedback2 • GL_NV_texture_barrier • GL_NV_transform_feedback2
  • 45. Synchronizing X11-based OpenGL Streams • New extension – GL_EXT_x11_sync_object • Bridges the X Synchronization Extension with OpenGL 3.2 “sync” objects (ARB_sync) • Introduces new OpenGL command – GLintptr sync_handle; – GLsync glImportSyncEXT (GLenum external_sync_type, GLintptr external_sync, GLbitfield flags); • external_sync_type must be GL_SYNC_X11_FENCE_EXT • flags must be zero
  • 46. Other Linux Updates • GL_CLAMP behaves in conformant way now – Long-standing work around for original Quake 3 • Enabled 10-bit per component X desktop support – GeForce 8 and better GPUs • Support for 3D Vision Pro stereo now
  • 47. What is 3D Vision Pro? • For Professionals • All of 3D Vision support, plus • Radio frequency (RF) glasses, Bidirectional • Query compass, accelerometer, battery • Many RF channels – no collision • Up to ~120 feet • No line of sight needed to emitter • NVAPI to control
  • 48. OpenGL Inter-GPU Communication PCI Express • NV_copy_image extension GPU1 GPU0 destRC srcRC – Quadro-only Compute/ Compute/ DMA DMA Render Render – (Asynchronous) copy Engine Engine Engine Engine between GPU’s – Does not require GPU GPU binding textures or Memory Memory destTex srcTex state changes – App handles wglCopyImageSubDataNV(srcRC, srcTex, GL_TEXTURE_2D,0, 0, 0, 0, destRC, destTex, synchronization GL_TEXTURE_2D, 0, 0, 0, 0, width, height, 1);
  • 50. DirectX / OpenGL Interoperability Application • DirectX interop is a GL API extension (WGL_NVX_dx_interop) • Premise: create resources in DirectX, get access to them within OpenGL OpenGL DirectX driver driver • Read/write support for DirectX 9 textures, render targets and vertex buffers • Implicit synchronization is done Texture / Vertex buffer object between Direct3D and OpenGL data • Soon: DirectX 10/11 support GPU Display
  • 51. DirectX / OpenGL Interoperability // create Direct3D device and resources the regular way direct3D->CreateDevice(..., &dxDevice); dxDevice->CreateRenderTarget(width, height, D3DFMT_A8R8G8B8, D3DMULTISAMPLE_4_SAMPLES, 0, FALSE, &dxColorBuffer, NULL); dxDevice->CreateDepthStencilSurface(width, height, D3DFMT_D24S8, D3DMULTISAMPLE_4_SAMPLES, 0, FALSE, &dxDepthBuffer, NULL);
  • 52. DirectX / OpenGL Interoperability // Register DirectX device for OpenGL interop HANDLE gl_handleD3D = wglDXOpenDeviceNVX(dxDevice); // Register DirectX render targets as GL texture objects GLuint names[2]; HANDLE handles[2]; handles[0] = wglDXRegisterObjectNVX(gl_handleD3D, dxColorBuffer, names[0], GL_TEXTURE_2D_MULTISAMPLE, WGL_ACCESS_READ_WRITE_NVX); handles[1] = wglDXRegisterObjectNVX(gl_handleD3D, dxDepthBuffer, names[0], GL_TEXTURE_2D_MULTISAMPLE, WGL_ACCESS_READ_WRITE_NVX); // Now textures can be used as normal OpenGL textures
  • 53. DirectX / OpenGL Interoperability // Rendering example: DirectX and OpenGL rendering to the // same render target direct3d_render_pass(); // D3D renders to the render targets as usual // Lock the render targets for GL access wglDXLockObjectsNVX(handleD3D, 2, handles); opengl_render_pass(); // OpenGL renders using the textures as render // targets (e.g., attached to an FBO) wglDXUnlockObjectsNVX(handleD3D, 2, handles); direct3d_swap_buffers(); // Direct3D presents the results on the screen
  • 54. CUDA Interoperability example Application • Interop APIs are extensions to OpenGL and CUDA • Multi-card interop 2x faster on Quadro / Tesla. Transfer between cards, minimal OpenGL CUDA CPU involvement driver driver • GeForce requires CPU copy Quadro fast Tesla or or Interop GeForce GeForce Display
  • 55. Multi-GPU OpenGL Interoperability Application • Interop using NV_copy_image OpenGL extension GPU affinity • Transfer directly between cards, minimal CPU involvement OpenGL OpenGL driver driver • Quadro only Quadro fast Quadro Off-screen Interop buffer Display
  • 56. What is path rendering? • A rendering approach – Resolution-independent two-dimensional graphics – Occlusion & transparency depend on rendering order • So called “Painter’s Algorithm” – Basic primitive is a path to be filled or stroked • Path is a sequence of path commands • Commands are – moveto, lineto, curveto, arcto, closepath, etc. • Standards – Content: PostScript, PDF, TrueType fonts, Flash, Scalable Vector Graphics (SVG), HTML5 Canvas, Silverlight, Office drawings – APIs: Apple Quartz 2D, Khronos OpenVG, Microsoft Direct2D, Cairo, Skia, Qt::QPainter, Anti-grain Graphics,
  • 57. What is NV_path_rendering? • OpenGL extension to GPU-accelerate path rendering • Uses “stencil, then cover” (StC) approach – Create a path object – Step 1: “Stencil” the path object into the stencil buffer • GPU provides fast stenciling of filled or stroked paths – Step 2: “Cover” the path object and stencil test against its coverage stenciled by the prior step • Application can configure arbitrary shading during the step – More details later • Supports the union of functionality of all major path rendering standards – Includes all stroking embellishments – Includes first-class text and font support – Allows this functionality to mix with traditional 3D and programmable shading
  • 58.
  • 59. Pixel pipeline Vertex pipeline Path pipeline Application Path specification Pixel assembly Vertex assembly Transform path (unpack) Vertex operations transform feedback Primitive assembly Pixel operations Primitive operations Fill/Stroke Covering Pixel pack Rasterization read Texture Fragment operations back memory Fill/Stroke Application Raster operations Stenciling Framebuffer Display
  • 60. Talk on Tuesday • “GPU-Accelerated Path Rendering” – Tuesday, May 15 – 14:00-14:50, Room A3 • Teaser scene
  • 61. Bindless Graphics • Problem: Binding to different objects (textures, buffers) takes a lot of validation time in driver – And applications are limited to a small palette of bound buffers and textures – Approach of OpenGL, but also Direct3D • Solution: Exposes GPU virtual addresses – Let shaders and vertex puller access buffer and texture memory by its virtual address!
  • 62. Prior to Bindless Graphics • Traditional OpenGL – GPU memory reads are “indirected” through bindings • Limited number of texture units and vertex array attributes – glBindTexture—for texture images and buffers – glBindBuffer—for vertex arrays
  • 63. Buffer-centric Evolution • Data moves onto GPU, away from CPU – Apps on CPUs just too slow at moving data otherwise Array Element Buffer glBegin, glDrawElements, etc. Object (VeBO) Texture Buffer Object (TexBO) Vertex Array Buffer Object Vertex Puller (VaBO) texel data Transform Feedback Buffer (XBO) Vertex Shading Pixel Unpack Buffer (PuBO) vertex data Texturing Geometry Shading glDrawPixels, glTexImage2D, etc. Parameter Buffer Object (PaBO) glReadPixels, Pixel Pack Buffer etc. Fragment Pixel (PpBO) Uniform Buffer Shading Pipeline pixel data Object (UBO) parameter data Framebuffer
  • 64. Kepler – Bindless Textures • Enormous increase in the number of unique textures available to shaders at run-time • More different materials and richer texture detail in a scene texture #0 Shader code texture #1 Shader code texture #2 … texture #127 … Pre-Kepler texture binding model Kepler bindless textures over 1 million unique textures
  • 65. Kepler – Bindless Textures Pre-Kepler texture binding model Kepler bindless textures CPU CPU Load texture A Load textures A, B, C Load texture B Draw() Load texture C Bind texture A to slot I GPU Bind texture B to slot J Read from texture A Draw() Read from texture B Read from texture C GPU Read from texture at slot I Read from texture at slot J CPU Bind texture C to slot K Bindless model reduces CPU Draw() overhead and improves GPU access GPU efficiency Read from texture at slot K
  • 66. Bindless Textures • Apropos for ray-tracing and advanced rendering where textures cannot be “bound” in advance Shader code
  • 67. Bindless performance benefit Numbers obtained with a directed test
  • 68. More Information on Bindless Texture • Kepler has new NV_bindless_texture extension – Texture companion to • NV_vertex_buffer_unified_memory for bindless vertex arrays • NV_shader_buffer_load for bindless shader buffer reads • API specification publically available – http://developer.download.nvidia.com/opengl/specs/GL_NV_bindless_texture.txt
  • 69. API Usage to Initialize Bindless Texture • Make a conventional OpenGL texture object – With a 32-bit GLuint name • Query a 64-bit texture handle from 32-bit texture name – GLuint64 glGetTextureHandleNV(GLuint); • Make handle resident in context’s GPU address space – void glMakeTextureHandleResidentNV(GLuint64);
  • 70. Writing GLSL for Bindless Textures • Request GLSL to understand bindless textures – #version 400 // or later – #extension GL_NV_bindless_texture : require • Declare a sampler in the normal way – in sampler2D bindless_texture; • Alternatively, access bindless samplers in big array: – uniform Samplers { sampler2D lotsOfSamplers[256]; } – Exciting: 256 samplers exceeds the available texture units!
  • 71. Update Sampler Uniforms with Bindless Texture Handle • Get a location for a sampler or image uniform – GLint loc = glGetUniformLocation(program, “bindless_texture”); – GLint loc_array = glGetUniformLocation(program, “lotsOfSamplers”); • Then set sampler to the bindless texture handle – glProgramUniformHandleui64NV(program, location, 1, &bindless_handle);
  • 72. Seam-free Cube Map Edges • Cube maps have edges along each face – Traditionally texture mapping hardware simply clamps to these seam edges • Results in “seam” artifacts – Particularly when level-of-detail bias is large • Meaning very blurry levels • But seams appear sharply seam • Use glEnable( GL_TEXTURE_CUBE_MAP_SEA MLESS) to mitigate these artifacts on context-wide basis – Applies to all cube maps
  • 73. Seamless Cube Maps: Before and After • Before: with edge seams • After: without seams
  • 74. Kepler Provides Per-texture Seamless Cube Map Support • Kepler GPUs support – GL_AMD_seamless_cubemap_per_texture – Provides per-texture object parameters • glTexParameteriv(GL_TEXTURE_2D, GL_TEXTURE_CUBE_MAP_SEAMLESS_ARB, GL_TRUE); – Not sure if you want a mix of seamless and seamed cube maps… • …but supported anyway
  • 75. Cg Toolkit 3.1 Update • Cg 3.0 = big upgrade – Introduced Shader Model 5.0 support for OpenGL • New gp5vp, gp5gp, and gp5fp profiles correspond to Shader Model 5.0 vertex, geometry, and fragment profiles – Compiles to assembly text for the NV_gpu_program5 assembly-level shading language – Also programmable tessellation profiles • gp5tcp = NV_gpu_program5 tessellation control program • gp5tep = NV_gpu_program5 tessellation evaluation program – Also DirectX 11 profiles • Cg 3.1 refines 3.0 functionality – Bug fixes, constant buffer support, better GLSL support
  • 76. Cg 3.1 OpenGL Shader Model 5.0 Profiles indices primitive rectangular & indices primitive topology & triangluar topology & in-band vertex patch in-band vertex attributes index to vertex attributes index to vertex attribute attribute vertex puller puller vertex attributes attributes control point vertex vertex (vertex) shader attributes vertex shader gp4vp shader gp5vp gp5vp control assembled points tessellation assembled primitive primitive level-of-detail control (hull) parameters shader geometry geometry gp4gp shader gp5tcs gp5gp shader tessellation patch control primitive stream primitive stream generator points & programmable clipping, parameters clipping, setup, & setup, & rasterization rasterization fixed-function fragments gp5tes tessellation fragments fine primitive fragment topology evaluation gp4fp fragment (u, v) coordinates (domain) shader gp5fp shader shader Cg 2.x support Cg 3.x adds programmable tessellation
  • 77. Cg 3.1 profiles OpenGL OpenGL Shader Shader Model Shader Model OpenGL Shading domain 4.0 5.0 Language (GLSL) DirectX 10 DirectX 11 Vertex (or control point) gp4vp gp5vp glslv vs_4_0 vs_5_0 Tessellation control (hull) n/a gp5tcp n/a n/a hs_5_0 Tessellation n/a n/a evaluation gp5tep n/a ds_5_0 (domain) Geometry gp5gp gp5gp glslg gs_4_0 gs_5_0 Fragment gp4fp gp5fp glslf ps_4_0 ps_5_0
  • 78. Cg 3.1 Update • Various improvements & bug fixes – Constant buffer fixes – GLSL translation for different GLSL versions • Support for CENTROID, FLAT, and NOPERSPECTIVE fragment program interpolants • Deprecated functionality – Removed Direct3D 8 support – Minimum Mac OS X version: 10.5 (Leopard)
  • 79. Cg Off-line Compiler Compiles GLSL to Assembly • GLSL is “opaque” about the assembly generated – Hard to know the quality of the code generated with the driver – Cg Toolkit 3.0 is a useful development tool even for GLSL programmers • Cg Toolkit’s cgc command-line compiler actually accepts both the Cg (cgc’s default) and GLSL (with –oglsl flag) languages – Uses the same compiler technology used to compile GLSL within the driver • Example command line for compiling earlier tessellation control and evaluation GLSL shaders – cgc -profile gp5tcp -oglsl -po InputPatchSize=3 -po PATCH_3 filename.glsl (assumes 3 input control points and 3 outputs points) – cgc -profile gp5tep -oglsl -po PATCH_3 filename.glsl (assumes 3 input control points)
  • 81. Other OpenGL-related Sessions at GTC • S0024: GPU-Accelerated Path Rendering – Tuesday, 14:00, Room A3 • S0049: Using the GPU Direct for Video API – Tuesday, 15:00, Room J8 • S0356: Optimized Texture Transfers – Tuesday, 16:00, Room J2 • S0267A: Mixing Graphics and Compute with Multiple GPUs – Tuesday, 17:00, Room J2 • S0353: Programming Multi-GPUs for Scalable Rendering – Wednesday, 9:00, Room A1 • S0267B - Mixing Graphics and Compute with Multiple GPUs – Thursday, 17:30, Room L

Editor's Notes

  1. One way to view the OpenGL offerings from NVIDIA is as a tool to enable awesome visual applications to be developed through a comprehensive stack of software solutions. We offer: Cg: A shading language and effects system to develop rendering effects and techniques, and deploy them on various platforms and APIs, including OpenGL. Scenix: OpenGL based professional scene graph used in many markets today. Automotive styling, visualization, simulation, broadcast graphics, and more. Applications can quickly add features such as stereo, SDI, 30-bit color, scene distribution and interactive ray tracing Optix: Is an interactive ray-tracing engine built on top of CUDA and OpenGL. Hybrid rendering, mixing of traditional graphics rendering and ray tracing, are also enabled. OptiX integrates with SceniX. With Optix, you accelerate an existing renderer, or build a new one yourself. Complex: Maintains interactivity for large scenes as they exceed the limits of a single GPU, allowing massive data sets to be explored, using multiple GPUs in a system. The CompleX engine can be adopted by any product using OpenGL, and can be enabled immediately by an application using SceniX Parallel Nsight is an advanced debugger and analyzer fully integrated with Visual Studio. It enables you to better see what is going on with your OpenGL, CUDA, Direct3D, DirectCompute or OpenCL application.
  2. This is a close-up taken out of the Fermi Rocket Sled demo. This shows how hybrid rendering works. With traditional OpenGL rendering you generate your image, for example using SceniX, and then enhance it by using OptiX to add reflection and shadows. One of our experts on this, Tristan, is available if you have more questions on how you can do this in your application. Tristan, raise your hand please.
  3. OpenGL is the only Cross Platform 3D API. Every major Operating System provides a version or flavor of OpenGL. Windows, Mac OS, iOS, Linux and Android.
  4. After Mark’s deep-dive, I’ll pull you back up into the higher level view of where OpenGL fits in as the 3D graphics API.
  5. OpenGL is a desktop API, as you’ve seen by now. OpenGL ES is the sister API for mobile and embedded devices. By keeping both APIs closely aligned, content can flow from there up into OpenGL enabled platforms and back down to OpenGL ES enabled platforms. WebGL was announced last year. It provides JavaScript bindings to OpenGL ES and provides plug-in less 3D graphics in a web browser. WebGL gives you access to the GPU in the system inside of a browser. Beta implementations of WebGL are already available from Mozilla, Google and Opera. For NVIDIA this means we will support and enhance OpenGL and WebGL on GeForce and OpenGL ES and WebGL on Tegra for the mobile market.
  6. Finally, Siggraph is where we introduce 3D Vision Pro as well. Again, come to the booth for more information. Now on to OpenGL!
  7. OpenGL Interoperability means the ability to share data among different APIs. You can, for example, map a D3D surface as a texture or renderbuffer in OpenGL directly. Or map an OpenGL buffer object as a CUDA memory object. It is also possible to transfer data between two OpenGL contexts in a multi-GPU situation with maximum efficiency.
  8. Here is an example of interop with CUDA. The application makes OpenGL and CUDA calls. For example, process a video stream in CUDA, then transfer it over to OpenGL using interop for more processing and display. This works on a single card as well as multi-card systems. Performance between Quadro cards will be higher than between GeForce cards.
  9. This slide is almost the same as the previous one, except it shows interop between two OpenGL contexts, each talking to a separate graphics card. The application is using GPU affinity to direct rendering to a particular GPU. Again, you need interop to transfer the data from one card to the other. For example, to composite the final frame of a rendering split over many GPUs. Note that both GPU affinity and fast GL to GL interop with NV_copy_image are Quadro only solutions.
  10. NV_path_rendering provides a new third pipeline—in addition to the vertex and pixel pipelines—for rendering pixels
  11. Before Kepler, in order to make a texture available for the GPU to reference it had to be assigned a “slot” in a fixed-size binding table. The number of slots in that table ultimately limits how many unique textures a shader can read from at run time. On Kepler, no such additional setup is necessary - shader can reference textures in memory directly and there is no need to go through the binding tables anymore. This effectively eliminates any limits on the number of unique textures it can use to render a scene.
  12. Bindless textures reduce CPU work and provide more efficient access for the GPU
  13. In advanced rendering apps such as raytracing it is impossible to know in advance which textures a given ray may hit – thus it is impossible to pre-“bind” them. Bindless model solves this problem by allowing shader reference textures directly