Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mark Kilgard, July 31
SIGGRAPH 2017, Los Angeles
NVIDIA OpenGL in 2017
2
Mark Kilgard
• Principal System Software Engineer
OpenGL driver and API evolution
Cg (“C for graphics”) shading language...
3
OpenGL 4.6 with SPIR-V support announced today
4
NVIDIA’s OpenGL Leverage
Debugging with
Nsight
Programmable
Graphics
Android & SHIELD
Quadro
OptiX
GeForce
Adobe Creativ...
5
OpenGL Codebase Leverage
Same driver code base supports multiple APIs
OpenGL for Embedded,
Mobile, and Web
Multi-vendor,...
6
NVIDIA’s Shading Compiler Even More Leveraged
Various
Direct3D versions3D APIs based on NVIDIA OpenGL driver code base
N...
7
Still the One Truly Common & Open 3D API
OS X
Linux
FreeBSD
Solaris
Android
Windows
Embedded
Designs
8
NVIDIA OpenGL in 2017 Provides
OpenGL’s Maximally Available Superset
OpenGL 4.6
Pascal
Extensions
2015 ARB extensions
Op...
9
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• A...
10
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• ...
11
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• ...
12
Available to Download Today
• Beta driver with OpenGL 4.6 support
July 31, 2017
13
For those tracking birthdays...
Then celebrating OpenGL 4.3 Now celebrating OpenGL 4.6
14
Need a Refresher on 2014, 2015, and 2016 OpenGL?
• Honestly, NVIDIA exposed lots of functionality in last 3 years
Avail...
15
Introducing OpenGL 4.6
• Big feature: SPIR-V support required
• SPIR-V = standard intermediate language for parallel co...
16
OpenGL extension exposing Khronos intermediate
language for parallel compute and graphics
Khronos extension for OpenGL ...
17
SPIR-V Ecosystem
LLVM
Third party kernel and
shader Languages
•SPIR-V
•Khronos defined and controlled
cross-API interme...
18
NVIDIA’s SIGGRAPH
Driver Update
• NVIDIA historically releases a “developer” driver at SIGGRAPH with support for all Kh...
19
20
What OpenGL 4.6 Packages Together
• OpenGL evolves by bundling extensions as a core version update
• OpenGL 4.6 = every...
21
ARB_indirect_parameters: Intro & Review
•Evolving capability in OpenGL 4.x
• General idea: allow the GPU to generate it...
22
Original Ways to Draw
• Two primary ways to draw with vertex arrays
• glDrawElements
• Accepts an array of vertex index...
23
Vertex 0 (x0,y0), (r0,g0,b0)
Vertex 1 (x1,y1), (r0,g0,b0)
Vertex 2 (x2,y2), (r0,g0,b0)
Vertex 3 (x3,y3), (r1,g1,b1)
Ver...
24
Multi Draw Arrays
• glMultiDrawArrays & glMultiDrawElements
• Same as before, but loop over glDrawArrays or glDrawEleme...
25
Instancing
• GPU draw the same primitive topology, N times
• Shader or vertex attribute usage can transform & shader ea...
26
Power of Instancing
• Vertex arrays with a single object mesh can
render N distinct instances from a single GL
command
...
27
Draw Indirect (OpenGL 4.0)
• Conventional GL draw calls
• Require directly passing parameters to each GL draw call to f...
28
Draw Indirect Buffer Layout
• glDrawArraysIndirect
• Takes: (GLenum mode, const void *indirect)
• indirect is GPU offse...
29
Multi Draw Indirect (OpenGL 4.3)
• Now a single GL command can launch multiple draw indirect operations
• Takes a primi...
30
ARB_indirect_parameters
• Yet-another NEW buffer binding
• glBindBuffer(GL_PARAMETER_BUFFER);
• Buffer source for readi...
31
ARB_indirect_parameters Usage Scenario
• Correctly-ordered blended dynamic particle system
• Particles are semi-opaque ...
32
ARB_pipeline_statistics_query
•New query types
• Shares same API initially used for occlusion queries
• glBeginQuery, g...
33
Available Statistics
Query token Queried statistic
GL_VERTICES_SUBMITTED # of vertices issued to OpenGL
GL_PRIMITIVES_S...
34
Simple Example Usage
• Creating a query object
• GLuint query_object;
• glGenQueries(1, &query_object);
• Begin a query...
35
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQue...
36
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQue...
37
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQue...
38
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQue...
39
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQue...
40
ARB_polygon_offset_clamp
• Extends OpenGL’s polygon offset feature
•Polygon offset was one of OpenGL’s first
extensions...
41
Motivation & Usage of Polygon Offset
• Motivation of polygon offset
• Depth buffers must quantize depth values
• Typica...
42
Polygon Offset Justified (1)
• Rasterizing triangles generates discretized depth values
• A rasterizer’s depth slope fo...
43
Polygon Offset Justified (2)
• Conceptually, think of interpolated depth as having “error bars”
• Depth rasterization e...
44
Polygon Offset Improved!
• Wait... Sounds fishy but (mostly) works!
• Mostly??
• What can go wrong?
• The depth slope c...
45
Using Polygon Offset Clamp
•Easy API just adds new maximum depth bias clamp value
• GL 1.1: glPolygonOffset(factor, uni...
46
Examples of Light Leaks
Mitigated by Polygon Offset Clamp
BEFORE AFTER
Solid girder’s shadow shows streaks
Animates bad...
47
Examples of Light Leaks
Mitigated by Polygon Offset Clamp
Dots of light within boot’s shadow
Animates badly
Mitigated b...
48
KHR_no_error
• The “no airbags” extension now part of OpenGL 4.6
• Makes OpenGL operation in the presence of GL_INVALID...
49
NVIDIA’s KHR_no_error Advice
• General advice: “Try it before you buy it”
• Not generating errors has a severe side-eff...
50
ARB_shader_atomic_counter_ops
• Completes OpenGL Shading Language (GLSL) support for atomic counters
• Prior ARB_shader...
51
ARB_shader_draw_parameters
• Adds to new GLSL built-in variables to get base vertex and instance
• gl_BaseVertex
• gl_B...
52
ARB_shader_group_vote
• Provides new GLSL built-in functions to compute composite of a set of boolean
conditions across...
53
ARB_shader_group_vote Rationale
• Rationale
• Implementation reality: GPUs run shader invocations using groups of threa...
54
ARB_gl_spirv
• This extension announced at SIGGRAPH 2016
• But was optional
• NVIDIA announced support last year
• Much...
55
OpenGL Driver
GLSL Compilation prior to SPIR-V
shader.vert
shader.geom
shader.frag
your
OpenGL
app
GPU
GLSL Compiler
Fr...
56
OpenGL Driver
GLSL Compiler
Front-end
ARB_gl_spirv Enabled
Offline Compilation of GLSL to SPIR-V
your
OpenGL
app
GPU
sh...
57
Tools to Manipulate SPIR-V
• Open source SPIR-V tools available
• glslang: glslValidtator
• Provides basic GLSL compile...
58
API Usage Differences: Compiling GLSL vs. SPIR-V
glCreateProgram
glShaderSource
glCompileShader
glAttachShader
glCreate...
59
API Usage Differences: Compiling GLSL vs. SPIR-V
glCreateProgram
glShaderBinary
glSpecializeShader
glAttachShader
glCre...
60
ARB_spirv_extensions
• Original ARB_gl_spirv extension only added support for SPIR-V 1.0 concepts that
were part of the...
61
First Set of SPIR-V Extensions
SPIR-V Extension Name Corresponding OpenGL extension or functionality
SPV_KHR_shader_bal...
62
ARB_texture_filter_anisotropic
• Fully compatible with long-standing EXT_texture_filter_anisotropic
• Reduces texture f...
63
64
ARB_transform_feedback_overflow_query
• Adds new query types which can be used to detect overflow of transform feedback...
65
Why OpenGL Core Updates
are Important (1)
• Not just opportunity for new functionality
• A new specification is release...
66
Why OpenGL Core Updates are Important (2)
• Not just opportunity for new functionality
• Opportunity for a new Conforma...
67
Why OpenGL Core Updates are Important (3)
• Not just opportunity for new functionality
• OpenGL Shading Language (GLSL)...
68
OpenGL 4.6’s Resolving of IP Issues & New Open Sourcing of OpenGL
Conformance Suite Benefits Open Source OpenGL Impleme...
69
Credit for OpenGL 4.6
• Khronos relies on its member companies to complete new OpenGL core updates
• Different companie...
70
NVIDIA OpenGL Initiatives in 2017
in addition to OpenGL 4.6
71
GPU “Interop” Usage
•Increasingly applications want to share GPU resources and mix APIs
• Typically sophisticated appli...
72
Past Interop Extensions for OpenGL
•Past interoperability extensions would pair OpenGL concepts to those
of another one...
73
Responding to New Interop Requirements
• Addressing criticism of prior interop extensions...
• In many cases, single-ve...
74
EXT_memory_object & EXT_semaphore
• Vulkan introduces explicit memory and synchronization objects
• EXT_memory_object i...
75
EXT_semaphore
• Introduces new object matching Vulkan-style semaphores
• Basic operations on semaphores
• Object manage...
76
EXT_memory_object
• Introduces new memory object corresponding to Vulkan concept
• Import memory objects with platform-...
77
OpenGL ES Parity
• Mobile developers often target OpenGL ES
• Apple’s iOS and Google’s Android use of ES made the de fa...
78
Oh, 3D developer—you
flatter me noticing my
complete & mature
feature set
With ES parity,
what does she
have that I don...
79
ES Parity Extensions for 2017
Extension name Functionality
EXT_clear_texture Clear texture images & sub-images
EXT_cons...
80
Still ES Lacks Much,
NVIDIA Provides What’s Missing
•The 2017 multi-vendor parity extensions
highlight what’s missing f...
81
NVIDIA’s ES Parity Philosophy
•The idea of “ES Parity” is NOT to turn an ES context into an OpenGL
4.x context
•The ide...
82
NVIDIA ES Parity
Enhancements
Result of NVIDIA’s ES Parity Efforts
Full OpenGL
ES 3.2
ES 3.1
ES 3.0
ES 2.0
Industry’s
m...
83
Perspective of ES Parity
from an OpenGL 4.6 Context
NVIDIA ES Parity
Enhancements
Full OpenGL
ES 3.2
ES 3.1
ES 3.0
ES 2...
84
Miscellaneous NEW Extensions for 2017
• NV_blend_minmax_factor, based on AMD_blend_minmax_factor
• EXT_protected_textur...
85
NV_blend_minmax_factor:
Modulated Min/Max Blending
• Original GL_MIN and GL_MAX blend equations limited
• Both ignore t...
86
NV_blend_minmax_factor
Example
• Blend code
• blendResult = max(sourceColor,
destinationColor × (1−sourceAlpha))
• Code...
87
EGL_EXT_protected_content &
EXT_protected_textures (1)
•Together provide OpenGL protected access control to GPU images
...
88
EGL_EXT_protected_content &
EXT_protected_textures (2)
• Pipeline stages of OpenGL contexts can also be designated prot...
89
EGL_EXT_protected_content Scenarios
• Android 7.0’s secure texture video playback
• Allows secure GPU post-processing o...
90
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
NumberofOpenGLextensionsproposed
Caveats: ext...
91
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 ...
92
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 ...
93
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 ...
94
Cumulative Implemented NVIDIA OpenGL
Extensions Over 20 Years
0
50
100
150
200
250
300
350
400
450
500
1997 1998 1999 2...
95
NVIDIA OpenGL Shader Caching
• NVIDIA OpenGL driver saves GLSL shaders it
compiles
• Cached compiled shaders saved to y...
96
Linux Graphics
Open Source Efforts from NVIDIA
• NVIDIA works to improve graphics support for entire Linux ecosystem
• ...
97
GLVND: GL Vendor-Neutral Dispatch library
• libglvnd
• Arbitrates OpenGL API calls between multiple vendors
• Multiple ...
98
Before GLVND
NVIDIA Proprietary
Linux Driver
Mesa + Nouveau
I control OpenGL
best on NVIDIA
GPUs
But I got
here first!
...
99
GLVND Architecture
libOpenGL
mapi/glapi
libGLdispatch
libGLX
libGL
X11 Server
GLX_EXT_libglnvd
extension
GLX_vendor GLX...
100
0
NVIDIA’s Support for Wayland
• Wayland
• Intended as simpler replacement for X Window System
• A protocol for a comp...
101
1
Latest VDPAU Support
• Video Decode and Presentation API for Unix (VDPAU)
• Latest NVIDIA GPUs (GeForce 1080, etc.)
...
102
2
NVIDIA Codec SDK 8.0
• Two hardware acceleration interfaces:
• NVENCODE API for video encode
acceleration
• NVDECODE...
103
3
Supported Video Encoding Formats by GPU Generation
* Except GM108
** Except GP100 (is limited to 4K resolution)
8k e...
104
4
GPU Encoding:
Awesome Performance & Comparable Quality
Bigger is faster for NVENC Comparable peak signal-to-noise ra...
105
5
Supported Video Decode Formats by GPU Generation
* Except GM108
** Max resolution support is limited to selected Pas...
106
6
NVDEC to OpenGL to NVENC
NVDEC NVENC
OpenGL
texture object
OpenGL
texture object
OpenGL
texture object
Linux only fo...
107
7
Proven GPU Codec Technology
•Same underlying technology powers these services
Play your PC games on your PC,
encode ...
108
8
GLEW Support Available NOW
GLEW = The OpenGL Extension Wrangler Library
Open source library
Pre-built distribution: ...
109
9
NVIDIA OpenGL in 2017 Provides
OpenGL’s Maximally Available Superset
OpenGL 4.6
Pascal
Extensions
2015 ARB extension...
110
0
Last Words
•Khronos announces OpenGL 4.6 today! Best OpenGL yet
•Highlights of NVIDIA’s OpenGL support in 2017
• NVI...
111
1
SIGGRAPH Paper Using OpenGL to Check Out
• How to make shaders modular without giving
up performance
• Open source o...
2
Upcoming SlideShare
Loading in …5
×

NVIDIA OpenGL 4.6 in 2017

8,844 views

Published on

NVIDIA OpenGL in 2017, presented at SIGGRAPH in Los Angeles, July 31, 2017.

Covers OpenGL 4.6

Published in: Technology

NVIDIA OpenGL 4.6 in 2017

  1. 1. Mark Kilgard, July 31 SIGGRAPH 2017, Los Angeles NVIDIA OpenGL in 2017
  2. 2. 2 Mark Kilgard • Principal System Software Engineer OpenGL driver and API evolution Cg (“C for graphics”) shading language GPU-accelerated path rendering & web browser rendering • OpenGL Utility Toolkit (GLUT) implementer • Specified and implemented much of OpenGL • Author of OpenGL for the X Window System • Co-author of Cg Tutorial • Worked on OpenGL for over 25 years My Background
  3. 3. 3 OpenGL 4.6 with SPIR-V support announced today
  4. 4. 4 NVIDIA’s OpenGL Leverage Debugging with Nsight Programmable Graphics Android & SHIELD Quadro OptiX GeForce Adobe Creative Cloud Embedded Projects
  5. 5. 5 OpenGL Codebase Leverage Same driver code base supports multiple APIs OpenGL for Embedded, Mobile, and Web Multi-vendor, explicit, low-level graphics from Khronos
  6. 6. 6 NVIDIA’s Shading Compiler Even More Leveraged Various Direct3D versions3D APIs based on NVIDIA OpenGL driver code base NVIDIA Shading Compiler code base Apple’s proprietary graphics API Proprietary console API
  7. 7. 7 Still the One Truly Common & Open 3D API OS X Linux FreeBSD Solaris Android Windows Embedded Designs
  8. 8. 8 NVIDIA OpenGL in 2017 Provides OpenGL’s Maximally Available Superset OpenGL 4.6 Pascal Extensions 2015 ARB extensions OpenGL 4.5 Core Maxwell Extensions Legacy EXT & Other Compatibility Extensions OpenGL Complete Compatibility Path Rendering Multi-GPU. SLI Approaching Zero Driver Overhead NVIDIA Multi-generation GPU Initiatives DirectX inter-op Vulkan inter-op ES Enhancements Full OpenGL ES 3.2 Khronos Standard Expected Compatibility NVIDIA Initiatives GPU Generation Features
  9. 9. 9 OpenGL’s Recent Advancements 2014 2015 2016 New ARB Extensions 3 standard extensions, beyond 4.5 • ARB_sparse_buffer • ARB_pipeline_statistics_query • ARB_transform_feedback_overflow_query Maxwell Extensions • Novel graphics features • 14 new extensions • Global Illumination & Vector Graphics focus
  10. 10. 10 OpenGL’s Recent Advancements 2014 2015 2016 New ARB Extensions 3 standard extensions, beyond 4.5 • ARB_sparse_buffer • ARB_pipeline_statistics_query • ARB_transform_feedback_overflow_query New ARB 2015 Extension Pack • Shader functionality • ARB_ES3_2_compatibility (shading language support) • ARB_parallel_shader_compile • ARB_gpu_shader_int64 • ARB_shader_atomic_counter_ops • ARB_shader_clock • ARB_shader_ballot • Graphics pipeline operation • ARB_fragment_shader_interlock • ARB_sample_locations • ARB_post_depth_coverage • ARB_ES3_2_compatibility (tessellation bounding box + multisample line width query) • ARB_shader_viewport_layer_array • Texture mapping functionality • ARB_texture_filter_minmax • ARB_sparse_texture2 • ARB_sparse_texture_clamp Maxwell Extensions • Novel graphics features • 14 new extensions • Global Illumination & Vector Graphics focus
  11. 11. 11 OpenGL’s Recent Advancements 2014 2015 2016 New ARB Extensions 3 standard extensions, beyond 4.5 • ARB_sparse_buffer • ARB_pipeline_statistics_query • ARB_transform_feedback_overflow_query Maxwell Extensions • Novel graphics features • 14 new extensions • Global Illumination & Vector Graphics focus New ARB 2015 Extension Pack • Shader functionality • ARB_ES3_2_compatibility (shading language support) • ARB_parallel_shader_compile • ARB_gpu_shader_int64 • ARB_shader_atomic_counter_ops • ARB_shader_clock • ARB_shader_ballot • Graphics pipeline operation • ARB_fragment_shader_interlock • ARB_sample_locations • ARB_post_depth_coverage • ARB_ES3_2_compatibility (tessellation bounding box + multisample line width query) • ARB_shader_viewport_layer_array • Texture mapping functionality • ARB_texture_filter_minmax • ARB_sparse_texture2 • ARB_sparse_texture_clamp Pascal Extensions • Novel graphics features • 5 new extensions • Virtual Reality focus OpenGL SPIR-V Support • Standard Shader Intermediate Representation • ARB_gl_spirv (not core) • Vulkan interoperability
  12. 12. 12 Available to Download Today • Beta driver with OpenGL 4.6 support July 31, 2017
  13. 13. 13 For those tracking birthdays... Then celebrating OpenGL 4.3 Now celebrating OpenGL 4.6
  14. 14. 14 Need a Refresher on 2014, 2015, and 2016 OpenGL? • Honestly, NVIDIA exposed lots of functionality in last 3 years Available @ http://www.slideshare.net/Mark_Kilgard
  15. 15. 15 Introducing OpenGL 4.6 • Big feature: SPIR-V support required • SPIR-V = standard intermediate language for parallel compute and graphics • Vulkan 1.0 standard requires expressing SPIR-V • Allows content creators to simplify their shader authoring and management pipelines • Previously this was an optional ARB extension, not required for 4.5 • Includes NEW ARB_spirv_extensions to SPIR-V support • Genius of AND: OpenGL 4.6 allows either GLSL or SPIR-V, your choice • Technically, NVIDIA’s Vulkan 1.0 allows use GLSL directly via an extension • Additional new ARB extensions bundled in OpenGL 4.6 for • Improving performance • Improving rendering quality • Resolving outstanding Intellectual Property (IP) issues support not built-in
  16. 16. 16 OpenGL extension exposing Khronos intermediate language for parallel compute and graphics Khronos extension for OpenGL + SPIR-V ARB extension announced last year July 22, 2016 Allows compiled SPIR-V code to be passed directly to OpenGL driver Accepts SPIR-V output from open source Glslang Khronos Reference compiler https://github.com/KhronosGroup/glslang Other compilers can target SPIR-V too Khronos standard extension ARB_gl_spirv +
  17. 17. 17 SPIR-V Ecosystem LLVM Third party kernel and shader Languages •SPIR-V •Khronos defined and controlled cross-API intermediate language •Native support for graphics and parallel constructs •32-bit Word Stream •Extensible and easily parsed •Retains data object and control flow information for effective code generation and translation OpenCL C++OpenCL C GLSL Khronos has open sourced these tools and translators IHV Driver Runtimes Other Intermediate Forms SPIR-V Validator SPIR-V (Dis)Assembler LLVM to SPIR-V Bi-directional Translator Khronos plans to open source these tools soon https://github.com/KhronosGroup/SPIR/tree/spirv-1.1 Open source C++ front-end released HLSL Khronos has open sourced these tools and translators Khronos plans to open source these tools soon Khronos has open sourced these tools and translators HLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators LLVM to SPIR-V Bi-directional Translator OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators SPIR-V Validator LLVM to SPIR-V Bi-directional Translator OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators SPIR-V (Dis)Assembler SPIR-V Validator LLVM to SPIR-V Bi-directional Translator OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators OpenGL support NEW with ARB_gl_spirv Standard in OpenGL 4.6
  18. 18. 18 NVIDIA’s SIGGRAPH Driver Update • NVIDIA historically releases a “developer” driver at SIGGRAPH with support for all Khronos standards announced at SIGGRAPH • This year too  • Monday (July 31, 2017) NVIDIA put out a new SIGGRAPH driver • OpenGL 4.6 (beta, expected to pass 4.6 Conformance when available) • Multi-vendor (EXT) interoperability extensions • Finally portable interoperability between OpenGL, Vulkan, OpenCL, etc. • Generic: EXT_memory_object, EXT_semaphore • Windows: EXT_memory_object_win32, EXT_win32_keyed_mutex, EXT_semaphore_win32 • Unix: EXT_memory_object_fd, EXT_semaphore_fd (Unix) • Other new extensions • NV_blend_minmax_factor, consistent with AMD_blend_minmax_factor • Fill in missing ES functionality gaps • EXT_clear_texture, EXT_conservative_depth, EXT_shader_group_vote, EXT_texture_compression_bptc, EXT_texture_sRGB_R8, EXT_draw_transform_feedback, OES_viewport_array • EXT_clip_cull_distance, ES support for clip planes & cull distances • EXT_protected_textures (Tegra & ES only) for protected content • For Windows and Linux operating systems • Also Vulkan improvements OpenGL 4.6 + Multi-vendor Interop + Vulkan Updates & More https://developer.nvidia.com/opengl-driver
  19. 19. 19
  20. 20. 20 What OpenGL 4.6 Packages Together • OpenGL evolves by bundling extensions as a core version update • OpenGL 4.6 = everything in 4.5 plus these extensions • ARB_indirect_parameters • ARB_pipeline_statistics_query • ARB_polygon_offset_clamp • KHR_no_error • ARB_shader_atomic_counter_ops (just extends OpenGL Shading Language) • ARB_shader_draw_parameters • ARB_shader_group_vote (just extends OpenGL Shading Language) • ARB_gl_spirv • ARB_spirv_extensions • ARB_texture_filter_anisotropic • ARB_transform_feedback_overflow_query • Now you can code for this functionality without ARB or EXT suffixing! The one technically “brand new” extension; other 4.6 functionality already proven & public
  21. 21. 21 ARB_indirect_parameters: Intro & Review •Evolving capability in OpenGL 4.x • General idea: allow the GPU to generate its own rendering work • Part of AZDO philosophy • AZDO = Approaching Zero Driver Overhead • Big idea: If GPU generates its own work, the driver overhead on the CPU diminishes • Example: compute shader generates sets of meshes; then renders those meshes • But we don’t want the GPU to “wait” for the CPU to orchestrate this effort •Builds on OpenGL 4.0 and 4.3’s improvements • 4.0 added indirect draws: instanced draw call’s parameters sourced from GPU buffer • 4.3 added multiple indirect draws: one GL command launched N indirect draws •OpenGL 4.6’s breakthrough: ARB_indirect_parameters • Now the count of multiple indirect draw batches itself can be sourced from the GPU
  22. 22. 22 Original Ways to Draw • Two primary ways to draw with vertex arrays • glDrawElements • Accepts an array of vertex indexes • glDrawArrays • Accepts a sequential range indexes • OpenGL 3.1 added instanced versions • glDrawElementsInstanced • glDrawArraysInstanced • Includes “instance count” parameter • Repeats each draw “instance count” times, changing gl_InstanceID each iteration
  23. 23. 23 Vertex 0 (x0,y0), (r0,g0,b0) Vertex 1 (x1,y1), (r0,g0,b0) Vertex 2 (x2,y2), (r0,g0,b0) Vertex 3 (x3,y3), (r1,g1,b1) Vertex 4 (x4,y4), (r1,g1,b1) Vertex 5 (x5,y5), (r1,g1,b1) glDrawArrays(GL_TRIANGLES, 0, 6); glDrawElements(GL_LINES, 12, GL_UNSIGNED_INT, 0); Vertex array buffer configuration Index buffer (element array) configuration 0 1 1 2 2 0 3 4 4 5 5 3
  24. 24. 24 Multi Draw Arrays • glMultiDrawArrays & glMultiDrawElements • Same as before, but loop over glDrawArrays or glDrawElements • Primitive count parameter says how many iterations • Each iteration sources non-mode parameters from CPU arrays • Fundamentally not more powerful than you writing the loop in your CPU code • But establishes a useful pattern for the future...
  25. 25. 25 Instancing • GPU draw the same primitive topology, N times • Shader or vertex attribute usage can transform & shader each instance differently • Loops to output a single set of draw indices multiple times • Each iteration outputs a different instance • GLSL shaders can access gl_InstanceID to behave differently per instance • Instancing alternative to using gl_InstanceID in your shader • glVertexAttribDivisor gives a vertex attribute array a divisor • When divisor is non-zero, floor(instance / divisor) is used for this array • Common usage: when divisor is 1 for a vertex attribute array, treats instance ID uses index • Effectively enables per-instance vertex arrays
  26. 26. 26 Power of Instancing • Vertex arrays with a single object mesh can render N distinct instances from a single GL command • Example image shows • Hundreds of instances • Draw from single mesh • Each instance has its own color & translation • Observations • GPU reads instanced vertex attributes • But CPU still launches the N instances Source: In2GPU
  27. 27. 27 Draw Indirect (OpenGL 4.0) • Conventional GL draw calls • Require directly passing parameters to each GL draw call to find the indices to source • Direct parameter passing means CPU supplies all the draw parameters • Causes CPU overhead on each draw • Solution: Draw Indirect • Sources each batch of draw arrays or draw elements parameters from a GPU buffer • Parameters, except for mode, accessed from GL_DRAW_INDIRECT_BUFFER binding • Big, non-obvious advantage • GPU can generate draw batches itself • Say with compute shaders • Means GPU can begin to feed itself
  28. 28. 28 Draw Indirect Buffer Layout • glDrawArraysIndirect • Takes: (GLenum mode, const void *indirect) • indirect is GPU offset to four 32-bit words • Mimics calling glDrawArraysInstanced(mode, cmd->first, cmd->count, cmd->primCount); • glDrawElementsIndirect • Takes: (GLenum mode, GLenum type, const void *indirect) • indirect is GPU buffer offset to five 32-bit words • Mimics calling glDrawElementsInstancedBaseVertex(mode, cmd->count, type, cmd->firstIndex * sizeof-type, cmd->primCount, cmd->baseVertex); • BUT cmd pointer indirection happens on the GPU sourced from a GL buffer object struct DrawArraysIndirectCommand { GLuint count; GLuint primCount; GLuint first; GLuint reservedMustBeZero; } ; struct DrawElementsIndirectCommand { GLuint count; GLuint primCount; GLuint firstIndex; GLint baseVertex; GLuint reservedMustBeZero; } ; Important: These structures are read by the GPU from GPU buffers
  29. 29. 29 Multi Draw Indirect (OpenGL 4.3) • Now a single GL command can launch multiple draw indirect operations • Takes a primitive count (N) for number of draw indirects • Performs N draw indirect operations • Each operation’s parameters are read from draw indirect buffer binding • Stride parameter • glMultiDrawArraysIndirect & glMultiDrawElementsIndirect • Single CPU command launches N draw indirect operations • All the parameters for all the draw indirect operations sourced by GPU • Very high leverage: tiny CPU effort can launch enormous amount of rendering
  30. 30. 30 ARB_indirect_parameters • Yet-another NEW buffer binding • glBindBuffer(GL_PARAMETER_BUFFER); • Buffer source for reading the indirect draw count • Two NEW commands • glMultiDrawArraysIndirectCount • glMultiDrawElementsIndirectCount • Like glMultiDraw{Arrays/Elements}Indirect except • NEW draw count offset parameter is a buffer offset into NEW current parameter buffer – parameter_buffer[drawcountoffset]  actual drawcount • Count clamped by maxdrawcount parameter • What’s better about OpenGL 4.6 version? • Free of ARB suffixes in OpenGL 4.6
  31. 31. 31 ARB_indirect_parameters Usage Scenario • Correctly-ordered blended dynamic particle system • Particles are semi-opaque 3D models, not just points • OpenGL compute shader computes particle interactions & what to render • Incrementally update particle positions & spin • Cull particles outside current view • Back-to-front sort of remaining viewable semi-opaque 3D models • Write out ordered, un-culled multi draw indirect to GL_DRAW_INDIRECT_BUFFER • Write out total of un-culled draw indirect count to GL_PARAMETER_BUFFER • Single glMultiDrawElementsIndirectCount command draws particles
  32. 32. 32 ARB_pipeline_statistics_query •New query types • Shares same API initially used for occlusion queries • glBeginQuery, glEndQuery, glGetQueryiv, glGenQueries, glDeleteQueries • Original occlusion queries just returned samples passed • Prior extensions added queries for transform feedback, conservative rasterization •Now extended to return rendering statistics throughout the pipeline • Shader invocation counts • How many primitives pass through different points in rendering pipeline •Useful for performance analysis • Without this functionality, very difficult to accurately know how much rendering work you are really creating • Particularly for modern OpenGL usage •Comparable to statistics available to Direct3D 11 • Compare with D3D11_QUERY_DATA_PIPELINE_STATISTICS
  33. 33. 33 Available Statistics Query token Queried statistic GL_VERTICES_SUBMITTED # of vertices issued to OpenGL GL_PRIMITIVES_SUBMITTED # of primitives issued to OpenGL GL_VERTEX_SHADER_INVOCATIONS # of times a vertex shader invoked GL_TESS_CONTROL_SHADER_PATCHES # of times a tessellation control shader invoked GL_TESS_EVALUATION_SHADER_INVOCATIONS # of times a tessellation evaluation shader invoked GL_GEOMETRY_SHADER_INVOCATIONS # of times a geometry shader invoked GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED # of primitives that entered primitive clipping GL_FRAGMENT_SHADER_INVOCATIONS # of times a fragment shader invoked GL_COMPUTE_SHADER_INVOCATIONS # of times a compute shader invoked GL_CLIPPING_INPUT_PRIMITIVES # of primitives that entered primitive clipping GL_CLIPPING_OUTPUT_PRIMITIVES # of primitives that output by primitive clipping
  34. 34. 34 Simple Example Usage • Creating a query object • GLuint query_object; • glGenQueries(1, &query_object); • Begin a query, do work, and end the query’s interval • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object); • renderLotsOfStuff! • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object); • Later read back to the CPU the query object’s result • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result; • glGetQueryObjectui64v(query_object, GL_QUERY_RESULT, &query_result); • When done with the query object • glDeleteQueries(1, &query_buffer); • Alternatively write query results into a buffer...
  35. 35. 35 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers create multiple query objects
  36. 36. 36 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers create buffer object for PU to write query results
  37. 37. 37 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers begin queries draw, and end them
  38. 38. 38 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers now have GPU write query results to GPU buffer
  39. 39. 39 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers later read the GPU buffer’s contents to the CPU
  40. 40. 40 ARB_polygon_offset_clamp • Extends OpenGL’s polygon offset feature •Polygon offset was one of OpenGL’s first extensions • Standardized by OpenGL 1.1 • Biases rasterized depth (Z) by constant bias + bias based on primitive’s depth maximum slope • What’s NEW in OpenGL 4.6 •Effective depth bias clamped to a specified maximum offset •Used to mitigate second-order light leak artifacts of polygon offset •Long supported by PlayStation 3 and Direct3D • First exposed in OpenGL as multi-vendor EXT extension •EXT_polygon_offset_clamp in 2014 •Adding to OpenGL 4.6 resolves IP issues Source: Eric Lengyel, Terathon Software
  41. 41. 41 Motivation & Usage of Polygon Offset • Motivation of polygon offset • Depth buffers must quantize depth values • Typically 24-bit fixed-point • Want to rasterize depth-tested geometry • BUT have need to disambiguate nearly identical depth values Rasterizing co-planar geometry, e.g. runway markings Constructing shadow maps needs Z values to be “pushed back a little” to avoid Z fighting causing self-shadowing artifacts Shadow acne due to Z fighting during shadow map testing Shadow acne avoided using polygon offset Hidden line and silhouette rendering via polygon offset
  42. 42. 42 Polygon Offset Justified (1) • Rasterizing triangles generates discretized depth values • A rasterizer’s depth slope for a triangle determines how Z values vary over triangle in pixel space • Triangles are “snapped” to sub-pixel fractional positions • Practical requirement, necessary for watertight rasterization • Rasterization hardware operates with finite fixed-point precision • Dealing with Z fighting isn’t as simple as “nudging Z values” a little closer/further • Two triangles logically in the same plane are NOT after • floating-point transformation • sub-pixel transformation • discrete depth interpolation • geometric mesh uncertainty – those triangles may appear co-planar, but are they really??
  43. 43. 43 Polygon Offset Justified (2) • Conceptually, think of interpolated depth as having “error bars” • Depth rasterization error isn’t “experimental” but rather “quantization” error • Important: The depth slope tells maximum the depth of a primitive will shift moving in pixel X & Y • So if there is uncertainty (read: quantization!) in X & Y, a primitive’s depth slope quantizes the maximum error per pixel shift • Hence polygon offset’s bias should be scaled by the maximum of the X & Y depth slopes • This is what the original OpenGL 1.1 polygon offset functionality does • Bias applied in unites of minimum Z buffer precision • Typically a bias of 1 or 2 and slope of 0.5 is enough to mitigate Z fighting • Accounts for half a pixel of Z error • Sounds fishy but (mostly) works! Think of your rasterized fragments & pixels having error bars for X & Y... and Z!
  44. 44. 44 Polygon Offset Improved! • Wait... Sounds fishy but (mostly) works! • Mostly?? • What can go wrong? • The depth slope can “get large” for geometry viewed edge-on • Gradient magnitude for slope is conservative and can get too large • So “fixing” shadow acne “exposes” light leaks • This is the “too much of a good thing” principle at work • Analogy: Band-aid on a band-aid: • If the bias can sometimes get too large... then • Clamp the maximum depth bias to some largest “reasonable” offset
  45. 45. 45 Using Polygon Offset Clamp •Easy API just adds new maximum depth bias clamp value • GL 1.1: glPolygonOffset(factor, units) • GL 4.6: glPolygonOffsetClamp(factor, units, clamp); •Changes the OpenGL specification’s equation for depth bias • WAS • NOW
  46. 46. 46 Examples of Light Leaks Mitigated by Polygon Offset Clamp BEFORE AFTER Solid girder’s shadow shows streaks Animates badly Mitigated by clamping
  47. 47. 47 Examples of Light Leaks Mitigated by Polygon Offset Clamp Dots of light within boot’s shadow Animates badly Mitigated by clamping BEFORE AFTERBEFORE AFTER
  48. 48. 48 KHR_no_error • The “no airbags” extension now part of OpenGL 4.6 • Makes OpenGL operation in the presence of GL_INVALID_VALUE, GL_INVALID_OPERATION, etc. undefined • GL_OUT_OF_MEMORY may still be generated but the occurrence might be delayed • Intended to make OpenGL more efficient by obviating error checking & handling • Hmm, not a large overhead in NVIDIA’s driver • Typically error checks are folded into parameter handling • Error checks are typically well-predicted “branches not taken” so cheap on modern CPUs • Your must “opt in” to the “no error” semantic at context creation • For EGL, works with eglCreateContext • See the EGL_KHR_create_context_no_error extension • Query the value of GL_CONTEXT_FLAGS for the GL_CONTEXT_FLAG_NO_ERROR_BIT to see if the “no errors” semantic is enabled for a context • WGL_ARB_create_context_no_error and GXL_ARB_create_context_no_error provide WGL and GLX mechanisms for requesting “no error” semantic for a context.
  49. 49. 49 NVIDIA’s KHR_no_error Advice • General advice: “Try it before you buy it” • Not generating errors has a severe side-effect (main effect!)  you’re blind to errors! • First confirm there’s some sufficient performance benefit to offset the risk • If you really are worried about API error detection overhead, consider Vulkan • And before you even try it: • Try disabling GL_DEBUG_OUTPUT_SYNCHRONOUS (part of OpenGL 4.3) first • This still detects GL errors but avoids returning errors synchronously • Asynchronous error and debug output helps NVIDIA’s dual-core driver to avoid app- driver synchronization events for errors and debug output • Then OpenGL API overhead can be relegated to another CPU improving performance • Without losing well-defined error handling • NVIDIA’s current “no errors” behavior is to simply hide posting the OpenGL error • So the current benefit of “no errors” is very meager • Errors are still detected and erroneous commands are ignored • Considerations • Expecting your software to work for years? • Is your application’s predictable operation important for your user base? • If yes, then blinding yourself to errors probably isn’t a good idea...
  50. 50. 50 ARB_shader_atomic_counter_ops • Completes OpenGL Shading Language (GLSL) support for atomic counters • Prior ARB_shader_atomic_counter limited to increment, decrement, & query ops • Operates on special atomic_uint variables • New built-in functions for atomic counters • Addition & Subtract: atomicCounterAdd & atomicCounterSubtract • Minimum & Maximum: atomicCounterMin & atomicCounterMax • Bitwise operators (AND, OR, XOR, etc.): atomicCounterAnd, atomicCounterOr, etc. • Exchange, Compare & Exchange: atomicCounterExchange, atomicCounterCompSwap • NOTE: Image loads & stores support similar atomics
  51. 51. 51 ARB_shader_draw_parameters • Adds to new GLSL built-in variables to get base vertex and instance • gl_BaseVertex • gl_BaseInstance • Useful for offsetting gl_VertexID or gl_InstanceID respectively • Also for glMultiDraw* commands, new GLSL built-in variable • gl_DrawID • glMultiDrawArrays, glMultiDrawArraysIndirect, glMultiDrawArraysIndirectCount • glMultiDrawElements, glMultiDrawElementsIndirect, glMultiDrawElementsIndirectCount • Rationale: lets app treat draw batches programmatically from within an über shader to better minimize state changes
  52. 52. 52 ARB_shader_group_vote • Provides new GLSL built-in functions to compute composite of a set of boolean conditions across a group of shader invocations • Functions returning a boolean • bool anyInvocation(bool value) all threads return true if value is true for any, otherwise false • bool allInvocations(bool value) all threads return true if value is true for all threads, otherwise false • bool allInvocationsEqual(bool value) all threads return true if value is identical (equal) for all threads, otherwise false
  53. 53. 53 ARB_shader_group_vote Rationale • Rationale • Implementation reality: GPUs run shader invocations using groups of threads • NVIDIA calls these groups “warps” • Threads run most efficient when they share the same sequence of instructions • This is called “converged execution” (good), instead of diverged execution (bad) • Group votes can keep threads running converged • Consider this an advanced optimization to your shaders • SPMT (“single program, multiple thread”) execution means shaders run reasonably even when divergence is possible • Example use: Common for all threads in shader to need exactly four loop iterations • If all threads can agree they are in the “4 iterations” case, the shader could be written with an unrolled loop in expectation of this common case • Thereby avoiding the loop overhead of the general case
  54. 54. 54 ARB_gl_spirv • This extension announced at SIGGRAPH 2016 • But was optional • NVIDIA announced support last year • Much more useful to have core part of OpenGL 4.6 • And NOW it is!
  55. 55. 55 OpenGL Driver GLSL Compilation prior to SPIR-V shader.vert shader.geom shader.frag your OpenGL app GPU GLSL Compiler Front-end GPU-specific Compiler Back-end
  56. 56. 56 OpenGL Driver GLSL Compiler Front-end ARB_gl_spirv Enabled Offline Compilation of GLSL to SPIR-V your OpenGL app GPU shader.vert shader.geom shader.frag shader.vert.spv shader.geom.pv shader.frag.spv glslangValidator or glslc GPU-specific Compiler Back-end SPIR-V Compiler Front-end
  57. 57. 57 Tools to Manipulate SPIR-V • Open source SPIR-V tools available • glslang: glslValidtator • Provides basic GLSL compiler that generates OpenGL friendly SPIR-V • Use the –G option for ARB_gl_spriv SPIR-V • https://github.com/KhronosGroup/glslang • SPIRV-Tools: spirv-as, spirv-dis, spirv-stats, etc. • Utilities for assembling, disassembling, or otherwise manipulating SPIR-V binaries • https://github.com/KhronosGroup/SPIRV-Tools • glslc • Compiler front-end matching conventional gcc/clang command line options • Use the --target-env=opengl_compat • https://github.com/google/shaderc • Your choice: • Build from source • Get pre-compiled from LunarG Vulkan SDK
  58. 58. 58 API Usage Differences: Compiling GLSL vs. SPIR-V glCreateProgram glShaderSource glCompileShader glAttachShader glCreateShader glLinkProgram glGetUniformLocation glGetAttribLocation Read GLSL text from file glUseProgram glProgramUniform* while more shader domains while more uniforms to introspect while more attributes to introspect
  59. 59. 59 API Usage Differences: Compiling GLSL vs. SPIR-V glCreateProgram glShaderBinary glSpecializeShader glAttachShader glCreateShader glLinkProgram Read SPIR-V binary blob from file glUseProgram glProgramUniform* while more shader domains while more uniforms to initialize app assume locations assigned within the shader, obviating dynamic introspection
  60. 60. 60 ARB_spirv_extensions • Original ARB_gl_spirv extension only added support for SPIR-V 1.0 concepts that were part of the OpenGL 4.5 Core Profile • Many OpenGL ARB and vendor extensions not in OpenGL 4.5 Core add shading language concepts • BUT being defined prior to the existence of SPIR-V support in OpenGL, they lack SPIR-V support for their additional features • Advertising an extension + its SPIR-V extension means the SPIR-V support for that extension is present • So ARB_spirv_extensions adds mechanism to advertise a driver’s supported SPIR-V extensions: • Glint num_spirv_extensions; glGetIntegerv(GL_NUM_SPIR_V_EXTENSIONS, &num_spirv_extensions); • for (int ndx=0; ndx<num_spirv_extensions; ndx++) const GLubyte *spirv_extension_name = glGetStringi(GL_SPIR_V_EXTENSIONS, ndx); • Also defines several SPIR-V extensions...
  61. 61. 61 First Set of SPIR-V Extensions SPIR-V Extension Name Corresponding OpenGL extension or functionality SPV_KHR_shader_ballot ARB_shader_ballot SPV_KHR_shader_draw_parameters ARB_shader_draw_parameters SPV_KHR_subgroup_vote ARB_shader_group_vote SPV_NV_stereo_view_rendering NV_stereo_view_rendering SPV_NV_viewport_array2 NV_viewport_array2 or ARB_shader_viewport_layer_array SPV_NV_geometry_shader_passthrough NV_geometry_shader_passthrough SPV_NV_sample_mask_override_coverage NV_sample_mask_override_coverage SPV_AMD_shader_explicit_vertex_parameter AMD_shader_explicit_vertex_parameter SPV_AMD_gpu_shader_half_float AMD_gpu_shader_half_float SPV_KHR_shader_atomic_counter_ops ARB_shader_atomic_counter_ops SPV_KHR_post_depth_coverage ARB_post_depth_coverage SPV_KHR_storage_buffer_storage_class Storage buffer support
  62. 62. 62 ARB_texture_filter_anisotropic • Fully compatible with long-standing EXT_texture_filter_anisotropic • Reduces texture filtering artifacts for viewed at glancing angles • Simple Ease to use: glTextureParameteri(texture_object, GL_TEXTURE_MAX_ANISOTROPY, 8);
  63. 63. 63
  64. 64. 64 ARB_transform_feedback_overflow_query • Adds new query types which can be used to detect overflow of transform feedback buffers * • GL_TRANSFORM_FEEDBACK_OVERFLOW if any stream overflows GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW if a particular indexed vertex stream overflows • These two NEW query types are also allowed for glBeginConditionalRender for conditional rendering • Allows the graphics pipeline can condition rendering on if a prior vertex stream operations overflowed • Comparable to Direct3D 11’s D3D11_QUERY_SO_OVERFLOW_PREDICATE* stream- out functionality * Transform feedback is when OpenGL streams transformed vertices into a GPU buffer
  65. 65. 65 Why OpenGL Core Updates are Important (1) • Not just opportunity for new functionality • A new specification is released that reconciles all the bundled extensions into a coherent single document • Also gives the OpenGL Working Group to better structure OpenGL’s specification • Opportunity to fix typos, improve consistency of terminology, clarify ambiguities, document expected error behavior • Almost two dozen different minor tweaks in 4.6, largely consequential to developers • Future extensions can then be written against a cleanly resolved 4.6 specification • Otherwise, extensions can overlap how they amend the core specification and lead to confusion • Ensures new functionality is covered by the Khronos Intellectual Property (IP) Framework • This allows OpenGL implementers, developers, and end-users to confidently depend on the functionality described • Specifically for 4.6, Intellectual Property concerns surrounding both anisotropic texture filtering and polygon offset clamping • Khronos maintains OpenGL, ES, and Vulkan in the same “IP zone”—so ratifying a Khronos standard resolves issues for related standards Coherent Specification Resolving IP Concerns
  66. 66. 66 Why OpenGL Core Updates are Important (2) • Not just opportunity for new functionality • Opportunity for a new Conformance Test Suite to be released • New tests obviously cover NEW functionality • But also include contributed tests for existing functionality • Without a new core specification, it is hard to enforce stronger conformance testing • Vendors would simply continue certifying with an older, weaker conformance test version • A new core version is a new opportunity to raise the shared quality bar for OpenGL • Developers adopt OpenGL features at different levels of comfort • Many developers are happy to use the latest, greatest features as soon as extensions are shipped in drivers • Other developers, often those with long-term support horizons, look for core updates to signal mature standards now ready to be adopted • Example: A graphics researcher and a medical device maker can both use OpenGL, but embrace the features provided at varying rates and at different milestones Conformance Testing QualitySheriff Developer Comfort Levels
  67. 67. 67 Why OpenGL Core Updates are Important (3) • Not just opportunity for new functionality • OpenGL Shading Language (GLSL) gets accompanying revision • So OpenGL 4.6 brings with it an updated GLSL • Like the core API specification, the GLSL specification needs reconciliation of new extensions, typos fixed, clarifications, etc. • As many Vulkan applications express shaders in GLSL and compile them with glslang to generate the SPIR-V that Vulkan expects, updating GLSL helps advance Vulkan • OpenGL core revisions are as much about consolidating OpenGL’s associated ecosystem support as simply adding NEW features to OpenGL Advancing the Ecosystem
  68. 68. 68 OpenGL 4.6’s Resolving of IP Issues & New Open Sourcing of OpenGL Conformance Suite Benefits Open Source OpenGL Implementation • Khronos using Vulkan’s conformance approach for OpenGL now • See https://github.com/KhronosGroup/VK-GL-CTS • Should help Mesa keep closer to latest official standard, better for OpenGL overall "OpenGL 4.6 will be the first OpenGL release where conformant open source implementations based on the Mesa project will be deliverable in a reasonable timeframe after release. The open sourcing of the OpenGL conformance test suite and ongoing work between Khronos and X.org will also allow for non-vendor led open source implementations to achieve conformance in the near future.“ David Airlie, senior principal engineer at Red Hat, and developer on Mesa / X.org projects Source: Khronos OpenGL 4.6 press release
  69. 69. 69 Credit for OpenGL 4.6 • Khronos relies on its member companies to complete new OpenGL core updates • Different companies drove different features, all free to comment and contribute • Representatives of these companies primarily drove the constituent features of OpenGL 4.6 See Appendix J of OpenGL 4.6 for comprehensive list of contributor companies and individuals
  70. 70. 70 NVIDIA OpenGL Initiatives in 2017 in addition to OpenGL 4.6
  71. 71. 71 GPU “Interop” Usage •Increasingly applications want to share GPU resources and mix APIs • Typically sophisticated applications •APIs involved might be • Graphics (OpenGL, Vulkan, Direct3D) • Compute (OpenCL, CUDA) • Video encode and decode (VDAPU, NVENC, NVDEC, Windows Media) •Multiple motivations for cross-process GPU resource sharing • Performance (don’t read back to CPU), latency control (VR compositing) • Robustness (isolation) • Security, including protecting digital media assets •Interop = jargon for two things • Sharing GPU resources among different APIs • Sharing GPU resources across process boundaries • For example, a display compositor
  72. 72. 72 Past Interop Extensions for OpenGL •Past interoperability extensions would pair OpenGL concepts to those of another one particular GPU API • Often exposed as proprietary extensions • Typically tied to platform concepts (e.g. Win32 HANDLEs) • Simple when API concepts match (e.g. OpenGL textures to Direct3D Surfaces) •Examples • NV_DX_interop mixed OpenGL and Direct3D 9 • NV_DX_interop2 mixes OpenGL and Direct3D 10 & 11 • NV_vdpau_interop mixes OpenGL and Linux VDAPAU video input/output surfaces • Additionally, CUDA & OpenCL have interop to OpenGL •Worked well as designed BUT...
  73. 73. 73 Responding to New Interop Requirements • Addressing criticism of prior interop extensions... • In many cases, single-vendor and proprietary extensions • Can we strive for something multi-vendor? • Overcoming NEW Managed vs. Explicit GPU API philosophy mismatches • Older GPU APIs (e.g. OpenGL, Direct3D 9,10,11) manage GPU resources and their underlying memory as one • Older APIs have textures, buffers, and synchronization objects • New GPU APIs (e.g. Vulkan, Direct3D 12) uses lower-level mechanisms to manage resources • Newer explicit APIs have explicit memory allocations and semaphores • Noticeable lack of common interop infrastructure • Can there be some common framework for interop • Isolate platform-specific methods to “import” objects into platform-specific extension • Windows uses HANDLEs, etc. • POSIX operating systems use file descriptors
  74. 74. 74 EXT_memory_object & EXT_semaphore • Vulkan introduces explicit memory and synchronization objects • EXT_memory_object imports Vulkan explicit memory objects to OpenGL • EXT_semaphore imports Vulkan semaphore objects to OpenGL • Extra interop mechanisms need to share GPU objects due to this • Platform-specific extensions specify how to import memory objects & semaphores • For POSIX systems (e.g. Linux), use EXT_memory_object_fd & EXT_sempahore_fd • fd = POSIX file descriptor • For Windows, use EXT_memory_object_win32 & EXT_semaphore_win32 • Uses either Win32’s opaque HANDLE type or KMT share handle • KMT = Kernel-Mode Thunk interface for Windows Display Driver Model (WDDM) • Also for interoperability with Direct3D 11 & 12 • Also EXT_win32_keyed_mutex provides access to the keyed synchronization primitive of Direct3D image objects
  75. 75. 75 EXT_semaphore • Introduces new object matching Vulkan-style semaphores • Basic operations on semaphores • Object management • glGenSemaphoresEXT generates semaphore object names • glDeleteSemaphoresEXT deletes semaphore objects • Parameter setting & querying • glSemaphoreParameterui64vEXT & GetSemaphoreParameterui64vEXT • Semaphore parameters are platform dependent (e.g. GL_D3D12_FENCE_VALUE_EXT) • Semaphore operations • glSignalSemaphoreEXT signals a semaphore • glWaitSemaphoreEXT waits until something signals the semaphore
  76. 76. 76 EXT_memory_object • Introduces new memory object corresponding to Vulkan concept • Import memory objects with platform-specific API • Then “carve out” managed OpenGL textures and buffers from a memory object • Commands to make textures: glTexStorageMem1DEXT, glTexStorageMem2DEXT, glTexStorageMem3DEXT, glTexStorageMem2DMultisampleEXT, glTexStorageMem3DMultisampleEXT • Also Direct State Access (DSA) versions: glTextureStorageMem2DEXT, etc. • Commands to carve out a buffer: glBufferStorageMemEXT, glNamedBufferStorageMemEXT
  77. 77. 77 OpenGL ES Parity • Mobile developers often target OpenGL ES • Apple’s iOS and Google’s Android use of ES made the de facto standard graphics API for mobile • Moore’s Law has eliminated the need for ES on NVIDIA products • ES 2.0/3.x is supported along with full OpenGL 4.x feature set • Essentially an ES context “hides” the complete OpenGL 4.x feature set • Good for compatibility and portability to other vendor’s less functional GPUs • Unfortunately ES has been slow to adopt important GPU features • NVIDIA makes sure developers relying on ES contexts don’t forego missing features • NVIDIA works to coordinate multi-vendor EXT extensions to ES • NVIDIA supports fully conformant ES contexts (+ extensions) even on Windows and Linux • NVIDIA’s OpenGL in 2017 adds many ES parity extensions... ???
  78. 78. 78 Oh, 3D developer—you flatter me noticing my complete & mature feature set With ES parity, what does she have that I don’t? OpenGL 4.6 Context ES 3.2 Context
  79. 79. 79 ES Parity Extensions for 2017 Extension name Functionality EXT_clear_texture Clear texture images & sub-images EXT_conservative_depth Bound direction of fragment shader depth output EXT_shader_group_vote Collective decision making in shaders EXT_texture_compression_bptc Compressed texture formats corresponding to Direct3D’s BC6 (8-bit RGB & RGBA) and BC7 (for HDR) formats EXT_texture_compression_rgtc One- and two-component texture compression EXT_texture_sRGB_R8 Single-component (red) sRGB color-space component encoding EXT_draw_transform_feedback Adds missing transform feedback API to ES intended for geometry shaders’s variable output vertices EXT_clip_cull_distance Clipping and culling planes OES_viewport_array Viewport index support for geometry shaders KHR_parallel_shader_compile Request multi-threaded GLSL shader compilation
  80. 80. 80 Still ES Lacks Much, NVIDIA Provides What’s Missing •The 2017 multi-vendor parity extensions highlight what’s missing from standard ES 3.2 •Additional major items missing from standard ES 3.2 • Texture views with OES_texture_view missed ES 3.2 inclusion • GPU-accelerated path rendering with NV_path_rendering for ES •BUT NVIDIA’s OpenGL ES context provides these •If ES still isn’t enough, just use an OpenGL 4.6 context • For example, Direct State Access is not in ES contexts  +
  81. 81. 81 NVIDIA’s ES Parity Philosophy •The idea of “ES Parity” is NOT to turn an ES context into an OpenGL 4.x context •The idea is to expose • Features NVIDIA’s ES developer base has requested • Features that we judge other ES vendors could reasonably support • When Khronos ES vendors broadly agree, we work towards an OES extension – Example: OES_viewport_array • When just subset of Khronos ES vendors agree, we work for a multi-vendor EXT extension – Example: EXT_clip_cull_distance • As a last resort, when other ES vendors don’t share our interest, we go with NV • Need a feature missing from ES? Speak up •NVIDIA does not expose extensions broadly inconsistent with ES’s philosophy • For example, fixed-function, immediate-mode, and display lists aren’t candidates for ES parity • Developers desiring such functionality are better off with OpenGL 4.x contexts
  82. 82. 82 NVIDIA ES Parity Enhancements Result of NVIDIA’s ES Parity Efforts Full OpenGL ES 3.2 ES 3.1 ES 3.0 ES 2.0 Industry’s most functional and full-featured ES driver OSes and Architectures Android, Linux, Windows, FreeBSD; x86, ARM, IBM PowerPC
  83. 83. 83 Perspective of ES Parity from an OpenGL 4.6 Context NVIDIA ES Parity Enhancements Full OpenGL ES 3.2 ES 3.1 ES 3.0 ES 2.0 NVIDIA OpenGL 4.6 with maximally functional extensions Same driver provides ES and 4.6 contexts Only difference between ES and 4.6 context is ES context disables non-ES usage
  84. 84. 84 Miscellaneous NEW Extensions for 2017 • NV_blend_minmax_factor, based on AMD_blend_minmax_factor • EXT_protected_textures (Tegra & ES only) • Used with EGL’s EGL_EXT_protected_content Miscellaneous 2017
  85. 85. 85 NV_blend_minmax_factor: Modulated Min/Max Blending • Original GL_MIN and GL_MAX blend equations limited • Both ignore the blend source and destination blend factors from glBlendFunc • Limitation of original SGI hardware • Conventional min/max blend equations • blendResult = min(sourceColor, destinationColor) • blendResult = max(sourceColor, destinationColor) • AMD_blend_minmax_factor extension generalizes with two new blend equations • GL_FACTOR_MIN_AMD: blendResult = min(sourceColor × sourceFactor, destinationColor × destinationFactor) • GL_FACTOR_MIN_AMD: blendResult = max(sourceColor × sourceFactor, destinationColor× destinationFactor) • NV_blend_minmax_factor provides same capability • Just with a few restrictions, matching blend equation advanced restrictions • Not for use with dual-source blending • Not for mismatched multiple draw buffers • Single-precision floating-point blending done in half-precision • Otherwise compatible with AMD extension (uses same token values)
  86. 86. 86 NV_blend_minmax_factor Example • Blend code • blendResult = max(sourceColor, destinationColor × (1−sourceAlpha)) • Code to configure • glEnable(GL_BLEND); • glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); • glBlendEquation(GL_FACTOR_MIN_AMD); • Extension supported on Maxwell and later GPU generations Unconventional blending Source: Visual Music Systems
  87. 87. 87 EGL_EXT_protected_content & EXT_protected_textures (1) •Together provide OpenGL protected access control to GPU images • Intended for managing trust in display compositors and apps • Designed for Android •GL_TEXTURE_PROTECTED_EXT texture parameter • Applies to OpenGL texture objects • And hence applies to framebuffer objects containing texture objects • Boolean, defaults to false (unprotected) unless explicitly specified true •EGL_PROTECTED_CONTENT_EXT attribute • Applies to EGL surfaces and EGLImages • Boolean, defaults to false (unprotected) unless explicitly specified true •Texture objects, EGL surfaces, and EGLImages all “resources” subject to protection
  88. 88. 88 EGL_EXT_protected_content & EXT_protected_textures (2) • Pipeline stages of OpenGL contexts can also be designated protected and unprotected • Scenario: • display compositor uses a protected context • while apps would use unprotected contexts • Technically different GPU stages can be protected vs. non-protected • General access rules • Protected pipeline stages • Can read any EGL surfaces and images, protected or otherwise • BUT may NOT write non-protected EGL surfaces and images • Non-protected contexts/stages • Can read & write non-protected • BUT may NOT read or write protected content • Expectation: GPU & operating system together enforce resource protection via protected virtual memory mappings
  89. 89. 89 EGL_EXT_protected_content Scenarios • Android 7.0’s secure texture video playback • Allows secure GPU post-processing of protected image content • Supports secure Digital Rights Management (DRM) Source: Google
  90. 90. 90 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year NumberofOpenGLextensionsproposed Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
  91. 91. 91 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 NumberofOpenGLextensionsproposed 1.5 4.5 4.6 4.4 4.34.24.1 3.3 & 4.0 3.2 3.1 3.0 2.1 2.0 1.4 1.3 1.21.1 OpenGL core version updates Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal
  92. 92. 92 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 NumberofOpenGLextensionsproposed Run-up to DirectX 10 Run-up to DirectX 11 Run-up to DirectX 12 TNT + GeForce Run-up to DirectX 8 Run-up to DirectX 9 Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal
  93. 93. 93 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 NumberofOpenGLextensionsproposed Tesla development (GeForce 8,9,100,200,300) Fermi development (~GeForce 400) Kepler development (~GeForce 600) GeForce 1,2 GeForce 3,4 GeForce 5,6,7 Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal Pascal development (~GeForce 10) Maxwell development (~GeForce 700-900) Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates
  94. 94. 94 Cumulative Implemented NVIDIA OpenGL Extensions Over 20 Years 0 50 100 150 200 250 300 350 400 450 500 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 CumulativeImplemented OpenGLextensionsproposed Same data as prior graphs, just integrated over time
  95. 95. 95 NVIDIA OpenGL Shader Caching • NVIDIA OpenGL driver saves GLSL shaders it compiles • Cached compiled shaders saved to your local disk • Next time you compile the same shader, driver loads the post-compiled cached copy • Saves compilation time! • Invalidated on new driver installation • Games can “warm” cache on installation or first play to speed game loading • Available both on Windows and Linux! • Windows location • %LOCALAPPDATA%NVIDIAGLCache • For older drivers, used %APPDATA %NVIDIAGLCache • Linux location • %HOME/.nv/GLCache • Or $XDG_CACHE_HOME/.nv/ if XDG_CACHE_HOME environment variable set • Following the convention set by the XDG Base Directory Standard • Locations subject to change with future drivers and new conventions
  96. 96. 96 Linux Graphics Open Source Efforts from NVIDIA • NVIDIA works to improve graphics support for entire Linux ecosystem • Examples • GL Vendor-Neutral Dispatch (GLVND) • arbitrates vendor-neutral access to OpenGL and EGL/GLX APIs • Wayland support for EGL Streams • Video Decode and Presentation API for Unix (VDPAU) • complete solution for decoding, post-processing, compositing, and displaying compressed or uncompressed video streams • All open source projects
  97. 97. 97 GLVND: GL Vendor-Neutral Dispatch library • libglvnd • Arbitrates OpenGL API calls between multiple vendors • Multiple drivers from different vendors to coexist on the same file system • Determines which vendor to dispatch each API call to at runtime • Both GLX and EGL are supported • Any combination with OpenGL and OpenGL ES (1.1, 2.0, 3.x) • NVIDIA open source contribution • https://github.com/NVIDIA/libglvnd
  98. 98. 98 Before GLVND NVIDIA Proprietary Linux Driver Mesa + Nouveau I control OpenGL best on NVIDIA GPUs But I got here first! Drivers driving you crazy! I just want my Linux window system to start! pre-GLVND user
  99. 99. 99 GLVND Architecture libOpenGL mapi/glapi libGLdispatch libGLX libGL X11 Server GLX_EXT_libglnvd extension GLX_vendor GLX_vendor2 OpenGL or ES Application
  100. 100. 100 0 NVIDIA’s Support for Wayland • Wayland • Intended as simpler replacement for X Window System • A protocol for a compositor to talk to its clients • Plus the C library implementation of that protocol • Depends on a compositor (e.g. Weston) that is the display server • Supports varying window managers (e.g. Mutter for Gnome) • Wayland is supported on NVIDIA GPUs through EGL Streams • Using NVIDIA’s Proprietary OpenGL driver performance & quality • Both Weston and Mutter (used by gnome-shell) currently have EGL Stream support • Although not by default • See https://github.com/NVIDIA/egl-wayland • NVIDIA open source project
  101. 101. 101 1 Latest VDPAU Support • Video Decode and Presentation API for Unix (VDPAU) • Latest NVIDIA GPUs (GeForce 1080, etc.) • Supports VDPAU Feature Set H • Hardware-accelerated decoding of 8192x8192 (8k) H.265/HEVC video streams • Full support of HEVC Main12 profile
  102. 102. 102 2 NVIDIA Codec SDK 8.0 • Two hardware acceleration interfaces: • NVENCODE API for video encode acceleration • NVDECODE API for video decode acceleration • Integration already available in the FFmpeg/libav • New in 8.0 • 10/12-bit decoding support with HEVC/VP9, enabling end-to-end HDR transcoding • Improved quality via weighted prediction • Support for OpenGL inputs (Linux only) Download for registered developers: https://developer.nvidia.com/designworks/video_codec_sdk/downloads/v8.0 Info: https://developer.nvidia.com/nvidia-video-codec-sdk
  103. 103. 103 3 Supported Video Encoding Formats by GPU Generation * Except GM108 ** Except GP100 (is limited to 4K resolution) 8k encoding for latest GPUs!
  104. 104. 104 4 GPU Encoding: Awesome Performance & Comparable Quality Bigger is faster for NVENC Comparable peak signal-to-noise ratio indicates comparable quality
  105. 105. 105 5 Supported Video Decode Formats by GPU Generation * Except GM108 ** Max resolution support is limited to selected Pascal chips *** VP8 decode support is limited to selected Pascal chips **** VP9 10/12 bit decode support is limited to select Pascal chips 8k encoding for latest GPUs!
  106. 106. 106 6 NVDEC to OpenGL to NVENC NVDEC NVENC OpenGL texture object OpenGL texture object OpenGL texture object Linux only for GL to NVENC For Windows, use OpenGL interop into Direct3D surfaces to encode from Direct3D surfaces Decode into Rendering to Framebuffer Objects Encode from
  107. 107. 107 7 Proven GPU Codec Technology •Same underlying technology powers these services Play your PC games on your PC, encode to the cloud Play your PC game on your PC, decode & play on your SHIELD TV
  108. 108. 108 8 GLEW Support Available NOW GLEW = The OpenGL Extension Wrangler Library Open source library Pre-built distribution: http://glew.sourceforge.net/ Source code: https://github.com/nigels-com/glew Your one-stop-shop for API support for all OpenGL extension APIs Now released GLEW 2.1 (July 31, 2017) provides API support for OpenGL 4.6 Multi-vendor EXT interoperability extensions All of NVIDIA’s Maxwell & Pascal extensions All other NVIDIA multi-GPU generation initiatives Examples: NV_path_rendering, NV_command_list, NV_gpu_multicast Thanks to Nigel Stewart, GLEW maintainer, for this
  109. 109. 109 9 NVIDIA OpenGL in 2017 Provides OpenGL’s Maximally Available Superset OpenGL 4.6 Pascal Extensions 2015 ARB extensions OpenGL 4.5 Core Maxwell Extensions Legacy EXT & Other Compatibility Extensions OpenGL Complete Compatibility Path Rendering Multi-GPU. SLI Approaching Zero Driver Overhead NVIDIA Multi-generation GPU Initiatives DirectX inter-op Vulkan inter-op ES Enhancements Full OpenGL ES 3.2 Khronos Standard Expected Compatibility NVIDIA Initiatives GPU Generation Features
  110. 110. 110 0 Last Words •Khronos announces OpenGL 4.6 today! Best OpenGL yet •Highlights of NVIDIA’s OpenGL support in 2017 • NVIDIA has OpenGL 4.6 today, developer preview driver available NOW • SPIR-V support standard part of OpenGL now • Multi-vendor EXT interoperability extensions NEW this year • “ES Parity” effort for 2017 • Miscellaneous extensions: protected content, min/max factor blending • Open source graphics contributions from NVIDIA • GLVND, VDPAU for video processing, and Wayland EGL Streams support • GPU-accelerated Encode & Decode
  111. 111. 111 1 SIGGRAPH Paper Using OpenGL to Check Out • How to make shaders modular without giving up performance • Open source on github • Accompanied by OpenGL and Vulkan demo • Wednesday, 2 August • Los Angeles Convention Center, Room 150/151 • 10:45 am - 12:35 pm
  112. 112. 2

×