SlideShare a Scribd company logo
1 of 112
Mark Kilgard, July 31
SIGGRAPH 2017, Los Angeles
NVIDIA OpenGL in 2017
2
Mark Kilgard
• Principal System Software Engineer
OpenGL driver and API evolution
Cg (“C for graphics”) shading language
GPU-accelerated path rendering & web browser
rendering
• OpenGL Utility Toolkit (GLUT) implementer
• Specified and implemented much of OpenGL
• Author of OpenGL for the X Window System
• Co-author of Cg Tutorial
• Worked on OpenGL for over 25 years
My Background
3
OpenGL 4.6 with SPIR-V support announced today
4
NVIDIA’s OpenGL Leverage
Debugging with
Nsight
Programmable
Graphics
Android & SHIELD
Quadro
OptiX
GeForce
Adobe Creative Cloud
Embedded Projects
5
OpenGL Codebase Leverage
Same driver code base supports multiple APIs
OpenGL for Embedded,
Mobile, and Web
Multi-vendor, explicit, low-level graphics
from Khronos
6
NVIDIA’s Shading Compiler Even More Leveraged
Various
Direct3D versions3D APIs based on NVIDIA OpenGL driver code base
NVIDIA Shading Compiler code base
Apple’s proprietary
graphics API
Proprietary console API
7
Still the One Truly Common & Open 3D API
OS X
Linux
FreeBSD
Solaris
Android
Windows
Embedded
Designs
8
NVIDIA OpenGL in 2017 Provides
OpenGL’s Maximally Available Superset
OpenGL 4.6
Pascal
Extensions
2015 ARB extensions
OpenGL 4.5
Core
Maxwell
Extensions
Legacy EXT & Other
Compatibility Extensions
OpenGL Complete
Compatibility
Path Rendering
Multi-GPU.
SLI
Approaching Zero
Driver Overhead
NVIDIA Multi-generation
GPU Initiatives
DirectX inter-op
Vulkan inter-op
ES Enhancements
Full OpenGL
ES 3.2
Khronos Standard
Expected Compatibility
NVIDIA Initiatives
GPU Generation Features
9
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• ARB_pipeline_statistics_query
• ARB_transform_feedback_overflow_query
Maxwell Extensions
• Novel graphics features
• 14 new extensions
• Global Illumination &
Vector Graphics focus
10
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• ARB_pipeline_statistics_query
• ARB_transform_feedback_overflow_query
New ARB 2015 Extension Pack
• Shader functionality
• ARB_ES3_2_compatibility (shading
language support)
• ARB_parallel_shader_compile
• ARB_gpu_shader_int64
• ARB_shader_atomic_counter_ops
• ARB_shader_clock
• ARB_shader_ballot
• Graphics pipeline operation
• ARB_fragment_shader_interlock
• ARB_sample_locations
• ARB_post_depth_coverage
• ARB_ES3_2_compatibility
(tessellation bounding box +
multisample line width query)
• ARB_shader_viewport_layer_array
• Texture mapping
functionality
• ARB_texture_filter_minmax
• ARB_sparse_texture2
• ARB_sparse_texture_clamp
Maxwell Extensions
• Novel graphics features
• 14 new extensions
• Global Illumination &
Vector Graphics focus
11
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• ARB_pipeline_statistics_query
• ARB_transform_feedback_overflow_query
Maxwell Extensions
• Novel graphics features
• 14 new extensions
• Global Illumination &
Vector Graphics focus
New ARB 2015 Extension Pack
• Shader functionality
• ARB_ES3_2_compatibility (shading
language support)
• ARB_parallel_shader_compile
• ARB_gpu_shader_int64
• ARB_shader_atomic_counter_ops
• ARB_shader_clock
• ARB_shader_ballot
• Graphics pipeline operation
• ARB_fragment_shader_interlock
• ARB_sample_locations
• ARB_post_depth_coverage
• ARB_ES3_2_compatibility
(tessellation bounding box +
multisample line width query)
• ARB_shader_viewport_layer_array
• Texture mapping
functionality
• ARB_texture_filter_minmax
• ARB_sparse_texture2
• ARB_sparse_texture_clamp
Pascal Extensions
• Novel graphics features
• 5 new extensions
• Virtual Reality focus
OpenGL SPIR-V Support
• Standard Shader
Intermediate Representation
• ARB_gl_spirv (not core)
• Vulkan interoperability
12
Available to Download Today
• Beta driver with OpenGL 4.6 support
July 31, 2017
13
For those tracking birthdays...
Then celebrating OpenGL 4.3 Now celebrating OpenGL 4.6
14
Need a Refresher on 2014, 2015, and 2016 OpenGL?
• Honestly, NVIDIA exposed lots of functionality in last 3 years
Available @ http://www.slideshare.net/Mark_Kilgard
15
Introducing OpenGL 4.6
• Big feature: SPIR-V support required
• SPIR-V = standard intermediate language for parallel compute and graphics
• Vulkan 1.0 standard requires expressing SPIR-V
• Allows content creators to simplify their shader authoring and management pipelines
• Previously this was an optional ARB extension, not required for 4.5
• Includes NEW ARB_spirv_extensions to SPIR-V support
• Genius of AND: OpenGL 4.6 allows either GLSL or SPIR-V, your choice
• Technically, NVIDIA’s Vulkan 1.0 allows use GLSL directly via an extension
• Additional new ARB extensions bundled in OpenGL 4.6 for
• Improving performance
• Improving rendering quality
• Resolving outstanding Intellectual Property (IP) issues
support not built-in
16
OpenGL extension exposing Khronos intermediate
language for parallel compute and graphics
Khronos extension for OpenGL + SPIR-V
ARB extension announced last year
July 22, 2016
Allows compiled SPIR-V code to be passed directly to OpenGL driver
Accepts SPIR-V output from open source Glslang Khronos Reference compiler
https://github.com/KhronosGroup/glslang
Other compilers can target SPIR-V too
Khronos standard extension ARB_gl_spirv
+
17
SPIR-V Ecosystem
LLVM
Third party kernel and
shader Languages
•SPIR-V
•Khronos defined and controlled
cross-API intermediate language
•Native support for graphics
and parallel constructs
•32-bit Word Stream
•Extensible and easily parsed
•Retains data object and control
flow information for effective
code generation and translation
OpenCL C++OpenCL C
GLSL
Khronos has open sourced
these tools and translators
IHV Driver
Runtimes
Other
Intermediate
Forms
SPIR-V Validator
SPIR-V (Dis)Assembler LLVM to SPIR-V
Bi-directional
Translator
Khronos plans to open
source these tools soon
https://github.com/KhronosGroup/SPIR/tree/spirv-1.1
Open source C++
front-end released
HLSL
Khronos has open sourced
these tools and translators
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators HLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
LLVM to SPIR-V
Bi-directional
Translator
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
SPIR-V Validator
LLVM to SPIR-V
Bi-directional
Translator
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
SPIR-V (Dis)Assembler
SPIR-V Validator
LLVM to SPIR-V
Bi-directional
Translator
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
OpenGL support NEW with
ARB_gl_spirv
Standard in
OpenGL 4.6
18
NVIDIA’s SIGGRAPH
Driver Update
• NVIDIA historically releases a “developer” driver at SIGGRAPH with support for all Khronos
standards announced at SIGGRAPH
• This year too 
• Monday (July 31, 2017) NVIDIA put out a new SIGGRAPH driver
• OpenGL 4.6 (beta, expected to pass 4.6 Conformance when available)
• Multi-vendor (EXT) interoperability extensions
• Finally portable interoperability between OpenGL, Vulkan, OpenCL, etc.
• Generic: EXT_memory_object, EXT_semaphore
• Windows: EXT_memory_object_win32, EXT_win32_keyed_mutex, EXT_semaphore_win32
• Unix: EXT_memory_object_fd, EXT_semaphore_fd (Unix)
• Other new extensions
• NV_blend_minmax_factor, consistent with AMD_blend_minmax_factor
• Fill in missing ES functionality gaps
• EXT_clear_texture, EXT_conservative_depth, EXT_shader_group_vote, EXT_texture_compression_bptc,
EXT_texture_sRGB_R8, EXT_draw_transform_feedback, OES_viewport_array
• EXT_clip_cull_distance, ES support for clip planes & cull distances
• EXT_protected_textures (Tegra & ES only) for protected content
• For Windows and Linux operating systems
• Also Vulkan improvements
OpenGL 4.6 + Multi-vendor Interop + Vulkan Updates & More
https://developer.nvidia.com/opengl-driver
19
20
What OpenGL 4.6 Packages Together
• OpenGL evolves by bundling extensions as a core version update
• OpenGL 4.6 = everything in 4.5 plus these extensions
• ARB_indirect_parameters
• ARB_pipeline_statistics_query
• ARB_polygon_offset_clamp
• KHR_no_error
• ARB_shader_atomic_counter_ops (just extends OpenGL Shading Language)
• ARB_shader_draw_parameters
• ARB_shader_group_vote (just extends OpenGL Shading Language)
• ARB_gl_spirv
• ARB_spirv_extensions
• ARB_texture_filter_anisotropic
• ARB_transform_feedback_overflow_query
• Now you can code for this functionality without ARB or EXT suffixing!
The one technically “brand new” extension;
other 4.6 functionality already proven & public
21
ARB_indirect_parameters: Intro & Review
•Evolving capability in OpenGL 4.x
• General idea: allow the GPU to generate its own rendering work
• Part of AZDO philosophy
• AZDO = Approaching Zero Driver Overhead
• Big idea: If GPU generates its own work, the driver overhead on the CPU diminishes
• Example: compute shader generates sets of meshes; then renders those meshes
• But we don’t want the GPU to “wait” for the CPU to orchestrate this effort
•Builds on OpenGL 4.0 and 4.3’s improvements
• 4.0 added indirect draws: instanced draw call’s parameters sourced from GPU buffer
• 4.3 added multiple indirect draws: one GL command launched N indirect draws
•OpenGL 4.6’s breakthrough: ARB_indirect_parameters
• Now the count of multiple indirect draw batches itself can be sourced from the GPU
22
Original Ways to Draw
• Two primary ways to draw with vertex arrays
• glDrawElements
• Accepts an array of vertex indexes
• glDrawArrays
• Accepts a sequential range indexes
• OpenGL 3.1 added instanced versions
• glDrawElementsInstanced
• glDrawArraysInstanced
• Includes “instance count” parameter
• Repeats each draw “instance count” times, changing gl_InstanceID each iteration
23
Vertex 0 (x0,y0), (r0,g0,b0)
Vertex 1 (x1,y1), (r0,g0,b0)
Vertex 2 (x2,y2), (r0,g0,b0)
Vertex 3 (x3,y3), (r1,g1,b1)
Vertex 4 (x4,y4), (r1,g1,b1)
Vertex 5 (x5,y5), (r1,g1,b1)
glDrawArrays(GL_TRIANGLES, 0, 6); glDrawElements(GL_LINES,
12, GL_UNSIGNED_INT, 0);
Vertex array buffer configuration Index buffer (element array) configuration
0 1 1 2 2 0 3 4 4 5 5 3
24
Multi Draw Arrays
• glMultiDrawArrays & glMultiDrawElements
• Same as before, but loop over glDrawArrays or glDrawElements
• Primitive count parameter says how many iterations
• Each iteration sources non-mode parameters from CPU arrays
• Fundamentally not more powerful than you writing the loop in your CPU code
• But establishes a useful pattern for the future...
25
Instancing
• GPU draw the same primitive topology, N times
• Shader or vertex attribute usage can transform & shader each instance differently
• Loops to output a single set of draw indices multiple times
• Each iteration outputs a different instance
• GLSL shaders can access gl_InstanceID to behave differently per instance
• Instancing alternative to using gl_InstanceID in your shader
• glVertexAttribDivisor gives a vertex attribute array a divisor
• When divisor is non-zero, floor(instance / divisor) is used for this array
• Common usage: when divisor is 1 for a vertex attribute array, treats instance ID uses index
• Effectively enables per-instance vertex arrays
26
Power of Instancing
• Vertex arrays with a single object mesh can
render N distinct instances from a single GL
command
• Example image shows
• Hundreds of instances
• Draw from single mesh
• Each instance has its own color & translation
• Observations
• GPU reads instanced vertex attributes
• But CPU still launches the N instances
Source: In2GPU
27
Draw Indirect (OpenGL 4.0)
• Conventional GL draw calls
• Require directly passing parameters to each GL draw call to find the indices to source
• Direct parameter passing means CPU supplies all the draw parameters
• Causes CPU overhead on each draw
• Solution: Draw Indirect
• Sources each batch of draw arrays or draw elements parameters from a GPU buffer
• Parameters, except for mode, accessed from GL_DRAW_INDIRECT_BUFFER binding
• Big advantage
• GPU can generate draw batches itself
• Say with compute shaders
• Means GPU can feed itself
28
Draw Indirect Buffer Layout
• glDrawArraysIndirect
• Takes: (GLenum mode, const void *indirect)
• indirect is GPU offset to four 32-bit words
• Mimics calling
glDrawArraysInstanced(mode, cmd->first,
cmd->count, cmd->primCount);
• glDrawElementsIndirect
• Takes: (GLenum mode, GLenum type, const void *indirect)
• indirect is GPU buffer offset to five 32-bit words
• Mimics calling
glDrawElementsInstancedBaseVertex(mode,
cmd->count, type, cmd->firstIndex * sizeof-type,
cmd->primCount, cmd->baseVertex);
• BUT cmd pointer indirection happens on the GPU sourced
from a GL buffer object
struct DrawArraysIndirectCommand {
GLuint count;
GLuint primCount;
GLuint first;
GLuint reservedMustBeZero;
} ;
struct DrawElementsIndirectCommand {
GLuint count;
GLuint primCount;
GLuint firstIndex;
GLint baseVertex;
GLuint reservedMustBeZero;
} ;
Important: These structures
are read by the GPU from
GPU buffers
29
Multi Draw Indirect (OpenGL 4.3)
• Now a single GL command can launch multiple draw indirect operations
• Takes a primitive count (N) for number of draw indirects
• Performs N draw indirect operations
• Each operation’s parameters are read from draw indirect buffer binding
• Stride parameter
• glMultiDrawArraysIndirect & glMultiDrawElementsIndirect
• Single CPU command launches N draw indirect operations
• All the parameters for all the draw indirect operations sourced by GPU
• Very high leverage: tiny CPU effort can launch enormous amount of rendering
30
ARB_indirect_parameters
• Yet-another new buffer binding
• glBindBuffer(GL_PARAMETER_BUFFER);
• Buffer source for reading the indirect draw count
• Two new commands
• glMultiDrawArraysIndirectCount
• glMultiDrawElementsIndirectCount
• Like glMultiDraw{Arrays/Elements}Indirect except
• NEW draw count offset parameter is a buffer offset into NEW current parameter buffer
– parameter_buffer[drawcountoffset]  actual drawcount
• Count clamped by maxdrawcount parameter
• What’s better about OpenGL 4.6 version?
• Free of ARB suffixes in OpenGL 4.6
31
ARB_indirect_parameters Usage Scenario
• Correctly-ordered blended dynamic particle system
• Particles are semi-opaque 3D models, not just points
• OpenGL compute shader computes particle interactions & what to render
• Incrementally update particle positions & spin
• Cull particles outside current view
• Back-to-front sort of remaining viewable semi-opaque 3D models
• Write out ordered, un-culled multi draw indirect to GL_DRAW_INDIRECT_BUFFER
• Write out total of un-culled draw indirect count to GL_PARAMETER_BUFFER
• Single glMultiDrawElementsIndirectCount command draws particles
32
ARB_pipeline_statistics_query
•New query types
• Shares same API initially used for occlusion queries
• glBeginQuery, glEndQuery, glGetQueryiv, glGenQueries, glDeleteQueries
• Original occlusion queries just returned samples passed
• Prior extensions added queries for transform feedback, conservative rasterization
•Now extended to return rendering statistics throughout the pipeline
• Shader invocation counts
• How many primitives pass through different points in rendering pipeline
•Useful for performance analysis
• Without this functionality, very difficult to accurately know how much
rendering work you are really creating
• Particularly for modern OpenGL usage
•Comparable to statistics available to Direct3D 11
• Compare with D3D11_QUERY_DATA_PIPELINE_STATISTICS
33
Available Statistics
Query token Queried statistic
GL_VERTICES_SUBMITTED # of vertices issued to OpenGL
GL_PRIMITIVES_SUBMITTED # of primitives issued to OpenGL
GL_VERTEX_SHADER_INVOCATIONS # of times a vertex shader invoked
GL_TESS_CONTROL_SHADER_PATCHES # of times a tessellation control shader invoked
GL_TESS_EVALUATION_SHADER_INVOCATIONS # of times a tessellation evaluation shader invoked
GL_GEOMETRY_SHADER_INVOCATIONS # of times a geometry shader invoked
GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED # of primitives that entered primitive clipping
GL_FRAGMENT_SHADER_INVOCATIONS # of times a fragment shader invoked
GL_COMPUTE_SHADER_INVOCATIONS # of times a compute shader invoked
GL_CLIPPING_INPUT_PRIMITIVES # of primitives that entered primitive clipping
GL_CLIPPING_OUTPUT_PRIMITIVES # of primitives that output by primitive clipping
34
Simple Example Usage
• Creating a query object
• GLuint query_object;
• glGenQueries(1, &query_object);
• Begin a query, do work, and end the query’s interval
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object);
• renderLotsOfStuff!
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object);
• Later read back to the CPU the query object’s result
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result;
• glGetQueryObjectui64v(query_object, GL_QUERY_RESULT, &query_result);
• When done with the query object
• glDeleteQueries(1, &query_buffer);
• Alternatively write query results into a buffer...
35
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
create multiple
query objects
36
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
create buffer object
for PU to write query
results
37
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
begin queries
draw, and end
them
38
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
now have GPU
write query results
to GPU buffer
39
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
later read the
GPU buffer’s
contents to the
CPU
40
ARB_polygon_offset_clamp
• Extends OpenGL’s polygon offset feature
•Polygon offset was one of OpenGL’s first
extensions
• Standardized by OpenGL 1.1
• Biases rasterized depth (Z) by constant bias +
bias based on primitive’s depth maximum slope
• What’s NEW in OpenGL 4.6
•Effective depth bias clamped to a specified
maximum offset
•Used to mitigate second-order light leak
artifacts of polygon offset
•Long supported by PlayStation 3 and Direct3D
• First exposed in OpenGL as multi-vendor EXT
extension
•EXT_polygon_offset_clamp in 2014
•Adding to OpenGL 4.6 resolves IP issues
Source: Eric Lengyel, Terathon Software
41
Motivation & Usage of Polygon Offset
• Motivation of polygon offset
• Depth buffers must quantize depth values
• Typically 24-bit fixed-point
• Want to rasterize depth-tested geometry
• BUT have need to disambiguate
nearly identical depth values
Rasterizing co-planar geometry, e.g. runway markings
Constructing shadow maps needs Z values to be “pushed back
a little” to avoid Z fighting causing self-shadowing artifacts
Shadow acne due to Z fighting
during shadow map testing
Shadow acne avoided using
polygon offset
Hidden line and silhouette rendering via polygon offset
42
Polygon Offset Justified (1)
• Rasterizing triangles generates discretized depth values
• A rasterizer’s depth slope for a triangle determines how Z values vary over triangle in
pixel space
• Triangles are “snapped” to sub-pixel fractional positions
• Practical requirement, necessary for watertight rasterization
• Rasterization hardware operates with finite fixed-point precision
• Dealing with Z fighting isn’t as simple as “nudging Z values” a little closer/further
• Two triangles logically in the same plane are NOT after
• floating-point transformation
• sub-pixel transformation
• discrete depth interpolation
• geometric mesh uncertainty – those triangles may appear co-planar, but are they really??
43
Polygon Offset Justified (2)
• Conceptually, think of interpolated depth as having “error bars”
• Depth rasterization error isn’t “experimental”
but rather “quantization” error
• Important: The depth slope tells maximum the
depth of a primitive will shift moving in pixel X & Y
• So if there is uncertainty (read: quantization!) in X & Y, a primitive’s depth slope quantizes the
maximum error per pixel shift
• Hence polygon offset’s bias should be scaled by the maximum of the X & Y depth slopes
• This is what the original OpenGL 1.1 polygon offset
functionality does
• Bias applied in unites of minimum Z buffer precision
• Typically a bias of 1 or 2 and slope of 0.5 is enough
to mitigate Z fighting
• Accounts for half a pixel of Z error
• Sounds fishy but (mostly) works! Think of your rasterized fragments & pixels
having error bars for X & Y... and Z!
44
Polygon Offset Improved!
• Wait... Sounds fishy but (mostly) works!
• Mostly??
• What can go wrong?
• The depth slop can “get large” for geometry viewed edge-on
• Gradient magnitude for slope is conservative and can get too large
• So “fixing” shadow acne “exposes” light leaks
• This is the “too much of a good thing” principle at work
• Analogy: Band-aid on a band-aid:
• If the bias can sometimes get too large... then
• Clamp the maximum depth bias to some largest “reasonable” offset
45
Using Polygon Offset Clamp
•Easy API just adds new maximum depth bias clamp value
• GL 1.1: glPolygonOffset(factor, units)
• GL 4.6: glPolygonOffsetClamp(factor, units, clamp);
•Changes the OpenGL specification’s equation for depth bias
• WAS
• NOW
46
Examples of Light Leaks
Mitigated by Polygon Offset Clamp
BEFORE AFTER
Solid girder’s shadow shows streaks
Animates badly
Mitigated by clamping
47
Examples of Light Leaks
Mitigated by Polygon Offset Clamp
Dots of light within boot’s shadow
Animates badly
Mitigated by clamping
BEFORE AFTERBEFORE AFTER
48
KHR_no_error
• The “no airbags” extension now part of OpenGL 4.6
• Makes OpenGL operation in the presence of GL_INVALID_VALUE,
GL_INVALID_OPERATION, etc. undefined
• GL_OUT_OF_MEMORY may still be generated but the occurrence might be delayed
• Intended to make OpenGL more efficient by obviating error checking & handling
• Hmm, not a large overhead in NVIDIA’s driver
• Typically error checks are folded into parameter handling
• Error checks are typically well-predicted “branches not taken” so cheap on modern CPUs
• Your must “opt in” to the “no error” semantic at context creation
• For EGL, works with eglCreateContext
• See the EGL_KHR_create_context_no_error extension
• Query the value of GL_CONTEXT_FLAGS for the GL_CONTEXT_FLAG_NO_ERROR_BIT to
see if the “no errors” semantic is enabled for a context
• WGL_ARB_create_context_no_error and GXL_ARB_create_context_no_error provide
WGL and GLX mechanisms for requesting “no error” semantic for a context.
49
NVIDIA’s KHR_no_error Advice
• General advice: “Try it before you buy it”
• Not generating errors has a severe side-effect (main effect!)  you’re blind to errors!
• First confirm there’s some sufficient performance benefit to offset the risk
• If you really are worried about API error detection overhead, consider Vulkan
• And before you even try it:
• Try disabling GL_DEBUG_OUTPUT_SYNCHRONOUS (part of OpenGL 4.3) first
• This still detects GL errors but avoids returning errors synchronously
• Asynchronous error and debug output helps NVIDIA’s dual-core driver to avoid app-
driver synchronization events for errors and debug output
• Then OpenGL API overhead can be relegated to another CPU improving performance
• Without losing well-defined error handling
• NVIDIA’s current “no errors” behavior is to simply hide posting the OpenGL error
• So the current benefit of “no errors” is very meager
• Errors are still detected and erroneous commands are ignored
• Considerations
• Expecting your software to work for years?
• Is your application’s predictable operation important for your user base?
• If yes, then blinding yourself to errors probably isn’t a good idea...
50
ARB_shader_atomic_counter_ops
• Completes OpenGL Shading Language (GLSL) support for atomic counters
• Prior ARB_shader_atomic_counter limited to increment, decrement, & query ops
• Operates on special atomic_uint variables
• New built-in functions for atomic counters
• Addition & Subtract: atomicCounterAdd & atomicCounterSubtract
• Minimum & Maximum: atomicCounterMin & atomicCounterMax
• Bitwise operators (AND, OR, XOR, etc.): atomicCounterAnd, atomicCounterOr, etc.
• Exchange, Compare & Exchange: atomicCounterExchange, atomicCounterCompSwap
• NOTE: Image loads & stores support similar atomics
51
ARB_shader_draw_parameters
• Adds to new GLSL built-in variables to get base vertex and instance
• gl_BaseVertex
• gl_BaseInstance
• Useful for offsetting gl_VertexID or gl_InstanceID respectively
• Also for glMultiDraw* commands, new GLSL built-in variable
• gl_DrawID
• glMultiDrawArrays, glMultiDrawArraysIndirect, glMultiDrawArraysIndirectCount
• glMultiDrawElements, glMultiDrawElementsIndirect, glMultiDrawElementsIndirectCount
• Rationale: lets app treat draw batches programmatically from within an über
shader to better minimize state changes
52
ARB_shader_group_vote
• Provides new GLSL built-in functions to compute composite of a set of boolean
conditions across a group of shader invocations
• Functions returning a boolean
• bool anyInvocation(bool value)
all threads return true if value is true for any, otherwise false
• bool allInvocations(bool value)
all threads return true if value is true for all threads, otherwise false
• bool allInvocationsEqual(bool value)
all threads return true if value is identical (equal) for all threads, otherwise false
53
ARB_shader_group_vote Rationale
• Rationale
• Implementation reality: GPUs run shader invocations using groups of threads
• NVIDIA calls these groups “warps”
• Threads run most efficient when they share the same sequence of instructions
• This is called “converged execution” (good), instead of diverged execution (bad)
• Group votes can keep threads running converged
• Consider this an advanced optimization to your shaders
• SPMT (“single program, multiple thread”) execution means shaders run reasonably
even when divergence is possible
• Example use: Common for all threads in shader to need exactly four loop
iterations
• If all threads can agree they are in the “4 iterations” case, the shader could be
written with an unrolled loop in expectation of this common case
• Thereby avoiding the loop overhead of the general case
54
ARB_gl_spirv
• This extension announced at SIGGRAPH 2016
• But was optional
• NVIDIA announced support last year
• Much more useful to have core part of OpenGL 4.6
• And NOW it is!
55
OpenGL Driver
GLSL Compilation prior to SPIR-V
shader.vert
shader.geom
shader.frag
your
OpenGL
app
GPU
GLSL Compiler
Front-end
GPU-specific
Compiler
Back-end
56
OpenGL Driver
GLSL Compiler
Front-end
ARB_gl_spirv Enabled
Offline Compilation of GLSL to SPIR-V
your
OpenGL
app
GPU
shader.vert
shader.geom
shader.frag
shader.vert.spv
shader.geom.pv
shader.frag.spv
glslangValidator
or
glslc
GPU-specific
Compiler
Back-end
SPIR-V
Compiler
Front-end
57
Tools to Manipulate SPIR-V
• Open source SPIR-V tools available
• glslang: glslValidtator
• Provides basic GLSL compiler that generates OpenGL friendly SPIR-V
• Use the –G option for ARB_gl_spriv SPIR-V
• https://github.com/KhronosGroup/glslang
• SPIRV-Tools: spirv-as, spirv-dis, spirv-stats, etc.
• Utilities for assembling, disassembling, or otherwise manipulating SPIR-V binaries
• https://github.com/KhronosGroup/SPIRV-Tools
• glslc
• Compiler front-end matching conventional gcc/clang command line options
• Use the --target-env=opengl_compat
• https://github.com/google/shaderc
• Your choice:
• Build from source
• Get pre-compiled from LunarG Vulkan SDK
58
API Usage Differences: Compiling GLSL vs. SPIR-V
glCreateProgram
glShaderSource
glCompileShader
glAttachShader
glCreateShader
glLinkProgram
glGetUniformLocation
glGetAttribLocation
Read GLSL text from file
glUseProgram
glProgramUniform*
while more
shader
domains while more
uniforms
to introspect
while more
attributes
to introspect
59
API Usage Differences: Compiling GLSL vs. SPIR-V
glCreateProgram
glShaderBinary
glSpecializeShader
glAttachShader
glCreateShader
glLinkProgram
Read SPIR-V binary blob from file
glUseProgram
glProgramUniform*
while more
shader
domains
while more
uniforms
to initialize
app assume locations
assigned within the shader,
obviating dynamic introspection
60
ARB_spirv_extensions
• Original ARB_gl_spirv extension only added support for SPIR-V 1.0 concepts that
were part of the OpenGL 4.5 Core Profile
• Many OpenGL ARB and vendor extensions not in OpenGL 4.5 Core add shading language
concepts
• BUT being defined prior to the existence of SPIR-V support in OpenGL, they lack SPIR-V
support for their additional features
• Advertising an extension + its SPIR-V extension means the SPIR-V support for that
extension is present
• So ARB_spirv_extensions adds mechanism to advertise a driver’s supported SPIR-V
extensions:
• Glint num_spirv_extensions;
glGetIntegerv(GL_NUM_SPIR_V_EXTENSIONS, &num_spirv_extensions);
• for (int ndx=0; ndx<num_spirv_extensions; ndx++)
const GLubyte *spirv_extension_name = glGetStringi(GL_SPIR_V_EXTENSIONS, ndx);
• Also defines several SPIR-V extensions...
61
First Set of SPIR-V Extensions
SPIR-V Extension Name Corresponding OpenGL extension or functionality
SPV_KHR_shader_ballot ARB_shader_ballot
SPV_KHR_shader_draw_parameters ARB_shader_draw_parameters
SPV_KHR_subgroup_vote ARB_shader_group_vote
SPV_NV_stereo_view_rendering NV_stereo_view_rendering
SPV_NV_viewport_array2 NV_viewport_array2 or
ARB_shader_viewport_layer_array
SPV_NV_geometry_shader_passthrough NV_geometry_shader_passthrough
SPV_NV_sample_mask_override_coverage NV_sample_mask_override_coverage
SPV_AMD_shader_explicit_vertex_parameter AMD_shader_explicit_vertex_parameter
SPV_AMD_gpu_shader_half_float AMD_gpu_shader_half_float
SPV_KHR_shader_atomic_counter_ops ARB_shader_atomic_counter_ops
SPV_KHR_post_depth_coverage ARB_post_depth_coverage
SPV_KHR_storage_buffer_storage_class Storage buffer support
62
ARB_texture_filter_anisotropic
• Fully compatible with long-standing EXT_texture_filter_anisotropic
• Simple Ease to use:
glTextureParameteri(texture_object, GL_TEXTURE_MAX_ANISOTROPY, 8);
63
64
ARB_transform_feedback_overflow_query
• Adds new query types which can be used to detect overflow of transform feedback
buffers
• GL_TRANSFORM_FEEDBACK_OVERFLOW if any stream overflows
GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW if a particular indexed vertex stream
overflows
• These two NEW query types are also allowed for glBeginConditionalRender for
conditional rendering
• Allows the graphics pipeline can condition rendering on if a prior vertex stream
operations overflowed
• Comparable to Direct3D 11’s D3D11_QUERY_SO_OVERFLOW_PREDICATE* stream-
out functionality
65
Why OpenGL Core Updates
are Important (1)
• Not just opportunity for new functionality
• A new specification is released that reconciles all the
bundled extensions into a coherent single document
• Also gives the OpenGL Working Group to better structure OpenGL’s specification
• Opportunity to fix typos, improve consistency of terminology, clarify ambiguities, document expected
error behavior
• Almost two dozen different minor tweaks in 4.6, largely consequential to developers
• Future extensions can then be written against a cleanly resolved 4.6 specification
• Otherwise, extensions can overlap how they amend the core specification and lead to confusion
• Ensures new functionality is covered by the Khronos Intellectual Property (IP) Framework
• This allows OpenGL implementers, developers, and
end-users to confidently depend on the functionality described
• Specifically for 4.6, Intellectual Property concerns surrounding
both anisotropic texture filtering and polygon offset clamping
• Khronos maintains OpenGL, ES, and Vulkan in the
same “IP zone”—so ratifying a Khronos standard resolves issues
for related standards
Coherent Specification
Resolving IP Concerns
66
Why OpenGL Core Updates are Important (2)
• Not just opportunity for new functionality
• Opportunity for a new Conformance Test Suite to be released
• New tests obviously cover NEW functionality
• But also include contributed tests for existing functionality
• Without a new core specification, it is hard to enforce stronger conformance testing
• Vendors would simply continue certifying with an older, weaker conformance test version
• A new core version is a new opportunity to raise the shared quality bar for OpenGL
• Developers adopt OpenGL features at different levels of comfort
• Many developers are happy to use the latest, greatest features
as soon as extensions are shipped in drivers
• Other developers, often those with long-term support horizons,
look for core updates to signal mature standards now ready
to be adopted
• Example: A graphics researcher and a medical device maker can
both use OpenGL, but embrace the features provided at varying
rates and at different milestones
Conformance Testing
QualitySheriff
Developer Comfort
Levels
67
Why OpenGL Core Updates are Important (3)
• Not just opportunity for new functionality
• OpenGL Shading Language (GLSL) gets accompanying revision
• So OpenGL 4.6 brings with it an updated GLSL
• Like the core API specification, the GLSL specification needs reconciliation of new
extensions, typos fixed, clarifications, etc.
• As many Vulkan applications express shaders in GLSL and compile them with glslang to
generate the SPIR-V that Vulkan expects, updating GLSL helps advance Vulkan
• OpenGL core revisions are as much about consolidating OpenGL’s associated
ecosystem support as simply adding NEW features to OpenGL
Advancing the Ecosystem
68
OpenGL 4.6’s Resolving of IP Issues & New Open Sourcing of OpenGL
Conformance Suite Benefits Open Source OpenGL Implementation
• Khronos using Vulkan’s conformance approach for OpenGL now
• See https://github.com/KhronosGroup/VK-GL-CTS
• Should help Mesa keep closer to latest official standard, better for OpenGL overall
"OpenGL 4.6 will be the first OpenGL release where conformant open source
implementations based on the Mesa project will be deliverable in a reasonable
timeframe after release. The open sourcing of the OpenGL conformance test
suite and ongoing work between Khronos and X.org will also allow for non-vendor
led open source implementations to achieve conformance in the near future.“
David Airlie, senior principal engineer at Red Hat, and developer on Mesa /
X.org projects
Source: Khronos OpenGL 4.6 press release
69
Credit for OpenGL 4.6
• Khronos relies on its member companies to complete new OpenGL core updates
• Different companies drove different features, all free to comment and contribute
• Representatives of these companies drove the constituent features of OpenGL 4.6
See Appendix J of OpenGL 4.6 for comprehensive list of contributor companies and individuals
70
GPU “Interop” Usage
•Increasingly applications want to share GPU resources and mix APIs
• Typically sophisticated applications
•APIs involved might be
• Graphics (OpenGL, Vulkan, Direct3D)
• Compute (OpenCL, CUDA)
• Video encode and decode (VDAPU, NVENC, NVDEC, Windows Media)
•Multiple motivations for cross-process GPU resource sharing
• Performance (don’t read back to CPU), latency control (VR compositing)
• Robustness (isolation)
• Security, including protecting digital media assets
•Interop = jargon for two things
• Sharing GPU resources among different APIs
• Sharing GPU resources across process boundaries
• For example, a display compositor
71
Past Interop Extensions for OpenGL
•Past interoperability extensions would pair OpenGL concepts to those
of another one particular GPU API
• Often exposed as proprietary extensions
• Typically tied to platform concepts (e.g. Win32 HANDLEs)
• Simple when API concepts match (e.g. OpenGL textures to Direct3D Surfaces)
•Examples
• NV_DX_interop mixed OpenGL and Direct3D 9
• NV_DX_interop2 mixes OpenGL and Direct3D 10 & 11
• NV_vdpau_interop mixes OpenGL and Linux VDAPAU video input/output surfaces
• Additionally, CUDA & OpenCL have interop to OpenGL
•Worked well as designed BUT...
72
Responding to New Interop Requirements
• Addressing criticism of prior interop extensions...
• In many cases, single-vendor and proprietary extensions
• Can we strive for something multi-vendor?
• Overcoming NEW Managed vs. Explicit GPU API philosophy mismatches
• Older GPU APIs (e.g. OpenGL, Direct3D 9,10,11) manage GPU resources and their
underlying memory as one
• Older APIs have textures, buffers, and synchronization objects
• New GPU APIs (e.g. Vulkan, Direct3D 12) uses lower-level mechanisms to manage
resources
• Newer explicit APIs have explicit memory allocations and semaphores
• Noticeable lack of common interop infrastructure
• Can there be some common framework for interop
• Isolate platform-specific methods to “import” objects into platform-specific extension
• Windows uses HANDLEs, etc.
• POSIX operating systems use file descriptors
73
EXT_memory_object & EXT_semaphore
• Vulkan introduces explicit memory and synchronization objects
• EXT_memory_object imports Vulkan explicit memory objects to OpenGL
• EXT_semaphore imports Vulkan semaphore objects to OpenGL
• Extra interop mechanisms need to share GPU objects due to this
• Platform-specific extensions specify how to import memory objects & semaphores
• For POSIX systems (e.g. Linux), use EXT_memory_object_fd & EXT_sempahore_fd
• fd = POSIX file descriptor
• For Windows, use EXT_memory_object_win32 & EXT_semaphore_win32
• Uses either Win32’s opaque HANDLE type or KMT share handle
• KMT = Kernel-Mode Thunk interface for Windows Display Driver Model (WDDM)
• Also for interoperability with Direct3D 11 & 12
• Also EXT_win32_keyed_mutex provides access to the keyed synchronization primitive
of Direct3D image objects
74
EXT_semaphore
• Introduces new object matching Vulkan-style semaphores
• Basic operations on semaphores
• Object management
• glGenSemaphoresEXT generates semaphore object names
• glDeleteSemaphoresEXT deletes semaphore objects
• Parameter setting & querying
• glSemaphoreParameterui64vEXT & GetSemaphoreParameterui64vEXT
• Semaphore parameters are platform dependent (e.g. GL_D3D12_FENCE_VALUE_EXT)
• Semaphore operations
• glSignalSemaphoreEXT signals a semaphore
• glWaitSemaphoreEXT waits until something signals the semaphore
75
EXT_memory_object
• Introduces new memory object corresponding to Vulkan concept
• Import memory objects with platform-specific API
• Then “carve out” managed OpenGL textures and buffers from a memory object
• Commands to make textures: glTexStorageMem1DEXT, glTexStorageMem2DEXT,
glTexStorageMem3DEXT, glTexStorageMem2DMultisampleEXT,
glTexStorageMem3DMultisampleEXT
• Also Direct State Access (DSA) versions: glTextureStorageMem2DEXT, etc.
• Commands to carve out a buffer: glBufferStorageMemEXT,
glNamedBufferStorageMemEXT
76
OpenGL ES Parity
• Mobile developers often target OpenGL ES
• Apple’s iOS and Google’s Android use of ES made the de facto standard graphics API for
mobile
• Moore’s Law has eliminated the need for ES on NVIDIA products
• ES 2.0/3.x is supported along with full OpenGL 4.x feature set
• Essentially an ES context “hides” the complete OpenGL 4.x feature set
• Good for compatibility and portability to other vendor’s less functional GPUs
• Unfortunately ES has been slow to adopt important GPU features
• NVIDIA makes sure developers relying on ES contexts don’t forego missing features
• NVIDIA works to coordinate multi-vendor EXT extensions to ES
• NVIDIA supports fully conformant ES contexts (+ extensions) even on Windows and Linux
• NVIDIA’s OpenGL in 2017 adds many ES parity extensions...
???
77
Oh, 3D developer—you
flatter me noticing my
complete & mature
feature set
With ES parity,
what does she
have that I don’t?
OpenGL 4.6
Context
ES 3.2
Context
78
ES Parity Extensions for 2017
Extension name Functionality
EXT_clear_texture Clear texture images & sub-images
EXT_conservative_depth Bound direction of fragment shader depth output
EXT_shader_group_vote Collective decision making in shaders
EXT_texture_compression_bptc Compressed texture formats corresponding to Direct3D’s BC6
(8-bit RGB & RGBA) and BC7 (for HDR) formats
EXT_texture_compression_rgtc One- and two-component texture compression
EXT_texture_sRGB_R8 Single-component (red) sRGB color-space component encoding
EXT_draw_transform_feedback Adds missing transform feedback API to ES intended for
geometry shaders’s variable output vertices
EXT_clip_cull_distance Clipping and culling planes
OES_viewport_array Viewport index support for geometry shaders
KHR_parallel_shader_compile Request multi-threaded GLSL shader compilation
79
Still ES Lacks Much,
NVIDIA Provides What’s Missing
•The 2017 multi-vendor parity extensions
highlight what’s missing from standard ES 3.2
•Additional major items missing from standard ES 3.2
• Texture views with OES_texture_view missed ES 3.2 inclusion
• GPU-accelerated path rendering with NV_path_rendering for ES
•BUT NVIDIA’s OpenGL ES context provides these
•If ES still isn’t enough, just use an OpenGL 4.6 context
• For example, Direct State Access is not in ES contexts 
+
80
NVIDIA’s ES Parity Philosophy
• The idea of “ES Parity” is NOT to turn an ES context into an OpenGL 4.x context
• The idea is to expose
• Features NVIDIA’s ES developer base has requested
• Features that we judge other ES vendors could reasonably support
• When Khronos ES vendors broadly agree, we work towards an OES extension
– Example: OES_viewport_array
• When just subset of Khronos ES vendors agree, we work for a multi-vendor EXT extension
– Example: EXT_clip_cull_distance
• As a last resort, when other ES vendors don’t share our interest, we go with NV
• Need a feature missing from ES? Speak up
• NVIDIA does not expose extensions broadly inconsistent with ES’s philosophy
• For example, fixed-function, immediate-mode,
and display lists aren’t candidates for ES parity
• Developers desiring such functionality are better
off with OpenGL 4.x contexts
81
NVIDIA ES Parity
Enhancements
Result of NVIDIA’s ES Parity Efforts
Full OpenGL
ES 3.2
ES 3.1
ES 3.0
ES 2.0
Industry’s
most functional
and full-featured
ES driver
OSes and Architectures
Android, Linux,
Windows, FreeBSD;
x86, ARM, IBM PowerPC
82
Perspective of ES Parity
from an OpenGL 4.6 Context
NVIDIA ES Parity
Enhancements
Full OpenGL
ES 3.2
ES 3.1
ES 3.0
ES 2.0
NVIDIA
OpenGL 4.6
with maximally
functional
extensions
Same
driver provides
ES and 4.6 contexts
Only
difference between
ES and 4.6 context is ES
context disables non-ES usage
83
Miscellaneous NEW Extensions for 2017
• NV_blend_minmax_factor, based on AMD_blend_minmax_factor
• EXT_protected_textures (Tegra & ES only)
• Used with EGL’s EGL_EXT_protected_content
Miscellaneous
2017
84
NV_blend_minmax_factor:
Modulated Min/Max Blending
• Original GL_MIN and GL_MAX blend equations limited
• Both ignore the blend source and destination blend factors from glBlendFunc
• Limitation of original SGI hardware
• Conventional min/max blend equations
• blendResult = min(sourceColor, destinationColor)
• blendResult = max(sourceColor, destinationColor)
• AMD_blend_minmax_factor extension generalizes with two new blend equations
• GL_FACTOR_MIN_AMD:
blendResult = min(sourceColor × sourceFactor, destinationColor × destinationFactor)
• GL_FACTOR_MIN_AMD:
blendResult = max(sourceColor × sourceFactor, destinationColor× destinationFactor)
• NV_blend_minmax_factor provides same capability
• Just with a few restrictions, matching blend equation advanced restrictions
• Not for use with dual-source blending
• Not for mismatched multiple draw buffers
• Single-precision floating-point blending done in half-precision
• Otherwise compatible with AMD extension (uses same token values)
85
NV_blend_minmax_factor
Example
• Blend code
• blendResult = max(sourceColor,
destinationColor × (1−sourceAlpha))
• Code to configure
• glEnable(GL_BLEND);
• glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
• glBlendEquation(GL_FACTOR_MIN_AMD);
• Extension supported on Maxwell and later GPU
generations
Unconventional blending
Source: Visual Music Systems
86
EGL_EXT_protected_content &
EXT_protected_textures (1)
•Together provide OpenGL protected access control to GPU images
• Intended for managing trust in display compositors and apps
• Designed for Android
•GL_TEXTURE_PROTECTED_EXT texture parameter
• Applies to OpenGL texture objects
• And hence applies to framebuffer objects containing texture objects
• Boolean, defaults to false (unprotected) unless explicitly specified true
•EGL_PROTECTED_CONTENT_EXT attribute
• Applies to EGL surfaces and EGLImages
• Boolean, defaults to false (unprotected) unless explicitly specified true
•Texture objects, EGL surfaces, and EGLImages all “resources” subject
to protection
87
EGL_EXT_protected_content &
EXT_protected_textures (2)
• Pipeline stages of OpenGL contexts can also be designated protected and
unprotected
• Scenario:
• display compositor uses a protected context
• while apps would use unprotected contexts
• Technically different GPU stages can be protected vs. non-protected
• General access rules
• Protected pipeline stages
• Can read any EGL surfaces and images, protected or otherwise
• BUT may NOT write non-protected EGL surfaces and images
• Non-protected contexts/stages
• Can read & write non-protected
• BUT may NOT read or write protected content
• Expectation: GPU & operating system together enforce resource protection via
protected virtual memory mappings
88
EGL_EXT_protected_content Scenarios
• Android 7.0’s secure texture video playback
• Allows secure GPU post-processing of protected image content
• Supports secure Digital Rights Management (DRM)
Source: Google
89
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
NumberofOpenGLextensionsproposed
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
90
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
NumberofOpenGLextensionsproposed
1.5
4.5 4.6
4.4
4.34.24.1
3.3 & 4.0
3.2
3.1
3.0
2.1
2.0
1.4
1.3
1.21.1
OpenGL core version updates
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
91
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
NumberofOpenGLextensionsproposed
Run-up to
DirectX 10
Run-up to
DirectX 11 Run-up to
DirectX 12
TNT +
GeForce
Run-up to
DirectX 8
Run-up to
DirectX 9
Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
92
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
NumberofOpenGLextensionsproposed
Tesla development
(GeForce 8,9,100,200,300)
Fermi development
(~GeForce 400)
Kepler development
(~GeForce 600)
GeForce 1,2
GeForce 3,4 GeForce 5,6,7
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
Pascal development
(~GeForce 10)
Maxwell development
(~GeForce 700-900)
Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates
93
Cumulative Implemented NVIDIA OpenGL
Extensions Over 20 Years
0
50
100
150
200
250
300
350
400
450
500
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
CumulativeImplemented
OpenGLextensionsproposed
Same data as prior graphs, just integrated over time
94
NVIDIA OpenGL Shader Caching
• NVIDIA OpenGL driver saves GLSL shaders it
compiles
• Cached compiled shaders saved to your
local disk
• Next time you compile the same shader,
driver loads the post-compiled cached
copy
• Saves compilation time!
• Invalidated on new driver installation
• Games can “warm” cache on installation or
first play to speed game loading
• Available both on Windows and Linux!
• Windows location
• %LOCALAPPDATA%NVIDIAGLCache
• For older drivers, used %APPDATA
%NVIDIAGLCache
• Linux location
• %HOME/.nv/GLCache
• Or $XDG_CACHE_HOME/.nv/ if
XDG_CACHE_HOME environment variable
set
• Following the convention set by the XDG
Base Directory Standard
• Locations subject to change with future
drivers and new conventions
95
Linux Graphics
Open Source Efforts from NVIDIA
• NVIDIA works to improve graphics support for entire Linux ecosystem
• Examples
• GL Vendor-Neutral Dispatch (GLVND)
• arbitrates vendor-neutral access to OpenGL and EGL/GLX APIs
• Wayland support for EGL Streams
• Video Decode and Presentation API for Unix (VDPAU)
• complete solution for decoding, post-processing, compositing, and displaying compressed or
uncompressed video streams
• All open source projects
96
GLVND: GL Vendor-Neutral Dispatch library
• libglvnd
• Arbitrates OpenGL API calls between multiple vendors
• Multiple drivers from different vendors to coexist on the same file system
• Determines which vendor to dispatch each API call to at runtime
• Both GLX and EGL are supported
• Any combination with OpenGL and OpenGL ES (1.1, 2.0, 3.x)
• NVIDIA open source contribution
• https://github.com/NVIDIA/libglvnd
97
Before GLVND
NVIDIA Proprietary
Linux Driver
Mesa + Nouveau
I control OpenGL
best on NVIDIA
GPUs
But I got
here first!
Drivers driving you crazy!
I just want my
Linux window
system to start!
pre-GLVND user
98
GLVND Architecture
libOpenGL
mapi/glapi
libGLdispatch
libGLX
libGL
X11 Server
GLX_EXT_libglnvd
extension
GLX_vendor GLX_vendor2
OpenGL or ES Application
99
NVIDIA’s Support for Wayland
• Wayland
• Intended as simpler replacement for X Window System
• A protocol for a compositor to talk to its clients
• Plus the C library implementation of that protocol
• Depends on a compositor (e.g. Weston) that is the display server
• Supports varying window managers (e.g. Mutter for Gnome)
• Wayland is supported on NVIDIA GPUs through EGL Streams
• Using NVIDIA’s Proprietary OpenGL driver performance & quality
• Both Weston and Mutter (used by gnome-shell) currently have EGL Stream support
• Although not by default
• See https://github.com/NVIDIA/egl-wayland
• NVIDIA open source project
100
0
Latest VDPAU Support
• Video Decode and Presentation API for Unix (VDPAU)
• Latest NVIDIA GPUs (GeForce 1080, etc.)
• Supports VDPAU Feature Set H
• Hardware-accelerated decoding of 8192x8192 (8k) H.265/HEVC video streams
• Full support of HEVC Main12 profile
101
1
NVIDIA Codec SDK 8.0
• Two hardware acceleration interfaces:
• NVENCODE API for video encode
acceleration
• NVDECODE API for video decode
acceleration
• Integration already available in the
FFmpeg/libav
• New in 8.0
• 10/12-bit decoding support with
HEVC/VP9, enabling end-to-end HDR
transcoding
• Improved quality via weighted prediction
• Support for OpenGL inputs (Linux only)
Download for registered developers: https://developer.nvidia.com/designworks/video_codec_sdk/downloads/v8.0
Info: https://developer.nvidia.com/nvidia-video-codec-sdk
102
2
Supported Video Encoding Formats by GPU Generation
* Except GM108
** Except GP100 (is limited to 4K resolution)
8k encoding for latest GPUs!
103
3
GPU Encoding:
Awesome Performance & Comparable Quality
Bigger is faster for NVENC Comparable peak signal-to-noise ratio
indicates comparable quality
104
4
Supported Video Decode Formats by GPU Generation
* Except GM108
** Max resolution support is limited to selected Pascal chips
*** VP8 decode support is limited to selected Pascal chips
**** VP9 10/12 bit decode support is limited to select Pascal chips
8k encoding for latest GPUs!
105
5
NVDEC to OpenGL to NVENC
NVDEC NVENC
OpenGL
texture object
OpenGL
texture object
OpenGL
texture object
Linux only for GL to NVENC
For Windows, use OpenGL
interop into Direct3D surfaces
to encode from Direct3D surfaces
Decode
into
Rendering to
Framebuffer Objects
Encode
from
106
6
Proven GPU Codec Technology
•Same underlying technology powers these services
Play your PC games on your PC,
encode to the cloud
Play your PC game on your PC,
decode & play on your SHIELD TV
107
7
GLEW Support Available NOW
GLEW = The OpenGL Extension Wrangler Library
Open source library
Pre-built distribution: http://glew.sourceforge.net/
Source code: https://github.com/nigels-com/glew
Your one-stop-shop for API support for all OpenGL extension APIs
Now released GLEW 2.1 (July 31, 2017) provides API support for
OpenGL 4.6
Multi-vendor EXT interoperability extensions
All of NVIDIA’s Maxwell & Pascal extensions
All other NVIDIA multi-GPU generation initiatives
Examples: NV_path_rendering, NV_command_list, NV_gpu_multicast
Thanks to Nigel Stewart, GLEW maintainer, for this
108
8
NVIDIA OpenGL in 2017 Provides
OpenGL’s Maximally Available Superset
OpenGL 4.6
Pascal
Extensions
2015 ARB extensions
OpenGL 4.5
Core
Maxwell
Extensions
Legacy EXT & Other
Compatibility Extensions
OpenGL Complete
Compatibility
Path Rendering
Multi-GPU.
SLI
Approaching Zero
Driver Overhead
NVIDIA Multi-generation
GPU Initiatives
DirectX inter-op
Vulkan inter-op
ES Enhancements
Full OpenGL
ES 3.2
Khronos Standard
Expected Compatibility
NVIDIA Initiatives
GPU Generation Features
109
9
Last Words
•Khronos announces OpenGL 4.6 today! Best OpenGL yet
•Highlights of NVIDIA’s OpenGL support in 2017
• NVIDIA has OpenGL 4.6 today, developer preview driver available NOW
• SPIR-V support standard part of OpenGL now
• Multi-vendor EXT interoperability extensions NEW this year
• “ES Parity” effort for 2017
• Miscellaneous extensions: protected content, min/max factor blending
• Open source graphics contributions from NVIDIA
• GLVND, VDPAU for video processing, and Wayland EGL Streams support
• GPU-accelerated Encode & Decode
110
0
SIGGRAPH Paper Using OpenGL to Check Out
• How to make shaders modular without giving
up performance
• Open source on github
• Accompanied by OpenGL and Vulkan demo
• Wednesday, 2 August
• Los Angeles Convention Center, Room 150/151
• 10:45 am - 12:35 pm
1
NVIDIA OpenGL 4.6 in 2017

More Related Content

What's hot

The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
SIGGRAPH Asia 2008 Modern OpenGL
SIGGRAPH Asia 2008 Modern OpenGLSIGGRAPH Asia 2008 Modern OpenGL
SIGGRAPH Asia 2008 Modern OpenGLMark Kilgard
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbiteElectronic Arts / DICE
 
Player Traversal Mechanics in the Vast World of Horizon Zero Dawn
Player Traversal Mechanics in the Vast World of Horizon Zero DawnPlayer Traversal Mechanics in the Vast World of Horizon Zero Dawn
Player Traversal Mechanics in the Vast World of Horizon Zero DawnGuerrilla
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCQLOC
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Philip Hammer
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Ki Hyunwoo
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 

What's hot (20)

The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
SIGGRAPH Asia 2008 Modern OpenGL
SIGGRAPH Asia 2008 Modern OpenGLSIGGRAPH Asia 2008 Modern OpenGL
SIGGRAPH Asia 2008 Modern OpenGL
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 
Player Traversal Mechanics in the Vast World of Horizon Zero Dawn
Player Traversal Mechanics in the Vast World of Horizon Zero DawnPlayer Traversal Mechanics in the Vast World of Horizon Zero Dawn
Player Traversal Mechanics in the Vast World of Horizon Zero Dawn
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Checkerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOCCheckerboard Rendering in Dark Souls: Remastered by QLOC
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Light prepass
Light prepassLight prepass
Light prepass
 

Similar to NVIDIA OpenGL 4.6 in 2017

"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...Edge AI and Vision Alliance
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...Edge AI and Vision Alliance
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...Edge AI and Vision Alliance
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...Edge AI and Vision Alliance
 
Introduction of openGL
Introduction  of openGLIntroduction  of openGL
Introduction of openGLGary Yeh
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...CEE-SEC(R)
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Cheer Chain Enterprise Co., Ltd.
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsRISC-V International
 
Tech Days 2015: AdaCore Roadmap
Tech Days 2015: AdaCore RoadmapTech Days 2015: AdaCore Roadmap
Tech Days 2015: AdaCore RoadmapAdaCore
 
Raffaele Rialdi
Raffaele RialdiRaffaele Rialdi
Raffaele RialdiCodeFest
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...Edge AI and Vision Alliance
 
Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...
Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...
Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...Tim Geisler
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlabNational Cheng Kung University
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...Edge AI and Vision Alliance
 
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...Edge AI and Vision Alliance
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...Edge AI and Vision Alliance
 

Similar to NVIDIA OpenGL 4.6 in 2017 (20)

"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
Introduction of openGL
Introduction  of openGLIntroduction  of openGL
Introduction of openGL
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutions
 
Tech Days 2015: AdaCore Roadmap
Tech Days 2015: AdaCore RoadmapTech Days 2015: AdaCore Roadmap
Tech Days 2015: AdaCore Roadmap
 
C#: Past, Present and Future
C#: Past, Present and FutureC#: Past, Present and Future
C#: Past, Present and Future
 
Raffaele Rialdi
Raffaele RialdiRaffaele Rialdi
Raffaele Rialdi
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
desktop_resume
desktop_resumedesktop_resume
desktop_resume
 
Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...
Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...
Domain-Specific Languages for Product Modeling (CWG 2011 Cologne, SAP Configu...
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
 
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
 

More from Mark Kilgard

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...Mark Kilgard
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsMark Kilgard
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016Mark Kilgard
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsMark Kilgard
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectanglesMark Kilgard
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Mark Kilgard
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineMark Kilgard
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsMark Kilgard
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingMark Kilgard
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardMark Kilgard
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path RenderingMark Kilgard
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012Mark Kilgard
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012Mark Kilgard
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam ReviewMark Kilgard
 

More from Mark Kilgard (20)

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School Students
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUs
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectangles
 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional Improvements
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforward
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path Rendering
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam Review
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

NVIDIA OpenGL 4.6 in 2017

  • 1. Mark Kilgard, July 31 SIGGRAPH 2017, Los Angeles NVIDIA OpenGL in 2017
  • 2. 2 Mark Kilgard • Principal System Software Engineer OpenGL driver and API evolution Cg (“C for graphics”) shading language GPU-accelerated path rendering & web browser rendering • OpenGL Utility Toolkit (GLUT) implementer • Specified and implemented much of OpenGL • Author of OpenGL for the X Window System • Co-author of Cg Tutorial • Worked on OpenGL for over 25 years My Background
  • 3. 3 OpenGL 4.6 with SPIR-V support announced today
  • 4. 4 NVIDIA’s OpenGL Leverage Debugging with Nsight Programmable Graphics Android & SHIELD Quadro OptiX GeForce Adobe Creative Cloud Embedded Projects
  • 5. 5 OpenGL Codebase Leverage Same driver code base supports multiple APIs OpenGL for Embedded, Mobile, and Web Multi-vendor, explicit, low-level graphics from Khronos
  • 6. 6 NVIDIA’s Shading Compiler Even More Leveraged Various Direct3D versions3D APIs based on NVIDIA OpenGL driver code base NVIDIA Shading Compiler code base Apple’s proprietary graphics API Proprietary console API
  • 7. 7 Still the One Truly Common & Open 3D API OS X Linux FreeBSD Solaris Android Windows Embedded Designs
  • 8. 8 NVIDIA OpenGL in 2017 Provides OpenGL’s Maximally Available Superset OpenGL 4.6 Pascal Extensions 2015 ARB extensions OpenGL 4.5 Core Maxwell Extensions Legacy EXT & Other Compatibility Extensions OpenGL Complete Compatibility Path Rendering Multi-GPU. SLI Approaching Zero Driver Overhead NVIDIA Multi-generation GPU Initiatives DirectX inter-op Vulkan inter-op ES Enhancements Full OpenGL ES 3.2 Khronos Standard Expected Compatibility NVIDIA Initiatives GPU Generation Features
  • 9. 9 OpenGL’s Recent Advancements 2014 2015 2016 New ARB Extensions 3 standard extensions, beyond 4.5 • ARB_sparse_buffer • ARB_pipeline_statistics_query • ARB_transform_feedback_overflow_query Maxwell Extensions • Novel graphics features • 14 new extensions • Global Illumination & Vector Graphics focus
  • 10. 10 OpenGL’s Recent Advancements 2014 2015 2016 New ARB Extensions 3 standard extensions, beyond 4.5 • ARB_sparse_buffer • ARB_pipeline_statistics_query • ARB_transform_feedback_overflow_query New ARB 2015 Extension Pack • Shader functionality • ARB_ES3_2_compatibility (shading language support) • ARB_parallel_shader_compile • ARB_gpu_shader_int64 • ARB_shader_atomic_counter_ops • ARB_shader_clock • ARB_shader_ballot • Graphics pipeline operation • ARB_fragment_shader_interlock • ARB_sample_locations • ARB_post_depth_coverage • ARB_ES3_2_compatibility (tessellation bounding box + multisample line width query) • ARB_shader_viewport_layer_array • Texture mapping functionality • ARB_texture_filter_minmax • ARB_sparse_texture2 • ARB_sparse_texture_clamp Maxwell Extensions • Novel graphics features • 14 new extensions • Global Illumination & Vector Graphics focus
  • 11. 11 OpenGL’s Recent Advancements 2014 2015 2016 New ARB Extensions 3 standard extensions, beyond 4.5 • ARB_sparse_buffer • ARB_pipeline_statistics_query • ARB_transform_feedback_overflow_query Maxwell Extensions • Novel graphics features • 14 new extensions • Global Illumination & Vector Graphics focus New ARB 2015 Extension Pack • Shader functionality • ARB_ES3_2_compatibility (shading language support) • ARB_parallel_shader_compile • ARB_gpu_shader_int64 • ARB_shader_atomic_counter_ops • ARB_shader_clock • ARB_shader_ballot • Graphics pipeline operation • ARB_fragment_shader_interlock • ARB_sample_locations • ARB_post_depth_coverage • ARB_ES3_2_compatibility (tessellation bounding box + multisample line width query) • ARB_shader_viewport_layer_array • Texture mapping functionality • ARB_texture_filter_minmax • ARB_sparse_texture2 • ARB_sparse_texture_clamp Pascal Extensions • Novel graphics features • 5 new extensions • Virtual Reality focus OpenGL SPIR-V Support • Standard Shader Intermediate Representation • ARB_gl_spirv (not core) • Vulkan interoperability
  • 12. 12 Available to Download Today • Beta driver with OpenGL 4.6 support July 31, 2017
  • 13. 13 For those tracking birthdays... Then celebrating OpenGL 4.3 Now celebrating OpenGL 4.6
  • 14. 14 Need a Refresher on 2014, 2015, and 2016 OpenGL? • Honestly, NVIDIA exposed lots of functionality in last 3 years Available @ http://www.slideshare.net/Mark_Kilgard
  • 15. 15 Introducing OpenGL 4.6 • Big feature: SPIR-V support required • SPIR-V = standard intermediate language for parallel compute and graphics • Vulkan 1.0 standard requires expressing SPIR-V • Allows content creators to simplify their shader authoring and management pipelines • Previously this was an optional ARB extension, not required for 4.5 • Includes NEW ARB_spirv_extensions to SPIR-V support • Genius of AND: OpenGL 4.6 allows either GLSL or SPIR-V, your choice • Technically, NVIDIA’s Vulkan 1.0 allows use GLSL directly via an extension • Additional new ARB extensions bundled in OpenGL 4.6 for • Improving performance • Improving rendering quality • Resolving outstanding Intellectual Property (IP) issues support not built-in
  • 16. 16 OpenGL extension exposing Khronos intermediate language for parallel compute and graphics Khronos extension for OpenGL + SPIR-V ARB extension announced last year July 22, 2016 Allows compiled SPIR-V code to be passed directly to OpenGL driver Accepts SPIR-V output from open source Glslang Khronos Reference compiler https://github.com/KhronosGroup/glslang Other compilers can target SPIR-V too Khronos standard extension ARB_gl_spirv +
  • 17. 17 SPIR-V Ecosystem LLVM Third party kernel and shader Languages •SPIR-V •Khronos defined and controlled cross-API intermediate language •Native support for graphics and parallel constructs •32-bit Word Stream •Extensible and easily parsed •Retains data object and control flow information for effective code generation and translation OpenCL C++OpenCL C GLSL Khronos has open sourced these tools and translators IHV Driver Runtimes Other Intermediate Forms SPIR-V Validator SPIR-V (Dis)Assembler LLVM to SPIR-V Bi-directional Translator Khronos plans to open source these tools soon https://github.com/KhronosGroup/SPIR/tree/spirv-1.1 Open source C++ front-end released HLSL Khronos has open sourced these tools and translators Khronos plans to open source these tools soon Khronos has open sourced these tools and translators HLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators LLVM to SPIR-V Bi-directional Translator OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators SPIR-V Validator LLVM to SPIR-V Bi-directional Translator OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators SPIR-V (Dis)Assembler SPIR-V Validator LLVM to SPIR-V Bi-directional Translator OpenCL C++OpenCL C GLSLHLSL Khronos plans to open source these tools soon Khronos has open sourced these tools and translators OpenGL support NEW with ARB_gl_spirv Standard in OpenGL 4.6
  • 18. 18 NVIDIA’s SIGGRAPH Driver Update • NVIDIA historically releases a “developer” driver at SIGGRAPH with support for all Khronos standards announced at SIGGRAPH • This year too  • Monday (July 31, 2017) NVIDIA put out a new SIGGRAPH driver • OpenGL 4.6 (beta, expected to pass 4.6 Conformance when available) • Multi-vendor (EXT) interoperability extensions • Finally portable interoperability between OpenGL, Vulkan, OpenCL, etc. • Generic: EXT_memory_object, EXT_semaphore • Windows: EXT_memory_object_win32, EXT_win32_keyed_mutex, EXT_semaphore_win32 • Unix: EXT_memory_object_fd, EXT_semaphore_fd (Unix) • Other new extensions • NV_blend_minmax_factor, consistent with AMD_blend_minmax_factor • Fill in missing ES functionality gaps • EXT_clear_texture, EXT_conservative_depth, EXT_shader_group_vote, EXT_texture_compression_bptc, EXT_texture_sRGB_R8, EXT_draw_transform_feedback, OES_viewport_array • EXT_clip_cull_distance, ES support for clip planes & cull distances • EXT_protected_textures (Tegra & ES only) for protected content • For Windows and Linux operating systems • Also Vulkan improvements OpenGL 4.6 + Multi-vendor Interop + Vulkan Updates & More https://developer.nvidia.com/opengl-driver
  • 19. 19
  • 20. 20 What OpenGL 4.6 Packages Together • OpenGL evolves by bundling extensions as a core version update • OpenGL 4.6 = everything in 4.5 plus these extensions • ARB_indirect_parameters • ARB_pipeline_statistics_query • ARB_polygon_offset_clamp • KHR_no_error • ARB_shader_atomic_counter_ops (just extends OpenGL Shading Language) • ARB_shader_draw_parameters • ARB_shader_group_vote (just extends OpenGL Shading Language) • ARB_gl_spirv • ARB_spirv_extensions • ARB_texture_filter_anisotropic • ARB_transform_feedback_overflow_query • Now you can code for this functionality without ARB or EXT suffixing! The one technically “brand new” extension; other 4.6 functionality already proven & public
  • 21. 21 ARB_indirect_parameters: Intro & Review •Evolving capability in OpenGL 4.x • General idea: allow the GPU to generate its own rendering work • Part of AZDO philosophy • AZDO = Approaching Zero Driver Overhead • Big idea: If GPU generates its own work, the driver overhead on the CPU diminishes • Example: compute shader generates sets of meshes; then renders those meshes • But we don’t want the GPU to “wait” for the CPU to orchestrate this effort •Builds on OpenGL 4.0 and 4.3’s improvements • 4.0 added indirect draws: instanced draw call’s parameters sourced from GPU buffer • 4.3 added multiple indirect draws: one GL command launched N indirect draws •OpenGL 4.6’s breakthrough: ARB_indirect_parameters • Now the count of multiple indirect draw batches itself can be sourced from the GPU
  • 22. 22 Original Ways to Draw • Two primary ways to draw with vertex arrays • glDrawElements • Accepts an array of vertex indexes • glDrawArrays • Accepts a sequential range indexes • OpenGL 3.1 added instanced versions • glDrawElementsInstanced • glDrawArraysInstanced • Includes “instance count” parameter • Repeats each draw “instance count” times, changing gl_InstanceID each iteration
  • 23. 23 Vertex 0 (x0,y0), (r0,g0,b0) Vertex 1 (x1,y1), (r0,g0,b0) Vertex 2 (x2,y2), (r0,g0,b0) Vertex 3 (x3,y3), (r1,g1,b1) Vertex 4 (x4,y4), (r1,g1,b1) Vertex 5 (x5,y5), (r1,g1,b1) glDrawArrays(GL_TRIANGLES, 0, 6); glDrawElements(GL_LINES, 12, GL_UNSIGNED_INT, 0); Vertex array buffer configuration Index buffer (element array) configuration 0 1 1 2 2 0 3 4 4 5 5 3
  • 24. 24 Multi Draw Arrays • glMultiDrawArrays & glMultiDrawElements • Same as before, but loop over glDrawArrays or glDrawElements • Primitive count parameter says how many iterations • Each iteration sources non-mode parameters from CPU arrays • Fundamentally not more powerful than you writing the loop in your CPU code • But establishes a useful pattern for the future...
  • 25. 25 Instancing • GPU draw the same primitive topology, N times • Shader or vertex attribute usage can transform & shader each instance differently • Loops to output a single set of draw indices multiple times • Each iteration outputs a different instance • GLSL shaders can access gl_InstanceID to behave differently per instance • Instancing alternative to using gl_InstanceID in your shader • glVertexAttribDivisor gives a vertex attribute array a divisor • When divisor is non-zero, floor(instance / divisor) is used for this array • Common usage: when divisor is 1 for a vertex attribute array, treats instance ID uses index • Effectively enables per-instance vertex arrays
  • 26. 26 Power of Instancing • Vertex arrays with a single object mesh can render N distinct instances from a single GL command • Example image shows • Hundreds of instances • Draw from single mesh • Each instance has its own color & translation • Observations • GPU reads instanced vertex attributes • But CPU still launches the N instances Source: In2GPU
  • 27. 27 Draw Indirect (OpenGL 4.0) • Conventional GL draw calls • Require directly passing parameters to each GL draw call to find the indices to source • Direct parameter passing means CPU supplies all the draw parameters • Causes CPU overhead on each draw • Solution: Draw Indirect • Sources each batch of draw arrays or draw elements parameters from a GPU buffer • Parameters, except for mode, accessed from GL_DRAW_INDIRECT_BUFFER binding • Big advantage • GPU can generate draw batches itself • Say with compute shaders • Means GPU can feed itself
  • 28. 28 Draw Indirect Buffer Layout • glDrawArraysIndirect • Takes: (GLenum mode, const void *indirect) • indirect is GPU offset to four 32-bit words • Mimics calling glDrawArraysInstanced(mode, cmd->first, cmd->count, cmd->primCount); • glDrawElementsIndirect • Takes: (GLenum mode, GLenum type, const void *indirect) • indirect is GPU buffer offset to five 32-bit words • Mimics calling glDrawElementsInstancedBaseVertex(mode, cmd->count, type, cmd->firstIndex * sizeof-type, cmd->primCount, cmd->baseVertex); • BUT cmd pointer indirection happens on the GPU sourced from a GL buffer object struct DrawArraysIndirectCommand { GLuint count; GLuint primCount; GLuint first; GLuint reservedMustBeZero; } ; struct DrawElementsIndirectCommand { GLuint count; GLuint primCount; GLuint firstIndex; GLint baseVertex; GLuint reservedMustBeZero; } ; Important: These structures are read by the GPU from GPU buffers
  • 29. 29 Multi Draw Indirect (OpenGL 4.3) • Now a single GL command can launch multiple draw indirect operations • Takes a primitive count (N) for number of draw indirects • Performs N draw indirect operations • Each operation’s parameters are read from draw indirect buffer binding • Stride parameter • glMultiDrawArraysIndirect & glMultiDrawElementsIndirect • Single CPU command launches N draw indirect operations • All the parameters for all the draw indirect operations sourced by GPU • Very high leverage: tiny CPU effort can launch enormous amount of rendering
  • 30. 30 ARB_indirect_parameters • Yet-another new buffer binding • glBindBuffer(GL_PARAMETER_BUFFER); • Buffer source for reading the indirect draw count • Two new commands • glMultiDrawArraysIndirectCount • glMultiDrawElementsIndirectCount • Like glMultiDraw{Arrays/Elements}Indirect except • NEW draw count offset parameter is a buffer offset into NEW current parameter buffer – parameter_buffer[drawcountoffset]  actual drawcount • Count clamped by maxdrawcount parameter • What’s better about OpenGL 4.6 version? • Free of ARB suffixes in OpenGL 4.6
  • 31. 31 ARB_indirect_parameters Usage Scenario • Correctly-ordered blended dynamic particle system • Particles are semi-opaque 3D models, not just points • OpenGL compute shader computes particle interactions & what to render • Incrementally update particle positions & spin • Cull particles outside current view • Back-to-front sort of remaining viewable semi-opaque 3D models • Write out ordered, un-culled multi draw indirect to GL_DRAW_INDIRECT_BUFFER • Write out total of un-culled draw indirect count to GL_PARAMETER_BUFFER • Single glMultiDrawElementsIndirectCount command draws particles
  • 32. 32 ARB_pipeline_statistics_query •New query types • Shares same API initially used for occlusion queries • glBeginQuery, glEndQuery, glGetQueryiv, glGenQueries, glDeleteQueries • Original occlusion queries just returned samples passed • Prior extensions added queries for transform feedback, conservative rasterization •Now extended to return rendering statistics throughout the pipeline • Shader invocation counts • How many primitives pass through different points in rendering pipeline •Useful for performance analysis • Without this functionality, very difficult to accurately know how much rendering work you are really creating • Particularly for modern OpenGL usage •Comparable to statistics available to Direct3D 11 • Compare with D3D11_QUERY_DATA_PIPELINE_STATISTICS
  • 33. 33 Available Statistics Query token Queried statistic GL_VERTICES_SUBMITTED # of vertices issued to OpenGL GL_PRIMITIVES_SUBMITTED # of primitives issued to OpenGL GL_VERTEX_SHADER_INVOCATIONS # of times a vertex shader invoked GL_TESS_CONTROL_SHADER_PATCHES # of times a tessellation control shader invoked GL_TESS_EVALUATION_SHADER_INVOCATIONS # of times a tessellation evaluation shader invoked GL_GEOMETRY_SHADER_INVOCATIONS # of times a geometry shader invoked GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED # of primitives that entered primitive clipping GL_FRAGMENT_SHADER_INVOCATIONS # of times a fragment shader invoked GL_COMPUTE_SHADER_INVOCATIONS # of times a compute shader invoked GL_CLIPPING_INPUT_PRIMITIVES # of primitives that entered primitive clipping GL_CLIPPING_OUTPUT_PRIMITIVES # of primitives that output by primitive clipping
  • 34. 34 Simple Example Usage • Creating a query object • GLuint query_object; • glGenQueries(1, &query_object); • Begin a query, do work, and end the query’s interval • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object); • renderLotsOfStuff! • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object); • Later read back to the CPU the query object’s result • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result; • glGetQueryObjectui64v(query_object, GL_QUERY_RESULT, &query_result); • When done with the query object • glDeleteQueries(1, &query_buffer); • Alternatively write query results into a buffer...
  • 35. 35 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers create multiple query objects
  • 36. 36 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers create buffer object for PU to write query results
  • 37. 37 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers begin queries draw, and end them
  • 38. 38 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers now have GPU write query results to GPU buffer
  • 39. 39 • Create multiple query objects • const Glint num_results = 2; // could be larger! • GLuint query_object[2]; • glGenQueries(num_results , query_object); • Create GPU buffer object for writing query results into • GLuint query_buffer_object; • glGenBuffers(1, &query_buffer_object); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ); • Begin a query, do work, end the query’s interval, and write query results to query buffer offsets • glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • renderLotsOfStuff! • gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]); • gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]); • glBindBuffer(GL_QUERY_BUFFER, query_buffer_object); • glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0); • glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1); • Later read the query results from GPU buffer • Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline! • GLuint64 query_result[2]; • glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64); • Cleanup • glDeleteBuffers(1 , &query_buffer_object); • glDeleteQueries(num_results , query_object); Example Writing Query Results to GPU Buffers later read the GPU buffer’s contents to the CPU
  • 40. 40 ARB_polygon_offset_clamp • Extends OpenGL’s polygon offset feature •Polygon offset was one of OpenGL’s first extensions • Standardized by OpenGL 1.1 • Biases rasterized depth (Z) by constant bias + bias based on primitive’s depth maximum slope • What’s NEW in OpenGL 4.6 •Effective depth bias clamped to a specified maximum offset •Used to mitigate second-order light leak artifacts of polygon offset •Long supported by PlayStation 3 and Direct3D • First exposed in OpenGL as multi-vendor EXT extension •EXT_polygon_offset_clamp in 2014 •Adding to OpenGL 4.6 resolves IP issues Source: Eric Lengyel, Terathon Software
  • 41. 41 Motivation & Usage of Polygon Offset • Motivation of polygon offset • Depth buffers must quantize depth values • Typically 24-bit fixed-point • Want to rasterize depth-tested geometry • BUT have need to disambiguate nearly identical depth values Rasterizing co-planar geometry, e.g. runway markings Constructing shadow maps needs Z values to be “pushed back a little” to avoid Z fighting causing self-shadowing artifacts Shadow acne due to Z fighting during shadow map testing Shadow acne avoided using polygon offset Hidden line and silhouette rendering via polygon offset
  • 42. 42 Polygon Offset Justified (1) • Rasterizing triangles generates discretized depth values • A rasterizer’s depth slope for a triangle determines how Z values vary over triangle in pixel space • Triangles are “snapped” to sub-pixel fractional positions • Practical requirement, necessary for watertight rasterization • Rasterization hardware operates with finite fixed-point precision • Dealing with Z fighting isn’t as simple as “nudging Z values” a little closer/further • Two triangles logically in the same plane are NOT after • floating-point transformation • sub-pixel transformation • discrete depth interpolation • geometric mesh uncertainty – those triangles may appear co-planar, but are they really??
  • 43. 43 Polygon Offset Justified (2) • Conceptually, think of interpolated depth as having “error bars” • Depth rasterization error isn’t “experimental” but rather “quantization” error • Important: The depth slope tells maximum the depth of a primitive will shift moving in pixel X & Y • So if there is uncertainty (read: quantization!) in X & Y, a primitive’s depth slope quantizes the maximum error per pixel shift • Hence polygon offset’s bias should be scaled by the maximum of the X & Y depth slopes • This is what the original OpenGL 1.1 polygon offset functionality does • Bias applied in unites of minimum Z buffer precision • Typically a bias of 1 or 2 and slope of 0.5 is enough to mitigate Z fighting • Accounts for half a pixel of Z error • Sounds fishy but (mostly) works! Think of your rasterized fragments & pixels having error bars for X & Y... and Z!
  • 44. 44 Polygon Offset Improved! • Wait... Sounds fishy but (mostly) works! • Mostly?? • What can go wrong? • The depth slop can “get large” for geometry viewed edge-on • Gradient magnitude for slope is conservative and can get too large • So “fixing” shadow acne “exposes” light leaks • This is the “too much of a good thing” principle at work • Analogy: Band-aid on a band-aid: • If the bias can sometimes get too large... then • Clamp the maximum depth bias to some largest “reasonable” offset
  • 45. 45 Using Polygon Offset Clamp •Easy API just adds new maximum depth bias clamp value • GL 1.1: glPolygonOffset(factor, units) • GL 4.6: glPolygonOffsetClamp(factor, units, clamp); •Changes the OpenGL specification’s equation for depth bias • WAS • NOW
  • 46. 46 Examples of Light Leaks Mitigated by Polygon Offset Clamp BEFORE AFTER Solid girder’s shadow shows streaks Animates badly Mitigated by clamping
  • 47. 47 Examples of Light Leaks Mitigated by Polygon Offset Clamp Dots of light within boot’s shadow Animates badly Mitigated by clamping BEFORE AFTERBEFORE AFTER
  • 48. 48 KHR_no_error • The “no airbags” extension now part of OpenGL 4.6 • Makes OpenGL operation in the presence of GL_INVALID_VALUE, GL_INVALID_OPERATION, etc. undefined • GL_OUT_OF_MEMORY may still be generated but the occurrence might be delayed • Intended to make OpenGL more efficient by obviating error checking & handling • Hmm, not a large overhead in NVIDIA’s driver • Typically error checks are folded into parameter handling • Error checks are typically well-predicted “branches not taken” so cheap on modern CPUs • Your must “opt in” to the “no error” semantic at context creation • For EGL, works with eglCreateContext • See the EGL_KHR_create_context_no_error extension • Query the value of GL_CONTEXT_FLAGS for the GL_CONTEXT_FLAG_NO_ERROR_BIT to see if the “no errors” semantic is enabled for a context • WGL_ARB_create_context_no_error and GXL_ARB_create_context_no_error provide WGL and GLX mechanisms for requesting “no error” semantic for a context.
  • 49. 49 NVIDIA’s KHR_no_error Advice • General advice: “Try it before you buy it” • Not generating errors has a severe side-effect (main effect!)  you’re blind to errors! • First confirm there’s some sufficient performance benefit to offset the risk • If you really are worried about API error detection overhead, consider Vulkan • And before you even try it: • Try disabling GL_DEBUG_OUTPUT_SYNCHRONOUS (part of OpenGL 4.3) first • This still detects GL errors but avoids returning errors synchronously • Asynchronous error and debug output helps NVIDIA’s dual-core driver to avoid app- driver synchronization events for errors and debug output • Then OpenGL API overhead can be relegated to another CPU improving performance • Without losing well-defined error handling • NVIDIA’s current “no errors” behavior is to simply hide posting the OpenGL error • So the current benefit of “no errors” is very meager • Errors are still detected and erroneous commands are ignored • Considerations • Expecting your software to work for years? • Is your application’s predictable operation important for your user base? • If yes, then blinding yourself to errors probably isn’t a good idea...
  • 50. 50 ARB_shader_atomic_counter_ops • Completes OpenGL Shading Language (GLSL) support for atomic counters • Prior ARB_shader_atomic_counter limited to increment, decrement, & query ops • Operates on special atomic_uint variables • New built-in functions for atomic counters • Addition & Subtract: atomicCounterAdd & atomicCounterSubtract • Minimum & Maximum: atomicCounterMin & atomicCounterMax • Bitwise operators (AND, OR, XOR, etc.): atomicCounterAnd, atomicCounterOr, etc. • Exchange, Compare & Exchange: atomicCounterExchange, atomicCounterCompSwap • NOTE: Image loads & stores support similar atomics
  • 51. 51 ARB_shader_draw_parameters • Adds to new GLSL built-in variables to get base vertex and instance • gl_BaseVertex • gl_BaseInstance • Useful for offsetting gl_VertexID or gl_InstanceID respectively • Also for glMultiDraw* commands, new GLSL built-in variable • gl_DrawID • glMultiDrawArrays, glMultiDrawArraysIndirect, glMultiDrawArraysIndirectCount • glMultiDrawElements, glMultiDrawElementsIndirect, glMultiDrawElementsIndirectCount • Rationale: lets app treat draw batches programmatically from within an über shader to better minimize state changes
  • 52. 52 ARB_shader_group_vote • Provides new GLSL built-in functions to compute composite of a set of boolean conditions across a group of shader invocations • Functions returning a boolean • bool anyInvocation(bool value) all threads return true if value is true for any, otherwise false • bool allInvocations(bool value) all threads return true if value is true for all threads, otherwise false • bool allInvocationsEqual(bool value) all threads return true if value is identical (equal) for all threads, otherwise false
  • 53. 53 ARB_shader_group_vote Rationale • Rationale • Implementation reality: GPUs run shader invocations using groups of threads • NVIDIA calls these groups “warps” • Threads run most efficient when they share the same sequence of instructions • This is called “converged execution” (good), instead of diverged execution (bad) • Group votes can keep threads running converged • Consider this an advanced optimization to your shaders • SPMT (“single program, multiple thread”) execution means shaders run reasonably even when divergence is possible • Example use: Common for all threads in shader to need exactly four loop iterations • If all threads can agree they are in the “4 iterations” case, the shader could be written with an unrolled loop in expectation of this common case • Thereby avoiding the loop overhead of the general case
  • 54. 54 ARB_gl_spirv • This extension announced at SIGGRAPH 2016 • But was optional • NVIDIA announced support last year • Much more useful to have core part of OpenGL 4.6 • And NOW it is!
  • 55. 55 OpenGL Driver GLSL Compilation prior to SPIR-V shader.vert shader.geom shader.frag your OpenGL app GPU GLSL Compiler Front-end GPU-specific Compiler Back-end
  • 56. 56 OpenGL Driver GLSL Compiler Front-end ARB_gl_spirv Enabled Offline Compilation of GLSL to SPIR-V your OpenGL app GPU shader.vert shader.geom shader.frag shader.vert.spv shader.geom.pv shader.frag.spv glslangValidator or glslc GPU-specific Compiler Back-end SPIR-V Compiler Front-end
  • 57. 57 Tools to Manipulate SPIR-V • Open source SPIR-V tools available • glslang: glslValidtator • Provides basic GLSL compiler that generates OpenGL friendly SPIR-V • Use the –G option for ARB_gl_spriv SPIR-V • https://github.com/KhronosGroup/glslang • SPIRV-Tools: spirv-as, spirv-dis, spirv-stats, etc. • Utilities for assembling, disassembling, or otherwise manipulating SPIR-V binaries • https://github.com/KhronosGroup/SPIRV-Tools • glslc • Compiler front-end matching conventional gcc/clang command line options • Use the --target-env=opengl_compat • https://github.com/google/shaderc • Your choice: • Build from source • Get pre-compiled from LunarG Vulkan SDK
  • 58. 58 API Usage Differences: Compiling GLSL vs. SPIR-V glCreateProgram glShaderSource glCompileShader glAttachShader glCreateShader glLinkProgram glGetUniformLocation glGetAttribLocation Read GLSL text from file glUseProgram glProgramUniform* while more shader domains while more uniforms to introspect while more attributes to introspect
  • 59. 59 API Usage Differences: Compiling GLSL vs. SPIR-V glCreateProgram glShaderBinary glSpecializeShader glAttachShader glCreateShader glLinkProgram Read SPIR-V binary blob from file glUseProgram glProgramUniform* while more shader domains while more uniforms to initialize app assume locations assigned within the shader, obviating dynamic introspection
  • 60. 60 ARB_spirv_extensions • Original ARB_gl_spirv extension only added support for SPIR-V 1.0 concepts that were part of the OpenGL 4.5 Core Profile • Many OpenGL ARB and vendor extensions not in OpenGL 4.5 Core add shading language concepts • BUT being defined prior to the existence of SPIR-V support in OpenGL, they lack SPIR-V support for their additional features • Advertising an extension + its SPIR-V extension means the SPIR-V support for that extension is present • So ARB_spirv_extensions adds mechanism to advertise a driver’s supported SPIR-V extensions: • Glint num_spirv_extensions; glGetIntegerv(GL_NUM_SPIR_V_EXTENSIONS, &num_spirv_extensions); • for (int ndx=0; ndx<num_spirv_extensions; ndx++) const GLubyte *spirv_extension_name = glGetStringi(GL_SPIR_V_EXTENSIONS, ndx); • Also defines several SPIR-V extensions...
  • 61. 61 First Set of SPIR-V Extensions SPIR-V Extension Name Corresponding OpenGL extension or functionality SPV_KHR_shader_ballot ARB_shader_ballot SPV_KHR_shader_draw_parameters ARB_shader_draw_parameters SPV_KHR_subgroup_vote ARB_shader_group_vote SPV_NV_stereo_view_rendering NV_stereo_view_rendering SPV_NV_viewport_array2 NV_viewport_array2 or ARB_shader_viewport_layer_array SPV_NV_geometry_shader_passthrough NV_geometry_shader_passthrough SPV_NV_sample_mask_override_coverage NV_sample_mask_override_coverage SPV_AMD_shader_explicit_vertex_parameter AMD_shader_explicit_vertex_parameter SPV_AMD_gpu_shader_half_float AMD_gpu_shader_half_float SPV_KHR_shader_atomic_counter_ops ARB_shader_atomic_counter_ops SPV_KHR_post_depth_coverage ARB_post_depth_coverage SPV_KHR_storage_buffer_storage_class Storage buffer support
  • 62. 62 ARB_texture_filter_anisotropic • Fully compatible with long-standing EXT_texture_filter_anisotropic • Simple Ease to use: glTextureParameteri(texture_object, GL_TEXTURE_MAX_ANISOTROPY, 8);
  • 63. 63
  • 64. 64 ARB_transform_feedback_overflow_query • Adds new query types which can be used to detect overflow of transform feedback buffers • GL_TRANSFORM_FEEDBACK_OVERFLOW if any stream overflows GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW if a particular indexed vertex stream overflows • These two NEW query types are also allowed for glBeginConditionalRender for conditional rendering • Allows the graphics pipeline can condition rendering on if a prior vertex stream operations overflowed • Comparable to Direct3D 11’s D3D11_QUERY_SO_OVERFLOW_PREDICATE* stream- out functionality
  • 65. 65 Why OpenGL Core Updates are Important (1) • Not just opportunity for new functionality • A new specification is released that reconciles all the bundled extensions into a coherent single document • Also gives the OpenGL Working Group to better structure OpenGL’s specification • Opportunity to fix typos, improve consistency of terminology, clarify ambiguities, document expected error behavior • Almost two dozen different minor tweaks in 4.6, largely consequential to developers • Future extensions can then be written against a cleanly resolved 4.6 specification • Otherwise, extensions can overlap how they amend the core specification and lead to confusion • Ensures new functionality is covered by the Khronos Intellectual Property (IP) Framework • This allows OpenGL implementers, developers, and end-users to confidently depend on the functionality described • Specifically for 4.6, Intellectual Property concerns surrounding both anisotropic texture filtering and polygon offset clamping • Khronos maintains OpenGL, ES, and Vulkan in the same “IP zone”—so ratifying a Khronos standard resolves issues for related standards Coherent Specification Resolving IP Concerns
  • 66. 66 Why OpenGL Core Updates are Important (2) • Not just opportunity for new functionality • Opportunity for a new Conformance Test Suite to be released • New tests obviously cover NEW functionality • But also include contributed tests for existing functionality • Without a new core specification, it is hard to enforce stronger conformance testing • Vendors would simply continue certifying with an older, weaker conformance test version • A new core version is a new opportunity to raise the shared quality bar for OpenGL • Developers adopt OpenGL features at different levels of comfort • Many developers are happy to use the latest, greatest features as soon as extensions are shipped in drivers • Other developers, often those with long-term support horizons, look for core updates to signal mature standards now ready to be adopted • Example: A graphics researcher and a medical device maker can both use OpenGL, but embrace the features provided at varying rates and at different milestones Conformance Testing QualitySheriff Developer Comfort Levels
  • 67. 67 Why OpenGL Core Updates are Important (3) • Not just opportunity for new functionality • OpenGL Shading Language (GLSL) gets accompanying revision • So OpenGL 4.6 brings with it an updated GLSL • Like the core API specification, the GLSL specification needs reconciliation of new extensions, typos fixed, clarifications, etc. • As many Vulkan applications express shaders in GLSL and compile them with glslang to generate the SPIR-V that Vulkan expects, updating GLSL helps advance Vulkan • OpenGL core revisions are as much about consolidating OpenGL’s associated ecosystem support as simply adding NEW features to OpenGL Advancing the Ecosystem
  • 68. 68 OpenGL 4.6’s Resolving of IP Issues & New Open Sourcing of OpenGL Conformance Suite Benefits Open Source OpenGL Implementation • Khronos using Vulkan’s conformance approach for OpenGL now • See https://github.com/KhronosGroup/VK-GL-CTS • Should help Mesa keep closer to latest official standard, better for OpenGL overall "OpenGL 4.6 will be the first OpenGL release where conformant open source implementations based on the Mesa project will be deliverable in a reasonable timeframe after release. The open sourcing of the OpenGL conformance test suite and ongoing work between Khronos and X.org will also allow for non-vendor led open source implementations to achieve conformance in the near future.“ David Airlie, senior principal engineer at Red Hat, and developer on Mesa / X.org projects Source: Khronos OpenGL 4.6 press release
  • 69. 69 Credit for OpenGL 4.6 • Khronos relies on its member companies to complete new OpenGL core updates • Different companies drove different features, all free to comment and contribute • Representatives of these companies drove the constituent features of OpenGL 4.6 See Appendix J of OpenGL 4.6 for comprehensive list of contributor companies and individuals
  • 70. 70 GPU “Interop” Usage •Increasingly applications want to share GPU resources and mix APIs • Typically sophisticated applications •APIs involved might be • Graphics (OpenGL, Vulkan, Direct3D) • Compute (OpenCL, CUDA) • Video encode and decode (VDAPU, NVENC, NVDEC, Windows Media) •Multiple motivations for cross-process GPU resource sharing • Performance (don’t read back to CPU), latency control (VR compositing) • Robustness (isolation) • Security, including protecting digital media assets •Interop = jargon for two things • Sharing GPU resources among different APIs • Sharing GPU resources across process boundaries • For example, a display compositor
  • 71. 71 Past Interop Extensions for OpenGL •Past interoperability extensions would pair OpenGL concepts to those of another one particular GPU API • Often exposed as proprietary extensions • Typically tied to platform concepts (e.g. Win32 HANDLEs) • Simple when API concepts match (e.g. OpenGL textures to Direct3D Surfaces) •Examples • NV_DX_interop mixed OpenGL and Direct3D 9 • NV_DX_interop2 mixes OpenGL and Direct3D 10 & 11 • NV_vdpau_interop mixes OpenGL and Linux VDAPAU video input/output surfaces • Additionally, CUDA & OpenCL have interop to OpenGL •Worked well as designed BUT...
  • 72. 72 Responding to New Interop Requirements • Addressing criticism of prior interop extensions... • In many cases, single-vendor and proprietary extensions • Can we strive for something multi-vendor? • Overcoming NEW Managed vs. Explicit GPU API philosophy mismatches • Older GPU APIs (e.g. OpenGL, Direct3D 9,10,11) manage GPU resources and their underlying memory as one • Older APIs have textures, buffers, and synchronization objects • New GPU APIs (e.g. Vulkan, Direct3D 12) uses lower-level mechanisms to manage resources • Newer explicit APIs have explicit memory allocations and semaphores • Noticeable lack of common interop infrastructure • Can there be some common framework for interop • Isolate platform-specific methods to “import” objects into platform-specific extension • Windows uses HANDLEs, etc. • POSIX operating systems use file descriptors
  • 73. 73 EXT_memory_object & EXT_semaphore • Vulkan introduces explicit memory and synchronization objects • EXT_memory_object imports Vulkan explicit memory objects to OpenGL • EXT_semaphore imports Vulkan semaphore objects to OpenGL • Extra interop mechanisms need to share GPU objects due to this • Platform-specific extensions specify how to import memory objects & semaphores • For POSIX systems (e.g. Linux), use EXT_memory_object_fd & EXT_sempahore_fd • fd = POSIX file descriptor • For Windows, use EXT_memory_object_win32 & EXT_semaphore_win32 • Uses either Win32’s opaque HANDLE type or KMT share handle • KMT = Kernel-Mode Thunk interface for Windows Display Driver Model (WDDM) • Also for interoperability with Direct3D 11 & 12 • Also EXT_win32_keyed_mutex provides access to the keyed synchronization primitive of Direct3D image objects
  • 74. 74 EXT_semaphore • Introduces new object matching Vulkan-style semaphores • Basic operations on semaphores • Object management • glGenSemaphoresEXT generates semaphore object names • glDeleteSemaphoresEXT deletes semaphore objects • Parameter setting & querying • glSemaphoreParameterui64vEXT & GetSemaphoreParameterui64vEXT • Semaphore parameters are platform dependent (e.g. GL_D3D12_FENCE_VALUE_EXT) • Semaphore operations • glSignalSemaphoreEXT signals a semaphore • glWaitSemaphoreEXT waits until something signals the semaphore
  • 75. 75 EXT_memory_object • Introduces new memory object corresponding to Vulkan concept • Import memory objects with platform-specific API • Then “carve out” managed OpenGL textures and buffers from a memory object • Commands to make textures: glTexStorageMem1DEXT, glTexStorageMem2DEXT, glTexStorageMem3DEXT, glTexStorageMem2DMultisampleEXT, glTexStorageMem3DMultisampleEXT • Also Direct State Access (DSA) versions: glTextureStorageMem2DEXT, etc. • Commands to carve out a buffer: glBufferStorageMemEXT, glNamedBufferStorageMemEXT
  • 76. 76 OpenGL ES Parity • Mobile developers often target OpenGL ES • Apple’s iOS and Google’s Android use of ES made the de facto standard graphics API for mobile • Moore’s Law has eliminated the need for ES on NVIDIA products • ES 2.0/3.x is supported along with full OpenGL 4.x feature set • Essentially an ES context “hides” the complete OpenGL 4.x feature set • Good for compatibility and portability to other vendor’s less functional GPUs • Unfortunately ES has been slow to adopt important GPU features • NVIDIA makes sure developers relying on ES contexts don’t forego missing features • NVIDIA works to coordinate multi-vendor EXT extensions to ES • NVIDIA supports fully conformant ES contexts (+ extensions) even on Windows and Linux • NVIDIA’s OpenGL in 2017 adds many ES parity extensions... ???
  • 77. 77 Oh, 3D developer—you flatter me noticing my complete & mature feature set With ES parity, what does she have that I don’t? OpenGL 4.6 Context ES 3.2 Context
  • 78. 78 ES Parity Extensions for 2017 Extension name Functionality EXT_clear_texture Clear texture images & sub-images EXT_conservative_depth Bound direction of fragment shader depth output EXT_shader_group_vote Collective decision making in shaders EXT_texture_compression_bptc Compressed texture formats corresponding to Direct3D’s BC6 (8-bit RGB & RGBA) and BC7 (for HDR) formats EXT_texture_compression_rgtc One- and two-component texture compression EXT_texture_sRGB_R8 Single-component (red) sRGB color-space component encoding EXT_draw_transform_feedback Adds missing transform feedback API to ES intended for geometry shaders’s variable output vertices EXT_clip_cull_distance Clipping and culling planes OES_viewport_array Viewport index support for geometry shaders KHR_parallel_shader_compile Request multi-threaded GLSL shader compilation
  • 79. 79 Still ES Lacks Much, NVIDIA Provides What’s Missing •The 2017 multi-vendor parity extensions highlight what’s missing from standard ES 3.2 •Additional major items missing from standard ES 3.2 • Texture views with OES_texture_view missed ES 3.2 inclusion • GPU-accelerated path rendering with NV_path_rendering for ES •BUT NVIDIA’s OpenGL ES context provides these •If ES still isn’t enough, just use an OpenGL 4.6 context • For example, Direct State Access is not in ES contexts  +
  • 80. 80 NVIDIA’s ES Parity Philosophy • The idea of “ES Parity” is NOT to turn an ES context into an OpenGL 4.x context • The idea is to expose • Features NVIDIA’s ES developer base has requested • Features that we judge other ES vendors could reasonably support • When Khronos ES vendors broadly agree, we work towards an OES extension – Example: OES_viewport_array • When just subset of Khronos ES vendors agree, we work for a multi-vendor EXT extension – Example: EXT_clip_cull_distance • As a last resort, when other ES vendors don’t share our interest, we go with NV • Need a feature missing from ES? Speak up • NVIDIA does not expose extensions broadly inconsistent with ES’s philosophy • For example, fixed-function, immediate-mode, and display lists aren’t candidates for ES parity • Developers desiring such functionality are better off with OpenGL 4.x contexts
  • 81. 81 NVIDIA ES Parity Enhancements Result of NVIDIA’s ES Parity Efforts Full OpenGL ES 3.2 ES 3.1 ES 3.0 ES 2.0 Industry’s most functional and full-featured ES driver OSes and Architectures Android, Linux, Windows, FreeBSD; x86, ARM, IBM PowerPC
  • 82. 82 Perspective of ES Parity from an OpenGL 4.6 Context NVIDIA ES Parity Enhancements Full OpenGL ES 3.2 ES 3.1 ES 3.0 ES 2.0 NVIDIA OpenGL 4.6 with maximally functional extensions Same driver provides ES and 4.6 contexts Only difference between ES and 4.6 context is ES context disables non-ES usage
  • 83. 83 Miscellaneous NEW Extensions for 2017 • NV_blend_minmax_factor, based on AMD_blend_minmax_factor • EXT_protected_textures (Tegra & ES only) • Used with EGL’s EGL_EXT_protected_content Miscellaneous 2017
  • 84. 84 NV_blend_minmax_factor: Modulated Min/Max Blending • Original GL_MIN and GL_MAX blend equations limited • Both ignore the blend source and destination blend factors from glBlendFunc • Limitation of original SGI hardware • Conventional min/max blend equations • blendResult = min(sourceColor, destinationColor) • blendResult = max(sourceColor, destinationColor) • AMD_blend_minmax_factor extension generalizes with two new blend equations • GL_FACTOR_MIN_AMD: blendResult = min(sourceColor × sourceFactor, destinationColor × destinationFactor) • GL_FACTOR_MIN_AMD: blendResult = max(sourceColor × sourceFactor, destinationColor× destinationFactor) • NV_blend_minmax_factor provides same capability • Just with a few restrictions, matching blend equation advanced restrictions • Not for use with dual-source blending • Not for mismatched multiple draw buffers • Single-precision floating-point blending done in half-precision • Otherwise compatible with AMD extension (uses same token values)
  • 85. 85 NV_blend_minmax_factor Example • Blend code • blendResult = max(sourceColor, destinationColor × (1−sourceAlpha)) • Code to configure • glEnable(GL_BLEND); • glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); • glBlendEquation(GL_FACTOR_MIN_AMD); • Extension supported on Maxwell and later GPU generations Unconventional blending Source: Visual Music Systems
  • 86. 86 EGL_EXT_protected_content & EXT_protected_textures (1) •Together provide OpenGL protected access control to GPU images • Intended for managing trust in display compositors and apps • Designed for Android •GL_TEXTURE_PROTECTED_EXT texture parameter • Applies to OpenGL texture objects • And hence applies to framebuffer objects containing texture objects • Boolean, defaults to false (unprotected) unless explicitly specified true •EGL_PROTECTED_CONTENT_EXT attribute • Applies to EGL surfaces and EGLImages • Boolean, defaults to false (unprotected) unless explicitly specified true •Texture objects, EGL surfaces, and EGLImages all “resources” subject to protection
  • 87. 87 EGL_EXT_protected_content & EXT_protected_textures (2) • Pipeline stages of OpenGL contexts can also be designated protected and unprotected • Scenario: • display compositor uses a protected context • while apps would use unprotected contexts • Technically different GPU stages can be protected vs. non-protected • General access rules • Protected pipeline stages • Can read any EGL surfaces and images, protected or otherwise • BUT may NOT write non-protected EGL surfaces and images • Non-protected contexts/stages • Can read & write non-protected • BUT may NOT read or write protected content • Expectation: GPU & operating system together enforce resource protection via protected virtual memory mappings
  • 88. 88 EGL_EXT_protected_content Scenarios • Android 7.0’s secure texture video playback • Allows secure GPU post-processing of protected image content • Supports secure Digital Rights Management (DRM) Source: Google
  • 89. 89 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year NumberofOpenGLextensionsproposed Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
  • 90. 90 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 NumberofOpenGLextensionsproposed 1.5 4.5 4.6 4.4 4.34.24.1 3.3 & 4.0 3.2 3.1 3.0 2.1 2.0 1.4 1.3 1.21.1 OpenGL core version updates Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal
  • 91. 91 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 NumberofOpenGLextensionsproposed Run-up to DirectX 10 Run-up to DirectX 11 Run-up to DirectX 12 TNT + GeForce Run-up to DirectX 8 Run-up to DirectX 9 Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal
  • 92. 92 Implemented NVIDIA OpenGL Extensions by Approximate Initial Proposal Year 0 10 20 30 40 50 60 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 NumberofOpenGLextensionsproposed Tesla development (GeForce 8,9,100,200,300) Fermi development (~GeForce 400) Kepler development (~GeForce 600) GeForce 1,2 GeForce 3,4 GeForce 5,6,7 Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions, difficult to say exactly when an extension was proposed, product release lags extension proposal Pascal development (~GeForce 10) Maxwell development (~GeForce 700-900) Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates
  • 93. 93 Cumulative Implemented NVIDIA OpenGL Extensions Over 20 Years 0 50 100 150 200 250 300 350 400 450 500 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 CumulativeImplemented OpenGLextensionsproposed Same data as prior graphs, just integrated over time
  • 94. 94 NVIDIA OpenGL Shader Caching • NVIDIA OpenGL driver saves GLSL shaders it compiles • Cached compiled shaders saved to your local disk • Next time you compile the same shader, driver loads the post-compiled cached copy • Saves compilation time! • Invalidated on new driver installation • Games can “warm” cache on installation or first play to speed game loading • Available both on Windows and Linux! • Windows location • %LOCALAPPDATA%NVIDIAGLCache • For older drivers, used %APPDATA %NVIDIAGLCache • Linux location • %HOME/.nv/GLCache • Or $XDG_CACHE_HOME/.nv/ if XDG_CACHE_HOME environment variable set • Following the convention set by the XDG Base Directory Standard • Locations subject to change with future drivers and new conventions
  • 95. 95 Linux Graphics Open Source Efforts from NVIDIA • NVIDIA works to improve graphics support for entire Linux ecosystem • Examples • GL Vendor-Neutral Dispatch (GLVND) • arbitrates vendor-neutral access to OpenGL and EGL/GLX APIs • Wayland support for EGL Streams • Video Decode and Presentation API for Unix (VDPAU) • complete solution for decoding, post-processing, compositing, and displaying compressed or uncompressed video streams • All open source projects
  • 96. 96 GLVND: GL Vendor-Neutral Dispatch library • libglvnd • Arbitrates OpenGL API calls between multiple vendors • Multiple drivers from different vendors to coexist on the same file system • Determines which vendor to dispatch each API call to at runtime • Both GLX and EGL are supported • Any combination with OpenGL and OpenGL ES (1.1, 2.0, 3.x) • NVIDIA open source contribution • https://github.com/NVIDIA/libglvnd
  • 97. 97 Before GLVND NVIDIA Proprietary Linux Driver Mesa + Nouveau I control OpenGL best on NVIDIA GPUs But I got here first! Drivers driving you crazy! I just want my Linux window system to start! pre-GLVND user
  • 99. 99 NVIDIA’s Support for Wayland • Wayland • Intended as simpler replacement for X Window System • A protocol for a compositor to talk to its clients • Plus the C library implementation of that protocol • Depends on a compositor (e.g. Weston) that is the display server • Supports varying window managers (e.g. Mutter for Gnome) • Wayland is supported on NVIDIA GPUs through EGL Streams • Using NVIDIA’s Proprietary OpenGL driver performance & quality • Both Weston and Mutter (used by gnome-shell) currently have EGL Stream support • Although not by default • See https://github.com/NVIDIA/egl-wayland • NVIDIA open source project
  • 100. 100 0 Latest VDPAU Support • Video Decode and Presentation API for Unix (VDPAU) • Latest NVIDIA GPUs (GeForce 1080, etc.) • Supports VDPAU Feature Set H • Hardware-accelerated decoding of 8192x8192 (8k) H.265/HEVC video streams • Full support of HEVC Main12 profile
  • 101. 101 1 NVIDIA Codec SDK 8.0 • Two hardware acceleration interfaces: • NVENCODE API for video encode acceleration • NVDECODE API for video decode acceleration • Integration already available in the FFmpeg/libav • New in 8.0 • 10/12-bit decoding support with HEVC/VP9, enabling end-to-end HDR transcoding • Improved quality via weighted prediction • Support for OpenGL inputs (Linux only) Download for registered developers: https://developer.nvidia.com/designworks/video_codec_sdk/downloads/v8.0 Info: https://developer.nvidia.com/nvidia-video-codec-sdk
  • 102. 102 2 Supported Video Encoding Formats by GPU Generation * Except GM108 ** Except GP100 (is limited to 4K resolution) 8k encoding for latest GPUs!
  • 103. 103 3 GPU Encoding: Awesome Performance & Comparable Quality Bigger is faster for NVENC Comparable peak signal-to-noise ratio indicates comparable quality
  • 104. 104 4 Supported Video Decode Formats by GPU Generation * Except GM108 ** Max resolution support is limited to selected Pascal chips *** VP8 decode support is limited to selected Pascal chips **** VP9 10/12 bit decode support is limited to select Pascal chips 8k encoding for latest GPUs!
  • 105. 105 5 NVDEC to OpenGL to NVENC NVDEC NVENC OpenGL texture object OpenGL texture object OpenGL texture object Linux only for GL to NVENC For Windows, use OpenGL interop into Direct3D surfaces to encode from Direct3D surfaces Decode into Rendering to Framebuffer Objects Encode from
  • 106. 106 6 Proven GPU Codec Technology •Same underlying technology powers these services Play your PC games on your PC, encode to the cloud Play your PC game on your PC, decode & play on your SHIELD TV
  • 107. 107 7 GLEW Support Available NOW GLEW = The OpenGL Extension Wrangler Library Open source library Pre-built distribution: http://glew.sourceforge.net/ Source code: https://github.com/nigels-com/glew Your one-stop-shop for API support for all OpenGL extension APIs Now released GLEW 2.1 (July 31, 2017) provides API support for OpenGL 4.6 Multi-vendor EXT interoperability extensions All of NVIDIA’s Maxwell & Pascal extensions All other NVIDIA multi-GPU generation initiatives Examples: NV_path_rendering, NV_command_list, NV_gpu_multicast Thanks to Nigel Stewart, GLEW maintainer, for this
  • 108. 108 8 NVIDIA OpenGL in 2017 Provides OpenGL’s Maximally Available Superset OpenGL 4.6 Pascal Extensions 2015 ARB extensions OpenGL 4.5 Core Maxwell Extensions Legacy EXT & Other Compatibility Extensions OpenGL Complete Compatibility Path Rendering Multi-GPU. SLI Approaching Zero Driver Overhead NVIDIA Multi-generation GPU Initiatives DirectX inter-op Vulkan inter-op ES Enhancements Full OpenGL ES 3.2 Khronos Standard Expected Compatibility NVIDIA Initiatives GPU Generation Features
  • 109. 109 9 Last Words •Khronos announces OpenGL 4.6 today! Best OpenGL yet •Highlights of NVIDIA’s OpenGL support in 2017 • NVIDIA has OpenGL 4.6 today, developer preview driver available NOW • SPIR-V support standard part of OpenGL now • Multi-vendor EXT interoperability extensions NEW this year • “ES Parity” effort for 2017 • Miscellaneous extensions: protected content, min/max factor blending • Open source graphics contributions from NVIDIA • GLVND, VDPAU for video processing, and Wayland EGL Streams support • GPU-accelerated Encode & Decode
  • 110. 110 0 SIGGRAPH Paper Using OpenGL to Check Out • How to make shaders modular without giving up performance • Open source on github • Accompanied by OpenGL and Vulkan demo • Wednesday, 2 August • Los Angeles Convention Center, Room 150/151 • 10:45 am - 12:35 pm
  • 111. 1