SlideShare a Scribd company logo
Advanced Scenegraph Rendering Pipeline
Markus Tavenrath – NVIDIA – matavenrath@nvidia.com
Christoph Kubisch – NVIDIA – ckubisch@nvidia.com
2
 Traditional approach is render while
traversing a SceneGraph
 Scene complexity increases
– Deep hierarchies, traversal expensive
– Large objects split up into a lot of
little pieces, increased draw call count
– Unsorted rendering, lot of state
changes
 CPU becomes bottleneck when
rendering those scenes
SceneGraph Rendering
models courtesy of PTC
Introduction SceneGraph SceneTree ShapeList Renderer
3
Overview
Introduction SceneGraph SceneTree ShapeList Renderer
G0
T0 T1
T2
S1 S2
G1
T3
S0
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
ShapeList
S1 S2S0 S1‘ S2‘
RendererCache
S1 S2S0 S1‘ S2‘
SceneGraph SceneTree ShapeList
Renderer
Gi Group Ti Transform Si Shape
4
SceneGraph
G0
T0 T1
T2
S1 S2
G1
T3
S0
 SceneGraph is DAG
 No unique path to a node
– Cannot efficiently cache path-dependent data per node
 Traversal runs over 14 nodes for rendering.
 Processed 6 Transform Nodes
– 6 matrix/matrix multiplications and inversions
 Nodes are usually ‚large‘ and not linear in memory
– Each node access generates at least one, most
likely cache misses
Gi Group Ti Transform Si Shape
Introduction SceneGraph SceneTree ShapeList Renderer
5
SceneTree construction
G0
T0 T1
T2
S1 S2
G1
T3
S0
Gi Group Ti Transform Si Shape
Introduction SceneGraph SceneTree ShapeList Renderer
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
G0 -> (G0)
T0 -> (T0)
T1 -> (T1)
S0 -> (S0)
G1 -> (G1)
T2 -> (T2)
S1 -> (S1)
T3 -> (T3)
S2 -> (S2)
G0 -> (G0)
T0 -> (T0)
T1 -> (T1)
S0 -> (S0)
G1 -> (G1,G1‘)
T2 -> (T2,T2‘)
S1 -> (S1,S1‘)
T3 -> (T3,T3‘)
S2 -> (S2,S2‘)
 Observer based
synchronization
6
 SceneTree has unique path to each
node
 Store accumulated attributes like
transforms or visibility in each Node
 Trade memory for performance
– 64-byte per node, 100k nodes ~6MB
– Transforms stored separate vector
 Traversal still processes 14 nodes.
Gi Group Ti Transform Si Shape
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
SceneTree
Introduction SceneGraph SceneTree ShapeList Renderer
7
SceneTree invalidate attributes cache
 Keep dirty flags per node
 Keep dirty vector per flag
 SceneGraph change notifications
invalidated nodes
– If not dirty, mark dirty and add to
dirty vector
– O(1) operation, no sorting required
upon changes
 Before rendering a frame process
dirty vectorDirty vector
T1T3 T3‘
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
Introduction SceneGraph SceneTree ShapeList Renderer
8
T3‘
T1
T2‘T3
SceneTree validate attribute cache
 Walk through dirty vector
— Node marked dirty -> search top dirty
— Validate subtree from top dirty
 Validation example
— T3 dirty, traverse up to root node
 T3 top dirty node, validate T3 subtree
— T3‘ dirty, traverse up to root node
 T1 top dirty node, validate T1 subtree
— T1 not dirty
 No work to do
Dirty vector
T1T3 T3‘
G0
T0 T1
T2
S1 S2
G1
T3
S0
T3‘
S1‘ S2‘
G1‘
T3 T1T3‘
Introduction SceneGraph SceneTree ShapeList Renderer
9
SceneTree to ShapeList
 Add Events for ShapeList generation
– addShape(Shape)
– removeShape(Shape)
ShapeList
S1 S2S0 S1‘ S2‘
Gi Group Ti Transform Si Shape
G0
T0 T1
T2
S1 S2
G1
T3
S0
T2‘
S1‘ S2‘
G1‘
T3‘
Introduction SceneGraph SceneTree ShapeList Renderer
10
 SceneGraph to SceneTree synchronization
– Store accumulated data per node instance
 SceneTree to ShapeList synchronization
– Avoid SceneTree traversal
 Next: Efficient data structure for renderer based on
ShapeList
Summary
Introduction SceneGraph SceneTree ShapeList Renderer
11
Renderer Data Structures
Program
Shader
Vertex
Shader
Fragment
ParameterDescription
Camera
ParameterDescription
Light
ParameterDescription
Matrices
ParameterDescription
Material
ambient
diffuse
specular
texture
Name Type Arraysize
vec3
vec3
vec3
Sampler
2
2
2
0
ParamaterDescriptionShape
Program
‚colored‘
Geometry
Shape1
ParameterData
Camera1
ParameterData
Lightset 1
ParameterData
Transform 1
ParameterData
red
Introduction SceneGraph SceneTree ShapeList Renderer
12
Example Parameter Grouping
Shader independent globals, i.e. camera
Object parameters, i.e. position/rotation/scaling
Material handles, i.e. textures and buffers
Material raw values, i.e. float, int and bool
Light, i.e. light sources and shadow maps
Shader dependent globals, i.e. environment map
always
frequent
constant
rare
Introduction SceneGraph SceneTree ShapeList Renderer
Parameters Frequency
13
Shapes
coloredGroup
by
Program
shapelist
Rendering structures
‚colored‘ ‚textured‘ ‚colored‘ ‚colored‘ ‚textured‘
‚colored‘ ‚colored‘‚colored‘
Shapes
textured
‚textured‘ ‚textured‘
Introduction SceneGraph SceneTree ShapeList Renderer
Sort by ParameterData
14
ParameterData Cache
 Cache is a big char[] with all ParameterData.
 ParameterData are sorted by first usage.
 Parameters are converted to Target-API datatype, i.e.
— Int8 to int32, TextureHandle to bindless texture...
 Updating parameters is only playback of data in memory,
no conditionals.
 Filter for used parameters to reduce cache size
Parameters
colored
red blue
Parameters
textured
wood marble
Introduction SceneGraph SceneTree ShapeList Renderer
15
Vertex Attribute Cache
 Big char[] with vertex attribute pointers
— Bindless pointers, VBOs or VAB streams
 Each set of attributes stored only once
 Ordered by first usage
 Attributes required by program are known
— Store only used attributes in Cache
— Useful for special passes like depth pass where only pos is
required
attributes
colored
pos
normal
pos
normal
pos
normal
Introduction SceneGraph SceneTree ShapeList Renderer
16
Renderer Cache complete
Parameters
colored
Phong red Phong blue
Shapes
colored
‚colored‘ ‚colored‘‚colored‘
Attributes
colored
pos
normal
pos
normal
pos
normal
Introduction SceneGraph SceneTree ShapeList Renderer
foreach(shape) {
if (visible(shape)) {
if (changed(parameters)) render(parameters);
if (changed(attributes)) render(attributes);
render(shape);
}
}
17
 CPU boundedness improved (application)
– Recomputation of attributes (transforms)
– Deep hierarchies: traversal expensive
– Unsorted rendering, lot of state changes
 CPU boundedness remaining (OpenGL usage)
– Large objects split up into a lot of little pieces,
increased draw call count
Achievements
ShapeList
Renderer
SceneTree
RendererCache
OpenGL
implementation
Renderer
18
 Avoid data redundancy
– Data stored once, referenced multiple times
– Update only once (less host to gpu transfers)
 Increase batching potential
– Further cuts api calls
– Less driver CPU work
 Minimize CPU/GPU interaction
– Allow GPU to update its own data
– Lower api usage when scene is changed little
– E.g. GPU-based culling
Enabling Hardware Scalability
19
 Avoids classic SceneGraph design
 Geometry
– Vertex/IndexBuffer
– BoundingBox
– Divided into parts (CAD features)
 Material
 Node Hierarchy
 Object
– Node and Geometry reference
– For each Geometry part
 Material reference
 Enabled state
model courtesy of PTC
OpenGL Research Framework
 99000 total parts, 3.8 Mtris, 5.1 Mverts
 700 geometries, 128 materials
 2000 objects
20
 Kepler Quadro K5000, i7
 vbo bind and drawcall per part, i.e. 99 000
drawcalls
scene draw time > 38 ms (CPU bound)
 vbo bind per geometry, drawcalls per part
scene draw time > 14 ms (CPU bound)
 All subsequent techniques raise perf significantly
scene draw time < 6 ms
1.8 ms with occlusion culling
Performance baseline
21
 MultiDraw (1.x)
– Render ranges from current VBO/IBO
– Single drawcall for many distinct objects
– Reduces overhead for low complexity objects
 ARB_draw_indirect (4.x)
 ARB_multi_draw_indirect
– Store drawcall information on GPU or HOST
– Let GPU create/modify GPU buffers
Drawcall Reduction
DrawElementsIndirect
{
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLint baseVertex;
GLuint baseInstance;
}
22
– All use multidraw capabilites to
render across gaps
– BATCHED use CPU generated list of
combined parts with same state
 Object‘s part cache must be rebuilt
based on material/enabled state
– INDIVIDUAL stay on per-part level
 No caches, can update assignment or
cmd buffers directly
Drawing Techniques
a b c
a+b c
Parts with different materials in geometry
Grouped and „grown“ drawcalls
Single call, encode material/matrix
assignment via vertex attribute
23
 Group parameters by frequency of change
 Generating shader strings allows different storage
backend for „uniforms“
Parameters
Effect "Phong {
Group „material" (many) {
vec4 "ambient"
vec4 "diffuse"
vec4 "specular"
}
Group „view" (few) {
vec4 „viewProjTM„
}
Group „object" (many) {
mat4 „worldTM„
}
... Code ...
}
 OpenGL 2 uniforms
 OpenGL 3,4 buffers
 NVIDIA bindless technology...
24
 GL2 approach:
– Avoid many small
uniforms
– Arrays of uniforms,
grouped by frequency of
update, tightly-packed
Parameters
uniform mat4 worldMatrices[2];
uniform vec4 materialData[8];
#define matrix_world worldMatrices[0]
#define matrix_worldIT worldMatrices[1]
#define material_diffuse materialData[0]
#define material_emissive materialData[1]
#define material_gloss materialData[2].x
// GL3 can use floatBitsToInt and friends
// for free reinterpret casts within
// macros
...
wPos = matrix_world * oPos;
...
// in fragment shader
color = material_diffuse +
material_emissive;
...
25
 GL4 approach:
– TextureBufferObject
(TBO) for matrices
– UniformBufferObject
(UBO) with array data
to save costly binds
– Assignment indices
passed as vertex
attribute
Parameters in vec4 oPos;
uniform samplerBuffer matrixBuffer;
uniform materialBuffer {
Material materials[512];
};
in ivec2 vAssigns;
flat out ivec2 fAssigns;
// in vertex shader
fAssigns = vAssigns;
worldTM = getMatrix (matrixBuffer,
vAssigns.x);
wPos = worldTM * oPos;
...
// in fragment shader
color = materials[fAssigns.y].color;
...
26
setupSceneMatrixAndMaterialBuffer (scene);
foreach (obj in scene) {
if ( isVisible(obj) ) {
setupDrawGeometryVertexBuffer (obj);
// iterate over different materials used
foreach ( batch in obj.materialCaches) {
glVertexAttribI2i (indexAttr, batch.materialIndex, matrixIndex);
glMultiDrawElements (GL_TRIANGLES, batch.counts, GL_UNSIGNED_INT ,
batch.offsets,batched.numUsed);
}
}
}
OpenGL 4.x approach
27
glVertexAttribDivisor == 0 : VArray[ gl_VertexID + baseVertex ]
glVertexAttribDivisor != 0 : VArray[ gl_InstanceID / VDivisor + baseInstance ]
VArray[ 0 / 1 + baseInstance ]
Material & Matrix Index
VertexBuffer (divisor:1)
Position & Normal
VertexBuffer (divisor:0)
...
instanceCount = 1
baseInstance = 0
...
instanceCount = 1
baseInstance = 1
MultiDrawIndirect
Buffer
Per drawcall vertex attribute
vertex attributes
fetched for last
vertex in second
drawcall
baseinstance = 1
28
OpenGL 4.2+ indirect approach
...
foreach ( obj in scene.objects ) {
...
// instead of glVertexAttribI2i calls and a loop
// we use the baseInstance for the attribute
// bind special assignment buffer as vertex attribute
glBindBuffer ( GL_ARRAY_BUFFER, obj->assignBuffer);
glVertexAttribIPointer (indexAttr, 2, GL_INT, . . . );
// draw everything in one go
glMultiDrawElementsIndirect ( GL_TRIANGLES, GL_UNSIGNED_INT,
obj->indirectOffset, obj->numIndirects, 0 );
}
29
 ARB_vertex_attrib_binding
(VAB)
– Avoids many buffer changes
– Separates format from data
– Bind multiple vertex
attributes to one buffer
 NV_vertex_buffer_unified_
memory (VBUM)
– Allows very fast switching
through GPU pointers
Vertex Setup
/* setup once, similar to glVertexAttribPointer
but with relative offset last */
glVertexAttribFormat (ATTR_NORMAL, 3,
GL_FLOAT, GL_TRUE, offsetof(Vertex,normal));
glVertexAttribFormat (ATTR_POS, 3,
GL_FLOAT, GL_FALSE, offsetof(Vertex,pos));
// bind to stream
glVertexAttribBinding (ATTR_NORMAL, 0);
glVertexAttribBinding (ATTR_POS, 0);
// switch single stream buffer
glBindVertexBuffer (0, bufID, 0, sizeof(Vertex));
// NV_vertex_buffer_unified_memory
// enable once and set stride
glEnableClientState (GL_VERTEX...NV);...
glBindVertexBuffer (0, 0, 0, sizeof(Vertex));
// switch single buffer via pointer
glBufferAddressRangeNV (GL_VERTEX...,0,bufADDR,
bufSize);
30
0
200
400
600
800
1000
VBO VAB VAB+VBUM BINDLESS
INDIRECT HOST
VAB+VBUM
BINDLESS
INDIRECT GPU
VAB+VBUM
Timeinmicroseocnds[us]
– Vertex/Index setup inside MultiDrawIndirect command
NV_bindless_multidraw_indirect
one GL call to draw entire scene
GPU benefit depends on triangles
per drawcall (> ~ 500)
NV_bindless_multidraw_indirect
~ 2400 drawcalls, GL4 BATCHED style
Lower is
Better
Effect on CPU time
31
0
1000
2000
3000
4000
5000
6000
K KB K KB K KB K KB
GL4 INDIRECTHOST
INDIVIDUAL
GL4 INDIRECTGPU
INDIVIDUAL
GL4 BATCHED GL2 BATCHED
Timeinmicroseconds[us]
Bindless (green) always reduces CPU, and
may help framerate/GPU a bit
K = Kepler 5000, regular VBO
KB = Kepler 5000, VBUM + VAB
Lower is
Better
GPU
CPU
99.000 2.400 hw drawcalls
2.000 2.400 sw drawcalls
32
0
1000
2000
3000
4000
5000
6000
K KB K KB K KB K KB
GL4 INDIRECTHOST
INDIVIDUAL
GL4 INDIRECTGPU
INDIVIDUAL
GL4 BATCHED GL2 BATCHED
Timeinmicroseconds[us]
GPU
CPU
MultiDrawIndirect achieves almost 20 Mio drawcalls per
second (2000 VBO changes, „only“ 1/3 perf lost).
GPU-buffered commands save lots of CPU time
Lower is
Better
99.000 2.400 hw drawcalls
2.000 2.400 sw drawcalls
Scene-dependent!
INDIVIDUAL could be as fast
if enough work per drawcall
K = Kepler 5000, regular VBO
KB = Kepler 5000, VBUM + VAB
33
0
1000
2000
3000
4000
5000
6000
K KB K KB K KB K KB
GL4 INDIRECTHOST
INDIVIDUAL
GL4 INDIRECTGPU
INDIVIDUAL
GL4 BATCHED GL2 BATCHED
Timeinmicroseconds[us]
GL2 uniforms beat paletted UBO a bit in GPU, but are slower on
CPU side. (1 glUniform call with 8x vec4, vs indexed UBO)
Lower is
Better
GPU
CPU
Scene-dependent!
GL4 better when more
materials changed per object
K = Kepler 5000, regular VBO
KB = Kepler 5000, VBUM + VAB
99.000 2.400 hw drawcalls
2.000 2.400 sw drawcalls
34
 Share geometry buffers for batching
 Group parameters for fast updating
 MultiDraw/Indirect for keeping objects
independent or remove additional loops
– baseInstance to provide unique
index/assignments for drawcall
 Bindless to reduce validation
overhead/add flexibility
Recap
35
 GPU friendly processing
– Matrix and bbox buffer, object buffer
– XFB/Compute or „invisible“ rendering
– Vs. old techniques: Single GPU job for ALL objects!
 Results
– „Readback“ GPU to Host
 Can use GPU to pack into bit stream
– „Indirect“ GPU to GPU
 Set DrawIndirect‘s instanceCount to 0 or 1
GPU Culling Basics
0,1,0,1,1,1,0,0,0
buffer cmdBuffer{
Command cmds[];
};
...
cmds[obj].instanceCount = visible;
36
 OpenGL 4.2+
– Depth-Pass
– Raster „invisible“ bounding boxes
 Disable Color/Depth writes
 Geometry Shader to create the three
visible box sides
 Depth buffer discards occluded fragments
(earlyZ...)
 Fragment Shader writes output:
visible[objindex] = 1
Occlusion Culling
// GLSL fragment shader
// from ARB_shader_image_load_store
layout(early_fragment_tests) in;
buffer visibilityBuffer{
int visibility[];
};
flat in int objID;
void main(){
visibility[objID] = 1;
}
// buffer would have been cleared
// to 0 before
Passing bbox fragments
enable object
Algorithm by
Evgeny Makarov, NVIDIA
depth
buffer
37
 Exploit that majority of objects don‘t change
much relative to camera
 Draw each object only once (vertex/drawcall-
bound)
– Render last visible, fully shaded
(last)
– Test all against current depth:
(visible)
– Render newly added visible:
none, if no spatial changes made
(~last) & (visible)
– (last) = (visible)
Temporal Coherence
frame: f – 1
frame: f
last visible
bboxes occluded
bboxes pass depth
(visible)
new visible
invisible
visible
camera
camera
moved
38
Culling Readback vs Indirect
0
500
1000
1500
2000
2500
readback indirect NVindirect
Timeinmicroseconds[us] In the „draw new visible“ phase indirect cannot
benefit of „nothing to setup/draw“ in advance,
still processes „empty“ lists
For readback results, CPU has to
wait for GPU idle
37% faster with
culling
33% faster with
culling 37% faster with
culling
NV_bindless_
multidraw_indirect
saves CPU and bit of
GPU time
Scene-dependent,
i.e. triangles per
drawcall and # of
„invisible“Lower is
Better
GL4 BATCHED style
GPU
CPU
39
 Temporal culling very useful for object/vertex-boundedness
– Can also apply for Z-pass...
 Readback vs Indirect
– Readback variant „easier“ to be faster (no setups...), but syncs!
– NV_bindless_multidraw benefit depends on scene (VBO changes
and primitives per drawcall)
 Working towards GPU autonomous system
– (NV_bindless)/ARB_multidraw_indirect as mechanism for GPU
creating its own work, research and feature work in progresss
Culling Results
40
 Thank you!
– Contact
 ckubisch@nvidia.com
 matavenrath@nvidia.com
glFinish();
41
 Family of extensions to use
native handles/addresses
 NV_vertex_buffer_unified_memory
 NV_bindless_multidraw_indirect
 NV_shader_buffer_load/store
– Pointers in GLSL
 NV_bindless_texture
– No more unit restrictions
– References inside buffers
NVIDIA Bindless Technology
// GLSL with true pointers
uniform MyStruct* mystructs;
// API
glUniformui64NV (bufferLocation,
bufferADDR);
texHDL = glGetTextureHandleNV (tex);
// later instead of glBindTexture
glUniformHandleui64NV (texLocation,
texHDL)
// GLSL
// can also store textures in resources
uniform materialBuffer {
sampler2D manyTextures [LARGE];
}
42
0
500
1000
1500
2000
2500
Readback
GPU
Readback
CPU
Indirect
GPU
Indirect
CPU
NVIndirect
GPU
NVIndirect
CPU
Timeinmicroseconds
6. Draw New Visible
5. Update Internals
4. Occlusion Cull
3. Draw Last Visible
2. Update Internals
1. Frustum Cull
Nothing „new“ to draw, but CPU doesn‘t
know, still setting things up, GPU runs
thru „empty“ cmd buffer
For readback results, CPU has to
wait for GPU idle
Culling
Readback
vs
Indirect 432 fps with culling
315 without 387 fps with culling
289 without
429 fps with culling
313 without
Special bindless indirect
version can save lots of
CPU and a bit GPU costs
for drawing the scene
with a single big cmd
buffer

More Related Content

What's hot

A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
Philip Hammer
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
Tristan Lorach
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
Electronic Arts / DICE
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
Electronic Arts / DICE
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Electronic Arts / DICE
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
Ki Hyunwoo
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
Johan Andersson
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
Tiago Sousa
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
Electronic Arts / DICE
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
Mark Kilgard
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space Marine
Pope Kim
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
Michele Giacalone
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
Electronic Arts / DICE
 
Lighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow FallLighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow Fall
Guerrilla
 

What's hot (20)

A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
Parallel Futures of a Game Engine
Parallel Futures of a Game EngineParallel Futures of a Game Engine
Parallel Futures of a Game Engine
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space Marine
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Lighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow FallLighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow Fall
 

Viewers also liked

Game engine introduction and approach
Game engine introduction and approachGame engine introduction and approach
Game engine introduction and approachDuy Tan Geek
 
Game Engine Overview
Game Engine OverviewGame Engine Overview
Game Engine OverviewSharad Mitra
 
Game Engine Architecture
Game Engine ArchitectureGame Engine Architecture
Game Engine Architecture
Attila Jenei
 
What Is A Game Engine
What Is A Game EngineWhat Is A Game Engine
What Is A Game Engine
Seth Sivak
 
Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4
jrouwe
 
Design your 3d game engine
Design your 3d game engineDesign your 3d game engine
Design your 3d game engine
Daosheng Mu
 
Ray tracing
Ray tracingRay tracing
Ray tracing
Yogesh jatin Gupta
 
Ray tracing
Ray tracingRay tracing
Ray tracing
Muhammad Azam
 
GRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityGRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and Radiosity
Michael Heron
 
Unity - Game Engine
Unity - Game EngineUnity - Game Engine
Unity - Game Engine
Geeks Anonymes
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and ProgrammingSumit Jain
 
Pitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYPitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONY
Anaya Medias Swiss
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
Mark Kilgard
 

Viewers also liked (15)

Game engine introduction and approach
Game engine introduction and approachGame engine introduction and approach
Game engine introduction and approach
 
Game Engine Overview
Game Engine OverviewGame Engine Overview
Game Engine Overview
 
Game Engine Architecture
Game Engine ArchitectureGame Engine Architecture
Game Engine Architecture
 
What Is A Game Engine
What Is A Game EngineWhat Is A Game Engine
What Is A Game Engine
 
Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4Killzone Shadow Fall: Threading the Entity Update on PS4
Killzone Shadow Fall: Threading the Entity Update on PS4
 
Design your 3d game engine
Design your 3d game engineDesign your 3d game engine
Design your 3d game engine
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 
GRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityGRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and Radiosity
 
Unity - Game Engine
Unity - Game EngineUnity - Game Engine
Unity - Game Engine
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and Programming
 
Ray tracing
 Ray tracing Ray tracing
Ray tracing
 
Pitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONYPitfalls of Object Oriented Programming by SONY
Pitfalls of Object Oriented Programming by SONY
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
Ray tracing
Ray tracingRay tracing
Ray tracing
 

Similar to Advanced Scenegraph Rendering Pipeline

D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Penn graphics
Penn graphicsPenn graphics
Penn graphics
floored
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
mistercteam
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
Mark Kilgard
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
💻 Anton Gerdelan
 
Introduction To Geometry Shaders
Introduction To Geometry ShadersIntroduction To Geometry Shaders
Introduction To Geometry Shaderspjcozzi
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
Mark Kilgard
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shadersgueste52f1b
 
Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Matthias Trapp
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
Owen Wu
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
mistercteam
 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
Mark Kilgard
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
mistercteam
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
Prabindh Sundareson
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
AMD
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
AMD Developer Central
 
GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?
Bartlomiej Filipek
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 

Similar to Advanced Scenegraph Rendering Pipeline (20)

D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Penn graphics
Penn graphicsPenn graphics
Penn graphics
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Introduction To Geometry Shaders
Introduction To Geometry ShadersIntroduction To Geometry Shaders
Introduction To Geometry Shaders
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shaders
 
Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)Rendering of Complex 3D Treemaps (GRAPP 2013)
Rendering of Complex 3D Treemaps (GRAPP 2013)
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
Chapter-3.pdf
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
2DCompsitionEngine
2DCompsitionEngine2DCompsitionEngine
2DCompsitionEngine
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Advanced Scenegraph Rendering Pipeline

  • 1. Advanced Scenegraph Rendering Pipeline Markus Tavenrath – NVIDIA – matavenrath@nvidia.com Christoph Kubisch – NVIDIA – ckubisch@nvidia.com
  • 2. 2  Traditional approach is render while traversing a SceneGraph  Scene complexity increases – Deep hierarchies, traversal expensive – Large objects split up into a lot of little pieces, increased draw call count – Unsorted rendering, lot of state changes  CPU becomes bottleneck when rendering those scenes SceneGraph Rendering models courtesy of PTC Introduction SceneGraph SceneTree ShapeList Renderer
  • 3. 3 Overview Introduction SceneGraph SceneTree ShapeList Renderer G0 T0 T1 T2 S1 S2 G1 T3 S0 G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ ShapeList S1 S2S0 S1‘ S2‘ RendererCache S1 S2S0 S1‘ S2‘ SceneGraph SceneTree ShapeList Renderer Gi Group Ti Transform Si Shape
  • 4. 4 SceneGraph G0 T0 T1 T2 S1 S2 G1 T3 S0  SceneGraph is DAG  No unique path to a node – Cannot efficiently cache path-dependent data per node  Traversal runs over 14 nodes for rendering.  Processed 6 Transform Nodes – 6 matrix/matrix multiplications and inversions  Nodes are usually ‚large‘ and not linear in memory – Each node access generates at least one, most likely cache misses Gi Group Ti Transform Si Shape Introduction SceneGraph SceneTree ShapeList Renderer
  • 5. 5 SceneTree construction G0 T0 T1 T2 S1 S2 G1 T3 S0 Gi Group Ti Transform Si Shape Introduction SceneGraph SceneTree ShapeList Renderer G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ G0 -> (G0) T0 -> (T0) T1 -> (T1) S0 -> (S0) G1 -> (G1) T2 -> (T2) S1 -> (S1) T3 -> (T3) S2 -> (S2) G0 -> (G0) T0 -> (T0) T1 -> (T1) S0 -> (S0) G1 -> (G1,G1‘) T2 -> (T2,T2‘) S1 -> (S1,S1‘) T3 -> (T3,T3‘) S2 -> (S2,S2‘)  Observer based synchronization
  • 6. 6  SceneTree has unique path to each node  Store accumulated attributes like transforms or visibility in each Node  Trade memory for performance – 64-byte per node, 100k nodes ~6MB – Transforms stored separate vector  Traversal still processes 14 nodes. Gi Group Ti Transform Si Shape G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ SceneTree Introduction SceneGraph SceneTree ShapeList Renderer
  • 7. 7 SceneTree invalidate attributes cache  Keep dirty flags per node  Keep dirty vector per flag  SceneGraph change notifications invalidated nodes – If not dirty, mark dirty and add to dirty vector – O(1) operation, no sorting required upon changes  Before rendering a frame process dirty vectorDirty vector T1T3 T3‘ G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ Introduction SceneGraph SceneTree ShapeList Renderer
  • 8. 8 T3‘ T1 T2‘T3 SceneTree validate attribute cache  Walk through dirty vector — Node marked dirty -> search top dirty — Validate subtree from top dirty  Validation example — T3 dirty, traverse up to root node  T3 top dirty node, validate T3 subtree — T3‘ dirty, traverse up to root node  T1 top dirty node, validate T1 subtree — T1 not dirty  No work to do Dirty vector T1T3 T3‘ G0 T0 T1 T2 S1 S2 G1 T3 S0 T3‘ S1‘ S2‘ G1‘ T3 T1T3‘ Introduction SceneGraph SceneTree ShapeList Renderer
  • 9. 9 SceneTree to ShapeList  Add Events for ShapeList generation – addShape(Shape) – removeShape(Shape) ShapeList S1 S2S0 S1‘ S2‘ Gi Group Ti Transform Si Shape G0 T0 T1 T2 S1 S2 G1 T3 S0 T2‘ S1‘ S2‘ G1‘ T3‘ Introduction SceneGraph SceneTree ShapeList Renderer
  • 10. 10  SceneGraph to SceneTree synchronization – Store accumulated data per node instance  SceneTree to ShapeList synchronization – Avoid SceneTree traversal  Next: Efficient data structure for renderer based on ShapeList Summary Introduction SceneGraph SceneTree ShapeList Renderer
  • 11. 11 Renderer Data Structures Program Shader Vertex Shader Fragment ParameterDescription Camera ParameterDescription Light ParameterDescription Matrices ParameterDescription Material ambient diffuse specular texture Name Type Arraysize vec3 vec3 vec3 Sampler 2 2 2 0 ParamaterDescriptionShape Program ‚colored‘ Geometry Shape1 ParameterData Camera1 ParameterData Lightset 1 ParameterData Transform 1 ParameterData red Introduction SceneGraph SceneTree ShapeList Renderer
  • 12. 12 Example Parameter Grouping Shader independent globals, i.e. camera Object parameters, i.e. position/rotation/scaling Material handles, i.e. textures and buffers Material raw values, i.e. float, int and bool Light, i.e. light sources and shadow maps Shader dependent globals, i.e. environment map always frequent constant rare Introduction SceneGraph SceneTree ShapeList Renderer Parameters Frequency
  • 13. 13 Shapes coloredGroup by Program shapelist Rendering structures ‚colored‘ ‚textured‘ ‚colored‘ ‚colored‘ ‚textured‘ ‚colored‘ ‚colored‘‚colored‘ Shapes textured ‚textured‘ ‚textured‘ Introduction SceneGraph SceneTree ShapeList Renderer Sort by ParameterData
  • 14. 14 ParameterData Cache  Cache is a big char[] with all ParameterData.  ParameterData are sorted by first usage.  Parameters are converted to Target-API datatype, i.e. — Int8 to int32, TextureHandle to bindless texture...  Updating parameters is only playback of data in memory, no conditionals.  Filter for used parameters to reduce cache size Parameters colored red blue Parameters textured wood marble Introduction SceneGraph SceneTree ShapeList Renderer
  • 15. 15 Vertex Attribute Cache  Big char[] with vertex attribute pointers — Bindless pointers, VBOs or VAB streams  Each set of attributes stored only once  Ordered by first usage  Attributes required by program are known — Store only used attributes in Cache — Useful for special passes like depth pass where only pos is required attributes colored pos normal pos normal pos normal Introduction SceneGraph SceneTree ShapeList Renderer
  • 16. 16 Renderer Cache complete Parameters colored Phong red Phong blue Shapes colored ‚colored‘ ‚colored‘‚colored‘ Attributes colored pos normal pos normal pos normal Introduction SceneGraph SceneTree ShapeList Renderer foreach(shape) { if (visible(shape)) { if (changed(parameters)) render(parameters); if (changed(attributes)) render(attributes); render(shape); } }
  • 17. 17  CPU boundedness improved (application) – Recomputation of attributes (transforms) – Deep hierarchies: traversal expensive – Unsorted rendering, lot of state changes  CPU boundedness remaining (OpenGL usage) – Large objects split up into a lot of little pieces, increased draw call count Achievements ShapeList Renderer SceneTree RendererCache OpenGL implementation Renderer
  • 18. 18  Avoid data redundancy – Data stored once, referenced multiple times – Update only once (less host to gpu transfers)  Increase batching potential – Further cuts api calls – Less driver CPU work  Minimize CPU/GPU interaction – Allow GPU to update its own data – Lower api usage when scene is changed little – E.g. GPU-based culling Enabling Hardware Scalability
  • 19. 19  Avoids classic SceneGraph design  Geometry – Vertex/IndexBuffer – BoundingBox – Divided into parts (CAD features)  Material  Node Hierarchy  Object – Node and Geometry reference – For each Geometry part  Material reference  Enabled state model courtesy of PTC OpenGL Research Framework  99000 total parts, 3.8 Mtris, 5.1 Mverts  700 geometries, 128 materials  2000 objects
  • 20. 20  Kepler Quadro K5000, i7  vbo bind and drawcall per part, i.e. 99 000 drawcalls scene draw time > 38 ms (CPU bound)  vbo bind per geometry, drawcalls per part scene draw time > 14 ms (CPU bound)  All subsequent techniques raise perf significantly scene draw time < 6 ms 1.8 ms with occlusion culling Performance baseline
  • 21. 21  MultiDraw (1.x) – Render ranges from current VBO/IBO – Single drawcall for many distinct objects – Reduces overhead for low complexity objects  ARB_draw_indirect (4.x)  ARB_multi_draw_indirect – Store drawcall information on GPU or HOST – Let GPU create/modify GPU buffers Drawcall Reduction DrawElementsIndirect { GLuint count; GLuint instanceCount; GLuint firstIndex; GLint baseVertex; GLuint baseInstance; }
  • 22. 22 – All use multidraw capabilites to render across gaps – BATCHED use CPU generated list of combined parts with same state  Object‘s part cache must be rebuilt based on material/enabled state – INDIVIDUAL stay on per-part level  No caches, can update assignment or cmd buffers directly Drawing Techniques a b c a+b c Parts with different materials in geometry Grouped and „grown“ drawcalls Single call, encode material/matrix assignment via vertex attribute
  • 23. 23  Group parameters by frequency of change  Generating shader strings allows different storage backend for „uniforms“ Parameters Effect "Phong { Group „material" (many) { vec4 "ambient" vec4 "diffuse" vec4 "specular" } Group „view" (few) { vec4 „viewProjTM„ } Group „object" (many) { mat4 „worldTM„ } ... Code ... }  OpenGL 2 uniforms  OpenGL 3,4 buffers  NVIDIA bindless technology...
  • 24. 24  GL2 approach: – Avoid many small uniforms – Arrays of uniforms, grouped by frequency of update, tightly-packed Parameters uniform mat4 worldMatrices[2]; uniform vec4 materialData[8]; #define matrix_world worldMatrices[0] #define matrix_worldIT worldMatrices[1] #define material_diffuse materialData[0] #define material_emissive materialData[1] #define material_gloss materialData[2].x // GL3 can use floatBitsToInt and friends // for free reinterpret casts within // macros ... wPos = matrix_world * oPos; ... // in fragment shader color = material_diffuse + material_emissive; ...
  • 25. 25  GL4 approach: – TextureBufferObject (TBO) for matrices – UniformBufferObject (UBO) with array data to save costly binds – Assignment indices passed as vertex attribute Parameters in vec4 oPos; uniform samplerBuffer matrixBuffer; uniform materialBuffer { Material materials[512]; }; in ivec2 vAssigns; flat out ivec2 fAssigns; // in vertex shader fAssigns = vAssigns; worldTM = getMatrix (matrixBuffer, vAssigns.x); wPos = worldTM * oPos; ... // in fragment shader color = materials[fAssigns.y].color; ...
  • 26. 26 setupSceneMatrixAndMaterialBuffer (scene); foreach (obj in scene) { if ( isVisible(obj) ) { setupDrawGeometryVertexBuffer (obj); // iterate over different materials used foreach ( batch in obj.materialCaches) { glVertexAttribI2i (indexAttr, batch.materialIndex, matrixIndex); glMultiDrawElements (GL_TRIANGLES, batch.counts, GL_UNSIGNED_INT , batch.offsets,batched.numUsed); } } } OpenGL 4.x approach
  • 27. 27 glVertexAttribDivisor == 0 : VArray[ gl_VertexID + baseVertex ] glVertexAttribDivisor != 0 : VArray[ gl_InstanceID / VDivisor + baseInstance ] VArray[ 0 / 1 + baseInstance ] Material & Matrix Index VertexBuffer (divisor:1) Position & Normal VertexBuffer (divisor:0) ... instanceCount = 1 baseInstance = 0 ... instanceCount = 1 baseInstance = 1 MultiDrawIndirect Buffer Per drawcall vertex attribute vertex attributes fetched for last vertex in second drawcall baseinstance = 1
  • 28. 28 OpenGL 4.2+ indirect approach ... foreach ( obj in scene.objects ) { ... // instead of glVertexAttribI2i calls and a loop // we use the baseInstance for the attribute // bind special assignment buffer as vertex attribute glBindBuffer ( GL_ARRAY_BUFFER, obj->assignBuffer); glVertexAttribIPointer (indexAttr, 2, GL_INT, . . . ); // draw everything in one go glMultiDrawElementsIndirect ( GL_TRIANGLES, GL_UNSIGNED_INT, obj->indirectOffset, obj->numIndirects, 0 ); }
  • 29. 29  ARB_vertex_attrib_binding (VAB) – Avoids many buffer changes – Separates format from data – Bind multiple vertex attributes to one buffer  NV_vertex_buffer_unified_ memory (VBUM) – Allows very fast switching through GPU pointers Vertex Setup /* setup once, similar to glVertexAttribPointer but with relative offset last */ glVertexAttribFormat (ATTR_NORMAL, 3, GL_FLOAT, GL_TRUE, offsetof(Vertex,normal)); glVertexAttribFormat (ATTR_POS, 3, GL_FLOAT, GL_FALSE, offsetof(Vertex,pos)); // bind to stream glVertexAttribBinding (ATTR_NORMAL, 0); glVertexAttribBinding (ATTR_POS, 0); // switch single stream buffer glBindVertexBuffer (0, bufID, 0, sizeof(Vertex)); // NV_vertex_buffer_unified_memory // enable once and set stride glEnableClientState (GL_VERTEX...NV);... glBindVertexBuffer (0, 0, 0, sizeof(Vertex)); // switch single buffer via pointer glBufferAddressRangeNV (GL_VERTEX...,0,bufADDR, bufSize);
  • 30. 30 0 200 400 600 800 1000 VBO VAB VAB+VBUM BINDLESS INDIRECT HOST VAB+VBUM BINDLESS INDIRECT GPU VAB+VBUM Timeinmicroseocnds[us] – Vertex/Index setup inside MultiDrawIndirect command NV_bindless_multidraw_indirect one GL call to draw entire scene GPU benefit depends on triangles per drawcall (> ~ 500) NV_bindless_multidraw_indirect ~ 2400 drawcalls, GL4 BATCHED style Lower is Better Effect on CPU time
  • 31. 31 0 1000 2000 3000 4000 5000 6000 K KB K KB K KB K KB GL4 INDIRECTHOST INDIVIDUAL GL4 INDIRECTGPU INDIVIDUAL GL4 BATCHED GL2 BATCHED Timeinmicroseconds[us] Bindless (green) always reduces CPU, and may help framerate/GPU a bit K = Kepler 5000, regular VBO KB = Kepler 5000, VBUM + VAB Lower is Better GPU CPU 99.000 2.400 hw drawcalls 2.000 2.400 sw drawcalls
  • 32. 32 0 1000 2000 3000 4000 5000 6000 K KB K KB K KB K KB GL4 INDIRECTHOST INDIVIDUAL GL4 INDIRECTGPU INDIVIDUAL GL4 BATCHED GL2 BATCHED Timeinmicroseconds[us] GPU CPU MultiDrawIndirect achieves almost 20 Mio drawcalls per second (2000 VBO changes, „only“ 1/3 perf lost). GPU-buffered commands save lots of CPU time Lower is Better 99.000 2.400 hw drawcalls 2.000 2.400 sw drawcalls Scene-dependent! INDIVIDUAL could be as fast if enough work per drawcall K = Kepler 5000, regular VBO KB = Kepler 5000, VBUM + VAB
  • 33. 33 0 1000 2000 3000 4000 5000 6000 K KB K KB K KB K KB GL4 INDIRECTHOST INDIVIDUAL GL4 INDIRECTGPU INDIVIDUAL GL4 BATCHED GL2 BATCHED Timeinmicroseconds[us] GL2 uniforms beat paletted UBO a bit in GPU, but are slower on CPU side. (1 glUniform call with 8x vec4, vs indexed UBO) Lower is Better GPU CPU Scene-dependent! GL4 better when more materials changed per object K = Kepler 5000, regular VBO KB = Kepler 5000, VBUM + VAB 99.000 2.400 hw drawcalls 2.000 2.400 sw drawcalls
  • 34. 34  Share geometry buffers for batching  Group parameters for fast updating  MultiDraw/Indirect for keeping objects independent or remove additional loops – baseInstance to provide unique index/assignments for drawcall  Bindless to reduce validation overhead/add flexibility Recap
  • 35. 35  GPU friendly processing – Matrix and bbox buffer, object buffer – XFB/Compute or „invisible“ rendering – Vs. old techniques: Single GPU job for ALL objects!  Results – „Readback“ GPU to Host  Can use GPU to pack into bit stream – „Indirect“ GPU to GPU  Set DrawIndirect‘s instanceCount to 0 or 1 GPU Culling Basics 0,1,0,1,1,1,0,0,0 buffer cmdBuffer{ Command cmds[]; }; ... cmds[obj].instanceCount = visible;
  • 36. 36  OpenGL 4.2+ – Depth-Pass – Raster „invisible“ bounding boxes  Disable Color/Depth writes  Geometry Shader to create the three visible box sides  Depth buffer discards occluded fragments (earlyZ...)  Fragment Shader writes output: visible[objindex] = 1 Occlusion Culling // GLSL fragment shader // from ARB_shader_image_load_store layout(early_fragment_tests) in; buffer visibilityBuffer{ int visibility[]; }; flat in int objID; void main(){ visibility[objID] = 1; } // buffer would have been cleared // to 0 before Passing bbox fragments enable object Algorithm by Evgeny Makarov, NVIDIA depth buffer
  • 37. 37  Exploit that majority of objects don‘t change much relative to camera  Draw each object only once (vertex/drawcall- bound) – Render last visible, fully shaded (last) – Test all against current depth: (visible) – Render newly added visible: none, if no spatial changes made (~last) & (visible) – (last) = (visible) Temporal Coherence frame: f – 1 frame: f last visible bboxes occluded bboxes pass depth (visible) new visible invisible visible camera camera moved
  • 38. 38 Culling Readback vs Indirect 0 500 1000 1500 2000 2500 readback indirect NVindirect Timeinmicroseconds[us] In the „draw new visible“ phase indirect cannot benefit of „nothing to setup/draw“ in advance, still processes „empty“ lists For readback results, CPU has to wait for GPU idle 37% faster with culling 33% faster with culling 37% faster with culling NV_bindless_ multidraw_indirect saves CPU and bit of GPU time Scene-dependent, i.e. triangles per drawcall and # of „invisible“Lower is Better GL4 BATCHED style GPU CPU
  • 39. 39  Temporal culling very useful for object/vertex-boundedness – Can also apply for Z-pass...  Readback vs Indirect – Readback variant „easier“ to be faster (no setups...), but syncs! – NV_bindless_multidraw benefit depends on scene (VBO changes and primitives per drawcall)  Working towards GPU autonomous system – (NV_bindless)/ARB_multidraw_indirect as mechanism for GPU creating its own work, research and feature work in progresss Culling Results
  • 40. 40  Thank you! – Contact  ckubisch@nvidia.com  matavenrath@nvidia.com glFinish();
  • 41. 41  Family of extensions to use native handles/addresses  NV_vertex_buffer_unified_memory  NV_bindless_multidraw_indirect  NV_shader_buffer_load/store – Pointers in GLSL  NV_bindless_texture – No more unit restrictions – References inside buffers NVIDIA Bindless Technology // GLSL with true pointers uniform MyStruct* mystructs; // API glUniformui64NV (bufferLocation, bufferADDR); texHDL = glGetTextureHandleNV (tex); // later instead of glBindTexture glUniformHandleui64NV (texLocation, texHDL) // GLSL // can also store textures in resources uniform materialBuffer { sampler2D manyTextures [LARGE]; }
  • 42. 42 0 500 1000 1500 2000 2500 Readback GPU Readback CPU Indirect GPU Indirect CPU NVIndirect GPU NVIndirect CPU Timeinmicroseconds 6. Draw New Visible 5. Update Internals 4. Occlusion Cull 3. Draw Last Visible 2. Update Internals 1. Frustum Cull Nothing „new“ to draw, but CPU doesn‘t know, still setting things up, GPU runs thru „empty“ cmd buffer For readback results, CPU has to wait for GPU idle Culling Readback vs Indirect 432 fps with culling 315 without 387 fps with culling 289 without 429 fps with culling 313 without Special bindless indirect version can save lots of CPU and a bit GPU costs for drawing the scene with a single big cmd buffer