SlideShare a Scribd company logo
1 of 33
Vertex Shader Tricks
New Ways to Use the Vertex Shader to Improve
Performance
Bill Bilodeau
Developer Technology Engineer, AMD
Topics Covered
● Overview of the DX11 front-end pipeline
● Common bottlenecks
● Advanced Vertex Shader Features
● Vertex Shader Techniques
● Samples and Results
Graphics Hardware
DX11 Front-End Pipeline
● VS –vertex data
● HS – control points
● Tessellator
● DS – generated vertices
● GS – primitives
● Write to UAV at all stages
● Starting with DX11.1
Vector GPR’s
(256 2048-bit registers)
Vector ALU
(1 64-way single precision operation every 4 clocks)
Scalar ALU
(1 operation every 4 clocks)
Scalar GPR’s
(256 64-bit registers)
Vector/Scalar cross communication bus
Vector GPR’s
(256 2048-bit registers)
Vector ALU
(1 64-way single precision operation every 4 clocks)
Scalar ALU
(1 operation every 4 clocks)
Scalar GPR’s
(256 64-bit registers)
Vector/Scalar cross communication bus
Vector GPR’s
(256 2048-bit registers)
Vector ALU
(1 64-way single precision operation every 4 clocks)
Scalar ALU
(1 operation every 4 clocks)
Scalar GPR’s
(256 64-bit registers)
Vector/Scalar cross communication bus
.
.
.
Input Assembler
Hull Shader
Domain
Shader
Tessellator
Geometry
Shader
Stream
Out
CB,
SRV,
or
UAV
Vertex Shader
Bottlenecks - VS
● VS Attributes
● Limit outputs to 4 attributes (AMD)
●This applies to all shader stages (except PS)
● VS Texture Fetches
● Too many texture fetches can add latency
●Especially dependent texture fetches
●Group fetches together for better performance
●Hide latency with ALU instructions
Bottlenecks - VS
● Use the caches wisely
● Avoid large vertex formats
that waste pre-VS cache
space
● DrawIndexed() allows for
reuse of processed vertices
saved in the post-VS cache
●Vertices with the same index
only need to get processed once
Vertex Shader
Pre-VS Cache
(Hides Latency)
Input Assembler
Post-VS Cache
(Vertex Reuse)
Bottlenecks - GS
● GS
● Can add or remove primitives
● Adding new primitives requires storing new
vertices
●Going off chip to store data can be a bandwidth issue
● Using the GS means another shader stage
●This means more competition for shader resources
●Better if you can do everything in the VS
Advanced Vertex Shader Features
● SV_VertexID, SV_InstanceID
● UAV output (DX11.1)
● NULL vertex buffer
● VS can create its own vertex data
SV_VertexID
● Can use the vertex id to decide what
vertex data to fetch
● Fetch from SRV, or procedurally create a
vertex
VSOut VertexShader(SV_VertexID id)
{
float3 vertex = g_VertexBuffer[id];
…
}
UAV buffers
● Write to UAVs from a Vertex Shader
● New feature in DX11.1 (UAV at any stage)
● Can be used instead of stream-out for
writing vertex data
● Triangle output not limited to strips
●You can use whatever format you want
● Can output anything useful to a UAV
NULL Vertex Buffer
● DX11/DX10 allows this
● Just set the number of vertices in Draw()
● VS will execute without a vertex buffer bound
● Can be used for instancing
● Call Draw() with the total number of vertices
● Bind mesh and instance data as SRVs
Vertex Shader Techniques
● Full Screen Triangle
● Vertex Shader Instancing
● Merged Instancing
● Vertex Shader UAVs
Full Screen Triangle
● For post-processing effects
● Triangle has better performance
than quad
● Fast and easy with VS
generated coordinates
● No IB or VB is necessary
● Something you should be
using for full screen effects
Clip Space Coordinates
(-1, -1, 0)
(-1, 3, 0)
(3, -1, 0)
Full Screen Triangle: C++ code
// Null VB, IB
pd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL );
pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 );
pd3dImmediateContext->IASetInputLayout( NULL );
// Set Shaders
pd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 );
pd3dImmediateContext->PSSetShader( … );
pd3dImmediateContext->PSSetShaderResources( … );
pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
// Render 3 vertices for the triangle
pd3dImmediateContext->Draw(3, 0);
Full Screen Triangle: HLSL Code
VSOutput VSFullScreenTest(uint id:SV_VERTEXID)
{
VSOutput output;
// generate clip space position
output.pos.x = (float)(id / 2) * 4.0 - 1.0;
output.pos.y = (float)(id % 2) * 4.0 - 1.0;
output.pos.z = 0.0;
output.pos.w = 1.0;
// texture coordinates
output.tex.x = (float)(id / 2) * 2.0;
output.tex.y = 1.0 - (float)(id % 2) * 2.0;
// color
output.color = float4(1, 1, 1, 1);
return output;
}
Clip Space Coordinates
(-1, -1, 0)
(-1, 3, 0)
(3, -1, 0)
VS Instancing: Point Sprites
● Often done on GS, but can be faster on VS
● Create an SRV point buffer and bind to VS
● Call Draw or DrawIndexed to render the full
triangle list.
● Read the location from the point buffer and
expand to vertex location in quad
● Can be used for particles or Bokeh DOF sprites
● Don’t use DrawInstanced for a small mesh
Point Sprites: C++ Code
pd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 );
pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0);
Point Sprites: HLSL Code
VSInstancedParticleDrawOut VSIndexBuffer(uint id:SV_VERTEXID)
{
VSInstancedParticleDrawOut output;
uint particleIndex = id / 4;
uint vertexInQuad = id % 4;
// calculate the position of the vertex
float3 position;
position.x = (vertexInQuad % 2) ? 1.0 : -1.0;
position.y = (vertexInQuad & 2) ? -1.0 : 1.0;
position.z = 0.0;
position.xy *= PARTICLE_RADIUS;
position = mul( position, (float3x3)g_mInvView ) +
g_bufPosColor[particleIndex].pos.xyz;
output.pos = mul( float4(position,1.0), g_mWorldViewProj );
output.color = g_bufPosColor[particleIndex].color;
// texture coordinate
output.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0;
output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0;
return output;
}
Point Sprite Performance
Indexed, 500K SpritesNon-Indexed, 500K SpritesGS, 500K SpritesDrawInstanced, 500K SpritesIndexed, 1M SpritesNon-Indexed, 1M SpritesGS, 1M SpritesDrawInstanced, 1M Sprit
R9 290x (ms) 0.52 0.77 1.38 1.77 1.02 1.53 2.7 3.54
Titan (ms) 0.52 0.87 0.83 5.1 1.5 1.92 1.6 10.3
0
2
4
6
8
10
12
AMD Radeon R9 290x
Nvidia Titan
Point Sprite Performance
● DrawIndexed() is the fastest method
● Draw() is slower but doesn’t need an IB
● Don’t use DrawInstanced() for creating
sprites on either AMD or NVidia hardware
● Not recommended for a small number of
vertices
Merge Instancing
● Combine multiple meshes that can be
instanced many times
● Better than normal instancing which renders
only one mesh
● Instance nearby meshes for smaller bounding box
● Each mesh is a page in the vertex data
● Fixed vertex count for each mesh
●Meshes smaller than page size use degenerate triangles
Merge Instancing
Mesh Vertex Data
Mesh Data 0
Mesh Data 1
Mesh Data 2
.
.
.
Mesh Instance Data
Instance 0
Mesh Index 2
Instance 1
Mesh Index 0
.
.
.
Degenerate
Triangle
Vertex 0
Vertex 1
Vertex 2
Vertex 3
.
.
.
0
0
0
Fixed Length Page
Merged Instancing using VS
● Use the vertex ID to look up the mesh to
instance
● All meshes are the same size, so (id / SIZE)
can be used as an offset to the mesh
● Faster than using DrawInstanced()
Merge Instancing Performance
0
5
10
15
20
25
30
DrawInstanced Soft Instancing
R9 290x
GTX 780
● Instancing performance test by
Cloud Imperium Games for Star
Citizen
● Renders 13.5M triangles (~40M
verts)
● DrawInstanced version calls
DrawInstanced() and uses instance
data in a vertex buffer
● Soft Instancing version uses
vertex instancing with Draw() calls
and fetches instance data from
SRV
AMD Radeon
R9 290X
Nvidia
GTX 780
ms
Vertex Shader UAVs
● Random access Read/Write in a VS
● Can be used to store transformed vertex
data for use in multi-pass algorithms
● Can be used for passing constant
attributes between any shader stage (not
just from VS)
Skinning to UAV
● Skin vertex data then output to UAV
● Instance the skinned UAV data multiple times
● Can also be used for non-instanced data
● Multiple passes can reuse the transformed
vertex data – Shadow map rendering
● Performance is about the same as
stream-out, but you can do more …
Bounding Box to UAV
● Can calculate and store Bbox in the VS
● Use a UAV to store the min/max values (6)
● InterlockedMin/InterlockedMax determine min
and max of the bbox
●Need to use integer values with atomics
● Use the stored bbox in later passes
● GPU physics (collision)
● Tile based processing
Bounding Box: HLSL Code
void UAVBBoxSkinVS(VSSkinnedIn input, uint id:SV_VERTEXID )
{
// skin the vertex
. . .
// output the max and min for the bounding box
int x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integer
int y = (int) (vSkinned.Pos.y * FLOAT_SCALE);
int z = (int) (vSkinned.Pos.z * FLOAT_SCALE);
InterlockedMin(g_BBoxUAV[0], x);
InterlockedMin(g_BBoxUAV[1], y);
InterlockedMin(g_BBoxUAV[2], z);
InterlockedMax(g_BBoxUAV[3], x);
InterlockedMax(g_BBoxUAV[4], y);
InterlockedMax(g_BBoxUAV[5], z);
. . .
Particle System UAV
● Single pass GPU-only particle system
● In the VS:
● Generate sprites for rendering
● Do Euler integration and update the particle
system state to a UAV
Particle System: HLSL Code
uint particleIndex = id / 4;
uint vertexInQuad = id % 4;
// calculate the new position of the vertex
float3 oldPosition = g_bufPosColor[particleIndex].pos.xyz;
float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz;
// Euler integration to find new position and velocity
float3 acceleration = normalize(oldVelocity) * ACCELLERATION;
float3 newVelocity = acceleration * g_deltaT + oldVelocity;
float3 newPosition = newVelocity * g_deltaT + oldPosition;
g_particleUAV[particleIndex].pos = float4(newPosition, 1.0);
g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0);
// Generate sprite vertices
. . .
Conclusion
● Vertex shader “tricks” can be more
efficient than more commonly used methods
● Use SV_Vertex ID for smarter instancing
●Sprites
●Merge Instancing
● UAVs add lots of freedom to vertex shaders
●Bounding box calculation
●Single pass VS particle system
Demos
● Particle System
● UAV Skinning
● Bbox
Acknowledgements
● Merge Instancing
● Emil Person, “Graphics Gems for Games”
SIGGRAPH 2011
● Brendan Jackson, Cloud Imperium
● Thanks to
● Nick Thibieroz, AMD
● Raul Aguaviva (particle system UAV), AMD
● Alex Kharlamov, AMD
Questions
● bill.bilodeau@amd.com

More Related Content

What's hot

Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Tiago Sousa
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesNarann29
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderEidos-Montréal
 
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...JP Lee
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Ki Hyunwoo
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenBenjamin Glatzel
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbiteElectronic Arts / DICE
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and MoreMark Kilgard
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 

What's hot (20)

DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
 
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the Fallen
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 

Similar to Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыDevGAMM Conference
 
NIR on the Mesa i965 backend (FOSDEM 2016)
NIR on the Mesa i965 backend (FOSDEM 2016)NIR on the Mesa i965 backend (FOSDEM 2016)
NIR on the Mesa i965 backend (FOSDEM 2016)Igalia
 
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksBeginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksJinTaek Seo
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 
Shadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics HardwareShadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics Hardwarestefan_b
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to knowRoberto Agostino Vitillo
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAnirudhGarg35
 
Shader model 5 0 and compute shader
Shader model 5 0 and compute shaderShader model 5 0 and compute shader
Shader model 5 0 and compute shaderzaywalker
 
OpenGL ES 3.0 2013
OpenGL ES 3.0 2013OpenGL ES 3.0 2013
OpenGL ES 3.0 2013Mindos Cheng
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
02 direct3 d_pipeline
02 direct3 d_pipeline02 direct3 d_pipeline
02 direct3 d_pipelineGirish Ghate
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APISri Ambati
 
GL Shading Language Document by OpenGL.pdf
GL Shading Language Document by OpenGL.pdfGL Shading Language Document by OpenGL.pdf
GL Shading Language Document by OpenGL.pdfshaikhshehzad024
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11smashflt
 
OpenGLES Android Graphics
OpenGLES Android GraphicsOpenGLES Android Graphics
OpenGLES Android GraphicsArvind Devaraj
 
Instancing
InstancingInstancing
Instancingacbess
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakovmistercteam
 
Trident International Graphics Workshop 2014 1/5
Trident International Graphics Workshop 2014 1/5Trident International Graphics Workshop 2014 1/5
Trident International Graphics Workshop 2014 1/5Takao Wada
 

Similar to Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14 (20)

Дмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформыДмитрий Вовк: Векторизация кода под мобильные платформы
Дмитрий Вовк: Векторизация кода под мобильные платформы
 
NIR on the Mesa i965 backend (FOSDEM 2016)
NIR on the Mesa i965 backend (FOSDEM 2016)NIR on the Mesa i965 backend (FOSDEM 2016)
NIR on the Mesa i965 backend (FOSDEM 2016)
 
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeksBeginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
Beginning direct3d gameprogramming09_shaderprogramming_20160505_jintaeks
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
Shadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics HardwareShadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics Hardware
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptx
 
Shader model 5 0 and compute shader
Shader model 5 0 and compute shaderShader model 5 0 and compute shader
Shader model 5 0 and compute shader
 
OpenGL ES 3.0 2013
OpenGL ES 3.0 2013OpenGL ES 3.0 2013
OpenGL ES 3.0 2013
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
02 direct3 d_pipeline
02 direct3 d_pipeline02 direct3 d_pipeline
02 direct3 d_pipeline
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
GL Shading Language Document by OpenGL.pdf
GL Shading Language Document by OpenGL.pdfGL Shading Language Document by OpenGL.pdf
GL Shading Language Document by OpenGL.pdf
 
Andes open cl for RISC-V
Andes open cl for RISC-VAndes open cl for RISC-V
Andes open cl for RISC-V
 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11
 
OpenGLES Android Graphics
OpenGLES Android GraphicsOpenGLES Android Graphics
OpenGLES Android Graphics
 
Instancing
InstancingInstancing
Instancing
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
Trident International Graphics Workshop 2014 1/5
Trident International Graphics Workshop 2014 1/5Trident International Graphics Workshop 2014 1/5
Trident International Graphics Workshop 2014 1/5
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 

Recently uploaded

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

  • 1. Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD
  • 2. Topics Covered ● Overview of the DX11 front-end pipeline ● Common bottlenecks ● Advanced Vertex Shader Features ● Vertex Shader Techniques ● Samples and Results
  • 3. Graphics Hardware DX11 Front-End Pipeline ● VS –vertex data ● HS – control points ● Tessellator ● DS – generated vertices ● GS – primitives ● Write to UAV at all stages ● Starting with DX11.1 Vector GPR’s (256 2048-bit registers) Vector ALU (1 64-way single precision operation every 4 clocks) Scalar ALU (1 operation every 4 clocks) Scalar GPR’s (256 64-bit registers) Vector/Scalar cross communication bus Vector GPR’s (256 2048-bit registers) Vector ALU (1 64-way single precision operation every 4 clocks) Scalar ALU (1 operation every 4 clocks) Scalar GPR’s (256 64-bit registers) Vector/Scalar cross communication bus Vector GPR’s (256 2048-bit registers) Vector ALU (1 64-way single precision operation every 4 clocks) Scalar ALU (1 operation every 4 clocks) Scalar GPR’s (256 64-bit registers) Vector/Scalar cross communication bus . . . Input Assembler Hull Shader Domain Shader Tessellator Geometry Shader Stream Out CB, SRV, or UAV Vertex Shader
  • 4. Bottlenecks - VS ● VS Attributes ● Limit outputs to 4 attributes (AMD) ●This applies to all shader stages (except PS) ● VS Texture Fetches ● Too many texture fetches can add latency ●Especially dependent texture fetches ●Group fetches together for better performance ●Hide latency with ALU instructions
  • 5. Bottlenecks - VS ● Use the caches wisely ● Avoid large vertex formats that waste pre-VS cache space ● DrawIndexed() allows for reuse of processed vertices saved in the post-VS cache ●Vertices with the same index only need to get processed once Vertex Shader Pre-VS Cache (Hides Latency) Input Assembler Post-VS Cache (Vertex Reuse)
  • 6. Bottlenecks - GS ● GS ● Can add or remove primitives ● Adding new primitives requires storing new vertices ●Going off chip to store data can be a bandwidth issue ● Using the GS means another shader stage ●This means more competition for shader resources ●Better if you can do everything in the VS
  • 7. Advanced Vertex Shader Features ● SV_VertexID, SV_InstanceID ● UAV output (DX11.1) ● NULL vertex buffer ● VS can create its own vertex data
  • 8. SV_VertexID ● Can use the vertex id to decide what vertex data to fetch ● Fetch from SRV, or procedurally create a vertex VSOut VertexShader(SV_VertexID id) { float3 vertex = g_VertexBuffer[id]; … }
  • 9. UAV buffers ● Write to UAVs from a Vertex Shader ● New feature in DX11.1 (UAV at any stage) ● Can be used instead of stream-out for writing vertex data ● Triangle output not limited to strips ●You can use whatever format you want ● Can output anything useful to a UAV
  • 10. NULL Vertex Buffer ● DX11/DX10 allows this ● Just set the number of vertices in Draw() ● VS will execute without a vertex buffer bound ● Can be used for instancing ● Call Draw() with the total number of vertices ● Bind mesh and instance data as SRVs
  • 11. Vertex Shader Techniques ● Full Screen Triangle ● Vertex Shader Instancing ● Merged Instancing ● Vertex Shader UAVs
  • 12. Full Screen Triangle ● For post-processing effects ● Triangle has better performance than quad ● Fast and easy with VS generated coordinates ● No IB or VB is necessary ● Something you should be using for full screen effects Clip Space Coordinates (-1, -1, 0) (-1, 3, 0) (3, -1, 0)
  • 13. Full Screen Triangle: C++ code // Null VB, IB pd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL ); pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 ); pd3dImmediateContext->IASetInputLayout( NULL ); // Set Shaders pd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 ); pd3dImmediateContext->PSSetShader( … ); pd3dImmediateContext->PSSetShaderResources( … ); pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); // Render 3 vertices for the triangle pd3dImmediateContext->Draw(3, 0);
  • 14. Full Screen Triangle: HLSL Code VSOutput VSFullScreenTest(uint id:SV_VERTEXID) { VSOutput output; // generate clip space position output.pos.x = (float)(id / 2) * 4.0 - 1.0; output.pos.y = (float)(id % 2) * 4.0 - 1.0; output.pos.z = 0.0; output.pos.w = 1.0; // texture coordinates output.tex.x = (float)(id / 2) * 2.0; output.tex.y = 1.0 - (float)(id % 2) * 2.0; // color output.color = float4(1, 1, 1, 1); return output; } Clip Space Coordinates (-1, -1, 0) (-1, 3, 0) (3, -1, 0)
  • 15. VS Instancing: Point Sprites ● Often done on GS, but can be faster on VS ● Create an SRV point buffer and bind to VS ● Call Draw or DrawIndexed to render the full triangle list. ● Read the location from the point buffer and expand to vertex location in quad ● Can be used for particles or Bokeh DOF sprites ● Don’t use DrawInstanced for a small mesh
  • 16. Point Sprites: C++ Code pd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 ); pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST ); pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0);
  • 17. Point Sprites: HLSL Code VSInstancedParticleDrawOut VSIndexBuffer(uint id:SV_VERTEXID) { VSInstancedParticleDrawOut output; uint particleIndex = id / 4; uint vertexInQuad = id % 4; // calculate the position of the vertex float3 position; position.x = (vertexInQuad % 2) ? 1.0 : -1.0; position.y = (vertexInQuad & 2) ? -1.0 : 1.0; position.z = 0.0; position.xy *= PARTICLE_RADIUS; position = mul( position, (float3x3)g_mInvView ) + g_bufPosColor[particleIndex].pos.xyz; output.pos = mul( float4(position,1.0), g_mWorldViewProj ); output.color = g_bufPosColor[particleIndex].color; // texture coordinate output.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0; output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0; return output; }
  • 18. Point Sprite Performance Indexed, 500K SpritesNon-Indexed, 500K SpritesGS, 500K SpritesDrawInstanced, 500K SpritesIndexed, 1M SpritesNon-Indexed, 1M SpritesGS, 1M SpritesDrawInstanced, 1M Sprit R9 290x (ms) 0.52 0.77 1.38 1.77 1.02 1.53 2.7 3.54 Titan (ms) 0.52 0.87 0.83 5.1 1.5 1.92 1.6 10.3 0 2 4 6 8 10 12 AMD Radeon R9 290x Nvidia Titan
  • 19. Point Sprite Performance ● DrawIndexed() is the fastest method ● Draw() is slower but doesn’t need an IB ● Don’t use DrawInstanced() for creating sprites on either AMD or NVidia hardware ● Not recommended for a small number of vertices
  • 20. Merge Instancing ● Combine multiple meshes that can be instanced many times ● Better than normal instancing which renders only one mesh ● Instance nearby meshes for smaller bounding box ● Each mesh is a page in the vertex data ● Fixed vertex count for each mesh ●Meshes smaller than page size use degenerate triangles
  • 21. Merge Instancing Mesh Vertex Data Mesh Data 0 Mesh Data 1 Mesh Data 2 . . . Mesh Instance Data Instance 0 Mesh Index 2 Instance 1 Mesh Index 0 . . . Degenerate Triangle Vertex 0 Vertex 1 Vertex 2 Vertex 3 . . . 0 0 0 Fixed Length Page
  • 22. Merged Instancing using VS ● Use the vertex ID to look up the mesh to instance ● All meshes are the same size, so (id / SIZE) can be used as an offset to the mesh ● Faster than using DrawInstanced()
  • 23. Merge Instancing Performance 0 5 10 15 20 25 30 DrawInstanced Soft Instancing R9 290x GTX 780 ● Instancing performance test by Cloud Imperium Games for Star Citizen ● Renders 13.5M triangles (~40M verts) ● DrawInstanced version calls DrawInstanced() and uses instance data in a vertex buffer ● Soft Instancing version uses vertex instancing with Draw() calls and fetches instance data from SRV AMD Radeon R9 290X Nvidia GTX 780 ms
  • 24. Vertex Shader UAVs ● Random access Read/Write in a VS ● Can be used to store transformed vertex data for use in multi-pass algorithms ● Can be used for passing constant attributes between any shader stage (not just from VS)
  • 25. Skinning to UAV ● Skin vertex data then output to UAV ● Instance the skinned UAV data multiple times ● Can also be used for non-instanced data ● Multiple passes can reuse the transformed vertex data – Shadow map rendering ● Performance is about the same as stream-out, but you can do more …
  • 26. Bounding Box to UAV ● Can calculate and store Bbox in the VS ● Use a UAV to store the min/max values (6) ● InterlockedMin/InterlockedMax determine min and max of the bbox ●Need to use integer values with atomics ● Use the stored bbox in later passes ● GPU physics (collision) ● Tile based processing
  • 27. Bounding Box: HLSL Code void UAVBBoxSkinVS(VSSkinnedIn input, uint id:SV_VERTEXID ) { // skin the vertex . . . // output the max and min for the bounding box int x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integer int y = (int) (vSkinned.Pos.y * FLOAT_SCALE); int z = (int) (vSkinned.Pos.z * FLOAT_SCALE); InterlockedMin(g_BBoxUAV[0], x); InterlockedMin(g_BBoxUAV[1], y); InterlockedMin(g_BBoxUAV[2], z); InterlockedMax(g_BBoxUAV[3], x); InterlockedMax(g_BBoxUAV[4], y); InterlockedMax(g_BBoxUAV[5], z); . . .
  • 28. Particle System UAV ● Single pass GPU-only particle system ● In the VS: ● Generate sprites for rendering ● Do Euler integration and update the particle system state to a UAV
  • 29. Particle System: HLSL Code uint particleIndex = id / 4; uint vertexInQuad = id % 4; // calculate the new position of the vertex float3 oldPosition = g_bufPosColor[particleIndex].pos.xyz; float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz; // Euler integration to find new position and velocity float3 acceleration = normalize(oldVelocity) * ACCELLERATION; float3 newVelocity = acceleration * g_deltaT + oldVelocity; float3 newPosition = newVelocity * g_deltaT + oldPosition; g_particleUAV[particleIndex].pos = float4(newPosition, 1.0); g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0); // Generate sprite vertices . . .
  • 30. Conclusion ● Vertex shader “tricks” can be more efficient than more commonly used methods ● Use SV_Vertex ID for smarter instancing ●Sprites ●Merge Instancing ● UAVs add lots of freedom to vertex shaders ●Bounding box calculation ●Single pass VS particle system
  • 31. Demos ● Particle System ● UAV Skinning ● Bbox
  • 32. Acknowledgements ● Merge Instancing ● Emil Person, “Graphics Gems for Games” SIGGRAPH 2011 ● Brendan Jackson, Cloud Imperium ● Thanks to ● Nick Thibieroz, AMD ● Raul Aguaviva (particle system UAV), AMD ● Alex Kharlamov, AMD

Editor's Notes

  1. The value of SV_VertexID depends on the draw call. For non-indexed Draw, the vertex ID starts with 0 and increments by 1 for every vertex processed by the shader. For DrawIndexed(), the vertexID is the value of the index in the index buffer for that vertex.
  2. For indexed Draw calls, create an index buffer which contains (index location + index number). That way you can calculate (vertexID/vertsPerMesh) to get the instance index, and (vertexID % vertsPerMesh) to get the index value which you can use to look up the vertex.
  3. - If the mesh is being reused many times, then calculating the bounding box has little overhead.Bounding box can be used for collision detection
  4. Could read and write from the UAV instead of binding an input SRV