SlideShare a Scribd company logo
A Scalable Real-Time Many-Shadowed-Light
Rendering System
Bo Li
Warner Bros. Games Montréal
© 2019 WB Games Montreal Inc. 1
© 2019 WB Games Montreal Inc.
Motivations
There’s no un-shadowed light in the real world
(Unless you are a Quantum Physicist Player)
© 2019 WB Games Montreal Inc. 3
Our System
© 2019 WB Games Montreal Inc. 4
Our System
© 2019 WB Games Montreal Inc. 5
Multi-Resolution Shadow Map Pool
• Lots of Pre-Allocated Shadow Textures
• Each subsequent level, 1/4 Resolution and
4x Number Textures
• Ideally constant #texels per-level
• Goal: Target constant Pixel/Texel Ratio(1:1)
• The smaller screen-projected size, the more
texture slots available
• Each Light allocates its best resolution
© 2019 WB Games Montreal Inc. 6
Shadow Pool: In practice
• PlayStation 4 / Xbox One:
• Max 128 Textures Per-Level
• First Level: [2048x2048]x4
• Last Level: [32x32]x128
• 156 MB, 596 Textures
• Allocation:
• Search with Max-Desired Resolution
• If no free texture found, search in next levels
© 2019 WB Games Montreal Inc. 7
Free
Occupied
Request
Our System
© 2019 WB Games Montreal Inc. 8
GPU Shadow Map Compression
• Motivations
• Skip runtime static shadow rendering
• Parallelism: Overlap Compute/Graphics pipeline
• Minimize Size
• Challenges
• CS based compression/decompression: parallel irregular data structure
• Floating point precision
• Limited data format on GPU TGSM: 32bit data size
• Lossy but conservative errors (No hand-tweaking Depth/Slope-Bias)
© 2019 WB Games Montreal Inc. 9
GPU Shadow Map Compression: Data Flow
© 2019 WB Games Montreal Inc. 10
Input: RGBA16F
XYZ Plane + Raw Depth
Inverted Near/Far
32x32 Tiles Encode each Quad, either Depth Plane or Packed float4
(256 Packed Quads, 32bits, shared FP16 exponent)
Sorting Packed Quads(CodeBooks). Very important for
depth-tested shadows: Can re-use depth plane
CodeBook Compaction/Merge
Generate Quad Indices to compacted CodeBook
(1 Byte/Index), Conservative error adjustment
Encode Sparse QuadTree and output
Output: D16 Linear-Z
Storage
GPU Shadow Map Compression: Result
© 2019 WB Games Montreal Inc. 11
• Designed to handle Alpha-Testing: Unique depth planes shared
• Compression Ratio:
• Typically anywhere between 7:1 to 100:1
• Worst case 1.45:1 (pure noise input), best case 512:1 (single depth plane/tile)
• Can expect average 20:1 or better, prefer larger textures
• Decompression Speed:
• 0.048ms for 1024x1024 on PS4 Base
• Close to hardware pixel fill-rate
• Compression Speed:
• 0.36ms for 1024x1024 on PS4 Base (unoptimized)
• Todo: use LaneSwizzle instruction for sorting/scan
Shadow Map Compression: Quality
© 2019 WB Games Montreal Inc. 12
Original Shadow Map
Static Shadow Map Compression: Quality
© 2019 WB Games Montreal Inc. 13
Compressed Shadow Map(7.15 : 1)
Our System
© 2019 WB Games Montreal Inc. 14
Dynamic Shadow Pass
• Goal: Minimize overhead
• Full Depth Copy
• Full Depth Clear / Depth Decompression
• Traditional Options:
• Full Shadow Copy From Static to Dynamic: Slow, High Fixed Cost
• Re-Generate Static Shadow: Very high CPU Cost
• Sample Both Static and Dynamic Shadow Maps: High Filtering Cost
• But Dynamic-Only Shadow is often highly-sparse in texture space
• Full shadow map copy/merging is undesirable
• Double filtering cost is undesirable
© 2019 WB Games Montreal Inc. 15
Separated Dynamic Shadow: Example
• A typical shadow map layout with dynamic interactions:
+ x =
Static Dynamic-Mask Dynamic Equivalent
512x512 32x32 1024 x 1024
© 2019 WB Games Montreal Inc. 16
Conservative Dynamic-Mask
• Filtering: Check Dynamic-Mask Texture Once for
the Entire Kernel
• Unbound(TextureIndex == -1): Static-Only
• Texel false: Static-Only
• Texel true: Dynamic and Static
• Dynamic-Mask must be conservative covering the
whole filter kernel
© 2019 WB Games Montreal Inc. 17
Conservative Dynamic-Mask
• Bound as UAV on Pixel Shader, no AtomicOp
needed on Current-Gen Consoles
• R8_UNORM On PS4/Xbox, R32_UINT On PC (or
check “UAV Typed Load” in DX11.3)
• Extrapolate the position from center of a four
pixel quad with the shadow filtering kernel radius
© 2019 WB Games Montreal Inc. 18
float2 ConservativeOffset = (SvPosition.xy & 1) - 0.5f) * Max_Filer_Kernel_Size * 2;
uint2 LowResCoord = (SvPosition.xy + ConservativeOffset) / 64.f;
if (ShadowLowResFlags[LowResCoord] == 0) //Avoid some write contentions on some HW
ShadowLowResFlags[LowResCoord] = 1;
Depth Partial Decompression(PS4/X1)
• Minimized Overhead
• Only ~0.03ms overhead for 2k x 2k shadow pass(PS4)
• No full depth copy
• No “slow” depth clear
• Partial depth decompression with dynamic mask
• Use Dynamic-Mask for partial decompression
• Generate Rect-List Based on Dynamic-Mask
© 2019 WB Games Montreal Inc. 19
2048x2048 Depth (Example) Cost Variance
Full-Decompress 65.8us Fixed
Partial-Decompress 9.6us Data-Dependent
Robust Depth-Bias
• Uniform depth bias will always fail at unbounded depth slope
• Only used to correct rounding errors
• SlopeBias = Filter_Kernel_Radius
• Geometrically based: (Max Variance Per-Pixel) * Width
• No User Input Needed
• HW:
• RasterizerDesc.DepthBias = Epsilon; //(1 is a good epsilon choice)
• RasterizerDesc.SlopeScaledDepthBias = Max_Filer_Kernel_Size; //(ex 3.0f)
• Note: HW implement max(ddx(z), ddy(x)), you might want to use lager value
• SW:
• ShadowDepth += Epsilon / 65535.f; //For R16_Depth
• ShadowDepth += (abs(ddx(ShadowDepth)) + abs(ddy(ShadowDepth))) *
Max_Filer_Kernel_Size;
© 2019 WB Games Montreal Inc. 20
Robust Depth-Bias: D16
Depth Bias = 0.005
Slope Bias = 0.0
Depth Bias = 1.0 / 65536
Slope Bias = FILTER_KERNEL_RADIUS
© 2019 WB Games Montreal Inc. 21
Shadow Acne
Missing contact
Our System
© 2019 WB Games Montreal Inc. 22
Tiled-Deferred-Shadow
• Shader Occupancy
• Simpler/Small code: Less VGPRs
• Separate Deferred Spot / Point: Even less VGPRs, Less cache trashing
• 70% occupancy on GCN
• Bindless Shadow Map Table: Single Pass Projection
• PC: Use DX12 Binding Spaces: Requires SM5.1, Supported on most DX11 GPUs
• Texture2D SpotShadows[] : register(t0, space2);
• TextureCube PointShadows[] : register(t0, space3);
• DispatchIndirect() after Light-List Generation
© 2019 WB Games Montreal Inc.
Tiled-Deferred Shadow: Selective Sample Test
• Deferred shadow more sensitive to False-Positives
• Eating up shadow output channels very quickly
• Fighting Depth-Complexity: How Conservative?
• Near/Far Bounding Boxes + Selective depth samples test
• Cull the light if there’s no depth sample touches it
• Trade off between precision and speed easily
© 2019 WB Games Montreal Inc. 24
Light List Culling Performance PS4 Base
2AABB(Bounding Box) Only 0.30ms
2AABB + 16 Depth Sample Test 0.41ms
2AABB + 64 Depth Sample Test 0.57ms
Tiled-Deferred Shadow: Selective Sample Test
• Fighting Depth-Complexity: Comparison
© 2019 WB Games Montreal Inc. 25
Deferred Lighting Output Bounding Box Culling Only Bounding Box Culling+
Selective Sample Test
Our System
© 2019 WB Games Montreal Inc. 26
Deferred-Shadow Mask Challenges
• Large Data between Deferred-Shadow -> Deferred-Shading
• Motivation: Targeting 4K+
• (4K) * (16 Shadow Per-Pixel) x (8 Bits) ~= 128 MB
• Better(but naïve) Solution
• Lower precision masks + Temporal AA
• As Low as 2 bit is acceptable
• 128MB -> 32MB
• We can do better than 1bit 
• 128MB -> 12MB
© 2019 WB Games Montreal Inc. 27
Deferred-Shadow Mask Compression
• Block compression Vector-Quantization(VQ) instead of Pixel
Quantization
• 4x4 Pixel Block, 4096 CodeBooks(Offline-Data-Trained Patterns)
• Output best matching Indices, 12bits/Block
© 2019 WB Games Montreal Inc. 28
Input
Best Match
4096 CodeBooks
Encoder
Output
Lookup
4096 CodeBooks
Decoder
12bit Indices
Deferred-Shadow Mask: Optimization
• Skip fully black/white blocks (WaveBallot)
• Search Tree: TSVQ
• Tree-Structured-Vector-Quantization: O(log(n))
• Full, Balanced Quad-Tree, 6 levels (4 ^ 6 = 4096 CodeBooks)
• MSAD4 (AMD GCN: v_msad_u8)
• Multimedia instruction
• Accumulate 4 byte matching errors in one instruction
• LaneSwizzle (AMD GCN: ds_swizzle_b32)
• Fast exchanging data between threads
• No TGSM (Thread-Group-Shared-Memory)
• Cost: ~7% deferred shadow pass
© 2019 WB Games Montreal Inc. 29
VQ Compression Code (Per-Pixel Frequency)
uint CompressVQ(float Shadow, uint2 Gtid : SV_GroupThreadID, uint GroupIndex : SV_GroupIndex)
{
uint SrcPixel = uint(Shadow * 254.99f + 1.f) << ((GTid.x % 4) * 8);//0 is special number for msad
SrcPixel |= LaneSwizzle(SrcPixel, 0x1F, 0, 0x1);
SrcPixel |= LaneSwizzle(SrcPixel, 0x1F, 0, 0x2);//Collected 4 neighbor pixels
uint CurrIndex = -1;
[unroll]
for (int i = 0; i < 6; i++) //CodeBook size 4096=4^6
{
CurrIndex = CurrIndex * 4 + 4; //QuadTree next level
uint MatchErr = msad(SrcPixel, uint2(VQCodeBookBuffer[(CurrIndex + GTid.x % 4) * 4 +
(GTid.y % 4)], 0), 0);
MatchErr += LaneSwizzle(MatchErr, 0x1F, 0, THREADGROUP_SIZEX); //Accum next line
MatchErr += LaneSwizzle(MatchErr, 0x1F, 0, THREADGROUP_SIZEX << 1);//Accum 2 lines away
uint MatchErr_Index = (MatchErr << 8) | (GTid.x % 4); //Pack index for deterministic order
MatchErr_Index = min(MatchErr_Index, LaneSwizzle(MatchErr_Index, 0x1F, 0, 0x1));
MatchErr_Index = min(MatchErr_Index, LaneSwizzle(MatchErr_Index, 0x1F, 0, 0x2));
CurrIndex += MatchErr_Index & 0xf; //Broadcasted best matching of the four children
}
return CurrIndex;
}
© 2019 WB Games Montreal Inc. 30
Light Channel: 4 Bits
© 2019 WB Games Montreal Inc. 31
4 Bits / Mask + TAA
Light Channel: 2 Bits
© 2019 WB Games Montreal Inc. 32
2 Bits / Mask + TAA
Light Channel: 1 Bit
© 2019 WB Games Montreal Inc. 33
1 Bit / Mask + TAA
Light Channel: VQ Compressed 0.75 Bit
© 2019 WB Games Montreal Inc. 34
0.75Bit VQ / Mask + TAA
Light Channel: 4 Bits
© 2019 WB Games Montreal Inc. 35
4 Bits / Mask + TAA
Performance: Static Camera
• Unannounced Project, running on PS4 Base
• High-poly un-optimized meshes in BasePass
• 2507 shadowed lights in the scene
0.44ms Shadow Depth 0.40ms Deferred Shadow
© 2019 WB Games Montreal Inc. 36
Performance: Moving Camera
© 2019 WB Games Montreal Inc. 37
Conclusions
• Benefits:
• Shippable on Current-Gen Consoles (PS4/Xbox One/DX12 API)
• Thousands of Shadowed-Light in Large Environment
• Significantly more Stable framerate
• Minimal Shadow-popping
• Minimized Run-time Memory Allocations
• Supports Shadowed Volumetric and Transparency Lighting for Local Lights
• Challenges:
• Vertex-Animated Static-Mesh (E.g. Trees): Static or Dynamic?
• Currently switched to dynamic with high resolution shadow, otherwise cached
• Bake stateless texture space animation?
© 2019 WB Games Montreal Inc. 38
• References
• https://developer.amd.com/wordpress/media/2012/10/AMD_Southern_Islan
ds_Instruction_Set_Architecture.pdf
• https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
• Thanks:
• Zaratsyan, Art
• Béliveau, Jimmy
• Lassonde, Gabriel
• Turcotte, Sebastien
• Fatnassi, Sammy
• Wu, Shan
© 2019 WB Games Montreal Inc. 39
We’re HIRING
Questions
https://youtu.be/lyYpFVB_-fI

More Related Content

What's hot

Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
ozlael ozlael
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
Electronic Arts / DICE
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
Electronic Arts / DICE
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
VIKAS SINGH BHADOURIA
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Philip Hammer
 
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
UnityTechnologiesJapan002
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Electronic Arts / DICE
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
Electronic Arts / DICE
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
Electronic Arts / DICE
 
Massive Point Light Soft Shadows
Massive Point Light Soft ShadowsMassive Point Light Soft Shadows
Massive Point Light Soft Shadows
Wolfgang Engel
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
Philip Hammer
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
Michele Giacalone
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in Games
Colin Barré-Brisebois
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John HableNaughty Dog
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 

What's hot (20)

Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
【Unite Tokyo 2018】『崩壊3rd』開発者が語るアニメ風レンダリングの極意
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Massive Point Light Soft Shadows
Massive Point Light Soft ShadowsMassive Point Light Soft Shadows
Massive Point Light Soft Shadows
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in Games
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 

Similar to A Scalable Real-Time Many-Shadowed-Light Rendering System

Masked Occlusion Culling
Masked Occlusion CullingMasked Occlusion Culling
Masked Occlusion Culling
Intel® Software
 
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Lviv Startup Club
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Daosheng Mu
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
Eidos-Montréal
 
Pixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and Trends
Pixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and TrendsPixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and Trends
Pixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and Trends
pixellab
 
Making a game with Molehill: Zombie Tycoon
Making a game with Molehill: Zombie TycoonMaking a game with Molehill: Zombie Tycoon
Making a game with Molehill: Zombie TycoonJean-Philippe Doiron
 
Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES renderer
Davide Pasca
 
Introduce to 3d rendering engine
Introduce to 3d rendering engineIntroduce to 3d rendering engine
Introduce to 3d rendering engine
Daosheng Mu
 
The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014
Jarosław Pleskot
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
Edge AI and Vision Alliance
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systems
Jay Nagar
 
「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発
Unity Technologies Japan K.K.
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.ppt
MalleshBettadapura1
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
Philip Hammer
 
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S..."Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
Edge AI and Vision Alliance
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko3D
 
Stupid Canvas Tricks
Stupid Canvas TricksStupid Canvas Tricks
Stupid Canvas Tricks
deanhudson
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15
Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15
Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15
Jarosław Pleskot
 

Similar to A Scalable Real-Time Many-Shadowed-Light Rendering System (20)

Masked Occlusion Culling
Masked Occlusion CullingMasked Occlusion Culling
Masked Occlusion Culling
 
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
 
XNAPresentation
XNAPresentationXNAPresentation
XNAPresentation
 
Pixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and Trends
Pixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and TrendsPixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and Trends
Pixel-Lab / Games:EDU / Michel Kripalani / Games Industry Overview and Trends
 
Making a game with Molehill: Zombie Tycoon
Making a game with Molehill: Zombie TycoonMaking a game with Molehill: Zombie Tycoon
Making a game with Molehill: Zombie Tycoon
 
Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES renderer
 
Introduce to 3d rendering engine
Introduce to 3d rendering engineIntroduce to 3d rendering engine
Introduce to 3d rendering engine
 
The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systems
 
「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.ppt
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S..."Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525
 
Stupid Canvas Tricks
Stupid Canvas TricksStupid Canvas Tricks
Stupid Canvas Tricks
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15
Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15
Shadow Warrior 2 and the evolution of the Roadhog Engine, GIC15
 

Recently uploaded

NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 

Recently uploaded (20)

NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 

A Scalable Real-Time Many-Shadowed-Light Rendering System

  • 1. A Scalable Real-Time Many-Shadowed-Light Rendering System Bo Li Warner Bros. Games Montréal © 2019 WB Games Montreal Inc. 1
  • 2. © 2019 WB Games Montreal Inc.
  • 3. Motivations There’s no un-shadowed light in the real world (Unless you are a Quantum Physicist Player) © 2019 WB Games Montreal Inc. 3
  • 4. Our System © 2019 WB Games Montreal Inc. 4
  • 5. Our System © 2019 WB Games Montreal Inc. 5
  • 6. Multi-Resolution Shadow Map Pool • Lots of Pre-Allocated Shadow Textures • Each subsequent level, 1/4 Resolution and 4x Number Textures • Ideally constant #texels per-level • Goal: Target constant Pixel/Texel Ratio(1:1) • The smaller screen-projected size, the more texture slots available • Each Light allocates its best resolution © 2019 WB Games Montreal Inc. 6
  • 7. Shadow Pool: In practice • PlayStation 4 / Xbox One: • Max 128 Textures Per-Level • First Level: [2048x2048]x4 • Last Level: [32x32]x128 • 156 MB, 596 Textures • Allocation: • Search with Max-Desired Resolution • If no free texture found, search in next levels © 2019 WB Games Montreal Inc. 7 Free Occupied Request
  • 8. Our System © 2019 WB Games Montreal Inc. 8
  • 9. GPU Shadow Map Compression • Motivations • Skip runtime static shadow rendering • Parallelism: Overlap Compute/Graphics pipeline • Minimize Size • Challenges • CS based compression/decompression: parallel irregular data structure • Floating point precision • Limited data format on GPU TGSM: 32bit data size • Lossy but conservative errors (No hand-tweaking Depth/Slope-Bias) © 2019 WB Games Montreal Inc. 9
  • 10. GPU Shadow Map Compression: Data Flow © 2019 WB Games Montreal Inc. 10 Input: RGBA16F XYZ Plane + Raw Depth Inverted Near/Far 32x32 Tiles Encode each Quad, either Depth Plane or Packed float4 (256 Packed Quads, 32bits, shared FP16 exponent) Sorting Packed Quads(CodeBooks). Very important for depth-tested shadows: Can re-use depth plane CodeBook Compaction/Merge Generate Quad Indices to compacted CodeBook (1 Byte/Index), Conservative error adjustment Encode Sparse QuadTree and output Output: D16 Linear-Z Storage
  • 11. GPU Shadow Map Compression: Result © 2019 WB Games Montreal Inc. 11 • Designed to handle Alpha-Testing: Unique depth planes shared • Compression Ratio: • Typically anywhere between 7:1 to 100:1 • Worst case 1.45:1 (pure noise input), best case 512:1 (single depth plane/tile) • Can expect average 20:1 or better, prefer larger textures • Decompression Speed: • 0.048ms for 1024x1024 on PS4 Base • Close to hardware pixel fill-rate • Compression Speed: • 0.36ms for 1024x1024 on PS4 Base (unoptimized) • Todo: use LaneSwizzle instruction for sorting/scan
  • 12. Shadow Map Compression: Quality © 2019 WB Games Montreal Inc. 12 Original Shadow Map
  • 13. Static Shadow Map Compression: Quality © 2019 WB Games Montreal Inc. 13 Compressed Shadow Map(7.15 : 1)
  • 14. Our System © 2019 WB Games Montreal Inc. 14
  • 15. Dynamic Shadow Pass • Goal: Minimize overhead • Full Depth Copy • Full Depth Clear / Depth Decompression • Traditional Options: • Full Shadow Copy From Static to Dynamic: Slow, High Fixed Cost • Re-Generate Static Shadow: Very high CPU Cost • Sample Both Static and Dynamic Shadow Maps: High Filtering Cost • But Dynamic-Only Shadow is often highly-sparse in texture space • Full shadow map copy/merging is undesirable • Double filtering cost is undesirable © 2019 WB Games Montreal Inc. 15
  • 16. Separated Dynamic Shadow: Example • A typical shadow map layout with dynamic interactions: + x = Static Dynamic-Mask Dynamic Equivalent 512x512 32x32 1024 x 1024 © 2019 WB Games Montreal Inc. 16
  • 17. Conservative Dynamic-Mask • Filtering: Check Dynamic-Mask Texture Once for the Entire Kernel • Unbound(TextureIndex == -1): Static-Only • Texel false: Static-Only • Texel true: Dynamic and Static • Dynamic-Mask must be conservative covering the whole filter kernel © 2019 WB Games Montreal Inc. 17
  • 18. Conservative Dynamic-Mask • Bound as UAV on Pixel Shader, no AtomicOp needed on Current-Gen Consoles • R8_UNORM On PS4/Xbox, R32_UINT On PC (or check “UAV Typed Load” in DX11.3) • Extrapolate the position from center of a four pixel quad with the shadow filtering kernel radius © 2019 WB Games Montreal Inc. 18 float2 ConservativeOffset = (SvPosition.xy & 1) - 0.5f) * Max_Filer_Kernel_Size * 2; uint2 LowResCoord = (SvPosition.xy + ConservativeOffset) / 64.f; if (ShadowLowResFlags[LowResCoord] == 0) //Avoid some write contentions on some HW ShadowLowResFlags[LowResCoord] = 1;
  • 19. Depth Partial Decompression(PS4/X1) • Minimized Overhead • Only ~0.03ms overhead for 2k x 2k shadow pass(PS4) • No full depth copy • No “slow” depth clear • Partial depth decompression with dynamic mask • Use Dynamic-Mask for partial decompression • Generate Rect-List Based on Dynamic-Mask © 2019 WB Games Montreal Inc. 19 2048x2048 Depth (Example) Cost Variance Full-Decompress 65.8us Fixed Partial-Decompress 9.6us Data-Dependent
  • 20. Robust Depth-Bias • Uniform depth bias will always fail at unbounded depth slope • Only used to correct rounding errors • SlopeBias = Filter_Kernel_Radius • Geometrically based: (Max Variance Per-Pixel) * Width • No User Input Needed • HW: • RasterizerDesc.DepthBias = Epsilon; //(1 is a good epsilon choice) • RasterizerDesc.SlopeScaledDepthBias = Max_Filer_Kernel_Size; //(ex 3.0f) • Note: HW implement max(ddx(z), ddy(x)), you might want to use lager value • SW: • ShadowDepth += Epsilon / 65535.f; //For R16_Depth • ShadowDepth += (abs(ddx(ShadowDepth)) + abs(ddy(ShadowDepth))) * Max_Filer_Kernel_Size; © 2019 WB Games Montreal Inc. 20
  • 21. Robust Depth-Bias: D16 Depth Bias = 0.005 Slope Bias = 0.0 Depth Bias = 1.0 / 65536 Slope Bias = FILTER_KERNEL_RADIUS © 2019 WB Games Montreal Inc. 21 Shadow Acne Missing contact
  • 22. Our System © 2019 WB Games Montreal Inc. 22
  • 23. Tiled-Deferred-Shadow • Shader Occupancy • Simpler/Small code: Less VGPRs • Separate Deferred Spot / Point: Even less VGPRs, Less cache trashing • 70% occupancy on GCN • Bindless Shadow Map Table: Single Pass Projection • PC: Use DX12 Binding Spaces: Requires SM5.1, Supported on most DX11 GPUs • Texture2D SpotShadows[] : register(t0, space2); • TextureCube PointShadows[] : register(t0, space3); • DispatchIndirect() after Light-List Generation © 2019 WB Games Montreal Inc.
  • 24. Tiled-Deferred Shadow: Selective Sample Test • Deferred shadow more sensitive to False-Positives • Eating up shadow output channels very quickly • Fighting Depth-Complexity: How Conservative? • Near/Far Bounding Boxes + Selective depth samples test • Cull the light if there’s no depth sample touches it • Trade off between precision and speed easily © 2019 WB Games Montreal Inc. 24 Light List Culling Performance PS4 Base 2AABB(Bounding Box) Only 0.30ms 2AABB + 16 Depth Sample Test 0.41ms 2AABB + 64 Depth Sample Test 0.57ms
  • 25. Tiled-Deferred Shadow: Selective Sample Test • Fighting Depth-Complexity: Comparison © 2019 WB Games Montreal Inc. 25 Deferred Lighting Output Bounding Box Culling Only Bounding Box Culling+ Selective Sample Test
  • 26. Our System © 2019 WB Games Montreal Inc. 26
  • 27. Deferred-Shadow Mask Challenges • Large Data between Deferred-Shadow -> Deferred-Shading • Motivation: Targeting 4K+ • (4K) * (16 Shadow Per-Pixel) x (8 Bits) ~= 128 MB • Better(but naïve) Solution • Lower precision masks + Temporal AA • As Low as 2 bit is acceptable • 128MB -> 32MB • We can do better than 1bit  • 128MB -> 12MB © 2019 WB Games Montreal Inc. 27
  • 28. Deferred-Shadow Mask Compression • Block compression Vector-Quantization(VQ) instead of Pixel Quantization • 4x4 Pixel Block, 4096 CodeBooks(Offline-Data-Trained Patterns) • Output best matching Indices, 12bits/Block © 2019 WB Games Montreal Inc. 28 Input Best Match 4096 CodeBooks Encoder Output Lookup 4096 CodeBooks Decoder 12bit Indices
  • 29. Deferred-Shadow Mask: Optimization • Skip fully black/white blocks (WaveBallot) • Search Tree: TSVQ • Tree-Structured-Vector-Quantization: O(log(n)) • Full, Balanced Quad-Tree, 6 levels (4 ^ 6 = 4096 CodeBooks) • MSAD4 (AMD GCN: v_msad_u8) • Multimedia instruction • Accumulate 4 byte matching errors in one instruction • LaneSwizzle (AMD GCN: ds_swizzle_b32) • Fast exchanging data between threads • No TGSM (Thread-Group-Shared-Memory) • Cost: ~7% deferred shadow pass © 2019 WB Games Montreal Inc. 29
  • 30. VQ Compression Code (Per-Pixel Frequency) uint CompressVQ(float Shadow, uint2 Gtid : SV_GroupThreadID, uint GroupIndex : SV_GroupIndex) { uint SrcPixel = uint(Shadow * 254.99f + 1.f) << ((GTid.x % 4) * 8);//0 is special number for msad SrcPixel |= LaneSwizzle(SrcPixel, 0x1F, 0, 0x1); SrcPixel |= LaneSwizzle(SrcPixel, 0x1F, 0, 0x2);//Collected 4 neighbor pixels uint CurrIndex = -1; [unroll] for (int i = 0; i < 6; i++) //CodeBook size 4096=4^6 { CurrIndex = CurrIndex * 4 + 4; //QuadTree next level uint MatchErr = msad(SrcPixel, uint2(VQCodeBookBuffer[(CurrIndex + GTid.x % 4) * 4 + (GTid.y % 4)], 0), 0); MatchErr += LaneSwizzle(MatchErr, 0x1F, 0, THREADGROUP_SIZEX); //Accum next line MatchErr += LaneSwizzle(MatchErr, 0x1F, 0, THREADGROUP_SIZEX << 1);//Accum 2 lines away uint MatchErr_Index = (MatchErr << 8) | (GTid.x % 4); //Pack index for deterministic order MatchErr_Index = min(MatchErr_Index, LaneSwizzle(MatchErr_Index, 0x1F, 0, 0x1)); MatchErr_Index = min(MatchErr_Index, LaneSwizzle(MatchErr_Index, 0x1F, 0, 0x2)); CurrIndex += MatchErr_Index & 0xf; //Broadcasted best matching of the four children } return CurrIndex; } © 2019 WB Games Montreal Inc. 30
  • 31. Light Channel: 4 Bits © 2019 WB Games Montreal Inc. 31 4 Bits / Mask + TAA
  • 32. Light Channel: 2 Bits © 2019 WB Games Montreal Inc. 32 2 Bits / Mask + TAA
  • 33. Light Channel: 1 Bit © 2019 WB Games Montreal Inc. 33 1 Bit / Mask + TAA
  • 34. Light Channel: VQ Compressed 0.75 Bit © 2019 WB Games Montreal Inc. 34 0.75Bit VQ / Mask + TAA
  • 35. Light Channel: 4 Bits © 2019 WB Games Montreal Inc. 35 4 Bits / Mask + TAA
  • 36. Performance: Static Camera • Unannounced Project, running on PS4 Base • High-poly un-optimized meshes in BasePass • 2507 shadowed lights in the scene 0.44ms Shadow Depth 0.40ms Deferred Shadow © 2019 WB Games Montreal Inc. 36
  • 37. Performance: Moving Camera © 2019 WB Games Montreal Inc. 37
  • 38. Conclusions • Benefits: • Shippable on Current-Gen Consoles (PS4/Xbox One/DX12 API) • Thousands of Shadowed-Light in Large Environment • Significantly more Stable framerate • Minimal Shadow-popping • Minimized Run-time Memory Allocations • Supports Shadowed Volumetric and Transparency Lighting for Local Lights • Challenges: • Vertex-Animated Static-Mesh (E.g. Trees): Static or Dynamic? • Currently switched to dynamic with high resolution shadow, otherwise cached • Bake stateless texture space animation? © 2019 WB Games Montreal Inc. 38
  • 39. • References • https://developer.amd.com/wordpress/media/2012/10/AMD_Southern_Islan ds_Instruction_Set_Architecture.pdf • https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ • Thanks: • Zaratsyan, Art • Béliveau, Jimmy • Lassonde, Gabriel • Turcotte, Sebastien • Fatnassi, Sammy • Wu, Shan © 2019 WB Games Montreal Inc. 39 We’re HIRING

Editor's Notes

  1. I’m going to talk about how we render many shadowed lights efficiently
  2. Feel free to take photos,
  3. What’s the motivations? There’s no un-shadowed light in the real world. How many times lighting artists has to put unshadowed light in a game for good performance? When PBR lighting in games looks so real, but we still accept many unshadowed light, something is obviously wrong. We want to break the barrier. This is not a tech demo or pure research project, it has been use for production for more than a year.
  4. Our system diagram. I’ll go through the key building blocks one by one.
  5. First I’m going to talk about our shadow pool.
  6. shadow pool is our run-time storage system. We use a pre-allocated shadow pool. with at least hundreds of textures of different resolutions. Each lower level has 4 times the number textures and one quarter size for each one. Why? Because the camera has fixed number of pixels, and we targets 1:1 pixel to texel ratio, Assuming uniform density for lights, they always fill the same whole screen no matter what distance, we need roughly the same number of texels for different sized lights.
  7. An example on current-gen consoles, Top level is 2k by 2k, last level is 32x32. 156MB of memory in total. Allocation from this shadow map pool work by searching the perfect match first. In case we didn’t find a best match, we search in the lower resolution level until we find a free one. It’s much better to have lower quality shadows instead of shadow popping
  8. Now I want to talk about we render static shadow map.
  9. We don’t have to. We developed GPU based shadow map compressor to minimize data streaming size. We use lossy compression, because we want to make sure the size can always reduced even if the input was noise. And interesting challenge is I want conservative error. Which mean the error is always on one sign. Such that we don’t have to tweak ANY depth bias settings when switching to compressed shadowmap.
  10. Compression Data Flow. I can only talk about details roughly. the basic block for compression is 32x32 pixels and is done in a compute shader thread group. First, we compress the 2x2 quads into 32 bit entries, either as xyz depth plane if they belongs to same triangle or packed float4, in total we have 256 of them. Then we sort and compact the packed entries to remove duplicated ones, and generate an index at each quads into the merged entries. So we can have many quads pointing to the same entry because they belong to the same triangle. And finally we encode the sparse quadtree in sharememory and output to a buffer.
  11. The Key idea here is we designed with alpha tested shadow maps in mind.so even if there’re holes on a triangle, we can still use indirection to share the same triangle data to reduce size. The results: Average compression ratios are around 20:1 or better it costs 0.048ms on PS4 base to decompress 1 million texels. It’s actually pretty close to hardware pixel fill-rate.
  12. Original shadow map
  13. Compressed. Note this shadow map is very hard to compress because the light is right on top of the alpha-tested tree. Yet we still achieve a decent compression ratio.
  14. Next: Dynamic shadow pass
  15. Since we cache static shadow, Normally, we have to copy out the static shadow map into a new RT to render dynamic on top of that. This is not efficient. Imaging there’s a small object moving in front of the light. We have to copy the whole shadow map only to render a small object. The cost of copy is so much higher than the rendering. The purpose is to minimize overhead.
  16. We use separated shadow map for a single light. from left to right a static shadow map, dynamic mask and a dynamic shadow map. The right most is the mixed up shadow map in a traditional game engine. Each shadow map can have different resolution. we may use higher resolution dynamic shadow map for better character quality.
  17. Shadow projection and filtering is done by checking the dynamic mask first. If the mask is true, we have to filter both dynamic and static shadow map. If false, we only filter the static one. That’s why dynamic mask has to be conservatively rasterized. it has to cover dynamic object plus a filter kernel size.
  18. So how to we generate the conservative dynamic mask? We bound a UAV in addition to the regular depth buffer, and output to the mask with an expanded position from the quad center by a kernel radius. On consoles we only need 8 bit format for the UAV, which is the minimal size for UAV typed load.
  19. We use dynamic mask for partial depth decompression on consoles. Depth decompression is normally a fixed GPU cost. To lower the overhead, we only decompress the htiles that are touched by dynamic mask, and achieved a significant speedup. We can render a 2k by 2k shadow map in 0.05ms on PS4, including buffer clear, rendering and depth decompress.
  20. I want to talk about robust depth bias. Because we have so many possible shadow map resolutions, we need a very robust shadow bias that work in all cases. Our shadow bias is so robust that we removed shadow biasing setting from the editor.
  21. We only use absolutely minimal Uniform Depth bias to correct numeric rounding errors. Most of shadow acne came from depth slope. Many game engines did this wrong. Both 2 big engines I wouldn’t say which, let artist to input a plausible shadow bias for each light, There’re 2 issues: 1: artist have more work 2: you might never fix contact shadow and shadow acne at the same time. It’s clearly show on the left, you either fail on one or the other, or both.
  22. Now it’s our deferred-shadow.
  23. Why do we choose deferred shadow? Occupancy. Not only we use deferred-shadow, we have deferred spot light and deferred point light to further improve occupancy and cache performance. We use bindless textures spaces to bind all the shadow maps at once and dispatch indirect on all platforms.
  24. the classic issue with tiled-deferred shading is depth complexity for light culling. Deferred shadow is much more sensitive to depth complexity, because false positive will waste output channels. There’re many talks about this topic, But we choose a simple solution. We use a special sampling pattern in a tile to test if a light is really touching any samples after bounding box test. The pattern is constructed such that each row and column has at least one sample, so we can capture very thin features in the scene. In theory We might miss some very small lights, but in practice, we faded out very small lights so we don’t really see any artifacts. The cost of selective sample test is only 0.1ms to 0.3ms.
  25. Here’s the example. Middle is bounding box light culling, you can clearly see those false-positives. And on the right is our selective sample test results. It’s much cleaner.
  26. And finally, I want to talk about per-pixel shadow masks.
  27. One issue with deferred shadow is large amount of data have to be passed from deferred shadow to deferred shading. For 4K native resolution, 16 lights per-pixel, we need 128MB for light mask. We can reduce number of bits per channel in expense of quality. But we have a new solution that can reduce size even more with good quality.
  28. The idea is Vector quantization. We use large codebook to find the best match of the input block. We only have to store index for the whole block. Decoding is extremely fast. Just a memory copy from the cache.
  29. The idea may sounds crazy. how we can search thousands of blocks per-pixel? Actually it’s doable. First we use full-balanced search tree for O(log(n)). And we have 2 useful shader instructions to help us. for block matching we have Msad4. And laneswizzle allow us to quickly sync data between threads.
  30. Here’s an example Vector quantization compression code. I don’t have time to explain but it’s all written there. The function runs at per-pixel frequency so it’s very easy to use not the most efficient.
  31. Quality comparison. Here’s 4 bit/light channel + TAA, as reference quality.
  32. 2 bit/light channel
  33. 1 bit/light channel, things gets pretty noisey now.
  34. And here’s our 0.75bit VQ compression.
  35. back to 4bit Again, for comparison.
  36. How about performance? This is GPU capture in an internal test level with tons of mesh, npc firing, We have 2500 shadow lights in the scene. in this capture, shadow use less than 1ms for the whole frame. Of course if other effects such as volumetric fog use shadow data, they are not included here but the cost should be in the same order.
  37. Our system really performs with moving camera. This is a frame time recording in one of our cut-scene test with constant moving camera Yellow color is our modified UE4 engine, Green is stock UE4 Engine. Compared to stock UE4 engine, our framerates are significantly more stable. We achieved a performance gain of anywhere between 10% to 1500%.
  38. One of the challenges is that how you handle dynamic trees in the world? They are static mesh with vertex animation. If you cache them they look static and if you render them it’s huge number of drawcalls. For us we only render them as dynamic when they close up and that seems to be good enough. But texture space stateless animation might be interesting in the future.
  39. Thanks WB game montreal to make this possible and we are hiring. Thanks Seb and Gabriel for ideas of volumetric fog.
  40. And if you have any question I will be playing a video in the background at the same time. https://youtu.be/lyYpFVB_-fI