SlideShare a Scribd company logo
Deferred Lighting and Post Processing on
PLAYSTATION®3
Matt Swoboda
PhyreEngine™ Team
Sony Computer Entertainment Europe (SCEE) R&D
2
Where Are We Now?
• PS3 into its 3rd
year
• Many developers on their 2nd
generation engines
• Solved the basic problems
• SPUs STILL underused
– But it’s improving
3
But..
• GPU now the most common bottleneck
• Usually limited by fragment operations
• Many titles take > 1/3 of their time in post processing
• Most developers want to do even more fragment work
– More / heavier post processing effects
– Better lighting techniques / more lights / softer shadows
– Longer shaders
– Features ported from PC / other console hardware
4
“We fixed the vertex bottleneck..”
• Many possible solutions to improve geometry
performance beyond just “optimising the shader”
– LOD
– Occlusion culling & visibility culling
– Move large vertex operations to SPU, e.g. skinning
– SPU triangle culling
5
What About Pixels?
• Fragment operations / post processing rarely optimised
like geometry operations
– Throw whole operation at the GPU
– Same operation done for every pixel
– Spatial optimization / branching considered too slow
• SPU not considered: “too slow”, “uses too much
bandwidth”
6
SPU pixel processing
• Yes, the SPU is fast enough to process pixels
• Won’t beat the GPU in a brute force race
• GPU specialises in rasterising triangles and sampling
textures – has dedicated hardware
• SPU is a general purpose processor
– Use flexibility to your advantage
– Choose different code branches and fast paths
Post Processing Effects on SPU
A Whirlwind Tour
8
What to do on SPU
• Options:
• Offload whole processes from GPU to SPU
• Or use SPU and GPU together to do one process
9
Depth Of Field Pre-Process
• High quality depth of field requires a long fragment shader
– Read depth samples and colour samples in a kernel / disc
– Check depths against centre pixel depth
– Weight colours by depth check results
• Wasteful for “most” of the screen
– All depth checks pass (out of focus) or all fail (in focus)
– All fail == pass through original buffer
– All pass == use pre-blurred buffer – separable gaussian blur
• Categorise the screen for these cases on SPU
10
Depth Of Field Classification Results
• Post process depth buffer
• Classify by min/max depth
• Green: fully in focus
• Blue: fully out of focus
• Red: neither fully in or out
11
Depth Of Field Pre-process results
• Pre-process only on SPU,
blur operations on GPU
– Goal: minimise overall frame
time and latency
• Large blur w.r.t. depth
• 15 ms+ on GPU alone
• 1.5-2ms on SPU + 3 ms on
GPU
12
Screen Tile Classification
• Categorise the screen using the range of depth values
within a tile
• Powerful technique with many applications
– Full screen effect optimization - DOF, SSAO..
– Soft particles
– Affecting lights
– Occluder information
13
Screen Space Ambient Occlusion (SSAO)
• Generate an ambient occlusion approximation using the
depth buffer alone
• Perform a large kernel-based series of depth
comparisons and sum the results
• Downsample output to ½ size for performance
– Output normals for bilateral upsampling
14
SPU Screen Space Ambient Occlusion
Results
• GPU version: 10ms+
• SPU version: 6ms on 2
SPUs
• Used in “Donkey Trader”
PhyreEngine game
template
Deferred Rendering
16
Deferred Shading Overview
• Rasterise geometry information to multiple “GBuffers”
(geometry buffers)
• Apply lighting and shading in a post process
Demo
18
Deferred Lighting on SPU
• The SPU can handle the deferred lighting process
• The GPU renders the geometry to GBuffers
• SPU and GPU execute in parallel
– Total time : max( geometry, lighting )
19
Deferred Lighting on SPU: Implementation (1)
• Process each pixel once
• Work out which lights affect each pixel
• Apply the N affecting lights in a loop
• Process the screen in tiles
• Use classification techniques per tile to optimise
20
Deferred Lighting on SPU: Implementation (2)
• Calculate affecting lights per tile
– Build a frustum around the tile using the min and max depth
values in that tile
– Perform frustum check with each light’s bounding volume
– Compare light direction with tile average normal value
• Choose fast paths based on tile contents
– No lights affect the tile? Use fast path
– Check material values to see if any pixels are marked as lit
21
Deferred Lighting on SPU: Implementation (3)
• Choose whether to process MSAA
per tile
– If no sample pair values differ, light
only one sample from the pair,
otherwise light both samples
separately
– Typically quite few tiles need both
MSAA samples lit
Tiles requiring MSAA
22
Deferred Lighting on SPU: Results
• 3 shadow casting lights, 100
point lights
• 2x MSAA, 720p
– Lighting performed per sample
• Apply tone mapping on SPU
– Virtually free
• Performance: > 60 fps, 3
SPUs for 11ms each
– No MSAA: 2 SPUs for 11ms
23
Deferred Lighting on SPU: Issues
• Potential latency
– Must keep GPU busy while SPU process is running
– Render something else or add a frame of latency
• Main memory requirements
• Shadows
– Requires “random” texture access – not ideal for SPU
– Can render shadows on GPU to a full screen buffer and use it
on SPU
24
Flavours of Deferred Lighting on SPU
• Full deferred render on SPU
– Input all GBuffers, output final composited result
• Light pre-pass render on SPU
– Input normal and depth only; calculate light result; sample in
2nd
geometry pass
• Light tile classification data output?
– SPU outputs information per tile about affecting lights
– Do lighting calculations on GPU
Volumetric Lighting
26
Volumetric Lighting
• Also known as “god rays” or “light beams”
• Simulates the effect of light illuminating dust particles in
the air
• Numerous fakes exist
– Artist-placed geometry
– Artist-placed particles
• Better: generate using the shadow map
– Works in a “general case”
27
Volumetric Lighting
• Ray march through the shadow map
– Trace one ray per pixel in screen space
– Sample the depth buffer to determine
the end of the ray
• Sample the shadow map at N points
along the ray
– N ~= 50
– Attenuate and sum up the number of
samples that passed
• Blur and add noise
28
Volumetric Lighting
• Effect is a bit too slow to be practical on GPU: ~5ms
• Do it on SPU instead
• Parallelises with GPU easily
– Result needed late in the render at compositing stage
– Only needs depth and shadow map inputs
• Problem: must randomly sample from the shadow map
29
Texture sampling on SPU
• “Random access” texture sampling is bad for SPU
• It’s bad for GPU, too, but sometimes you just have to do it
• GPU:
– Fast access from texture cache; cache miss is slow
– Dedicated hardware handles lookups, filtering and wrapping
• SPU:
– Fast access from “texture cache” (SPU local memory)
– Slow access on cache miss (DMA from main memory)
– Cache lookups slow (no dedicated hardware)
– Must manually handle filtering and wrapping (again, slow)
30
Texture sampling on SPU
• Either:
– Make the texture entirely fit in SPU local memory
– Problem solved!
– Still inefficient: random accesses reduce register parallelism
• Or
– Write a very good software cache
– Locate potential cache misses early - long before you need the values
– Avoid branches in sampling code
31
Volumetric Lighting on SPU
• Volumetric light result will be blurred
– Don’t need full shadow map accuracy
– No filtering on texture samples needed
• Downsample shadow map from 1024x1024, 32 bit to
256x256, 16 bit
– 128k – fits in SPU local memory
• Fast enough to sample on SPU
32
Volumetric Lighting on SPU: Results
• Takes ~11 ms on 1 SPU
33
Shadow Mapping on SPU (1)
• Needs the full-size shadow map
– 1024x1024x32 bit == 4mb : won’t fit in SPU local memory
– We’ll have to write that “very good software cache”, then
• Pre-process the shadow map on SPU
– Calculate min and max depth for each tile
– Store in a low resolution depth hierarchy map
– Output high resolution shadow map as cache tiles
34
Shadow Mapping on SPU (2)
• Software cache with 32 entries
– Each entry is a shadow map tile
– Branchless determination of cache entry index for tile index
• Locate cache misses early
– While detiling depth data – work out required shadow tiles
– Pull in all cache-missed tiles
• Sample shadow map during lighting calculations
– All required shadow tiles are now definitely in cache – lookup is
branchless
• It’s quite slow
– Locate tile in cache per pixel
35
Shadow Mapping on SPU (3)
• Optimise via special cases to win back
performance
• Use the low resolution shadow tile map
– Always in SPU local memory
– If pixel shadow z > tile max Z : definitely in shadow
– If pixel shadow z < tile min Z : definitely not in shadow
• Check low resolution map before triggering
cache fetches
• Classify whole screen tiles as in or out of
shadow
– Don’t need to sample high resolution shadow map at
all for those tiles Tiles requiring high resolution shadow samples
Conclusion
37
Conclusion
• New additions to your toolbox:
– Tile-based classification techniques on SPU
– Deferred lighting on SPU
– Texture sampling on SPU
• Rendering is no longer just a GPU problem
– Use general purpose nature of the SPU to your advantage
• Rethink fragment processing optimisation strategies
– Make the GPU work smarter, not harder
38
Conclusion
• Some titles are already using SPU post processing
– Killzone 2
• PhyreEngine™ is here to help
– (If you’re a registered PS3 developer) it’s on DevNet now
– Not just an engine: also a reference
– Comes with full source
– Download it, learn from it, steal bits of the code
– Check out the PhyreEngine™ SPU Post Processing Library

More Related Content

What's hot

A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Electronic Arts / DICE
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
Tiago Sousa
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
Guerrilla
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
Guerrilla
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
Philip Hammer
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
Tiago Sousa
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John HableNaughty Dog
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of GamesKillzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Guerrilla
 
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...JP Lee
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
Philip Hammer
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
stevemcauley
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
Electronic Arts / DICE
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
Ki Hyunwoo
 
Lighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow FallLighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow Fall
Guerrilla
 

What's hot (20)

A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 
Light prepass
Light prepassLight prepass
Light prepass
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of GamesKillzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
 
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...Penner   pre-integrated skin rendering (siggraph 2011 advances in real-time r...
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
Lighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow FallLighting of Killzone: Shadow Fall
Lighting of Killzone: Shadow Fall
 

Similar to Deferred Lighting and Post Processing on PLAYSTATION®3

PlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge TechniquesPlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge Techniques
Slide_N
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
Slide_N
 
Unity optimization techniques applied in Catan Universe
Unity optimization techniques applied in Catan UniverseUnity optimization techniques applied in Catan Universe
Unity optimization techniques applied in Catan Universe
Exozet Berlin GmbH
 
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unity Technologies
 
Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
none299359
 
A modern Post-Processing Pipeline
A modern Post-Processing PipelineA modern Post-Processing Pipeline
A modern Post-Processing Pipeline
Wolfgang Engel
 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
Wolfgang Engel
 
Felwyrld Tech
Felwyrld TechFelwyrld Tech
Felwyrld Tech
Alex Nankervis
 
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAAParis Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
Wolfgang Engel
 
Massive Point Light Soft Shadows
Massive Point Light Soft ShadowsMassive Point Light Soft Shadows
Massive Point Light Soft Shadows
Wolfgang Engel
 
Sony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development DivisionSony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development DivisionSlide_N
 
Optcarrot: A Pure-Ruby NES Emulator
Optcarrot: A Pure-Ruby NES EmulatorOptcarrot: A Pure-Ruby NES Emulator
Optcarrot: A Pure-Ruby NES Emulator
mametter
 
Develop2012 deferred sanchez_stachowiak
Develop2012 deferred sanchez_stachowiakDevelop2012 deferred sanchez_stachowiak
Develop2012 deferred sanchez_stachowiak
Matt Filer
 
Next generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedbergNext generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedberg
Mary Chan
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
ozlael ozlael
 
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in UnityProgressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Unity Technologies
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
Owen Wu
 
HiPEAC 2019 Workshop - Use Cases
HiPEAC 2019 Workshop - Use CasesHiPEAC 2019 Workshop - Use Cases
HiPEAC 2019 Workshop - Use Cases
Tulipp. Eu
 
「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発
Unity Technologies Japan K.K.
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
inside-BigData.com
 

Similar to Deferred Lighting and Post Processing on PLAYSTATION®3 (20)

PlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge TechniquesPlayStation: Cutting Edge Techniques
PlayStation: Cutting Edge Techniques
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
 
Unity optimization techniques applied in Catan Universe
Unity optimization techniques applied in Catan UniverseUnity optimization techniques applied in Catan Universe
Unity optimization techniques applied in Catan Universe
 
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
 
Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
 
A modern Post-Processing Pipeline
A modern Post-Processing PipelineA modern Post-Processing Pipeline
A modern Post-Processing Pipeline
 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
 
Felwyrld Tech
Felwyrld TechFelwyrld Tech
Felwyrld Tech
 
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAAParis Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
 
Massive Point Light Soft Shadows
Massive Point Light Soft ShadowsMassive Point Light Soft Shadows
Massive Point Light Soft Shadows
 
Sony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development DivisionSony Computer Entertainment Europe Research & Development Division
Sony Computer Entertainment Europe Research & Development Division
 
Optcarrot: A Pure-Ruby NES Emulator
Optcarrot: A Pure-Ruby NES EmulatorOptcarrot: A Pure-Ruby NES Emulator
Optcarrot: A Pure-Ruby NES Emulator
 
Develop2012 deferred sanchez_stachowiak
Develop2012 deferred sanchez_stachowiakDevelop2012 deferred sanchez_stachowiak
Develop2012 deferred sanchez_stachowiak
 
Next generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedbergNext generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedberg
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in UnityProgressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in Unity
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 
HiPEAC 2019 Workshop - Use Cases
HiPEAC 2019 Workshop - Use CasesHiPEAC 2019 Workshop - Use Cases
HiPEAC 2019 Workshop - Use Cases
 
「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発「原神」におけるコンソールプラットフォーム開発
「原神」におけるコンソールプラットフォーム開発
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 

More from Slide_N

Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
Slide_N
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
Slide_N
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Slide_N
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
Slide_N
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
Slide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
Slide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
Slide_N
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
Slide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
Slide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
Slide_N
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
Slide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
Slide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Slide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
Slide_N
 

More from Slide_N (20)

Future Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPCFuture Commodity Chip Called CELL for HPC
Future Commodity Chip Called CELL for HPC
 
Common Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngineCommon Software Models and Platform for Cell and SpursEngine
Common Software Models and Platform for Cell and SpursEngine
 
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
Toshiba's Approach to Consumer Product Applications by Cell and Desire/Challe...
 
Towards Cell Broadband Engine - Together with Playstation
Towards Cell Broadband Engine  - Together with PlaystationTowards Cell Broadband Engine  - Together with Playstation
Towards Cell Broadband Engine - Together with Playstation
 
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

Deferred Lighting and Post Processing on PLAYSTATION®3

  • 1. Deferred Lighting and Post Processing on PLAYSTATION®3 Matt Swoboda PhyreEngine™ Team Sony Computer Entertainment Europe (SCEE) R&D
  • 2. 2 Where Are We Now? • PS3 into its 3rd year • Many developers on their 2nd generation engines • Solved the basic problems • SPUs STILL underused – But it’s improving
  • 3. 3 But.. • GPU now the most common bottleneck • Usually limited by fragment operations • Many titles take > 1/3 of their time in post processing • Most developers want to do even more fragment work – More / heavier post processing effects – Better lighting techniques / more lights / softer shadows – Longer shaders – Features ported from PC / other console hardware
  • 4. 4 “We fixed the vertex bottleneck..” • Many possible solutions to improve geometry performance beyond just “optimising the shader” – LOD – Occlusion culling & visibility culling – Move large vertex operations to SPU, e.g. skinning – SPU triangle culling
  • 5. 5 What About Pixels? • Fragment operations / post processing rarely optimised like geometry operations – Throw whole operation at the GPU – Same operation done for every pixel – Spatial optimization / branching considered too slow • SPU not considered: “too slow”, “uses too much bandwidth”
  • 6. 6 SPU pixel processing • Yes, the SPU is fast enough to process pixels • Won’t beat the GPU in a brute force race • GPU specialises in rasterising triangles and sampling textures – has dedicated hardware • SPU is a general purpose processor – Use flexibility to your advantage – Choose different code branches and fast paths
  • 7. Post Processing Effects on SPU A Whirlwind Tour
  • 8. 8 What to do on SPU • Options: • Offload whole processes from GPU to SPU • Or use SPU and GPU together to do one process
  • 9. 9 Depth Of Field Pre-Process • High quality depth of field requires a long fragment shader – Read depth samples and colour samples in a kernel / disc – Check depths against centre pixel depth – Weight colours by depth check results • Wasteful for “most” of the screen – All depth checks pass (out of focus) or all fail (in focus) – All fail == pass through original buffer – All pass == use pre-blurred buffer – separable gaussian blur • Categorise the screen for these cases on SPU
  • 10. 10 Depth Of Field Classification Results • Post process depth buffer • Classify by min/max depth • Green: fully in focus • Blue: fully out of focus • Red: neither fully in or out
  • 11. 11 Depth Of Field Pre-process results • Pre-process only on SPU, blur operations on GPU – Goal: minimise overall frame time and latency • Large blur w.r.t. depth • 15 ms+ on GPU alone • 1.5-2ms on SPU + 3 ms on GPU
  • 12. 12 Screen Tile Classification • Categorise the screen using the range of depth values within a tile • Powerful technique with many applications – Full screen effect optimization - DOF, SSAO.. – Soft particles – Affecting lights – Occluder information
  • 13. 13 Screen Space Ambient Occlusion (SSAO) • Generate an ambient occlusion approximation using the depth buffer alone • Perform a large kernel-based series of depth comparisons and sum the results • Downsample output to ½ size for performance – Output normals for bilateral upsampling
  • 14. 14 SPU Screen Space Ambient Occlusion Results • GPU version: 10ms+ • SPU version: 6ms on 2 SPUs • Used in “Donkey Trader” PhyreEngine game template
  • 16. 16 Deferred Shading Overview • Rasterise geometry information to multiple “GBuffers” (geometry buffers) • Apply lighting and shading in a post process
  • 17. Demo
  • 18. 18 Deferred Lighting on SPU • The SPU can handle the deferred lighting process • The GPU renders the geometry to GBuffers • SPU and GPU execute in parallel – Total time : max( geometry, lighting )
  • 19. 19 Deferred Lighting on SPU: Implementation (1) • Process each pixel once • Work out which lights affect each pixel • Apply the N affecting lights in a loop • Process the screen in tiles • Use classification techniques per tile to optimise
  • 20. 20 Deferred Lighting on SPU: Implementation (2) • Calculate affecting lights per tile – Build a frustum around the tile using the min and max depth values in that tile – Perform frustum check with each light’s bounding volume – Compare light direction with tile average normal value • Choose fast paths based on tile contents – No lights affect the tile? Use fast path – Check material values to see if any pixels are marked as lit
  • 21. 21 Deferred Lighting on SPU: Implementation (3) • Choose whether to process MSAA per tile – If no sample pair values differ, light only one sample from the pair, otherwise light both samples separately – Typically quite few tiles need both MSAA samples lit Tiles requiring MSAA
  • 22. 22 Deferred Lighting on SPU: Results • 3 shadow casting lights, 100 point lights • 2x MSAA, 720p – Lighting performed per sample • Apply tone mapping on SPU – Virtually free • Performance: > 60 fps, 3 SPUs for 11ms each – No MSAA: 2 SPUs for 11ms
  • 23. 23 Deferred Lighting on SPU: Issues • Potential latency – Must keep GPU busy while SPU process is running – Render something else or add a frame of latency • Main memory requirements • Shadows – Requires “random” texture access – not ideal for SPU – Can render shadows on GPU to a full screen buffer and use it on SPU
  • 24. 24 Flavours of Deferred Lighting on SPU • Full deferred render on SPU – Input all GBuffers, output final composited result • Light pre-pass render on SPU – Input normal and depth only; calculate light result; sample in 2nd geometry pass • Light tile classification data output? – SPU outputs information per tile about affecting lights – Do lighting calculations on GPU
  • 26. 26 Volumetric Lighting • Also known as “god rays” or “light beams” • Simulates the effect of light illuminating dust particles in the air • Numerous fakes exist – Artist-placed geometry – Artist-placed particles • Better: generate using the shadow map – Works in a “general case”
  • 27. 27 Volumetric Lighting • Ray march through the shadow map – Trace one ray per pixel in screen space – Sample the depth buffer to determine the end of the ray • Sample the shadow map at N points along the ray – N ~= 50 – Attenuate and sum up the number of samples that passed • Blur and add noise
  • 28. 28 Volumetric Lighting • Effect is a bit too slow to be practical on GPU: ~5ms • Do it on SPU instead • Parallelises with GPU easily – Result needed late in the render at compositing stage – Only needs depth and shadow map inputs • Problem: must randomly sample from the shadow map
  • 29. 29 Texture sampling on SPU • “Random access” texture sampling is bad for SPU • It’s bad for GPU, too, but sometimes you just have to do it • GPU: – Fast access from texture cache; cache miss is slow – Dedicated hardware handles lookups, filtering and wrapping • SPU: – Fast access from “texture cache” (SPU local memory) – Slow access on cache miss (DMA from main memory) – Cache lookups slow (no dedicated hardware) – Must manually handle filtering and wrapping (again, slow)
  • 30. 30 Texture sampling on SPU • Either: – Make the texture entirely fit in SPU local memory – Problem solved! – Still inefficient: random accesses reduce register parallelism • Or – Write a very good software cache – Locate potential cache misses early - long before you need the values – Avoid branches in sampling code
  • 31. 31 Volumetric Lighting on SPU • Volumetric light result will be blurred – Don’t need full shadow map accuracy – No filtering on texture samples needed • Downsample shadow map from 1024x1024, 32 bit to 256x256, 16 bit – 128k – fits in SPU local memory • Fast enough to sample on SPU
  • 32. 32 Volumetric Lighting on SPU: Results • Takes ~11 ms on 1 SPU
  • 33. 33 Shadow Mapping on SPU (1) • Needs the full-size shadow map – 1024x1024x32 bit == 4mb : won’t fit in SPU local memory – We’ll have to write that “very good software cache”, then • Pre-process the shadow map on SPU – Calculate min and max depth for each tile – Store in a low resolution depth hierarchy map – Output high resolution shadow map as cache tiles
  • 34. 34 Shadow Mapping on SPU (2) • Software cache with 32 entries – Each entry is a shadow map tile – Branchless determination of cache entry index for tile index • Locate cache misses early – While detiling depth data – work out required shadow tiles – Pull in all cache-missed tiles • Sample shadow map during lighting calculations – All required shadow tiles are now definitely in cache – lookup is branchless • It’s quite slow – Locate tile in cache per pixel
  • 35. 35 Shadow Mapping on SPU (3) • Optimise via special cases to win back performance • Use the low resolution shadow tile map – Always in SPU local memory – If pixel shadow z > tile max Z : definitely in shadow – If pixel shadow z < tile min Z : definitely not in shadow • Check low resolution map before triggering cache fetches • Classify whole screen tiles as in or out of shadow – Don’t need to sample high resolution shadow map at all for those tiles Tiles requiring high resolution shadow samples
  • 37. 37 Conclusion • New additions to your toolbox: – Tile-based classification techniques on SPU – Deferred lighting on SPU – Texture sampling on SPU • Rendering is no longer just a GPU problem – Use general purpose nature of the SPU to your advantage • Rethink fragment processing optimisation strategies – Make the GPU work smarter, not harder
  • 38. 38 Conclusion • Some titles are already using SPU post processing – Killzone 2 • PhyreEngine™ is here to help – (If you’re a registered PS3 developer) it’s on DevNet now – Not just an engine: also a reference – Comes with full source – Download it, learn from it, steal bits of the code – Check out the PhyreEngine™ SPU Post Processing Library

Editor's Notes

  1. The Typical performance limitation on PS3 titles has moved from a CPU bottleneck to a GPU bottleneck - specifically fragment operations.
  2. There’s a large range of techniques that can be applied to optimise vertex performance, often focusing on only drawing what you can see, and spending the most time drawing the most important things. In addition the SPU has successfully been used to perform vertex processing.
  3. Fragment operations are usually applied in a brute force manner. Techniques for spending the most time only where it makes the most difference – e.g. edge cases – are rarely used because of the need for branching and potentially complex pre-passes. The SPU is rarely used for pixel processing because it’s perceived as being too slow or needing too much bandwidth.
  4. The SPU is fast enough to perform some pixel processing tasks. Bandwidth is rarely an issue – every post process we’ve ever developed so far on SPU has been cycle limited, not bandwidth limited – the PS3 bus really is that fast. The GPU is very good at specific tasks, such as rasterisation and texture sampling, but it has strict limitations. The SPU is a general purpose processor – you can read and write data in any order you like, apply branches anywhere you like, and use fast paths to optimise the process.
  5. A brief run-through of some post processes we’ve implemented on SPU, to contrast the differing approaches used.
  6. Use SPU to optimise GPU operations Depth buffer tile classification DXT compress render targets on SPU Use SPU to perform processes more suitable for SPU architecture Summed area table generation – much easier on SPU than GPU Deferred lighting – SPU can pick fast paths per block of pixels Use SPU to perform processes to offload from GPU Screen space ambient occlusion, volumetric lighting Operations on SPU work in parallel with GPU doing other work – minimise time on critical path
  7. Depth of field is a very desirable but potentially slow post process. The process works by performing a blur where each weight in the kernel is scaled by a function of the difference between the kernel sample depth and pixel depth. As such the kernel can’t be separated, and the blur shader is therefore quite long and slow. This is in fact a waste of time for much of the screen. For most pixels, the depth differences are such that the weights are all 1 or 0. If we can detect areas like this we can run those areas through considerably shorter shaders and greatly reduce execution time.
  8. We can perform this classification process on SPU. The process reads the depth buffer, processes it in tiles, classifies the results and outputs three lists of point sprites – one for each classification type. The lists are then rendered on the GPU using different shaders to perform the effect. For more detail about this process please refer to my SCEE DevStation 2008 presentation on SPU post processing.
  9. The result is that using a version of the effect with classification techniques is massively faster than one without. The performance timings can be scaled down if the original shader used is simpler.
  10. Soft particles: determine which particles need to be handled as “soft” by checking depth min/max tiles. Only those intersecting need to be handled as “soft”. The rest can be handled as regular particles or even avoided completely if they are totally obscured by the depth.
  11. For SSAO we perform the entire process on the SPU. The effect is generated and output to a texture which is sampled during rendering on the GPU.
  12. The process was able to be kicked off after the depth pre pass, and parallelised with the shadow rendering pass on the GPU. The results were then available to be read during the main render pass.
  13. Rasterise geometry information to multiple “GBuffers” (geometry buffers) colour, normal, depth, specular and material information Apply lighting and shading as post processes Multiple lights and spatial optimizations easy Fewer shader combinations required Has some negative points GBuffers consume memory and bandwidth MSAA is problematic Fixed BRDFs
  14. Demo showing deferred lighting on the SPU.
  15. The SPU handles the deferred lighting process usually done on GPU. Parallelise the lighting process across multiple SPUs to improve performance. The GPU handles rasterisation processes - like rendering GBuffers, shadow maps, alpha passes, reflection geometry etc. SPU may be slower than GPU at light processing, but it’s faster than doing it all in serial on GPU. By moving such a large body of work off the GPU, we can greatly increase the overall frame rate if the GPU is the bottleneck.
  16. To handle multiple lighting models, calculate all the different models and select based on light type. Branch per tile to optimise the set of light types used.
  17. Comparing the light direction with tile average normal value can avoid lights behind walls and so on.
  18. MSAA - why does this work? If there are no triangle edges or intersections, both samples rendered from the colour output will contain the same value.
  19. The resulting performance is massively increased compared to a GPU-only equivalent. Tone mapping is virtually free on SPU – accumulation is just a sum of all pixel luminance, easily rolled into lighting calculations. The current frame’s tone mapping is applied using the eye adapted value from the previous frame. This maps much better to SPU than GPU. Why not roll in colour correction and other post processing operations – e.g. a depth of field pre-process - to the full deferred render solution?
  20. GBuffers must be in main memory to be read by SPU – potentially requiring a lot of main memory. Find something else for the GPU to do that doesn’t depend on SPU results - alpha geometry, reflections, shadow maps, effects? Otherwise, add a frame of latency.
  21. Different options exist depending on your limitations.
  22. Potentially very slow. 50 texture samples times 1280x720 pixels? Downsample to ¼ width and height – result is blurred anyway. There is a demo of this effect running on GPU in the NVIDIA SDK.
  23. Even after considerable down-sampling the effect still takes over 5ms on the GPU – too slow for our situation. So we decided to implement it on SPU instead. Unfortunately the effect requires random sampling of a shadow map texture – something which is difficult to map to the SPU.
  24. To work out the best way to do SPU texture sampling, consider the GPU. The GPU has a texture cache which stores a small portion of the texture in fast-to-access memory, and the rest in a slower, larger memory buffer elsewhere (main memory or VRAM). On the SPU, that texture cache is the SPU’s local store. Accesses to this are fast, but if the data is not in cache it must be pulled in from main memory by DMA – which is slower. Also there is no dedicated hardware to manage the texture lookup – everything must be emulated in software.
  25. If the texture can fit entirely in the SPU’s local store, we can avoid the whole texture cache issue. If not, we have to write a software cache that can handle it. This software cache must be branchless for lookups, otherwise performance of the calling code will be destroyed. This implies that cache misses must be caught and resolved early, so there are no DMAs in the main processing loop either.
  26. Fortunately for volumetric lighting, the whole shadow map can be made to fit in SPU local memory by downsampling and reducing it.
  27. The effect is fast and parallel enough to run on an SPU in the background while other work is done on the GPU.
  28. Low resolution format: 64x64 for a 1k x 1k shadow map. Output the high resolution shadow map in a series of 16x16 tiles – they map to cache pages 1k in size.
  29. The low resolution shadow min/max depth map can be used to greatly optimise the process by skipping high resolution shadow reads where the whole shadow tile is all in or out of shadow, and skipping shadow lookups entirely where the whole screen tile is in or out of shadow. By doing all this we can achieve good performance – good enough to make it practical to sample shadow maps on the SPU. This screen tile information can be output in a pre-process step similar to the depth of field classification and used for deferred shadowing on the GPU – only sampling the shadow map on GPU for the edge cases where the tile falls on the border of in and out of shadow. The rest of the screen can run through a fast path. This can greatly optimise the performance of deferred shadowing.
  30. Key takeaways: Reconsider your approach to fragment processing operations. Use tile-based classification on the SPU to optimise heavy fragment processes. Move 2D fragment processing operations such as post process effects or deferred lighting to the SPU. Texture sampling on the SPU is possible too.
  31. Much of the work you need for SPU post processing has already been done for you – download PhyreEngine and you’ll find a complete engine with full source code which implements the effects in this presentation. It also provides the necessary GPU/SPU sync framework and many useful utilities to aid post processing – such as de-tiling of main memory render targets.