SlideShare a Scribd company logo
1 of 71
SPU-based Deferred Shading for Battlefield 3 on Playstation 3 Christina Coffin Platform Specialist, Senior Engineer DICE
Agenda Introduction SPU lighting overview SPU lighting practicalities & algorithms Code optimizations & development practices Best practices Conclusions Q&A
Introduction Maxxing out mature consoles
Past: Frostbite 1 Forward Rendered + Destruction + Limited Lighting
Precomputed + Static = 
Now: Frostbite 2 + Battlefield 3 Indoor + Outdoor + Urban HDR lighting solution Complex lighting with Environment Destruction Deferred shaded Multiple Light types and materials Goal:  ”Use SPUs to distribute shading work and offload the GPU so it can do other work in parallel”
Why SPU-based Deferred Shading? Want more interesting visual lighting + FX Offload GPU work to the SPUs  Having SPU+GPU work together on visuals raises the bar Already developed a tile-based DX 11 compute shader Good reference point for doing deferred work on SPU Lots of SPU compute power to take advantage of Simple sync model to get RSX + SPUs cooperating together
SPU Shading Overview Physically based specular shading model Energy Conserving specular , specular power from 2 to 2048 Lighting performed in camera relative worldspace, float precision fp16 HDR Output Multiple Materials / Lighting models Runs on 5-6 SPUs
Multiple lighting models + materials Standard Metallic Skin Translucency
RSX Rendering Frame Timeline Note: Sizes not proportional to time taken! Geometry Pass Transfer  Gbuffer+Z to CELL memory SPU Job Kick Cascade Shadows + Particles + SSAO JTS Barrier SPU 0 SPU Shade Job SPU 1 SPU Shade Job SPU 2 SPU Shade Job SPU 3 SPU Shade Job SPU 4 SPU Shade Job
RSX Rendering Frame Timeline Note: Sizes not proportional to time taken! Geometry Pass Transfer  Gbuffer+Z to CELL memory SPU Job Kick Cascade Shadows + Particles + SSAO JTS Barrier PostFx + Finish SPU 0 SPU that finishes last tile clears JTS in cmdBuffer by DMA SPU Shade Job SPU 1 SPU Shade Job SPU 2 SPU Shade Job SPU 3 SPU Shade Job SPU 4 SPU Shade Job
GPU Renders GBuffer RSX render to local memory GBuffer Data 4x MRT    ARGB8888 + Z24S8 Tiled Memory
RSX Setup Rendering Data Flow Local Memory Cell Memory Main Geometry Pass (6-10ms) Z Buffer + 4 MRT Z Buffer + 3 MRT RSX Copy     Z+Gbuffers (1.3ms) Inline DWORD Transfer (Spu job kick)
Source data in CELL memory + Stencil SPU SPU SPU Packed array of all lights visible in camera (1000+) SPU SPU SPU
SPU Tile Based Shading work units
SPU Shading Flow Overview For each 64x64 pixel screen tile region: Reserve a tile Transfer & detile data Cull lights Unpack & Shade pixels Transfer shaded pixels to output framebuffer
SPU Tile Work Allocation SPUs determine their tile to process by atomically incrementing a shared tile index value Index value maps to fetch address of Z+Gbuffers per tile Simple sync model to keep SPUs working Auto Load balancing between SPUs Not all tiles take equal time = variable material+lighting complexity
Data transfer + De-TILING Getting the Tile Data onto SPUs....
FB Tile Data Fetch Get MRT 2 DMA Traffic Get MRT 0 Get MRT 1 Get ZBuffer SPU Code Detile 0 Detile 1 Detile Z Detile 2 SPU LS 64x64 Pixels 32bpp (x4) Output Linear Formatted Ready for Shading Work
SPU LIGHT TILE CULLING
SPU Cull lights Determine visible lights that intersect the tile volume in worldspace Tile frusta generation from min/max z-depth extents Ignores ’sky’ depth ( Z >= 0xFFFFFF00) Each tile has a different near/far plane based on its pixels SPU code generates frusta bounding volume for culling Point-, Spot- & Line-light types supported specialized culling vs. tile volumes
SPU SHADING LOOP
SPU Shade pixels SPU Tile Based Shading We do the same things the GPU does in shaders, but written for SPU ISA Vectorize GPU .hlsl / compute shader to get started Negligable differences in float rounding RSX vs SPU Core Steps: Unpack Gbuffer+Z Data -> Shade -> Pack to fp16 -> DMA out
Core shading components 3MRT + Z = 128 bits per pixel of source data Distance attenuation Diffuse  Lighting Mask on Stencil Data Texture Sampling (Limited) Specular Lighting Light Volume Clipping Light Shape Attenuation Wraparound Lighting Attenuation by Surface Normal Fresnel Material Type Shading Blend in Diffuse Albedo
SPU Shading - 4x4 Pixel Quads Core shading loop Operates on 16 pixels at a time Float32 precison Spatially Coherent Lit in worldspace Unpack source data to Structure of Arrays (SoA) format
Gbuffer data expansion to SoA for shading Depth + Stencil Spec Albedo Diffuse Albedo Normals Smoothness Material ID  = Lots shufb+ csfltinstructions for swizzle /mask / converting to float
SPU Light Tile Job Loop for (all pixels) Unpack 16 Pixels of Z+Gbuffer data Apply all PointLights Apply all SpotLights Apply all LineLights Convert lighting output       to fp16 and store to LS
DMA output finished pixels Finished 32x16 pixel tiles output to RSX memory by DMA list 1 List entry per 32x1 pixel scanline Required due to Linear buffer destination  ! Once all tiles are done & transfered: SPU finishing the last tile, clears ’wait for SPUs’ JTS in cmdbuffer GPU is free to continue rendering for the frame Transparent objects Blend-In Particles Post-process Tonemapping
Meanwhile, back in RSX Land.... RSX is busy doing something (useful) while the SPUs compute the fp16 radiance for tiles. Planar Reflections Cascade and Spotlight Shadow Rendering GPU Lighting that mixes texture projection/sampling Offscreen buffer Particle Rendering Z downsamples for postFX + SSAO Virtual Texturing / Compositing Occlusion queries
Algorithmic optimizations
Tile Light Culling Tile Based Culling System	 Designed to handle extreme lighting loads Shading Budget 40ms (split across 5 SPUs) Culling system overhead 1-4ms (1000+ lights)
Tile Light Culling 2 Light Culling passes: FB Tile Cull, 64x64 pixel tiles SubTile Cull, 32x16 pixel tiles
DMA Traffic FB Tile Light Culling Frustum Light List (1000+ Lights) in CELL memory Fetch in 16 Light Batches DmaGet Batch-A DmaGet Batch-B DmaGet Batch-C SPU Code Cull Batch-A Cull Batch-B SPU LS FB Tile Light Visibility List 128 lights
SubTile Light Culling SPU LS FB Tile Light Visibility List 128 lights SubTile 0 Cull for(8 SubTiles) SubTile 1 Cull SubTile 2 Cull Subtile 0 Light Index List Subtile 1 Light Index List Subtile 2 Light Index List
Lights per 64x64 pixel tile
Lights per 64x64 pixel tile
Light Culling Optimizations - Hierarchy Camera Frustum Light Volume Coarse Z-Occlusion FB Tile SPU Z-Cull 64x64 Pixels SubTile SPU Z-Cull 32x16 Pixels Coarse to Fine Grained Culling
Culling Optimizations - Takeaway Complex scenes require aggressive culling Avoids bad performance edge cases Stabilizes performance cost Mix of brute force + simple hierarchy Good Debug Visualizations is key Help guide content optimization and validate culling
Algorithmic optimization #0 Material Classification Knowing which materials reside in a tile = choose optimal SPU code permutation that avoids unneeded work. E.g. No point in calculating skin shading if the material isnt present in a subtile. Use SPU shading code permutations! Similar to GPU optimization via shader permutations SPU Local Store considerations with this approach
Algorithmic optimization #1 Normal Cone Culling Build a conservative bounding normal cone of all pixels in subtile,  Cull lights against it to remove light for entire tile No materials with a wraparound lighting model in the subtile are allowed. (Tile classification) Flat versus heavily normal mapped surfaces dictate win factor
Algorithmic optimization #2 Support diffuse only light sources Common practice in pure GPU rendered games Fill / Area lighting Only use specular contributing light sources where it counts. 2x+ faster Adds additional lighting loop + codesize considerations
Algorithmic optimization #3 Specular Albedo Present in a subtile? If all pixels in a subtile have specular albedo of zero: Execute diffuse only lighting fast path for this case. If your artists like to make everything shiny, you might not see much of a win here
Algorithmic optimization #4 Branch on 4x4 pixel tile intersection with light based on the calculated lighting attenuation term float	attenuation	= 1 / (0.01f + sqrDist); 			attenuation	= max( 0, attenuation + lightThreshold ); if( all 16 pixels have an attenuation value of 0 or less) (continue on to next light)
Branching if attenuation for 16 pixels < 0 . // compare for greater than zero, can use this to saturate attenuation between 0-1 qword	attenMask_0		= si_fcgt( attenuation_0, const_0 ); qword	attenMask_1		= si_fcgt( attenuation_1, const_0 ); qword	attenMask_2		= si_fcgt( attenuation_2, const_0 ); qword	attenMask_3		= si_fcgt( attenuation_3, const_0 ); // ’or’ merge masks from dwords in quadwords (odd pipe) qword	attenMerged_0		= si_orx( attenMask_0 ); qword	attenMerged_1		= si_orx( attenMask_1 ); qword	attenMerged_2		= si_orx( attenMask_2 ); qword	attenMerged_3		= si_orx( attenMask_3 ); // final merge of 4 quadwords with horizonally merged masks qword	attenMerge_01 	= si_or( attenMerged_0, attenMerged_1 ); qword	attenMerge_23 	= si_or( attenMerged_2, attenMerged_3 ); qword	attenMerge_0123	= si_or( attenMerge_01, attenMerge_23 ); if( !si_to_uint(attenMerge_0123)) 	continue;// move to next light!
Algorithmic optimization #5Clipping + Optimizing lighting space Light Clipping Against House Wall No Light Clipping
Algorithmic optimization #5Clipping + Optimizing lighting space Light Clipping Against House Wall No Light Clipping
Why so much culling? Why not adjust content to avoid bad cases? Highly destructible + dynamic environments Variable # of visible lights - depth ‘swiss cheese’ factor Solution must handle distant / scoped views
Algorithmic Optimization # 6 Spherical Gaussian Based Specular Model
Code optimizations
Unpack gbuffer data to Structure of Arrays (SoA) format Obvious must-have for those familiar with SPUs. SPUs are powerful, but crappy data breeds crappy code and wastes significant performance. shufb to get the data in the right format SoA gets us improved pipelining+ more efficient computation 4 quadwords of data Code optimization #0 X0 Y0Z0W0 X1Y1Z1W1 X2Y2Z2W2 X3Y3Z3W3 X0 X1 x2 x3 Y0 Y1 Y2 Y3 Z0 Z1 Z2 Z3 W0 W1 W2 W3
Code optimization #1:	Loop Unrolling First versions worked on different sized horizontal pixel spans 4x1 pixels = Minimum SoA implementation 8x1 pixels = 2x unrolled 16x1 pixels = 4x unrolled 4x4 pixels = 4x unrolled +Improved Spatial Coherency!
Code optimization #2 Branch on Sky pixels in 4x4 pixel processing loops Branches are expensive, but can be a performance win Fully unpacking and shading 16 pixels = a lot of work to branch around Also useful to branch on specific materials Depends on the cost of branching relative to just doing a compute + vectorized select
Code optimization #3 Instruction Pipe Balancing:SPU shading code very heavy on even instruction pipe 	Lots of fm,fma, fa, fsub, csflt .... Avoid shli , or+and (even pipe), 	use rotqbii + shufb (odd pipe) for shifting + masking - Vanilla C code with GCC doesnt reliably do this which is why you should use explicitly use si intrinsics.
Code Optimization #4 Lookup tables for unpacking data    Can be done all in the odd instruction pipe Lighting Code is naturally Even pipe heavy odd pipe is underutilized ! Huge wins for complex functions Minimum work for a win is ~21 cycles for 4 pixels when migrating to LUT. Source gbuffer data channels are 8bit Converted to float and multiplied by constants or values w/ limited range 4k of LS can let us map 4 functions to convert  8bit ->float
Specular power LUT From a GPU shader version .hlsl source: 		half smoothness = gbuffer1.a;// 8 bit source 	// Specular power from 2-2048 with a perceptually linear distribution 		float specularPower = pow(2, 1+smoothness*10); 		// Sloan & Hoffman normalized specular highlight 		float specularNormalizationScale = (specularPower+8)/8;
Remapping functions to Lookups 8bit gbuffer source value = 256 Quadword LUT (4k) Store 4 different function output floats per LUT entry LUT code can use odd instruction pipe exclusively parent shading code is even pipe heavy = WIN Total instructions to do 8 lookups for 4 different pixels: 8 shufb, 4 lqx, 4 rotqbii (all odd pipe) ~21cycles
Code Optimization # 5 SPU Shading Code Overlays Avoids Limitations of limited SPU LS Position Independent Code SPU fetches permutations on demand SPU Local Store CELL Memory Overlay Cache Area SPU Shading Function Permutations SPU DmaGet
Overlay Permutation Building spu-lv2-gcc (-fpic) +  Permutation #defines  .C Master Source Permutation.o Realtime editing + reloading Permutation.bin Embedded into .ELF Permutation.h
Best practices
Code + Development Environment ‘Must-haves’ for maintaining efficiency and *my* sanity: Support toggle between pure RSX implementation and SPU version validate visual parity between versions Runtime SPU job reloading build + reload = ~10 seconds Runtime option to switch running SPU code on 1-6 SPUs Maintain single non-overlay übershader version that compiles into Job Add/remove core features via #define Work out core dataflow and code structuring + debugging in 1 function.
Possible Code Permutations Materials: Skin Translucent Metal Specular Foliage Emissive ‘Default’ Transformations: Different  field of view projections Light Types: Point light Spotlight Line Light Ellipsoid Polygonal Area Lighting Styles: Diffuse only Specular + Diffuse Lighting Clip Planes Pixel Masking by Stencil
Code Permutations Best PracticesMaterial + Light Tile permutations  Still need a catch-all übershader To support worst case (all pixels have different materials + light styles) Fast dev sandboxing versus regenerating all permutations Determining permutations needed is driven by performance Content dependent and relative costs between permutations Managing codesize during dev #define NO_FUNC_PERMUTATIONS // use only ubershader Visualize permutation usage onscreen (color ID screen tiles)
’SPA’  (SPU ASSEMBLER) SPA is good for: Improving Performance* Measuring Cycle counts, dual issue Evaluating loop costs for many permutations Experimenting with variable amounts of loop unrolling Don’t jump too early into writing everything in SPA Smart data layout, C code w/unrolling, SI instrinsics , good culling are foundational elements that should come first.
Conclusions SPUs are more than capable of performing shading work traditionally done by RSX Think of SPUs as another GPU compute resource SPUs can do significantly better light culling than the RSX RSX+SPU combined shading creates a great opportunity to raise the bar on graphics!
Special Thanks Johan Andersson Frostbite Rendering Team Daniel Collin Andreas Fredriksson Colin Barre-Brisebois Steven Tovey + Matt Swoboda @ SCEE Everyone at DICE
Questions? Email: christina.coffin@dice.se Blog:     http://web.mac.com/christinacoffin/ Twitter: @christinacoffin Battlefield 3 & Frostbite 2 talks at GDC’11: For more DICE talks: http://publications.dice.se
References A Bizarre Way to do Real-Time Lighting http://www.spuify.co.uk/?p=323 Deferred Lighting and Post Processing on PLAYSTATION®3 http://www.technology.scee.net/files/presentations/gdc2009/DeferredLightingandPostProcessingonPS3.ppt SPU Shaders - Mike Acton, Insomniac Games www.insomniacgames.com/tech/articles/0108/files/spu_shaders.pdf Deferred Rendering in Killzone 2 http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf Bending the graphics pipeline http://www.slideshare.net/DICEStudio/bending-the-graphics-pipeline A Real-Time Radiosity Architecture http://www.slideshare.net/DICEStudio/siggraph10-arrrealtime-radiosityarchitecture

More Related Content

What's hot

Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbiteElectronic Arts / DICE
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Tiago Sousa
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingElectronic Arts / DICE
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbiteElectronic Arts / DICE
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Takahiro Harada
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
 

What's hot (20)

Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Lighting you up in Battlefield 3
Lighting you up in Battlefield 3Lighting you up in Battlefield 3
Lighting you up in Battlefield 3
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Light prepass
Light prepassLight prepass
Light prepass
 
Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)Forward+ (EUROGRAPHICS 2012)
Forward+ (EUROGRAPHICS 2012)
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 

Viewers also liked

Battlelog - Building scalable web sites with tight game integration
Battlelog - Building scalable web sites with tight game integrationBattlelog - Building scalable web sites with tight game integration
Battlelog - Building scalable web sites with tight game integrationElectronic Arts / DICE
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)Johan Andersson
 
How data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesHow data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesElectronic Arts / DICE
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsElectronic Arts / DICE
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 
Stylized Rendering in Battlefield Heroes
Stylized Rendering in Battlefield HeroesStylized Rendering in Battlefield Heroes
Stylized Rendering in Battlefield HeroesElectronic Arts / DICE
 
Building the Battlefield AI Experience
Building the Battlefield AI ExperienceBuilding the Battlefield AI Experience
Building the Battlefield AI ExperienceElectronic Arts / DICE
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Level Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's EdgeLevel Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's EdgeElectronic Arts / DICE
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 

Viewers also liked (15)

Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Battlelog - Building scalable web sites with tight game integration
Battlelog - Building scalable web sites with tight game integrationBattlelog - Building scalable web sites with tight game integration
Battlelog - Building scalable web sites with tight game integration
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 
How data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesHow data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield Heroes
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
Stylized Rendering in Battlefield Heroes
Stylized Rendering in Battlefield HeroesStylized Rendering in Battlefield Heroes
Stylized Rendering in Battlefield Heroes
 
Building the Battlefield AI Experience
Building the Battlefield AI ExperienceBuilding the Battlefield AI Experience
Building the Battlefield AI Experience
 
A Step Towards Data Orientation
A Step Towards Data OrientationA Step Towards Data Orientation
A Step Towards Data Orientation
 
Introduction to Data Oriented Design
Introduction to Data Oriented DesignIntroduction to Data Oriented Design
Introduction to Data Oriented Design
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Level Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's EdgeLevel Design Challenges & Solutions - Mirror's Edge
Level Design Challenges & Solutions - Mirror's Edge
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 

Similar to SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3

SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3Slide_N
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2Guerrilla
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemGuerrilla
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Gdc2011 direct x 11 rendering in battlefield 3
Gdc2011 direct x 11 rendering in battlefield 3Gdc2011 direct x 11 rendering in battlefield 3
Gdc2011 direct x 11 rendering in battlefield 3drandom
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoonmochimedia
 
Rendering basics
Rendering basicsRendering basics
Rendering basicsicedmaster
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1qpqpqp
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingSteven Tovey
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Next generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedbergNext generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedbergMary Chan
 
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAAParis Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAAWolfgang Engel
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)Mark Kilgard
 
Unity AMD FSR - SIGGRAPH 2021.pptx
Unity AMD FSR - SIGGRAPH 2021.pptxUnity AMD FSR - SIGGRAPH 2021.pptx
Unity AMD FSR - SIGGRAPH 2021.pptxssuser2c3c67
 

Similar to SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3 (20)

SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 
The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2The Rendering Technology of Killzone 2
The Rendering Technology of Killzone 2
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Gdc2011 direct x 11 rendering in battlefield 3
Gdc2011 direct x 11 rendering in battlefield 3Gdc2011 direct x 11 rendering in battlefield 3
Gdc2011 direct x 11 rendering in battlefield 3
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoon
 
Rendering basics
Rendering basicsRendering basics
Rendering basics
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time Lighting
 
Xbox
XboxXbox
Xbox
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Next generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedbergNext generation mobile gp us and rendering techniques - niklas smedberg
Next generation mobile gp us and rendering techniques - niklas smedberg
 
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAAParis Master Class 2011 - 01 Deferred Lighting, MSAA
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
 
Unity AMD FSR - SIGGRAPH 2021.pptx
Unity AMD FSR - SIGGRAPH 2021.pptxUnity AMD FSR - SIGGRAPH 2021.pptx
Unity AMD FSR - SIGGRAPH 2021.pptx
 

More from Electronic Arts / DICE

GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeSIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeElectronic Arts / DICE
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingElectronic Arts / DICE
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanElectronic Arts / DICE
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingCEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingElectronic Arts / DICE
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismCEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismElectronic Arts / DICE
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringElectronic Arts / DICE
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...Electronic Arts / DICE
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingElectronic Arts / DICE
 
Creativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsCreativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsElectronic Arts / DICE
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDShiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDElectronic Arts / DICE
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontElectronic Arts / DICE
 

More from Electronic Arts / DICE (20)

GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's EdgeSIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time RaytracingCEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
 
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and ProceduralismCEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
 
Creativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural SystemsCreativity of Rules and Patterns: Designing Procedural Systems
Creativity of Rules and Patterns: Designing Procedural Systems
 
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEEDShiny Pixels and Beyond: Real-Time Raytracing at SEED
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Battlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + MantleBattlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + Mantle
 

Recently uploaded

the red riding girl story fkjgoifdjgijogifdoin
the red riding girl story fkjgoifdjgijogifdointhe red riding girl story fkjgoifdjgijogifdoin
the red riding girl story fkjgoifdjgijogifdoinkomalbattersea123
 
WHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdf
WHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdfWHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdf
WHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdfOptimistic18
 
Title Unlocking Imagination The Importance of Toca Boca for Kids.pdf
Title Unlocking Imagination The Importance of Toca Boca for Kids.pdfTitle Unlocking Imagination The Importance of Toca Boca for Kids.pdf
Title Unlocking Imagination The Importance of Toca Boca for Kids.pdfToca boca
 
Vibration Control.pptxjjjjjjjjjjjjjjjjjjjjj
Vibration Control.pptxjjjjjjjjjjjjjjjjjjjjjVibration Control.pptxjjjjjjjjjjjjjjjjjjjjj
Vibration Control.pptxjjjjjjjjjjjjjjjjjjjjjjoshuaclack73
 
在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样
在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样
在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样ahafux
 
WHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdf
WHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdfWHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdf
WHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdfOptimistic18
 
Vector Methods.pptxjjjjjjjjjjjjjjjjjjjjjj
Vector Methods.pptxjjjjjjjjjjjjjjjjjjjjjjVector Methods.pptxjjjjjjjjjjjjjjjjjjjjjj
Vector Methods.pptxjjjjjjjjjjjjjjjjjjjjjjjoshuaclack73
 
Abortion Pills In Jeddah+966572737505 & Get cytotec Jeddah
Abortion Pills In Jeddah+966572737505 & Get cytotec JeddahAbortion Pills In Jeddah+966572737505 & Get cytotec Jeddah
Abortion Pills In Jeddah+966572737505 & Get cytotec Jeddahmarufhussain782445
 
8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...
8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...
8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...Marwah Studios
 
Smart-Dustbin-Using-EEEEEEESP32 (1).pptx
Smart-Dustbin-Using-EEEEEEESP32 (1).pptxSmart-Dustbin-Using-EEEEEEESP32 (1).pptx
Smart-Dustbin-Using-EEEEEEESP32 (1).pptxsrajece
 
The Gaming Quiz - 17th April 2024, Quiz Club NITW
The Gaming Quiz - 17th April 2024,  Quiz Club NITWThe Gaming Quiz - 17th April 2024,  Quiz Club NITW
The Gaming Quiz - 17th April 2024, Quiz Club NITWQuiz Club NITW
 
WHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdf
WHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdfWHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdf
WHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdfOptimistic18
 
batwheels_01batwheels_01batwheels_01batwheels_01
batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01
batwheels_01batwheels_01batwheels_01batwheels_01Patricia Pham
 
NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...
NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...
NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...Amil Baba Dawood bangali
 
The Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdf
The Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdfThe Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdf
The Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdfXtreame HDTV
 
NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...
NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...
NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...Amil Baba Dawood bangali
 
C&C Artists' Websites .
C&C Artists' Websites                       .C&C Artists' Websites                       .
C&C Artists' Websites .LukeNash7
 
week 3 questions and answers.phhhhhhhhhhptx
week 3 questions and answers.phhhhhhhhhhptxweek 3 questions and answers.phhhhhhhhhhptx
week 3 questions and answers.phhhhhhhhhhptxjoshuaclack73
 

Recently uploaded (19)

the red riding girl story fkjgoifdjgijogifdoin
the red riding girl story fkjgoifdjgijogifdointhe red riding girl story fkjgoifdjgijogifdoin
the red riding girl story fkjgoifdjgijogifdoin
 
WHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdf
WHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdfWHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdf
WHO KILLED ALASKA? #15: "5½ STORIES Part Two" Transcript .pdf
 
Title Unlocking Imagination The Importance of Toca Boca for Kids.pdf
Title Unlocking Imagination The Importance of Toca Boca for Kids.pdfTitle Unlocking Imagination The Importance of Toca Boca for Kids.pdf
Title Unlocking Imagination The Importance of Toca Boca for Kids.pdf
 
Vibration Control.pptxjjjjjjjjjjjjjjjjjjjjj
Vibration Control.pptxjjjjjjjjjjjjjjjjjjjjjVibration Control.pptxjjjjjjjjjjjjjjjjjjjjj
Vibration Control.pptxjjjjjjjjjjjjjjjjjjjjj
 
在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样
在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样
在线办理(concordia学位证书)康考迪亚大学毕业证学历学位证书学费发票原版一模一样
 
WHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdf
WHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdfWHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdf
WHO KILLED ALASKA? #17: Mirror Memoria - "OFFICER" TRANSCRIPT.pdf
 
kiff2
kiff2kiff2
kiff2
 
Vector Methods.pptxjjjjjjjjjjjjjjjjjjjjjj
Vector Methods.pptxjjjjjjjjjjjjjjjjjjjjjjVector Methods.pptxjjjjjjjjjjjjjjjjjjjjjj
Vector Methods.pptxjjjjjjjjjjjjjjjjjjjjjj
 
Abortion Pills In Jeddah+966572737505 & Get cytotec Jeddah
Abortion Pills In Jeddah+966572737505 & Get cytotec JeddahAbortion Pills In Jeddah+966572737505 & Get cytotec Jeddah
Abortion Pills In Jeddah+966572737505 & Get cytotec Jeddah
 
8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...
8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...
8th Global Fashion and Design Week Noida 2024 Sets New Standards in Creative ...
 
Smart-Dustbin-Using-EEEEEEESP32 (1).pptx
Smart-Dustbin-Using-EEEEEEESP32 (1).pptxSmart-Dustbin-Using-EEEEEEESP32 (1).pptx
Smart-Dustbin-Using-EEEEEEESP32 (1).pptx
 
The Gaming Quiz - 17th April 2024, Quiz Club NITW
The Gaming Quiz - 17th April 2024,  Quiz Club NITWThe Gaming Quiz - 17th April 2024,  Quiz Club NITW
The Gaming Quiz - 17th April 2024, Quiz Club NITW
 
WHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdf
WHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdfWHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdf
WHO KILLED ALASKA? #18: Mirror Memoria - "TATTOO" TRANSCRIPT.pdf
 
batwheels_01batwheels_01batwheels_01batwheels_01
batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01batwheels_01
batwheels_01batwheels_01batwheels_01batwheels_01
 
NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...
NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...
NO1 Pakistan Vashikaran Specialist in Uk Black Magic Specialist in Uk Black M...
 
The Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdf
The Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdfThe Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdf
The Ultimate Guide to Choosing the Best HD IPTV Service in 2024.pdf
 
NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...
NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...
NO1 Pakistan kala jadu Specialist Expert in Quetta, Gujranwala, muzaffarabad,...
 
C&C Artists' Websites .
C&C Artists' Websites                       .C&C Artists' Websites                       .
C&C Artists' Websites .
 
week 3 questions and answers.phhhhhhhhhhptx
week 3 questions and answers.phhhhhhhhhhptxweek 3 questions and answers.phhhhhhhhhhptx
week 3 questions and answers.phhhhhhhhhhptx
 

SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3

  • 1. SPU-based Deferred Shading for Battlefield 3 on Playstation 3 Christina Coffin Platform Specialist, Senior Engineer DICE
  • 2. Agenda Introduction SPU lighting overview SPU lighting practicalities & algorithms Code optimizations & development practices Best practices Conclusions Q&A
  • 3. Introduction Maxxing out mature consoles
  • 4. Past: Frostbite 1 Forward Rendered + Destruction + Limited Lighting
  • 6. Now: Frostbite 2 + Battlefield 3 Indoor + Outdoor + Urban HDR lighting solution Complex lighting with Environment Destruction Deferred shaded Multiple Light types and materials Goal: ”Use SPUs to distribute shading work and offload the GPU so it can do other work in parallel”
  • 7. Why SPU-based Deferred Shading? Want more interesting visual lighting + FX Offload GPU work to the SPUs Having SPU+GPU work together on visuals raises the bar Already developed a tile-based DX 11 compute shader Good reference point for doing deferred work on SPU Lots of SPU compute power to take advantage of Simple sync model to get RSX + SPUs cooperating together
  • 8. SPU Shading Overview Physically based specular shading model Energy Conserving specular , specular power from 2 to 2048 Lighting performed in camera relative worldspace, float precision fp16 HDR Output Multiple Materials / Lighting models Runs on 5-6 SPUs
  • 9. Multiple lighting models + materials Standard Metallic Skin Translucency
  • 10. RSX Rendering Frame Timeline Note: Sizes not proportional to time taken! Geometry Pass Transfer Gbuffer+Z to CELL memory SPU Job Kick Cascade Shadows + Particles + SSAO JTS Barrier SPU 0 SPU Shade Job SPU 1 SPU Shade Job SPU 2 SPU Shade Job SPU 3 SPU Shade Job SPU 4 SPU Shade Job
  • 11. RSX Rendering Frame Timeline Note: Sizes not proportional to time taken! Geometry Pass Transfer Gbuffer+Z to CELL memory SPU Job Kick Cascade Shadows + Particles + SSAO JTS Barrier PostFx + Finish SPU 0 SPU that finishes last tile clears JTS in cmdBuffer by DMA SPU Shade Job SPU 1 SPU Shade Job SPU 2 SPU Shade Job SPU 3 SPU Shade Job SPU 4 SPU Shade Job
  • 12. GPU Renders GBuffer RSX render to local memory GBuffer Data 4x MRT ARGB8888 + Z24S8 Tiled Memory
  • 13. RSX Setup Rendering Data Flow Local Memory Cell Memory Main Geometry Pass (6-10ms) Z Buffer + 4 MRT Z Buffer + 3 MRT RSX Copy Z+Gbuffers (1.3ms) Inline DWORD Transfer (Spu job kick)
  • 14. Source data in CELL memory + Stencil SPU SPU SPU Packed array of all lights visible in camera (1000+) SPU SPU SPU
  • 15. SPU Tile Based Shading work units
  • 16. SPU Shading Flow Overview For each 64x64 pixel screen tile region: Reserve a tile Transfer & detile data Cull lights Unpack & Shade pixels Transfer shaded pixels to output framebuffer
  • 17. SPU Tile Work Allocation SPUs determine their tile to process by atomically incrementing a shared tile index value Index value maps to fetch address of Z+Gbuffers per tile Simple sync model to keep SPUs working Auto Load balancing between SPUs Not all tiles take equal time = variable material+lighting complexity
  • 18. Data transfer + De-TILING Getting the Tile Data onto SPUs....
  • 19. FB Tile Data Fetch Get MRT 2 DMA Traffic Get MRT 0 Get MRT 1 Get ZBuffer SPU Code Detile 0 Detile 1 Detile Z Detile 2 SPU LS 64x64 Pixels 32bpp (x4) Output Linear Formatted Ready for Shading Work
  • 20. SPU LIGHT TILE CULLING
  • 21. SPU Cull lights Determine visible lights that intersect the tile volume in worldspace Tile frusta generation from min/max z-depth extents Ignores ’sky’ depth ( Z >= 0xFFFFFF00) Each tile has a different near/far plane based on its pixels SPU code generates frusta bounding volume for culling Point-, Spot- & Line-light types supported specialized culling vs. tile volumes
  • 23. SPU Shade pixels SPU Tile Based Shading We do the same things the GPU does in shaders, but written for SPU ISA Vectorize GPU .hlsl / compute shader to get started Negligable differences in float rounding RSX vs SPU Core Steps: Unpack Gbuffer+Z Data -> Shade -> Pack to fp16 -> DMA out
  • 24. Core shading components 3MRT + Z = 128 bits per pixel of source data Distance attenuation Diffuse Lighting Mask on Stencil Data Texture Sampling (Limited) Specular Lighting Light Volume Clipping Light Shape Attenuation Wraparound Lighting Attenuation by Surface Normal Fresnel Material Type Shading Blend in Diffuse Albedo
  • 25. SPU Shading - 4x4 Pixel Quads Core shading loop Operates on 16 pixels at a time Float32 precison Spatially Coherent Lit in worldspace Unpack source data to Structure of Arrays (SoA) format
  • 26. Gbuffer data expansion to SoA for shading Depth + Stencil Spec Albedo Diffuse Albedo Normals Smoothness Material ID = Lots shufb+ csfltinstructions for swizzle /mask / converting to float
  • 27. SPU Light Tile Job Loop for (all pixels) Unpack 16 Pixels of Z+Gbuffer data Apply all PointLights Apply all SpotLights Apply all LineLights Convert lighting output to fp16 and store to LS
  • 28. DMA output finished pixels Finished 32x16 pixel tiles output to RSX memory by DMA list 1 List entry per 32x1 pixel scanline Required due to Linear buffer destination ! Once all tiles are done & transfered: SPU finishing the last tile, clears ’wait for SPUs’ JTS in cmdbuffer GPU is free to continue rendering for the frame Transparent objects Blend-In Particles Post-process Tonemapping
  • 29. Meanwhile, back in RSX Land.... RSX is busy doing something (useful) while the SPUs compute the fp16 radiance for tiles. Planar Reflections Cascade and Spotlight Shadow Rendering GPU Lighting that mixes texture projection/sampling Offscreen buffer Particle Rendering Z downsamples for postFX + SSAO Virtual Texturing / Compositing Occlusion queries
  • 31. Tile Light Culling Tile Based Culling System Designed to handle extreme lighting loads Shading Budget 40ms (split across 5 SPUs) Culling system overhead 1-4ms (1000+ lights)
  • 32. Tile Light Culling 2 Light Culling passes: FB Tile Cull, 64x64 pixel tiles SubTile Cull, 32x16 pixel tiles
  • 33. DMA Traffic FB Tile Light Culling Frustum Light List (1000+ Lights) in CELL memory Fetch in 16 Light Batches DmaGet Batch-A DmaGet Batch-B DmaGet Batch-C SPU Code Cull Batch-A Cull Batch-B SPU LS FB Tile Light Visibility List 128 lights
  • 34. SubTile Light Culling SPU LS FB Tile Light Visibility List 128 lights SubTile 0 Cull for(8 SubTiles) SubTile 1 Cull SubTile 2 Cull Subtile 0 Light Index List Subtile 1 Light Index List Subtile 2 Light Index List
  • 35.
  • 36. Lights per 64x64 pixel tile
  • 37. Lights per 64x64 pixel tile
  • 38. Light Culling Optimizations - Hierarchy Camera Frustum Light Volume Coarse Z-Occlusion FB Tile SPU Z-Cull 64x64 Pixels SubTile SPU Z-Cull 32x16 Pixels Coarse to Fine Grained Culling
  • 39. Culling Optimizations - Takeaway Complex scenes require aggressive culling Avoids bad performance edge cases Stabilizes performance cost Mix of brute force + simple hierarchy Good Debug Visualizations is key Help guide content optimization and validate culling
  • 40. Algorithmic optimization #0 Material Classification Knowing which materials reside in a tile = choose optimal SPU code permutation that avoids unneeded work. E.g. No point in calculating skin shading if the material isnt present in a subtile. Use SPU shading code permutations! Similar to GPU optimization via shader permutations SPU Local Store considerations with this approach
  • 41. Algorithmic optimization #1 Normal Cone Culling Build a conservative bounding normal cone of all pixels in subtile, Cull lights against it to remove light for entire tile No materials with a wraparound lighting model in the subtile are allowed. (Tile classification) Flat versus heavily normal mapped surfaces dictate win factor
  • 42.
  • 43.
  • 44.
  • 45. Algorithmic optimization #2 Support diffuse only light sources Common practice in pure GPU rendered games Fill / Area lighting Only use specular contributing light sources where it counts. 2x+ faster Adds additional lighting loop + codesize considerations
  • 46. Algorithmic optimization #3 Specular Albedo Present in a subtile? If all pixels in a subtile have specular albedo of zero: Execute diffuse only lighting fast path for this case. If your artists like to make everything shiny, you might not see much of a win here
  • 47. Algorithmic optimization #4 Branch on 4x4 pixel tile intersection with light based on the calculated lighting attenuation term float attenuation = 1 / (0.01f + sqrDist); attenuation = max( 0, attenuation + lightThreshold ); if( all 16 pixels have an attenuation value of 0 or less) (continue on to next light)
  • 48. Branching if attenuation for 16 pixels < 0 . // compare for greater than zero, can use this to saturate attenuation between 0-1 qword attenMask_0 = si_fcgt( attenuation_0, const_0 ); qword attenMask_1 = si_fcgt( attenuation_1, const_0 ); qword attenMask_2 = si_fcgt( attenuation_2, const_0 ); qword attenMask_3 = si_fcgt( attenuation_3, const_0 ); // ’or’ merge masks from dwords in quadwords (odd pipe) qword attenMerged_0 = si_orx( attenMask_0 ); qword attenMerged_1 = si_orx( attenMask_1 ); qword attenMerged_2 = si_orx( attenMask_2 ); qword attenMerged_3 = si_orx( attenMask_3 ); // final merge of 4 quadwords with horizonally merged masks qword attenMerge_01 = si_or( attenMerged_0, attenMerged_1 ); qword attenMerge_23 = si_or( attenMerged_2, attenMerged_3 ); qword attenMerge_0123 = si_or( attenMerge_01, attenMerge_23 ); if( !si_to_uint(attenMerge_0123)) continue;// move to next light!
  • 49. Algorithmic optimization #5Clipping + Optimizing lighting space Light Clipping Against House Wall No Light Clipping
  • 50. Algorithmic optimization #5Clipping + Optimizing lighting space Light Clipping Against House Wall No Light Clipping
  • 51. Why so much culling? Why not adjust content to avoid bad cases? Highly destructible + dynamic environments Variable # of visible lights - depth ‘swiss cheese’ factor Solution must handle distant / scoped views
  • 52. Algorithmic Optimization # 6 Spherical Gaussian Based Specular Model
  • 54. Unpack gbuffer data to Structure of Arrays (SoA) format Obvious must-have for those familiar with SPUs. SPUs are powerful, but crappy data breeds crappy code and wastes significant performance. shufb to get the data in the right format SoA gets us improved pipelining+ more efficient computation 4 quadwords of data Code optimization #0 X0 Y0Z0W0 X1Y1Z1W1 X2Y2Z2W2 X3Y3Z3W3 X0 X1 x2 x3 Y0 Y1 Y2 Y3 Z0 Z1 Z2 Z3 W0 W1 W2 W3
  • 55. Code optimization #1: Loop Unrolling First versions worked on different sized horizontal pixel spans 4x1 pixels = Minimum SoA implementation 8x1 pixels = 2x unrolled 16x1 pixels = 4x unrolled 4x4 pixels = 4x unrolled +Improved Spatial Coherency!
  • 56. Code optimization #2 Branch on Sky pixels in 4x4 pixel processing loops Branches are expensive, but can be a performance win Fully unpacking and shading 16 pixels = a lot of work to branch around Also useful to branch on specific materials Depends on the cost of branching relative to just doing a compute + vectorized select
  • 57. Code optimization #3 Instruction Pipe Balancing:SPU shading code very heavy on even instruction pipe Lots of fm,fma, fa, fsub, csflt .... Avoid shli , or+and (even pipe), use rotqbii + shufb (odd pipe) for shifting + masking - Vanilla C code with GCC doesnt reliably do this which is why you should use explicitly use si intrinsics.
  • 58. Code Optimization #4 Lookup tables for unpacking data Can be done all in the odd instruction pipe Lighting Code is naturally Even pipe heavy odd pipe is underutilized ! Huge wins for complex functions Minimum work for a win is ~21 cycles for 4 pixels when migrating to LUT. Source gbuffer data channels are 8bit Converted to float and multiplied by constants or values w/ limited range 4k of LS can let us map 4 functions to convert 8bit ->float
  • 59. Specular power LUT From a GPU shader version .hlsl source: half smoothness = gbuffer1.a;// 8 bit source // Specular power from 2-2048 with a perceptually linear distribution float specularPower = pow(2, 1+smoothness*10); // Sloan & Hoffman normalized specular highlight float specularNormalizationScale = (specularPower+8)/8;
  • 60. Remapping functions to Lookups 8bit gbuffer source value = 256 Quadword LUT (4k) Store 4 different function output floats per LUT entry LUT code can use odd instruction pipe exclusively parent shading code is even pipe heavy = WIN Total instructions to do 8 lookups for 4 different pixels: 8 shufb, 4 lqx, 4 rotqbii (all odd pipe) ~21cycles
  • 61. Code Optimization # 5 SPU Shading Code Overlays Avoids Limitations of limited SPU LS Position Independent Code SPU fetches permutations on demand SPU Local Store CELL Memory Overlay Cache Area SPU Shading Function Permutations SPU DmaGet
  • 62. Overlay Permutation Building spu-lv2-gcc (-fpic) + Permutation #defines .C Master Source Permutation.o Realtime editing + reloading Permutation.bin Embedded into .ELF Permutation.h
  • 64. Code + Development Environment ‘Must-haves’ for maintaining efficiency and *my* sanity: Support toggle between pure RSX implementation and SPU version validate visual parity between versions Runtime SPU job reloading build + reload = ~10 seconds Runtime option to switch running SPU code on 1-6 SPUs Maintain single non-overlay übershader version that compiles into Job Add/remove core features via #define Work out core dataflow and code structuring + debugging in 1 function.
  • 65. Possible Code Permutations Materials: Skin Translucent Metal Specular Foliage Emissive ‘Default’ Transformations: Different field of view projections Light Types: Point light Spotlight Line Light Ellipsoid Polygonal Area Lighting Styles: Diffuse only Specular + Diffuse Lighting Clip Planes Pixel Masking by Stencil
  • 66. Code Permutations Best PracticesMaterial + Light Tile permutations Still need a catch-all übershader To support worst case (all pixels have different materials + light styles) Fast dev sandboxing versus regenerating all permutations Determining permutations needed is driven by performance Content dependent and relative costs between permutations Managing codesize during dev #define NO_FUNC_PERMUTATIONS // use only ubershader Visualize permutation usage onscreen (color ID screen tiles)
  • 67. ’SPA’ (SPU ASSEMBLER) SPA is good for: Improving Performance* Measuring Cycle counts, dual issue Evaluating loop costs for many permutations Experimenting with variable amounts of loop unrolling Don’t jump too early into writing everything in SPA Smart data layout, C code w/unrolling, SI instrinsics , good culling are foundational elements that should come first.
  • 68. Conclusions SPUs are more than capable of performing shading work traditionally done by RSX Think of SPUs as another GPU compute resource SPUs can do significantly better light culling than the RSX RSX+SPU combined shading creates a great opportunity to raise the bar on graphics!
  • 69. Special Thanks Johan Andersson Frostbite Rendering Team Daniel Collin Andreas Fredriksson Colin Barre-Brisebois Steven Tovey + Matt Swoboda @ SCEE Everyone at DICE
  • 70. Questions? Email: christina.coffin@dice.se Blog: http://web.mac.com/christinacoffin/ Twitter: @christinacoffin Battlefield 3 & Frostbite 2 talks at GDC’11: For more DICE talks: http://publications.dice.se
  • 71. References A Bizarre Way to do Real-Time Lighting http://www.spuify.co.uk/?p=323 Deferred Lighting and Post Processing on PLAYSTATION®3 http://www.technology.scee.net/files/presentations/gdc2009/DeferredLightingandPostProcessingonPS3.ppt SPU Shaders - Mike Acton, Insomniac Games www.insomniacgames.com/tech/articles/0108/files/spu_shaders.pdf Deferred Rendering in Killzone 2 http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf Bending the graphics pipeline http://www.slideshare.net/DICEStudio/bending-the-graphics-pipeline A Real-Time Radiosity Architecture http://www.slideshare.net/DICEStudio/siggraph10-arrrealtime-radiosityarchitecture