SlideShare a Scribd company logo
Parallel Graphics in Frostbite – Current & Future Johan Andersson DICE
Menu Game engine CPU & GPU parallelism Rendering techniques & systems – old & new Mixed in with some future predictions & wishes
Quick background Frostbite 1.x   [1][2][3] Xbox 360, PS3, DX10 Battlefield: Bad Company (shipped) Battlefield 1943 (shipped) Battlefield: Bad Company 2 Frostbite 2  [4][5] In development Xbox 360, PS3 DX11 (10.0, 10.1, 11) Disclaimer: Unless specified, pictures are from engine tests, not actual games
Job-based parallelism Must utilize all cores in the engine Xbox 360: 6 HW threads PS3: 2 HW threads + 6 great SPUs PC: 2-8 HW threads  And many more coming Divide up systems into Jobs Async function calls with explicit inputs & outputs Typically fully independent stateless functions Makes it easier on PS3 SPU & in general Job dependencies create job graph All cores consume jobs CELL processor – We like
Frostbite CPU job graph Frame job graph  from Frostbite 1 (PS3) Build big job graphs Batch, batch, batch Mix CPU- & SPU-jobs  Future: Mix in low-latency GPU-jobs Job dependencies determine: Execution order  Sync points Load balancing I.e. the effective parallelism Braided Parallelism* [6] Intermixed task- & data-parallelism * Still only 10 hits on google (yet!), but I like Aaron’s term
Rendering jobs Rendering systems are heavily divided up into jobs Jobs: Terrain geometry processing Undergrowth generation [2] Decal projection [3] Particle simulation Frustum culling Occlusion culling Occlusion rasterization Command buffer generation PS3: Triangle culling Most will move to GPU Eventually..  A few have already! Mostly one-way data flow I will talk about a couple of these..
Parallel command buffer recording  Dispatch draw calls and state to multiple command buffers in parallel Scales linearly with # cores 1500-4000 draw calls per frame Reduces latency & improves performance Important for all platforms, used on: Xbox 360 PS3 (SPU-based) PC DX11 Previously not possible on PC, but now in DX11...
DX11 parallel dispatch First class citizen in DX11 Killer feature for reducing CPU overhead & latency ~90% of our rendering dispatch job time is in D3D/driver DX11 deferred device context per core  Together with dynamic resources (cbuffer/vbuffer) for usage on that deferred context Renderer has list of all draw calls we want to do for each rendering “layer” of the frame Split draw calls for each layer into chunks of ~256 and dispatch in parallel to the deferred contexts Each chunk generates a command list Render to immediate context & execute command lists Profit!  Goal: close to linear scaling up to octa-core when we get full DX11 driver support (up to the IHVs now) Future note: This is ”just” a stopgap measure until we evolve the GPU to be able to fully feed itself (hi LRB)
Occlusion culling Problem: Buildings & env occlude large amounts of objects Invisible objects still have to: Update logic & animations Generate command buffer Processed on CPU & GPU Difficult to implement full culling Destructible buildings Dynamic occludees Difficult to precompute  GPU occlusion queries can be heavy to render From Battlefield: Bad Company PS3
Our solution: Software occlusion rasterization
Software occlusion culling Rasterize coarse zbuffer on SPU/CPU 256x114 float Good fit in SPU LS, but could be 16-bit Low-poly occluder meshes Manually conservative 100 m view distance Max 10000 vertices/frame Parallel SPU vertex & raster jobs Cost: a few milliseconds Then cull all objects against zbuffer ,[object Object]
Screen-space bounding-box testPictures & numbers from  Battlefield: Bad Company PS3
GPU occlusion culling Ideally want GPU rasterization & testing, but: Occlusion queries introduces overhead & latency Can be manageable, but far from ideal Conditional rendering only helps GPU Not CPU, frame memory or draw calls Future 1: Low-latency extra GPU exec. context Rasterization and testing done on GPU where it belongs Lockstep with CPU, need to read back data within a few ms Should be possible on LRB (latency?), want on all HW Future 2: Move entire cull & rendering to ”GPU” World rep., cull, systems, dispatch. End goal.
PS3 geometry processing Problem: Slow GPU triangle & vertex setup on PS3 Combined with unique situation with powerful & initially not fully utilized ”free” SPUs! Solution: SPU triangle culling Trade SPU time for GPU time Cull all back faces, micro-triangles, out of frustum Based on Sony’s PS3 EDGE library [7] Also see Jon Olick’s talk from the course last year 5 SPU jobs processes frame geometry in parallel Output is new index buffer for each draw call
Custom geometry processing Software control opens up great flexibility and programmability!  Simple custom culling/processing that we’ve added: Partition bounding box culling Mesh part culling Clip plane triangle trivial accept & reject Triangle cull volumes (inverse clip planes) Others are doing: Full skinning, morph targets, CLOD, cloth Future wish: No forced/fixed vertex & geometry shaders DIY compute shaders with fixed-func stages (tesselation and rasterization) Software-controlled queuing of data between stages To avoid always spilling out to memory
Decal projection Traditionally a CPU process Relying on identical visual & physics representation  Or duplicated mesh data in CPU memory (on PC)  ,[object Object]
UMA! 
Project in SPU-jobs
Output VB/IB to GPU,[object Object]
No CPU memory or upload
Huge decals + huge meshes,[object Object]
Screen-space tile classification Divide screen up into tiles and determine how many & which light sources intersect each tile Only apply the visible light sources on pixels in each tile Reduced BW & setup cost with multiple lights in single shader Used in Naughty Dog’s Uncharted [9] and SCEE PhyreEngine [10] Hmm, isn’t light classification per screen-space tile sort of similar of how a compute shader can work with 2D thread groups? Answer: YES, except a CS can do everything in a single pass! From ”The Technology of Uncharted". GDC’08 [9]
CS-based deferred shading  Deferred shading using DX11 CS ,[object Object]
Not production tested or optimized
Compute Shader 5.0
Assumption: No shadows (for now)New hybrid Graphics/Compute shading pipeline: Graphics pipeline rasterizes gbuffers for opaque surfaces Compute pipeline uses gbuffers, culls light sources, computes lighting & combines with shading (multiple other variants also possible)
CS requirements & setup Input data is gbuffers, depth buffer & light constants Output is fully composited & lit HDR texture 1 thread per pixel, 16x16 thread groups (aka tile) Normal Roughness Texture2D<float4> gbufferTexture1 : register(t0); Texture2D<float4> gbufferTexture2 : register(t1); Texture2D<float4> gbufferTexture3 : register(t2); Texture2D<float4> depthTexture : register(t3); RWTexture2D<float4> outputTexture : register(u0); #define BLOCK_SIZE 16 [numthreads(BLOCK_SIZE,BLOCK_SIZE,1)] void csMain(     uint3 groupId : SV_GroupID,     uint3 groupThreadId : SV_GroupThreadID,     uint groupIndex: SV_GroupIndex,     uint3 dispatchThreadId : SV_DispatchThreadID) {     ... } Diffuse Albedo Specular Albedo
CS steps 1-2 groupshared uint minDepthInt; groupshared uint maxDepthInt; // --- globals above, function below ------- float depth =        depthTexture.Load(uint3(texCoord, 0)).r; uint depthInt = asuint(depth); minDepthInt = 0xFFFFFFFF; maxDepthInt = 0; GroupMemoryBarrierWithGroupSync(); InterlockedMin(minDepthInt, depthInt); InterlockedMax(maxDepthInt, depthInt); GroupMemoryBarrierWithGroupSync(); float minGroupDepth = asfloat(minDepthInt); float maxGroupDepth = asfloat(maxDepthInt); Load gbuffers & depth Calculate min & max z in threadgroup / tile Using InterlockedMin/Max on groupshared variable Atomics only work on ints  But casting works (z is always +) Optimization note:  Separate pass using parallel reduction with Gather to a small texture could be faster Note to the future: GPU already has similar values in HiZ/ZCull!   Can skip step 2 if we could resolve out min & max z to a texture directly Min z looks just like the occlusion software rendering output
CS step 3 – Cull idea Determine visible light sources for each tile Cull all light sources against tile ”frustum” Light sources can either naively be all light sources in the scene, or CPU frustum culled potentially visible light sources Output for each tile is: # of visible light sources Index list of visible light sources Example numbers from test scene This is the key part of the algorithm and compute shader, so must try to be rather clever with the implementation! Per-tile visible light count (black = 0 lights, white = 40)
CS step 3 – Cull implementation struct Light {     float3 pos;     float sqrRadius;     float3 color;     float invSqrRadius; }; int lightCount; StructuredBuffer<Light> lights; groupshared uint visibleLightCount = 0; groupshared uint visibleLightIndices[1024]; // ----- globals above, cont. function below ----------- uint threadCount = BLOCK_SIZE*BLOCK_SIZE;  uint passCount = (lightCount+threadCount-1) / threadCount; for (uint passIt = 0; passIt < passCount; ++passIt) {     uint lightIndex = passIt*threadCount + groupIndex;     // prevent overrun by clamping to a last ”null” light     lightIndex = min(lightIndex, lightCount);      if (intersects(lights[lightIndex], tile))     {         uint offset;         InterlockedAdd(visibleLightCount, 1, offset);         visibleLightIndices[offset] = lightIndex;     }	 } GroupMemoryBarrierWithGroupSync(); Each thread switches to process light sources instead of a pixel*  Wow, parallelism switcheroo! 256 light sources in parallel per tile Multiple iterations for >256 lights	 Intersect light source & tile Many variants dep. on accuracy requirements & performance Tile min & max z is used as a shader ”depth bounds” test For visible lights, append light index to index list Atomic add to threadgroup shared memory. ”inlined stream compaction” Prefix sum + stream compaction should be faster than atomics, but more limiting Synchronize group & switch back to processing pixels We now know which light sources affect the tile *Your grandfather’s pixel shader can’t do that!
CS deferred shading final steps Computed lighting For each pixel, accumulate lighting from visible lights Read from tile visible light index list in threadgroup shared memory Combine lighting & shading albedos / parameters Output is non-MSAA HDR texture Render transparent surfaces on top float3 diffuseLight = 0; float3 specularLight = 0; for (uint lightIt = 0; lightIt < visibleLightCount; ++lightIt) {     uint lightIndex = visibleLightIndices[lightIt];     Light light = lights[lightIndex];		     evaluateAndAccumulateLight(         light,          gbufferParameters,         diffuseLight,         specularLight);  } Combined final output  (not the best example)
Example results
Example: 25+ analytical specular highlights per pixel
Compute Shader-based  Deferred Shading demo
CS-based deferred shading  The Good: Constant & absolute minimal bandwidth Read gbuffers & depth once! Doesn’t need intermediate light buffers Can take a lot of memory with HDR, MSAA & color specular Scales up to huge amount of big overlapping light sources! Fine-grained culling (16x16) Only ALU cost, good future scaling Could be useful for accumulating VPLs The Bad: Requires DX11 HW (duh) CS 4.0/4.1 difficult due to atomics & scattered groupshared writes Culling overhead for small light sources Can accumulate them using standard light volume rendering Or separate CS for tile-classific. Potentially performance MSAA texture loads / UAV writing might be slower then standard PS The Ugly: ,[object Object]
DX11 CS UAV limitation.  ,[object Object]
What else do we want to do? WARNING: Overly enthusiastic and non all-knowing game developer ranting Mixed resolution MSAA particle rendering  Depth test per sample, shade per quarter pixel, and depth-aware upsample directly in shader Demand-paged procedural texturing / compositing Zero latency “texture shaders” Pre-tessellation coarse rasterization for z-culling of patches Potential optimization in scenes of massive geometric overdraw Can be coupled with recursive schemes Deferred shading w/ many & arbitrary BRDFs/materials Queue up pixels of multiple materials for coherent processing in own shader Instead of incoherenct screen-space dynamic flow control Latency-free lens flares  Finally! No false/late occlusion Occlusion query results written to CB and used in shader to cull & scale And much much more...

More Related Content

What's hot

Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
Electronic Arts / DICE
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
Tiago Sousa
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
Johan Andersson
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Tiago Sousa
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Electronic Arts / DICE
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
Michele Giacalone
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
Electronic Arts / DICE
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Guerrilla
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
Ki Hyunwoo
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Tiago Sousa
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
Tiago Sousa
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
Mark Kilgard
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Philip Hammer
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in Games
Colin Barré-Brisebois
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Electronic Arts / DICE
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
Johan Andersson
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John HableNaughty Dog
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
Electronic Arts / DICE
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
Philip Hammer
 

What's hot (20)

Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next GenerationTaking Killzone Shadow Fall Image Quality Into The Next Generation
Taking Killzone Shadow Fall Image Quality Into The Next Generation
 
Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1Rendering AAA-Quality Characters of Project A1
Rendering AAA-Quality Characters of Project A1
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in Games
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 

Viewers also liked

4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
Electronic Arts / DICE
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
Electronic Arts / DICE
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
Electronic Arts / DICE
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Johan Andersson
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
Electronic Arts / DICE
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
Electronic Arts / DICE
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
Electronic Arts / DICE
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
Electronic Arts / DICE
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
Electronic Arts / DICE
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
Electronic Arts / DICE
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barré-Brisebois
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Electronic Arts / DICE
 
Executable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itExecutable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itElectronic Arts / DICE
 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
Electronic Arts / DICE
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
Electronic Arts / DICE
 
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOMHow High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
Anders Clerwall
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
Electronic Arts / DICE
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Electronic Arts / DICE
 

Viewers also liked (20)

4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 
5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)5 Major Challenges in Real-time Rendering (2012)
5 Major Challenges in Real-time Rendering (2012)
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
 
Executable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight itExecutable Bloat - How it happens and how we can fight it
Executable Bloat - How it happens and how we can fight it
 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOMHow High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 

Similar to Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)

Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
Johan Andersson
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderertobias_persson
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
Nitesh Dubey
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
Alcides Fonseca
 
Coding for multiple cores
Coding for multiple coresCoding for multiple cores
Coding for multiple cores
Lee Hanxue
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
Prabindh Sundareson
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
ナム-Nam Nguyễn
 
Gpu
GpuGpu
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmicguest40fc7cd
 
Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02
RubnCuesta2
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Pcsx2 readme 0.9.6
Pcsx2 readme 0.9.6Pcsx2 readme 0.9.6
Pcsx2 readme 0.9.6
Angel David
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
Owen Wu
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical
Moabi.com
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 

Similar to Parallel Graphics in Frostbite - Current & Future (Siggraph 2009) (20)

Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderer
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)
 
Coding for multiple cores
Coding for multiple coresCoding for multiple cores
Coding for multiple cores
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
Gpu
GpuGpu
Gpu
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
 
Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Pcsx2 readme 0.9.6
Pcsx2 readme 0.9.6Pcsx2 readme 0.9.6
Pcsx2 readme 0.9.6
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)

  • 1. Parallel Graphics in Frostbite – Current & Future Johan Andersson DICE
  • 2. Menu Game engine CPU & GPU parallelism Rendering techniques & systems – old & new Mixed in with some future predictions & wishes
  • 3. Quick background Frostbite 1.x [1][2][3] Xbox 360, PS3, DX10 Battlefield: Bad Company (shipped) Battlefield 1943 (shipped) Battlefield: Bad Company 2 Frostbite 2 [4][5] In development Xbox 360, PS3 DX11 (10.0, 10.1, 11) Disclaimer: Unless specified, pictures are from engine tests, not actual games
  • 4.
  • 5.
  • 6. Job-based parallelism Must utilize all cores in the engine Xbox 360: 6 HW threads PS3: 2 HW threads + 6 great SPUs PC: 2-8 HW threads And many more coming Divide up systems into Jobs Async function calls with explicit inputs & outputs Typically fully independent stateless functions Makes it easier on PS3 SPU & in general Job dependencies create job graph All cores consume jobs CELL processor – We like
  • 7. Frostbite CPU job graph Frame job graph from Frostbite 1 (PS3) Build big job graphs Batch, batch, batch Mix CPU- & SPU-jobs Future: Mix in low-latency GPU-jobs Job dependencies determine: Execution order Sync points Load balancing I.e. the effective parallelism Braided Parallelism* [6] Intermixed task- & data-parallelism * Still only 10 hits on google (yet!), but I like Aaron’s term
  • 8. Rendering jobs Rendering systems are heavily divided up into jobs Jobs: Terrain geometry processing Undergrowth generation [2] Decal projection [3] Particle simulation Frustum culling Occlusion culling Occlusion rasterization Command buffer generation PS3: Triangle culling Most will move to GPU Eventually.. A few have already! Mostly one-way data flow I will talk about a couple of these..
  • 9. Parallel command buffer recording Dispatch draw calls and state to multiple command buffers in parallel Scales linearly with # cores 1500-4000 draw calls per frame Reduces latency & improves performance Important for all platforms, used on: Xbox 360 PS3 (SPU-based) PC DX11 Previously not possible on PC, but now in DX11...
  • 10. DX11 parallel dispatch First class citizen in DX11 Killer feature for reducing CPU overhead & latency ~90% of our rendering dispatch job time is in D3D/driver DX11 deferred device context per core Together with dynamic resources (cbuffer/vbuffer) for usage on that deferred context Renderer has list of all draw calls we want to do for each rendering “layer” of the frame Split draw calls for each layer into chunks of ~256 and dispatch in parallel to the deferred contexts Each chunk generates a command list Render to immediate context & execute command lists Profit! Goal: close to linear scaling up to octa-core when we get full DX11 driver support (up to the IHVs now) Future note: This is ”just” a stopgap measure until we evolve the GPU to be able to fully feed itself (hi LRB)
  • 11. Occlusion culling Problem: Buildings & env occlude large amounts of objects Invisible objects still have to: Update logic & animations Generate command buffer Processed on CPU & GPU Difficult to implement full culling Destructible buildings Dynamic occludees Difficult to precompute GPU occlusion queries can be heavy to render From Battlefield: Bad Company PS3
  • 12. Our solution: Software occlusion rasterization
  • 13.
  • 14. Screen-space bounding-box testPictures & numbers from Battlefield: Bad Company PS3
  • 15. GPU occlusion culling Ideally want GPU rasterization & testing, but: Occlusion queries introduces overhead & latency Can be manageable, but far from ideal Conditional rendering only helps GPU Not CPU, frame memory or draw calls Future 1: Low-latency extra GPU exec. context Rasterization and testing done on GPU where it belongs Lockstep with CPU, need to read back data within a few ms Should be possible on LRB (latency?), want on all HW Future 2: Move entire cull & rendering to ”GPU” World rep., cull, systems, dispatch. End goal.
  • 16. PS3 geometry processing Problem: Slow GPU triangle & vertex setup on PS3 Combined with unique situation with powerful & initially not fully utilized ”free” SPUs! Solution: SPU triangle culling Trade SPU time for GPU time Cull all back faces, micro-triangles, out of frustum Based on Sony’s PS3 EDGE library [7] Also see Jon Olick’s talk from the course last year 5 SPU jobs processes frame geometry in parallel Output is new index buffer for each draw call
  • 17. Custom geometry processing Software control opens up great flexibility and programmability! Simple custom culling/processing that we’ve added: Partition bounding box culling Mesh part culling Clip plane triangle trivial accept & reject Triangle cull volumes (inverse clip planes) Others are doing: Full skinning, morph targets, CLOD, cloth Future wish: No forced/fixed vertex & geometry shaders DIY compute shaders with fixed-func stages (tesselation and rasterization) Software-controlled queuing of data between stages To avoid always spilling out to memory
  • 18.
  • 21.
  • 22. No CPU memory or upload
  • 23.
  • 24. Screen-space tile classification Divide screen up into tiles and determine how many & which light sources intersect each tile Only apply the visible light sources on pixels in each tile Reduced BW & setup cost with multiple lights in single shader Used in Naughty Dog’s Uncharted [9] and SCEE PhyreEngine [10] Hmm, isn’t light classification per screen-space tile sort of similar of how a compute shader can work with 2D thread groups? Answer: YES, except a CS can do everything in a single pass! From ”The Technology of Uncharted". GDC’08 [9]
  • 25.
  • 26. Not production tested or optimized
  • 28. Assumption: No shadows (for now)New hybrid Graphics/Compute shading pipeline: Graphics pipeline rasterizes gbuffers for opaque surfaces Compute pipeline uses gbuffers, culls light sources, computes lighting & combines with shading (multiple other variants also possible)
  • 29. CS requirements & setup Input data is gbuffers, depth buffer & light constants Output is fully composited & lit HDR texture 1 thread per pixel, 16x16 thread groups (aka tile) Normal Roughness Texture2D<float4> gbufferTexture1 : register(t0); Texture2D<float4> gbufferTexture2 : register(t1); Texture2D<float4> gbufferTexture3 : register(t2); Texture2D<float4> depthTexture : register(t3); RWTexture2D<float4> outputTexture : register(u0); #define BLOCK_SIZE 16 [numthreads(BLOCK_SIZE,BLOCK_SIZE,1)] void csMain( uint3 groupId : SV_GroupID, uint3 groupThreadId : SV_GroupThreadID, uint groupIndex: SV_GroupIndex, uint3 dispatchThreadId : SV_DispatchThreadID) { ... } Diffuse Albedo Specular Albedo
  • 30. CS steps 1-2 groupshared uint minDepthInt; groupshared uint maxDepthInt; // --- globals above, function below ------- float depth = depthTexture.Load(uint3(texCoord, 0)).r; uint depthInt = asuint(depth); minDepthInt = 0xFFFFFFFF; maxDepthInt = 0; GroupMemoryBarrierWithGroupSync(); InterlockedMin(minDepthInt, depthInt); InterlockedMax(maxDepthInt, depthInt); GroupMemoryBarrierWithGroupSync(); float minGroupDepth = asfloat(minDepthInt); float maxGroupDepth = asfloat(maxDepthInt); Load gbuffers & depth Calculate min & max z in threadgroup / tile Using InterlockedMin/Max on groupshared variable Atomics only work on ints  But casting works (z is always +) Optimization note: Separate pass using parallel reduction with Gather to a small texture could be faster Note to the future: GPU already has similar values in HiZ/ZCull! Can skip step 2 if we could resolve out min & max z to a texture directly Min z looks just like the occlusion software rendering output
  • 31. CS step 3 – Cull idea Determine visible light sources for each tile Cull all light sources against tile ”frustum” Light sources can either naively be all light sources in the scene, or CPU frustum culled potentially visible light sources Output for each tile is: # of visible light sources Index list of visible light sources Example numbers from test scene This is the key part of the algorithm and compute shader, so must try to be rather clever with the implementation! Per-tile visible light count (black = 0 lights, white = 40)
  • 32. CS step 3 – Cull implementation struct Light { float3 pos; float sqrRadius; float3 color; float invSqrRadius; }; int lightCount; StructuredBuffer<Light> lights; groupshared uint visibleLightCount = 0; groupshared uint visibleLightIndices[1024]; // ----- globals above, cont. function below ----------- uint threadCount = BLOCK_SIZE*BLOCK_SIZE; uint passCount = (lightCount+threadCount-1) / threadCount; for (uint passIt = 0; passIt < passCount; ++passIt) { uint lightIndex = passIt*threadCount + groupIndex; // prevent overrun by clamping to a last ”null” light lightIndex = min(lightIndex, lightCount); if (intersects(lights[lightIndex], tile)) { uint offset; InterlockedAdd(visibleLightCount, 1, offset); visibleLightIndices[offset] = lightIndex; } } GroupMemoryBarrierWithGroupSync(); Each thread switches to process light sources instead of a pixel* Wow, parallelism switcheroo! 256 light sources in parallel per tile Multiple iterations for >256 lights Intersect light source & tile Many variants dep. on accuracy requirements & performance Tile min & max z is used as a shader ”depth bounds” test For visible lights, append light index to index list Atomic add to threadgroup shared memory. ”inlined stream compaction” Prefix sum + stream compaction should be faster than atomics, but more limiting Synchronize group & switch back to processing pixels We now know which light sources affect the tile *Your grandfather’s pixel shader can’t do that!
  • 33. CS deferred shading final steps Computed lighting For each pixel, accumulate lighting from visible lights Read from tile visible light index list in threadgroup shared memory Combine lighting & shading albedos / parameters Output is non-MSAA HDR texture Render transparent surfaces on top float3 diffuseLight = 0; float3 specularLight = 0; for (uint lightIt = 0; lightIt < visibleLightCount; ++lightIt) { uint lightIndex = visibleLightIndices[lightIt]; Light light = lights[lightIndex]; evaluateAndAccumulateLight( light, gbufferParameters, diffuseLight, specularLight); } Combined final output (not the best example)
  • 35. Example: 25+ analytical specular highlights per pixel
  • 36.
  • 37. Compute Shader-based Deferred Shading demo
  • 38.
  • 39.
  • 40. What else do we want to do? WARNING: Overly enthusiastic and non all-knowing game developer ranting Mixed resolution MSAA particle rendering Depth test per sample, shade per quarter pixel, and depth-aware upsample directly in shader Demand-paged procedural texturing / compositing Zero latency “texture shaders” Pre-tessellation coarse rasterization for z-culling of patches Potential optimization in scenes of massive geometric overdraw Can be coupled with recursive schemes Deferred shading w/ many & arbitrary BRDFs/materials Queue up pixels of multiple materials for coherent processing in own shader Instead of incoherenct screen-space dynamic flow control Latency-free lens flares Finally! No false/late occlusion Occlusion query results written to CB and used in shader to cull & scale And much much more...
  • 41. Conclusions A good parallelization model is key for good game engine performance (duh) Job graphs of mixed task- & data-parallel CPU & SPU jobs works well for us SPU-jobs do the heavy lifting Hybrid compute/graphics pipelines looks promising Efficient interopability is super important (DX11 is great) Deferred lighting & shading in CS is just the start Want a user-defined streaming pipeline model Expressive & extensible hybrid pipelines with queues Focus on the data flow & patterns instead of doing sequential memory passes
  • 42. Acknowledgements DICE & Frostbite team Nicolas Thibieroz, Mark Leather Miguel Sainz, Yury Uralsky Kayvon Fatahalian Matt Swoboda, Pål-Kristian Engstad Timothy Farrar, Jake Cannell
  • 43. References [1] Johan Andersson. ”Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing Techniques ”. GDC 2007. http://repi.blogspot.com/2009/01/conference-slides.html [2] Natasha Tartarchuk & Johan Andersson. ”Rendering Architecture and Real-time Procedural Shading & Texturing Techniques”. GDC 2007. http://developer.amd.com/Assets/Andersson-Tatarchuk-FrostbiteRenderingArchitecture(GDC07_AMD_Session).pdf [3] Johan Andersson. ”Terrain Rendering in Frostbite using Procedural ShaderSplatting”. Siggraph 2007. http://developer.amd.com/media/gpu_assets/Andersson-TerrainRendering(Siggraph07).pdf [4] Daniel Johansson & Johan Andersson. “Shadows & Decals – D3D10 techniques from Frostbite”. GDC 2009. http://repi.blogspot.com/2009/03/gdc09-shadows-decals-d3d10-techniques.html [5] Bill Bilodeau & Johan Andersson. “Your Game Needs Direct3D 11, So Get Started Now!”. GDC 2009. http://repi.blogspot.com/2009/04/gdc09-your-game-needs-direct3d-11-so.html [6] Aaron Lefohn. ”Programming Larrabee: Beyond Data Parallelism” – ”Beyond Programmable Shading” course. Siggraph 2008. http://s08.idav.ucdavis.edu/lefohn-programming-larrabee.pdf [7] Mark Cerny, Jon Olick, Vince Diesi. “PLAYSTATION Edge”. GDC 2007. [8] Wolfgang Engel. “Light Pre-Pass Renderer Mark III” - “Advances in Real-Time Rendering in 3D Graphics and Games” course notes. Siggraph 2009. [9] Pål-KristianEngstad, "The Technology of Uncharted: Drake’s Fortune". GDC 2008. http://www.naughtydog.com/corporate/press/GDC%202008/UnchartedTechGDC2008.pdf [10] Matt Swoboda. “Deferred Lighting and Post Processing on PLAYSTATION®3”. GDC 2009. http://www.technology.scee.net/files/presentations/gdc2009/DeferredLightingandPostProcessingonPS3.ppt. [11] Kayvon Fatahalian et al. ”GRAMPS: A Programming Model for Graphics Pipelines”. ACM Transactions on Graphics January, 2009. http://graphics.stanford.edu/papers/gramps-tog/ [12] Jared Hoberock et al. ”Stream Compaction for Deferred Shading” http://graphics.cs.uiuc.edu/~jch/papers/shadersorting.pdf
  • 44. We are hiring senior developers
  • 45.
  • 46. You could win a Siggraph’09 mug (yey!)
  • 47. One winner per course, notified by email in the eveningEmail: johan.andersson@dice.se Blog:http://repi.se Twitter:http://twitter.com/repi igetyourfail.com
  • 49. Timing view Real-time in-game overlay See CPU, SPU & GPU timing events & effective parallelism What we use to reduce sync-points & optimize load balancing between all processors GPU timing through event queries AFR-handling rather shaky, but works!* Example: PC, 4 CPU cores, 2 GPUs in AFR *At least on AMD 4870x2 after some alt-tab action

Editor's Notes

  1. Concrete effects on the graphics pipelineInclude a few wishes & predictions of how we would like the GPU programming model to evolve
  2. FB1 Started out 5 years ago. Target the ”next-generation” consoles.And while we developed the engine we also worked on BFBC1, which was the pilot project. After that shipped, we have been spending quite a bit of effort on a new version of the engine of which one of the major new things is full PC & DX11 support.So I think we have some interesting experiences on both the consoles and modern PCs.And no, I won’t talk about BF3.
  3. These large scale environments require heavy processing.Lets go a head and talk about jobs..
  4. Better code structure!Gustafson’s LawFixed 33 ms/f
  5. 90% of this is a single job graph for the rendering of a frameRely on dynamic load-balancingTask-parallel examples: Terrain culling, particles, building entitiesData-parallel examples: ParticlesHave example of low-latency GPU job later onBraided: Aaron introduced the term at this course last year
  6. We have a lot of jobs across the engine, too many to go through so I chose to focus more on some of the rendering.Our intention is to move all of these to the GPU
  7. One of the first optimizations for multi-core that we did was to move all rendering dispatch to a seperate thread. This includes all the draw calls and states that we set to D3D.This helps a lot but it doesn’t scale that well as we only utilize a single extra core.Gather
  8. Software rasterization of occluders
  9. Conservative
  10. Want to rasterize on GPU, not CPU CPU Û GPU job dependencies
  11. Only visible trianglesJon Olick also talked about Edge at the course last yearSo I wont go into that much detail about thisThis is a huge optimization both for us and many other developers
  12. Initially very skepticalIntrinsics problematic
  13. 280 instruction GSStreamOut buffer/query management is difficult & buggyWant to use CS
  14. Esp. heavy with MSAAParallel reduction probably faster than atomics
  15. MSAA texture read cost, UAV costGood: depth bounds on all HW
  16. Paper presented at Siggraph yesterday.Dynamic irregular workloadsElegant model, holds a lot of promise
  17. OpenCLRapidMind, GRAMPs