Successfully reported this slideshow.

Bending the Graphics Pipeline

8

Share

Loading in …3
×
1 of 39
1 of 39

More Related Content

More from Electronic Arts / DICE

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Bending the Graphics Pipeline

  1. 1. Bending the Graphics Pipeline Johan Andersson DICE
  2. 2. Overview <ul><li>Give a taste of a few rendering techniques we are using & experimenting with how they interact, or would like to interact, with the graphics pipeline </li></ul><ul><li>Tile-based Deferred Shading </li></ul><ul><li>Morphological Antialiasing </li></ul><ul><li>Analytical Ambient Occlusion </li></ul>08/01/10 Beyond Programmable Shading, SIGGRAPH 2010
  3. 3. TILE-BASED DEFERRED SHADING
  4. 4. Tile-based deferred shading <ul><li>Tile-based culling & lighting </li></ul><ul><ul><li>Cull lights per screen-space tile </li></ul></ul><ul><ul><li>Lighting kernel runs per tile </li></ul></ul><ul><ul><li>Minimizes bandwidth/setup cost </li></ul></ul><ul><li>DX11: GPU compute shader </li></ul><ul><ul><li>Covered in the course last year [Andersson09] </li></ul></ul><ul><li>PS3: SPU jobs </li></ul><ul><ul><li>GPU renders gbuffer </li></ul></ul><ul><ul><li>SPU does light culling & full lighting evaluation for each pixel </li></ul></ul>08/01/10 Beyond Programmable Shading, SIGGRAPH 2010
  5. 5. <ul><li>Standard phong </li></ul><ul><li>Metallic </li></ul><ul><li>Skin </li></ul><ul><li>Translucent </li></ul>Multiple deferred lighting models Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  6. 6. Working with tiles <ul><li>Tile culling optimizations </li></ul><ul><ul><li>Cull lights & shadows with tile normal cone </li></ul></ul><ul><ul><li>Detect tile specular=0 </li></ul></ul><ul><ul><li>Detect tile lighting model </li></ul></ul><ul><li>Tile lighting kernel permutations </li></ul><ul><ul><li>Specular on/off </li></ul></ul><ul><ul><li>Lighting models </li></ul></ul><ul><ul><li>More in the future </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  7. 7. SPU-based Deferred Shading <ul><li>Ported DX11 compute shader to SPU job </li></ul><ul><ul><li>Offloads PS3 GPU </li></ul></ul><ul><ul><li>SPU processing in parallel with GPU rendering </li></ul></ul><ul><ul><li>32x16 pixel tiles </li></ul></ul><ul><li>Explicit SoA vectorization instead of implicit </li></ul><ul><ul><li>C/C++ on SPU - HLSL on GPU </li></ul></ul><ul><ul><li>Not a problem for such a relative small kernel </li></ul></ul><ul><ul><li>But not ideal data-parallel programming model </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  8. 8. SPU vs GPU architecture <ul><li>6 execution contexts vs 1+ million (each pixel) </li></ul><ul><li>Explicit SIMD vs implicit SIMD </li></ul><ul><li>C/C++ vs HLSL </li></ul><ul><li>Explicit async DMA vs implicit latency hiding </li></ul><ul><li>What can we learn? </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  9. 9. Issues & challenges going forward <ul><li>More lighting models </li></ul><ul><ul><li>SIMD & branching efficiency </li></ul></ul><ul><li>Transparent decal surfaces & volumes </li></ul><ul><ul><li>Fixed function blending doesn’t work well with deferred </li></ul></ul><ul><li>Higher-quality antialiasing </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  10. 10. Flexible lighting models <ul><li>Want both more & more flexible models: </li></ul><ul><ul><li>Custom gbuffer layout per material </li></ul></ul><ul><ul><li>Quality & performance tradeoffs </li></ul></ul><ul><li>Examples: </li></ul><ul><ul><li>Hair / anisotropic materials </li></ul></ul><ul><ul><ul><li>Requires more lighting model parameters in gbuffer </li></ul></ul></ul><ul><ul><li>Foliage </li></ul></ul><ul><ul><ul><li>Massive overdraw with alpha-tested simple shaders, few parameters </li></ul></ul></ul><ul><ul><ul><li>Write to as simple gbuffer as possible to reduce ROP/bandwidth bottleneck </li></ul></ul></ul><ul><ul><li>Skin </li></ul></ul><ul><ul><ul><li>Sub-surface scattering approximation </li></ul></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  11. 11. The SIMD efficiency problem <ul><li>Lighting models through dynamic branches </li></ul><ul><li>GPU shader model can be problematic: </li></ul><ul><ul><li>Increased register pressure = overall slower shader </li></ul></ul><ul><ul><li>Requires good screen-space SIMD coherency for performance win </li></ul></ul><ul><li>Potential solutions: </li></ul><ul><ul><li>Reshuffle pixels to improve coherency ? </li></ul></ul><ul><ul><ul><li>Within each tile, sort pixels by model, compute lighting & then scatter back </li></ul></ul></ul><ul><ul><li>GRAMPS -style queing? [Sugerman09] </li></ul></ul><ul><ul><ul><li>Attractive & powerful high-level programming model </li></ul></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 Alpha-tested foliage has far from ideal coherency
  12. 12. Decals & deferred shading <ul><li>Decals blend selectively against gbuffer </li></ul><ul><ul><li>Include: </li></ul></ul><ul><ul><ul><li>Diffuse albedo (gbuffer1.rgb) </li></ul></ul></ul><ul><ul><ul><li>Normal (gbuffer0.rgb) </li></ul></ul></ul><ul><ul><li>Want to include (but can’t in single pass): </li></ul></ul><ul><ul><ul><li>Specular albedo (gbuffer1.a) </li></ul></ul></ul><ul><ul><ul><li>Specular smoothness (gbuffer0.a) </li></ul></ul></ul><ul><ul><li>Exclude: </li></ul></ul><ul><ul><ul><li>Material id (can’t blend) </li></ul></ul></ul><ul><ul><ul><li>Object lighting (inherit from below surface) </li></ul></ul></ul><ul><li>Fixed function blending doesn’t work well </li></ul><ul><ul><li>Pixel shader can’t write out both alpha & blend factor! </li></ul></ul><ul><ul><li>Consoles doesn’t have blend mode per MRT </li></ul></ul><ul><ul><li>Linear blend doesn’t work for all components </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 See Destruction Masking in Frostbite 2 using Volume Distance Fields [Kihl10] for more details about decal use case
  13. 13. Need programmable blending <ul><li>Benefits: </li></ul><ul><ul><li>Write out gbuffer alpha channels indepenently of blend factor </li></ul></ul><ul><ul><li>Treat channels & targets however you see fit </li></ul></ul><ul><ul><li>Non-linear blending & renormalizing blends </li></ul></ul><ul><ul><li>Can do overlapping dependent blending </li></ul></ul><ul><ul><ul><li>Read current normal, add bumps relative to it, write out </li></ul></ul></ul><ul><li>What approach? </li></ul><ul><ul><li>LRB-style pixel shader framebuffer read/modify/write [Lalonde09] </li></ul></ul><ul><ul><ul><li>Ideal general solution for developers </li></ul></ul></ul><ul><ul><ul><li>How to hide synchronization latency? Implicit / explicit? </li></ul></ul></ul><ul><ul><li>Blend shader </li></ul></ul><ul><ul><ul><li>Yet another stage in a fixed pipeline </li></ul></ul></ul><ul><ul><ul><li>No R/M/W, not ideal </li></ul></ul></ul><ul><ul><li>More? </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  14. 14. The deferred shading + MSAA problem <ul><li>Huge storage & bandwidth requirements with deferred </li></ul><ul><ul><li>1920 x 1080 x 5 x 4 x 4 = 165 MB </li></ul></ul><ul><ul><li>Doesn’t scale! Adding 1 bit of precision = 2x more memory </li></ul></ul><ul><li>4x MSAA is not enough </li></ul><ul><ul><li>Esp. for thin geometry in a distance </li></ul></ul><ul><li>Prohibitive performance and bandwidth in general with deferred shading </li></ul><ul><ul><li>But don’t miss Andrew Lauritzen’s talk later in the course: Deferred Rendering for Current and Future Rendering Pipelines </li></ul></ul><ul><li>There are alternatives to MSAA... </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  15. 15. MLAA – Morphological Antialiasing <ul><li>Post-effect antialiasing </li></ul><ul><li>Introduced in [Reshetov09] </li></ul><ul><li>Implementations: </li></ul><ul><ul><li>Intel CPU reference implementation [Reshetov09] </li></ul></ul><ul><ul><li>Sony PS3 SPU implementation [Perthuis10] </li></ul></ul><ul><ul><li>GPU compute? [Biri10] </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  16. 16. MLAA workings Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 From [Reshetov09]
  17. 17. MLAA comparisons (PS3) Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 No AA MLAA
  18. 18. MLAA takeaways <ul><li>Awesome AA for still pictures </li></ul><ul><li>Moving pictures good, but: </li></ul><ul><ul><li>No sub-pixel information = edges snap to pixels </li></ul></ul><ul><ul><li>Doesn’t solve aliasing on fine detail geometry </li></ul></ul><ul><ul><li>Overall still a very good benefit! </li></ul></ul><ul><li>Focus/exclude effect based on framebuffer alpha & thresholds </li></ul><ul><ul><li>Unique requirements per game/app </li></ul></ul><ul><ul><li>Not good to use on some UI, mark in alpha (or apply before) </li></ul></ul><ul><li>Variable post-effect , trade perf vs quality! </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  19. 19. MLAA future (PC) <ul><li>GPU compute shader implementation </li></ul><ul><li>Combine with MSAA & sub-pixel samples </li></ul><ul><ul><li>Simple MSAA box filter downsampling is a big waste </li></ul></ul><ul><ul><li>Sort of similar to A Directionally Adaptive Edge Anti-Aliasing Filter [Yang09] </li></ul></ul><ul><ul><li>A must to reduce the edge snapping of pure MLAA </li></ul></ul><ul><ul><li>Not fully clear how it should work (sample distribution) </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  20. 20. AMBIENT OCCLUSION
  21. 21. Current dynamic AO <ul><li>Horizon-based Ambient Occlusion </li></ul><ul><ul><li>See [Bavoil09] for complete details </li></ul></ul><ul><li>Based on screen-space depth-buffer ( SSAO ) </li></ul><ul><ul><li>Very high quality sampling </li></ul></ul><ul><ul><li>But only screen-space info is a big limitation </li></ul></ul><ul><ul><li>Creates false occlusion artifacts </li></ul></ul><ul><li>Render in half-res for improved performance </li></ul><ul><ul><li>Bilateral upsampling + gaussian blur </li></ul></ul><ul><ul><li>Can also do dual-resolution to reduce artifacts </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  22. 22. Horizon-based Ambient Occlusion Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 False occlusion halo from thin geometry
  23. 23. HBAO limitations <ul><li>False halo occlusion artifacts around small geometry </li></ul><ul><ul><li>Such as: fences & poles </li></ul></ul><ul><ul><li>Extra visible when moving the camera </li></ul></ul><ul><li>Very noisy sampling for detailed zbuffers </li></ul><ul><ul><li>Common with alpha-tested foliage </li></ul></ul><ul><ul><li>Difficult sampling problem </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  24. 24. Analytical Ambient Occlusion Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  25. 25. HBAO vs AAO Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  26. 26. Analytical Ambient Occlusion <ul><li>Using Ambient Occlusion Volumes </li></ul><ul><ul><li>[McGuire10] </li></ul></ul><ul><li>Experimental implementation in Frostbite 2 </li></ul><ul><ul><li>With some good help from Morgan McGuire and Louis Bavoil </li></ul></ul><ul><li>Geometry-based technique </li></ul><ul><ul><li>Not screen-space! </li></ul></ul><ul><ul><li>Say what? </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  27. 27. AOV idea <ul><li>Extrude prism for each triangle (GS) </li></ul><ul><ul><li>Extrusion distance is where occlusion=0 </li></ul></ul><ul><li>Rasterize primitives in prism </li></ul><ul><ul><li>With depth-test enabled, near depth clip disabled </li></ul></ul><ul><ul><li>Finds visible points inside volume </li></ul></ul><ul><ul><li>Need to handle case with camera inside volume </li></ul></ul><ul><li>Accumulate analytical occlusion contribution for visible pixels (PS) </li></ul><ul><ul><li>Uses pixel normal & depth values from gbuffer </li></ul></ul><ul><ul><li>Subtractive blend </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  28. 28. Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 HBAO
  29. 29. Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 HBAO AOV
  30. 30. AOV in practice <ul><li>Render geometry again in separate AO pass </li></ul><ul><ul><li>Uses depth & normal buffer from deferred rendering </li></ul></ul><ul><ul><li>Half-res or lower with bilateral upsampling </li></ul></ul><ul><ul><li>Culling should consider extrusion distance </li></ul></ul><ul><li>Separate paths for dynamic & rigid objects </li></ul><ul><ul><li>Can pre-compute rigid extruded AOV & reduce overdraw </li></ul></ul><ul><li>Doesn’t work with alpha-tested surfaces </li></ul><ul><ul><li>Simulate with per-surface or per-triangle approx. coverage factor </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  31. 31. Overdarkening (extra occlusion) Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  32. 32. Varying overdraw with AO distance Beyond Programmable Shading, SIGGRAPH 2010 08/01/10 0.1 m 0.2 m 0.5 m
  33. 33. AOV pros & cons <ul><li>Pros: </li></ul><ul><li>Very high quality - close to raytracing ground truth </li></ul><ul><li>Noise free (when full res) </li></ul><ul><li>Perfectly stable with view changes </li></ul><ul><li>Supports arbitrary dynamic polygon soups </li></ul><ul><li>Cons: </li></ul><ul><li>Requires massive fillrate </li></ul><ul><li>Geometry cost </li></ul><ul><li>Overdarkening, may require content tweaks </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  34. 34. AOV future optimizations <ul><li>Reduce the massive overdraw </li></ul><ul><ul><li>Cull / restrict prisms that only extend out to empty air? </li></ul></ul><ul><ul><li>Clamp screen-space prism size </li></ul></ul><ul><ul><ul><li>Not correct, but practical tradeoff. HBAO does this </li></ul></ul></ul><ul><li>More optimal prism geometry </li></ul><ul><ul><li>GS is limited to triangle strip output  </li></ul></ul><ul><ul><li>Precompute using quads for rigid objects </li></ul></ul><ul><li>Geometry LOD / mix with higher-order geometry representations </li></ul><ul><ul><li>Also see AO volume texture & analytical capsule techniques [Hill10] </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  35. 35. AOV takeaways <ul><li>Major improvement in visual quality compared to SSAO </li></ul><ul><li>Interesting use of geometry & rasterization pipelines </li></ul><ul><ul><li>Builds on existing HW-, SW- & content pipelines </li></ul></ul><ul><ul><li>Quite simple brute force drop-in (but not as simple as SSAO ) </li></ul></ul><ul><li>Siggraph interactive framerates™ today, but lots of potential: </li></ul><ul><ul><li>Performance highly dependent on occlusion distance </li></ul></ul><ul><ul><li>Optimizations / less brute force? </li></ul></ul><ul><ul><li>Use for high-end / reference / precompute / beauty shots initially </li></ul></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  36. 36. Conclusions <ul><li>New graphics pipeline usages are opened up with improved HW performance </li></ul><ul><ul><li>Often not efficient to do with pure compute </li></ul></ul><ul><ul><li>Continue to give us more performance & bandwidth! </li></ul></ul><ul><li>We need to continue to break down some fixed graphics pipeline barriers </li></ul>08/01/10 Beyond Programmable Shading, SIGGRAPH 2010
  37. 37. Acknowledgments <ul><li>Morgan McGuire </li></ul><ul><li>Louis Bavoil </li></ul><ul><li>David Luebke </li></ul><ul><li>Andrew Lauritzen </li></ul><ul><li>Robert Kihl </li></ul><ul><li>Christina Coffin </li></ul><ul><li>SCEE </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  38. 38. Questions? <ul><li>email: [email_address] </li></ul><ul><li>blog: http://repi.se </li></ul><ul><li>twitter: @repi </li></ul><ul><li>For more DICE talks: </li></ul><ul><li>http://publications.dice.se </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10
  39. 39. References <ul><li>[Andersson09] Johan Andersson, “Parallel Graphics in Frostbite - Current & Future”, Beyond Programmable Shading Course – Siggraph 2009 http://s09.idav.ucdavis.edu / </li></ul><ul><li>[Lalonde09] Paul Lalonde “Innovating in a Software Graphics Pipeline” Beyond Programmable Shading Course – Siggraph 2009 http:// s09.idav.ucdavis.edu/ </li></ul><ul><li>[Reshetov09] Alexander Reshetov, ”Morphological Antialiasing” </li></ul><ul><li>[Yang09] Jason C. Yang et al, High Performance Graphics 2009, ” A Directionally Adaptive Edge Anti-Aliasing Filter” </li></ul><ul><li>[McGuire10] Morgan McGuire, High Performance Graphics 2010, ”Ambient Occlusion Volumes” http://graphics.cs.williams.edu/papers/AOVHPG10/ </li></ul><ul><li>[Biri10] Venceslas Biri et al, Siggraph 2010, “Practical morphological antialiasing on the GPU” </li></ul><ul><li>[Bavoil08] Louis Bavoil & Miguel Sainz, Siggraph 2008 “ Image-Space Horizon-Based Ambient Occlusion” http://developer.nvidia.com/object/siggraph-2008-HBAO.html </li></ul><ul><li>[Hill10] Stephen Hill, Game Developers Conference 2010 ”Rendering with Conviction” </li></ul><ul><li>[Kihl10] Robert Kihl, Advanced in Real-time Rendering in 3D Graphics and Games, Siggraph 2010, ”Destruction Masking in Frostbite 2 using Volume Distance Fields” http://publications.dice.se </li></ul><ul><li>[Sugerman09] Jeremy Sugerman et al - ACM Transactions on Graphics January, 2009 ”GRAMPS: A Programming Model for Graphics Pipelines” http://graphics.stanford.edu/papers/gramps-tog/ </li></ul><ul><li>[Perthuis10] Cedric Perthuis, ”MLAA in God of War 3” (PS3 registered developers only) </li></ul>Beyond Programmable Shading, SIGGRAPH 2010 08/01/10

Editor's Notes

  • Register pressure not an issue for SPU
  • Foliage: Can’t switch render targets for each draw call to fewer gbuffers, flushes GPU pipeline
  • Detect separation lines, detect feature patterns and based on those blend in the ideal coverage
  • Standard MSAA box filter resolve is naive
  • Standard MSAA box filter resolve is naive
  • Artists that are aware of the algorithm and this problem should be able to work around it in the content in most cases. Could potentially also have them select ór paint which triangles &amp; vertices that should contribute and/or the actual extrusion length on them
  • Deferred shading hits a major memory / BW wall with MSAA and OIT
  • ×