Oit And Indirect Illumination Using Dx11 Linked Lists

18,055 views

Published on

0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
18,055
On SlideShare
0
From Embeds
0
Number of Embeds
183
Actions
Shares
0
Downloads
137
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide
  • A node is composed of Element and Link
  • Uses a Pixel Shader 5.0 to store fragments into linked list: because we need trianglesrasterized as normal through the rendering pipeline. Briefly describe UAVs.Uses atomic operations: only 1 such operation per pixel is used (although it has feedback)
  • “Fragment & Link” buffer only use Write access actually.FragmentData: typically fragment color, but also transparency, depth, blend mode id, etc.Must be large enough to store all fragments: allocation must take into account maximum number of elements to store. E.g. “average overdraw per pixel” can be used for allocation: width*height*averageoverdraw*sizeof(Element&Link)Fragment data struct should be kept minimal to reduce performance requirements (and improve performance in the process by performing less memory operations)
  • StartOffsetBuffer is 1D buffer despite being screen-sized; this is because atomic operations are not supported on 2D UAV textures. Initialized to magic value: thus by default the linked lists are empty since they point to -1 (end of list).
  • “Viewport” is used to illustrate incoming triangle (as no RT is bound)Here fragment data is just the pixel color for simplification.Start Offset Buffer is initialized to “magic value” (-1)
  • We’re going to exchange the offset in the StartOffsetBuffer with the current value of the counter (before incrementation).This is because the current value of the counter gives us the offset (address) of the fragment we’re storing.
  • Assuming a rasterization order of left to right for the green triangle
  • Assuming a rasterization order of left to right for the yellow triangle
  • Current pixel count is TOTAL pixel count, not pixel count per pixel location.Not using Append Buffers because they require the DXGI_FORMAT_R32_UNKNOWN format.Times 4 because startoffsetaddress is in bytes, not UINT (1 UINT = 4 bytes).
  • No “4x”to calculate offset address here becauseStartOffsetBufferSRV is declared with the uint type.
  • No point storing transparent fragments that are hidden by opaque geometry, i.e. that fail the depth test.
  • May also get away with packed color and depth into the same uint!: this would make the element size a nice 64 bits.Deferred rendering engines may wish to store the same properties in F&L structure as opaque pixels, but watch out for memory and performance impact of large structure!If using different blend modes per pixel then you would need to store blend mode as id into fragment data.Depth is stored as uint – we’ll explain why later on.
  • Default DX11 pixel pipeline executes pixel shader before depth test[earlydepthstencil] allows the depth test to be performed *before* pixel shader.Normally the HW will try to do this for you automatically but a PS writing to UAVs cannot benefit from this treatment – a bit like alpha testing.Allows performance savings and rendering correctness!: less fragments to store = good!
  • Sparse memory accesses = slow!: especially true as original rendering order can mean that fragments belonging to the same screen coordinates can be (very) far apart from each other.
  • Under-blending can be used insteadNo need for background input thenActual HW blending mode enabledBlending mode to use with under-blending is ONE, SRCALPHA
  • SortedPixels[].x storescolor, SortedPixels[].y stores depthUse (nNumPixels>=MAX_SORTED_PIXELS) test to prevent overflowActual sort algorithm used can be anything (e.g. insertion sort, etc.)
  • This happens after sorting.vCurrentColor.xyz= vFragColor.w * vFragColor.xyz + (1.0-vFragColor.w)* vCurrentColor.xyz;
  • The problem with OIT via PPLL and MSAA is that fragment sorting and blending must happens on a per-sample basis, which means a lot of storage and processing.Storing individual samples into Linked Lists requires a huge amount of memory: as most pixels are not edge pixels, they would get all their samples covered. Unfortunately schemes mixing edge and non-edge pixels proved difficult to implement and did not generate satisfactory results.SV_COVERAGE isDX11 only
  • Depth is now packed into 24 bits: this is why depth is stored as uint
  • Blue triangle covers one depth sample
  • Assuming a rasterization order of left to right for the yellow triangle
  • Traversal causes a lot of random access: Elements belonging to a same pixel location can be far apart in LL!
  • Not as fast: probably because multiple passes involved.
  • Only viable if access to samples is no longer required: (e.g. for HDR-correct tone mapping)Sorting is unaffected because stored samples share the same depth.Samples with the same coverage are blended back-to-front: this means blending happens on a per-sample basis as it normally would.
  • Code illustration for 2 samplesSortedPixels[] contains pixel color in .x and depth+coverage in .y
  • Oit And Indirect Illumination Using Dx11 Linked Lists

    1. 1.
    2. 2. OIT and Indirect Illumination using DX11 Linked Lists<br />Holger Gruen AMD ISV Relations<br />Nicolas Thibieroz AMD ISV Relations<br />
    3. 3. Agenda<br />Introduction<br />Linked List Rendering<br />Order Independent Transparency<br />Indirect Illumination<br />Q&A<br />
    4. 4. Introduction<br />Direct3D 11 HW opens the door to many new rendering algorithms<br />In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc. <br />This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique<br />OIT<br />Indirect Illumination <br />
    5. 5. Per-Pixel Linked Lists with Direct3D 11<br />Element<br />Element<br />Element<br />Element<br />Link<br />Link<br />Link<br />Link<br />Nicolas Thibieroz<br />European ISV Relations<br />AMD<br />
    6. 6. Why Linked Lists?<br />Data structure useful for programming<br />Very hard to implement efficiently with previous real-time graphics APIs<br />DX11 allows efficient creation and parsing of linked lists<br />Per-pixel linked lists<br />A collection of linked lists enumerating all pixels belonging to the same screen position<br />Element<br />Element<br />Element<br />Element<br />Link<br />Link<br />Link<br />Link<br />
    7. 7. Two-step process<br />1) Linked List Creation<br />Store incoming fragments into linked lists<br />2) Rendering from Linked List<br />Linked List traversal and processing of stored fragments<br />
    8. 8. Creating Per-Pixel Linked Lists<br />
    9. 9. PS5.0 and UAVs<br />Uses a Pixel Shader 5.0 to store fragments into linked lists<br />Not a Compute Shader 5.0!<br />Uses atomic operations<br />Two UAV buffers required<br />- “Fragment & Link” buffer<br />- “Start Offset” buffer<br /> UAV = Unordered Access View<br />
    10. 10. Fragment & Link Buffer<br />The “Fragment & Link” buffer contains data and link for all fragments to store<br />Must be large enough to store all fragments<br />Created with Counter support<br />D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view<br />Declaration:<br />structFragmentAndLinkBuffer_STRUCT<br />{<br />FragmentData_STRUCTFragmentData; // Fragment data<br />uintuNext; // Link to next fragment<br />};<br />RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;<br />
    11. 11. Start Offset Buffer<br />The “Start Offset” buffer contains the offset of the last fragment written at every pixel location<br />Screen-sized:(width * height * sizeof(UINT32) )<br />Initialized to magic value (e.g. -1)<br />Magic value indicates no more fragments are stored (i.e. end of the list)<br />Declaration:<br />RWByteAddressBufferStartOffsetBuffer;<br />
    12. 12. Linked List Creation (1)<br />No color Render Target bound!<br />No rendering yet, just storing in L.L.<br />Depth buffer bound if needed<br />OIT will need it in a few slides<br />UAVs bounds as input/output:<br />StartOffsetBuffer (R/W)<br />FragmentAndLinkBuffer (W)<br />
    13. 13. Linked List Creation (2a)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />Fragment and Link Buffer<br />Counter =<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />
    14. 14. Linked List Creation (2b)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />0<br />Fragment and Link Buffer<br />Counter =<br />Fragment and Link Buffer<br />-1<br />Fragment and Link Buffer<br />-1<br />Fragment and Link Buffer<br />
    15. 15. Linked List Creation (2c)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />3<br />Fragment and Link Buffer<br />Counter =<br />Fragment and Link Buffer<br />-1<br />Fragment and Link Buffer<br />-1<br />-1<br />Fragment and Link Buffer<br />
    16. 16. Linked List Creation (2d)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />0<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Viewport<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />4<br />3<br />5<br />Fragment and Link Buffer<br />Counter =<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />0<br />-1<br />Fragment and Link Buffer<br />
    17. 17. Linked List Creation - Code<br />float PS_StoreFragments(PS_INPUT input) : SV_Target<br />{<br /> // Calculate fragment data (color, depth, etc.)<br />FragmentData_STRUCTFragmentData = ComputeFragment();<br /> // Retrieve current pixel count and increase counter<br />uintuPixelCount = FLBuffer.IncrementCounter();<br />// Exchange offsets in StartOffsetBuffer<br />uintvPos = uint(input.vPos);<br />uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );<br />uintuOldStartOffset;<br />StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);<br />// Add new fragment entry in Fragment & Link Buffer<br />FragmentAndLinkBuffer_STRUCT Element;<br />Element.FragmentData = FragmentData;<br />Element.uNext = uOldStartOffset;<br />FLBuffer[uPixelCount] = Element;<br />}<br />
    18. 18. Traversing Per-Pixel Linked Lists<br />
    19. 19. Rendering Pixels (1)<br />“Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV<br />Buffer<uint> StartOffsetBufferSRV;<br />StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;<br />Render a fullscreen quad<br />For each pixel, parse the linked list and retrieve fragments for this screen position<br />Process list of fragments as required<br />Depends on algorithm<br />e.g. sorting, finding maximum, etc.<br /> SRV = Shader Resource View<br />
    20. 20. Rendering from Linked List<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />4<br />-1<br />-1<br />-1<br />-1<br />3<br />3<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Render Target<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />0<br />0<br />-1<br />Fragment and Link Buffer<br />
    21. 21. Rendering Pixels (2)<br />float4 PS_RenderFragments(PS_INPUT input) : SV_Target<br />{<br /> // Calculate UINT-aligned start offset buffer address<br />uintvPos = uint(input.vPos);<br />uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x;<br />// Fetch offset of first fragment for current pixel<br />uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);<br />// Parse linked list for all fragments at this position<br /> float4 FinalColor=float4(0,0,0,0);<br /> while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value<br /> {<br />// Retrieve pixel at current offset<br /> Element=FLBufferSRV[uOffset];<br /> // Process pixel as required<br />ProcessPixel(Element, FinalColor);<br />// Retrieve next offset<br />uOffset = Element.uNext;<br /> }<br /> return (FinalColor);<br />}<br />
    22. 22. Order-Independent Transparency via Per-Pixel Linked Lists<br />Nicolas Thibieroz<br />European ISV Relations<br />AMD<br />
    23. 23. Description<br />Straight application of the linked list algorithm<br />Stores transparent fragments into PPLL<br />Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader<br />Blend mode can be unique per-pixel!<br />Special case for MSAA support<br />
    24. 24. Linked List Structure<br />Optimize performance by reducing amount of data to write to/read from UAV<br />E.g. uint instead of float4 for color<br />Example data structure for OIT:<br />structFragmentAndLinkBuffer_STRUCT<br />{<br />uintuPixelColor; // Packed pixel color<br />uintuDepth; // Pixel depth<br />uintuNext; // Address of next link<br />};<br />May also get away with packed color and depth into the same uint! (if same alpha)<br />16 bits color (565) + 16 bits depth<br />Performance/memory/quality trade-off<br />
    25. 25. Visible Fragments Only!<br />Use [earlydepthstencil] in front of Linked List creation pixel shader<br />This ensures only transparent fragments that pass the depth test are stored<br />i.e. Visible fragments!<br />Allows performance savings and rendering correctness!<br />[earlydepthstencil]<br />float PS_StoreFragments(PS_INPUT input) : SV_Target<br />{<br /> ...<br />}<br />
    26. 26. Sorting Pixels<br />Sorting in place requires R/W access to Linked List<br />Sparse memory accesses = slow!<br />Better way is to copy all pixels into array of temp registers<br />Then do the sorting<br />Temp array declaration means a hard limit on number of pixel per screen coordinates<br />Required trade-off for performance<br />
    27. 27. Sorting and Blending<br />0.95<br />0.93<br />0.87<br />0.98<br />Background<br />color<br />Temp Array<br />Render Target<br />PS color<br />Blend fragments back to front in PS<br />Blending algorithm up to app<br />Example: SRCALPHA-INVSRCALPHA<br />Or unique per pixel! (stored in fragment data)<br />Background passed as input texture<br />Actual HW blending mode disabled<br />0.98<br />0.87<br />0.95<br />0.93<br />-1<br />0<br />12<br />34<br />
    28. 28. Storing Pixels for Sorting<br /> (...)<br /> static uint2 SortedPixels[MAX_SORTED_PIXELS]; <br /> // Parse linked list for all pixels at this position<br /> // and store them into temp array for later sorting<br />intnNumPixels=0;<br /> while (uOffset!=0xFFFFFFFF)<br /> {<br />// Retrieve pixel at current offset<br /> Element=FLBufferSRV[uOffset];<br /> // Copy pixel data into temp array<br />SortedPixels[nNumPixels++]=<br /> uint2(Element.uPixelColor, Element.uDepth);<br />// Retrieve next offset<br /> [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?<br /> 0xFFFFFFFF : Element.uNext;<br /> }<br />// Sort pixels in-place<br />SortPixelsInPlace(SortedPixels, nNumPixels);<br /> (...) <br />
    29. 29. Pixel Blending in PS<br /> (...)<br /> // Retrieve current color from background texture<br /> float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0)); <br /> // Rendering pixels using SRCALPHA-INVSRCALPHA blending<br /> for (int k=0; k<nNumPixels; k++)<br /> {<br />// Retrieve next unblended furthermost pixel<br /> float4 vPixColor= UnpackFromUint(SortedPixels[k].x);<br /> // Manual blending between current fragment and previous one<br /> vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,<br />vPixColor.w);<br /> }<br />// Return manually-blended color<br /> return vCurrentColor;<br />} <br />
    30. 30. OIT via Per-Pixel Linked Lists with MSAA Support<br />
    31. 31. Sample Coverage<br />Storing individual samples into Linked Lists requires a huge amount of memory<br />... and performance will suffer!<br />Solution is to store transparent pixels into PPLL as before<br />But including sample coverage too!<br />Requires as many bits as MSAA mode<br />Declare SV_COVERAGE in PS structure<br />struct PS_INPUT<br /> {<br /> float3 vNormal : NORMAL;<br /> float2 vTex : TEXCOORD;<br /> float4 vPos : SV_POSITION;<br />uintuCoverage : SV_COVERAGE;<br /> }<br />
    32. 32. Linked List Structure<br />Almost unchanged from previously<br />Depth is now packed into 24 bits<br />8 Bits are used to store coverage<br />structFragmentAndLinkBuffer_STRUCT<br />{<br />uintuPixelColor; // Packed pixel color<br />uintuDepthAndCoverage; // Depth + coverage<br />uintuNext; // Address of next link<br />};<br />
    33. 33. Sample Coverage Example<br />Pixel Center<br />Sample<br />Third sample is covered<br />uCoverage = 0x04 (0100 in binary)<br />Element.uDepthAndCoverage = <br />( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;<br />
    34. 34. Rendering Samples (1)<br />Rendering phase needs to be able to write individual samples<br />Thus PS is run at sample frequency<br />Can be done by declaring SV_SAMPLEINDEX in input structure<br />Parse linked list and store pixels into temp array for later sorting<br />Similar to non-MSAA case<br />Difference is to only store sample if coverage matches sample index being rasterized<br />
    35. 35. Rendering Samples (2)<br /> static uint2 SortedPixels[MAX_SORTED_PIXELS]; <br /> // Parse linked list for all pixels at this position<br /> // and store them into temp array for later sorting<br />intnNumPixels=0;<br /> while (uOffset!=0xFFFFFFFF)<br /> {<br />// Retrieve pixel at current offset<br /> Element=FLBufferSRV[uOffset];<br /> // Retrieve pixel coverage from linked list element<br />uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage);<br /> if ( uCoverage & (1<<In.uSampleIndex) )<br /> {<br /> // Coverage matches current sample so copy pixel<br />SortedPixels[nNumPixels++]=Element;<br />}<br />// Retrieve next offset<br /> [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?<br /> 0xFFFFFFFF : Element.uNext;<br /> }<br />
    36. 36. DEMO<br />OIT Linked List Demo<br />
    37. 37. Direct3D 11 Indirect Illumination<br />Holger GruenEuropean ISV Relations AMD<br />
    38. 38. Indirect Illumination Introduction 1<br />Real-time Indirect illumination is an active research topic<br />Numerous approaches existReflective Shadow Maps (RSM)[Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] <br />Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010]<br />Most approaches somehow extend to multi-bounce lighting<br />
    39. 39. Indirect Illumination Introduction 2<br />This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination<br />Indirect shadows are ignored here<br />A Direct3D 11 technique that traces rays to compute indirect shadows<br />Part of this technique could generally be used for ray-tracing dynamic scenes<br />
    40. 40. Indirect Illumination w/o Indirect Shadows<br />Draw scene g-buffer<br />Draw Reflective Shadowmap (RSM)<br />RSM shows the part of the scene that recieves direct light from the light source<br />Draw Indirect Light buffer at ½ res <br />RSM texels are used as light sources on g-buffer pixels for indirect lighting <br />Upsample Indirect Light (IL)<br />Draw final image adding IL<br />
    41. 41. Step 1<br />G-Buffer needs to allow reconstruction of<br />World/Camera space position<br />World/Camera space normal<br />Color/ Albedo <br />DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows<br />
    42. 42. Step 2<br />RSM needs to allow reconstruction of<br />World/Camera space position<br />World/Camera space normal<br />Color/ Albedo<br />Only draw emitters of indirect light<br />DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows<br />
    43. 43. Step 3<br />Render a ½ res IL as a deferred op<br />Transform g-buffer pix to RSM space<br />->Light Space->project to RSM texel space<br />Use a kernel of RSM texels as light sources<br />RSM texels also called Virtual Point Light(VPL)<br />Kernel size depends on<br />Desired speed<br />Desired look of the effect<br />RSM resolution<br />
    44. 44. Computing IL at a G-buf Pixel 1<br />Sum up contribution of all VPLs in the kernel<br />
    45. 45. Computing IL at a G-buf Pixel 2<br />RSM texel/VPL<br />g-buffer pixel<br />This term is very similar to terms used in radiosity form factor computations<br />
    46. 46. Computing IL at a G-buf Pixel 3<br />A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3.<br />stx : sub RSM texel x position [0.0, 1.0[<br />sty : sub RSM texel y position [0.0, 1.0[<br />
    47. 47. Computing IL at a g-buf pixel 4<br />IndirectLight =<br /> (1.0f-sty) * ((1.0f-stx) * + stx * ) +<br /> (0.0f+sty) * ((1.0f-stx) * + stx * )<br />Evaluation of 4 big VPL kernels is slow <br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
    48. 48. Computing IL at a g-buf pixel 5<br />SmoothIndirectLight =<br /> (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+<br /> (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4<br />stx : sub RSM texel x position of g-buf pix [0.0, 1.0[<br />sty : sub RSM texel y position of g-buf pix [0.0, 1.0[<br />This trick is probably known to some of you already. See backup for a detailed explanation !<br />
    49. 49. Indirect Light Buffer<br />
    50. 50. Step 4<br />Indirect Light buffer is ½ res<br />Perform a bilateral upsampling step<br />SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007<br />Result is a full resolution IL<br />
    51. 51. Step 5<br />Combine <br />Direct Illumination<br />Indirect Illumination<br />Shadows (not mentioned)<br />
    52. 52. Scene without IL<br />
    53. 53. Combined Image<br />DEMO<br />~280 FPS on a HD5970 @ 1280x1024<br />for a 15x15 VPL kernel<br />Deffered IL pass + bilateral upsampling costs ~2.5 ms<br />
    54. 54. How to add Indirect Shadows<br />Use a CS and the linked lists technique<br />Insert blocker geomety of IL into 3D grid of lists – let‘s use the triangles of the blocker for now <br />see backup for alternative data structure<br />Look at a kernel of VPLs again<br />Only accumulate light of VPLs that are occluded by blocker tris<br />Trace rays through 3d grid to detect occluded VPLs<br />Render low res buffer only<br />Subtract blocked indirect light from IL buffer<br />Blurred version of low res blocked IL is used<br />Blur is combined bilateral blurring/upsampling<br />
    55. 55. Insert tris into 3D grid of triangle lists<br />Rasterize dynamic blockers to 3D grid using a CS and atomics<br />Scene<br />
    56. 56. Insert tris into 3D grid of triangle lists<br />(0,1,0)<br />Rasterize dynamic blockers to 3D grid using a CS and atomics<br />World space 3D grid of triangle lists around IL blockers laid out in a UAV<br />Scene<br />eol = End of list (0xffffffff)<br />
    57. 57. 3D Grid Demo<br />
    58. 58. Indirect Light Buffer<br />Blocker of green light<br />Emitter of green<br />light<br />Expected<br />indirect shadow<br />
    59. 59. Blocked Indirect Light<br />
    60. 60. Indirect Light Buffer<br />
    61. 61. Subtracting Blocked IL<br />
    62. 62. Final Image<br />DEMO<br />~70 FPS on a HD5970 @ 1280x1024<br />~300 million rays per second for Indirect Shadows<br />Ray casting costs ~9 ms<br />
    63. 63. Future directions<br />Speed up IL rendering<br />Render IL at even lower res<br />Look into multi-res RSMs<br />Speed up ray-tracing<br />Per pixel array of lists for depth buckets (see backup)<br />Other data structures<br />Raytrace other primitive types<br />Splats, fuzzy ellipsoids etc.<br />Proxy geometry or bounding volumes of blockers<br />Get rid of Interlocked*() ops<br />Just mark grid cells as occupied<br />Lower quality but could work on earlier hardware<br />
    64. 64. Q&A<br />Holger Gruen holger.gruen@AMD.com<br />Nicolas Thibieroz nicolas.thibieroz@AMD.com<br />Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD) <br />
    65. 65. Backup Slides IL<br />
    66. 66. Computing IL at a g-buf pixel 1<br />Want to support low res RSMs<br />Want to create smooth indirect light <br />Goal is bi-linear filtering of four VPL-Kernels<br />Otherwise results don‘t look smooth<br />
    67. 67. Computing IL at a g-buf pixel 2<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
    68. 68. Computing IL at a g-buf pixel 3<br />For smooth IL one needs to consider four VPL kernels with centers at t0, t1, t2 and t3.<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
    69. 69. Computing IL at a g-buf pixel 4<br />Center at t0<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
    70. 70. Computing IL at a g-buf pixel 4<br />Center at t1<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />
    71. 71. Computing IL at a g-buf pixel 5<br />Center at t2<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />VPL kernel at t0<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />
    72. 72. Computing IL at a g-buf pixel 6<br />Center at t3<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
    73. 73. Computing IL at a g-buf pixel 7<br />IndirectLight =<br /> (1.0f-sty) * ((1.0f-stx) * + stx * ) +<br /> (0.0f+sty) * ((1.0f-stx) * + stx * )<br />Evaluation of 4 big VPL kernels is slow <br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
    74. 74. Computing IL at a g-buf pixel 8<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
    75. 75. Computing IL at a g-buf pixel 9<br />B1<br />B2<br />B0<br />B3<br />B4<br />B5<br />B6<br />B7<br />B8<br />VPL kernel at t0<br />stx : sub texel x position [0.0, 1.0[<br />VPL kernel at t2<br />sty : sub texel y position [0.0, 1.0[<br />VPL kernel at t1<br />VPL kernel at t3<br />
    76. 76. Computing IL at a g-buf pixel 9<br />IndirectLight =<br /> (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+<br /> (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4<br />Evaluation of 7 small and 1 bigger VPL kernels is fast <br />stx : sub texel x position [0.0, 1.0[<br />sty : sub texel y position [0.0, 1.0[<br />
    77. 77. Insert Tris into 2D Map of Lists of Tris<br />Rasterize blockers of IL from view of light <br />Light<br />2D buffer<br />Scene<br />
    78. 78. Insert Tris into 2D Map of Lists of Tris<br />Rasterize blockers of IL from view of light using a GS and conservative rasterization<br />Light<br />Scene<br />2D buffer of lists of triangles written to by scattering PS<br />eol = End of list (0xffffffff)<br />
    79. 79. Backup Slides PPLL<br />
    80. 80. Linked List Creation (2)<br />For every pixel:<br />Calculate pixel data (color, depth etc.)<br /> Retrieve current pixel count from Fragment & Link UAV and increment counter<br />uintuPixelCount =FragmentAndLinkBuffer.IncrementCounter();<br />Swap offsets in Start Offset UAV<br />uintuOldStartOffset; StartOffsetBuffer.InterlockedExchange(<br />PixelScreenSpacePositionLinearAddress, <br />uPixelCount, uOldStartOffset);<br />Add new entry to Fragment & Link UAV<br />FragmentAndLinkBuffer_STRUCT Element;<br />Element.FragmentData = FragmentData;<br />Element.uNext = uOldStartOffset;<br />FragmentAndLinkBuffer[uPixelCount] = Element;<br />
    81. 81. Linked List Creation (3d)<br />Start Offset Buffer<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />3<br />4<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Render Target<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />1<br />2<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />0<br />-1<br />-1<br />-1<br />-1<br />Fragment and Link Buffer<br />Fragment and Link Buffer<br />
    82. 82. PPLL Pros and Cons<br />Pros:<br />Allows storing of variable number of elements per pixel<br />Only one atomic operation per pixel<br />Using built-in HW UAV counter<br />Good performance<br />Cons:<br />Traversal causes a lot of random access<br />Special case for storing MSAA samples<br />
    83. 83. Alternative Technique:Per Pixel Array<br />There is a two pass method that allows the construction of per-pixel arrays<br />First described in ’Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps’(E.Sintorn, E. Eisemann, U. Assarsson, EG 2008)<br />Uses a pre-fix sum to allocate multiple entries into a buffer<br />Elements belonging to the same location will therefore be contiguous in memory<br />Faster traversal (less random access)<br />Not as fast as PPLL for scenes with ‘high’ depth complexity in our testing<br />
    84. 84. Rendering SamplesAlternate Method<br />Writes pixels straight into non-MSAA RT<br />Only viable if access to samples is no longer required<br />Samples must be resolved into pixels before writing<br />This affects the manual blending process<br />Samples with the same coverage are blended back-to-front<br />...then averaged (resolved) with other samples before written out to RT<br />This method can prove slightly faster than per-sample rendering<br />
    85. 85. Rendering FragmentsAlternate Method (2x MSAA)<br />// Retrieve current sample colors from background texture<br /> float4 vCurrentColor0=BackgroundTexture.Load(int3(vPos.xy, 0));<br /> float4 vCurrentColor1=BackgroundTexture.Load(int3(vPos.xy, 1)); <br /> // Rendering pixels using SRCALPHA-INVSRCALPHA blending<br /> for (int k=0; k<nNumPixels; k++)<br /> {<br />// Retrieve next unblended furthermost pixel<br /> float4 vPixColor= UnpackFromUint(SortedPixels[k].x);<br /> // Retrieve sample coverage<br />uintuCoverage=UnpackCoverage(SortedPixels[k].y);<br /> // Manual blending between current samples and previous ones<br /> if (uCoverage & (1<<0)) vCurrentColor0= lerp(vCurrentColor0.xyz, vPixColor.xyz, vPixColor.w);<br /> if (uCoverage & (1<<1)) vCurrentColor1= lerp(vCurrentColor1.xyz, vPixColor.xyz, vPixColor.w);<br /> }<br />// Return resolved and manually-blended color<br /> return (vCurrentColor0+vCurrentColor1)/0.5;<br />} <br />

    ×