The document discusses using DirectX 11 linked lists and per-pixel linked lists to implement order-independent transparency (OIT) and indirect illumination. It describes how linked lists can be used to store transparent fragments and then sort and blend them for OIT. It also explains how to trace rays through linked lists to compute indirect shadows for indirect illumination. The techniques open up new rendering algorithms for real-time graphics.
1 of 85
Downloaded 196 times
More Related Content
Oit And Indirect Illumination Using Dx11 Linked Lists
2. OIT and Indirect Illumination using DX11 Linked ListsHolger Gruen AMD ISV RelationsNicolas Thibieroz AMD ISV Relations
4. IntroductionDirect3D 11 HW opens the door to many new rendering algorithmsIn particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc. This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techiqueOITIndirect Illumination
5. Per-Pixel Linked Lists with Direct3D 11ElementElementElementElementLinkLinkLinkLinkNicolas ThibierozEuropean ISV RelationsAMD
6. Why Linked Lists?Data structure useful for programmingVery hard to implement efficiently with previous real-time graphics APIsDX11 allows efficient creation and parsing of linked listsPer-pixel linked listsA collection of linked lists enumerating all pixels belonging to the same screen positionElementElementElementElementLinkLinkLinkLink
7. Two-step process1) Linked List CreationStore incoming fragments into linked lists2) Rendering from Linked ListLinked List traversal and processing of stored fragments
9. PS5.0 and UAVsUses a Pixel Shader 5.0 to store fragments into linked listsNot a Compute Shader 5.0!Uses atomic operationsTwo UAV buffers required- “Fragment & Link” buffer- “Start Offset” buffer UAV = Unordered Access View
10. Fragment & Link BufferThe “Fragment & Link” buffer contains data and link for all fragments to storeMust be large enough to store all fragmentsCreated with Counter supportD3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV viewDeclaration:structFragmentAndLinkBuffer_STRUCT{FragmentData_STRUCTFragmentData; // Fragment datauintuNext; // Link to next fragment};RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;
11. Start Offset BufferThe “Start Offset” buffer contains the offset of the last fragment written at every pixel locationScreen-sized:(width * height * sizeof(UINT32) )Initialized to magic value (e.g. -1)Magic value indicates no more fragments are stored (i.e. end of the list)Declaration:RWByteAddressBufferStartOffsetBuffer;
12. Linked List Creation (1)No color Render Target bound!No rendering yet, just storing in L.L.Depth buffer bound if neededOIT will need it in a few slidesUAVs bounds as input/output:StartOffsetBuffer (R/W)FragmentAndLinkBuffer (W)
13. Linked List Creation (2a)Start Offset Buffer-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1Viewport-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-10Fragment and Link BufferCounter =Fragment and Link BufferFragment and Link BufferFragment and Link Buffer
14. Linked List Creation (2b)Start Offset Buffer-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1Viewport-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-110Fragment and Link BufferCounter =Fragment and Link Buffer-1Fragment and Link Buffer-1Fragment and Link Buffer
15. Linked List Creation (2c)Start Offset Buffer-1-1-1-1-1-1-10-1-1-1-1-1-1-1-1-1-1Viewport-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1123Fragment and Link BufferCounter =Fragment and Link Buffer-1Fragment and Link Buffer-1-1Fragment and Link Buffer
16. Linked List Creation (2d)Start Offset Buffer-1-1-1-1-1-1-1-1-1-1-10-1-1-1-1-1-1Viewport-1-1-1-1-1-1-1-1-112-1-1-1-1-1-1-1435Fragment and Link BufferCounter =-1-1-1Fragment and Link Buffer0-1Fragment and Link Buffer
17. Linked List Creation - Codefloat PS_StoreFragments(PS_INPUT input) : SV_Target{ // Calculate fragment data (color, depth, etc.)FragmentData_STRUCTFragmentData = ComputeFragment(); // Retrieve current pixel count and increase counteruintuPixelCount = FLBuffer.IncrementCounter();// Exchange offsets in StartOffsetBufferuintvPos = uint(input.vPos);uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );uintuOldStartOffset;StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);// Add new fragment entry in Fragment & Link BufferFragmentAndLinkBuffer_STRUCT Element;Element.FragmentData = FragmentData;Element.uNext = uOldStartOffset;FLBuffer[uPixelCount] = Element;}
19. Rendering Pixels (1)“Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRVBuffer<uint> StartOffsetBufferSRV;StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;Render a fullscreen quadFor each pixel, parse the linked list and retrieve fragments for this screen positionProcess list of fragments as requiredDepends on algorithme.g. sorting, finding maximum, etc. SRV = Shader Resource View
20. Rendering from Linked ListStart Offset Buffer-1-1-1-1-1-14-1-1-1-133-1-1-1-1-1-1Render Target-1-1-1-1-1-1-1-1-112-1-1-1-1-1-1-1Fragment and Link Buffer-1-1-1-1Fragment and Link Buffer00-1Fragment and Link Buffer
21. Rendering Pixels (2)float4 PS_RenderFragments(PS_INPUT input) : SV_Target{ // Calculate UINT-aligned start offset buffer addressuintvPos = uint(input.vPos);uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x;// Fetch offset of first fragment for current pixeluintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);// Parse linked list for all fragments at this position float4 FinalColor=float4(0,0,0,0); while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value {// Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Process pixel as requiredProcessPixel(Element, FinalColor);// Retrieve next offsetuOffset = Element.uNext; } return (FinalColor);}
23. DescriptionStraight application of the linked list algorithmStores transparent fragments into PPLLRendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shaderBlend mode can be unique per-pixel!Special case for MSAA support
24. Linked List StructureOptimize performance by reducing amount of data to write to/read from UAVE.g. uint instead of float4 for colorExample data structure for OIT:structFragmentAndLinkBuffer_STRUCT{uintuPixelColor; // Packed pixel coloruintuDepth; // Pixel depthuintuNext; // Address of next link};May also get away with packed color and depth into the same uint! (if same alpha)16 bits color (565) + 16 bits depthPerformance/memory/quality trade-off
25. Visible Fragments Only!Use [earlydepthstencil] in front of Linked List creation pixel shaderThis ensures only transparent fragments that pass the depth test are storedi.e. Visible fragments!Allows performance savings and rendering correctness![earlydepthstencil]float PS_StoreFragments(PS_INPUT input) : SV_Target{ ...}
26. Sorting PixelsSorting in place requires R/W access to Linked ListSparse memory accesses = slow!Better way is to copy all pixels into array of temp registersThen do the sortingTemp array declaration means a hard limit on number of pixel per screen coordinatesRequired trade-off for performance
27. Sorting and Blending0.950.930.870.98BackgroundcolorTemp ArrayRender TargetPS colorBlend fragments back to front in PSBlending algorithm up to appExample: SRCALPHA-INVSRCALPHAOr unique per pixel! (stored in fragment data)Background passed as input textureActual HW blending mode disabled0.980.870.950.93-101234
28. Storing Pixels for Sorting (...) static uint2 SortedPixels[MAX_SORTED_PIXELS]; // Parse linked list for all pixels at this position // and store them into temp array for later sortingintnNumPixels=0; while (uOffset!=0xFFFFFFFF) {// Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Copy pixel data into temp arraySortedPixels[nNumPixels++]= uint2(Element.uPixelColor, Element.uDepth);// Retrieve next offset [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ? 0xFFFFFFFF : Element.uNext; }// Sort pixels in-placeSortPixelsInPlace(SortedPixels, nNumPixels); (...)
29. Pixel Blending in PS (...) // Retrieve current color from background texture float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0)); // Rendering pixels using SRCALPHA-INVSRCALPHA blending for (int k=0; k<nNumPixels; k++) {// Retrieve next unblended furthermost pixel float4 vPixColor= UnpackFromUint(SortedPixels[k].x); // Manual blending between current fragment and previous one vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,vPixColor.w); }// Return manually-blended color return vCurrentColor;}
31. Sample CoverageStoring individual samples into Linked Lists requires a huge amount of memory... and performance will suffer!Solution is to store transparent pixels into PPLL as beforeBut including sample coverage too!Requires as many bits as MSAA modeDeclare SV_COVERAGE in PS structurestruct PS_INPUT { float3 vNormal : NORMAL; float2 vTex : TEXCOORD; float4 vPos : SV_POSITION;uintuCoverage : SV_COVERAGE; }
32. Linked List StructureAlmost unchanged from previouslyDepth is now packed into 24 bits8 Bits are used to store coveragestructFragmentAndLinkBuffer_STRUCT{uintuPixelColor; // Packed pixel coloruintuDepthAndCoverage; // Depth + coverageuintuNext; // Address of next link};
34. Rendering Samples (1)Rendering phase needs to be able to write individual samplesThus PS is run at sample frequencyCan be done by declaring SV_SAMPLEINDEX in input structureParse linked list and store pixels into temp array for later sortingSimilar to non-MSAA caseDifference is to only store sample if coverage matches sample index being rasterized
35. Rendering Samples (2) static uint2 SortedPixels[MAX_SORTED_PIXELS]; // Parse linked list for all pixels at this position // and store them into temp array for later sortingintnNumPixels=0; while (uOffset!=0xFFFFFFFF) {// Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Retrieve pixel coverage from linked list elementuintuCoverage=UnpackCoverage(Element.uDepthAndCoverage); if ( uCoverage & (1<<In.uSampleIndex) ) { // Coverage matches current sample so copy pixelSortedPixels[nNumPixels++]=Element;}// Retrieve next offset [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ? 0xFFFFFFFF : Element.uNext; }
38. Indirect Illumination Introduction 1Real-time Indirect illumination is an active research topicNumerous approaches existReflective Shadow Maps (RSM)[Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010]Most approaches somehow extend to multi-bounce lighting
39. Indirect Illumination Introduction 2This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illuminationIndirect shadows are ignored hereA Direct3D 11 technique that traces rays to compute indirect shadowsPart of this technique could generally be used for ray-tracing dynamic scenes
40. Indirect Illumination w/o Indirect ShadowsDraw scene g-bufferDraw Reflective Shadowmap (RSM)RSM shows the part of the scene that recieves direct light from the light sourceDraw Indirect Light buffer at ½ res RSM texels are used as light sources on g-buffer pixels for indirect lighting Upsample Indirect Light (IL)Draw final image adding IL
41. Step 1G-Buffer needs to allow reconstruction ofWorld/Camera space positionWorld/Camera space normalColor/ Albedo DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows
42. Step 2RSM needs to allow reconstruction ofWorld/Camera space positionWorld/Camera space normalColor/ AlbedoOnly draw emitters of indirect lightDXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows
43. Step 3Render a ½ res IL as a deferred opTransform g-buffer pix to RSM space->Light Space->project to RSM texel spaceUse a kernel of RSM texels as light sourcesRSM texels also called Virtual Point Light(VPL)Kernel size depends onDesired speedDesired look of the effectRSM resolution
44. Computing IL at a G-buf Pixel 1Sum up contribution of all VPLs in the kernel
45. Computing IL at a G-buf Pixel 2RSM texel/VPLg-buffer pixelThis term is very similar to terms used in radiosity form factor computations
46. Computing IL at a G-buf Pixel 3A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3.stx : sub RSM texel x position [0.0, 1.0[sty : sub RSM texel y position [0.0, 1.0[
47. Computing IL at a g-buf pixel 4IndirectLight = (1.0f-sty) * ((1.0f-stx) * + stx * ) + (0.0f+sty) * ((1.0f-stx) * + stx * )Evaluation of 4 big VPL kernels is slow VPL kernel at t0stx : sub texel x position [0.0, 1.0[VPL kernel at t2sty : sub texel y position [0.0, 1.0[VPL kernel at t1VPL kernel at t3
48. Computing IL at a g-buf pixel 5SmoothIndirectLight = (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+ (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4stx : sub RSM texel x position of g-buf pix [0.0, 1.0[sty : sub RSM texel y position of g-buf pix [0.0, 1.0[This trick is probably known to some of you already. See backup for a detailed explanation !
50. Step 4Indirect Light buffer is ½ resPerform a bilateral upsampling stepSeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007Result is a full resolution IL
53. Combined ImageDEMO~280 FPS on a HD5970 @ 1280x1024for a 15x15 VPL kernelDeffered IL pass + bilateral upsampling costs ~2.5 ms
54. How to add Indirect ShadowsUse a CS and the linked lists techniqueInsert blocker geomety of IL into 3D grid of lists – let‘s use the triangles of the blocker for now see backup for alternative data structureLook at a kernel of VPLs againOnly accumulate light of VPLs that are occluded by blocker trisTrace rays through 3d grid to detect occluded VPLsRender low res buffer onlySubtract blocked indirect light from IL bufferBlurred version of low res blocked IL is usedBlur is combined bilateral blurring/upsampling
55. Insert tris into 3D grid of triangle listsRasterize dynamic blockers to 3D grid using a CS and atomicsScene
56. Insert tris into 3D grid of triangle lists(0,1,0)Rasterize dynamic blockers to 3D grid using a CS and atomicsWorld space 3D grid of triangle lists around IL blockers laid out in a UAVSceneeol = End of list (0xffffffff)
62. Final ImageDEMO~70 FPS on a HD5970 @ 1280x1024~300 million rays per second for Indirect ShadowsRay casting costs ~9 ms
63. Future directionsSpeed up IL renderingRender IL at even lower resLook into multi-res RSMsSpeed up ray-tracingPer pixel array of lists for depth buckets (see backup)Other data structuresRaytrace other primitive typesSplats, fuzzy ellipsoids etc.Proxy geometry or bounding volumes of blockersGet rid of Interlocked*() opsJust mark grid cells as occupiedLower quality but could work on earlier hardware
64. Q&AHolger Gruen holger.gruen@AMD.comNicolas Thibieroz nicolas.thibieroz@AMD.comCredits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD)
66. Computing IL at a g-buf pixel 1Want to support low res RSMsWant to create smooth indirect light Goal is bi-linear filtering of four VPL-KernelsOtherwise results don‘t look smooth
67. Computing IL at a g-buf pixel 2stx : sub texel x position [0.0, 1.0[sty : sub texel y position [0.0, 1.0[
68. Computing IL at a g-buf pixel 3For smooth IL one needs to consider four VPL kernels with centers at t0, t1, t2 and t3.stx : sub texel x position [0.0, 1.0[sty : sub texel y position [0.0, 1.0[
69. Computing IL at a g-buf pixel 4Center at t0VPL kernel at t0stx : sub texel x position [0.0, 1.0[sty : sub texel y position [0.0, 1.0[
70. Computing IL at a g-buf pixel 4Center at t1VPL kernel at t0stx : sub texel x position [0.0, 1.0[sty : sub texel y position [0.0, 1.0[VPL kernel at t1
71. Computing IL at a g-buf pixel 5Center at t2stx : sub texel x position [0.0, 1.0[VPL kernel at t2VPL kernel at t0sty : sub texel y position [0.0, 1.0[VPL kernel at t1
72. Computing IL at a g-buf pixel 6Center at t3VPL kernel at t0stx : sub texel x position [0.0, 1.0[VPL kernel at t2sty : sub texel y position [0.0, 1.0[VPL kernel at t1VPL kernel at t3
73. Computing IL at a g-buf pixel 7IndirectLight = (1.0f-sty) * ((1.0f-stx) * + stx * ) + (0.0f+sty) * ((1.0f-stx) * + stx * )Evaluation of 4 big VPL kernels is slow VPL kernel at t0stx : sub texel x position [0.0, 1.0[VPL kernel at t2sty : sub texel y position [0.0, 1.0[VPL kernel at t1VPL kernel at t3
74. Computing IL at a g-buf pixel 8VPL kernel at t0stx : sub texel x position [0.0, 1.0[VPL kernel at t2sty : sub texel y position [0.0, 1.0[VPL kernel at t1VPL kernel at t3
75. Computing IL at a g-buf pixel 9B1B2B0B3B4B5B6B7B8VPL kernel at t0stx : sub texel x position [0.0, 1.0[VPL kernel at t2sty : sub texel y position [0.0, 1.0[VPL kernel at t1VPL kernel at t3
76. Computing IL at a g-buf pixel 9IndirectLight = (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+ (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4Evaluation of 7 small and 1 bigger VPL kernels is fast stx : sub texel x position [0.0, 1.0[sty : sub texel y position [0.0, 1.0[
77. Insert Tris into 2D Map of Lists of TrisRasterize blockers of IL from view of light Light2D bufferScene
78. Insert Tris into 2D Map of Lists of TrisRasterize blockers of IL from view of light using a GS and conservative rasterizationLightScene2D buffer of lists of triangles written to by scattering PSeol = End of list (0xffffffff)
80. Linked List Creation (2)For every pixel:Calculate pixel data (color, depth etc.) Retrieve current pixel count from Fragment & Link UAV and increment counteruintuPixelCount =FragmentAndLinkBuffer.IncrementCounter();Swap offsets in Start Offset UAVuintuOldStartOffset; StartOffsetBuffer.InterlockedExchange(PixelScreenSpacePositionLinearAddress, uPixelCount, uOldStartOffset);Add new entry to Fragment & Link UAVFragmentAndLinkBuffer_STRUCT Element;Element.FragmentData = FragmentData;Element.uNext = uOldStartOffset;FragmentAndLinkBuffer[uPixelCount] = Element;
81. Linked List Creation (3d)Start Offset Buffer-1-1-1-1-1-1-134-1-1-1-1-1-1-1-1-1Render Target-1-1-1-1-1-1-1-1-112-1-1-1-1-1-1-1Fragment and Link BufferFragment and Link Buffer0-1-1-1-1Fragment and Link BufferFragment and Link Buffer
82. PPLL Pros and ConsPros:Allows storing of variable number of elements per pixelOnly one atomic operation per pixelUsing built-in HW UAV counterGood performanceCons:Traversal causes a lot of random accessSpecial case for storing MSAA samples
83. Alternative Technique:Per Pixel ArrayThere is a two pass method that allows the construction of per-pixel arraysFirst described in ’Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps’(E.Sintorn, E. Eisemann, U. Assarsson, EG 2008)Uses a pre-fix sum to allocate multiple entries into a bufferElements belonging to the same location will therefore be contiguous in memoryFaster traversal (less random access)Not as fast as PPLL for scenes with ‘high’ depth complexity in our testing
84. Rendering SamplesAlternate MethodWrites pixels straight into non-MSAA RTOnly viable if access to samples is no longer requiredSamples must be resolved into pixels before writingThis affects the manual blending processSamples with the same coverage are blended back-to-front...then averaged (resolved) with other samples before written out to RTThis method can prove slightly faster than per-sample rendering
85. Rendering FragmentsAlternate Method (2x MSAA)// Retrieve current sample colors from background texture float4 vCurrentColor0=BackgroundTexture.Load(int3(vPos.xy, 0)); float4 vCurrentColor1=BackgroundTexture.Load(int3(vPos.xy, 1)); // Rendering pixels using SRCALPHA-INVSRCALPHA blending for (int k=0; k<nNumPixels; k++) {// Retrieve next unblended furthermost pixel float4 vPixColor= UnpackFromUint(SortedPixels[k].x); // Retrieve sample coverageuintuCoverage=UnpackCoverage(SortedPixels[k].y); // Manual blending between current samples and previous ones if (uCoverage & (1<<0)) vCurrentColor0= lerp(vCurrentColor0.xyz, vPixColor.xyz, vPixColor.w); if (uCoverage & (1<<1)) vCurrentColor1= lerp(vCurrentColor1.xyz, vPixColor.xyz, vPixColor.w); }// Return resolved and manually-blended color return (vCurrentColor0+vCurrentColor1)/0.5;}
Editor's Notes
A node is composed of Element and Link
Uses a Pixel Shader 5.0 to store fragments into linked list: because we need trianglesrasterized as normal through the rendering pipeline. Briefly describe UAVs.Uses atomic operations: only 1 such operation per pixel is used (although it has feedback)
“Fragment & Link” buffer only use Write access actually.FragmentData: typically fragment color, but also transparency, depth, blend mode id, etc.Must be large enough to store all fragments: allocation must take into account maximum number of elements to store. E.g. “average overdraw per pixel” can be used for allocation: width*height*averageoverdraw*sizeof(Element&Link)Fragment data struct should be kept minimal to reduce performance requirements (and improve performance in the process by performing less memory operations)
StartOffsetBuffer is 1D buffer despite being screen-sized; this is because atomic operations are not supported on 2D UAV textures. Initialized to magic value: thus by default the linked lists are empty since they point to -1 (end of list).
“Viewport” is used to illustrate incoming triangle (as no RT is bound)Here fragment data is just the pixel color for simplification.Start Offset Buffer is initialized to “magic value” (-1)
We’re going to exchange the offset in the StartOffsetBuffer with the current value of the counter (before incrementation).This is because the current value of the counter gives us the offset (address) of the fragment we’re storing.
Assuming a rasterization order of left to right for the green triangle
Assuming a rasterization order of left to right for the yellow triangle
Current pixel count is TOTAL pixel count, not pixel count per pixel location.Not using Append Buffers because they require the DXGI_FORMAT_R32_UNKNOWN format.Times 4 because startoffsetaddress is in bytes, not UINT (1 UINT = 4 bytes).
No “4x”to calculate offset address here becauseStartOffsetBufferSRV is declared with the uint type.
No point storing transparent fragments that are hidden by opaque geometry, i.e. that fail the depth test.
May also get away with packed color and depth into the same uint!: this would make the element size a nice 64 bits.Deferred rendering engines may wish to store the same properties in F&L structure as opaque pixels, but watch out for memory and performance impact of large structure!If using different blend modes per pixel then you would need to store blend mode as id into fragment data.Depth is stored as uint – we’ll explain why later on.
Default DX11 pixel pipeline executes pixel shader before depth test[earlydepthstencil] allows the depth test to be performed *before* pixel shader.Normally the HW will try to do this for you automatically but a PS writing to UAVs cannot benefit from this treatment – a bit like alpha testing.Allows performance savings and rendering correctness!: less fragments to store = good!
Sparse memory accesses = slow!: especially true as original rendering order can mean that fragments belonging to the same screen coordinates can be (very) far apart from each other.
Under-blending can be used insteadNo need for background input thenActual HW blending mode enabledBlending mode to use with under-blending is ONE, SRCALPHA
SortedPixels[].x storescolor, SortedPixels[].y stores depthUse (nNumPixels>=MAX_SORTED_PIXELS) test to prevent overflowActual sort algorithm used can be anything (e.g. insertion sort, etc.)
This happens after sorting.vCurrentColor.xyz= vFragColor.w * vFragColor.xyz + (1.0-vFragColor.w)* vCurrentColor.xyz;
The problem with OIT via PPLL and MSAA is that fragment sorting and blending must happens on a per-sample basis, which means a lot of storage and processing.Storing individual samples into Linked Lists requires a huge amount of memory: as most pixels are not edge pixels, they would get all their samples covered. Unfortunately schemes mixing edge and non-edge pixels proved difficult to implement and did not generate satisfactory results.SV_COVERAGE isDX11 only
Depth is now packed into 24 bits: this is why depth is stored as uint
Blue triangle covers one depth sample
Assuming a rasterization order of left to right for the yellow triangle
Traversal causes a lot of random access: Elements belonging to a same pixel location can be far apart in LL!
Not as fast: probably because multiple passes involved.
Only viable if access to samples is no longer required: (e.g. for HDR-correct tone mapping)Sorting is unaffected because stored samples share the same depth.Samples with the same coverage are blended back-to-front: this means blending happens on a per-sample basis as it normally would.
Code illustration for 2 samplesSortedPixels[] contains pixel color in .x and depth+coverage in .y