Z Buffer Optimizations

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    Other Software techniques include Disable depth buffering when it is not needed, e.g. an alpha blended HUD If using multiple depth buffers, allocate the most render-intensive one first

    RADEON 9500/9700 can achieve up to 24:1 compression rate in extreme cases

    ATI calls Z-Cull “Hierarchical Z” and NVIDIA calls it “Light Memory Architecture.”

    Favorites, Groups & Events

    Z Buffer Optimizations - Presentation Transcript

    1. Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
    2. Overview
      • Z-Buffer Review
      • Hardware: Early-Z
      • Software: Front-to-Back Sorting
      • Hardware: Double-Speed Z-Only
      • Software: Early-Z Pass
      • Software: Deferred Shading
      • Hardware: Buffer Compression
      • Hardware: Fast Clear
      • Hardware: Z-Cull
      • Future: Programmable Culling Unit
    3. Z-Buffer Review
      • Also called Depth Buffer
      • Fragment vs Pixel
      • Alternatives: Painter’s, Ray Casting, etc
    4. Z-Buffer History
      • “Brute-force approach”
      • “Ridiculously expensive”
      • Sutherland, Sproull, and, Schumacker, “A Characterization of Ten Hidden-Surface Algorithms”, 1974
    5. Z-Buffer Quiz
      • 10 triangles cover a pixel. Rendering these in random order with a Z-buffer, what is the average number of times the pixel’s z-value is written?
      See Subtle Tools Slides: erich.realtimerendering.com
    6. Z-Buffer Quiz
      • 1 st triangle writes depth
      • 2 nd triangle has 1/2 chance of writing depth
      • 3 rd triangle has 1/3 chance of writing depth
      • 1 + 1/2 + 1/3 + …+ 1/10 = 2.9289…
      See Subtle Tools Slides: erich.realtimerendering.com
    7. Z-Buffer Quiz
      • Harmonic Series
      See Subtle Tools Slides: erich.realtimerendering.com # Depth Writes # Triangles 10 12,367 5 83 4.03 31 3.02 11 2.08 4 1 1
    8. Z-Test in the Pipeline
      • When is the Z-Test?
      or Fragment Shader Fragment Shader Z-Test Z-Test
    9. Early-Z
      • Avoid expensive fragment shaders
      • Reduce bandwidth to frame buffer
        • Writes not reads
      Fragment Shader Z-Test
    10. Early-Z
      • A utomatically enabled on GeForce (8?) unless
        • Fragment shader discard s or write depth
        • Depth writes and alpha-test are enabled
      • Fine-grained as opposed to Z-Cull .
      • ATI: “Top of the Pipe Z Reject”
      Fragment Shader Z-Test See NVIDIA GPU Programming Guide for exact details
    11. Front-to-Back Sorting
      • Utilize Early-Z for opaque objects
      • Old hardware still has less z-buffer writes
      • CPU overhead. Need efficient sorting
        • Bucket Sort
        • Octtree
      • Conflicts with state sorting
      0 2 0 - 0.25 0.25 – 0.5 0.5 – 0.75 0.75 - 1 1 1
    12. Double Speed Z-Only
      • GeForce FX and later render at double speed when writing only depth or stencil
      • Enabled when
        • Color writes are disabled
        • Fragment shader discard s or write depth
        • Alpha-test is disabled
      See NVIDIA GPU Programming Guide for exact details
    13. Early-Z Pass
      • Software technique to utilize Early-Z and Double Speed Z-Only
      • Two passes
        • Render depth only. “Lay down depth” – Double Speed Z-Only
        • Render with full shaders – Early-Z (and Z-Cull )
    14. Deferred Shading
      • Similar to Early-Z Pass
        • 1 st Pass: Visibility tests
        • 2 nd Pass: Shading
      • Different than Early-Z Pass
        • Geometry is only transformed once
    15. Deferred Shading
      • 1 st Pass
        • Render geometry into G-Buffers:
      Images from Tabula Rasa. See Resources. Fragment Colors Normals Depth Edge Weight
    16. Deferred Shading
      • 2 nd Pass
        • Shading == post processing effects
        • Render full screen quads that read from G-Buffers
        • Objects are no longer needed
    17. Deferred Shading
      • Light Accumulation Result
      Image from Tabula Rasa. See Resources.
    18. Deferred Shading
      • Eliminates shading fragments that fail Z-Test
      • Increases video memory requirement
      • How does it affect bandwidth?
    19. Buffer Compression
      • Reduce depth buffer bandwidth
      • Generally does not reduce memory usage of actual depth buffer
      • Same architecture applies to other buffers, e.g. color and stencil
    20. Buffer Compression
      • Tile Table: Status for n x n tile of depths, e.g. n =8
        • [state, z min , z max ]
        • state is either compressed , uncompressed , or cleared
      [ uncompressed , 0.1, 0.8] 0.1 0.5 0.5 0.1 0.5 0.5 0.1 0.8 0.8 0.8 0.8 0.5 0.5 0.5 0.5 0.1
    21. Buffer Compression Tile Table Decompress Compress Compressed Z-Buffer Rasterizer updated z-values updated z-max n x n uncompressed z values [z min , z max ]
    22. Buffer Compression
      • Depth Buffer Write
        • Rasterizer modifies copy of uncompressed tile
        • Tile is lossless compressed (if possible) and sent to actual depth buffer
        • Update Tile Table
          • z min and z max
          • status: compressed or decompressed
    23. Buffer Compression
      • Depth Buffer Read
        • Tile Status
          • Uncompressed : Send tile
          • Compressed : Decompress and send tile
          • Cleared : See Fast Clear
    24. Fast Clear
      • Don’t touch depth buffer
      • glClear sets state of each tile to cleared
      • When the rasterizer reads a cleared buffer
        • A tile filled with GL_DEPTH_CLEAR_VALUE is sent
        • Depth buffer is not accessed
    25. Fast Clear
      • Use glClear
        • Not full screen quads
        • No "one frame positive, one frame negative“ trick
      • Clear stencil together with depth
    26. Z-Cull
      • Cull blocks of fragments before shading
      • Coarse-grained as opposed to Early-Z
      Fragment Shader Z-Cull Z triangle min > tile’s z max z triangle min
    27. Z-Cull
      • Z max -Culling
        • Rasterizer fetches z max for each tile it processes
        • Compute z triangle min for a triangle
        • Culled if z triangle min > z max
      Fragment Shader Z-Cull Z triangle min > tile’s z max z triangle min
    28. Z-Cull
      • Z min -Culling
        • Support different depth tests
        • Avoid depth buffer reads
        • If triangle is in front of tile, depth tests for each pixel is unnecessary
      Fragment Shader Z-Cull Z triangle max < tile’s z min z triangle max
    29. Z-Cull
      • A utomatically enabled on GeForce (6?) cards unless
        • glClear isn’t used
        • Fragment shader writes depth (or discard s?)
        • Direction of depth test is changed
      • ATI recommends avoiding = and != depth compares and stencil fail and stencil depth fail operations
      • Less efficient when depth varies a lot within a few pixels
      See NVIDIA GPU Programming Guide for exact details
    30. Programmable Culling Unit
      • Cull before fragment shader even if the shader writes depth or discard s
      • Run part of shader over an entire tile to determine lower bound z value
      • Hasselgren and Akenine-Möller, “PCU: The Programmable Culling Unit,” 2007
    31. Summary
      • What was once “ridiculously expensive” is now the primary visible surface algorithm for rasterization
    32. Resources
      • www.realtimerendering.com
      Sections 7.9.2 and 18.3
    33. Resources
      • developer.nvidia.com/object/gpu_programming_guide.html
      GeForce 8 Guide: sections 3.4.9, 3.6, and 4.8 GeForce 7 Guide: section 3.6
    34. Resources
      • http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf
      ATI Radeon HyperZ Technology Steve Morein
    35. Resources
      • http://ati.amd.com/developer/dx9/ATI-DX9_Optimization.pdf
      Performance Optimization Techniques for ATI Graphics Hardware with DirectX® 9.0 Guennadi Riguer Sections 6.5 and 8
    36. Resources
      • developer.nvidia.com/object/gpu_gems_home.html
      Chapter 28: Graphics Pipeline Performance
    37. Resources
      • developer.nvidia.com/object/gpu-gems-3.html
      Chapter 19: Deferred Shading in Tabula Rasa

    + pjcozzipjcozzi, 3 months ago

    custom

    520 views, 0 favs, 3 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 520
      • 434 on SlideShare
      • 86 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 9
    Most viewed embeds
    • 80 views on http://cagetu.egloos.com
    • 5 views on http://www.hanrss.com
    • 1 views on http://3denginedesignforvirtualglobes.blogspot.com

    more

    All embeds
    • 80 views on http://cagetu.egloos.com
    • 5 views on http://www.hanrss.com
    • 1 views on http://3denginedesignforvirtualglobes.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories