Visibility Driven Out-of-Core HLOD Rendering

2,021 views

Published on

Abstract: With advances in model acquisition and procedural modeling, geometric models can have billions of polygons and gigabytes of textures. Such model complexity continues to outpace the explosive growth of CPU and GPU processing power. Brute force rendering cannot achieve interactive frame rates. Even if these massive models could fit into video memory, current GPUs can only process 10-200 million triangles per second. Interactive massive model rendering requires techniques that are output-sensitive: performance is a function of the number of pixels rendered, not the size of the model. Such techniques are surveyed, including visibility culling, level of detail, and memory management. In addition, this work presents a new out-of-core rendering algorithm that is demonstrated with a variety of HLOD rendering algorithms.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,021
On SlideShare
0
From Embeds
0
Number of Embeds
598
Actions
Shares
0
Downloads
34
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 30-60 gig on disk. 12 CDs
  • Need spatial coherence
  • Spatial data structures exploit spatial coherence.
    Visit nodes in front to back order. Useful for early-z and occlusion culling.
  • [Wonka00] presents per-region visibility with occluder fusion in urban environments. Assuming a 2.5D scene, all buildings must be perpendicular to the ground plane and connected to the ground.
    More recently, [Ohlarik08] presents efficient from-point occlusion culling for scenes with large spherical occluders (e.g. planets) and small occludees (e.g. satellites). Occlusion culling is reduced to horizon distance tests.
  • The bounding volume usually has far less geometry.
    Expensive shaders required to render the object are not generally required to render the bounding volume.
    When only depth testing is enabled, as is the case when rendering the bounding volume, today’s GPUs use a higher-performance rendering path.
  • Walk up tree to find first ancestor with geometry.
    Each node keeps a count of the number of children it has with geometry. Refinement stops at nodes with a count of zero since refinement cannot improve image quaility.
  • Visibility Driven Out-of-Core HLOD Rendering

    1. 1. Visibility Driven Out-of-Core HLOD Rendering Patrick Cozzi The University of Pennsylvania 00000000 of 01010110
    2. 2. Project History Procedurally generated model of Pompeii: ~1.4 billion polygons. Image from [Mueller06] 00000001 of 01010110
    3. 3. Project History Boeing 777 model: ~350 million polygons. Image from http://graphics.cs.uni-sb.de/MassiveRT/boeing777.html 00000010 of 01010110
    4. 4. Contents  Previous Work  View Frustum and Occlusion Culling  Hardware Occlusion Queries (HOQ)  Level of Detail (LOD)  Hierarchical Level of Detail (HLOD)  Out-of-Core Rendering (OOC) 00000011 of 01010110
    5. 5. Contents Continued  Implementation Work  Vertex Clustering [Rossignac93]  HLOD Tree Creation  Primary Contribution: OOC Rendering  Results  Future Work  Demos throughout 00000100 of 01010110
    6. 6. View Frustum Culling  Can be slower than brute force. When? culled rendered culled culled rendered rendered 00000101 of 01010110
    7. 7. View Frustum Culling 0 1 2 3 4 5 0 1 3 42 5 00000110 of 01010110
    8. 8. View Frustum Culling 0 1 2 3 4 5 0 1 3 42 5 00000111 of 01010110
    9. 9. View Frustum Culling  Demo 00001000 of 01010110
    10. 10. Occlusion Culling  Effective in scenes with high depth complexity culled 0001001 of 01010110
    11. 11. Occlusion Culling  From-region or from-point  Most are conservative  Occluder Fusion  Difficult for general scenes with arbitrary occluders. So make simplifying assumptions:  [Wonka00] – urban environments  [Ohlarik08] – planets and satellites 00001010 of 01010110
    12. 12. Hardware Occlusion Queries  From-point visibility that handles general scenes with arbitrary occluders and occluder fusion  How?  Use the GPU 00001011 of 01010110
    13. 13. Hardware Occlusion Queries  Disable color and depth write Color Buffer Depth Buffer 00001100 of 01010110
    14. 14. Hardware Occlusion Queries  Disable color and depth write  Render BV using HOQ 00001101 of 01010110
    15. 15. Hardware Occlusion Queries  Disable color and depth write  Render BV using HOQ  Enable color and depth writes Color Buffer Depth Buffer 0001110 of 01010110
    16. 16. Hardware Occlusion Queries  Disable color and depth write  Render BV using HOQ  Enable color and depth writes  Render object based on HOQ results 00001111 of 01010110
    17. 17. Hardware Occlusion Queries class IQueryOcclusion { public: virtual void Begin() = 0; virtual void End() = 0; virtual bool IsResultAvailable() = 0; virtual unsigned int NumberOfSamplesPassed() = 0; virtual unsigned int NumberOfFragmentsPassed() = 0; }; 00010000 of 01010110
    18. 18. Hardware Occlusion Queries class IQueryOcclusion { public: virtual void Begin() = 0; virtual void End() = 0; virtual bool IsResultAvailable() = 0; virtual unsigned int NumberOfSamplesPassed() = 0; virtual unsigned int NumberOfFragmentsPassed() = 0; }; 00010000 of 01010110
    19. 19. Hardware Occlusion Queries class IQueryOcclusion { public: virtual void Begin() = 0; virtual void End() = 0; virtual bool IsResultAvailable() = 0; virtual unsigned int NumberOfSamplesPassed() = 0; virtual unsigned int NumberOfFragmentsPassed() = 0; }; 00001000 of 01010110
    20. 20. Hardware Occlusion Queries  CPU stalls and GPU starvation Draw o1 Draw o2 Draw o3 Draw o1 Draw o2 Draw o3 CPU GPU Query o1 Query o1 Draw o1 Draw o1 -- stall -- -- starve -- CPU GPU 00010001 of 01010110
    21. 21. Is Culling Enough? 00010010 of 01010110
    22. 22. Is Culling Enough? Now what? 0001011 of 01010110
    23. 23. Is Culling Enough?  Demo 00010100 of 01010110
    24. 24. Level of Detail  Generation: less triangles, simpler shader  Selection: distance, pixel size  Switching: avoid popping  Discrete, Continuous, Hierarchical 00010101 of 01010110
    25. 25. Discrete LOD 3,086 Triangles 52,375 Triangles 69,541 Triangles 00010110 of 01010110
    26. 26. Discrete LOD  Demo 00010111 of 01010110
    27. 27. Discrete LOD Not enough detail up close Too much detail in the distance 00011000 of 01010110
    28. 28. Continuous LOD edge collapse vertex split Image from [Luebke01] 00011001 of 01010110
    29. 29. Hierarchical LOD 1 Node 3,086 Triangles 4 Nodes 9,421 Triangles 16 Nodes 77,097 Triangles 00011010 of 01010110
    30. 30. Hierarchical LOD 1 Node 3,086 Triangles 4 Nodes 9,421 Triangles 16 Nodes 77,097 Triangles 00011011 of 01010110
    31. 31. Hierarchical LOD visit(node) { if (computeSSE(node) < pixel tolerance) { render(node); } else { foreach (child in node.children) visit(child); } } Node Refinement 00011100 of 01010110
    32. 32. Hierarchical LOD 00011101 of 01010110
    33. 33. Hierarchical LOD  New Problem: Cracks 00011110 of 01010110
    34. 34. Hierarchical LOD  Demo 00011111 of 01010110
    35. 35. HLOD + Culling visit(node) { if (node overlaps view frustum) { // ... } } 00100000 of 01010110
    36. 36. HLOD + Culling visit(node) { if (node overlaps view frustum) { render node’s BV with HOQ if (query.NumberOfFragmentsPassed() > 0) { // ... } } } Render front to back! 00100001 of 01010110
    37. 37. HLOD + Culling + VMSSE visit(node) { if (node overlaps view frustum) { render node’s BV with HOQ if (query.NumberOfFragmentsPassed() > 0) { if (computeVMSSE(node, query) < tolerance) { render(node); } else { // ... } } } } 00100010 of 01010110
    38. 38.  VMSEE: Virtual Multiresolution SSE  Relative Visibility = # pixels visible / # possible pixels visible  VMSSE = f(SSE, Relative Visibility) VMSSE 00100011 of 01010110
    39. 39. Optimized HLOD Refinement Driven by HOQs [Charalambos07]  Exploit spatial and temporal coherence for scheduling HOQs.  Predict refinement based on node’s relative visibility from previous frame  VMSSEi est = SSEi * biasi-1 00100100 of 01010110
    40. 40. Optimized HLOD Refinement Driven by HOQs [Charalambos07]  Example prediction  Refinement stopped for this node in previous frame  VMSSEi est < threshold ? Stop : Refine  Stop:  Issue query  Render without checking query 00100101 of 01010110
    41. 41. Implementation Work  3 HLOD algorithms including [Charalambos07]  Vertex Clustering  HLOD Tree Creation  OOC Rendering  Load/Unload Rules  Rendering  Replacement Policy  Multithreading 00100110 of 01010110
    42. 42. Vertex Clustering [Rossignac93]  Fast: expected O(n)  Robustness: arbitrary topology  Capable of drastic simplification  “Easy to code”  OOC extensions [Lindstrom00] 00100111 of 01010110
    43. 43. Vertex Clustering [Rossignac93] 1. Compute per-vertex weights 11 0.8 0.50.5 2. Assign vertices to clusters 3. Identify highest weighted vertex in each cluster 00100111 of 01010110
    44. 44. Vertex Clustering [Rossignac93] 1. Compute per-vertex weights 11 0.8 2. Assign vertices to clusters 3. Identify highest weighted vertex in each cluster 4. Collapse and remove degenerate triangles 00101000 of 01010110
    45. 45. Vertex Clustering [Rossignac93] 3,086 Triangles 52,375 Triangles 69,541 Triangles 00101001 of 01010110
    46. 46. Vertex Clustering [Rossignac93]  Questionable Fidelity  Hard to control output  Conservative Error Metric 00101010 of 01010110
    47. 47. HLOD Tree Creation  Input  Model (.ply, .obj)  Target triangles per leaf node  Maximum tree depth  Output  1 file per node  Normals computed at runtime 00101011 of 01010110
    48. 48. HLOD Tree Creation  Top-down  Root node: Full AABB Lowest Detail 00101100 of 01010110
    49. 49. HLOD Tree Creation  Splitting Planes 2 Planes 3 Planes 00101101 of 01010110
    50. 50. HLOD Tree Creation  Splitting Planes 00101110 of 01010110
    51. 51. HLOD Tree Creation 00101111 of 01010110
    52. 52. visit(node) { if ((computeSSE(node) < pixel tolerance) || (not all children resident)) { render(node); foreach (child in node.children) requestResidency(child); } else { foreach (child in node.children) visit(child); } } Previous Work: Out-of-Core Based on [Ulrich02] Prefetch Need all children To render To refine 00110000 of 01010110
    53. 53. Previous Work: Out-of-Core  [Varadhan02]  Requires full skeleton in memory  No occlusion culling  No front-to-back sorting Image From [Varadhan02] 00110001 of 01010110
    54. 54. Previous Work: Out-of-Core  [Corrêa03]  PLP in separate thread  Requires full skeleton in memory  No LOD 00110010 of 01010110
    55. 55. Out-of-Core  Replacement Policy?  LRU?  Can’t refine when one child is removed  Remove deepest child in parent’s tree? 00110011 of 01010110
    56. 56. OOC Rendering  Benefits of our algorithm  No full HLOD skeleton  Works with HOQs  Refinement with a subset of children  Replacement policy maximizes detail near the viewer  Multithreaded 00110100 of 01010110
    57. 57. OOC Rendering: Load/Unload  HLOD tree on disk 00110101 of 01010110
    58. 58. OOC Rendering: Load/Unload  Subset of HLOD tree in memory 00110110 of 01010110
    59. 59. OOC Rendering: Load/Unload  Load node -> load children skeletons 00110111 of 01010110
    60. 60. OOC Rendering: Load/Unload  Only unload dynamic leafs 00111000 of 01010110
    61. 61. OOC Rendering: Load/Unload  Only unload dynamic leafs 00111001 of 01010110
    62. 62. OOC Rendering: Load/Unload  Nodes don’t need all their children in memory 00111010 of 01010110
    63. 63. OOC Rendering: Load/Unload  Result:  If a node is not a skeleton, none of its ancestors are skeletons. In other words, if a node has geometry loaded, so does all of its ancestors. 00111011 of 01010110
    64. 64. OOC Rendering: Load/Unload  Never Happens: 00111100 of 01010110
    65. 65. OOC Rendering: Rendering  Modify in-core HLOD  Add request queue:  Stop refinement at skeleton node  Push node onto request queue  Ensure parent safety  Render subset of parent’s geometry 00111101 of 01010110
    66. 66. OCC Rendering: Subset of Parent  Use OpenGL clipping planes 00111110 of 01010110
    67. 67. OCC Rendering: Subset of Parent  Without clipping planes 00111111 of 01010110
    68. 68. OCC Rendering: Subset of Parent  Demo 01000000 of 01010110
    69. 69. OCC Rendering: Node Replacement  Replacement List (only dynamic leafs) 01000001 of 01010110
    70. 70. OCC Rendering: Node Replacement  Replacement List Partitions 01000010 of 01010110
    71. 71. OCC Rendering: Node Replacement  Start Frame 01000011 of 01010110
    72. 72. OCC Rendering: Node Replacement  Add Node 01000100 of 01010110
    73. 73. OCC Rendering: Node Replacement  Render Node 01000101 of 01010110
    74. 74. OCC Rendering: Node Replacement  Move to safety 01000110 of 01010110
    75. 75. OCC Rendering: Node Replacement  Suggest Removal Node 01000111 of 01010110
    76. 76. OCC Rendering: Multithreading 01001000 of 01010110
    77. 77. Low Memory  Demo 01001000 of 01010110
    78. 78. Selected Results (lol)  Load Time  10 Blocks in Pompeii  5,646,041 triangles Time in seconds Full model 5.2 Out-of-Core 0.05 01001010 of 01010110
    79. 79. Selected Results View 1 View 2  Zoomed out rendering 01001011 of 01010110
    80. 80. Selected Results View 1 View 2 Brute Force 63 fps 5,646,041 triangles 63 fps 5,646,041 triangles HLOD - SSE 1,415 fps 161,742 triangles 881 fps 302,337 triangles HLOD - Naive VMSEE 1,060 fps 140, 458 triangles 300 fps 260,007 triangles HLOD - Scheduled VMSSE 1,176 fps 140, 458 triangles 588 fps 270,774 triangles  Zoomed out rendering 01001100 of 01010110
    81. 81. Selected Results  Zoomed In Rendering View 3 View 4 01001101 of 01010110
    82. 82. Selected Results  Zoomed In Rendering View 3 View 4 01001110 of 01010110
    83. 83. Selected Results  Zoomed In Rendering View 3 View 4 Brute Force 62 fps 5,646,041 triangles 62 fps 5,646,041 triangles HLOD - SSE 128 fps 2,541,434 triangles 98 fps 3,222,701 triangles HLOD - Naive VMSEE 180 fps 346,901 triangles 320 fps 46,765 triangles HLOD - Scheduled VMSSE 210 fps 601,730 triangles 232 fps 103,844 triangles 01001111 of 01010110
    84. 84. Statistics  Lines of Code  GUI: 420  Unit Tests: 1,720  HLOD Creation: 4,600  Rendering: 4,500  Time Spent  Coding: 8 weeks “fulltime.” 3 last spring, 5 this fall.  Plus reading, writing, slides, and logistics. 01010000 of 01010110
    85. 85. Future Work  Improve tree creation  Polygonal simplification  Splitting planes  Fill cracks  Optimal disk layout  Better occlusion performance  Multiple volumes or occlusion- preserving low LOD  Optimize use of clipping planes 01010001 of 01010110
    86. 86. Future Work  Don’t require ancestors to have geometry loaded.  Much better use of memory  More complicated rendering  More rendering artifacts 01010010 of 01010110
    87. 87. Future Work  Cache Management  Aggressively remove nodes  Replacement Policy: Average detail instead of best up close 01010011 of 01010110
    88. 88. Future Work  Multithreading  Multiple load threads  Fault tolerance, increase throughput  Compute thread(s)  Compute normals  Decompress (/ recompress)  Vertex cache optimize? 01010100 of 01010110
    89. 89. Future Work  True Usefulness  Textures  Picking on individual objects  Test with truly massive models 01010101 of 01010110
    90. 90. Future Work  Today  Mad Mex Hour Happy. Now – 6:30pm  Saturday, February 7th  Graduation Party. My House. 3pm. 01010110 of 01010110

    ×