Introduction to Massive Model Visualization Patrick Cozzi Analytical Graphics, Inc.
Contents Minimal computer graphics background Culling Level of Detail (LOD) Memory Management Videos throughout
Computer Graphics Background Goal:  Convert 3D model to pixels 3D models are composed of triangles
Computer Graphics Background 1 triangle = 3 vertices Gross over simplification:  3 floats per vertex (x 0 , y 0 , z 0 ) (x 1 , y 1 , z 1 ) (x 2 , y 2 , z 2 )
Computer Graphics Background Triangles go through  graphics pipeline  to become pixels View parameters define the size and shape of the world viewer near plane far plane
Computer Graphics Background CPU GPU Monitor Vertex  Processing Geometry  Processing Fragment  Processing PCIe Bus Color Depth Stencil
Computer Graphics Background Visible Surfaces
Example Massive Models Procedurally generated model of Pompeii: ~1.4 billion polygons. Image from [Mueller06]
Example Massive Models Boeing 777 model: ~350 million polygons. Image from http://graphics.cs.uni-sb.de/MassiveRT/boeing777.html
Example Massive Models
Trends No upper bound on model complexity Procedural generation Laser scans and aerial imagery Imagery Image from [Lakhia06]
Trends High GPU throughput At least 10-200 million triangles per second Widen  gap  between processor and memory performance CPU – GPU bottleneck
Goal output-sensitivity : performance as a function of the number of pixels rendered, not the size of the model
View Frustum Culling Use bounding volume: spheres, AABBs, OOBBs, etc rendered rendered rendered culled culled culled
View Frustum Culling 0 1 2 3 4 5 0 1 3 4 2 5
View Frustum Culling 0 1 2 3 4 5 0 1 3 4 2 5
View Frustum Culling Demo
Occlusion Culling Effective in scenes with high  depth complexity culled
Occlusion Culling From-region  or  from-point Most are  conservative Occluder Fusion Difficult for general scenes with arbitrary occluders.  So make simplifying assumptions: [Wonka00] – urban environments [Ohlarik08] – planets and satellites
Hardware Occlusion Queries From-point visibility that handles general scenes with arbitrary occluders and occluder fusion How? Use the GPU
Hardware Occlusion Queries Render occluders Render object’s BV using HOQ Render full object based on result
Hardware Occlusion Queries CPU stalls  and  GPU starvation Draw   o1 Draw   o2 Draw   o3 Draw   o1 Draw   o2 Draw   o3 CPU GPU Query   o1 Query   o1 Draw   o1 Draw   o1 -- stall -- -- starve -- CPU GPU
Is Culling Enough?
Is Culling Enough?
Hardware Occlusion Queries Demo
Level of Detail Generation: less triangles, simpler shader Selection: distance, pixel size Switching: avoid popping Discrete, Continuous, Hierarchical
Discrete LOD 3,086 Triangles 52,375 Triangles 69,541 Triangles
Discrete LOD Not enough detail  up close Too much detail  in the distance
Continuous LOD edge collapse vertex split Image from [Luebke01]
Hierarchical LOD 1 Node 3,086 Triangles 4 Nodes 9,421 Triangles 16 Nodes 77,097 Triangles
Hierarchical LOD 1 Node 3,086 Triangles 4 Nodes 9,421 Triangles 16 Nodes 77,097 Triangles
Hierarchical LOD visit(node) { if  (computeSSE(node) < pixel tolerance) { render(node); } else { foreach  (child  in  node.children) visit(child); } } Node  Refinement
Hierarchical LOD
Hierarchical LOD
Hierarchical LOD Demo
Hierarchical LOD Easy to Add view frustum culling Add occlusion culling via HOQs Render front to back Use VMSSE to drive refinement Requires preprocessing Is Culling + HLOD enough?
Memory Management Out-of-Core Compression Cache Coherent Layouts
Out-of-Core HLOD visit(node) { if  ((computeSSE(node) < pixel tolerance) ||  (not all children resident)) { render(node); foreach  (child  in  node.children) requestResidency(child); } else { foreach  (child  in  node.children) visit(child); } }
Out-of-Core HLOD Multithreaded Disk reads Decompression, normal generation, etc Cache management Prioritize reads, e.g. distance from viewer Replacement policy Skeleton in memory? BV, error metric, parent – child relationships
Out-of-Core Prefetching Reduce geometry cache misses Predict and load required nodes
Out-of-Core Prefetching Predict camera position [Correa03] v v’ f f’
Out-of-Core Prefetching [Varadhan02] Coherence of Front Prefetch ascendants/descendants Prefetch with enlarged view frustum Prioritize 0 1 2 3 4 5 6 7
Compression “Size is Speed” Geometric Vertices, Indices I/O and Rendering Performance Texture Performance or Quality Render Disk De/re-compress
Cache Coherent Layouts Reorder vertices and indies to maximize GPU cache hits Vertex Shader Post VS Cache Pre VS Cache GPU Main Memory Primitive Assembly Reorder Triangles Reorder Vertices
Cache Coherent Layouts Minimize ACMR Average Cache Miss Ratio Cache Oblivious [Yoon05] Linear Time [Sander07]
Not Covered Today Clustered backface culling IBR Sorting Batching Ray Tracing
Summary Combine culling, LOD, and out-of-core techniques Keep the CPU and GPU busy Exploit Coherence: Spatial and Temporal

Introduction To Massive Model Visualization

Editor's Notes

  • #2 TODO: BV slide?
  • #3 Also good stuff for game engines
  • #10 30-60 gig on disk. 12 CDs
  • #13 CPU 60% per year to two decades. Main memory and dis access only decreased by 7-10%
  • #15 Can be slower than brute force. When?
  • #16 Spatial data structures exploit spatial coherence. Visit nodes in front to back order. Useful for early-z and occlusion culling.
  • #22 The bounding volume usually has far less geometry. Expensive shaders required to render the object are not generally required to render the bounding volume. When only depth testing is enabled, as is the case when rendering the bounding volume, today’s GPUs use a higher-performance rendering path.
  • #35 Solutions: restricted triangles, runtime stitching, skirts