Embarrassingly Parallel Computation for Occlusion Culling

1,748 views

Published on

One of the key challenges of modern 3D game rendering engines powering the next-generation of console games is to minimize resources spent on assets that do not actually contribute to the user experience. More specifically, determining which surfaces are hidden behind (occluded by) other surfaces can be a very hard problem to solve in real-time, but will typically yield significant performance gains.

Real-time occlusion culling typically requires either a vast amount of manual labor or a computationally intensive pre-processing step. In this talk, I will show how the occluder generation step can actually be considered embarrassingly parallel, and distributed across multiple nodes accordingly. I will also discuss how this model can be further improved.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,748
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Embarrassingly Parallel Computation for Occlusion Culling

  1. 1. EmbarrassinglyParallel Computationfor VisibilityJasin BushnaiefUmbra Software
  2. 2. Who are we?• The only occlusion culling middleware company in the world• Founded in 2006• Based in Helsinki• 12 people• Customers: Bungie (Halo), Guerrilla (Killzone), Remedy (Alan Wake), Bioware (Mass Effect), CD Projekt (Witcher), ArenaNet (Guild Wars) and many more
  3. 3. We’re going to talk about• The past – Brief introduction to occlusion culling – Traditional methods of visibility computation• The present – Umbra’s visibility computation algorithm – How it can be distributed• The future – Challenges of modern games and engines
  4. 4. The Past:SO, WHAT’S OCCLUSION CULLINGANYWAY?
  5. 5. Graphics in games• Game development process: – Artists create content – Engine runtime renders it• Rendering – Content consists of objects – Which consist of triangles – Which get rendered by the GPU• Our business: rendering optimization
  6. 6. Occlusion culling explained• ”Culling is the process of removing breeding animals from a group based on specific criteria.” (Wikipedia)• Hidden surface removal: ”Which surfaces do not contribute to the final rendered image on the screen?”• Some popular HSR methods: – Frustum culling – Backface culling – Occlusion culling
  7. 7. Occlusion culling explained• Occlusion culling: ”Which surfaces are blocked (occluded) by other surfaces?”• Depth buffering is one way to do OC – Very accurate (i.e. pixel level) – Ubiquitous on hardware, easy problem to solve – Occurs very late in the pipeline
  8. 8. Occlusion culling explained• Higher-level methods complement depth- buffering nicely• These cull entire objects, groups of objects or entire sections of the scene – Not easy!• The earlier, the better
  9. 9. Occlusion cullingOnly the objects visible tothe camera are rendered
  10. 10. ”Traditional” way to do OC• Preprocess: – Divide scene into cells – Compute visibility between cells • Results in a visibility matrix (PVS)• Runtime: – Locate the camera – Do a lookup into the PVS matrix
  11. 11. Simple example
  12. 12. Split scene into cells A B C D
  13. 13. Compute visibility (sampling) A B A B C D A 1 1 1 0 B C D C D
  14. 14. Compute visibilityA B A B C D A 1 1 1 0 B 1 1 0 1 C DC D
  15. 15. Compute visibilityA B A B C D A 1 1 1 0 B 1 1 0 1 C 1 0 1 1 DC D
  16. 16. Compute visibilityA B A B C D A 1 1 1 0 B 1 1 0 1 C 1 0 1 1 D 0 1 1 1C D
  17. 17. Runtime PVS cullingA B A B C D A 1 1 1 0 B 1 1 0 1 C 1 0 1 1 D 0 1 1 1C D
  18. 18. Problem?• Solving visibility between cells is very difficult – E.g. Solving analytically is actually O(n4)• Global operation by nature• Doesn’t play well with dynamic scenes – Worst case: a change in one cell requires recomputation of the entire matrix
  19. 19. The PresentUMBRA DOES IT BETTER
  20. 20. Welcome to the 2010s• Modern game worlds are huge• So it’d be cool if you didn’t need the entire scene in memory, ever• It’d be even cooler if the heavy lifting could be distributed. Or sent to the Cloud™• Buildings collapse. Things change.
  21. 21. The Umbra approach• Don’t actually compute visibility for the entire scene• Instead, process geometry to create a datastructure to solve visibility in the runtime• Portal culling in the runtime
  22. 22. Data generation• Data = portal graph• Generate local graphs individually reasonably- sized geometry chunks (tiles), in parallel• Combine the results into a global portal graph that can be quickly traversed• Solve visibility quickly in the runtime using this graph
  23. 23. Will this work?• Portal generation – Is very hard, but possible to do automatically – Only local geometry needed →Pretty much an embarrassingly parallel problem• Runtime – Not as simple as a PVS lookup, but still quite fast
  24. 24. Simple example revisited
  25. 25. Split geometry into tiles
  26. 26. Dispatch tiles to worker nodes Tile 0 Tile 1 Tile 2 Tile 3
  27. 27. Generate portalsTile 0 Tile 1 Tile 2 Tile 3
  28. 28. Combine portal graph
  29. 29. Runtime query: traverse portals
  30. 30. What did we do here? • Essentially a map-reduce – Split scene into distributable tiles – Generate local portal graph for each tile – Combine results, link global portal graph RuntimeScene Tile 0 Portals 0 Global portal Visible graph objects Reduce Tile 1 Portals 1 Query Map ... ... Tile n Portals n
  31. 31. The FutureTHE NEXT GENERATION
  32. 32. Turns out...• Even the initial ”map” is too much for large game worlds• A global graph of a vast world is too expensive in the runtime• You need to support multiple versions of some chunks for dynamic content – Quite a combinatorial problem→ Next-gen games require an even bettersolution!
  33. 33. So we did something like this Runtime Tile 0 Portals 0 Graph A Visible objects Combine Query Tile 1 Portals 1 Tile 2 Portals 2 Tile 3 Portals 3 Graph B Visible Combine objects Query... ... ... Tile n Portals n
  34. 34. Got rid of ”map” Runtime Tile 0 Portals 0 Graph A Visible objects Combine Query Tile 1 Portals 1 Tile 2 Portals 2 Tile 3 Portals 3 Graph B Visible Combine objects Query... ... ... Tile n Portals n
  35. 35. Split up ”reduce”, moved to runtime Runtime Tile 0 Portals 0 Graph A Visible objects Combine Query Tile 1 Portals 1 Tile 2 Portals 2 Tile 3 Portals 3 Graph B Visible Combine objects Query... ... ... Tile n Portals n
  36. 36. Questions?jasin@umbrasoftware.com

×