Your SlideShare is downloading. ×
Visibility Optimization for Games
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Visibility Optimization for Games

5,344
views

Published on

A presentation held by Umbra Software lead programmer Sampo Lappalainen at China Game Developer Conference 2011.

A presentation held by Umbra Software lead programmer Sampo Lappalainen at China Game Developer Conference 2011.

Published in: Technology, Design

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,344
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
101
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Tehdään grafiikka moottori joka piirtää kamaa ruudulle -> helppoa. Artsti tekee modeleita -> modelit annetaan graffamoottorille ja piirretään. Inskät optimoi graffamoottoria ja artistit optimoi graffaa kunnes performance on kunnossa. Sit päädytään tähän tilanteeseen... 08/31/11 12:04
  • Miten päädyttiin alkuperäsestä tilanteesta tähän? Tekki rajotti toimintaa niin paljon, että tää oli parasta mitä saatiin aikaan. 08/31/11 12:04
  • Pelidevaajat tekee tekkiä jotta pelit saatas näyttämään hyvältä. Artistit pystyis tekemään hienompaa kamaa. Ongelmana ei oo piirtää hienoa grafiikkaa, ongelmana on piirtää hienoa grafiikkaa tarpeeks nopeesti. 08/31/11 12:04
  • 08/31/11 12:04
  • TODO kuva TODO viite
  • TODO rename slide TODO pictures from Teppo’s presentation
  • TODO code?
  • VFn sisällä on vielä paljon cullattavaa. 08/31/11 12:04
  • 08/31/11 12:04
  • TODO lähteet
  • TODO lähteet
  • 08/31/11 12:04
  • Objekti 3 on tullu just näkyviin. 08/31/11 12:04
  • TODO note about SIMD? TODO MORE BEEF!
  • TODO rethink
  • TODO rethink
  • TODO describe how it works
  • Esimerkki seuraa. 08/31/11 12:04
  • 08/31/11 12:04
  • TODO Video
  • TODO kuva miten toimii oikeasti
  • TODO kuva portal vs pvs culling
  • TODO link to paper
  • Transcript

    • 1. Visibility Optimization for Games Sampo Lappalainen Lead Programmer Umbra Software Ltd.
    • 2. Introduction
      • Background in graphics programming
      • Hybrid Graphics, NVIDIA, Umbra Software
      • With Umbra since 2008
      • Graphics middleware for console and PC games
      • Emphasis on visibility
    • 3. Roadmap
      • Motivation
      • Theory
      • Practice
      • Other applications
      • Demo
    • 4. MOTIVATION
      • Why is visibility optimization important?
    • 5. Game World
    • 6. Our Villain
    • 7. Our Hero
    • 8. Screen Shot
    • 9. Game Worlds
      • Game developers want to make impressive game worlds
      • Hardware sets limits on what can and can’t be done.
      • Game developers need to push the hardware to it’s limits.
    • 10. Visibility Optimization
      • The most effective way to gain performance in games.
      • Two basic ways to do visibility optimization:
        • art and level design
        • technology
      • Games use a mix of both.
    • 11. Visibility Optimization by Level Design
      • Artists design game worlds so that performance is acceptable.
      • Can be done in numerous ways e.g.:
        • limiting view distance
        • limiting polygon or object count
        • modeling portals and cells
    • 12. Visibility Optimization by Level Design
    • 13. Visibility Optimization by Level Design
      • Time consuming and usually boring work.
      • Sets huge limits on what can and cannot be done.
      • May lead to monotonic level design.
      • Manual and non-recurring work.
    • 14. Visibility Optimization by Technology
    • 15. Visibility Optimization by Technology
    • 16. Visibility Optimization by Technology
      • Gains:
        • No time wasted on rendering objects that don’t contribute to the output image (no state changes, no draw calls etc).
        • AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.
    • 17. THEORY
      • Walkthrough of the key concepts
    • 18. Terminology
      • Culling – removing hidden objects from rendering
      • Target – object that can be hidden by others
      • Occluder – an object that blocks visibility
      • Rendering artifact – A non-intended glitch in the output image
    • 19. Metrics for comparison
      • GPU cost
      • CPU cost
      • Overall frame time
      • Memory usage
      • Precomputation time
      • Manual work
      • Culling power
    • 20. Backface culling
      • Taken care of by the HW
      • Culling entire triangles based on their winding
      • No need to render the insides of an object
    • 21. Depth buffering
      • Taken care of by the HW
      • A two dimensional buffer for storing z-values for each screen pixel
      • Before processing shaders for a pixel to be rendered, test the z-value.
      • Allows drawing of unsorted geometry, however sorting still greatly improves performance
    • 22. Hierarchical depth buffering
      • Replace depth buffer with a depth pyramid
        • Bottom of the pyramid: full-resolution depth buffer
        • Higher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below level
      • Hierarchically rasterize the polygon starting from the highest level
        • If polygon is further than the recorded pixel, early exit
        • If polygon is closer, hierarchically test the lower levels
        • If the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramid
    • 23. Spatial hierarchies
      • Enabled culling large portions of the game world with a single quick test
      • Dynamic objects can be moved in the hierarchy runtime
      • BSP-tree, kd-tree
    • 24. Spatial hierarchies
    • 25. View frustum culling
      • Culling objects that are outside the camera view cone
      • Test using object bounds
      • Tremendous speed-up using an hierarchy
    • 26. View Frustum Culling
    • 27. View Frustum Culling
    • 28. Potentially Visible Set - PVS
      • A data structure that defines from-region-visibility for a scene
      • Computed in pre-process
      • Scene is divided into Cells
      • Compute a bit matrix that lists all the visible objects for each cell
      • Runtime a simple matrix lookup
      • How to find a good sub-division for a scene?
      • Cannot handle dynamic occluders
      • Target volume: extension to handle dynamic targets
    • 29. Portals
      • Place portals in the scene that connect the cells to form a portal graph
      • In runtime, find the portals of the current cell that are in the frustum
      • Traverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portal
      • Same limitations with dynamic objects as with PVS systems
    • 30. Rasterization-based
      • Render occluder geometry into a software coverage buffer
      • Test visibility using test geometry
      • Use temporal coherence to determine the initial set to be rendered
      • Handles both dynamic targets and occluders as long as they have occluder geometry
    • 31. Testing from coverage buffer
    • 32. Testing from coverage buffer
    • 33. Testing from coverage buffer
    • 34. Testing from coverage buffer
    • 35. Testing from coverage buffer
    • 36. Testing from coverage buffer
    • 37. Testing from coverage buffer
    • 38. Testing from coverage buffer
    • 39. Occlusion Queries
      • Supported by GPUs since 2001.
      • GPU answers the question: “how many pixels would have been visible if this object would have been rendered”?
      • Instead of rasterizing your own depth buffer, use the GPU depth buffer instead
      • Normally the query is done using bounding volumes (effective but not necessary).
      • No need for artist generated occluder geometry
      • GPU-CPU synchronization needed
    • 40. Occlusion Queries
      • Determine the set of visible objects against the actual rendered geometry:
        • all pixels can be used as occluding material!
    • 41. Using Occlusion Queries
      • Occlusion queries are a really powerful tool for visibility optimization.
      • Like all other features of the GPU occlusion queries can be used ineffectively.
      • Special tricks are needed to get the most out of occlusion queries.
    • 42. Issuing Occlusion Queries disableColorWrite(); disableDepthWrite(); startQueryCounter(); renderObjectBounds(); stopQueryCounter(); enableColorWrite(); enableDepthWrite(); if (query->getResult() > 0) renderObject();
    • 43. CPU-GPU synchronization
      • With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing).
      • With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not.
      • The CPU needs to wait for the query results to be available.
      • No parallel processing (which is really bad).
    • 44. Issuing Occlusion Queries
    • 45. Issuing Occlusion Queries
    • 46. Issuing Occlusion Queries
    • 47. Issuing Occlusion Queries
      • Fortunately GPU design has a solution for this problem.
      • GPUs can store multiple occlusion query results.
      • Occlusion queries can be batched.
      • Some GPUs have a limit on how many query results can be stored.
    • 48. Batching Occlusion Queries disableColorWrite(); disableDepthWrite(); for (each query) { startQueryCounter(); renderObjectBounds(); stopQueryCounter(); } enableDepthWrite(); enableColorWrite(); for (each query) { if (query->getResult() > 0) renderObject(); }
    • 49. Batching Occlusion Queries
    • 50. Latent Occlusion Queries
      • Some stalls may be introduced between frames.
      • The last query result needs to be read back before continuing.
      • Avoid GPU stalls by using the query results from the previous frame.
      • Read back the query results at the beginning of each frame.
      • Sounds like a perfect solution?
    • 51. Latent Occlusion Queries
    • 52. Latent Occlusion Queries
      • There are downsides to this.
      • Visible popping artifacts when objects come visible.
      • If the camera is moving slowly and FPS is good, no problem.
      • When multiple objects become visible FPS typically drops (there’s a lot more to render)
      • For example when a door is opened.
    • 53. Latent Occlusion Queries
    • 54. Latent Occlusion Queries
    • 55. Latent Occlusion Queries
    • 56. Latent Occlusion Queries
      • Queries done to hierarchy nodes produce even larger artifacts
      • Growing bounds helps, but is difficult to get to work with hierarchical queries
      • The stall in using occlusion query results on the same frame may be as short as 0.1ms (on XBOX 360)
      • In this a price developers are ready to pay for artifact free occlusion culling?
    • 57. Parallelism
      • Most gaming platforms today come with more than one CPU
      • Using the same algorithm for multiple cameras (splitscreen, AI bots, light sources)
      • Tile-based rasterization
      • Parallel data structure traverse
    • 58. PRACTICE
      • What kind of systems have really been used?
    • 59. Binary Space Partitioning
      • As made famous by Doom and the Quake series
      • A tree data structure for representing the scene
      • Gordon and Chen 1991 paper used in Doom ( http://www.rothschild.haifa.ac.il/~gordon/ftb-bsp.pdf )
      • Teller’s 1992 PhD thesis used in Quake ( http://people.csail.mit.edu/seth/pubs/pubs.html )
    • 60. Binary space partitioning
      • Before Doom BSP’s were used to do sorting for the painter’s algorithm (back-to-front)
      • Painter’s algorithm is too slow for large scenes
      • Solution: change the order to front-to-back and keep track on which pixels have been drawn
      • Quake introduced a pre-process step for computing a PVS based on the BSP model
    • 61. Umbra   1
      • Used in Star Wars Galaxies, EverQuest 2, Age of Conan, Kingdom Heroes 2, Tian Xia 2
      • A data structure that supports dynamic and static visibility
      • Software rasterizer and occlusion queries supported
    • 62. Umbra   1
      • Database
        • Spatial bounding volume hierarchy
        • User updates
      • Visibility traverse
        • Input: camera parameters
        • Output: visible object set
        • Hierarchical visibility testing: a single query can hide large parts of the scene
    • 63. Hierarchical Culling
      • In typical game scenes most of the scene is hidden at any given point of view
      • Problem:
        • the size of the whole scene effects performance ( input sensitive system ).
      • Only the visible objects are supposed to effect performance ( output sensitive system ).
    • 64. Hierarchical Culling
    • 65. Hierarchical Culling
      • Solution:
        • build a spatial hierarchy for the objects in the scene
      • Culling hidden parts of the scene in constant time
      • Occlude groups of objects: if a hierarchy node is hidden all nodes below it are also hidden
    • 66. Hierarchy Traversal
      • Traverse the hierarchy to determine visibility
      • Use temporal coherency
      • On first frame, start from the root
      • Store nodes where traversal ended and start traversing them on the next frame
      • Nodes form a visibility barrier
    • 67. Hierarchy Traversal
    • 68. Hierarchy Traversal
    • 69. Hierarchy Traversal
    • 70. Dynamic Objects
      • Object geometry may change (e.g. due to LODing).
      • Objects may move
      • If object geometry changes it may not fit into its old bounds
      • Move the object upwards in the hierarchy so that the bounds can fit inside a node
      • Push the object back down once there is idle time
    • 71. Dynamic Objects
      • If the object moves temporal bounding volumes can be used.
      • Use history info to predict the object movement.
      • The TBV doesn’t have to be updated every frame.
    • 72. Dynamic Objects
    • 73. Dynamic Objects
    • 74. Umbra   2
      • Multi-core version of the previous tech
      • Used in e.g. Mass Effect 2, Dragon Age series, Alan Wake
    • 75. Multi-core culling
      • Two subtasks: rendering and visibility traversal
      • Rendering issues rendering calls and occlusion queries.
      • Visibility processing takes care of hierarchy processing and high level culling (e.g. vf culling).
    • 76. Multi-core culling
      • Game tread needs to do updates before our visibility thread can continue (camera and object updates)
      • Visibility thread updates the hierarchy
      • After update the hierarchy can be traversed
    • 77. Multi-core culling
    • 78. Multi-core culling
      • While the visibility thread is idle it can update the hierarchy:
        • lazy hierarchy building
        • collapsing nodes
        • visibility barrier updates
        • moving dynamic objects down etc.
    • 79. Umbra   3
      • Used by Unity 3D, Secret Studio
      • Collection of visibility algorithms
        • Umbra 1-2 feature sets
        • Automatic portal generation in pre-process
        • CPU rasterization and ray-tracing based portal culling algorithms
        • PVS culling for low end systems
    • 80. Umbra   3
      • Uses real geometry, no need for artists to create occluder geometry
      • Support for streaming, distance queries, intersection queries
    • 81. Automatic portal generation
      • Works with both outdoor and indoor scenes
      • Conservative occlusion
      • The output is a graph where the nodes are cells and the edges are the portals
      • Optionally a PVS can be computed
      • Incremental updates
    • 82. Umbra   3 recursive portal culling
      • Recursive traverse of the portal graph from the camera view point, ray tracing
      • Very accurate culling results
      • Too slow for whole scene culling, currently used for reference and for dynamic object culling
    • 83.  
    • 84. Umbra   3 optimized portal culling
      • Rasterize the portals into a coverage buffer
      • Fast enough for even outdoor scenes
      • In some cases over-estimates the visible set
    • 85.  
    • 86. Umbra   3 PVS culling
      • Extremely fast
        • Needed for low end systems such as smart phones
        • Can be used to determine visibility for e.g. hunderds of AI bots
      • The longer time spent computing, the more accurate the result
    • 87. Killzone 3
      • See ”Practical occlusion culling for PS3”: http://gdcvault.com/play/1014356/Practical-Occlusion-Culling-on
      • Solution implemented spesifically for PlayStation 3
      • Rasterizes a 720p tiled depth buffer on the SPU’s
      • Performs occlusion tests to a downsampled depth buffer using object bounds
      • Occluder mesh selection done by artists
    • 88. Battlefield 3
      • See ”Culling the Battlefield”: http://publications.dice.se/attachments/ CullingTheBattlefield .pdf
      • A cross-platform (XBOX360, PS3, PC) solution
      • SIMD optimized frustum culling
      • Software rasterizer for occlusion culling done to a 256x116 depth buffer
      • Occluder geometry hand made by artists
    • 89. OTHER APPLICATIONS
      • What else can I use it for?
    • 90. Lighting & shadows
      • When applied from a light sources point of view a visibility algorithm can be used for finding shadow casters
      • ” Shadow Caster Occlusion Culling for Efficient Shadow mapping” ( http://www.cg.tuwien.ac.at/research/publications/2011/bittner-2011-scc/bittner-2011-scc-paper.pdf )
    • 91. Streaming
      • Large game worlds have so much content that it cannot fit in the memory of a gaming platform
      • Loading between zones takes away immersion
      • A from-region visibility algorithm can be used to do visibility-based streaming over the network or from a storage media
    • 92. AI
      • A visibility algorithm can be used to drive AI logic
      • Data structures used in visibility determination can be modified to be used for distance or intersection testing
    • 93. Sound occlusion
      • Distance and intersection tests can be used to simulate the behaviour of sound
      • Precomputing visibility and audio have a lot of overlap and make for an interesting field of study
    • 94. FIN
      • Sampo Lappalainen
      • [email_address]
      • http://www.umbra3.com

    ×