Visibility Optimization for Games
Upcoming SlideShare
Loading in...5

Visibility Optimization for Games



A presentation held by Umbra Software lead programmer Sampo Lappalainen at China Game Developer Conference 2011.

A presentation held by Umbra Software lead programmer Sampo Lappalainen at China Game Developer Conference 2011.



Total Views
Views on SlideShare
Embed Views



9 Embeds 1,923 1775 100 15 9 8 7 4 4 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Tehdään grafiikka moottori joka piirtää kamaa ruudulle -> helppoa. Artsti tekee modeleita -> modelit annetaan graffamoottorille ja piirretään. Inskät optimoi graffamoottoria ja artistit optimoi graffaa kunnes performance on kunnossa. Sit päädytään tähän tilanteeseen... 08/31/11 12:04
  • Miten päädyttiin alkuperäsestä tilanteesta tähän? Tekki rajotti toimintaa niin paljon, että tää oli parasta mitä saatiin aikaan. 08/31/11 12:04
  • Pelidevaajat tekee tekkiä jotta pelit saatas näyttämään hyvältä. Artistit pystyis tekemään hienompaa kamaa. Ongelmana ei oo piirtää hienoa grafiikkaa, ongelmana on piirtää hienoa grafiikkaa tarpeeks nopeesti. 08/31/11 12:04
  • 08/31/11 12:04
  • TODO kuva TODO viite
  • TODO rename slide TODO pictures from Teppo’s presentation
  • TODO code?
  • VFn sisällä on vielä paljon cullattavaa. 08/31/11 12:04
  • 08/31/11 12:04
  • TODO lähteet
  • TODO lähteet
  • 08/31/11 12:04
  • Objekti 3 on tullu just näkyviin. 08/31/11 12:04
  • TODO note about SIMD? TODO MORE BEEF!
  • TODO rethink
  • TODO rethink
  • TODO describe how it works
  • Esimerkki seuraa. 08/31/11 12:04
  • 08/31/11 12:04
  • TODO Video
  • TODO kuva miten toimii oikeasti
  • TODO kuva portal vs pvs culling
  • TODO link to paper

Visibility Optimization for Games Visibility Optimization for Games Presentation Transcript

  • Visibility Optimization for Games Sampo Lappalainen Lead Programmer Umbra Software Ltd.
  • Introduction
    • Background in graphics programming
    • Hybrid Graphics, NVIDIA, Umbra Software
    • With Umbra since 2008
    • Graphics middleware for console and PC games
    • Emphasis on visibility
  • Roadmap
    • Motivation
    • Theory
    • Practice
    • Other applications
    • Demo
    • Why is visibility optimization important?
  • Game World
  • Our Villain
  • Our Hero
  • Screen Shot
  • Game Worlds
    • Game developers want to make impressive game worlds
    • Hardware sets limits on what can and can’t be done.
    • Game developers need to push the hardware to it’s limits.
  • Visibility Optimization
    • The most effective way to gain performance in games.
    • Two basic ways to do visibility optimization:
      • art and level design
      • technology
    • Games use a mix of both.
  • Visibility Optimization by Level Design
    • Artists design game worlds so that performance is acceptable.
    • Can be done in numerous ways e.g.:
      • limiting view distance
      • limiting polygon or object count
      • modeling portals and cells
  • Visibility Optimization by Level Design
  • Visibility Optimization by Level Design
    • Time consuming and usually boring work.
    • Sets huge limits on what can and cannot be done.
    • May lead to monotonic level design.
    • Manual and non-recurring work.
  • Visibility Optimization by Technology
  • Visibility Optimization by Technology
  • Visibility Optimization by Technology
    • Gains:
      • No time wasted on rendering objects that don’t contribute to the output image (no state changes, no draw calls etc).
      • AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.
    • Walkthrough of the key concepts
  • Terminology
    • Culling – removing hidden objects from rendering
    • Target – object that can be hidden by others
    • Occluder – an object that blocks visibility
    • Rendering artifact – A non-intended glitch in the output image
  • Metrics for comparison
    • GPU cost
    • CPU cost
    • Overall frame time
    • Memory usage
    • Precomputation time
    • Manual work
    • Culling power
  • Backface culling
    • Taken care of by the HW
    • Culling entire triangles based on their winding
    • No need to render the insides of an object
  • Depth buffering
    • Taken care of by the HW
    • A two dimensional buffer for storing z-values for each screen pixel
    • Before processing shaders for a pixel to be rendered, test the z-value.
    • Allows drawing of unsorted geometry, however sorting still greatly improves performance
  • Hierarchical depth buffering
    • Replace depth buffer with a depth pyramid
      • Bottom of the pyramid: full-resolution depth buffer
      • Higher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below level
    • Hierarchically rasterize the polygon starting from the highest level
      • If polygon is further than the recorded pixel, early exit
      • If polygon is closer, hierarchically test the lower levels
      • If the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramid
  • Spatial hierarchies
    • Enabled culling large portions of the game world with a single quick test
    • Dynamic objects can be moved in the hierarchy runtime
    • BSP-tree, kd-tree
  • Spatial hierarchies
  • View frustum culling
    • Culling objects that are outside the camera view cone
    • Test using object bounds
    • Tremendous speed-up using an hierarchy
  • View Frustum Culling
  • View Frustum Culling
  • Potentially Visible Set - PVS
    • A data structure that defines from-region-visibility for a scene
    • Computed in pre-process
    • Scene is divided into Cells
    • Compute a bit matrix that lists all the visible objects for each cell
    • Runtime a simple matrix lookup
    • How to find a good sub-division for a scene?
    • Cannot handle dynamic occluders
    • Target volume: extension to handle dynamic targets
  • Portals
    • Place portals in the scene that connect the cells to form a portal graph
    • In runtime, find the portals of the current cell that are in the frustum
    • Traverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portal
    • Same limitations with dynamic objects as with PVS systems
  • Rasterization-based
    • Render occluder geometry into a software coverage buffer
    • Test visibility using test geometry
    • Use temporal coherence to determine the initial set to be rendered
    • Handles both dynamic targets and occluders as long as they have occluder geometry
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Testing from coverage buffer
  • Occlusion Queries
    • Supported by GPUs since 2001.
    • GPU answers the question: “how many pixels would have been visible if this object would have been rendered”?
    • Instead of rasterizing your own depth buffer, use the GPU depth buffer instead
    • Normally the query is done using bounding volumes (effective but not necessary).
    • No need for artist generated occluder geometry
    • GPU-CPU synchronization needed
  • Occlusion Queries
    • Determine the set of visible objects against the actual rendered geometry:
      • all pixels can be used as occluding material!
  • Using Occlusion Queries
    • Occlusion queries are a really powerful tool for visibility optimization.
    • Like all other features of the GPU occlusion queries can be used ineffectively.
    • Special tricks are needed to get the most out of occlusion queries.
  • Issuing Occlusion Queries disableColorWrite(); disableDepthWrite(); startQueryCounter(); renderObjectBounds(); stopQueryCounter(); enableColorWrite(); enableDepthWrite(); if (query->getResult() > 0) renderObject();
  • CPU-GPU synchronization
    • With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing).
    • With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not.
    • The CPU needs to wait for the query results to be available.
    • No parallel processing (which is really bad).
  • Issuing Occlusion Queries
  • Issuing Occlusion Queries
  • Issuing Occlusion Queries
  • Issuing Occlusion Queries
    • Fortunately GPU design has a solution for this problem.
    • GPUs can store multiple occlusion query results.
    • Occlusion queries can be batched.
    • Some GPUs have a limit on how many query results can be stored.
  • Batching Occlusion Queries disableColorWrite(); disableDepthWrite(); for (each query) { startQueryCounter(); renderObjectBounds(); stopQueryCounter(); } enableDepthWrite(); enableColorWrite(); for (each query) { if (query->getResult() > 0) renderObject(); }
  • Batching Occlusion Queries
  • Latent Occlusion Queries
    • Some stalls may be introduced between frames.
    • The last query result needs to be read back before continuing.
    • Avoid GPU stalls by using the query results from the previous frame.
    • Read back the query results at the beginning of each frame.
    • Sounds like a perfect solution?
  • Latent Occlusion Queries
  • Latent Occlusion Queries
    • There are downsides to this.
    • Visible popping artifacts when objects come visible.
    • If the camera is moving slowly and FPS is good, no problem.
    • When multiple objects become visible FPS typically drops (there’s a lot more to render)
    • For example when a door is opened.
  • Latent Occlusion Queries
  • Latent Occlusion Queries
  • Latent Occlusion Queries
  • Latent Occlusion Queries
    • Queries done to hierarchy nodes produce even larger artifacts
    • Growing bounds helps, but is difficult to get to work with hierarchical queries
    • The stall in using occlusion query results on the same frame may be as short as 0.1ms (on XBOX 360)
    • In this a price developers are ready to pay for artifact free occlusion culling?
  • Parallelism
    • Most gaming platforms today come with more than one CPU
    • Using the same algorithm for multiple cameras (splitscreen, AI bots, light sources)
    • Tile-based rasterization
    • Parallel data structure traverse
    • What kind of systems have really been used?
  • Binary Space Partitioning
    • As made famous by Doom and the Quake series
    • A tree data structure for representing the scene
    • Gordon and Chen 1991 paper used in Doom ( )
    • Teller’s 1992 PhD thesis used in Quake ( )
  • Binary space partitioning
    • Before Doom BSP’s were used to do sorting for the painter’s algorithm (back-to-front)
    • Painter’s algorithm is too slow for large scenes
    • Solution: change the order to front-to-back and keep track on which pixels have been drawn
    • Quake introduced a pre-process step for computing a PVS based on the BSP model
  • Umbra   1
    • Used in Star Wars Galaxies, EverQuest 2, Age of Conan, Kingdom Heroes 2, Tian Xia 2
    • A data structure that supports dynamic and static visibility
    • Software rasterizer and occlusion queries supported
  • Umbra   1
    • Database
      • Spatial bounding volume hierarchy
      • User updates
    • Visibility traverse
      • Input: camera parameters
      • Output: visible object set
      • Hierarchical visibility testing: a single query can hide large parts of the scene
  • Hierarchical Culling
    • In typical game scenes most of the scene is hidden at any given point of view
    • Problem:
      • the size of the whole scene effects performance ( input sensitive system ).
    • Only the visible objects are supposed to effect performance ( output sensitive system ).
  • Hierarchical Culling
  • Hierarchical Culling
    • Solution:
      • build a spatial hierarchy for the objects in the scene
    • Culling hidden parts of the scene in constant time
    • Occlude groups of objects: if a hierarchy node is hidden all nodes below it are also hidden
  • Hierarchy Traversal
    • Traverse the hierarchy to determine visibility
    • Use temporal coherency
    • On first frame, start from the root
    • Store nodes where traversal ended and start traversing them on the next frame
    • Nodes form a visibility barrier
  • Hierarchy Traversal
  • Hierarchy Traversal
  • Hierarchy Traversal
  • Dynamic Objects
    • Object geometry may change (e.g. due to LODing).
    • Objects may move
    • If object geometry changes it may not fit into its old bounds
    • Move the object upwards in the hierarchy so that the bounds can fit inside a node
    • Push the object back down once there is idle time
  • Dynamic Objects
    • If the object moves temporal bounding volumes can be used.
    • Use history info to predict the object movement.
    • The TBV doesn’t have to be updated every frame.
  • Dynamic Objects
  • Dynamic Objects
  • Umbra   2
    • Multi-core version of the previous tech
    • Used in e.g. Mass Effect 2, Dragon Age series, Alan Wake
  • Multi-core culling
    • Two subtasks: rendering and visibility traversal
    • Rendering issues rendering calls and occlusion queries.
    • Visibility processing takes care of hierarchy processing and high level culling (e.g. vf culling).
  • Multi-core culling
    • Game tread needs to do updates before our visibility thread can continue (camera and object updates)
    • Visibility thread updates the hierarchy
    • After update the hierarchy can be traversed
  • Multi-core culling
  • Multi-core culling
    • While the visibility thread is idle it can update the hierarchy:
      • lazy hierarchy building
      • collapsing nodes
      • visibility barrier updates
      • moving dynamic objects down etc.
  • Umbra   3
    • Used by Unity 3D, Secret Studio
    • Collection of visibility algorithms
      • Umbra 1-2 feature sets
      • Automatic portal generation in pre-process
      • CPU rasterization and ray-tracing based portal culling algorithms
      • PVS culling for low end systems
  • Umbra   3
    • Uses real geometry, no need for artists to create occluder geometry
    • Support for streaming, distance queries, intersection queries
  • Automatic portal generation
    • Works with both outdoor and indoor scenes
    • Conservative occlusion
    • The output is a graph where the nodes are cells and the edges are the portals
    • Optionally a PVS can be computed
    • Incremental updates
  • Umbra   3 recursive portal culling
    • Recursive traverse of the portal graph from the camera view point, ray tracing
    • Very accurate culling results
    • Too slow for whole scene culling, currently used for reference and for dynamic object culling
  • Umbra   3 optimized portal culling
    • Rasterize the portals into a coverage buffer
    • Fast enough for even outdoor scenes
    • In some cases over-estimates the visible set
  • Umbra   3 PVS culling
    • Extremely fast
      • Needed for low end systems such as smart phones
      • Can be used to determine visibility for e.g. hunderds of AI bots
    • The longer time spent computing, the more accurate the result
  • Killzone 3
    • See ”Practical occlusion culling for PS3”:
    • Solution implemented spesifically for PlayStation 3
    • Rasterizes a 720p tiled depth buffer on the SPU’s
    • Performs occlusion tests to a downsampled depth buffer using object bounds
    • Occluder mesh selection done by artists
  • Battlefield 3
    • See ”Culling the Battlefield”: CullingTheBattlefield .pdf
    • A cross-platform (XBOX360, PS3, PC) solution
    • SIMD optimized frustum culling
    • Software rasterizer for occlusion culling done to a 256x116 depth buffer
    • Occluder geometry hand made by artists
    • What else can I use it for?
  • Lighting & shadows
    • When applied from a light sources point of view a visibility algorithm can be used for finding shadow casters
    • ” Shadow Caster Occlusion Culling for Efficient Shadow mapping” ( )
  • Streaming
    • Large game worlds have so much content that it cannot fit in the memory of a gaming platform
    • Loading between zones takes away immersion
    • A from-region visibility algorithm can be used to do visibility-based streaming over the network or from a storage media
  • AI
    • A visibility algorithm can be used to drive AI logic
    • Data structures used in visibility determination can be modified to be used for distance or intersection testing
  • Sound occlusion
    • Distance and intersection tests can be used to simulate the behaviour of sound
    • Precomputing visibility and audio have a lot of overlap and make for an interesting field of study
  • FIN
    • Sampo Lappalainen
    • [email_address]