A presentation I gave at KGC 2014 in November 2014. I introduce Umbra 3 and go through three customer use cases that show the sort of development we've done with some of our partners.
I cover Witcher 3 (CD Projekt RED), Quantum Break (Remedy Entertainment) and Destiny (Bungie).
31. Case study: Quantum break
• Xbox용 3인칭 액션 게임 (Max Payne &Alan Wak
e 제작자)
• 업계에서 저명한 그래픽 팀에서 개발한 자체 3D
엔진 사용
• 각각의 View에 나오는 평군 오브젝트는 기본 4
만개 부 (occlusion 없음)
• 대규모 폭파와 semi dynamic한 지오매트리 사용
• 기존의 GPU occlusion query를 사용 하고 있었음.
32. Semi-dynamic scene changes
• 대부분 static한 요소에서의
대규모 스케일 변화
– 기물 파괴
– 다른 시간대에서 보여 주는 각각
의 씬 버전
• 해결방안
– 가시성 데이터는 각 씬의 데이터
블럭 내에서 빌드와 저장이 이루
어짐
– 여러 버전의 데이터 블럭은 각각
의 다이나믹 스테이트에 저장
– 활성화 된 가시성 데이터 블럭은
런타임에 연동
33. Shadow caster culling
• OCCLUDED SHADOW를 캐스트 하는 쉐도우 캐스터는 렌더링 하지 않음.
• OCLLUSION BUFFER 를 다시 RE-PROJECT 하여 RECEIVER MASK를 생성할
수 있는 LIGHT SPACE 만들기.
• RECEIVER MASK에 대응하는 SHADOW CASTER 테스트
35. Case Study: Destiny
• Bungie가 제작한 자체 엔진 (cross-platform)
• 2014년에 현재/기존 콘솔에 출시
• 2009년부터 Umbra와 협업
• 기존 작업 방식은 수작업으로 포털과 BSP 씬 작업
• Umbra visibility data 사용처는…
– Game play cluster definition
– Spatial connectivity
– Audio occlusion
– Global illumination acceleration
36. Incremental content updates
• 임의의 폴리곤 풀을 preprocess하기 위한 요건
– 3km x 3km map
– Full rebuild: 5 minutes
– 최소의 업데이트 증가 : 10 seconds
• Umbra의 계산 방식은 작은 점들의 테스크들이 농장을 이
루는 것과 비슷하다; 그래프 형식으로 표현
• 각 태스크의 결과물은 공유된 저장소에 cache된다.
• 로컬에 있는 occlusion data에 따라서 독립적인 업데이
트도 가능
37. Culling with predicted camera
• 카메라 업데이트 병행해서 가
시성 처리
→ 가시성 쿼리가 시작 될 때에는
정확한 카메라 위치는 파악 되지
않는다.
• Umbra 3는 “camera predict
ion radius” 를 제공하여 보
수적인 관점의 “ from-region”
쿼리를 제공한다.
• 모든 occluder는 최종 결과물
의 양에 따라 축소 된다.
38. Dynamic changes in visibility
• 닫힌 문, 셔터 달린 문 등은 훌륭한 Occluder
들이다. 단, 그것들이 닫혀 있을 때만 그러하
다.
• 가시성 그래프 결점은 런타임에 링크를 활성
화 하는 것을 돕는다.
• Umbra 3 는 일반적인 “gate” 오브젝트가 런
타임에서 On/Off로 전환되는 것을 지원한다.
39. Thank you.
For more on Umbra 3, go to umbra3.com
sampo@umbrasoftware.com
Follow us on Twitter @umbrasoftware
Editor's Notes
Fast content creation and smooth frame-rates with Umbra 3
Introduction to Umbra and visibility
Case study: Witcher 3
Case study: Quantum Break
Case study: Destiny
Video Games Powered by Umbra 3
The Witcher3 developer by CD projekt RED in Poland, and slated for release next year uses Umbra for visibility on all platforms.
Stunningly beautiful Killzone: Shadow Fall, the PS4 launch title by Guerrilla
Destiny by Bungie, going to be released September this year uses Umbra 3 on all platforms – previous and current gen.
Call of Duty: Ghosts by Infinity Ward – ps4, xbox one and pc
PVS
GPU rendering
Simplified occluder rasterization
Portals and Cells
Umbra
Here you’re looking at top down view of an example scene.
The requirement was that there should be no manual markup or other requirements on the input geometry. So what we take as input is all the level geometry as it is.
So we really don’t have any other information besides the long list of polygons, which are possibly grouped into objects.
Doing geometric operations with polygons directly has all kinds of difficulties related to floating point accuracy. Also, almost all real life game levels contain some small modeling errors such as t-vertices and cracks between objects. We need to be able to work with this kind of input as well.
What we do next is to voxelize all the geometry. Great thing about voxelization is that it removes all the nasty problems with floating point accuracy and automatically removes common modeling errors such as cracks and t-vertices in the geometry.
Voxelization also discretizes the input, making the following processing independent of polygons count. In effect we can choose the resolution of the input data. This is important for the goal of creating a bounded size data structure.
The input could have billions of triangles but after this step we can throw all the original geometry away and work on the voxels instead.
Bad thing about voxelization is that it requires quite a lot of memory. In fact since we need accurate visibility data we have to make the voxels quite small and the number of them might be measured in billions or even hundreds of billions for larger levels. Even compressed this data can take gigabytes of memory. The memory requirements alone indicate we need to further refine turn this voxel presentation into something else to make it usable in practice.
The approach we chose is to create a cell-and-portal graph by grouping the voxels together based on proximity and connectivity. Cells are created from groups of voxels. Portals are then created on the boundaries of these groups.
We chose to create portals because in the past they have been proven to be an efficient way to represent visibility, and we solve the issues of manually placed portals by generating them automatically.
In constrast to manually placed portals, we might generate thousands of portals which allows us to have accurate visibility in outdoor spaces as well.
By controlling the number of output cells and portals we can choose the output resolution of the visibility data so that it meets the memory and performance requirements.
There are several options on how we could use the portal data. We could do a traditional recursive portal traversal, clipping the view frustum as we go through each portal. Or we could do ray-tracing or cone-tracing in the graph.
The approach we choose is to rasterize the portals using a custom software rasterizer optimized for this purpose. With rasterizer we need to touch each portal only once, as opposed to recursive portal traversal which can suffer from exponential blowup if there’s a lot of portal intersections in the screenspace.
(We could also traverse other kind of queries for connectivity.)
Also really useful property of the rasterizer is that it produces a depth buffer as output, which is almost optimal data structure for doing further visibility tests on the query output.
Also with rasterization we can choose the output resolution based on the platform and accuracy requirements.
Since we’re rasterizing portals instead of occluders, it’s trivial to implement conservative rasterization, which is a requirement for getting correct results in lower resolutions.
SAVE ENGINEERS’ TIME, ARTIST
IT’S EASY
PORTABLE
PROVEN
SUPPORT
ENGINEERS DON’T GET TO ROLL THEIR OWN
The game worlds are very large and in many case this means that if your occlusion culling systems require any kind of manual work, it is going to be a major burden for your artists.
The worlds are pretty open, so any kind of manual portal placements are pretty much out of the question.
So in this sense, Umbra’s tech suits this use case really really well.
Also a game like the Witcher 3 relies heavily on dynamic streaming of data and LOD’s, both of which were features that we didn’t really support at the time when we started discussing co-operation with the CD Projekt’s team.
So again, here’s the process how Umbra works. So polygon soup, generate occlusion data. Works really well in many cases, but in a situations where the game worlds are huge there’s a couple of problems.
First, on the content authoring side, there might be situations where the artists cannot have the entire world – the source data - in memory at any given time at once. So you need to be able to process just a local section of the world individually.
And on the other hand, on the engine runtime the occlusion data for a world that is simply vast might be a bit too much to have in memory at all times.
So we needed to do something about that.
Now I’ll tell you about the solution that we did, it is pretty simple really. This is obviously a good thing when it comes to design.
So the user is just able to split the game world into these chunks, or tiles. Each of these tiles are just individual polygin soups.
The the user can produce individual data sets for each of these tiles as well. This process is up to the user to distribute, so he can do this in multiple threads or multiple processes or even on multiple computers altogether.
Processing...
The end result is that you have a set of streaming tiles and corresponding output occlusion data sets.
And then you can proceed to write these data sets on the disk or do what ever you want with them.
Now in the engine runtime then you have the camera location and typically some sort of a radius inside which the camera will be during the next few frames.
Based on that information you know which ones of your streaming tiles are going to be active, and you can just select the corresponding Tomes.
Once you have the Tomes streamed in, along with the other data you stream in when you select the active streaming tiles, you just combine the Tome s into a Tome Collection.
Then you can use that tome collection exactly as you would be using an individual tome . So you perform visibility queries and so forth.
There were couple of interesting engineering challenges when implementing a system like this even though it sound astoundingly simple. I’m not going into full detail on how exactly we did this, just to give you an idea it’s not all powerpoint animations when you do a system like this.
First of all the streaming tiles are completely independent from each other. And especially when you are computing the occlusion data for a streaming tile it would be really nice to access some neighbouring geometry, especially when working near the borders of the tiles. So it would come in handy to know something about the geometry on the other side. Unfortunately this is not possibe in a system like this so we needed a way to circumvent that.
Also, the kind of a flipside of this very same issue is that we needed to be able to match those neighbouring tiles together on their borders. Which sometimes can be very tricky when you don’t know anything about the neighbouring tile. You could for instance have completely different set of computation parameters used on the other side and still need to be able to combine the data sets.
And obviously, you need to be really really quick when you do the opration so that it does not hurt the frame rate. In a typical scenario, you don’t do this every frame and you get to spend some time over a few frames to do this. Overall, we only had a few milliseconds for the entire operation.
This was quite an interesting engineering operation to undertake.
Right, so LOD’s then. Previously Umbra had no notion about LOD’s. We had polygon soup, some triangles grouped into objects and then there were visibility queries.
Obviously, in any modern 3D engine you need to have support for LOD’s.
It’s not just multiple versions of the same mesh, but you need to support LOD hierarchies and then there is the problem of deciding how the different LOD’s actually contribute to the occlusion. So you don’t want to end up in a situation where different LOD’s of the same mesh occlude each other.
The solution to this one once again sounds pretty simple. So first of all, for occlusion you just use the LOD level that contains most detail.
At this point I should probably make a distincition between an occluder and an occludee. An occluder hides other objects and an occludee is returned as a result from the visibility query.
Then, each of the LOD levels are occludees in Umbra. And for each of these levels you an specify an active distance range.
So in runtime when we do the visibility query based on the camera transformation it’s pretty easy to do distance culling based on each of the occludees active distance range.
Now, in all cases a simple camera distance is not a good criteria for selecting the active LOD level. For instance, when you do things like zooming, or you look through the scope of a sniper rifle, the distance doesn’t change but you still need to use a more detailed LOD. For this purpose we implemented possibility to scale the LOD distance in runtime. So the user specifies a number between 0 and 1 and all the LOD distances are scaled with that number.
Another similar feature we implemented was the possibility to override the distance reference point entirely.
We considered other crieteria for selecting the LOD level as well, such as the proportional screen space area, but so far it seem that the distance to the camera, distance scaling and modifiable distance reference point are sufficient for all the uses that we encountered. There is currenty no plans to change that.
We also considered doing something smarter with the occluder data, since the occluder generation in Umbra is based on voxelization, we considered doing something like taking an intersection between all the LOD levels and using that as the occluder mesh, but then again the simpler approach where we use the most detailed mesh seems to be working sufficiently well, so there hasn’t been any pressure to change that..
Third person action game for Xbox One from creators of Max Payne and Alan Wake
In-house next generation 3D engine, developed by one of the most respected graphics team in the industry
Average object count per view ~40k without occlusion
Features lots of large scale destruction, semi-dynamic changes of geometry
Previously used GPU occlusion queries for visibility
One aspect of how Remedy is using Umbra that I wanted to talk about is how to deal with dynamically changing geometry.
The first type of changes required in this title are basically transitions from one state to another – I suppose often accompanied by a big collision or explosion.
There’s also cases where a scene is visited at different points in time and where some but not all of the geometry changes from one version to the next.
Both of these are nicely handled by being able to stream the Umbra data chunks in and out. Multiple versions for a given chunk exist to represent different states it is in, and the engine loads the appropriate chunks for the current game state. Parts of the scene that don’t change only have a single version of the Tome data.
While changing the active set of visibility data chunks is not instantaneous, it is something that happens within a couple of frames and therefore poses no problem with these types of transitions.
One of the harder technical challenges involved in the runtime linking that we had to solve is that it guarantees that there’s never leaks through solid walls even when the wall extends across streaming units. But I’ll save you from the details of that exercise.
So these dynamic scene changes are very similar to what we ended up implementing for the Witcher 3 team for their streaming needs.
Another thing that Remedy does is to cull shadow casters in shadow map rendering with the Umbra Tome data.
You can obviously do occlusion culling in light space just as you would for a normal camera.
But what Remedy is doing is something that turns out to be even more powerful.
Having the Umbra generated occlusion buffer and the visible objects, you can reproject that to light space to create a mask representing the potential shadow receivers for the light source. Shadow casters are then tested against this mask to find if they cast a visible shadow.
In the illustrations here, you see that with occlusion culling only we are still spending a lot of time rendering shadows. This can be avoided by using the visibility data for caster culling as shown on the right.
This is a title that needs to introduction.
In case you’ve been living in a bottle it’s a huge production by the creators of Halo and it launched in September this year for the previous and current gen consoles using Umbra 3 across the board.
We have worked with Bungie since 2009 and indeed many aspects of Umbra 3 were designed according to Bungie’s requirements.
Bungie has changed the way they build content pretty dramatically from the previous title, they’ve gone from modeling BSPs to be able to arbitrarily splash geometry around.
Their old way of doing visibility by manually placing portals was not going to cut it, so they called us.
As a result of the collaboration the Umbra data is being used not only for visibility but for many other engine systems as well. I’m going to concentrate on visibility on the following slides.
Hard requirements from Bungie
3km x 3km map
Full rebuild: 5 minutes
Smallest incremental update: 10 seconds
In Bungie’s words: ”so much content, so little time”
The enabling feature for rapid content iteration is that the data is local in nature. Bungie distributes the computation task into their build farm, and all intermediate results are shared by everyone in the team.
This also makes sure that you can keep tweaking on content to the very last moment before launch, and even after.
I wanted to bring up one cool way that Bungie is optimizing input latency.
The final camera update is sometimes done very late in the frame due to it being dependent on Havok finishing its calculation.
But there are known bounds for how much the camera can move, as we know the current velocity of the camera and its previous location.
Visibility query done with incomplete information.
We built the means of doing a from-region instead of from-point query.
Shrink occluders – grow anti-occluders: still correct.
We call this feature the predicted camera, and it’s pretty much a unique property of our algorithm.
Finally, I wanted to highlight another type of support for dynamic changes in occlusion that Umbra 3 supports: the gate objects.
Ofter there is need to change the occlusion dramatically, but very locally. For instance by opening a door or a window. As the name implies, gate objects are meant exactly for this purpose.
A gate object is a special input to Umbra that creates a set of portals that can be toggled on and off in runtime.
This is mostly used for doors and windows, and there’s limits to what it scales to.
Bungie uses the gates not just for doors, but also to partition space for the other more advanced use cases I mentioned earlier. These uses include audio occlusion, AI activation and other AI operations and broad-phase collision detection.
But I’ll leave these more advanced use cases to another time.