Rendering ‘Monster Hunter Online’
Liu Xiao
Graphics Programmer
Content
• Background
• Outline
• 24-hour TOD dynamic IBL
• Physically correct screen space reflection
• Lighting for transparent objects
• Pipeline summery
• Other small things
Background
 Why did we need to refactor the rendering pipeline of CryEngine?
• The version we are using (3.3.8) is out of date and the graphics quality was not really “next generation”.
• Its IBL GI system does not support dynamic time of day, while we have lots of maps with day / night cycles.
• No point lighting for transparent objects or any other forward shading objects (cloth, hair, etc).
• The real-time water reflection system has too many limits, while we have a lot of water surfaces in game, along with other
reflective surfaces such as ice, marble floors, etc.
 Why did we focus on the lighting pipeline?
• Most of the art assets are finished, therefore we need to find a quality improving direction mostly driven by programming.
• Physically based shading is main stream in this generation.
• Monster Hunter is a game that focuses a lot on object details, especially monster details and character details.
 Conclusion:
• better lighting quality and better material details.
Outline
 The rendering pipeline refactoring started at February 2015 – 9 months before release
• Headache: 80% of the art assets were finished, yet the graphics quality was still at the X360 / PS3 level.
• Direction: a programming driven, artist friendly way to improve the quality.
• Conclusion: PBR material & pipeline + other features that enhance the lighting quality.
 Comparison between CryEngine versions
• The deferred lighting + forward shading pipeline in CryEngine 3.3.8 has cheap G buffer pass but cannot
support PBR for local point lights and GI. It also wastes draw calls.
• The deferred shading + (some object) forward shading pipeline in CryEngine 3.6.17 supports full PBR
pipeline, yet it still lacks some features. Besides, it’s DX11 only while we still need to support DX9.
Outline
 Step 1: Upgrade some part of the engine
• Deferred Lighting  Deferred Shading
• Normalized Phong BRDF  GGX BRDF
Outline
 Step 2: Develop some features that CryEngine couldn’t offer
• Runtime GPU filtered dynamic IBL for 24-hour time of day.
• Physically correct screen space reflection.
• Forward plus pipeline for translucent objects.
• Compute shader & UAV support (won’t be discussed in this talk).
Video(Static IBL): cubemap built from 12pm cannot fit full day / night cycle
Video(Dynamic IBL): one cubemap fits full day / night cycle
SSR on water surface
SSR on wet surface (rainy)
SSR on ice (general material)
Video: lighting translucent object
Hair with F+ point light lighting
Hair with out point light lighting
Hair with F+ GI
Hair with out F+ GI
F+ for AMD TressFX
24-hour Dynamic IBL
Image Based Lighting in CryEngine
CryEngine uses standard image based lighting
• Plant cubemap probes in the editor, then bake the cubemaps.
• Then the baked cubemap is filtered through AMD Cubemap Gen and generates one diffuse
cubemap and one specular cubemap.
Environment Probe
Image Based Lighting in CryEngine
• The low frequency diffuse cubemap is used for environment diffuse。
Image Based Lighting in CryEngine
• The high frequency specular cubemap is used for environment specular.
• The cubemap is filtered on various levels based on different levels of glossness,
and the filtered results are stored in the cubemap mips.
• The lighting shader will fetch the correct cubemap mip based on the pixel’s
glossenss.
Image Based Lighting in CryEngine
 Pros:
• Easy to bake, easy to use.
 Cons:
• Pre-baked cubemap is not suitable for all TOD setups. It only suits the lighting environment when it’s baked.
• Although we also have a lots of maps with fixed TOD, it still requires tons of work to rebuild all of them due
to the amount of combination:
• 1 map = 8 zones
• 1 zone may have 3 - 5 fixed TOD setups
• 1 zone may have 1 – 12 cubemap probes
• Building 1 map = ??? Of TOD changes & cubemap rebuilds
 Conclusion:
• Need to let one cubemap suit all TOD setups!
Static IBL (12pm) – cubemap built @ 12pm fits well
Static IBL (6pm) – cubemap built @ 12pm is too bright
Static IBL (5am) – cubemap built @ 12pm is too bright
24-hour Dynamic GI
 Goal:
• One cubemap suits all time of the day.
• Can be used for both 24-hour TOD map and fixed TOD map.
 Our solution in Monster Hunter Online:
• Instead of baking a cubemap with static scene image, bake a scene diffuse cubemap and scene normal
cubemap. [ McAuley 2015, "Rendering The World Of Far Cry 4" ]
• Based on the game’s lighting environment and sky dome, re-light the cubemap, and generate cubemap mips.
The mip with 16 x 16 resolution can be used for environment diffuse cubemap.
• Based on the BRDF we use, with filtered importance sampling, we filter the re-lit cubemap based on
different levels of glossness. The filtered cubemap will be used for environment specular cubemap.
[ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]
Cubemap Relighting
+ =+
Cubemap Filtering
 For each runtime re-lit cubemap:
1. Generate MIPs from raw cubemap first.
2. The low resolution MIP can be used for environment diffuse cubemap directly.
3. However, for environment specular, we need to filter the cubemap correctly based on different levels of
glossness, and store the filtered results in different cubemap MIPs.
 The problem: how to filter specular cubemap correctly on GPU at runtime?
 Solution: GPU filtered importance sampling
1. [ GPU Gems3, “GPU Based Importance Sampling” ]
2. [ Karis 2013, “Real Shading in Unreal Engine 4” ]
3. [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]
Environment Specular
Importance sampling in Monte-Carlo integration:
With specular BRDF:
Environment Specular
环境Specular
• The DFG term can be pre-integrated offline and the results can be stored in a texture. [ Karis 2013,
“Real Shading in Unreal Engine 4” ]
• Or it can be approximated through one function. [ Lazarov 2013, "Getting More Physical in Call of
Duty: Black Ops II" ]。
• We use the pre-integrated DFG texture for our high spec configuration while we use the Lazarov
function for our low spec configuration in order to reduce texture sampling.
Cubemap Filtering
 The LD term needs to be computed for each pixel of the cubemap
1. For each pixel of each cubemap MIP, find N sampling directions through importance sampling. [ Karis 2013,
“Real Shading in Unreal Engine 4” ]
2. Calculate the PDF of that direction based the glossness value the filtering is based on.
3. Based on the PDF, calculate the raw cubemap MIP we need to sample, and sample it. [ GPU Gems3, “GPU
Based Importance Sampling” ](equation 11, 12 and 13)
4. Weight these samples with cosine term.
Cubemap Filtering
 Optimization:
• There’s no need to filter cubemap MIP0 as it will be used for perfect mirror-smooth surface. We just copy
the MIP0 of the raw cubemap to the filtered cubemap.
• Limit the sampling count of the rest MIPs to be 32.
• Limit the maximum size of the cubemap to be 128 x 128 x 6.
• Since the TOD of our game isn’t running very fast, there’s no need to filter all cubemaps every frame. We
distribute the filtering work through multiple frames.
 Results:
• Lighting + filtering one cubemap = 0.35 ms
• By distributing the filtering through multiple frames, the total GPU time of cubemap filtering pass is about 1
ms.
• The GPU time was measured on GTX 660, with 12 cubemaps in the map.
Cubemap Filtering
• The final filtered results is almost identical to the filtered results from ATI CubemapGen.
Static IBL (6pm) – cubemap built @ 12pm is too bright
Dynamic IBL (6pm) – ambient & env specular fit well
Static IBL (5am) – cubemap built @ 12pm is too bright
Dynamic IBL (5am) – ambient & env specular fit well
Future work:
• The current re-lit cubemap doesn’t contain depth info, thus the lighting pass can only apply sun light and sky
light to it.
• Need to bake depth into the cubemap so that the cubemap re-lighting pass can be more accurate.
• The filtering pass can be further optimized if we batch all cubemap faces of the same MIP together.
• An automatic cubemap probe planting & baking system would be nice.
Physically correct screen space reflection
Motivation:
• Monter Hunter Online has a lot of water surface and smooth surface across the game world, thus correct
reflection system is necessary.
• The original water reflection was achieved by rendering the scene again through a reflection camera.
• Lots of limitations:
1. More draw calls.
2. Even worse if there are multiple reflection surfaces in the viewport.
3. The reflection only has sun light and misses point light, shadows and GI.
4. Without any filtering, it only support mirror reflection and cannot support glossy reflection.
5. It cannot be used for GPU dynamic rain effect as the reflection surfaces (puddles) are generated procedurally.
Multi-reflection case
Motivation:
• Screen Space Reflection is being wildly used in games already.
• Pros:
1. High efficiency & rooms for optimization. We ended up with 2 types of SSR: water only reflection for low spec and all-
surface reflection for high spec.
2. The reflection contains all the rendering information (lighting, shadows, GI, particles) if we use last frame’s scene
texture as input.
• Cons:
1. Cannot reflect anything that is outside the viewport.
2. Cannot reflect anything that is occluded in the screen.
3. Need environment specular cubemap for fallback solution.
• Conclusion:
• CryEngine’s IBL cubemaps can be a good fall back solution for SSR if the tracing fails.
Our SSR for water
Video: Our SSR for rain
Our Glossy reflection
Our SSR on mixed surface
MHOL water only reflection
CE 3.6.17 SSR on smooth surface
MHOL full-surface reflection
CE 3.6.17 SSR on rough surface
SSR pipeline in MHOL
• Water only reflection (for low spec)
1. Render water surface normal & depth onto half resolution targets
2. Ray-tracing pass
3. Filtering pass with temporal reprojection
4. Apply SSR during water surface shading
• All-surface reflection (for high spec)
1. Downsample Gbuffer normal & depth onto half resolution targets, then render water surface normal & depth onto it.
2. Ray-tracing pass with stochastic importance sampling
3. Scene texture blurring pass (scene texture was from last frame)
4. Filtering pass
5. Temporal reprojection pass
6. Apply SSR after deferred IBL pass
7. Apply water SSR during water surface shading
Raytrace Pass (water reflection)
• The algorithm is pretty simple: calculate the reflection vector based on the view vector and
surface normal, then ray march along the reflection vector until it hits the scene depth.
normal
view reflection ray
scene depth
reflection surface
Raytrace Pass (water reflection)
• Optimizations:
1. For each ray march step: XY  one pixel of the screen space, Z  the reciprocal of linear Z. [ McGuire 2014, “Efficient
GPU Screen-Space Ray Tracing” ]
2. The ray march process can be optimized with dithered sampling. Assuming the ray-tracing result for each pixel of a 2x2
tile is similar, then we can distribute the whole tracing distance evenly among these 4 pixels (with a dithered offset
pattern) [ Valient 2014, “Reflections And Volumetrics Of Killzone Shadow Fall” ].
3. Then we just need to composite the tracing results of these 4 pixels in the filtering pass in order to get the complete
tracing result.
DC
BA
A
B
C
D
Filtering Pass (water reflection)
 How to make sure the reflection has all the scene information?
1. Only store the reflection location (reflection’s UV coordinate in scene texture) during the ray-tracing pass.
2. Calculate the reflection location from last frame (reprojection) during the filtering pass.
3. Use the reprojected reflection location to sample the scene color from last frame’s scene texture.
 How to composite the tracing results from 2x2 tile?
• Since we are only dealing with water reflections now, the surface information of 2x2 tile is almost identical (same
reflectance and glossness, as they are all water), we just need to average them.
 Temporal reprojection for stability
• Due to some factors (half resolution rendering, discontinuous scene depth, etc), the final reflection image flickers while
camera is moving. Thus we need to use temporal reprojection to stabilize the reflection image. This will be detailed later.
Raytrace Pass
Filtering Pass
Temporal Reprojection
Final Composite
Performance
• Profiling environment: 800 x 450 half resolution SSR on GTX 660
• The whole SSR process takes 0.55ms in the worst case (half of the screen is occupied by water).
All surface reflection
• The all-surface reflection needs to follow the microfacet BRDF we use.
• The reflection direction is no longer a ray but a cone.
1. The rougher the reflection surface is, the blurrier the reflection image becomes.
2. The further the reflection surface is, the blurrier the reflection image becomes.
3. The intensity of the reflection needs to follow the reflection surface Fresnel reflectance.
Glossy reflection in real world
Glossy reflection in real world
Mixed reflection in MHOL
Raytrace Pass (all-surface)
• Since the reflection direction is no longer a ray but a cone, we need more than one reflection rays
in order to fill the cone.
• Which is quite impractical in realtime game.
normal
view
scene depth
reflection surface
reflection cone
Raytrace Pass (all-surface)
• Optimization 1:
1. We can expend the dithered sampling technique motioned before. For 4 pixels in the 2x2 tile, we can shoot 4 rays with 4
different reflection angles.
2. If the reflection surface is smooth enough, these 4 rays may be enough to cover the reflection cone.
DC
BA
A
B
C
D
Raytrace Pass (all-surface)
• Optimization 2:
1. However, for rough reflection surface, how can we cover the cone with only 4 rays?
2. Solution: temporal supersampling.
• Temproal Supersampling Refresher:
1. For a static pixel, if nothing is changed regarding to this pixel, then its image should be the same across frames.
2. Based on this assumption, we can distribute the rendering work of this pixel across frames and use the accumulated
result of the pixel as its final image.
Raytrace Pass (all-surface)
• Temproal Supersampling:
For the pixels of 2x2 tile, we not only let them shoot 4 rays with 4 different reflection angles at one frame, we also let them
shoot rays with different angles across frames. This is achieved by adding a frame variation to the importance sampling used
for reflection angle calculation.
A
B
C
D
A
B
C
D
Frame N+1Frame N
Video: tracing result with jittering
Filtering Pass (all-surface)
• Difference between this and how we handle the filtering of water reflection
1. Since the ray angles were calculated based on importance sampling with a jittered pattern, we also need a jittered
pattern to sample the ray-tracing results. [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ]
2. The source texture (last frame’s scene texture) needs to be pre-filtered so that the reflections of all glossness can find the
correct blurred scene image.
3. The material information of the pixels within the 2x2 tile may be totally different, thus we cannot simply average the ray-
tracing results. The results have to be weighted by the material surface BRDF. Based on [ Stachowiak 2015, “Stochastic
Screen-Space Reflections” ], the weight for each ray-tracing result pixel is the current pixel’s BRDF divided by the
sampling pixel’s PDF.
Video: filtering result with jittering
Temporal Reprojection Pass
• Temporal Reprojection:
1. For each pixel from the filtering pass, we calculate its world space position, then reproject it into last frame’s screen UV
space, then find the resolved result from last frame’s SSR image.
2. Then we accumulate the last frame’s SSR data with current frame’s filtered data, and use that for this frame’s final SSR
image.
• The reprojection process for SSR is a little bit different
1. Usually we reproject the pixel based on its world space position. We calculate its world space position based on the
pixel’s screen space location and scene depth. However, this is not the case for SSR reprojection.
2. For the pixel of the reflection, we cannot use the reflection surface’s depth, we need to use the reflection’s depth to
calculate the world space position, then use that to reproject the pixel location.
3. [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ]
Video: final result with temporal reprojection
Temporal Reprojection Pass
• Remove Ghost Artifacts:
1. Since we use last frame’s scene texture as reflection source texture, we will have one-frame delay anyway. Thus we
cannot remove ghosting totally, we can only limit it to a tolerable level.
2. Using reflection’s depth instead of reflection surface’s depth for reprojection (previously mentioned) helps a lot.
3. The color & luma bounding box clamping from UE4’s TXAA technique also helps a lot.
4. Skip the resolved sample if the reprojected location is too far away from the current pixel location.
5. Skip the resolved sample if the depth of the reprojected location is too far away from the depth of the current pixel
location.
Video: Reprojection based on reflection surface’s depth
Video: Reprojection based on reflection’s depth
Video: moving camera on smooth surface
Video: moving camera on mixed surface
Performance
• Profiling environment: 800 x 450 half resolution SSR on GTX 660
• Ray-tracing pass = 0.8 ms (worst case while all pixels on screen pass the SSR glossness threshold)
• Last frame scene texture pre-filtering = 0.08 ms
• Filtering pass = 0.27 ms
• Temporal reprojection pass = 0.2 ms
• Total = 1 ms (average) ~ 1.4 ms (worst)
Future work:
1. Optimize ray-tracing pass with hi-z sampling.
2. Further reduce ghosting issue by applying motion vector during the temporal reprojection pass.
3. Render the filtering pass in full resolution for better quality (maybe for ultra spec configuration).
Lighting Translucent objects
Lighting Translucent objects
 Goal:
• Almost all of our maps have point lights and for some of them point light is the only type of light source.
• Monster Hunter Online has quite a few objects that can only be correctly shaded through forward shading:
a) Objects with anisotropic specular: hair (opaque part), silk, etc.
b) Objects with subsurface scattering for all lights: ice.
c) Translucent objects that do not go through Gbuffer pass: hair tips (translucent), TressFX fur, glass, etc.
• Need to make sure all of these forward shading objects are correctly lit by point lights.
 Solution:
• Forward Plus [ Harada 2012, “Forward+: Bringing Deferred Lighting To The Next Level” ]
Video: Hair is affected by point lights
Forward Plus
 Difference from tiled lighting:
1. Both techniques do tiled light culling on GPU.
2. Tiled lighting does per-pixel lighting directly after light culling.
3. Forward plus stores culled lights into a light indexed list which can be used for both deferred lighting and forward
lighting.
Forward Plus
 How tile is divided:
1. The amount of tile needs to match the pixel count of one mip of hi-z. Each tile represents one pixel of the hi-z.
Our hi-z contains both min z and max z of the scene.
2. In 4k resolution, the screen is divided into 256 x 128 tiles; in 1080p or above, the screen is divided into 128 x 64
tiles; in other resolution, the screen is divided into 64 x 32 tiles.
3. Up to 16 point lights are supported for each tile; up to 255 point lights are supported for the viewport; the point
light indices range from 1 to 255.
4. Each light is culled by the tile boundary and the min z and max z of the tile (the pixel of the hi-z). Hi-z is also used
for GPU occlusion test for the game.
Forward Plus
 How light indices are stored:
1. For DX9, first we need 4 R8G8B8A8 render targets to store the light indices. The first 4 lights of one tile is stored
in render target A, the next 4 lights of the same tile is stored in render target B, and so on.
2. Therefore the light index has to be in the range of 1-255 (0 is reserved as no-light), and 1 pixel of 1 render target
can store up to 4 lights, which means up to 16 lights are supported for 1 tile.
3. Then we combine these 4 render targets into 1 light index atlas.
4. For DX11 it’s pretty simple: the light indices of each tile is stored in one element of a structured buffer.
Forward Plus
 Pipeline:
1. Cull the lights against frustum on CPU, then write light data into one global light buffer.
2. Write the depth of translucent objects into scene Z target, and generate hi-z based on that.
3. Do tiled light culling on GPU, then store the culled light indices into global light indexed light.
4. During forward shading, find the tile index based on the shading pixel location, then find the light indices from the
global light indexed light, then light the pixel based on the light data from the global light buffer.
 Pros:
Easy to implement. Meets our artists requirement pretty well.
 Cons:
1. We run into VGPR issues in the forward shading pass due to extensive forward lighting (and GI) calculation.
2. DX9 has vector register limits, which makes some of the pixel shaders fail to compile.
3. We have to simplify some lighting functions for the DX9 version.
Character Hair Rendering
• Most part of the hair strand is opaque, while the strand tip is translucent.
• The opaque part goes through Gbuffer pass, deferred cubemap pass and deferred shadow pass
along with other general objects, while its lighting part is done through forward shading due to its
anisotropic specular lighting.
• The translucent part only goes through forward shading pass, where all of its lighting (both direct
and environment) and shadowing are done.
• Hair sorting:
1. For different characters their hair is sorted based on character heads position.
2. For hair layers within one hair style, they are sorted based on their sub-material names, such as “hair_0”,
“hair_1” and such. We require artists to appoint different sub-materials to different hair layers if they
want correct sorting.
3. However, for quite a few hair styles we don’t sort at all and the result is still fine.
• We also write hair depth into scene Z target so that the hair won’t be all blurry when depth of
field is turned on.
Opaque hair only VS opaque hair with translucent tips
GI for translucent objects
• Translucent objects still need GI lighting.
• Especially for hair, whose opaque part goes through deferred cubemap pass for GI while translucent part
only does forward shading.
Strand tips with no GI Strand tips with GI
GI for translucent objects
 Goal:
• Our version of CryEngine uses cubemaps as its GI solution. Sometime we have up to 12 cubemaps rendered
in screen.
• For DX11 we can simply store all these cubemaps into one cubemap array and use it anywhere, while this is
totally unviable in DX9.
• For DX9 we cannot sample all those 12 cubemaps in one forward shading pixel shader either because of the
texture stage limits.
• Need to figure out a way to efficiently store all those cubemaps and use them in the forward shading pass.
 Solution:
• Project all those cubemaps into spherical harmonics coefficients.
• Therefore our GI for translucent objects is environment diffuse only. The environment specular for
translucent objects is faked by sampling one general specular cubemap (our translucent hair tips don’t have
any environment specular at all).
Spherical Harmonics Projection
 Pipeline:
1. Project each pixel of the diffuse cubemap into spherical harmonics, then accumulate them.
2. For DX11 we use one set of SH9 coefficients, which can be stored in one element of a structure buffer.
3. For DX9 we use one set of SH4 coefficients, which can be stored in 4 pixels of a R32G32B32A32F render target.
4. To simply the pipeline, we project all diffuse cubemaps into spherical harmonics coefficients, and use that for
both of our deferred cubemap pass and forward shading pass. The environment specular is still done through
sampling specular cubemaps during the deferred cubemap pass.
5. As is mentioned before, since we can only project diffuse cubemaps into spherical harmonics coefficients, we
can only do environment diffuse lighting for translucent objects. The environment specular lighting is faked for
them.
Spherical Harmonics Projection
 DX11:
1. The projection process is done through compute shader with parallel reduction[ Young 2010, “DirectCompute
Optimizations and Best Practices” ].
2. We spawn 256 threads for one 16 x 16 diffuse cubemap. Each thread handles 6 pixels from 6 faces. The projected
coefficients are stored in the group shared memory, then they are accumulated through parallel reduction.
3. It only takes 0.16 ms to project 12 cubemaps with 16 x 16 resolution. We do this process every frame.
 DX9:
1. We can’t find a smart way to do this, so we have to do the projection through brutal force: we have to do the projection
of all pixels of one cubemap in one pixel shader fragment.
2. Due to the inefficiency, we have to lower the diffuse cubemap resolution for this process. We use 8 x 8 diffuse cubemap
for SH4 projection. All 8 x 8 x 6 pixels are projected and accumulated in one pixel shader fragment.
3. Basically we trade GPU performance with quality. The GPU time it takes to project 12 cubemaps with 8 x 8 resolution is
0.18 ms for DX9, which is on par with the DX11 version, and the quality loss is sort of acceptable.
Diffuse cubemap
SH9 (DX11)
SH4 (DX9)
Light Indexed Deferred
• The CryEngine we are using still uses light volume as its deferred lighting solution. This is fine
most of the time, until we run into maps with lots of point lights.
• Since we already have the global light indexed list, we can just use it as our deferred lighting
solution, which is called “light indexed deferred” in some papers.
• It’s not much different than tiled lighting, except we have culled the lights already and we just
need to grab the light indices for each tile and do the usual thing.
Light Indexed Deferred
• Performance:
1. DX11: when there aren’t many point lights in the viewport, the light indexed deferred method is on par with using light
volumes. However, when the amount of lights exceeds 50, the light indexed deferred method beats light volumes hands
down.
2. DX9: due to the inefficiency of branches and loops in SM 3.0, the light indexed deferred method is slower than light
volumes when there aren’t many point lights.
• Since we don’t have 50+ point lights most of time, we only turn on light indexed deferred for
DX11. For DX9 we still use the CryEngine light volume solution.
• Since we have a lot of outdoor maps in our game, the sky background can be seen most of the
time during the game. Thus a stencil pre-pass culling all the empty tiles is quite helpful.
• In Monster Hunter Online, the stencil pre-pass before the lighting pass saves about 0.1 – 0.2 ms in
DX11.
Future work:
1. Currently shadow casting point lights and spot lights are still rendered by using light volumes.
2. Need to find a solution to make the tiled deferred lighting system support shadow casting point lights.
3. Also need to find a solution to support spot lights and area lights (need to implement first).
Rendering Pipeline Overview
Rendering Pipeline in CryEngine 3.3.8
• CryEngine 3.3.8 uses deferred lighting + forward shading
1. Gbuffer Pass
2. Shadow Pass
3. AO Pass
4. Deferred Cubemap Pass (GI solution)
5. Deferred Lighting Pass (done through light volumes)
6. Deferred Shadowing
7. Forward Shading (for everything)
8. HDR Tonemapping + Post Effects + AA
• Gbuffer layout:
R8G8B8A8 Normal & Glossness + R32F linear depth
• Pro: thin Gbuffer saves a lot of bandwidth
• Con: waste of draw call and the thin Gbuffer cannot support PBR for point lights and GI
Rendering Pipeline in CryEngine 3.6.17
• CryEngine 3.6.17 uses Deferred Shading + Forward Shading
1. Gbuffer Pass
2. Shadow Pass
3. AO Pass + SSR Pass
4. Deferred Cubemap Pass
5. Deferred Lighting Pass (done through tiled lighting)
6. Deferred Shadowing
7. Deferred Shading
8. Forward Shading (only for some objects)
9. HDR Tonemapping + Post Effects + AA
• Gbuffer layout:
R8G8B8A8 Normal & Transmittance + R8G8B8A8 Diffuse & SSS profile + R8G8B8A8 Glossness & Specular
• Pro: supports full PBR for all types of lights
• Con: lacks features our artists desire & no longer supports DX9
Rendering Pipeline in Monster Hunter Online
• Based on 3.6.17, we added more features
1. Gbuffer Pass
2. Shadow Pass
3. Hi-Z Occlusion Test (DX11)
4. AO Pass + Stochastic SSR Pass
5. Cubemap Relighting & Filtering Pass
6. SH9 / SH4 projection Pass
7. Deferred Cubemap Pass
8. Deferred Lighting Pass (DX9: light volumes; DX11: light indexed deferred)
9. Deferred Shadowing
10. Deferred Shading
11. Forward Plus Preparing (build hi-z with translucent objects depth + build light indexed list based on new hi-z)
12. Forward Shading (with Forward Plus lighting for translucent objects)
13. HDR Tonemapping + Post Effects + AA
• Gbuffer layout:
R8G8B8A8 Normal & Material ID + R8G8B8A8 Glossness & Specular + R8G8B8A8 Diffuse & Transmittance + R32F linear depth (DX9 only)
Rendering Pipeline in Monster Hunter Online
• We also have a simplified deferred shading pipeline for middle spec configuration.
1. Gbuffer Pass
2. Shadow Pass
3. Hi-Z Occlusion Test (DX11)
4. Water only SSR
5. Deferred Lighting Pass (DX9: light volumes; DX11: light indexed deferred)
6. Deferred Shadowing
7. Deferred Shading
8. Forward Plus Preparing (build hi-z with translucent objects depth + build light indexed list based on new hi-z)
9. Forward Shading (with Forward Plus lighting for translucent objects)
10. HDR Tonemapping + Post Effects + AA
• We replaced the cubemap GI solution with artist tuned static ambient value for environment diffuse and one
global specular cubemap for environment specular.
Rendering Pipeline in Monster Hunter Online
• Meanwhile we also have one forward shading only pipeline for really low spec machines.
1. Forward Shading done in R16G16B16A16F format render target
2. HDR Tonemapping and no other post effects
 This is pretty much how vanilla Monster Hunter is rendered on PSP, and possibly can be our rendering pipeline of choice if
we plan to port our game to mobile phones.
 Only sun light is handled. For a lot of environment objects normal mapping is skipped and only vertex normal is used.
 Environment diffuse is done through artist tuned ambient value, and environment specular is done through global
specular cubemap.
 Shadow map is not used. Instead, sphere shadow volume is used for character shadows.
 Still supports PBR.
 Future work:
1. Write scene depth into alpha channel of the scene render target so that we can still support screen space decals.
2. Since we don’t have depth pre-pass, we’d better sort the scene objects beforehand in order to reduce overdraws with early depth test.
Other small things
Miscellaneous rendering tricks, experiences, etc
Prevent color leaking in half-res DOF
• Depth of field pipeline:
1. Filter Pass: calculate the filtering kernel and blend alpha based on scene depth, then filter the scene.
2. Composite Pass: blend the filtered scene and the original scene based on blend alpha.
DOF filter
Half resolution scene DOF filtered scene blended with un-
filtered scene
Prevent color leaking in half-res DOF
• How to prevent color leaking:
1. Pre-multiplied Alpha Pass: before filtering the scene, calculate the blend alpha and multiply the scene with it.
2. Filter Pass: filter the scene which has been multiplied by the blend alpha, then divide the blend alpha after filtering.
3. Composite Pass: same as usual.
DOF filter
Half resolution scene pre-multiplied with alpha
DOF filtered scene blended with un-
filtered scene
Decal color leak Decal no color leak
Prevent color leaking in deferred decals
• The Problem:
• This is caused by depth discontinuities when there’s an object occluding the decal in screen space.
• For small decals, this can be solved by disabling texture mipmap, such as forcing to use MIP0.
• However, in our game we have quite a lot of huge decals placed on terrain in order to improve
environment variation.
• The Solution:
• [ Persson 2010, http://humus.name/index.php?page=3D&ID=84 ]
• We sample the scene depth within the 2x2 area of the decal pixel, then manually find the depth
continuities on horizontal and vertical direction. Then we use tex2Dgrad to sample the decal
texture.
Geometry blending
• Our artists noticed the feature when playing Rise of the Tomb Raider.
• Our artists want similar feature implemented in Monster Hunter Online: to reduce visible
disconnect between objects and terrain.
No Geometry Blend With Geometry Blend
Geometry blending
• Our pipeline:
1. Separate geometry blending objects from other Gbuffer objects and put them into a standalone render
list.
2. Render other Gbuffer objects first in the Gbuffer pass, then resolve the scene depth onto a target (half
resolution depth will also do).
3. Then render the geometry blending objects:
① Sample the resolved depth in shader and calculate the blending alpha based on the difference between resolved
scene depth and geometry fragment depth.
② Do Gbuffer alpha blending (like how the terrain layers are handled) based on this depth bias based alpha.
4. Since we use deferred shading and all PBR materials go through the same lighting function, Gbuffer
blended objects look like they are blended together.
Alpha blended albedo diffuse Alpha blended world space normal
Alpha blended specular reflectance Alpha blended glossness
Geometry blending
• With this technique we can also do blending among non-terrain objects:
• However, we require our artists to only do this among non-metals. This is because for non-metals their
specular values are close (all around 50) while metals may have very different specular values. Besides,
blending specular reflectance among metals or among metals and non-metals doesn’t make sense for
PBR anyway.
DX11 in-place editing
• Sometimes we want to modify the data of render target A but the hardware alpha blending model just
cannot do it.
• For example: A’ = f (A)
• A is the original pixel of render target A
• A’ is the modified pixel of render target A
• f() is fairly complex and cannot be achieved through src color * src blend + dst color * dst blend
• Usually this is done through “ping pong”:
• Copy render target A to render target B
• Sample render target B during the shading pass, calculate A’ based on f(B)
• Write back A’ to render target A
• This needs one extra texture copy and one extra render target
A B f(B)B
Pass 1: copy A to B Pass 2: write f(B) to A
A’ = f (B)
A f(A)
Since B = A, so f(B) = f(A) = A’
DX11 in-place editing
• DX11 supports in-place editing on RWTexture if its format is 32bit:
1. Create the texture with its corresponding typeless format. For example, for R8G8B8A8, its corresponding typeless format
is R8G8B8A8_TYPELESS.
2. Create the RTV and SRV of it with its original format, such as R8G8B8A8.
3. Create the UAV of it with R32UINT.
4. For the rendering pass, set the UAV as its shading output.
5. Inside the shader, load the data from UAV first, then convert it from R32UINT format to your desired format (such as
R8G8B8A8).
6. After the data modification, re-convert your data back to R32UINT format, then write it back to the output UAV.
7. Thus, we can modify the data of target A with only one rendering pass and no extra render target.
A
Read A directly, do the modification, then write it back directly
A’ = f (A)
f(A)
DX11 in-place editing
• The Gbuffer rain pass and Gbuffer snow pass of CryEngine 3 can be optimized through this
method.
• With in-place editing, we saved 3 extra render targets from Gbuffer rain pass and Gbuffer snow
pass.
• Besides, we also saved 1 ms GPU time from Gbuffer rain pass and 0.2 ms GPU time from Gbuffer
snow pass.
• Warning: DX11 in-place editing doesn’t work for MSAA targets and it breaks on old (pre 2015)
Nvidia drivers.
Gbuffer Compression (not used)
• Gbuffer layout before rendering pipeline refactoring:
• R8G8B8A8 Normal & Glossness + R32F linear depth
1. Good: thin Gbuffer, cheap on bandwidth
2. Bad: cannot support PBR for IBL cubemaps and point lights
• Gbuffer layout after rendering pipeline refactoring:
• R8G8B8A8 Normal & Material ID + R8G8B8A8 Glossness & Specular + R8G8B8A8 Diffuse & Transmittance + R32F linear depth (DX9 only)
1. Good: supports PBR for all types of lighting, backward compatible with legacy materials
2. Bad: 3 targets for DX11 and 4 targets for DX9, bandwidth usage doubled compared to previous version
• Tested Gbuffer compression:
• A2R10G10B10 NormalXY & Glossness & Material ID + R8G8B8A8 Specular YCr/YCb & Diffuse YCr/YCb + R32F linear depth (DX9 only)
1. Good: have all the information we need from the uncompressed Gbuffer layout
2. Bad: ALU usage for decoding interleaved YCrCb Gbuffer data over-shadows bandwidth gain
3. Conclusion: not viable on PC at least
Road to PBR
 Programmer side:
1. HDR & linear space lighting pipeline (already exists in CryEngine).
2. New texture combination: Diffuse + Normal + Specular (Frensel 0) + Glossness.
3. New Gbuffer layout & rendering pipeline to suit the new texture combination.
4. Specular BRDF: Normalized Phong  Microfacet GGX.
5. Specular AA (already handled in CryEngine texture tool): http://blog.selfshadow.com/2011/07/22/specular-
showdown/
6. Pre-filtered / runtime-filtered environment specular cubemap.
 Artists side:
1. Need to make sure the artists understand the difference between albedo diffuse and specular reflectance.
2. Need to make sure the artists know where to put their texture details at (normal & glossness, but not in diffuse or
specular).
Road to PBR
 The Problem:
1. Our time budget for PBR assets transition is only 2-3 months.
2. Need to figure out a method for quick & easy assets review.
3. We also refactored our environment cubemap pipeline, all maps lighting setup were refactored again.
4. Need to figure out a method for quick & easy lighting setup review.
Road to PBR
 The Solution:
1. Added lots of rendering debug view mode in the sandbox, so that our artists can review their work quickly
and easily.
2. We also expended this debug view mode with lots of profiling visualization modes, such as texture usage.
A corner of the main city in Sandbox
Diffuse view mode: albedo diffuse should
not contain lighting information and
should be dark or black for metals.
Glossness view mode: artists can put a
much details into the material
glossness.
Specular view mode: metals have bright and
sometimes colorful specular (over 100) while non-
metals all have low and gray specular (around 50).
Normal view mode: we use this view
mode to check if assets have incorrectly
exported normal maps.
Environment diffuse view mode: this is used to check
how the skylight parameter of TOD is setup and how
the GI system is working in the environment.
Environment specular view mode: this is used to check
how the pre-filtered / runtime-filtered specular
cubemap is working in the environment.
Texture MIP view mode: this is used for
testing the texture streaming system.
Texture usage view mode: this is used for
checking the amount of material textures
used for each surface material.
Texture size view mode: this is used for
checking the maximum texture size of each
surface material.
Gameplay related texture channel view
mode: we also store some gameplay
related information into material textures.
Reference
• [ McAuley 2015, "Rendering The World Of Far Cry 4" ]
• [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]
• [ GPU Gems3, "GPU Based Importance Sampling" ]
• [ Karis 2013, "Real Shading in Unreal Engine 4" ]
• [ Lazarov 2013, "Getting More Physical in Call of Duty: Black Ops II" ]
• [ McGuire 2014, "Efficient GPU Screen-Space Ray Tracing" ]
• [ Valient 2014, "Reflections And Volumetrics Of Killzone Shadow Fall" ]
• [ Stachowiak 2015, "Stochastic Screen-Space Reflections" ]
• [ Harada 2012, "Forward+: Bringing Deferred Lighting To The Next Level" ]
• [ Young 2010, "DirectCompute Optimizations and Best Practices" ]
• [ Persson 2010, http://humus.name/index.php?page=3D&ID=84 ]

Rendering 'monster hunter online'

  • 1.
    Rendering ‘Monster HunterOnline’ Liu Xiao Graphics Programmer
  • 2.
    Content • Background • Outline •24-hour TOD dynamic IBL • Physically correct screen space reflection • Lighting for transparent objects • Pipeline summery • Other small things
  • 3.
    Background  Why didwe need to refactor the rendering pipeline of CryEngine? • The version we are using (3.3.8) is out of date and the graphics quality was not really “next generation”. • Its IBL GI system does not support dynamic time of day, while we have lots of maps with day / night cycles. • No point lighting for transparent objects or any other forward shading objects (cloth, hair, etc). • The real-time water reflection system has too many limits, while we have a lot of water surfaces in game, along with other reflective surfaces such as ice, marble floors, etc.  Why did we focus on the lighting pipeline? • Most of the art assets are finished, therefore we need to find a quality improving direction mostly driven by programming. • Physically based shading is main stream in this generation. • Monster Hunter is a game that focuses a lot on object details, especially monster details and character details.  Conclusion: • better lighting quality and better material details.
  • 4.
    Outline  The renderingpipeline refactoring started at February 2015 – 9 months before release • Headache: 80% of the art assets were finished, yet the graphics quality was still at the X360 / PS3 level. • Direction: a programming driven, artist friendly way to improve the quality. • Conclusion: PBR material & pipeline + other features that enhance the lighting quality.  Comparison between CryEngine versions • The deferred lighting + forward shading pipeline in CryEngine 3.3.8 has cheap G buffer pass but cannot support PBR for local point lights and GI. It also wastes draw calls. • The deferred shading + (some object) forward shading pipeline in CryEngine 3.6.17 supports full PBR pipeline, yet it still lacks some features. Besides, it’s DX11 only while we still need to support DX9.
  • 5.
    Outline  Step 1:Upgrade some part of the engine • Deferred Lighting  Deferred Shading • Normalized Phong BRDF  GGX BRDF
  • 6.
    Outline  Step 2:Develop some features that CryEngine couldn’t offer • Runtime GPU filtered dynamic IBL for 24-hour time of day. • Physically correct screen space reflection. • Forward plus pipeline for translucent objects. • Compute shader & UAV support (won’t be discussed in this talk).
  • 7.
    Video(Static IBL): cubemapbuilt from 12pm cannot fit full day / night cycle
  • 8.
    Video(Dynamic IBL): onecubemap fits full day / night cycle
  • 9.
    SSR on watersurface
  • 10.
    SSR on wetsurface (rainy)
  • 11.
    SSR on ice(general material)
  • 12.
  • 13.
    Hair with F+point light lighting
  • 14.
    Hair with outpoint light lighting
  • 15.
  • 16.
  • 17.
    F+ for AMDTressFX
  • 18.
  • 19.
    Image Based Lightingin CryEngine CryEngine uses standard image based lighting • Plant cubemap probes in the editor, then bake the cubemaps. • Then the baked cubemap is filtered through AMD Cubemap Gen and generates one diffuse cubemap and one specular cubemap.
  • 20.
  • 21.
    Image Based Lightingin CryEngine • The low frequency diffuse cubemap is used for environment diffuse。
  • 22.
    Image Based Lightingin CryEngine • The high frequency specular cubemap is used for environment specular. • The cubemap is filtered on various levels based on different levels of glossness, and the filtered results are stored in the cubemap mips. • The lighting shader will fetch the correct cubemap mip based on the pixel’s glossenss.
  • 23.
    Image Based Lightingin CryEngine  Pros: • Easy to bake, easy to use.  Cons: • Pre-baked cubemap is not suitable for all TOD setups. It only suits the lighting environment when it’s baked. • Although we also have a lots of maps with fixed TOD, it still requires tons of work to rebuild all of them due to the amount of combination: • 1 map = 8 zones • 1 zone may have 3 - 5 fixed TOD setups • 1 zone may have 1 – 12 cubemap probes • Building 1 map = ??? Of TOD changes & cubemap rebuilds  Conclusion: • Need to let one cubemap suit all TOD setups!
  • 24.
    Static IBL (12pm)– cubemap built @ 12pm fits well
  • 25.
    Static IBL (6pm)– cubemap built @ 12pm is too bright
  • 26.
    Static IBL (5am)– cubemap built @ 12pm is too bright
  • 27.
    24-hour Dynamic GI Goal: • One cubemap suits all time of the day. • Can be used for both 24-hour TOD map and fixed TOD map.  Our solution in Monster Hunter Online: • Instead of baking a cubemap with static scene image, bake a scene diffuse cubemap and scene normal cubemap. [ McAuley 2015, "Rendering The World Of Far Cry 4" ] • Based on the game’s lighting environment and sky dome, re-light the cubemap, and generate cubemap mips. The mip with 16 x 16 resolution can be used for environment diffuse cubemap. • Based on the BRDF we use, with filtered importance sampling, we filter the re-lit cubemap based on different levels of glossness. The filtered cubemap will be used for environment specular cubemap. [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]
  • 28.
  • 30.
    Cubemap Filtering  Foreach runtime re-lit cubemap: 1. Generate MIPs from raw cubemap first. 2. The low resolution MIP can be used for environment diffuse cubemap directly. 3. However, for environment specular, we need to filter the cubemap correctly based on different levels of glossness, and store the filtered results in different cubemap MIPs.  The problem: how to filter specular cubemap correctly on GPU at runtime?  Solution: GPU filtered importance sampling 1. [ GPU Gems3, “GPU Based Importance Sampling” ] 2. [ Karis 2013, “Real Shading in Unreal Engine 4” ] 3. [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]
  • 31.
    Environment Specular Importance samplingin Monte-Carlo integration: With specular BRDF:
  • 32.
  • 33.
    环境Specular • The DFGterm can be pre-integrated offline and the results can be stored in a texture. [ Karis 2013, “Real Shading in Unreal Engine 4” ] • Or it can be approximated through one function. [ Lazarov 2013, "Getting More Physical in Call of Duty: Black Ops II" ]。 • We use the pre-integrated DFG texture for our high spec configuration while we use the Lazarov function for our low spec configuration in order to reduce texture sampling.
  • 34.
    Cubemap Filtering  TheLD term needs to be computed for each pixel of the cubemap 1. For each pixel of each cubemap MIP, find N sampling directions through importance sampling. [ Karis 2013, “Real Shading in Unreal Engine 4” ] 2. Calculate the PDF of that direction based the glossness value the filtering is based on. 3. Based on the PDF, calculate the raw cubemap MIP we need to sample, and sample it. [ GPU Gems3, “GPU Based Importance Sampling” ](equation 11, 12 and 13) 4. Weight these samples with cosine term.
  • 35.
    Cubemap Filtering  Optimization: •There’s no need to filter cubemap MIP0 as it will be used for perfect mirror-smooth surface. We just copy the MIP0 of the raw cubemap to the filtered cubemap. • Limit the sampling count of the rest MIPs to be 32. • Limit the maximum size of the cubemap to be 128 x 128 x 6. • Since the TOD of our game isn’t running very fast, there’s no need to filter all cubemaps every frame. We distribute the filtering work through multiple frames.  Results: • Lighting + filtering one cubemap = 0.35 ms • By distributing the filtering through multiple frames, the total GPU time of cubemap filtering pass is about 1 ms. • The GPU time was measured on GTX 660, with 12 cubemaps in the map.
  • 36.
    Cubemap Filtering • Thefinal filtered results is almost identical to the filtered results from ATI CubemapGen.
  • 37.
    Static IBL (6pm)– cubemap built @ 12pm is too bright
  • 38.
    Dynamic IBL (6pm)– ambient & env specular fit well
  • 39.
    Static IBL (5am)– cubemap built @ 12pm is too bright
  • 40.
    Dynamic IBL (5am)– ambient & env specular fit well
  • 41.
    Future work: • Thecurrent re-lit cubemap doesn’t contain depth info, thus the lighting pass can only apply sun light and sky light to it. • Need to bake depth into the cubemap so that the cubemap re-lighting pass can be more accurate. • The filtering pass can be further optimized if we batch all cubemap faces of the same MIP together. • An automatic cubemap probe planting & baking system would be nice.
  • 42.
    Physically correct screenspace reflection
  • 43.
    Motivation: • Monter HunterOnline has a lot of water surface and smooth surface across the game world, thus correct reflection system is necessary. • The original water reflection was achieved by rendering the scene again through a reflection camera. • Lots of limitations: 1. More draw calls. 2. Even worse if there are multiple reflection surfaces in the viewport. 3. The reflection only has sun light and misses point light, shadows and GI. 4. Without any filtering, it only support mirror reflection and cannot support glossy reflection. 5. It cannot be used for GPU dynamic rain effect as the reflection surfaces (puddles) are generated procedurally.
  • 44.
  • 45.
    Motivation: • Screen SpaceReflection is being wildly used in games already. • Pros: 1. High efficiency & rooms for optimization. We ended up with 2 types of SSR: water only reflection for low spec and all- surface reflection for high spec. 2. The reflection contains all the rendering information (lighting, shadows, GI, particles) if we use last frame’s scene texture as input. • Cons: 1. Cannot reflect anything that is outside the viewport. 2. Cannot reflect anything that is occluded in the screen. 3. Need environment specular cubemap for fallback solution. • Conclusion: • CryEngine’s IBL cubemaps can be a good fall back solution for SSR if the tracing fails.
  • 46.
  • 47.
  • 48.
  • 49.
    Our SSR onmixed surface
  • 50.
    MHOL water onlyreflection
  • 51.
    CE 3.6.17 SSRon smooth surface
  • 52.
  • 53.
    CE 3.6.17 SSRon rough surface
  • 54.
    SSR pipeline inMHOL • Water only reflection (for low spec) 1. Render water surface normal & depth onto half resolution targets 2. Ray-tracing pass 3. Filtering pass with temporal reprojection 4. Apply SSR during water surface shading • All-surface reflection (for high spec) 1. Downsample Gbuffer normal & depth onto half resolution targets, then render water surface normal & depth onto it. 2. Ray-tracing pass with stochastic importance sampling 3. Scene texture blurring pass (scene texture was from last frame) 4. Filtering pass 5. Temporal reprojection pass 6. Apply SSR after deferred IBL pass 7. Apply water SSR during water surface shading
  • 55.
    Raytrace Pass (waterreflection) • The algorithm is pretty simple: calculate the reflection vector based on the view vector and surface normal, then ray march along the reflection vector until it hits the scene depth. normal view reflection ray scene depth reflection surface
  • 57.
    Raytrace Pass (waterreflection) • Optimizations: 1. For each ray march step: XY  one pixel of the screen space, Z  the reciprocal of linear Z. [ McGuire 2014, “Efficient GPU Screen-Space Ray Tracing” ] 2. The ray march process can be optimized with dithered sampling. Assuming the ray-tracing result for each pixel of a 2x2 tile is similar, then we can distribute the whole tracing distance evenly among these 4 pixels (with a dithered offset pattern) [ Valient 2014, “Reflections And Volumetrics Of Killzone Shadow Fall” ]. 3. Then we just need to composite the tracing results of these 4 pixels in the filtering pass in order to get the complete tracing result. DC BA A B C D
  • 58.
    Filtering Pass (waterreflection)  How to make sure the reflection has all the scene information? 1. Only store the reflection location (reflection’s UV coordinate in scene texture) during the ray-tracing pass. 2. Calculate the reflection location from last frame (reprojection) during the filtering pass. 3. Use the reprojected reflection location to sample the scene color from last frame’s scene texture.  How to composite the tracing results from 2x2 tile? • Since we are only dealing with water reflections now, the surface information of 2x2 tile is almost identical (same reflectance and glossness, as they are all water), we just need to average them.  Temporal reprojection for stability • Due to some factors (half resolution rendering, discontinuous scene depth, etc), the final reflection image flickers while camera is moving. Thus we need to use temporal reprojection to stabilize the reflection image. This will be detailed later.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
    Performance • Profiling environment:800 x 450 half resolution SSR on GTX 660 • The whole SSR process takes 0.55ms in the worst case (half of the screen is occupied by water).
  • 64.
    All surface reflection •The all-surface reflection needs to follow the microfacet BRDF we use. • The reflection direction is no longer a ray but a cone. 1. The rougher the reflection surface is, the blurrier the reflection image becomes. 2. The further the reflection surface is, the blurrier the reflection image becomes. 3. The intensity of the reflection needs to follow the reflection surface Fresnel reflectance.
  • 65.
  • 66.
  • 67.
  • 68.
    Raytrace Pass (all-surface) •Since the reflection direction is no longer a ray but a cone, we need more than one reflection rays in order to fill the cone. • Which is quite impractical in realtime game. normal view scene depth reflection surface reflection cone
  • 69.
    Raytrace Pass (all-surface) •Optimization 1: 1. We can expend the dithered sampling technique motioned before. For 4 pixels in the 2x2 tile, we can shoot 4 rays with 4 different reflection angles. 2. If the reflection surface is smooth enough, these 4 rays may be enough to cover the reflection cone. DC BA A B C D
  • 70.
    Raytrace Pass (all-surface) •Optimization 2: 1. However, for rough reflection surface, how can we cover the cone with only 4 rays? 2. Solution: temporal supersampling. • Temproal Supersampling Refresher: 1. For a static pixel, if nothing is changed regarding to this pixel, then its image should be the same across frames. 2. Based on this assumption, we can distribute the rendering work of this pixel across frames and use the accumulated result of the pixel as its final image.
  • 71.
    Raytrace Pass (all-surface) •Temproal Supersampling: For the pixels of 2x2 tile, we not only let them shoot 4 rays with 4 different reflection angles at one frame, we also let them shoot rays with different angles across frames. This is achieved by adding a frame variation to the importance sampling used for reflection angle calculation. A B C D A B C D Frame N+1Frame N
  • 72.
    Video: tracing resultwith jittering
  • 73.
    Filtering Pass (all-surface) •Difference between this and how we handle the filtering of water reflection 1. Since the ray angles were calculated based on importance sampling with a jittered pattern, we also need a jittered pattern to sample the ray-tracing results. [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ] 2. The source texture (last frame’s scene texture) needs to be pre-filtered so that the reflections of all glossness can find the correct blurred scene image. 3. The material information of the pixels within the 2x2 tile may be totally different, thus we cannot simply average the ray- tracing results. The results have to be weighted by the material surface BRDF. Based on [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ], the weight for each ray-tracing result pixel is the current pixel’s BRDF divided by the sampling pixel’s PDF.
  • 74.
  • 75.
    Temporal Reprojection Pass •Temporal Reprojection: 1. For each pixel from the filtering pass, we calculate its world space position, then reproject it into last frame’s screen UV space, then find the resolved result from last frame’s SSR image. 2. Then we accumulate the last frame’s SSR data with current frame’s filtered data, and use that for this frame’s final SSR image. • The reprojection process for SSR is a little bit different 1. Usually we reproject the pixel based on its world space position. We calculate its world space position based on the pixel’s screen space location and scene depth. However, this is not the case for SSR reprojection. 2. For the pixel of the reflection, we cannot use the reflection surface’s depth, we need to use the reflection’s depth to calculate the world space position, then use that to reproject the pixel location. 3. [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ]
  • 76.
    Video: final resultwith temporal reprojection
  • 77.
    Temporal Reprojection Pass •Remove Ghost Artifacts: 1. Since we use last frame’s scene texture as reflection source texture, we will have one-frame delay anyway. Thus we cannot remove ghosting totally, we can only limit it to a tolerable level. 2. Using reflection’s depth instead of reflection surface’s depth for reprojection (previously mentioned) helps a lot. 3. The color & luma bounding box clamping from UE4’s TXAA technique also helps a lot. 4. Skip the resolved sample if the reprojected location is too far away from the current pixel location. 5. Skip the resolved sample if the depth of the reprojected location is too far away from the depth of the current pixel location.
  • 78.
    Video: Reprojection basedon reflection surface’s depth
  • 79.
    Video: Reprojection basedon reflection’s depth
  • 80.
    Video: moving cameraon smooth surface
  • 81.
    Video: moving cameraon mixed surface
  • 82.
    Performance • Profiling environment:800 x 450 half resolution SSR on GTX 660 • Ray-tracing pass = 0.8 ms (worst case while all pixels on screen pass the SSR glossness threshold) • Last frame scene texture pre-filtering = 0.08 ms • Filtering pass = 0.27 ms • Temporal reprojection pass = 0.2 ms • Total = 1 ms (average) ~ 1.4 ms (worst)
  • 83.
    Future work: 1. Optimizeray-tracing pass with hi-z sampling. 2. Further reduce ghosting issue by applying motion vector during the temporal reprojection pass. 3. Render the filtering pass in full resolution for better quality (maybe for ultra spec configuration).
  • 84.
  • 85.
    Lighting Translucent objects Goal: • Almost all of our maps have point lights and for some of them point light is the only type of light source. • Monster Hunter Online has quite a few objects that can only be correctly shaded through forward shading: a) Objects with anisotropic specular: hair (opaque part), silk, etc. b) Objects with subsurface scattering for all lights: ice. c) Translucent objects that do not go through Gbuffer pass: hair tips (translucent), TressFX fur, glass, etc. • Need to make sure all of these forward shading objects are correctly lit by point lights.  Solution: • Forward Plus [ Harada 2012, “Forward+: Bringing Deferred Lighting To The Next Level” ]
  • 86.
    Video: Hair isaffected by point lights
  • 87.
    Forward Plus  Differencefrom tiled lighting: 1. Both techniques do tiled light culling on GPU. 2. Tiled lighting does per-pixel lighting directly after light culling. 3. Forward plus stores culled lights into a light indexed list which can be used for both deferred lighting and forward lighting.
  • 88.
    Forward Plus  Howtile is divided: 1. The amount of tile needs to match the pixel count of one mip of hi-z. Each tile represents one pixel of the hi-z. Our hi-z contains both min z and max z of the scene. 2. In 4k resolution, the screen is divided into 256 x 128 tiles; in 1080p or above, the screen is divided into 128 x 64 tiles; in other resolution, the screen is divided into 64 x 32 tiles. 3. Up to 16 point lights are supported for each tile; up to 255 point lights are supported for the viewport; the point light indices range from 1 to 255. 4. Each light is culled by the tile boundary and the min z and max z of the tile (the pixel of the hi-z). Hi-z is also used for GPU occlusion test for the game.
  • 89.
    Forward Plus  Howlight indices are stored: 1. For DX9, first we need 4 R8G8B8A8 render targets to store the light indices. The first 4 lights of one tile is stored in render target A, the next 4 lights of the same tile is stored in render target B, and so on. 2. Therefore the light index has to be in the range of 1-255 (0 is reserved as no-light), and 1 pixel of 1 render target can store up to 4 lights, which means up to 16 lights are supported for 1 tile. 3. Then we combine these 4 render targets into 1 light index atlas. 4. For DX11 it’s pretty simple: the light indices of each tile is stored in one element of a structured buffer.
  • 92.
    Forward Plus  Pipeline: 1.Cull the lights against frustum on CPU, then write light data into one global light buffer. 2. Write the depth of translucent objects into scene Z target, and generate hi-z based on that. 3. Do tiled light culling on GPU, then store the culled light indices into global light indexed light. 4. During forward shading, find the tile index based on the shading pixel location, then find the light indices from the global light indexed light, then light the pixel based on the light data from the global light buffer.  Pros: Easy to implement. Meets our artists requirement pretty well.  Cons: 1. We run into VGPR issues in the forward shading pass due to extensive forward lighting (and GI) calculation. 2. DX9 has vector register limits, which makes some of the pixel shaders fail to compile. 3. We have to simplify some lighting functions for the DX9 version.
  • 93.
    Character Hair Rendering •Most part of the hair strand is opaque, while the strand tip is translucent. • The opaque part goes through Gbuffer pass, deferred cubemap pass and deferred shadow pass along with other general objects, while its lighting part is done through forward shading due to its anisotropic specular lighting. • The translucent part only goes through forward shading pass, where all of its lighting (both direct and environment) and shadowing are done. • Hair sorting: 1. For different characters their hair is sorted based on character heads position. 2. For hair layers within one hair style, they are sorted based on their sub-material names, such as “hair_0”, “hair_1” and such. We require artists to appoint different sub-materials to different hair layers if they want correct sorting. 3. However, for quite a few hair styles we don’t sort at all and the result is still fine. • We also write hair depth into scene Z target so that the hair won’t be all blurry when depth of field is turned on.
  • 94.
    Opaque hair onlyVS opaque hair with translucent tips
  • 95.
    GI for translucentobjects • Translucent objects still need GI lighting. • Especially for hair, whose opaque part goes through deferred cubemap pass for GI while translucent part only does forward shading. Strand tips with no GI Strand tips with GI
  • 96.
    GI for translucentobjects  Goal: • Our version of CryEngine uses cubemaps as its GI solution. Sometime we have up to 12 cubemaps rendered in screen. • For DX11 we can simply store all these cubemaps into one cubemap array and use it anywhere, while this is totally unviable in DX9. • For DX9 we cannot sample all those 12 cubemaps in one forward shading pixel shader either because of the texture stage limits. • Need to figure out a way to efficiently store all those cubemaps and use them in the forward shading pass.  Solution: • Project all those cubemaps into spherical harmonics coefficients. • Therefore our GI for translucent objects is environment diffuse only. The environment specular for translucent objects is faked by sampling one general specular cubemap (our translucent hair tips don’t have any environment specular at all).
  • 97.
    Spherical Harmonics Projection Pipeline: 1. Project each pixel of the diffuse cubemap into spherical harmonics, then accumulate them. 2. For DX11 we use one set of SH9 coefficients, which can be stored in one element of a structure buffer. 3. For DX9 we use one set of SH4 coefficients, which can be stored in 4 pixels of a R32G32B32A32F render target. 4. To simply the pipeline, we project all diffuse cubemaps into spherical harmonics coefficients, and use that for both of our deferred cubemap pass and forward shading pass. The environment specular is still done through sampling specular cubemaps during the deferred cubemap pass. 5. As is mentioned before, since we can only project diffuse cubemaps into spherical harmonics coefficients, we can only do environment diffuse lighting for translucent objects. The environment specular lighting is faked for them.
  • 98.
    Spherical Harmonics Projection DX11: 1. The projection process is done through compute shader with parallel reduction[ Young 2010, “DirectCompute Optimizations and Best Practices” ]. 2. We spawn 256 threads for one 16 x 16 diffuse cubemap. Each thread handles 6 pixels from 6 faces. The projected coefficients are stored in the group shared memory, then they are accumulated through parallel reduction. 3. It only takes 0.16 ms to project 12 cubemaps with 16 x 16 resolution. We do this process every frame.  DX9: 1. We can’t find a smart way to do this, so we have to do the projection through brutal force: we have to do the projection of all pixels of one cubemap in one pixel shader fragment. 2. Due to the inefficiency, we have to lower the diffuse cubemap resolution for this process. We use 8 x 8 diffuse cubemap for SH4 projection. All 8 x 8 x 6 pixels are projected and accumulated in one pixel shader fragment. 3. Basically we trade GPU performance with quality. The GPU time it takes to project 12 cubemaps with 8 x 8 resolution is 0.18 ms for DX9, which is on par with the DX11 version, and the quality loss is sort of acceptable.
  • 99.
  • 100.
  • 101.
  • 102.
    Light Indexed Deferred •The CryEngine we are using still uses light volume as its deferred lighting solution. This is fine most of the time, until we run into maps with lots of point lights. • Since we already have the global light indexed list, we can just use it as our deferred lighting solution, which is called “light indexed deferred” in some papers. • It’s not much different than tiled lighting, except we have culled the lights already and we just need to grab the light indices for each tile and do the usual thing.
  • 103.
    Light Indexed Deferred •Performance: 1. DX11: when there aren’t many point lights in the viewport, the light indexed deferred method is on par with using light volumes. However, when the amount of lights exceeds 50, the light indexed deferred method beats light volumes hands down. 2. DX9: due to the inefficiency of branches and loops in SM 3.0, the light indexed deferred method is slower than light volumes when there aren’t many point lights. • Since we don’t have 50+ point lights most of time, we only turn on light indexed deferred for DX11. For DX9 we still use the CryEngine light volume solution. • Since we have a lot of outdoor maps in our game, the sky background can be seen most of the time during the game. Thus a stencil pre-pass culling all the empty tiles is quite helpful. • In Monster Hunter Online, the stencil pre-pass before the lighting pass saves about 0.1 – 0.2 ms in DX11.
  • 106.
    Future work: 1. Currentlyshadow casting point lights and spot lights are still rendered by using light volumes. 2. Need to find a solution to make the tiled deferred lighting system support shadow casting point lights. 3. Also need to find a solution to support spot lights and area lights (need to implement first).
  • 107.
  • 108.
    Rendering Pipeline inCryEngine 3.3.8 • CryEngine 3.3.8 uses deferred lighting + forward shading 1. Gbuffer Pass 2. Shadow Pass 3. AO Pass 4. Deferred Cubemap Pass (GI solution) 5. Deferred Lighting Pass (done through light volumes) 6. Deferred Shadowing 7. Forward Shading (for everything) 8. HDR Tonemapping + Post Effects + AA • Gbuffer layout: R8G8B8A8 Normal & Glossness + R32F linear depth • Pro: thin Gbuffer saves a lot of bandwidth • Con: waste of draw call and the thin Gbuffer cannot support PBR for point lights and GI
  • 109.
    Rendering Pipeline inCryEngine 3.6.17 • CryEngine 3.6.17 uses Deferred Shading + Forward Shading 1. Gbuffer Pass 2. Shadow Pass 3. AO Pass + SSR Pass 4. Deferred Cubemap Pass 5. Deferred Lighting Pass (done through tiled lighting) 6. Deferred Shadowing 7. Deferred Shading 8. Forward Shading (only for some objects) 9. HDR Tonemapping + Post Effects + AA • Gbuffer layout: R8G8B8A8 Normal & Transmittance + R8G8B8A8 Diffuse & SSS profile + R8G8B8A8 Glossness & Specular • Pro: supports full PBR for all types of lights • Con: lacks features our artists desire & no longer supports DX9
  • 110.
    Rendering Pipeline inMonster Hunter Online • Based on 3.6.17, we added more features 1. Gbuffer Pass 2. Shadow Pass 3. Hi-Z Occlusion Test (DX11) 4. AO Pass + Stochastic SSR Pass 5. Cubemap Relighting & Filtering Pass 6. SH9 / SH4 projection Pass 7. Deferred Cubemap Pass 8. Deferred Lighting Pass (DX9: light volumes; DX11: light indexed deferred) 9. Deferred Shadowing 10. Deferred Shading 11. Forward Plus Preparing (build hi-z with translucent objects depth + build light indexed list based on new hi-z) 12. Forward Shading (with Forward Plus lighting for translucent objects) 13. HDR Tonemapping + Post Effects + AA • Gbuffer layout: R8G8B8A8 Normal & Material ID + R8G8B8A8 Glossness & Specular + R8G8B8A8 Diffuse & Transmittance + R32F linear depth (DX9 only)
  • 111.
    Rendering Pipeline inMonster Hunter Online • We also have a simplified deferred shading pipeline for middle spec configuration. 1. Gbuffer Pass 2. Shadow Pass 3. Hi-Z Occlusion Test (DX11) 4. Water only SSR 5. Deferred Lighting Pass (DX9: light volumes; DX11: light indexed deferred) 6. Deferred Shadowing 7. Deferred Shading 8. Forward Plus Preparing (build hi-z with translucent objects depth + build light indexed list based on new hi-z) 9. Forward Shading (with Forward Plus lighting for translucent objects) 10. HDR Tonemapping + Post Effects + AA • We replaced the cubemap GI solution with artist tuned static ambient value for environment diffuse and one global specular cubemap for environment specular.
  • 112.
    Rendering Pipeline inMonster Hunter Online • Meanwhile we also have one forward shading only pipeline for really low spec machines. 1. Forward Shading done in R16G16B16A16F format render target 2. HDR Tonemapping and no other post effects  This is pretty much how vanilla Monster Hunter is rendered on PSP, and possibly can be our rendering pipeline of choice if we plan to port our game to mobile phones.  Only sun light is handled. For a lot of environment objects normal mapping is skipped and only vertex normal is used.  Environment diffuse is done through artist tuned ambient value, and environment specular is done through global specular cubemap.  Shadow map is not used. Instead, sphere shadow volume is used for character shadows.  Still supports PBR.  Future work: 1. Write scene depth into alpha channel of the scene render target so that we can still support screen space decals. 2. Since we don’t have depth pre-pass, we’d better sort the scene objects beforehand in order to reduce overdraws with early depth test.
  • 113.
    Other small things Miscellaneousrendering tricks, experiences, etc
  • 115.
    Prevent color leakingin half-res DOF • Depth of field pipeline: 1. Filter Pass: calculate the filtering kernel and blend alpha based on scene depth, then filter the scene. 2. Composite Pass: blend the filtered scene and the original scene based on blend alpha. DOF filter Half resolution scene DOF filtered scene blended with un- filtered scene
  • 116.
    Prevent color leakingin half-res DOF • How to prevent color leaking: 1. Pre-multiplied Alpha Pass: before filtering the scene, calculate the blend alpha and multiply the scene with it. 2. Filter Pass: filter the scene which has been multiplied by the blend alpha, then divide the blend alpha after filtering. 3. Composite Pass: same as usual. DOF filter Half resolution scene pre-multiplied with alpha DOF filtered scene blended with un- filtered scene
  • 118.
    Decal color leakDecal no color leak
  • 119.
    Prevent color leakingin deferred decals • The Problem: • This is caused by depth discontinuities when there’s an object occluding the decal in screen space. • For small decals, this can be solved by disabling texture mipmap, such as forcing to use MIP0. • However, in our game we have quite a lot of huge decals placed on terrain in order to improve environment variation. • The Solution: • [ Persson 2010, http://humus.name/index.php?page=3D&ID=84 ] • We sample the scene depth within the 2x2 area of the decal pixel, then manually find the depth continuities on horizontal and vertical direction. Then we use tex2Dgrad to sample the decal texture.
  • 120.
    Geometry blending • Ourartists noticed the feature when playing Rise of the Tomb Raider. • Our artists want similar feature implemented in Monster Hunter Online: to reduce visible disconnect between objects and terrain. No Geometry Blend With Geometry Blend
  • 121.
    Geometry blending • Ourpipeline: 1. Separate geometry blending objects from other Gbuffer objects and put them into a standalone render list. 2. Render other Gbuffer objects first in the Gbuffer pass, then resolve the scene depth onto a target (half resolution depth will also do). 3. Then render the geometry blending objects: ① Sample the resolved depth in shader and calculate the blending alpha based on the difference between resolved scene depth and geometry fragment depth. ② Do Gbuffer alpha blending (like how the terrain layers are handled) based on this depth bias based alpha. 4. Since we use deferred shading and all PBR materials go through the same lighting function, Gbuffer blended objects look like they are blended together.
  • 122.
    Alpha blended albedodiffuse Alpha blended world space normal Alpha blended specular reflectance Alpha blended glossness
  • 123.
    Geometry blending • Withthis technique we can also do blending among non-terrain objects: • However, we require our artists to only do this among non-metals. This is because for non-metals their specular values are close (all around 50) while metals may have very different specular values. Besides, blending specular reflectance among metals or among metals and non-metals doesn’t make sense for PBR anyway.
  • 124.
    DX11 in-place editing •Sometimes we want to modify the data of render target A but the hardware alpha blending model just cannot do it. • For example: A’ = f (A) • A is the original pixel of render target A • A’ is the modified pixel of render target A • f() is fairly complex and cannot be achieved through src color * src blend + dst color * dst blend • Usually this is done through “ping pong”: • Copy render target A to render target B • Sample render target B during the shading pass, calculate A’ based on f(B) • Write back A’ to render target A • This needs one extra texture copy and one extra render target A B f(B)B Pass 1: copy A to B Pass 2: write f(B) to A A’ = f (B) A f(A) Since B = A, so f(B) = f(A) = A’
  • 125.
    DX11 in-place editing •DX11 supports in-place editing on RWTexture if its format is 32bit: 1. Create the texture with its corresponding typeless format. For example, for R8G8B8A8, its corresponding typeless format is R8G8B8A8_TYPELESS. 2. Create the RTV and SRV of it with its original format, such as R8G8B8A8. 3. Create the UAV of it with R32UINT. 4. For the rendering pass, set the UAV as its shading output. 5. Inside the shader, load the data from UAV first, then convert it from R32UINT format to your desired format (such as R8G8B8A8). 6. After the data modification, re-convert your data back to R32UINT format, then write it back to the output UAV. 7. Thus, we can modify the data of target A with only one rendering pass and no extra render target. A Read A directly, do the modification, then write it back directly A’ = f (A) f(A)
  • 126.
    DX11 in-place editing •The Gbuffer rain pass and Gbuffer snow pass of CryEngine 3 can be optimized through this method. • With in-place editing, we saved 3 extra render targets from Gbuffer rain pass and Gbuffer snow pass. • Besides, we also saved 1 ms GPU time from Gbuffer rain pass and 0.2 ms GPU time from Gbuffer snow pass. • Warning: DX11 in-place editing doesn’t work for MSAA targets and it breaks on old (pre 2015) Nvidia drivers.
  • 127.
    Gbuffer Compression (notused) • Gbuffer layout before rendering pipeline refactoring: • R8G8B8A8 Normal & Glossness + R32F linear depth 1. Good: thin Gbuffer, cheap on bandwidth 2. Bad: cannot support PBR for IBL cubemaps and point lights • Gbuffer layout after rendering pipeline refactoring: • R8G8B8A8 Normal & Material ID + R8G8B8A8 Glossness & Specular + R8G8B8A8 Diffuse & Transmittance + R32F linear depth (DX9 only) 1. Good: supports PBR for all types of lighting, backward compatible with legacy materials 2. Bad: 3 targets for DX11 and 4 targets for DX9, bandwidth usage doubled compared to previous version • Tested Gbuffer compression: • A2R10G10B10 NormalXY & Glossness & Material ID + R8G8B8A8 Specular YCr/YCb & Diffuse YCr/YCb + R32F linear depth (DX9 only) 1. Good: have all the information we need from the uncompressed Gbuffer layout 2. Bad: ALU usage for decoding interleaved YCrCb Gbuffer data over-shadows bandwidth gain 3. Conclusion: not viable on PC at least
  • 128.
    Road to PBR Programmer side: 1. HDR & linear space lighting pipeline (already exists in CryEngine). 2. New texture combination: Diffuse + Normal + Specular (Frensel 0) + Glossness. 3. New Gbuffer layout & rendering pipeline to suit the new texture combination. 4. Specular BRDF: Normalized Phong  Microfacet GGX. 5. Specular AA (already handled in CryEngine texture tool): http://blog.selfshadow.com/2011/07/22/specular- showdown/ 6. Pre-filtered / runtime-filtered environment specular cubemap.  Artists side: 1. Need to make sure the artists understand the difference between albedo diffuse and specular reflectance. 2. Need to make sure the artists know where to put their texture details at (normal & glossness, but not in diffuse or specular).
  • 129.
    Road to PBR The Problem: 1. Our time budget for PBR assets transition is only 2-3 months. 2. Need to figure out a method for quick & easy assets review. 3. We also refactored our environment cubemap pipeline, all maps lighting setup were refactored again. 4. Need to figure out a method for quick & easy lighting setup review.
  • 130.
    Road to PBR The Solution: 1. Added lots of rendering debug view mode in the sandbox, so that our artists can review their work quickly and easily. 2. We also expended this debug view mode with lots of profiling visualization modes, such as texture usage.
  • 131.
    A corner ofthe main city in Sandbox
  • 132.
    Diffuse view mode:albedo diffuse should not contain lighting information and should be dark or black for metals. Glossness view mode: artists can put a much details into the material glossness. Specular view mode: metals have bright and sometimes colorful specular (over 100) while non- metals all have low and gray specular (around 50). Normal view mode: we use this view mode to check if assets have incorrectly exported normal maps.
  • 133.
    Environment diffuse viewmode: this is used to check how the skylight parameter of TOD is setup and how the GI system is working in the environment. Environment specular view mode: this is used to check how the pre-filtered / runtime-filtered specular cubemap is working in the environment.
  • 134.
    Texture MIP viewmode: this is used for testing the texture streaming system. Texture usage view mode: this is used for checking the amount of material textures used for each surface material. Texture size view mode: this is used for checking the maximum texture size of each surface material. Gameplay related texture channel view mode: we also store some gameplay related information into material textures.
  • 135.
    Reference • [ McAuley2015, "Rendering The World Of Far Cry 4" ] • [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ] • [ GPU Gems3, "GPU Based Importance Sampling" ] • [ Karis 2013, "Real Shading in Unreal Engine 4" ] • [ Lazarov 2013, "Getting More Physical in Call of Duty: Black Ops II" ] • [ McGuire 2014, "Efficient GPU Screen-Space Ray Tracing" ] • [ Valient 2014, "Reflections And Volumetrics Of Killzone Shadow Fall" ] • [ Stachowiak 2015, "Stochastic Screen-Space Reflections" ] • [ Harada 2012, "Forward+: Bringing Deferred Lighting To The Next Level" ] • [ Young 2010, "DirectCompute Optimizations and Best Practices" ] • [ Persson 2010, http://humus.name/index.php?page=3D&ID=84 ]

Editor's Notes

  • #4 The CryEngine version we are using is quite out of date and due to our extensive changes in the gameplay & online code base, upgrading the engine is no longer an option. The time window for updating visual quality was really short (1 year left before release) and 80% of the art assets were already finished. Therefore we need to find a direction that both improves our graphics quality and affects as few art assets as possible.
  • #6 First we upgraded some part of the rendering engine. The transition from Normalized Phong material to standard PBS material was not that hard. Our artists had spent some time on a later version of CryEngine creating the opening cinematic assets so they had some training & practice for it. Most of the texture conversion work was automated by python scripts. Although most of them were not calibrated, but the result was acceptable considering the time budget we had.
  • #7 Since we have quite a few maps with 24-hour time of day, the pre-baked IBL provided by CryEngine is not enough. We need a solution for correct GI in any time of the day. The water reflection system (reflective camera) from CryEngine has a lot of limits: no shadow, no point lights, waste of draw call, etc. The origin screen space reflection from CryEngine is not good enough (will be detailed later). CryEngine 3.6.17’s forward tiled lighting only support hair and it requires DX11, while we need a solution for a range of materials and we need it work on both DX9 and DX11. Compute shader & UAVs are essential for a functional DX11 driver, yet they were lacked in either CryEngine 3.3.8 or CryEngine 3.6.17.
  • #8 Notice the ambient lighting during dawn and dusk: it’s way too bright for dawn and dusk while the sun is at horizon angle.
  • #9 The lighting at dawn and dusk is much correct (the bright night is explicitly configured by the artists).
  • #10 CryEngine has its own screen space reflection but it cannot match ours either performance wise or quality wise.
  • #11 Notice the reflection is stretched along the reflection surface.
  • #12 This shows that our SSR works on all kinds of materials and it follows the roughness & normal details on materials.
  • #13 A transparent bottle affected by point lights.
  • #14 Character hair has its opaque part and translucent part. This picture shows the character hair is lit by a point light at the camera position.
  • #15 This shows the artifacts if we turn off the F+ pipeline for the translucent part of the hair.
  • #16 Hair (both opaque and translucent) also needs correct GI. This picture shows the hair is affected by GI in the shadow.
  • #17 This shows the artifacts when the translucent part of the hair is not affected by GI.
  • #18 All hair strands in AMD TressFX are translucent triangles, therefore all of them need F+ support for correct lighting.
  • #21 The left one is high frequency specular cubemap, the right one is low frequency diffuse cubemap.
  • #25 A static cubemap is baked at 12pm.
  • #26 If we use the same cubemap for dusk, then obviously it’s too bright for that time of day.
  • #27 And if we use it for dawn, then it’s also too bright.
  • #29 Normal cubemap + diffuse cubemap + sky cubemap (baked at runtime every frame) = re-lit cubemap
  • #30 These are re-lit cubemaps based on the lighting environment of 1am, 5am, 10am, 1pm, 5pm and 20pm.
  • #31 The original pre-baked cubemap from CryEngine is filtered through ATI CubemapGen offline. However, for runtime re-lit cubemap, we have to figure out a method to filter it online. Our solution is GPU filtered importance sampling.
  • #32 fr is the specular BRDF we use in the game, and pr is the PDF of that BRDF.
  • #33 Based on [Lagrade et al. 2014], the integral can be de-composited into 2 parts: the DFG term and LD term.
  • #36 We only allow up to 4 cubemaps to be filtered each frame, and we make sure one cubemap won’t be filtered again for at least 10 frames.
  • #38 The over-bright dusk we used to have with CryEngine’s static cubemap.
  • #39 The correct dusk we have with runtime re-lit & filtered cubemap.
  • #40 The over-bright dawn we have with CryEngine’s static cubemap.
  • #41 The correct dawn we have with runtime re-lit & filtered cubemap.
  • #43 CryEngine has its own screen space reflection but ours is better performance wise and quality wise.
  • #47 The easiest type of screen space reflection.
  • #48 It works very well with CryEngine’s Gbuffer rain effect.
  • #49 This demonstrates the glossy reflection we developed.
  • #50 Notice the reflections on the stainless steel ball on the right corner.
  • #51 The ray-trace distance of our method can cover the whole screen and the performance is still very good. The SSR pass of this scene takes 0.35ms.
  • #52 The SSR from CryEngine 3.6.17 has limited ray-trace distance and it looks horrible on smooth surface such as water or ice. In this scene it takes 0.45ms.
  • #53 With importance sampling during the ray-tracing pass, our full-surface reflection can achieve physically correct reflection result on rough surface: Sharp reflection as it comes in contact with the reflection surface. Reflection stretches along the reflection direction.
  • #54 The glossy reflection of CryEngine’s SSR is not accurate and it suffers from heavy color leaking on object edges.
  • #55 Therefore the water meshes have to be rendered twice. The draw call of water surface mesh is quite simple as it’s just a quad, but for project grid ocean we cannot afford that. Thus the ocean SSR is done on a flat plane and we distort the reflection based on ocean normal during ocean shading pass.
  • #66 Picture is taken from China Art Museum. Notice the vertically stretched reflections of the LCD panel.
  • #67 The reflection close to the reflection surface is sharp. The reflection far from the reflection surface is blurry.
  • #68 This is the final result of a test scene in sandbox. I’ll demonstrate how the SSR works in this scene.
  • #69 For water reflection with one reflection ray, it takes 0.4 ms to complete the ray-tracing pass. Therefore we cannot afford more than that to shoot multiple reflection rays.
  • #73 Notice the jittering of the ray-tracing result. We will remove the jittering through temporal reprojection later.
  • #75 Now we get the feeling of the reflection images but there’s way too many noises and flickers. We will use temporal reprojection to remove them.
  • #77 Now after temporal reprojection, the final SSR image is much more stable and can be used for game.
  • #79 This video demonstrates the ghosting artifact caused by using reflection surface’s depth for reprojection.
  • #80 This video demonstrates result of using reflection’s depth for reprojection. The result is much more stable.
  • #91 This is a screenshot taken from the night scene of the main city, which has quite a lot of point lights.
  • #92 This is the light index atlas we use for DX9 forward plus pipeline. Top left: the first 4 light indices of each tile. Top right: the next 4 light indices of each tile. Bottom left: the next 4 light indices of each tile. Bottom right: the last 4 light indices of each tile.
  • #106 We have a light tile visualization system that prints out the count of lights for each tile on screen.
  • #111 We also have some gameplay related rendering passes among these rendering passes. They won’t be detailed in this talk.
  • #114 介绍一些渲染技巧,和光照无关。
  • #128 Normal compression  Lambert Azimuthal RGB color compression  interleaved YCr / YCb
  • #131 From left to right: M / B / G: these are game feature related info packed into material texture channels. Artists want to review their texture masking with this feature. N / N / D / D / T / S: size of normal map mip 0 / normal map mip / size of diffuse map mip 0 / diffuse map mip / count of textures used / shader complexity D / N / S / G / A: Gbuffer diffuse / Gbuffer normal / Gbuffer specular / Gbuffer glossness / ambient occlusion D / S: environment diffuse / environment specular
  • #133 These are our custom Gbuffer view modes in sandbox. They help our artists to review their work.
  • #135 These are our streaming related & gameplay related view modes. They help our engine programmers to review the streaming system we also refactored.