Rendering 'monster hunter online'

Rendering ‘Monster Hunter Online’
Liu Xiao
Graphics Programmer

Content
• Background
• Outline
• 24-hour TOD dynamic IBL
• Physically correct screen space reflection
• Lighting for transparent objects
• Pipeline summery
• Other small things

Background
 Why did we need to refactor the rendering pipeline of CryEngine?
• The version we are using (3.3.8) is out of date and the graphics quality was not really “next generation”.
• Its IBL GI system does not support dynamic time of day, while we have lots of maps with day / night cycles.
• No point lighting for transparent objects or any other forward shading objects (cloth, hair, etc).
• The real-time water reflection system has too many limits, while we have a lot of water surfaces in game, along with other
reflective surfaces such as ice, marble floors, etc.
 Why did we focus on the lighting pipeline?
• Most of the art assets are finished, therefore we need to find a quality improving direction mostly driven by programming.
• Physically based shading is main stream in this generation.
• Monster Hunter is a game that focuses a lot on object details, especially monster details and character details.
 Conclusion:
• better lighting quality and better material details.

Outline
 The rendering pipeline refactoring started at February 2015 – 9 months before release
• Headache: 80% of the art assets were finished, yet the graphics quality was still at the X360 / PS3 level.
• Direction: a programming driven, artist friendly way to improve the quality.
• Conclusion: PBR material & pipeline + other features that enhance the lighting quality.
 Comparison between CryEngine versions
• The deferred lighting + forward shading pipeline in CryEngine 3.3.8 has cheap G buffer pass but cannot
support PBR for local point lights and GI. It also wastes draw calls.
• The deferred shading + (some object) forward shading pipeline in CryEngine 3.6.17 supports full PBR
pipeline, yet it still lacks some features. Besides, it’s DX11 only while we still need to support DX9.

Outline
 Step 1: Upgrade some part of the engine
• Deferred Lighting  Deferred Shading
• Normalized Phong BRDF  GGX BRDF

Outline
 Step 2: Develop some features that CryEngine couldn’t offer
• Runtime GPU filtered dynamic IBL for 24-hour time of day.
• Physically correct screen space reflection.
• Forward plus pipeline for translucent objects.
• Compute shader & UAV support (won’t be discussed in this talk).

Video(Static IBL): cubemap built from 12pm cannot fit full day / night cycle

Video(Dynamic IBL): one cubemap fits full day / night cycle

Video: lighting translucent object

Hair with F+ point light lighting

Hair with out point light lighting

Image Based Lighting in CryEngine
CryEngine uses standard image based lighting
• Plant cubemap probes in the editor, then bake the cubemaps.
• Then the baked cubemap is filtered through AMD Cubemap Gen and generates one diffuse
cubemap and one specular cubemap.

• The low frequency diffuse cubemap is used for environment diffuse。

• The high frequency specular cubemap is used for environment specular.
• The cubemap is filtered on various levels based on different levels of glossness,
and the filtered results are stored in the cubemap mips.
• The lighting shader will fetch the correct cubemap mip based on the pixel’s
glossenss.

 Pros:
• Easy to bake, easy to use.
 Cons:
• Pre-baked cubemap is not suitable for all TOD setups. It only suits the lighting environment when it’s baked.
• Although we also have a lots of maps with fixed TOD, it still requires tons of work to rebuild all of them due
to the amount of combination:
• 1 map = 8 zones
• 1 zone may have 3 - 5 fixed TOD setups
• 1 zone may have 1 – 12 cubemap probes
• Building 1 map = ??? Of TOD changes & cubemap rebuilds
 Conclusion:
• Need to let one cubemap suit all TOD setups!

Static IBL (12pm) – cubemap built @ 12pm fits well

Static IBL (6pm) – cubemap built @ 12pm is too bright

Static IBL (5am) – cubemap built @ 12pm is too bright

24-hour Dynamic GI
 Goal:
• One cubemap suits all time of the day.
• Can be used for both 24-hour TOD map and fixed TOD map.
 Our solution in Monster Hunter Online:
• Instead of baking a cubemap with static scene image, bake a scene diffuse cubemap and scene normal
cubemap. [ McAuley 2015, "Rendering The World Of Far Cry 4" ]
• Based on the game’s lighting environment and sky dome, re-light the cubemap, and generate cubemap mips.
The mip with 16 x 16 resolution can be used for environment diffuse cubemap.
• Based on the BRDF we use, with filtered importance sampling, we filter the re-lit cubemap based on
different levels of glossness. The filtered cubemap will be used for environment specular cubemap.
[ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]

Cubemap Filtering
 For each runtime re-lit cubemap:
1. Generate MIPs from raw cubemap first.
2. The low resolution MIP can be used for environment diffuse cubemap directly.
3. However, for environment specular, we need to filter the cubemap correctly based on different levels of
glossness, and store the filtered results in different cubemap MIPs.
 The problem: how to filter specular cubemap correctly on GPU at runtime?
 Solution: GPU filtered importance sampling
1. [ GPU Gems3, “GPU Based Importance Sampling” ]
2. [ Karis 2013, “Real Shading in Unreal Engine 4” ]
3. [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]

Environment Specular
Importance sampling in Monte-Carlo integration:
With specular BRDF:

环境Specular
• The DFG term can be pre-integrated offline and the results can be stored in a texture. [ Karis 2013,
“Real Shading in Unreal Engine 4” ]
• Or it can be approximated through one function. [ Lazarov 2013, "Getting More Physical in Call of
Duty: Black Ops II" ]。
• We use the pre-integrated DFG texture for our high spec configuration while we use the Lazarov
function for our low spec configuration in order to reduce texture sampling.

Cubemap Filtering
 The LD term needs to be computed for each pixel of the cubemap
1. For each pixel of each cubemap MIP, find N sampling directions through importance sampling. [ Karis 2013,
“Real Shading in Unreal Engine 4” ]
2. Calculate the PDF of that direction based the glossness value the filtering is based on.
3. Based on the PDF, calculate the raw cubemap MIP we need to sample, and sample it. [ GPU Gems3, “GPU
Based Importance Sampling” ](equation 11, 12 and 13)
4. Weight these samples with cosine term.

Cubemap Filtering
 Optimization:
• There’s no need to filter cubemap MIP0 as it will be used for perfect mirror-smooth surface. We just copy
the MIP0 of the raw cubemap to the filtered cubemap.
• Limit the sampling count of the rest MIPs to be 32.
• Limit the maximum size of the cubemap to be 128 x 128 x 6.
• Since the TOD of our game isn’t running very fast, there’s no need to filter all cubemaps every frame. We
distribute the filtering work through multiple frames.
 Results:
• Lighting + filtering one cubemap = 0.35 ms
• By distributing the filtering through multiple frames, the total GPU time of cubemap filtering pass is about 1
ms.
• The GPU time was measured on GTX 660, with 12 cubemaps in the map.

Cubemap Filtering
• The final filtered results is almost identical to the filtered results from ATI CubemapGen.

Dynamic IBL (6pm) – ambient & env specular fit well

Dynamic IBL (5am) – ambient & env specular fit well

Future work:
• The current re-lit cubemap doesn’t contain depth info, thus the lighting pass can only apply sun light and sky
light to it.
• Need to bake depth into the cubemap so that the cubemap re-lighting pass can be more accurate.
• The filtering pass can be further optimized if we batch all cubemap faces of the same MIP together.
• An automatic cubemap probe planting & baking system would be nice.

Physically correct screen space reflection

Motivation:
• Monter Hunter Online has a lot of water surface and smooth surface across the game world, thus correct
reflection system is necessary.
• The original water reflection was achieved by rendering the scene again through a reflection camera.
• Lots of limitations:
1. More draw calls.
2. Even worse if there are multiple reflection surfaces in the viewport.
3. The reflection only has sun light and misses point light, shadows and GI.
4. Without any filtering, it only support mirror reflection and cannot support glossy reflection.
5. It cannot be used for GPU dynamic rain effect as the reflection surfaces (puddles) are generated procedurally.

Motivation:
• Screen Space Reflection is being wildly used in games already.
• Pros:
1. High efficiency & rooms for optimization. We ended up with 2 types of SSR: water only reflection for low spec and all-
surface reflection for high spec.
2. The reflection contains all the rendering information (lighting, shadows, GI, particles) if we use last frame’s scene
texture as input.
• Cons:
1. Cannot reflect anything that is outside the viewport.
2. Cannot reflect anything that is occluded in the screen.
3. Need environment specular cubemap for fallback solution.
• Conclusion:
• CryEngine’s IBL cubemaps can be a good fall back solution for SSR if the tracing fails.

CE 3.6.17 SSR on smooth surface

CE 3.6.17 SSR on rough surface

SSR pipeline in MHOL
• Water only reflection (for low spec)
1. Render water surface normal & depth onto half resolution targets
2. Ray-tracing pass
3. Filtering pass with temporal reprojection
4. Apply SSR during water surface shading
• All-surface reflection (for high spec)
1. Downsample Gbuffer normal & depth onto half resolution targets, then render water surface normal & depth onto it.
2. Ray-tracing pass with stochastic importance sampling
3. Scene texture blurring pass (scene texture was from last frame)
4. Filtering pass
5. Temporal reprojection pass
6. Apply SSR after deferred IBL pass
7. Apply water SSR during water surface shading

Raytrace Pass (water reflection)
• The algorithm is pretty simple: calculate the reflection vector based on the view vector and
surface normal, then ray march along the reflection vector until it hits the scene depth.
normal
view reflection ray
scene depth
reflection surface

Raytrace Pass (water reflection)
• Optimizations:
1. For each ray march step: XY  one pixel of the screen space, Z  the reciprocal of linear Z. [ McGuire 2014, “Efficient
GPU Screen-Space Ray Tracing” ]
2. The ray march process can be optimized with dithered sampling. Assuming the ray-tracing result for each pixel of a 2x2
tile is similar, then we can distribute the whole tracing distance evenly among these 4 pixels (with a dithered offset
pattern) [ Valient 2014, “Reflections And Volumetrics Of Killzone Shadow Fall” ].
3. Then we just need to composite the tracing results of these 4 pixels in the filtering pass in order to get the complete
tracing result.
DC
BA
A
B
C
D

Filtering Pass (water reflection)
 How to make sure the reflection has all the scene information?
1. Only store the reflection location (reflection’s UV coordinate in scene texture) during the ray-tracing pass.
2. Calculate the reflection location from last frame (reprojection) during the filtering pass.
3. Use the reprojected reflection location to sample the scene color from last frame’s scene texture.
 How to composite the tracing results from 2x2 tile?
• Since we are only dealing with water reflections now, the surface information of 2x2 tile is almost identical (same
reflectance and glossness, as they are all water), we just need to average them.
 Temporal reprojection for stability
• Due to some factors (half resolution rendering, discontinuous scene depth, etc), the final reflection image flickers while
camera is moving. Thus we need to use temporal reprojection to stabilize the reflection image. This will be detailed later.

Performance
• Profiling environment: 800 x 450 half resolution SSR on GTX 660
• The whole SSR process takes 0.55ms in the worst case (half of the screen is occupied by water).

All surface reflection
• The all-surface reflection needs to follow the microfacet BRDF we use.
• The reflection direction is no longer a ray but a cone.
1. The rougher the reflection surface is, the blurrier the reflection image becomes.
2. The further the reflection surface is, the blurrier the reflection image becomes.
3. The intensity of the reflection needs to follow the reflection surface Fresnel reflectance.

Glossy reflection in real world

Raytrace Pass (all-surface)
• Since the reflection direction is no longer a ray but a cone, we need more than one reflection rays
in order to fill the cone.
• Which is quite impractical in realtime game.
normal
view
scene depth
reflection surface
reflection cone

• Optimization 1:
1. We can expend the dithered sampling technique motioned before. For 4 pixels in the 2x2 tile, we can shoot 4 rays with 4
different reflection angles.
2. If the reflection surface is smooth enough, these 4 rays may be enough to cover the reflection cone.
DC
BA
A
B
C
D

• Optimization 2:
1. However, for rough reflection surface, how can we cover the cone with only 4 rays?
2. Solution: temporal supersampling.
• Temproal Supersampling Refresher:
1. For a static pixel, if nothing is changed regarding to this pixel, then its image should be the same across frames.
2. Based on this assumption, we can distribute the rendering work of this pixel across frames and use the accumulated
result of the pixel as its final image.

• Temproal Supersampling:
For the pixels of 2x2 tile, we not only let them shoot 4 rays with 4 different reflection angles at one frame, we also let them
shoot rays with different angles across frames. This is achieved by adding a frame variation to the importance sampling used
for reflection angle calculation.
A
B
C
D
A
B
C
D
Frame N+1Frame N

Video: tracing result with jittering

Filtering Pass (all-surface)
• Difference between this and how we handle the filtering of water reflection
1. Since the ray angles were calculated based on importance sampling with a jittered pattern, we also need a jittered
pattern to sample the ray-tracing results. [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ]
2. The source texture (last frame’s scene texture) needs to be pre-filtered so that the reflections of all glossness can find the
correct blurred scene image.
3. The material information of the pixels within the 2x2 tile may be totally different, thus we cannot simply average the ray-
tracing results. The results have to be weighted by the material surface BRDF. Based on [ Stachowiak 2015, “Stochastic
Screen-Space Reflections” ], the weight for each ray-tracing result pixel is the current pixel’s BRDF divided by the
sampling pixel’s PDF.

Video: filtering result with jittering

Temporal Reprojection Pass
• Temporal Reprojection:
1. For each pixel from the filtering pass, we calculate its world space position, then reproject it into last frame’s screen UV
space, then find the resolved result from last frame’s SSR image.
2. Then we accumulate the last frame’s SSR data with current frame’s filtered data, and use that for this frame’s final SSR
image.
• The reprojection process for SSR is a little bit different
1. Usually we reproject the pixel based on its world space position. We calculate its world space position based on the
pixel’s screen space location and scene depth. However, this is not the case for SSR reprojection.
2. For the pixel of the reflection, we cannot use the reflection surface’s depth, we need to use the reflection’s depth to
calculate the world space position, then use that to reproject the pixel location.
3. [ Stachowiak 2015, “Stochastic Screen-Space Reflections” ]

Video: final result with temporal reprojection

Temporal Reprojection Pass
• Remove Ghost Artifacts:
1. Since we use last frame’s scene texture as reflection source texture, we will have one-frame delay anyway. Thus we
cannot remove ghosting totally, we can only limit it to a tolerable level.
2. Using reflection’s depth instead of reflection surface’s depth for reprojection (previously mentioned) helps a lot.
3. The color & luma bounding box clamping from UE4’s TXAA technique also helps a lot.
4. Skip the resolved sample if the reprojected location is too far away from the current pixel location.
5. Skip the resolved sample if the depth of the reprojected location is too far away from the depth of the current pixel
location.

Video: Reprojection based on reflection surface’s depth

Video: Reprojection based on reflection’s depth

Video: moving camera on smooth surface

Video: moving camera on mixed surface

Performance
• Profiling environment: 800 x 450 half resolution SSR on GTX 660
• Ray-tracing pass = 0.8 ms (worst case while all pixels on screen pass the SSR glossness threshold)
• Last frame scene texture pre-filtering = 0.08 ms
• Filtering pass = 0.27 ms
• Temporal reprojection pass = 0.2 ms
• Total = 1 ms (average) ~ 1.4 ms (worst)

Future work:
1. Optimize ray-tracing pass with hi-z sampling.
2. Further reduce ghosting issue by applying motion vector during the temporal reprojection pass.
3. Render the filtering pass in full resolution for better quality (maybe for ultra spec configuration).

Lighting Translucent objects
 Goal:
• Almost all of our maps have point lights and for some of them point light is the only type of light source.
• Monster Hunter Online has quite a few objects that can only be correctly shaded through forward shading:
a) Objects with anisotropic specular: hair (opaque part), silk, etc.
b) Objects with subsurface scattering for all lights: ice.
c) Translucent objects that do not go through Gbuffer pass: hair tips (translucent), TressFX fur, glass, etc.
• Need to make sure all of these forward shading objects are correctly lit by point lights.
 Solution:
• Forward Plus [ Harada 2012, “Forward+: Bringing Deferred Lighting To The Next Level” ]

Video: Hair is affected by point lights

Forward Plus
 Difference from tiled lighting:
1. Both techniques do tiled light culling on GPU.
2. Tiled lighting does per-pixel lighting directly after light culling.
3. Forward plus stores culled lights into a light indexed list which can be used for both deferred lighting and forward
lighting.

Forward Plus
 How tile is divided:
1. The amount of tile needs to match the pixel count of one mip of hi-z. Each tile represents one pixel of the hi-z.
Our hi-z contains both min z and max z of the scene.
2. In 4k resolution, the screen is divided into 256 x 128 tiles; in 1080p or above, the screen is divided into 128 x 64
tiles; in other resolution, the screen is divided into 64 x 32 tiles.
3. Up to 16 point lights are supported for each tile; up to 255 point lights are supported for the viewport; the point
light indices range from 1 to 255.
4. Each light is culled by the tile boundary and the min z and max z of the tile (the pixel of the hi-z). Hi-z is also used
for GPU occlusion test for the game.

Forward Plus
 How light indices are stored:
1. For DX9, first we need 4 R8G8B8A8 render targets to store the light indices. The first 4 lights of one tile is stored
in render target A, the next 4 lights of the same tile is stored in render target B, and so on.
2. Therefore the light index has to be in the range of 1-255 (0 is reserved as no-light), and 1 pixel of 1 render target
can store up to 4 lights, which means up to 16 lights are supported for 1 tile.
3. Then we combine these 4 render targets into 1 light index atlas.
4. For DX11 it’s pretty simple: the light indices of each tile is stored in one element of a structured buffer.

Forward Plus
 Pipeline:
1. Cull the lights against frustum on CPU, then write light data into one global light buffer.
2. Write the depth of translucent objects into scene Z target, and generate hi-z based on that.
3. Do tiled light culling on GPU, then store the culled light indices into global light indexed light.
4. During forward shading, find the tile index based on the shading pixel location, then find the light indices from the
global light indexed light, then light the pixel based on the light data from the global light buffer.
 Pros:
Easy to implement. Meets our artists requirement pretty well.
 Cons:
1. We run into VGPR issues in the forward shading pass due to extensive forward lighting (and GI) calculation.
2. DX9 has vector register limits, which makes some of the pixel shaders fail to compile.
3. We have to simplify some lighting functions for the DX9 version.

Character Hair Rendering
• Most part of the hair strand is opaque, while the strand tip is translucent.
• The opaque part goes through Gbuffer pass, deferred cubemap pass and deferred shadow pass
along with other general objects, while its lighting part is done through forward shading due to its
anisotropic specular lighting.
• The translucent part only goes through forward shading pass, where all of its lighting (both direct
and environment) and shadowing are done.
• Hair sorting:
1. For different characters their hair is sorted based on character heads position.
2. For hair layers within one hair style, they are sorted based on their sub-material names, such as “hair_0”,
“hair_1” and such. We require artists to appoint different sub-materials to different hair layers if they
want correct sorting.
3. However, for quite a few hair styles we don’t sort at all and the result is still fine.
• We also write hair depth into scene Z target so that the hair won’t be all blurry when depth of
field is turned on.

Opaque hair only VS opaque hair with translucent tips

GI for translucent objects
• Translucent objects still need GI lighting.
• Especially for hair, whose opaque part goes through deferred cubemap pass for GI while translucent part
only does forward shading.
Strand tips with no GI Strand tips with GI

GI for translucent objects
 Goal:
• Our version of CryEngine uses cubemaps as its GI solution. Sometime we have up to 12 cubemaps rendered
in screen.
• For DX11 we can simply store all these cubemaps into one cubemap array and use it anywhere, while this is
totally unviable in DX9.
• For DX9 we cannot sample all those 12 cubemaps in one forward shading pixel shader either because of the
texture stage limits.
• Need to figure out a way to efficiently store all those cubemaps and use them in the forward shading pass.
 Solution:
• Project all those cubemaps into spherical harmonics coefficients.
• Therefore our GI for translucent objects is environment diffuse only. The environment specular for
translucent objects is faked by sampling one general specular cubemap (our translucent hair tips don’t have
any environment specular at all).

Spherical Harmonics Projection
 Pipeline:
1. Project each pixel of the diffuse cubemap into spherical harmonics, then accumulate them.
2. For DX11 we use one set of SH9 coefficients, which can be stored in one element of a structure buffer.
3. For DX9 we use one set of SH4 coefficients, which can be stored in 4 pixels of a R32G32B32A32F render target.
4. To simply the pipeline, we project all diffuse cubemaps into spherical harmonics coefficients, and use that for
both of our deferred cubemap pass and forward shading pass. The environment specular is still done through
sampling specular cubemaps during the deferred cubemap pass.
5. As is mentioned before, since we can only project diffuse cubemaps into spherical harmonics coefficients, we
can only do environment diffuse lighting for translucent objects. The environment specular lighting is faked for
them.

Spherical Harmonics Projection
 DX11:
1. The projection process is done through compute shader with parallel reduction[ Young 2010, “DirectCompute
Optimizations and Best Practices” ].
2. We spawn 256 threads for one 16 x 16 diffuse cubemap. Each thread handles 6 pixels from 6 faces. The projected
coefficients are stored in the group shared memory, then they are accumulated through parallel reduction.
3. It only takes 0.16 ms to project 12 cubemaps with 16 x 16 resolution. We do this process every frame.
 DX9:
1. We can’t find a smart way to do this, so we have to do the projection through brutal force: we have to do the projection
of all pixels of one cubemap in one pixel shader fragment.
2. Due to the inefficiency, we have to lower the diffuse cubemap resolution for this process. We use 8 x 8 diffuse cubemap
for SH4 projection. All 8 x 8 x 6 pixels are projected and accumulated in one pixel shader fragment.
3. Basically we trade GPU performance with quality. The GPU time it takes to project 12 cubemaps with 8 x 8 resolution is
0.18 ms for DX9, which is on par with the DX11 version, and the quality loss is sort of acceptable.

Light Indexed Deferred
• The CryEngine we are using still uses light volume as its deferred lighting solution. This is fine
most of the time, until we run into maps with lots of point lights.
• Since we already have the global light indexed list, we can just use it as our deferred lighting
solution, which is called “light indexed deferred” in some papers.
• It’s not much different than tiled lighting, except we have culled the lights already and we just
need to grab the light indices for each tile and do the usual thing.

Light Indexed Deferred
• Performance:
1. DX11: when there aren’t many point lights in the viewport, the light indexed deferred method is on par with using light
volumes. However, when the amount of lights exceeds 50, the light indexed deferred method beats light volumes hands
down.
2. DX9: due to the inefficiency of branches and loops in SM 3.0, the light indexed deferred method is slower than light
volumes when there aren’t many point lights.
• Since we don’t have 50+ point lights most of time, we only turn on light indexed deferred for
DX11. For DX9 we still use the CryEngine light volume solution.
• Since we have a lot of outdoor maps in our game, the sky background can be seen most of the
time during the game. Thus a stencil pre-pass culling all the empty tiles is quite helpful.
• In Monster Hunter Online, the stencil pre-pass before the lighting pass saves about 0.1 – 0.2 ms in
DX11.

Future work:
1. Currently shadow casting point lights and spot lights are still rendered by using light volumes.
2. Need to find a solution to make the tiled deferred lighting system support shadow casting point lights.
3. Also need to find a solution to support spot lights and area lights (need to implement first).

Rendering Pipeline in CryEngine 3.3.8
• CryEngine 3.3.8 uses deferred lighting + forward shading
1. Gbuffer Pass
2. Shadow Pass
3. AO Pass
4. Deferred Cubemap Pass (GI solution)
5. Deferred Lighting Pass (done through light volumes)
6. Deferred Shadowing
7. Forward Shading (for everything)
8. HDR Tonemapping + Post Effects + AA
• Gbuffer layout:
R8G8B8A8 Normal & Glossness + R32F linear depth
• Pro: thin Gbuffer saves a lot of bandwidth
• Con: waste of draw call and the thin Gbuffer cannot support PBR for point lights and GI

Rendering Pipeline in CryEngine 3.6.17
• CryEngine 3.6.17 uses Deferred Shading + Forward Shading
1. Gbuffer Pass
2. Shadow Pass
3. AO Pass + SSR Pass
4. Deferred Cubemap Pass
5. Deferred Lighting Pass (done through tiled lighting)
7. Deferred Shading
8. Forward Shading (only for some objects)
• Gbuffer layout:
R8G8B8A8 Normal & Transmittance + R8G8B8A8 Diffuse & SSS profile + R8G8B8A8 Glossness & Specular
• Pro: supports full PBR for all types of lights
• Con: lacks features our artists desire & no longer supports DX9

Rendering Pipeline in Monster Hunter Online
• Based on 3.6.17, we added more features
1. Gbuffer Pass
2. Shadow Pass
3. Hi-Z Occlusion Test (DX11)
4. AO Pass + Stochastic SSR Pass
5. Cubemap Relighting & Filtering Pass
6. SH9 / SH4 projection Pass
7. Deferred Cubemap Pass
8. Deferred Lighting Pass (DX9: light volumes; DX11: light indexed deferred)
10. Deferred Shading
11. Forward Plus Preparing (build hi-z with translucent objects depth + build light indexed list based on new hi-z)
12. Forward Shading (with Forward Plus lighting for translucent objects)
• Gbuffer layout:
R8G8B8A8 Normal & Material ID + R8G8B8A8 Glossness & Specular + R8G8B8A8 Diffuse & Transmittance + R32F linear depth (DX9 only)

• We also have a simplified deferred shading pipeline for middle spec configuration.
1. Gbuffer Pass
2. Shadow Pass
3. Hi-Z Occlusion Test (DX11)
4. Water only SSR
5. Deferred Lighting Pass (DX9: light volumes; DX11: light indexed deferred)
7. Deferred Shading
8. Forward Plus Preparing (build hi-z with translucent objects depth + build light indexed list based on new hi-z)
9. Forward Shading (with Forward Plus lighting for translucent objects)
• We replaced the cubemap GI solution with artist tuned static ambient value for environment diffuse and one
global specular cubemap for environment specular.

• Meanwhile we also have one forward shading only pipeline for really low spec machines.
1. Forward Shading done in R16G16B16A16F format render target
2. HDR Tonemapping and no other post effects
 This is pretty much how vanilla Monster Hunter is rendered on PSP, and possibly can be our rendering pipeline of choice if
we plan to port our game to mobile phones.
 Only sun light is handled. For a lot of environment objects normal mapping is skipped and only vertex normal is used.
 Environment diffuse is done through artist tuned ambient value, and environment specular is done through global
specular cubemap.
 Shadow map is not used. Instead, sphere shadow volume is used for character shadows.
 Still supports PBR.
 Future work:
1. Write scene depth into alpha channel of the scene render target so that we can still support screen space decals.
2. Since we don’t have depth pre-pass, we’d better sort the scene objects beforehand in order to reduce overdraws with early depth test.

Other small things
Miscellaneous rendering tricks, experiences, etc

Prevent color leaking in half-res DOF
• Depth of field pipeline:
1. Filter Pass: calculate the filtering kernel and blend alpha based on scene depth, then filter the scene.
2. Composite Pass: blend the filtered scene and the original scene based on blend alpha.
DOF filter
Half resolution scene DOF filtered scene blended with un-
filtered scene

Prevent color leaking in half-res DOF
• How to prevent color leaking:
1. Pre-multiplied Alpha Pass: before filtering the scene, calculate the blend alpha and multiply the scene with it.
2. Filter Pass: filter the scene which has been multiplied by the blend alpha, then divide the blend alpha after filtering.
3. Composite Pass: same as usual.
DOF filter
Half resolution scene pre-multiplied with alpha
DOF filtered scene blended with un-
filtered scene

Decal color leak Decal no color leak

Prevent color leaking in deferred decals
• The Problem:
• This is caused by depth discontinuities when there’s an object occluding the decal in screen space.
• For small decals, this can be solved by disabling texture mipmap, such as forcing to use MIP0.
• However, in our game we have quite a lot of huge decals placed on terrain in order to improve
environment variation.
• The Solution:
• [ Persson 2010, http://humus.name/index.php?page=3D&ID=84 ]
• We sample the scene depth within the 2x2 area of the decal pixel, then manually find the depth
continuities on horizontal and vertical direction. Then we use tex2Dgrad to sample the decal
texture.

Geometry blending
• Our artists noticed the feature when playing Rise of the Tomb Raider.
• Our artists want similar feature implemented in Monster Hunter Online: to reduce visible
disconnect between objects and terrain.
No Geometry Blend With Geometry Blend

Geometry blending
• Our pipeline:
1. Separate geometry blending objects from other Gbuffer objects and put them into a standalone render
list.
2. Render other Gbuffer objects first in the Gbuffer pass, then resolve the scene depth onto a target (half
resolution depth will also do).
3. Then render the geometry blending objects:
① Sample the resolved depth in shader and calculate the blending alpha based on the difference between resolved
scene depth and geometry fragment depth.
② Do Gbuffer alpha blending (like how the terrain layers are handled) based on this depth bias based alpha.
4. Since we use deferred shading and all PBR materials go through the same lighting function, Gbuffer
blended objects look like they are blended together.

Alpha blended albedo diffuse Alpha blended world space normal
Alpha blended specular reflectance Alpha blended glossness

Geometry blending
• With this technique we can also do blending among non-terrain objects:
• However, we require our artists to only do this among non-metals. This is because for non-metals their
specular values are close (all around 50) while metals may have very different specular values. Besides,
blending specular reflectance among metals or among metals and non-metals doesn’t make sense for
PBR anyway.

DX11 in-place editing
• Sometimes we want to modify the data of render target A but the hardware alpha blending model just
cannot do it.
• For example: A’ = f (A)
• A is the original pixel of render target A
• A’ is the modified pixel of render target A
• f() is fairly complex and cannot be achieved through src color * src blend + dst color * dst blend
• Usually this is done through “ping pong”:
• Copy render target A to render target B
• Sample render target B during the shading pass, calculate A’ based on f(B)
• Write back A’ to render target A
• This needs one extra texture copy and one extra render target
A B f(B)B
Pass 1: copy A to B Pass 2: write f(B) to A
A’ = f (B)
A f(A)
Since B = A, so f(B) = f(A) = A’

• DX11 supports in-place editing on RWTexture if its format is 32bit:
1. Create the texture with its corresponding typeless format. For example, for R8G8B8A8, its corresponding typeless format
is R8G8B8A8_TYPELESS.
2. Create the RTV and SRV of it with its original format, such as R8G8B8A8.
3. Create the UAV of it with R32UINT.
4. For the rendering pass, set the UAV as its shading output.
5. Inside the shader, load the data from UAV first, then convert it from R32UINT format to your desired format (such as
R8G8B8A8).
6. After the data modification, re-convert your data back to R32UINT format, then write it back to the output UAV.
7. Thus, we can modify the data of target A with only one rendering pass and no extra render target.
A
Read A directly, do the modification, then write it back directly
A’ = f (A)
f(A)

• The Gbuffer rain pass and Gbuffer snow pass of CryEngine 3 can be optimized through this
method.
• With in-place editing, we saved 3 extra render targets from Gbuffer rain pass and Gbuffer snow
pass.
• Besides, we also saved 1 ms GPU time from Gbuffer rain pass and 0.2 ms GPU time from Gbuffer
snow pass.
• Warning: DX11 in-place editing doesn’t work for MSAA targets and it breaks on old (pre 2015)
Nvidia drivers.

Gbuffer Compression (not used)
• Gbuffer layout before rendering pipeline refactoring:
• R8G8B8A8 Normal & Glossness + R32F linear depth
1. Good: thin Gbuffer, cheap on bandwidth
2. Bad: cannot support PBR for IBL cubemaps and point lights
• Gbuffer layout after rendering pipeline refactoring:
• R8G8B8A8 Normal & Material ID + R8G8B8A8 Glossness & Specular + R8G8B8A8 Diffuse & Transmittance + R32F linear depth (DX9 only)
1. Good: supports PBR for all types of lighting, backward compatible with legacy materials
2. Bad: 3 targets for DX11 and 4 targets for DX9, bandwidth usage doubled compared to previous version
• Tested Gbuffer compression:
• A2R10G10B10 NormalXY & Glossness & Material ID + R8G8B8A8 Specular YCr/YCb & Diffuse YCr/YCb + R32F linear depth (DX9 only)
1. Good: have all the information we need from the uncompressed Gbuffer layout
2. Bad: ALU usage for decoding interleaved YCrCb Gbuffer data over-shadows bandwidth gain
3. Conclusion: not viable on PC at least

Road to PBR
 Programmer side:
1. HDR & linear space lighting pipeline (already exists in CryEngine).
2. New texture combination: Diffuse + Normal + Specular (Frensel 0) + Glossness.
3. New Gbuffer layout & rendering pipeline to suit the new texture combination.
4. Specular BRDF: Normalized Phong  Microfacet GGX.
5. Specular AA (already handled in CryEngine texture tool): http://blog.selfshadow.com/2011/07/22/specular-
showdown/
6. Pre-filtered / runtime-filtered environment specular cubemap.
 Artists side:
1. Need to make sure the artists understand the difference between albedo diffuse and specular reflectance.
2. Need to make sure the artists know where to put their texture details at (normal & glossness, but not in diffuse or
specular).

Road to PBR
 The Problem:
1. Our time budget for PBR assets transition is only 2-3 months.
2. Need to figure out a method for quick & easy assets review.
3. We also refactored our environment cubemap pipeline, all maps lighting setup were refactored again.
4. Need to figure out a method for quick & easy lighting setup review.

Road to PBR
 The Solution:
1. Added lots of rendering debug view mode in the sandbox, so that our artists can review their work quickly
and easily.
2. We also expended this debug view mode with lots of profiling visualization modes, such as texture usage.

A corner of the main city in Sandbox

Diffuse view mode: albedo diffuse should
not contain lighting information and
should be dark or black for metals.
Glossness view mode: artists can put a
much details into the material
glossness.
Specular view mode: metals have bright and
sometimes colorful specular (over 100) while non-
metals all have low and gray specular (around 50).
Normal view mode: we use this view
mode to check if assets have incorrectly
exported normal maps.

Environment diffuse view mode: this is used to check
how the skylight parameter of TOD is setup and how
the GI system is working in the environment.
Environment specular view mode: this is used to check
how the pre-filtered / runtime-filtered specular
cubemap is working in the environment.

Texture MIP view mode: this is used for
testing the texture streaming system.
Texture usage view mode: this is used for
checking the amount of material textures
used for each surface material.
Texture size view mode: this is used for
checking the maximum texture size of each
surface material.
Gameplay related texture channel view
mode: we also store some gameplay
related information into material textures.

Reference
• [ McAuley 2015, "Rendering The World Of Far Cry 4" ]
• [ Lagrade et al. 2014, "Moving Frostbite to Physically Based Rendering" ]
• [ GPU Gems3, "GPU Based Importance Sampling" ]
• [ Karis 2013, "Real Shading in Unreal Engine 4" ]
• [ Lazarov 2013, "Getting More Physical in Call of Duty: Black Ops II" ]
• [ McGuire 2014, "Efficient GPU Screen-Space Ray Tracing" ]
• [ Valient 2014, "Reflections And Volumetrics Of Killzone Shadow Fall" ]
• [ Stachowiak 2015, "Stochastic Screen-Space Reflections" ]
• [ Harada 2012, "Forward+: Bringing Deferred Lighting To The Next Level" ]
• [ Young 2010, "DirectCompute Optimizations and Best Practices" ]
• [ Persson 2010, http://humus.name/index.php?page=3D&ID=84 ]

Rendering 'monster hunter online'

More Related Content

Recently uploaded

Featured

Rendering 'monster hunter online'

Editor's Notes