The document discusses techniques for improving VR rendering performance. It begins by summarizing multi-GPU rendering approaches, including assigning each GPU to render a single eye or half of an eye. It then discusses fixed foveated rendering using NVIDIA's multi-resolution shading to reduce pixel rendering at the periphery. The document also covers reprojection techniques for missed frames, including rotation-only and positional reprojection. It concludes by introducing adaptive quality techniques that dynamically adjust rendering settings like resolution and anti-aliasing to maintain framerates while maximizing GPU utilization.
Learn how to use the Lightweight Render Pipeline to optimize for maximum performance on mobile platforms. After watching this, you will be ready to build an experience for standalone mobile VR headsets, like Oculus Go, and how to use 3-DOF controllers for interactions.
Speakers:
Dan Miller (Unity Technologies)
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...AMD Developer Central
Presentation WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
This presentation introduces Vulkan components, what you must know to start using this new API. And what you must know when using it on NVIDIA hardware
Porting the Source Engine to Linux: Valve's Lessons Learnedbasisspace
These slides discuss the techniques applied to porting a large, commercial AAA engine from Windows to Linux. It includes the lessons learned along the way, and pitfalls we ran into to help serve as a warning to other developers.
Learn how to use the Lightweight Render Pipeline to optimize for maximum performance on mobile platforms. After watching this, you will be ready to build an experience for standalone mobile VR headsets, like Oculus Go, and how to use 3-DOF controllers for interactions.
Speakers:
Dan Miller (Unity Technologies)
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour...AMD Developer Central
Presentation WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
This presentation introduces Vulkan components, what you must know to start using this new API. And what you must know when using it on NVIDIA hardware
Porting the Source Engine to Linux: Valve's Lessons Learnedbasisspace
These slides discuss the techniques applied to porting a large, commercial AAA engine from Windows to Linux. It includes the lessons learned along the way, and pitfalls we ran into to help serve as a warning to other developers.
presented at SIGGRAPH 2014 in Vancouver during NVIDIA's "Best of GTC" sponsored sessions
http://www.nvidia.com/object/siggraph2014-best-gtc.html
Watch the replay that includes a demo of GPU-accelerated Illustrator and several OpenGL 4 demos running on NVIDIA's Tegra Shield tablet.
http://www.ustream.tv/recorded/51255959
Find out more about the OpenGL examples for GameWorks:
https://developer.nvidia.com/gameworks-opengl-samples
This presentation demonstrates how to efficiently manage GPU buffers using today's APIs. It describes why buffer management is so important, and how inefficient buffer management can cut frame rates in half. Finally, it demonstrates a couple of new techniques; the first being discard-free circular buffers and the second transient buffers.
This talk, delivered at GDC 2014, describes a method to detect CPU-GPU sync points. CPU-GPU sync points rob applications of performance and often go undetected. As a single CPU-GPU sync point can halve an application's frame rate, it is important that they be understood and detected as quickly as possible.
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Recorded video here:
http://on-demand.gputechconf.com/siggraph/2017/video/sig1757-tristan-lorach-vkFX-effective-approach-for-vulkan-api.html
Vulkan is a complex low-level API, full of structures and dedicated objects. Using it may be tedious and often leads to complicated source code. We propose here a way to define and use Vulkan components in a convenient and readable way. Then we will show how this infrastructure allows to introduce and use higher concepts, such as Techniques, Passes; and even how to instantiate resources, render-targets right from within the effect, making it self-sufficient and consistent as a general description. The overall purpose of this open-source project is to improve and enhance the use of Vulkan API, while keeping its strength and flexibility. This project can run in two different ways: either as a compiler generating C++ code for you; or at runtime, to load effects and use them right away.
vkFx comes from a former project called nvFx, presented few years ago. While nvFx was intended to be Generic (OpenGL & D3D compliant), vkFx is Vulkan-specific: so the project is thin and doesn’t break important paradigms that Vulkan requires to stay powerful.
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
Unity mobile game performance profiling – using arm mobile studioOwen Wu
本議程將使用實際的Unity專案來說明如何在手機上有效的進行效能分析,除了介紹使用Unity本身的效能分析工具外,也會介紹如何使用Arm Mobile Studio來做更為精確的效能分析。
目標對象與預期收穫
Unity遊戲工程師可以從本議程中學習到有效的手機遊戲效能分析知識及技巧,從而能快速且正確地找出手機遊戲的效能瓶頸。
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019Unity Technologies
Learn how the Boat Attack demo was created using the Universal Render Pipeline. These slides offer an in-depth look at the features used in the demo, including Shader Graph, Custom Render Passes, Camera Callback, and more.
Speaker:
Andre McGrail - Unity Technologies
Watch the session on YouTube: https://youtu.be/ZPQdm1T7aRs
XR graphics in Unity: delivering the best AR/VR experiences – Unite Copenhage...Unity Technologies
Virtual reality (VR) and augmented reality (AR) are powerful tools for storytelling, but poor execution can negatively impact consumer reactions and engagement. This session guides you through the latest Unity tech and best practices for creating stunning high-end VR and mobile AR visuals.
Speaker: Dan Miller – Unity
Watch the session on YouTube: https://youtu.be/dvOZ7IL2iOI
presented at SIGGRAPH 2014 in Vancouver during NVIDIA's "Best of GTC" sponsored sessions
http://www.nvidia.com/object/siggraph2014-best-gtc.html
Watch the replay that includes a demo of GPU-accelerated Illustrator and several OpenGL 4 demos running on NVIDIA's Tegra Shield tablet.
http://www.ustream.tv/recorded/51255959
Find out more about the OpenGL examples for GameWorks:
https://developer.nvidia.com/gameworks-opengl-samples
This presentation demonstrates how to efficiently manage GPU buffers using today's APIs. It describes why buffer management is so important, and how inefficient buffer management can cut frame rates in half. Finally, it demonstrates a couple of new techniques; the first being discard-free circular buffers and the second transient buffers.
This talk, delivered at GDC 2014, describes a method to detect CPU-GPU sync points. CPU-GPU sync points rob applications of performance and often go undetected. As a single CPU-GPU sync point can halve an application's frame rate, it is important that they be understood and detected as quickly as possible.
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Recorded video here:
http://on-demand.gputechconf.com/siggraph/2017/video/sig1757-tristan-lorach-vkFX-effective-approach-for-vulkan-api.html
Vulkan is a complex low-level API, full of structures and dedicated objects. Using it may be tedious and often leads to complicated source code. We propose here a way to define and use Vulkan components in a convenient and readable way. Then we will show how this infrastructure allows to introduce and use higher concepts, such as Techniques, Passes; and even how to instantiate resources, render-targets right from within the effect, making it self-sufficient and consistent as a general description. The overall purpose of this open-source project is to improve and enhance the use of Vulkan API, while keeping its strength and flexibility. This project can run in two different ways: either as a compiler generating C++ code for you; or at runtime, to load effects and use them right away.
vkFx comes from a former project called nvFx, presented few years ago. While nvFx was intended to be Generic (OpenGL & D3D compliant), vkFx is Vulkan-specific: so the project is thin and doesn’t break important paradigms that Vulkan requires to stay powerful.
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
Unity mobile game performance profiling – using arm mobile studioOwen Wu
本議程將使用實際的Unity專案來說明如何在手機上有效的進行效能分析,除了介紹使用Unity本身的效能分析工具外,也會介紹如何使用Arm Mobile Studio來做更為精確的效能分析。
目標對象與預期收穫
Unity遊戲工程師可以從本議程中學習到有效的手機遊戲效能分析知識及技巧,從而能快速且正確地找出手機遊戲的效能瓶頸。
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019Unity Technologies
Learn how the Boat Attack demo was created using the Universal Render Pipeline. These slides offer an in-depth look at the features used in the demo, including Shader Graph, Custom Render Passes, Camera Callback, and more.
Speaker:
Andre McGrail - Unity Technologies
Watch the session on YouTube: https://youtu.be/ZPQdm1T7aRs
XR graphics in Unity: delivering the best AR/VR experiences – Unite Copenhage...Unity Technologies
Virtual reality (VR) and augmented reality (AR) are powerful tools for storytelling, but poor execution can negatively impact consumer reactions and engagement. This session guides you through the latest Unity tech and best practices for creating stunning high-end VR and mobile AR visuals.
Speaker: Dan Miller – Unity
Watch the session on YouTube: https://youtu.be/dvOZ7IL2iOI
Performance Evaluation and Comparison of Service-based Image Processing based...Matthias Trapp
Presentation of Research Paper "Performance Evaluation and Comparison of Service-based Image Processing based on Software Rendering" at 27. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2019) in Plzen, Czech Republic.
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
Slides from when I was teaching CS4052 Computer Graphics at Trinity College Dublin in Ireland.
These slides aren't used any more so they may as well be available to the public!
There are some mistakes in the slides, I'll try to comment below these.
This is the second lecture, and introduces programming with OpenGL 4 and shaders.
The presentation broadly covers the various optimization tips & techniques on content development for Virtual Reality. This was presented by the Art Lead of Digital Agents Interactive at the National Game Developer Conference 2017
- Vector- and Raster-based Graphics
-- Idea behind Vector- and Raster-based Graphics
-- Crispness
-- Overview of Raster-based Drawing APIs
- Platform independent Graphics and GUIs in the Web Browser
-- Bare HTML Pages
-- Plugins and Problems
-- From rich Content to HTML 5
- Drawing with HTML 5 Canvas
-- Continuous, Event driven and free Drawing
-- Basic Drawing "How does Drawing work with JavaScript?"
-- Interaction with Controls
VMworld 2013: A Technical Deep Dive on VMware Horizon View 5.2 Performance an...VMworld
VMworld 2013
Banit Agrawal, VMware
Warren Ponder, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/07/optimized-image-processing-for-automotive-image-sensors-with-novel-color-filter-arrays-a-presentation-from-nextchip/
Young-Jun Yoo, Vice President of the Automotive Business and Operations Unit at Nextchip, presents the “Optimized Image Processing for Automotive Image Sensors with Novel Color Filter Arrays” tutorial at the May 2023 Embedded Vision Summit.
Traditionally, image sensors have been optimized to produce images that look natural to humans. For images consumed by algorithms, what matters is capturing the most information. We can achieve this via higher resolution, but higher resolution means lower sensitivity. To increase resolution and maintain high sensitivity, color information can be sacrificed—but in automotive applications, color is critical. In response, suppliers offer image sensors that capture color information using novel color filter arrays (CFAs).
Instead of the traditional RGGB array, these sensors use patterns like red-clear-clear-green (RCCG). These approaches yield good results for perception algorithms, but what about cases where images are used by both algorithms and humans? Can we reconstruct a natural-looking image from an image sensor using a non-standard CFA? In this talk, Yoo explores novel CFAs and introduces Nextchip’s vision processor, which supports reconstruction of natural-looking images from image sensors with novel CFAs, including RGB-IR sensors.
2. My Presentation Last Year
● This is part 2 of my presentation from last year “Advanced
VR Rendering”, GDC 2015
● Video and slides from last year are free online:
http://www.gdcvault.com/play/1021771/Advanced-VR
● This year’s talk focuses on performance, but increased
visual quality is the goal of what I’m presenting today
2
3. Outline
● Multi-GPU for VR
● Fixed Foveated Rendering & Radial Density Masking
● Reprojection
● Adaptive Quality
3
7. Single GPU
● Single GPU does all the work
● Stereo rendering can be accomplished
a number of ways (this example uses
sequential rendering)
● Shadow buffer is shared by both eyes
7
GPU 0
8. Multi-GPU Affinity APIs
● AMD and NVIDIA have multi-GPU affinity APIs
● Broadcast draw calls across GPUs using affinity mask
● Set different shader constant buffers per-GPU
● Transfer subrects of render targets across GPUs
● Use transfer fences to asynchronously transfer while the
destination GPU is still rendering
8
9. Multi-GPU – 2 GPUs
● Each GPU renders a single eye
● Both GPUs render shadow buffer
● “Submit Left” and “App Window”
executes in the transfer bubble
● 30-35% performance increase
9
GPU 0 GPU 1
10. Multi-GPU – 4 GPUs
● Each GPU renders half of one eye
● All GPUs render shadow buffer
● PS cost scales, VS cost does not
● Might have high CPU cost in the driver
10
GPU 0
GPU 1
GPU 2
GPU 3
14. Outline
● Multi-GPU for VR
● Fixed Foveated Rendering & Radial Density Masking
● Reprojection
● Adaptive Quality
14
15. Projection Matrix vs VR Optics
● Pixel density distribution from the projection matrix is the
opposite of what we want
● Projection matrix: Pixel density per degree increases at the
periphery
● VR optics: Pixel density increases at the center
● We end up over rendering pixels at the periphery
15
18. Fixed Foveated Rendering
● Multi-GPU opportunities for 2 and 4 GPUs
● Using NVIDIA’s “Multi-Resolution Shading” we gain an additional ~5-10% GPU perf with
less CPU overhead (See “GameWorks VR”, Nathan Reed, SIGGRAPH 2015)
18
19. Fixed Foveated Rendering
● Multi-GPU opportunities for 2 and 4 GPUs
● Using NVIDIA’s “Multi-Resolution Shading” we gain an additional ~5-10% GPU perf with
less CPU overhead (See “GameWorks VR”, Nathan Reed, SIGGRAPH 2015)
19
20. Fixed Foveated Rendering
● Multi-GPU opportunities for 2 and 4 GPUs
● Using NVIDIA’s “Multi-Resolution Shading” we gain an additional ~5-10% GPU perf with
less CPU overhead (See “GameWorks VR”, Nathan Reed, SIGGRAPH 2015)
20
21. I’m Bad at Words
Me: “I have this idea to reduce fill rate at the periphery”
Jeep Barnett: “That’s interesting”
<Next day I show a demo>
Jeep: “That’s not even close to what I thought you meant”
Me: “What did you think I meant?”
Jeep: “I thought you would skip rendering every other pixel on the outside”
Me: (Laughing) “That’s not how GPUs work”
<Later that night>
Me: “Wait a minute…that’s a great idea!”
21
22. Radial Density Masking
Skip rending a checker pattern of 2x2 pixel quads to match current GPU architectures
22
23. Reconstruction Filter
23
+ =
* =
(Average 2 neighbors) (Average across diagonals)
Optimized Bilinear Samples
Weights near to far:
0.375, 0.375, 0.125, 0.125
Weights near to far:
0.5, 0.28125, 0.09375, 0.09375, 0.03125
24. Radial Density Masking
1. Clip() 2x2 pixel quads as you render, or fill stencil or depth with a
2x2 checker pattern and render
2. Reconstruction filter
● Saves 5-15% performance in Aperture Robot Repair. You can get
higher gains with different content and different shaders. If the
overhead of reconstruction and skipping pixels doesn’t beat the pixel
shader savings of skipped pixel quads, then it’s a wash.
● Almost always a big savings on low-end GPUs
24
25. Outline
● Multi-GPU for VR
● Fixed Foveated Rendering & Radial Density Masking
● Reprojection
● Adaptive Quality
25
26. Dealing With Missed Frames
● If your engine is not hitting framerate, the VR system can
reuse last frame’s rendered images and reproject:
● Rotation-only reprojection
● Position & rotation reprojection
● Reprojection to fill in missed frames should be thought of
as a last-resort safety net. Please DO NOT rely on
reprojection to maintain framerate unless your customer
is using a GPU below your application’s min spec.
26
27. Rotation-Only Reprojection
● Judder is caused by camera translation, animation, and objects
moved by tracked controllers. Judder appears as two distinct
images averaged together.
27
28. Rotation-Only Reprojection
● Rotation reprojection is eye-centered, not head-
centered, so reprojecting from wrong location
● ICD (inter-camera distance) artificially narrows
during reprojection depending on amount of
rotation
● “A leisurely head turn is in the ballpark of 100
degrees/second” – Michael Abrash, Valve blog, 2013
28
29. Rotation-Only Reprojection
● The good:
● Well-understood algorithm for decades and might improve with
modern research
● It works reasonably well for a single missed frame even with the
known side effects
● So…there’s a non-trivial set of tradeoffs, but I think it’s
“good enough” to use as a last-resort safety net for missed
frames. It’s better than dropping frames.
29
30. Positional Reprojection
● Still an unsolved problem that we are very interested in
● You only get one depth in a traditional renderer, so representing
translucency is a challenge (Particle systems)
● Depth might be stored in an MSAA depth buffer with color already
resolved. This can result in color bleeding.
● Hole-filling algorithms for pixels that aren’t represented can cause
retinal rivalry. Even with many frames worth of valid stereo pairs, if
the user moves vertically by crouching down or standing up, there
are gaps that need to be filled.
30
31. Asynchronous Reprojection
● Ideal safety net
● Requires preemption granularity as good as or better than current
generation GPUs
● Current GPUs can generally preempt at draw call boundaries,
depending on the GPU
● Not yet a guarantee to always reproject in time for vsync
● Applications need to be aware of preemption granularity:
● “You can split up the screen into tiles and run the post processing on each tile in a
separate draw call. That way, you provide the opportunity for async timewarp to come
in and preempt in between those draws if it needs to.” – “VRDirect”, Nathan Reed,
GDC 2015
31
32. Interleaved Reprojection Hint
● Older GPUs can’t support asynchronous reprojection, so we need an alternative
● OpenVR API has an interleaved reprojection hint where the app can request
every-other-frame rotation-only reprojection if the underlying system doesn’t
support always-on, asynchronous reprojection. App gets ~18ms/frame to render.
● Underlying VR system can also use interleaved reprojection as an auto-enabled
safety net when application is below target framerate
● Every-other-frame reprojection is a good tradeoff:
● “In our experience, ATW should run at a fixed fraction of the game frame rate. For example, at
90Hz refresh rate, we should either hit 90Hz or fall down to the half-rate of 45Hz with ATW. This
will result in image doubling, but the relative positions of the double images on the retina will be
stable. Rendering at an intermediate rate, such as 65Hz, will result in a constantly changing
number and position of the images on the retina, which is a worse artifact.” – “Asynchronous
Timewarp Examined”, Michael Antonov, Oculus blog, March, 2015
32
33. Outline
● Multi-GPU for VR
● Fixed Foveated Rendering & Radial Density Masking
● Reprojection
● Adaptive Quality
33
34. Maintaining Framerate is Hard
● VR is more challenging than traditional games because:
● The user has fine control over the camera
● Many interaction models allow users to reconfigure the world
● I gave up on tuning rendering and content to hit 90 fps since users
can reconfigure content so easily
● Last year at GDC, we got Robot Repair to hit framerate by tuning the
worst 20% of the experience
34
35. Adaptive Quality
● Stated simply: “Adaptive Quality dynamically changes rendering
settings to maintain framerate while maximizing GPU utilization”
● Goal #1: Reduce the chances of dropping frames and reprojecting
● Goal #2: Increase quality when there are idle GPU cycles
● Example is the Aperture Robot Repair VR demo running at target
framerate on an NVIDIA 680 using two different methods
35
36. Adaptive Quality - Benefits
● Lower GPU min spec for your application
● Increased art asset limits – Artists can now make the tradeoff
between slightly lower fidelity rendering for higher poly assets or
more complex materials
● Don’t need to rely on reprojection to maintain framerate
● Unexpected Benefit: Our apps look better on all hardware
36
37. What Settings Are Changed?
● What you can’t adjust:
● Can’t toggle visual features like specular
● Can’t toggle shadows
● What you can adjust:
● Rendering resolution/viewport (aka Dynamic Resolution)
● MSAA level or anti-aliasing algorithm
● Fixed Foveated Rendering
● Radial Density Masking
● etc.
37
40. Measuring GPU Workload
● You GPU workload isn’t always solid, might have bubbles
● VR system GPU workload is variable: lens distortion, chromatic
aberration, chaperone bounds, overlays, etc.
● Get timings from the VR system, not your application. OpenVR, for
example, provides a total GPU timer that accounts for all GPU work
40
41. GPU Timers - Latency
● Your GPU queries are 1 frame old
● You also have 1 or 2 frames in the queue that can’t be modified
41
42. Implementation Details – 3 Rules
● Goal: Maintain 70%-90% GPU utilization
● High = 90% of frame (10.0ms)
● Decrease aggressively: If the last frame finished rendering after the 90%
threshold of the GPU frame, drop 2 levels, wait 2 frames
● Low = 70% of frame (7.8ms)
● Increase conservatively: If the last 3 frames finished below the 70% threshold
of the GPU frame, increase 1 level, wait 2 frames
● Prediction = 85% of frame (9.4ms)
● Use linear extrapolation from last two frames to predict rapid increases
● If last frame is above the 85% threshold and the linearly extrapolated next
frame is above the high threshold (90%), drop 2 levels, wait 2 frames
42
43. 10% Idle Rule
● The high threshold of 90% leaves 10% of the GPU idle for other
processes almost every frame. This is a good thing.
● You need to share the GPU with other processes, even Windows
desktop needs a slice of the GPU every few VR frames.
● My mental model of GPU budget changed from last year’s 11.11ms
to now 10.0ms per frame, so you almost never starve other
processes of GPU cycles.
43
44. Adaptive Quality in Aperture Robot Repair
Option A
+6: 8xMSAA, 1.4x res (NVIDIA 980Ti renders here for 95% of the experience)
+5: 8xMSAA, 1.3x res
+4: 8xMSAA, 1.2x res
+3: 8xMSAA, 1.1x res
+2: 8xMSAA, 1.0x res
+1: 4xMSAA, 1.1x res
0: 4xMSAA, 1.0x resolution (Default) (NVIDIA 970 stays at or above this level)
-1: 4xMSAA, 0.9x res
-2: 4xMSAA, 0.81x res
-3: 4xMSAA, 0.73x res
-4: 4xMSAA, 0.65x res, Radial Density Masking (NVIDIA 680 stays at or above this level)
44
45. What About Text?
● You don’t actually want to go down to that low resolution scalar of
0.65, because in-game text will be very difficult to read
● Instead:
● Raise the low end up to about 0.8 resolution scalar
● If GPU can’t maintain framerate at lowest resolution, enable Interleaved
Reprojection Hint (in case asynchronous reprojection isn’t supported)
● For Adaptive Quality, we enable the interleaved reprojection hint
when we want to drop below the lowest quality level as the last-
resort safety net with rotation-only reprojection
45
46. Adaptive Quality in Aperture Robot Repair
Option A
+6: 8xMSAA, 1.4x res
+5: 8xMSAA, 1.3x res
+4: 8xMSAA, 1.2x res
+3: 8xMSAA, 1.1x res
+2: 8xMSAA, 1.0x res
+1: 4xMSAA, 1.1x res
0: 4xMSAA, 1.0x resolution (Default)
-1: 4xMSAA, 0.9x res
-2: 4xMSAA, 0.81x res
-3: 4xMSAA, 0.73x res
-4: 4xMSAA, 0.65x res, Radial Density Masking
46
Option B – Text-friendly
+6: 8xMSAA, 1.4x res
+5: 8xMSAA, 1.3x res
+4: 8xMSAA, 1.2x res
+3: 8xMSAA, 1.1x res
+2: 8xMSAA, 1.0x res
+1: 4xMSAA, 1.1x res
0: 4xMSAA, 1.0x resolution (Default)
-1: 4xMSAA, 0.9x res
-2: 4xMSAA, 0.81x res
-3: 4xMSAA, 0.81x res, Interleaved Reprojection Hint
47. Why Max Out at 1.4x Resolution?
Aperture allocates both a 1.4 8xMSAA and a 1.1 4xMSAA render target per eye for a
total of 342 MB + 117 MB = 459 MB per eye (918 MB 2 eyes)! So we use sequential
rendering to share the render target and limit resolution to 1.4x for 4 GB GPUs.
47
Scalar MSAA Resolution GPU Memory
1 Eye = Color +
Depth + Resolve
GPU Memory
2 Eyes = Color +
Depth + Resolve
2.0 8x 3024x3360 698 MB 1,396 MB
2.0 4x 3024x3360 388 MB 776 MB
1.4 8x 2116x2352 342 MB 684 MB
1.2 8x 1814x2016 251 MB 502 MB
1.0 8x 1512x1680 174 MB 348 MB
1.1 4x 1663x1848 117 MB 234 MB
1.0 4x 1512x1680 97 MB 194 MB
0.81 4x 1224x1360 64 MB 128 MB
Aperture
Aperture
49. Valve’s Unity Rendering Plugin
● Valve is using a custom rendering plugin in Unity for The Lab
● The Valve VR Rendering Plugin will be free on the Unity Asset Store
soon with full source code
● The plugin is a single-pass forward renderer (because we want
4xMSAA and 8xMSAA) supporting up to 18 dynamic shadowing lights
and Adaptive Quality
● Special thanks to Unity devs Peter Kuhn, Scott Flynn, Joachim Ante,
and Rich Geldreich for adding Adaptive Quality hooks to Unity
5.4.0b9 that shipped one week ago!
49
50. Decoupling CPU and GPU Performance
● Make your render thread autonomous
● If CPU isn’t ready with a new frame, don’t reproject! Instead, render
thread resubmits last frame’s GPU workload with updated HMD
poses and minimal Adaptive Quality support of dynamic resolution
● To solve animation judder, feed your render thread two animation
frames you can interpolate between to keep animation updating
● But, non-trivial animation prediction is a hard problem
● Then you can plan to run your CPU at 1/2 or 1/3 GPU framerate to
do more complex simulation or run on lower end CPUs
50
51. Summary
● Multi-GPU support should be in all VR engines (at least 2-GPUs)
● Fixed Foveated Rendering and Radial Density Masking are solutions
that help counteract the optics vs projection matrix battle
● Adaptive Quality scales fidelity up and down while leaving 10% of
the GPU available for other processes. Do not rely on reprojection to
hit framerate on your min spec!
● Valve VR Rendering Plugin for Unity will ship free soon
● Think about how your engine can decouple CPU and GPU
performance with resubmission on your render thread
51