Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Khronos Munich 2018 - Halcyon and Vulkan

6,303 views

Published on

Graham Wihlidal from SEED attended the Munich Khronos Meetup and presented some aspects of Halcyon's rendering architecture, as well as details of the Vulkan implementation. Graham presented components like high-level render command translation, render graph, and shader compilation.

Published in: Technology
  • Be the first to comment

Khronos Munich 2018 - Halcyon and Vulkan

  1. 1. Halcyon + Vulkan Munich Khronos Meetup Graham Wihlidal SEED – Electronic Arts
  2. 2. “PICA PICA”  Exploratory mini-game & world  Goals  Hybrid rendering with DXR [Andersson 2018]  Clean and consistent visuals  Self-learning AI agents [Harmer 2018]  Procedural worlds [Opara 2018]  No precomputation  Uses SEED’s Halcyon R&D framework S E E D // Halcyon + Vulkan
  3. 3. HALCYON
  4. 4. Halcyon Goals  Rapid prototyping framework  Different purpose than Frostbite  Fast experimentation vs. AAA games  Windows, Linux, macOS S E E D // Halcyon + Vulkan
  5. 5. Halcyon Goals  Minimize or eliminate busy-work  Artist “meta-data” meshes  Occlusion  GI / Lighting  Collision  Level-of-detail  Live reloading of all assets  Insanely fast iteration times S E E D // Halcyon + Vulkan
  6. 6. Halcyon Goals  Only target modern APIs  Direct3D 12  Vulkan 1.1  Metal 2  Multi-GPU  Explicit heterogeneous mGPU  No AFR nonsense  No linked adapters S E E D // Halcyon + Vulkan
  7. 7. Halcyon Goals  Local or remote streaming  Minimal boilerplate code  Variety of rendering techniques and approaches  Rasterization  Path and ray tracing  Hybrid S E E D // Halcyon + Vulkan
  8. 8. Hybrid Rendering S E E D // Halcyon + Vulkan Direct Shadows (ray trace or raster) Direct Lighting (compute) Reflections (ray trace or compute) Global Illumination (ray trace) Post Processing (compute) Transparency & Translucency (ray trace) Ambient Occlusion (ray trace or compute) Deferred Shading (raster)
  9. 9. Hybrid Rendering
  10. 10. Rasterization Only
  11. 11. Halcyon Goals  “PICA PICA” and Halcyon built from scratch  Implemented lots of bespoke technology  Minimal effort to add a new API or platform  Efficient and flexible rendering was a major focus S E E D // Halcyon + Vulkan
  12. 12. Rendering Components  Render Backend  Render Device  Render Handles  Render Commands  Render Graph  Render Proxy S E E D // Halcyon + Vulkan
  13. 13. Rendering Components S E E D // Halcyon + Vulkan Render Handles Render Commands Render Backend Render Device Render Backend Render DeviceRender Device Render Backend Render Proxy Render Graph Render Graph Application
  14. 14. Render Backend
  15. 15. Render Backend  Live-reloadable DLLs  Enumerates adapters and capabilities  Swap chain support  Extensions (i.e. ray tracing, sub groups, …)  Determine adapter(s) to use S E E D // Halcyon + Vulkan
  16. 16. Render Backend  Provides debugging and profiling  RenderDoc integration, validation layers, …  Create and destroy render devices S E E D // Halcyon + Vulkan
  17. 17. Render Backend S E E D // Halcyon + Vulkan  Direct3D 12  Vulkan 1.1  Metal 2  Proxy  Mock
  18. 18. Render Backend S E E D // Halcyon + Vulkan  Direct3D 12  Shader Model 6.X  DirectX Ray Tracing  Bindless Resources  Explicit Multi-GPU  DirectML (soon..)  …
  19. 19. Render Backend S E E D // Halcyon + Vulkan  Vulkan 1.1  Sub-groups  Descriptor indexing  External memory  Multi-draw indirect  Ray tracing (soon..)  …
  20. 20. Render Backend S E E D // Halcyon + Vulkan  Metal 2  Early development  Primarily desktop  Argument buffers  Machine learning  …
  21. 21. Render Backend S E E D // Halcyon + Vulkan  Proxy  Not discussed in this presentation
  22. 22. Render Backend S E E D // Halcyon + Vulkan  Mock  Performs resource tracking and validation  Command stream is parsed and evaluated  No submission to an API  Useful for unit tests and debugging
  23. 23. Render Device
  24. 24. Render Device S E E D // Halcyon + Vulkan  Abstraction of a logical GPU adapter  e.g. VkDevice, ID3D12Device, …  Provides interface to GPU queues  Command list submission
  25. 25. Render Device S E E D // Halcyon + Vulkan  Ownership of GPU resources  Create & Destroy  Lifetime tracking of resources  Mapping render handles  device resources
  26. 26. Render Handles
  27. 27. S E E D // Halcyon + Vulkan Render Handles  Resources associated by handle  Lightweight (64 bits)  Constant-time lookup  Type safety (i.e. buffer vs texture)  Can be serialized or transmitted  Generational for safety  e.g. double-delete, usage after delete
  28. 28. S E E D // Halcyon + Vulkan Render Handles ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle  Handles allow one-to-many cardinality [handle->devices]  Each device can have a unique representation of the handle
  29. 29. S E E D // Halcyon + Vulkan Render Handles  Can query if a device has a handle loaded  Safely add and remove devices  Handle owned by application, representation can reload on device ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle
  30. 30. S E E D // Halcyon + Vulkan Render Handles  Shared resources are supported  Primary device owner, secondaries alias primary ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle
  31. 31. S E E D // Halcyon + Vulkan Render Handles  Can also mix and match backends in the same process!  Made debugging VK implementation much easier  DX12 on left half of screen, VK on right half of screen ID3D12Resource ID3D12Resource VK: Adapter 0 VkImage Proxy: Adapter 0 Render Handle DX12: Adapter 1DX12: Adapter 0 Render Handle
  32. 32. Render Commands
  33. 33. Render Commands  Draw  DrawIndirect  Dispatch  DispatchIndirect  UpdateBuffer  UpdateTexture  CopyBuffer  CopyTexture  Barriers  Transitions  BeginTiming  EndTiming  ResolveTimings  BeginEvent  EndEvent  BeginRenderPass  EndRenderPass  RayTrace  UpdateTopLevel  UpdateBottomLevel  UpdateShaderTable S E E D // Halcyon + Vulkan
  34. 34. Render Commands  Queue type specified  Spec validation  Allowed to run?  e.g. draws on compute  Automatic scheduling  Where can it run?  Async compute S E E D // Halcyon + Vulkan
  35. 35. Render Commands S E E D // Halcyon + Vulkan
  36. 36. Render Command List  Encodes high level commands  Tracks queue types encountered  Queue mask indicating scheduling rules  Commands are stateless - parallel recording S E E D // Halcyon + Vulkan
  37. 37. Render Compilation  Render command lists are “compiled”  Translation to low level API  Can compile once, submit multiple times  Serial operation (memcpy speed)  Perfect redundant state filtering S E E D // Halcyon + Vulkan
  38. 38. Render Graph
  39. 39. Render Graph  Inspired by FrameGraph [O’Donnell 2017]  Automatically handle transient resources  Import explicitly managed resources  Automatic resource transitions  Render target batching  DiscardResource  Memory aliasing barriers  … S E E D // Halcyon + Vulkan
  40. 40. Render Graph  Basic memory management  Not targeting current consoles  Fine grained memory reuse sub-optimal with current PC drivers  Lose ~5% on aliasing barriers and discards  Automatic queue scheduling  Ongoing research  Need heuristics on task duration and bottlenecks  e.g. Memory vs ALU  Not enough to specify dependencies S E E D // Halcyon + Vulkan
  41. 41. Render Graph  Frame Graph  Render Graph: No concept of a “frame”  Fully automatic transitions and split barriers  Single implementation, regardless of backend  Translation from high level render command stream  API differences hidden from render graph  Support for mGPU  Mostly implicit and automatic  Can specify a scheduling policy  Not discussed in this presentation S E E D // Halcyon + Vulkan
  42. 42. Render Graph  Composition of multiple graphs at varying frequencies  Same GPU: async compute  mGPU: graphs per GPU  Out-of-core: server cluster, remote streaming S E E D // Halcyon + Vulkan
  43. 43. Render Graph  Composition of multiple graphs at varying frequencies  e.g. translucency, refraction, global illumination S E E D // Halcyon + Vulkan
  44. 44. Render Graph  Two phases  Graph construction  Specify inputs and outputs  Serial operation (by design)  Graph evaluation  Highly parallelized  Record high level render commands  Automatic barriers and transitions S E E D // Halcyon + Vulkan
  45. 45.  Construction phase  Evaluation phase
  46. 46. Render Graph  Automatic profiling data  GPU and CPU counters per-pass  Works with mGPU  Each GPU is profiled S E E D // Halcyon + Vulkan
  47. 47. Render Graph  Live debugging overlay  Evaluated passes in-order of execution  Input and output dependencies  Resource version information S E E D // Halcyon + Vulkan
  48. 48. Render Graph Some of our render graph passes: S E E D // Halcyon + Vulkan  Bloom  BottomLevelUpdate  BrdfLut  CocDerive  DepthPyramid  DiffuseSh  Dof  Final  GBuffer  Gtao  IblReflection  ImGui  InstanceTransforms  Lighting  MotionBlur  Present  RayTracing  RayTracingAccum  ReflectionFilter  ReflectionSample  ReflectionTrace  Rtao  Screenshot  Segmentation  ShaderTableUpdate  ShadowFilter  ShadowMask  ShadowCascades  ShadowTrace  Skinning  Ssr  SurfelGapFill  SurfelLighting  SurfelPositions  SurfelSpawn  Svgf  TemporalAa  TemporalReproject  TopLevelUpdate  TranslucencyTrace  Velocity  Visibility
  49. 49. Shaders
  50. 50. Shaders  Complex materials  Multiple microfacet layers  [Stachowiak 2018]  Energy conserving  Automatic Fresnel between layers  All lighting & rendering modes  Raster, path-traced reference, hybrid  Iterate with different looks  Bake down permutations for production S E E D // Halcyon + Vulkan Objects with Multi-Layered Materials
  51. 51. Shaders  Exclusively HLSL  Shader Model 6.X  Majority are compute shaders  Performance is critical  Group shared memory  Wave-ops / Sub-groups S E E D // Halcyon + Vulkan
  52. 52. Shaders  No reflection  Avoid costly lookups  Only explicit bindings  … except for validation  Extensive use of HLSL spaces  Updates at varying frequency  Bindless S E E D // Halcyon + Vulkan
  53. 53. Shaders S E E D // Halcyon + Vulkan SPIR-VDXIL Vulkan 1.1Direct3D 12 ISPCMSL SPIRV-CROSS AVX2, …Metal 2 DXC HLSL
  54. 54. Shader Arguments  Commands refer to resources with “Shader Arguments”  Each argument represents an HLSL space  MaxShaderParameters  4 [Configurable]  # of spaces, not # of resources S E E D // Halcyon + Vulkan
  55. 55. Shader Arguments  Each argument contains:  “ShaderViews” handle  Constant buffer handle and offset  “ShaderViews”  Collection of SRV and UAV handles S E E D // Halcyon + Vulkan
  56. 56. Shader Arguments  Constant buffers are all dynamic  Avoid temporary descriptors  Just a few large buffers, offsets change frequently  VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC  DX12 Root Descriptors (pass in GPU VA)  All descriptor sets are only written once  Persisted / cached S E E D // Halcyon + Vulkan
  57. 57. Vulkan Implementation  Architecture simplified development effort  Vulkan specific:  Backend and device implementation  Memory allocators (e.g. AMD VMA)  Barrier and transition logic  Resource binding model S E E D // Halcyon + Vulkan
  58. 58. Vulkan Implementation S E E D // Halcyon + Vulkan
  59. 59. Vulkan Implementation S E E D // Halcyon + Vulkan
  60. 60. Vulkan Implementation S E E D // Halcyon + Vulkan
  61. 61. Vulkan Implementation S E E D // Halcyon + Vulkan
  62. 62. Vulkan Implementation S E E D // Halcyon + Vulkan
  63. 63. Vulkan Implementation S E E D // Halcyon + Vulkan
  64. 64. Vulkan Implementation S E E D // Halcyon + Vulkan
  65. 65. Vulkan Implementation S E E D // Halcyon + Vulkan
  66. 66. Vulkan! 
  67. 67. Vulkan! 
  68. 68. Vulkan! 
  69. 69. Vulkan Implementation  Shader compilation (HLSL  SPIR-V)  Patch SPIR-V to match DX12  Using spirv-reflect from Hai and Cort  spvReflectCreateShaderModule  spvReflectEnumerateDescriptorSets  spvReflectChangeDescriptorBindingNumbers  spvReflectGetCodeSize / spvReflectGetCode  spvReflectDestroyShaderModule S E E D // Halcyon + Vulkan
  70. 70. SPIR-V Patching  SPV_REFLECT_RESOURCE_FLAG_SRV  Offset += 1000  SPV_REFLECT_RESOURCE_FLAG_SAMPLER  Offset += 2000  SPV_REFLECT_RESOURCE_FLAG_UAV  Offset += 3000 S E E D // Halcyon + Vulkan
  71. 71. SPIR-V Patching  SPV_REFLECT_RESOURCE_FLAG_CBV  Offset Unchanged: 0  Descriptor Set += MAX_SHADER_ARGUMENTS  CBVs move to their own descriptor sets  ShaderViews become persistent and immutable S E E D // Halcyon + Vulkan
  72. 72. SPIR-V Patching S E E D // Halcyon + Vulkan  If 2 of 4 HLSL spaces in use: Unbound SRVs (>=1000) UAVs (>=3000)Samplers (>=2000) Dynamic Constant Buffer (Offset: 0) SRVs (>=1000) UAVs (>=3000)Samplers (>=2000) Unbound Dynamic Constant Buffer (Offset: 0) Set 0 Set 1 Set 2 Set 3 Set 4 Set 5
  73. 73. Vulkan Implementation  Translate commands  Read command list  Write Vulkan API S E E D // Halcyon + Vulkan
  74. 74. S E E D // Halcyon + Vulkan
  75. 75. S E E D // Halcyon + Vulkan
  76. 76. Ongoing Work! VK nearing DX12
  77. 77. Tools
  78. 78. Tools  RenderDoc  NV Nsight  AMD RGP S E E D // Halcyon + Vulkan
  79. 79. S E E D // Halcyon + Vulkan
  80. 80. Tools S E E D // Halcyon + Vulkan
  81. 81. Tools S E E D // Halcyon + Vulkan  C++ Export!  Standalone
  82. 82. Dear ImGui + ImGuizmo  Live tweaking  Very useful! S E E D // Halcyon + Vulkan
  83. 83. References S E E D // Halcyon + Vulkan  [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”. available online  [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”. available online  [Harmer 2018] Jack Harmer, Linus Gisslén, Henrik Holst, Joakim Bergdahl, Tom Olsson, Kristoffer Sjöö and Magnus Nordin. “Imitation Learning with Concurrent Actions in 3D Games”. available online  [Opara 2018] Anastasia Opara. “Creativity of Rules and Patterns”. available online  [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”. available online
  84. 84. Thanks  Matthäus Chajdas  Rys Sommefeldt  Timothy Lottes  Tobias Hector  Neil Henning  John Kessenich  Hai Nguyen  Nuno Subtil  Adam Sawicki  Alon Or-bach  Baldur Karlsson  Cort Stratton  Mathias Schott  Rolando Caloca  Sebastian Aaltonen  Hans-Kristian Arntzen  Yuriy O’Donnell  Arseny Kapoulkine  Tex Riddell  Marcelo Lopez Ruiz  Lei Zhang  Greg Roth  Noah Fredriks  Qun Lin  Ehsan Nasiri,  Steven Perron  Alan Baker  Diego Novillo
  85. 85. Thanks  SEED  Johan Andersson  Colin Barré-Brisebois  Jasper Bekkers  Joakim Bergdahl  Ken Brown  Dean Calver  Dirk de la Hunt  Jenna Frisk  Paul Greveson  Henrik Halen  Effeli Holst  Andrew Lauritzen  Magnus Nordin  Niklas Nummelin  Anastasia Opara  Kristoffer Sjöö  Ida Winterhaven  Tomasz Stachowiak  Microsoft  Chas Boyd  Ivan Nevraev  Amar Patel  Matt Sandy  NVIDIA  Tomas Akenine-Möller  Nir Benty  Jiho Choi  Peter Harrison  Alex Hyder  Jon Jansen  Aaron Lefohn  Ignacio Llamas  Henry Moreton  Martin Stich
  86. 86. S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E S E E D . E A . C O M W E ‘ R E H I R I N G !
  87. 87. Questions?

×