Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MANTLE FOR DEVELOPERS
JOHAN ANDERSSON – TECHNICAL DIRECTOR
FROSTBITE
ELECTRONIC ARTS
Mantle?
Simplify advanced development
 Improve performance
 Enable developers to innovate
 Challenge the status quo
Developer impact areas
Control

CPU performance
Programmability

GPU performance
Platforms
Control

New model

Traditional Model:
Black Box

Explicit Model:
Mantle

 Middle-ground abstraction – compromise
between...
Control

App responsibility

 Tell when render target will be used as a texture
‒ And many more resource state transition...
Control

Explicit control enables

 App high-level decisions & optimizations
‒ Has full scene information
‒ Easier to opt...
Control

Explicit control enables

 Transient resources
‒ Alias render targets within frame
‒ Major memory savings
‒ No n...
Control

CPU performance
CPU perf

Core concepts
 Descriptor sets
 Monolithic pipelines
 Command buffers
CPU perf

Descriptor sets

 Table with resource references to bind to
graphics or compute pipeline
Image

Memory

Sampler...
CPU perf

Descriptor sets

 Table with resource references to bind to
graphics or compute pipeline
Image

Link

‒ Reduce ...
CPU perf

Monolithic pipelines

 Shader stages & select graphics state combined into single object
‒ No runtime compilati...
CPU perf

Command buffers

 Issue pipelined graphics & compute commands into a command buffer
‒ Bind graphics state, desc...
CPU perf

CPU 0
CPU 1
CPU 2

DX/GL parallelism
Game

Game
Game
Render
Render
Driver Render

 Automatically extracts paral...
CPU perf

Parallel dispatch with Mantle

CPU 0

Game

Game

Game

CPU 1

Render

Render

Render

CPU 2

Render

Render

Re...
CPU performance

GPU performance
GPU perf

GPU optimizations

 Thanks to improved CPU performance – CPU
will rarely be a bottleneck for the GPU
‒ CPU coul...
GPU perf

Queues

 Modern GPUs are heterogeneous machines
with multiple engines

Graphics

‒ Graphics pipeline
‒ Compute ...
GPU perf

Queues
Graphics
Compute
DMA
...
Queues

GPU
GPU perf

Queue use cases

 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute

Copy

DMA
Graphics...
GPU perf

Queue use cases

 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute

 Async compute to...
GPU perf

Queue use cases

 Async DMA transfers

 Multiple compute kernels collaborating

‒ Copy resources in parallel w...
GPU perf

Queue use cases

 Async DMA transfers

 Multiple compute kernels collaborating

‒ Copy resources in parallel w...
GPU performance
Programmability
Programmability

Explicit Multi-GPU

 Explicit control of GPU queues and synchronization, finally!
‒ Implement your own A...
Programmability

New mechanisms

 Command buffer predication & flow control
‒ GPU affecting/skipping submitted commands
‒...
Programmability

Bindless resources

 Mantle supports bindless resources
‒ Shaders can select resources to use instead of...
Programmability

Platforms
Platforms

Today

 Mantle gives us strong benefits on Windows today
‒ Console-like performance & programmability on both ...
Platforms

Linux & Mac

 Want to see Mantle on Linux and Mac!
‒ Would enable support for our full engine & rendering
‒ Si...
Platforms

Mobile

 Mobile architectures are getting closer in capabilities to desktop GPUs
 Want graphics API that allo...
Platforms

Multi-vendor?

 Mantle is designed to be a thin hardware abstraction
‒ Not tied to AMD’s GCN architecture
‒ Fo...
Platforms
Frostbite

Battlefield 4

 Mantle support is in development
‒ Core renderer (closer to PS4 than DX11)
‒ Implement all ren...
Frostbite

Plants vs Zombies: Garden Warfare
 Very different rendering
compared to BF4 
 Frostbite Mantle renderer will...
Frostbite

Future
 All Frostbite games designed with Mantle
‒ 15 games in development across all of EA

 Advanced Mantle...
Email:
Web:
Twitter:

repi@dice.se
http://frostbite.com
@repi

THE END
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts
Upcoming SlideShare
Loading in …5
×

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

11,741 views

Published on

Keynote, Mantle for Developers, by Johan Andersson, Technical Director, DICE/Electronic Arts, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Published in: Technology
  • Be the first to comment

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

  1. 1. MANTLE FOR DEVELOPERS JOHAN ANDERSSON – TECHNICAL DIRECTOR FROSTBITE ELECTRONIC ARTS
  2. 2. Mantle? Simplify advanced development  Improve performance  Enable developers to innovate  Challenge the status quo
  3. 3. Developer impact areas Control CPU performance Programmability GPU performance Platforms
  4. 4. Control New model Traditional Model: Black Box Explicit Model: Mantle  Middle-ground abstraction – compromise between performance & “usability”  Thin low-level abstraction to expose how hardware works  Hidden resource memory & state  App explicit memory management  Resource CPU access tied to device context  Resources are globally accessible  Driver analyzes & synchronizes implicitly  App explicit resource state transitions
  5. 5. Control App responsibility  Tell when render target will be used as a texture ‒ And many more resource state transitions  Don’t destroy resources that GPU is using ‒ Keep track with fences or frames  Manual dynamic resource renaming ‒ No DISCARD for driver resource renaming  Resource memory tiling  Powerful validation layer will help!
  6. 6. Control Explicit control enables  App high-level decisions & optimizations ‒ Has full scene information ‒ Easier to optimize performance & memory  Flexible & efficient memory management ‒ Linear frame allocators ‒ Memory pools ‒ Pinned memory  Reduced development time ‒ For advanced game engines & apps ‒ Easier to get to target performance & robustness
  7. 7. Control Explicit control enables  Transient resources ‒ Alias render targets within frame ‒ Major memory savings ‒ No need to pre-allocate everything  Light-weight driver ‒ Easier to develop & maintain ‒ Reduced CPU draw call overhead
  8. 8. Control CPU performance
  9. 9. CPU perf Core concepts  Descriptor sets  Monolithic pipelines  Command buffers
  10. 10. CPU perf Descriptor sets  Table with resource references to bind to graphics or compute pipeline Image Memory Sampler Link  Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works  Example 1: Single simple dynamic descriptor set ‒ Bind everything you need for a single draw call ‒ Close to DX/GL model but share between stages Dynamic descriptor set VertexBuffer (VS) Texture0 (VS+PS) Constants (VS) Texture1 (PS)  App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic Texture2 (PS) Sampler0 (VS+PS)
  11. 11. CPU perf Descriptor sets  Table with resource references to bind to graphics or compute pipeline Image Link ‒ Reduce update time & memory usage Memory Sampler  Example 2: Reuse static set with nesting  Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works  App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic Static descriptor set Dynamic descriptor set Constants (VS) Link VertexBuffer (VS) Texture0 (VS+PS) Texture1 (PS) Texture2 (PS) Texture3 (PS) Texture4 (PS) Sampler0 (VS+PS) Sampler1 (PS)
  12. 12. CPU perf Monolithic pipelines  Shader stages & select graphics state combined into single object ‒ No runtime compilation or patching needed! ‒ Significantly less runtime overhead to use Pipeline state  Supports parallel building & caching ‒ Fast loading times  Usage & management up to the app ‒ Static vs dynamic creation ‒ Amount of pipelines ‒ State usage IA VS HS DS Tessellator GS RS PS DB CB
  13. 13. CPU perf Command buffers  Issue pipelined graphics & compute commands into a command buffer ‒ Bind graphics state, descriptor sets, pipeline ‒ Draw calls ‒ Render targets ‒ Clears ‒ Memory transfers ‒ NOT: resource mapping  Fully independent objects ‒ Create multiple every frame ‒ Or pre-build up front and reuse
  14. 14. CPU perf CPU 0 CPU 1 CPU 2 DX/GL parallelism Game Game Game Render Render Driver Render  Automatically extracts parallelism out of most apps   Doesn’t scale beyond 2-3 cores   Additional latency   Driver thread often bottleneck – can collide app threads  Render
  15. 15. CPU perf Parallel dispatch with Mantle CPU 0 Game Game Game CPU 1 Render Render Render CPU 2 Render Render Render CPU 3 Render Render Render CPU 4 Render Render Render  App can go fully wide with its rendering – minimal latency   Close to linear scaling with CPU cores   No driver threads – no overhead – no contention   Frostbite’s approach on all consoles – and on PC with Mantle! 
  16. 16. CPU performance GPU performance
  17. 17. GPU perf GPU optimizations  Thanks to improved CPU performance – CPU will rarely be a bottleneck for the GPU ‒ CPU could help GPU more: ‒ Less brute force rendering ‒ Improve culling  Resource states ‒ Gives driver a lot more knowledge & flexibility ‒ Apps can avoid expensive/redundant transitions, such as surface decompression  Expose existing GPU functionality  Shader pipeline object – driver optimizations ‒ Can optimize with pipeline state knowledge ‒ Can optimize across all shader stages ‒ Quad & Rect-lists ‒ HW-specific MSAA & depth data access ‒ Programmable sample patterns ‒ And more..
  18. 18. GPU perf Queues  Modern GPUs are heterogeneous machines with multiple engines Graphics ‒ Graphics pipeline ‒ Compute pipeline(s) ‒ DMA transfer ‒ Video encode/decode ‒ More…  Mantle exposes queues for the engines + synchronization primitives Compute DMA ... Queues GPU
  19. 19. GPU perf Queues Graphics Compute DMA ... Queues GPU
  20. 20. GPU perf Queue use cases  Async DMA transfers ‒ Copy resources in parallel with graphics or compute Copy DMA Graphics Render Other render Use copy
  21. 21. GPU perf Queue use cases  Async DMA transfers ‒ Copy resources in parallel with graphics or compute  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units Compute Graphics GBuffer Non-shadowed lighting Shadowmap 0 Shadowmap 1 Final lighting
  22. 22. GPU perf Queue use cases  Async DMA transfers  Multiple compute kernels collaborating ‒ Copy resources in parallel with graphics or compute ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute rasterizer  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units Compute 0 Compute 1 Graphics Compute Geometry Compute Rasterizer Ordinary Rendering
  23. 23. GPU perf Queue use cases  Async DMA transfers  Multiple compute kernels collaborating ‒ Copy resources in parallel with graphics or compute  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units Compute Graphics ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute rasterizer  Compute as frontend for graphics pipeline ‒ Compute runs asynchronously ahead and prepares & optimizes geometry for graphics pipeline  Game Process1 large GPU Process0 engines will buildProcess0 job graphs ‒ Move away from single sequential submission Draw0 ‒ Just as we already have doneDraw1 on CPU Draw2
  24. 24. GPU performance Programmability
  25. 25. Programmability Explicit Multi-GPU  Explicit control of GPU queues and synchronization, finally! ‒ Implement your own Alternate-Frame-Rendering ‒ Or something more exotic..  Use case: Workstation rendering with 4-8 GPUs ‒ Super high-quality rendering & simulation ‒ Load balance graphics & compute job graphs across GPUs ‒ 20-40 TFlops in a single machine!  Use case: Low-latency rendering ‒ Important for VR and competitive games ‒ Latency optimized GPU job graph scheduling ‒ VR: Simultaneously drive 2 GPUs (1 per eye)
  26. 26. Programmability New mechanisms  Command buffer predication & flow control ‒ GPU affecting/skipping submitted commands ‒ Go beyond DrawIndirect / DispatchIndirect ‒ Advanced variable workloads ‒ Advanced culling optimizations  Write occlusion query results into GPU buffer ‒ No CPU roundtrip needed ‒ Can drive predicated rendering ‒ Or use results directly in shaders (lens flares)
  27. 27. Programmability Bindless resources  Mantle supports bindless resources ‒ Shaders can select resources to use instead of static binding from CPU ‒ Extension of the descriptor set support  Examples ‒ Performance optimizations – less data to update ‒ Logic & data structures that live fully on the GPU ‒ Scene culling & rendering ‒ Material representations  Key component that will open up a lot of opportunities! ‒ Deferred shading ‒ Raytracing
  28. 28. Programmability Platforms
  29. 29. Platforms Today  Mantle gives us strong benefits on Windows today ‒ Console-like performance & programmability on both Windows 7 and Windows 8 ‒ For us, well worth the dev time!  DX & GL are the industry standards ‒ Needed for platforms that do not support Mantle ‒ Needed by devs who do not want/need more control ‒ Have to have fallback paths for GL/DX, but not limit oneself to it  Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations ‒ PS4 graphics API has great programmability & performance as well ‒ Share concepts, methods & optimization strategies
  30. 30. Platforms Linux & Mac  Want to see Mantle on Linux and Mac! ‒ Would enable support for our full engine & rendering ‒ Significantly easier to do efficient renderer with Mantle than with OpenGL  Use cases: ‒ Workstations ‒ R&D ‒ Not limited by WDDM ‒ Games ‒ Mantle + SteamOS = powerful combination!
  31. 31. Platforms Mobile  Mobile architectures are getting closer in capabilities to desktop GPUs  Want graphics API that allows apps to fully utilize the hardware ‒ Power efficient ‒ High performance ‒ Programmable  Major opportunity with Mantle – leap frog GL4, DX11 ‒ For mobile SoC vendors ‒ For Google and Apple
  32. 32. Platforms Multi-vendor?  Mantle is designed to be a thin hardware abstraction ‒ Not tied to AMD’s GCN architecture ‒ Forward compatible ‒ Extensions for architecture- and platform-specific functionality  Mantle would be a much more efficient graphics API for other vendors as well ‒ Most Mantle functionality can be supported on today’s modern GPUs  Want to see future version of Mantle supported on all platforms and on all modern GPUs! ‒ Become an active industry standard with IHVs and ISVs collaborating ‒ Enable us developers to innovate with great performance & programmability everywhere
  33. 33. Platforms
  34. 34. Frostbite Battlefield 4  Mantle support is in development ‒ Core renderer (closer to PS4 than DX11) ‒ Implement all rendering techniques used in BF4 (many!) ‒ CPU optimizations (parallel dispatch, descriptor sets) ‒ GPU optimizations (minimize transitions, MSAA) ‒ R&D for advanced GPU optimizations ‒ Memory management ‒ Multi-GPU support ‒ ~2 months of work  Update targeting late December
  35. 35. Frostbite Plants vs Zombies: Garden Warfare  Very different rendering compared to BF4   Frostbite Mantle renderer will work out of the box  Focus on APU performance
  36. 36. Frostbite Future  All Frostbite games designed with Mantle ‒ 15 games in development across all of EA  Advanced Mantle rendering & use cases ‒ Lots of exciting R&D opportunities!  Want multi-vendor & multi-platform support!
  37. 37. Email: Web: Twitter: repi@dice.se http://frostbite.com @repi THE END

×