Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

MANTLE FOR DEVELOPERS
JOHAN ANDERSSON – TECHNICAL DIRECTOR
FROSTBITE
ELECTRONIC ARTS

Mantle?
Simplify advanced development
 Improve performance
 Enable developers to innovate
 Challenge the status quo

Developer impact areas
Control

CPU performance
Programmability

GPU performance
Platforms

Control

New model

Traditional Model:
Black Box

Explicit Model:
Mantle

 Middle-ground abstraction – compromise
between performance & “usability”

 Thin low-level abstraction to expose how
hardware works

 Hidden resource memory & state

 App explicit memory management

 Resource CPU access tied to device context

 Resources are globally accessible

 Driver analyzes & synchronizes implicitly

 App explicit resource state transitions

Control

App responsibility

 Tell when render target will be used as a texture
‒ And many more resource state transitions

 Don’t destroy resources that GPU is using
‒ Keep track with fences or frames

 Manual dynamic resource renaming
‒ No DISCARD for driver resource renaming

 Resource memory tiling
 Powerful validation layer will help!

Control

Explicit control enables

 App high-level decisions & optimizations
‒ Has full scene information
‒ Easier to optimize performance & memory

 Flexible & efficient memory management
‒ Linear frame allocators
‒ Memory pools
‒ Pinned memory

 Reduced development time
‒ For advanced game engines & apps
‒ Easier to get to target performance & robustness

Control

Explicit control enables

 Transient resources
‒ Alias render targets within frame
‒ Major memory savings
‒ No need to pre-allocate everything

 Light-weight driver
‒ Easier to develop & maintain
‒ Reduced CPU draw call overhead

CPU perf

Core concepts
 Descriptor sets
 Monolithic pipelines
 Command buffers

CPU perf

Descriptor sets

 Table with resource references to bind to
graphics or compute pipeline
Image

Memory

Sampler

Link

 Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works

 Example 1: Single simple dynamic descriptor set
‒ Bind everything you need for a single draw call
‒ Close to DX/GL model but share between stages
Dynamic descriptor set
VertexBuffer (VS)
Texture0 (VS+PS)
Constants (VS)
Texture1 (PS)

 App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic

Texture2 (PS)
Sampler0 (VS+PS)

CPU perf

Descriptor sets

 Table with resource references to bind to
graphics or compute pipeline
Image

Link

‒ Reduce update time & memory usage

Memory

Sampler

 Example 2: Reuse static set with nesting

 Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works

 App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic

Static descriptor set
Dynamic descriptor set
Constants (VS)
Link

VertexBuffer (VS)
Texture0 (VS+PS)
Texture1 (PS)
Texture2 (PS)
Texture3 (PS)
Texture4 (PS)
Sampler0 (VS+PS)
Sampler1 (PS)

CPU perf

Monolithic pipelines

 Shader stages & select graphics state combined into single object
‒ No runtime compilation or patching needed!
‒ Significantly less runtime overhead to use
Pipeline state

 Supports parallel building & caching
‒ Fast loading times

 Usage & management up to the app
‒ Static vs dynamic creation
‒ Amount of pipelines
‒ State usage

IA

VS

HS

DS
Tessellator

GS

RS

PS

DB
CB

CPU perf

Command buffers

 Issue pipelined graphics & compute commands into a command buffer
‒ Bind graphics state, descriptor sets, pipeline
‒ Draw calls
‒ Render targets
‒ Clears
‒ Memory transfers
‒ NOT: resource mapping

 Fully independent objects
‒ Create multiple every frame
‒ Or pre-build up front and reuse

CPU perf

CPU 0
CPU 1
CPU 2

DX/GL parallelism
Game

Game
Game
Render
Render
Driver Render

 Automatically extracts parallelism out of most apps 
 Doesn’t scale beyond 2-3 cores 
 Additional latency 
 Driver thread often bottleneck – can collide app threads 

Render

CPU perf

Parallel dispatch with Mantle

CPU 0

Game

Game

Game

CPU 1

Render

Render

Render

CPU 2

Render

Render

Render

CPU 3

Render

Render

Render

CPU 4

Render

Render

Render

 App can go fully wide with its rendering – minimal latency 
 Close to linear scaling with CPU cores 
 No driver threads – no overhead – no contention 
 Frostbite’s approach on all consoles – and on PC with Mantle! 

CPU performance

GPU performance

GPU perf

GPU optimizations

 Thanks to improved CPU performance – CPU
will rarely be a bottleneck for the GPU
‒ CPU could help GPU more:
‒ Less brute force rendering
‒ Improve culling

 Resource states
‒ Gives driver a lot more knowledge & flexibility
‒ Apps can avoid expensive/redundant transitions,
such as surface decompression

 Expose existing GPU functionality
 Shader pipeline object – driver optimizations
‒ Can optimize with pipeline state knowledge
‒ Can optimize across all shader stages

‒ Quad & Rect-lists
‒ HW-specific MSAA & depth data access
‒ Programmable sample patterns
‒ And more..

GPU perf

Queues

 Modern GPUs are heterogeneous machines
with multiple engines

Graphics

‒ Graphics pipeline
‒ Compute pipeline(s)
‒ DMA transfer
‒ Video encode/decode
‒ More…

 Mantle exposes queues for the engines +
synchronization primitives

Compute
DMA
...
Queues

GPU

GPU perf

Queues
Graphics
Compute
DMA
...
Queues

GPU

GPU perf

Queue use cases

 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute

Copy

DMA
Graphics

Render

Other render

Use copy

GPU perf

Queue use cases

compute

 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units

Compute
Graphics

GBuffer

Non-shadowed lighting
Shadowmap 0
Shadowmap 1

Final lighting

GPU perf

Queue use cases


 Multiple compute kernels collaborating

compute

‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer


Compute 0
Compute 1
Graphics

Compute Geometry
Compute Rasterizer
Ordinary Rendering

GPU perf

Queue use cases


 Multiple compute kernels collaborating

compute


Compute
Graphics

‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer

 Compute as frontend for graphics pipeline
‒ Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline

 Game Process1
large GPU
Process0 engines will buildProcess0 job graphs
‒ Move away from single sequential submission
Draw0
‒ Just as we already have doneDraw1
on CPU

Draw2

GPU performance
Programmability

Programmability

Explicit Multi-GPU

 Explicit control of GPU queues and synchronization, finally!
‒ Implement your own Alternate-Frame-Rendering
‒ Or something more exotic..

 Use case: Workstation rendering with 4-8 GPUs
‒ Super high-quality rendering & simulation
‒ Load balance graphics & compute job graphs across GPUs
‒ 20-40 TFlops in a single machine!

 Use case: Low-latency rendering
‒ Important for VR and competitive games
‒ Latency optimized GPU job graph scheduling
‒ VR: Simultaneously drive 2 GPUs (1 per eye)

Programmability

New mechanisms

 Command buffer predication & flow control
‒ GPU affecting/skipping submitted commands
‒ Go beyond DrawIndirect / DispatchIndirect
‒ Advanced variable workloads
‒ Advanced culling optimizations

 Write occlusion query results into GPU buffer
‒ No CPU roundtrip needed
‒ Can drive predicated rendering
‒ Or use results directly in shaders (lens flares)

Programmability

Bindless resources

 Mantle supports bindless resources
‒ Shaders can select resources to use instead of
static binding from CPU
‒ Extension of the descriptor set support

 Examples
‒ Performance optimizations – less data to update
‒ Logic & data structures that live fully on the GPU
‒ Scene culling & rendering

‒ Material representations

 Key component that will open up a lot of
opportunities!

‒ Deferred shading
‒ Raytracing

Platforms

Today

 Mantle gives us strong benefits on Windows today
‒ Console-like performance & programmability on both Windows 7 and Windows 8
‒ For us, well worth the dev time!

 DX & GL are the industry standards
‒ Needed for platforms that do not support Mantle
‒ Needed by devs who do not want/need more control
‒ Have to have fallback paths for GL/DX, but not limit oneself to it

 Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations
‒ PS4 graphics API has great programmability & performance as well
‒ Share concepts, methods & optimization strategies

Platforms

Linux & Mac

 Want to see Mantle on Linux and Mac!
‒ Would enable support for our full engine & rendering
‒ Significantly easier to do efficient renderer with Mantle than with OpenGL

 Use cases:
‒ Workstations
‒ R&D
‒ Not limited by WDDM

‒ Games
‒ Mantle + SteamOS = powerful combination!

Platforms

Mobile

 Mobile architectures are getting closer in capabilities to desktop GPUs
 Want graphics API that allows apps to fully utilize the hardware
‒ Power efficient
‒ High performance
‒ Programmable

 Major opportunity with Mantle – leap frog GL4, DX11
‒ For mobile SoC vendors
‒ For Google and Apple

Platforms

Multi-vendor?

 Mantle is designed to be a thin hardware abstraction
‒ Not tied to AMD’s GCN architecture
‒ Forward compatible
‒ Extensions for architecture- and platform-specific functionality

 Mantle would be a much more efficient graphics API for other vendors as well
‒ Most Mantle functionality can be supported on today’s modern GPUs

 Want to see future version of Mantle supported on all platforms and on all modern GPUs!
‒ Become an active industry standard with IHVs and ISVs collaborating
‒ Enable us developers to innovate with great performance & programmability everywhere

Frostbite

Battlefield 4

 Mantle support is in development
‒ Core renderer (closer to PS4 than DX11)
‒ Implement all rendering techniques used in BF4 (many!)
‒ CPU optimizations (parallel dispatch, descriptor sets)
‒ GPU optimizations (minimize transitions, MSAA)
‒ R&D for advanced GPU optimizations
‒ Memory management
‒ Multi-GPU support
‒ ~2 months of work

 Update targeting late December

Frostbite

Plants vs Zombies: Garden Warfare
 Very different rendering
compared to BF4 
 Frostbite Mantle renderer will
work out of the box
 Focus on APU performance

Frostbite

Future
 All Frostbite games designed with Mantle
‒ 15 games in development across all of EA

 Advanced Mantle rendering & use cases
‒ Lots of exciting R&D opportunities!

 Want multi-vendor & multi-platform support!

Email:
Web:
Twitter:

repi@dice.se
http://frostbite.com
@repi

THE END

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts

Similar to Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts (20)

More from AMD Developer Central

More from AMD Developer Central (20)

Recently uploaded

Recently uploaded (20)

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Technical Director, DICE/Electronic Arts