SlideShare a Scribd company logo
1 of 38
Download to read offline
DX12 & Vulkan
Dawn of a New Generation of
Graphics APIs
Stephan Hodes
Developer Technology Engineer, AMD
Agenda
Why do we need new APIs?
What‘s new?
• Parallelism
• Explicit Memory Management
• PSOs and Descriptor Sets
Best Practices
Evolution of 3D Graphics APIs
• Software rasterization
• 1996 Glide: Bilinear filtering + Transparency
• 1998 DirectX 6: IHV independent + Multitexturing
• 1999 DirectX 7: Hardware Texturing&Lighting + Cube Maps
• 2000 DirectX 8: Programmable shaders + Tessellation
• 2002 DirectX 9: More complex shaders
• 2006 DirectX 10: Unified shader Model
• 2009 DirectX 11: Compute
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
2006200720082009201020112012201320142015
GPU (GFLOPS)
CPU (10 MIPS single
Core)
GPU performance increasing faster than single
core CPU performance
• Allow multi-threaded command buffer recording
• Reduce workload of runtime/driver
• Reduce runtime validation
• Move work to init/load time (e.g. Pipeline setup)
• More explicit control over the hardware
• Reduce convenience functions
How to get GPU bound
• Optimize API usage (e.g. state caching & sorting)
• Batch, Batch, Batch!!!
• ~10k draw calls max
New APIs
Convenience function: Resource renaming
Examples:
• Backbuffer
• Dynamic vertex buffer
• Dynamic constant buffer
• Descriptor sets
• Command buffer
Frame N Frame N+1
Frame N
Frame N+1
Frame N
CPU:
GPU:
Res Lifetime:
Frame N+1
Track usage by End-Of-Frame-Fence
• Fences are expensive
• Use less than 10 fences per frame
Best practice for constant buffers:
• Use system memory (DX12: UPLOAD)
• Keep it mapped
Adopted by several developers & titles
• Developers are willing do the additional work
• Significant performance improvements in games
• Good ISVs don‘t need runtime validation
Only available on AMD GCN hardware
Needed standardization
2013: AMD Mantle
„Mantle is not for everyone“
• It‘s a „just the right level API“
• Support different HW configurations
• Discreet GPU vs. Integrated
• Shaders & command buffer are HW specific
• Support different HW generations
• Think about future hardware
• On PC, your title is never alone
2013: AMD Mantle
„Mantle is not a low level API “
Next Generation API features
DirectX12 & Vulkan share the Mantle philosophy:
• Minimize overhead
• Minimize runtime validation
• Allow multithreaded command buffer recording
• Provide low level memory management
• Support multiple asynchronous queues
• Provide explicit access to multiple devices
The big question: „How much performance will I
gain from porting my code to new APIs?“
No magic involved!
Depends on the current bottlenecks
Depends a lot on engine design
• Need to utilize new possibilities
• It might „just work“ (esp. if heavily CPU limited)
• Might need redesign of engine (and asset pipeline)
Particle Update
Post
Processing
Transparent
Object
Rendering
Game Engine 1.0
(Designed around API commands)
DirectX 11 Runtime
(A lot of validation)
DirectX 11 Driver
(A lot of optimization)
Hardware
AOGI
Lighting
And
Shading
Shadowmaps
Gbuffer
Particle Update
Post
Processing
Game Engine 1.0
(Designed around API commands)
DirectX 11 Runtime
(A lot of validation)
DirectX 11 Driver
(A lot of optimization)
Hardware
AOGI
Direct
access
DX12 Runtime
DX12 Driver
API Abstraction
Layer
Shadowmaps
Gbuffer
Lighting
And
Shading
Transparent
Object
Rendering
Particle Update
Game Engine 1.0
DirectX 11 Runtime
(A lot of validation)
DirectX 11 Driver
(A lot of convenience)
Hardware
AOGI
DX12 Runtime
DX12 Driver
Game Engine 2.0
Direct
access
Direct
access
DirectX 11 Runtime
(A lot of validation)
DirectX 11 Driver
Shadowmaps
Gbuffer
Lighting
And
Shading
Post
Processing
Transparent
Object
Rendering
Think Parallel!
Keep the GPU busy
CPU side multithreading
• Multi threaded command buffer building
• Submission to queue is not thread safe
• Split frame into macro render jobs
• Offload shader compilation from main thread
• Batch command buffer submission
• Don‘t stall during submit/present
Radeon Fury X Radeon Fury
Compute Units 64 56
Core 1050 Mhz 1000 Mhz
Memory size 4 GB 4 GB
Memory BW 512 GB/s 512 GB/s
ALU 8.6 TFlops 7.17 TFlops
64 CU
x 4 SIMD per CU
x 10 Wavefronts per SIMD
x 64 Threads per Wavefront
______________________
Up to 163840 threads
Batch, Batch, Batch!!!
GPU side multithreading
IA
VS
Rasterizer
PS
Output
GPU: Single graphics queue
Multiple commands can execute in parallel
• Pipeline (usually) must maintain pixel order
• Load balancing is the main problem
Culling
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N
Draw N
Draw N
Draw N
Draw N
Draw N
Draw N+2
Draw N+2
Draw N+2
Draw N+2
N+2 …
N+2 …
Time Query
• Indicate RaW/WaW Hazards
• Switch resource state between RO/RW/WO
• Decompress DepthStencil/RTs
• May cause a stall or cache flush
• Batch them!
• Split Barriers may help in the future
• Always execute them on the last queue that
wrote the resource
Most common cause for bugs!
Explicit Barriers & Transitions
IA
VS
Rasterizer
PS
Output
GPU: Barriers
• Hard to detect Barriers in DX11
• Explicit in DX12
Culling
Draw N
Draw N
Draw N
Draw N
Draw N
Draw N
Barrier
Draw N+2
Draw N+2
Draw N+2
Draw N+2
Draw N+2
Draw N+2
Time Query
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N+1Barrier
IA
VS
Rasterizer
PS
Output
GPU: Barriers
• Batch them!
• [DX12] In the future split barriers may help
Culling
Draw N
Draw N
Draw N
Draw N
Draw N
Draw N
Time Query
Draw N+2
Draw N+2
Draw N+2
Draw N+2
Draw N+2
Draw N+2DoubleBarrier
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N+1
Draw N+1
IA
VS
Rasterizer
PS
Output
GPU underutilization
Culling can cause bubbles of inactivity.
Fetch latency is a common cause for underutilization
Culling
Time Query
Draw N
Draw N
Graphics
Compute
Copy
Multiple Queues
• Let driver know about
independent workloads
• Each queue type a superset
• Multiple queues per type
• Specify type at record time
• Parallel execution
• Sync using fences
• Shared GPU resources
Multiple queues allow to specify tasks to execute in parallel
Schedule different bottlenecks together to improve efficiency
Asynchronous Compute
Bus dominated Shader throughput Geometry dominated
Shadow mapping
ROP heavy workloads
G buffer operations
DMA operations
- Texture upload
- Heap defrag
Deferred lighting
Postprocessing effects
Most compute tasks
- Texture compression
- Physics
- Simulations
Rendering highly detailed
models
DirectX 11 only supports one device
• CF/SLI support essentially a driver hack
• Increases latency
Explicit MGPU allows
• Split Frame Rendering
• Master/Slave configurations
• Split frame rendering
• 3D/VR rendering using 2 dGPUs
Explicit MGPU
Take Control!
Explicit Memory Managagement
Explicit Memory Management
• Control over heaps and residency
• Abstraction for different architectures
• VMM still exists
• Use Evict/MakeResident to page out
unused resources
• Avoid oversubscribing resident memory!
DEFAULT UPLOAD READBACK
Memory
Pool
Local (dGPU)
System(iGPU)
System System
CPU
Properties
No CPU access Write Combine Write Back
Usage Frequent GPU
Read/Write
Max GPU
Bandwidth
CPU Write Once,
GPU Read Once
Max CPU Write
Bandwidth
GPU Write Once,
CPU Read
Max CPU Read
Bandwidth
Explicit Memory Management
Rendertargets & UAVs
• Create in DEFAULT
Textures
• Write to UPLOAD
• Use copy queue to copy to DEFAULT
• Copy swizzles: required on iGPU!
Buffers (CB/VB/IB)
• Placement dependent on usage:
• Write once/Read once => UPLOAD
• Write once/Read many => Copy to DEFAULT
Direct3D 12 Resource Creation APIs
Physical
Pages
Physical Pages
GPU VA
Resource Heap
Texture3DBuffer
Physical
Pages
GPU VA
Resource
Heap
Texture2D
Committed Placed Reserved
Don‘t over-allocate committed memory
• Share L1 with windows and other processes
• Don‘t allocate more than 80%
• Reduce memory footprint
• Use placed resources to reduce overhead
• Use reserved resources as PRT
Allocate most important resources first
Group resources used together in same heap
• Use MakeResident/Evict
Explicit Memory Management
Avoid Redundancy!
Organize your pipelines
Full pipeline optimization
● Simplifies optimization
Additional information at startup
● Shaders
● Raster states
● Static constants
Build a pipeline cache
● No pre-warming
Most engines not designed for
monolithic pipelines
IA
Pipeline
RS
DBCB
PS
GS
VS
DS
HS
State
State RTRT
IB
ResResResResResResResRes
ResResResResResResResRes
ResResResResResResResRes
RTDS
State
Descriptor
Set
Descriptor
Set
Descriptor
Set
Root
State
PipelineStateObjects
Old APIs:
• Single resource binding
• A lot of work for the driver to track, validate and
manage resource bindings
• Data management scripting language style
New APIs:
• Group resources in descriptor sets
• Pipelines contain „pointers“
• Data management C/C++ style
Descriptor Sets
Table driven
Shared across all shader stages
Two-level table
– Root Signature describes a top-level layout
• Pointers to descriptor tables
• Direct pointers to constant buffers
• Inline constants
Changing which table is pointed to is cheap
– It’s just writing a pointer
– no synchronisation cost
Changing contents of table is harder
– Can’t change table in flight on the
hardware
– No automatic renaming
Table
Pointer
Root
Signature
Root
Constant
Buffer
View
32-bit
constant
Table
pointer
Table
pointer
CB view
CB view
SR view
UA view
Descriptor
Table
SR viewSR view
SR view
SR view
Descriptor
Table
Table
pointer
Resource Binding
PIPELINE_STATE_DESC
Pipeline
PS
VS
DS
HS
Root
Descriptor
Table
Pointer
Root
Constant
Buffer
View
32-bit
constant
Table
pointer
SR view
SR view
SR view
CB view
CB view
SR view
UA view
CB0
CB1
CB0
CB0
SR0
CB1
CB0
SR0
SR1
UA0
Use Shader and Pipeline cache
• Avoid duplication
Sort draw calls by PSO used
• Sort by Tessellation/GS enabled/disabled
Keep Root Descriptor small
• Group DescriptorSets by update pattern
Sort Root entries by frequency of update
• Most frequently changed entries first
PSO: Best Practices
Top 5 Performance Advice
#5. Avoid allocation/release at runtime
#4. Don‘t oversubscribe!
Manage your Memory efficiently
#3. Batch, Batch, Batch!
Group Barriers, group command buffer submissions
#2. Think Parallel!
On CPU as well as GPU
#1. Old optimization recommendations still apply
Thank you!
Contact: Stephan.Hodes@amd.com
@Highflz

More Related Content

What's hot

NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-featuresRaimundo Renato
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanElectronic Arts / DICE
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門NVIDIA Japan
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesColin Barré-Brisebois
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 

What's hot (20)

NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-features
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and VulkanKhronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
 
OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
Past, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in GamesPast, Present and Future Challenges of Global Illumination in Games
Past, Present and Future Challenges of Global Illumination in Games
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 

Viewers also liked

Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...AMD Developer Central
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12mistercteam
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauAMD Developer Central
 
[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1MoonLightMS
 
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3Umbra
 

Viewers also liked (20)

Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 
[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1[1023 박민수] 깊이_버퍼_그림자_1
[1023 박민수] 깊이_버퍼_그림자_1
 
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
 
A Step Towards Data Orientation
A Step Towards Data OrientationA Step Towards Data Orientation
A Step Towards Data Orientation
 

Similar to DX12 & Vulkan: Dawn of a New Generation of Graphics APIs

3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)mistercteam
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Presentation for IGDCloud meetup: The clouds arena AWS ver. others
Presentation for IGDCloud meetup: The clouds arena AWS ver. othersPresentation for IGDCloud meetup: The clouds arena AWS ver. others
Presentation for IGDCloud meetup: The clouds arena AWS ver. othersForthscale
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linuxmountpoint.io
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Matthias Trapp
 
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Ontico
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyDataStax Academy
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machineheraflux
 

Similar to DX12 & Vulkan: Dawn of a New Generation of Graphics APIs (20)

3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Presentation for IGDCloud meetup: The clouds arena AWS ver. others
Presentation for IGDCloud meetup: The clouds arena AWS ver. othersPresentation for IGDCloud meetup: The clouds arena AWS ver. others
Presentation for IGDCloud meetup: The clouds arena AWS ver. others
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...
 
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machine
 

More from AMD Developer Central

Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...AMD Developer Central
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 

More from AMD Developer Central (17)

DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs

  • 1. DX12 & Vulkan Dawn of a New Generation of Graphics APIs Stephan Hodes Developer Technology Engineer, AMD
  • 2. Agenda Why do we need new APIs? What‘s new? • Parallelism • Explicit Memory Management • PSOs and Descriptor Sets Best Practices
  • 3. Evolution of 3D Graphics APIs • Software rasterization • 1996 Glide: Bilinear filtering + Transparency • 1998 DirectX 6: IHV independent + Multitexturing • 1999 DirectX 7: Hardware Texturing&Lighting + Cube Maps • 2000 DirectX 8: Programmable shaders + Tessellation • 2002 DirectX 9: More complex shaders • 2006 DirectX 10: Unified shader Model • 2009 DirectX 11: Compute
  • 4. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 2006200720082009201020112012201320142015 GPU (GFLOPS) CPU (10 MIPS single Core) GPU performance increasing faster than single core CPU performance
  • 5. • Allow multi-threaded command buffer recording • Reduce workload of runtime/driver • Reduce runtime validation • Move work to init/load time (e.g. Pipeline setup) • More explicit control over the hardware • Reduce convenience functions How to get GPU bound • Optimize API usage (e.g. state caching & sorting) • Batch, Batch, Batch!!! • ~10k draw calls max New APIs
  • 6. Convenience function: Resource renaming Examples: • Backbuffer • Dynamic vertex buffer • Dynamic constant buffer • Descriptor sets • Command buffer Frame N Frame N+1 Frame N Frame N+1 Frame N CPU: GPU: Res Lifetime: Frame N+1 Track usage by End-Of-Frame-Fence • Fences are expensive • Use less than 10 fences per frame Best practice for constant buffers: • Use system memory (DX12: UPLOAD) • Keep it mapped
  • 7. Adopted by several developers & titles • Developers are willing do the additional work • Significant performance improvements in games • Good ISVs don‘t need runtime validation Only available on AMD GCN hardware Needed standardization 2013: AMD Mantle „Mantle is not for everyone“
  • 8. • It‘s a „just the right level API“ • Support different HW configurations • Discreet GPU vs. Integrated • Shaders & command buffer are HW specific • Support different HW generations • Think about future hardware • On PC, your title is never alone 2013: AMD Mantle „Mantle is not a low level API “
  • 9. Next Generation API features DirectX12 & Vulkan share the Mantle philosophy: • Minimize overhead • Minimize runtime validation • Allow multithreaded command buffer recording • Provide low level memory management • Support multiple asynchronous queues • Provide explicit access to multiple devices
  • 10. The big question: „How much performance will I gain from porting my code to new APIs?“ No magic involved! Depends on the current bottlenecks Depends a lot on engine design • Need to utilize new possibilities • It might „just work“ (esp. if heavily CPU limited) • Might need redesign of engine (and asset pipeline)
  • 11. Particle Update Post Processing Transparent Object Rendering Game Engine 1.0 (Designed around API commands) DirectX 11 Runtime (A lot of validation) DirectX 11 Driver (A lot of optimization) Hardware AOGI Lighting And Shading Shadowmaps Gbuffer
  • 12. Particle Update Post Processing Game Engine 1.0 (Designed around API commands) DirectX 11 Runtime (A lot of validation) DirectX 11 Driver (A lot of optimization) Hardware AOGI Direct access DX12 Runtime DX12 Driver API Abstraction Layer Shadowmaps Gbuffer Lighting And Shading Transparent Object Rendering
  • 13. Particle Update Game Engine 1.0 DirectX 11 Runtime (A lot of validation) DirectX 11 Driver (A lot of convenience) Hardware AOGI DX12 Runtime DX12 Driver Game Engine 2.0 Direct access Direct access DirectX 11 Runtime (A lot of validation) DirectX 11 Driver Shadowmaps Gbuffer Lighting And Shading Post Processing Transparent Object Rendering
  • 15. CPU side multithreading • Multi threaded command buffer building • Submission to queue is not thread safe • Split frame into macro render jobs • Offload shader compilation from main thread • Batch command buffer submission • Don‘t stall during submit/present
  • 16. Radeon Fury X Radeon Fury Compute Units 64 56 Core 1050 Mhz 1000 Mhz Memory size 4 GB 4 GB Memory BW 512 GB/s 512 GB/s ALU 8.6 TFlops 7.17 TFlops 64 CU x 4 SIMD per CU x 10 Wavefronts per SIMD x 64 Threads per Wavefront ______________________ Up to 163840 threads Batch, Batch, Batch!!! GPU side multithreading
  • 17. IA VS Rasterizer PS Output GPU: Single graphics queue Multiple commands can execute in parallel • Pipeline (usually) must maintain pixel order • Load balancing is the main problem Culling Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N Draw N Draw N Draw N Draw N Draw N Draw N+2 Draw N+2 Draw N+2 Draw N+2 N+2 … N+2 … Time Query
  • 18. • Indicate RaW/WaW Hazards • Switch resource state between RO/RW/WO • Decompress DepthStencil/RTs • May cause a stall or cache flush • Batch them! • Split Barriers may help in the future • Always execute them on the last queue that wrote the resource Most common cause for bugs! Explicit Barriers & Transitions
  • 19. IA VS Rasterizer PS Output GPU: Barriers • Hard to detect Barriers in DX11 • Explicit in DX12 Culling Draw N Draw N Draw N Draw N Draw N Draw N Barrier Draw N+2 Draw N+2 Draw N+2 Draw N+2 Draw N+2 Draw N+2 Time Query Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N+1Barrier
  • 20. IA VS Rasterizer PS Output GPU: Barriers • Batch them! • [DX12] In the future split barriers may help Culling Draw N Draw N Draw N Draw N Draw N Draw N Time Query Draw N+2 Draw N+2 Draw N+2 Draw N+2 Draw N+2 Draw N+2DoubleBarrier Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N+1 Draw N+1
  • 21. IA VS Rasterizer PS Output GPU underutilization Culling can cause bubbles of inactivity. Fetch latency is a common cause for underutilization Culling Time Query Draw N Draw N
  • 22. Graphics Compute Copy Multiple Queues • Let driver know about independent workloads • Each queue type a superset • Multiple queues per type • Specify type at record time • Parallel execution • Sync using fences • Shared GPU resources
  • 23. Multiple queues allow to specify tasks to execute in parallel Schedule different bottlenecks together to improve efficiency Asynchronous Compute Bus dominated Shader throughput Geometry dominated Shadow mapping ROP heavy workloads G buffer operations DMA operations - Texture upload - Heap defrag Deferred lighting Postprocessing effects Most compute tasks - Texture compression - Physics - Simulations Rendering highly detailed models
  • 24. DirectX 11 only supports one device • CF/SLI support essentially a driver hack • Increases latency Explicit MGPU allows • Split Frame Rendering • Master/Slave configurations • Split frame rendering • 3D/VR rendering using 2 dGPUs Explicit MGPU
  • 26. Explicit Memory Management • Control over heaps and residency • Abstraction for different architectures • VMM still exists • Use Evict/MakeResident to page out unused resources • Avoid oversubscribing resident memory!
  • 27. DEFAULT UPLOAD READBACK Memory Pool Local (dGPU) System(iGPU) System System CPU Properties No CPU access Write Combine Write Back Usage Frequent GPU Read/Write Max GPU Bandwidth CPU Write Once, GPU Read Once Max CPU Write Bandwidth GPU Write Once, CPU Read Max CPU Read Bandwidth
  • 28. Explicit Memory Management Rendertargets & UAVs • Create in DEFAULT Textures • Write to UPLOAD • Use copy queue to copy to DEFAULT • Copy swizzles: required on iGPU! Buffers (CB/VB/IB) • Placement dependent on usage: • Write once/Read once => UPLOAD • Write once/Read many => Copy to DEFAULT
  • 29. Direct3D 12 Resource Creation APIs Physical Pages Physical Pages GPU VA Resource Heap Texture3DBuffer Physical Pages GPU VA Resource Heap Texture2D Committed Placed Reserved
  • 30. Don‘t over-allocate committed memory • Share L1 with windows and other processes • Don‘t allocate more than 80% • Reduce memory footprint • Use placed resources to reduce overhead • Use reserved resources as PRT Allocate most important resources first Group resources used together in same heap • Use MakeResident/Evict Explicit Memory Management
  • 32. Full pipeline optimization ● Simplifies optimization Additional information at startup ● Shaders ● Raster states ● Static constants Build a pipeline cache ● No pre-warming Most engines not designed for monolithic pipelines IA Pipeline RS DBCB PS GS VS DS HS State State RTRT IB ResResResResResResResRes ResResResResResResResRes ResResResResResResResRes RTDS State Descriptor Set Descriptor Set Descriptor Set Root State PipelineStateObjects
  • 33. Old APIs: • Single resource binding • A lot of work for the driver to track, validate and manage resource bindings • Data management scripting language style New APIs: • Group resources in descriptor sets • Pipelines contain „pointers“ • Data management C/C++ style Descriptor Sets
  • 34. Table driven Shared across all shader stages Two-level table – Root Signature describes a top-level layout • Pointers to descriptor tables • Direct pointers to constant buffers • Inline constants Changing which table is pointed to is cheap – It’s just writing a pointer – no synchronisation cost Changing contents of table is harder – Can’t change table in flight on the hardware – No automatic renaming Table Pointer Root Signature Root Constant Buffer View 32-bit constant Table pointer Table pointer CB view CB view SR view UA view Descriptor Table SR viewSR view SR view SR view Descriptor Table Table pointer Resource Binding
  • 36. Use Shader and Pipeline cache • Avoid duplication Sort draw calls by PSO used • Sort by Tessellation/GS enabled/disabled Keep Root Descriptor small • Group DescriptorSets by update pattern Sort Root entries by frequency of update • Most frequently changed entries first PSO: Best Practices
  • 37. Top 5 Performance Advice #5. Avoid allocation/release at runtime #4. Don‘t oversubscribe! Manage your Memory efficiently #3. Batch, Batch, Batch! Group Barriers, group command buffer submissions #2. Think Parallel! On CPU as well as GPU #1. Old optimization recommendations still apply