SlideShare a Scribd company logo
1 of 15
Johan Andersson
Technical Director
Frostbite
LOW-LEVEL GRAPHICS
INTEL VCARB 2014
Email: johan@frostbite.com
Web: http://frostbite.com
Twitter: @repi
 Frostbite has 2 very different rendering use cases:
1. Rendering the world with huge amounts of objects
 Tons of draw calls and with lots of different states & pipelines
 Heavily CPU limited with high-level APIs
 Read-only view of the world and resources (except for render targets)
2. Setting up rendering and doing lighting, post-fx, virtual texturing, compute, etc
 Tons of different types of complex operations, not a lot of draw calls
 ~50 different rendering passes in Frostbite
 Managing resource state, memory and running on different queues (graphics, compute, DMA)
 Both are very important and low-level discussion & design target are for both!
RENDERING USE CASES
 I consider low-level APIs having 3 important design targets:
1. Solving the many draw calls CPU performance problem
 CPU overhead
 Parallel command buffer building
 Binding model & pipeline objects
 Explicit resource state handling
2. Enabling new GPU programmability
 Bindless
 CPU/GPU collaboration
 GPU work creation
3. Improving GPU performance & memory usage
 Utilize multiple hardware engines in parallel
 Explicit memory management / virtual memory
 Explicit resource state handling
 Believe one needs a clean slate API design to target all of these – new paradigm model
 Hence Mantle & DX12 - too much legacy in the way in DX11, GL4 and GLES
LOW-LEVEL API TARGETS
 1-2 orders of a magnitude less CPU overhead for lots of draw calls
 Thanks to explicit resource barriers, explicit memory management & few bind points
 Problem for us is not ”how to do 1 million similar draw calls with same state”
 Problem is ”how to do 10-100k draw calls with different state”
 Should in parallel also try and reduce amount of state (such as bindless), but wary of GPU overhead
 Stable & consistent performance
 Major benefit for users, will only be more important going forward
 Explicit submission to GPU, not at random commands
 No runtime late binding or compilation of shaders
 Can be a challenge for engines to know all pipeline state up front, but worth designing for!
 Improved GPU performance
 Engine has more high-level knowledge of how resources are used
 Seen examples of adv GPU driver opts that are easier to implement thanks to low-level model
PERFORMANCE
DX11
MANTLE
 Key design to significantly lower driver overhead and complexity
 Explicit hazard tracking
 Hides architecture-specific caches
 Can be challenge in the apps/engines – but worth it
 Esp. with multiple queues & out-of-order command buffers
 Requires very clear specifications and great validation layer!
 In Frostbite we mostly track this per resource instead for simplicity & performance
 Instead of per subresource
RESOURCE TRANSITIONS
 Example complex cases:
 Read-only depth testing together with stencil writing (different state for depth & stencil)
 Mipmap generation (sub-resource specific states)
 Async compute offloading part of graphics pipeline
 Critical to make sure future low-level APIs have the right states/usages exposed
 Devil is in the details
 Does the sw/hw require the transition to happen on the same queue that just used a resource?
 Look at your concrete use cases early!
 Would help if future hardware doesn’t need as many different barriers
 But at what cost?
RESOURCE TRANSITIONS
 Aka ”descriptor sets” in Mantle
 This new model has been working very well for us – even with very basic handling
 Great to have as separate objects not connected to device context
 Treated all resource references as dynamic and built every frame
 ~15k resource entries in single large table per frame in BF4 (heavy instancing)
 Lots of opportunity going forward
 Split out static resources to own persistent tables
 Split out shared common resources to own table
 Connect together with nested resource tables
 Bindless for more complex cases
RESOURCE TABLES
 Seen really good wins with both async DMAs and async compute
 And it is an important target for us going forward
 Additional opportunities
 DMA in and out of embedded memory
 Buffer/Image/Video compression/decompression
 More?
 What engines / queues does Intel have & be able to expose?
MULTIPLE GPU QUEUES
 Kick CPU job from GPU
 Possible now with explicit fences & events that CPU can poll/wait on
 Enables filling in resources just in time
 Want async command buffers
 Kick CPU job from GPU and CPU builds & unlocks an already queued command buffer
 We’ve been doing this on consoles - ”just” a software limitation
 Example use case: Sample Distributed Shadowmaps without stalling GPU pipeline
 Major opportunity going forward
 Needs support in OS:es & driver models
 Drive rendering pipeline based on data from the current GPU frame (such as the zbuffer)
 Decide where to run code based on power efficiency
 Important both for discrete & integrated GPUs
GPU/CPU COLLABORATION
 For us, have been both easier to work with & get good performance
 What we are used to from working on consoles and have architecture for
 Update buffers from any thread, not locked to a device context
 Persistently map or pin buffer & image objects for easy reading & writing
 Pool memory to reduce overhead
 Alias objects to the same memory for significant reduction in memory
 Esp. for render targets
 Built-in virtual memory mapping
 Easier & more flexible way to manage large amount of memory
EXPLICIT MEMORY MANAGEMENT
 Major issue for us during BF4, avoiding VidMM stalls
 VidMM is a black box  Difficult to know what is going on & why
 Explicitly track memory references for each command buffer
 Tweak memory pools & chunk sizes
 Force memory to different heaps
 Setting memory priorities
 Going forward will redesign to strongly avoid overcommiting
 Automatically balance streaming pool settings and cap graphics settings
 Are there any other ways the app, OS and GPUs can handle this?
 Page faulting GPUs?
OVERCOMMITING VIDEO MEMORY
 Extensions are a great fit for low-level APIs
 Low-level extensions that exposes new hardware functionality
 Examples: PixelSync and EXT_shader_pixel_local_storage
 No need for huge amount of extensions like OGL which is mostly a high-level API
 Mantle has 5 extensions for the AMD-specific hardware functionality & Windows-specific integration
 Potential challenge for DX that has (officially) not had extensions before
 Would like to see DX official extensions, including shader code extensions!
 GL & Mantle has a strong advantage here
 Other alternative would be rapid iterations on the DX API (small updates quarterly?)
EXTENSIONS
 Discuss! 
THANKS

More Related Content

What's hot

FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderertobias_persson
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive RenderingElectronic Arts / DICE
 
How data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesHow data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesElectronic Arts / DICE
 
Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeWuBinbo
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 

What's hot (20)

Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Battlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + MantleBattlefield 4 + Frostbite + Mantle
Battlefield 4 + Frostbite + Mantle
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderer
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering5 Major Challenges in Interactive Rendering
5 Major Challenges in Interactive Rendering
 
How data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesHow data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield Heroes
 
Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)Gamebryo LightSpeed(English)
Gamebryo LightSpeed(English)
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 

Similar to Low-level Graphics APIs

Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailInternet World
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSumeet Bansal
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2 VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2 VMworld
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...In-Memory Computing Summit
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityConSanFrancisco123
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3UniFabric
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistParis Open Source Summit
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Ceph Community
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIn-Memory Computing Summit
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 
Well Behaved Mobile Apps on AIR - Performance Related
Well Behaved Mobile Apps on AIR - Performance RelatedWell Behaved Mobile Apps on AIR - Performance Related
Well Behaved Mobile Apps on AIR - Performance RelatedRenaun Erickson
 

Similar to Low-level Graphics APIs (20)

Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teams
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2 VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
Well Behaved Mobile Apps on AIR - Performance Related
Well Behaved Mobile Apps on AIR - Performance RelatedWell Behaved Mobile Apps on AIR - Performance Related
Well Behaved Mobile Apps on AIR - Performance Related
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Low-level Graphics APIs

  • 1. Johan Andersson Technical Director Frostbite LOW-LEVEL GRAPHICS INTEL VCARB 2014 Email: johan@frostbite.com Web: http://frostbite.com Twitter: @repi
  • 2.  Frostbite has 2 very different rendering use cases: 1. Rendering the world with huge amounts of objects  Tons of draw calls and with lots of different states & pipelines  Heavily CPU limited with high-level APIs  Read-only view of the world and resources (except for render targets) 2. Setting up rendering and doing lighting, post-fx, virtual texturing, compute, etc  Tons of different types of complex operations, not a lot of draw calls  ~50 different rendering passes in Frostbite  Managing resource state, memory and running on different queues (graphics, compute, DMA)  Both are very important and low-level discussion & design target are for both! RENDERING USE CASES
  • 3.  I consider low-level APIs having 3 important design targets: 1. Solving the many draw calls CPU performance problem  CPU overhead  Parallel command buffer building  Binding model & pipeline objects  Explicit resource state handling 2. Enabling new GPU programmability  Bindless  CPU/GPU collaboration  GPU work creation 3. Improving GPU performance & memory usage  Utilize multiple hardware engines in parallel  Explicit memory management / virtual memory  Explicit resource state handling  Believe one needs a clean slate API design to target all of these – new paradigm model  Hence Mantle & DX12 - too much legacy in the way in DX11, GL4 and GLES LOW-LEVEL API TARGETS
  • 4.  1-2 orders of a magnitude less CPU overhead for lots of draw calls  Thanks to explicit resource barriers, explicit memory management & few bind points  Problem for us is not ”how to do 1 million similar draw calls with same state”  Problem is ”how to do 10-100k draw calls with different state”  Should in parallel also try and reduce amount of state (such as bindless), but wary of GPU overhead  Stable & consistent performance  Major benefit for users, will only be more important going forward  Explicit submission to GPU, not at random commands  No runtime late binding or compilation of shaders  Can be a challenge for engines to know all pipeline state up front, but worth designing for!  Improved GPU performance  Engine has more high-level knowledge of how resources are used  Seen examples of adv GPU driver opts that are easier to implement thanks to low-level model PERFORMANCE
  • 7.  Key design to significantly lower driver overhead and complexity  Explicit hazard tracking  Hides architecture-specific caches  Can be challenge in the apps/engines – but worth it  Esp. with multiple queues & out-of-order command buffers  Requires very clear specifications and great validation layer!  In Frostbite we mostly track this per resource instead for simplicity & performance  Instead of per subresource RESOURCE TRANSITIONS
  • 8.  Example complex cases:  Read-only depth testing together with stencil writing (different state for depth & stencil)  Mipmap generation (sub-resource specific states)  Async compute offloading part of graphics pipeline  Critical to make sure future low-level APIs have the right states/usages exposed  Devil is in the details  Does the sw/hw require the transition to happen on the same queue that just used a resource?  Look at your concrete use cases early!  Would help if future hardware doesn’t need as many different barriers  But at what cost? RESOURCE TRANSITIONS
  • 9.  Aka ”descriptor sets” in Mantle  This new model has been working very well for us – even with very basic handling  Great to have as separate objects not connected to device context  Treated all resource references as dynamic and built every frame  ~15k resource entries in single large table per frame in BF4 (heavy instancing)  Lots of opportunity going forward  Split out static resources to own persistent tables  Split out shared common resources to own table  Connect together with nested resource tables  Bindless for more complex cases RESOURCE TABLES
  • 10.  Seen really good wins with both async DMAs and async compute  And it is an important target for us going forward  Additional opportunities  DMA in and out of embedded memory  Buffer/Image/Video compression/decompression  More?  What engines / queues does Intel have & be able to expose? MULTIPLE GPU QUEUES
  • 11.  Kick CPU job from GPU  Possible now with explicit fences & events that CPU can poll/wait on  Enables filling in resources just in time  Want async command buffers  Kick CPU job from GPU and CPU builds & unlocks an already queued command buffer  We’ve been doing this on consoles - ”just” a software limitation  Example use case: Sample Distributed Shadowmaps without stalling GPU pipeline  Major opportunity going forward  Needs support in OS:es & driver models  Drive rendering pipeline based on data from the current GPU frame (such as the zbuffer)  Decide where to run code based on power efficiency  Important both for discrete & integrated GPUs GPU/CPU COLLABORATION
  • 12.  For us, have been both easier to work with & get good performance  What we are used to from working on consoles and have architecture for  Update buffers from any thread, not locked to a device context  Persistently map or pin buffer & image objects for easy reading & writing  Pool memory to reduce overhead  Alias objects to the same memory for significant reduction in memory  Esp. for render targets  Built-in virtual memory mapping  Easier & more flexible way to manage large amount of memory EXPLICIT MEMORY MANAGEMENT
  • 13.  Major issue for us during BF4, avoiding VidMM stalls  VidMM is a black box  Difficult to know what is going on & why  Explicitly track memory references for each command buffer  Tweak memory pools & chunk sizes  Force memory to different heaps  Setting memory priorities  Going forward will redesign to strongly avoid overcommiting  Automatically balance streaming pool settings and cap graphics settings  Are there any other ways the app, OS and GPUs can handle this?  Page faulting GPUs? OVERCOMMITING VIDEO MEMORY
  • 14.  Extensions are a great fit for low-level APIs  Low-level extensions that exposes new hardware functionality  Examples: PixelSync and EXT_shader_pixel_local_storage  No need for huge amount of extensions like OGL which is mostly a high-level API  Mantle has 5 extensions for the AMD-specific hardware functionality & Windows-specific integration  Potential challenge for DX that has (officially) not had extensions before  Would like to see DX official extensions, including shader code extensions!  GL & Mantle has a strong advantage here  Other alternative would be rapid iterations on the DX API (small updates quarterly?) EXTENSIONS

Editor's Notes

  1. I’ve been at DICE & EA for 13 years