Direct3D and the Future of Graphics APIs - AMD at GDC14

2,259 views
2,059 views

Published on

A look at how new Direct3D advancements enhance efficiency and enable fully-threaded building of command buffers in this prentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,259
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Direct3D and the Future of Graphics APIs - AMD at GDC14

  1. 1. DIRECT3D AND THE FUTURE OF GRAPHICS APIS Dave Oldcorn, AMD Dan Baker, Oxide Games Johan Andersson, EA / DICE
  2. 2. 2 | AMD Direct3D Futures | March 20th, 2014 NITROUS AND DX12 Dan Baker Partner, Oxide Games
  3. 3. 3 | AMD Direct3D Futures | March 20th, 2014 HAVEN’T WE BEEN HERE BEFORE? Goal of DX9 –Remember State blocks? Goal of DX10 –Large state groups Goal of DX11 –Deferred contexts Are we actually getting faster, or are CPUs just faster? –Quite possible no perf improvements due to API features in 10 years Maybe adding features isn’t the answer…
  4. 4. 4 | AMD Direct3D Futures | March 20th, 2014 DEEPLY ROOTED PROBLEM  Coding design philosophies clash with real world  OOP, data hiding, polymorphic design clashes with task-driven, data parallel  Evident in language trends, striking disconnect between what is considered good code, and what is fast  Gap has always been there, but has grown in recent years – 15 years ago, processors often bound by computation – Now, usually bound by cache misses, serialization, pipeline stalls, etc. – Multi-Core CPUs are ineffectively utilized  „Heavy Iron‟ , e.g. Big Object, Opaque memory is a dead end for performance  The revolt is beginning in high performance graphics APIS, but will spread
  5. 5. 5 | AMD Direct3D Futures | March 20th, 2014 BUT… HOW MUCH FASTER? Biggest problem with industry today: Acceptance Only 1 secret in API design: That it can be done. –And isn‟t that hard –And our code isn‟t that ugly Star Swarm already demonstrating what is possible on a PC
  6. 6. 6 | AMD Direct3D Futures | March 20th, 2014 D3D12 FEATURES THAT NITROUS USES True de-coupled multi-core rendering – Expecting near linear thread scheduling Manual Hazard tracking – Hazards have been resolved already Memory Heaps – Bigger chunks of memory pool grouping make management simpler Descriptor Tables – Table exposure allows a cheaper way of binding textures – Allows texture bindings to be shared between non-adjacent batches
  7. 7. 7 | AMD Direct3D Futures | March 20th, 2014 WHAT’S DIFFERENT NOW? Spec Written Spec Reviewed API implemented Released to public First Engine use Analysis done Thenn
  8. 8. 8 | AMD Direct3D Futures | March 20th, 2014 WHAT’S DIFFERENT NOW? Nown Create Spec Implement Spec Prototype on Actual Engines Analyze Discuss with IHVs, ISVs Start Here If Ready, exit here to prep for release
  9. 9. 9 | AMD Direct3D Futures | March 20th, 2014 IN THE SPIRIT OF CONTRIBUTING Oxide proud to announce that we have a proto-type of Nitrous running on D3D12 *PR DISCLAIMER* This is not an official announcement regarding D3D12 support Porting from other modern APIs is much simpler than porting from D3D11 to D3D12
  10. 10. 10 | AMD Direct3D Futures | March 20th, 2014 EXPECTED RESULTS CPU Driver overhead largely put to rest Huge increases in driver reliability Huge decreases in frame latency, expecting median frame latency to be 1.5 frames –Increased perceptual responsiveness Never a dropped frame or stall due to driver API issues –*Other OS events could cause stalls Driver should be far smaller, simpler to implement, IHVs can spend more time on optimizations
  11. 11. DIRECT3D12 AND THE FUTURE OF GRAPHICS APIS Dave Oldcorn, Direct3D12 Driver Architect, AMD
  12. 12. 12 | AMD Direct3D Futures | March 20th, 2014 THE PROBLEM
  13. 13. 13 | AMD Direct3D Futures | March 20th, 2014 THE PROBLEM  Mismatch between existing Direct3D and hardware capabilities – Lots of CPU cores, but only one stream of data – State communication in small chunks – “Hidden” work  Hard to predict from any one given call what the overhead might be  Implicit memory management – Hardware evolving away from classical register programming
  14. 14. 14 | AMD Direct3D Futures | March 20th, 2014 Metal (register level access) API LANDSCAPE  Gap between PC „raw‟ 3D APIs and the hardware has opened up  Very high level APIs now ubiquitous; easy to access even for casual developers, plenty of choice  Where the PC APIs are is a middle ground Capability,easeofuse,distancefrom3Dengine Game Engines Frostbite Unity Unreal CryEngine BlitzTech Flash / Silverlight Console APIs Opportunity D3D9 OpenGL D3D11 D3D7/8 Application
  15. 15. 15 | AMD Direct3D Futures | March 20th, 2014 WHAT ARE THE CONSEQUENCES? WHAT ARE THE SOLUTIONS?
  16. 16. 16 | AMD Direct3D Futures | March 20th, 2014 SEQUENTIAL API  Sequential API: state for given draw comes from arbitrary previous time  Some states must be reconciled on the CPU (“delayed validation”) – All contributing state needs to be visible  GPU isn‟t like this, uses command buffers – Must save and restore state at start and end ... Draw Set PS CB Draw x 5 Set VS CB Draw x 3 Set Blend Set PS Set RT state Draw Set VS VB Draw ... (more, earlier) PS CB VS CB Blend state PS RT state Draw State contributing to draw API input
  17. 17. 17 | AMD Direct3D Futures | March 20th, 2014 THREADING A SEQUENTIAL API  Sequential API threading – Simple producer / consumer model  Extra latency  Buffering has a cost  More threading would mean dividing tasks on finer grain – Bottlenecked on application or driver thread  Difficult to extract parallelism (Amdahl‟s Law) Application simulation Prebuild Thread 0 Prebuild Thread 1 Application Render Thread GPU Execution Queue Queued Buffer 0 Queued Buffer 1 ... Runtime / Driver Application Driver Thread Queued Buffer 2
  18. 18. 18 | AMD Direct3D Futures | March 20th, 2014 COMMAND BUFFER API  GPUs only listen to command buffers  Let the app build them – Command Lists, at the API level  Solves sequential API CPU issues Application simulation Thread 0 Thread 1 Build Cmd Buffer Build Cmd Buffer GPU Execution Queue Queued Buffer 0 Queued Buffer 1 ... Runtime / Driver Application
  19. 19. 19 | AMD Direct3D Futures | March 20th, 2014 BETTER SCHEDULING  App has much more control over scheduling work – Both CPU side and GPU  Threads don‟t really share much resource  Many more options for streaming assets Driver thread Create thread D3D11: CB building threads tend to interfere GPU load still added but only after queuing Render work Create work GPU executes D3D12: CB building threads more independent Create thread Build threads
  20. 20. 20 | AMD Direct3D Futures | March 20th, 2014 PIPELINE OBJECTS  Pipeline objects get rid of JIT and enable LTCG for GPUs  Decouple interface and implementation  We‟re aware that this is a hairpin bend for many graphics engines to negotiate. – Many engines don‟t think in terms of predicting state up front – The benefits are worth it Simplified dataflow through pipeline VS PS Index Process Primitive Generation Rasteriser Rendertarget Output ? ? ?
  21. 21. 21 | AMD Direct3D Futures | March 20th, 2014 RENDER OBJECT BINDING MISMATCH  Hardware uses tables in video memory  BUT still programmed like a register solution – So one bind becomes:  Allocate a new chunk of video memory  Create a new copy of the entire table  Update the one entry  Write the register with the new table base address SR CB On-chip root table (1 per stage) Pointer to table (here, textures) GPU Memory SRD table GPU Memory resource Pointer to table (constant buffers) Pointer to (+ params of) resource
  22. 22. 22 | AMD Direct3D Futures | March 20th, 2014 DESCRIPTOR TABLES  Several tables of each type of resource – Easy to divide up by frequency  Tables can be of arbitrary size; dynamically indexed to provide bindless textures  Changing a table pointer is cheap  Updating a descriptor in a table is not SR.T[0] SR.T[3] SR.T[2] SR.T[1] UAV CB.T[1] CB.T[0] Samp SR.T[0][0] SR.T[0][2] SR.T[0][1] CB.T[1][0] CB.T[1][1] On-chip table Pointer to table (textures table 0) GPU Memory SRD table Pointer to table (constbuf table 1)
  23. 23. 23 | AMD Direct3D Futures | March 20th, 2014 KEY INNOVATIONS Innovation CPU-side win GPU-side win Command buffers Build on many threads Control of scheduling Lower latency Simplified state tracking Pipeline state objects Link at create time No JIT shader compiles Efficient batched updates Cheaper state updates Enables LTCG Bind objects in groups Cheap to change group Cheap to change group Fits hardware paradigm Move work to Create Predictability Enables optimisations
  24. 24. 24 | AMD Direct3D Futures | March 20th, 2014 KEY INNOVATIONS Innovation CPU-side win GPU-side win Explicit Synchronisation Efficiency Required for bindless textures Less overhead Explicit Memory Management Efficiency Predictability Application flexibility Zero copy Control over placement Do less Predictability, Efficiency Enables aggressive schedule FEWER BUGS
  25. 25. 25 | AMD Direct3D Futures | March 20th, 2014 NEW PROBLEMS (AND TIPS TO SOLVE THEM)
  26. 26. 26 | AMD Direct3D Futures | March 20th, 2014 NEW VISIBLE LIMITS  More draws in does not automatically mean more triangles out – You will not see full rendering rates with triangles averaging 1 pixel each. – Wireframe mode should look different to filled rendering
  27. 27. 27 | AMD Direct3D Futures | March 20th, 2014 NEW VISIBLE LIMITS  Feeding the GPU much more efficiently means exploring interesting new limits that weren‟t visible before  10k/frame of anything is ~1µs per thing.  GPU pipeline depth is likely to be 1-10µs (1k-10k cycles).  Specific limit: context registers – Shader tables are NOT in the context – Compute doesn‟t bottleneck on context
  28. 28. 28 | AMD Direct3D Futures | March 20th, 2014 APPLICATION IN CHARGE  Application is arbiter of correct rendering – This is a serious responsibility – The benefits of D3D12 aren‟t readily available without this condition Applications must be warning-free on the debug layer  Different opportunities for driver intervention
  29. 29. 29 | AMD Direct3D Futures | March 20th, 2014 APPLICATION IN CHARGE  No driver thread in play – App can target much lower latency – BUT implies app has to be ready with new GPU work Driver F1 App Render Frame 1 GPU F1 Frame 2 F2 F2 Frame 3 F3 F3 D3D11: No dead GPU time after 1st frame (but extra latency) Dead Time First work sent to driver Driver buffers Present; no future dead time No buffered present reveals dead time on GPU
  30. 30. 30 | AMD Direct3D Futures | March 20th, 2014 USE COMMAND BUFFERS SPARINGLY  Each API command list maps to a single hardware command buffer  Starting / ending a command list has an overhead – Writes full 3D state, may flush caches or idle GPU  We think a good rule of thumb will be to target around 100 command buffers/frame – Use the multiple submission API where possible CB0 CB1 CB2CB0 Multiple applications running on system Application 0 queue CB0 CB1 CB2 CB0 Application 1 queue GPU executes
  31. 31. 31 | AMD Direct3D Futures | March 20th, 2014 ROUND-UP
  32. 32. 32 | AMD Direct3D Futures | March 20th, 2014 ALL-NEW  There‟s a learning curve here for all of us  In the main it‟s a shallow one – Compared at least to the general problem of multithreaded rendering  Multithread is always hard. – Simpler design means fewer bugs and more predictable performance
  33. 33. 33 | AMD Direct3D Futures | March 20th, 2014 WHAT AMD PLAN TO DELIVER  An early preview driver “soon”  Release driver for Direct3D12 launch  Continuous engagement – With Microsoft – With ISVs  Bring your opinions to us and to Microsoft.
  34. 34. 34 | AMD Direct3D Futures | March 20th, 2014 DX12 AND FROSTBITE Johan Andersson Technical Director
  35. 35. 35 | AMD Direct3D Futures | March 20th, 2014 DX12 AND FROSTBITE  PC is very important for EA and we‟ve been pushing hard to improve graphics capabilities on Windows  Excited to be working with Microsoft and the IHVs on Direct3D again!  Good & very healthy collaboration between Microsoft, the IHVs and us game/engine developers  DX12 is a really big step forward from DX11 or GL4
  36. 36. 36 | AMD Direct3D Futures | March 20th, 2014 DX12 FEATURES AND FROSTBITE  Key DX12 features that are a great fit for Frostbite: – Efficient parallel command buffers – Descriptor tables – Pipeline objects – Explicit resource synchronization – Explicit memory management  DX12 is still in development so actively working with Microsoft & the IHVs to help make sure all of it fits together and is efficient
  37. 37. 37 | AMD Direct3D Futures | March 20th, 2014 DX12 PLATFORMS  DX12 support on Windows 7 & most existing PC hardware is critical for us – Huge user base still on Windows 7 – Gamers would see major benefits without upgrading  DX12 support on Xbox One is critical for us – Will lead to improved performance & quality for future Xbox One titles – Almost all of our games are cross platform Gen4/PC – Easier development – renderer is shared between Windows & Xbox One  Looking forward to DX12 on mobile/tablets – Power efficiency & low overhead is really key – Need larger user base to target on Windows for mobile
  38. 38. 38 | AMD Direct3D Futures | March 20th, 2014 DX12 AND FROSTBITE  We are building a DX12 renderer for Frostbite! – Will work on GPUs from all vendors – benefits a wide set of gamers  Expected benefits over DX11: – More stable and consistent performance – Higher overall performance – Move our design target – more richer & more detailed game worlds – Thinner drivers – easier to work with / less of a black box – More control for us developers – new techniques & optimizations  Really happy that the full Windows & Xbox eco systems are moving to low-level graphics API!
  39. 39. 39 | AMD Direct3D Futures | March 20th, 2014 QUESTIONS

×