Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rendering Battlefield 4 with Mantle

116,199 views

Published on

In this technical presentation Johan Andersson shows how the Frostbite 3 game engine is using the low-level graphics API Mantle to deliver significantly improved performance in Battlefield 4 on PC and future games from Electronic Arts. He will go through the work of bringing over an advanced existing engine to an entirely new graphics API, the benefits and concrete details of doing low-level rendering on PC and how it fits into the architecture and rendering systems of Frostbite. Advanced optimization techniques and topics such as parallel dispatch, GPU memory management, multi-GPU rendering, async compute & async DMA will be covered as well as sharing experiences of working with Mantle in general.

Published in: Entertainment & Humor
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Puma Toto merupakan agen togel online terpercaya di seluruh Nusantara. Puma Toto memiliki 8juta member yang bermain bersama dengan Puma Toto dan telah bergerak di bidang bandar togel online terpercaya lebih dari 10tahun. Puma Toto juga Prediksi Togel Akurat dan juga memiliki Prediksi Angka Jitu yang telah bergerak lebih dari 10 tahun, dan akan membantu Anda untuk mendapatkan kemenangan dalam taruhan togel online Bagi Anda yang suka bermain taruhan togel online dan dengan cara main togel online yang tepat untuk mendapatkan kemenangan yang besar, Anda bisa untuk bergabung dengan Puma Toto. Puma Toto juga memiliki Keuntungan bagi Anda yang bergabung, seperti: Bonus New Member 10% Bonus Harian 2% max 10rb / hari Bonus Referral 2% (Seumur Hidup) Deposit Termurah, hanya 10.000 rupiah Minimal taruhan 100perak Diskon dan Hadiah yang menarik dari Puma Toto: Discount dan Hadiah 2D Puma Toto : 18% x 79 Discount dan Hadiah 3D Puma Toto : 46% x 510 Discount dan Hadiah 4D Puma Toto : 58% x 3.900 Puma Toto bekerjasama dengan 25 Negara yang mengeluarkan permainan Togel Online, Yakni: 1. Togel Singapore / Togel SGP dari Singapura 2. Togel Hongkong / Togel HK dari Hongkong 3. Togel Sydney dari Australia 4. Togel Tibetian4D dari Tibet 5. Togel Mongolia4D dari Mongolia 6. Togel Tokyo Mega dari Jepang 7. Togel Korea4D dariKorea 8. Togel Penang4D dari Malaysia 9. Togel Swiss4D dari Swiss 10. Togel Vietnam4D dari Vietnam 11. Togel Bangkok4D dari Thailand 12. Togel Germany4D dari Jerman 13. Togel UK4D dari Inggris 14. Togel Spain4d dari Spanyol 15. Togel Mexico4D dari Mexico 16. Togel Brazilia4D dari Brazil 17. Togel Psco dari Filipina 18. Togel Bulleyes dari New Zealand 19. Togel Greece dari Yunani 20. Togel Texas Morning dari Amerika 21. Togel Oregon 1 dari Amerika 22. Togel Oregon 2 dari Amerika 23. Togel Oregon 3 dari Amerika 24. Togel Carolina 1 dari Amerika 25. Togel Virginia Night dari Amerika Daftarkan diri Anda bersama Puma Toto.Bandar Togel Online Terpercaya dengan minimal deposit hanya 10.000 rupiah dan dengan minimal betting hanya 100 perak dan proses deposit tercepat maximal 1 menit dan proses withdraw tercepat hanya 2menit maximal. Puma Toto akan membantu kemenangan Anda dalam bermain permainan togel online, tentunya ketika Anda bergabung dengan Bandar Togel Online Terpercaya Dan dengan cara main togel online terbaik. Dan dengan kualitas pelayanan bintang lima dari Puma Toto untuk kalian para pecinta permainan togel online. Puma Toto juga memiliki Prediksi Angka Jitu, dan Prediksi Togel Akurat yang dapat membantu kemenangan Anda dalam bermain taruhan togel online.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Puma Toto merupakan agen togel online terpercaya di seluruh Nusantara. Puma Toto memiliki 8juta member yang bermain bersama dengan Puma Toto dan telah bergerak di bidang bandar togel online terpercaya lebih dari 10tahun. Puma Toto juga Prediksi Togel Akurat dan juga memiliki Prediksi Angka Jitu yang telah bergerak lebih dari 10 tahun, dan akan membantu Anda untuk mendapatkan kemenangan dalam taruhan togel online Bagi Anda yang suka bermain taruhan togel online dan dengan cara main togel online yang tepat untuk mendapatkan kemenangan yang besar, Anda bisa untuk bergabung dengan Puma Toto. Puma Toto juga memiliki Keuntungan bagi Anda yang bergabung, seperti: Bonus New Member 10% Bonus Harian 2% max 10rb / hari Bonus Referral 2% (Seumur Hidup) Deposit Termurah, hanya 10.000 rupiah Minimal taruhan 100perak Diskon dan Hadiah yang menarik dari Puma Toto: Discount dan Hadiah 2D Puma Toto : 18% x 79 Discount dan Hadiah 3D Puma Toto : 46% x 510 Discount dan Hadiah 4D Puma Toto : 58% x 3.900 Puma Toto bekerjasama dengan 25 Negara yang mengeluarkan permainan Togel Online, Yakni: 1. Togel Singapore / Togel SGP dari Singapura 2. Togel Hongkong / Togel HK dari Hongkong 3. Togel Sydney dari Australia 4. Togel Tibetian4D dari Tibet 5. Togel Mongolia4D dari Mongolia 6. Togel Tokyo Mega dari Jepang 7. Togel Korea4D dariKorea 8. Togel Penang4D dari Malaysia 9. Togel Swiss4D dari Swiss 10. Togel Vietnam4D dari Vietnam 11. Togel Bangkok4D dari Thailand 12. Togel Germany4D dari Jerman 13. Togel UK4D dari Inggris 14. Togel Spain4d dari Spanyol 15. Togel Mexico4D dari Mexico 16. Togel Brazilia4D dari Brazil 17. Togel Psco dari Filipina 18. Togel Bulleyes dari New Zealand 19. Togel Greece dari Yunani 20. Togel Texas Morning dari Amerika 21. Togel Oregon 1 dari Amerika 22. Togel Oregon 2 dari Amerika 23. Togel Oregon 3 dari Amerika 24. Togel Carolina 1 dari Amerika 25. Togel Virginia Night dari Amerika Daftarkan diri Anda bersama Puma Toto.Bandar Togel Online Terpercaya dengan minimal deposit hanya 10.000 rupiah dan dengan minimal betting hanya 100 perak dan proses deposit tercepat maximal 1 menit dan proses withdraw tercepat hanya 2menit maximal. Puma Toto akan membantu kemenangan Anda dalam bermain permainan togel online, tentunya ketika Anda bergabung dengan Bandar Togel Online Terpercaya Dan dengan cara main togel online terbaik. Dan dengan kualitas pelayanan bintang lima dari Puma Toto untuk kalian para pecinta permainan togel online. Puma Toto juga memiliki Prediksi Angka Jitu, dan Prediksi Togel Akurat yang dapat membantu kemenangan Anda dalam bermain taruhan togel online.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Check the source ⇒ www.HelpWriting.net ⇐ This site is really helped me out gave me relief from headaches. Good luck!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download Full doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download PDF EBOOK here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download EPUB Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... Download doc Ebook here { https://redirect.is/fyxsb0u } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Rendering Battlefield 4 with Mantle

  1. 1. RENDERING BATTLEFIELD 4 WITH MANTLE Johan Andersson – Electronic Arts
  2. 2. 2
  3. 3. 3 DX11 Mantle Avg: 78 fps Min: 42 fps Core i7-3970x, AMD Radeon R9 290x, 1080p ULTRA Avg: 120 fps Min: 94 fps+58%!
  4. 4. 4 BF4 MANTLE GOALS Goals: – Significantly improve CPU performance – More consistent & stable performance – Improve GPU performance where possible – Add support for a new Mantle rendering backend in a live game  Minimize changes to engine interfaces  Compatible with built PC content – Work on wide set of hardware  APU to quad-GPU  But x64 only (32-bit Windows needs to die) Non-goals: – Design new renderer from scratch for Mantle – Take advantage of asymmetric MGPU (APU+discrete) – Optimize video memory consumption
  5. 5. 5 BF4 MANTLE STRATEGIC GOALS  Prove that low-level graphics APIs work outside of consoles  Push the industry towards low-level graphics APIs everywhere  Build a foundation for the future that we can build great games on
  6. 6. 6 SHADERS
  7. 7. 7 SHADERS  Shader resource bind points replaced with a resource table object - descriptor set – This is how the hardware accesses the shader resources – Flat list of images, buffers and samplers used by any of the shader stages – Vertex shader streams converted to vertex shader buffer loads  Engine assign each shader resource to specific slot in the descriptor set(s) – Can share slots between shader stages = smaller descriptor sets – The mapping takes a while to wrap one’s head around
  8. 8. 8 SHADER CONVERSION  DX11 bytecode shaders gets converted to AMDIL & mapping applied using ILC tool – Done at load time – Don’t have to change our shaders!  Have full source & control over the process  Could write AMDIL directly or use other frontends if wanted
  9. 9. 9 DESCRIPTOR SETS  Very simple usage in BF4: for each draw call write flat list of resources –Essentially direct replacement of SetTexture/SetConstantBuffer/SetInputStream  Single dynamic descriptor set object per frame  Sub-allocate for each draw call and write list of resources  ~15000 resource slots written per frame in BF4, still very fast
  10. 10. 10 DESCRIPTOR SETS
  11. 11. 11 DESCRIPTOR SETS – FUTURE OPTIMIZATIONS  Use static descriptor sets when possible  Reduce resource duplication by reusing & sharing more across shader stages  Nested descriptor sets
  12. 12. 12 COMPUTE PIPELINES  1:1 mapping between pipeline & shader  No state built into pipeline  Can execute in parallel with rendering  ~100 compute pipelines in BF4
  13. 13. 13 GRAPHICS PIPELINES  All graphics shader stages combined to a single pipeline object together with important graphics state  ~10000 graphics pipelines in BF4 on a single level, ~25 MB of video memory  Could use smaller working pool of active state objects to keep reasonable amount in memory – Have not been required for us
  14. 14. 14 PRE-BUILDING PIPELINES  Graphics pipeline creation is expensive operation, do at load time instead of runtime! – Creating one of our graphics pipelines take ~10-60 ms each – Pre-build using N parallel low-priority jobs – Avoid 99.9% of runtime stalls caused by pipeline creation!  Requires knowing the graphics pipeline state that will be used with the shaders – Primitive type – Render target formats – Render target write masks – Blend modes  Not fully trivial to know all state, may require engine changes / pre-defining use cases – Important to design for!
  15. 15. 15 PIPELINE CACHE  Cache built pipelines both in memory cache and disk cache – Improved loading times – Max 300 MB – Simple LRU policy – LZ4 compressed (free)  Database signature: – Driver version – Vendor ID – Device ID
  16. 16. 16 MEMORY
  17. 17. 17 MEMORY MANAGEMENT  Mantle devices exposes multiple memory heaps with characteristics – Can be different between devices, drivers and OS:es  User explicitly places resources in wanted heaps – Driver suggests preferred heaps when creating objects, not a requirement Type Size Page CPU access GPU Read GPU Write CPU Read CPU Write Local 256 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 130 170 0.0058 2.8 Local 4096 MB 65535 130 180 0 0 Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 2.6 2.6 0.1 3.3 Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent 2.6 2.6 3.2 2.9
  18. 18. 18 FROSTBITE MEMORY HEAPS  System Shared Mapped – CPU memory that is GPU visible. – Write combined & persistently mapped = easy & fast to write to in parallel at any time  System Shared Pinned – CPU cached for readback. – Not used much  Video Shared – GPU memory accessible by CPU. Used for descriptor sets and dynamic buffers – Max 256 MB (legacy constraint) – Avoid keeping persistently mapped as WDMM doesn’t like this and can decide to move it back to CPU memory   Video Private – GPU private memory. – Used for render targets, textures and other resources CPU does not need to access
  19. 19. 19 MEMORY REFERENCES  WDDM needs to know which memory allocations are referenced for each command buffer – In order to make sure they are resident and not paged out – Max ~1700 memory references are supported – Overhead with having lots of references  Engine needs to keep track of what memory is referenced while building the command buffers – Easy & fast to do – Each reference is either read-only or read/write – We use a simple global list of references shared for all command buffers.
  20. 20. 20 MEMORY POOLING  Pooling memory allocations were required for us – Sub allocate within larger 1 – 32 MB chunks – All resources stored memory handle + offset – Not as elegant as just void* on consoles – Fragmentation can be a concern, not too much issues for us in practice  GPU virtual memory mapping is fully supported, can simplify & optimize management
  21. 21. 21 OVERCOMMITTING VIDEO MEMORY  Avoid overcommitting video memory! – Will lead to severe stalls as VidMM moves blocks and moves memory back and forth – VidMM is a black box  – One of the biggest issues we ran into during development  Recommendations – Balance memory pools – Make sure to use read-only memory references – Use memory priorities
  22. 22. 22 MEMORY PRIORITIES  Setting priorities on the memory allocations helps VidMM choose what to page out when it has to  5 priority levels – Very high = Render targets with MSAA – High = Render targets and UAVs – Normal = Textures – Low = Shader & constant buffers – Very low = vertex & index buffers
  23. 23. 23 MEMORY RESIDENCY FUTURE  For best results manage which resources are in video memory yourself & keep only ~80% used – Avoid all stalls – Can async DMA in and out  We are thinking of redesigning to fully avoid possibility of overcommitting  Hoping WDDM’s memory residency management can be simplified & improved in the future
  24. 24. 24 RESOURCE MANAGEMENT
  25. 25. 25 RESOURCE LIFETIMES  App manages lifetime of all resources – Have to make sure GPU is not using an object or memory while we are freeing it on the CPU – How we’ve always worked with GPUs on the consoles – Multi-GPU adds some additional complexity that consoles do not have  We keep track of lifetimes on a per frame granularity – Queues for object destruction & free memory operations – Add to queue at any time on the CPU – Process queues when GPU command buffers for the frame are done executing – Tracked with command buffer fences
  26. 26. 26 LINEAR FRAME ALLOCATOR  We use multiple linear allocators with Mantle for both transient buffers & images – Used for huge amount of small constant data and other GPU frame data that CPU writes – Easy to use and very low overhead – Don’t have to care about lifetimes or state  Fixed memory buffers for each frame – Super cheap sub-allocation from from any thread – If full, use heap allocation (also fast due to pooling)  Alternative: ring buffers – Requires being able to stall & drain pipeline at any allocation if full, additional complexity for us
  27. 27. 27 TILING  Textures should be tiled for performance – Explicitly handled in Mantle, user selects linear or tiled – Some formats (BC) can’t be accessed as linear by the GPU  On consoles we handle tiling offline as part of our data processing pipeline – We know the exact tiling formats and have separate resources per platform  For Mantle – Tiling formats are opaque, can be different between GPU architectures and image types – Tile textures with DMA image upload from SystemShared to VideoPrivate  Linear source, tiled destination  Free
  28. 28. 28 COMMAND BUFFERS
  29. 29. 29 COMMAND BUFFERS  Command buffers are the atomic unit of work dispatched to the GPU – Separate creation from execution – No “immediate context” a la DX11 that can execute work at any call – Makes resource synchronization and setup significantly easier & faster  Typical BF4 scenes have around ~50 command buffers per frame – Reasonable tradeoff for us with submission overhead vs CPU load-balancing
  30. 30. 30 COMMAND BUFFER SOURCES  Frostbite has 2 separate sources of command buffers – World rendering  Rendering the world with tons of objects, lots of draw calls. Have all frame data up front  All resources except for render targets are read-only  Generated in parallel up front each frame – Immediate rendering (“the rest”)  Setting up rendering and doing lighting, post-fx, virtual texturing, compute, etc  Managing resource state, memory and running on different queues (graphics, compute, DMA)  Sequentially generated in a single job, simulate an immediate context by splitting the command buffer  Both are very important and have different requirements
  31. 31. 31 RESOURCE TRANSITIONS  Key design in Mantle to significantly lower driver overhead & complexity – Explicit hazard tracking by the app/engine – Drives architecture-specific caches & compression – AMD: FMASK, CMASK, HTILE – Enables explicit memory management  Examples: – Optimal render target writes → Graphics shader read-only – Compute shader write-only → DrawIndirect arguments  Mantle has a strong validation layer that tracks transitions which is a major help
  32. 32. 32 MANAGING RESOURCE TRANSITIONS  Engines need a clear design on how to handle state transitions  Multiple approaches possible: – Sequential in-order command buffers  Generate one command buffer at the time in order  Transition resources on-demand when doing operation on them, very simple  Recommendation: start with this – Out-of-order multiple command buffers  Track state per command buffer, fix up transitions when order of command buffers is known – Hybrid approaches & more
  33. 33. 33 MANAGING RESOURCE TRANSITIONS IN FROSTBITE  Current approach in Frostbite is quite basic: – We keep track of a single state for each resource (not subresource) – The “immediate rendering” transition resources as needed depending on operation – The out of order “world rendering” command buffers don’t need to transition states  Already have write access to MRTs and read-access to all resources setup outside them  Avoids the problem of them not knowing the state during generation  Works now but as we do more general parallel rendering it will have to change – Track resource state for each command buffer & fixup between command buffers
  34. 34. 34 DYNAMIC STATE OBJECTS  Graphics state is only set with the pipeline object and 5 dynamic state objects – State objects: color blend, raster, viewport, depth-stencil, MSAA – No other parameters such as in DX11 with stencil ref or SetViewport functions  Frostbite use case: – Pre-create when possible – Otherwise on-demand creation (hash map) – Only ~100 state objects!  Still possible to end up with lots of state objects – Esp. with state object float & integer values (depth bounds, depth bias, viewport) – But no need to store all permutations in memory, objects are fast to create & app manages lifetimes
  35. 35. 35 QUEUES
  36. 36. 36 QUEUES  Universal queue can do both graphics, compute and presents  We use also use additional queues to parallelize GPU operations: – DMA queue – Improve perf with faster transfers & avoiding idling graphics will transfering – Compute queue - Improve perf by utilizing idle ALU and update resources simultaneously with gfx  More GPUs = more queues!
  37. 37. 37  Order of execution within a queue is sequential  Synchronize multiple queues with GPU semaphores (signal & wait)  Also works across multiple GPUs Compute Graphics QUEUES SYNCHRONIZATION S Wait W S
  38. 38. 38 QUEUES SYNCHRONIZATION CONT  Started out with explicit semaphores – Error prone to handle when having lots of different semaphores & queues – Difficult to visualize & debug  Switched to more representation more similar to a job graph  Just a model on top of the semaphores
  39. 39. 39 GPU JOB GRAPH  Each GPU job has list of dependencies (other command buffers)  Dependencies has to finish first before job can run on its queue  The dependencies can be from any queue  Was easier to work with, debug and visualize  Really extendable going forward Graphics 1 Graphics 2 DMA Compute Graphics 2
  40. 40. 40 ASYNC DMA  AMD GPUs have dedicated hardware DMA engines, let’s use them! – Uploading through DMA is faster than on universal queue, even if blocking – DMA have alignment restrictions, have to support falling back to copies on universal queue  Use case: Frame buffer & texture uploads – Used by resource initial data uploads and our UpdateSubresource – Guaranteed to be finished before the GPU universal queue starts rendering the frame  Use case: Multi-GPU frame buffer copy – Peer-to-peer copy of the frame buffer to the GPU that will present it
  41. 41. 41 ASYNC COMPUTE  Frostbite has lots of compute shader passes that could run in parallel with graphics work – HBAO, blurring, classification, tile-based lighting, etc  Running as async compute can improve GPU performance by utilizing ”free” ALU – For example while doing shadowmap rendering (ROP bound)
  42. 42. 42 ASYNC COMPUTE – TILE-BASED LIGHTING  3 sequential compute shaders – Input: zbuffer & gbuffer – Output: HDR texture/UAV  Runs in parallel with graphics pipeline that renders to other targets Compute Graphics TileZ Gbuffer Shadowmaps Reflection Distort Transp Cull lights Lighting S SWait W
  43. 43. 43 ASYNC COMPUTE – TILE-BASED LIGHTING  We manually prepare the resources for the async compute – Important to not access the resources on other queues at the same time (unless read-only state) – Have to transition resources on the queue that last used it  Up to 80% faster in our initial tests, but not fully reliable – But is a pretty small part of the frame time – Not in BF4 yet Compute Graphics TileZ Gbuffer Shadowmaps Reflection Distort Transp Cull lights Lighting S SWait W
  44. 44. 44 MULTI-GPU
  45. 45. 45 MULTI-GPU  Multi-GPU alternatives: – AFR – Alternate Frame Rendering (1-4 GPUs of the same power) – Heterogeneous AFR – 1 small + 1 big GPU (APU + Discrete) – SFR – Split Frame Rendering – Multi-GPU Job Graph – Primary strong GPU + slave GPUs helping  Frostbite supports AFR natively – No synchronization points within the frame – For resources that are not rendered every frame: re-render resources for each GPU  Example: sky envmap update on weather change  With Mantle multi-GPU is explicit and we have to build support for it ourselves
  46. 46. 46 MULTI-GPU AFR WITH MANTLE  All resources explicitly duplicated on each GPU with async DMA – Hidden internally in our rendering abstraction  Every frame alternate which GPU we build command buffers for and are using resources from  Our UpdateSubresource has to make sure it updates resources on all GPU  Presenting the screen has to in some modes copy the frame buffer to the GPU that owns the display  Bonus: – Can simulate multi-GPU mode even with single GPU! – Multi-GPU works in windowed mode!
  47. 47. 47  GPUs are independently rendering & presenting to the screen – can cause micro-stuttering – Frames are not presented in a regular intervals – Frame rate can be high but presentation & gameplay is not smooth – FCAT is a good tool to analyse this MULTI-GPU ISSUES GPU0 GPU1 Frame 0 P Frame 1 P Frame 2 P Frame 3 P GPU0 GPU1 Irregular presentation interval
  48. 48. 48  GPUs are independently rendering & presenting to the screen – can cause micro-stuttering – Frames are not presented in a regular intervals – Frame rate can be high but presentation & gameplay is not smooth – FCAT is a good tool to analyse this  We need to introduce dependency & dampening between the GPUs to alleviate this – frame pacing MULTI-GPU ISSUES GPU0 GPU1 Frame 0 P Frame 1 P Frame 2 P Frame 3 P Ideal presentation interval
  49. 49. 49 FRAME PACING  Measure average frame rate on each GPU – Short history (10-30 frames) – Filter out spikes  Insert delay on the GPU before each present – Force the frame times to become more regular and GPUs to align – Delay value is based on the calculate avg frame rate GPU0 GPU1 Frame 0 P Frame 1 P Frame 2 P Frame 3 P GPU0 GPU1 Delay D
  50. 50. 50 CONCLUSION
  51. 51. 51 MANTLE DEV RECOMMENDATIONS  The validation layer is a critical friend!  You’ll end up with a lot of object & memory management code, try share with console code  Make sure you have control over memory usage and can avoid overcommitting video memory  Build a robust solution for resource state management early  Figure out how to pre-create your graphics pipelines, can require engine design changes  Build for multi-GPU support from the start, easier than to retrofit
  52. 52. 52 FUTURE  Second wave of Frostbite Mantle titles  Adapt Frostbite core rendering layer based on learnings from Mantle – Refine binding & buffer updates to further reduce overhead – Virtual memory management – More async compute & async DMAs – Multi-GPU job graph R&D  Linux – Would like to see how our Mantle renderer behaves with different memory management & driver model
  53. 53. 53 QUESTIONS? Email: johan@frostbite.com Web: http://frostbite.com Twitter: @repi

×