SlideShare a Scribd company logo
1 of 38
NITROUS AND MANTLE:
Combining efficient engine design with a modern API
Dan Baker, Partner, Oxide Games
2 | Nitrous and Mantle | 19 March 2014
PRE-REQUISITE MOTIVATIONAL SLIDE
MODERN APIS ARE
STARTING TO FEEL
RATHER DATED
BUT HOW MUCH
BETTER CAN WE
BE?
3 | Nitrous and Mantle | 19 March 2014
PRE-REQUISITE MOTIVATIONAL SLIDE
TURNS OUT… A
WHOLE LOT
FASTER
4 | Nitrous and Mantle | 19 March 2014
HONEY, DOES THIS DRESS MAKE ME LOOK FAT?
…
5 | Nitrous and Mantle | 19 March 2014
STATE OF THE ART TODAY: WHAT’S GOING ON?
Lots of little things add up
2 major problems require rearchitecture
–Functional threading model throws a wrench into task
based systems
–Implicit Hazard tracking and synchronization
API tries to hide the async nature of GPU
Lots of little things, memory model, binding model, etc
Analysis of features like instancing indicate that it is
unreliable and tends to speed up only the fastest
frames, correlation between batches and driver perf is
casual
Can’t RETRO fit old APIS
6 | Nitrous and Mantle | 19 March 2014
DIVING INTO NITROUS
Nitrous = Oxide’s custom engine
Specifically designed for high throughput
Core neutral. Main thread acts only as lightweight
sequencer
All work divided up into small jobs, which are in the
microsecond range
Can produce lots of jobs, 10,000+ range per frame
7 | Nitrous and Mantle | 19 March 2014
STAR SWARM
 Nitrous Engine demo
 Free to download,
experiment
 Proof of concept for
modern API design
 Represents 2 AI
opponents, thus
application CPU load
is realistic
 10,000 units possible
 100,000+ batches
possible
8 | Nitrous and Mantle | 19 March 2014
SECRETS BEHIND STAR SWARM
Much of what is required for high performance isn’t specific to
Mantle
Star Swarm originally not based on Mantle
If engine is structured in certain ways, Mantle support is
straight-forward and intuitive. Maybe even fun.
Work done to restructure engine will have benefits outside of
Mantle support
9 | Nitrous and Mantle | 19 March 2014
ADDING NITROUS TO THE ENGINE
 Rendering broken into jobs which generate autonomous command buffers
 CPU to GPU data streamlined – constants, texture updates go into GPU frame memory
 Shader bindings standardized
 Shaders, state, bundled into blocks
 Resources grouped into sets
 Graphics commands streamlined, restricted bind points
 Stateless command format
 Expensive state transitioned rarely
 Much attention paid to cache usage, lockless data structures
 All hazards detangled, all buffers considered non persistent
10 | Nitrous and Mantle | 19 March 2014
MULTI-CORE CPU BASICS
Be Wary, There Is A Lot Of Very Bad Advice In The Wild
Spawning threads to handle tasks
Relying OS preemptive scheduler, heavy weight OS synchronization primitives
Functional threading in general
Your Survival Guide
OK: Multi-thread read of same location
OK: Multi-thread write to different locations
OK: Multi-thread write to same location in ‘stamp’ mode
CAUTION: Atomic instructions
STOP: Multi-thread read/write to same location
STOP: Multi-thread write to same CACHE line
11 | Nitrous and Mantle | 19 March 2014
NITROUS AND MANTLE
Nitrous is NOT built around Mantle
Reverse is more true, Mantle adapts well to Nitrous
internal concepts
The concepts are what make engine fast
Results are astounding, driver time reduced up to
50x
Mantle is the harbinger of future API design, Not just
in Graphics
12 | Nitrous and Mantle | 19 March 2014
TASK BASED SYSTEM
 Idea is that work load is a constructed graph of much
smaller nuggets
 Many advantages
– Scales well, 32+ cores
– Easy to balance workload
– More power efficient – more slower cores just as good
 Already seeing CPUs dynamically slowing clock speed
– If enough similar work items queued, can execute same
code on cores
 Cache hit rate much higher
– End up generating a larger number of command buffers
to prevent thread serialization
13 | Nitrous and Mantle | 19 March 2014
HOW NITROUS GENERATES COMMANDS
Core 0 Core 1 Core 2
Cmd
Buffer 0
Cmd
Buffer 1
Cmd
Buffer 2
Cmd
Buffer 3
Frame Data
Cmd
Buffer 2
Cmd
Buffer 0
Cmd
Buffer 1
Cmd
Buffer 3
14 | Nitrous and Mantle | 19 March 2014
NITROUS COMMAND FORMATS
 In reality, diagram is over simplified
 Nitrous has it’s own internal command format
– Small, efficient commands
– Stateless, each command contains references to all needed state
– Inheritance unneeded
– Separates internal graphics system from any particular API
 Being Stateless, can be generated completely out of order
 Entire Frame is queued up in internal command format
 Frame is translated to GPU commands via Mantle
– Nitrous Command buffers are translated into Mantle Command Buffers at one section
 Get’s more optimal use out of instruction cache and data cache
15 | Nitrous and Mantle | 19 March 2014
BUILDING AROUND ASYNCRONISITY: HOW NITROUS THINKS OF A FRAME
 Entire app should be exposed to concept of asyncronisity
 The concept of a frame:
– A set of commands which will be executed on the GPU
– A set of data which will be read by the GPU
– This concept is fundamental in Nitrous, regardless of API
Frame
CMD CMD CMD CMD
Frame Data
Persistent
Textures Big
Transfer
Buffers
Resource
Sets
16 | Nitrous and Mantle | 19 March 2014
CREATING A FRAME, USING FRAME DATA
 Create 2 copies of our frame data
 One will be read by GPU, while
other is being written to by the CPU
 Must use fence to make sure CPU
doesn’t get ahead
 More complex situations could be
explored
 Frame data includes
– Constant Data
– Small texture updates
Even Frame
Odd Frame
GPU
CPU
17 | Nitrous and Mantle | 19 March 2014
STARTING OUR FRAME
g_fTotalFrameTime = System::Time::TimeAsSeconds( System::Time::PeekTime()) - g_StartTotalFrameTime ;
g_uFrameBuffer = (g_uFrame % TransferMemory::BUFFERS);
if(g_bProtectMemory)
ProtectMemory(g_CmdTransferMemory.PerFrameMemory[g_uFrameBuffer].pData, g_CmdTransferMemory.Capacity, PO_READWRITE);
gr = grWaitForFences(g_Device, 1, &g_FrameFences[g_uFrameBuffer], true, MAX_FRAME_WAIT_TIME_SECONDS);
if(gr == GR_TIMEOUT) // Throw the frame out
return false;
g_StartTotalFrameTime = System::Time::TimeAsSeconds( System::Time::PeekTime());
//Reset our allocation and map the current GPU memory
HeapAndChunk HeapChunk = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].Memory;
MarkMemoryChunk(HeapChunk);
GR_GPU_MEMORY DynamicMemory = g_MemoryHeap.MemoryChunk[HeapChunk.Heap][HeapChunk.SubChunk].Memory;
gr = grMapMemory(DynamicMemory, 0, (GR_VOID**) &g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData);
g_CmdTransferMemory.PerFrameMemory[g_uFrameBuffer ].Offset = 0;
g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer ].Offset = 0;
g_GPUTransferMemory.pCurrentMantleMemoryBase = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData;
g_GPUTransferMemory.pCurrentMantleMemoryEnd = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData +
g_GPUTransferMemory.Capacity;
g_GPUTransferMemory.CurrentMantleMemory = DynamicMemory;
g_GPUTransferMemory.TempHeapMemory = 0;
return true;
18 | Nitrous and Mantle | 19 March 2014
HINT
Use memory heap that has highest cpuWritePerfRating
In Debug, rather then copying directly to GPU memory,
allocate CPU memory
–Or use pinned Mantle memory
Then, use OS call Virtual Protect with PAGE_NOACCESS for
any data that effects the frame, while the frame is being
accessed by GPU, or could be being translated by the CPU
If any part of system inadvertently writes to the memory, will
throw exception
19 | Nitrous and Mantle | 19 March 2014
SOME EXTRA STUFF WE WILL NEED
Because we track hazards, we will want a few more buffers
A delete queue – objects are not deleted, but placed in the delete queue
–One queue per frame, once that frame is complete, items will be deleted
A state transition queue
–Used only when a resource is created, to transition it to the desired
initial state
An Unordered Command Queue
–Gets flushed before main frames command queue
–Useful for preparing resources for first time use (e.g. initialization)
20 | Nitrous and Mantle | 19 March 2014
INTERNAL COMMAND FORMAT
 Nitrous has it’s own internal command format
 Persistent state:
– Resource Sets
– Shader Blocks
– Various pipeline state
 Frame State, primary construct is a batch set
– Contains primitives, batches and shader sets
– Batches which reference
 Primitives
 Shader Sets
– Constant references are made into our frame memory
 Each one of these has a different, natural change frequency
21 | Nitrous and Mantle | 19 March 2014
NITROUS MEMORY POOLS
Resources used together, created together
Multiple resource sets are often pooled
Simplifies memory management, less then
1000 total allocations
Orange Team Unit’s Memory
FIGHTER 1 CAR. REAR
CAR. FOR CARRIER MAIN
(0) Albiedo
(1) Material Mask
(2) Ambient
Occlusion
(3) Normal Map
(4) Weathering Map
(0) Albiedo
(1) Material Mask
(2) Ambient
Occlusion
(3) Normal Map
(4) Weathering Map
(0) Albiedo
(1) Material Mask
(2) Ambient
Occlusion
(3) Normal Map
(4) Weathering Map
(0) Albiedo
(1) Material Mask
(2) Ambient
Occlusion
(3) Normal Map
(4) Weathering Map
22 | Nitrous and Mantle | 19 March 2014
NITROUS MEMORY POOLS
 GPU resource allocation a little tricky – we don’t
know ahead of time how big something might be
 2 step process, first calculate size of resource,
then allocate pool based on that size
 Does not map 1:1 to Mantle memory allocations
 Instead, Pool is created with default page size
 When a new resource is added, either it places
inside current allocation, or if resource is bigger
then the page size, creates a new allocation that
fits the resource
 A memory pool in Nitrous = a list of allocations in
Mantle
 If able to size ahead of time, only 1 allocation
Unit Textures
Diffuse
Specular
Mask
AO
Normal
Mantle
Alloc
Mantle
Alloc
Mantle
Alloc
23 | Nitrous and Mantle | 19 March 2014
CREATING A RESOURCE
//Setup our resource decriptions
ResourceDesc RedLUTDesc, BlueLUTDesc, GreenLUTDesc;
RedLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfRedData, 0);
BlueLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfBlueData, 0);
GreenLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfGreenData, 0);
//Calculate Size
uint64 uSize = 0;
uSize += GetResourceMemorySize(RedLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU);
uSize += GetResourceMemorySize(BlueLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU);
uSize += GetResourceMemorySize(GreenLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU);
//Create a GPU heap, sized to what we want
g_GPUMemory = AllocateHeap(uSize, Graphics::HT_GPU);
//Create the Resources
ColorLuts.RedLUT.Resource = CreateResource(RedLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);
ColorLuts.BlueLUT.Resource = CreateResource(BlueLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);
ColorLuts.GreenLUT.Resource = CreateResource(GreenLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);
24 | Nitrous and Mantle | 19 March 2014
SOME EXTRA MANAGEMENT REQUIRED
 Creating a Resource slightly more involved
 When a resource creation call occurs, check to see if we are a GPU heap
 If so, no way to directly map memory and upload resource so
– 1) Allocate, or recycle a CPU visible heap object
– 2) Create Resource and map into this heap
– 3) Create Resource on the GPU in the specified heap, (it will be uninitialized)
– 4) Issue a copy command in our Unordered Command Queue
– 5) Place temp resource in a deletion queue
 For any resource, we allow a default state to be specified
– At beginning of frame, before we execute main comands, issue any state transition queues to place
resources from default state into desired state
25 | Nitrous and Mantle | 19 March 2014
RESOURCE SETS
 In real world, textures are grouped
 Nitrous has 5 bind points
– 2 for batch
– 2 for shader
– 1 for primitive
 VB is just a resource set
 Nitrous does not allow binding of individual
textures
 Clearly, maps 1:1 to a descriptor
Space Fighter 1
(0) Albiedo
(1) Material Mask
(2) Ambient Occlusion
(3) Normal Map
(4) Weathering Map
26 | Nitrous and Mantle | 19 March 2014
VERTEX BUFFERS
 Nitrous does not use Vertex Buffers
 Instead, Resource Set acts as VB, but with more programmatic control
 Vastly simplifies engine side management
– VBs can be saved as DDS files
– Do not require a huge amount of loading code for slightly different Vertex Formats
– Can fold Displacement maps and other geometry modifiers into Primitive Resource Set
 Not seen strong evidence on any hardware that this causes a performance issue
27 | Nitrous and Mantle | 19 March 2014
CONSTANT BUFFERS
 Nitrous does not have concept of constant buffers
 Instead, all constant data is thrown out every frame
– When we render an object, CPU will generate the constants needed for that frame
– Grab a piece of the Frame Memory and write to it
 Constant bindings are just references into our frame memory
 But… be careful! CPU is writing straight to GPU memory. Do NOT read it back!
 Evidence suggests no performance advantage of persisting constants across frames, regenerating every
frame is ample fast. 100k+ batches not a problem
28 | Nitrous and Mantle | 19 March 2014
A BATCH IN NITROUS CONSISTS OF 4 PARTS
Batch Set
Prim 0 Prim 1 Prim 2
Shader 0 Shader 1
Batch
0
Batch
1
Batch
2
Batch
3
Batch
4
Primitive
IB
Resources
Tri info
Shader
Resources (2)
Constants (2)
Shader Block
Batch
Primitive
Shader
Resources (2)
Constants (2)
Batch Set
Batches
Primitives
Shaders
RTs
Blend State
29 | Nitrous and Mantle | 19 March 2014
DESCRIPTOR TABLE LAYOUT FOR NITROUS
Descriptor 0
*Batch Resource Set 0
*Batch Resource Set 1
Batch Constants 1
Batch Constants 2
*Shader Resource Set 0
*Shader Resource Set 1
Shader Constants 0
Shader Constants 1
*UAV
*Samplers (only 1 global bank)
Descriptor 1
*Primitive VB
Dynamic Const
Batch Constants 0
30 | Nitrous and Mantle | 19 March 2014
DESCRIPTOR BINDING STRATEGY
 Remember: Descriptors are just structures on GPU memory, so need to double buffer as well
 Create 1 giant descriptor table, start update at beginning of frame
 Recognize that we have a resource bind vector of only 9 items
 Each bind vector can be built into a descriptor table, but don’t need unique one
 Check to see if this bind vector has been built before(During this frame), e.g. resident in a small cache, if
so, just reference it
 If not, build a new descriptor table, and place in cache
 Dynamic constants, batch constant 0, uses grCmdBindDynamicMemoryView
– Usually, this will change every call (e.g. some part of the batch is changing or else it’s the same batch)
 Using grCmdBindDynamicMemoryView, for 100k batches, about 5-10k descriptors actually need to get
built per frame
31 | Nitrous and Mantle | 19 March 2014
TRACKING RESOURCE USAGE
 Apps responsibility to track what resources get used
 Simple strategy: Stamp a frame number on each
memory pool anytime it is bound
 Traverse the complete resource list, anything which
matches current frame must be resident
 Quick as long as we keep # of heaps reasonable
 Important: Frame # should be padded into a cache line
to avoid serialization
Heap description Last Frame Used
UI Textures intro 2401
UI Textures in Game 17204
Orange Faction Units 17204
Purple Faction Units 17204
Weapon effects 16392
Post Process RTs 17204
Terrain Heightmap 17204
32 | Nitrous and Mantle | 19 March 2014
DEALING WITH STATE TRANSITIONS
 Most important, difficult part of Mantle
 Must understand anytime a resource is getting used in a different way,
 Read After Write
 Write After Write
33 | Nitrous and Mantle | 19 March 2014
SHADER BLOCKS
 Shader Blocks
– Group of shaders with identical resources
– Key point : all shader stages grouped together
– All resources are bound to all stages
– For mantle, need add some extra data
 Can we blend?
 What back buffer formats might be used?
 What z buffer formats might be used?
– Create a matrix of pipeline objects based on specified
modes
 The right pipeline objet is selected based on current RT state
 RTs and blendstate already chunked, no extra state changes
introduced
ShaderGroup SimpleShader
{
ResourceSetPrimitive = VertexData;
ConstantSetDynamic[0] = DynamicData;
ResourceSetBatch[1] = UserTS;
ConstantSetShader[0] = Globals;
RenderTargetFormats = R16G16B16A16_FLOAT,
R11G11B10_FLOAT;
BlendStates = BlendOff;
DepthTargetFormats = D32_FLOAT;
Methods
{
main:
CodeBlocks = SimpleShaders;
VertexShader = SimpleVSShader;
PixelShader = SimplePSShader;
zprime:
CodeBlocks = SimpleShaders;
VertexShader = SimpleVSShader;
PixelShader = BlankSimplePSShader;
}
}
34 | Nitrous and Mantle | 19 March 2014
CREATING SHADER BLOCKS IN MANTLE
 Translate HLSL Byte code to Mantle IC
– All done at compile time, have a Mantle speific executable
 Creating a Mapping Table
– Batch has 5 bind points
– Shader has 4 bind points
– Batch Set has 1 bind point
– Primitive has 1 bind point
– Global Samplers have 1 bind point
 Set up our IC so all pipeline objects use exactly the same top level desciptor
35 | Nitrous and Mantle | 19 March 2014
WHAT ABOUT THAT PRESENT?
 Unlike other APIS, we do not need, or should, block on the present on the main thread
 Instead we spawn a job, which we block against on the next present
Void PresentJob()
{
…
result = grQueueSubmit(g_UniversalQueue, g_cCommandBuffers, g_CommandBuffers,
cMemRefs, MemRef, g_FrameFences[g_uSubmittingFrameBuffer]);
uint32 PresentFlags = 0;
if(g_bVSync)
PresentFlags = GR_WSI_WIN_PRESENT_FLIP_DONOTWAIT;
// instruct the GPU to present the backbuffer in the applications window
GR_WSI_WIN_PRESENT_INFO presentInfo =
{
g_hWnd, g_MasterResourceList.Images[DR_BACKBUFFER],
GR_WSI_WIN_PRESENT_MODE_BLT, 0, PresentFlags
};
result = grWsiWinQueuePresent(g_UniversalQueue, &presentInfo);
SignalProcessAndPresentDone(pInfo);
}
}
36 | Nitrous and Mantle | 19 March 2014
WHAT OUR FRAME SUBMISSION LOOKS LIKE
1) Block on last frames present’s job (e.g. NOT the fence, the actual job we spawned)
2) Process and pending resource transitions from newly created resources
3) Generate all pending unordered commands, by generating into 1 or more cmd buffers
4) Send signals to the issuers of unordered commands, to notify them the commands are submiitted
5) Begin translation of Nitrous cmds into Mantle cmds – usually 100-500 jobs across all cores
6) Flush the deletion queues for this frame (likely a few frames old at this point)
7) Any item in our master deletion queue, add to the now empty deletion queue for this frame
8) Handle memory readbacks
9) Spawn Present job
37 | Nitrous and Mantle | 19 March 2014
FUTURE WORK
 Now have explicit control over Multi GPU
 Can write better MGPU solutions, like split screen which will not increase latency
– We just got rid of a bunch of latency, don’t want to add it back!
 Asymetric GPU use situations are doable – e.g. using integrated graphics in tandem with Discrete GPU
38 | Nitrous and Mantle | 19 March 2014
RESULTS
 Star Swarm surprised both Oxide and AMD
– We were not expecting to see cases where application was 300-400% faster, still room for
optimizations
– Right now, we are clearly GPU bound, will release an update soon that increases CPU utilization a little
bit to optimize GPU, expecting 10-20% more performance out of Mantle on high end GPUs
 Driver overhead very consistent, well correlated to number of calls made
 About 2 man months of work
– For an Alpha API, likely 1 month if final version
 Especially telling on slower CPUs, surprising number of cases with high end GPUS with old CPUs
 Try for yourself: Star Swarm is free to download on Steam!

More Related Content

What's hot

SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingMarian Marinov
 
Trivadis TechEvent 2016 cgroups im Einsatz von Florian Feicht
Trivadis TechEvent 2016 cgroups im Einsatz von Florian FeichtTrivadis TechEvent 2016 cgroups im Einsatz von Florian Feicht
Trivadis TechEvent 2016 cgroups im Einsatz von Florian FeichtTrivadis
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memoryjournalacij
 
LSA2 - 02 Control Groups
LSA2 - 02   Control GroupsLSA2 - 02   Control Groups
LSA2 - 02 Control GroupsMarian Marinov
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 

What's hot (10)

SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Trivadis TechEvent 2016 cgroups im Einsatz von Florian Feicht
Trivadis TechEvent 2016 cgroups im Einsatz von Florian FeichtTrivadis TechEvent 2016 cgroups im Einsatz von Florian Feicht
Trivadis TechEvent 2016 cgroups im Einsatz von Florian Feicht
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memory
 
Technical Skills.pdf
Technical Skills.pdfTechnical Skills.pdf
Technical Skills.pdf
 
Memperf
MemperfMemperf
Memperf
 
Modern processors
Modern processorsModern processors
Modern processors
 
LSA2 - 02 Control Groups
LSA2 - 02   Control GroupsLSA2 - 02   Control Groups
LSA2 - 02 Control Groups
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Nvprof um 1
Nvprof um 1Nvprof um 1
Nvprof um 1
 

Similar to Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AMD at GDC14

Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLinaro
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASYasunori Goto
 
DB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlDB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlFlorence Dubois
 
2014 valat-phd-defense-slides
2014 valat-phd-defense-slides2014 valat-phd-defense-slides
2014 valat-phd-defense-slidesSébastien Valat
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance mentoresd
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraDataStax Academy
 
Nt1310 Unit 3 Computer Components
Nt1310 Unit 3 Computer ComponentsNt1310 Unit 3 Computer Components
Nt1310 Unit 3 Computer ComponentsKristi Anderson
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
IP Address Lookup By Using GPU
IP Address Lookup By Using GPUIP Address Lookup By Using GPU
IP Address Lookup By Using GPUJino Antony
 
eBPF in the view of a storage developer
eBPF in the view of a storage developereBPF in the view of a storage developer
eBPF in the view of a storage developerRichárd Kovács
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUZhengjie Lu
 

Similar to Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AMD at GDC14 (20)

Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
The ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RASThe ideal and reality of NVDIMM RAS
The ideal and reality of NVDIMM RAS
 
DB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlDB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and control
 
2014 valat-phd-defense-slides
2014 valat-phd-defense-slides2014 valat-phd-defense-slides
2014 valat-phd-defense-slides
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and Cassandra
 
Notes on NUMA architecture
Notes on NUMA architectureNotes on NUMA architecture
Notes on NUMA architecture
 
Nt1310 Unit 3 Computer Components
Nt1310 Unit 3 Computer ComponentsNt1310 Unit 3 Computer Components
Nt1310 Unit 3 Computer Components
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
20120140505010
2012014050501020120140505010
20120140505010
 
IP Address Lookup By Using GPU
IP Address Lookup By Using GPUIP Address Lookup By Using GPU
IP Address Lookup By Using GPU
 
eBPF in the view of a storage developer
eBPF in the view of a storage developereBPF in the view of a storage developer
eBPF in the view of a storage developer
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPU
 
Co question 2006
Co question 2006Co question 2006
Co question 2006
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
AIX Performance Tuning Session at STU2017
AIX Performance Tuning Session at STU2017AIX Performance Tuning Session at STU2017
AIX Performance Tuning Session at STU2017
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AMD at GDC14

  • 1. NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games
  • 2. 2 | Nitrous and Mantle | 19 March 2014 PRE-REQUISITE MOTIVATIONAL SLIDE MODERN APIS ARE STARTING TO FEEL RATHER DATED BUT HOW MUCH BETTER CAN WE BE?
  • 3. 3 | Nitrous and Mantle | 19 March 2014 PRE-REQUISITE MOTIVATIONAL SLIDE TURNS OUT… A WHOLE LOT FASTER
  • 4. 4 | Nitrous and Mantle | 19 March 2014 HONEY, DOES THIS DRESS MAKE ME LOOK FAT? …
  • 5. 5 | Nitrous and Mantle | 19 March 2014 STATE OF THE ART TODAY: WHAT’S GOING ON? Lots of little things add up 2 major problems require rearchitecture –Functional threading model throws a wrench into task based systems –Implicit Hazard tracking and synchronization API tries to hide the async nature of GPU Lots of little things, memory model, binding model, etc Analysis of features like instancing indicate that it is unreliable and tends to speed up only the fastest frames, correlation between batches and driver perf is casual Can’t RETRO fit old APIS
  • 6. 6 | Nitrous and Mantle | 19 March 2014 DIVING INTO NITROUS Nitrous = Oxide’s custom engine Specifically designed for high throughput Core neutral. Main thread acts only as lightweight sequencer All work divided up into small jobs, which are in the microsecond range Can produce lots of jobs, 10,000+ range per frame
  • 7. 7 | Nitrous and Mantle | 19 March 2014 STAR SWARM  Nitrous Engine demo  Free to download, experiment  Proof of concept for modern API design  Represents 2 AI opponents, thus application CPU load is realistic  10,000 units possible  100,000+ batches possible
  • 8. 8 | Nitrous and Mantle | 19 March 2014 SECRETS BEHIND STAR SWARM Much of what is required for high performance isn’t specific to Mantle Star Swarm originally not based on Mantle If engine is structured in certain ways, Mantle support is straight-forward and intuitive. Maybe even fun. Work done to restructure engine will have benefits outside of Mantle support
  • 9. 9 | Nitrous and Mantle | 19 March 2014 ADDING NITROUS TO THE ENGINE  Rendering broken into jobs which generate autonomous command buffers  CPU to GPU data streamlined – constants, texture updates go into GPU frame memory  Shader bindings standardized  Shaders, state, bundled into blocks  Resources grouped into sets  Graphics commands streamlined, restricted bind points  Stateless command format  Expensive state transitioned rarely  Much attention paid to cache usage, lockless data structures  All hazards detangled, all buffers considered non persistent
  • 10. 10 | Nitrous and Mantle | 19 March 2014 MULTI-CORE CPU BASICS Be Wary, There Is A Lot Of Very Bad Advice In The Wild Spawning threads to handle tasks Relying OS preemptive scheduler, heavy weight OS synchronization primitives Functional threading in general Your Survival Guide OK: Multi-thread read of same location OK: Multi-thread write to different locations OK: Multi-thread write to same location in ‘stamp’ mode CAUTION: Atomic instructions STOP: Multi-thread read/write to same location STOP: Multi-thread write to same CACHE line
  • 11. 11 | Nitrous and Mantle | 19 March 2014 NITROUS AND MANTLE Nitrous is NOT built around Mantle Reverse is more true, Mantle adapts well to Nitrous internal concepts The concepts are what make engine fast Results are astounding, driver time reduced up to 50x Mantle is the harbinger of future API design, Not just in Graphics
  • 12. 12 | Nitrous and Mantle | 19 March 2014 TASK BASED SYSTEM  Idea is that work load is a constructed graph of much smaller nuggets  Many advantages – Scales well, 32+ cores – Easy to balance workload – More power efficient – more slower cores just as good  Already seeing CPUs dynamically slowing clock speed – If enough similar work items queued, can execute same code on cores  Cache hit rate much higher – End up generating a larger number of command buffers to prevent thread serialization
  • 13. 13 | Nitrous and Mantle | 19 March 2014 HOW NITROUS GENERATES COMMANDS Core 0 Core 1 Core 2 Cmd Buffer 0 Cmd Buffer 1 Cmd Buffer 2 Cmd Buffer 3 Frame Data Cmd Buffer 2 Cmd Buffer 0 Cmd Buffer 1 Cmd Buffer 3
  • 14. 14 | Nitrous and Mantle | 19 March 2014 NITROUS COMMAND FORMATS  In reality, diagram is over simplified  Nitrous has it’s own internal command format – Small, efficient commands – Stateless, each command contains references to all needed state – Inheritance unneeded – Separates internal graphics system from any particular API  Being Stateless, can be generated completely out of order  Entire Frame is queued up in internal command format  Frame is translated to GPU commands via Mantle – Nitrous Command buffers are translated into Mantle Command Buffers at one section  Get’s more optimal use out of instruction cache and data cache
  • 15. 15 | Nitrous and Mantle | 19 March 2014 BUILDING AROUND ASYNCRONISITY: HOW NITROUS THINKS OF A FRAME  Entire app should be exposed to concept of asyncronisity  The concept of a frame: – A set of commands which will be executed on the GPU – A set of data which will be read by the GPU – This concept is fundamental in Nitrous, regardless of API Frame CMD CMD CMD CMD Frame Data Persistent Textures Big Transfer Buffers Resource Sets
  • 16. 16 | Nitrous and Mantle | 19 March 2014 CREATING A FRAME, USING FRAME DATA  Create 2 copies of our frame data  One will be read by GPU, while other is being written to by the CPU  Must use fence to make sure CPU doesn’t get ahead  More complex situations could be explored  Frame data includes – Constant Data – Small texture updates Even Frame Odd Frame GPU CPU
  • 17. 17 | Nitrous and Mantle | 19 March 2014 STARTING OUR FRAME g_fTotalFrameTime = System::Time::TimeAsSeconds( System::Time::PeekTime()) - g_StartTotalFrameTime ; g_uFrameBuffer = (g_uFrame % TransferMemory::BUFFERS); if(g_bProtectMemory) ProtectMemory(g_CmdTransferMemory.PerFrameMemory[g_uFrameBuffer].pData, g_CmdTransferMemory.Capacity, PO_READWRITE); gr = grWaitForFences(g_Device, 1, &g_FrameFences[g_uFrameBuffer], true, MAX_FRAME_WAIT_TIME_SECONDS); if(gr == GR_TIMEOUT) // Throw the frame out return false; g_StartTotalFrameTime = System::Time::TimeAsSeconds( System::Time::PeekTime()); //Reset our allocation and map the current GPU memory HeapAndChunk HeapChunk = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].Memory; MarkMemoryChunk(HeapChunk); GR_GPU_MEMORY DynamicMemory = g_MemoryHeap.MemoryChunk[HeapChunk.Heap][HeapChunk.SubChunk].Memory; gr = grMapMemory(DynamicMemory, 0, (GR_VOID**) &g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData); g_CmdTransferMemory.PerFrameMemory[g_uFrameBuffer ].Offset = 0; g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer ].Offset = 0; g_GPUTransferMemory.pCurrentMantleMemoryBase = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData; g_GPUTransferMemory.pCurrentMantleMemoryEnd = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData + g_GPUTransferMemory.Capacity; g_GPUTransferMemory.CurrentMantleMemory = DynamicMemory; g_GPUTransferMemory.TempHeapMemory = 0; return true;
  • 18. 18 | Nitrous and Mantle | 19 March 2014 HINT Use memory heap that has highest cpuWritePerfRating In Debug, rather then copying directly to GPU memory, allocate CPU memory –Or use pinned Mantle memory Then, use OS call Virtual Protect with PAGE_NOACCESS for any data that effects the frame, while the frame is being accessed by GPU, or could be being translated by the CPU If any part of system inadvertently writes to the memory, will throw exception
  • 19. 19 | Nitrous and Mantle | 19 March 2014 SOME EXTRA STUFF WE WILL NEED Because we track hazards, we will want a few more buffers A delete queue – objects are not deleted, but placed in the delete queue –One queue per frame, once that frame is complete, items will be deleted A state transition queue –Used only when a resource is created, to transition it to the desired initial state An Unordered Command Queue –Gets flushed before main frames command queue –Useful for preparing resources for first time use (e.g. initialization)
  • 20. 20 | Nitrous and Mantle | 19 March 2014 INTERNAL COMMAND FORMAT  Nitrous has it’s own internal command format  Persistent state: – Resource Sets – Shader Blocks – Various pipeline state  Frame State, primary construct is a batch set – Contains primitives, batches and shader sets – Batches which reference  Primitives  Shader Sets – Constant references are made into our frame memory  Each one of these has a different, natural change frequency
  • 21. 21 | Nitrous and Mantle | 19 March 2014 NITROUS MEMORY POOLS Resources used together, created together Multiple resource sets are often pooled Simplifies memory management, less then 1000 total allocations Orange Team Unit’s Memory FIGHTER 1 CAR. REAR CAR. FOR CARRIER MAIN (0) Albiedo (1) Material Mask (2) Ambient Occlusion (3) Normal Map (4) Weathering Map (0) Albiedo (1) Material Mask (2) Ambient Occlusion (3) Normal Map (4) Weathering Map (0) Albiedo (1) Material Mask (2) Ambient Occlusion (3) Normal Map (4) Weathering Map (0) Albiedo (1) Material Mask (2) Ambient Occlusion (3) Normal Map (4) Weathering Map
  • 22. 22 | Nitrous and Mantle | 19 March 2014 NITROUS MEMORY POOLS  GPU resource allocation a little tricky – we don’t know ahead of time how big something might be  2 step process, first calculate size of resource, then allocate pool based on that size  Does not map 1:1 to Mantle memory allocations  Instead, Pool is created with default page size  When a new resource is added, either it places inside current allocation, or if resource is bigger then the page size, creates a new allocation that fits the resource  A memory pool in Nitrous = a list of allocations in Mantle  If able to size ahead of time, only 1 allocation Unit Textures Diffuse Specular Mask AO Normal Mantle Alloc Mantle Alloc Mantle Alloc
  • 23. 23 | Nitrous and Mantle | 19 March 2014 CREATING A RESOURCE //Setup our resource decriptions ResourceDesc RedLUTDesc, BlueLUTDesc, GreenLUTDesc; RedLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfRedData, 0); BlueLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfBlueData, 0); GreenLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfGreenData, 0); //Calculate Size uint64 uSize = 0; uSize += GetResourceMemorySize(RedLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU); uSize += GetResourceMemorySize(BlueLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU); uSize += GetResourceMemorySize(GreenLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU); //Create a GPU heap, sized to what we want g_GPUMemory = AllocateHeap(uSize, Graphics::HT_GPU); //Create the Resources ColorLuts.RedLUT.Resource = CreateResource(RedLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory); ColorLuts.BlueLUT.Resource = CreateResource(BlueLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory); ColorLuts.GreenLUT.Resource = CreateResource(GreenLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);
  • 24. 24 | Nitrous and Mantle | 19 March 2014 SOME EXTRA MANAGEMENT REQUIRED  Creating a Resource slightly more involved  When a resource creation call occurs, check to see if we are a GPU heap  If so, no way to directly map memory and upload resource so – 1) Allocate, or recycle a CPU visible heap object – 2) Create Resource and map into this heap – 3) Create Resource on the GPU in the specified heap, (it will be uninitialized) – 4) Issue a copy command in our Unordered Command Queue – 5) Place temp resource in a deletion queue  For any resource, we allow a default state to be specified – At beginning of frame, before we execute main comands, issue any state transition queues to place resources from default state into desired state
  • 25. 25 | Nitrous and Mantle | 19 March 2014 RESOURCE SETS  In real world, textures are grouped  Nitrous has 5 bind points – 2 for batch – 2 for shader – 1 for primitive  VB is just a resource set  Nitrous does not allow binding of individual textures  Clearly, maps 1:1 to a descriptor Space Fighter 1 (0) Albiedo (1) Material Mask (2) Ambient Occlusion (3) Normal Map (4) Weathering Map
  • 26. 26 | Nitrous and Mantle | 19 March 2014 VERTEX BUFFERS  Nitrous does not use Vertex Buffers  Instead, Resource Set acts as VB, but with more programmatic control  Vastly simplifies engine side management – VBs can be saved as DDS files – Do not require a huge amount of loading code for slightly different Vertex Formats – Can fold Displacement maps and other geometry modifiers into Primitive Resource Set  Not seen strong evidence on any hardware that this causes a performance issue
  • 27. 27 | Nitrous and Mantle | 19 March 2014 CONSTANT BUFFERS  Nitrous does not have concept of constant buffers  Instead, all constant data is thrown out every frame – When we render an object, CPU will generate the constants needed for that frame – Grab a piece of the Frame Memory and write to it  Constant bindings are just references into our frame memory  But… be careful! CPU is writing straight to GPU memory. Do NOT read it back!  Evidence suggests no performance advantage of persisting constants across frames, regenerating every frame is ample fast. 100k+ batches not a problem
  • 28. 28 | Nitrous and Mantle | 19 March 2014 A BATCH IN NITROUS CONSISTS OF 4 PARTS Batch Set Prim 0 Prim 1 Prim 2 Shader 0 Shader 1 Batch 0 Batch 1 Batch 2 Batch 3 Batch 4 Primitive IB Resources Tri info Shader Resources (2) Constants (2) Shader Block Batch Primitive Shader Resources (2) Constants (2) Batch Set Batches Primitives Shaders RTs Blend State
  • 29. 29 | Nitrous and Mantle | 19 March 2014 DESCRIPTOR TABLE LAYOUT FOR NITROUS Descriptor 0 *Batch Resource Set 0 *Batch Resource Set 1 Batch Constants 1 Batch Constants 2 *Shader Resource Set 0 *Shader Resource Set 1 Shader Constants 0 Shader Constants 1 *UAV *Samplers (only 1 global bank) Descriptor 1 *Primitive VB Dynamic Const Batch Constants 0
  • 30. 30 | Nitrous and Mantle | 19 March 2014 DESCRIPTOR BINDING STRATEGY  Remember: Descriptors are just structures on GPU memory, so need to double buffer as well  Create 1 giant descriptor table, start update at beginning of frame  Recognize that we have a resource bind vector of only 9 items  Each bind vector can be built into a descriptor table, but don’t need unique one  Check to see if this bind vector has been built before(During this frame), e.g. resident in a small cache, if so, just reference it  If not, build a new descriptor table, and place in cache  Dynamic constants, batch constant 0, uses grCmdBindDynamicMemoryView – Usually, this will change every call (e.g. some part of the batch is changing or else it’s the same batch)  Using grCmdBindDynamicMemoryView, for 100k batches, about 5-10k descriptors actually need to get built per frame
  • 31. 31 | Nitrous and Mantle | 19 March 2014 TRACKING RESOURCE USAGE  Apps responsibility to track what resources get used  Simple strategy: Stamp a frame number on each memory pool anytime it is bound  Traverse the complete resource list, anything which matches current frame must be resident  Quick as long as we keep # of heaps reasonable  Important: Frame # should be padded into a cache line to avoid serialization Heap description Last Frame Used UI Textures intro 2401 UI Textures in Game 17204 Orange Faction Units 17204 Purple Faction Units 17204 Weapon effects 16392 Post Process RTs 17204 Terrain Heightmap 17204
  • 32. 32 | Nitrous and Mantle | 19 March 2014 DEALING WITH STATE TRANSITIONS  Most important, difficult part of Mantle  Must understand anytime a resource is getting used in a different way,  Read After Write  Write After Write
  • 33. 33 | Nitrous and Mantle | 19 March 2014 SHADER BLOCKS  Shader Blocks – Group of shaders with identical resources – Key point : all shader stages grouped together – All resources are bound to all stages – For mantle, need add some extra data  Can we blend?  What back buffer formats might be used?  What z buffer formats might be used? – Create a matrix of pipeline objects based on specified modes  The right pipeline objet is selected based on current RT state  RTs and blendstate already chunked, no extra state changes introduced ShaderGroup SimpleShader { ResourceSetPrimitive = VertexData; ConstantSetDynamic[0] = DynamicData; ResourceSetBatch[1] = UserTS; ConstantSetShader[0] = Globals; RenderTargetFormats = R16G16B16A16_FLOAT, R11G11B10_FLOAT; BlendStates = BlendOff; DepthTargetFormats = D32_FLOAT; Methods { main: CodeBlocks = SimpleShaders; VertexShader = SimpleVSShader; PixelShader = SimplePSShader; zprime: CodeBlocks = SimpleShaders; VertexShader = SimpleVSShader; PixelShader = BlankSimplePSShader; } }
  • 34. 34 | Nitrous and Mantle | 19 March 2014 CREATING SHADER BLOCKS IN MANTLE  Translate HLSL Byte code to Mantle IC – All done at compile time, have a Mantle speific executable  Creating a Mapping Table – Batch has 5 bind points – Shader has 4 bind points – Batch Set has 1 bind point – Primitive has 1 bind point – Global Samplers have 1 bind point  Set up our IC so all pipeline objects use exactly the same top level desciptor
  • 35. 35 | Nitrous and Mantle | 19 March 2014 WHAT ABOUT THAT PRESENT?  Unlike other APIS, we do not need, or should, block on the present on the main thread  Instead we spawn a job, which we block against on the next present Void PresentJob() { … result = grQueueSubmit(g_UniversalQueue, g_cCommandBuffers, g_CommandBuffers, cMemRefs, MemRef, g_FrameFences[g_uSubmittingFrameBuffer]); uint32 PresentFlags = 0; if(g_bVSync) PresentFlags = GR_WSI_WIN_PRESENT_FLIP_DONOTWAIT; // instruct the GPU to present the backbuffer in the applications window GR_WSI_WIN_PRESENT_INFO presentInfo = { g_hWnd, g_MasterResourceList.Images[DR_BACKBUFFER], GR_WSI_WIN_PRESENT_MODE_BLT, 0, PresentFlags }; result = grWsiWinQueuePresent(g_UniversalQueue, &presentInfo); SignalProcessAndPresentDone(pInfo); } }
  • 36. 36 | Nitrous and Mantle | 19 March 2014 WHAT OUR FRAME SUBMISSION LOOKS LIKE 1) Block on last frames present’s job (e.g. NOT the fence, the actual job we spawned) 2) Process and pending resource transitions from newly created resources 3) Generate all pending unordered commands, by generating into 1 or more cmd buffers 4) Send signals to the issuers of unordered commands, to notify them the commands are submiitted 5) Begin translation of Nitrous cmds into Mantle cmds – usually 100-500 jobs across all cores 6) Flush the deletion queues for this frame (likely a few frames old at this point) 7) Any item in our master deletion queue, add to the now empty deletion queue for this frame 8) Handle memory readbacks 9) Spawn Present job
  • 37. 37 | Nitrous and Mantle | 19 March 2014 FUTURE WORK  Now have explicit control over Multi GPU  Can write better MGPU solutions, like split screen which will not increase latency – We just got rid of a bunch of latency, don’t want to add it back!  Asymetric GPU use situations are doable – e.g. using integrated graphics in tandem with Discrete GPU
  • 38. 38 | Nitrous and Mantle | 19 March 2014 RESULTS  Star Swarm surprised both Oxide and AMD – We were not expecting to see cases where application was 300-400% faster, still room for optimizations – Right now, we are clearly GPU bound, will release an update soon that increases CPU utilization a little bit to optimize GPU, expecting 10-20% more performance out of Mantle on high end GPUs  Driver overhead very consistent, well correlated to number of calls made  About 2 man months of work – For an Alpha API, likely 1 month if final version  Especially telling on slower CPUs, surprising number of cases with high end GPUS with old CPUs  Try for yourself: Star Swarm is free to download on Steam!