Your SlideShare is downloading. ×
0
Windows to Reality:Getting the Most out ofDirect3D 10 Graphics inYour GamesShanon DroneSoftware Development EngineerXNA De...
Key areas Debug Layer Draw Calls Constant Updates State Management Shader Linkage Resource Updates Dynamic Geometry Portin...
Debug LayerUse it!  The D3D10 layer can help find performance  issues    App controlled by passing    D3D10_CREATE_DEVICE_...
Draw Calls Draw calls are still “not free” Draw overhead is reduced in D3D10   But not enough that you can be lazy Efficie...
Draw CallsExcess baggage  An increase in the number of draw calls  generally increases the number of API  calls associated...
Constant Updates Updating shader constants was often a bottleneck in D3D9 It can still be a bottleneck in D3D10 The main d...
Constant UpdatesConstant Buffer Recap  Constant Buffers are buffer objects that  hold shader constant data  They are updat...
Constant UpdatesPorting Issues  D3D9 constants were updated individually  by calling SetXXXXXShaderConstantX  In D3D10, yo...
Constant UpdatesNaïve Port: AKA how to cripple perf  Each shader uses one big constant buffer  Submitting one value submit...
Constant UpdatesNaïve Port: AKA how to cripple perf  100 skinned meshes (100 materials), 900  static meshes (400 materials...
Constant UpdatesOrganize Constants  The first step is to organize constants by  frequency of update  One shader will gener...
Begin Framecbuffer VSGlobalPerFrameCB              Update VSGlobalPerFrameCB{                        4 Bytes  float AppTim...
Constant Updates 13,120,000   Bytes              /   707,748                   Bytes    =   18x
Constant UpdatesManaging Buffers  Constant buffers need to be managed in  the application  Creating a few buffers that are...
Constant UpdatesManaging Buffers  Solution 1 (Fastest)    Create Constant Buffers that line up exactly    with the number ...
Constant UpdatesManaging Buffers  Solution 2 (Second Best)    If you cannot create a CBs that line up exactly    with elem...
Constant UpdatesCase Study: Skinning using Solution 1  Skinning in D3D9 (or a bad D3D10 port)    Multiple passes causes re...
Constant UpdatesD3D9 Version / or Naïve D3D10 Version   Pass1                        Mesh2 Bone0                          ...
Constant UpdatesPreferred D3D10 Version                         Mesh1 CB                 Mesh2 CB Frame Start             ...
Constant UpdatesAdvanced D3D10 Version Why not store all of our characters’ bones in a 128-bit FP texture? We can upload b...
State Management Individual state setting is no longer possible in D3D10 State in D3D10 is stored in state objects These s...
State ManagementManaging State Objects  Solution 1 (Fastest)    If you have a known set of materials and    required state...
State ManagementManaging State Objects  Solution 2 (Second Best)    If your content is not finalized, or if you    CANNOT ...
Shader Linkage D3D9 shader linkage was based off of semantics (POSITION, NORMAL, TEXCOORDN) D3D10 linkage is based off of ...
Shader LinkageNo Holes Allowed!   Elements must be read in the order they   are output from the previous stage   Cannot ha...
Shader LinkageInput Assembler to Vertex Shader  Input Layouts define the signature of the  vertex stream data  Input Layou...
Shader LinkageInput Assembler to Vertex Shader  Solution 1 (Fastest)    Create an Input Layout for each unique    Vertex S...
Shader LinkageInput Assembler to Vertex Shader  Solution 2 (Second Best)    If you load meshes and create input layouts   ...
Shader LinkageAside: Instancing  Instancing is a first class citizen on D3D10!  Stream source frequency is now part of  th...
Resource Updates Updating resources is different in D3D10 Create / Lock / Fill / Unlock paradigm is no longer necessary (a...
Resource UpdatesResource Usage Types D3D10_USAGE_DEFAULT D3D10_USAGE_IMMUTABLE D3D10_USAGE_DYNAMIC D3D10_USAGE_STAGING
Resource UpdatesD3D10_USAGE_DEFAULT Use for resources that need fast GPU read and write access Can only be updated using U...
Resource UpdatesD3D10_USAGE_IMMUTABLE Use for resources that need fast GPU read access only Once they are created, they ca...
Resource UpdatesD3D10_USAGE_DYNAMIC Use for resources that need fast CPU write access (at the expense of slower GPU read a...
Resource UpdatesD3D10_USAGE_STAGING This is the only way to read data back from the GPU Can only be updated using Map Cann...
Resource UpdatesSummary CPU updates the resource frequently (more than once per frame)   Use D3D10_USAGE_DYNAMIC CPU updat...
Resource UpdatesExample: Vertex Buffer  The vertex buffer is touched by the CPU  less than once per frame    Create it wit...
Resource UpdatesThe Exception: Constant Buffers  CBs are always expected to be updated  frequently  Select CB usage based ...
Resource UpdatesUpdateSubresource UpdateSubresource requires a system memory buffer and incurs an extra copy Use if you ha...
Resource UpdatesMap Map requires no extra system memory but may hit driver renaming limits if abused Use if compositing va...
Resource UpdatesA note on overusing discard  Use D3D10_MAP_WRITE_DISCARD carefully  with buffers!  D3D10_MAP_WRITE_DISCARD...
Dynamic Geometry DrawIndexedPrimitiveUP is gone! DrawPrimitiveUP is gone! Your well-behaved D3D9 app isn’t using these any...
Dynamic GeometrySolution: Same as in D3D9  Use one large buffer, and map it with  D3D10_MAP_WRITE_NO_OVERWRITE  Advance th...
Porting Tips StretchRect is Gone   Work around using render-to-texture A8R8G8B8 have been replaced with R8G8B8A8 formats  ...
Porting TipsContinued User Clip Planes usage has changed   They’ve move to the shader   Experiment with the SV_ClipDistanc...
Porting TipsContinued No offsets on Map calls   This was basically API clutter in D3D9   Calculate the offset from the ret...
Porting TipsContinued Input Layout conversions tightened up   D3DDECLTYPE_UBYTE4 in the vertex stream   could be converted...
Porting TipsContinued Sampler and Texture bindings   Samplers can be bound independently of textures   This is very flexib...
Porting TipsContinued D3DSAMP_SRGBTEXTURE   This sampler state setting does not exist on   D3D10   Instead it’s included i...
Summary Use the debug runtime! More draw calls usually means more constant updating and state changing calls Be frugal wit...
Call to Action Actually exploit D3D10! This talk tells you how to get performance gains from a straight port You can get a...
http://www.xna.com                                © 2007 Microsoft Corporation. All rights reserved.This presentation is f...
Windows to reality   getting the most out of direct3 d 10 graphics in your games
Upcoming SlideShare
Loading in...5
×

Windows to reality getting the most out of direct3 d 10 graphics in your games

11,283

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
11,283
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Windows to reality getting the most out of direct3 d 10 graphics in your games"

  1. 1. Windows to Reality:Getting the Most out ofDirect3D 10 Graphics inYour GamesShanon DroneSoftware Development EngineerXNA Developer ConnectionMicrosoft
  2. 2. Key areas Debug Layer Draw Calls Constant Updates State Management Shader Linkage Resource Updates Dynamic Geometry Porting Tips
  3. 3. Debug LayerUse it! The D3D10 layer can help find performance issues App controlled by passing D3D10_CREATE_DEVICE_DEBUG into D3D10CreateDevice. Use the D3DX10 Debug Runtime Link against D3DX10d.lib Only do this for debug builds! Look for performance warnings in the debug output
  4. 4. Draw Calls Draw calls are still “not free” Draw overhead is reduced in D3D10 But not enough that you can be lazy Efficiency in the number of draw calls will still give a performance win
  5. 5. Draw CallsExcess baggage An increase in the number of draw calls generally increases the number of API calls associated with those draws ConstantBuffer updates Resource changes (VBs, IBs, Textures) InputLayout changes These all have effects on performance that vary with draw call count
  6. 6. Constant Updates Updating shader constants was often a bottleneck in D3D9 It can still be a bottleneck in D3D10 The main difference between the two is the new Constant Buffer object in D3D10 This is the largest section of this talk
  7. 7. Constant UpdatesConstant Buffer Recap Constant Buffers are buffer objects that hold shader constant data They are updated using D3D10_MAP_WRITE_DISCARD or by calling UpdateSubresource There are 16 Constant Buffer slots available to each shader in the pipeline Try not to use all 16 to leave some headroom
  8. 8. Constant UpdatesPorting Issues D3D9 constants were updated individually by calling SetXXXXXShaderConstantX In D3D10, you have to update the entire constant buffer all at once A naïve port from D3D9 to D3D10 can have crippling performance implications if Constant Buffers are not handled correctly! Rule of thumb: Do not update more data than you need to
  9. 9. Constant UpdatesNaïve Port: AKA how to cripple perf Each shader uses one big constant buffer Submitting one value submits them all! If you have one 4096 byte Constant Buffer, and you only need to update your World matrix, you will still have to update 4096 bytes of data and send it across the bus Don’t do this!
  10. 10. Constant UpdatesNaïve Port: AKA how to cripple perf 100 skinned meshes (100 materials), 900 static meshes (400 materials), 1 shadow + 1 lighting pass Shadow Pass Update VSGlobalCB 6560 Bytes x 100 = 656000 Bytes cbuffer VSGlobalsCB Update VSGlobalCB { 6560 matrix ViewProj; Bytes 6560 Bytes x 900 = 5904000 Bytes matrix Bones[100]; Light Pass matrix World; Update VSGlobalCB float SpecPower; 6560 Bytes x 100 = 656000 Bytes float4 BDRFCoefficients; float AppTime; Update VSGlobalCB uint2 RenderTargetSize; 6560 Bytes x 900 = 5904000 Bytes }; = 13,120,000 Bytes
  11. 11. Constant UpdatesOrganize Constants The first step is to organize constants by frequency of update One shader will generally be used to draw several objects Some data in this shader doesn’t need to be set for every draw For example: Time, ViewProj matrices Split these out into their own buffers
  12. 12. Begin Framecbuffer VSGlobalPerFrameCB Update VSGlobalPerFrameCB{ 4 Bytes float AppTime; 4 Bytes x 1 = 4 Bytes}; Update VSPerSkinnedCBscbuffer VSPerSkinnedCB 6400 Bytes x 100 = 640000 Bytes{ 6400 Bytes Update VSPerStaticCBs matrix Bones[100];}; 64 Bytes x 900 = 57600 Bytescbuffer VSPerStaticCB Shadow Pass{ 64 Bytes Update VSPerPassCB matrix World}; 72 Bytes x 1 = 72 Bytescbuffer VSPerPassCB Light Pass{ Update VSPerPassCB matrix ViewProj; 72 Bytes 72 Bytes x 1 = 72 Bytes uint2 RenderTargetSize;}; Update VSPerMaterialCBscbuffer VSPerMaterialCB 20 Bytes x 500 = 10000 Bytes{ 20 Bytes float SpecPower; float4 BDRFCoefficients; = 707,748 Bytes};
  13. 13. Constant Updates 13,120,000 Bytes / 707,748 Bytes = 18x
  14. 14. Constant UpdatesManaging Buffers Constant buffers need to be managed in the application Creating a few buffers that are used for all shader constants just won’t work We update more data than necessary due to large buffers
  15. 15. Constant UpdatesManaging Buffers Solution 1 (Fastest) Create Constant Buffers that line up exactly with the number of elements of each frequency group Global CBs CBs per Mesh CBs per Material CBs per Pass This ensures that EVERY constant buffer is no larger than it absolutely needs to be This also ensures the most efficient update of CBs based upon frequency
  16. 16. Constant UpdatesManaging Buffers Solution 2 (Second Best) If you cannot create a CBs that line up exactly with elements, you can create a tiered constant buffer system Create arrays of 32-byte, 64-byte, 128-byte, 256- byte, etc. constant buffers Keep a shadow copy of the constant data in system memory When it comes time to render, select the smallest CB from the array that will hold the necessary constant data May have to resubmit redundant data for separate passes Hybrid approach?
  17. 17. Constant UpdatesCase Study: Skinning using Solution 1 Skinning in D3D9 (or a bad D3D10 port) Multiple passes causes redundant bone data uploads to the GPU Skinning in D3D10 Using Constant Buffers we only need to upload it once
  18. 18. Constant UpdatesD3D9 Version / or Naïve D3D10 Version Pass1 Mesh2 Bone0 Mesh1 Set Mesh1 Bones Mesh2 Bone1 Mesh1 Bone1 Draw Mesh1 Mesh2 Bone2 Mesh1 Set Mesh2 Bones Constant Mesh2 Bone3 Mesh1 Draw Mesh2 Data Pass2 Mesh2 Bone4 Mesh1 Set Mesh1 Bones … Draw Mesh1 Mesh2 BoneN Mesh1 Set Mesh2 Bones Draw Mesh2
  19. 19. Constant UpdatesPreferred D3D10 Version Mesh1 CB Mesh2 CB Frame Start Mesh1 Bone0 Mesh2 Bone0 Update Mesh1 CB Mesh1 Bone1 Mesh2 Bone1 Update Mesh2 CB Mesh1 Bone2 Mesh2 Bone2 Pass1 Mesh1 Bone3 Mesh2 Bone3 Bind Mesh1 CB Draw Mesh1 Mesh1 Bone4 Mesh2 Bone4 Bind Mesh2 CB … … Draw Mesh2 Mesh1 BoneN Mesh2 BoneN Pass2 Bind Mesh1 CB Draw Mesh1 Bind Mesh2 CB CB Slot 0 Mesh1 Mesh2 CB Draw Mesh2
  20. 20. Constant UpdatesAdvanced D3D10 Version Why not store all of our characters’ bones in a 128-bit FP texture? We can upload bones for all visible characters at the start of a frame We can draw similar characters using instancing instead of individual draws Use SV_InstanceID to select the start of the character’s bone data in the texture Stream the skinned meshes to memory using Stream Output and render all subsequent passes from the post-skinned buffer
  21. 21. State Management Individual state setting is no longer possible in D3D10 State in D3D10 is stored in state objects These state objects are immutable To change even one aspect of a state object requires that you create an entirely new state object with that one change
  22. 22. State ManagementManaging State Objects Solution 1 (Fastest) If you have a known set of materials and required states, you can create all state objects at load time State objects are small and there are finite set of permutations With all state objects created at runtime, all that needs to be done during rendering is to bind the object
  23. 23. State ManagementManaging State Objects Solution 2 (Second Best) If your content is not finalized, or if you CANNOT get your engine to lump state together Create a state object hash table Hash off of the setting that has the most unique states Grab pre-created states from the hash-table Why not give your tools pipeline the ability to do this for a level and save out the results?
  24. 24. Shader Linkage D3D9 shader linkage was based off of semantics (POSITION, NORMAL, TEXCOORDN) D3D10 linkage is based off of offsets and sizes This means stricter linkage rules This also means that the driver doesn’t have to link shaders together at every draw call!
  25. 25. Shader LinkageNo Holes Allowed! Elements must be read in the order they are output from the previous stage Cannot have “holes” between linkagesStruct VS_OUTPUT Struct PS_INPUT{ { float3 Norm : NORMAL; float2 Tex : TEXCOORD0; float3 Norm NORMAL; float2 Tex : TEXCOORD0; float3 Norm : NORMAL; Tex TEXCOORD0; float2 Tex2 : TEXCOORD1; float2 Tex2 : TEXCOORD1; float4 Pos : SV_POSITION;}; }; Holes at the end are OK
  26. 26. Shader LinkageInput Assembler to Vertex Shader Input Layouts define the signature of the vertex stream data Input Layouts are the similar to Vertex Declarations in D3D9 Strict linkage rules are a big difference Creating Input Layouts on the fly is not recommended CreateInputLayout requires a shader signature to validate against
  27. 27. Shader LinkageInput Assembler to Vertex Shader Solution 1 (Fastest) Create an Input Layout for each unique Vertex Stream / Vertex Shader combination up front Input Layouts are small This assumes that the shader input signature is available when you call CreateInputLayout Try to normalize Input Layouts across level or be art directed
  28. 28. Shader LinkageInput Assembler to Vertex Shader Solution 2 (Second Best) If you load meshes and create input layouts before loading shaders, you might have a problem You can use a similar hashing scheme as the one used for State Objects When the Input Layout is needed, search the hash for an Input Layout that matches the Vertex Stream and Vertex Shader signature Why not store this data to a file and pre- populate the Input Layouts after your content is tuned?
  29. 29. Shader LinkageAside: Instancing Instancing is a first class citizen on D3D10! Stream source frequency is now part of the Input Layout Multiple frequencies will mean multiple Input Layouts
  30. 30. Resource Updates Updating resources is different in D3D10 Create / Lock / Fill / Unlock paradigm is no longer necessary (although you can still do it) Texture data can be passed into the texture at create time
  31. 31. Resource UpdatesResource Usage Types D3D10_USAGE_DEFAULT D3D10_USAGE_IMMUTABLE D3D10_USAGE_DYNAMIC D3D10_USAGE_STAGING
  32. 32. Resource UpdatesD3D10_USAGE_DEFAULT Use for resources that need fast GPU read and write access Can only be updated using UpdateSubresource Render targets are good candidates Textures that are updated infrequently (less than once per frame) are good candidates
  33. 33. Resource UpdatesD3D10_USAGE_IMMUTABLE Use for resources that need fast GPU read access only Once they are created, they cannot be updated... ever Initial data must be passed in during the creation call Resources that will never change (static textures, VBs / Ibs) are good candidates Don’t bend over backwards trying to make everything D3D10_USAGE_IMMUTABLE
  34. 34. Resource UpdatesD3D10_USAGE_DYNAMIC Use for resources that need fast CPU write access (at the expense of slower GPU read access) No CPU read access Can only be updated using Map with: D3D10_MAP_WRITE_DISCARD D3D10_MAP_WRITE_NO_OVERWRITE Dynamic Vertex Buffers are good candidates Dynamic (> once per frame) textures are good candidates
  35. 35. Resource UpdatesD3D10_USAGE_STAGING This is the only way to read data back from the GPU Can only be updated using Map Cannot map with D3D10_MAP_WRITE_DISCARD or D3D10_MAP_WRITE_NO_OVERWRITE Might want to double buffer to keep from stalling GPU The GPU cannot directly use these
  36. 36. Resource UpdatesSummary CPU updates the resource frequently (more than once per frame) Use D3D10_USAGE_DYNAMIC CPU updates the resource infrequently (once per frame or less) Use D3D10_USAGE_DEFAULT CPU doesn’t update the resource Use D3D10_USAGE_IMMUTABLE CPU needs to read the resource Use D3D10_USAGE_STAGING
  37. 37. Resource UpdatesExample: Vertex Buffer The vertex buffer is touched by the CPU less than once per frame Create it with D3D10_USAGE_DEFAULT Update it with UpdateSubresource The vertex buffer is used for dynamic geometry and CPU need to update if multiple times per frame Create it with D3D10_USAGE_DYNAMIC Update it with Map
  38. 38. Resource UpdatesThe Exception: Constant Buffers CBs are always expected to be updated frequently Select CB usage based upon which one causes the least amount of system memory to be transferred Not just to the GPU, but system-to-system memory copies as well
  39. 39. Resource UpdatesUpdateSubresource UpdateSubresource requires a system memory buffer and incurs an extra copy Use if you have system copies of your constant data already in one place
  40. 40. Resource UpdatesMap Map requires no extra system memory but may hit driver renaming limits if abused Use if compositing values on the fly or collecting values from other places
  41. 41. Resource UpdatesA note on overusing discard Use D3D10_MAP_WRITE_DISCARD carefully with buffers! D3D10_MAP_WRITE_DISCARD tells the driver to give us a new memory buffer if the current one is busy There are a LIMITED set of temporary buffers If these run out, then your app will stall until another buffer can be freed This can happen if you do dynamic geometry using one VB and D3D10_MAP_WRITE_DISCARD
  42. 42. Dynamic Geometry DrawIndexedPrimitiveUP is gone! DrawPrimitiveUP is gone! Your well-behaved D3D9 app isn’t using these anyway, right?
  43. 43. Dynamic GeometrySolution: Same as in D3D9 Use one large buffer, and map it with D3D10_MAP_WRITE_NO_OVERWRITE Advance the write position with every draw Wrap to the beginning Make sure your buffer is large enough that you’re not overwriting data that the GPU is reading This is what happens under the covers for D3D9 when using DIPUP or DUP in Windows Vista
  44. 44. Porting Tips StretchRect is Gone Work around using render-to-texture A8R8G8B8 have been replaced with R8G8B8A8 formats Swizzle on texture load or swizzle in the shader Fixed Function AlphaTest is Gone Add logic to the shader and call discard Fixed Function Fog is Gone Add it to the shader
  45. 45. Porting TipsContinued User Clip Planes usage has changed They’ve move to the shader Experiment with the SV_ClipDistance SEMANTIC vs discard in the PS to determine which is faster for your shader Query data sizes might have changed Occlusion queries are UINT64 vs DWORD No Triangle Fan Support Work around in content pipeline or on load SetCursorProperties, ShowCursor are gone Use Win32 APIs to handle cursors now
  46. 46. Porting TipsContinued No offsets on Map calls This was basically API clutter in D3D9 Calculate the offset from the returned pointer Clears are no longer bound to pipeline state If you want a clear call to respect scissor, stencil, or other state, draw a full-screen quad This is closer to the HW The Driver/HW has been doing for you for years OMSetBlendState Never set the SampleMask to 0 in OMSetBlendState
  47. 47. Porting TipsContinued Input Layout conversions tightened up D3DDECLTYPE_UBYTE4 in the vertex stream could be converted to a float4 in the VS in D3D9 IE. 255u in the stream would show up as 255.0 in the VS In D3D10 you either get a normalized [0..1] value or 255 (u)int Register keyword It doesn’t mean the same thing in D3D10 Use register to determine which CB slot a CB binds to Use packoffset to place a variable inside a CB
  48. 48. Porting TipsContinued Sampler and Texture bindings Samplers can be bound independently of textures This is very flexible! Sampler and Texture slots are not always the same Register Packing In D3D9 all variables took up at least one float4 register (even if you only used a single float!) In D3D10 variables are packed together This saves a lot of space Make sure your engine doesn’t do everything based upon register offsets or your variables might alias
  49. 49. Porting TipsContinued D3DSAMP_SRGBTEXTURE This sampler state setting does not exist on D3D10 Instead it’s included in the texture format This is more like the Xbox 360 Consider re-optimizing resource usage and upload for better D3D10 performance But use D3D10_USAGE_DEFAULT resources and UpdateSubresource and a baseline
  50. 50. Summary Use the debug runtime! More draw calls usually means more constant updating and state changing calls Be frugal with constant updates Avoid resubmitting redundant data! Create as much state and input layout information up front as possible Select D3D10_USAGE for resources based upon the CPU access patterns needed Use D3D10_MAP_NO_OVERWRITE and a big buffer as a replacement for DIPUP and DUP
  51. 51. Call to Action Actually exploit D3D10! This talk tells you how to get performance gains from a straight port You can get a whole lot more by using D3D10’s advanced features! StreamOut to minimize skinning costs First class instancing support Store some vertex data in textures Move some systems to the GPU (Particles?) Aggressive use of Constant Buffers
  52. 52. http://www.xna.com © 2007 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×