Your SlideShare is downloading. ×
D3 D10 Unleashed   New Features And Effects
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

D3 D10 Unleashed New Features And Effects

1,849
views

Published on

Published in: Technology, Art & Photos

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,849
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 06/07/09 01:02
  • Transcript

    • 1. D3D10 Unleashed: New Features and Effects David Tuft Program Manager Direct 3D
    • 2. Outline
      • Direct3D 10
        • Design Imperatives
        • Features and Capabilities
        • Applications
    • 3.
      • No single graphics hardware target
      • CPU-bound games and applications
        • Bandwidth and CPU cycles are the bottleneck in multiple areas (physics, AI)
        • Large amount of CPU resources spent directing the GPU
      The Situation Today GPU CPU
    • 4. The Situation Today
      • No single graphics hardware target
      • CPU-bound games and applications
        • Bandwidth and CPU cycles are the bottleneck in multiple areas (physics, AI)
        • Large amount of CPU resources spent directing the GPU
      • GPU overly-specialized
    • 5. Direct3D 10 Unleashing the power of the GPU
      • Consistency – guarantee a common feature-set with strict requirements
      • Performance –
        • Render MORE
        • objects, materials, clutter, vegetation, shadows
        • with LESS
        • CPU cycles, stalls, and bandwidth cost
      • Visual Effects – unprecedented graphics
      • Capability – empower the GPU to handle a new series of applications
    • 6. Consistency
      • Completely Re-architected
      • NO CAPS!
      • No more fixed function
      • Strict rasterization and floating point rules
      • Logical, straightforward, but still powerful
    • 7.
      • Core layer
        • Validation moved from set add draw time to create time
      • Debug layer
        • No behavior changes between layers
        • No perf hit when disabled
      • Switch-to-reference layer
      • Thread-safe layer
      Consistency Layered Design
    • 8. Consistency
      • No half texel offset!
      • Texture coordinates match the pixel positions
    • 9.
      • State grouping
      • Validation
        • Minimal at draw time
        • Necessary at state creation time
        • Lots in debug layer
      • Lower API overhead allows for less draw time hit
      Performance Small Batch
    • 10.
      • The cores are yours
        • Create all needed depth stencil objects at init time
        • Set when needed
      • SetRenderState( D3DRS_STENCILENABLE )
      • SetRenderSTate( D3DRS_STENCILMASK )
      • ID3D10Device::CreateDepthStencilState()
      • OMSetDepthStencilState()
      • State grouped to match hardware
      Performance State Grouping Depth Stencil DepthEnable DepthFunc DepthWriteMask StencilEnable StencilReadMask StencilWriteMask FrontFace BackFace
    • 11.
      • Reduce state-change overhead by grouping state into immutable objects
      • Input layout
        • Format, Offset, InstanceDataStepRate, …
      • Rasterizer
        • Cull Mode, Multisample Enable, Fill Mode, …
      • DepthStencil
        • Depth Enable, Depth Func, Stencil Masks, …
      • Blend
        • SrcBlend, DestBlend, BlendOp, …
      • Sampler (No longer bound to a specific texture)
        • Filter Mode, MinLOD, MaxLOD,…
      Performance State Grouping
    • 12.
      • D3D10_USAGE_
        • IMMUTABLE = never updated
          • Create time update
          • Bound is input
        • DEFAULT = updated < once per frame
          • Update via UpdateSubresource
        • DYNAMIC = update >= once per frame
          • Update with map / unmap
        • STAGING = fast read back path
          • Can’t bind or write to in pipeline
      Performance Resource Usage
    • 13.
      • Texture Arrays
      • Format Reinterpretation
      • Stream Output
      • Resource Views
      • Input Assembler
      • Immediate offset on Memory Access
        • Integer/Bitwise Instructions
        • Comparison Filtering
      • Constant Buffers
      • State Objects
      Visual Effects GPU Features! Shared-Exponent HDR Compression (RGBE) Block-Compressed Formats for bump/normal maps 128 texture slots 8 Render targets More interstage communication Instance, Vertex, Primitive identifiers Per-primitive Clip distance Predicated Rendering Alpha-to-Coverage Multisample Readback Better cubemap filtering Input Assembler … Input Assembler Vertex Buffer Index Buffer Texture Texture Texture Depth/Stencil Render Target Stream Output Vertex Shader Geometry Shader Rasterizer/ Interpolator Pixel Shader Output Merger
    • 14.
      • New Unified Shader Core
        • Have the same functionality
        • On some cards, all shader stages use the same cores
      • Comparison-Sample instruction
        • Percentage-Closer shadow Filtering
      • Immediate offset (up to +/-8) on Texture/Buffer load
        • Custom filter kernels
      • Resource info
        • Returns height, width, # of miplevels, arraysize for the resource view
      • More of everything
        • Inter-stage registers, samplers, textures
        • Unlimited instruction count
      Visual Effects The Shader Core
    • 15. Shader Model 4.0 A new level of programmability
      • Full integer/bitwise instruction set
        • Massively parallel image and data processing
        • Custom decompression schemes
      • Buffer load – CPU-like unfiltered memory access
      • Switch statements
    • 16. New Resource Types: Texture Arrays
      • Dynamically indexable in the shader
      • Whole array can be set as a render target or as a texture input
    • 17.
      • Views enable interpretation of resources at different bind locations
      Resource Views Resource views example: cubemap
    • 18. Resource Views
      • Resource in D3D10 are generally typeless
      • Resource must be interpreted as a specific type by obtaining a view of the resource
      • Allows you to reinterpret data in a different format
      • Forces type validation earlier in setup
        • Don’t have to re-validate on every draw
    • 19. Geometry Shader Amplification and De-Amplification
      • Emits primitives of a specified output type (point, linestrip, trianglestrip)
        • Limited geometry amplification/de-amplification: Output 0-1024 values per invocation
      • No more 1-in / 1-out limit!
        • Shadow Volumes
        • Fur/Fins
        • Procedural Geometry/Detailing
        • All-GPU Particle Systems
        • Point Sprites
      Geometry Shader
    • 20. The New Pipeline Direct3D10 – Geometry Shader
      • Access to the whole primitive
        • Triangle
        • Line
        • Point
      • With adjacency
    • 21. Geometry Shader Example Shadow volume generation
    • 22. Geometry Shader Example Generalized displacement maps
      • Normal mapping (Direct3D 9)
    • 23. Geometry Shader Example Generalized displacement maps
      • Displacement Mapping (Direct3D 10)
    • 24. Render-To-Volume Geometry Shader
    • 25. Stream Out
      • Amplification from GS/VS can be directed into a buffer
      • Generated geometry easily redrawn using DrawAuto() command with no CPU intervention
      DrawAuto()
    • 26. FX10
      • D3D 10 runtime is optimized, and it is significantly faster and leaner
      • No hidden performance cliffs; Any slow paths will be reported in debug
      • Better reflection
      • You can retrieve almost anything from an Effect
      • Reflection metadata can be discarded— no performance or memory cost at run time
    • 27. FX10 Pipeline Requirements
        • All State Commands:
        • IASetVertexBuffers/SetIndexBuffer
        • IASetPrimitiveTopology
        • {VS|GS|PS}SetShader
        • {VS|GS|PS}SetShaderResources
        • {VS|GS|PS}SetConstantBuffers
        • {VS|GS|PS}SetSamplers
        • SOSetTargets
        • RSSetState
        • RSSetViewports/ScissorRects
        • OMSetRenderTargets
        • OMSetBlendState
        • OMSetDepthStencilState
    • 28. FX10 Pipeline Requirements
        • All State Commands:
        • IASetVertexBuffers/SetIndexBuffer
        • IASetPrimitiveTopology
        • {VS|GS|PS}SetShader
        • {VS|GS|PS}SetShaderResources
        • {VS|GS|PS}SetConstantBuffers
        • {VS|GS|PS}SetSamplers
        • SOSetTargets
        • RSSetState
        • RSSetViewports/ScissorRects
        • OMSetRenderTargets
        • OMSetBlendState
        • OMSetDepthStencilState
    • 29. Constant Buffers
      • Constants now managed like vertex/texture data
        • Updated efficiently via lock/discard or UpdateResourceUP
        • Set like any other resource
      • Up to 4096 4-channel × 32-bit elements per CB
      • Create as many CBs as you want; 16 can be bound to a shader at once
      A C B D A B B A D C Constant Buffers Shader A Shader B
    • 30. Constant Buffers
      • Example HLSL Syntax
      • Variables still exist in the global namespace
        • arrayIndex = 4;
        • myObject.arrayIndex = 4;
      cbuffer myObject { float4x4 matWorld; float3 vObjectPosition; int arrayIndex; } cbuffer myScene { float3 vSunPosition; float4x4 matView; }
    • 31. Additional Features
    • 32. Queries & Predicates
      • Many events and stats gathered by runtime
        • Command completion
        • Object Occlusion (in samples rendered)
        • Pipeline Stats
      • Commands can be queued depending on the result of the query
        • Called a Predicate
    • 33. Example: Predicated Rendering
      • Depending on occlusion query of a bounding geometry( OCCLUSIONPREDICATE ), queue the rendering of a more complex object
        • No CPU involvement required
      • Use PREDICATEHINT to avoid accidental pipeline stall for query result
    • 34.  
    • 35.  
    • 36. Direct3D 10 GPU material management
      • Render a multitude of unique materials without taxing the CPU
        • Unlimited instruction length
        • Switch statements
        • Texture arrays
        • Geometry shader
        • Constant buffers
        • Access to material descriptions
    • 37. Learn About DX10 Material Systems
    • 38. The Great Divide The PCI “Express”
      • Using the GPU when
        • You can’t pay for bandwidth
        • The cores are busy
      • Particle system animation
      • Collision Detection
      • DSP effects
        • Convolution
        • Bloom
      • Advanced Rendering
      CPU GPU
    • 39.
      • Multiple ways to implemented
        • Geometry shader Stream OUT
          • Amplification, processing done in GS, or VS
        • Render to Texture, with vertex shader position lookup
          • Processing done is PS
      Capability Particle System
    • 40. Back to our Goals
    • 41.
      • Fewer calls needed
        • Geometry Shaders/ Constant Buffers/ Texture Arrays…
      • Remaining calls are fast
        • Massive reduction in state and validation overhead :
          • Validation on CREATION, not on binding
          • Views , State Objects
      • Avoid CPU intervention
        • Predicated Draw()
        • DrawAuto()
      • Lean’n’mean runtime, refactored for performance
      Small Batch Performance
    • 42. Strict Specification
      • Strictly-defined, consistent behavior throughout the pipeline
        • IEEE floating-point compliance
          • Includes IEEE754R NaN-quashing Min/Max instructions
          • Precise FP32 sampling/blending/math/conversion rules. Ex:
            • FP32 shader ops – precise to 1.0 ULP
            • FP32 to Integer – precise to 0.6 ULP per op
            • FP16 blending - precise to 0.6 ULP per op
        • FP32 blending required
        • Exact line/triangle/AA rasterization rules
    • 43. GPU Exploitation
      • API specifically designed to enable pushing any computations onto GPU
        • Of course, it’s up to you!
      • Extra pipeline stage: Geometry Shader
      • Minimized CPU interaction.
        • Get the pipeline flowing and leave it alone
    • 44. Call to Action
      • Try out the new API
      • Get the SDK!
      • http://msdn2.microsoft.com/en-us/xna/aa937788.aspx
    • 45. © 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. www.xna.com