Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Modern Graphics Pipeline Overview

5,102 views

Published on

An overview of a typical graphics pipeline for current GPU hardware

Published in: Technology, Business
  • Be the first to comment

Modern Graphics Pipeline Overview

  1. 1. A Brief Overview of the Graphics Pipeline Cedric Lee
  2. 2. What is a graphics pipeline?3D Raster Stage Stage StageScene Image● Hardware, real-time / interactive rendering● Popular APIs : OpenGL and DirectX
  3. 3. Overview● Basic Graphics Pipeline● Modern Graphics Pipeline● Beyond Pipelining● The New Wave
  4. 4. Basic Graphics Pipeline● Use case: ● Render a textured mesh with per-pixel lighting ● ambient light, 1 dir, 1 point, no shadows ● Assume z-buffer based architecture
  5. 5. 3D Scene● Surface ● Triangle mesh – Vertices and indices – Per-vertex position, normal ● Position + orientation (world matrix)● Material ● Per-vertex uv, tangent, binormal ● Diffuse + normal maps● Diffuse lighting (direction, colour)● Camera (view + projection matrices)
  6. 6. Vertex FetchingVertexStream Per-Vertex Position-OS Input Normal-OS Assembler Tangent-OS Binormal-OS Index Texture UVStream
  7. 7. Vertex Processing Per-Vertex Position-OS Normal-OS Tangent-OS Binormal-OS Per-Vertex Texture UV Position-WS Vertex Position-SS Shader Normal-WS Tangent-WS Uniform Binormal-WS Constants Texture UV World Matrix View MatrixProjection Matrix
  8. 8. Scan ConversionPer-Vertex Per-PixelPosition-WS Position-WSPosition-SS Trivial Position-SS RasterizerNormal-WS Reject Normal-WSTangent-OS Tangent-OSBinormal-OS Binormal-OS Texture UV Texture UV Viewport clipping Early Z rejection Back-face culling Interpolate
  9. 9. Pixel Processing Textures Per-Pixel Diffuse Position-WS Position-SS Normal Normal-WS Tangent-WS Per-Pixel Binormal-WS Texture UV Depth Pixel Colour Shader Alpha Uniform ConstantsAmbient L colour Texturing Dir L colour Lighting Dir L dir Point L colour Point L pos
  10. 10. Raster Operators (ROPs) DepthPer-Pixel Buffer Depth Test, Depth Alpha Test, Colour Alpha Blend Colour Buffer Frame buffer / render targets
  11. 11. Modern GPU Pipeline● Programmable units● Vertex shaders, Pixel Shaders● DX10 : Geometry Shader ● Kill/emit vertices, primitives ● Ex. displacement mapping, fur, 1-pass render to cube map
  12. 12. Modern GPU Pipeline● Unified shader architecture ● Common shading cores shared between Vertex, Geometry and Pixel shading units ● Scheduler distributes work ● Load balancing
  13. 13. http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
  14. 14. http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
  15. 15. Modern GPU Pipeline● Bandwidth: ● Hierarchical Z ● PS3: Compressed Z and colour to reduce bandwidth for MSAA reads ● X360: in-GPU EDRAM – lots of bandwidth
  16. 16. Modern GPU Pipeline● CUDA / DX11 Compute Shader ● Stream processing (GPGPU) ● Exposes shading functionality ● Arbitrary memory reads
  17. 17. Modern GPU● More memory, processing units● More floating point formats, fewer usage restrictions● More render targets (8)● Longer shaders● New data structures (e.g. Texture arrays)● Better MSAA and anisotropic filtering support
  18. 18. Beyond Pipelining● Multi-processor ● Solution to “memory” and “power” walls ● Pipelining : multiple stages happening at once ● Parallelism : many things happening in the same stage● Limit of pipelining ● Small number of pipeline steps ● Some steps are much more compute intensive
  19. 19. Parallelism● Parallelism examples: ● All components of float4 at the same time ● Multiple vertices at the same time ● Multiple triangles at the same time
  20. 20. SIMD● e.g. GPU ALU● Shared instruction store and control● Compact and less expensive● Efficient with no loops or branches● Problem with unused processing cycles ● Unfilled quads are inefficient ● Solution : avoid small or skinny triangles (PS3)● Not good for more complicated data structures or algorithms
  21. 21. SIMT● Still SIMD. Shared code between threads.● Process groups of primitives (e.g. 48 quads) in each thread● Latency hiding: ● 1 Thread stalls on texture fetch ● Othe threads continue execution ● Especially important due to “memory wall”
  22. 22. SIMT● When branching: ● Only evaluate one branch if all primitives take that branch ● Must evaluate both branches and mask the results if not all primitives take the same branch● Reduces unused processor cycles
  23. 23. MIMD● e.g. Multi-core CPUs, Cell SPEs, Larrabee● Diff code stores and controls for diff processors● More complex hardware● More expensive● Synchronization issues● Can handle more complex data structures and algorithms
  24. 24. The New Wave● MIMD ● Cell SPEs ● Larrabee
  25. 25. Cell SPEs● SPEs ● Local memory store ● Shared memory accessed via DMA ● Ring bus
  26. 26. PS3● RSX ● Traditional GPU (z-buffer, ROP) ● SIMD data structures and processing (arrays)● Offload GPU work to SPUs ● Micro triangle removal ● Skinning ● Post-FX ● Lighting ● Mostly rely on SIMD-friendly data structures
  27. 27. Larrabee● Many general purpose CPU cores● Coherent memory access from cores● Very few fixed-function units (e.g. Texture)● Most graphics pipeline components are programmable ● Depth buffer ● Blending● Invites more complex data structures and algo
  28. 28. What does this mean?
  29. 29. Programming● GPU programming may become more like SPU programming ● More MIMD ● More synchronization and data buffering issues ● More attention to latency hiding
  30. 30. Surfaces and Volumes● Curved surfaces● Displacement mapping● Multi-resolution meshes● Volumes
  31. 31. Lighting● Non-uniform representations ● Irregular Shadow Mapping ● Deep Shadow Maps
  32. 32. Rasterization● Object-parallel rasterization ● Ray-casting – Implicit surfaces (e.g. Metaballs, Level sets, CSG) – Direct volume rendering ● Order independent transparency
  33. 33. Questions?

×