Modern Graphics Pipeline Overview

4,633 views

Published on

An overview of a typical graphics pipeline for current GPU hardware

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,633
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
94
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Modern Graphics Pipeline Overview

  1. 1. A Brief Overview of the Graphics Pipeline Cedric Lee
  2. 2. What is a graphics pipeline?3D Raster Stage Stage StageScene Image● Hardware, real-time / interactive rendering● Popular APIs : OpenGL and DirectX
  3. 3. Overview● Basic Graphics Pipeline● Modern Graphics Pipeline● Beyond Pipelining● The New Wave
  4. 4. Basic Graphics Pipeline● Use case: ● Render a textured mesh with per-pixel lighting ● ambient light, 1 dir, 1 point, no shadows ● Assume z-buffer based architecture
  5. 5. 3D Scene● Surface ● Triangle mesh – Vertices and indices – Per-vertex position, normal ● Position + orientation (world matrix)● Material ● Per-vertex uv, tangent, binormal ● Diffuse + normal maps● Diffuse lighting (direction, colour)● Camera (view + projection matrices)
  6. 6. Vertex FetchingVertexStream Per-Vertex Position-OS Input Normal-OS Assembler Tangent-OS Binormal-OS Index Texture UVStream
  7. 7. Vertex Processing Per-Vertex Position-OS Normal-OS Tangent-OS Binormal-OS Per-Vertex Texture UV Position-WS Vertex Position-SS Shader Normal-WS Tangent-WS Uniform Binormal-WS Constants Texture UV World Matrix View MatrixProjection Matrix
  8. 8. Scan ConversionPer-Vertex Per-PixelPosition-WS Position-WSPosition-SS Trivial Position-SS RasterizerNormal-WS Reject Normal-WSTangent-OS Tangent-OSBinormal-OS Binormal-OS Texture UV Texture UV Viewport clipping Early Z rejection Back-face culling Interpolate
  9. 9. Pixel Processing Textures Per-Pixel Diffuse Position-WS Position-SS Normal Normal-WS Tangent-WS Per-Pixel Binormal-WS Texture UV Depth Pixel Colour Shader Alpha Uniform ConstantsAmbient L colour Texturing Dir L colour Lighting Dir L dir Point L colour Point L pos
  10. 10. Raster Operators (ROPs) DepthPer-Pixel Buffer Depth Test, Depth Alpha Test, Colour Alpha Blend Colour Buffer Frame buffer / render targets
  11. 11. Modern GPU Pipeline● Programmable units● Vertex shaders, Pixel Shaders● DX10 : Geometry Shader ● Kill/emit vertices, primitives ● Ex. displacement mapping, fur, 1-pass render to cube map
  12. 12. Modern GPU Pipeline● Unified shader architecture ● Common shading cores shared between Vertex, Geometry and Pixel shading units ● Scheduler distributes work ● Load balancing
  13. 13. http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
  14. 14. http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
  15. 15. Modern GPU Pipeline● Bandwidth: ● Hierarchical Z ● PS3: Compressed Z and colour to reduce bandwidth for MSAA reads ● X360: in-GPU EDRAM – lots of bandwidth
  16. 16. Modern GPU Pipeline● CUDA / DX11 Compute Shader ● Stream processing (GPGPU) ● Exposes shading functionality ● Arbitrary memory reads
  17. 17. Modern GPU● More memory, processing units● More floating point formats, fewer usage restrictions● More render targets (8)● Longer shaders● New data structures (e.g. Texture arrays)● Better MSAA and anisotropic filtering support
  18. 18. Beyond Pipelining● Multi-processor ● Solution to “memory” and “power” walls ● Pipelining : multiple stages happening at once ● Parallelism : many things happening in the same stage● Limit of pipelining ● Small number of pipeline steps ● Some steps are much more compute intensive
  19. 19. Parallelism● Parallelism examples: ● All components of float4 at the same time ● Multiple vertices at the same time ● Multiple triangles at the same time
  20. 20. SIMD● e.g. GPU ALU● Shared instruction store and control● Compact and less expensive● Efficient with no loops or branches● Problem with unused processing cycles ● Unfilled quads are inefficient ● Solution : avoid small or skinny triangles (PS3)● Not good for more complicated data structures or algorithms
  21. 21. SIMT● Still SIMD. Shared code between threads.● Process groups of primitives (e.g. 48 quads) in each thread● Latency hiding: ● 1 Thread stalls on texture fetch ● Othe threads continue execution ● Especially important due to “memory wall”
  22. 22. SIMT● When branching: ● Only evaluate one branch if all primitives take that branch ● Must evaluate both branches and mask the results if not all primitives take the same branch● Reduces unused processor cycles
  23. 23. MIMD● e.g. Multi-core CPUs, Cell SPEs, Larrabee● Diff code stores and controls for diff processors● More complex hardware● More expensive● Synchronization issues● Can handle more complex data structures and algorithms
  24. 24. The New Wave● MIMD ● Cell SPEs ● Larrabee
  25. 25. Cell SPEs● SPEs ● Local memory store ● Shared memory accessed via DMA ● Ring bus
  26. 26. PS3● RSX ● Traditional GPU (z-buffer, ROP) ● SIMD data structures and processing (arrays)● Offload GPU work to SPUs ● Micro triangle removal ● Skinning ● Post-FX ● Lighting ● Mostly rely on SIMD-friendly data structures
  27. 27. Larrabee● Many general purpose CPU cores● Coherent memory access from cores● Very few fixed-function units (e.g. Texture)● Most graphics pipeline components are programmable ● Depth buffer ● Blending● Invites more complex data structures and algo
  28. 28. What does this mean?
  29. 29. Programming● GPU programming may become more like SPU programming ● More MIMD ● More synchronization and data buffering issues ● More attention to latency hiding
  30. 30. Surfaces and Volumes● Curved surfaces● Displacement mapping● Multi-resolution meshes● Volumes
  31. 31. Lighting● Non-uniform representations ● Irregular Shadow Mapping ● Deep Shadow Maps
  32. 32. Rasterization● Object-parallel rasterization ● Ray-casting – Implicit surfaces (e.g. Metaballs, Level sets, CSG) – Direct volume rendering ● Order independent transparency
  33. 33. Questions?

×