Graphics processing unit

1,586 views
1,580 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,586
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
252
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Graphics processing unit

  1. 1. 1/29/2015 1 GRAPHICS PROCESSING UNIT Shashwat Shriparv dwivedishashwat@gmail.com InfinitySoft
  2. 2. 21/29/2015 Presentation Overview Definition Comparison with CPU Architecture GPU-CPU Interaction GPU Memory
  3. 3. 1/29/2015 3 Why GPU?  To provide a separate dedicated graphics resources including a graphics processor and memory.  To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests.
  4. 4. 1/29/2015 4 There comes GPU
  5. 5. 1/29/2015 5 What is a GPU?  A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated processor efficient at manipulating and displaying computer graphics .  Like the CPU (Central Processing Unit), it is a single-chip processor.
  6. 6. 1/29/2015 6 HOWEVER, The abstract goal of a GPU, is to enable a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks.
  7. 7. 1/29/2015 7 GPU vs CPU  A GPU is tailored for highly parallel operation while a CPU executes programs serially.  For this reason, GPUs have many parallel execution units , while CPUs have few execution units .  GPUs have singificantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs.  GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).
  8. 8. 1/29/2015 8 BRIEF HISTORY  First-Generation GPUs – Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set.  Second-Generation GPUs – 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable.  Third-Generation GPUs – 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex Programmability + ASM  Fourth-Generation GPUs – 2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S.  Fifth-Generation GPUs - GeForce 8X:DirectX10.
  9. 9. 1/29/2015 9 GPU Architecture How many processing units? How many ALUs? Do you need a cache? What kind of memory?
  10. 10. 1/29/2015 10 GPU Architecture How many processing units? – Lots. How many ALUs? Do you need a cache? What kind of memory?
  11. 11. 1/29/2015 11 GPU Architecture How many processing units? – Lots. How many ALUs? – Hundreds. Do you need a cache? What kind of memory?
  12. 12. 1/29/2015 12 GPU Architecture How many processing units? – Lots. How many ALUs? – Hundreds. Do you need a cache? – Sort of. What kind of memory?
  13. 13. 1/29/2015 13 GPU Architecture How many processing units? – Lots. How many ALUs? – Hundreds. Do you need a cache? – Sort of. What kind of memory? – very fast.
  14. 14. 1/29/2015 14 The difference……. Without GPU With GPU
  15. 15. 1/29/2015 15 The GPU pipeline  The GPU receives geometry information from the CPU as an input and provides a picture as an output  Let’s see how that happens… host interface vertex processing triangle setup pixel processing memory interface
  16. 16. 1/29/2015 16 Details………..
  17. 17. 1/29/2015 17 Host Interface The host interface is the communication bridge between the CPU and the GPU.  It receives commands from the CPU and also pulls geometry information from system memory.  It outputs a stream of vertices in object space with all their associated information (texture coordinates, per vertex color etc) . host interface vertex processing triangle setup pixel processing memory interface
  18. 18. 1/29/2015 18 Vertex Processing The vertex processing stage receives vertices from the host interface in object space and outputs them in screen space This may be a simple linear transformation, or a complex operation involving morphing effects No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping) host interface vertex processing triangle setup pixel processing memory interface
  19. 19. 1/29/2015 19 Triangle setup In this stage geometry information becomes raster information (screen space geometry is the input, pixels are the output) Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected host interface vertex processing triangle setup pixel processing memory interface
  20. 20. 1/29/2015 20 Triangle Setup (cont…..) A pixel is generated if and only if its center is inside the triangle Every pixel generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle
  21. 21. 1/29/2015 21 Pixel Processing Each pixel provided by triangle setup is fed into pixel processing as a set of attributes which are used to compute the final color for this pixel The computations taking place here include texture mapping and math operations host interface vertex processing triangle setup pixel processing memory interface
  22. 22. 1/29/2015 22 Memory Interface Pixel colors provided by the previous stage are written to the framebuffer Used to be the biggest bottleneck before pixel processing took over Before the final write occurs, some pixels are rejected by the zbuffer .On modern GPUs z is compressed to reduce framebuffer bandwidth (but not size). host interface vertex processing triangle setup pixel processing memory interface
  23. 23. 1/29/2015 23 Programmability in GPU pipeline In current state of the art GPUs, vertex and pixel processing are now programmable The programmer can write programs that are executed for every vertex as well as for every pixel This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications host interface vertex processing triangle setup pixel processing memory interface
  24. 24. 1/29/2015 24 GPU Pipelined Architecture (simplified view) Frame buffer Pixel Shader Texture Storage + Filtering Rasterizer Vertex Shader Vertex Setup C P U Vertices Pixels GPU …110010100100…
  25. 25. 1/29/2015 25 GPU Pipelined Architecture (simplified view) GPU One unit can limit the speed of the pipeline… Frame buffer Pixel Shader Texture Storage + Filtering Rasterizer Vertex Shader Vertex Setup C P U
  26. 26. 1/29/2015 26 CPU/GPU interaction The CPU and GPU inside the PC work in parallel with each other There are two “threads” going on, one for the CPU and one for the GPU, which communicate through a command buffer: CPU writes commands here GPU reads commands from here Pending GPU commands
  27. 27. 1/29/2015 27 CPU/GPU interaction (cont) If this command buffer is drained empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster! If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited
  28. 28. 1/29/2015 28 Synchronization issues In the figure below, the CPU must not overwrite the data in the “yellow” block until the GPU is done with the “black” command, which references that data: CPU writes commands here GPU reads commands from here data
  29. 29. 1/29/2015 29 Inlining data One way to avoid these problems is to inline all data to the command buffer and avoid references to separate data: CPU writes commands here GPU reads commands from here  However, this is also bad for performance, since we may need to copy several Mbytes passing around a pointer
  30. 30. 1/29/2015 30 GPU readbacks The output of a GPU is a rendered image on the screen, what will happen if the CPU tries to read it? CPU writes commands here GPU reads commands from here Pending GPU commands  GPU must be synchronized with the CPU, ie it must drain its entire command buffer, and the CPU must wait while this happens
  31. 31. 1/29/2015 31 GPU readbacks (cont) We lose all parallelism, since first the CPU waits for the GPU, then the GPU waits for the CPU (because the command buffer has been drained) Both CPU and GPU performance take a nosedive Bottom line: the image the GPU produces is for your eyes, not for the CPU (treat the CPU -> GPU highway as a one way street)
  32. 32. 1/29/2015 32 About GPU memory…..
  33. 33. 1/29/2015 33 Memory Hierarchy CPU and GPU Memory Hierarchy CPU Registers Disk CPU Caches CPU Main Memory GPU Video Memory GPU Caches GPU Constant Registers GPU Temporary Registers
  34. 34. 1/29/2015 34 Where is GPU Data Stored? – Vertex buffer – Frame buffer – Texture Vertex Buffer Vertex Processor Rasterizer Fragment Processor Frame Buffer(s) Texture
  35. 35. 1/29/2015 35 CPU memory vs GPU memory CPU GPU Registers Read/write Read/write Local Mem Read/write stack None Global Mem Read/write heap Read-only during computation. Write-only at end (to pre-computed address) Disk Read/write disk None
  36. 36. 1/29/2015 36 It looks like…..
  37. 37. 1/29/2015 37 Some applications….. Computer generated holography using a graphics processing unit Improve the performance of CAD tools. Computer graphics in games
  38. 38. 1/29/2015 38 New….. NVIDIA's new graphics processing unit, the GeForce 8X ULTRA, said to represent the very latest in visual effects technologies.
  39. 39. 1/29/2015 39 THANK YOU Shashwat Shriparv dwivedishashwat@gmail.com InfinitySoft

×