SlideShare a Scribd company logo
1 of 39
Download to read offline
1/29/2015 1
GRAPHICS
PROCESSING UNIT
Shashwat Shriparv
dwivedishashwat@gmail.com
InfinitySoft
21/29/2015
Presentation Overview
Definition
Comparison with CPU
Architecture
GPU-CPU Interaction
GPU Memory
1/29/2015 3
Why GPU?
 To provide a separate dedicated graphics
resources including a graphics processor and
memory.
 To relieve some of the burden of the main
system resources, namely the Central Processing
Unit, Main Memory, and the System Bus, which
would otherwise get saturated with graphical
operations and I/O requests.
1/29/2015 4
There comes
GPU
1/29/2015 5
What is a GPU?
 A Graphics Processing Unit or GPU (also
occasionally called Visual Processing Unit or
VPU) is a dedicated processor efficient at
manipulating and displaying computer graphics .
 Like the CPU (Central Processing Unit), it is a
single-chip processor.
1/29/2015 6
HOWEVER,
The abstract goal of a GPU, is to enable
a representation of a 3D world as
realistically as possible. So these GPUs are
designed to provide additional
computational power that is customized
specifically to perform these 3D tasks.
1/29/2015 7
GPU vs CPU
 A GPU is tailored for highly parallel operation
while a CPU executes programs serially.
 For this reason, GPUs have many parallel
execution units , while CPUs have few execution
units .
 GPUs have singificantly faster and more
advanced memory interfaces as they need to
shift around a lot more data than CPUs.
 GPUs have much deeper pipelines (several
thousand stages vs 10-20 for CPUs).
1/29/2015 8
BRIEF HISTORY
 First-Generation GPUs
– Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature
set.
 Second-Generation GPUs
– 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and
S3’s Savage3D; T&L; OpenGL and DX7;Configurable.
 Third-Generation GPUs
– 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex
Programmability + ASM
 Fourth-Generation GPUs
– 2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions,
DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M
T/S.
 Fifth-Generation GPUs
- GeForce 8X:DirectX10.
1/29/2015 9
GPU Architecture
How many processing units?
How many ALUs?
Do you need a cache?
What kind of memory?
1/29/2015 10
GPU Architecture
How many processing units?
– Lots.
How many ALUs?
Do you need a cache?
What kind of memory?
1/29/2015 11
GPU Architecture
How many processing units?
– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
What kind of memory?
1/29/2015 12
GPU Architecture
How many processing units?
– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
– Sort of.
What kind of memory?
1/29/2015 13
GPU Architecture
How many processing units?
– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
– Sort of.
What kind of memory?
– very fast.
1/29/2015 14
The difference…….
Without GPU With GPU
1/29/2015 15
The GPU pipeline
 The GPU receives geometry information
from the CPU as an input and provides a
picture as an output
 Let’s see how that happens…
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 16
Details………..
1/29/2015 17
Host Interface
The host interface is the communication
bridge between the CPU and the GPU.
 It receives commands from the CPU and also
pulls geometry information from system memory.
 It outputs a stream of vertices in object space
with all their associated information (texture
coordinates, per vertex color etc) .
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 18
Vertex Processing
The vertex processing stage receives vertices
from the host interface in object space and
outputs them in screen space
This may be a simple linear transformation, or a
complex operation involving morphing effects
No new vertices are created in this stage, and
no vertices are discarded (input/output has 1:1
mapping)
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 19
Triangle setup
In this stage geometry information becomes
raster information (screen space geometry is the
input, pixels are the output)
Prior to rasterization, triangles that are
backfacing or are located outside the viewing
frustrum are rejected
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 20
Triangle Setup (cont…..)
A pixel is generated if and only if its center is inside
the triangle
Every pixel generated has its attributes computed
to be the perspective correct interpolation of the
three vertices that make up the triangle
1/29/2015 21
Pixel Processing
Each pixel provided by triangle setup is fed into
pixel processing as a set of attributes which are
used to compute the final color for this pixel
The computations taking place here include
texture mapping and math operations
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 22
Memory Interface
Pixel colors provided by the previous stage are
written to the framebuffer
Used to be the biggest bottleneck before pixel
processing took over
Before the final write occurs, some pixels are
rejected by the zbuffer .On modern GPUs z is
compressed to reduce framebuffer bandwidth
(but not size).
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 23
Programmability in GPU pipeline
In current state of the art GPUs, vertex and
pixel processing are now programmable
The programmer can write programs that are
executed for every vertex as well as for every
pixel
This allows fully customizable geometry and
shading effects that go well beyond the generic
look and feel of older 3D applications
host
interface
vertex
processing
triangle
setup
pixel
processing
memory
interface
1/29/2015 24
GPU Pipelined Architecture
(simplified view)
Frame
buffer
Pixel
Shader
Texture
Storage +
Filtering
Rasterizer
Vertex
Shader
Vertex
Setup
C
P
U
Vertices Pixels
GPU
…110010100100…
1/29/2015 25
GPU Pipelined Architecture
(simplified view)
GPU
One unit can limit the speed of the pipeline…
Frame
buffer
Pixel
Shader
Texture
Storage +
Filtering
Rasterizer
Vertex
Shader
Vertex
Setup
C
P
U
1/29/2015 26
CPU/GPU interaction
The CPU and GPU inside the PC work in
parallel with each other
There are two “threads” going on, one for
the CPU and one for the GPU, which
communicate through a command buffer:
CPU writes commands here
GPU reads commands from here
Pending GPU commands
1/29/2015 27
CPU/GPU interaction (cont)
If this command buffer is drained empty,
we are CPU limited and the GPU will spin
around waiting for new input. All the GPU
power in the universe isn’t going to make
your application faster!
If the command buffer fills up, the CPU
will spin around waiting for the GPU to
consume it, and we are effectively GPU
limited
1/29/2015 28
Synchronization issues
In the figure below, the CPU must not
overwrite the data in the “yellow” block
until the GPU is done with the “black”
command, which references that data:
CPU writes commands here
GPU reads commands from here
data
1/29/2015 29
Inlining data
One way to avoid these problems is to
inline all data to the command buffer and
avoid references to separate data:
CPU writes commands here
GPU reads commands from here
 However, this is also bad for performance, since we may need to copy several Mbytes
passing around a pointer
1/29/2015 30
GPU readbacks
The output of a GPU is a rendered image on the
screen, what will happen if the CPU tries to read
it?
CPU writes commands here
GPU reads commands from here
Pending GPU commands
 GPU must be synchronized with the CPU, ie it must drain its
entire command buffer, and the CPU must wait while this happens
1/29/2015 31
GPU readbacks (cont)
We lose all parallelism, since first the CPU
waits for the GPU, then the GPU waits for
the CPU (because the command buffer
has been drained)
Both CPU and GPU performance take a
nosedive
Bottom line: the image the GPU produces
is for your eyes, not for the CPU (treat the
CPU -> GPU highway as a one way street)
1/29/2015 32
About GPU memory…..
1/29/2015 33
Memory Hierarchy
CPU and GPU Memory Hierarchy
CPU Registers
Disk
CPU Caches
CPU Main
Memory
GPU Video
Memory
GPU Caches
GPU Constant
Registers
GPU Temporary
Registers
1/29/2015 34
Where is GPU Data Stored?
– Vertex buffer
– Frame buffer
– Texture
Vertex Buffer
Vertex
Processor
Rasterizer
Fragment
Processor
Frame
Buffer(s)
Texture
1/29/2015 35
CPU memory vs GPU memory
CPU GPU
Registers Read/write Read/write
Local Mem Read/write stack None
Global Mem Read/write heap Read-only during
computation.
Write-only at end (to
pre-computed
address)
Disk Read/write disk None
1/29/2015 36
It looks like…..
1/29/2015 37
Some applications…..
Computer generated holography using a
graphics processing unit
Improve the performance of CAD tools.
Computer graphics in games
1/29/2015 38
New…..
NVIDIA's new graphics processing unit,
the GeForce 8X ULTRA, said to represent
the very latest in visual effects
technologies.
1/29/2015 39
THANK
YOU
Shashwat Shriparv
dwivedishashwat@gmail.com
InfinitySoft

More Related Content

What's hot

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)self employed
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)MuntasirMuhit
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentationJosiah Lund
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit pptNitesh Dubey
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
Gpu Systems
Gpu SystemsGpu Systems
Gpu Systemsjpaugh
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computerPriya Manik
 
mobile processors
mobile processorsmobile processors
mobile processorsAreticharan
 
Presentation on - Processors
Presentation on - Processors Presentation on - Processors
Presentation on - Processors The Avi Sharma
 

What's hot (20)

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
GPU
GPUGPU
GPU
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
Gpu Systems
Gpu SystemsGpu Systems
Gpu Systems
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computer
 
Graphics card
Graphics cardGraphics card
Graphics card
 
Graphic card
Graphic cardGraphic card
Graphic card
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
mobile processors
mobile processorsmobile processors
mobile processors
 
Presentation on - Processors
Presentation on - Processors Presentation on - Processors
Presentation on - Processors
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 

Similar to Graphics processing unit

[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalMasatsugu HASHIMOTO
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memoryjournalacij
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Create Amazing VFX with the Visual Effect Graph
Create Amazing VFX with the Visual Effect GraphCreate Amazing VFX with the Visual Effect Graph
Create Amazing VFX with the Visual Effect GraphUnity Technologies
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
GPU Latency Analysis
GPU Latency AnalysisGPU Latency Analysis
GPU Latency AnalysisBenson Tao
 
Kernel Recipes 2014 - The Linux graphics stack and Nouveau driver
Kernel Recipes 2014 - The Linux graphics stack and Nouveau driverKernel Recipes 2014 - The Linux graphics stack and Nouveau driver
Kernel Recipes 2014 - The Linux graphics stack and Nouveau driverAnne Nicolas
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unitDayakar Siddula
 

Similar to Graphics processing unit (20)

[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_Final
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memory
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Create Amazing VFX with the Visual Effect Graph
Create Amazing VFX with the Visual Effect GraphCreate Amazing VFX with the Visual Effect Graph
Create Amazing VFX with the Visual Effect Graph
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s project
 
GPU Latency Analysis
GPU Latency AnalysisGPU Latency Analysis
GPU Latency Analysis
 
Gpu
GpuGpu
Gpu
 
Kernel Recipes 2014 - The Linux graphics stack and Nouveau driver
Kernel Recipes 2014 - The Linux graphics stack and Nouveau driverKernel Recipes 2014 - The Linux graphics stack and Nouveau driver
Kernel Recipes 2014 - The Linux graphics stack and Nouveau driver
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 

More from Shashwat Shriparv (20)

Learning Linux Series Administrator Commands.pptx
Learning Linux Series Administrator Commands.pptxLearning Linux Series Administrator Commands.pptx
Learning Linux Series Administrator Commands.pptx
 
Kerberos Architecture.pptx
Kerberos Architecture.pptxKerberos Architecture.pptx
Kerberos Architecture.pptx
 
Kerberos Architecture.pptx
Kerberos Architecture.pptxKerberos Architecture.pptx
Kerberos Architecture.pptx
 
Command Seperators.pptx
Command Seperators.pptxCommand Seperators.pptx
Command Seperators.pptx
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradation
 
R language introduction
R language introductionR language introduction
R language introduction
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Hbase interact with shell
Hbase interact with shellHbase interact with shell
Hbase interact with shell
 
H base development
H base developmentH base development
H base development
 
Hbase
HbaseHbase
Hbase
 
H base
H baseH base
H base
 
My sql
My sqlMy sql
My sql
 
Apache tomcat
Apache tomcatApache tomcat
Apache tomcat
 
Linux 4 you
Linux 4 youLinux 4 you
Linux 4 you
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Java interview questions
Java interview questionsJava interview questions
Java interview questions
 

Graphics processing unit

  • 1. 1/29/2015 1 GRAPHICS PROCESSING UNIT Shashwat Shriparv dwivedishashwat@gmail.com InfinitySoft
  • 2. 21/29/2015 Presentation Overview Definition Comparison with CPU Architecture GPU-CPU Interaction GPU Memory
  • 3. 1/29/2015 3 Why GPU?  To provide a separate dedicated graphics resources including a graphics processor and memory.  To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests.
  • 5. 1/29/2015 5 What is a GPU?  A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated processor efficient at manipulating and displaying computer graphics .  Like the CPU (Central Processing Unit), it is a single-chip processor.
  • 6. 1/29/2015 6 HOWEVER, The abstract goal of a GPU, is to enable a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks.
  • 7. 1/29/2015 7 GPU vs CPU  A GPU is tailored for highly parallel operation while a CPU executes programs serially.  For this reason, GPUs have many parallel execution units , while CPUs have few execution units .  GPUs have singificantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs.  GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).
  • 8. 1/29/2015 8 BRIEF HISTORY  First-Generation GPUs – Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set.  Second-Generation GPUs – 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable.  Third-Generation GPUs – 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex Programmability + ASM  Fourth-Generation GPUs – 2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S.  Fifth-Generation GPUs - GeForce 8X:DirectX10.
  • 9. 1/29/2015 9 GPU Architecture How many processing units? How many ALUs? Do you need a cache? What kind of memory?
  • 10. 1/29/2015 10 GPU Architecture How many processing units? – Lots. How many ALUs? Do you need a cache? What kind of memory?
  • 11. 1/29/2015 11 GPU Architecture How many processing units? – Lots. How many ALUs? – Hundreds. Do you need a cache? What kind of memory?
  • 12. 1/29/2015 12 GPU Architecture How many processing units? – Lots. How many ALUs? – Hundreds. Do you need a cache? – Sort of. What kind of memory?
  • 13. 1/29/2015 13 GPU Architecture How many processing units? – Lots. How many ALUs? – Hundreds. Do you need a cache? – Sort of. What kind of memory? – very fast.
  • 15. 1/29/2015 15 The GPU pipeline  The GPU receives geometry information from the CPU as an input and provides a picture as an output  Let’s see how that happens… host interface vertex processing triangle setup pixel processing memory interface
  • 17. 1/29/2015 17 Host Interface The host interface is the communication bridge between the CPU and the GPU.  It receives commands from the CPU and also pulls geometry information from system memory.  It outputs a stream of vertices in object space with all their associated information (texture coordinates, per vertex color etc) . host interface vertex processing triangle setup pixel processing memory interface
  • 18. 1/29/2015 18 Vertex Processing The vertex processing stage receives vertices from the host interface in object space and outputs them in screen space This may be a simple linear transformation, or a complex operation involving morphing effects No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping) host interface vertex processing triangle setup pixel processing memory interface
  • 19. 1/29/2015 19 Triangle setup In this stage geometry information becomes raster information (screen space geometry is the input, pixels are the output) Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected host interface vertex processing triangle setup pixel processing memory interface
  • 20. 1/29/2015 20 Triangle Setup (cont…..) A pixel is generated if and only if its center is inside the triangle Every pixel generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle
  • 21. 1/29/2015 21 Pixel Processing Each pixel provided by triangle setup is fed into pixel processing as a set of attributes which are used to compute the final color for this pixel The computations taking place here include texture mapping and math operations host interface vertex processing triangle setup pixel processing memory interface
  • 22. 1/29/2015 22 Memory Interface Pixel colors provided by the previous stage are written to the framebuffer Used to be the biggest bottleneck before pixel processing took over Before the final write occurs, some pixels are rejected by the zbuffer .On modern GPUs z is compressed to reduce framebuffer bandwidth (but not size). host interface vertex processing triangle setup pixel processing memory interface
  • 23. 1/29/2015 23 Programmability in GPU pipeline In current state of the art GPUs, vertex and pixel processing are now programmable The programmer can write programs that are executed for every vertex as well as for every pixel This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications host interface vertex processing triangle setup pixel processing memory interface
  • 24. 1/29/2015 24 GPU Pipelined Architecture (simplified view) Frame buffer Pixel Shader Texture Storage + Filtering Rasterizer Vertex Shader Vertex Setup C P U Vertices Pixels GPU …110010100100…
  • 25. 1/29/2015 25 GPU Pipelined Architecture (simplified view) GPU One unit can limit the speed of the pipeline… Frame buffer Pixel Shader Texture Storage + Filtering Rasterizer Vertex Shader Vertex Setup C P U
  • 26. 1/29/2015 26 CPU/GPU interaction The CPU and GPU inside the PC work in parallel with each other There are two “threads” going on, one for the CPU and one for the GPU, which communicate through a command buffer: CPU writes commands here GPU reads commands from here Pending GPU commands
  • 27. 1/29/2015 27 CPU/GPU interaction (cont) If this command buffer is drained empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster! If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited
  • 28. 1/29/2015 28 Synchronization issues In the figure below, the CPU must not overwrite the data in the “yellow” block until the GPU is done with the “black” command, which references that data: CPU writes commands here GPU reads commands from here data
  • 29. 1/29/2015 29 Inlining data One way to avoid these problems is to inline all data to the command buffer and avoid references to separate data: CPU writes commands here GPU reads commands from here  However, this is also bad for performance, since we may need to copy several Mbytes passing around a pointer
  • 30. 1/29/2015 30 GPU readbacks The output of a GPU is a rendered image on the screen, what will happen if the CPU tries to read it? CPU writes commands here GPU reads commands from here Pending GPU commands  GPU must be synchronized with the CPU, ie it must drain its entire command buffer, and the CPU must wait while this happens
  • 31. 1/29/2015 31 GPU readbacks (cont) We lose all parallelism, since first the CPU waits for the GPU, then the GPU waits for the CPU (because the command buffer has been drained) Both CPU and GPU performance take a nosedive Bottom line: the image the GPU produces is for your eyes, not for the CPU (treat the CPU -> GPU highway as a one way street)
  • 32. 1/29/2015 32 About GPU memory…..
  • 33. 1/29/2015 33 Memory Hierarchy CPU and GPU Memory Hierarchy CPU Registers Disk CPU Caches CPU Main Memory GPU Video Memory GPU Caches GPU Constant Registers GPU Temporary Registers
  • 34. 1/29/2015 34 Where is GPU Data Stored? – Vertex buffer – Frame buffer – Texture Vertex Buffer Vertex Processor Rasterizer Fragment Processor Frame Buffer(s) Texture
  • 35. 1/29/2015 35 CPU memory vs GPU memory CPU GPU Registers Read/write Read/write Local Mem Read/write stack None Global Mem Read/write heap Read-only during computation. Write-only at end (to pre-computed address) Disk Read/write disk None
  • 37. 1/29/2015 37 Some applications….. Computer generated holography using a graphics processing unit Improve the performance of CAD tools. Computer graphics in games
  • 38. 1/29/2015 38 New….. NVIDIA's new graphics processing unit, the GeForce 8X ULTRA, said to represent the very latest in visual effects technologies.