The document discusses the history and evolution of 3D graphics technologies including OpenGL and DirectX, provides an overview of GPU programming models and architectures, and explores how GPUs are increasingly being used for general purpose computing beyond just graphics through technologies like CUDA and OpenCL. It also highlights how GPUs can provide significant performance gains for parallel applications compared to CPUs.
1. An Introduction to GPU
3D Games to HPC
Krishnaraj Rao
Presented at Bangalore DV Club, 03/12/2010
2. Agenda
3D Graphics
The Big Picture
Quick Overview
Programming Model
Importance of 3D
High Performance Parallel Computing
Why GPUs for HPPC?
Available APIs
GPU Computing architecture
Q&A
3. The Big Picture ! Movies
Capture Models Scene Rendering Post
API Processing
Creation
Creation
4. The Big Picture - Games
Capture Models Scene Rendering Post
API Processing
!"#$%
Drivers
Creation HLSL,
Creation Cg
5. Models end up in World Space
Worldspace includes everything!
Position and orientation for all
items is needed to accurately calculate
transformations into screen space.
Light Source
Y
Z View Point
or Camera
Screen
X
World Coordinate Space
7. Simple Interactive 3D Graphics App
A simple example
Static scene geometry, Vertex
Setup Raster
Fragment Raster
Engine Engine Ops
moving viewer
Repeat this loop: Z Cull Texture
CPU takes user input from
joystick or mouse
CPU re-calculates viewer
position, view direction,
and light positions in 3-D
world space
GPU clears memory and Update Viewer
Read Draw all
draws the complete scene Joystick Position and Light Scene
geometry with the new Position Direction Objects
viewer and light positions
Repeat forever
8. Adding Programmability to the
Graphics Pipeline
3D Application
or Game
3D API
Commands
3D API:
OpenGL or
Direct3D
CPU ! GPU Boundary
GPU Assembled
Command & Polygons, Pixel
Data Stream Vertex Index Lines, and Location Pixel
Stream Points Stream Updates
GPU
Primitive Rasterization & Raster
Front Framebuffer
Assembly Interpolation Operations
End
Rasterized
Pre-transformed Transformed Pre-transformed Transformed
Vertices Vertices Fragments Fragments
Programmable Programmable
Vertex Fragment
Processor Processor
9. A History of Innovation
1995 1999 2002 2003 2004 2005 2006-2007
NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8
1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million
Transistors Transistors Transistors Transistors Transistors Transistors Transistors
2008
GeForce GTX 200
1.4 Billion
Transistors
"#$but what do all these extra transistors do?
NVIDIA Confidential
10. GPU continues to offload CPU work
Geom Geom Triangle Pixel
Z / Blend
Gather Proc Proc Proc
1996
CPU GPU
Geom Geom Triangle Pixel
Z / Blend
Gather Proc Proc Proc 2000
CPU GPU
Scene Physics Geom Geom Triangle Pixel
Z / Blend
Mgmt and AI Gather Proc Proc Proc
2004
CPU GPU
Scene Physics Geom Geom Triangle Pixel
Z / Blend
Mgmt and AI Gather Proc Proc Proc
2008
CPU GPU
11. Programming Model
API: Set of functions, procedures or classes
that an OS, library or service provides to
support requests made by computer
programs
DirectX: Collection of APIs to handle
multimedia, esp. game programming and
video tasks, on MS platforms.
OpenGL (Open Graphics Library) is a
standard specification defining a cross-
language, cross-platform API for writing
applications that produce 2D and 3D
computer graphics.
12. Why is 3D Graphics important?
More than just Fun and Games....
Tokyo, Japan California Coastline
16. GPU Processing Power
CPU, meet your new partner!
'9C
$"# !"#
#;<2=&>012&?@&AB, !"#$#%&'()&*+,
-&/0123 *-.&/0123
4.*&'6789: 45.-&(6789:
>9C
17. Beyond Graphics
With floating-point math and textures, graphics
processors can be used for more than just graphics
%&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28
Lots of ongoing research mapping algorithms and
problems onto programmable GPUs
Solving Linear Equations
Black-Scholes Options Pricing
Rigid- and Soft-Body Dynamics
Middleware layers being developed to accelerate
)*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)
18. What is GPGPU ?
General Purpose computation using GPU
in applications other than 3D graphics
GPU accelerates critical path of application
Data parallel algorithms leverage GPU attributes
Large data arrays, streaming throughput
Fine-grain SIMD parallelism
Floating point (FP) computation
%,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42
Applications ! see //GPGPU.org
Game effects (FX) physics, image processing
Physical modeling, computational engineering, matrix
algebra, convolution, correlation, sorting
19. Why Computation on the GPU?
A quiet buildup of potential
Calculation Throughput and Memory Bandwidth: 10X
Equivalent performance at fraction of power & cost
GPU in every PC ! pervasive presence and massive impact
%&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8
Natively designed to handle massive threading
Every pixel is a thread
Increased precision (fp32), programmability, flexibility
GPUs are a mass-market parallel processor
Economies of scale
Peak floating point performance is much higher than comparable
CPUs
ATI x1900XT Intel Core 2 Duo E6600
!$400 (video card) !$400 (processor only)
!250 GFLOPs (SP Float) !40 GFLOPS (SP Float)
!46 GB main memory BW !8.5 GB main memory BW
20. Why Computation on the GPU?
Supercomputing Performance
Inherently Parallel Architecture
1000+ cores, massively parallel processing
250x the compute performance of a PC
Personal
)B+*$Researcher, One C/0*,:140/5*,8
Supercomputer in a desktop system
Plugs into standard power strip
Accessible
Program in C, C++, Fortran for Windows or Linux
Available from OEMs and resellers worldwide and priced
like a workstation
21. Compute Applications
Computational Fluid Dynamics Data Mining, Analytics &
Computer Aided Engineering Databases
Digital Content Creation MATLAB Acceleration
Electronic Design Automation Molecular Dynamics
Finance Weather, Atmospheric, Ocean
Game Physics Modeling, and Space Sciences
Graphics Libraries
Imaging and Computer Vision Oil & Gas
Medical Imaging Programming Tools
Numerics Ray Tracing
Bio-Informatics and Life Signal Processing
Sciences Video & Audio
Computational Chemistry
Computational
Electromagnetics &
Electrodynamics
24. APIs for Heterogeneous Computing
CUDA (Compute Unified Device Architecture) is a
parallel computing architecture developed by NVIDIA.
Programmers use 'C for CUDA' (C with NVIDIA
extensions), compiled through a PathScale Open64 C
compiler, to code algorithms for execution on the
GPU. Both low/high level APIs are provided
OpenCL (Open Computing Language) is a framework
for writing programs that execute across
heterogeneous platforms consisting of CPUs, GPUs,
and other processors.
Microsoft DirectCompute is an API that supports
General-purpose computing on GPUs on Microsoft
Win Vista or Win 7. DirectCompute is part of the
Microsoft DirectX collection of APIs.
26. OpenCL: Platform Model & Program Structure
One Host+ one or more Compute Devices
Each Compute Device is composed of one or more
Compute Units
Each Compute Unit is further divided into one or more
Processing Elements
27. CUDA Parallel Computing Architecture
ISA and hardware
compute engine
Includes a C-compiler
plus support for
OpenCL and
DX11 Compute
Architected to natively
support all
computational
interfaces
(standard languages
and APIs)
28. Option 1
OpenCL and C for CUDA
Entry point for
C for CUDA developers who
prefer high-level C
Entry point for
developers who want OpenCL
low-level API
Shared back-end
compiler and PTX
optimization
technology
GPU
29. CUDA SuccessDScience & Computation
Not 2x or 3x, but speedups are 20x to 150x
146X 36X 18X 50X 100X
Medical Molecular Video Matlab Astrophysic
Imaging Dynamics Transcoding Computing s
U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN
Urbana
149X 47X 20X 130X 30X
Financial Linear Algebra 3D Quantum Gene
simulation Universidad Ultrasound Chemistry Sequencing
Oxford Jaime Techniscan U of Illinois, U of Maryland
Urbana
30. 100x more affordable
20x less power
Tesla
250x consumption Personal
Supercomputer
Performance
Supercomputing
Cluster 250x
Faster
1x E1;-9F2
Workstations
$100K - $1M < $10 K
Accessibility
32. Grand Computing Challenges
Personalized Mathematics for Information
Renewable
Medicine Scientific Data Mining
Energy
Discovery
Machines That Natural Human Predict
Economic
Think Machine Environmental
Analysis
Interaction Changes
33. Final Thoughts
GPU and heterogeneous parallel
architecture will revolutionize computing
Parallel computing needed to solve some of
the most interesting and important human
challenges ahead
Learning parallel programming is imperative
for students in computing and sciences
34. From Virtua Fighter to Tsubame
1995 ! NV1 2008 ! GT200
0.8M transistors 1,200M transistors
50MHz 1.3GHz
1M Bytes 4G Bytes
0 GFLOPS 1 TFLOPS
Another 1000x in 15 years?
38. OpenGL ES
Designed for hand-held and embedded devices
Goal is smaller footprint to support OpenGL
PlayStation 3 and cell phone industry adopting ES
OpenGL ES 1.1
Strips out anything deemed extra in OpenGL
Keeps conventional fixed-function vertex and fragment
processing
OpenGL ES 2.0
Adds programmable vertex and fragment shaders
Shaders specified in binary format
Drops support for fixed-function vertex and fragment
processing
39. OpenGL ES ! Cont
OpenGL ES 1.0 : Symbian OS, Android Platform
OpenGL ES 1.0+ : Playstation 3
OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)
Open GL ES 2.0 : iPhone 3GS, iPOD touch