3 d to _hpc

An Introduction to GPU
3D Games to HPC
Krishnaraj Rao
Presented at Bangalore DV Club, 03/12/2010

Agenda

3D Graphics
The Big Picture
Quick Overview
Programming Model
Importance of 3D

High Performance Parallel Computing
Why GPUs for HPPC?
Available APIs
GPU Computing architecture

Q&A

The Big Picture ! Movies

Capture Models Scene Rendering Post
API Processing

Creation
Creation

The Big Picture - Games

Capture Models Scene Rendering Post
API Processing
!"#$%
Drivers
Creation HLSL,
Creation Cg

Models end up in World Space
Worldspace includes everything!
Position and orientation for all
items is needed to accurately calculate
transformations into screen space.
Light Source

Y

Z View Point
or Camera

Screen

X
World Coordinate Space

View Transformation world ends up
on Screen

Screen Coordinate Space

Simple Interactive 3D Graphics App

A simple example
Static scene geometry, Vertex
Setup Raster
Fragment Raster
Engine Engine Ops
moving viewer
Repeat this loop: Z Cull Texture
CPU takes user input from
joystick or mouse
CPU re-calculates viewer
position, view direction,
and light positions in 3-D
world space
GPU clears memory and Update Viewer
Read Draw all
draws the complete scene Joystick Position and Light Scene
geometry with the new Position Direction Objects

viewer and light positions
Repeat forever

Adding Programmability to the
Graphics Pipeline
3D Application
or Game

3D API
Commands

3D API:
OpenGL or
Direct3D
CPU ! GPU Boundary

GPU Assembled
Command & Polygons, Pixel
Data Stream Vertex Index Lines, and Location Pixel
Stream Points Stream Updates
GPU
Primitive Rasterization & Raster
Front Framebuffer
Assembly Interpolation Operations
End

Rasterized
Pre-transformed Transformed Pre-transformed Transformed
Vertices Vertices Fragments Fragments

Programmable Programmable
Vertex Fragment
Processor Processor

A History of Innovation

1995 1999 2002 2003 2004 2005 2006-2007
NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8
1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million
Transistors Transistors Transistors Transistors Transistors Transistors Transistors
2008
GeForce GTX 200
1.4 Billion
Transistors

"#$but what do all these extra transistors do?

NVIDIA Confidential

GPU continues to offload CPU work
Geom Geom Triangle Pixel
Z / Blend
Gather Proc Proc Proc
1996
CPU GPU

Geom Geom Triangle Pixel
Z / Blend
Gather Proc Proc Proc 2000
CPU GPU

Scene Physics Geom Geom Triangle Pixel
Z / Blend
Mgmt and AI Gather Proc Proc Proc
2004
CPU GPU

Scene Physics Geom Geom Triangle Pixel
Z / Blend
Mgmt and AI Gather Proc Proc Proc
2008
CPU GPU

Programming Model
API: Set of functions, procedures or classes
that an OS, library or service provides to
support requests made by computer
programs
DirectX: Collection of APIs to handle
multimedia, esp. game programming and
video tasks, on MS platforms.
OpenGL (Open Graphics Library) is a
standard specification defining a cross-
language, cross-platform API for writing
applications that produce 2D and 3D
computer graphics.

Why is 3D Graphics important?
More than just Fun and Games....

Tokyo, Japan California Coastline

3D Consumer Applications
Vista Office PDFs

Music Photos Maps

!"#$%&'#()#*)+,#-.//#,/

Massive
Data
Parallelism

Instruction
Level
Parallelism

Data Fits in Cache Huge Data Sets

GPU Processing Power
CPU, meet your new partner!

'9C

$"# !"#
#;<2=&>012&?@&AB, !"#$#%&'()&*+,
-&/0123 *-.&/0123
4.*&'6789: 45.-&(6789:

>9C

Beyond Graphics

With floating-point math and textures, graphics
processors can be used for more than just graphics
%&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28

Lots of ongoing research mapping algorithms and
problems onto programmable GPUs
Solving Linear Equations
Black-Scholes Options Pricing
Rigid- and Soft-Body Dynamics

Middleware layers being developed to accelerate
)*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)

What is GPGPU ?
General Purpose computation using GPU
in applications other than 3D graphics
GPU accelerates critical path of application
Data parallel algorithms leverage GPU attributes
Large data arrays, streaming throughput
Fine-grain SIMD parallelism
Floating point (FP) computation
%,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42

Applications ! see //GPGPU.org
Game effects (FX) physics, image processing
Physical modeling, computational engineering, matrix
algebra, convolution, correlation, sorting

Why Computation on the GPU?
A quiet buildup of potential
Calculation Throughput and Memory Bandwidth: 10X
Equivalent performance at fraction of power & cost
GPU in every PC ! pervasive presence and massive impact
%&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8
Natively designed to handle massive threading
Every pixel is a thread
Increased precision (fp32), programmability, flexibility
GPUs are a mass-market parallel processor
Economies of scale
Peak floating point performance is much higher than comparable
CPUs

ATI x1900XT Intel Core 2 Duo E6600
!$400 (video card) !$400 (processor only)
!250 GFLOPs (SP Float) !40 GFLOPS (SP Float)
!46 GB main memory BW !8.5 GB main memory BW

Why Computation on the GPU?
Supercomputing Performance
Inherently Parallel Architecture
1000+ cores, massively parallel processing
250x the compute performance of a PC
Personal
)B+*$Researcher, One C/0*,:140/5*,8
Supercomputer in a desktop system
Plugs into standard power strip
Accessible
Program in C, C++, Fortran for Windows or Linux
Available from OEMs and resellers worldwide and priced
like a workstation

Compute Applications
Computational Fluid Dynamics Data Mining, Analytics &
Computer Aided Engineering Databases
Digital Content Creation MATLAB Acceleration
Electronic Design Automation Molecular Dynamics
Finance Weather, Atmospheric, Ocean
Game Physics Modeling, and Space Sciences
Graphics Libraries
Imaging and Computer Vision Oil & Gas
Medical Imaging Programming Tools
Numerics Ray Tracing
Bio-Informatics and Life Signal Processing
Sciences Video & Audio
Computational Chemistry
Computational
Electromagnetics &
Electrodynamics

Heterogeneous Computing

Multi-Core Parallel-Core
CPU GPU

APIS FOR HETEROGENEOUS COMPUTING

APIs for Heterogeneous Computing
CUDA (Compute Unified Device Architecture) is a
parallel computing architecture developed by NVIDIA.
Programmers use 'C for CUDA' (C with NVIDIA
extensions), compiled through a PathScale Open64 C
compiler, to code algorithms for execution on the
GPU. Both low/high level APIs are provided
OpenCL (Open Computing Language) is a framework
for writing programs that execute across
heterogeneous platforms consisting of CPUs, GPUs,
and other processors.
Microsoft DirectCompute is an API that supports
General-purpose computing on GPUs on Microsoft
Win Vista or Win 7. DirectCompute is part of the
Microsoft DirectX collection of APIs.

OpenCL: Platform Model & Program Structure

One Host+ one or more Compute Devices
Each Compute Device is composed of one or more
Compute Units
Each Compute Unit is further divided into one or more
Processing Elements

CUDA Parallel Computing Architecture

ISA and hardware
compute engine

Includes a C-compiler
plus support for
OpenCL and
DX11 Compute

Architected to natively
support all
computational
interfaces
(standard languages
and APIs)

Option 1
OpenCL and C for CUDA

Entry point for
C for CUDA developers who
prefer high-level C

Entry point for
developers who want OpenCL
low-level API

Shared back-end
compiler and PTX
optimization
technology

GPU

CUDA SuccessDScience & Computation
Not 2x or 3x, but speedups are 20x to 150x

146X 36X 18X 50X 100X

Medical Molecular Video Matlab Astrophysic
Imaging Dynamics Transcoding Computing s
U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN
Urbana

149X 47X 20X 130X 30X

Financial Linear Algebra 3D Quantum Gene
simulation Universidad Ultrasound Chemistry Sequencing
Oxford Jaime Techniscan U of Illinois, U of Maryland
Urbana

100x more affordable
20x less power
Tesla
250x consumption Personal
Supercomputer
Performance

Supercomputing
Cluster 250x
Faster

1x E1;-9F2
Workstations

$100K - $1M < $10 K
Accessibility

C1.@6+7$5<*$G1,.;F2$H125$3140.*I$
Challenges

Film

Science
Auto Design
Oil & Gas

Medicine

Broadcast Space Exploration

Grand Computing Challenges

Personalized Mathematics for Information
Renewable
Medicine Scientific Data Mining
Energy
Discovery

Machines That Natural Human Predict
Economic
Think Machine Environmental
Analysis
Interaction Changes

Final Thoughts

GPU and heterogeneous parallel
architecture will revolutionize computing

Parallel computing needed to solve some of
the most interesting and important human
challenges ahead

Learning parallel programming is imperative
for students in computing and sciences

From Virtua Fighter to Tsubame

1995 ! NV1 2008 ! GT200
0.8M transistors 1,200M transistors

50MHz 1.3GHz

1M Bytes 4G Bytes

0 GFLOPS 1 TFLOPS

Another 1000x in 15 years?

Open GL
1992: OpenGL 1.0
1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing)
1998: OpenGL 1.2 (3D Textures, BGRA pixel format)
1998: OpenGL 1.2.1 (Multi-Texture)
2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures)
2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation)
2003: OpenGL 1.5 (Vertex Attr from Vid Mem)
2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex)
2006: OpenGL 2.1 (GLSL1.2, sRGB Textures)
2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures)
2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI)
2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)

OpenGL ES

Designed for hand-held and embedded devices
Goal is smaller footprint to support OpenGL
PlayStation 3 and cell phone industry adopting ES
OpenGL ES 1.1
Strips out anything deemed extra in OpenGL
Keeps conventional fixed-function vertex and fragment
processing
OpenGL ES 2.0
Adds programmable vertex and fragment shaders
Shaders specified in binary format
Drops support for fixed-function vertex and fragment
processing

OpenGL ES ! Cont

OpenGL ES 1.0 : Symbian OS, Android Platform
OpenGL ES 1.0+ : Playstation 3
OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)
Open GL ES 2.0 : iPhone 3GS, iPOD touch

DirectX

GDI: legacy Windows graphics API ~1985
DirectX 1.0 ! 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput)
DirectX 3.0 ! 1996 (Rasterization only 3D Support, Akward prog. Model, Not
successful)
DirectX 5.0 ! 1997 (Draw Primitives, DirectX vs OpenGL War)
DirectX 6.0 ! 1998 (Multitexture, OGL/Glide features, Texture Compression)
DirectX 7.0 ! 1999 (Geometry HW accleration and Blending, Cube mapping)
DirectX 8.0 ! 2000/1 (Programable VS/PS Shaders, XBOX)
DirectX 9.0 ! 2002-2003 (More programmability, Branching, FP pixel prog.)
DirectX 9.0c ! 2004 (ShaderModel 3.0)
DirectX 10.0 ! 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output)
DirectX 10.1 ! 2008 (SM4.1, Better Image Quality)
DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)

3 d to _hpc

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (9)

Similar to 3 d to _hpc

Similar to 3 d to _hpc (20)

More from Obsidian Software

More from Obsidian Software (20)

3 d to _hpc