Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,523
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
41
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. An Introduction to GPU 3D Games to HPC Krishnaraj Rao Presented at Bangalore DV Club, 03/12/2010
  • 2. Agenda 3D Graphics The Big Picture Quick Overview Programming Model Importance of 3D High Performance Parallel Computing Why GPUs for HPPC? Available APIs GPU Computing architecture Q&A
  • 3. The Big Picture – Movies Capture Models Scene Rendering Post API Processing Creation Creation
  • 4. The Big Picture - Games Capture Models Scene Rendering Post API Processing GPU’s Drivers Creation HLSL, Creation Cg
  • 5. Models end up in World Space Worldspace includes everything! Position and orientation for all items is needed to accurately calculate transformations into screen space. Light Source Y Z View Point or Camera Screen X World Coordinate Space
  • 6. View Transformation world ends up on Screen Screen Coordinate Space
  • 7. Simple Interactive 3D Graphics App A simple example Static scene geometry, Vertex Setup Raster Fragment Raster Engine Engine Ops moving viewer Repeat this loop: Z Cull Texture CPU takes user input from joystick or mouse CPU re-calculates viewer position, view direction, and light positions in 3-D world space GPU clears memory and Update Viewer Read Draw all draws the complete scene Joystick Position and Light Scene geometry with the new Position Direction Objects viewer and light positions Repeat forever
  • 8. Adding Programmability to the Graphics Pipeline 3D Application or Game 3D API Commands 3D API: OpenGL or Direct3D CPU – GPU Boundary GPU Assembled Command & Polygons, Pixel Data Stream Vertex Index Lines, and Location Pixel Stream Points Stream Updates GPU Primitive Rasterization & Raster Front Framebuffer Assembly Interpolation Operations End Rasterized Pre-transformed Transformed Pre-transformed Transformed Vertices Vertices Fragments Fragments Programmable Programmable Vertex Fragment Processor Processor
  • 9. A History of Innovation 1995 1999 2002 2003 2004 2005 2006-2007 NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8 1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million Transistors Transistors Transistors Transistors Transistors Transistors Transistors 2008 GeForce GTX 200 1.4 Billion Transistors …. but what do all these extra transistors do? NVIDIA Confidential
  • 10. GPU continues to offload CPU work Geom Geom Triangle Pixel Z / Blend Gather Proc Proc Proc 1996 CPU GPU Geom Geom Triangle Pixel Z / Blend Gather Proc Proc Proc 2000 CPU GPU Scene Physics Geom Geom Triangle Pixel Z / Blend Mgmt and AI Gather Proc Proc Proc 2004 CPU GPU Scene Physics Geom Geom Triangle Pixel Z / Blend Mgmt and AI Gather Proc Proc Proc 2008 CPU GPU
  • 11. Programming Model API: Set of functions, procedures or classes that an OS, library or service provides to support requests made by computer programs DirectX: Collection of APIs to handle multimedia, esp. game programming and video tasks, on MS platforms. OpenGL (Open Graphics Library) is a standard specification defining a cross- language, cross-platform API for writing applications that produce 2D and 3D computer graphics.
  • 12. Why is 3D Graphics important? More than just Fun and Games.... Tokyo, Japan California Coastline
  • 13. 3D Consumer Applications Vista Office PDFs Music Photos Maps
  • 14. GPUS IN HPC
  • 15. Evolution of Processors Massive Data Parallelism Instruction Level Parallelism Data Fits in Cache Huge Data Sets
  • 16. GPU Processing Power CPU, meet your new partner! GPU CPU GPU Intel Core i7 965 NVIDIA GTX 285 4 cores 240 cores 102 GFLOPS 1.04 TFLOPS CPU
  • 17. Beyond Graphics With floating-point math and textures, graphics processors can be used for more than just graphics GPGPU = “General Purpose Computing on GPUs” Lots of ongoing research mapping algorithms and problems onto programmable GPUs Solving Linear Equations Black-Scholes Options Pricing Rigid- and Soft-Body Dynamics Middleware layers being developed to accelerate “eye candy” game physics on GPUs (HavokFX)
  • 18. What is GPGPU ? General Purpose computation using GPU in applications other than 3D graphics GPU accelerates critical path of application Data parallel algorithms leverage GPU attributes Large data arrays, streaming throughput Fine-grain SIMD parallelism Floating point (FP) computation Great for “embarrassingly parallel” algorithms Applications – see //GPGPU.org Game effects (FX) physics, image processing Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting
  • 19. Why Computation on the GPU? A quiet buildup of potential Calculation Throughput and Memory Bandwidth: 10X Equivalent performance at fraction of power & cost GPU in every PC – pervasive presence and massive impact GPUs have always been parallel “multi-core” Natively designed to handle massive threading Every pixel is a thread Increased precision (fp32), programmability, flexibility GPUs are a mass-market parallel processor Economies of scale Peak floating point performance is much higher than comparable CPUs ATI x1900XT Intel Core 2 Duo E6600 $400 (video card) $400 (processor only) 250 GFLOPs (SP Float) 40 GFLOPS (SP Float) 46 GB main memory BW 8.5 GB main memory BW
  • 20. Why Computation on the GPU? Supercomputing Performance Inherently Parallel Architecture 1000+ cores, massively parallel processing 250x the compute performance of a PC Personal “One Researcher, One Supercomputer” Supercomputer in a desktop system Plugs into standard power strip Accessible Program in C, C++, Fortran for Windows or Linux Available from OEMs and resellers worldwide and priced like a workstation
  • 21. Compute Applications Computational Fluid Dynamics Data Mining, Analytics & Computer Aided Engineering Databases Digital Content Creation MATLAB Acceleration Electronic Design Automation Molecular Dynamics Finance Weather, Atmospheric, Ocean Game Physics Modeling, and Space Sciences Graphics Libraries Imaging and Computer Vision Oil & Gas Medical Imaging Programming Tools Numerics Ray Tracing Bio-Informatics and Life Signal Processing Sciences Video & Audio Computational Chemistry Computational Electromagnetics & Electrodynamics
  • 22. Heterogeneous Computing Multi-Core Parallel-Core CPU GPU
  • 23. APIS FOR HETEROGENEOUS COMPUTING
  • 24. APIs for Heterogeneous Computing CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. Both low/high level APIs are provided OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. Microsoft DirectCompute is an API that supports General-purpose computing on GPUs on Microsoft Win Vista or Win 7. DirectCompute is part of the Microsoft DirectX collection of APIs.
  • 25. OpenCL
  • 26. OpenCL: Platform Model & Program Structure One Host+ one or more Compute Devices Each Compute Device is composed of one or more Compute Units Each Compute Unit is further divided into one or more Processing Elements
  • 27. CUDA Parallel Computing Architecture ISA and hardware compute engine Includes a C-compiler plus support for OpenCL and DX11 Compute Architected to natively support all computational interfaces (standard languages and APIs)
  • 28. Option 1 OpenCL and C for CUDA Entry point for C for CUDA developers who prefer high-level C Entry point for developers who want OpenCL low-level API Shared back-end compiler and PTX optimization technology GPU
  • 29. CUDA Success—Science & Computation Not 2x or 3x, but speedups are 20x to 150x 146X 36X 18X 50X 100X Medical Molecular Video Matlab Astrophysic Imaging Dynamics Transcoding Computing s U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN Urbana 149X 47X 20X 130X 30X Financial Linear Algebra 3D Quantum Gene simulation Universidad Ultrasound Chemistry Sequencing Oxford Jaime Techniscan U of Illinois, U of Maryland Urbana
  • 30. 100x more affordable 20x less power Tesla 250x consumption Personal Supercomputer Performance Supercomputing Cluster 250x Faster 1x Today’s Workstations $100K - $1M < $10 K Accessibility
  • 31. Solving the World’s Most Complex  Challenges Film Science Auto Design Oil & Gas Medicine Broadcast Space Exploration
  • 32. Grand Computing Challenges Personalized Mathematics for Information Renewable Medicine Scientific Data Mining Energy Discovery Machines That Natural Human Predict Economic Think Machine Environmental Analysis Interaction Changes
  • 33. Final Thoughts GPU and heterogeneous parallel architecture will revolutionize computing Parallel computing needed to solve some of the most interesting and important human challenges ahead Learning parallel programming is imperative for students in computing and sciences
  • 34. From Virtua Fighter to Tsubame 1995 – NV1 2008 – GT200 0.8M transistors 1,200M transistors 50MHz 1.3GHz 1M Bytes 4G Bytes 0 GFLOPS 1 TFLOPS Another 1000x in 15 years?
  • 35. BACKUP
  • 36. Graphics API History
  • 37. Open GL 1992: OpenGL 1.0 1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing) 1998: OpenGL 1.2 (3D Textures, BGRA pixel format) 1998: OpenGL 1.2.1 (Multi-Texture) 2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures) 2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation) 2003: OpenGL 1.5 (Vertex Attr from Vid Mem) 2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex) 2006: OpenGL 2.1 (GLSL1.2, sRGB Textures) 2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures) 2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI) 2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)
  • 38. OpenGL ES Designed for hand-held and embedded devices Goal is smaller footprint to support OpenGL PlayStation 3 and cell phone industry adopting ES OpenGL ES 1.1 Strips out anything deemed extra in OpenGL Keeps conventional fixed-function vertex and fragment processing OpenGL ES 2.0 Adds programmable vertex and fragment shaders Shaders specified in binary format Drops support for fixed-function vertex and fragment processing
  • 39. OpenGL ES – Cont OpenGL ES 1.0 : Symbian OS, Android Platform OpenGL ES 1.0+ : Playstation 3 OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models) Open GL ES 2.0 : iPhone 3GS, iPOD touch
  • 40. DirectX GDI: legacy Windows graphics API ~1985 DirectX 1.0 – 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput) DirectX 3.0 – 1996 (Rasterization only 3D Support, Akward prog. Model, Not successful) DirectX 5.0 – 1997 (Draw Primitives, DirectX vs OpenGL War) DirectX 6.0 – 1998 (Multitexture, OGL/Glide features, Texture Compression) DirectX 7.0 – 1999 (Geometry HW accleration and Blending, Cube mapping) DirectX 8.0 – 2000/1 (Programable VS/PS Shaders, XBOX) DirectX 9.0 – 2002-2003 (More programmability, Branching, FP pixel prog.) DirectX 9.0c – 2004 (ShaderModel 3.0) DirectX 10.0 – 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output) DirectX 10.1 – 2008 (SM4.1, Better Image Quality) DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)