SlideShare a Scribd company logo
1 of 40
Download to read offline
An Introduction to GPU
          3D Games to HPC
           Krishnaraj Rao
Presented at Bangalore DV Club, 03/12/2010
Agenda

 3D Graphics
    The Big Picture
    Quick Overview
    Programming Model
    Importance of 3D


 High Performance Parallel Computing
    Why GPUs for HPPC?
    Available APIs
    GPU Computing architecture


 Q&A
The Big Picture ! Movies




Capture      Models   Scene   Rendering   Post
                      API                 Processing


Creation
           Creation
The Big Picture - Games




Capture      Models   Scene   Rendering   Post
                      API                 Processing
                              !"#$%
                              Drivers
Creation              HLSL,
           Creation   Cg
Models end up in World Space
     Worldspace includes everything!
     Position and orientation for all
     items is needed to accurately calculate
     transformations into screen space.
                                               Light Source




               Y

                           Z                                  View Point
                                                              or Camera

                                                 Screen



                                  X
         World Coordinate Space
View Transformation world ends up
on Screen




   Screen Coordinate Space
Simple Interactive 3D Graphics App

 A simple example
     Static scene geometry,       Vertex
                                                Setup       Raster
                                                                       Fragment      Raster
                                  Engine                                Engine        Ops
     moving viewer
 Repeat this loop:                                          Z Cull     Texture
     CPU takes user input from
     joystick or mouse
     CPU re-calculates viewer
     position, view direction,
     and light positions in 3-D
     world space
     GPU clears memory and                               Update Viewer
                                      Read                                        Draw all
     draws the complete scene        Joystick           Position and Light         Scene
     geometry with the new           Position               Direction             Objects

     viewer and light positions
     Repeat forever
Adding Programmability to the
    Graphics Pipeline
           3D Application
             or Game


      3D API
    Commands

             3D API:
            OpenGL or
             Direct3D
                                                          CPU ! GPU Boundary


           GPU                                          Assembled
    Command &                                           Polygons,                            Pixel
    Data Stream             Vertex Index                Lines, and                          Location                  Pixel
                              Stream                      Points                            Stream                   Updates
               GPU
                                           Primitive                   Rasterization &                   Raster
               Front                                                                                                           Framebuffer
                                           Assembly                     Interpolation                   Operations
               End


                                                                   Rasterized
Pre-transformed                                  Transformed Pre-transformed                                  Transformed
        Vertices                                 Vertices          Fragments                                  Fragments

                        Programmable                                                     Programmable
                            Vertex                                                         Fragment
                          Processor                                                        Processor
A History of Innovation




    1995                 1999          2002          2003          2004          2005        2006-2007
    NV1               GeForce 256    GeForce4      GeForce FX    GeForce 6     GeForce 7     GeForce 8
  1 Million            22 Million    63 Million    130 Million   222 Million   302 Million   754 Million
 Transistors           Transistors   Transistors   Transistors   Transistors   Transistors   Transistors
                                                                                                                2008
                                                                                                           GeForce GTX 200
                                                                                                              1.4 Billion
                                                                                                             Transistors



                 "#$but what do all these extra transistors do?

NVIDIA Confidential
GPU continues to offload CPU work
        Geom               Geom       Triangle         Pixel
                                                                     Z / Blend
        Gather             Proc         Proc           Proc
                                                                                             1996
                  CPU                                          GPU




        Geom               Geom       Triangle         Pixel
                                                                     Z / Blend
        Gather             Proc         Proc           Proc                                  2000
   CPU                                             GPU


Scene            Physics     Geom       Geom       Triangle      Pixel
                                                                                 Z / Blend
Mgmt              and AI     Gather     Proc         Proc        Proc
                                                                                             2004
        CPU                                            GPU



Scene            Physics     Geom       Geom       Triangle      Pixel
                                                                                 Z / Blend
Mgmt              and AI     Gather     Proc         Proc        Proc
                                                                                             2008
  CPU                                            GPU
Programming Model
 API: Set of functions, procedures or classes
 that an OS, library or service provides to
 support requests made by computer
 programs
 DirectX: Collection of APIs to handle
 multimedia, esp. game programming and
 video tasks, on MS platforms.
 OpenGL (Open Graphics Library) is a
 standard specification defining a cross-
 language, cross-platform API for writing
 applications that produce 2D and 3D
 computer graphics.
Why is 3D Graphics important?
More than just Fun and Games....




Tokyo, Japan                       California Coastline
3D Consumer Applications
  Vista      Office        PDFs




  Music       Photos       Maps
GPUS IN HPC
!"#$%&'#()#*)+,#-.//#,/


 Massive
   Data
Parallelism




Instruction
   Level
Parallelism



              Data Fits in Cache   Huge Data Sets
GPU Processing Power
CPU, meet your new partner!

                                         '9C



                 $"#    !"#
    #;<2=&>012&?@&AB,   !"#$#%&'()&*+,
              -&/0123   *-.&/0123
         4.*&'6789:     45.-&(6789:


                                         >9C
Beyond Graphics

 With floating-point math and textures, graphics
 processors can be used for more than just graphics
    %&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28


 Lots of ongoing research mapping algorithms and
 problems onto programmable GPUs
    Solving Linear Equations
    Black-Scholes Options Pricing
    Rigid- and Soft-Body Dynamics


 Middleware layers being developed to accelerate
 )*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)
What is GPGPU ?
  General Purpose computation using GPU
  in applications other than 3D graphics
     GPU accelerates critical path of application
  Data parallel algorithms leverage GPU attributes
     Large data arrays, streaming throughput
     Fine-grain SIMD parallelism
     Floating point (FP) computation
  %,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42

  Applications ! see //GPGPU.org
     Game effects (FX) physics, image processing
     Physical modeling, computational engineering, matrix
     algebra, convolution, correlation, sorting
Why Computation on the GPU?
  A quiet buildup of potential
       Calculation Throughput and Memory Bandwidth: 10X
       Equivalent performance at fraction of power & cost
       GPU in every PC ! pervasive presence and massive impact
  %&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8
  Natively designed to handle massive threading
  Every pixel is a thread
  Increased precision (fp32), programmability, flexibility
  GPUs are a mass-market parallel processor
       Economies of scale
  Peak floating point performance is much higher than comparable
  CPUs

    ATI x1900XT                     Intel Core 2 Duo E6600
    !$400 (video card)              !$400 (processor only)
    !250 GFLOPs (SP Float)          !40 GFLOPS (SP Float)
    !46 GB main memory BW           !8.5 GB main memory BW
Why Computation on the GPU?
  Supercomputing Performance
     Inherently Parallel Architecture
     1000+ cores, massively parallel processing
     250x the compute performance of a PC
  Personal
     )B+*$Researcher, One C/0*,:140/5*,8
     Supercomputer in a desktop system
     Plugs into standard power strip
  Accessible
     Program in C, C++, Fortran for Windows or Linux
     Available from OEMs and resellers worldwide and priced
     like a workstation
Compute Applications
  Computational Fluid Dynamics   Data Mining, Analytics &
  Computer Aided Engineering     Databases
  Digital Content Creation       MATLAB Acceleration
  Electronic Design Automation   Molecular Dynamics
  Finance                        Weather, Atmospheric, Ocean
  Game Physics                   Modeling, and Space Sciences
  Graphics                       Libraries
  Imaging and Computer Vision    Oil & Gas
  Medical Imaging                Programming Tools
  Numerics                       Ray Tracing
  Bio-Informatics and Life       Signal Processing
  Sciences                       Video & Audio
  Computational Chemistry
  Computational
  Electromagnetics &
  Electrodynamics
Heterogeneous Computing




   Multi-Core   Parallel-Core
     CPU            GPU
APIS FOR HETEROGENEOUS COMPUTING
APIs for Heterogeneous Computing
 CUDA (Compute Unified Device Architecture) is a
 parallel computing architecture developed by NVIDIA.
 Programmers use 'C for CUDA' (C with NVIDIA
 extensions), compiled through a PathScale Open64 C
 compiler, to code algorithms for execution on the
 GPU. Both low/high level APIs are provided
 OpenCL (Open Computing Language) is a framework
 for writing programs that execute across
 heterogeneous platforms consisting of CPUs, GPUs,
 and other processors.
 Microsoft DirectCompute is an API that supports
 General-purpose computing on GPUs on Microsoft
 Win Vista or Win 7. DirectCompute is part of the
 Microsoft DirectX collection of APIs.
OpenCL
OpenCL: Platform Model & Program Structure

   One Host+ one or more Compute Devices
      Each Compute Device is composed of one or more
      Compute Units
      Each Compute Unit is further divided into one or more
      Processing Elements
CUDA Parallel Computing Architecture


ISA and hardware
compute engine

Includes a C-compiler
plus support for
OpenCL and
DX11 Compute

Architected to natively
support all
computational
interfaces
(standard languages
and APIs)
Option 1
OpenCL and C for CUDA


                                         Entry point for
                            C for CUDA   developers who
                                         prefer high-level C

      Entry point for
developers who want     OpenCL
       low-level API

   Shared back-end
      compiler and          PTX
       optimization
        technology

                            GPU
CUDA SuccessDScience & Computation
Not 2x or 3x, but speedups are 20x to 150x




    146X            36X              18X              50X             100X

    Medical      Molecular           Video          Matlab         Astrophysic
   Imaging       Dynamics         Transcoding     Computing             s
   U of Utah    U of Illinois,   Elemental Tech   AccelerEyes        RIKEN
                  Urbana




    149X             47X             20X             130X              30X

    Financial   Linear Algebra        3D           Quantum             Gene
   simulation    Universidad      Ultrasound       Chemistry        Sequencing
     Oxford         Jaime         Techniscan      U of Illinois,   U of Maryland
                                                    Urbana
100x more affordable
                                        20x less power
                                                                     Tesla
      250x                               consumption               Personal
                                                                 Supercomputer
Performance




                    Supercomputing
                        Cluster                       250x
                                                     Faster




              1x                                                    E1;-9F2
                                                                  Workstations




                   $100K - $1M                              < $10 K
                                        Accessibility
C1.@6+7$5<*$G1,.;F2$H125$3140.*I$
Challenges


                                  Film



      Science
                                           Auto Design
                      Oil & Gas




    Medicine



                Broadcast           Space Exploration
Grand Computing Challenges




                Personalized    Mathematics for   Information
 Renewable
                 Medicine          Scientific     Data Mining
  Energy
                                  Discovery




Machines That   Natural Human       Predict
                                                  Economic
   Think           Machine       Environmental
                                                  Analysis
                  Interaction      Changes
Final Thoughts

 GPU and heterogeneous parallel
 architecture will revolutionize computing

 Parallel computing needed to solve some of
 the most interesting and important human
 challenges ahead

 Learning parallel programming is imperative
 for students in computing and sciences
From Virtua Fighter to Tsubame


       1995 ! NV1         2008 ! GT200
       0.8M transistors   1,200M transistors

           50MHz                1.3GHz

          1M Bytes            4G Bytes

         0 GFLOPS            1 TFLOPS



   Another 1000x in 15 years?
BACKUP
Graphics API History
Open GL
1992: OpenGL 1.0
1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing)
1998: OpenGL 1.2 (3D Textures, BGRA pixel format)
1998: OpenGL 1.2.1 (Multi-Texture)
2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures)
2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation)
2003: OpenGL 1.5 (Vertex Attr from Vid Mem)
2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex)
2006: OpenGL 2.1 (GLSL1.2, sRGB Textures)
2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures)
2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI)
2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)
OpenGL ES

 Designed for hand-held and embedded devices
    Goal is smaller footprint to support OpenGL
    PlayStation 3 and cell phone industry adopting ES
 OpenGL ES 1.1
    Strips out anything deemed extra in OpenGL
    Keeps conventional fixed-function vertex and fragment
    processing
 OpenGL ES 2.0
    Adds programmable vertex and fragment shaders
    Shaders specified in binary format
    Drops support for fixed-function vertex and fragment
    processing
OpenGL ES ! Cont


 OpenGL ES 1.0 : Symbian OS, Android Platform
 OpenGL ES 1.0+ : Playstation 3
 OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)
 Open GL ES 2.0 : iPhone 3GS, iPOD touch
DirectX

GDI: legacy Windows graphics API ~1985
DirectX 1.0 ! 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput)
DirectX 3.0 ! 1996 (Rasterization only 3D Support, Akward prog. Model, Not
successful)
DirectX 5.0 ! 1997 (Draw Primitives, DirectX vs OpenGL War)
DirectX 6.0 ! 1998 (Multitexture, OGL/Glide features, Texture Compression)
DirectX 7.0 ! 1999 (Geometry HW accleration and Blending, Cube mapping)
DirectX 8.0 ! 2000/1 (Programable VS/PS Shaders, XBOX)
DirectX 9.0 ! 2002-2003 (More programmability, Branching, FP pixel prog.)
DirectX 9.0c ! 2004 (ShaderModel 3.0)
DirectX 10.0 ! 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output)
DirectX 10.1 ! 2008 (SM4.1, Better Image Quality)
DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)

More Related Content

What's hot

Java me 08-mobile3d
Java me 08-mobile3dJava me 08-mobile3d
Java me 08-mobile3dHemanth Raju
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsFisnik Kraja
 
Poser pro reference manual
Poser pro reference manualPoser pro reference manual
Poser pro reference manualSykrayo
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Ghajini - The Game Development
Ghajini - The Game DevelopmentGhajini - The Game Development
Ghajini - The Game DevelopmentImran K
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...Owen Wu
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
 
Google I/O 2013 - Android Graphics Performance
Google I/O 2013 - Android Graphics PerformanceGoogle I/O 2013 - Android Graphics Performance
Google I/O 2013 - Android Graphics PerformanceDouO
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
 

What's hot (15)

Java me 08-mobile3d
Java me 08-mobile3dJava me 08-mobile3d
Java me 08-mobile3d
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
 
Poser pro reference manual
Poser pro reference manualPoser pro reference manual
Poser pro reference manual
 
Hardware Accelerated 2D Rendering for Android
Hardware Accelerated 2D Rendering for AndroidHardware Accelerated 2D Rendering for Android
Hardware Accelerated 2D Rendering for Android
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Gtug
GtugGtug
Gtug
 
Ghajini - The Game Development
Ghajini - The Game DevelopmentGhajini - The Game Development
Ghajini - The Game Development
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
 
Gstarcad 2012 ppt
Gstarcad 2012 ppt Gstarcad 2012 ppt
Gstarcad 2012 ppt
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 
Google I/O 2013 - Android Graphics Performance
Google I/O 2013 - Android Graphics PerformanceGoogle I/O 2013 - Android Graphics Performance
Google I/O 2013 - Android Graphics Performance
 
Specsheet sncdh160
Specsheet sncdh160Specsheet sncdh160
Specsheet sncdh160
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 

Viewers also liked (9)

Jonathan bromley doulos
Jonathan bromley doulosJonathan bromley doulos
Jonathan bromley doulos
 
Chris brown ti
Chris brown tiChris brown ti
Chris brown ti
 
Dv club foils_intel_austin
Dv club foils_intel_austinDv club foils_intel_austin
Dv club foils_intel_austin
 
Dill may-2008
Dill may-2008Dill may-2008
Dill may-2008
 
Benjamin q4 2008_bristol
Benjamin q4 2008_bristolBenjamin q4 2008_bristol
Benjamin q4 2008_bristol
 
Colwell validation attitude
Colwell validation attitudeColwell validation attitude
Colwell validation attitude
 
Ludden power7 verification
Ludden power7 verificationLudden power7 verification
Ludden power7 verification
 
Zehr dv club_12052006
Zehr dv club_12052006Zehr dv club_12052006
Zehr dv club_12052006
 
Zhang rtp q307
Zhang rtp q307Zhang rtp q307
Zhang rtp q307
 

Similar to 3 d to _hpc

GPU Virtualization on VMware's Hosted I/O Architecture
GPU Virtualization on VMware's Hosted I/O ArchitectureGPU Virtualization on VMware's Hosted I/O Architecture
GPU Virtualization on VMware's Hosted I/O Architectureguestb3fc97
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.ozlael ozlael
 
Gentek Introduce(en)
Gentek Introduce(en)Gentek Introduce(en)
Gentek Introduce(en)cloudmmog
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012Mark Kilgard
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.ozlael ozlael
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit pptNitesh Dubey
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...Edge AI and Vision Alliance
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Persistent Systems Ltd.
 
Adobe AIR - Mobile Performance – Tips & Tricks
Adobe AIR - Mobile Performance – Tips & TricksAdobe AIR - Mobile Performance – Tips & Tricks
Adobe AIR - Mobile Performance – Tips & TricksMihai Corlan
 
Core image presentation
Core image presentationCore image presentation
Core image presentationKyle Stewart
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 

Similar to 3 d to _hpc (20)

GPU Virtualization on VMware's Hosted I/O Architecture
GPU Virtualization on VMware's Hosted I/O ArchitectureGPU Virtualization on VMware's Hosted I/O Architecture
GPU Virtualization on VMware's Hosted I/O Architecture
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
Gentek Introduce(en)
Gentek Introduce(en)Gentek Introduce(en)
Gentek Introduce(en)
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
 
Adobe AIR - Mobile Performance – Tips & Tricks
Adobe AIR - Mobile Performance – Tips & TricksAdobe AIR - Mobile Performance – Tips & Tricks
Adobe AIR - Mobile Performance – Tips & Tricks
 
Core image presentation
Core image presentationCore image presentation
Core image presentation
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
Alexey Savchenko, Unreal Engine
Alexey Savchenko, Unreal EngineAlexey Savchenko, Unreal Engine
Alexey Savchenko, Unreal Engine
 

More from Obsidian Software (20)

Yang greenstein part_2
Yang greenstein part_2Yang greenstein part_2
Yang greenstein part_2
 
Yang greenstein part_1
Yang greenstein part_1Yang greenstein part_1
Yang greenstein part_1
 
Williamson arm validation metrics
Williamson arm validation metricsWilliamson arm validation metrics
Williamson arm validation metrics
 
Whipp q3 2008_sv
Whipp q3 2008_svWhipp q3 2008_sv
Whipp q3 2008_sv
 
Vishakantaiah validating
Vishakantaiah validatingVishakantaiah validating
Vishakantaiah validating
 
Validation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentValidation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environment
 
Tobin verification isglobal
Tobin verification isglobalTobin verification isglobal
Tobin verification isglobal
 
Tierney bq207
Tierney bq207Tierney bq207
Tierney bq207
 
The validation attitude
The validation attitudeThe validation attitude
The validation attitude
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
Strickland dvclub
Strickland dvclubStrickland dvclub
Strickland dvclub
 
Stinson post si and verification
Stinson post si and verificationStinson post si and verification
Stinson post si and verification
 
Shultz dallas q108
Shultz dallas q108Shultz dallas q108
Shultz dallas q108
 
Shreeve dv club_ams
Shreeve dv club_amsShreeve dv club_ams
Shreeve dv club_ams
 
Sharam salamian
Sharam salamianSharam salamian
Sharam salamian
 
Schulz sv q2_2009
Schulz sv q2_2009Schulz sv q2_2009
Schulz sv q2_2009
 
Schulz dallas q1_2008
Schulz dallas q1_2008Schulz dallas q1_2008
Schulz dallas q1_2008
 
Salamian dv club_foils_intel_austin
Salamian dv club_foils_intel_austinSalamian dv club_foils_intel_austin
Salamian dv club_foils_intel_austin
 
Sakar jain
Sakar jainSakar jain
Sakar jain
 

3 d to _hpc

  • 1. An Introduction to GPU 3D Games to HPC Krishnaraj Rao Presented at Bangalore DV Club, 03/12/2010
  • 2. Agenda 3D Graphics The Big Picture Quick Overview Programming Model Importance of 3D High Performance Parallel Computing Why GPUs for HPPC? Available APIs GPU Computing architecture Q&A
  • 3. The Big Picture ! Movies Capture Models Scene Rendering Post API Processing Creation Creation
  • 4. The Big Picture - Games Capture Models Scene Rendering Post API Processing !"#$% Drivers Creation HLSL, Creation Cg
  • 5. Models end up in World Space Worldspace includes everything! Position and orientation for all items is needed to accurately calculate transformations into screen space. Light Source Y Z View Point or Camera Screen X World Coordinate Space
  • 6. View Transformation world ends up on Screen Screen Coordinate Space
  • 7. Simple Interactive 3D Graphics App A simple example Static scene geometry, Vertex Setup Raster Fragment Raster Engine Engine Ops moving viewer Repeat this loop: Z Cull Texture CPU takes user input from joystick or mouse CPU re-calculates viewer position, view direction, and light positions in 3-D world space GPU clears memory and Update Viewer Read Draw all draws the complete scene Joystick Position and Light Scene geometry with the new Position Direction Objects viewer and light positions Repeat forever
  • 8. Adding Programmability to the Graphics Pipeline 3D Application or Game 3D API Commands 3D API: OpenGL or Direct3D CPU ! GPU Boundary GPU Assembled Command & Polygons, Pixel Data Stream Vertex Index Lines, and Location Pixel Stream Points Stream Updates GPU Primitive Rasterization & Raster Front Framebuffer Assembly Interpolation Operations End Rasterized Pre-transformed Transformed Pre-transformed Transformed Vertices Vertices Fragments Fragments Programmable Programmable Vertex Fragment Processor Processor
  • 9. A History of Innovation 1995 1999 2002 2003 2004 2005 2006-2007 NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8 1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million Transistors Transistors Transistors Transistors Transistors Transistors Transistors 2008 GeForce GTX 200 1.4 Billion Transistors "#$but what do all these extra transistors do? NVIDIA Confidential
  • 10. GPU continues to offload CPU work Geom Geom Triangle Pixel Z / Blend Gather Proc Proc Proc 1996 CPU GPU Geom Geom Triangle Pixel Z / Blend Gather Proc Proc Proc 2000 CPU GPU Scene Physics Geom Geom Triangle Pixel Z / Blend Mgmt and AI Gather Proc Proc Proc 2004 CPU GPU Scene Physics Geom Geom Triangle Pixel Z / Blend Mgmt and AI Gather Proc Proc Proc 2008 CPU GPU
  • 11. Programming Model API: Set of functions, procedures or classes that an OS, library or service provides to support requests made by computer programs DirectX: Collection of APIs to handle multimedia, esp. game programming and video tasks, on MS platforms. OpenGL (Open Graphics Library) is a standard specification defining a cross- language, cross-platform API for writing applications that produce 2D and 3D computer graphics.
  • 12. Why is 3D Graphics important? More than just Fun and Games.... Tokyo, Japan California Coastline
  • 13. 3D Consumer Applications Vista Office PDFs Music Photos Maps
  • 15. !"#$%&'#()#*)+,#-.//#,/ Massive Data Parallelism Instruction Level Parallelism Data Fits in Cache Huge Data Sets
  • 16. GPU Processing Power CPU, meet your new partner! '9C $"# !"# #;<2=&>012&?@&AB, !"#$#%&'()&*+, -&/0123 *-.&/0123 4.*&'6789: 45.-&(6789: >9C
  • 17. Beyond Graphics With floating-point math and textures, graphics processors can be used for more than just graphics %&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28 Lots of ongoing research mapping algorithms and problems onto programmable GPUs Solving Linear Equations Black-Scholes Options Pricing Rigid- and Soft-Body Dynamics Middleware layers being developed to accelerate )*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)
  • 18. What is GPGPU ? General Purpose computation using GPU in applications other than 3D graphics GPU accelerates critical path of application Data parallel algorithms leverage GPU attributes Large data arrays, streaming throughput Fine-grain SIMD parallelism Floating point (FP) computation %,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42 Applications ! see //GPGPU.org Game effects (FX) physics, image processing Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting
  • 19. Why Computation on the GPU? A quiet buildup of potential Calculation Throughput and Memory Bandwidth: 10X Equivalent performance at fraction of power & cost GPU in every PC ! pervasive presence and massive impact %&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8 Natively designed to handle massive threading Every pixel is a thread Increased precision (fp32), programmability, flexibility GPUs are a mass-market parallel processor Economies of scale Peak floating point performance is much higher than comparable CPUs ATI x1900XT Intel Core 2 Duo E6600 !$400 (video card) !$400 (processor only) !250 GFLOPs (SP Float) !40 GFLOPS (SP Float) !46 GB main memory BW !8.5 GB main memory BW
  • 20. Why Computation on the GPU? Supercomputing Performance Inherently Parallel Architecture 1000+ cores, massively parallel processing 250x the compute performance of a PC Personal )B+*$Researcher, One C/0*,:140/5*,8 Supercomputer in a desktop system Plugs into standard power strip Accessible Program in C, C++, Fortran for Windows or Linux Available from OEMs and resellers worldwide and priced like a workstation
  • 21. Compute Applications Computational Fluid Dynamics Data Mining, Analytics & Computer Aided Engineering Databases Digital Content Creation MATLAB Acceleration Electronic Design Automation Molecular Dynamics Finance Weather, Atmospheric, Ocean Game Physics Modeling, and Space Sciences Graphics Libraries Imaging and Computer Vision Oil & Gas Medical Imaging Programming Tools Numerics Ray Tracing Bio-Informatics and Life Signal Processing Sciences Video & Audio Computational Chemistry Computational Electromagnetics & Electrodynamics
  • 22. Heterogeneous Computing Multi-Core Parallel-Core CPU GPU
  • 24. APIs for Heterogeneous Computing CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. Both low/high level APIs are provided OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. Microsoft DirectCompute is an API that supports General-purpose computing on GPUs on Microsoft Win Vista or Win 7. DirectCompute is part of the Microsoft DirectX collection of APIs.
  • 26. OpenCL: Platform Model & Program Structure One Host+ one or more Compute Devices Each Compute Device is composed of one or more Compute Units Each Compute Unit is further divided into one or more Processing Elements
  • 27. CUDA Parallel Computing Architecture ISA and hardware compute engine Includes a C-compiler plus support for OpenCL and DX11 Compute Architected to natively support all computational interfaces (standard languages and APIs)
  • 28. Option 1 OpenCL and C for CUDA Entry point for C for CUDA developers who prefer high-level C Entry point for developers who want OpenCL low-level API Shared back-end compiler and PTX optimization technology GPU
  • 29. CUDA SuccessDScience & Computation Not 2x or 3x, but speedups are 20x to 150x 146X 36X 18X 50X 100X Medical Molecular Video Matlab Astrophysic Imaging Dynamics Transcoding Computing s U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN Urbana 149X 47X 20X 130X 30X Financial Linear Algebra 3D Quantum Gene simulation Universidad Ultrasound Chemistry Sequencing Oxford Jaime Techniscan U of Illinois, U of Maryland Urbana
  • 30. 100x more affordable 20x less power Tesla 250x consumption Personal Supercomputer Performance Supercomputing Cluster 250x Faster 1x E1;-9F2 Workstations $100K - $1M < $10 K Accessibility
  • 31. C1.@6+7$5<*$G1,.;F2$H125$3140.*I$ Challenges Film Science Auto Design Oil & Gas Medicine Broadcast Space Exploration
  • 32. Grand Computing Challenges Personalized Mathematics for Information Renewable Medicine Scientific Data Mining Energy Discovery Machines That Natural Human Predict Economic Think Machine Environmental Analysis Interaction Changes
  • 33. Final Thoughts GPU and heterogeneous parallel architecture will revolutionize computing Parallel computing needed to solve some of the most interesting and important human challenges ahead Learning parallel programming is imperative for students in computing and sciences
  • 34. From Virtua Fighter to Tsubame 1995 ! NV1 2008 ! GT200 0.8M transistors 1,200M transistors 50MHz 1.3GHz 1M Bytes 4G Bytes 0 GFLOPS 1 TFLOPS Another 1000x in 15 years?
  • 37. Open GL 1992: OpenGL 1.0 1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing) 1998: OpenGL 1.2 (3D Textures, BGRA pixel format) 1998: OpenGL 1.2.1 (Multi-Texture) 2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures) 2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation) 2003: OpenGL 1.5 (Vertex Attr from Vid Mem) 2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex) 2006: OpenGL 2.1 (GLSL1.2, sRGB Textures) 2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures) 2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI) 2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)
  • 38. OpenGL ES Designed for hand-held and embedded devices Goal is smaller footprint to support OpenGL PlayStation 3 and cell phone industry adopting ES OpenGL ES 1.1 Strips out anything deemed extra in OpenGL Keeps conventional fixed-function vertex and fragment processing OpenGL ES 2.0 Adds programmable vertex and fragment shaders Shaders specified in binary format Drops support for fixed-function vertex and fragment processing
  • 39. OpenGL ES ! Cont OpenGL ES 1.0 : Symbian OS, Android Platform OpenGL ES 1.0+ : Playstation 3 OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models) Open GL ES 2.0 : iPhone 3GS, iPOD touch
  • 40. DirectX GDI: legacy Windows graphics API ~1985 DirectX 1.0 ! 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput) DirectX 3.0 ! 1996 (Rasterization only 3D Support, Akward prog. Model, Not successful) DirectX 5.0 ! 1997 (Draw Primitives, DirectX vs OpenGL War) DirectX 6.0 ! 1998 (Multitexture, OGL/Glide features, Texture Compression) DirectX 7.0 ! 1999 (Geometry HW accleration and Blending, Cube mapping) DirectX 8.0 ! 2000/1 (Programable VS/PS Shaders, XBOX) DirectX 9.0 ! 2002-2003 (More programmability, Branching, FP pixel prog.) DirectX 9.0c ! 2004 (ShaderModel 3.0) DirectX 10.0 ! 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output) DirectX 10.1 ! 2008 (SM4.1, Better Image Quality) DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)