SlideShare a Scribd company logo
GPGPU ALGORITHMS IN GAMES
How Heterogeneous Systems Architecture can be
leveraged to optimize algorithms in video games
Matthijs De Smedt
Nixxes Software B.V.
Lead Graphics Programmer
CONTENTS


 A short introduction
 Current usage of GPGPU in games
 Heterogeneous Systems Architecture
 Examples made possible by HSA




 | HSA Algorithms in Games | June 13th, 2012
INTRODUCTION
VIDEOGAMES


 Games are near real-time simulations
                                                       CPU
 Response time is key                                        Input
 Most systems run in sync with the output frequency
   – Rendering 60 frames per second
   – Allows for 16ms of processing time                      Simulate
 Framerate is limited either by:
   – GPU
                                                             Render
   – CPU
   – Display (VSync)

                                                       GPU
                                                             Render



 | HSA Algorithms in Games | June 13th, 2012
HARDWARE


 Typical hardware target for PC games:
  – One multicore CPU
  – One GPU
 Multiple GPUs: CrossFire
  – Transparent to the application
  – Driver alternates frames between GPUs
 GPUs are becoming more general purpose:
  – General Purpose GPU algorithms (GPGPU)




                                               CrossFire




 | HSA Algorithms in Games | June 13th, 2012
GPGPU IN GAMES
INTRODUCTION TO GPGPU


 Rendering is a sequence of parallel algorithms
 GPUs are great at parallel computation
 Evolution of hardware and software to general purpose
 First GPGPU was accomplished with programmable rendering
  – DirectX
  – OpenGL
 Second generation using dedicated GPGPU APIs:
  – CUDA
  – OpenCL
  – DirectCompute
 Third generation of GPGPU on the way:
  – Heterogeneous Systems Architecture



 | HSA Algorithms in Games | June 13th, 2012
GPGPU IN GAMES


 Some GPGPU algorithms are being used in
  games right now. For example:
   – Physics
        Particles
        Fluid simulation
        Destruction
   – Specialized graphics algorithms
        Post-processing
 All these algorithms drive visual effects


                                               GPU particle system by Fairlight




 | HSA Algorithms in Games | June 13th, 2012
CURRENT PHYSICS EXAMPLE


 GPGPU particle simulation using DirectCompute
                                                        CPU
 Great for simulating thousands of visible particles
 Results of simulation are never copied back to CPU          Call GPU
   – Can not interfere with gameplay
   – Not synced in networked games
 Example: Smoke particles that affect game AI          GPU
                                                              Simulate
                                                              particles


                                                              Render
                                                              particles




 | HSA Algorithms in Games | June 13th, 2012
GPGPU LIMITATIONS


 Why isn’t GPGPU used more for non-graphics?
 Latency
  – DirectX has many layers and buffers
  – DirectX commands are buffered up to multiple frames
  – Actual execution on the GPU is delayed
 Copy overhead
  – GPU cannot directly access application memory
  – Must copy all data from and to the application
 Functionality
  – Constrained programming models




 | HSA Algorithms in Games | June 13th, 2012
HETEROGENEOUS SYSTEMS
         ARCHITECTURE
HETEROGENEOUS SYSTEMS ARCHITECTURE

 New hardware and software


Hardware                                       Software
 New features on discrete GPUs                 "Drivers"
 Accelerated Processing Unit                     – HSA provides a new, thin Compute API
  – Next generation processor                     – Very low latency
  – Multiple CPU and GPU cores on                 – Unified Address Space
    the same die                                  – Exposes more hardware capabilities
  – Shared memory access                        HSA Intermediate Language
  – Soon to be as widespread as                   – Virtual ISA
    multicore CPUs
                                                  – Introduces CPU programming features to the GPU




 | HSA Algorithms in Games | June 13th, 2012
USING THE APU


 Distinction between two hardware configurations
 APU without discrete GPU
  – Found in many laptops, soon in many desktops
  – Use the on-die GPU for rendering
 APU with discrete GPU:
  – Hard-core gamers will still use discrete GPUs
  – Asymmetrical CrossFire
  – Or: Dedicate the on-die GPU to Compute algorithms
        Could result in massive speedup of algorithms
        Using SIMD co-processors to offload the CPU is familiar to PS3 developers




 | HSA Algorithms in Games | June 13th, 2012
COPY OVERHEAD


 Current Compute APIs require the application to explicitly copy all input and output memory
  – Copying can easily takes longer than processing on CPU!
  – Only small datasets or very expensive computations benefit from GPGPU
 HSA introduces a Unified Address Space for CPU and GPU memory
  – CPU pointers on the GPU
  – Virtual memory on the GPU
        Paging over PCI-Express (discrete) or shared memory controller (APU)
  – Fully coherent
  – Will make GPGPU an option for many more algorithms




 | HSA Algorithms in Games | June 13th, 2012
LATENCY


 DirectX commands are buffered
 When the GPU is fully loaded this buffer is saturated
 Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames
   – Results will be several frames behind
   – Game simulation needs all objects to be in sync
 GPGPU is currently impractical to use for anything but visual effects




 | HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
LATENCY


 HSA’s new Compute API will reduce latency
 How to deal with a saturated GPU?
 A second GPU
  – Dedicate the APU to Compute
  – Virtually no latency
 HSA feature: Graphics pre-emption
  – Context switching on the GPU
        Interrupt a graphics task (typically a large command list)
        Execute Compute algorithm
        Switch back to graphics
  – Can be used both on discrete GPUs or on the APU
 Choose the solution best suited to your needs




 | HSA Algorithms in Games | June 13th, 2012
APU USAGE EXAMPLE


                                               Schedule           Execute


DirectCompute

             GPU
               CPU


                    HSA

                                                          Frame
                                               Execute

 | HSA Algorithms in Games | June 13th, 2012
PROGRAMMING MODEL


 HSA Intermediate Language: HSAIL
 Designed for parallel algorithms
 JIT compiles your algorithm to CPU or GPU hardware
  – Also makes multi-core SIMD programming easy!
 High level language features
  – Object-oriented programming
  – Virtual functions
  – Exceptions
 Debugging
 SysCall support
  – I/O




 | HSA Algorithms in Games | June 13th, 2012
EXAMPLE ALGORITHMS
PHYSICS


 Current GPGPU physics solutions only output to
  the renderer
 With HSA you can simulate physics on the GPU
  and get the results back in the same frame
 Use hardware acceleration to compute physics for
  gameplay objects
 Reduced CPU load
 More objects, higher fidelity




 | HSA Algorithms in Games | June 13th, 2012
FRUSTUM CULLING


 Videogames tend to be GPU-bound
 Avoid rendering what cannot be seen
 Cull objects outside the camera viewport
  – Test the bounding box of every object against
    the camera frustum
  – Currently done on the CPU
  – Lots of vector math
  – Can be computed completely in parallel!
 CPU needs the results immediately
  – HSA will allow low-latency execution




 | HSA Algorithms in Games | June 13th, 2012
OCCLUSION CULLING


 Objects may be hidden behind others: Occlusion
 Final per-pixel occlusion is only known after
  rendering the scene
 Approximate occlusion by rendering low-detail
  geometry
   – This kind of occlusion culling is currently being
     done on CPU or on SPUs
   – Rendering is better suited to GPUs
 HSA solution:
   – Software rasterization in Compute on the GPU
   – HSA does not yet expose graphics pipeline 
                                                         Software occlusion culling in Battlefield 3
   – Still much faster than a multicore CPU




 | HSA Algorithms in Games | June 13th, 2012
SORTING


 Typically several long lists per frame need sorting
 Sorting on the GPU using a parallel sort algorithm
   – Ken Batcher: Bitonic or Odd-even mergesort
 Copy overhead currently negates the performance
  advantage of using a GPU sorting algorithm
 HSA solution:
   – Unified Address Space
   – GPU can sort in-place in system memory




 | HSA Algorithms in Games | June 13th, 2012
ASSET DECOMPRESSION


 Game assets are stored compressed on disk
 Decompression is expensive
 The usage of some compression algorithms is
  prevented by CPU speed
 Games are moving away from loading screens
 An APU with Unified Address Space
  – Can be used to decompress new assets
    without taxing the CPU or discrete GPU
  – Perhaps even use HSAIL I/O to read from disk
  – A better streaming experience for gamers




 | HSA Algorithms in Games | June 13th, 2012
PATHFINDING


 Some strategy games simulate thousands of units
 Pathfinding over complex terrain with thousands of
  moving units is very expensive
 Clever approximate solutions are often used
  – Supreme Commander 2 “Flow field”
 GPGPU pathfinding with HSA
  – Use one GPU thread per unit to do a deep
    search for an optimal path
  – With HSA such an algorithm can page all
    requisite data from system memory and write
    back found paths
  – APU could be fully saturated with pathfinding
    without impacting framerate




 | HSA Algorithms in Games | June 13th, 2012
CONCLUSION


 Many algorithms in games are suitable for offloading to the GPU
 Heterogeneous Systems Architecture solves two major obstacles
  – Latency
  – Memory access
 HSAIL allows for entirely new kinds of GPGPU programs
 APUs can be used to offload the CPU
 HSA will finally make GPUs available to developers as full-featured co-processors




 | HSA Algorithms in Games | June 13th, 2012
THANK YOU


 Any questions?




 | HSA Algorithms in Games | June 13th, 2012
Disclaimer & Attribution
        The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions
        and typographical errors.

        The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited
        to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product
        differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no
        obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to
        make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

        NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
        RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
        INFORMATION.

        ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY
        DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL
        OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
        EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

        AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in
        this presentation are for informational purposes only and may be trademarks of their respective owners.

        The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and
        opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is
        not responsible for the content herein and no endorsements are implied.




| HSA Algorithms in Games | June 13th, 2012

More Related Content

What's hot

Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
Kamran Ashraf
 
Gpu
GpuGpu
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
Amal R
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
Chakkrit (Kla) Tantithamthavorn
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
Saksham Tanwar
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
Dayakar Siddula
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)
junliwanag
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
Saurabh Kumar
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
Chetan Gole
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
self employed
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
AMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
AMD Developer Central
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
Josiah Lund
 
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
AMD Developer Central
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
AMD Developer Central
 
GS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
GS-4151, Developing Thief with new AMD technology, by Jurjen KatsmanGS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
GS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
AMD Developer Central
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
AMD Developer Central
 

What's hot (19)

Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
 
Gpu
GpuGpu
Gpu
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
GS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
GS-4151, Developing Thief with new AMD technology, by Jurjen KatsmanGS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
GS-4151, Developing Thief with new AMD technology, by Jurjen Katsman
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 

Viewers also liked

G325 overview Oakmead 2014
G325 overview  Oakmead 2014 G325 overview  Oakmead 2014
G325 overview Oakmead 2014
Julian McDougall
 
Social.media 201 - alumni senate 2010
Social.media 201  - alumni senate 2010Social.media 201  - alumni senate 2010
Social.media 201 - alumni senate 2010
Josh Stowe
 
我的班級
我的班級我的班級
我的班級k87414
 
Dennis "Whitey" Lueck's Vegetable Garden
Dennis "Whitey" Lueck's Vegetable GardenDennis "Whitey" Lueck's Vegetable Garden
Dennis "Whitey" Lueck's Vegetable Garden
Jgeglia
 
Social media 101 - alumni senate 2010
Social media 101  - alumni senate 2010Social media 101  - alumni senate 2010
Social media 101 - alumni senate 2010
Josh Stowe
 
D-Grid Infrastructure
D-Grid InfrastructureD-Grid Infrastructure
D-Grid Infrastructure
Stefan Freitag
 
Who we are
Who we areWho we are
Who we are
Temple FWB
 
Inhauteriak power point
Inhauteriak power pointInhauteriak power point
Inhauteriak power pointaltzaeskola
 
Computer People Summary
Computer People SummaryComputer People Summary
Computer People Summary
Martinjones123
 
Ccna1v3 Mod02
Ccna1v3 Mod02Ccna1v3 Mod02
Ccna1v3 Mod02
Arun Ajitha
 
Ibm web sphere datapower b2b appliance xb60 revealed
Ibm web sphere datapower b2b appliance xb60 revealedIbm web sphere datapower b2b appliance xb60 revealed
Ibm web sphere datapower b2b appliance xb60 revealed
netmotshop
 
Активный отдых с RussiaDiscovery
Активный отдых с RussiaDiscoveryАктивный отдых с RussiaDiscovery
Активный отдых с RussiaDiscovery
RussiaDiscovery
 
100624 tube 7 golven van opinie in beeld (aart paardekooper)
100624 tube 7   golven van opinie in beeld (aart paardekooper)100624 tube 7   golven van opinie in beeld (aart paardekooper)
100624 tube 7 golven van opinie in beeld (aart paardekooper)KennisLAB
 
Surf's Up! KennisLAB publicatie
Surf's Up! KennisLAB publicatieSurf's Up! KennisLAB publicatie
Surf's Up! KennisLAB publicatieKennisLAB
 
Capturing and Sharing Your Story
Capturing and Sharing Your StoryCapturing and Sharing Your Story
Capturing and Sharing Your Story
Healthy City
 
Career development meetingslideshare
Career development meetingslideshareCareer development meetingslideshare
Career development meetingslideshare
V
 
How to Use HealthyCity.org for Uploading Your Own Data
How to Use HealthyCity.org for Uploading Your Own Data How to Use HealthyCity.org for Uploading Your Own Data
How to Use HealthyCity.org for Uploading Your Own Data
Healthy City
 

Viewers also liked (20)

G325 overview Oakmead 2014
G325 overview  Oakmead 2014 G325 overview  Oakmead 2014
G325 overview Oakmead 2014
 
Social.media 201 - alumni senate 2010
Social.media 201  - alumni senate 2010Social.media 201  - alumni senate 2010
Social.media 201 - alumni senate 2010
 
我的班級
我的班級我的班級
我的班級
 
Dennis "Whitey" Lueck's Vegetable Garden
Dennis "Whitey" Lueck's Vegetable GardenDennis "Whitey" Lueck's Vegetable Garden
Dennis "Whitey" Lueck's Vegetable Garden
 
Social media 101 - alumni senate 2010
Social media 101  - alumni senate 2010Social media 101  - alumni senate 2010
Social media 101 - alumni senate 2010
 
D-Grid Infrastructure
D-Grid InfrastructureD-Grid Infrastructure
D-Grid Infrastructure
 
Who we are
Who we areWho we are
Who we are
 
Struktur kurikulum pbsi 2012
Struktur kurikulum pbsi 2012Struktur kurikulum pbsi 2012
Struktur kurikulum pbsi 2012
 
Inhauteriak power point
Inhauteriak power pointInhauteriak power point
Inhauteriak power point
 
Computer People Summary
Computer People SummaryComputer People Summary
Computer People Summary
 
Ccna1v3 Mod02
Ccna1v3 Mod02Ccna1v3 Mod02
Ccna1v3 Mod02
 
Ibm web sphere datapower b2b appliance xb60 revealed
Ibm web sphere datapower b2b appliance xb60 revealedIbm web sphere datapower b2b appliance xb60 revealed
Ibm web sphere datapower b2b appliance xb60 revealed
 
Активный отдых с RussiaDiscovery
Активный отдых с RussiaDiscoveryАктивный отдых с RussiaDiscovery
Активный отдых с RussiaDiscovery
 
Historiaurrea
HistoriaurreaHistoriaurrea
Historiaurrea
 
Historiaurrea
HistoriaurreaHistoriaurrea
Historiaurrea
 
100624 tube 7 golven van opinie in beeld (aart paardekooper)
100624 tube 7   golven van opinie in beeld (aart paardekooper)100624 tube 7   golven van opinie in beeld (aart paardekooper)
100624 tube 7 golven van opinie in beeld (aart paardekooper)
 
Surf's Up! KennisLAB publicatie
Surf's Up! KennisLAB publicatieSurf's Up! KennisLAB publicatie
Surf's Up! KennisLAB publicatie
 
Capturing and Sharing Your Story
Capturing and Sharing Your StoryCapturing and Sharing Your Story
Capturing and Sharing Your Story
 
Career development meetingslideshare
Career development meetingslideshareCareer development meetingslideshare
Career development meetingslideshare
 
How to Use HealthyCity.org for Uploading Your Own Data
How to Use HealthyCity.org for Uploading Your Own Data How to Use HealthyCity.org for Uploading Your Own Data
How to Use HealthyCity.org for Uploading Your Own Data
 

Similar to GPGPU algorithms in games

AMD 2012: HSA in Gaming
AMD 2012: HSA in GamingAMD 2012: HSA in Gaming
AMD 2012: HSA in Gaming
naroon2
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
HSA Foundation
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
Khan Mostafa
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
Kelum Senanayake
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
Savith Satheesh
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s project
Neelesh Vaish
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
cseij
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introduction
ijtsrd
 
Gpu
GpuGpu
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
Johan Andersson
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
AMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
r Skip
 
Making GPU resets less painful on Linux
Making GPU resets less painful on LinuxMaking GPU resets less painful on Linux
Making GPU resets less painful on Linux
Igalia
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
AMD Developer Central
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
CSCJournals
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
Dhan V Sagar
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
Editor IJARCET
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 

Similar to GPGPU algorithms in games (20)

AMD 2012: HSA in Gaming
AMD 2012: HSA in GamingAMD 2012: HSA in Gaming
AMD 2012: HSA in Gaming
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s project
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introduction
 
Gpu
GpuGpu
Gpu
 
Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)Parallel Futures of a Game Engine (v2.0)
Parallel Futures of a Game Engine (v2.0)
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
Making GPU resets less painful on Linux
Making GPU resets less painful on LinuxMaking GPU resets less painful on Linux
Making GPU resets less painful on Linux
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 

GPGPU algorithms in games

  • 1. GPGPU ALGORITHMS IN GAMES How Heterogeneous Systems Architecture can be leveraged to optimize algorithms in video games Matthijs De Smedt Nixxes Software B.V. Lead Graphics Programmer
  • 2. CONTENTS  A short introduction  Current usage of GPGPU in games  Heterogeneous Systems Architecture  Examples made possible by HSA | HSA Algorithms in Games | June 13th, 2012
  • 4. VIDEOGAMES  Games are near real-time simulations CPU  Response time is key Input  Most systems run in sync with the output frequency – Rendering 60 frames per second – Allows for 16ms of processing time Simulate  Framerate is limited either by: – GPU Render – CPU – Display (VSync) GPU Render | HSA Algorithms in Games | June 13th, 2012
  • 5. HARDWARE  Typical hardware target for PC games: – One multicore CPU – One GPU  Multiple GPUs: CrossFire – Transparent to the application – Driver alternates frames between GPUs  GPUs are becoming more general purpose: – General Purpose GPU algorithms (GPGPU) CrossFire | HSA Algorithms in Games | June 13th, 2012
  • 7. INTRODUCTION TO GPGPU  Rendering is a sequence of parallel algorithms  GPUs are great at parallel computation  Evolution of hardware and software to general purpose  First GPGPU was accomplished with programmable rendering – DirectX – OpenGL  Second generation using dedicated GPGPU APIs: – CUDA – OpenCL – DirectCompute  Third generation of GPGPU on the way: – Heterogeneous Systems Architecture | HSA Algorithms in Games | June 13th, 2012
  • 8. GPGPU IN GAMES  Some GPGPU algorithms are being used in games right now. For example: – Physics  Particles  Fluid simulation  Destruction – Specialized graphics algorithms  Post-processing  All these algorithms drive visual effects GPU particle system by Fairlight | HSA Algorithms in Games | June 13th, 2012
  • 9. CURRENT PHYSICS EXAMPLE  GPGPU particle simulation using DirectCompute CPU  Great for simulating thousands of visible particles  Results of simulation are never copied back to CPU Call GPU – Can not interfere with gameplay – Not synced in networked games  Example: Smoke particles that affect game AI GPU Simulate particles Render particles | HSA Algorithms in Games | June 13th, 2012
  • 10. GPGPU LIMITATIONS  Why isn’t GPGPU used more for non-graphics?  Latency – DirectX has many layers and buffers – DirectX commands are buffered up to multiple frames – Actual execution on the GPU is delayed  Copy overhead – GPU cannot directly access application memory – Must copy all data from and to the application  Functionality – Constrained programming models | HSA Algorithms in Games | June 13th, 2012
  • 11. HETEROGENEOUS SYSTEMS ARCHITECTURE
  • 12. HETEROGENEOUS SYSTEMS ARCHITECTURE  New hardware and software Hardware Software  New features on discrete GPUs  "Drivers"  Accelerated Processing Unit – HSA provides a new, thin Compute API – Next generation processor – Very low latency – Multiple CPU and GPU cores on – Unified Address Space the same die – Exposes more hardware capabilities – Shared memory access  HSA Intermediate Language – Soon to be as widespread as – Virtual ISA multicore CPUs – Introduces CPU programming features to the GPU | HSA Algorithms in Games | June 13th, 2012
  • 13. USING THE APU  Distinction between two hardware configurations  APU without discrete GPU – Found in many laptops, soon in many desktops – Use the on-die GPU for rendering  APU with discrete GPU: – Hard-core gamers will still use discrete GPUs – Asymmetrical CrossFire – Or: Dedicate the on-die GPU to Compute algorithms  Could result in massive speedup of algorithms  Using SIMD co-processors to offload the CPU is familiar to PS3 developers | HSA Algorithms in Games | June 13th, 2012
  • 14. COPY OVERHEAD  Current Compute APIs require the application to explicitly copy all input and output memory – Copying can easily takes longer than processing on CPU! – Only small datasets or very expensive computations benefit from GPGPU  HSA introduces a Unified Address Space for CPU and GPU memory – CPU pointers on the GPU – Virtual memory on the GPU  Paging over PCI-Express (discrete) or shared memory controller (APU) – Fully coherent – Will make GPGPU an option for many more algorithms | HSA Algorithms in Games | June 13th, 2012
  • 15. LATENCY  DirectX commands are buffered  When the GPU is fully loaded this buffer is saturated  Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames – Results will be several frames behind – Game simulation needs all objects to be in sync  GPGPU is currently impractical to use for anything but visual effects | HSA Algorithms in Games | June 13th, 2012
  • 16. | HSA Algorithms in Games | June 13th, 2012
  • 17. | HSA Algorithms in Games | June 13th, 2012
  • 18. | HSA Algorithms in Games | June 13th, 2012
  • 19. | HSA Algorithms in Games | June 13th, 2012
  • 20. LATENCY  HSA’s new Compute API will reduce latency  How to deal with a saturated GPU?  A second GPU – Dedicate the APU to Compute – Virtually no latency  HSA feature: Graphics pre-emption – Context switching on the GPU  Interrupt a graphics task (typically a large command list)  Execute Compute algorithm  Switch back to graphics – Can be used both on discrete GPUs or on the APU  Choose the solution best suited to your needs | HSA Algorithms in Games | June 13th, 2012
  • 21. APU USAGE EXAMPLE Schedule Execute DirectCompute GPU CPU HSA Frame Execute | HSA Algorithms in Games | June 13th, 2012
  • 22. PROGRAMMING MODEL  HSA Intermediate Language: HSAIL  Designed for parallel algorithms  JIT compiles your algorithm to CPU or GPU hardware – Also makes multi-core SIMD programming easy!  High level language features – Object-oriented programming – Virtual functions – Exceptions  Debugging  SysCall support – I/O | HSA Algorithms in Games | June 13th, 2012
  • 24. PHYSICS  Current GPGPU physics solutions only output to the renderer  With HSA you can simulate physics on the GPU and get the results back in the same frame  Use hardware acceleration to compute physics for gameplay objects  Reduced CPU load  More objects, higher fidelity | HSA Algorithms in Games | June 13th, 2012
  • 25. FRUSTUM CULLING  Videogames tend to be GPU-bound  Avoid rendering what cannot be seen  Cull objects outside the camera viewport – Test the bounding box of every object against the camera frustum – Currently done on the CPU – Lots of vector math – Can be computed completely in parallel!  CPU needs the results immediately – HSA will allow low-latency execution | HSA Algorithms in Games | June 13th, 2012
  • 26. OCCLUSION CULLING  Objects may be hidden behind others: Occlusion  Final per-pixel occlusion is only known after rendering the scene  Approximate occlusion by rendering low-detail geometry – This kind of occlusion culling is currently being done on CPU or on SPUs – Rendering is better suited to GPUs  HSA solution: – Software rasterization in Compute on the GPU – HSA does not yet expose graphics pipeline  Software occlusion culling in Battlefield 3 – Still much faster than a multicore CPU | HSA Algorithms in Games | June 13th, 2012
  • 27. SORTING  Typically several long lists per frame need sorting  Sorting on the GPU using a parallel sort algorithm – Ken Batcher: Bitonic or Odd-even mergesort  Copy overhead currently negates the performance advantage of using a GPU sorting algorithm  HSA solution: – Unified Address Space – GPU can sort in-place in system memory | HSA Algorithms in Games | June 13th, 2012
  • 28. ASSET DECOMPRESSION  Game assets are stored compressed on disk  Decompression is expensive  The usage of some compression algorithms is prevented by CPU speed  Games are moving away from loading screens  An APU with Unified Address Space – Can be used to decompress new assets without taxing the CPU or discrete GPU – Perhaps even use HSAIL I/O to read from disk – A better streaming experience for gamers | HSA Algorithms in Games | June 13th, 2012
  • 29. PATHFINDING  Some strategy games simulate thousands of units  Pathfinding over complex terrain with thousands of moving units is very expensive  Clever approximate solutions are often used – Supreme Commander 2 “Flow field”  GPGPU pathfinding with HSA – Use one GPU thread per unit to do a deep search for an optimal path – With HSA such an algorithm can page all requisite data from system memory and write back found paths – APU could be fully saturated with pathfinding without impacting framerate | HSA Algorithms in Games | June 13th, 2012
  • 30. CONCLUSION  Many algorithms in games are suitable for offloading to the GPU  Heterogeneous Systems Architecture solves two major obstacles – Latency – Memory access  HSAIL allows for entirely new kinds of GPGPU programs  APUs can be used to offload the CPU  HSA will finally make GPUs available to developers as full-featured co-processors | HSA Algorithms in Games | June 13th, 2012
  • 31. THANK YOU  Any questions? | HSA Algorithms in Games | June 13th, 2012
  • 32.
  • 33. Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied. | HSA Algorithms in Games | June 13th, 2012