In this presentation from 2012, AMD details the potential benefits that developers could take advantage of to leverage additional performance efficiency boosts and parallellism in gaming via utilizing the HSA capabilities of selected silicon.
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
AMD 2012: HSA in Gaming
1. GPGPU ALGORITHMS IN GAMES
How Heterogeneous Systems Architecture can be
leveraged to optimize algorithms in video games
Matthijs De Smedt
Nixxes Software B.V.
Lead Graphics Programmer
2. | HSA Algorithms in Games | June 13th, 2012
CONTENTS
A short introduction
Current usage of GPGPU in games
Heterogeneous Systems Architecture
Examples made possible by HSA
4. | HSA Algorithms in Games | June 13th, 2012
VIDEOGAMES
Games are near real-time simulations
Response time is key
Most systems run in sync with the output frequency
– Rendering 60 frames per second
– Allows for 16ms of processing time
Framerate is limited either by:
– GPU
– CPU
– Display (VSync)
CPU
GPU
Input
Simulate
Render
Render
5. | HSA Algorithms in Games | June 13th, 2012
HARDWARE
Typical hardware target for PC games:
– One multicore CPU
– One GPU
Multiple GPUs: CrossFire
– Transparent to the application
– Driver alternates frames between GPUs
GPUs are becoming more general purpose:
– General Purpose GPU algorithms (GPGPU)
CrossFire
7. | HSA Algorithms in Games | June 13th, 2012
INTRODUCTION TO GPGPU
Rendering is a sequence of parallel algorithms
GPUs are great at parallel computation
Evolution of hardware and software to general purpose
First GPGPU was accomplished with programmable rendering
– DirectX
– OpenGL
Second generation using dedicated GPGPU APIs:
– CUDA
– OpenCL
– DirectCompute
Third generation of GPGPU on the way:
– Heterogeneous Systems Architecture
8. | HSA Algorithms in Games | June 13th, 2012
GPGPU IN GAMES
Some GPGPU algorithms are being used in
games right now. For example:
– Physics
Particles
Fluid simulation
Destruction
– Specialized graphics algorithms
Post-processing
All these algorithms drive visual effects
GPU particle system by Fairlight
9. | HSA Algorithms in Games | June 13th, 2012
CURRENT PHYSICS EXAMPLE
GPGPU particle simulation using DirectCompute
Great for simulating thousands of visible particles
Results of simulation are never copied back to CPU
– Can not interfere with gameplay
– Not synced in networked games
Example: Smoke particles that affect game AI
CPU
GPU
Call GPU
Simulate
particles
Render
particles
10. | HSA Algorithms in Games | June 13th, 2012
GPGPU LIMITATIONS
Why isn’t GPGPU used more for non-graphics?
Latency
– DirectX has many layers and buffers
– DirectX commands are buffered up to multiple frames
– Actual execution on the GPU is delayed
Copy overhead
– GPU cannot directly access application memory
– Must copy all data from and to the application
Functionality
– Constrained programming models
12. | HSA Algorithms in Games | June 13th, 2012
HETEROGENEOUS SYSTEMS ARCHITECTURE
Hardware Software
"Drivers"
– HSA provides a new, thin Compute API
– Very low latency
– Unified Address Space
– Exposes more hardware capabilities
HSA Intermediate Language
– Virtual ISA
– Introduces CPU programming features to the GPU
New features on discrete GPUs
Accelerated Processing Unit
– Next generation processor
– Multiple CPU and GPU cores on
the same die
– Shared memory access
– Soon to be as widespread as
multicore CPUs
New hardware and software
13. | HSA Algorithms in Games | June 13th, 2012
USING THE APU
Distinction between two hardware configurations
APU without discrete GPU
– Found in many laptops, soon in many desktops
– Use the on-die GPU for rendering
APU with discrete GPU:
– Hard-core gamers will still use discrete GPUs
– Asymmetrical CrossFire
– Or: Dedicate the on-die GPU to Compute algorithms
Could result in massive speedup of algorithms
Using SIMD co-processors to offload the CPU is familiar to PS3 developers
14. | HSA Algorithms in Games | June 13th, 2012
COPY OVERHEAD
Current Compute APIs require the application to explicitly copy all input and output memory
– Copying can easily takes longer than processing on CPU!
– Only small datasets or very expensive computations benefit from GPGPU
HSA introduces a Unified Address Space for CPU and GPU memory
– CPU pointers on the GPU
– Virtual memory on the GPU
Paging over PCI-Express (discrete) or shared memory controller (APU)
– Fully coherent
– Will make GPGPU an option for many more algorithms
15. | HSA Algorithms in Games | June 13th, 2012
LATENCY
DirectX commands are buffered
When the GPU is fully loaded this buffer is saturated
Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames
– Results will be several frames behind
– Game simulation needs all objects to be in sync
GPGPU is currently impractical to use for anything but visual effects
20. | HSA Algorithms in Games | June 13th, 2012
LATENCY
HSA’s new Compute API will reduce latency
How to deal with a saturated GPU?
A second GPU
– Dedicate the APU to Compute
– Virtually no latency
HSA feature: Graphics pre-emption
– Context switching on the GPU
Interrupt a graphics task (typically a large command list)
Execute Compute algorithm
Switch back to graphics
– Can be used both on discrete GPUs or on the APU
Choose the solution best suited to your needs
21. | HSA Algorithms in Games | June 13th, 2012
APU USAGE EXAMPLE
GPU
CPU
HSA
Frame
Schedule
DirectCompute
Execute
Execute
22. | HSA Algorithms in Games | June 13th, 2012
PROGRAMMING MODEL
HSA Intermediate Language: HSAIL
Designed for parallel algorithms
JIT compiles your algorithm to CPU or GPU hardware
– Also makes multi-core SIMD programming easy!
High level language features
– Object-oriented programming
– Virtual functions
– Exceptions
Debugging
SysCall support
– I/O
24. | HSA Algorithms in Games | June 13th, 2012
PHYSICS
Current GPGPU physics solutions only output to
the renderer
With HSA you can simulate physics on the GPU
and get the results back in the same frame
Use hardware acceleration to compute physics for
gameplay objects
Reduced CPU load
More objects, higher fidelity
25. | HSA Algorithms in Games | June 13th, 2012
FRUSTUM CULLING
Videogames tend to be GPU-bound
Avoid rendering what cannot be seen
Cull objects outside the camera viewport
– Test the bounding box of every object against
the camera frustum
– Currently done on the CPU
– Lots of vector math
– Can be computed completely in parallel!
CPU needs the results immediately
– HSA will allow low-latency execution
26. | HSA Algorithms in Games | June 13th, 2012
OCCLUSION CULLING
Objects may be hidden behind others: Occlusion
Final per-pixel occlusion is only known after
rendering the scene
Approximate occlusion by rendering low-detail
geometry
– This kind of occlusion culling is currently being
done on CPU or on SPUs
– Rendering is better suited to GPUs
HSA solution:
– Software rasterization in Compute on the GPU
– HSA does not yet expose graphics pipeline
– Still much faster than a multicore CPU
Software occlusion culling in Battlefield 3
27. | HSA Algorithms in Games | June 13th, 2012
SORTING
Typically several long lists per frame need sorting
Sorting on the GPU using a parallel sort algorithm
– Ken Batcher: Bitonic or Odd-even mergesort
Copy overhead currently negates the performance
advantage of using a GPU sorting algorithm
HSA solution:
– Unified Address Space
– GPU can sort in-place in system memory
28. | HSA Algorithms in Games | June 13th, 2012
ASSET DECOMPRESSION
Game assets are stored compressed on disk
Decompression is expensive
The usage of some compression algorithms is
prevented by CPU speed
Games are moving away from loading screens
An APU with Unified Address Space
– Can be used to decompress new assets
without taxing the CPU or discrete GPU
– Perhaps even use HSAIL I/O to read from disk
– A better streaming experience for gamers
29. | HSA Algorithms in Games | June 13th, 2012
PATHFINDING
Some strategy games simulate thousands of units
Pathfinding over complex terrain with thousands of
moving units is very expensive
Clever approximate solutions are often used
– Supreme Commander 2 “Flow field”
GPGPU pathfinding with HSA
– Use one GPU thread per unit to do a deep
search for an optimal path
– With HSA such an algorithm can page all
requisite data from system memory and write
back found paths
– APU could be fully saturated with pathfinding
without impacting framerate
30. | HSA Algorithms in Games | June 13th, 2012
CONCLUSION
Many algorithms in games are suitable for offloading to the GPU
Heterogeneous Systems Architecture solves two major obstacles
– Latency
– Memory access
HSAIL allows for entirely new kinds of GPGPU programs
APUs can be used to offload the CPU
HSA will finally make GPUs available to developers as full-featured co-processors
31. | HSA Algorithms in Games | June 13th, 2012
THANK YOU
Any questions?
32.
33. | HSA Algorithms in Games | June 13th, 2012
Disclaimer & Attribution
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions
and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited
to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product
differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no
obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to
make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.
NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY
DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL
OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in
this presentation are for informational purposes only and may be trademarks of their respective owners.
The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and
opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is
not responsible for the content herein and no endorsements are implied.