Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
USING GPUS FOR
COLLISION DETECTION
AMD
Takahiro Harada
2 | Eurographics 2012 | Takahiro Harada
GPUS
3 | Eurographics 2012 | Takahiro Harada
GPU RAW PERFORMANCE
§ High memory bandwidth
§ High TFLOPS
§ Radeon HD 7970
–  3...
4 | Eurographics 2012 | Takahiro Harada
RAW PERFORMANCE IS HIGH, BUT
§ GPU performs good only if used correctly
–  Diverg...
5 | Eurographics 2012 | Takahiro Harada
EX) HASH GRID (LINKED LIST)
§ Low ALU op / Mem op ratio
§ Spatial query using ha...
6 | Eurographics 2012 | Takahiro Harada
EX) HASH GRID (LINKED LIST)
§ Low ALU op / Mem op ratio
§ Spatial query using ha...
7 | Eurographics 2012 | Takahiro Harada
REASON OF PERFORMANCE JUMP
§ Key architectural changes
–  No VLIW
–  Improved mem...
8 | Eurographics 2012 | Takahiro Harada
GPU COLLISION DETECTION
9 | Eurographics 2012 | Takahiro Harada
COLLISION DETECTION WAS HARD FOR GPUS
§ Old GPUs were not as flexible as today’s ...
10 | Eurographics 2012 | Takahiro Harada
UNIFORM GRID
§ Simplest but useful data structure
§ Pre Compute Language
–  Ver...
11 | Eurographics 2012 | Takahiro Harada
UNIFORM GRID
§ Simplest but useful data structure
§ Pre Compute Language
–  Ver...
12 | Eurographics 2012 | Takahiro Harada
WITH UNIFORM GRID
§ Often used for particle-based simulation
§ Other steps are ...
13 | Eurographics 2012 | Takahiro Harada
SMOOTHED PARTICLE HYDRODYNAMICS
§ Fluid simulation
§ Solving the Navier-Stokes ...
14 | Eurographics 2012 | Takahiro Harada
RIGID BODY SIMULATION
§ Represent a rigid body with a set of particles
§ Rigid ...
15 | Eurographics 2012 | Takahiro Harada
COUPLING RIGID BODIES + FLUID
16 | Eurographics 2012 | Takahiro Harada
CLOTH COLLISION DETECTION
§ Particle v.s. triangle mesh collision
§ Dual unifor...
17 | Eurographics 2012 | Takahiro Harada
SEVERAL PHYSICS PROBLEMS WERE SOLVED
Particle Simulations
Granular Materials Flui...
18 | Eurographics 2012 | Takahiro Harada
PROBLEM SOLVED??
§ Not yet
§ Uniform grid is not a perfect solution
§ More com...
19 | Eurographics 2012 | Takahiro Harada
BROAD-PHASE COLLISION DETECTION
§ Sweep & prune
–  Sort end points
–  Sweep sort...
20 | Eurographics 2012 | Takahiro Harada
NARROW-PHASE COLLISION DETECTION
§ Variety of work
–  So many shape combinations...
21 | Eurographics 2012 | Takahiro Harada
MULTIPLE GPU
COLLISION DETECTION
22 | Eurographics 2012 | Takahiro Harada
MULTIPLE GPU PROGRAMMING
23 | Eurographics 2012 | Takahiro Harada
Parallelization using Multiple GPUs
• Two levels of parallelization
• 1GPU
• Mult...
24 | Eurographics 2012 | Takahiro Harada
25 | Eurographics 2012 | Takahiro Harada
PARTICLE SIMULATION
ON
MULTIPLE GPUS
26 | Eurographics 2012 | Takahiro Harada
27 | Eurographics 2012 | Takahiro Harada
28 | Eurographics 2012 | Takahiro Harada
Data Management
• Not all the data have to be sent
• Data required for the comput...
29 | Eurographics 2012 | Takahiro Harada
30 | Eurographics 2012 | Takahiro Harada
31 | Eurographics 2012 | Takahiro Harada
Data Transfer between GPUs
• No direct data transfer is provided on current hardw...
32 | Eurographics 2012 | Takahiro Harada
Computation of 1 Time Step
GPU0 GPU1 GPU2 GPU3
Main
Memory
Compt
Force
GPU0
Compt...
33 | Eurographics 2012 | Takahiro Harada
Environment
• 4GPU server (Simulation) + 1GPU (Rendering)
• 1M particles
• 6GPU s...
34 | Eurographics 2012 | Takahiro Harada
MOVIE
Harada et al., Massive Particles: Particle-based Simulations on Multiple GP...
35 | Eurographics 2012 | Takahiro Harada
Results
Number of Particles (103
particles)
0
10
20
30
40
50
60
70
80
90
100
0 20...
36 | Eurographics 2012 | Takahiro Harada
PROBLEM SOLVED??
§ Not yet
§ Most of the problems had identical object size
–  ...
37 | Eurographics 2012 | Takahiro Harada
HETEROGENEOUS COLLISION
DETECTION
38 | Eurographics 2012 | Takahiro Harada
§ Large number of particles
§ Particles with identical size
–  Work granularity...
39 | Eurographics 2012 | Takahiro Harada
MIXED PARTICLE SIMULATION
§ Not only small particles
§ Difficulty for GPUs
–  L...
40 | Eurographics 2012 | Takahiro Harada
CHALLENGE
§ Non uniform work granularity
–  Small-small(SS) collision
§ Uniform...
41 | Eurographics 2012 | Takahiro Harada
FUSION ARCHITECTURE
§ CPU and GPU are:
–  On the same die
–  Much closer
–  Effi...
42 | Eurographics 2012 | Takahiro Harada
METHOD
43 | Eurographics 2012 | Takahiro Harada
TWO SIMULATIONS
§ Small particles
§ Large particles
Build
Acc. Structure
SS
Col...
44 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
Uniform Work
Non Uniform Work
CLASSIFY BY W...
45 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
GPU
CPU
CLASSIFY BY WORK GRANULARITY, ASSIG...
46 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
§ Grid, small particle data has to be shar...
47 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
§ Grid, small particle data has to be shar...
48 | Eurographics 2012 | Takahiro Harada
GPU
CPU
VISUALIZING WORKLOADS
Build
Acc. Structure
SS
Collision
S
Integration
Pos...
49 | Eurographics 2012 | Takahiro Harada
§ Small particles
§ Large particles
§ To get better load balancing
–  The sync...
50 | Eurographics 2012 | Takahiro Harada
MULTI THREADING
(4 THREADS)
51 | Eurographics 2012 | Takahiro Harada
OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION
GPU
Build
Acc. Structure
SS
Collis...
52 | Eurographics 2012 | Takahiro Harada
DEMO
GPUWork
CPUWork
53 | Eurographics 2012 | Takahiro Harada
QUESTIONS?
Upcoming SlideShare
Loading in …5
×

Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

2,720 views

Published on

Published in: Technology, Art & Photos
  • Be the first to comment

Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

  1. 1. USING GPUS FOR COLLISION DETECTION AMD Takahiro Harada
  2. 2. 2 | Eurographics 2012 | Takahiro Harada GPUS
  3. 3. 3 | Eurographics 2012 | Takahiro Harada GPU RAW PERFORMANCE § High memory bandwidth § High TFLOPS § Radeon HD 7970 –  32x4 SIMD engines –  64 wide SIMD –  3.79 TFLOPS –  264 GB/s 0 50 100 150 200 250 300 0 0.5 1 1.5 2 2.5 3 3.5 4 4890 5870 6970 7970 TFLOPS GB/s
  4. 4. 4 | Eurographics 2012 | Takahiro Harada RAW PERFORMANCE IS HIGH, BUT § GPU performs good only if used correctly –  Divergence –  ALU op / Memory op ratio –  etc § Some algorithm requires operation GPUs are not good at
  5. 5. 5 | Eurographics 2012 | Takahiro Harada EX) HASH GRID (LINKED LIST) § Low ALU op / Mem op ratio § Spatial query using hash grid do { write( node ); node = node->next; }while( node ) § CPUs is better for this 0 10 20 30 40 50 60 Query HD7970 HD6970 HD5870 HD6870 PhenomXII6(1) PhenomXII6(1) PhenomXII6(3) PhenomXII6(4) PhenomXII6(5) PhenomXII6(6) CPU
  6. 6. 6 | Eurographics 2012 | Takahiro Harada EX) HASH GRID (LINKED LIST) § Low ALU op / Mem op ratio § Spatial query using hash grid –  while() –  { § Fetch element § Copy –  } § CPUs is better for this 0 10 20 30 40 50 60 Query HD7970 HD6970 HD5870 HD6870 PhenomXII6(1) PhenomXII6(1) PhenomXII6(3) PhenomXII6(4) PhenomXII6(5) PhenomXII6(6) GPU
  7. 7. 7 | Eurographics 2012 | Takahiro Harada REASON OF PERFORMANCE JUMP § Key architectural changes –  No VLIW –  Improved memory system
  8. 8. 8 | Eurographics 2012 | Takahiro Harada GPU COLLISION DETECTION
  9. 9. 9 | Eurographics 2012 | Takahiro Harada COLLISION DETECTION WAS HARD FOR GPUS § Old GPUs were not as flexible as today’s GPUs –  Fixed function –  Programmable shader (Vertex shader, pixel shader) § Grid based fluid simulation § Cloth simulation Crane et al., Real-Time Simulation and Rendering of 3D Fluids, GPU Gems3 (2007)
  10. 10. 10 | Eurographics 2012 | Takahiro Harada UNIFORM GRID § Simplest but useful data structure § Pre Compute Language –  Vertex shader, pixel shader –  Vertex texture fetch -> Random write –  Depth test, blending etc -> Atomics § Now it is very easy –  Random write and atomics are supported struct Cell { u32 m_counter; u32 m_data[N ]; }
  11. 11. 11 | Eurographics 2012 | Takahiro Harada UNIFORM GRID § Simplest but useful data structure § Pre Compute Language –  Vertex shader, pixel shader –  Vertex texture fetch -> Random write –  Depth test, blending etc -> Atomics § Now it is very easy –  Random write and atomics are supported 2 5 8
  12. 12. 12 | Eurographics 2012 | Takahiro Harada WITH UNIFORM GRID § Often used for particle-based simulation § Other steps are embarrassingly parallel § Distinct Element Method (DEM) § Particles can be extended to…
  13. 13. 13 | Eurographics 2012 | Takahiro Harada SMOOTHED PARTICLE HYDRODYNAMICS § Fluid simulation § Solving the Navier-Stokes equation on particles Harada et al., Smoothed Particle Hydrodynamics on GPUs, CGI (2007)
  14. 14. 14 | Eurographics 2012 | Takahiro Harada RIGID BODY SIMULATION § Represent a rigid body with a set of particles § Rigid body collision = Particle collision Harada et al., Real-time Rigid Body Simulation on GPUs, GPU Gems3 (2007)
  15. 15. 15 | Eurographics 2012 | Takahiro Harada COUPLING RIGID BODIES + FLUID
  16. 16. 16 | Eurographics 2012 | Takahiro Harada CLOTH COLLISION DETECTION § Particle v.s. triangle mesh collision § Dual uniform grid –  1st grid: particles –  2nd grid: triangles Harada et al., Real-time Fluid Simulation Coupled with Cloth, TPCG (2007)
  17. 17. 17 | Eurographics 2012 | Takahiro Harada SEVERAL PHYSICS PROBLEMS WERE SOLVED Particle Simulations Granular Materials Fluids Rigid Bodies Cloth Coupling
  18. 18. 18 | Eurographics 2012 | Takahiro Harada PROBLEM SOLVED?? § Not yet § Uniform grid is not a perfect solution § More complicated collision detection is necessary –  E.g., rigid body simulation –  Broad-phase collision detection –  Narrow-phase collision detection
  19. 19. 19 | Eurographics 2012 | Takahiro Harada BROAD-PHASE COLLISION DETECTION § Sweep & prune –  Sort end points –  Sweep sorted list § Optimizations –  Split sweep of a large object –  Workspace subdivision –  Cell subdivision Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units, TOG(2010)
  20. 20. 20 | Eurographics 2012 | Takahiro Harada NARROW-PHASE COLLISION DETECTION § Variety of work –  So many shape combinations -> divergence § Unified shape representation § Each collision computation is parallelizable
  21. 21. 21 | Eurographics 2012 | Takahiro Harada MULTIPLE GPU COLLISION DETECTION
  22. 22. 22 | Eurographics 2012 | Takahiro Harada MULTIPLE GPU PROGRAMMING
  23. 23. 23 | Eurographics 2012 | Takahiro Harada Parallelization using Multiple GPUs • Two levels of parallelization • 1GPU • Multiple GPUs Memory MemoryMemoryMemory
  24. 24. 24 | Eurographics 2012 | Takahiro Harada
  25. 25. 25 | Eurographics 2012 | Takahiro Harada PARTICLE SIMULATION ON MULTIPLE GPUS
  26. 26. 26 | Eurographics 2012 | Takahiro Harada
  27. 27. 27 | Eurographics 2012 | Takahiro Harada
  28. 28. 28 | Eurographics 2012 | Takahiro Harada Data Management • Not all the data have to be sent • Data required for the computation has to be sent • Two kinds of particles have to be sent to adjacent GPU 1. Escaped particles : Particles get out from adjacent subdomain (adjacent GPU will be responsible for these particles in the next time step) 2. Ghost particles in the ghost region
  29. 29. 29 | Eurographics 2012 | Takahiro Harada
  30. 30. 30 | Eurographics 2012 | Takahiro Harada
  31. 31. 31 | Eurographics 2012 | Takahiro Harada Data Transfer between GPUs • No direct data transfer is provided on current hardwares • Transfer via main memory • The buffers equal to the number of connectors are allocated • Each GPU writes data to the defined location at the same time • Each GPU read data of the defined location at the same time GPU0 GPU1 GPU2 GPU3 Main Memory
  32. 32. 32 | Eurographics 2012 | Takahiro Harada Computation of 1 Time Step GPU0 GPU1 GPU2 GPU3 Main Memory Compt Force GPU0 Compt Force GPU1 Compt Force GPU2 Compt Force GPU3 Update Update Update Update Prepare Send Prepare Send Prepare Send Prepare Send Send Send Send Send Synchronization Receive Receive Receive Receive Post Receive Post Receive Post Receive Post Receive
  33. 33. 33 | Eurographics 2012 | Takahiro Harada Environment • 4GPU server (Simulation) + 1GPU (Rendering) • 1M particles • 6GPU server boxes (Simulation) + 1GPU (Rendering) @ GDC2008 • 3M particles GPU GPU GPU GPU GPU GPU GPU Simulation GPUs GPU GPU GPU GPU GPU Simulation GPUs
  34. 34. 34 | Eurographics 2012 | Takahiro Harada MOVIE Harada et al., Massive Particles: Particle-based Simulations on Multiple GPUs, SIG Talk(2008)
  35. 35. 35 | Eurographics 2012 | Takahiro Harada Results Number of Particles (103 particles) 0 10 20 30 40 50 60 70 80 90 100 0 200 400 600 800 1000 1200 Total (1GPU) Sim (2GPU) Total (2GPU) Sim (4GPU) Total (4GPU) ComputationTime(ms)
  36. 36. 36 | Eurographics 2012 | Takahiro Harada PROBLEM SOLVED?? § Not yet § Most of the problems had identical object size –  e.g., Particles § The reason is because of GPU architecture –  Not designed to solve non-uniform problem
  37. 37. 37 | Eurographics 2012 | Takahiro Harada HETEROGENEOUS COLLISION DETECTION
  38. 38. 38 | Eurographics 2012 | Takahiro Harada § Large number of particles § Particles with identical size –  Work granularity is almost the same –  Good for the wide SIMD architecture PARTICLE BASED SIMULATION ON THE GPU Harada et al. 2007
  39. 39. 39 | Eurographics 2012 | Takahiro Harada MIXED PARTICLE SIMULATION § Not only small particles § Difficulty for GPUs –  Large particles interact with small particles –  Large-large collision
  40. 40. 40 | Eurographics 2012 | Takahiro Harada CHALLENGE § Non uniform work granularity –  Small-small(SS) collision § Uniform, GPU –  Large-large(LL) collision § Non Uniform, CPU –  Large-small(LS) collision § Non Uniform, CPU
  41. 41. 41 | Eurographics 2012 | Takahiro Harada FUSION ARCHITECTURE § CPU and GPU are: –  On the same die –  Much closer –  Efficient data sharing § CPU and GPU are good at different works –  CPU: serial computation, conditional branch –  GPU: parallel computation § Able to dispatch works to: –  Serial work with varying granularity → CPU –  Parallel work with the uniform granularity → GPU
  42. 42. 42 | Eurographics 2012 | Takahiro Harada METHOD
  43. 43. 43 | Eurographics 2012 | Takahiro Harada TWO SIMULATIONS § Small particles § Large particles Build Acc. Structure SS Collision S Integration Build Acc. Structure LL Collision L Integration LS Collision Position Velocity Force Grid Position Velocity Force
  44. 44. 44 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles Uniform Work Non Uniform Work CLASSIFY BY WORK GRANULARITY Build Acc. Structure SS Collision S Integration L Integration Position Velocity Force Grid Position Velocity Force LL Collision LS Collision Build Acc. Structure
  45. 45. 45 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles GPU CPU CLASSIFY BY WORK GRANULARITY, ASSIGN PROCESSOR Build Acc. Structure SS Collision S Integration L Integration Position Velocity Force Grid Position Velocity Force LL Collision LS Collision Build Acc. Structure
  46. 46. 46 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles § Grid, small particle data has to be shared with the CPU for LS collision –  Allocated as zero copy buffer GPU CPU DATA SHARING Build Acc. Structure SS Collision S Integration L Integration Position Velocity Force Grid Position Velocity Force LL Collision Build Acc. Structure Position Velocity Grid Force LS Collision
  47. 47. 47 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles § Grid, small particle data has to be shared with the CPU for LS collision –  Allocated as zero copy buffer GPU CPU SYNCHRONIZATION Position Velocity Force Grid Position Velocity Force SS Collision S Integration L Integration LL Collision Position Velocity Grid Force Synchronization LS Collision Build Acc. Structure Build Acc. Structure Synchronization
  48. 48. 48 | Eurographics 2012 | Takahiro Harada GPU CPU VISUALIZING WORKLOADS Build Acc. Structure SS Collision S Integration Position Velocity Force Grid Position Velocity Force LL Collision LS Collision Synchronization L Integration § Small particles § Large particles § Grid construction can be moved at the end of the pipeline –  Unbalanced workload
  49. 49. 49 | Eurographics 2012 | Takahiro Harada § Small particles § Large particles § To get better load balancing –  The sync is for passing the force buffer filled by the CPU to the GPU –  Move the LL collision after the sync GPU CPU LOAD BALANCING Build Acc. Structure SS Collision S Integration Position Velocity Force Grid Position Velocity Force LL Collision Synchronization L Integration LS Collision
  50. 50. 50 | Eurographics 2012 | Takahiro Harada MULTI THREADING (4 THREADS)
  51. 51. 51 | Eurographics 2012 | Takahiro Harada OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION GPU Build Acc. Structure SS Collision S Integ. CPU0 CPU1 CPU2 LS Collision LS Collision LS Collision Synchronization MergeMergeMerge LL Coll. L Integ. Synchronization S Sorting S Sorting S Sorting Synchronization Harada, Heterogeneous Particle-Based Simulation, SIG ASIA Talk(2011)
  52. 52. 52 | Eurographics 2012 | Takahiro Harada DEMO GPUWork CPUWork
  53. 53. 53 | Eurographics 2012 | Takahiro Harada QUESTIONS?

×