A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)

1,258 views

Published on

Published in: Technology, Spiritual
  • Be the first to comment

A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)

  1. 1. A PARALLEL CONSTRAINT SOLVER FOR A RIGID BODY SIMULATION Takahiro Harada, AMD
  2. 2. 2 Harada, A Parallel Constraint Solver for a Rigid Body Simulation INTRODUCTION
  3. 3. 3 Harada, A Parallel Constraint Solver for a Rigid Body Simulation RIGID BODY SIMULATION PIPELINE  Broad phase collision detection – Quick check using bounding volumes – (A,C)(A,B)(B,C)(B,D)(E,F)  Narrow phase collision detection – Detailed check using geometry – (A,B)(B,D)(E,F)  Constraint solve A B C D E F
  4. 4. 4 Harada, A Parallel Constraint Solver for a Rigid Body Simulation RIGID BODY SIMULATION ON THE GPU  Broad phase collision detection – Harada et al., Smoothed Particle Hydrodynamics on GPUs (2007) – Le Grand, Broad-phase Collision Detection with CUDA (2007) – Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units (2010)  Narrow phase collision detection – Sathe, Rigid Body Collision Detection on the GPU (2006) – Harada et al., Real-time Rigid Body Simulation on GPUs (2007) – Kipfer, LCP Algorithms for Collision Detection using CUDA (2007)  Constraint solve – Harada, Real-time Rigid Body Simulation on GPUs (2007) – Harada, Parallelizing the Physics Pipeline (2009) – Tonge et al., PhysX GPU Rigid Bodies in Batman (2010)
  5. 5. 5 Harada, A Parallel Constraint Solver for a Rigid Body Simulation WHY SOLVER ISN’T STRAIGHT FORWARD??  LCP – Projected Gauss Seidel  Dependency between constraints
  6. 6. 6 Harada, A Parallel Constraint Solver for a Rigid Body Simulation PARALLEL SOLVE  Split constraints into batches  Objects are dynamic
  7. 7. 7 Harada, A Parallel Constraint Solver for a Rigid Body Simulation BY INTRODUCING BATCHES  Now can solve in parallel  Batch creation is serial process – GPU needs parallezm  But have to create batch in parallel – How??
  8. 8. 8 Harada, A Parallel Constraint Solver for a Rigid Body Simulation RIGID BODY SIMULATION ON THE GPU  Broad phase collision detection – Harada et al., Smoothed Particle Hydrodynamics on GPUs (2007) – Le Grand, Broad-phase Collision Detection with CUDA (2007) – Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units (2010)  Narrow phase collision detection – Sathe, Rigid Body Collision Detection on the GPU (2006) – Harada et al., Real-time Rigid Body Simulation on GPUs (2007) – Kipfer, LCP Algorithms for Collision Detection using CUDA (2007)  Constraint solve – Harada, Real-time Rigid Body Simulation on GPUs (2007) <- Penalty method – Harada, Parallelizing the Physics Pipeline (2009) <- Partially serializing batch creation – Tonge et al., PhysX GPU Rigid Bodies in Batman (2010) <- Global atomics. Many corner cases
  9. 9. 9 Harada, A Parallel Constraint Solver for a Rigid Body Simulation METHOD
  10. 10. 10 Harada, A Parallel Constraint Solver for a Rigid Body Simulation GOAL  Good performance == Fit to the architecture
  11. 11. 11 Harada, A Parallel Constraint Solver for a Rigid Body Simulation DESIGNING FOR GPUS  2 level of parallelization – SIMD level – SIMD lane level  Sync  Share data  Less communication is better – Inter SIMD – Inter SIMD lane  The best algorithm for CPUs is not always the best for GPUs – In order, out of order Global Memory
  12. 12. 12 Harada, A Parallel Constraint Solver for a Rigid Body Simulation STRATEGY  2 step batch creation – 1st step: Global split  Localize the problem by splitting pairs into disjoint sets – 2nd step: Local batch creation  Efficient local operation with streaming data from global memory
  13. 13. 13 Harada, A Parallel Constraint Solver for a Rigid Body Simulation GLOBAL SPLIT  Split the pairs by space  Procedures – Calculate cell index for a pair – Reorder pairs by cell indices  GPU Radix sort  Each group(cell) is independent – except for adjacent cells
  14. 14. 14 Harada, A Parallel Constraint Solver for a Rigid Body Simulation GLOBAL SPLIT  Split the pairs by space  Procedures – Calculate cell index for a pair – Reorder pairs by cell index  GPU Radix sort  Ref: Introduction to GPU Radix Sort  4 independent set of groups  4 kernel dispatches – Green – Orange – Red – Blue
  15. 15. 15 Harada, A Parallel Constraint Solver for a Rigid Body Simulation 1ST DISPATCH (GREEN) ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS
  16. 16. 16 Harada, A Parallel Constraint Solver for a Rigid Body Simulation 2ND DISPATCH (ORANGE) ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS
  17. 17. 17 Harada, A Parallel Constraint Solver for a Rigid Body Simulation 3RD DISPATCH (RED) ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS
  18. 18. 18 Harada, A Parallel Constraint Solver for a Rigid Body Simulation 4TH DISPATCH (BLUE) ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU LDS
  19. 19. 19 Harada, A Parallel Constraint Solver for a Rigid Body Simulation PROBLEM SOLVED?  Not yet  Need a strategy for each SIMD (64 wide)  Solution  2nd level: Local batch creation
  20. 20. 20 Harada, A Parallel Constraint Solver for a Rigid Body Simulation LOCAL BATCH CREATION  Constraints are assigned for a SIMD  Q: How to extract the independent batches to utilize SIMD?  Parallel batch creation doesn’t work 0 1 2 3 0 1 2 3
  21. 21. 21 Harada, A Parallel Constraint Solver for a Rigid Body Simulation A B C D E F H G I J 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 A B C D E F G H I J PARALLEL BATCH CREATION FAILURE CASE
  22. 22. 22 Harada, A Parallel Constraint Solver for a Rigid Body Simulation A B C D E F H G I J 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 A B V V C V D V E V F V G V V H I V J PARALLEL BATCH CREATION FAILURE CASE
  23. 23. 23 Harada, A Parallel Constraint Solver for a Rigid Body Simulation A B C D E F H G I J 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 A B O X C O D O E O F O G O X H I O J PARALLEL BATCH CREATION FAILURE CASE
  24. 24. 24 Harada, A Parallel Constraint Solver for a Rigid Body Simulation A B C D E F H G I J 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 A B O X C V O D V O E V O F V V O G O X H V V I V O J V PARALLEL BATCH CREATION FAILURE CASE
  25. 25. 25 Harada, A Parallel Constraint Solver for a Rigid Body Simulation A B C D E F H G I J 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 A B O X C X O D X O E X O F X X O G O X H O X I X O J X PARALLEL BATCH CREATION FAILURE CASE
  26. 26. 26 Harada, A Parallel Constraint Solver for a Rigid Body Simulation LOCAL ATOMICS BATCH CREATION  Parallel approach doesn’t work  Serial approach is inefficient  Iterative Parallel Batch creation – Even if one shot doesn’t work, it will get better after a few iteration – Need frequent access to the pairs  Utilize fast LDS
  27. 27. 27 Harada, A Parallel Constraint Solver for a Rigid Body Simulation WHAT IF A CELL HAS 1,000,000 PAIRS??  Obviously, does not fit to LDS  Likely to happen  Streaming the pairs to LDS – Like moving a window over a buffer  Procedures – Fill the local buffer with pairs – Iterative batch creation – Flush to global memory  This step only reorder constraints  No additional data output Local Constraint Buffer Local Constraint Buffer Local Batched Buffer Local Constraint Buffer Local Constraint Buffer Local Batched Buffer Local Constraint Buffer Local Constraint Buffer Local Batched Buffer (1) Fill (3) Fill (5) Fill (2) Batch (4) Batch (6) Batch Global Constraint Buffer Global Constraint Buffer (7) Flush 1st iteration 2nd iteration 3rd iteration
  28. 28. 28 Harada, A Parallel Constraint Solver for a Rigid Body Simulation SOLVE A GROUP  The pairs are already sorted by batch  But we need to know where is the boundary  Procedures – Read pairs to local dispatch buffer – Check boundary – Parallel solve – Repeat until done  Batches were maintained entirely by the CPU  This moves batch dispatch work to GPU Constraint Buffer Batch0 Batch1 Batch2 Batch3 Batch4 SIMD width
  29. 29. 29 Harada, A Parallel Constraint Solver for a Rigid Body Simulation RESULTS
  30. 30. 30 Harada, A Parallel Constraint Solver for a Rigid Body Simulation PIPELINE  Copy body and pair buffer  GPU allocates big buffers – Contact – Constraints  Narrow phase and solve is done on the GPU  Don’t have to read back big buffers Body Pair Body Contact Constraint Pair Merge Dispatch Logic CPU Broad phase Collision GPU NP Collision Solve Body, Pair BodyCopy Copy Copy Copy
  31. 31. 31 Harada, A Parallel Constraint Solver for a Rigid Body Simulation MOVIE
  32. 32. 32 Harada, A Parallel Constraint Solver for a Rigid Body Simulation CONCLUSIONS  Presented parallel constraint solver – 2 stage batch creation – Reduced # of dispatch from the CPU – GPU does dispatch by itself  Parallel iterative batch creation improved the batch quality a lot – It surpassed the quality of single theaded batch creation after a few iteration  Still room for improvement for SIMD utilization  Integrate GPU broadphase collision detection to complete GPU pipeline

×