1. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
CUDA-based Linear Solvers for Stable
Fluids
G. Amador and A. Gomes
Departamento de Inform´atica
Universidade da Beira Interior
Covilh˜a, Portugal
m1420@ubi.pt, agomes@di.ubi.pt
April, 2010
3. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Overview
The study of fluid simulation (e.g., water) is important
for two industries:
(real-time ≥ 30 fps) (off-line ≤ 30 fps)
4. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Overview
The study of fluid simulation (e.g., water) is important
for two industries:
(real-time ≥ 30 fps) (off-line ≤ 30 fps)
Problems:
How to implement (specifically for 3D stable fluids) the
CUDA-based versions of the Jacobi, Gauss-Seidel,
and conjugate gradient iterative solvers?
What are the real-time performance limitations of
these solvers implementations?
5. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
The Eulerian approach
The Eulerian approach
Space partitioning:
Variations of velocity and density are observed at the
center of each cell.
Velocities and densities are updated through an im-
plicit method (Stam stable fluids, 1999), i.e., uncondi-
tionally stable for any time step.
6. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Navier-Stokes equations for incompressible fluids
Mass conservation: −→
u = 0
Velocity evolution:
∂
−→
u
∂t
= −
−→
u ·
−→
u + v 2−→
u +
−→
f
Density evolution:
∂ρ
∂t
= −
−→
u · ρ + k 2
ρ + S
−→
u : velocity field.
v: fluids viscosity.
ρ: density of the field.
k: density diffusion rate.
−→
f : external forces added to the velocity field.
S: external sources added to the density field.
=
∂
∂x
,
∂
∂y
,
∂
∂z
: gradient.
7. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Navier-Stokes equations implementation
Update velocity:
Add external forces (
−→
f ).
Velocity Diffusion (v 2−→
u ).
Move (−
−→
u .
−→
u e
−→
u = 0).
Update density:
Add external sources (S).
Density advection (−
−→
u . ρ).
Density diffusion (k 2
ρ).
8. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Navier-Stokes equations implementation
Update velocity:
Add external forces (
−→
f ).
Velocity Diffusion (v 2−→
u ).
Move (−
−→
u .
−→
u e
−→
u = 0).
Update density:
Add external sources (S).
Density advection (−
−→
u . ρ).
Density diffusion (k 2
ρ).
9. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Diffusion
Exchanges of density
or velocity between
neighbours (2D).
Solve a sparse linear system (Ax = b), using an iter-
ative method (e.g., Jacobi, Gauss-Seidel, conjugate
gradient, etc.).
10. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Move
Ensure mass conservation and the fluid’s incom-
pressibility.
Hodge decomposition:
Conservative field = our field - gradient
Determine the gradient using diffusion’s iterative
method (e.g., Jacobi, Gauss-Seidel, conjugate gradi-
ent, etc.).
18. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Conclusions
Conclusions
The CUDA-based implementation of the Gauss-
Seidel solver allows more iterations than the CPU-
based implementation, however it converges two
times slower.
The CUDA-based implementations of the Jacobi and
Gauss-Seidel iterative solvers achieved better perfor-
mances (i.e. faster in processing time) than the CPU-
based implementations.
The CUDA-based implementation of the conjugate
gradient, for grid sizes superior to 643, due to global
memory latency, performs worst than the CPU-based
version.
19. ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Future Work
Future Work
Search ways, implementable using CUDA, to reduce
global memory accesses (e.g., data structures, dy-
namic memory, etc.).
Implement the CPU-based multi-core versions of
the solvers and compare their performance with the
CUDA-based versions.
Search new solvers implementable using CUDA, with
better convergence rate than relaxation techniques
(Jacobi and Gauss-Seidel), with no significant extra
computational effort such as the conjugate gradient.