Project Dissertation

Real-Time Physically-Based
Fluid Simulation on the GPU
Valentin Hinov

Declaration of Originality and Permission to Copy
Author: Valentin Hinov
Title: Real-Time Physically-Based Fluid Simulation on the GPU
Degree: BSc (Hons) Computer Games Technology
Year: 2014
(i) I certify that the above mentioned project is my original work.
(ii) I agree that this dissertation may be reproduced, stored or transmitted, in any
form and by any means without the written consent of the undersigned.
Signature: .................................................................
Date: .................................................................
i

Abstract
Physically-based fluid simulation has long been reserved for the realm of offline ren-dering.
Increasing improvements in the parallel computational power of graphics
cards are bringing the opportunity to simulate this phenomena in real-time. This
projects aims to prove that, with certain simplifications and optimisations, fluid
simulation can be used in demanding applications such as games.
A framework is created for this project to present methods for calculating and
rendering fire and smoke using the parallel processing power of the graphics card
through the DirectX 11 Compute Shader APIs. The suggested approach takes into
consideration the importance of maintaining performance in a real-time application.
Various LOD(Level Of Detail) and performance optimisation methods used in games
are adopted and modified for this purpose.
The most important variable for smooth gameplay is the frames-per-second (FPS)
that an application maintains. By keeping a constant measure of it, the framework
provides a means to monitor the stability and effectiveness of the implementation.
The results of this project show that proper adoption of LOD techniques, such
as frame skipping can greatly reduce processing overhead. On the other hand, the
use of instancing techniques can allow for multiple fluids to be rendered at the cost
of simulating just one. This, together with smart usage of texture management
help keep the memory and processing footprint low. Conclusively, these combined
provide an optimized solution for using physically-based fire and smoke in a real-time
setting, which maintains both accuracy and visual quality. Measurements show
that simulating 3 differently sized fluid domains - 64x128x64, 40x80x40, 30x60x30 -
maintains an average frame rate of over 800 on a high tier graphics card, while still
managing a comfortable 50 on a low tier one.
Keywords: fluid simulation, performance, DirectX 11, Compute Shader
ii

Preface
I would like to take this opportunity to extend my gratitude to the support and help
I have received from my supervisor, Dr David MacTaggart, and my module tutor,
Dr Henry Fortuna. I would also like to thank Alex Dunn, who provided me with
valuable advice and constructive criticism.
I am also immensely grateful for the help and patience provided by Tsvetelina
Dacheva and the support of my parents during the long production hours on this
project.
- Valentin Hinov
iii

Contents
Abstract ii
Preface iii
List of Figures vi
List of Tables viii
1 Introduction 1
1.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Mathematics of Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Modelling the Simulation Space . . . . . . . . . . . . . . . . . 6
2.2 State of Fluids in Games . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 LOD and Performance Overview . . . . . . . . . . . . . . . . . . . . . 7
3 Literature Review 8
3.1 Early work on real-time solvers . . . . . . . . . . . . . . . . . . . . . 8
3.2 The GPU advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 3D Fluid Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Volume Ray Casting . . . . . . . . . . . . . . . . . . . . . . . 12
4 Methodology 13
4.1 Introduction and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Framework Architecture . . . . . . . . . . . . . . . . . . . . . 13
4.1.2 Methodology Structure . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Fluid Domain Representation . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Setting up a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15
iv

4.3.1 Choosing a Grid Size . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.2 Setup Optimisations . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Running a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.1 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.2 Runtime Modifications . . . . . . . . . . . . . . . . . . . . . . 24
4.4.3 Choosing an Update Rate . . . . . . . . . . . . . . . . . . . . 26
4.4.4 Frame Skipping . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Rendering Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5.1 Render Parameters . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5.2 Fluid Instancing . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Results and Discussion 32
5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Visual Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Modifying Parameters . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Memory & Performance Results . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 Memory Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Conclusion and Future Work 38
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Appendix A Test GPUs Specifications 40
Appendix B CD Contents 41
References 41
Bibliography 45
v

List of Figures
1.1 Computational performance of Navier-Stokes equations on new NVidia
GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.1 Advection step moving smoke density along a velocity field. As shown
in Stam’s solver from 2003 (Stam, 2003) . . . . . . . . . . . . . . . . 9
3.2 Smoke being pushed and moving around by a gargoyle in "Hellgate:
London" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Different grids, same render scale. Domain size from left to right:
16x32x16, 32x64x32, 64x128x64 . . . . . . . . . . . . . . . . . . . . . 16
4.2 Left: Using MacCormack for density and reaction only; Right: Using
MacCormack for all fields . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Different vorticity confinement strengths. Strength factors from left
to right: 0, 0.5, 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Users have the freedom to edit fluid control settings at runtime to
observe their effects. Reaction values are not used for smoke simulation 25
4.5 Render settings modify the look of a fluid without changing its phys-ical
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Different sample rates of a 64x128x64 fluid from afar. Left: 32 sam-ples;
Right: 128 samples . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.7 Different sample rates of a 64x128x64 fluid from up close. Left to
right: 32, 64, 128 samples . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 Looking at the entire final scene from a distance with all fluids in view 33
5.2 Smoke and fire simulation in the application . . . . . . . . . . . . . . 34
5.3 Right: Fast decaying fire, producing a lot of smoke; Mid: Strong
fire, burning with nearly no smoke; Right: Average strength fire,
producing blue smoke . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Benchmark results on notebook computer using a NVidia GT 640M
LE GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi

5.5 Benchmark results on gaming PC using an AMD Radeon R9 290X
GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.1 Technical Specifications of both graphics cards used for testing. The
bandwidth and clock speeds are the key factors for performance . . . 40
vii

List of Tables
4.1 Texture Formats and their uses . . . . . . . . . . . . . . . . . . . . . 17
5.1 Hardware used for testing . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Video memory used for simulations of different resolution . . . . . . . 35
viii

Chapter 1
Introduction
Fluid simulation has been a hot topic in computer graphics, especially in the last
decade where the dramatic increase in computational power has affected not only the
CPU (Central Processing Unit) but the much more parallel-focused GPU (Graphics
Processing Unit) (Gupta, 2011).
For virtual environments in games, a correct portrayal of natural phenomena, such
as smoke and fire, aids greatly in immersing the player in the world and making
it appear believable. However, realistic rendering and simulation of these fluids
requires a considerable amount of resources - both from a processing and from a
memory standpoint. In fact, when an extreme degree of accuracy is needed - for
example, how a ships design will handle at sea - high-performance computing (HPC)
centres are used and the calculations often take several months to complete.
Computer games, in their inherent nature, are all about interacting in real-time
with a virtual world. As CPUs and graphics cards have got more powerful, the
expectations of how fast real-time is and how good the worlds look has increased.
Depending on the game, the accepted frame rate varies between 30 and 60. Drops
below 30 become instantly obvious, as the world seems to experience "hiccups" and
the action slows down. In fast-paced, twitch-based experiences, such as first-person
shooters or real-time strategies, maintaining a frame rate of 60 is often a requirement
for smooth gameplay.
The challenge of integrating realistic fluid simulation in a virtual world, while ad-hering
to these requirements, is the main motivation behind this project. Lets start
by defining what a fluid is. A fluid is any substance that flows - meaning it can
take the shape of its container. This includes liquids, such as water, and gases, such
1

as air. Smoke can also be described as a fluid, although it is more accurate to say
that it is composed of tiny particulates suspended in a gas (Gourlay, 2012). Fire is
the chemical process of combustion, leading to the release of heat and light. As it
decays, smoke forms as a by-product.
In graphics a fluid can be modelled as a grid system of cells, each containing the
properties of the fluid at that location. The most important of these are velocity -
the speed and direction of the flow at a cell location, and density - the amount of
material that position contains. Every update step, the equation of motion is ap-plied
on each cell and the quantity of the properties it contains changes. It follows
that depending on the grid size and quality of the simulation, this traversal and
update can quite be an expensive operation.
This is where the GPU advantage comes in - splitting up a task into a lot of smaller
parallel-running jobs is exactly what the hardware excels at. In fact, early 2D GPU
fluid dynamics experiments saw a performance increase of up to six times compared
to a CPU implementation (Harris, 2004). As GPGPU (General-purpose computing
on graphics processing units) has advanced with technologies such as CUDA and
DirectCompute (MSDN, 2010), speeds of up to ten times faster are becoming a
reality (NVidia, 2013).
Figure 1.1: Computational performance of Navier-Stokes equations on new NVidia
GPUs
2

Simply moving all calculations to the graphics card does not solve the problem fully.
What needs to be considered is that in a proper real-time interactive application,
the GPU will be engaged with many other activities - such as rendering polygons,
doing lighting calculations and others - meaning that fluid simulation and rendering
cannot be the only task occupying resources.
1.1 Project Aim
The objective of this project is to investigate the simulation of physically-based
fluids by taking advantage of modern GPU hardware with the aim of answering the
following question:
How can the parallel processing advantage of modern graphics cards be
used for simulating physically-based fluids, and how can this approach
be adapted for real-time use?
The main consideration during this investigation will be what simplifications can
be made when simulating - both when setting up and during runtime - in order to
obtain a result which is both graphically impressive and computationally efficient.
The project hopes to achieve the following:
• Derive an effective way of utilising the GPU for solving the equations of fluid
motion in 3D.
• Discover what level of detail methods and performance optimisations can be
applied in order to use fewer system resources.
• Draw conclusions and recommendations for further research into this area.
Over the course of this undertaking, an experimental framework will be developed
to showcase the research discoveries. It will also be used to gather quantitative data
in order to make the appropriate conclusions as to the effectiveness of the provided
solutions.
3

1.2 Dissertation Structure
Chapter 2 gives additional background on the main topics of discussion: the math-ematical
formulas describing fluid motion; current state of fluid representation in
games and it ends with a review of how level of detail is used in games to increase
performance and how existing techniques can be adapted to fluid simulation.
Chapter 3 presents past research in the area of fluid simulation - starting from
work on early real-time solvers and moving on to research into using the GPU. It
also discusses past work on ways of rendering fluids as well as research into integrat-ing
level of detail in fluid simulation.
Chapter 4 describes this projects implementation of a physically-based incompress-ible
Navier-Stokes. It also discusses what optimisations are used for setting up and
running a simulation. It ends with how fluid rendering is handled.
Chapter 5 analyses the results and data collected from the experimental frame-work
and explores their implications.
Chapter 6 concludes this dissertation and draws recommendation for future work
that could be undertaken.
4

Chapter 2
Background
2.1 Mathematics of Fluid Flow
For understanding of the mechanics behind fluid dynamics, knowledge of differential
vector operations is expected. Gradient rf, divergence r·~v, curl r×~v, directional
derivative ~v·rf and the Laplacian r2 are all used in the Navier-Stokes Equations
which are described below.
@~u
@t
= −(~u·r)~u−
1

rp+r2u+ ~F (2.1)
r·~u = 0 (2.2)
Where ~u is the velocity of the fluid; p is the pressure; is the density; controls
the viscosity of the fluid and ~F encapsulates all external forces acting on it.
Equation (1) is known as the momentum equation of fluid flow. It is derived from
Newton’s 2nd law of motion which means it describes the acceleration of the fluid
due to forces acting on it. From left to right these being advection, pressure, diffu-sion
and external forces. When dealing with complex media it is common to make
simplifying assumptions so as to more easily model the problem. Thus, when deal-ing
with a fluid it is assumed that it is an incompressible and homogeneous one.
Equation (2), the continuity equation enforces the incompressibility assumption by
ensuring that the fluid always has zero divergence, meaning that the volume of the
fluid will remain constant in time.
5

The Navier-Stokes equations are commonly used because they precisely describe the
evolution of a velocity field over time given its current state and other forces Stam
(2003). The key task of a fluid solver is to compute a numerical approximation of ~u.
This velocity field later controls the visual phenomena of the fluid - smoke density
or fire reaction values for example.
2.1.1 Modelling the Simulation Space
Fluids are typically modelled in one of two ways - as a field or as a particle system.
These are referred to as the Eulerian and Lagrangian viewpoints, respectively. The
first considers the fluid as a region of points - each containing properties like velocity
and density. These values change with time, but the points containing them stay
fixed in space. The Lagrangian viewpoint takes the more conventional approach of
modelling the continuum as a set of particles. Each particle, in addition to carrying
with it the properties of the fluid, has a position component. The easiest way to
visualise this is to think of the particles as molecules of fluid that move in time. The
most common way of representing Eulerian fluids is an arrangement of voxels and
Lagrangian ones as classic particle systems. A more in-depth description of these
viewpoints can be found in (Bridson, 2008).
2.2 State of Fluids in Games
Fluid simulation in games covers a wide range of phenomena - the most important
of which are water, smoke and fire. As this project mainly deals with the latter two,
they will be the focus of discussion. For a more detailed look at the state of water
in games, please refer to (Barrett, 2012).
3D games have to do an impressive amount of work to provide an immersive expe-rience.
During each update call physics, pathfinding, lighting, rendering and other
calculations have to be computed. It is no surprise that developers look to simplify
effects whenever they can. Particle systems have for the longest time been used to
model smoke. Particles are just 2D textured sprites, which always face the virtual
camera. Fire is often rendered the same way with the addition of static animations.
With improvements in lighting and particle control, the look of the effects does
visibly improve but at their core the simulation is not based on physical properties
but is determined by design tools.
6

An additional negative side to this representation is that it makes proper interaction
with the fluids very difficult to achieve. While, as is discussed later, in a suitably
defined Navier-Stokes solver, boundary conditions are part of the simulation process
and can be used as a means of accurate interaction with the system.
2.3 LOD and Performance Overview
The need to render many and different graphical effects on a system with limited re-sources
has been a widely explored challenge. Various techniques are often employed
to save processing time and system memory while still maintaining good graphical
quality. It is worthwhile to study some of these methods with the view of how they
might be adopted for fluid simulation.
A common performance boost when rendering 3D polygon mesh objects involves
reducing the amount of polygons they are made out of (Valve, 2012). There are
a variety of different ways to accomplish this - either with pre-made low-poly ver-sions
of the mesh or via a procedural method at runtime, often using GPU shaders.
A system is then set up to intelligently swap or blend between different versions
based on various parameters, such as the ones mentioned above. The end result is
less graphics bandwidth used and fewer computations made. If done properly, the
player never notices it.
Level of detail has other uses than just object rendering. It also has a place in
the complex calculations such as rigid body dynamics. For simulating collisions be-tween
bodies, for example, if the player is not looking at the objects in question, a
simplified, less realistic calculation can take place. Whatever the player is looking at
needs to behave consistently, but objects outside this direct area are less important
and approximations can be used.
A common occurrence in games is the need to render many of the same object
multiple times. Creating copies for the resources required, like vertex buffers and
textures, can quickly add up. Instead the same graphical data is used to render
multiple copies of the object where required (Carucci, 2005). Since transferring tri-angle
data from the CPU to the GPU and submitting state changes is a relatively
slow operation, instancing is a method that frees up valuable CPU processing time.
Batching as many draw calls together as possible is an often advised method of
optimising game renderers.
7

Chapter 3
Literature Review
Investigations into the physics behind fluid simulation dates back to the 18th and
19th centuries when the mathematicians Euler, followed by Navier and Stokes devel-oped
the basics of analytical solutions to fluid flows. With the start of the computer
processing era came the possibility to calculate solutions to these equations numer-ically.
Far from the idea of real-time applications, however, early research into the
topic focused on engineering applications, striving for accuracy and not factoring in
time taken (Hess and Smith, 1967).
3.1 Early work on real-time solvers
In the late 90s and early 2000s the prospect of real-time simulations started to be
actively discussed in research fields. Up to this point the majority of the inves-tigation
had been into offline graphical solvers. Also, the majority of numerical
solvers by that point used explicit techniques which suffer from instability unless a
small simulation time step is provided. It was Jos Stam, who in 1999 SIGGRAPH
conference, proposed an implicit Navier-Stokes solver that was stable under higher
time steps and was fast enough so results could be viewed instantly (Stam, 1999).
The importance of this paper stems from the fact that the method it put forward
was designed to be used in real-time. Not only that, but this approach allows for
boundary conditions to be dynamic and, as such, opens the door to interactivity
with the fluid. For game applications this is key. The resultant technique is very
successful in simulating gaseous-type fluids and will influence this study.
8

Figure 3.1: Advection step moving smoke density along a velocity field. As shown
in Stam’s solver from 2003 (Stam, 2003)
Stam’s initial proposal has downsides, though. Namely, it suffers from numerical
dissipation (also known as numerical diffusion/smoothing). This, as described by
(Bai and Turk, 2005) is due to the averaging operations performed when interpo-lating
values in the differential equation numerical solvers. Due to the lower order
accuracy of the advection routine, Stam’s method experiences this. This not only
tends to smooth out interesting features, like vortices in the fluid, but also makes
the fluid appear too viscous.
With this problem in mind, Fedkiw et al. (Fedkiw et al., 2001) presented a sem-inal
paper in the 2001 SIGGRAPH proceedings. In Visual Simulation of Smoke
the incompressible Euler equations are used as the fluid solver on a staggered grid
arrangement. They are combined with a new method called vorticity confinement
(Steinhoff and Underhill, 1994) which injects the energy lost due to numerical dissi-pation,
effectively balancing out the simulation. The result is that, even on a fairly
coarse grid, the aforementioned interesting features, such as swirling vortices in the
smoke field, are preserved and the overall lifespan of the smoke is improved. Like
Stam’s proposed method, this one is stable for large timesteps and allows for dy-namic
boundaries. The downside of this procedure is that it introduces an extra
computational step in the algorithm. The step itself is not overly expensive and
greatly enhances the simulation’s look, so it will be considered for this research.
9

3.2 The GPU advantage
At the SIGGRAPH 2003 conference the power of the GPU was the topic of dis-cussion.
Krüger and Westermann demonstrated that the parallelism of graphics
processors can be used as a matrix solver and to handle finite difference equations
for PDE approximations (Krüger and Westermann, 2003). On an ATI9800 card an
interactive visualized 2D Navier-Stokes solution ran at 9 FPS (frames per second)
on a 1024x1024 grid. In contrast, the CPU solvers provided by (Fedkiw et al., 2001)
need more than a second per frame on a similar sized domain. The advantage of
the GPU became obvious and fluid dynamics research reflected that.
In 2004, as part of the GPU Gems book (Fernando et al., 2004), Harris wrote a
chapter entitled Fast Fluid Dynamics on the GPU (Harris, 2004). He described
a method, based on Stam’s Stable Fluids technique that offloaded all equation
of motion calculations to the graphics card and produced a very fast interactive
2D Navier-Stokes solver. He successfully demonstrated how the grid data can be
translated into textures and how pixel shaders, which run simultaneously on each
pixel every render call, can be used to calculate the simulation. To solve the Poisson-pressure
equation he used a Jacobi iteration scheme, which, compared to the Krüger
and Westermann conjugate gradient and multigrid solvers, converges slower, but is
simple to implement and makes easy use of parallel calculations. Harris details how
his approach can be easily extended to allow for arbitrary boundaries (indeed, it is
just an addition of a texture that contains them to each shader). This GPU Gems
chapter provides a straightforward introduction into using shaders as a fluid solver
and will influence the early investigation of this study. Harris also describes a means
to extend the domain into 3D by layering 2D textures, but as current technology
allows for easy use of 3D Textures, it will not be considered.
As graphics hardware saw a staggering growth, in 2007 Harris’ work was built upon
in GPU Gems 3 (Nguyen et al., 2007). In the chapter Real-Time Simulation and
Rendering of 3D Fluids (Crane et al., 2007) the authors extend GPU simulation of
real-time fluid dynamics into the 3D domain. Their example program successfully
simulated either fire, smoke or water in a 70x70x100 grid. Additionally, using the
powerful Direct3D 10 support for 3D textures and the brand new geometry shader
functionality, their method allowed for any 3D object to voxelised and used as a
dynamic boundary for the simulation. Results of this can be seen in the game
Hellgate: London (Studios, 2007), which utilises this procedure.
10

Figure 3.2: Smoke being pushed and moving around by a gargoyle in Hellgate:
London
In the 2004 GPU Gems chapter, Harris uses a semi-Lagrangian backward advection
step that is based upon the one used by Stam (Stam, 1999) and as such suffers from
numerical smoothing. Crane et al. address this issue by utilising a MacCormack
scheme, which is a higher-order accuracy advection solver, in addition to vorticity
confinement. While this introduces two intermediate semi-Lagrangian steps in the
advection process and is not an unconditional method, it allows for better visual
fidelity of the final result without increasing the grid resolution. This saves memory
bandwidth at the expense of more computation, but in the chapter’s words math is
cheap compared to bandwidth. As this project will be looking into 3D smoke and
fire simulation by utilizing graphics hardware, this work will be used for reference.
3.3 3D Fluid Rendering
Graphics cards are optimised for rendering polygons and especially triangles. This
must be taken into account when it comes to displaying volumetric data, such as a
smoke, as there is no native way of rendering volume.
3.3.1 Particle Systems
In almost all early real-time fluid solvers (Stam, 2003) and many modern ones
(Gourlay, 2012) and (McGuire, 2006), the approach is to use particles. This has
the initial advantage of using an already established system that is common in
games and other graphical applications. Lagrangian or semi-Lagrangian schemes
are used to represent the domain. In the example from Gourlay, there are two types
of particles. The first are called vortex particles (or vortons). They are used to
represent the flow field and are free to move anywhere. The second type are just
regular particles, used for visualisation of the effect. They change their colour and
opacity state, depending on the vortons around them.
11

Using particle systems is advantageous when using a Lagrangian scheme and comes
with the advantage that the simulation space can be global, instead of a constrained
grid size. The disadvantage when using particles to visualise the fluid is the CPU
and GPU memory and processing overhead of storing and updating all of them. The
finer the detail level required, the more particles need to be used. Another downside
is that an unconstrained, dynamic simulation space is difficult to implement when
using the GPU.
3.3.2 Volume Ray Casting
The other main rendering technique which has gained more traction in GPU fluid
solvers is called ray-marching (or volume ray casting). This is the approach used by
(Crane et al., 2007) and (Zhou et al., 2007). This method works by considering the
fluid as a box, made up of many voxels, which contain the fluid properties. When
rendering, rays are traced from the point of view to the volume. The rays are then
marched through the domain with a predefined sample rate, accumulating colour
based on what the volume contains - smoke density, for example. This ends either
when enough density to get a fully saturated colour has been collected, or the ray
exits the volume. Usually, to get a decent visual result, a step size equal to half a
voxel is used when marching. Results of using volume ray casting can be seen in
the figure on page 11.
There are certain problems present in ray-marching. As discussed by (Crane et al.,
2007), banding is a visible artifact that appears if the sample step is too big or grid
resolution is too small. It is mostly prevalent when looking at the fluid up close.
There are certain ways to compensate for this - by using a smaller sample step, or
taking an extra sample at each step. These both come at an additional computa-tional
cost, though.
Ray-marching fits very well with the Eulerian and semi-Lagrangian schemes as it
considers the simulation domain as a fixed grid with changing properties. It is
also a technique that is inherently parallel and can naturally be implemented using
GPU pixel shaders. As this work will focus entirely on leveraging the graphics card,
volume ray casting will be the rendering method considered.
12

Chapter 4
Methodology
4.1 Introduction and Goals
Upon completing the research into this topic and revising the major project aims,
implementation goals are set. To revise - the main objective of this project is to
provide an effective way of performing fluid simulation on the GPU and adapt it to
be used at runtime. With this in mind, the framework created is tasked to fulfil the
following:
• Support simulation of both fire and smoke.
• Compute and render at least 2 different fluids at the same time.
• Maintain a frame rate of at least 30 FPS on low-to-mid tier graphics cards
and 60 FPS on high-end ones.
• Showcase improved application performance by using one or more LOD tech-niques.
4.1.1 Framework Architecture
The framework developed for this project targets the Windows 7+ operating sys-tems.
It uses Direct3D for rendering and is implemented in C++. For graphics
card operations it makes use of the powerful DirectCompute API along with HLSL
(High-Level Shading Language) for writing compute and pixel shaders.
13

4.1.2 Methodology Structure
The methodology starts by talking about how the fluid domain is represented. Then,
it covers the set up of a simulation and optimisations that it makes use of. What
follows after is a description of the process of running a fluid simulation and the
techniques that are used to control it at runtime. The methodology concludes with
how fluids are rendered. Please note that fluid simulation and fluid object will be
used interchangeably.
4.2 Fluid Domain Representation
In order to solve the fluid equations of motions numerically, the domain must be
discretized into computational points that the solver works with. As discussed in
the background chapter, an Eulerian representation models a domain as a grid of
points that contain the properties of the fluid. This is the approach this framework
utilises. The main reason being that 3D Eulerian grids are can be logically mapped
to voxels in 3D textures, which contain the data the GPU requires. The following is
part of a function which creates a 3D texture that can hold a 4 16-bit floating point
numbers:
D3D11_TEXTURE3D_DESC textureDesc;
textureDesc.Width = SizeX;
textureDesc.Height = SizeY;
textureDesc.Depth = SizeZ;
textureDesc.Format = DXGI_FORMAT_R16G16B16A16_FLOAT;
Where SizeX, SizeY and SizeZ vary depending on the size of the required domain
bounds. The format can also be changed. Each fluid uses a number of equally sized
textures to represent its state and properties. These are then stored on and used by
the graphics card. The equations of motion are calculated by running computational
kernels (implemented by shader programs) over the textures.
14

4.3 Setting up a Simulation
The first necessary requirement when creating a new fluid object is to determine
its grid dimensions. Textures of this size are then created for the various fluid
properties. Each texture is then used to create a ShaderParams structure, outlined
below:
struct ShaderParams {
CComPtrID3D11ShaderResourceView mSRV;
CComPtrID3D11UnorderedAccessView mUAV;
};
Generally, a ShaderResourceView (SRV) is used as an input to a shader program, as
it can only be read from, and a UnorderedAccessView (UAV) is used as an output,
as it can be written to. A more detailed description of Direct3D resource interfaces
can be found on (MSDN, 2014). For all fluid properties, except divergence, vorticity
and obstacles, 2 textures and ShaderParams structures are created. This is due to
the need to keep track of the fluid state at the previous time step in order to evaluate
the new one.
Choosing what type the fluid will be is also decided during this step. The framework
allows for two kinds - fire and smoke. While simulating both is nearly identical, fire
simulation requires 2 extra textures to keep track of the fire reaction values (this
determines the intensity of the fire at each cell and is used when rendering it).
At the end of the setup static boundary conditions are initialised. As fluids are
modelled as being in a box domain, boundary conditions are modelled as a single-cell
wide obstacle along each wall of the box and are stored in an 8-bit texture.
4.3.1 Choosing a Grid Size
Grid size has the biggest effect on how fast a simulation step is processed. A
32x64x32 domain, for example, will be evaluated more than 3 times faster than
a 64x128x64 one. This is both due to the fact that there are less cells to process
and because smaller texture sizes need less memory bandwidth.
The render size of the fluid is independent of its simulation domain. This means
that both a high and a low-resolution grid can be rendered with the same size. Up-
15

scaling the render size of a coarse grid can lead to visible artifacts, as there will be
less cells to sample for a good appearance. Similarly, rendering a fine grid in smaller
scale is potentially wasteful, as the increased detail is harder to spot.
Figure 4.1: Different grids, same render scale. Domain size from left to right:
16x32x16, 32x64x32, 64x128x64
When setting up a simulation, it is best to first determine the render scale required
and use that to choose the appropriate grid resolution in order to achieve the detail
needed. It is worth noting that dynamically resizing textures at runtime is not
feasible, so grid sizes stay fixed throughout execution.
4.3.2 Setup Optimisations
Memory is a key factor during setup, since depending on the fluid, a simulation can
use up to 15 textures (due to double buffering) to keep all the required data for its
state. These all have to be stored in memory and bound-unbound from the graphics
pipeline every frame. Using many fluid objects has the risk of bottlenecking the
GPU and starving it of memory.
Texture Formats
Direct3D offers an expansive range of different texture formats that can be used
on the GPU. Based on the needed number of components and their size, choosing
the correct format helps reduce video memory used and keeps texture bandwidth
low, increasing performance. This is the first place for potential optimisations.
When constructing fluid object textures, the format chosen is the smallest one that
can contain the data. The DXGI_FORMAT_R16G16B16A16_FLOAT format is
used for textures that hold fluid velocity and vorticity, since it is the smallest one
16

that provides 3 components per cell. Density and pressure, on the other hand,
only need 1 component per grid cell, which means they can be created with the
DXGI_FORMAT_R16_FLOAT format. This leads to using 4 times less memory.
It is worth to mention that using 16 bits per float is in itself an optimisation over
using 32 bit floats, but as shown in other research (Crane et al., 2007), the visual
degradation due to precision is hardly discernible. Below is a table of all formats
used and the properties they’re used for.
Table 4.1: Texture Formats and their uses
Direct3D Texture Format Uses
Texture Format Fluid Property
DXGI_FORMAT_R16G16B16A16_FLOAT
Velocity
Vorticity
DXGI_FORMAT_R16_FLOAT
Density
Temperature
Reaction
Divergence
Pressure
DXGI_FORMAT_R8_SINT Obstacles
Texture Sharing
There are several resources that need to be created for each fluid object that are
unique to it. These are the velocity, density, temperature, vorticity, obstacle and
reaction (for fire simulation) textures. They are unique, as they must be maintained
through program execution. On the other hand, the textures for velocity divergence
and fluid pressure are used temporarily by each solver.
To take advantage of this, when a fluid simulation is constructed it first checks
if an instance of these common resources has been created for its grid size. If so - it
uses them, if not - it constructs and makes them available for further sharing. With
this advantage in mind, if using many fluid objects - it is advantageous to build
them of the same size.
17

4.4 Running a Simulation
Once the 3D scene has been initialised, the main application loop begins. In it, each
fluid simulation is updated based on the numerical equation of motion solver. For
this implementation, fluids are modelled as both incompressible and inviscid. Thus,
the equations of motion become:
@~u
@t
= −(~u·r)~u−
1

rp+ ~F (4.1)
r·~u = 0 (4.2)
The calculation process involves solving each part of these equations in order, using
the result from each as the input to the other. The function that solves this every
update is outlined below:
void Fluid3DSolver::Process(ID3D11DeviceContext *context) {
// Set the obstacle texture - it is constant throughout the
execution step
context-CSSetShaderResources(4, 1,
(mFluidResources.obstacleSP.mSRV.p));
// Set all the constant buffers to the context
SetShaderBuffers(context);
//Advect temperature, density and reaction against velocity
AdvectProperties();
// Advect velocity against itself
AdvectVelocity();
// Determine how the temperature of the fluid changes the velocity
ComputeBuoyancy();
// Add a constant amount of density and temperature back into the
system
RefreshConstantImpulse();
// If there are any extra forces - add them here
ApplyExtraForces();
// Inject vorticity back into the system
ComputeVorticityConfinement();
// Subtract the pressure gradient from the velocity field. This
computes divergence free velocity.
ComputeProjection();
}
18

Firstly, the function binds the obstacle texture to the graphics pipeline. This is
because nearly all compute shader programs query this texture and there is no
need to constantly rebind it. Afterwards, the SetShaderBuffers function copies the
required fluid computational parameters - which control aspects of calculation - from
standard C++ structs into GPU constant buffers.
4.4.1 Simulation Steps
The rest of the functions solve the equations of motion. They all have underlying
similarities: binding required SRVs as inputs and UAVs as outputs to their respective
shader programs. Each is calculated using a numerical method that estimates its
value. Below are all the steps outlined in order of execution.
Advection
Advection is what happens when the velocity field of the fluid transports other quan-tities,
including itself, along the flow. This is described by the term (~u·r)~u. There
are two methods that the framework uses to calculate advection.
The first, simpler one, is the trace-back implicit routine (Stam, 1999). It uses a
semi-Lagrangian scheme to calculate the new quantity of a fluid property at a posi-tion
by tracing back the trajectory to its former cell and copying the quantity. The
advantage of this advection technique is that it is unconditionally stable for any
time steps and velocities.
p (~x, t+t) = × p(~x−~u(~x, t)t, t)−μ (4.3)
Here p (~x, t+t) is the quantity at the new time step. is a user-defined dissipation
term. It is in the range 2 [0, 1] and it artificially controls how fast the quantity
being advected dissipates. 1 is no dissipation and lower values lead to the quantity
disappearing faster. μ is the decay constant. It is used only for fire simulation and
controls how fast the fire reaction dies out. When it is used the end result is clamped
to not go below 0.
The second advection routine used is the one proposed in (Crane et al., 2007) -
the MacCormack scheme. It works by first performing two semi-Lagrangian steps,
one by tracing forward and one by tracing back. Using those values, it performs a
higher-order accuracy calculation, which leads to less numerical diffusion than the
19

previous routine.
ˆn+1 = A(n)
ˆn = AR( ˆn+1)
n+1 = ×( ˆn+1+ 1
2(n− ˆn))−μ
(4.4)
Here, n indicates the advected property, ˆn+1 and ˆn calculate the two intermedi-ate
properties. n+1 gives the final property at the new time step. A performs the
advection routine 4.3 on the passed quantity and AR indicates that it is performed
in reverse (meaning with a negative time step value). Again is the dissipation fac-tor
and μ is the decay constant. When doing MacCormack advection, no artificial
dissipation or decay is performed on the first two steps. Since this advection routine
is not unconditionally stable, the final result is clamped within the minimum and
maximum values of the surrounding grid cells.
While the MacCormack scheme gives improved detail, it forces the creation of two
additional textures to hold the intermediate results and the computational cost
of calculating them. The cost of the first can be offset by using texture sharing,
mentioned previously, between simulations for these temporary values. The compu-tational
cost is dealt with by using MacCormack advection only for the density and
reaction properties and the standard one for the temperature and velocity fields.
Figure 4.2: Left: Using MacCormack for density and reaction only; Right: Using
MacCormack for all fields
20

As it can be seen, due to the chaotic nature of both fire and smoke, the extra
detail gained by using the more expensive advection routine on all fields is hardly
discernible. In fact, the only difference can be seen in the beginning of a simulation,
as the MacCormack one advects slightly faster.
Buoyancy
Buoyancy is what causes hot air to rise and cool air to fall. In the simulation
it is used to modify the velocity field at each grid cell based on the temperature
and density values at that cell, the density weight and buoyancy and the ambient
temperature of the environment. It is one of the external forces ~F in equation 4.1.
~ fbuyoancy = ((T −Tamb)'−(×))~vupt (4.5)
Where T and represent temperature and density at the current grid location
respectively. Tamb is the ambient temperature of the fluid - if not used, can be left
at 0. The buoyancy factor of the density field is ' - it controls how buoyant the
smoke is, meaning how quickly it rises with the hot air. is the smoke weight - a
higher value will exert a stronger force on the velocity field and will make it die out
faster. The result is multiplied by the global normal up vector and the current time
step numerical integration value. The resultant force is then applied to the velocity
value at that cell location.
Constant Impulse and External Forces
All fluid objects have been designed to have new quantities added every simula-tion
step. For smoke, a constant amount of temperature and density is injected
from the bottom of the domain. The addition of the first helps the system maintain
velocity and the second keeps a steady stream of smoke that is the final visible result.
With fire simulation the addition of temperature remains the same. It also in-jects
extra reaction into the system along with the temperature. This is analogous
to adding fuel to a fire. Afterwards, an extinguishment test is performed on the grid.
This samples reaction values and determines if they are below an extinguishment
threshold - if so, smoke is formed based on a reaction constant.
The ApplyExternalForces method is mainly reserved for future use. This is where
forces such as wind can be added to introduce more chaotic behaviour into the sys-tem.
User interaction with fluids can also be accomplished using this function. Any
21

quantity used by the simulation can be added in this step.
Vorticity Confinement
Even when using a higher-order advection routine, the solver still suffers from nu-merical
dissipation. Vorticity confinement (Fedkiw et al., 2001) tries to offset this
by calculating the local vorticity
~! = r×~u (4.6)
and injecting it back into the velocity field. Calculating this is the first step of the
process. Afterwards a normalized vorticity location vector is retrieved using:
~ = r|~!|
~N
= ~
|~|
(4.7)
In both equations, vector operations are estimated using finite difference methods.
The final confinement force is then calculated by:
~ fconf = (~N
×~!)t (4.8)
In this equation 0 is called the strength factor and controls the amount of small
scale detail that is introduced back into the velocity field. In this project implemen-tation
it is clamped to the range 2 [0, 1]. This force is then added to the existing
flow.
Vorticity confinement requires the addition of 1 extra texture per fluid and 2 rela-tively
cheap shader program operations per update step. The technique proves vital
to the proper appearance of both smoke and fire and more than makes up for its
cost in visual quality.
22

Figure 4.3: Different vorticity confinement strengths. Strength factors from left to
right: 0, 0.5, 1.0
As can be seen - a suitable strength factor lies between 0.5 and 1.0. The simulations
in the project application use values in that range with fire ones tending to be higher.
Projection
Up to this point a velocity field ~w has been calculated but it does not adhere to the
continuity equation 2.2 as it is divergent. Therefore, the final step in each simulation
update is to calculate a divergence-free flow field. (Harris, 2004) explains that the
Helmholtz-Hodge Decomposition Theorem can be used to correct the velocity by
subtracting the gradient of the pressure field:
~u = ~w−rp (4.9)
To compute the pressure field the following Poisson-pressure equation can be used:
r2p = r· ~w (4.10)
These two equations are logically broken down into 3 operations. The first calculates
the divergence of the velocity field r· ~w and stores it in a texture. Again, vector
operations are estimated using finite differences.
23

The second step solves the Poisson-pressure equation using a common method called
a Jacobi iteration solver. It is a technique that converges relatively slowly to a solu-tion
but has the advantage of being cheap to run using GPU kernels (Harris, 2004).
This project uses an average of 10 to 15 Jacobi iterations for both fire and smoke.
A higher number will provide better looking, more accurate results but the compu-tational
cost rises quite steeply. As proven by (Crane et al., 2007) higher iteration
counts do not lead to overly better quality render results.
The final step is a straightforward subtraction of the resultant pressure gradient
from the divergent flow field. The result is stored in ~u which becomes the new
velocity field.
Boundary Interaction
As mentioned in section 4.3, all fluids have a single voxel wide obstacle texture on
the box edges that acts as the boundary for the system. Cells in this texture either
have the value of 1 if there is an obstacle at the location, or 0 if there is none. All
computational steps have access to this texture and use it differently.
Its most important function is to enforce the free-slip boundary condition, which
states that a fluid cannot flow into or out of a solid, but can freely flow along its
surface. This is mainly done in the projection step, where if an obstacle is detected,
the velocity component of that cell is taken as 0. When performing Jacobi iterations
and sampling adjacent cells, if an obstacle is present, the pressure component of that
cell is not used - this is the approach utilised by (Crane et al., 2007).
Obstacles are similarly used in the computation of vorticity confinement and ad-vection
- forcing the velocity vector to be 0 if inside a boundary.
4.4.2 Runtime Modifications
Since so many variables control the appearance and structure of a fluid object, it is
deemed feasible to have as many of them available to be edited at runtime as pos-sible.
These are all kept in a C++ struct called FluidSettings and, along with the
domain size, are used when constructing a fluid. These parameters are then trans-ferred
to the GPU in various buffers during the SetShaderBuffers function from 4.4.
24

At runtime, nearly all of the control parameters can be edited from a user interface
window. This window appears when the user clicks on a fluid object with the mouse.
Figure 4.4: Users have the freedom to edit fluid control settings at runtime to observe
their effects. Reaction values are not used for smoke simulation
When a parameter is edited, a method is called on the respective FluidCalculator
for that object.
void Fluid3DCalculator::SetFluidSettings(const FluidSettings
fluidSettings) {
// Update buffers if needed
int dirtyFlags = GetUpdateDirtyFlags(fluidSettings);
this-fluidSettings = fluidSettings;
if (dirtyFlags BufferDirtyFlags::General) {
UpdateGeneralBuffer();
} else if ...
}
It first checks to see what settings have been changed and sets the necessary update
dirty flags. Using the dirty flag pattern allows for only the constant buffers that
25

have changed to be updated, instead of all of them. Updating a buffer involves
copying its contents from GPU to system memory, changing them and then copying
them back into the GPU so it should not be overused as advised by (McDonald,
2012). Dirty flags assist with this.
In a real game environment a player would not have such access to fluid settings,
but this is immensely useful as a level of detail or game design tool as it allows for
fine-tuning of just how the simulation plays out.
4.4.3 Choosing an Update Rate
For doing updates on objects each frame, games tend to use the difference between
the time at the new frame subtracted from the time at the old frame. This is referred
to as the delta time. Since this can vary with frame rate, sensitive calculations such
as game physics tend to use a fixed integration time step value that is independent
from delta time.
This approach is used here - the value is, by default, 1/30, meaning 30 fluid up-dates
per second. Note that this value controls how often the process method
of a fluid is called, not the t value for the calculation formulas - that is defined
separately for each fluid. The advantage of calling process at a fixed rate is that it
keeps fluid movement consistent. If it was updated with a variable rate, each fluid
would slow down or speed up, leading to a distorted look.
30 updates a second was chosen since it is fast enough for each fluid to develop
with reasonable speed, while still keeping up decent performance. It can be changed
at runtime, although if the update rate increases above the processing capability of
the hardware, the application slows down as it cannot keep up with the required
number of updates. Rates of around 30 to 50 a second are common choices, although
higher ones are certainly achievable on better hardware.
26

4.4.4 Frame Skipping
Even with the many simplifications, memory cutbacks and processing optimisations
used, updating a reasonably-sized fluid object every frame is a demanding operation.
Here is where an LOD technique called frame skipping comes in use. Its premise
is quite simple - instead of updating a fluid simulation every frame, do it every
few frames. It is inspired by the approximation techniques used by game physics
simulations and has previously been adopted for fluids (Tangvald, 2007). Below is
the implementation as used in the project.
void Update() {
bool canUpdate = framesSinceLastProcess framesToSkip;
if (canUpdate) {
fluidCalculator-Process();
framesSinceLastProcess = 0;
}
else {
++framesSinceLastProcess;
}
}
Although a very simple LOD method, frame skipping frees up substantial comput-ing
power, especially when using many fluid objects. Its downside is that its effects
are quickly spotted. Even skipping one frame per process step means that the sim-ulation
will update twice as slow. Therefore, this technique is only used on fluids
which are not in the current view frustum. Even then, it starts being used only after
the simulation has had a few seconds to develop first. Afterwards, no difference in
behaviour can be noticed when looking away and then back at a fluid object, since
the behaviour is inherently chaotic.
Choosing the amount of frames to skip can be changed at runtime. It has to be
noted that the performance gained by skipping additional frames is not linear - it
peaks at 5 and most is gained around skipping 2 or 3.
27

4.5 Rendering Fluids
Rendering the final result is done via the Ray-marching technique previously dis-cussed
(Zhou et al., 2007). It was chosen due to it being straightforward to im-plement
in a standard pixel shader program and for its ability to give good visual
results.
A fluid in 3D space is represented by an object called a VolumeRenderer which
at its core is a simple cube - it has a position, rotation and scale components - all
the required properties for rendering in 3D space. When an instance of a volume
renderer is constructed it needs to know what type of fluid it will render - smoke or
fire. If the type is smoke - it can be given a reference to a 3D texture (in the form
of a SRV) of smoke density values that it then uses for drawing. If rendering fire,
it can also be passed a reference to a 3D texture of fire reaction values in addition
to density. This creation data is important, as there are different pixel shaders used
when rendering each type.
4.5.1 Render Parameters
There are certain parameters that affect the render result of a fluid simulation which
can be modified at runtime.
Figure 4.5: Render settings modify the look of a fluid without changing its physical
properties
Number of Samples
The number of samples is the sample rate described in section 3.3.2. It has a direct
effect on the quality of the produced result. A higher rate will sample more density
and reaction values, thus producing a more accurate average colour. It also means
that more time will be spent in the pixel shader, which directly affects performance.
In practise, sample rate matters only when the fluid takes up a significant amount
28

of screen space. This is due to the fact that a pixel shader is only run on the visible
pixels on the screen that an object occupies.
Figure 4.6: Different sample rates of a 64x128x64 fluid from afar. Left: 32 samples;
Right: 128 samples
As it can be seen, from afar the difference in quality is hardly discernible, although
the step size difference is substantial. The performance of both is nearly identical
- since there are less pixels occupying the screen space, the extra time spent in the
shader program is insignificant.
Figure 4.7: Different sample rates of a 64x128x64 fluid from up close. Left to right:
32, 64, 128 samples
This is the same flame as the one in the previous figure. When viewing from a
29

closer distance, the quality of using a higher number of samples can be seen more
clearly (this is more defined when seeing the fluid moving). This is due to the lower
step value leading to a smaller range of colours used to represent the fluid. A vis-ible
improvement is seen when increasing the sample rate from 32 and 64, but a
very slight one when going from 64 to 128. This is because more samples cannot
make up for the grid size of a fluid. Even for a relatively big domain, like the one in
the figures, there will be little visual gain when using more than 100 samples per ray.
There are significant performance implications when using a higher sample rate
with the fluid in full view, since the fluid takes up a large part of the screen. De-pending
on the view distance, rendering with 32 samples could be nearly twice as
fast as rendering with 128. It is therefore best to decide upon a sample rate that
would give a good visual result, yet still compute fast.
Colour and Absorption
Changing the Smoke Color property alters the colour appearance of smoke for both
types of simulations.
Smoke and fire Absorption control how much to saturate the resultant colour when
sampling density and reaction values respectively. A higher value will mimic thick
smoke or flames, while a lower one will produce a weaker looking flame or lighter
smoke.
4.5.2 Fluid Instancing
A key goal throughout the development of this project is to separate the concept of
fluid motion calculation from fluid rendering. A Fluid3DCalculator object does not
know about a VolumeRenderer and vice versa. The former is responsible for setting
up and running the equations of motion on a set of 3D grids while the latter will
render suitable 3D textures passed to it.
Given this separation, it is straightforward to implement a form of instancing for
fluids. This means that one fluid instance can be drawn multiple times by different
volume renderers. Since the cost of rendering is trivial compared to the cost of
simulating, this allows for a scene to seemingly contain many fire and smoke effects,
while only computing a small amount.
30

Volume renderers using data from the same fluid instance will display identical
results. To make them visibly dissimilar, each can be set different render parame-ters.
A combination of colour and absorption can be used to achieve non-identical
looking fluids. The final scene as seen on page 33 is made out of 2 unique smoke
simulations - one of which has 3 instances, and 1 unique fire simulation that has 2
instances.
Instancing and Frame Skipping
Frame skipping is used when a fluid simulation is not in view. Instancing means
that the same fluid simulation can be in more than one place. To deal with this,
before activating frame skipping, all volume renderers that use a particular fluid
object are tested for visibility. If even one is in view - frame skipping will not occur.
31

Chapter 5
Results and Discussion
The previous chapter covered the implementation details of calculating the fluid
equations of motion and rendering the result. It also discussed various optimisation
methods used to make the process as performant as possible. This chapter will
examine the results of the implementation to determine its effectiveness. This will
involve scrutinizing both the visual results of the simulation and its performance.
5.1 Testing Setup
5.1.1 Hardware
The application was tested and benchmarked on two different systems. The first is
a mid-tier laptop and the second is a high-end gaming PC.
Table 5.1: Hardware used for testing
Laptop PC
CPU Intel Core i7-3632QM @ 2.20GHz i5 3570K @ 4.5GHz
RAM 8 GB, DDR3 12 GB, DDR3
GPU NVIDIA GeForce GT 640M LE, 2 GB DDR3 ATI R9 280X, 3GB GDDR5
OS Microsoft Windows 7 64-bit Microsoft Windows 8.1 64-bit
The important difference between the two setups being the graphics card. The
NVIDIA, being a mobile low-power series, has around 2.5 times less clock cycles
and 11 times less memory bandwidth compared to the ATI one. Detailed specifica-tions
on both GPUs can be found in appendix A.
32

Quantitative results are obtained during application runtime. There is an in-game
frame counter to report on FPS. It displays current, minimum, maximum and av-erage
frames per second achieved and is used as a benchmark for performance.
5.1.2 Scene
The test scene has been set up to fulfil the application requirements. There are 3
different fluids computed at the same time - 2 fire and 1 smoke effects. There are 6
volume renderers visualising the results of those simulations.
Figure 5.1: Looking at the entire final scene from a distance with all fluids in view
The user is free to control the camera, click on fluids and change or observe their
parameters. There is also a scene fly-through mode, which performs a looping
predefined movement through the scene. This mode features both up-close and
distance views of the various fluids in the scene.
33

5.2 Visual Results
Real-world phenomena, such as smoke and fire, come with an inherent random-ness
and subtle features that computer graphics do not have the power to precisely
mimic. With certain simplifications and smart uses of technology, though, the re-sults
obtained in this project successfully attempt to bridge that gap.
Figure 5.2: Smoke and fire simulation in the application
5.2.1 Modifying Parameters
Since the application allows the freedom to modify both fluid and render settings -
it is very easy to produce different looking simulations.
Figure 5.3: Right: Fast decaying fire, producing a lot of smoke; Mid: Strong fire,
burning with nearly no smoke; Right: Average strength fire, producing blue smoke
34

5.3 Memory Performance Results
The main goal of this project is to prove that the parallel power of graphics cards has
reached a threshold that would allow for real-time physically-based fluid simulation.
For this reason memory and frame times are both a topic of common discussion
throughout this project.
5.3.1 Memory Use
In section 4.3.2 the various optimisations that are performed during a fluid object
set up were discussed. By querying the GPU, it can be seen how much video mem-ory
fluids of different types and domain sizes use. Below is table with several of
these results with increasing grid resolution. These do not include video memory
for rendering.
Table 5.2: Video memory used for simulations of different resolution
Grid Size Smoke Memory Fire Memory Shared Memory
16x16x16 0.1 MB 0.11 MB 0.06 MB
32x32x32 1.8 MB 1.9 MB 0.8 MB
64x64x64 13.8 MB 14.8 MB 5.5 MB
128x128x128 110 MB 118 MB 44 MB
Smoke Memory is the video memory required per unique smoke effect and Fire
Memory is the memory required per unique fire effect. Shared Memory is how much
of that total can be shared with other simulations.
As it can be seen, the memory required to store all of the textures that contain
the fluid properties rises exponentially with grid resolution. By utilising texture
sharing, some of this memory cost is offset when using more than one fluid of the
same size. Even so, using sizes bigger than 1283 is infeasible both due to the memory
cost required but also because the processing time quickly rises. A good option is
to only use a higher resolution in one or two dimensions, while using a smaller on
in another.
Alternatively, grids in the range of 303 to 503 are ideal for modelling average sized
uniform domains. Their memory cost comes around 1 to 1.5 times that of high
35

quality PNG images, which are often used as textures in games. Both test GPUs
have in excess of 2 GB of memory to spare, so this is a small cost to pay.
Finally, instancing allows for having many fire and smoke effects without paying
the memory cost for creating each one. Its benefits are measured in the amount
of instances that use a single fluid object. Considering also that the cost of a vol-ume
renderer is insignificant compared to that of a simulation means that, where
appropriate, instancing should be preferred to creating a new fluid effect.
5.3.2 Performance
To recap, the final scene features 1 smoke simulation of grid size 64x128x64, an-other
smoke one of grid size 30x60x30 and 1 fire of size 40x80x40. There are a
total of 6 volume renderers displaying the results of these simulations. Each simula-tion
does 10 Jacobi solver iterations and uses a sample rate of 64 when ray-marching.
This scene was benchmarked on both test machines several times with increasing
simulation update rates. Benchmarking involves running the scene in fly-through
mode for a period of 5 minutes and noting down the minimum, maximum and
average frame rates achieved.
Figure 5.4: Benchmark results on notebook computer using a NVidia GT 640M LE
GPU
The substantial difference between the maximum and the minimum and average
FPS is noticed immediately. This is due do the use of frame skipping when some
or all simulations are not in view, freeing up GPU resources. The minimum frame
rate occurs when all fluid objects are in view and one or more are viewed up close,
36

which increases render time. The majority of time in the fly-through mode is spent
with all or 2 out of 3 simulations in view from a distance. This is what the average
FPS captures.
The benchmark results show that going above 30 updates/sec is not feasible on
this setup since frames quickly start dropping. As mentioned previously, if the up-date
rate forces the use of more clock cycles and texture bandwidth than available,
the program slows down.
Figure 5.5: Benchmark results on gaming PC using an AMD Radeon R9 290X GPU
This graph displays the significance that increased memory bandwidth and clock
cycles have on performance. The AMD R290x only begins to get a decreased frame
rate when doing over 150 updates/sec. Up until then, it consistently keeps an av-erage
of above 800 FPS. Only around the 200 updates/sec mark do the simulations
start reaching the system limits.
In reality, though, there is no reason to use an update rate of more than 30-40
when that power can be spent on computing and rendering more fluid objects, in-stead.
These results show the potential that the new generation of GPUs have for
handling such computationally intensive tasks.
37

Chapter 6
Conclusion and Future Work
This project had the goal of investigating fluid simulation with the aim of answering
the following question:
How can the parallel processing advantage of modern graphics cards be
used for simulating physically-based fluids, and how can this approach
be adapted for real-time use?
With particular goals being:
• Derive an effective way of utilising the GPU for solving the equations of fluid
motion in 3D.
• Discover what level of detail methods and performance optimisations can be
applied in order to use fewer system resources.
This research has demonstrated that the equations of fluid motion can be calculated
in real-time with reasonable frame rates on the GPU. The project implementation
provided offers an optimised and memory efficient solution for numerically solving
and rendering fire and smoke with satisfactory results.
The performance tests in Chapter 5 clearly show that the newest generation of
graphics cards are more than capable of updating and rendering many simulations
at once. The tests also showed that low-to-mid tier cards can handle their own when
dealing with a few reasonably sized fluid domains at an average update rate.
38

6.1 Future Work
This project covers how to efficiently implement a fluid solver and render the results.
For a topic as broad as fluid simulation there is certainly more research that could
be done.
One area that can certainly be further investigated is implementing interactions
with a fluid. The external forces part of the motion equations can be used to pro-vide
a form of user control of the system. (Crane et al., 2007) implement a form of
object voxelisation using a geometry shader to allow arbitrary 3D models to be used
as obstacles in the simulation. This technique could be extended and improved to
take into account different objects going into and out of a fluid domains, disturbing
it based on their velocity and shape.
When there are only a few sources of constant input into a fluid domain, large
parts of the 3D grid are left empty but still take up computational time. A better
way to handle updating a fluid would be to split up each grid into chunks and de-termine
if a chunk contains fluid properties. Then, only the ones that do will be
updated. This technique has the potential to allow for much faster processing of
bigger fluid domains.
To further increase visual quality, rendering smoke could take into account light
sources and each fluid should be able to cast dynamic shadows. Additionally, a fire
itself could be made a light source. This would be achieved by first creating a num-ber
of lights per fire simulation and then advecting their positions via the velocity
field and controlling their brightness via the reaction field.
39

Appendix A
Test GPUs Specifications
Figure A.1: Technical Specifications of both graphics cards used for testing. The
bandwidth and clock speeds are the key factors for performance
40

Appendix B
CD Contents
The attached CD contains the following directory structure:
Application Contains the final application executable.
Dissertation Contains an electronic copy of this dissertation document.
Instructions Contains instructions for the operation of the application.
Media Contains images and video of the final application.
Project Contains the full source code and assets for the application.
Proposal Contains an electronic copy of the original project proposal.
41

References
Bai, Y. and Turk, G. 2005. Reducing numerical dissipation in fluid simulation.
Georgia Institute of Technology Available from: http://tinyurl.com/pcy4exs.
3.1
Barrett, J. 2012. Real-time animation and rendering of ocean waves. [Online]. 2.2
Bridson, R. 2008. Fluid Simulation for Computer Graphics. CRC Press. 2.1.1
Carucci, F. 2005. Inside Geometry Instancing. Addison-Wesley Professional.
Available from: http://http.developer.nvidia.com/GPUGems2/gpugems2_
chapter03.html. 2.3
Crane, K., Llamas, I., and Tariq, S. 2007. Real-Time Simulation and Render-ing
of 3D Fluids. Addison-Wesley Professional. Available from: http://http.
developer.nvidia.com/GPUGems3/gpugems3_ch30.html. 3.2, 3.3.2, 4.3.2, 4.4.1,
4.4.1, 4.4.1, 6.1
Fedkiw, R., Stam, J., and Jensen, H. W. 2001. Visual simulation of smoke. In:
SIGGRAPH 2001 Conference. 3.1, 3.2, 4.4.1
Fernando, R. et al. 2004. GPU Gems: Programming Techniques, Tips and Tricks for
Real-Time Graphics. Addison Wesley. Available from: http://http.developer.
nvidia.com/GPUGems. 3.2
Gourlay, M. 2012. Fluid simulation for video games. Intel Devel-oper
Zone Available from: http://software.intel.com/en-us/articles/
fluid-simulation-for-video-games-part-3. 1, 3.3.1
Gupta, S. 2011. Gpu supercomputers show exponential growth in top 500
list. [Online]. Available from: http://blogs.nvidia.com/blog/2011/11/14/
gpu-supercomputers-show-exponential-growth-in-top500-list/. 1
42

Harris, M. 2004. Fast Fluid Dynamics Simulation on the GPU. Addison Wes-ley.
chap. 38. Available from: http://http.developer.nvidia.com/GPUGems/
gpugems_ch38.html. 1, 3.2, 4.4.1, 4.4.1
Hess, J. and Smith, A. 1967. Calculation of potential flow around arbitrary bodies.
In: Progress in Aerospace Sciences. 3
Krüger, J. and Westermann, R. 2003. Linear algebra operators for gpu implementa-tion
of numerical algorithms. In: SIGGRAPH 2003 Conference. Available from:
http://tinyurl.com/ozb5xpy. 3.2
McDonald, J. 2012. Don’t throw it all away: Efficient buffer man-agement.
In: Game Developer Conference. Available from: https:
//developer.nvidia.com/sites/default/files/akamai/gamedev/files/
gdc12/Efficient_Buffer_Management_McDonald.pdf. 4.4.2
McGuire, M. 2006. A real-time, controllable simulator for plausible smoke.
Brown University Available from: http://graphics.cs.williams.edu/papers/
SmokeSimBrown06/smoke-simulation-brown06.pdf. 3.3.1
MSDN. 2010. Compute shader overview. [Online]. Available from: http://tinyurl.
com/plpw97t. 1
MSDN. 2014. Resource interfaces. [Online]. Available from: http://tinyurl.com/
mwledo4. 4.3
Nguyen, H. et al. 2007. GPU Gems 3. Addison-Wesley Professional. Available from:
https://developer.nvidia.com/content/gpu-gems-3. 3.2
NVidia. 2013. NVidia Computational Fluid Dynamics Page. [Online]. Available from:
http://www.nvidia.com/object/computational_fluid_dynamics.html. 1
Stam, J. 1999. Stable fluids. In: SIGGRAPH 1999 Conference. Avail-able
from: http://www.dgp.toronto.edu/people/stam/reality/Research/
pdf/ns.pdf. 3.1, 3.2, 4.4.1
Stam, J. 2003. Real-time fluid dynamics for games. In: Game Developer Con-ference.
Available from: http://www.dgp.toronto.edu/people/stam/reality/
Research/pdf/GDC03.pdf. (document), 2.1, 3.1, 3.3.1
Steinhoff, J. and Underhill, D. 1994. Modification of the euler equations for “vorticity
confinement”: Application to the computation of interacting vortex rings. Physics
of Fluids . 3.1
43

Hellgate: London. 2007. DVD-ROM. 3.2
Tangvald, L. 2007. Implementing lod for physically-based real-time fire rendering.
[Online]. 4.4.4
Valve, S. 2012. Level of detail. Valve Developer Portal Available from: https:
//developer.valvesoftware.com/wiki/Level_of_detail. 2.3
Zhou, K. et al. 2007. Real-time smoke rendering using compensated ray march-ing.
Microsoft Research Available from: http://research.microsoft.com/
pubs/70503/tr-2007-142.pdf. 3.3.2, 4.5
44

Bibliography
Acheson, D. 1990. Elementary Fluid Dynamics. Clarendon Press.
Rideout, P. 2011. 3d eulerian grid Available from: http://prideout.net/blog/
?p=66.
Selle, A. et al. 2007. An unconditionally stable maccormack method Available from:
http://tinyurl.com/nm4novl.
45

Project Dissertation

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Project Dissertation

Similar to Project Dissertation (20)

Project Dissertation