SlideShare a Scribd company logo
1 of 54
Download to read offline
Real-Time Physically-Based 
Fluid Simulation on the GPU 
Valentin Hinov
Declaration of Originality and Permission to Copy 
Author: Valentin Hinov 
Title: Real-Time Physically-Based Fluid Simulation on the GPU 
Degree: BSc (Hons) Computer Games Technology 
Year: 2014 
(i) I certify that the above mentioned project is my original work. 
(ii) I agree that this dissertation may be reproduced, stored or transmitted, in any 
form and by any means without the written consent of the undersigned. 
Signature: ................................................................. 
Date: ................................................................. 
i
Abstract 
Physically-based fluid simulation has long been reserved for the realm of offline ren-dering. 
Increasing improvements in the parallel computational power of graphics 
cards are bringing the opportunity to simulate this phenomena in real-time. This 
projects aims to prove that, with certain simplifications and optimisations, fluid 
simulation can be used in demanding applications such as games. 
A framework is created for this project to present methods for calculating and 
rendering fire and smoke using the parallel processing power of the graphics card 
through the DirectX 11 Compute Shader APIs. The suggested approach takes into 
consideration the importance of maintaining performance in a real-time application. 
Various LOD(Level Of Detail) and performance optimisation methods used in games 
are adopted and modified for this purpose. 
The most important variable for smooth gameplay is the frames-per-second (FPS) 
that an application maintains. By keeping a constant measure of it, the framework 
provides a means to monitor the stability and effectiveness of the implementation. 
The results of this project show that proper adoption of LOD techniques, such 
as frame skipping can greatly reduce processing overhead. On the other hand, the 
use of instancing techniques can allow for multiple fluids to be rendered at the cost 
of simulating just one. This, together with smart usage of texture management 
help keep the memory and processing footprint low. Conclusively, these combined 
provide an optimized solution for using physically-based fire and smoke in a real-time 
setting, which maintains both accuracy and visual quality. Measurements show 
that simulating 3 differently sized fluid domains - 64x128x64, 40x80x40, 30x60x30 - 
maintains an average frame rate of over 800 on a high tier graphics card, while still 
managing a comfortable 50 on a low tier one. 
Keywords: fluid simulation, performance, DirectX 11, Compute Shader 
ii
Preface 
I would like to take this opportunity to extend my gratitude to the support and help 
I have received from my supervisor, Dr David MacTaggart, and my module tutor, 
Dr Henry Fortuna. I would also like to thank Alex Dunn, who provided me with 
valuable advice and constructive criticism. 
I am also immensely grateful for the help and patience provided by Tsvetelina 
Dacheva and the support of my parents during the long production hours on this 
project. 
- Valentin Hinov 
iii
Contents 
Abstract ii 
Preface iii 
List of Figures vi 
List of Tables viii 
1 Introduction 1 
1.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 
1.2 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4 
2 Background 5 
2.1 Mathematics of Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . 5 
2.1.1 Modelling the Simulation Space . . . . . . . . . . . . . . . . . 6 
2.2 State of Fluids in Games . . . . . . . . . . . . . . . . . . . . . . . . . 6 
2.3 LOD and Performance Overview . . . . . . . . . . . . . . . . . . . . . 7 
3 Literature Review 8 
3.1 Early work on real-time solvers . . . . . . . . . . . . . . . . . . . . . 8 
3.2 The GPU advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 
3.3 3D Fluid Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 
3.3.1 Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11 
3.3.2 Volume Ray Casting . . . . . . . . . . . . . . . . . . . . . . . 12 
4 Methodology 13 
4.1 Introduction and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 13 
4.1.1 Framework Architecture . . . . . . . . . . . . . . . . . . . . . 13 
4.1.2 Methodology Structure . . . . . . . . . . . . . . . . . . . . . . 14 
4.2 Fluid Domain Representation . . . . . . . . . . . . . . . . . . . . . . 14 
4.3 Setting up a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15 
iv
4.3.1 Choosing a Grid Size . . . . . . . . . . . . . . . . . . . . . . . 15 
4.3.2 Setup Optimisations . . . . . . . . . . . . . . . . . . . . . . . 16 
4.4 Running a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 
4.4.1 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . . . 19 
4.4.2 Runtime Modifications . . . . . . . . . . . . . . . . . . . . . . 24 
4.4.3 Choosing an Update Rate . . . . . . . . . . . . . . . . . . . . 26 
4.4.4 Frame Skipping . . . . . . . . . . . . . . . . . . . . . . . . . . 27 
4.5 Rendering Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 
4.5.1 Render Parameters . . . . . . . . . . . . . . . . . . . . . . . . 28 
4.5.2 Fluid Instancing . . . . . . . . . . . . . . . . . . . . . . . . . . 30 
5 Results and Discussion 32 
5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 
5.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 
5.1.2 Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 
5.2 Visual Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 
5.2.1 Modifying Parameters . . . . . . . . . . . . . . . . . . . . . . 34 
5.3 Memory & Performance Results . . . . . . . . . . . . . . . . . . . . . 35 
5.3.1 Memory Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 
5.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 
6 Conclusion and Future Work 38 
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 
Appendix A Test GPUs Specifications 40 
Appendix B CD Contents 41 
References 41 
Bibliography 45 
v
List of Figures 
1.1 Computational performance of Navier-Stokes equations on new NVidia 
GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 
3.1 Advection step moving smoke density along a velocity field. As shown 
in Stam’s solver from 2003 (Stam, 2003) . . . . . . . . . . . . . . . . 9 
3.2 Smoke being pushed and moving around by a gargoyle in "Hellgate: 
London" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 
4.1 Different grids, same render scale. Domain size from left to right: 
16x32x16, 32x64x32, 64x128x64 . . . . . . . . . . . . . . . . . . . . . 16 
4.2 Left: Using MacCormack for density and reaction only; Right: Using 
MacCormack for all fields . . . . . . . . . . . . . . . . . . . . . . . . 20 
4.3 Different vorticity confinement strengths. Strength factors from left 
to right: 0, 0.5, 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 
4.4 Users have the freedom to edit fluid control settings at runtime to 
observe their effects. Reaction values are not used for smoke simulation 25 
4.5 Render settings modify the look of a fluid without changing its phys-ical 
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 
4.6 Different sample rates of a 64x128x64 fluid from afar. Left: 32 sam-ples; 
Right: 128 samples . . . . . . . . . . . . . . . . . . . . . . . . . 29 
4.7 Different sample rates of a 64x128x64 fluid from up close. Left to 
right: 32, 64, 128 samples . . . . . . . . . . . . . . . . . . . . . . . . 29 
5.1 Looking at the entire final scene from a distance with all fluids in view 33 
5.2 Smoke and fire simulation in the application . . . . . . . . . . . . . . 34 
5.3 Right: Fast decaying fire, producing a lot of smoke; Mid: Strong 
fire, burning with nearly no smoke; Right: Average strength fire, 
producing blue smoke . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 
5.4 Benchmark results on notebook computer using a NVidia GT 640M 
LE GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 
vi
5.5 Benchmark results on gaming PC using an AMD Radeon R9 290X 
GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 
A.1 Technical Specifications of both graphics cards used for testing. The 
bandwidth and clock speeds are the key factors for performance . . . 40 
vii
List of Tables 
4.1 Texture Formats and their uses . . . . . . . . . . . . . . . . . . . . . 17 
5.1 Hardware used for testing . . . . . . . . . . . . . . . . . . . . . . . . 32 
5.2 Video memory used for simulations of different resolution . . . . . . . 35 
viii
Chapter 1 
Introduction 
Fluid simulation has been a hot topic in computer graphics, especially in the last 
decade where the dramatic increase in computational power has affected not only the 
CPU (Central Processing Unit) but the much more parallel-focused GPU (Graphics 
Processing Unit) (Gupta, 2011). 
For virtual environments in games, a correct portrayal of natural phenomena, such 
as smoke and fire, aids greatly in immersing the player in the world and making 
it appear believable. However, realistic rendering and simulation of these fluids 
requires a considerable amount of resources - both from a processing and from a 
memory standpoint. In fact, when an extreme degree of accuracy is needed - for 
example, how a ships design will handle at sea - high-performance computing (HPC) 
centres are used and the calculations often take several months to complete. 
Computer games, in their inherent nature, are all about interacting in real-time 
with a virtual world. As CPUs and graphics cards have got more powerful, the 
expectations of how fast real-time is and how good the worlds look has increased. 
Depending on the game, the accepted frame rate varies between 30 and 60. Drops 
below 30 become instantly obvious, as the world seems to experience "hiccups" and 
the action slows down. In fast-paced, twitch-based experiences, such as first-person 
shooters or real-time strategies, maintaining a frame rate of 60 is often a requirement 
for smooth gameplay. 
The challenge of integrating realistic fluid simulation in a virtual world, while ad-hering 
to these requirements, is the main motivation behind this project. Lets start 
by defining what a fluid is. A fluid is any substance that flows - meaning it can 
take the shape of its container. This includes liquids, such as water, and gases, such 
1
as air. Smoke can also be described as a fluid, although it is more accurate to say 
that it is composed of tiny particulates suspended in a gas (Gourlay, 2012). Fire is 
the chemical process of combustion, leading to the release of heat and light. As it 
decays, smoke forms as a by-product. 
In graphics a fluid can be modelled as a grid system of cells, each containing the 
properties of the fluid at that location. The most important of these are velocity - 
the speed and direction of the flow at a cell location, and density - the amount of 
material that position contains. Every update step, the equation of motion is ap-plied 
on each cell and the quantity of the properties it contains changes. It follows 
that depending on the grid size and quality of the simulation, this traversal and 
update can quite be an expensive operation. 
This is where the GPU advantage comes in - splitting up a task into a lot of smaller 
parallel-running jobs is exactly what the hardware excels at. In fact, early 2D GPU 
fluid dynamics experiments saw a performance increase of up to six times compared 
to a CPU implementation (Harris, 2004). As GPGPU (General-purpose computing 
on graphics processing units) has advanced with technologies such as CUDA and 
DirectCompute (MSDN, 2010), speeds of up to ten times faster are becoming a 
reality (NVidia, 2013). 
Figure 1.1: Computational performance of Navier-Stokes equations on new NVidia 
GPUs 
2
Simply moving all calculations to the graphics card does not solve the problem fully. 
What needs to be considered is that in a proper real-time interactive application, 
the GPU will be engaged with many other activities - such as rendering polygons, 
doing lighting calculations and others - meaning that fluid simulation and rendering 
cannot be the only task occupying resources. 
1.1 Project Aim 
The objective of this project is to investigate the simulation of physically-based 
fluids by taking advantage of modern GPU hardware with the aim of answering the 
following question: 
How can the parallel processing advantage of modern graphics cards be 
used for simulating physically-based fluids, and how can this approach 
be adapted for real-time use? 
The main consideration during this investigation will be what simplifications can 
be made when simulating - both when setting up and during runtime - in order to 
obtain a result which is both graphically impressive and computationally efficient. 
The project hopes to achieve the following: 
• Derive an effective way of utilising the GPU for solving the equations of fluid 
motion in 3D. 
• Discover what level of detail methods and performance optimisations can be 
applied in order to use fewer system resources. 
• Draw conclusions and recommendations for further research into this area. 
Over the course of this undertaking, an experimental framework will be developed 
to showcase the research discoveries. It will also be used to gather quantitative data 
in order to make the appropriate conclusions as to the effectiveness of the provided 
solutions. 
3
1.2 Dissertation Structure 
Chapter 2 gives additional background on the main topics of discussion: the math-ematical 
formulas describing fluid motion; current state of fluid representation in 
games and it ends with a review of how level of detail is used in games to increase 
performance and how existing techniques can be adapted to fluid simulation. 
Chapter 3 presents past research in the area of fluid simulation - starting from 
work on early real-time solvers and moving on to research into using the GPU. It 
also discusses past work on ways of rendering fluids as well as research into integrat-ing 
level of detail in fluid simulation. 
Chapter 4 describes this projects implementation of a physically-based incompress-ible 
Navier-Stokes. It also discusses what optimisations are used for setting up and 
running a simulation. It ends with how fluid rendering is handled. 
Chapter 5 analyses the results and data collected from the experimental frame-work 
and explores their implications. 
Chapter 6 concludes this dissertation and draws recommendation for future work 
that could be undertaken. 
4
Chapter 2 
Background 
2.1 Mathematics of Fluid Flow 
For understanding of the mechanics behind fluid dynamics, knowledge of differential 
vector operations is expected. Gradient rf, divergence r·~v, curl r×~v, directional 
derivative ~v·rf and the Laplacian r2 are all used in the Navier-Stokes Equations 
which are described below. 
@~u 
@t 
= −(~u·r)~u− 
1 
 
rp+r2u+ ~F (2.1) 
r·~u = 0 (2.2) 
Where ~u is the velocity of the fluid; p is the pressure;  is the density;  controls 
the viscosity of the fluid and ~F encapsulates all external forces acting on it. 
Equation (1) is known as the momentum equation of fluid flow. It is derived from 
Newton’s 2nd law of motion which means it describes the acceleration of the fluid 
due to forces acting on it. From left to right these being advection, pressure, diffu-sion 
and external forces. When dealing with complex media it is common to make 
simplifying assumptions so as to more easily model the problem. Thus, when deal-ing 
with a fluid it is assumed that it is an incompressible and homogeneous one. 
Equation (2), the continuity equation enforces the incompressibility assumption by 
ensuring that the fluid always has zero divergence, meaning that the volume of the 
fluid will remain constant in time. 
5
The Navier-Stokes equations are commonly used because they precisely describe the 
evolution of a velocity field over time given its current state and other forces Stam 
(2003). The key task of a fluid solver is to compute a numerical approximation of ~u. 
This velocity field later controls the visual phenomena of the fluid - smoke density 
or fire reaction values for example. 
2.1.1 Modelling the Simulation Space 
Fluids are typically modelled in one of two ways - as a field or as a particle system. 
These are referred to as the Eulerian and Lagrangian viewpoints, respectively. The 
first considers the fluid as a region of points - each containing properties like velocity 
and density. These values change with time, but the points containing them stay 
fixed in space. The Lagrangian viewpoint takes the more conventional approach of 
modelling the continuum as a set of particles. Each particle, in addition to carrying 
with it the properties of the fluid, has a position component. The easiest way to 
visualise this is to think of the particles as molecules of fluid that move in time. The 
most common way of representing Eulerian fluids is an arrangement of voxels and 
Lagrangian ones as classic particle systems. A more in-depth description of these 
viewpoints can be found in (Bridson, 2008). 
2.2 State of Fluids in Games 
Fluid simulation in games covers a wide range of phenomena - the most important 
of which are water, smoke and fire. As this project mainly deals with the latter two, 
they will be the focus of discussion. For a more detailed look at the state of water 
in games, please refer to (Barrett, 2012). 
3D games have to do an impressive amount of work to provide an immersive expe-rience. 
During each update call physics, pathfinding, lighting, rendering and other 
calculations have to be computed. It is no surprise that developers look to simplify 
effects whenever they can. Particle systems have for the longest time been used to 
model smoke. Particles are just 2D textured sprites, which always face the virtual 
camera. Fire is often rendered the same way with the addition of static animations. 
With improvements in lighting and particle control, the look of the effects does 
visibly improve but at their core the simulation is not based on physical properties 
but is determined by design tools. 
6
An additional negative side to this representation is that it makes proper interaction 
with the fluids very difficult to achieve. While, as is discussed later, in a suitably 
defined Navier-Stokes solver, boundary conditions are part of the simulation process 
and can be used as a means of accurate interaction with the system. 
2.3 LOD and Performance Overview 
The need to render many and different graphical effects on a system with limited re-sources 
has been a widely explored challenge. Various techniques are often employed 
to save processing time and system memory while still maintaining good graphical 
quality. It is worthwhile to study some of these methods with the view of how they 
might be adopted for fluid simulation. 
A common performance boost when rendering 3D polygon mesh objects involves 
reducing the amount of polygons they are made out of (Valve, 2012). There are 
a variety of different ways to accomplish this - either with pre-made low-poly ver-sions 
of the mesh or via a procedural method at runtime, often using GPU shaders. 
A system is then set up to intelligently swap or blend between different versions 
based on various parameters, such as the ones mentioned above. The end result is 
less graphics bandwidth used and fewer computations made. If done properly, the 
player never notices it. 
Level of detail has other uses than just object rendering. It also has a place in 
the complex calculations such as rigid body dynamics. For simulating collisions be-tween 
bodies, for example, if the player is not looking at the objects in question, a 
simplified, less realistic calculation can take place. Whatever the player is looking at 
needs to behave consistently, but objects outside this direct area are less important 
and approximations can be used. 
A common occurrence in games is the need to render many of the same object 
multiple times. Creating copies for the resources required, like vertex buffers and 
textures, can quickly add up. Instead the same graphical data is used to render 
multiple copies of the object where required (Carucci, 2005). Since transferring tri-angle 
data from the CPU to the GPU and submitting state changes is a relatively 
slow operation, instancing is a method that frees up valuable CPU processing time. 
Batching as many draw calls together as possible is an often advised method of 
optimising game renderers. 
7
Chapter 3 
Literature Review 
Investigations into the physics behind fluid simulation dates back to the 18th and 
19th centuries when the mathematicians Euler, followed by Navier and Stokes devel-oped 
the basics of analytical solutions to fluid flows. With the start of the computer 
processing era came the possibility to calculate solutions to these equations numer-ically. 
Far from the idea of real-time applications, however, early research into the 
topic focused on engineering applications, striving for accuracy and not factoring in 
time taken (Hess and Smith, 1967). 
3.1 Early work on real-time solvers 
In the late 90s and early 2000s the prospect of real-time simulations started to be 
actively discussed in research fields. Up to this point the majority of the inves-tigation 
had been into offline graphical solvers. Also, the majority of numerical 
solvers by that point used explicit techniques which suffer from instability unless a 
small simulation time step is provided. It was Jos Stam, who in 1999 SIGGRAPH 
conference, proposed an implicit Navier-Stokes solver that was stable under higher 
time steps and was fast enough so results could be viewed instantly (Stam, 1999). 
The importance of this paper stems from the fact that the method it put forward 
was designed to be used in real-time. Not only that, but this approach allows for 
boundary conditions to be dynamic and, as such, opens the door to interactivity 
with the fluid. For game applications this is key. The resultant technique is very 
successful in simulating gaseous-type fluids and will influence this study. 
8
Figure 3.1: Advection step moving smoke density along a velocity field. As shown 
in Stam’s solver from 2003 (Stam, 2003) 
Stam’s initial proposal has downsides, though. Namely, it suffers from numerical 
dissipation (also known as numerical diffusion/smoothing). This, as described by 
(Bai and Turk, 2005) is due to the averaging operations performed when interpo-lating 
values in the differential equation numerical solvers. Due to the lower order 
accuracy of the advection routine, Stam’s method experiences this. This not only 
tends to smooth out interesting features, like vortices in the fluid, but also makes 
the fluid appear too viscous. 
With this problem in mind, Fedkiw et al. (Fedkiw et al., 2001) presented a sem-inal 
paper in the 2001 SIGGRAPH proceedings. In Visual Simulation of Smoke 
the incompressible Euler equations are used as the fluid solver on a staggered grid 
arrangement. They are combined with a new method called vorticity confinement 
(Steinhoff and Underhill, 1994) which injects the energy lost due to numerical dissi-pation, 
effectively balancing out the simulation. The result is that, even on a fairly 
coarse grid, the aforementioned interesting features, such as swirling vortices in the 
smoke field, are preserved and the overall lifespan of the smoke is improved. Like 
Stam’s proposed method, this one is stable for large timesteps and allows for dy-namic 
boundaries. The downside of this procedure is that it introduces an extra 
computational step in the algorithm. The step itself is not overly expensive and 
greatly enhances the simulation’s look, so it will be considered for this research. 
9
3.2 The GPU advantage 
At the SIGGRAPH 2003 conference the power of the GPU was the topic of dis-cussion. 
Krüger and Westermann demonstrated that the parallelism of graphics 
processors can be used as a matrix solver and to handle finite difference equations 
for PDE approximations (Krüger and Westermann, 2003). On an ATI9800 card an 
interactive visualized 2D Navier-Stokes solution ran at 9 FPS (frames per second) 
on a 1024x1024 grid. In contrast, the CPU solvers provided by (Fedkiw et al., 2001) 
need more than a second per frame on a similar sized domain. The advantage of 
the GPU became obvious and fluid dynamics research reflected that. 
In 2004, as part of the GPU Gems book (Fernando et al., 2004), Harris wrote a 
chapter entitled Fast Fluid Dynamics on the GPU (Harris, 2004). He described 
a method, based on Stam’s Stable Fluids technique that offloaded all equation 
of motion calculations to the graphics card and produced a very fast interactive 
2D Navier-Stokes solver. He successfully demonstrated how the grid data can be 
translated into textures and how pixel shaders, which run simultaneously on each 
pixel every render call, can be used to calculate the simulation. To solve the Poisson-pressure 
equation he used a Jacobi iteration scheme, which, compared to the Krüger 
and Westermann conjugate gradient and multigrid solvers, converges slower, but is 
simple to implement and makes easy use of parallel calculations. Harris details how 
his approach can be easily extended to allow for arbitrary boundaries (indeed, it is 
just an addition of a texture that contains them to each shader). This GPU Gems 
chapter provides a straightforward introduction into using shaders as a fluid solver 
and will influence the early investigation of this study. Harris also describes a means 
to extend the domain into 3D by layering 2D textures, but as current technology 
allows for easy use of 3D Textures, it will not be considered. 
As graphics hardware saw a staggering growth, in 2007 Harris’ work was built upon 
in GPU Gems 3 (Nguyen et al., 2007). In the chapter Real-Time Simulation and 
Rendering of 3D Fluids (Crane et al., 2007) the authors extend GPU simulation of 
real-time fluid dynamics into the 3D domain. Their example program successfully 
simulated either fire, smoke or water in a 70x70x100 grid. Additionally, using the 
powerful Direct3D 10 support for 3D textures and the brand new geometry shader 
functionality, their method allowed for any 3D object to voxelised and used as a 
dynamic boundary for the simulation. Results of this can be seen in the game 
Hellgate: London (Studios, 2007), which utilises this procedure. 
10
Figure 3.2: Smoke being pushed and moving around by a gargoyle in Hellgate: 
London 
In the 2004 GPU Gems chapter, Harris uses a semi-Lagrangian backward advection 
step that is based upon the one used by Stam (Stam, 1999) and as such suffers from 
numerical smoothing. Crane et al. address this issue by utilising a MacCormack 
scheme, which is a higher-order accuracy advection solver, in addition to vorticity 
confinement. While this introduces two intermediate semi-Lagrangian steps in the 
advection process and is not an unconditional method, it allows for better visual 
fidelity of the final result without increasing the grid resolution. This saves memory 
bandwidth at the expense of more computation, but in the chapter’s words math is 
cheap compared to bandwidth. As this project will be looking into 3D smoke and 
fire simulation by utilizing graphics hardware, this work will be used for reference. 
3.3 3D Fluid Rendering 
Graphics cards are optimised for rendering polygons and especially triangles. This 
must be taken into account when it comes to displaying volumetric data, such as a 
smoke, as there is no native way of rendering volume. 
3.3.1 Particle Systems 
In almost all early real-time fluid solvers (Stam, 2003) and many modern ones 
(Gourlay, 2012) and (McGuire, 2006), the approach is to use particles. This has 
the initial advantage of using an already established system that is common in 
games and other graphical applications. Lagrangian or semi-Lagrangian schemes 
are used to represent the domain. In the example from Gourlay, there are two types 
of particles. The first are called vortex particles (or vortons). They are used to 
represent the flow field and are free to move anywhere. The second type are just 
regular particles, used for visualisation of the effect. They change their colour and 
opacity state, depending on the vortons around them. 
11
Using particle systems is advantageous when using a Lagrangian scheme and comes 
with the advantage that the simulation space can be global, instead of a constrained 
grid size. The disadvantage when using particles to visualise the fluid is the CPU 
and GPU memory and processing overhead of storing and updating all of them. The 
finer the detail level required, the more particles need to be used. Another downside 
is that an unconstrained, dynamic simulation space is difficult to implement when 
using the GPU. 
3.3.2 Volume Ray Casting 
The other main rendering technique which has gained more traction in GPU fluid 
solvers is called ray-marching (or volume ray casting). This is the approach used by 
(Crane et al., 2007) and (Zhou et al., 2007). This method works by considering the 
fluid as a box, made up of many voxels, which contain the fluid properties. When 
rendering, rays are traced from the point of view to the volume. The rays are then 
marched through the domain with a predefined sample rate, accumulating colour 
based on what the volume contains - smoke density, for example. This ends either 
when enough density to get a fully saturated colour has been collected, or the ray 
exits the volume. Usually, to get a decent visual result, a step size equal to half a 
voxel is used when marching. Results of using volume ray casting can be seen in 
the figure on page 11. 
There are certain problems present in ray-marching. As discussed by (Crane et al., 
2007), banding is a visible artifact that appears if the sample step is too big or grid 
resolution is too small. It is mostly prevalent when looking at the fluid up close. 
There are certain ways to compensate for this - by using a smaller sample step, or 
taking an extra sample at each step. These both come at an additional computa-tional 
cost, though. 
Ray-marching fits very well with the Eulerian and semi-Lagrangian schemes as it 
considers the simulation domain as a fixed grid with changing properties. It is 
also a technique that is inherently parallel and can naturally be implemented using 
GPU pixel shaders. As this work will focus entirely on leveraging the graphics card, 
volume ray casting will be the rendering method considered. 
12
Chapter 4 
Methodology 
4.1 Introduction and Goals 
Upon completing the research into this topic and revising the major project aims, 
implementation goals are set. To revise - the main objective of this project is to 
provide an effective way of performing fluid simulation on the GPU and adapt it to 
be used at runtime. With this in mind, the framework created is tasked to fulfil the 
following: 
• Support simulation of both fire and smoke. 
• Compute and render at least 2 different fluids at the same time. 
• Maintain a frame rate of at least 30 FPS on low-to-mid tier graphics cards 
and 60 FPS on high-end ones. 
• Showcase improved application performance by using one or more LOD tech-niques. 
4.1.1 Framework Architecture 
The framework developed for this project targets the Windows 7+ operating sys-tems. 
It uses Direct3D for rendering and is implemented in C++. For graphics 
card operations it makes use of the powerful DirectCompute API along with HLSL 
(High-Level Shading Language) for writing compute and pixel shaders. 
13
4.1.2 Methodology Structure 
The methodology starts by talking about how the fluid domain is represented. Then, 
it covers the set up of a simulation and optimisations that it makes use of. What 
follows after is a description of the process of running a fluid simulation and the 
techniques that are used to control it at runtime. The methodology concludes with 
how fluids are rendered. Please note that fluid simulation and fluid object will be 
used interchangeably. 
4.2 Fluid Domain Representation 
In order to solve the fluid equations of motions numerically, the domain must be 
discretized into computational points that the solver works with. As discussed in 
the background chapter, an Eulerian representation models a domain as a grid of 
points that contain the properties of the fluid. This is the approach this framework 
utilises. The main reason being that 3D Eulerian grids are can be logically mapped 
to voxels in 3D textures, which contain the data the GPU requires. The following is 
part of a function which creates a 3D texture that can hold a 4 16-bit floating point 
numbers: 
D3D11_TEXTURE3D_DESC textureDesc; 
textureDesc.Width = SizeX; 
textureDesc.Height = SizeY; 
textureDesc.Depth = SizeZ; 
textureDesc.Format = DXGI_FORMAT_R16G16B16A16_FLOAT; 
Where SizeX, SizeY and SizeZ vary depending on the size of the required domain 
bounds. The format can also be changed. Each fluid uses a number of equally sized 
textures to represent its state and properties. These are then stored on and used by 
the graphics card. The equations of motion are calculated by running computational 
kernels (implemented by shader programs) over the textures. 
14
4.3 Setting up a Simulation 
The first necessary requirement when creating a new fluid object is to determine 
its grid dimensions. Textures of this size are then created for the various fluid 
properties. Each texture is then used to create a ShaderParams structure, outlined 
below: 
struct ShaderParams { 
CComPtrID3D11ShaderResourceView mSRV; 
CComPtrID3D11UnorderedAccessView mUAV; 
}; 
Generally, a ShaderResourceView (SRV) is used as an input to a shader program, as 
it can only be read from, and a UnorderedAccessView (UAV) is used as an output, 
as it can be written to. A more detailed description of Direct3D resource interfaces 
can be found on (MSDN, 2014). For all fluid properties, except divergence, vorticity 
and obstacles, 2 textures and ShaderParams structures are created. This is due to 
the need to keep track of the fluid state at the previous time step in order to evaluate 
the new one. 
Choosing what type the fluid will be is also decided during this step. The framework 
allows for two kinds - fire and smoke. While simulating both is nearly identical, fire 
simulation requires 2 extra textures to keep track of the fire reaction values (this 
determines the intensity of the fire at each cell and is used when rendering it). 
At the end of the setup static boundary conditions are initialised. As fluids are 
modelled as being in a box domain, boundary conditions are modelled as a single-cell 
wide obstacle along each wall of the box and are stored in an 8-bit texture. 
4.3.1 Choosing a Grid Size 
Grid size has the biggest effect on how fast a simulation step is processed. A 
32x64x32 domain, for example, will be evaluated more than 3 times faster than 
a 64x128x64 one. This is both due to the fact that there are less cells to process 
and because smaller texture sizes need less memory bandwidth. 
The render size of the fluid is independent of its simulation domain. This means 
that both a high and a low-resolution grid can be rendered with the same size. Up- 
15
scaling the render size of a coarse grid can lead to visible artifacts, as there will be 
less cells to sample for a good appearance. Similarly, rendering a fine grid in smaller 
scale is potentially wasteful, as the increased detail is harder to spot. 
Figure 4.1: Different grids, same render scale. Domain size from left to right: 
16x32x16, 32x64x32, 64x128x64 
When setting up a simulation, it is best to first determine the render scale required 
and use that to choose the appropriate grid resolution in order to achieve the detail 
needed. It is worth noting that dynamically resizing textures at runtime is not 
feasible, so grid sizes stay fixed throughout execution. 
4.3.2 Setup Optimisations 
Memory is a key factor during setup, since depending on the fluid, a simulation can 
use up to 15 textures (due to double buffering) to keep all the required data for its 
state. These all have to be stored in memory and bound-unbound from the graphics 
pipeline every frame. Using many fluid objects has the risk of bottlenecking the 
GPU and starving it of memory. 
Texture Formats 
Direct3D offers an expansive range of different texture formats that can be used 
on the GPU. Based on the needed number of components and their size, choosing 
the correct format helps reduce video memory used and keeps texture bandwidth 
low, increasing performance. This is the first place for potential optimisations. 
When constructing fluid object textures, the format chosen is the smallest one that 
can contain the data. The DXGI_FORMAT_R16G16B16A16_FLOAT format is 
used for textures that hold fluid velocity and vorticity, since it is the smallest one 
16
that provides 3 components per cell. Density and pressure, on the other hand, 
only need 1 component per grid cell, which means they can be created with the 
DXGI_FORMAT_R16_FLOAT format. This leads to using 4 times less memory. 
It is worth to mention that using 16 bits per float is in itself an optimisation over 
using 32 bit floats, but as shown in other research (Crane et al., 2007), the visual 
degradation due to precision is hardly discernible. Below is a table of all formats 
used and the properties they’re used for. 
Table 4.1: Texture Formats and their uses 
Direct3D Texture Format Uses 
Texture Format Fluid Property 
DXGI_FORMAT_R16G16B16A16_FLOAT 
Velocity 
Vorticity 
DXGI_FORMAT_R16_FLOAT 
Density 
Temperature 
Reaction 
Divergence 
Pressure 
DXGI_FORMAT_R8_SINT Obstacles 
Texture Sharing 
There are several resources that need to be created for each fluid object that are 
unique to it. These are the velocity, density, temperature, vorticity, obstacle and 
reaction (for fire simulation) textures. They are unique, as they must be maintained 
through program execution. On the other hand, the textures for velocity divergence 
and fluid pressure are used temporarily by each solver. 
To take advantage of this, when a fluid simulation is constructed it first checks 
if an instance of these common resources has been created for its grid size. If so - it 
uses them, if not - it constructs and makes them available for further sharing. With 
this advantage in mind, if using many fluid objects - it is advantageous to build 
them of the same size. 
17
4.4 Running a Simulation 
Once the 3D scene has been initialised, the main application loop begins. In it, each 
fluid simulation is updated based on the numerical equation of motion solver. For 
this implementation, fluids are modelled as both incompressible and inviscid. Thus, 
the equations of motion become: 
@~u 
@t 
= −(~u·r)~u− 
1 
 
rp+ ~F (4.1) 
r·~u = 0 (4.2) 
The calculation process involves solving each part of these equations in order, using 
the result from each as the input to the other. The function that solves this every 
update is outlined below: 
void Fluid3DSolver::Process(ID3D11DeviceContext *context) { 
// Set the obstacle texture - it is constant throughout the 
execution step 
context-CSSetShaderResources(4, 1, 
(mFluidResources.obstacleSP.mSRV.p)); 
// Set all the constant buffers to the context 
SetShaderBuffers(context); 
//Advect temperature, density and reaction against velocity 
AdvectProperties(); 
// Advect velocity against itself 
AdvectVelocity(); 
// Determine how the temperature of the fluid changes the velocity 
ComputeBuoyancy(); 
// Add a constant amount of density and temperature back into the 
system 
RefreshConstantImpulse(); 
// If there are any extra forces - add them here 
ApplyExtraForces(); 
// Inject vorticity back into the system 
ComputeVorticityConfinement(); 
// Subtract the pressure gradient from the velocity field. This 
computes divergence free velocity. 
ComputeProjection(); 
} 
18
Firstly, the function binds the obstacle texture to the graphics pipeline. This is 
because nearly all compute shader programs query this texture and there is no 
need to constantly rebind it. Afterwards, the SetShaderBuffers function copies the 
required fluid computational parameters - which control aspects of calculation - from 
standard C++ structs into GPU constant buffers. 
4.4.1 Simulation Steps 
The rest of the functions solve the equations of motion. They all have underlying 
similarities: binding required SRVs as inputs and UAVs as outputs to their respective 
shader programs. Each is calculated using a numerical method that estimates its 
value. Below are all the steps outlined in order of execution. 
Advection 
Advection is what happens when the velocity field of the fluid transports other quan-tities, 
including itself, along the flow. This is described by the term (~u·r)~u. There 
are two methods that the framework uses to calculate advection. 
The first, simpler one, is the trace-back implicit routine (Stam, 1999). It uses a 
semi-Lagrangian scheme to calculate the new quantity of a fluid property at a posi-tion 
by tracing back the trajectory to its former cell and copying the quantity. The 
advantage of this advection technique is that it is unconditionally stable for any 
time steps and velocities. 
p (~x, t+t) = × p(~x−~u(~x, t)t, t)−μ (4.3) 
Here p (~x, t+t) is the quantity at the new time step.  is a user-defined dissipation 
term. It is in the range  2 [0, 1] and it artificially controls how fast the quantity 
being advected dissipates. 1 is no dissipation and lower values lead to the quantity 
disappearing faster. μ is the decay constant. It is used only for fire simulation and 
controls how fast the fire reaction dies out. When it is used the end result is clamped 
to not go below 0. 
The second advection routine used is the one proposed in (Crane et al., 2007) - 
the MacCormack scheme. It works by first performing two semi-Lagrangian steps, 
one by tracing forward and one by tracing back. Using those values, it performs a 
higher-order accuracy calculation, which leads to less numerical diffusion than the 
19
previous routine. 
ˆn+1 = A(n) 
ˆn = AR( ˆn+1) 
n+1 = ×( ˆn+1+ 1 
2(n− ˆn))−μ 
(4.4) 
Here, n indicates the advected property, ˆn+1 and ˆn calculate the two intermedi-ate 
properties. n+1 gives the final property at the new time step. A performs the 
advection routine 4.3 on the passed quantity and AR indicates that it is performed 
in reverse (meaning with a negative time step value). Again  is the dissipation fac-tor 
and μ is the decay constant. When doing MacCormack advection, no artificial 
dissipation or decay is performed on the first two steps. Since this advection routine 
is not unconditionally stable, the final result is clamped within the minimum and 
maximum values of the surrounding grid cells. 
While the MacCormack scheme gives improved detail, it forces the creation of two 
additional textures to hold the intermediate results and the computational cost 
of calculating them. The cost of the first can be offset by using texture sharing, 
mentioned previously, between simulations for these temporary values. The compu-tational 
cost is dealt with by using MacCormack advection only for the density and 
reaction properties and the standard one for the temperature and velocity fields. 
Figure 4.2: Left: Using MacCormack for density and reaction only; Right: Using 
MacCormack for all fields 
20
As it can be seen, due to the chaotic nature of both fire and smoke, the extra 
detail gained by using the more expensive advection routine on all fields is hardly 
discernible. In fact, the only difference can be seen in the beginning of a simulation, 
as the MacCormack one advects slightly faster. 
Buoyancy 
Buoyancy is what causes hot air to rise and cool air to fall. In the simulation 
it is used to modify the velocity field at each grid cell based on the temperature 
and density values at that cell, the density weight and buoyancy and the ambient 
temperature of the environment. It is one of the external forces ~F in equation 4.1. 
~ fbuyoancy = ((T −Tamb)'−(×))~vupt (4.5) 
Where T and  represent temperature and density at the current grid location 
respectively. Tamb is the ambient temperature of the fluid - if not used, can be left 
at 0. The buoyancy factor of the density field is ' - it controls how buoyant the 
smoke is, meaning how quickly it rises with the hot air.  is the smoke weight - a 
higher value will exert a stronger force on the velocity field and will make it die out 
faster. The result is multiplied by the global normal up vector and the current time 
step numerical integration value. The resultant force is then applied to the velocity 
value at that cell location. 
Constant Impulse and External Forces 
All fluid objects have been designed to have new quantities added every simula-tion 
step. For smoke, a constant amount of temperature and density is injected 
from the bottom of the domain. The addition of the first helps the system maintain 
velocity and the second keeps a steady stream of smoke that is the final visible result. 
With fire simulation the addition of temperature remains the same. It also in-jects 
extra reaction into the system along with the temperature. This is analogous 
to adding fuel to a fire. Afterwards, an extinguishment test is performed on the grid. 
This samples reaction values and determines if they are below an extinguishment 
threshold - if so, smoke is formed based on a reaction constant. 
The ApplyExternalForces method is mainly reserved for future use. This is where 
forces such as wind can be added to introduce more chaotic behaviour into the sys-tem. 
User interaction with fluids can also be accomplished using this function. Any 
21
quantity used by the simulation can be added in this step. 
Vorticity Confinement 
Even when using a higher-order advection routine, the solver still suffers from nu-merical 
dissipation. Vorticity confinement (Fedkiw et al., 2001) tries to offset this 
by calculating the local vorticity 
~! = r×~u (4.6) 
and injecting it back into the velocity field. Calculating this is the first step of the 
process. Afterwards a normalized vorticity location vector is retrieved using: 
~ = r|~!| 
~N 
= ~ 
|~| 
(4.7) 
In both equations, vector operations are estimated using finite difference methods. 
The final confinement force is then calculated by: 
~ fconf = (~N 
×~!)t (4.8) 
In this equation   0 is called the strength factor and controls the amount of small 
scale detail that is introduced back into the velocity field. In this project implemen-tation 
it is clamped to the range  2 [0, 1]. This force is then added to the existing 
flow. 
Vorticity confinement requires the addition of 1 extra texture per fluid and 2 rela-tively 
cheap shader program operations per update step. The technique proves vital 
to the proper appearance of both smoke and fire and more than makes up for its 
cost in visual quality. 
22
Figure 4.3: Different vorticity confinement strengths. Strength factors from left to 
right: 0, 0.5, 1.0 
As can be seen - a suitable strength factor lies between 0.5 and 1.0. The simulations 
in the project application use values in that range with fire ones tending to be higher. 
Projection 
Up to this point a velocity field ~w has been calculated but it does not adhere to the 
continuity equation 2.2 as it is divergent. Therefore, the final step in each simulation 
update is to calculate a divergence-free flow field. (Harris, 2004) explains that the 
Helmholtz-Hodge Decomposition Theorem can be used to correct the velocity by 
subtracting the gradient of the pressure field: 
~u = ~w−rp (4.9) 
To compute the pressure field the following Poisson-pressure equation can be used: 
r2p = r· ~w (4.10) 
These two equations are logically broken down into 3 operations. The first calculates 
the divergence of the velocity field r· ~w and stores it in a texture. Again, vector 
operations are estimated using finite differences. 
23
The second step solves the Poisson-pressure equation using a common method called 
a Jacobi iteration solver. It is a technique that converges relatively slowly to a solu-tion 
but has the advantage of being cheap to run using GPU kernels (Harris, 2004). 
This project uses an average of 10 to 15 Jacobi iterations for both fire and smoke. 
A higher number will provide better looking, more accurate results but the compu-tational 
cost rises quite steeply. As proven by (Crane et al., 2007) higher iteration 
counts do not lead to overly better quality render results. 
The final step is a straightforward subtraction of the resultant pressure gradient 
from the divergent flow field. The result is stored in ~u which becomes the new 
velocity field. 
Boundary Interaction 
As mentioned in section 4.3, all fluids have a single voxel wide obstacle texture on 
the box edges that acts as the boundary for the system. Cells in this texture either 
have the value of 1 if there is an obstacle at the location, or 0 if there is none. All 
computational steps have access to this texture and use it differently. 
Its most important function is to enforce the free-slip boundary condition, which 
states that a fluid cannot flow into or out of a solid, but can freely flow along its 
surface. This is mainly done in the projection step, where if an obstacle is detected, 
the velocity component of that cell is taken as 0. When performing Jacobi iterations 
and sampling adjacent cells, if an obstacle is present, the pressure component of that 
cell is not used - this is the approach utilised by (Crane et al., 2007). 
Obstacles are similarly used in the computation of vorticity confinement and ad-vection 
- forcing the velocity vector to be 0 if inside a boundary. 
4.4.2 Runtime Modifications 
Since so many variables control the appearance and structure of a fluid object, it is 
deemed feasible to have as many of them available to be edited at runtime as pos-sible. 
These are all kept in a C++ struct called FluidSettings and, along with the 
domain size, are used when constructing a fluid. These parameters are then trans-ferred 
to the GPU in various buffers during the SetShaderBuffers function from 4.4. 
24
At runtime, nearly all of the control parameters can be edited from a user interface 
window. This window appears when the user clicks on a fluid object with the mouse. 
Figure 4.4: Users have the freedom to edit fluid control settings at runtime to observe 
their effects. Reaction values are not used for smoke simulation 
When a parameter is edited, a method is called on the respective FluidCalculator 
for that object. 
void Fluid3DCalculator::SetFluidSettings(const FluidSettings 
fluidSettings) { 
// Update buffers if needed 
int dirtyFlags = GetUpdateDirtyFlags(fluidSettings); 
this-fluidSettings = fluidSettings; 
if (dirtyFlags  BufferDirtyFlags::General) { 
UpdateGeneralBuffer(); 
} else if ... 
} 
It first checks to see what settings have been changed and sets the necessary update 
dirty flags. Using the dirty flag pattern allows for only the constant buffers that 
25
have changed to be updated, instead of all of them. Updating a buffer involves 
copying its contents from GPU to system memory, changing them and then copying 
them back into the GPU so it should not be overused as advised by (McDonald, 
2012). Dirty flags assist with this. 
In a real game environment a player would not have such access to fluid settings, 
but this is immensely useful as a level of detail or game design tool as it allows for 
fine-tuning of just how the simulation plays out. 
4.4.3 Choosing an Update Rate 
For doing updates on objects each frame, games tend to use the difference between 
the time at the new frame subtracted from the time at the old frame. This is referred 
to as the delta time. Since this can vary with frame rate, sensitive calculations such 
as game physics tend to use a fixed integration time step value that is independent 
from delta time. 
This approach is used here - the value is, by default, 1/30, meaning 30 fluid up-dates 
per second. Note that this value controls how often the process method 
of a fluid is called, not the t value for the calculation formulas - that is defined 
separately for each fluid. The advantage of calling process at a fixed rate is that it 
keeps fluid movement consistent. If it was updated with a variable rate, each fluid 
would slow down or speed up, leading to a distorted look. 
30 updates a second was chosen since it is fast enough for each fluid to develop 
with reasonable speed, while still keeping up decent performance. It can be changed 
at runtime, although if the update rate increases above the processing capability of 
the hardware, the application slows down as it cannot keep up with the required 
number of updates. Rates of around 30 to 50 a second are common choices, although 
higher ones are certainly achievable on better hardware. 
26
4.4.4 Frame Skipping 
Even with the many simplifications, memory cutbacks and processing optimisations 
used, updating a reasonably-sized fluid object every frame is a demanding operation. 
Here is where an LOD technique called frame skipping comes in use. Its premise 
is quite simple - instead of updating a fluid simulation every frame, do it every 
few frames. It is inspired by the approximation techniques used by game physics 
simulations and has previously been adopted for fluids (Tangvald, 2007). Below is 
the implementation as used in the project. 
void Update() { 
bool canUpdate = framesSinceLastProcess  framesToSkip; 
if (canUpdate) { 
fluidCalculator-Process(); 
framesSinceLastProcess = 0; 
} 
else { 
++framesSinceLastProcess; 
} 
} 
Although a very simple LOD method, frame skipping frees up substantial comput-ing 
power, especially when using many fluid objects. Its downside is that its effects 
are quickly spotted. Even skipping one frame per process step means that the sim-ulation 
will update twice as slow. Therefore, this technique is only used on fluids 
which are not in the current view frustum. Even then, it starts being used only after 
the simulation has had a few seconds to develop first. Afterwards, no difference in 
behaviour can be noticed when looking away and then back at a fluid object, since 
the behaviour is inherently chaotic. 
Choosing the amount of frames to skip can be changed at runtime. It has to be 
noted that the performance gained by skipping additional frames is not linear - it 
peaks at 5 and most is gained around skipping 2 or 3. 
27
4.5 Rendering Fluids 
Rendering the final result is done via the Ray-marching technique previously dis-cussed 
(Zhou et al., 2007). It was chosen due to it being straightforward to im-plement 
in a standard pixel shader program and for its ability to give good visual 
results. 
A fluid in 3D space is represented by an object called a VolumeRenderer which 
at its core is a simple cube - it has a position, rotation and scale components - all 
the required properties for rendering in 3D space. When an instance of a volume 
renderer is constructed it needs to know what type of fluid it will render - smoke or 
fire. If the type is smoke - it can be given a reference to a 3D texture (in the form 
of a SRV) of smoke density values that it then uses for drawing. If rendering fire, 
it can also be passed a reference to a 3D texture of fire reaction values in addition 
to density. This creation data is important, as there are different pixel shaders used 
when rendering each type. 
4.5.1 Render Parameters 
There are certain parameters that affect the render result of a fluid simulation which 
can be modified at runtime. 
Figure 4.5: Render settings modify the look of a fluid without changing its physical 
properties 
Number of Samples 
The number of samples is the sample rate described in section 3.3.2. It has a direct 
effect on the quality of the produced result. A higher rate will sample more density 
and reaction values, thus producing a more accurate average colour. It also means 
that more time will be spent in the pixel shader, which directly affects performance. 
In practise, sample rate matters only when the fluid takes up a significant amount 
28
of screen space. This is due to the fact that a pixel shader is only run on the visible 
pixels on the screen that an object occupies. 
Figure 4.6: Different sample rates of a 64x128x64 fluid from afar. Left: 32 samples; 
Right: 128 samples 
As it can be seen, from afar the difference in quality is hardly discernible, although 
the step size difference is substantial. The performance of both is nearly identical 
- since there are less pixels occupying the screen space, the extra time spent in the 
shader program is insignificant. 
Figure 4.7: Different sample rates of a 64x128x64 fluid from up close. Left to right: 
32, 64, 128 samples 
This is the same flame as the one in the previous figure. When viewing from a 
29
closer distance, the quality of using a higher number of samples can be seen more 
clearly (this is more defined when seeing the fluid moving). This is due to the lower 
step value leading to a smaller range of colours used to represent the fluid. A vis-ible 
improvement is seen when increasing the sample rate from 32 and 64, but a 
very slight one when going from 64 to 128. This is because more samples cannot 
make up for the grid size of a fluid. Even for a relatively big domain, like the one in 
the figures, there will be little visual gain when using more than 100 samples per ray. 
There are significant performance implications when using a higher sample rate 
with the fluid in full view, since the fluid takes up a large part of the screen. De-pending 
on the view distance, rendering with 32 samples could be nearly twice as 
fast as rendering with 128. It is therefore best to decide upon a sample rate that 
would give a good visual result, yet still compute fast. 
Colour and Absorption 
Changing the Smoke Color property alters the colour appearance of smoke for both 
types of simulations. 
Smoke and fire Absorption control how much to saturate the resultant colour when 
sampling density and reaction values respectively. A higher value will mimic thick 
smoke or flames, while a lower one will produce a weaker looking flame or lighter 
smoke. 
4.5.2 Fluid Instancing 
A key goal throughout the development of this project is to separate the concept of 
fluid motion calculation from fluid rendering. A Fluid3DCalculator object does not 
know about a VolumeRenderer and vice versa. The former is responsible for setting 
up and running the equations of motion on a set of 3D grids while the latter will 
render suitable 3D textures passed to it. 
Given this separation, it is straightforward to implement a form of instancing for 
fluids. This means that one fluid instance can be drawn multiple times by different 
volume renderers. Since the cost of rendering is trivial compared to the cost of 
simulating, this allows for a scene to seemingly contain many fire and smoke effects, 
while only computing a small amount. 
30
Volume renderers using data from the same fluid instance will display identical 
results. To make them visibly dissimilar, each can be set different render parame-ters. 
A combination of colour and absorption can be used to achieve non-identical 
looking fluids. The final scene as seen on page 33 is made out of 2 unique smoke 
simulations - one of which has 3 instances, and 1 unique fire simulation that has 2 
instances. 
Instancing and Frame Skipping 
Frame skipping is used when a fluid simulation is not in view. Instancing means 
that the same fluid simulation can be in more than one place. To deal with this, 
before activating frame skipping, all volume renderers that use a particular fluid 
object are tested for visibility. If even one is in view - frame skipping will not occur. 
31
Chapter 5 
Results and Discussion 
The previous chapter covered the implementation details of calculating the fluid 
equations of motion and rendering the result. It also discussed various optimisation 
methods used to make the process as performant as possible. This chapter will 
examine the results of the implementation to determine its effectiveness. This will 
involve scrutinizing both the visual results of the simulation and its performance. 
5.1 Testing Setup 
5.1.1 Hardware 
The application was tested and benchmarked on two different systems. The first is 
a mid-tier laptop and the second is a high-end gaming PC. 
Table 5.1: Hardware used for testing 
Laptop PC 
CPU Intel Core i7-3632QM @ 2.20GHz i5 3570K @ 4.5GHz 
RAM 8 GB, DDR3 12 GB, DDR3 
GPU NVIDIA GeForce GT 640M LE, 2 GB DDR3 ATI R9 280X, 3GB GDDR5 
OS Microsoft Windows 7 64-bit Microsoft Windows 8.1 64-bit 
The important difference between the two setups being the graphics card. The 
NVIDIA, being a mobile low-power series, has around 2.5 times less clock cycles 
and 11 times less memory bandwidth compared to the ATI one. Detailed specifica-tions 
on both GPUs can be found in appendix A. 
32
Quantitative results are obtained during application runtime. There is an in-game 
frame counter to report on FPS. It displays current, minimum, maximum and av-erage 
frames per second achieved and is used as a benchmark for performance. 
5.1.2 Scene 
The test scene has been set up to fulfil the application requirements. There are 3 
different fluids computed at the same time - 2 fire and 1 smoke effects. There are 6 
volume renderers visualising the results of those simulations. 
Figure 5.1: Looking at the entire final scene from a distance with all fluids in view 
The user is free to control the camera, click on fluids and change or observe their 
parameters. There is also a scene fly-through mode, which performs a looping 
predefined movement through the scene. This mode features both up-close and 
distance views of the various fluids in the scene. 
33
5.2 Visual Results 
Real-world phenomena, such as smoke and fire, come with an inherent random-ness 
and subtle features that computer graphics do not have the power to precisely 
mimic. With certain simplifications and smart uses of technology, though, the re-sults 
obtained in this project successfully attempt to bridge that gap. 
Figure 5.2: Smoke and fire simulation in the application 
5.2.1 Modifying Parameters 
Since the application allows the freedom to modify both fluid and render settings - 
it is very easy to produce different looking simulations. 
Figure 5.3: Right: Fast decaying fire, producing a lot of smoke; Mid: Strong fire, 
burning with nearly no smoke; Right: Average strength fire, producing blue smoke 
34
5.3 Memory  Performance Results 
The main goal of this project is to prove that the parallel power of graphics cards has 
reached a threshold that would allow for real-time physically-based fluid simulation. 
For this reason memory and frame times are both a topic of common discussion 
throughout this project. 
5.3.1 Memory Use 
In section 4.3.2 the various optimisations that are performed during a fluid object 
set up were discussed. By querying the GPU, it can be seen how much video mem-ory 
fluids of different types and domain sizes use. Below is table with several of 
these results with increasing grid resolution. These do not include video memory 
for rendering. 
Table 5.2: Video memory used for simulations of different resolution 
Grid Size Smoke Memory Fire Memory Shared Memory 
16x16x16 0.1 MB 0.11 MB 0.06 MB 
32x32x32 1.8 MB 1.9 MB 0.8 MB 
64x64x64 13.8 MB 14.8 MB 5.5 MB 
128x128x128 110 MB 118 MB 44 MB 
Smoke Memory is the video memory required per unique smoke effect and Fire 
Memory is the memory required per unique fire effect. Shared Memory is how much 
of that total can be shared with other simulations. 
As it can be seen, the memory required to store all of the textures that contain 
the fluid properties rises exponentially with grid resolution. By utilising texture 
sharing, some of this memory cost is offset when using more than one fluid of the 
same size. Even so, using sizes bigger than 1283 is infeasible both due to the memory 
cost required but also because the processing time quickly rises. A good option is 
to only use a higher resolution in one or two dimensions, while using a smaller on 
in another. 
Alternatively, grids in the range of 303 to 503 are ideal for modelling average sized 
uniform domains. Their memory cost comes around 1 to 1.5 times that of high 
35
quality PNG images, which are often used as textures in games. Both test GPUs 
have in excess of 2 GB of memory to spare, so this is a small cost to pay. 
Finally, instancing allows for having many fire and smoke effects without paying 
the memory cost for creating each one. Its benefits are measured in the amount 
of instances that use a single fluid object. Considering also that the cost of a vol-ume 
renderer is insignificant compared to that of a simulation means that, where 
appropriate, instancing should be preferred to creating a new fluid effect. 
5.3.2 Performance 
To recap, the final scene features 1 smoke simulation of grid size 64x128x64, an-other 
smoke one of grid size 30x60x30 and 1 fire of size 40x80x40. There are a 
total of 6 volume renderers displaying the results of these simulations. Each simula-tion 
does 10 Jacobi solver iterations and uses a sample rate of 64 when ray-marching. 
This scene was benchmarked on both test machines several times with increasing 
simulation update rates. Benchmarking involves running the scene in fly-through 
mode for a period of 5 minutes and noting down the minimum, maximum and 
average frame rates achieved. 
Figure 5.4: Benchmark results on notebook computer using a NVidia GT 640M LE 
GPU 
The substantial difference between the maximum and the minimum and average 
FPS is noticed immediately. This is due do the use of frame skipping when some 
or all simulations are not in view, freeing up GPU resources. The minimum frame 
rate occurs when all fluid objects are in view and one or more are viewed up close, 
36
which increases render time. The majority of time in the fly-through mode is spent 
with all or 2 out of 3 simulations in view from a distance. This is what the average 
FPS captures. 
The benchmark results show that going above 30 updates/sec is not feasible on 
this setup since frames quickly start dropping. As mentioned previously, if the up-date 
rate forces the use of more clock cycles and texture bandwidth than available, 
the program slows down. 
Figure 5.5: Benchmark results on gaming PC using an AMD Radeon R9 290X GPU 
This graph displays the significance that increased memory bandwidth and clock 
cycles have on performance. The AMD R290x only begins to get a decreased frame 
rate when doing over 150 updates/sec. Up until then, it consistently keeps an av-erage 
of above 800 FPS. Only around the 200 updates/sec mark do the simulations 
start reaching the system limits. 
In reality, though, there is no reason to use an update rate of more than 30-40 
when that power can be spent on computing and rendering more fluid objects, in-stead. 
These results show the potential that the new generation of GPUs have for 
handling such computationally intensive tasks. 
37
Chapter 6 
Conclusion and Future Work 
This project had the goal of investigating fluid simulation with the aim of answering 
the following question: 
How can the parallel processing advantage of modern graphics cards be 
used for simulating physically-based fluids, and how can this approach 
be adapted for real-time use? 
With particular goals being: 
• Derive an effective way of utilising the GPU for solving the equations of fluid 
motion in 3D. 
• Discover what level of detail methods and performance optimisations can be 
applied in order to use fewer system resources. 
This research has demonstrated that the equations of fluid motion can be calculated 
in real-time with reasonable frame rates on the GPU. The project implementation 
provided offers an optimised and memory efficient solution for numerically solving 
and rendering fire and smoke with satisfactory results. 
The performance tests in Chapter 5 clearly show that the newest generation of 
graphics cards are more than capable of updating and rendering many simulations 
at once. The tests also showed that low-to-mid tier cards can handle their own when 
dealing with a few reasonably sized fluid domains at an average update rate. 
38
6.1 Future Work 
This project covers how to efficiently implement a fluid solver and render the results. 
For a topic as broad as fluid simulation there is certainly more research that could 
be done. 
One area that can certainly be further investigated is implementing interactions 
with a fluid. The external forces part of the motion equations can be used to pro-vide 
a form of user control of the system. (Crane et al., 2007) implement a form of 
object voxelisation using a geometry shader to allow arbitrary 3D models to be used 
as obstacles in the simulation. This technique could be extended and improved to 
take into account different objects going into and out of a fluid domains, disturbing 
it based on their velocity and shape. 
When there are only a few sources of constant input into a fluid domain, large 
parts of the 3D grid are left empty but still take up computational time. A better 
way to handle updating a fluid would be to split up each grid into chunks and de-termine 
if a chunk contains fluid properties. Then, only the ones that do will be 
updated. This technique has the potential to allow for much faster processing of 
bigger fluid domains. 
To further increase visual quality, rendering smoke could take into account light 
sources and each fluid should be able to cast dynamic shadows. Additionally, a fire 
itself could be made a light source. This would be achieved by first creating a num-ber 
of lights per fire simulation and then advecting their positions via the velocity 
field and controlling their brightness via the reaction field. 
39
Appendix A 
Test GPUs Specifications 
Figure A.1: Technical Specifications of both graphics cards used for testing. The 
bandwidth and clock speeds are the key factors for performance 
40
Appendix B 
CD Contents 
The attached CD contains the following directory structure: 
Application Contains the final application executable. 
Dissertation Contains an electronic copy of this dissertation document. 
Instructions Contains instructions for the operation of the application. 
Media Contains images and video of the final application. 
Project Contains the full source code and assets for the application. 
Proposal Contains an electronic copy of the original project proposal. 
41
References 
Bai, Y. and Turk, G. 2005. Reducing numerical dissipation in fluid simulation. 
Georgia Institute of Technology Available from: http://tinyurl.com/pcy4exs. 
3.1 
Barrett, J. 2012. Real-time animation and rendering of ocean waves. [Online]. 2.2 
Bridson, R. 2008. Fluid Simulation for Computer Graphics. CRC Press. 2.1.1 
Carucci, F. 2005. Inside Geometry Instancing. Addison-Wesley Professional. 
Available from: http://http.developer.nvidia.com/GPUGems2/gpugems2_ 
chapter03.html. 2.3 
Crane, K., Llamas, I., and Tariq, S. 2007. Real-Time Simulation and Render-ing 
of 3D Fluids. Addison-Wesley Professional. Available from: http://http. 
developer.nvidia.com/GPUGems3/gpugems3_ch30.html. 3.2, 3.3.2, 4.3.2, 4.4.1, 
4.4.1, 4.4.1, 6.1 
Fedkiw, R., Stam, J., and Jensen, H. W. 2001. Visual simulation of smoke. In: 
SIGGRAPH 2001 Conference. 3.1, 3.2, 4.4.1 
Fernando, R. et al. 2004. GPU Gems: Programming Techniques, Tips and Tricks for 
Real-Time Graphics. Addison Wesley. Available from: http://http.developer. 
nvidia.com/GPUGems. 3.2 
Gourlay, M. 2012. Fluid simulation for video games. Intel Devel-oper 
Zone Available from: http://software.intel.com/en-us/articles/ 
fluid-simulation-for-video-games-part-3. 1, 3.3.1 
Gupta, S. 2011. Gpu supercomputers show exponential growth in top 500 
list. [Online]. Available from: http://blogs.nvidia.com/blog/2011/11/14/ 
gpu-supercomputers-show-exponential-growth-in-top500-list/. 1 
42
Harris, M. 2004. Fast Fluid Dynamics Simulation on the GPU. Addison Wes-ley. 
chap. 38. Available from: http://http.developer.nvidia.com/GPUGems/ 
gpugems_ch38.html. 1, 3.2, 4.4.1, 4.4.1 
Hess, J. and Smith, A. 1967. Calculation of potential flow around arbitrary bodies. 
In: Progress in Aerospace Sciences. 3 
Krüger, J. and Westermann, R. 2003. Linear algebra operators for gpu implementa-tion 
of numerical algorithms. In: SIGGRAPH 2003 Conference. Available from: 
http://tinyurl.com/ozb5xpy. 3.2 
McDonald, J. 2012. Don’t throw it all away: Efficient buffer man-agement. 
In: Game Developer Conference. Available from: https: 
//developer.nvidia.com/sites/default/files/akamai/gamedev/files/ 
gdc12/Efficient_Buffer_Management_McDonald.pdf. 4.4.2 
McGuire, M. 2006. A real-time, controllable simulator for plausible smoke. 
Brown University Available from: http://graphics.cs.williams.edu/papers/ 
SmokeSimBrown06/smoke-simulation-brown06.pdf. 3.3.1 
MSDN. 2010. Compute shader overview. [Online]. Available from: http://tinyurl. 
com/plpw97t. 1 
MSDN. 2014. Resource interfaces. [Online]. Available from: http://tinyurl.com/ 
mwledo4. 4.3 
Nguyen, H. et al. 2007. GPU Gems 3. Addison-Wesley Professional. Available from: 
https://developer.nvidia.com/content/gpu-gems-3. 3.2 
NVidia. 2013. NVidia Computational Fluid Dynamics Page. [Online]. Available from: 
http://www.nvidia.com/object/computational_fluid_dynamics.html. 1 
Stam, J. 1999. Stable fluids. In: SIGGRAPH 1999 Conference. Avail-able 
from: http://www.dgp.toronto.edu/people/stam/reality/Research/ 
pdf/ns.pdf. 3.1, 3.2, 4.4.1 
Stam, J. 2003. Real-time fluid dynamics for games. In: Game Developer Con-ference. 
Available from: http://www.dgp.toronto.edu/people/stam/reality/ 
Research/pdf/GDC03.pdf. (document), 2.1, 3.1, 3.3.1 
Steinhoff, J. and Underhill, D. 1994. Modification of the euler equations for “vorticity 
confinement”: Application to the computation of interacting vortex rings. Physics 
of Fluids . 3.1 
43
Hellgate: London. 2007. DVD-ROM. 3.2 
Tangvald, L. 2007. Implementing lod for physically-based real-time fire rendering. 
[Online]. 4.4.4 
Valve, S. 2012. Level of detail. Valve Developer Portal Available from: https: 
//developer.valvesoftware.com/wiki/Level_of_detail. 2.3 
Zhou, K. et al. 2007. Real-time smoke rendering using compensated ray march-ing. 
Microsoft Research Available from: http://research.microsoft.com/ 
pubs/70503/tr-2007-142.pdf. 3.3.2, 4.5 
44
Bibliography 
Acheson, D. 1990. Elementary Fluid Dynamics. Clarendon Press. 
Rideout, P. 2011. 3d eulerian grid Available from: http://prideout.net/blog/ 
?p=66. 
Selle, A. et al. 2007. An unconditionally stable maccormack method Available from: 
http://tinyurl.com/nm4novl. 
45

More Related Content

What's hot

FATKID - A Finite Automata Toolkit - NF Huysamen
FATKID - A Finite Automata Toolkit - NF HuysamenFATKID - A Finite Automata Toolkit - NF Huysamen
FATKID - A Finite Automata Toolkit - NF HuysamenNico Huysamen
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
 
Nvidia CUDA Programming Guide 1.0
Nvidia CUDA Programming Guide 1.0Nvidia CUDA Programming Guide 1.0
Nvidia CUDA Programming Guide 1.0Muhaza Liebenlito
 
Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...Banking at Ho Chi Minh city
 
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...mustafa sarac
 
An introduction to tivoli net view for os 390 v1r2 sg245224
An introduction to tivoli net view for os 390 v1r2 sg245224An introduction to tivoli net view for os 390 v1r2 sg245224
An introduction to tivoli net view for os 390 v1r2 sg245224Banking at Ho Chi Minh city
 
Disaster recovery solutions for ibm total storage san file system sg247157
Disaster recovery solutions for ibm total storage san file system sg247157Disaster recovery solutions for ibm total storage san file system sg247157
Disaster recovery solutions for ibm total storage san file system sg247157Banking at Ho Chi Minh city
 

What's hot (17)

Jmetal4.5.user manual
Jmetal4.5.user manualJmetal4.5.user manual
Jmetal4.5.user manual
 
document
documentdocument
document
 
Ee380 labmanual
Ee380 labmanualEe380 labmanual
Ee380 labmanual
 
FATKID - A Finite Automata Toolkit - NF Huysamen
FATKID - A Finite Automata Toolkit - NF HuysamenFATKID - A Finite Automata Toolkit - NF Huysamen
FATKID - A Finite Automata Toolkit - NF Huysamen
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Thesis
ThesisThesis
Thesis
 
Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016
 
zend framework 2
zend framework 2zend framework 2
zend framework 2
 
thesis
thesisthesis
thesis
 
Cube_Quest_Final_Report
Cube_Quest_Final_ReportCube_Quest_Final_Report
Cube_Quest_Final_Report
 
Nvidia CUDA Programming Guide 1.0
Nvidia CUDA Programming Guide 1.0Nvidia CUDA Programming Guide 1.0
Nvidia CUDA Programming Guide 1.0
 
Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...
 
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
Explorations in Parallel Distributed Processing: A Handbook of Models, Progra...
 
An introduction to tivoli net view for os 390 v1r2 sg245224
An introduction to tivoli net view for os 390 v1r2 sg245224An introduction to tivoli net view for os 390 v1r2 sg245224
An introduction to tivoli net view for os 390 v1r2 sg245224
 
Software guide 3.20.0
Software guide 3.20.0Software guide 3.20.0
Software guide 3.20.0
 
Disaster recovery solutions for ibm total storage san file system sg247157
Disaster recovery solutions for ibm total storage san file system sg247157Disaster recovery solutions for ibm total storage san file system sg247157
Disaster recovery solutions for ibm total storage san file system sg247157
 
Cs665 writeup
Cs665 writeupCs665 writeup
Cs665 writeup
 

Similar to Project Dissertation

Similar to Project Dissertation (20)

GPU HistoPyramid Based Fluid Simulation and Rendering
GPU HistoPyramid Based Fluid Simulation and RenderingGPU HistoPyramid Based Fluid Simulation and Rendering
GPU HistoPyramid Based Fluid Simulation and Rendering
 
thesis
thesisthesis
thesis
 
Final_report
Final_reportFinal_report
Final_report
 
wronski_ugthesis[1]
wronski_ugthesis[1]wronski_ugthesis[1]
wronski_ugthesis[1]
 
matconvnet-manual.pdf
matconvnet-manual.pdfmatconvnet-manual.pdf
matconvnet-manual.pdf
 
Examensarbete
ExamensarbeteExamensarbete
Examensarbete
 
final (1)
final (1)final (1)
final (1)
 
Honours_Thesis2015_final
Honours_Thesis2015_finalHonours_Thesis2015_final
Honours_Thesis2015_final
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Battlegrounds Png Informations
Battlegrounds Png InformationsBattlegrounds Png Informations
Battlegrounds Png Informations
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign Detection
 
Hdclone
HdcloneHdclone
Hdclone
 
2D ROBOTIC PLOTTER
2D ROBOTIC PLOTTER2D ROBOTIC PLOTTER
2D ROBOTIC PLOTTER
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
Tutorial
TutorialTutorial
Tutorial
 
AUGUMENTED REALITY FOR SPACE.pdf
AUGUMENTED REALITY FOR SPACE.pdfAUGUMENTED REALITY FOR SPACE.pdf
AUGUMENTED REALITY FOR SPACE.pdf
 
Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016
 
thesis
thesisthesis
thesis
 
Thesis_Report
Thesis_ReportThesis_Report
Thesis_Report
 

Project Dissertation

  • 1. Real-Time Physically-Based Fluid Simulation on the GPU Valentin Hinov
  • 2. Declaration of Originality and Permission to Copy Author: Valentin Hinov Title: Real-Time Physically-Based Fluid Simulation on the GPU Degree: BSc (Hons) Computer Games Technology Year: 2014 (i) I certify that the above mentioned project is my original work. (ii) I agree that this dissertation may be reproduced, stored or transmitted, in any form and by any means without the written consent of the undersigned. Signature: ................................................................. Date: ................................................................. i
  • 3. Abstract Physically-based fluid simulation has long been reserved for the realm of offline ren-dering. Increasing improvements in the parallel computational power of graphics cards are bringing the opportunity to simulate this phenomena in real-time. This projects aims to prove that, with certain simplifications and optimisations, fluid simulation can be used in demanding applications such as games. A framework is created for this project to present methods for calculating and rendering fire and smoke using the parallel processing power of the graphics card through the DirectX 11 Compute Shader APIs. The suggested approach takes into consideration the importance of maintaining performance in a real-time application. Various LOD(Level Of Detail) and performance optimisation methods used in games are adopted and modified for this purpose. The most important variable for smooth gameplay is the frames-per-second (FPS) that an application maintains. By keeping a constant measure of it, the framework provides a means to monitor the stability and effectiveness of the implementation. The results of this project show that proper adoption of LOD techniques, such as frame skipping can greatly reduce processing overhead. On the other hand, the use of instancing techniques can allow for multiple fluids to be rendered at the cost of simulating just one. This, together with smart usage of texture management help keep the memory and processing footprint low. Conclusively, these combined provide an optimized solution for using physically-based fire and smoke in a real-time setting, which maintains both accuracy and visual quality. Measurements show that simulating 3 differently sized fluid domains - 64x128x64, 40x80x40, 30x60x30 - maintains an average frame rate of over 800 on a high tier graphics card, while still managing a comfortable 50 on a low tier one. Keywords: fluid simulation, performance, DirectX 11, Compute Shader ii
  • 4. Preface I would like to take this opportunity to extend my gratitude to the support and help I have received from my supervisor, Dr David MacTaggart, and my module tutor, Dr Henry Fortuna. I would also like to thank Alex Dunn, who provided me with valuable advice and constructive criticism. I am also immensely grateful for the help and patience provided by Tsvetelina Dacheva and the support of my parents during the long production hours on this project. - Valentin Hinov iii
  • 5. Contents Abstract ii Preface iii List of Figures vi List of Tables viii 1 Introduction 1 1.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background 5 2.1 Mathematics of Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Modelling the Simulation Space . . . . . . . . . . . . . . . . . 6 2.2 State of Fluids in Games . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 LOD and Performance Overview . . . . . . . . . . . . . . . . . . . . . 7 3 Literature Review 8 3.1 Early work on real-time solvers . . . . . . . . . . . . . . . . . . . . . 8 3.2 The GPU advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 3D Fluid Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.2 Volume Ray Casting . . . . . . . . . . . . . . . . . . . . . . . 12 4 Methodology 13 4.1 Introduction and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.1 Framework Architecture . . . . . . . . . . . . . . . . . . . . . 13 4.1.2 Methodology Structure . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Fluid Domain Representation . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Setting up a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15 iv
  • 6. 4.3.1 Choosing a Grid Size . . . . . . . . . . . . . . . . . . . . . . . 15 4.3.2 Setup Optimisations . . . . . . . . . . . . . . . . . . . . . . . 16 4.4 Running a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4.1 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4.2 Runtime Modifications . . . . . . . . . . . . . . . . . . . . . . 24 4.4.3 Choosing an Update Rate . . . . . . . . . . . . . . . . . . . . 26 4.4.4 Frame Skipping . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Rendering Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.5.1 Render Parameters . . . . . . . . . . . . . . . . . . . . . . . . 28 4.5.2 Fluid Instancing . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5 Results and Discussion 32 5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1.2 Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Visual Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2.1 Modifying Parameters . . . . . . . . . . . . . . . . . . . . . . 34 5.3 Memory & Performance Results . . . . . . . . . . . . . . . . . . . . . 35 5.3.1 Memory Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6 Conclusion and Future Work 38 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Appendix A Test GPUs Specifications 40 Appendix B CD Contents 41 References 41 Bibliography 45 v
  • 7. List of Figures 1.1 Computational performance of Navier-Stokes equations on new NVidia GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.1 Advection step moving smoke density along a velocity field. As shown in Stam’s solver from 2003 (Stam, 2003) . . . . . . . . . . . . . . . . 9 3.2 Smoke being pushed and moving around by a gargoyle in "Hellgate: London" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1 Different grids, same render scale. Domain size from left to right: 16x32x16, 32x64x32, 64x128x64 . . . . . . . . . . . . . . . . . . . . . 16 4.2 Left: Using MacCormack for density and reaction only; Right: Using MacCormack for all fields . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Different vorticity confinement strengths. Strength factors from left to right: 0, 0.5, 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4 Users have the freedom to edit fluid control settings at runtime to observe their effects. Reaction values are not used for smoke simulation 25 4.5 Render settings modify the look of a fluid without changing its phys-ical properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.6 Different sample rates of a 64x128x64 fluid from afar. Left: 32 sam-ples; Right: 128 samples . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.7 Different sample rates of a 64x128x64 fluid from up close. Left to right: 32, 64, 128 samples . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1 Looking at the entire final scene from a distance with all fluids in view 33 5.2 Smoke and fire simulation in the application . . . . . . . . . . . . . . 34 5.3 Right: Fast decaying fire, producing a lot of smoke; Mid: Strong fire, burning with nearly no smoke; Right: Average strength fire, producing blue smoke . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.4 Benchmark results on notebook computer using a NVidia GT 640M LE GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 vi
  • 8. 5.5 Benchmark results on gaming PC using an AMD Radeon R9 290X GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 A.1 Technical Specifications of both graphics cards used for testing. The bandwidth and clock speeds are the key factors for performance . . . 40 vii
  • 9. List of Tables 4.1 Texture Formats and their uses . . . . . . . . . . . . . . . . . . . . . 17 5.1 Hardware used for testing . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Video memory used for simulations of different resolution . . . . . . . 35 viii
  • 10. Chapter 1 Introduction Fluid simulation has been a hot topic in computer graphics, especially in the last decade where the dramatic increase in computational power has affected not only the CPU (Central Processing Unit) but the much more parallel-focused GPU (Graphics Processing Unit) (Gupta, 2011). For virtual environments in games, a correct portrayal of natural phenomena, such as smoke and fire, aids greatly in immersing the player in the world and making it appear believable. However, realistic rendering and simulation of these fluids requires a considerable amount of resources - both from a processing and from a memory standpoint. In fact, when an extreme degree of accuracy is needed - for example, how a ships design will handle at sea - high-performance computing (HPC) centres are used and the calculations often take several months to complete. Computer games, in their inherent nature, are all about interacting in real-time with a virtual world. As CPUs and graphics cards have got more powerful, the expectations of how fast real-time is and how good the worlds look has increased. Depending on the game, the accepted frame rate varies between 30 and 60. Drops below 30 become instantly obvious, as the world seems to experience "hiccups" and the action slows down. In fast-paced, twitch-based experiences, such as first-person shooters or real-time strategies, maintaining a frame rate of 60 is often a requirement for smooth gameplay. The challenge of integrating realistic fluid simulation in a virtual world, while ad-hering to these requirements, is the main motivation behind this project. Lets start by defining what a fluid is. A fluid is any substance that flows - meaning it can take the shape of its container. This includes liquids, such as water, and gases, such 1
  • 11. as air. Smoke can also be described as a fluid, although it is more accurate to say that it is composed of tiny particulates suspended in a gas (Gourlay, 2012). Fire is the chemical process of combustion, leading to the release of heat and light. As it decays, smoke forms as a by-product. In graphics a fluid can be modelled as a grid system of cells, each containing the properties of the fluid at that location. The most important of these are velocity - the speed and direction of the flow at a cell location, and density - the amount of material that position contains. Every update step, the equation of motion is ap-plied on each cell and the quantity of the properties it contains changes. It follows that depending on the grid size and quality of the simulation, this traversal and update can quite be an expensive operation. This is where the GPU advantage comes in - splitting up a task into a lot of smaller parallel-running jobs is exactly what the hardware excels at. In fact, early 2D GPU fluid dynamics experiments saw a performance increase of up to six times compared to a CPU implementation (Harris, 2004). As GPGPU (General-purpose computing on graphics processing units) has advanced with technologies such as CUDA and DirectCompute (MSDN, 2010), speeds of up to ten times faster are becoming a reality (NVidia, 2013). Figure 1.1: Computational performance of Navier-Stokes equations on new NVidia GPUs 2
  • 12. Simply moving all calculations to the graphics card does not solve the problem fully. What needs to be considered is that in a proper real-time interactive application, the GPU will be engaged with many other activities - such as rendering polygons, doing lighting calculations and others - meaning that fluid simulation and rendering cannot be the only task occupying resources. 1.1 Project Aim The objective of this project is to investigate the simulation of physically-based fluids by taking advantage of modern GPU hardware with the aim of answering the following question: How can the parallel processing advantage of modern graphics cards be used for simulating physically-based fluids, and how can this approach be adapted for real-time use? The main consideration during this investigation will be what simplifications can be made when simulating - both when setting up and during runtime - in order to obtain a result which is both graphically impressive and computationally efficient. The project hopes to achieve the following: • Derive an effective way of utilising the GPU for solving the equations of fluid motion in 3D. • Discover what level of detail methods and performance optimisations can be applied in order to use fewer system resources. • Draw conclusions and recommendations for further research into this area. Over the course of this undertaking, an experimental framework will be developed to showcase the research discoveries. It will also be used to gather quantitative data in order to make the appropriate conclusions as to the effectiveness of the provided solutions. 3
  • 13. 1.2 Dissertation Structure Chapter 2 gives additional background on the main topics of discussion: the math-ematical formulas describing fluid motion; current state of fluid representation in games and it ends with a review of how level of detail is used in games to increase performance and how existing techniques can be adapted to fluid simulation. Chapter 3 presents past research in the area of fluid simulation - starting from work on early real-time solvers and moving on to research into using the GPU. It also discusses past work on ways of rendering fluids as well as research into integrat-ing level of detail in fluid simulation. Chapter 4 describes this projects implementation of a physically-based incompress-ible Navier-Stokes. It also discusses what optimisations are used for setting up and running a simulation. It ends with how fluid rendering is handled. Chapter 5 analyses the results and data collected from the experimental frame-work and explores their implications. Chapter 6 concludes this dissertation and draws recommendation for future work that could be undertaken. 4
  • 14. Chapter 2 Background 2.1 Mathematics of Fluid Flow For understanding of the mechanics behind fluid dynamics, knowledge of differential vector operations is expected. Gradient rf, divergence r·~v, curl r×~v, directional derivative ~v·rf and the Laplacian r2 are all used in the Navier-Stokes Equations which are described below. @~u @t = −(~u·r)~u− 1 rp+r2u+ ~F (2.1) r·~u = 0 (2.2) Where ~u is the velocity of the fluid; p is the pressure; is the density; controls the viscosity of the fluid and ~F encapsulates all external forces acting on it. Equation (1) is known as the momentum equation of fluid flow. It is derived from Newton’s 2nd law of motion which means it describes the acceleration of the fluid due to forces acting on it. From left to right these being advection, pressure, diffu-sion and external forces. When dealing with complex media it is common to make simplifying assumptions so as to more easily model the problem. Thus, when deal-ing with a fluid it is assumed that it is an incompressible and homogeneous one. Equation (2), the continuity equation enforces the incompressibility assumption by ensuring that the fluid always has zero divergence, meaning that the volume of the fluid will remain constant in time. 5
  • 15. The Navier-Stokes equations are commonly used because they precisely describe the evolution of a velocity field over time given its current state and other forces Stam (2003). The key task of a fluid solver is to compute a numerical approximation of ~u. This velocity field later controls the visual phenomena of the fluid - smoke density or fire reaction values for example. 2.1.1 Modelling the Simulation Space Fluids are typically modelled in one of two ways - as a field or as a particle system. These are referred to as the Eulerian and Lagrangian viewpoints, respectively. The first considers the fluid as a region of points - each containing properties like velocity and density. These values change with time, but the points containing them stay fixed in space. The Lagrangian viewpoint takes the more conventional approach of modelling the continuum as a set of particles. Each particle, in addition to carrying with it the properties of the fluid, has a position component. The easiest way to visualise this is to think of the particles as molecules of fluid that move in time. The most common way of representing Eulerian fluids is an arrangement of voxels and Lagrangian ones as classic particle systems. A more in-depth description of these viewpoints can be found in (Bridson, 2008). 2.2 State of Fluids in Games Fluid simulation in games covers a wide range of phenomena - the most important of which are water, smoke and fire. As this project mainly deals with the latter two, they will be the focus of discussion. For a more detailed look at the state of water in games, please refer to (Barrett, 2012). 3D games have to do an impressive amount of work to provide an immersive expe-rience. During each update call physics, pathfinding, lighting, rendering and other calculations have to be computed. It is no surprise that developers look to simplify effects whenever they can. Particle systems have for the longest time been used to model smoke. Particles are just 2D textured sprites, which always face the virtual camera. Fire is often rendered the same way with the addition of static animations. With improvements in lighting and particle control, the look of the effects does visibly improve but at their core the simulation is not based on physical properties but is determined by design tools. 6
  • 16. An additional negative side to this representation is that it makes proper interaction with the fluids very difficult to achieve. While, as is discussed later, in a suitably defined Navier-Stokes solver, boundary conditions are part of the simulation process and can be used as a means of accurate interaction with the system. 2.3 LOD and Performance Overview The need to render many and different graphical effects on a system with limited re-sources has been a widely explored challenge. Various techniques are often employed to save processing time and system memory while still maintaining good graphical quality. It is worthwhile to study some of these methods with the view of how they might be adopted for fluid simulation. A common performance boost when rendering 3D polygon mesh objects involves reducing the amount of polygons they are made out of (Valve, 2012). There are a variety of different ways to accomplish this - either with pre-made low-poly ver-sions of the mesh or via a procedural method at runtime, often using GPU shaders. A system is then set up to intelligently swap or blend between different versions based on various parameters, such as the ones mentioned above. The end result is less graphics bandwidth used and fewer computations made. If done properly, the player never notices it. Level of detail has other uses than just object rendering. It also has a place in the complex calculations such as rigid body dynamics. For simulating collisions be-tween bodies, for example, if the player is not looking at the objects in question, a simplified, less realistic calculation can take place. Whatever the player is looking at needs to behave consistently, but objects outside this direct area are less important and approximations can be used. A common occurrence in games is the need to render many of the same object multiple times. Creating copies for the resources required, like vertex buffers and textures, can quickly add up. Instead the same graphical data is used to render multiple copies of the object where required (Carucci, 2005). Since transferring tri-angle data from the CPU to the GPU and submitting state changes is a relatively slow operation, instancing is a method that frees up valuable CPU processing time. Batching as many draw calls together as possible is an often advised method of optimising game renderers. 7
  • 17. Chapter 3 Literature Review Investigations into the physics behind fluid simulation dates back to the 18th and 19th centuries when the mathematicians Euler, followed by Navier and Stokes devel-oped the basics of analytical solutions to fluid flows. With the start of the computer processing era came the possibility to calculate solutions to these equations numer-ically. Far from the idea of real-time applications, however, early research into the topic focused on engineering applications, striving for accuracy and not factoring in time taken (Hess and Smith, 1967). 3.1 Early work on real-time solvers In the late 90s and early 2000s the prospect of real-time simulations started to be actively discussed in research fields. Up to this point the majority of the inves-tigation had been into offline graphical solvers. Also, the majority of numerical solvers by that point used explicit techniques which suffer from instability unless a small simulation time step is provided. It was Jos Stam, who in 1999 SIGGRAPH conference, proposed an implicit Navier-Stokes solver that was stable under higher time steps and was fast enough so results could be viewed instantly (Stam, 1999). The importance of this paper stems from the fact that the method it put forward was designed to be used in real-time. Not only that, but this approach allows for boundary conditions to be dynamic and, as such, opens the door to interactivity with the fluid. For game applications this is key. The resultant technique is very successful in simulating gaseous-type fluids and will influence this study. 8
  • 18. Figure 3.1: Advection step moving smoke density along a velocity field. As shown in Stam’s solver from 2003 (Stam, 2003) Stam’s initial proposal has downsides, though. Namely, it suffers from numerical dissipation (also known as numerical diffusion/smoothing). This, as described by (Bai and Turk, 2005) is due to the averaging operations performed when interpo-lating values in the differential equation numerical solvers. Due to the lower order accuracy of the advection routine, Stam’s method experiences this. This not only tends to smooth out interesting features, like vortices in the fluid, but also makes the fluid appear too viscous. With this problem in mind, Fedkiw et al. (Fedkiw et al., 2001) presented a sem-inal paper in the 2001 SIGGRAPH proceedings. In Visual Simulation of Smoke the incompressible Euler equations are used as the fluid solver on a staggered grid arrangement. They are combined with a new method called vorticity confinement (Steinhoff and Underhill, 1994) which injects the energy lost due to numerical dissi-pation, effectively balancing out the simulation. The result is that, even on a fairly coarse grid, the aforementioned interesting features, such as swirling vortices in the smoke field, are preserved and the overall lifespan of the smoke is improved. Like Stam’s proposed method, this one is stable for large timesteps and allows for dy-namic boundaries. The downside of this procedure is that it introduces an extra computational step in the algorithm. The step itself is not overly expensive and greatly enhances the simulation’s look, so it will be considered for this research. 9
  • 19. 3.2 The GPU advantage At the SIGGRAPH 2003 conference the power of the GPU was the topic of dis-cussion. Krüger and Westermann demonstrated that the parallelism of graphics processors can be used as a matrix solver and to handle finite difference equations for PDE approximations (Krüger and Westermann, 2003). On an ATI9800 card an interactive visualized 2D Navier-Stokes solution ran at 9 FPS (frames per second) on a 1024x1024 grid. In contrast, the CPU solvers provided by (Fedkiw et al., 2001) need more than a second per frame on a similar sized domain. The advantage of the GPU became obvious and fluid dynamics research reflected that. In 2004, as part of the GPU Gems book (Fernando et al., 2004), Harris wrote a chapter entitled Fast Fluid Dynamics on the GPU (Harris, 2004). He described a method, based on Stam’s Stable Fluids technique that offloaded all equation of motion calculations to the graphics card and produced a very fast interactive 2D Navier-Stokes solver. He successfully demonstrated how the grid data can be translated into textures and how pixel shaders, which run simultaneously on each pixel every render call, can be used to calculate the simulation. To solve the Poisson-pressure equation he used a Jacobi iteration scheme, which, compared to the Krüger and Westermann conjugate gradient and multigrid solvers, converges slower, but is simple to implement and makes easy use of parallel calculations. Harris details how his approach can be easily extended to allow for arbitrary boundaries (indeed, it is just an addition of a texture that contains them to each shader). This GPU Gems chapter provides a straightforward introduction into using shaders as a fluid solver and will influence the early investigation of this study. Harris also describes a means to extend the domain into 3D by layering 2D textures, but as current technology allows for easy use of 3D Textures, it will not be considered. As graphics hardware saw a staggering growth, in 2007 Harris’ work was built upon in GPU Gems 3 (Nguyen et al., 2007). In the chapter Real-Time Simulation and Rendering of 3D Fluids (Crane et al., 2007) the authors extend GPU simulation of real-time fluid dynamics into the 3D domain. Their example program successfully simulated either fire, smoke or water in a 70x70x100 grid. Additionally, using the powerful Direct3D 10 support for 3D textures and the brand new geometry shader functionality, their method allowed for any 3D object to voxelised and used as a dynamic boundary for the simulation. Results of this can be seen in the game Hellgate: London (Studios, 2007), which utilises this procedure. 10
  • 20. Figure 3.2: Smoke being pushed and moving around by a gargoyle in Hellgate: London In the 2004 GPU Gems chapter, Harris uses a semi-Lagrangian backward advection step that is based upon the one used by Stam (Stam, 1999) and as such suffers from numerical smoothing. Crane et al. address this issue by utilising a MacCormack scheme, which is a higher-order accuracy advection solver, in addition to vorticity confinement. While this introduces two intermediate semi-Lagrangian steps in the advection process and is not an unconditional method, it allows for better visual fidelity of the final result without increasing the grid resolution. This saves memory bandwidth at the expense of more computation, but in the chapter’s words math is cheap compared to bandwidth. As this project will be looking into 3D smoke and fire simulation by utilizing graphics hardware, this work will be used for reference. 3.3 3D Fluid Rendering Graphics cards are optimised for rendering polygons and especially triangles. This must be taken into account when it comes to displaying volumetric data, such as a smoke, as there is no native way of rendering volume. 3.3.1 Particle Systems In almost all early real-time fluid solvers (Stam, 2003) and many modern ones (Gourlay, 2012) and (McGuire, 2006), the approach is to use particles. This has the initial advantage of using an already established system that is common in games and other graphical applications. Lagrangian or semi-Lagrangian schemes are used to represent the domain. In the example from Gourlay, there are two types of particles. The first are called vortex particles (or vortons). They are used to represent the flow field and are free to move anywhere. The second type are just regular particles, used for visualisation of the effect. They change their colour and opacity state, depending on the vortons around them. 11
  • 21. Using particle systems is advantageous when using a Lagrangian scheme and comes with the advantage that the simulation space can be global, instead of a constrained grid size. The disadvantage when using particles to visualise the fluid is the CPU and GPU memory and processing overhead of storing and updating all of them. The finer the detail level required, the more particles need to be used. Another downside is that an unconstrained, dynamic simulation space is difficult to implement when using the GPU. 3.3.2 Volume Ray Casting The other main rendering technique which has gained more traction in GPU fluid solvers is called ray-marching (or volume ray casting). This is the approach used by (Crane et al., 2007) and (Zhou et al., 2007). This method works by considering the fluid as a box, made up of many voxels, which contain the fluid properties. When rendering, rays are traced from the point of view to the volume. The rays are then marched through the domain with a predefined sample rate, accumulating colour based on what the volume contains - smoke density, for example. This ends either when enough density to get a fully saturated colour has been collected, or the ray exits the volume. Usually, to get a decent visual result, a step size equal to half a voxel is used when marching. Results of using volume ray casting can be seen in the figure on page 11. There are certain problems present in ray-marching. As discussed by (Crane et al., 2007), banding is a visible artifact that appears if the sample step is too big or grid resolution is too small. It is mostly prevalent when looking at the fluid up close. There are certain ways to compensate for this - by using a smaller sample step, or taking an extra sample at each step. These both come at an additional computa-tional cost, though. Ray-marching fits very well with the Eulerian and semi-Lagrangian schemes as it considers the simulation domain as a fixed grid with changing properties. It is also a technique that is inherently parallel and can naturally be implemented using GPU pixel shaders. As this work will focus entirely on leveraging the graphics card, volume ray casting will be the rendering method considered. 12
  • 22. Chapter 4 Methodology 4.1 Introduction and Goals Upon completing the research into this topic and revising the major project aims, implementation goals are set. To revise - the main objective of this project is to provide an effective way of performing fluid simulation on the GPU and adapt it to be used at runtime. With this in mind, the framework created is tasked to fulfil the following: • Support simulation of both fire and smoke. • Compute and render at least 2 different fluids at the same time. • Maintain a frame rate of at least 30 FPS on low-to-mid tier graphics cards and 60 FPS on high-end ones. • Showcase improved application performance by using one or more LOD tech-niques. 4.1.1 Framework Architecture The framework developed for this project targets the Windows 7+ operating sys-tems. It uses Direct3D for rendering and is implemented in C++. For graphics card operations it makes use of the powerful DirectCompute API along with HLSL (High-Level Shading Language) for writing compute and pixel shaders. 13
  • 23. 4.1.2 Methodology Structure The methodology starts by talking about how the fluid domain is represented. Then, it covers the set up of a simulation and optimisations that it makes use of. What follows after is a description of the process of running a fluid simulation and the techniques that are used to control it at runtime. The methodology concludes with how fluids are rendered. Please note that fluid simulation and fluid object will be used interchangeably. 4.2 Fluid Domain Representation In order to solve the fluid equations of motions numerically, the domain must be discretized into computational points that the solver works with. As discussed in the background chapter, an Eulerian representation models a domain as a grid of points that contain the properties of the fluid. This is the approach this framework utilises. The main reason being that 3D Eulerian grids are can be logically mapped to voxels in 3D textures, which contain the data the GPU requires. The following is part of a function which creates a 3D texture that can hold a 4 16-bit floating point numbers: D3D11_TEXTURE3D_DESC textureDesc; textureDesc.Width = SizeX; textureDesc.Height = SizeY; textureDesc.Depth = SizeZ; textureDesc.Format = DXGI_FORMAT_R16G16B16A16_FLOAT; Where SizeX, SizeY and SizeZ vary depending on the size of the required domain bounds. The format can also be changed. Each fluid uses a number of equally sized textures to represent its state and properties. These are then stored on and used by the graphics card. The equations of motion are calculated by running computational kernels (implemented by shader programs) over the textures. 14
  • 24. 4.3 Setting up a Simulation The first necessary requirement when creating a new fluid object is to determine its grid dimensions. Textures of this size are then created for the various fluid properties. Each texture is then used to create a ShaderParams structure, outlined below: struct ShaderParams { CComPtrID3D11ShaderResourceView mSRV; CComPtrID3D11UnorderedAccessView mUAV; }; Generally, a ShaderResourceView (SRV) is used as an input to a shader program, as it can only be read from, and a UnorderedAccessView (UAV) is used as an output, as it can be written to. A more detailed description of Direct3D resource interfaces can be found on (MSDN, 2014). For all fluid properties, except divergence, vorticity and obstacles, 2 textures and ShaderParams structures are created. This is due to the need to keep track of the fluid state at the previous time step in order to evaluate the new one. Choosing what type the fluid will be is also decided during this step. The framework allows for two kinds - fire and smoke. While simulating both is nearly identical, fire simulation requires 2 extra textures to keep track of the fire reaction values (this determines the intensity of the fire at each cell and is used when rendering it). At the end of the setup static boundary conditions are initialised. As fluids are modelled as being in a box domain, boundary conditions are modelled as a single-cell wide obstacle along each wall of the box and are stored in an 8-bit texture. 4.3.1 Choosing a Grid Size Grid size has the biggest effect on how fast a simulation step is processed. A 32x64x32 domain, for example, will be evaluated more than 3 times faster than a 64x128x64 one. This is both due to the fact that there are less cells to process and because smaller texture sizes need less memory bandwidth. The render size of the fluid is independent of its simulation domain. This means that both a high and a low-resolution grid can be rendered with the same size. Up- 15
  • 25. scaling the render size of a coarse grid can lead to visible artifacts, as there will be less cells to sample for a good appearance. Similarly, rendering a fine grid in smaller scale is potentially wasteful, as the increased detail is harder to spot. Figure 4.1: Different grids, same render scale. Domain size from left to right: 16x32x16, 32x64x32, 64x128x64 When setting up a simulation, it is best to first determine the render scale required and use that to choose the appropriate grid resolution in order to achieve the detail needed. It is worth noting that dynamically resizing textures at runtime is not feasible, so grid sizes stay fixed throughout execution. 4.3.2 Setup Optimisations Memory is a key factor during setup, since depending on the fluid, a simulation can use up to 15 textures (due to double buffering) to keep all the required data for its state. These all have to be stored in memory and bound-unbound from the graphics pipeline every frame. Using many fluid objects has the risk of bottlenecking the GPU and starving it of memory. Texture Formats Direct3D offers an expansive range of different texture formats that can be used on the GPU. Based on the needed number of components and their size, choosing the correct format helps reduce video memory used and keeps texture bandwidth low, increasing performance. This is the first place for potential optimisations. When constructing fluid object textures, the format chosen is the smallest one that can contain the data. The DXGI_FORMAT_R16G16B16A16_FLOAT format is used for textures that hold fluid velocity and vorticity, since it is the smallest one 16
  • 26. that provides 3 components per cell. Density and pressure, on the other hand, only need 1 component per grid cell, which means they can be created with the DXGI_FORMAT_R16_FLOAT format. This leads to using 4 times less memory. It is worth to mention that using 16 bits per float is in itself an optimisation over using 32 bit floats, but as shown in other research (Crane et al., 2007), the visual degradation due to precision is hardly discernible. Below is a table of all formats used and the properties they’re used for. Table 4.1: Texture Formats and their uses Direct3D Texture Format Uses Texture Format Fluid Property DXGI_FORMAT_R16G16B16A16_FLOAT Velocity Vorticity DXGI_FORMAT_R16_FLOAT Density Temperature Reaction Divergence Pressure DXGI_FORMAT_R8_SINT Obstacles Texture Sharing There are several resources that need to be created for each fluid object that are unique to it. These are the velocity, density, temperature, vorticity, obstacle and reaction (for fire simulation) textures. They are unique, as they must be maintained through program execution. On the other hand, the textures for velocity divergence and fluid pressure are used temporarily by each solver. To take advantage of this, when a fluid simulation is constructed it first checks if an instance of these common resources has been created for its grid size. If so - it uses them, if not - it constructs and makes them available for further sharing. With this advantage in mind, if using many fluid objects - it is advantageous to build them of the same size. 17
  • 27. 4.4 Running a Simulation Once the 3D scene has been initialised, the main application loop begins. In it, each fluid simulation is updated based on the numerical equation of motion solver. For this implementation, fluids are modelled as both incompressible and inviscid. Thus, the equations of motion become: @~u @t = −(~u·r)~u− 1 rp+ ~F (4.1) r·~u = 0 (4.2) The calculation process involves solving each part of these equations in order, using the result from each as the input to the other. The function that solves this every update is outlined below: void Fluid3DSolver::Process(ID3D11DeviceContext *context) { // Set the obstacle texture - it is constant throughout the execution step context-CSSetShaderResources(4, 1, (mFluidResources.obstacleSP.mSRV.p)); // Set all the constant buffers to the context SetShaderBuffers(context); //Advect temperature, density and reaction against velocity AdvectProperties(); // Advect velocity against itself AdvectVelocity(); // Determine how the temperature of the fluid changes the velocity ComputeBuoyancy(); // Add a constant amount of density and temperature back into the system RefreshConstantImpulse(); // If there are any extra forces - add them here ApplyExtraForces(); // Inject vorticity back into the system ComputeVorticityConfinement(); // Subtract the pressure gradient from the velocity field. This computes divergence free velocity. ComputeProjection(); } 18
  • 28. Firstly, the function binds the obstacle texture to the graphics pipeline. This is because nearly all compute shader programs query this texture and there is no need to constantly rebind it. Afterwards, the SetShaderBuffers function copies the required fluid computational parameters - which control aspects of calculation - from standard C++ structs into GPU constant buffers. 4.4.1 Simulation Steps The rest of the functions solve the equations of motion. They all have underlying similarities: binding required SRVs as inputs and UAVs as outputs to their respective shader programs. Each is calculated using a numerical method that estimates its value. Below are all the steps outlined in order of execution. Advection Advection is what happens when the velocity field of the fluid transports other quan-tities, including itself, along the flow. This is described by the term (~u·r)~u. There are two methods that the framework uses to calculate advection. The first, simpler one, is the trace-back implicit routine (Stam, 1999). It uses a semi-Lagrangian scheme to calculate the new quantity of a fluid property at a posi-tion by tracing back the trajectory to its former cell and copying the quantity. The advantage of this advection technique is that it is unconditionally stable for any time steps and velocities. p (~x, t+t) = × p(~x−~u(~x, t)t, t)−μ (4.3) Here p (~x, t+t) is the quantity at the new time step. is a user-defined dissipation term. It is in the range 2 [0, 1] and it artificially controls how fast the quantity being advected dissipates. 1 is no dissipation and lower values lead to the quantity disappearing faster. μ is the decay constant. It is used only for fire simulation and controls how fast the fire reaction dies out. When it is used the end result is clamped to not go below 0. The second advection routine used is the one proposed in (Crane et al., 2007) - the MacCormack scheme. It works by first performing two semi-Lagrangian steps, one by tracing forward and one by tracing back. Using those values, it performs a higher-order accuracy calculation, which leads to less numerical diffusion than the 19
  • 29. previous routine. ˆn+1 = A(n) ˆn = AR( ˆn+1) n+1 = ×( ˆn+1+ 1 2(n− ˆn))−μ (4.4) Here, n indicates the advected property, ˆn+1 and ˆn calculate the two intermedi-ate properties. n+1 gives the final property at the new time step. A performs the advection routine 4.3 on the passed quantity and AR indicates that it is performed in reverse (meaning with a negative time step value). Again is the dissipation fac-tor and μ is the decay constant. When doing MacCormack advection, no artificial dissipation or decay is performed on the first two steps. Since this advection routine is not unconditionally stable, the final result is clamped within the minimum and maximum values of the surrounding grid cells. While the MacCormack scheme gives improved detail, it forces the creation of two additional textures to hold the intermediate results and the computational cost of calculating them. The cost of the first can be offset by using texture sharing, mentioned previously, between simulations for these temporary values. The compu-tational cost is dealt with by using MacCormack advection only for the density and reaction properties and the standard one for the temperature and velocity fields. Figure 4.2: Left: Using MacCormack for density and reaction only; Right: Using MacCormack for all fields 20
  • 30. As it can be seen, due to the chaotic nature of both fire and smoke, the extra detail gained by using the more expensive advection routine on all fields is hardly discernible. In fact, the only difference can be seen in the beginning of a simulation, as the MacCormack one advects slightly faster. Buoyancy Buoyancy is what causes hot air to rise and cool air to fall. In the simulation it is used to modify the velocity field at each grid cell based on the temperature and density values at that cell, the density weight and buoyancy and the ambient temperature of the environment. It is one of the external forces ~F in equation 4.1. ~ fbuyoancy = ((T −Tamb)'−(×))~vupt (4.5) Where T and represent temperature and density at the current grid location respectively. Tamb is the ambient temperature of the fluid - if not used, can be left at 0. The buoyancy factor of the density field is ' - it controls how buoyant the smoke is, meaning how quickly it rises with the hot air. is the smoke weight - a higher value will exert a stronger force on the velocity field and will make it die out faster. The result is multiplied by the global normal up vector and the current time step numerical integration value. The resultant force is then applied to the velocity value at that cell location. Constant Impulse and External Forces All fluid objects have been designed to have new quantities added every simula-tion step. For smoke, a constant amount of temperature and density is injected from the bottom of the domain. The addition of the first helps the system maintain velocity and the second keeps a steady stream of smoke that is the final visible result. With fire simulation the addition of temperature remains the same. It also in-jects extra reaction into the system along with the temperature. This is analogous to adding fuel to a fire. Afterwards, an extinguishment test is performed on the grid. This samples reaction values and determines if they are below an extinguishment threshold - if so, smoke is formed based on a reaction constant. The ApplyExternalForces method is mainly reserved for future use. This is where forces such as wind can be added to introduce more chaotic behaviour into the sys-tem. User interaction with fluids can also be accomplished using this function. Any 21
  • 31. quantity used by the simulation can be added in this step. Vorticity Confinement Even when using a higher-order advection routine, the solver still suffers from nu-merical dissipation. Vorticity confinement (Fedkiw et al., 2001) tries to offset this by calculating the local vorticity ~! = r×~u (4.6) and injecting it back into the velocity field. Calculating this is the first step of the process. Afterwards a normalized vorticity location vector is retrieved using: ~ = r|~!| ~N = ~ |~| (4.7) In both equations, vector operations are estimated using finite difference methods. The final confinement force is then calculated by: ~ fconf = (~N ×~!)t (4.8) In this equation 0 is called the strength factor and controls the amount of small scale detail that is introduced back into the velocity field. In this project implemen-tation it is clamped to the range 2 [0, 1]. This force is then added to the existing flow. Vorticity confinement requires the addition of 1 extra texture per fluid and 2 rela-tively cheap shader program operations per update step. The technique proves vital to the proper appearance of both smoke and fire and more than makes up for its cost in visual quality. 22
  • 32. Figure 4.3: Different vorticity confinement strengths. Strength factors from left to right: 0, 0.5, 1.0 As can be seen - a suitable strength factor lies between 0.5 and 1.0. The simulations in the project application use values in that range with fire ones tending to be higher. Projection Up to this point a velocity field ~w has been calculated but it does not adhere to the continuity equation 2.2 as it is divergent. Therefore, the final step in each simulation update is to calculate a divergence-free flow field. (Harris, 2004) explains that the Helmholtz-Hodge Decomposition Theorem can be used to correct the velocity by subtracting the gradient of the pressure field: ~u = ~w−rp (4.9) To compute the pressure field the following Poisson-pressure equation can be used: r2p = r· ~w (4.10) These two equations are logically broken down into 3 operations. The first calculates the divergence of the velocity field r· ~w and stores it in a texture. Again, vector operations are estimated using finite differences. 23
  • 33. The second step solves the Poisson-pressure equation using a common method called a Jacobi iteration solver. It is a technique that converges relatively slowly to a solu-tion but has the advantage of being cheap to run using GPU kernels (Harris, 2004). This project uses an average of 10 to 15 Jacobi iterations for both fire and smoke. A higher number will provide better looking, more accurate results but the compu-tational cost rises quite steeply. As proven by (Crane et al., 2007) higher iteration counts do not lead to overly better quality render results. The final step is a straightforward subtraction of the resultant pressure gradient from the divergent flow field. The result is stored in ~u which becomes the new velocity field. Boundary Interaction As mentioned in section 4.3, all fluids have a single voxel wide obstacle texture on the box edges that acts as the boundary for the system. Cells in this texture either have the value of 1 if there is an obstacle at the location, or 0 if there is none. All computational steps have access to this texture and use it differently. Its most important function is to enforce the free-slip boundary condition, which states that a fluid cannot flow into or out of a solid, but can freely flow along its surface. This is mainly done in the projection step, where if an obstacle is detected, the velocity component of that cell is taken as 0. When performing Jacobi iterations and sampling adjacent cells, if an obstacle is present, the pressure component of that cell is not used - this is the approach utilised by (Crane et al., 2007). Obstacles are similarly used in the computation of vorticity confinement and ad-vection - forcing the velocity vector to be 0 if inside a boundary. 4.4.2 Runtime Modifications Since so many variables control the appearance and structure of a fluid object, it is deemed feasible to have as many of them available to be edited at runtime as pos-sible. These are all kept in a C++ struct called FluidSettings and, along with the domain size, are used when constructing a fluid. These parameters are then trans-ferred to the GPU in various buffers during the SetShaderBuffers function from 4.4. 24
  • 34. At runtime, nearly all of the control parameters can be edited from a user interface window. This window appears when the user clicks on a fluid object with the mouse. Figure 4.4: Users have the freedom to edit fluid control settings at runtime to observe their effects. Reaction values are not used for smoke simulation When a parameter is edited, a method is called on the respective FluidCalculator for that object. void Fluid3DCalculator::SetFluidSettings(const FluidSettings fluidSettings) { // Update buffers if needed int dirtyFlags = GetUpdateDirtyFlags(fluidSettings); this-fluidSettings = fluidSettings; if (dirtyFlags BufferDirtyFlags::General) { UpdateGeneralBuffer(); } else if ... } It first checks to see what settings have been changed and sets the necessary update dirty flags. Using the dirty flag pattern allows for only the constant buffers that 25
  • 35. have changed to be updated, instead of all of them. Updating a buffer involves copying its contents from GPU to system memory, changing them and then copying them back into the GPU so it should not be overused as advised by (McDonald, 2012). Dirty flags assist with this. In a real game environment a player would not have such access to fluid settings, but this is immensely useful as a level of detail or game design tool as it allows for fine-tuning of just how the simulation plays out. 4.4.3 Choosing an Update Rate For doing updates on objects each frame, games tend to use the difference between the time at the new frame subtracted from the time at the old frame. This is referred to as the delta time. Since this can vary with frame rate, sensitive calculations such as game physics tend to use a fixed integration time step value that is independent from delta time. This approach is used here - the value is, by default, 1/30, meaning 30 fluid up-dates per second. Note that this value controls how often the process method of a fluid is called, not the t value for the calculation formulas - that is defined separately for each fluid. The advantage of calling process at a fixed rate is that it keeps fluid movement consistent. If it was updated with a variable rate, each fluid would slow down or speed up, leading to a distorted look. 30 updates a second was chosen since it is fast enough for each fluid to develop with reasonable speed, while still keeping up decent performance. It can be changed at runtime, although if the update rate increases above the processing capability of the hardware, the application slows down as it cannot keep up with the required number of updates. Rates of around 30 to 50 a second are common choices, although higher ones are certainly achievable on better hardware. 26
  • 36. 4.4.4 Frame Skipping Even with the many simplifications, memory cutbacks and processing optimisations used, updating a reasonably-sized fluid object every frame is a demanding operation. Here is where an LOD technique called frame skipping comes in use. Its premise is quite simple - instead of updating a fluid simulation every frame, do it every few frames. It is inspired by the approximation techniques used by game physics simulations and has previously been adopted for fluids (Tangvald, 2007). Below is the implementation as used in the project. void Update() { bool canUpdate = framesSinceLastProcess framesToSkip; if (canUpdate) { fluidCalculator-Process(); framesSinceLastProcess = 0; } else { ++framesSinceLastProcess; } } Although a very simple LOD method, frame skipping frees up substantial comput-ing power, especially when using many fluid objects. Its downside is that its effects are quickly spotted. Even skipping one frame per process step means that the sim-ulation will update twice as slow. Therefore, this technique is only used on fluids which are not in the current view frustum. Even then, it starts being used only after the simulation has had a few seconds to develop first. Afterwards, no difference in behaviour can be noticed when looking away and then back at a fluid object, since the behaviour is inherently chaotic. Choosing the amount of frames to skip can be changed at runtime. It has to be noted that the performance gained by skipping additional frames is not linear - it peaks at 5 and most is gained around skipping 2 or 3. 27
  • 37. 4.5 Rendering Fluids Rendering the final result is done via the Ray-marching technique previously dis-cussed (Zhou et al., 2007). It was chosen due to it being straightforward to im-plement in a standard pixel shader program and for its ability to give good visual results. A fluid in 3D space is represented by an object called a VolumeRenderer which at its core is a simple cube - it has a position, rotation and scale components - all the required properties for rendering in 3D space. When an instance of a volume renderer is constructed it needs to know what type of fluid it will render - smoke or fire. If the type is smoke - it can be given a reference to a 3D texture (in the form of a SRV) of smoke density values that it then uses for drawing. If rendering fire, it can also be passed a reference to a 3D texture of fire reaction values in addition to density. This creation data is important, as there are different pixel shaders used when rendering each type. 4.5.1 Render Parameters There are certain parameters that affect the render result of a fluid simulation which can be modified at runtime. Figure 4.5: Render settings modify the look of a fluid without changing its physical properties Number of Samples The number of samples is the sample rate described in section 3.3.2. It has a direct effect on the quality of the produced result. A higher rate will sample more density and reaction values, thus producing a more accurate average colour. It also means that more time will be spent in the pixel shader, which directly affects performance. In practise, sample rate matters only when the fluid takes up a significant amount 28
  • 38. of screen space. This is due to the fact that a pixel shader is only run on the visible pixels on the screen that an object occupies. Figure 4.6: Different sample rates of a 64x128x64 fluid from afar. Left: 32 samples; Right: 128 samples As it can be seen, from afar the difference in quality is hardly discernible, although the step size difference is substantial. The performance of both is nearly identical - since there are less pixels occupying the screen space, the extra time spent in the shader program is insignificant. Figure 4.7: Different sample rates of a 64x128x64 fluid from up close. Left to right: 32, 64, 128 samples This is the same flame as the one in the previous figure. When viewing from a 29
  • 39. closer distance, the quality of using a higher number of samples can be seen more clearly (this is more defined when seeing the fluid moving). This is due to the lower step value leading to a smaller range of colours used to represent the fluid. A vis-ible improvement is seen when increasing the sample rate from 32 and 64, but a very slight one when going from 64 to 128. This is because more samples cannot make up for the grid size of a fluid. Even for a relatively big domain, like the one in the figures, there will be little visual gain when using more than 100 samples per ray. There are significant performance implications when using a higher sample rate with the fluid in full view, since the fluid takes up a large part of the screen. De-pending on the view distance, rendering with 32 samples could be nearly twice as fast as rendering with 128. It is therefore best to decide upon a sample rate that would give a good visual result, yet still compute fast. Colour and Absorption Changing the Smoke Color property alters the colour appearance of smoke for both types of simulations. Smoke and fire Absorption control how much to saturate the resultant colour when sampling density and reaction values respectively. A higher value will mimic thick smoke or flames, while a lower one will produce a weaker looking flame or lighter smoke. 4.5.2 Fluid Instancing A key goal throughout the development of this project is to separate the concept of fluid motion calculation from fluid rendering. A Fluid3DCalculator object does not know about a VolumeRenderer and vice versa. The former is responsible for setting up and running the equations of motion on a set of 3D grids while the latter will render suitable 3D textures passed to it. Given this separation, it is straightforward to implement a form of instancing for fluids. This means that one fluid instance can be drawn multiple times by different volume renderers. Since the cost of rendering is trivial compared to the cost of simulating, this allows for a scene to seemingly contain many fire and smoke effects, while only computing a small amount. 30
  • 40. Volume renderers using data from the same fluid instance will display identical results. To make them visibly dissimilar, each can be set different render parame-ters. A combination of colour and absorption can be used to achieve non-identical looking fluids. The final scene as seen on page 33 is made out of 2 unique smoke simulations - one of which has 3 instances, and 1 unique fire simulation that has 2 instances. Instancing and Frame Skipping Frame skipping is used when a fluid simulation is not in view. Instancing means that the same fluid simulation can be in more than one place. To deal with this, before activating frame skipping, all volume renderers that use a particular fluid object are tested for visibility. If even one is in view - frame skipping will not occur. 31
  • 41. Chapter 5 Results and Discussion The previous chapter covered the implementation details of calculating the fluid equations of motion and rendering the result. It also discussed various optimisation methods used to make the process as performant as possible. This chapter will examine the results of the implementation to determine its effectiveness. This will involve scrutinizing both the visual results of the simulation and its performance. 5.1 Testing Setup 5.1.1 Hardware The application was tested and benchmarked on two different systems. The first is a mid-tier laptop and the second is a high-end gaming PC. Table 5.1: Hardware used for testing Laptop PC CPU Intel Core i7-3632QM @ 2.20GHz i5 3570K @ 4.5GHz RAM 8 GB, DDR3 12 GB, DDR3 GPU NVIDIA GeForce GT 640M LE, 2 GB DDR3 ATI R9 280X, 3GB GDDR5 OS Microsoft Windows 7 64-bit Microsoft Windows 8.1 64-bit The important difference between the two setups being the graphics card. The NVIDIA, being a mobile low-power series, has around 2.5 times less clock cycles and 11 times less memory bandwidth compared to the ATI one. Detailed specifica-tions on both GPUs can be found in appendix A. 32
  • 42. Quantitative results are obtained during application runtime. There is an in-game frame counter to report on FPS. It displays current, minimum, maximum and av-erage frames per second achieved and is used as a benchmark for performance. 5.1.2 Scene The test scene has been set up to fulfil the application requirements. There are 3 different fluids computed at the same time - 2 fire and 1 smoke effects. There are 6 volume renderers visualising the results of those simulations. Figure 5.1: Looking at the entire final scene from a distance with all fluids in view The user is free to control the camera, click on fluids and change or observe their parameters. There is also a scene fly-through mode, which performs a looping predefined movement through the scene. This mode features both up-close and distance views of the various fluids in the scene. 33
  • 43. 5.2 Visual Results Real-world phenomena, such as smoke and fire, come with an inherent random-ness and subtle features that computer graphics do not have the power to precisely mimic. With certain simplifications and smart uses of technology, though, the re-sults obtained in this project successfully attempt to bridge that gap. Figure 5.2: Smoke and fire simulation in the application 5.2.1 Modifying Parameters Since the application allows the freedom to modify both fluid and render settings - it is very easy to produce different looking simulations. Figure 5.3: Right: Fast decaying fire, producing a lot of smoke; Mid: Strong fire, burning with nearly no smoke; Right: Average strength fire, producing blue smoke 34
  • 44. 5.3 Memory Performance Results The main goal of this project is to prove that the parallel power of graphics cards has reached a threshold that would allow for real-time physically-based fluid simulation. For this reason memory and frame times are both a topic of common discussion throughout this project. 5.3.1 Memory Use In section 4.3.2 the various optimisations that are performed during a fluid object set up were discussed. By querying the GPU, it can be seen how much video mem-ory fluids of different types and domain sizes use. Below is table with several of these results with increasing grid resolution. These do not include video memory for rendering. Table 5.2: Video memory used for simulations of different resolution Grid Size Smoke Memory Fire Memory Shared Memory 16x16x16 0.1 MB 0.11 MB 0.06 MB 32x32x32 1.8 MB 1.9 MB 0.8 MB 64x64x64 13.8 MB 14.8 MB 5.5 MB 128x128x128 110 MB 118 MB 44 MB Smoke Memory is the video memory required per unique smoke effect and Fire Memory is the memory required per unique fire effect. Shared Memory is how much of that total can be shared with other simulations. As it can be seen, the memory required to store all of the textures that contain the fluid properties rises exponentially with grid resolution. By utilising texture sharing, some of this memory cost is offset when using more than one fluid of the same size. Even so, using sizes bigger than 1283 is infeasible both due to the memory cost required but also because the processing time quickly rises. A good option is to only use a higher resolution in one or two dimensions, while using a smaller on in another. Alternatively, grids in the range of 303 to 503 are ideal for modelling average sized uniform domains. Their memory cost comes around 1 to 1.5 times that of high 35
  • 45. quality PNG images, which are often used as textures in games. Both test GPUs have in excess of 2 GB of memory to spare, so this is a small cost to pay. Finally, instancing allows for having many fire and smoke effects without paying the memory cost for creating each one. Its benefits are measured in the amount of instances that use a single fluid object. Considering also that the cost of a vol-ume renderer is insignificant compared to that of a simulation means that, where appropriate, instancing should be preferred to creating a new fluid effect. 5.3.2 Performance To recap, the final scene features 1 smoke simulation of grid size 64x128x64, an-other smoke one of grid size 30x60x30 and 1 fire of size 40x80x40. There are a total of 6 volume renderers displaying the results of these simulations. Each simula-tion does 10 Jacobi solver iterations and uses a sample rate of 64 when ray-marching. This scene was benchmarked on both test machines several times with increasing simulation update rates. Benchmarking involves running the scene in fly-through mode for a period of 5 minutes and noting down the minimum, maximum and average frame rates achieved. Figure 5.4: Benchmark results on notebook computer using a NVidia GT 640M LE GPU The substantial difference between the maximum and the minimum and average FPS is noticed immediately. This is due do the use of frame skipping when some or all simulations are not in view, freeing up GPU resources. The minimum frame rate occurs when all fluid objects are in view and one or more are viewed up close, 36
  • 46. which increases render time. The majority of time in the fly-through mode is spent with all or 2 out of 3 simulations in view from a distance. This is what the average FPS captures. The benchmark results show that going above 30 updates/sec is not feasible on this setup since frames quickly start dropping. As mentioned previously, if the up-date rate forces the use of more clock cycles and texture bandwidth than available, the program slows down. Figure 5.5: Benchmark results on gaming PC using an AMD Radeon R9 290X GPU This graph displays the significance that increased memory bandwidth and clock cycles have on performance. The AMD R290x only begins to get a decreased frame rate when doing over 150 updates/sec. Up until then, it consistently keeps an av-erage of above 800 FPS. Only around the 200 updates/sec mark do the simulations start reaching the system limits. In reality, though, there is no reason to use an update rate of more than 30-40 when that power can be spent on computing and rendering more fluid objects, in-stead. These results show the potential that the new generation of GPUs have for handling such computationally intensive tasks. 37
  • 47. Chapter 6 Conclusion and Future Work This project had the goal of investigating fluid simulation with the aim of answering the following question: How can the parallel processing advantage of modern graphics cards be used for simulating physically-based fluids, and how can this approach be adapted for real-time use? With particular goals being: • Derive an effective way of utilising the GPU for solving the equations of fluid motion in 3D. • Discover what level of detail methods and performance optimisations can be applied in order to use fewer system resources. This research has demonstrated that the equations of fluid motion can be calculated in real-time with reasonable frame rates on the GPU. The project implementation provided offers an optimised and memory efficient solution for numerically solving and rendering fire and smoke with satisfactory results. The performance tests in Chapter 5 clearly show that the newest generation of graphics cards are more than capable of updating and rendering many simulations at once. The tests also showed that low-to-mid tier cards can handle their own when dealing with a few reasonably sized fluid domains at an average update rate. 38
  • 48. 6.1 Future Work This project covers how to efficiently implement a fluid solver and render the results. For a topic as broad as fluid simulation there is certainly more research that could be done. One area that can certainly be further investigated is implementing interactions with a fluid. The external forces part of the motion equations can be used to pro-vide a form of user control of the system. (Crane et al., 2007) implement a form of object voxelisation using a geometry shader to allow arbitrary 3D models to be used as obstacles in the simulation. This technique could be extended and improved to take into account different objects going into and out of a fluid domains, disturbing it based on their velocity and shape. When there are only a few sources of constant input into a fluid domain, large parts of the 3D grid are left empty but still take up computational time. A better way to handle updating a fluid would be to split up each grid into chunks and de-termine if a chunk contains fluid properties. Then, only the ones that do will be updated. This technique has the potential to allow for much faster processing of bigger fluid domains. To further increase visual quality, rendering smoke could take into account light sources and each fluid should be able to cast dynamic shadows. Additionally, a fire itself could be made a light source. This would be achieved by first creating a num-ber of lights per fire simulation and then advecting their positions via the velocity field and controlling their brightness via the reaction field. 39
  • 49. Appendix A Test GPUs Specifications Figure A.1: Technical Specifications of both graphics cards used for testing. The bandwidth and clock speeds are the key factors for performance 40
  • 50. Appendix B CD Contents The attached CD contains the following directory structure: Application Contains the final application executable. Dissertation Contains an electronic copy of this dissertation document. Instructions Contains instructions for the operation of the application. Media Contains images and video of the final application. Project Contains the full source code and assets for the application. Proposal Contains an electronic copy of the original project proposal. 41
  • 51. References Bai, Y. and Turk, G. 2005. Reducing numerical dissipation in fluid simulation. Georgia Institute of Technology Available from: http://tinyurl.com/pcy4exs. 3.1 Barrett, J. 2012. Real-time animation and rendering of ocean waves. [Online]. 2.2 Bridson, R. 2008. Fluid Simulation for Computer Graphics. CRC Press. 2.1.1 Carucci, F. 2005. Inside Geometry Instancing. Addison-Wesley Professional. Available from: http://http.developer.nvidia.com/GPUGems2/gpugems2_ chapter03.html. 2.3 Crane, K., Llamas, I., and Tariq, S. 2007. Real-Time Simulation and Render-ing of 3D Fluids. Addison-Wesley Professional. Available from: http://http. developer.nvidia.com/GPUGems3/gpugems3_ch30.html. 3.2, 3.3.2, 4.3.2, 4.4.1, 4.4.1, 4.4.1, 6.1 Fedkiw, R., Stam, J., and Jensen, H. W. 2001. Visual simulation of smoke. In: SIGGRAPH 2001 Conference. 3.1, 3.2, 4.4.1 Fernando, R. et al. 2004. GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics. Addison Wesley. Available from: http://http.developer. nvidia.com/GPUGems. 3.2 Gourlay, M. 2012. Fluid simulation for video games. Intel Devel-oper Zone Available from: http://software.intel.com/en-us/articles/ fluid-simulation-for-video-games-part-3. 1, 3.3.1 Gupta, S. 2011. Gpu supercomputers show exponential growth in top 500 list. [Online]. Available from: http://blogs.nvidia.com/blog/2011/11/14/ gpu-supercomputers-show-exponential-growth-in-top500-list/. 1 42
  • 52. Harris, M. 2004. Fast Fluid Dynamics Simulation on the GPU. Addison Wes-ley. chap. 38. Available from: http://http.developer.nvidia.com/GPUGems/ gpugems_ch38.html. 1, 3.2, 4.4.1, 4.4.1 Hess, J. and Smith, A. 1967. Calculation of potential flow around arbitrary bodies. In: Progress in Aerospace Sciences. 3 Krüger, J. and Westermann, R. 2003. Linear algebra operators for gpu implementa-tion of numerical algorithms. In: SIGGRAPH 2003 Conference. Available from: http://tinyurl.com/ozb5xpy. 3.2 McDonald, J. 2012. Don’t throw it all away: Efficient buffer man-agement. In: Game Developer Conference. Available from: https: //developer.nvidia.com/sites/default/files/akamai/gamedev/files/ gdc12/Efficient_Buffer_Management_McDonald.pdf. 4.4.2 McGuire, M. 2006. A real-time, controllable simulator for plausible smoke. Brown University Available from: http://graphics.cs.williams.edu/papers/ SmokeSimBrown06/smoke-simulation-brown06.pdf. 3.3.1 MSDN. 2010. Compute shader overview. [Online]. Available from: http://tinyurl. com/plpw97t. 1 MSDN. 2014. Resource interfaces. [Online]. Available from: http://tinyurl.com/ mwledo4. 4.3 Nguyen, H. et al. 2007. GPU Gems 3. Addison-Wesley Professional. Available from: https://developer.nvidia.com/content/gpu-gems-3. 3.2 NVidia. 2013. NVidia Computational Fluid Dynamics Page. [Online]. Available from: http://www.nvidia.com/object/computational_fluid_dynamics.html. 1 Stam, J. 1999. Stable fluids. In: SIGGRAPH 1999 Conference. Avail-able from: http://www.dgp.toronto.edu/people/stam/reality/Research/ pdf/ns.pdf. 3.1, 3.2, 4.4.1 Stam, J. 2003. Real-time fluid dynamics for games. In: Game Developer Con-ference. Available from: http://www.dgp.toronto.edu/people/stam/reality/ Research/pdf/GDC03.pdf. (document), 2.1, 3.1, 3.3.1 Steinhoff, J. and Underhill, D. 1994. Modification of the euler equations for “vorticity confinement”: Application to the computation of interacting vortex rings. Physics of Fluids . 3.1 43
  • 53. Hellgate: London. 2007. DVD-ROM. 3.2 Tangvald, L. 2007. Implementing lod for physically-based real-time fire rendering. [Online]. 4.4.4 Valve, S. 2012. Level of detail. Valve Developer Portal Available from: https: //developer.valvesoftware.com/wiki/Level_of_detail. 2.3 Zhou, K. et al. 2007. Real-time smoke rendering using compensated ray march-ing. Microsoft Research Available from: http://research.microsoft.com/ pubs/70503/tr-2007-142.pdf. 3.3.2, 4.5 44
  • 54. Bibliography Acheson, D. 1990. Elementary Fluid Dynamics. Clarendon Press. Rideout, P. 2011. 3d eulerian grid Available from: http://prideout.net/blog/ ?p=66. Selle, A. et al. 2007. An unconditionally stable maccormack method Available from: http://tinyurl.com/nm4novl. 45