A parallel computational framework for the simulation of variably saturated flow
based on the Cellular Automata concept usi...
Upcoming SlideShare
Loading in …5
×

A parallel computational framework for the simulation of variably saturated flow based on the Cellular Automata concept using CUDA architecture

439 views

Published on

A simple and efficient computational framework is presented for the simulation of variably saturated flow in porous media. In this modeling approach the Cellular Automata (CA) concept is implemented. The computational domain is thus discretized with a regular grid and simple rules govern the evolution of the physical phenomena. The inherent CA concept simplicity and its natural parallelism make the parallel implementation of algorithms very efficient, especially for the simulation of large scale phenomena. This is a very important feature that allows to incorporate the CA computational framework into a more general catchment scale distributed hydrological model for the detailed simulation of soil water balance or into other types of models, which are used to simulate the dynamics of water pressure heads and of soil saturation, such as in the case of the modelling of rainfall triggered landslides and of solute and contaminant transport in agricultural soils. CUDA architecture is utilized in order to take advantage of the computational capabilities of modern GPUs. Particular weight was given to the utilization of the different available memory types. Constant and texture memory are extensively used in order to accelerate the memory accesses, while shared memory is used to exploit the locality of thread computations and optimize block’s memory accesses. The presented model was applied to various test cases and showed good agreement with published results and scalability with increasing thread and block size.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
439
On SlideShare
0
From Embeds
0
Number of Embeds
40
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A parallel computational framework for the simulation of variably saturated flow based on the Cellular Automata concept using CUDA architecture

  1. 1. A parallel computational framework for the simulation of variably saturated flow based on the Cellular Automata concept using CUDA architecture. Paolo Burlando , Grigorios G. Anagnostopoulos and Adamos Kyriakou 1 1 2 Institute of Environmental Engineering (1), Computer Vision Lab (2), ETH Zurich, Switzerland Abstract No: IN13A-1317 correspondence: anagnostopoulos@ifu.baug.ethz.ch 1. Introduction 3. Verification of the algorithm 5. Implementation and performance A simple and efficient computational framework is presented for the simulation of variably saturated flow in porous media. In this modeling approach the Cellular Automata (CA) concept is implemented. The presented algorithm was tested against known benchmark cases available from the literature, in order to evaluate its performance. These include experimental data, analytical solutions and numerical experiments (Anagnostopoulos and Burlando, 2011). The most challenging issue is the fact that the domain can have irregular geometry, which can make more difficult the exploitation of locality at the thread computations and the use of the shared memory. 2. Computational algorithm • The cell values are stored in a 1D array and for each cell the indexes of its neighboring cells were also stored. Both of these matrices reside in the global memory. 0 t = 2 hrs t = 3 hrs t = 4 hrs t = 8 hrs experimental data 0.5 Water Depth (m) According to the macroscopic CA notion the computational domain consists of a two or three dimensional lattice, which is composed by rectangular or prismatic cells respectively. Every cell of the lattice communicates with its neighbors only through its faces. • Simulation constants are stored in the constant memory. 1 • Soil properties for each soil class are stored in the texture memory. 1.5 2 0 • Atomic operations are used in order to check for convergence at every iteration. 0.5 1 1.5 Distance (m) 2 2.5 Results and conclusions: Q1 (0,-1,0) Q3 4. CUDA Architecture Q5 (0,0,0) Q2 Q0 Q4 100000" CUDA is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA GPUs to solve many complex computational problems in a more efficient way than on a CPU. (0,1,0) (1,0,0) • The shared memory is used to accelerate the atomic operations and the block’s memory accesses. 3 (0,0,-1) (-1,0,0) Parallelization strategy: Chapter 4: Hardware Implementation (0,0,1) 90" Speed%Up%Factor%(.)% • The inherent CA concept simplicity and its natural parallelism make its implementation easy within the CUDA framework. An example is the infiltration experiment of Vauclin et al (1979), which is used to evaluate the ability of the model simulating the transient position of the water table in a laboratory scale soil box. Speed%(%cells/sec%)% • It is efficient for the simulation of large scale phenomena. For the runs we used a Nvidia Quadro 2000 graphics card with 192 CUDA cores installed in a pc with an Intel Xeon processor at 2.93 GHz. The benchmark case of Vauclin et al (1979) was used for assessing the performance of the code for grid dimensions of increasing size (scale effect). 10000" 1000" 100" CPU" 10" GPU" Device ht+ c t = 1" 1000" Multiprocessor 2 Shared Memory Processor 1 Registers 50" 40" 30" 20" Processor 2 100000" 1000000" 10000000" 0" 1000" … 10000" 100000" Number%of%Cells% 1000000" 10000000" • The speed up factor increases with grid dimension. As the domain size increases more computational resources of the GPU are exploited. Registers Instruction Unit Processor M • Our framework is very attractive for basin scale simulations (e.g. in natural hazards assessment) where the grid sizes can become excessively large. Constant Cache Texture Cache ↵2I The above equation is applied in all the cells of the lattice except those, which have a Diriclets boundary condition, the hydraulic head of which is fixed throughout the simulation. 60" Number%of%Cells% X K ↵c A↵c X Vc ( c ) t ht + hc + Q↵ ↵ bound + Sc l↵c t 0 ↵2I X K ↵c A↵c Vc ( c ) + l↵c t ↵2I 10000" Multiprocessor 1 Registers 70" 10" Multiprocessor N Coupling the discrete formulation of the mass balance of an arbitrary cell with the DarcyBuckinghams law one can compute the head at time t + t: 80" Device Memory References A set of SIMT multiprocessors with on-chip shared memory. Figure 4-2.Hardware Model CUDA comes with a software environment that al4.2 Multiple Devices lows developers to use C as a high-level programming language. The use of multiple GPUs as CUDA devices by an application running on a multiGPU system is only guaranteed to work if these GPUs are of the same type. When the system is in SLI mode, all GPUs are accessible via the CUDA driver and runtime as separate devices, but there are special considerations as described below. First, an allocation in one CUDA device on one GPU will consume memory on other GPUs. Because of this, allocations may fail earlier than otherwise expected. [1] G.G. Anagnostopoulos, P. Burlando, (2011). Object-oriented computational framework for the simulation of variably saturated flow, using a reduced complexity model, Submitted in Environmental Modelling & Software [2] M. Vauclin, D. Khanji, G. Vachaud, (1979). Experimental and numerical study of a transient, two-dimensional unsaturated-saturated water recharge problem. Water Resources Research, Vol 15 [3] NVIDIA (2010). Cuda programming guide, 3.0, Available: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/ NVIDIA_CUDA_ProgrammingGuide.pdf

×