Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero
Simulation As a Method To Support Complex Organizational Transformations in H...
Similar to Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero
Similar to Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero (20)
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero
1. Seventh International Conference on
Geographical Analysis, Urban Modeling, Spatial Statistics, GEOG-AN-MOD 2012
Parallel Simulation of
Urban Dynamics on the GPU
Ivan Blecic, Arnaldo Cecchini and Giuseppe A. Trunfio
Department of Architecture, Planning and Design
University of Sassari
2. Introduction
• A number of geosimulation models have been developed to better
understand and predict urban growth, land-use and landscape
changes.
• Some trends can be recognized from the literature:
– increasing size of the areas under study, which can often go beyond
the traditional scale of a city, covering wider regional and nation
territory or even an entire continent;
– such models tend to be more and more sophisticated, also because
they can take advantage of the increased availability of high
resolution remote sensing data;
– Automatic and computationally expensive calibration processes are
often required, involving large search spaces and many parameters.
• As a result, real world applications of such models often require
long computing times.
3. Introduction
• Geosimulation models are often computationally
intensive;
• In spite of this, few studies exist in the literature on
the application of parallel computing to
geosimulation models
– (e.g. the recent work by Guan and Clarke where a general-
purpose parallel library was developed and applied to
speed up the well known CA model SLEUTH);
• We apply GPGPU (General-Purpose computing on
Graphics Processing Units) to a widely used CA
approach for land-use simulation based on the
concept of transition potentials
4. GPGPU
• GPGPU (General-Purpose computing on Graphics Processing Units):
using Graphics Processing Units for standard computation
• Why computing using Graphics Processing Units ?
• the computational power of devices enabling GPGPU has exceeded that
of the standard CPUs by more than one order of magnitude;
• the price of a typical high-end GPU is comparable to the price of a
standard CPU;
CPUs
GPUs
5. GPGPU
• Why computing using Graphics Processing Units ?
• There has been a rapid increase in the programmability of GPU
devices, which has facilitated the porting of many scientific applications
leading to relevant parallel speedups
• Main alternatives from the programming point of view:
• nVidia CUDA: C-language Compute Unified
Device Architecture is a popular programming
model introduced in 2006 by nVidia Corporation
for their GPUs
• openCL: an open standard maintained by the
Khronos group with the backing of major
graphics hardware vendors as well as large
computer industry vendors
6. GPUs
• Modern GPUs are multiprocessors with a highly
efficient hardware-coded multi-threading support.
• The key capability of a GPU unit is thus to execute
thousands of threads running the same function
concurrently on different data.
• Hence, the computational power provided by such
an architecture can be fully exploited through a fine
grained data-parallel approach when the same
computation can be independently carried out on
different elements of a dataset.
7. GPUs
• We use the GPGPU platform provided by nVidia
– it consists of a group of Streaming Multiprocessors (SMs);
– each SM can support some co-resident concurrent threads;
– each SM consists of multiple Scalar Processor (SP) cores.
SM
8. CUDA
C-language Compute Unified Device Architecture
• In a typical CUDA program, sequential host instructions are
combined with parallel GPU code.
• In CUDA, the GPU activation is obtained by writing device functions
in C language, which are called kernels:
– when a kernel is invoked by the CPU, a number of threads (e.g.
typically several thousands) execute the kernel code in parallel on
different data;
Kernels are organized in blocks
9. CUDA
• The GPU can access
different types of memory.
• The device global memory
can deliver significantly
(e.g. one order of
magnitude) higher memory
bandwidth than the main
computer memory;
• Unfortunately, the GPU
global memory is typically
linked to the GPU card
through a relatively slow
bus
10. Two GPGPU accelerated models for Simulating
Land-Use Dynamics
• Two versions of a typical Cellular Automata (CA) model for
land use dynamics have been parallelized for the GPU:
– a constrained cellular automata model (CCA);
– and the corresponding unconstrained version (UCA).
• Both models are based on the well known concept of
transition potential:
– in the CCA the aggregate level of demand for every land use is
fixed by an exogenous constraint at each time step;
– in the UCA the amount of cells that are in a certain state at each
time step only depends on the internal model parameters and
model structure;
11. CCA and UCA simulation of land use change
Planning regulation, Accessibility, neighbourhood effect
suitability, etc.
(interactions between urban
functions)
Transition potentials Land use requests in the area
Land use at time t Land use at time t+1
12. CCA and UCA simulation of land use change
• Step 1 for both UCA and CCA:
– transition potential computation (on a local basis);
• Step 2 for UCA (on a local basis):
– of all the possible land uses, a cell is transformed into
the one having the highest transition potential;
• Step 2 for CCA (on a non-local basis):
– transforming each cell into the state with the highest
potential, given the constraint of the overall number
of cells in each state imposed by the exogenous trend
for that step;
13. GPGPU Parallelization with CUDA:
design choices
• One or more CUDA computational kernels (i.e.
threads) are assigned to each cell of the automaton;
– to define the kernels a key step consists of identifying all the sets of
instructions that can be executed independently of each other on
the different cells of the automaton;
• Most of the automaton data is stored in the GPU
global memory. This involves:
– CPU-GPU memory copy operation before the beginning of the
simulation and GPU-CPU memory copy at the end of the simulation;
– at the end of each CA step a device-to-device memory copy operation
is used to re-initialise the current values of the CA state with the next
values.
14. GPGPU Parallelization of UCA
• In the UCA model, the computation performed at
each step by each cell consists of two phases:
1. the computation of the transition potentials and
2. the assignment of a new land use;
• Since both can be carried out independently
for each cell, they were included in a single
kernel, thus avoiding the overhead related to
invocation of an additional kernel.
15. GPGPU Parallelization of CCA
• Also in the CCA each cell computes its transition potential;
However, the downwards scanning of the list of cells ranked
according to their higher potential (lines 4-5) must be carried out
according to the list order, one cell at a time (inherently sequential)
• As a land-use demand is satisfied, a new ranking of cells must be
performed before any further cell transition.
• The constraints on the total number of cells represents a strong condition
of dependency between the cells.
16. GPGPU Parallelization of CCA
• A different constrained allocation procedure has been devised,
which is able to better exploit the GPU while maintaining the
essential characteristics of the original constrained approach.
• The proposed parallel
constrained allocation
tries to process in
parallel blocks of cells
that have their highest
potential for the same
land use;
• More details of the
algorithms in the paper
17. Computational results: hardware
• The sequential UCA and CCA reference versions were run on
a desktop computer equipped with a 2.66 Ghz Intel Core 2
Quad CPU;
• The parallel versions were run on the following GPUs:
18. Computational results: test cases
• Two different datasets:
– the first concerns the area of the city of Florence and is composed of
242 × 151 cells of size corresponding to 100 m;
– the second represents the urban area of Athens and is composed of
321 × 391 cells of size 100 m.
– 30 simulation steps (i.e. 30 years of future land use projection);
– for the CCA, a constant 3% increment, referred to the initial number
of cells, was adopted as constraint for each active land use.
• In both the CCA and UCA, the effort involved in the
computation of transition potentials is almost proportional to
the number of neighbouring cells.
– for this reason three different neighbourhood radius were
considered, namely r = 10 cells, r = 15 cells and r = 20 cells.
21. Computational results:
conclusions
• The gain in terms of computing time is impressive.
• As expected, the speedup of the UCA model was always
superior to that achieved on the CCA model.
• Improvement are still possible, since not all typical GPGPU
optimization strategies have been implemented and more
powerful GPUs are available;
• The main advantage lies in enabling an accurate calibration,
which otherwise may not be possible in some cases involving
models operating at regional or continental scale