Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)

Mark Britton, DHI
FMA Conference
Santa Clara CA, 09/03/14
© DHI #1
Parallelization techniques and hardwarefor 2D modelling

Acknowledgements
© DHI
•DHI Denmark (Johan Hartnack & Ole Sorensen)
•DHI New Zealand (Colin Roberts & Greg Whyte)
•Various HPC providers that have allowed DHI to freely install and test software on their facilities
#2

Objectives
© DHI
•Trying to simplify the language of hardware and programming for specific hardware
•Share where we (DHI) are at, and where we are going
•Demonstrate what is possible in 2D modelling
#3
Cluster
CUDA
Shared
Memory

© DHI #4
MIKE21 –Different numerical solutions
Single Grid(and nested)
Curvilinear(river morphology)
Flexible Mesh(triangles & quads)

Model set-up used for bench marking (Mediterranean Sea)
© DHI #5
•Flexible Mesh (Finite Volume) explicit code optimizedfor parallelization and distributed simulation

Parallelization –Shared memory approach
© DHI #6
•The calculations are carried out on multiple processors on the same PC, all accessing the same memory (Open Multi-Processing or OPENMP).

Parallelization –Shared memory approach
Incl. Side-feeding
Excl. Side-feeding
Number of processors
Speed up factor
#7
Mesh
No. elements
__
80,968
__
323,029
__
1,292,116
© DHI

•The calculations are carried out on multiple processors, each with itsown memory space, and the required information is passed between the processors at regular intervals (Message Passing Interface or MPI).
© DHI #8
Parallelization –Distributed memory approach

•Basic Concept-Domain decomposition concept(physical sub-domains) -Each processor integrates the equationsin the assigned sub-domain-Data exchange between sub-domainsis based on halo layer/elements concept
© DHI #9

© DHI
Date
Linux
Unix
Mixed
MS Windows
BSD based
June 2013
95.2%
3.2%
0.8%
0.6%
0.2%.
•High performance computing (HPC) has been one of the fastest growing IT-markets within the last five years
LinuxUnix
Mixed
Windows
BSD
Mac
#10

High Performance Computing
Speed up factor
Number of processors
#11
© DHI
Mesh
No. elements
__
80,968
__
323,029
__
1,292,116

© DHI #12
Parallelization –Utilizing GPU technology
•GeForce GTX TITAN GPU Card, middle of the range gaming card retails for approximately USD$1000

GPU
#13
© DHI

•The key calculations (2D) are carried out on the graphics processors.
•MIKE 21 FM and MIKE FLOOD FM are both GPU enabled (same code) -more products to come
•Uniquely, for a coupled simulation (1D/2D) in MIKE FLOOD, the 1D calculations (structures/channels) are undertaken on the CPU.
•It is not possible to scale the degree of parallelization on a GPU-all cores are active all the time-scale using the resolution of the mesh
•DHI software is optimized for CUDA technology, used in manyGPU cards from the NVIDIA range
•DHI software can be run in both Single and Double Precision
#14
© DHI

Double Precision
© DHI #15
_
1storder
_
2ndorder

GPU
GPU
GPU
#16
Hybrid Parallelization –A new frontier
© DHI

•Combines GPU technology with the MPI technology (a cluster of GPU’s)
© DHI #17
IT4Innovation’s AnselmCluster at Ostrava University (Czech Republic, CZ)
•3344 compute nodes
•each node has 2 x IntelE5-2665 2.4GHz (16 cores)
•23 GPU accelerated nodes
•15 TB RAM

© DHI
Number of GPUs
#18
Mesh
No. elements
__
323,029
__
1,292,116
__
5,156,238
Mediterranean SeaDouble Precision

© DHI #19
Mesh
Sample Flood Model
# Elements
995019
•A sample flood model used for bench marking(not all elements are wet, less efficient parallelization)

© DHI #20
MPI
•Bench marking using a flood model (not all elements are wet)
Each nodehas 16 cores
1 million (not all wet)
1.3 million (all wet)
80k (all wet)

© DHI #22
GPU vsMPI
•1 GPU is about 5xfaster than 16 cores
•4 GPU’s is about 4xfaster than 64 cores
•16 GPU’s is nearly 3xfaster than 256 cores
16
32
64
128
256
_
MPI
_
GPU
•4 GPU’s is fasterthan 256 cores
2
4
16
8
1

© DHI #23
Hybrid Parallelization –A case study
•Christchurch, New Zealand
Catchment area approx. 420 km2 including three river systems in the model domain:
Avon River
Styx River
HeathcoteRiver
2D model domain:
4.2 million elements
10 m x 10 m resolution flexible mesh (rectangular elements)
Distributed rainfall-runoff with no losses (rain-on-grid) -extreme rainfall event-21 hour storm

© DHI #24
Hybrid Parallelization –A case study
•Christchurch, New Zealand
Run time on desktop PC (MPI) is 8.9 hours:
16 core Dell Workstation
2 x Intel® Xeon® CPU ES- 2687W v2 (8 core, 3.40 GHZ)
32 GB of RAM
Windows 7 operating system
Run time with 1 x GeForce GTX TITAN GPU card is 3.1 hours
Run time with 2 x GeForce GTX TITAN GPU cards is 1.7 hours

•The mathematical formulation in the GPU versionis identical to the CPU version
•Coupled models (1D/2D) are enabledin the GPU version, allowing structuresto be modelled, not just 2D flow
•GPU performance is excellent but highlydependent on the card
•Optimal performance is achieved for modelswith more than 400,000 elements
•GPU cards are much cheaper than the equivalentCPU hardware in terms of performance (up to 50x cheaper)
© DHI #25
GPU Perspectives

•The use of advanced parallelization techniques are keywhen delivering timely, detailed, accurate and consistenthydrodynamic modellingresults.
•Large detailed 1D/2D hydrodynamic models can be used inreal-time and near real-time applications like Flood Forecastingand Disaster Risk Management.
•DHI Software is ready to take full advantage of the next waveof hardware solutions with the Hybrid MPI/GPU approach.
© DHI #26
Conclusions

I am a numerical modeller, and my models take a very long time to run.
My company/department has just invested $$$ in getting me some really fast new computer hardware so I can be more efficient, more productive and/or more profitable.
Tomorrow I will be super excited because:
(a)all my current models run so much faster than today
or
(b) I can start building even bigger models with even finer resolution.
?/10 numerical modellerschoose (b)
© DHI #27
The Modelling Conundrum…..

Mark Britton
Global Corporate Relationship Manager
mfb@dhigroup.com
© DHI #28
Thank you for your attention
ISO Certified for SoftwareDevelopment & Support

Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)

More Related Content

Viewers also liked

Similar to Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)

More from Stephen Flood

Recently uploaded

Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)