Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)


Published on

Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)

Delivered at: -

FMA Conference
3 September 2014 - Santa Clara CA

Published in: Science
  • Be the first to comment

Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)

  1. 1. Mark Britton, DHI FMA Conference Santa Clara CA, 09/03/14 © DHI #1 Parallelization techniques and hardwarefor 2D modelling
  2. 2. Acknowledgements © DHI •DHI Denmark (Johan Hartnack & Ole Sorensen) •DHI New Zealand (Colin Roberts & Greg Whyte) •Various HPC providers that have allowed DHI to freely install and test software on their facilities #2
  3. 3. Objectives © DHI •Trying to simplify the language of hardware and programming for specific hardware •Share where we (DHI) are at, and where we are going •Demonstrate what is possible in 2D modelling #3 Cluster CUDA Shared Memory
  4. 4. © DHI #4 MIKE21 –Different numerical solutions Single Grid(and nested) Curvilinear(river morphology) Flexible Mesh(triangles & quads)
  5. 5. Model set-up used for bench marking (Mediterranean Sea) © DHI #5 •Flexible Mesh (Finite Volume) explicit code optimizedfor parallelization and distributed simulation
  6. 6. Parallelization –Shared memory approach © DHI #6 •The calculations are carried out on multiple processors on the same PC, all accessing the same memory (Open Multi-Processing or OPENMP).
  7. 7. Parallelization –Shared memory approach Incl. Side-feeding Excl. Side-feeding Number of processors Speed up factor #7 Mesh No. elements __ 80,968 __ 323,029 __ 1,292,116 © DHI
  8. 8. •The calculations are carried out on multiple processors, each with itsown memory space, and the required information is passed between the processors at regular intervals (Message Passing Interface or MPI). © DHI #8 Parallelization –Distributed memory approach
  9. 9. •Basic Concept-Domain decomposition concept(physical sub-domains) -Each processor integrates the equationsin the assigned sub-domain-Data exchange between sub-domainsis based on halo layer/elements concept © DHI #9 Parallelization –Distributed memory approach
  10. 10. © DHI Date Linux Unix Mixed MS Windows BSD based June 2013 95.2% 3.2% 0.8% 0.6% 0.2%. •High performance computing (HPC) has been one of the fastest growing IT-markets within the last five years LinuxUnix Mixed Windows BSD Mac Parallelization –Distributed memory approach #10
  11. 11. High Performance Computing Speed up factor Number of processors #11 © DHI Parallelization –Distributed memory approach Mesh No. elements __ 80,968 __ 323,029 __ 1,292,116
  12. 12. © DHI #12 Parallelization –Utilizing GPU technology •GeForce GTX TITAN GPU Card, middle of the range gaming card retails for approximately USD$1000
  13. 13. GPU #13 Parallelization –Utilizing GPU technology © DHI
  14. 14. •The key calculations (2D) are carried out on the graphics processors. •MIKE 21 FM and MIKE FLOOD FM are both GPU enabled (same code) -more products to come •Uniquely, for a coupled simulation (1D/2D) in MIKE FLOOD, the 1D calculations (structures/channels) are undertaken on the CPU. •It is not possible to scale the degree of parallelization on a GPU-all cores are active all the time-scale using the resolution of the mesh •DHI software is optimized for CUDA technology, used in manyGPU cards from the NVIDIA range •DHI software can be run in both Single and Double Precision #14 Parallelization –Utilizing GPU technology © DHI
  15. 15. Double Precision © DHI #15 Parallelization –Utilizing GPU technology _ 1storder _ 2ndorder
  16. 16. GPU GPU GPU #16 Hybrid Parallelization –A new frontier © DHI
  17. 17. •Combines GPU technology with the MPI technology (a cluster of GPU’s) © DHI #17 Hybrid Parallelization –A new frontier IT4Innovation’s AnselmCluster at Ostrava University (Czech Republic, CZ) •3344 compute nodes •each node has 2 x IntelE5-2665 2.4GHz (16 cores) •23 GPU accelerated nodes •15 TB RAM
  18. 18. © DHI Number of GPUs #18 Mesh No. elements __ 323,029 __ 1,292,116 __ 5,156,238 Hybrid Parallelization –A new frontier Mediterranean SeaDouble Precision
  19. 19. © DHI #19 Hybrid Parallelization –A new frontier Mesh Sample Flood Model # Elements 995019 •A sample flood model used for bench marking(not all elements are wet, less efficient parallelization)
  20. 20. © DHI #20 Hybrid Parallelization –A new frontier MPI •Bench marking using a flood model (not all elements are wet) Each nodehas 16 cores 1 million (not all wet) 1.3 million (all wet) 0.3 million (all wet) 80k (all wet)
  21. 21. © DHI #21 Hybrid Parallelization –A new frontier GPU Number of GPU nodes •Bench marking using a flood model (not all elements are wet) 1 2 4 8 16 1 million (not all wet) 5.2 million (all wet) 1.3 million (all wet) 0.3 million (all wet)
  22. 22. © DHI #22 Hybrid Parallelization –A new frontier •Bench marking using a flood model (not all elements are wet) GPU vsMPI •1 GPU is about 5xfaster than 16 cores •4 GPU’s is about 4xfaster than 64 cores •16 GPU’s is nearly 3xfaster than 256 cores 16 32 64 128 256 _ MPI _ GPU •4 GPU’s is fasterthan 256 cores 2 4 16 8 1
  23. 23. © DHI #23 Hybrid Parallelization –A case study •Christchurch, New Zealand Catchment area approx. 420 km2 including three river systems in the model domain: Avon River Styx River HeathcoteRiver 2D model domain: 4.2 million elements 10 m x 10 m resolution flexible mesh (rectangular elements) Distributed rainfall-runoff with no losses (rain-on-grid) -extreme rainfall event-21 hour storm
  24. 24. © DHI #24 Hybrid Parallelization –A case study •Christchurch, New Zealand Run time on desktop PC (MPI) is 8.9 hours: 16 core Dell Workstation 2 x Intel® Xeon® CPU ES- 2687W v2 (8 core, 3.40 GHZ) 32 GB of RAM Windows 7 operating system Run time with 1 x GeForce GTX TITAN GPU card is 3.1 hours Run time with 2 x GeForce GTX TITAN GPU cards is 1.7 hours
  25. 25. •The mathematical formulation in the GPU versionis identical to the CPU version •Coupled models (1D/2D) are enabledin the GPU version, allowing structuresto be modelled, not just 2D flow •GPU performance is excellent but highlydependent on the card •Optimal performance is achieved for modelswith more than 400,000 elements •GPU cards are much cheaper than the equivalentCPU hardware in terms of performance (up to 50x cheaper) © DHI #25 GPU Perspectives
  26. 26. •The use of advanced parallelization techniques are keywhen delivering timely, detailed, accurate and consistenthydrodynamic modellingresults. •Large detailed 1D/2D hydrodynamic models can be used inreal-time and near real-time applications like Flood Forecastingand Disaster Risk Management. •DHI Software is ready to take full advantage of the next waveof hardware solutions with the Hybrid MPI/GPU approach. © DHI #26 Conclusions
  27. 27. I am a numerical modeller, and my models take a very long time to run. My company/department has just invested $$$ in getting me some really fast new computer hardware so I can be more efficient, more productive and/or more profitable. Tomorrow I will be super excited because: (a)all my current models run so much faster than today or (b) I can start building even bigger models with even finer resolution. ?/10 numerical modellerschoose (b) © DHI #27 The Modelling Conundrum…..
  28. 28. Mark Britton Global Corporate Relationship Manager © DHI #28 Thank you for your attention ISO Certified for SoftwareDevelopment & Support