The new Parallel Krylov Solver package
Jarno Verkaik (Deltares)
Joseph Hughes (USGS)
Edwin Sutanudjaja (UU)
Oliver Schmitz (UU)
Paul van Walsum (Alterra WUR)
Raju Ram (TUD)
1 November, 2016
Contents
• Problem statement and solution
• Short overview of (related) developments
• Concept of domain decomposition
• (Preliminary) results
• Practical usage with iMOD
Cartesius: the Dutch supercomputer
Problem statement and solution
Problem statement:
• In order to support decision makers in solving hydrological problems,
detailed high resolution models are often needed.
• These models typically consist of a large number of computational cells
and have large memory requirements and long run times.
Solution:
• An efficient technique for obtaining realistic run times and memory
requirements is parallel computing, where the problem is divided over
multiple processor cores.
1 November, 2016
Short overview of developments
2010 ----- Deltares develops parallel MT3DMS using Message Passing Interface.
2012 ----- USGS develops parallel U(nstructured)PCG-solver using OpenMP.
2013 ----- Deltares & USGS start working on new Parallel Krylov Solver package
for MODFLOW-2005 based on UPCG (hybrid, combined MPI/OpenMP).
2013 ----- USGS releases MODFLOW-USG (UnStructured Grid) that includes the
PCGU-solver, a derivative solver of UPCG.
2015 ----- Deltares incorporates PKS in MODFLOW-USG. Cases: Indonesia and global.
2016 ----- Deltares incorporates PKS in iMODFLOW, together with Alterra for
MetaSWAP. Main case: Netherlands Hydrological Model.
2017 ----- Deltares releases iMODFLOW with PKS.
2017 ----- Deltares incorporates PKS in iMOD-SEAWAT. Cases: fresh-salt global Deltas.
(2017+---- Deltares & USGS incorporate PKS in new MODFLOW-6.)
1 November, 2016
• Distribute the memory over multiple
(connected) processor cores.
• For this, partitionize the MODFLOW grid:
• iMODFLOW: uniform blocks,
Recursive Coordinate Bisection
• MODFLOW-USG: METIS library
Concept of domain decomposition (1/3)
1 November, 2016
Example
MODFLOW-USG
METIS
Example
iMODFLOW
“uniform”
Example
iMODFLOW
RCB
• Distribute the linear system Ah = b over
the partitions, where h is the groundwater
head to be solved.
• Connect the partitions tightly through MPI,
using an overlap for exchanging data.
• Solve this system in parallel with exactly
the same accuracy as for the serial case.
• Krylov-Schwarz domain decomposition:
• Restricted Additive Schwarz
parallel preconditioner:
• Applied to CG/GCR Krylov methods
• Inaccurate subdomain solve: ILU only
• Dirichlet transmission condition
Concept of domain decomposition (2/3)
1 November, 2016
www.ddm.org
Example structure additive
Schwarz preconditioner M
Concept of domain decomposition (3/3)
1 November, 2016
Results iMODFLOW: NHM (1)
1 November, 2016
• Netherlands Hydrological Model for drought simulation
• iMODFLOW-MetaSWAP-TRANSOL-MOZART-DM
parallel PKS serial
• Simulation period: 2006, daily time-step
• MODFLOW: confined, 7 layers, 7x1300x1200 (~6.5M cells)
• MetaSWAP: ~0.5M cells
Results iMODFLOW: NHM (2)
1 November, 2016
• Maximum measured speedup ~5.
• Maximum theoretical speedup is
limited by surface water (< 1/0.06  16.7).
• Exactly the same heads are computed
with PKS as for the serial case.
Amdahl’s law
Results MODFLOW-USG: global model
1 November, 2016
• PCRGLOB-WB
• Period 1996-jan, transient with daily time-step,
confined, 2 layers, ~4.5M cells, 5 arc-min.
• Run 1: watershed-based input/output
(SIO, 53 watersheds)
• Run 2: Input/output clipped on METIS
partitions (NO_SIO)
Results MODFLOW-USG: Indonesia & synthetic
1 November, 2016
• Indonesia model:
steady-state, confined, 1 layer, ~4M cells,
30 arc-sec.
• Synthetic:
steady-state, confined,
heterogeneous conductivity, 2 layers,
10 km x 10 km, ~112M cells (2x7500x7500)
Synthetic Indonesia
Practical usage with iMOD (Windows)
Easy-to-use in three steps:
1. Install Microsoft MPI:
https://msdn.microsoft.com/en-us/library/bb524831(v=vs.85).aspx
2. Modify your run-file, Dataset 5 (Solver Configuration)
3. Start your parallel job. E.g. starting from the DOS prompt using 4 cores:
mpiexec -n 4 iMODFLOW.exe imodflow.run
1 November, 2016
Enable PKS
Same options as PCG
Partition method, flag for merging IDF output
!!! THANK YOU FOR YOUR ATTENTION !!!
1 November, 2016
…
…
42

DSD-INT 2016 The new parallel Krylov Solver package - Verkaik

  • 1.
    The new ParallelKrylov Solver package Jarno Verkaik (Deltares) Joseph Hughes (USGS) Edwin Sutanudjaja (UU) Oliver Schmitz (UU) Paul van Walsum (Alterra WUR) Raju Ram (TUD)
  • 2.
    1 November, 2016 Contents •Problem statement and solution • Short overview of (related) developments • Concept of domain decomposition • (Preliminary) results • Practical usage with iMOD Cartesius: the Dutch supercomputer
  • 3.
    Problem statement andsolution Problem statement: • In order to support decision makers in solving hydrological problems, detailed high resolution models are often needed. • These models typically consist of a large number of computational cells and have large memory requirements and long run times. Solution: • An efficient technique for obtaining realistic run times and memory requirements is parallel computing, where the problem is divided over multiple processor cores. 1 November, 2016
  • 4.
    Short overview ofdevelopments 2010 ----- Deltares develops parallel MT3DMS using Message Passing Interface. 2012 ----- USGS develops parallel U(nstructured)PCG-solver using OpenMP. 2013 ----- Deltares & USGS start working on new Parallel Krylov Solver package for MODFLOW-2005 based on UPCG (hybrid, combined MPI/OpenMP). 2013 ----- USGS releases MODFLOW-USG (UnStructured Grid) that includes the PCGU-solver, a derivative solver of UPCG. 2015 ----- Deltares incorporates PKS in MODFLOW-USG. Cases: Indonesia and global. 2016 ----- Deltares incorporates PKS in iMODFLOW, together with Alterra for MetaSWAP. Main case: Netherlands Hydrological Model. 2017 ----- Deltares releases iMODFLOW with PKS. 2017 ----- Deltares incorporates PKS in iMOD-SEAWAT. Cases: fresh-salt global Deltas. (2017+---- Deltares & USGS incorporate PKS in new MODFLOW-6.) 1 November, 2016
  • 5.
    • Distribute thememory over multiple (connected) processor cores. • For this, partitionize the MODFLOW grid: • iMODFLOW: uniform blocks, Recursive Coordinate Bisection • MODFLOW-USG: METIS library Concept of domain decomposition (1/3) 1 November, 2016 Example MODFLOW-USG METIS Example iMODFLOW “uniform” Example iMODFLOW RCB
  • 6.
    • Distribute thelinear system Ah = b over the partitions, where h is the groundwater head to be solved. • Connect the partitions tightly through MPI, using an overlap for exchanging data. • Solve this system in parallel with exactly the same accuracy as for the serial case. • Krylov-Schwarz domain decomposition: • Restricted Additive Schwarz parallel preconditioner: • Applied to CG/GCR Krylov methods • Inaccurate subdomain solve: ILU only • Dirichlet transmission condition Concept of domain decomposition (2/3) 1 November, 2016 www.ddm.org Example structure additive Schwarz preconditioner M
  • 7.
    Concept of domaindecomposition (3/3) 1 November, 2016
  • 8.
    Results iMODFLOW: NHM(1) 1 November, 2016 • Netherlands Hydrological Model for drought simulation • iMODFLOW-MetaSWAP-TRANSOL-MOZART-DM parallel PKS serial • Simulation period: 2006, daily time-step • MODFLOW: confined, 7 layers, 7x1300x1200 (~6.5M cells) • MetaSWAP: ~0.5M cells
  • 9.
    Results iMODFLOW: NHM(2) 1 November, 2016 • Maximum measured speedup ~5. • Maximum theoretical speedup is limited by surface water (< 1/0.06  16.7). • Exactly the same heads are computed with PKS as for the serial case. Amdahl’s law
  • 10.
    Results MODFLOW-USG: globalmodel 1 November, 2016 • PCRGLOB-WB • Period 1996-jan, transient with daily time-step, confined, 2 layers, ~4.5M cells, 5 arc-min. • Run 1: watershed-based input/output (SIO, 53 watersheds) • Run 2: Input/output clipped on METIS partitions (NO_SIO)
  • 11.
    Results MODFLOW-USG: Indonesia& synthetic 1 November, 2016 • Indonesia model: steady-state, confined, 1 layer, ~4M cells, 30 arc-sec. • Synthetic: steady-state, confined, heterogeneous conductivity, 2 layers, 10 km x 10 km, ~112M cells (2x7500x7500) Synthetic Indonesia
  • 12.
    Practical usage withiMOD (Windows) Easy-to-use in three steps: 1. Install Microsoft MPI: https://msdn.microsoft.com/en-us/library/bb524831(v=vs.85).aspx 2. Modify your run-file, Dataset 5 (Solver Configuration) 3. Start your parallel job. E.g. starting from the DOS prompt using 4 cores: mpiexec -n 4 iMODFLOW.exe imodflow.run 1 November, 2016 Enable PKS Same options as PCG Partition method, flag for merging IDF output
  • 13.
    !!! THANK YOUFOR YOUR ATTENTION !!! 1 November, 2016 … … 42