Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adams_SIAMCSE15

155 views

Published on

RX-Solvers Presentation at SIAM CSE15

Published in: Science
  • Be the first to comment

Adams_SIAMCSE15

  1. 1. Mark F. Adams Jed Brown Matt Knepley Segmental Refinement: A Multigrid Technique for Data Locality
  2. 2. 2  Communication costs at all levels of memory hierarchy have been increasing relative to processor speeds for 30+ years  Energy constraints exacerbates cost of memory movement • Tipping point where radical ideas should be investigated  Brandt (1977) proposed Segmental Refinement (SR)  Serial ultra-low memory multigrid method  Brandt & Diskin (1994) apply idea parallel processing  We continue this work • First published multilevel numerical results  SR uses buffering and does not “communicate” finest levels • Do not replicate multigrid semantics exactly  Not brute force “communication avoiding” • Game: keep textbook MG efficiency, asymptotically exact Rethinking multigrid solvers for new architectures
  3. 3. 3 Multigrid Communication Patterns: vertical (tree) & horiz. FMG starts with accurate solve on coarsest grid
  4. 4. 4 Nearest neighbor intra grid comm. Inter grid tree++ comm. Refine grid split processes, building tree Multigrid Communication Patterns: vertical (tree) & horiz.
  5. 5. 5 FMG goes back to coarse grid after each new level Multigrid Communication Patterns: vertical (tree) & horiz.
  6. 6. 6 Nearest neighbor intra grid comm. Inter grid tree++ comm. Back down – Two level FMG Multigrid Communication Patterns: vertical (tree) & horiz.
  7. 7. 7 Nearest neighbor intra grid comm. Inter grid tree++ comm. refine & populate procs Multigrid Communication Patterns: vertical (tree) & horiz.
  8. 8. 8 Nearest neighbor intra grid comm. Inter grid tree++ comm. Populate & refine Fully populated – continue refinement Multigrid Communication Patterns: vertical (tree) & horiz.
  9. 9. 9 Nearest neighbor intra grid comm. Inter grid tree+ comm. Populate & refine Multigrid Communication Patterns: vertical (tree) & horiz.
  10. 10. 10 SR removes horizontal communication at some scale Nearest neighbor intra grid comm. Inter grid tree+ comm. Populate & refine Segmental refinement removes communication finest levels
  11. 11. 11  SR uses conventional FMG solver “coarse” grid solver  Buffer cells added to finer grids & don’t update  Error decays  Experimentally find adequate buffer schedule acceptable level of accuracy Segmental refinement technique – Brandt & Diskin O(1) O(log N)
  12. 12. 12  Use linear buffer schedule: # buffer cells J level i: A + B(K – i), i > 0 • K SR grids, i = 0 transition level • A & B independent integer parameters ≥ 0 • Only few (A) buffer cells on fine grid, more on coarse grids  Model problem • Cell-centered finite difference 27-point stencil Laplacian • Cartesian isotropic grids • Piecewise constant Restriction, linear Prolongation • 2nd order Chebyshev smoother in V(2,2) cycles • Full multigrid with linear FMG prolongation • u=(x4-Lxx2)(y4-Lyy2)(z4-Lzz2); L= (2,1,1)  Segmental refinement parameters of interest: • A: Number of buffer cells in finest grid • B: Increase buffer cells per level • N0: Size of transition level sub-domain Accuracy w.r.t. segmental refinement parameters
  13. 13. 13 Probe parameter 5D space (A, B, N0, K, esr) Fine grid buffer size A = 4 N0 (K) Increment B 16 (6) 8 (5) 4 (4) 0 5.7 2.6 1.2 1 2.0 1.4 1.0 2 1.3 1.1 NA 3 1.1 1.0 NA  Ratio (esr/econv) SR err to convention MG solver error  Define acceptable error as ~10% (esr/econv ~1.1)  Observe isosurfaces …  Implies N0 & B increase w K  Dependence on N0 not recognized B & D analysis • This need corroboration & possibly amelioration  Implies new data model for asymptotic analysis … More data in paper
  14. 14. 14 Data movement complexity analysis – new data model  Machine model (that is proper “basis” for new DM) • Q words (small patches) memory and processes on fine grid • √Q memory partitions • √Q words per memory partition  New DM: transition level fits on one memory partition Log(NK)/2 + 1 Log(NK)/2
  15. 15. 15 Data movement complexity analysis  Complexity model (again proper “basis” for new DM): • Near – intra-partition – communication • Far – inter-partition – communication • Horizontal communication (residual, etc.): CH • Vertical communication (Restrict & Prolong): CV Comm. type Near (L2/8) Far (L2/8) Coarse grids 3(6cH + 2cV) 0 Conv. fine grids 6cH 6cH + 2cV SR fine grids 6cH 0 + 2cV SR removes fine grid communication Unlike conventional communication avoiding, SR reduces bisection bandwidth. In 3D: N2 -> N log2(N)
  16. 16. 16 Conclusion: SR severs horizontal comm. at some scale log NK1-K2 K2= O(1)  Designed two SR data models • Each removes horizontal communication at level in memory hierarchy  Multiple SR models combined address all levels of interest …  Future work: • Corroborate:  Try H.O. Prolong.  Vertex centered FE • Extend to more apps • New SR data models • ...
  17. 17. 17 Weak scaling: 2 - 8K sockets Edison (Cray XC30) Some indication of SR catching up (4 SR levels) O(1) SR levels Not log (N)
  18. 18. 18 Thank you https://bitbucket.org/madams/srgmg (SISC) paper, code, data, parse & run scripts:

×