SlideShare a Scribd company logo
1 of 39
Download to read offline
Title




    DUNE on current and next generation HPC Platforms

                                               Markus Blatt

                                             Dr. Markus Blatt
                                      HPC-Simulation-Software & Services


                              Forschungszentrum J¨lich, Germany
                                                 u
                                        March 8, 2012




M. Blatt (HPC-Sim-Soft, Heidelberg)             DUNE Blue Gene             J¨lich 2012
                                                                            u            1 / 33
Outline


Outline


1    DUNE

2    Parallelization
       Parallel Grid Interface
       Parallel Iterative Solvers
       Parallel Algebraic Multigrid
       Scalability
       A Glimpse at other DUNE projects

3    Trends and Outlook for HPC and DUNE

4    HPC-Simulation-Software & Services



M. Blatt (HPC-Sim-Soft, Heidelberg)   DUNE Blue Gene   J¨lich 2012
                                                        u            2 / 33
DUNE


Why we created DUNE!

Problems with most PDE software
    • Mostly support one special set of features:
       • IPARS: block structured, parallel, multiphysics.
       • Alberta: simplicial, unstructured, bisection refinement.
       • UG: unstructured, multi-element, red-green refinement, parallel.
       • QuocMesh: Fast, on-the-fly structured grids.
    • Other features either not or inefficiently supported.

The idea of DUNE
    • Separation of data structures and algorithms.
    • Easy exchange of interface implementations.
    • Reuse of legacy software.
    • Fine grained interfaces.
    • C++ with templates.

M. Blatt (HPC-Sim-Soft, Heidelberg)   DUNE Blue Gene              J¨lich 2012
                                                                   u            3 / 33
DUNE


Modularity

                         dune−pdelab−howto


                                      dune−pdelab              dune−fem


                  dune−grid−howto                                dune−localfunctions


                                      dune−grid           dune−istl          Metis
                  NeuronGrid
                                                                          SuperLU
                        UG     Alberta ALU            dune−common

                                             VTK   Gmsh

    • Grid interface: (non-)conforming hierarchically grid interface.
    • Iterative Solver Template Library: Dense and sparse linear algebra.
    • PDELab: Discretization module based on residual formulation.


M. Blatt (HPC-Sim-Soft, Heidelberg)           DUNE Blue Gene                         J¨lich 2012
                                                                                      u            4 / 33
DUNE


PDELab: Plug, Code and Play Simulation Software



                                               • Choose:
                                                   • Grid
                                                   • Finite Element
                                                   • Maybe imlement local
                                                     operator.
                                                   • Time stepping scheme.
                                                   • (Non-)linear solvers.
                                               • Recompile application
                                               • Run efficiently.




M. Blatt (HPC-Sim-Soft, Heidelberg)   DUNE Blue Gene                 J¨lich 2012
                                                                      u            5 / 33
DUNE


Sample Simulations




           Transport in porous media, Density-driven flow, Flow around Root networks, Neuron network simulations

    •   Electromagnetics
    •   Computational neuroscience: biophysically realistic networks of neurons
    •   Geostatistical inversion: coping with uncertain parameters
    •   Linear Acoustics
    •   Multiphase-Flow and transport in porous media
    •   Density-driven flow
M. Blatt (HPC-Sim-Soft, Heidelberg)                 DUNE Blue Gene                                   J¨lich 2012
                                                                                                      u            6 / 33
Parallelization


Parallelization


Parallel Grids
    • Domain decomposition (overlapping or non-overlapping).
    • Load-balancing
    • Message passing based on MPI is handled by grid manager.

Parallel linear algebra
    • Message passing decoupled from grid.
    • Abstraction: Parallel index sets to identify data globally.
    • Reuse of efficient sequential linear algebra.
    • Minimize communication.



M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene         J¨lich 2012
                                                                     u            7 / 33
Parallelization   Parallel Grid Interface


An Example of a Parallel Grid


                                                                                        First row: with
                                                                                        overlap and ghosts
        c=0                           c=1                         c=2                   Second row: with
                                                                                        overlap only
                                                                                        Third row: with
                                                                                        ghosts only

                                                                                              interior
        c=0                           c=1                         c=2
                                                                                              overlap
                                                                                               ghost
                                                                                               border
                                                                                                front
                                                                                             not stored
        c=0                           c=1                         c=2

M. Blatt (HPC-Sim-Soft, Heidelberg)                DUNE Blue Gene                               J¨lich 2012
                                                                                                 u            8 / 33
Parallelization   Parallel Grid Interface


Parallel Grids in Dune
    • YaspGrid
        • structured
        • 2D/3D
        • arbitrary overlap
    • UGGrid
        • unstructured
        • 2D/3D
        • multi-element
        • one layer of ghost cells
        • (conforming) red-green refinement
        • (non-free!)
    • ALUGrid
        • unstructured
        • 3D
        • tetrahedral or hexahedral elements
        • ghost cells
        • (non-conforming) bisection refinement
M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                       J¨lich 2012
                                                                                   u            9 / 33
Parallelization   Parallel Grid Interface


Parallel Grids in Dune
    • YaspGrid
        • structured
        • 2D/3D
        • arbitrary overlap
    • UGGrid
        • unstructured
        • 2D/3D
        • multi-element
        • one layer of ghost cells
        • (conforming) red-green refinement
        • (non-free!)
    • ALUGrid
        • unstructured
        • 3D
        • tetrahedral or hexahedral elements
        • ghost cells
        • (non-conforming) bisection refinement
M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                       J¨lich 2012
                                                                                   u            9 / 33
Parallelization   Parallel Grid Interface


Parallel Grids in Dune
    • YaspGrid
        • structured
        • 2D/3D
        • arbitrary overlap
    • UGGrid
        • unstructured
        • 2D/3D
        • multi-element
        • one layer of ghost cells
        • (conforming) red-green refinement
        • (non-free!)
    • ALUGrid
        • unstructured
        • 3D
        • tetrahedral or hexahedral elements
        • ghost cells
        • (non-conforming) bisection refinement
M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                       J¨lich 2012
                                                                                   u            9 / 33
Parallelization   Parallel Iterative Solvers


Index Sets




Index Set
                                                                   P−1
    • Distributed overlapping index set I =                        p=0 Ip
    • Process p manages mapping Ip −→ [0, np ).
    • Might only store information about the mapping for shared indices.




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                          J¨lich 2012
                                                                                      u            10 / 33
Parallelization   Parallel Iterative Solvers


Index Sets


Index Set
                                                                   P−1
    • Distributed overlapping index set I =                        p=0 Ip
    • Process p manages mapping Ip −→ [0, np ).
    • Might only store information about the mapping for shared indices.

Global Index
    • Identifies a position (index) globally.
    • Arbitrary and not consecutive (to support adaptivity).
    • Persistent.
    • On JUGENE this is not an int to get rid off the 32 bit limit!



M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                          J¨lich 2012
                                                                                      u            10 / 33
Parallelization   Parallel Iterative Solvers


Index Sets

Index Set
                                                                   P−1
    • Distributed overlapping index set I =                        p=0 Ip
    • Process p manages mapping Ip −→ [0, np ).
    • Might only store information about the mapping for shared indices.

Local Index
    • Addresses a position in the local container.
    • Convertible to an integral type.
    • Consecutive index starting from 0.
    • Non-persistent.
    • Provides an attribute to identify ghost region.


M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                          J¨lich 2012
                                                                                      u            10 / 33
Parallelization   Parallel Iterative Solvers


Remote Information and Communication


Remote Index Information
    • For each process q the process p knows all common global indices
       together with their attribute on q.

Communication
    • Target and source partition of the index is chosen using attribute
       flags, e.g from ghost to owner and ghost.
    • If there is remote index information of q available on p, then p send
       all the data in one message.
    • All communication takes place asynchronously at the same time.




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                          J¨lich 2012
                                                                                      u            11 / 33
Parallelization   Parallel Iterative Solvers


Parallel Matrix Representation

    • Let Ii be a nonoverlapping decomposition of our index set I .
    • Ii is the augmented index set such set for all k ∈ Ii with
       |akj | + |ajk | = 0 also k ∈ Ii holds.
    • Then the locally stored matrix looks like
                                                   
                                           
                                                   
                                               Ii                     Aii             ∗
                                           
                                      Ii            
                                           
                                           
                                                                       0              I
                                           

    • Therefore Av can be computed locally for the entries associated with
       Ii if v is known for Ii
    • A communication step ensures consistent ghost values.
    • Matrix can be used for hybrid preconditioners.

M. Blatt (HPC-Sim-Soft, Heidelberg)                 DUNE Blue Gene                        J¨lich 2012
                                                                                           u            12 / 33
Parallelization   Parallel Algebraic Multigrid


Algebraic Multigrid (AMG)


Stationary Iterative Methods
    • Error reduction stagnates with increasing number of iterations and
       unknowns
    • Reduces only high frequency errors

Algebraic Multigrid
    • approximate smooth residual on a coarser grid and solve there.
    • calculate a correction there
    • interpolate correction to the fine grid and add it to the current guess
    • Use algebraic nature of the problem to define coarse level.
    • Coarsening adapts to problem and grid automatically



M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                            J¨lich 2012
                                                                                        u            13 / 33
Parallelization   Parallel Algebraic Multigrid


Aggregation AMG

Simple, non-smoothed version
   • Piecewise constant prolongators Pl .
   • Heuristic and greedy aggregation algorithm.
   • Al−1 = PlT Al Pl
   • Proposed by Raw, Vanek et al., Braess
   • Preconditioner for Krylov methods.

Observations
   • Reasonable coarse grid operator for systems.
   • Preserves FV discretization.
   • Very memory efficient.
   • Fast and scalable V-cycle.

M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                            J¨lich 2012
                                                                                        u            14 / 33
Parallelization   Parallel Algebraic Multigrid


Aggregation AMG

Simple, non-smoothed version
   • Piecewise constant prolongators Pl .
   • Heuristic and greedy aggregation algorithm.
   • Al−1 = PlT Al Pl
   • Proposed by Raw, Vanek et al., Braess
   • Preconditioner for Krylov methods.

Observations
   • Reasonable coarse grid operator for systems.
   • Preserves FV discretization.
   • Very memory efficient.
   • Fast and scalable V-cycle.

M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                            J¨lich 2012
                                                                                        u            14 / 33
Parallelization   Parallel Algebraic Multigrid


Illustration Parallel Setup
                                      Decoupled Aggregation




M. Blatt (HPC-Sim-Soft, Heidelberg)            DUNE Blue Gene                            J¨lich 2012
                                                                                          u            15 / 33
Parallelization   Parallel Algebraic Multigrid


Illustration Parallel Setup
                                 Communicate Ghost Aggregates




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                            J¨lich 2012
                                                                                        u            15 / 33
Parallelization   Parallel Algebraic Multigrid


Parallel Setup Phase




    • Every process builds aggregates in his owner region.
    • One communication with next neighbors to update aggregate
       information in ghost region.
    • Coarse level index sets are a subset of the fine level.
    • Remote index information can be deduced locally.
    • Galerkin product can be calculated locally.




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene                            J¨lich 2012
                                                                                        u            16 / 33
Parallelization        Parallel Algebraic Multigrid


Data Agglomeration on Coarse Levels



                                                                                                   0
     number of vertices per processor




                                                                                                   1              • Repartition the data onto
                                                                                                                      fewer processes.
                                            (L−2)’
                                                               L−4                 L−6
                                                                                                                  • Use METIS on the graph of
                                            L−1               L−3              L−5                 L−7
                                                                                                                      the communication pattern.
 coarsen
                                                                                                                      (ParMETIS cannot handle
  target
                                            L                 L−2
                                                                               L−4’
                                                                                                   L−6’               the full machine!)
                                        1            number of nonidle processor               n




M. Blatt (HPC-Sim-Soft, Heidelberg)                                                       DUNE Blue Gene                                 J¨lich 2012
                                                                                                                                          u            17 / 33
Parallelization   Scalability


Weak Scalability Results (Poisson)



    procs         1/H        lev.       TB       TS           It         TIt     TT
        1          80           5     19.86      31.91        8          3.989   51.77
        8         160           6      27.7      46.4         10         4.64    74.2
       64         320           7      74.1      49.3         10         4.93    123
      512         640           8     76.91      60.2         12         5.017   137.1
    4096         1280         10      81.31      64.45        13         4.958   145.8
   32768         2560         11      92.75      65.55        13         5.042   158.3
  262144         5120         12      188.5      67.66        13         5.205   256.2




M. Blatt (HPC-Sim-Soft, Heidelberg)             DUNE Blue Gene                           J¨lich 2012
                                                                                          u            18 / 33
Parallelization   Scalability


Clipped Log-Random Problem




    • −       · (k(x) u) = f in Ω
    • κ(x) realization of log-random field with variance σ 2 , mean 0, and
       correlation length λ.
    • k(x): binary medium constructed from κ(x).
    • Weak scaling: λ scales with mesh width h.

M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene           J¨lich 2012
                                                                       u            19 / 33
Parallelization   Scalability


Weak Scalability Results (Possion vs. Clipped)




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene           J¨lich 2012
                                                                       u            20 / 33
Parallelization   Scalability


Weak Scalability Results (Clipped Log-Random Problem)



    • σ 2 = 8, λ = 4h
    procs         1/h        lev.       TB       TS           It         TIt     TT
        1          80           5     19.93      49.39        12         4.116   69.32
        8         160           6      28.1      73.7         15         4.91    102
       64         320           7      75.1      105          20         5.26    180
      512         640           8     80.11      134          25         5.362   214.1
    4096         1280         10      84.71      171.7        33         5.203   256.4
   32768         2560         11      93.24      189.5        36         5.264   282.7
  262144         5120         12      195.9      386.5        72         5.368   582.5




M. Blatt (HPC-Sim-Soft, Heidelberg)             DUNE Blue Gene                           J¨lich 2012
                                                                                          u            21 / 33
Parallelization   Scalability


Parallel Groundwater Simulation




                        Figure: Cut through the ground beneath an acre

    • Highly discontinuous permeability of the ground.
    • 3D simulations with high resolution.
    • Efficient and robust parallel iterative solvers.

                                      −    · (K (x) u) = f in Ω                         (1)
M. Blatt (HPC-Sim-Soft, Heidelberg)              DUNE Blue Gene           J¨lich 2012
                                                                           u            22 / 33
Parallelization   Scalability


Weak Scalability Results II
    •   Richards equation.
    •   64 ∗ 64 ∗ 128 unknowns per process.
    •   1.25E11 unknowns on the full JUGENE.
    •   One time step in simulation.




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene           J¨lich 2012
                                                                       u            23 / 33
Parallelization   Scalability


Efficiency Solver Components




M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene           J¨lich 2012
                                                                       u            24 / 33
Parallelization   Scalability


Efficiency IO




    • Highly tuned by Olaf Ippisch
    • SionLib from J¨lich rocks!
                    u
    • Still: IO not very scalable!

M. Blatt (HPC-Sim-Soft, Heidelberg)          DUNE Blue Gene           J¨lich 2012
                                                                       u            25 / 33
Parallelization      A Glimpse at other DUNE projects




                  Projects using D UNE -F EM on JUGENE

                                     ¨
                     DFG SPP MetStrom: Adaptive Numerics for Multiscale Phenomena
                       · D. Kroner, S.Brdar (Freiburg)
                              ¨
                       · M. Baldauf, D. Schuster (DWD)
                       · R. Klofkorn (Stuttgart)
                              ¨
                       · A. Dedner (Warwick)




                                                    Mountain wave test case: work by S.Brdar

                     BW Stiftung HPC-11: Simulation of 2-stroke engines with detailed combustion
                       · D. Kroner, D. Lebiedz, M. Nolte, M. Fein (Freiburg)
                               ¨
                       · R. Klofkorn (Stuttgart)
                               ¨
                       · A. Dedner (Warwick)




                                                             work by D.Trescher
                                                                                                           Robert Klöfkorn
M. Blatt (HPC-Sim-Soft, Heidelberg)               DUNE Blue Gene                                   J¨lich 2012
                                                                                                    u               26 / 33
Parallelization   A Glimpse at other DUNE projects




                  Strong scaling on Blue Gene/P


                     Table: Strong scaling and efficiency on the supercomputer JUGENE (Julich, Germany)
                                                                                       ¨

                  #cores         #cells/core1         #DOFs/core       time (ms)2      speed-up        efficiency
                    512                   474            296250             46216              —          —
                   4096                    59             36875              6294            7.34        0.91
                  32768                     7              4375               949           48.71        0.76
                  65536                     3              1875               504           91.70        0.72

                       Navier-Stokes equations solved with CDG2 ( k = 4, 3D)
                       overall number of cells 243 000 (#DOFS ≈ 1.52 · 108 ) on Cartesian grid
                       ≈ 6.1 GB memory consumption on a desktop machine
                       explicit Runge-Kutta method of order 3
                       programming techniques for performance
                          ·   template meta programming
                          ·   automated code generation of DG kernels
                          ·   hybrid parallelization (MPI / pthreads , work in progress)
                          ·   overlap computation and communication (DCMF INTERRUPT=1)
                 1
                     average #cells/core
                 2
                     average run time per time step                                                             Robert Klöfkorn
M. Blatt (HPC-Sim-Soft, Heidelberg)                      DUNE Blue Gene                                 J¨lich 2012
                                                                                                         u               27 / 33
Trends and Outlook for HPC and DUNE


(My) Parallel Machines




       Helics I (2003)                    Helics II (2007)        Blue Gene/P (2009)
    • 256 nodes                         • 156+4 nodes             • 73728 nodes
    • Dual AMD Athlon 1,4               • 2x Dual Core AMD        • PowerPC 450
       GHz                                 Opteron 2220 2.8 GHz     Quad-core 850Mhz
    • 5.9 GFLOPS                        • 18.8 GFLOPS             • 13,6 GFLOPS
       peak/node                           peak/node                peak/node
    • 1GB main                          • 8 GB RAM/node (21.4     • 2 GB RAM/node (13.6
       memory/node                         GB/s)                    GB/s)
    • Myrinet 2 Gbit                    • Myricom 10Gbit

M. Blatt (HPC-Sim-Soft, Heidelberg)              DUNE Blue Gene                 J¨lich 2012
                                                                                 u            28 / 33
Trends and Outlook for HPC and DUNE


Observations in Parallel Computing

Software
    • Solution of time dependent (nonlinear) equations with implicit time
       stepping schemes.
    • Most time consuming: Solution of linear system.
    • Peak GFLOPS out of reach!
    • Methods are limited by memory bandwidth.

Hardware
    • Costs for compute power drop fast (2002: 12 USD/MFLOP, 2011:
       .01 USD/MFLOP)
    • Costs for main memory drop only slightly.
    • Main memory not power efficient.


M. Blatt (HPC-Sim-Soft, Heidelberg)              DUNE Blue Gene   J¨lich 2012
                                                                   u            29 / 33
Trends and Outlook for HPC and DUNE


Current Hardware/Software Trends
The hardware manufacturers solution (Green Computing)
    • More cores per node
    • Less main memory per core.
    • SIMD (Blue Gene/Q, GPGPU)
    • Increase GFLOPS per GB/s main memory speed
    • Faster network interconnects.

How does DUNE cope
    • Memory efficiency!
    • Minimize communication!
    • Favor less convergent iterative methods!
    • Time to solution / scalability matters most!
    • Ability to compute bigger problems faster.

Software always two steps behind. Blue Gene
M. Blatt (HPC-Sim-Soft, Heidelberg)
                             DUNE                         J¨lich 2012
                                                           u            30 / 33
Trends and Outlook for HPC and DUNE


Current Work and Future Plans


Software Point of View
    • Hybrid parallelization in ALUgrid (Kl¨fkorn, University Stuttgart)
                                           o
    • Borrow ideas from GPGPU-computing (coalesced memory)
    • Check out cache oblivious algorithms
    • Check out parallel in time algorithms.

Application Point of View
    • Inverse Modeling
         • Geostatistical inversion: University Heidelberg (Ippisch, Ngo) and
           T¨bingen (Cirpka, Schwede)
            u
         • Run several parallel forward simulation in parallel.




M. Blatt (HPC-Sim-Soft, Heidelberg)              DUNE Blue Gene     J¨lich 2012
                                                                     u            31 / 33
Trends and Outlook for HPC and DUNE


DUNE on Blue Gene/Q?




Advantages of a central installation:
    • Saves scientists a lot of time.
    • Would help optimizing DUNE for the platform.
    • Possibility of professional installation and user support.
    • Closer cooperation of DUNE and IBM brings benefits to all.




M. Blatt (HPC-Sim-Soft, Heidelberg)              DUNE Blue Gene    J¨lich 2012
                                                                    u            32 / 33
HPC-Simulation-Software & Services


What can we do for you?


                      Efficient simulation software made to measure.




                   Hans-Bunte-Str. 8-10, 69123 Heidelberg, Germany
                             http://www.dr-blatt.de



M. Blatt (HPC-Sim-Soft, Heidelberg)               DUNE Blue Gene     J¨lich 2012
                                                                      u            33 / 33

More Related Content

What's hot

Ad hoc routing
Ad hoc routingAd hoc routing
Ad hoc routingits
 
H.264 Library
H.264 LibraryH.264 Library
H.264 LibraryVideoguy
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...Larry Smarr
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Distributed clustering from data streams
Distributed clustering from data streamsDistributed clustering from data streams
Distributed clustering from data streamsLARCA UPC
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club
 
Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...
Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...
Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...Pierre Laffitte
 
The performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemsThe performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemschakravarthy Gopi
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
 
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...CSCJournals
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineeringpriyanka singh
 
A low complex modified golden
A low complex modified goldenA low complex modified golden
A low complex modified goldenIJCNCJournal
 

What's hot (20)

Ad hoc routing
Ad hoc routingAd hoc routing
Ad hoc routing
 
Conference ECT 2010
Conference ECT 2010Conference ECT 2010
Conference ECT 2010
 
48
4848
48
 
H.264 Library
H.264 LibraryH.264 Library
H.264 Library
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
 
(Ebook photo) a-z of digital photography
(Ebook photo)   a-z of digital photography(Ebook photo)   a-z of digital photography
(Ebook photo) a-z of digital photography
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Distributed clustering from data streams
Distributed clustering from data streamsDistributed clustering from data streams
Distributed clustering from data streams
 
JFEF encoding
JFEF encodingJFEF encoding
JFEF encoding
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
 
Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...
Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...
Deep Neural Networks for Automatic detection of Sreams and Shouted Speech in ...
 
The performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemsThe performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systems
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
 
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
Rate Distortion Performance for Joint Source Channel Coding of JPEG image Ove...
 
270 273
270 273270 273
270 273
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineering
 
Compression
CompressionCompression
Compression
 
A low complex modified golden
A low complex modified goldenA low complex modified golden
A low complex modified golden
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
 

Viewers also liked

Citta metropolitana adotta_indeciso
Citta metropolitana adotta_indecisoCitta metropolitana adotta_indeciso
Citta metropolitana adotta_indecisopdnavigli
 
Africa quiz review
Africa quiz reviewAfrica quiz review
Africa quiz reviewkmdonahue2
 
effective-time-mangement 212
effective-time-mangement 212effective-time-mangement 212
effective-time-mangement 212Geetha Mohan
 
Facebook page ebook2011
Facebook page ebook2011Facebook page ebook2011
Facebook page ebook2011cdstovall
 
Czech republic re
Czech republic re Czech republic re
Czech republic re lilitome
 
Africa quiz review
Africa quiz reviewAfrica quiz review
Africa quiz reviewkmdonahue2
 
Scaling algebraic multigrid to over 287K processors
Scaling algebraic multigrid to over 287K processorsScaling algebraic multigrid to over 287K processors
Scaling algebraic multigrid to over 287K processorsMarkus Blatt
 
Narrative for Social Games
Narrative for Social GamesNarrative for Social Games
Narrative for Social GamesJonathon Myers
 
Africa quiz review
Africa quiz reviewAfrica quiz review
Africa quiz reviewkmdonahue2
 
Encontro presencial final tecnologias na educacao
Encontro presencial final tecnologias na educacaoEncontro presencial final tecnologias na educacao
Encontro presencial final tecnologias na educacaoNte Sjp
 
История поискового отряда
История поискового отрядаИстория поискового отряда
История поискового отрядаzeklip
 
Food: Business, Movement, or Both? WB Rosenzweig
Food: Business, Movement, or Both? WB RosenzweigFood: Business, Movement, or Both? WB Rosenzweig
Food: Business, Movement, or Both? WB RosenzweigWilliam Rosenzweig
 
Czech republic real estate investment & economic outlook q4 2011
Czech republic real estate investment & economic outlook q4 2011 Czech republic real estate investment & economic outlook q4 2011
Czech republic real estate investment & economic outlook q4 2011 lilitome
 
finel report on india map project
finel report on india map projectfinel report on india map project
finel report on india map projectAshish Sharma
 

Viewers also liked (19)

About me D-G
About me D-GAbout me D-G
About me D-G
 
Citta metropolitana adotta_indeciso
Citta metropolitana adotta_indecisoCitta metropolitana adotta_indeciso
Citta metropolitana adotta_indeciso
 
Africa quiz review
Africa quiz reviewAfrica quiz review
Africa quiz review
 
effective-time-mangement 212
effective-time-mangement 212effective-time-mangement 212
effective-time-mangement 212
 
Facebook page ebook2011
Facebook page ebook2011Facebook page ebook2011
Facebook page ebook2011
 
Czech republic re
Czech republic re Czech republic re
Czech republic re
 
Africa quiz review
Africa quiz reviewAfrica quiz review
Africa quiz review
 
Scaling algebraic multigrid to over 287K processors
Scaling algebraic multigrid to over 287K processorsScaling algebraic multigrid to over 287K processors
Scaling algebraic multigrid to over 287K processors
 
Beautiful
BeautifulBeautiful
Beautiful
 
Narrative for Social Games
Narrative for Social GamesNarrative for Social Games
Narrative for Social Games
 
Africa quiz review
Africa quiz reviewAfrica quiz review
Africa quiz review
 
Encontro presencial final tecnologias na educacao
Encontro presencial final tecnologias na educacaoEncontro presencial final tecnologias na educacao
Encontro presencial final tecnologias na educacao
 
История поискового отряда
История поискового отрядаИстория поискового отряда
История поискового отряда
 
Computer maitenance
Computer maitenanceComputer maitenance
Computer maitenance
 
Food: Business, Movement, or Both? WB Rosenzweig
Food: Business, Movement, or Both? WB RosenzweigFood: Business, Movement, or Both? WB Rosenzweig
Food: Business, Movement, or Both? WB Rosenzweig
 
Czech republic real estate investment & economic outlook q4 2011
Czech republic real estate investment & economic outlook q4 2011 Czech republic real estate investment & economic outlook q4 2011
Czech republic real estate investment & economic outlook q4 2011
 
finel report on india map project
finel report on india map projectfinel report on india map project
finel report on india map project
 
Core Values
Core ValuesCore Values
Core Values
 
Examples of Company Core Values
Examples of Company Core ValuesExamples of Company Core Values
Examples of Company Core Values
 

Similar to DUNE on current and next generation HPC Platforms

PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Codemotion
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizonac.titus.brown
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCJeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)
Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)
Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)Alex Kozlov
 
Dna computing
Dna computingDna computing
Dna computingcsr522
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 

Similar to DUNE on current and next generation HPC Platforms (20)

PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
Shuronr
ShuronrShuronr
Shuronr
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Human parsing
Human parsingHuman parsing
Human parsing
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
Sang_Graphormer.pdf
Sang_Graphormer.pdfSang_Graphormer.pdf
Sang_Graphormer.pdf
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do Transformers Really Perform Bad for...
NS-CUK Seminar: S.T.Nguyen, Review on "Do Transformers Really Perform Bad for...NS-CUK Seminar: S.T.Nguyen, Review on "Do Transformers Really Perform Bad for...
NS-CUK Seminar: S.T.Nguyen, Review on "Do Transformers Really Perform Bad for...
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCJeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Chapter 4 better.pptx
Chapter 4 better.pptxChapter 4 better.pptx
Chapter 4 better.pptx
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)
Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)
Exact Inference in Bayesian Networks using MapReduce (Hadoop Summit 2010)
 
Dna computing
Dna computingDna computing
Dna computing
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 

DUNE on current and next generation HPC Platforms

  • 1. Title DUNE on current and next generation HPC Platforms Markus Blatt Dr. Markus Blatt HPC-Simulation-Software & Services Forschungszentrum J¨lich, Germany u March 8, 2012 M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 1 / 33
  • 2. Outline Outline 1 DUNE 2 Parallelization Parallel Grid Interface Parallel Iterative Solvers Parallel Algebraic Multigrid Scalability A Glimpse at other DUNE projects 3 Trends and Outlook for HPC and DUNE 4 HPC-Simulation-Software & Services M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 2 / 33
  • 3. DUNE Why we created DUNE! Problems with most PDE software • Mostly support one special set of features: • IPARS: block structured, parallel, multiphysics. • Alberta: simplicial, unstructured, bisection refinement. • UG: unstructured, multi-element, red-green refinement, parallel. • QuocMesh: Fast, on-the-fly structured grids. • Other features either not or inefficiently supported. The idea of DUNE • Separation of data structures and algorithms. • Easy exchange of interface implementations. • Reuse of legacy software. • Fine grained interfaces. • C++ with templates. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 3 / 33
  • 4. DUNE Modularity dune−pdelab−howto dune−pdelab dune−fem dune−grid−howto dune−localfunctions dune−grid dune−istl Metis NeuronGrid SuperLU UG Alberta ALU dune−common VTK Gmsh • Grid interface: (non-)conforming hierarchically grid interface. • Iterative Solver Template Library: Dense and sparse linear algebra. • PDELab: Discretization module based on residual formulation. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 4 / 33
  • 5. DUNE PDELab: Plug, Code and Play Simulation Software • Choose: • Grid • Finite Element • Maybe imlement local operator. • Time stepping scheme. • (Non-)linear solvers. • Recompile application • Run efficiently. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 5 / 33
  • 6. DUNE Sample Simulations Transport in porous media, Density-driven flow, Flow around Root networks, Neuron network simulations • Electromagnetics • Computational neuroscience: biophysically realistic networks of neurons • Geostatistical inversion: coping with uncertain parameters • Linear Acoustics • Multiphase-Flow and transport in porous media • Density-driven flow M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 6 / 33
  • 7. Parallelization Parallelization Parallel Grids • Domain decomposition (overlapping or non-overlapping). • Load-balancing • Message passing based on MPI is handled by grid manager. Parallel linear algebra • Message passing decoupled from grid. • Abstraction: Parallel index sets to identify data globally. • Reuse of efficient sequential linear algebra. • Minimize communication. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 7 / 33
  • 8. Parallelization Parallel Grid Interface An Example of a Parallel Grid First row: with overlap and ghosts c=0 c=1 c=2 Second row: with overlap only Third row: with ghosts only interior c=0 c=1 c=2 overlap ghost border front not stored c=0 c=1 c=2 M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 8 / 33
  • 9. Parallelization Parallel Grid Interface Parallel Grids in Dune • YaspGrid • structured • 2D/3D • arbitrary overlap • UGGrid • unstructured • 2D/3D • multi-element • one layer of ghost cells • (conforming) red-green refinement • (non-free!) • ALUGrid • unstructured • 3D • tetrahedral or hexahedral elements • ghost cells • (non-conforming) bisection refinement M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 9 / 33
  • 10. Parallelization Parallel Grid Interface Parallel Grids in Dune • YaspGrid • structured • 2D/3D • arbitrary overlap • UGGrid • unstructured • 2D/3D • multi-element • one layer of ghost cells • (conforming) red-green refinement • (non-free!) • ALUGrid • unstructured • 3D • tetrahedral or hexahedral elements • ghost cells • (non-conforming) bisection refinement M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 9 / 33
  • 11. Parallelization Parallel Grid Interface Parallel Grids in Dune • YaspGrid • structured • 2D/3D • arbitrary overlap • UGGrid • unstructured • 2D/3D • multi-element • one layer of ghost cells • (conforming) red-green refinement • (non-free!) • ALUGrid • unstructured • 3D • tetrahedral or hexahedral elements • ghost cells • (non-conforming) bisection refinement M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 9 / 33
  • 12. Parallelization Parallel Iterative Solvers Index Sets Index Set P−1 • Distributed overlapping index set I = p=0 Ip • Process p manages mapping Ip −→ [0, np ). • Might only store information about the mapping for shared indices. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 10 / 33
  • 13. Parallelization Parallel Iterative Solvers Index Sets Index Set P−1 • Distributed overlapping index set I = p=0 Ip • Process p manages mapping Ip −→ [0, np ). • Might only store information about the mapping for shared indices. Global Index • Identifies a position (index) globally. • Arbitrary and not consecutive (to support adaptivity). • Persistent. • On JUGENE this is not an int to get rid off the 32 bit limit! M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 10 / 33
  • 14. Parallelization Parallel Iterative Solvers Index Sets Index Set P−1 • Distributed overlapping index set I = p=0 Ip • Process p manages mapping Ip −→ [0, np ). • Might only store information about the mapping for shared indices. Local Index • Addresses a position in the local container. • Convertible to an integral type. • Consecutive index starting from 0. • Non-persistent. • Provides an attribute to identify ghost region. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 10 / 33
  • 15. Parallelization Parallel Iterative Solvers Remote Information and Communication Remote Index Information • For each process q the process p knows all common global indices together with their attribute on q. Communication • Target and source partition of the index is chosen using attribute flags, e.g from ghost to owner and ghost. • If there is remote index information of q available on p, then p send all the data in one message. • All communication takes place asynchronously at the same time. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 11 / 33
  • 16. Parallelization Parallel Iterative Solvers Parallel Matrix Representation • Let Ii be a nonoverlapping decomposition of our index set I . • Ii is the augmented index set such set for all k ∈ Ii with |akj | + |ajk | = 0 also k ∈ Ii holds. • Then the locally stored matrix looks like      Ii Aii ∗  Ii    0 I  • Therefore Av can be computed locally for the entries associated with Ii if v is known for Ii • A communication step ensures consistent ghost values. • Matrix can be used for hybrid preconditioners. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 12 / 33
  • 17. Parallelization Parallel Algebraic Multigrid Algebraic Multigrid (AMG) Stationary Iterative Methods • Error reduction stagnates with increasing number of iterations and unknowns • Reduces only high frequency errors Algebraic Multigrid • approximate smooth residual on a coarser grid and solve there. • calculate a correction there • interpolate correction to the fine grid and add it to the current guess • Use algebraic nature of the problem to define coarse level. • Coarsening adapts to problem and grid automatically M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 13 / 33
  • 18. Parallelization Parallel Algebraic Multigrid Aggregation AMG Simple, non-smoothed version • Piecewise constant prolongators Pl . • Heuristic and greedy aggregation algorithm. • Al−1 = PlT Al Pl • Proposed by Raw, Vanek et al., Braess • Preconditioner for Krylov methods. Observations • Reasonable coarse grid operator for systems. • Preserves FV discretization. • Very memory efficient. • Fast and scalable V-cycle. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 14 / 33
  • 19. Parallelization Parallel Algebraic Multigrid Aggregation AMG Simple, non-smoothed version • Piecewise constant prolongators Pl . • Heuristic and greedy aggregation algorithm. • Al−1 = PlT Al Pl • Proposed by Raw, Vanek et al., Braess • Preconditioner for Krylov methods. Observations • Reasonable coarse grid operator for systems. • Preserves FV discretization. • Very memory efficient. • Fast and scalable V-cycle. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 14 / 33
  • 20. Parallelization Parallel Algebraic Multigrid Illustration Parallel Setup Decoupled Aggregation M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 15 / 33
  • 21. Parallelization Parallel Algebraic Multigrid Illustration Parallel Setup Communicate Ghost Aggregates M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 15 / 33
  • 22. Parallelization Parallel Algebraic Multigrid Parallel Setup Phase • Every process builds aggregates in his owner region. • One communication with next neighbors to update aggregate information in ghost region. • Coarse level index sets are a subset of the fine level. • Remote index information can be deduced locally. • Galerkin product can be calculated locally. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 16 / 33
  • 23. Parallelization Parallel Algebraic Multigrid Data Agglomeration on Coarse Levels 0 number of vertices per processor 1 • Repartition the data onto fewer processes. (L−2)’ L−4 L−6 • Use METIS on the graph of L−1 L−3 L−5 L−7 the communication pattern. coarsen (ParMETIS cannot handle target L L−2 L−4’ L−6’ the full machine!) 1 number of nonidle processor n M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 17 / 33
  • 24. Parallelization Scalability Weak Scalability Results (Poisson) procs 1/H lev. TB TS It TIt TT 1 80 5 19.86 31.91 8 3.989 51.77 8 160 6 27.7 46.4 10 4.64 74.2 64 320 7 74.1 49.3 10 4.93 123 512 640 8 76.91 60.2 12 5.017 137.1 4096 1280 10 81.31 64.45 13 4.958 145.8 32768 2560 11 92.75 65.55 13 5.042 158.3 262144 5120 12 188.5 67.66 13 5.205 256.2 M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 18 / 33
  • 25. Parallelization Scalability Clipped Log-Random Problem • − · (k(x) u) = f in Ω • κ(x) realization of log-random field with variance σ 2 , mean 0, and correlation length λ. • k(x): binary medium constructed from κ(x). • Weak scaling: λ scales with mesh width h. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 19 / 33
  • 26. Parallelization Scalability Weak Scalability Results (Possion vs. Clipped) M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 20 / 33
  • 27. Parallelization Scalability Weak Scalability Results (Clipped Log-Random Problem) • σ 2 = 8, λ = 4h procs 1/h lev. TB TS It TIt TT 1 80 5 19.93 49.39 12 4.116 69.32 8 160 6 28.1 73.7 15 4.91 102 64 320 7 75.1 105 20 5.26 180 512 640 8 80.11 134 25 5.362 214.1 4096 1280 10 84.71 171.7 33 5.203 256.4 32768 2560 11 93.24 189.5 36 5.264 282.7 262144 5120 12 195.9 386.5 72 5.368 582.5 M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 21 / 33
  • 28. Parallelization Scalability Parallel Groundwater Simulation Figure: Cut through the ground beneath an acre • Highly discontinuous permeability of the ground. • 3D simulations with high resolution. • Efficient and robust parallel iterative solvers. − · (K (x) u) = f in Ω (1) M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 22 / 33
  • 29. Parallelization Scalability Weak Scalability Results II • Richards equation. • 64 ∗ 64 ∗ 128 unknowns per process. • 1.25E11 unknowns on the full JUGENE. • One time step in simulation. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 23 / 33
  • 30. Parallelization Scalability Efficiency Solver Components M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 24 / 33
  • 31. Parallelization Scalability Efficiency IO • Highly tuned by Olaf Ippisch • SionLib from J¨lich rocks! u • Still: IO not very scalable! M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 25 / 33
  • 32. Parallelization A Glimpse at other DUNE projects Projects using D UNE -F EM on JUGENE ¨ DFG SPP MetStrom: Adaptive Numerics for Multiscale Phenomena · D. Kroner, S.Brdar (Freiburg) ¨ · M. Baldauf, D. Schuster (DWD) · R. Klofkorn (Stuttgart) ¨ · A. Dedner (Warwick) Mountain wave test case: work by S.Brdar BW Stiftung HPC-11: Simulation of 2-stroke engines with detailed combustion · D. Kroner, D. Lebiedz, M. Nolte, M. Fein (Freiburg) ¨ · R. Klofkorn (Stuttgart) ¨ · A. Dedner (Warwick) work by D.Trescher Robert Klöfkorn M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 26 / 33
  • 33. Parallelization A Glimpse at other DUNE projects Strong scaling on Blue Gene/P Table: Strong scaling and efficiency on the supercomputer JUGENE (Julich, Germany) ¨ #cores #cells/core1 #DOFs/core time (ms)2 speed-up efficiency 512 474 296250 46216 — — 4096 59 36875 6294 7.34 0.91 32768 7 4375 949 48.71 0.76 65536 3 1875 504 91.70 0.72 Navier-Stokes equations solved with CDG2 ( k = 4, 3D) overall number of cells 243 000 (#DOFS ≈ 1.52 · 108 ) on Cartesian grid ≈ 6.1 GB memory consumption on a desktop machine explicit Runge-Kutta method of order 3 programming techniques for performance · template meta programming · automated code generation of DG kernels · hybrid parallelization (MPI / pthreads , work in progress) · overlap computation and communication (DCMF INTERRUPT=1) 1 average #cells/core 2 average run time per time step Robert Klöfkorn M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 27 / 33
  • 34. Trends and Outlook for HPC and DUNE (My) Parallel Machines Helics I (2003) Helics II (2007) Blue Gene/P (2009) • 256 nodes • 156+4 nodes • 73728 nodes • Dual AMD Athlon 1,4 • 2x Dual Core AMD • PowerPC 450 GHz Opteron 2220 2.8 GHz Quad-core 850Mhz • 5.9 GFLOPS • 18.8 GFLOPS • 13,6 GFLOPS peak/node peak/node peak/node • 1GB main • 8 GB RAM/node (21.4 • 2 GB RAM/node (13.6 memory/node GB/s) GB/s) • Myrinet 2 Gbit • Myricom 10Gbit M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 28 / 33
  • 35. Trends and Outlook for HPC and DUNE Observations in Parallel Computing Software • Solution of time dependent (nonlinear) equations with implicit time stepping schemes. • Most time consuming: Solution of linear system. • Peak GFLOPS out of reach! • Methods are limited by memory bandwidth. Hardware • Costs for compute power drop fast (2002: 12 USD/MFLOP, 2011: .01 USD/MFLOP) • Costs for main memory drop only slightly. • Main memory not power efficient. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 29 / 33
  • 36. Trends and Outlook for HPC and DUNE Current Hardware/Software Trends The hardware manufacturers solution (Green Computing) • More cores per node • Less main memory per core. • SIMD (Blue Gene/Q, GPGPU) • Increase GFLOPS per GB/s main memory speed • Faster network interconnects. How does DUNE cope • Memory efficiency! • Minimize communication! • Favor less convergent iterative methods! • Time to solution / scalability matters most! • Ability to compute bigger problems faster. Software always two steps behind. Blue Gene M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE J¨lich 2012 u 30 / 33
  • 37. Trends and Outlook for HPC and DUNE Current Work and Future Plans Software Point of View • Hybrid parallelization in ALUgrid (Kl¨fkorn, University Stuttgart) o • Borrow ideas from GPGPU-computing (coalesced memory) • Check out cache oblivious algorithms • Check out parallel in time algorithms. Application Point of View • Inverse Modeling • Geostatistical inversion: University Heidelberg (Ippisch, Ngo) and T¨bingen (Cirpka, Schwede) u • Run several parallel forward simulation in parallel. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 31 / 33
  • 38. Trends and Outlook for HPC and DUNE DUNE on Blue Gene/Q? Advantages of a central installation: • Saves scientists a lot of time. • Would help optimizing DUNE for the platform. • Possibility of professional installation and user support. • Closer cooperation of DUNE and IBM brings benefits to all. M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 32 / 33
  • 39. HPC-Simulation-Software & Services What can we do for you? Efficient simulation software made to measure. Hans-Bunte-Str. 8-10, 69123 Heidelberg, Germany http://www.dr-blatt.de M. Blatt (HPC-Sim-Soft, Heidelberg) DUNE Blue Gene J¨lich 2012 u 33 / 33