PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE                                                           Evaluating Seri...
MOLECULAR DYNAMICS CODE                                                                                         OverviewTh...
MOLECULAR DYNAMICS CODE                                                                      Implementation Details       ...
SCOPE OF THE CODE ANALYSIS                                             Hardware Details & Parameter SweepAll measurements ...
SERIAL ANALYSISIdentifying The Best Configuration For Parallel Runs                                           Runtime for ...
SERIAL ANALYSISIdentifying The Best Configuration For Parallel Runs                                           Runtime in s...
SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions                                       PAPI_FP_INS     Technische ...
SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions                                       FPU Idle     Technische Uni...
SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions                                       Branches mispredicted     T...
SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions                                       PAPI_VEC_INS     Technische...
SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions                                       L1 hit ratio     Technische...
SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions                                       PAPI_TO_CYC     Technische ...
SOURCE CODE ANALYSIS                       Codeblocks Ax & Bx For Ion-Mix (PP02)A0:!   !   BranchingA1:!   !   ArithmeticA...
SOURCE CODE ANALYSIS   Codeblocks Ax & Bx For Ion-Mix (PP02)                                            B0:               ...
SOURCE CODE ANALYSIS   Codeblocks Ax & Bx For Ion-Mix (PP02)                                                    B0:       ...
PARALLEL ANALYSIS                                                                 First Scalability Evaluation            ...
PARALLEL ANALYSIS WITH VAMPIR                    ZIH has more then a decade of expertise in analyzing                    s...
OPENMP CODE OPTIMIZATION           Comparison of OpenMP Versions using 1 Node and 1-8 Cores  Technische Universität Dresde...
OPENMP CODE OPTIMIZATION           Comparison of OpenMP Versions using 1 Node and 1-8 Cores  Technische Universität Dresde...
PARALLEL ANALYSIS WITH VAMPIR                       55k particles, MPI only, 8 nodes, 1 process per node    Technische Uni...
PARALLEL ANALYSIS WITH VAMPIR                                      20+ loops of the A0_B2 block    Technische Universität ...
PARALLEL ANALYSIS WITH VAMPIR                     55k particles, MPI only, 1 nodes, 8 processes per node    Technische Uni...
PARALLEL ANALYSIS WITH VAMPIR                                      20+ loops of the A0_B2 block    Technische Universität ...
PARALLEL ANALYSIS WITH VAMPIR   55k particles, MPI only, 84 nodes, 8 process per node (672 core == full machine run)      ...
PARALLEL ANALYSIS WITH VAMPIR                              switching of groups and flushing of VT buffers    Technische Uni...
PARALLEL ANALYSIS WITH VAMPIR                          55k particles, OpenMP only, 1 Node, 8 processes    Technische Unive...
PARALLEL ANALYSIS WITH VAMPIR55k particles, Hybrid, 84 MPI processes (1 per Node), 8 Threads per process (672 core == full...
PARALLEL ANALYSIS WITH VAMPIR                                      Second group, 100 runs    Technische Universität Dresde...
SCALABILITY STUDIES ON XRAY1 Node   8 Nodes                                                     Parallel Efficiency (MPI-on...
SCALABILITY STUDIES ON XRAY                                     Parallel Efficiency (Hybrid runs)   Technische Universität ...
 CONCLUSION AND FUTURE WORKTo find the optimal serial version of MD, different combinations of the codeblocks and compiler ...
PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE Evaluating Serial, Thread and Process-Parallel Performance
Upcoming SlideShare
Loading in …5
×

PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE Evaluating Serial, Thread and Process-Parallel Performance

949 views

Published on

A molecular dynamics code simulating the diffusion in dense nuclear matter in white dwarf stars is analyzed in this collaboration between PTI (Indiana University) and ZIH (Technische Universität Dresden). The code is highly configurable allowing MPI, OpenMP, or hybrid runs and additional fine-tuning with a range of parameters. The first step in the code analysis is to identify the best performing parameter set of the serial version. This configuration represents the most promising candidate for further studies. Aim of the parallel analysis is then to measure the scalability limits of the different parallel code implementations and to detect bottlenecks possibly preventing higher parallel efficiency. This work has been done with the parallel analysis framework Vampir.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
949
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE Evaluating Serial, Thread and Process-Parallel Performance

  1. 1. PERFORMANCE-STUDIES OF A MOLECULAR DYNAMICS CODE Evaluating Serial, Thread and Process-Parallel Performance Molecular Dynamics Collaboration betweenA molecular dynamics code simulating the PTI and ZIHdiffusion in dense nuclear matter in white dwarfstars is analyzed in this collaboration. The code Planning In 2008 a collaboration between the Technischeis highly configurable allowing MPI, OpenMP, or Universität Dresden, Center for Informationhybrid runs and additional fine tuning with a Services and High Performance Computingrange of parameters. (ZIH), in Germany and the Indiana University, Im h VampirTrace Pervasive Technology Institute (PTI), was wit plem founded to mutually benefit from common valuation r Vampi research areas. entation Serial Analysis with These research topics include:The first step in the code analysis is to identify ‣ Data-centric computing Ethe best performing parameter set. This ‣ Computing for biological and life sciencesconfiguration represents the most promising ‣ Wide area distributed file systemscandidate for further parallel analysis. ‣ Parallel computing performance Testing This collaboration involves ZIH maintaining a Parallel Analysis stack of performance evaluation tools inside the NSF FutureGrid project. For the analysis anAim of the parallel analysis is to measure the HPC system called Xray, a Cray XT5m which isscalability limits of the different parallel code Authors part of the FutureGrid hardware, was used.implementations and to detect bottlenecks Xray consists of Quad-core 64-bit AMDpossibly preventing further parallel efficiency. T. William, M. Weber, D. Röhrig - ZIH, TU Dresden Opteron series 2000 processors. PGI compilersThis work has been done with the parallel D. K. Berry, R. Henschel - UITS, IU Bloomington version 9.0.4 and the optimized xtpe-barcelonaanalysis framework Vampir. J. Hughto, A. S. Schneider, C. J. Horowitz - Physics, IU Bloomington module provided by Cray have been used.This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  2. 2. MOLECULAR DYNAMICS CODE OverviewThe analyzed molecular dynamics code (MD) The movie shows a 5125 (5k) particlehas been developed at Indiana University for simulation of oxygen ions which are formingsimulating dense nuclear matter in white dwarf microcrystals:and neutron stars. ‣ 1280 Carbon ionsMatter in these compact stellar objects consistsof either completely ionized atoms, or else free ‣ 3740 Oxygen ionsneutrons and protons. ‣ 100 Neon ionsThe MD code models these systems as classicalparticles interacting via a screened Coulomb By studying the motion of ions over a longinteraction. The electrons are not modeled simulation time, one can calculate the self-explicitly, but are treated as a uniform diffusion coefficient of ions, an importantbackground charge that serves to screen the quantity for understanding the behavior ofCoulomb interaction. white dwarfs. MD-Code Ion Mix Movements Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  3. 3. MOLECULAR DYNAMICS CODE Implementation Details During compilation the input parameter PP0xMD is a hybrid MPI/OpenMP parallel code decides what file is used for the simulation.written in Fortran. Its main features are: Inside the PP0x files the parameters Ax and Bx denote different code-blocks that implement ‣ Simple particle-particle algorithm equal semantics in different ways. Additionally, ‣ Long range interaction with large cut-off the file PP03 provides the define NBS to sphere influence the vector access length. Those ‣ Cutoff radius is half the box edge length parameters have been used throughout the ‣ Each ion interacts with about half the years to re-implement code utilizing newer others constructs of Fortran95 that had not been ‣ Interactions are calculated in a pair of available in Fortran77. nested do loops ‣ Outer loop goes over "target" particles PP01: ‣ Inner loop goes over "source" particles ‣ Original implementation ‣ Targets are assigned in a round-robin ‣ No division into Ax, Bx, or NBS fashion to MPI processes ‣ Baseline for other measurements ‣ Within each MPI process, the work is shared among OpenMP threads PP02: ‣ A0, A1, A2The MD code is highly modular, consisting of ‣ B0, B1, B2several different implementations of the same ‣ No NBSsimulation logic . The par ticle-par ticle Code block in the PP03 file showing different implementationsinteraction semantics have been implemented of the same loop selectable by Ax preprocessor macros PP03:multiple times in different ways. Each ‣ A0, A1, A2implementation is stored in an individual source ‣ B0, B1, B2, B3, B4, B5, B6, B7file, called from PP01 to PP03. ‣ NBS Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  4. 4. SCOPE OF THE CODE ANALYSIS Hardware Details & Parameter SweepAll measurements done on FutureGrid Time estimate per measurements and Serial analysis:machine Xray: particles: ‣ 55k particles ‣ Cray XT5m 5k ! ~ 300 seconds! ~5 minutes ‣ Comparing optimization flags: ‣ 672 cores 27k ! ~ 4000 seconds! ~1 h ! O2 ‣ 84 nodes 55k ! ~ 35000 seconds! ~10 h ! O3 ‣ 2 CPU per node ! fastsse ‣ Quad-Core AMD Opteron(tm) Processor 23 (C2) ‣ fastsse == -fast ‣ 2.4 GHz ‣ fast - Common optimizations; ‣ ... ! -O2 -Munroll=c:1 -Mnoframe -Mlre ! -Mautoinline -Mvect=sse -Mscalarsse ! -Mcache_align -Mflushz Particle-type Loop combination Paralellism nucleon-nucleon Block A: ! ! A0-A2 serial: ! md ion pure Block B: ! ! B1-B7 OpenMP:! md_omp ion mix Code-blocking: ! NBS MPI: ! md_mpi Hybrid: ! md_mpi_omp Input parameter Compiler Flags Code version simulation type O2 Original: ! ! PP01 number of particles O3 Production: ! PP02 number of time steps SSE Research:! ! PP03 .... >8100 possible combinations Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  5. 5. SERIAL ANALYSISIdentifying The Best Configuration For Parallel Runs Runtime for all code cominations Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  6. 6. SERIAL ANALYSISIdentifying The Best Configuration For Parallel Runs Runtime in seconds Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  7. 7. SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions PAPI_FP_INS Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  8. 8. SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions FPU Idle Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  9. 9. SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions Branches mispredicted Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  10. 10. SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions PAPI_VEC_INS Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  11. 11. SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions L1 hit ratio Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  12. 12. SERIAL ANALYSISPAPI Aided Analysis Of PP02 Code Versions PAPI_TO_CYC Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  13. 13. SOURCE CODE ANALYSIS Codeblocks Ax & Bx For Ion-Mix (PP02)A0:! ! BranchingA1:! ! ArithmeticA2:! ! Array syntax Block A Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  14. 14. SOURCE CODE ANALYSIS Codeblocks Ax & Bx For Ion-Mix (PP02) B0: ! ! Loop B1: ! ! No Cut-off Sphere B2: ! ! Cut-off SphereBlock B Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  15. 15. SOURCE CODE ANALYSIS Codeblocks Ax & Bx For Ion-Mix (PP02) B0: ! ! Loop B1: ! ! No Cut-off Sphere B2: ! ! Cut-off Sphere Result: "Calculation does not beat branching"Block B Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  16. 16. PARALLEL ANALYSIS First Scalability Evaluation The MD code is used in production with particle counts between 27k and 55k. Production runs currently use the PP02 implementation of particle- particle interaction semantics. The code blocks selected are A0 and B2, and the vector length (NBS) is set to 32. Measurements show that this configuration is very efficient and yields good performance. So far, runtimes of larger runs have only been analyzed using the hybrid version. The chart shows speedups on a large Cray XT5 (Kraken) and the resulting efficiency for 27k and 55k particle runs. Up to 1152 cores, the efficiency is higher than 80%. At 4608 cores the efficiency drops below 50% for 27k particles. Using more ions shows the effect of weak scaling that can be used to optimize the efficiency so that even on 4608 cores an efficiency of 58% can be achieved. Although these results are very promising, Kraken has 112,896 compute cores that could be used for computation. The mid term goal is therefore to optimize the MD code to achieve 50% parallel efficiency for 16000 cores. To achieve this we will take a more detailed look at the 4 different versions of the code (serial, OpenMP, MPI and hybrid).The code is very efficient for multi-core processors such as the 6-core processors on Kraken, the Cray XT5 at The National Institute for Computational Sciences. Each processor is assigned one MPI process. The work of each process is then shared among 6 OpenMP threads. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  17. 17. PARALLEL ANALYSIS WITH VAMPIR ZIH has more then a decade of expertise in analyzing sequential and parallel codes. As part of the collaboration with IU, ZIH applied its analysis software Vampir to the different versions of MD. Vampir consists of a tracing framework called VampirTrace that is able to monitor all levels of parallelism of the MD code. Additionally PAPI counters are recorded in order to analyze the cache behavior of the main calculation loop. In a first step traces of the four versions were generated using only 5k particles to get a broad overview. The next step will be to refine the tracing mechanism using manually instrumented code to avoid the rather huge overhead introduced by OpenMP tracing. This will enable monitoring of at least 4096 cores to get an understanding why the efficiency drops below 50% at this core count. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  18. 18. OPENMP CODE OPTIMIZATION Comparison of OpenMP Versions using 1 Node and 1-8 Cores Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  19. 19. OPENMP CODE OPTIMIZATION Comparison of OpenMP Versions using 1 Node and 1-8 Cores Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  20. 20. PARALLEL ANALYSIS WITH VAMPIR 55k particles, MPI only, 8 nodes, 1 process per node Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  21. 21. PARALLEL ANALYSIS WITH VAMPIR 20+ loops of the A0_B2 block Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  22. 22. PARALLEL ANALYSIS WITH VAMPIR 55k particles, MPI only, 1 nodes, 8 processes per node Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  23. 23. PARALLEL ANALYSIS WITH VAMPIR 20+ loops of the A0_B2 block Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  24. 24. PARALLEL ANALYSIS WITH VAMPIR 55k particles, MPI only, 84 nodes, 8 process per node (672 core == full machine run) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  25. 25. PARALLEL ANALYSIS WITH VAMPIR switching of groups and flushing of VT buffers Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  26. 26. PARALLEL ANALYSIS WITH VAMPIR 55k particles, OpenMP only, 1 Node, 8 processes Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  27. 27. PARALLEL ANALYSIS WITH VAMPIR55k particles, Hybrid, 84 MPI processes (1 per Node), 8 Threads per process (672 core == full machine run) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  28. 28. PARALLEL ANALYSIS WITH VAMPIR Second group, 100 runs Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  29. 29. SCALABILITY STUDIES ON XRAY1 Node 8 Nodes Parallel Efficiency (MPI-only runs) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  30. 30. SCALABILITY STUDIES ON XRAY Parallel Efficiency (Hybrid runs) Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/
  31. 31.  CONCLUSION AND FUTURE WORKTo find the optimal serial version of MD, different combinations of the codeblocks and compiler flags were tested. The results from the serial analysis builtthe starting point for the parallel analysis.Vampir was then successfully applied to the code making it possible to visualize PlanningMPI and OpenMP parallelism.An important step for future parallel analysis is to embed PAPI counter directly Im h VampirTracein the MD code to avoid the overhead caused by VampirTrace. This enables the wit plemcomparison of the actually achieved performance with the theoretical peak valuationperformance using larger core counts. r Vampi entationAdditional tests with larger particle counts are planned to exploit the weak withscaling capabilities of the code. The hybrid implementation is to be run indifferent process/thread configurations to research wether fitting MPI/OpenMP Eusage to the hardware characteristics can further increase performance.Analysis tests will be re-run on Kraken with up to 16386 cores. The goal is todetect scalability issues and to significantly improve parallel efficiency. Te s ti n gThe MD code is currently being ported to GPGPUs. Vampir already supportsthe analysis of CUDA parallelized code. This analysis support will enable efficientported code allowing for the utilization of the maximum possible potentialprovided by GPGPUs. Technische Universität Dresden IU Bloomington Zentrum für Informationsdienste Pervasive Technology Institute und Hochleistungsrechnen (ZIH) 2719 East 10th Street 01062 Dresden, Germany Bloomington, Indiana 47408 E-mail: contact@vampir.eu E-mail: pti@indiana.edu Web: http://www.vampir.eu/ Web: http://pti.iu.edu/

×