2. Outline
● Introduction to behavior of Feel++ library;
– Problem:
● how to solve the bottleneck ?
● How to optimize the speedup in terms of execution time ?
● Scalability
● Laplacian execution parameters
3. Introduction to Feel++ library
● We focused on scalability of parallel
applications measuring the following metrics:
– Performance (Execution Time);
● Boost++ class cpu_timer that measures wall clock
time, user CPU process time, and system CPU process
time.
– Throughtput (Strong and Weak Scalability);
● N * ( tN / t1 ) * 100% (strong scalability);
● ( tN / t1 ) * 100% (weak scalasbility);
4. Performance metrics
●
Sequential FactorsSequential Factors
● Computation
– Execution time
● Wall-clock time
● CPU time
● Number of functions calls
● FLOPS (Measurement of Feel++ benchmarks)
– Aorta, Aneurism, Pelvis
●
Parallel Factors (Message passing programming model)Parallel Factors (Message passing programming model)
– Count:: the number of MPI point-to-point messages sent
– Duration:the time spent these send calls
– Size: the number of bytes transmitted by these calls
– Derived metrics:
● Throughput: normalization of scalability of parallel efficiency numerical apps
5. Scalability
● Factors determining efficiency of parallel algorithm
– Load balance : distribution of work among processors
– Concurrency : processors working simultaneously
– Overhead : additional work not present in corresponding
– serial computation
● Parallel scalability: relative effectiveness with which
parallel algorithm can utilize additional processors
11. Mesh Granularity
● Performance analysis of Mesh applications
– Type of granularity (Fine & Coarse granularity)
● Changing the hsize of mesh apps
– Case study: stokes & aorta
– Mesh generation (using gmsh): understanding the pre/pos
processing (including high order geometries
– Mesh benchmarks on Feel++
● aorta
● pelvis
● aneurism
12. Scalasca 1.4
Scalable Automatic Performance Analysis
● Performance analysis tool to accomplish the
measurement/analysis of MPI, OpenMP, Hybrid programming
constructs on Feel++ highly scalable HPC applications;
● Supports C, C++, Fortran;
● Many HPC platforms, Ex. Test of OpenMP/MPI on CURIE
version using NAS Parallel Benchmarks;
● Main goal @ Feel++: laplacian, stokes, navier-stokes
● Workflow: instrumentation, measurement, analysis,
presentation, evaluation, optimization;
13. Communication Bottlenecks
1
(1−P)+P/N
● Message passing model relies on inter-process messages
● Communication bottlenecks are major issue in parallel computing massively
parallel supercomputers;
● Amdhl's law
– P is a portion of program that can be made in parallel;
– (1-P) is portion that cannot be parallelized then the maximum speedup:
● If Parallel apps cannot scale to 1000 of processors (Weak scalability)
– Its means that happens the decrease of execution time and worse cache
coherency between nodes;