Performance Analysis of Feel++ Library Using Scalasca

Performance Analysis
Introduction to Performance Analysis with Feel++
Jussara Marandola
Marandola.fj@gmail.com

Outline
● Introduction to behavior of Feel++ library;
– Problem:
● how to solve the bottleneck ?
● How to optimize the speedup in terms of execution time ?
● Scalability
● Laplacian execution parameters

Introduction to Feel++ library
● We focused on scalability of parallel
applications measuring the following metrics:
– Performance (Execution Time);
● Boost++ class cpu_timer that measures wall clock
time, user CPU process time, and system CPU process
time.
– Throughtput (Strong and Weak Scalability);
● N * ( tN / t1 ) * 100% (strong scalability);
● ( tN / t1 ) * 100% (weak scalasbility);

Performance metrics
●
Sequential FactorsSequential Factors
● Computation
– Execution time
● Wall-clock time
● CPU time
● Number of functions calls
● FLOPS (Measurement of Feel++ benchmarks)
– Aorta, Aneurism, Pelvis
●
Parallel Factors (Message passing programming model)Parallel Factors (Message passing programming model)
– Count:: the number of MPI point-to-point messages sent
– Duration:the time spent these send calls
– Size: the number of bytes transmitted by these calls
– Derived metrics:
● Throughput: normalization of scalability of parallel efficiency numerical apps

Scalability
● Factors determining efficiency of parallel algorithm
– Load balance : distribution of work among processors
– Concurrency : processors working simultaneously
– Overhead : additional work not present in corresponding
– serial computation
● Parallel scalability: relative effectiveness with which
parallel algorithm can utilize additional processors

Laplacian execution paramaters
● To compile the laplacian sample:
– /home/projet/jussara/feel.opt/
– Make feel_doc_laplacian -j5
● Execution paramaters (combinatorial approach)
Np Shape hsize dim paraview
1 hypercube 0.1 2 rectangle
1 simplex 0.1 2 triangle
1 hypercube 0.1 3 cube
1 simplex 0.1 3 tetrahedron

Laplacian execution paramaters
●
To execute the laplacian sample:
– The executable laplacian is found on
/home/projet/jussara/feel.opt/doc/manual/tutorial
– 1st approach we setup Np(number of processors) =1;
– mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 –dim=2mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 –dim=2
– mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=3mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=3
– mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=2mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=2
– mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 --dim=3mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 --dim=3

Generating files output
● Parallel laplacian execution
– mpirun -np 1 ./feel_doc_laplacian –hsize=0.1 –
shape=hypercube --dim2
– Results files found in
/home/project/feel/doc/tutorial
– Files:
● hypercube-2.geo
● hypercube-2.msh
● Laplacian-hypercube-2-1.sos

Display the data with Paraview
● paraview laplacian-hypercube-2-
1.sos
● Color legend
– g *
– u
– pid
● Style representation
– Points
– Wireframe
– Surface *
– Surface with edges
– Volume

Laplacian shapes visualization

Mesh Granularity
● Performance analysis of Mesh applications
– Type of granularity (Fine & Coarse granularity)
● Changing the hsize of mesh apps
– Case study: stokes & aorta
– Mesh generation (using gmsh): understanding the pre/pos
processing (including high order geometries
– Mesh benchmarks on Feel++
● aorta
● pelvis
● aneurism

Scalasca 1.4
Scalable Automatic Performance Analysis
● Performance analysis tool to accomplish the
measurement/analysis of MPI, OpenMP, Hybrid programming
constructs on Feel++ highly scalable HPC applications;
● Supports C, C++, Fortran;
● Many HPC platforms, Ex. Test of OpenMP/MPI on CURIE
version using NAS Parallel Benchmarks;
● Main goal @ Feel++: laplacian, stokes, navier-stokes
● Workflow: instrumentation, measurement, analysis,
presentation, evaluation, optimization;

Communication Bottlenecks
1
(1−P)+P/N
● Message passing model relies on inter-process messages
● Communication bottlenecks are major issue in parallel computing massively
parallel supercomputers;
● Amdhl's law
– P is a portion of program that can be made in parallel;
– (1-P) is portion that cannot be parallelized then the maximum speedup:
● If Parallel apps cannot scale to 1000 of processors (Weak scalability)
– Its means that happens the decrease of execution time and worse cache
coherency between nodes;

Performance Analysis of Feel++ Library Using Scalasca

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Performance Analysis of Feel++ Library Using Scalasca

Similar to Performance Analysis of Feel++ Library Using Scalasca (20)

Performance Analysis of Feel++ Library Using Scalasca