SlideShare a Scribd company logo
1 of 13
Download to read offline
Performance Analysis
Introduction to Performance Analysis with Feel++
Jussara Marandola
Marandola.fj@gmail.com
Outline
● Introduction to behavior of Feel++ library;
– Problem:
● how to solve the bottleneck ?
● How to optimize the speedup in terms of execution time ?
● Scalability
● Laplacian execution parameters
Introduction to Feel++ library
● We focused on scalability of parallel
applications measuring the following metrics:
– Performance (Execution Time);
● Boost++ class cpu_timer that measures wall clock
time, user CPU process time, and system CPU process
time.
– Throughtput (Strong and Weak Scalability);
● N * ( tN / t1 ) * 100% (strong scalability);
● ( tN / t1 ) * 100% (weak scalasbility);
Performance metrics
●
Sequential FactorsSequential Factors
● Computation
– Execution time
● Wall-clock time
● CPU time
● Number of functions calls
● FLOPS (Measurement of Feel++ benchmarks)
– Aorta, Aneurism, Pelvis
●
Parallel Factors (Message passing programming model)Parallel Factors (Message passing programming model)
– Count:: the number of MPI point-to-point messages sent
– Duration:the time spent these send calls
– Size: the number of bytes transmitted by these calls
– Derived metrics:
● Throughput: normalization of scalability of parallel efficiency numerical apps
Scalability
● Factors determining efficiency of parallel algorithm
– Load balance : distribution of work among processors
– Concurrency : processors working simultaneously
– Overhead : additional work not present in corresponding
– serial computation
● Parallel scalability: relative effectiveness with which
parallel algorithm can utilize additional processors
Laplacian execution paramaters
● To compile the laplacian sample:
– /home/projet/jussara/feel.opt/
– Make feel_doc_laplacian -j5
● Execution paramaters (combinatorial approach)
Np Shape hsize dim paraview
1 hypercube 0.1 2 rectangle
1 simplex 0.1 2 triangle
1 hypercube 0.1 3 cube
1 simplex 0.1 3 tetrahedron
Laplacian execution paramaters
●
To execute the laplacian sample:
– The executable laplacian is found on
/home/projet/jussara/feel.opt/doc/manual/tutorial
– 1st approach we setup Np(number of processors) =1;
– mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 –dim=2mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 –dim=2
– mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=3mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=3
– mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=2mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=2
– mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 --dim=3mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 --dim=3
Generating files output
● Parallel laplacian execution
– mpirun -np 1 ./feel_doc_laplacian –hsize=0.1 –
shape=hypercube --dim2
– Results files found in
/home/project/feel/doc/tutorial
– Files:
● hypercube-2.geo
● hypercube-2.msh
● Laplacian-hypercube-2-1.sos
Display the data with Paraview
● paraview laplacian-hypercube-2-
1.sos
● Color legend
– g *
– u
– pid
● Style representation
– Points
– Wireframe
– Surface *
– Surface with edges
– Volume
Laplacian shapes visualization
Mesh Granularity
● Performance analysis of Mesh applications
– Type of granularity (Fine & Coarse granularity)
● Changing the hsize of mesh apps
– Case study: stokes & aorta
– Mesh generation (using gmsh): understanding the pre/pos
processing (including high order geometries
– Mesh benchmarks on Feel++
● aorta
● pelvis
● aneurism
Scalasca 1.4
Scalable Automatic Performance Analysis
● Performance analysis tool to accomplish the
measurement/analysis of MPI, OpenMP, Hybrid programming
constructs on Feel++ highly scalable HPC applications;
● Supports C, C++, Fortran;
● Many HPC platforms, Ex. Test of OpenMP/MPI on CURIE
version using NAS Parallel Benchmarks;
● Main goal @ Feel++: laplacian, stokes, navier-stokes
● Workflow: instrumentation, measurement, analysis,
presentation, evaluation, optimization;
Communication Bottlenecks
1
(1−P)+P/N
● Message passing model relies on inter-process messages
● Communication bottlenecks are major issue in parallel computing massively
parallel supercomputers;
● Amdhl's law
– P is a portion of program that can be made in parallel;
– (1-P) is portion that cannot be parallelized then the maximum speedup:
● If Parallel apps cannot scale to 1000 of processors (Weak scalability)
– Its means that happens the decrease of execution time and worse cache
coherency between nodes;

More Related Content

Viewers also liked

Regular Past Tense For Kids
Regular Past Tense For KidsRegular Past Tense For Kids
Regular Past Tense For KidsLogos Academy
 
Tenses ppt 2007 by vedant dhaka
Tenses ppt 2007 by vedant dhakaTenses ppt 2007 by vedant dhaka
Tenses ppt 2007 by vedant dhakaVedant Dhaka
 
Past simple tense
Past simple tensePast simple tense
Past simple tensemargie66
 
English ppt on tenses
English ppt on tensesEnglish ppt on tenses
English ppt on tensessiddharth246
 
The Past Simple ppt
The Past Simple pptThe Past Simple ppt
The Past Simple pptLaura Pérez
 
PAST SIMPLE POWER POINT
PAST SIMPLE POWER POINTPAST SIMPLE POWER POINT
PAST SIMPLE POWER POINTiguerendiain
 
Past simple ppt mercedes
Past simple ppt mercedesPast simple ppt mercedes
Past simple ppt mercedesvazquezcalleja
 
Simple past tense: regular and irregular verbs
Simple past tense: regular and irregular verbsSimple past tense: regular and irregular verbs
Simple past tense: regular and irregular verbsmonica_llovet
 
Powerpoint presentation on tenses
Powerpoint presentation on tensesPowerpoint presentation on tenses
Powerpoint presentation on tensesraizan
 
Simple present tense
Simple present tenseSimple present tense
Simple present tensemilyrichi
 
Simple present tense
Simple present tenseSimple present tense
Simple present tenseiraidahj
 
Composicion pasado simple
Composicion pasado simpleComposicion pasado simple
Composicion pasado simplemarimar0701
 
Grammar Tenses
Grammar TensesGrammar Tenses
Grammar Tensesfhinojosac
 

Viewers also liked (20)

Tenses
TensesTenses
Tenses
 
The Present Simple Tense
The Present Simple TenseThe Present Simple Tense
The Present Simple Tense
 
Tense
TenseTense
Tense
 
Regular Past Tense For Kids
Regular Past Tense For KidsRegular Past Tense For Kids
Regular Past Tense For Kids
 
Tenses ppt 2007 by vedant dhaka
Tenses ppt 2007 by vedant dhakaTenses ppt 2007 by vedant dhaka
Tenses ppt 2007 by vedant dhaka
 
Past simple tense
Past simple tensePast simple tense
Past simple tense
 
English ppt on tenses
English ppt on tensesEnglish ppt on tenses
English ppt on tenses
 
Tenses
TensesTenses
Tenses
 
The Past Simple ppt
The Past Simple pptThe Past Simple ppt
The Past Simple ppt
 
PAST SIMPLE POWER POINT
PAST SIMPLE POWER POINTPAST SIMPLE POWER POINT
PAST SIMPLE POWER POINT
 
12 tenses of english grammer
12 tenses of english grammer12 tenses of english grammer
12 tenses of english grammer
 
Past simple ppt mercedes
Past simple ppt mercedesPast simple ppt mercedes
Past simple ppt mercedes
 
Simple past tense: regular and irregular verbs
Simple past tense: regular and irregular verbsSimple past tense: regular and irregular verbs
Simple past tense: regular and irregular verbs
 
English book 3 student
English book 3 studentEnglish book 3 student
English book 3 student
 
Powerpoint presentation on tenses
Powerpoint presentation on tensesPowerpoint presentation on tenses
Powerpoint presentation on tenses
 
Simple present tense
Simple present tenseSimple present tense
Simple present tense
 
Simple present tense
Simple present tenseSimple present tense
Simple present tense
 
Composicion pasado simple
Composicion pasado simpleComposicion pasado simple
Composicion pasado simple
 
English tenses
English tensesEnglish tenses
English tenses
 
Grammar Tenses
Grammar TensesGrammar Tenses
Grammar Tenses
 

Similar to Performance Analysis of Feel++ Library Using Scalasca

L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxIsaac383415
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterEDB
 
Optimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericOptimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericNik Peric
 
Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementAnne Nicolas
 
High Performance Python - Marc Garcia
High Performance Python - Marc GarciaHigh Performance Python - Marc Garcia
High Performance Python - Marc GarciaMarc Garcia
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
Evaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsksEvaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsksChris Parnin
 
Cpu scheduling algorithm on windows
Cpu scheduling algorithm on windowsCpu scheduling algorithm on windows
Cpu scheduling algorithm on windowssiddhartha pande
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
 
Problem solving using computers - Unit 1 - Study material
Problem solving using computers - Unit 1 - Study materialProblem solving using computers - Unit 1 - Study material
Problem solving using computers - Unit 1 - Study materialTo Sum It Up
 
Operating Systems Process Scheduling Algorithms
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithmssathish sak
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...Yahoo Developer Network
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 

Similar to Performance Analysis of Feel++ Library Using Scalasca (20)

L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptx
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even Faster
 
Optimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericOptimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola Peric
 
Os2
Os2Os2
Os2
 
Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power Management
 
High Performance Python - Marc Garcia
High Performance Python - Marc GarciaHigh Performance Python - Marc Garcia
High Performance Python - Marc Garcia
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
orca_fosdem_FINAL
orca_fosdem_FINALorca_fosdem_FINAL
orca_fosdem_FINAL
 
Evaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsksEvaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsks
 
Cpu scheduling algorithm on windows
Cpu scheduling algorithm on windowsCpu scheduling algorithm on windows
Cpu scheduling algorithm on windows
 
Nyt Prof 200910
Nyt Prof 200910Nyt Prof 200910
Nyt Prof 200910
 
Chpt7
Chpt7Chpt7
Chpt7
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Problem solving using computers - Unit 1 - Study material
Problem solving using computers - Unit 1 - Study materialProblem solving using computers - Unit 1 - Study material
Problem solving using computers - Unit 1 - Study material
 
Operating Systems Process Scheduling Algorithms
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithms
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
Process management
Process managementProcess management
Process management
 

Performance Analysis of Feel++ Library Using Scalasca

  • 1. Performance Analysis Introduction to Performance Analysis with Feel++ Jussara Marandola Marandola.fj@gmail.com
  • 2. Outline ● Introduction to behavior of Feel++ library; – Problem: ● how to solve the bottleneck ? ● How to optimize the speedup in terms of execution time ? ● Scalability ● Laplacian execution parameters
  • 3. Introduction to Feel++ library ● We focused on scalability of parallel applications measuring the following metrics: – Performance (Execution Time); ● Boost++ class cpu_timer that measures wall clock time, user CPU process time, and system CPU process time. – Throughtput (Strong and Weak Scalability); ● N * ( tN / t1 ) * 100% (strong scalability); ● ( tN / t1 ) * 100% (weak scalasbility);
  • 4. Performance metrics ● Sequential FactorsSequential Factors ● Computation – Execution time ● Wall-clock time ● CPU time ● Number of functions calls ● FLOPS (Measurement of Feel++ benchmarks) – Aorta, Aneurism, Pelvis ● Parallel Factors (Message passing programming model)Parallel Factors (Message passing programming model) – Count:: the number of MPI point-to-point messages sent – Duration:the time spent these send calls – Size: the number of bytes transmitted by these calls – Derived metrics: ● Throughput: normalization of scalability of parallel efficiency numerical apps
  • 5. Scalability ● Factors determining efficiency of parallel algorithm – Load balance : distribution of work among processors – Concurrency : processors working simultaneously – Overhead : additional work not present in corresponding – serial computation ● Parallel scalability: relative effectiveness with which parallel algorithm can utilize additional processors
  • 6. Laplacian execution paramaters ● To compile the laplacian sample: – /home/projet/jussara/feel.opt/ – Make feel_doc_laplacian -j5 ● Execution paramaters (combinatorial approach) Np Shape hsize dim paraview 1 hypercube 0.1 2 rectangle 1 simplex 0.1 2 triangle 1 hypercube 0.1 3 cube 1 simplex 0.1 3 tetrahedron
  • 7. Laplacian execution paramaters ● To execute the laplacian sample: – The executable laplacian is found on /home/projet/jussara/feel.opt/doc/manual/tutorial – 1st approach we setup Np(number of processors) =1; – mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 –dim=2mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 –dim=2 – mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=3mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=3 – mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=2mpirun -np 1 ./feel_doc_laplacian –shape=hypercube –hsize=0.1 –dim=2 – mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 --dim=3mpirun -np 1 ./feel_doc_laplacian –shape=simplex –hsize=0.1 --dim=3
  • 8. Generating files output ● Parallel laplacian execution – mpirun -np 1 ./feel_doc_laplacian –hsize=0.1 – shape=hypercube --dim2 – Results files found in /home/project/feel/doc/tutorial – Files: ● hypercube-2.geo ● hypercube-2.msh ● Laplacian-hypercube-2-1.sos
  • 9. Display the data with Paraview ● paraview laplacian-hypercube-2- 1.sos ● Color legend – g * – u – pid ● Style representation – Points – Wireframe – Surface * – Surface with edges – Volume
  • 11. Mesh Granularity ● Performance analysis of Mesh applications – Type of granularity (Fine & Coarse granularity) ● Changing the hsize of mesh apps – Case study: stokes & aorta – Mesh generation (using gmsh): understanding the pre/pos processing (including high order geometries – Mesh benchmarks on Feel++ ● aorta ● pelvis ● aneurism
  • 12. Scalasca 1.4 Scalable Automatic Performance Analysis ● Performance analysis tool to accomplish the measurement/analysis of MPI, OpenMP, Hybrid programming constructs on Feel++ highly scalable HPC applications; ● Supports C, C++, Fortran; ● Many HPC platforms, Ex. Test of OpenMP/MPI on CURIE version using NAS Parallel Benchmarks; ● Main goal @ Feel++: laplacian, stokes, navier-stokes ● Workflow: instrumentation, measurement, analysis, presentation, evaluation, optimization;
  • 13. Communication Bottlenecks 1 (1−P)+P/N ● Message passing model relies on inter-process messages ● Communication bottlenecks are major issue in parallel computing massively parallel supercomputers; ● Amdhl's law – P is a portion of program that can be made in parallel; – (1-P) is portion that cannot be parallelized then the maximum speedup: ● If Parallel apps cannot scale to 1000 of processors (Weak scalability) – Its means that happens the decrease of execution time and worse cache coherency between nodes;