SlideShare a Scribd company logo
1 of 86
Download to read offline
ORNL is managed by UT-Battelle, LLC for the US Department of Energy
Performance Analysis with Scalasca
George S. Markomanolis
7 August 2019
22 Open slide master to edit
Outline
• Introduction to Scalasca
• How to compile (using Score-P)
• Explaining functionalities of Scalasca/CUBE4 on two applications
• Testing a case with a large trace
33 Open slide master to edit
Scalasca
• Scalasca is a software tool that supports the performance analysis of
parallel applications
• The analysis identifies potential performance bottlenecks – in particular
those concerning communication and synchronization – and offers
guidance in exploring their causes.
• Installed version on Summit: v2.25
• Module: scalasca
• For instrumentation is used Score-P
• Web site: https://www.scalasca.org
• Email: scalasca@fz-juelich.de
44 Open slide master to edit
Capability Matrix - Scalasca
Capability Profiling Tracing Notes/Limitations
MPI, MPI-IO Yes Yes
OpenMP CPU Yes Yes
OpenMP GPU No No
OpenACC No No Score-P does instrument but CUBE does
not provide information
CUDA No No Score-P does instrument but CUBE does
not provide information
POSIX I/O Yes Yes
POSIX threads Yes Yes
Memory – app-level Yes Yes
Memory – func-level Yes Yes
Hotspot Detection Yes Yes
Variance Detection Yes Yes
Hardware Counters Yes Yes
55 Open slide master to edit
Techniques
• Profile analysis:
– Summary of aggregated metrics
• Per function/call-path and/or per process/thread
– mpiP, TAU, PerfSuite, Vampir
• Time-line analysis
• Pattern analysis
66 Open slide master to edit
Automatic trace analysis
• Apply tracing
• Automatic search for patterns on inefficient behavior
• Classificaiton of behavior
• Much faster than manual trace analysis
• Scalability
77 Open slide master to edit
Workflow
88 Open slide master to edit
CUBE4
• Parallel program analysis report exploration tools
• Libraries for XML report
• Algebra utilities for report processing
• GUI for interactive analysis exploration
• Three coupled tree browsers
• Performance property
• Call-tree path
• System location
• CUBE4 displays severities
• Value for precise comparison
• Colour for easy identification of hostpots
• Inclusive valye when closed and exclusive when expanded
99 Open slide master to edit
Scalasca on Summit
module load scalasca
scalasca
Scalasca 2.5
Toolset for scalable performance analysis of large-scale parallel applications
usage: scalasca [OPTION]... ACTION <argument>...
1. prepare application objects and executable for measurement:
scalasca -instrument <compile-or-link-command> # skin (using scorep)
2. run application under control of measurement system:
scalasca -analyze <application-launch-command> # scan
3. interactively explore measurement analysis report:
scalasca -examine <experiment-archive|report> # square
Options:
-c, --show-config show configuration summary and exit
-h, --help show this help and exit
-n, --dry-run show actions without taking them
--quickref show quick reference guide and exit
--remap-specfile show path to remapper specification file and exit
-v, --verbose enable verbose commentary
-V, --version show version information and exit
1010 Open slide master to edit
Scalasca Workflow
• Compilation: use Score-P
• Execution of the binary for profiling (it will create an output folder):
scalasca -analyze jsrun …
• Examine of the data (GUI is loaded)
scalasca -examine output_folder
1111 Open slide master to edit
MiniWeather – MPI – Tools parameters
• Parameters for Scalasca/Score-P
module load scorep/6.0_r14595
module load scalasca/2.5
export SCOREP_METRIC_PAPI=PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_FP_OPS
export SCOREP_MPI_ENABLE_GROUPS=ALL
export SCAN_MPI_LAUNCHER=jsrun
1212 Open slide master to edit
Instrumentation
1313 Open slide master to edit
MiniWeather - MPI
• Edit the Makefile and add the $PREP before the compiler name
• Compile:
MPI: make PREP="scorep --mpp=mpi --pdt" mpi
• Execution (submission script):
scalasca -analyze jsrun -n 64 -r 8 ./miniWeather_mpi
• Visualize:
scalasca -examine /gpfs/…/scorep_miniWeather_mpi_8p64_sum
1414 Open slide master to edit
CUBE4 – Central View
3 Windows:
Metric tree
Call tree
System tree
1515 Open slide master to edit
Exploring the menus
1616 Open slide master to edit
How to expand the trees
1717 Open slide master to edit
Computation – System tre
1818 Open slide master to edit
Computation – Blox plot
1919 Open slide master to edit
Computation Sunburst
2020 Open slide master to edit
Computation – Process x Thread
2121 Open slide master to edit
Score-P configuration parameters
2222 Open slide master to edit
CUBE – Flat view
2323 Open slide master to edit
Initialization variation
2424 Open slide master to edit
MPI_Comm_dup variation
2525 Open slide master to edit
MPI_Comm_dup variation II
2626 Open slide master to edit
Getting information about metrics
2727 Open slide master to edit
MPI_Allreduce variation
2828 Open slide master to edit
Collective I/O
2929 Open slide master to edit
MPI_Waitall variation
3030 Open slide master to edit
Information about transferred data
3131 Open slide master to edit
Read-Individual operations
3232 Open slide master to edit
Write-Collective operations
3333 Open slide master to edit
Computational imbalance - Overload
3434 Open slide master to edit
Computational imbalance - Underload
3535 Open slide master to edit
Computational imbalace
3636 Open slide master to edit
Bytes read
3737 Open slide master to edit
Instructions
3838 Open slide master to edit
CUBE4 – Derived metrics – Instructions per Cycle
• Right click on any metric of the metric tree, and
select “Edit metric” -> Create derived metric ->
“as a root”
3939 Open slide master to edit
Instructions per cycle (IPC) – Useful computational
workload
There is no specific rule but codes with IPC less than 1.5, can be improved
4040 Open slide master to edit
Floating operations NOT per second
4141 Open slide master to edit
Derived metric of Floating operations to create the
metric Flops
4242 Open slide master to edit
Difference between two executions
cube_diff scorep_miniWeather_mpi_4p64_sum1/profile.cubex
scorep_miniWeather_mpi_8p64_sum2/profile.cubex -c -o result.cubex
cube result.cubex
Tracing with Scalasca
4444 Open slide master to edit
Tracing with Scalasca
• Tracing can/will cause bigger overhead during the execution
of the application
• More information are recorded including timeline
• Scalasca will analyze the trace according to various patterns
and it will identify the bottlenecks
4545 Open slide master to edit
How much memory buffer to use for tracing?
• Examine the profiling data
scalasca -examine -s /gpfs/alpine/.../scorep_miniWeather_mpi_8p64_sum
INFO: Score report written to /gpfs/alpine/…/scorep_miniWeather_mpi_8p64_sum/scorep.score
• head /gpfs/alpine/…/scorep_miniWeather_mpi_8p64_sum/scorep.score
Estimated aggregate size of event trace: 978MB
Estimated requirements for largest trace buffer (max_buf): 16MB
Estimated memory requirements (SCOREP_TOTAL_MEMORY): 18MB
• Add in your submission script (include ~10% extra):
export SCOREP_TOTAL_MEMORY=20MB
4646 Open slide master to edit
Overhead
• You need to declare enough size of SCOREP_TOTAL_MEMORY to avoid flushing of the
performance files.
• For our application, non instrumented execution on 1 node takes ~30 seconds, while for profiling
and tracing is 45 and 53 seconds respectively, so 50% and 76% overhead.
• The overhead always depends on the application and what you instrument, OpenMP etc.
• We have choice of selective instrumentation or manually profiling filter
4747 Open slide master to edit
How to use Scalasca/Score-P with tracing?
• In your submission script, replace:
scalasca -analyze jsrun …
with
scalasca -analyze -q -t jsrun
• The –q disables the profiling.
4848 Open slide master to edit
Initial view with tracing
4949 Open slide master to edit
Computation with tracing and expand trees
5050 Open slide master to edit
Late Sender for Point-to-Point communication
5151 Open slide master to edit
Late Sender - Time
Documentation:
https://apps.fz-juelich.de/scalasca/releases/scalasca/2.5/help/scalasca_patterns.html
5252 Open slide master to edit
Late Sender – Wrong Order Time/Different Sources
5353 Open slide master to edit
Late Sender – Wrong Order Time/Same source
5454 Open slide master to edit
MPI Wait at N x N Time
5555 Open slide master to edit
MPI Wait at N x N Time - Explanation
5656 Open slide master to edit
Number of MPI communications
5757 Open slide master to edit
Short and Long-term delay
5858 Open slide master to edit
Long-term delay sender
5959 Open slide master to edit
Long-term delay sender
6060 Open slide master to edit
Propagating wait states
6161 Open slide master to edit
Tracing start
6262 Open slide master to edit
Terminal wait states
6363 Open slide master to edit
Direct wait stats
6464 Open slide master to edit
Wait States
6565 Open slide master to edit
Imbalance in the critical path
6666 Open slide master to edit
Critical Path Profile
6767 Open slide master to edit
Critical Path Imbalance
6868 Open slide master to edit
Intra-partition Imbalance
6969 Open slide master to edit
Non Critical Path Activities
Scalasca with MPI+OpenMP
7171 Open slide master to edit
MiniWeather – MPI+OMP
• Compile:
MPI: make PREP="scorep --mpp=mpi –openmp --pdt" openmp
• Execution (submission script):
scalasca -analyze -q -t jsrun -n 16 -r 2 -a 1 -c 8 "-b packed:8"
./miniWeather_mpi_openmp
• Visualize:
scalasca -examine /gpfs/…/
scorep_miniWeather_mpi_openmp_2p16x8_trace
7272 Open slide master to edit
OpenMP Views
7373 Open slide master to edit
OpenMP Thread Management Time
7474 Open slide master to edit
Duration of Fork
7575 Open slide master to edit
Source code of corresponding OpenMP call
7676 Open slide master to edit
OpenMP Thread Team Fork
7777 Open slide master to edit
Implicit Synchronization
7878 Open slide master to edit
Implicit Synchronization - Explanation
7979 Open slide master to edit
Idle threads
8080 Open slide master to edit
Idle threads - Explanation
8181 Open slide master to edit
Limited Parallelism – Process x Thread
8282 Open slide master to edit
Limited Parallelism - Explanation
8383 Open slide master to edit
Long-term delay costs
8484 Open slide master to edit
Long/Short-term OpenMP Thread Idleness delay costs
8585 Open slide master to edit
Computational Imbalance overload
8686 Open slide master to edit
Computational Imbalance overload – Process x Thread

More Related Content

What's hot

Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit
 
Restoring US First Website
Restoring US First WebsiteRestoring US First Website
Restoring US First WebsiteAhmed AlSum
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueDatabricks
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer Khubaib Mahar
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Spark Summit
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
OSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamOSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamNETWAYS
 

What's hot (10)

Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc Bourlier
 
Fig 9-02
Fig 9-02Fig 9-02
Fig 9-02
 
Restoring US First Website
Restoring US First WebsiteRestoring US First Website
Restoring US First Website
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
opendayight loadBalancer
opendayight loadBalancer opendayight loadBalancer
opendayight loadBalancer
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
 
Fig 9-03
Fig 9-03Fig 9-03
Fig 9-03
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
OSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamOSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga Team
 

Similar to Performance Analysis with Scalasca on Summit Supercomputer part I

Introduction to Extrae/Paraver, part I
Introduction to Extrae/Paraver, part IIntroduction to Extrae/Paraver, part I
Introduction to Extrae/Paraver, part IGeorge Markomanolis
 
How to use TAU for Performance Analysis on Summit Supercomputer
How to use TAU for Performance Analysis on Summit SupercomputerHow to use TAU for Performance Analysis on Summit Supercomputer
How to use TAU for Performance Analysis on Summit SupercomputerGeorge Markomanolis
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Databricks
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDatabricks
 
Boost your productivity with Scala tooling!
Boost your productivity  with Scala tooling!Boost your productivity  with Scala tooling!
Boost your productivity with Scala tooling!MeriamLachkar1
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen IIPorting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen IIGeorge Markomanolis
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Mitaka_IntroStarlink.pdf
Mitaka_IntroStarlink.pdfMitaka_IntroStarlink.pdf
Mitaka_IntroStarlink.pdfrobinsroy28
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Lalit Panwar
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDatabricks
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopQuantUniversity
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Dongjoon Hyun
 

Similar to Performance Analysis with Scalasca on Summit Supercomputer part I (20)

Introduction to Extrae/Paraver, part I
Introduction to Extrae/Paraver, part IIntroduction to Extrae/Paraver, part I
Introduction to Extrae/Paraver, part I
 
How to use TAU for Performance Analysis on Summit Supercomputer
How to use TAU for Performance Analysis on Summit SupercomputerHow to use TAU for Performance Analysis on Summit Supercomputer
How to use TAU for Performance Analysis on Summit Supercomputer
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
 
Boost your productivity with Scala tooling!
Boost your productivity  with Scala tooling!Boost your productivity  with Scala tooling!
Boost your productivity with Scala tooling!
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen IIPorting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Mitaka_IntroStarlink.pdf
Mitaka_IntroStarlink.pdfMitaka_IntroStarlink.pdf
Mitaka_IntroStarlink.pdf
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
 

More from George Markomanolis

Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
Analyzing ECP Proxy Apps with the Profiling Tool Score-P
Analyzing ECP Proxy Apps with the Profiling Tool Score-PAnalyzing ECP Proxy Apps with the Profiling Tool Score-P
Analyzing ECP Proxy Apps with the Profiling Tool Score-PGeorge Markomanolis
 
Performance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part IIPerformance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part IIGeorge Markomanolis
 
Performance Analysis with TAU on Summit Supercomputer, part II
Performance Analysis with TAU on Summit Supercomputer, part IIPerformance Analysis with TAU on Summit Supercomputer, part II
Performance Analysis with TAU on Summit Supercomputer, part IIGeorge Markomanolis
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaGeorge Markomanolis
 
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...George Markomanolis
 
Introduction to Performance Analysis tools on Shaheen II
Introduction to Performance Analysis tools on Shaheen IIIntroduction to Performance Analysis tools on Shaheen II
Introduction to Performance Analysis tools on Shaheen IIGeorge Markomanolis
 

More from George Markomanolis (15)

Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Analyzing ECP Proxy Apps with the Profiling Tool Score-P
Analyzing ECP Proxy Apps with the Profiling Tool Score-PAnalyzing ECP Proxy Apps with the Profiling Tool Score-P
Analyzing ECP Proxy Apps with the Profiling Tool Score-P
 
Performance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part IIPerformance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part II
 
Performance Analysis with TAU on Summit Supercomputer, part II
Performance Analysis with TAU on Summit Supercomputer, part IIPerformance Analysis with TAU on Summit Supercomputer, part II
Performance Analysis with TAU on Summit Supercomputer, part II
 
Introducing IO-500 benchmark
Introducing IO-500 benchmarkIntroducing IO-500 benchmark
Introducing IO-500 benchmark
 
Experience using the IO-500
Experience using the IO-500Experience using the IO-500
Experience using the IO-500
 
Harshad - Handle Darshan Data
Harshad - Handle Darshan DataHarshad - Handle Darshan Data
Harshad - Handle Darshan Data
 
Lustre Best Practices
Lustre Best Practices Lustre Best Practices
Lustre Best Practices
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to Omega
 
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
Introduction to Performance Analysis tools on Shaheen II
Introduction to Performance Analysis tools on Shaheen IIIntroduction to Performance Analysis tools on Shaheen II
Introduction to Performance Analysis tools on Shaheen II
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Performance Analysis with Scalasca on Summit Supercomputer part I

  • 1. ORNL is managed by UT-Battelle, LLC for the US Department of Energy Performance Analysis with Scalasca George S. Markomanolis 7 August 2019
  • 2. 22 Open slide master to edit Outline • Introduction to Scalasca • How to compile (using Score-P) • Explaining functionalities of Scalasca/CUBE4 on two applications • Testing a case with a large trace
  • 3. 33 Open slide master to edit Scalasca • Scalasca is a software tool that supports the performance analysis of parallel applications • The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes. • Installed version on Summit: v2.25 • Module: scalasca • For instrumentation is used Score-P • Web site: https://www.scalasca.org • Email: scalasca@fz-juelich.de
  • 4. 44 Open slide master to edit Capability Matrix - Scalasca Capability Profiling Tracing Notes/Limitations MPI, MPI-IO Yes Yes OpenMP CPU Yes Yes OpenMP GPU No No OpenACC No No Score-P does instrument but CUBE does not provide information CUDA No No Score-P does instrument but CUBE does not provide information POSIX I/O Yes Yes POSIX threads Yes Yes Memory – app-level Yes Yes Memory – func-level Yes Yes Hotspot Detection Yes Yes Variance Detection Yes Yes Hardware Counters Yes Yes
  • 5. 55 Open slide master to edit Techniques • Profile analysis: – Summary of aggregated metrics • Per function/call-path and/or per process/thread – mpiP, TAU, PerfSuite, Vampir • Time-line analysis • Pattern analysis
  • 6. 66 Open slide master to edit Automatic trace analysis • Apply tracing • Automatic search for patterns on inefficient behavior • Classificaiton of behavior • Much faster than manual trace analysis • Scalability
  • 7. 77 Open slide master to edit Workflow
  • 8. 88 Open slide master to edit CUBE4 • Parallel program analysis report exploration tools • Libraries for XML report • Algebra utilities for report processing • GUI for interactive analysis exploration • Three coupled tree browsers • Performance property • Call-tree path • System location • CUBE4 displays severities • Value for precise comparison • Colour for easy identification of hostpots • Inclusive valye when closed and exclusive when expanded
  • 9. 99 Open slide master to edit Scalasca on Summit module load scalasca scalasca Scalasca 2.5 Toolset for scalable performance analysis of large-scale parallel applications usage: scalasca [OPTION]... ACTION <argument>... 1. prepare application objects and executable for measurement: scalasca -instrument <compile-or-link-command> # skin (using scorep) 2. run application under control of measurement system: scalasca -analyze <application-launch-command> # scan 3. interactively explore measurement analysis report: scalasca -examine <experiment-archive|report> # square Options: -c, --show-config show configuration summary and exit -h, --help show this help and exit -n, --dry-run show actions without taking them --quickref show quick reference guide and exit --remap-specfile show path to remapper specification file and exit -v, --verbose enable verbose commentary -V, --version show version information and exit
  • 10. 1010 Open slide master to edit Scalasca Workflow • Compilation: use Score-P • Execution of the binary for profiling (it will create an output folder): scalasca -analyze jsrun … • Examine of the data (GUI is loaded) scalasca -examine output_folder
  • 11. 1111 Open slide master to edit MiniWeather – MPI – Tools parameters • Parameters for Scalasca/Score-P module load scorep/6.0_r14595 module load scalasca/2.5 export SCOREP_METRIC_PAPI=PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_FP_OPS export SCOREP_MPI_ENABLE_GROUPS=ALL export SCAN_MPI_LAUNCHER=jsrun
  • 12. 1212 Open slide master to edit Instrumentation
  • 13. 1313 Open slide master to edit MiniWeather - MPI • Edit the Makefile and add the $PREP before the compiler name • Compile: MPI: make PREP="scorep --mpp=mpi --pdt" mpi • Execution (submission script): scalasca -analyze jsrun -n 64 -r 8 ./miniWeather_mpi • Visualize: scalasca -examine /gpfs/…/scorep_miniWeather_mpi_8p64_sum
  • 14. 1414 Open slide master to edit CUBE4 – Central View 3 Windows: Metric tree Call tree System tree
  • 15. 1515 Open slide master to edit Exploring the menus
  • 16. 1616 Open slide master to edit How to expand the trees
  • 17. 1717 Open slide master to edit Computation – System tre
  • 18. 1818 Open slide master to edit Computation – Blox plot
  • 19. 1919 Open slide master to edit Computation Sunburst
  • 20. 2020 Open slide master to edit Computation – Process x Thread
  • 21. 2121 Open slide master to edit Score-P configuration parameters
  • 22. 2222 Open slide master to edit CUBE – Flat view
  • 23. 2323 Open slide master to edit Initialization variation
  • 24. 2424 Open slide master to edit MPI_Comm_dup variation
  • 25. 2525 Open slide master to edit MPI_Comm_dup variation II
  • 26. 2626 Open slide master to edit Getting information about metrics
  • 27. 2727 Open slide master to edit MPI_Allreduce variation
  • 28. 2828 Open slide master to edit Collective I/O
  • 29. 2929 Open slide master to edit MPI_Waitall variation
  • 30. 3030 Open slide master to edit Information about transferred data
  • 31. 3131 Open slide master to edit Read-Individual operations
  • 32. 3232 Open slide master to edit Write-Collective operations
  • 33. 3333 Open slide master to edit Computational imbalance - Overload
  • 34. 3434 Open slide master to edit Computational imbalance - Underload
  • 35. 3535 Open slide master to edit Computational imbalace
  • 36. 3636 Open slide master to edit Bytes read
  • 37. 3737 Open slide master to edit Instructions
  • 38. 3838 Open slide master to edit CUBE4 – Derived metrics – Instructions per Cycle • Right click on any metric of the metric tree, and select “Edit metric” -> Create derived metric -> “as a root”
  • 39. 3939 Open slide master to edit Instructions per cycle (IPC) – Useful computational workload There is no specific rule but codes with IPC less than 1.5, can be improved
  • 40. 4040 Open slide master to edit Floating operations NOT per second
  • 41. 4141 Open slide master to edit Derived metric of Floating operations to create the metric Flops
  • 42. 4242 Open slide master to edit Difference between two executions cube_diff scorep_miniWeather_mpi_4p64_sum1/profile.cubex scorep_miniWeather_mpi_8p64_sum2/profile.cubex -c -o result.cubex cube result.cubex
  • 44. 4444 Open slide master to edit Tracing with Scalasca • Tracing can/will cause bigger overhead during the execution of the application • More information are recorded including timeline • Scalasca will analyze the trace according to various patterns and it will identify the bottlenecks
  • 45. 4545 Open slide master to edit How much memory buffer to use for tracing? • Examine the profiling data scalasca -examine -s /gpfs/alpine/.../scorep_miniWeather_mpi_8p64_sum INFO: Score report written to /gpfs/alpine/…/scorep_miniWeather_mpi_8p64_sum/scorep.score • head /gpfs/alpine/…/scorep_miniWeather_mpi_8p64_sum/scorep.score Estimated aggregate size of event trace: 978MB Estimated requirements for largest trace buffer (max_buf): 16MB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 18MB • Add in your submission script (include ~10% extra): export SCOREP_TOTAL_MEMORY=20MB
  • 46. 4646 Open slide master to edit Overhead • You need to declare enough size of SCOREP_TOTAL_MEMORY to avoid flushing of the performance files. • For our application, non instrumented execution on 1 node takes ~30 seconds, while for profiling and tracing is 45 and 53 seconds respectively, so 50% and 76% overhead. • The overhead always depends on the application and what you instrument, OpenMP etc. • We have choice of selective instrumentation or manually profiling filter
  • 47. 4747 Open slide master to edit How to use Scalasca/Score-P with tracing? • In your submission script, replace: scalasca -analyze jsrun … with scalasca -analyze -q -t jsrun • The –q disables the profiling.
  • 48. 4848 Open slide master to edit Initial view with tracing
  • 49. 4949 Open slide master to edit Computation with tracing and expand trees
  • 50. 5050 Open slide master to edit Late Sender for Point-to-Point communication
  • 51. 5151 Open slide master to edit Late Sender - Time Documentation: https://apps.fz-juelich.de/scalasca/releases/scalasca/2.5/help/scalasca_patterns.html
  • 52. 5252 Open slide master to edit Late Sender – Wrong Order Time/Different Sources
  • 53. 5353 Open slide master to edit Late Sender – Wrong Order Time/Same source
  • 54. 5454 Open slide master to edit MPI Wait at N x N Time
  • 55. 5555 Open slide master to edit MPI Wait at N x N Time - Explanation
  • 56. 5656 Open slide master to edit Number of MPI communications
  • 57. 5757 Open slide master to edit Short and Long-term delay
  • 58. 5858 Open slide master to edit Long-term delay sender
  • 59. 5959 Open slide master to edit Long-term delay sender
  • 60. 6060 Open slide master to edit Propagating wait states
  • 61. 6161 Open slide master to edit Tracing start
  • 62. 6262 Open slide master to edit Terminal wait states
  • 63. 6363 Open slide master to edit Direct wait stats
  • 64. 6464 Open slide master to edit Wait States
  • 65. 6565 Open slide master to edit Imbalance in the critical path
  • 66. 6666 Open slide master to edit Critical Path Profile
  • 67. 6767 Open slide master to edit Critical Path Imbalance
  • 68. 6868 Open slide master to edit Intra-partition Imbalance
  • 69. 6969 Open slide master to edit Non Critical Path Activities
  • 71. 7171 Open slide master to edit MiniWeather – MPI+OMP • Compile: MPI: make PREP="scorep --mpp=mpi –openmp --pdt" openmp • Execution (submission script): scalasca -analyze -q -t jsrun -n 16 -r 2 -a 1 -c 8 "-b packed:8" ./miniWeather_mpi_openmp • Visualize: scalasca -examine /gpfs/…/ scorep_miniWeather_mpi_openmp_2p16x8_trace
  • 72. 7272 Open slide master to edit OpenMP Views
  • 73. 7373 Open slide master to edit OpenMP Thread Management Time
  • 74. 7474 Open slide master to edit Duration of Fork
  • 75. 7575 Open slide master to edit Source code of corresponding OpenMP call
  • 76. 7676 Open slide master to edit OpenMP Thread Team Fork
  • 77. 7777 Open slide master to edit Implicit Synchronization
  • 78. 7878 Open slide master to edit Implicit Synchronization - Explanation
  • 79. 7979 Open slide master to edit Idle threads
  • 80. 8080 Open slide master to edit Idle threads - Explanation
  • 81. 8181 Open slide master to edit Limited Parallelism – Process x Thread
  • 82. 8282 Open slide master to edit Limited Parallelism - Explanation
  • 83. 8383 Open slide master to edit Long-term delay costs
  • 84. 8484 Open slide master to edit Long/Short-term OpenMP Thread Idleness delay costs
  • 85. 8585 Open slide master to edit Computational Imbalance overload
  • 86. 8686 Open slide master to edit Computational Imbalance overload – Process x Thread