SlideShare a Scribd company logo
Event Tracing with
    VampirTrace and Vampir


Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323




Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
Overview



 Introduction
 Event Tracing Overview
 Instrumentation
 Run-Time Measurement
 Conclusions




                          2
Introduction



Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323




Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
Why bother with performance analysis?

  Moore's Law still in charge, so what?
  increasingly difficult to get close to peak performance
   – for sequential computation
       • memory wall
       • optimum pipelining, ...
   – for parallel interaction
       • Amdahl's law
       • synchronization with single late-comer, ...


  efficiency is important because of limited resources
  scalability is important to cope with next bigger simulation




                                                         4
Profiling and Tracing

Profile Recording
  of aggregated information (Time, Counts, …)
  about program and system entities
   – functions, loops, basic blocks
   – application, processes, threads, …


Methods of Profile Creation
  sampling (statistical approach)
  direct measurement (deterministic approach)




                                                5
Profiling and Tracing


Trace Recording
 run-time events (points of interest)
 during program execution
 saved as event record
  – timestamp, process, thread, event type
  – event specific information
 via instrumentation & trace library

Event Trace
 collection of all events of a process / program
 sorted by time stamp



                                                   6
Profiling and Tracing


Tracing Advantages
 preserve temporal and spatial relationships (context)
 allow reconstruction of dynamic behavior
 profiles can be calculated from traces

Tracing Disadvantages
 traces can become very large
 may cause perturbation
 instrumentation and tracing is complicated
  – event buffering, clock synchronization, …




                                                         7
Event Tracing Overview



Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323




Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
Event Tracing from A to Z

   Instrumentation               Run Time       Visualization / Analysis
                                 Measurement



  src
                                   exec.
           instrument




  exec.
           instrument
                                trace file(s)
                                                    see following
               see more below
                                                    presentation



                                                   9
Most common event types

Which events to monitor?
 enter/leave of function/routine/region
  – time stamp, process/thread, function ID
 send/receive of P2P message (MPI)
  – time stamp, sender, receiver, length, tag, communicator
 collective communication (MPI)
  – time stamp, process, root, communicator, # bytes
 hardware performance counter value
  – time stamp, process, counter ID, value

 corresponding “record types” in trace file format


                                                     10
Parallel Trace Files
10010 P 1 ENTER 5
10090 P 1 ENTER 6
10110 P 1 ENTER 12
10110 P 1 SEND TO 3 LEN 1024 ...
10330 P 1 LEAVE 12
10400 10020 P 2 ENTER 5
      P 1 LEAVE 6                   DEF TIMERRES 1000000000
10520 10095 P 2 ENTER 6
      P 1 ENTER 9                   DEF PROCESS 1 `Master`
10550 10120 P 2 ENTER 13
      P 1 LEAVE 9                 DEF PROCESS 1 `Slave`
...   10300 P 2 RECV FROM 3 LEN 1024 ...
                                  DEF FUNCTION 5 `main`
      10350 P 2 LEAVE 13            DEF FUNCTION 6 `foo`
      10450 P 2 LEAVE 6             DEF FUNCTION 9 `bar`
      10620 P 2 ENTER 9             DEF FUNCTION 12 `MPI_Send`
      10650 P 2 LEAVE 9             DEF FUNCTION 13 `MPI_Recv`
      ...

                     Trace Format Schematics
                                                11
Trace Visualization: Timeline Display




                                        12
Trace Visualization: Process Timeline Display




                                            13
Trace Visualization: Statistic Summary Display




                                            14
Trace Visualization: Message Statistics Display




                                             15
The Vampir Tool Family

VampirTrace
  convenient instrumentation and measurement
  hides away complicated details
  provides many options and switches for experts
  VampirTrace is part of Open MPI 1.3

Vampir/VampirServer
  interactive trace visualization and analysis
  intuitive browsing and zooming
  scalable to large trace data sizes (100GB)
  scalable to high parallelism (2000 processes)

Vampir for Windows in progress, beta version
available

                                                   16
Trace File Formats


Open Trace Format (OTF)
 Open source trace file format
 Includes powerful libotf for use in custom applications
 High level interface for tools + low level interface for trace libraries


Other Formats
 TAU Trace Format (Univ. of Oregon)
 Epilog (ZAM, FZ Jülich)
 STF (Pallas, now Intel)




                                                           17
Other Tools
Other Event Tracing Tools
 TAU profiling (University of Oregon, USA)
  – profiling and tracing for parallel applications
  – http://www.cs.uoregon.edu/research/tau/
 Paraver (CEPBA, Barcelona, Spain)
  – trace based parallel performance analysis and visualization
  – http://www.cepba.upc.edu/paraver/
 Scalasca (FZ Jülich)
  – tracing and automatic detection of performance problems
  – http://www.scalasca.org/
 Intel Trace Collector & Analyzer
  – Very similar to Vampir


                                                      18
Instrumentation



Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323




Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
Instrumentation



Instrumentation: Process of modifying programs to detect and report
   events by calling instrumentation functions.


  instrumentation functions provided by trace library
  notification about run-time event


  there are various ways of instrumentation




                                                        20
Instrumentation

Edit – Compile – Run Cycle


               Compiler                       Run
Source Code                   Binary                     Results




Edit – Compile – Run Cycle with VampirTrace


                Compiler                      Run
Source Code                   Binary                     Results
               VT Wrapper

                                                         Traces




                                                    21
Instrumentation Types


 Source code instrumentation
  –   manually
  –   automatically
 Instrumentation with wrapper functions
 Library pre-load instrumentation
 Compiler Instrumentation
 Binary instrumentation


 VampirTrace supports different methods of instrumentation
 Hidden in compiler wrappers




                                                    22
Source Code Instrumentation


int foo(void* arg) {                int foo(void* arg) {
                                        enter(7);
    if (cond) {                         if (cond) {
                                            leave(7);
        return 1;
                                            return 1;
    }
                                        }
                                        leave(7);
    return 0;
                                        return 0;
}
                                    }


                       manually or automatically

                                                        23
Source Code Instrumentation

manually
  large effort
  error prone
  difficult to manage


automatically
  via source to source translation
  Program Database Toolkit (PDT)
  http://www.cs.uoregon.edu/research/pdt/
  OpenMP Pragma And Region Instrumentor (Opari)
  http://www.fz-juelich.de/zam/kojak/opari/



                                                  24
Instrumentation with Wrapper Functions

 provide wrapper functions
  – call instrumentation function for notification
  – call original target for actual functionality


 implement via library pre-load
 or via preprocessor directives

  #define fread WRAPPER_glibc_fread
  #define fwrite WRAPPER_glibc_fwrite

 suitable for standard libraries (e.g. MPI, glibc)
 can evaluate function call semantics (function signature, arguments)



                                                      25
The MPI Profiling Interface

 Instrumentation via library pre-load, e.g. for MPI
 Each MPI function has two names:
  – MPI_xxx and PMPI_xxx
 Selective replacement of MPI routines at link time

          MPI_Send                          MPI_Send
                                                        user program


          MPI_Send
                                                       wrapper library


          MPI_Send        PMPI_Send         MPI_Send
                                                             MPI library



                                                        26
Compiler Instrumentation

gcc -finstrument-functions –c foo.c


  void __cyg_profile_func_enter( <args> );
  void __cyg_profile_func_exit( <args> );


  many compilers support instrumentation:
   (GCC, Intel, IBM, PGI, NEC, Hitachi, Sun Fortran, …)
  no common API, different command line switches, different
  behavior
  no source modification necessary
  managed by VampirTrace


                                                      27
Dynamic Instrumentation

 modify binary executable in main memory (or in a file)
 insert instrumentation calls
 very platform/machine dependent
 expensive


Using the DynInst project
 provides common interface to binary instrumentation
 available for Alpha/Tru64, MIPS/IRIX, PowerPC/AIX,
 Sparc/Solaris, x86/Linux+Windows, ia64/Linux
 see http://www.dyninst.org




                                                       28
Practical Instrumentation

  Use VampirTrace compiler wrappers
  Internals and plattform specifics hidden
  Select appropriate way(s) of instrumentation
  Substitute calls to the regular compiler with calls to compiler
  wrappers
   CC=mpicc
   CC=vtcc




                                                        29
Run Time Measurement



Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323




Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
Trace Library

What does the trace library do?
  provide instrumentation functions
  receive events of various types
  collect event properties
   – time stamp
   – location (thread, process, cluster node, MPI rank)
   – event specific properties
   – perhaps hardware performance counter values
  record to memory buffer, flush eventually
  try to be fast, minimize overhead



                                                      31
Run-Time Options


 There are a number of run-time options
 Controlled by environment variables


 PAPI hardware performance counters
 Memory allocation counters
 Application I/O calls
 Filtering
 Grouping
 more ...


 see more in the following presentations and hands-on parts


                                                     32
Performance Counters


 Include hardware performance counters in traces
  –   via PAPI library
  –   or Sun Solaris CPC counters
  –   or NEC SX counters
 VT_METRICS can be used to specify a colon-separated list of counters
 see papi_avail and papi_command_line tools etc.
 see VampirTrace Documentation for CPC and NEC counters
 set VT_METRICS environment variable



  export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM



                                                     33
Memory Allocation Tracing


 monitor memory allocation behavior
 record memory volume as counter
 record glibc calls like “malloc” and “free” as function calls
 via environment variable VT_MEMTRACE


  export VT_MEMTRACE=yes




                                                         34
I/O Tracing


 monitor POSIX I/O behavior
 record read/write rates as counters
 record standard I/O calls like “open” and “read”
 via environment variable VT_IOTRACE

  export VT_IOTRACE=yes

 mmap I/O not supported




                                                    35
Function Filtering

 selective tracing of certain functions/subroutines
 one way to reduce trace file size!
 via environment variable VT_FILTER_SPEC
  export VT_FILTER_SPEC=/home/user/filter.spec
 run-time filtering, no re-compilation or re-linking
  my*;test -- 1000
  calculate -- -1
  * -- 1000000

 see also the vtfilter tool
  –   can create a filter file with rough target size estimate
  –   can apply a filter to an existing trace file as post processing



                                                           36
Function Grouping

 defined user specified groups
 highlighting application behavior, different activities, program phases
  –   communication, computation, initialization, different libraries, ...
 groups are assigned to colors in Vampir displays
 run-time grouping, no re-compilation or re-linking


 via environment variable VT_GROUPS_SPEC
  export VT_GROUPS_SPEC=/home/<user>/groups.spec
 contains a list of groups of associated functions, wildcards allowed
  CALC=calculate
  MISC=my*;test
  UNKNOWN=*


                                                            37
Behind the Scenes
Further activities of the trace library:
  Data management
    –   Trace data is written to a buffer in memory first
    –   When this buffer is full, data is flushed to files
    –   Data compression, etc
  Timer selection and time synchronization between local clocks
    –   use highly accurate clocks
  Unification of local process/thread traces (post processing)
    –   trace processes/threads separately
    –   collect all traces of all parallel processes/threads at the end
    –   add global information about all participants



                                                             38
Conclusions



Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323




Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
Conclusion
 performance analysis is very important in HPC


 use performance analysis tools for profiling and tracing
 do not spend effort in DIY solutions, e.g. like printf-debugging


 use tracing tools with some precautions
  – overhead
  – data volume


 let us know about problems and about feature wishes via
 vampirsupport@zih.tu-dresden.de




                                                         40
available via http://www.vampir.eu/ and
http://www.tu-dresden.de/zih/vampirtrace/

           Thank you !
                                   41

More Related Content

What's hot

Yacf
YacfYacf
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
Sławomir Zborowski
 
Verilog Lecture3 hust 2014
Verilog Lecture3 hust 2014Verilog Lecture3 hust 2014
Verilog Lecture3 hust 2014
Béo Tú
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
Shinya Takamaeda-Y
 
Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL BasicRon Liu
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
Wang Hsiangkai
 
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteProcessor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteDVClub
 
Georgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software securityGeorgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software securityDefconRussia
 
Verilog tutorial
Verilog tutorialVerilog tutorial
Verilog tutorial
Maryala Srinivas
 
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Shinya Takamaeda-Y
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Yusuke Izawa
 
VLSI Lab manual PDF
VLSI Lab manual PDFVLSI Lab manual PDF
VLSI Lab manual PDF
UR11EC098
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Yusuke Izawa
 
Practical file
Practical filePractical file
Practical file
rajeevkr35
 
Smalltalk implementation of EXIL, a Component-based Programming Language
 Smalltalk implementation of EXIL, a Component-based Programming Language Smalltalk implementation of EXIL, a Component-based Programming Language
Smalltalk implementation of EXIL, a Component-based Programming Language
ESUG
 
Lecture 2 verilog
Lecture 2   verilogLecture 2   verilog
Lecture 2 verilogvenravi10
 
VLSI lab manual
VLSI lab manualVLSI lab manual
VLSI lab manual
VaniPrasad11
 

What's hot (20)

Yacf
YacfYacf
Yacf
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
Verilog Lecture3 hust 2014
Verilog Lecture3 hust 2014Verilog Lecture3 hust 2014
Verilog Lecture3 hust 2014
 
Syntutic
SyntuticSyntutic
Syntutic
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
 
Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL Basic
 
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
 
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test SuiteProcessor Verification Using Open Source Tools and the GCC Regression Test Suite
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
 
Georgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software securityGeorgy Nosenko - An introduction to the use SMT solvers for software security
Georgy Nosenko - An introduction to the use SMT solvers for software security
 
Verilog tutorial
Verilog tutorialVerilog tutorial
Verilog tutorial
 
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
 
VLSI Lab manual PDF
VLSI Lab manual PDFVLSI Lab manual PDF
VLSI Lab manual PDF
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Practical file
Practical filePractical file
Practical file
 
Smalltalk implementation of EXIL, a Component-based Programming Language
 Smalltalk implementation of EXIL, a Component-based Programming Language Smalltalk implementation of EXIL, a Component-based Programming Language
Smalltalk implementation of EXIL, a Component-based Programming Language
 
Lecture 2 verilog
Lecture 2   verilogLecture 2   verilog
Lecture 2 verilog
 
VLSI lab manual
VLSI lab manualVLSI lab manual
VLSI lab manual
 

Similar to 1 Vampir Overview

May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurementPTIHPA
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
Ganesan Narayanasamy
 
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Mingliang Liu
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Ganesan Narayanasamy
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
Rafael Ferreira da Silva
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
Ganesan Narayanasamy
 
Multicore
MulticoreMulticore
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redactedRyan Breed
 
HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & AnalysisRishi Pathak
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
Steve Caron
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and AnalysisRishi Pathak
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
iguazio
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
Affan Syed
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and toolszhang hua
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
Netronome
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 

Similar to 1 Vampir Overview (20)

May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
 
Multicore
MulticoreMulticore
Multicore
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
 
HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & Analysis
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and Analysis
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 

More from PTIHPA

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi PresentationPTIHPA
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_onPTIHPA
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace VisualizationPTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configurationPTIHPA
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indianaPTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program AnalysisPTIHPA
 
Switc Hpa
Switc HpaSwitc Hpa
Switc HpaPTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert HenschelPTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In DetailPTIHPA
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IUPTIHPA
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir UsagePTIHPA
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...PTIHPA
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopPTIHPA
 

More from PTIHPA (15)

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 
Switc Hpa
Switc HpaSwitc Hpa
Switc Hpa
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

1 Vampir Overview

  • 1. Event Tracing with VampirTrace and Vampir Zellescher Weg 12 Willers-Bau A114 Tel. +49 351 - 463 - 38323 Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
  • 2. Overview Introduction Event Tracing Overview Instrumentation Run-Time Measurement Conclusions 2
  • 3. Introduction Zellescher Weg 12 Willers-Bau A114 Tel. +49 351 - 463 - 38323 Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
  • 4. Why bother with performance analysis? Moore's Law still in charge, so what? increasingly difficult to get close to peak performance – for sequential computation • memory wall • optimum pipelining, ... – for parallel interaction • Amdahl's law • synchronization with single late-comer, ... efficiency is important because of limited resources scalability is important to cope with next bigger simulation 4
  • 5. Profiling and Tracing Profile Recording of aggregated information (Time, Counts, …) about program and system entities – functions, loops, basic blocks – application, processes, threads, … Methods of Profile Creation sampling (statistical approach) direct measurement (deterministic approach) 5
  • 6. Profiling and Tracing Trace Recording run-time events (points of interest) during program execution saved as event record – timestamp, process, thread, event type – event specific information via instrumentation & trace library Event Trace collection of all events of a process / program sorted by time stamp 6
  • 7. Profiling and Tracing Tracing Advantages preserve temporal and spatial relationships (context) allow reconstruction of dynamic behavior profiles can be calculated from traces Tracing Disadvantages traces can become very large may cause perturbation instrumentation and tracing is complicated – event buffering, clock synchronization, … 7
  • 8. Event Tracing Overview Zellescher Weg 12 Willers-Bau A114 Tel. +49 351 - 463 - 38323 Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
  • 9. Event Tracing from A to Z Instrumentation Run Time Visualization / Analysis Measurement src exec. instrument exec. instrument trace file(s) see following see more below presentation 9
  • 10. Most common event types Which events to monitor? enter/leave of function/routine/region – time stamp, process/thread, function ID send/receive of P2P message (MPI) – time stamp, sender, receiver, length, tag, communicator collective communication (MPI) – time stamp, process, root, communicator, # bytes hardware performance counter value – time stamp, process, counter ID, value corresponding “record types” in trace file format 10
  • 11. Parallel Trace Files 10010 P 1 ENTER 5 10090 P 1 ENTER 6 10110 P 1 ENTER 12 10110 P 1 SEND TO 3 LEN 1024 ... 10330 P 1 LEAVE 12 10400 10020 P 2 ENTER 5 P 1 LEAVE 6 DEF TIMERRES 1000000000 10520 10095 P 2 ENTER 6 P 1 ENTER 9 DEF PROCESS 1 `Master` 10550 10120 P 2 ENTER 13 P 1 LEAVE 9 DEF PROCESS 1 `Slave` ... 10300 P 2 RECV FROM 3 LEN 1024 ... DEF FUNCTION 5 `main` 10350 P 2 LEAVE 13 DEF FUNCTION 6 `foo` 10450 P 2 LEAVE 6 DEF FUNCTION 9 `bar` 10620 P 2 ENTER 9 DEF FUNCTION 12 `MPI_Send` 10650 P 2 LEAVE 9 DEF FUNCTION 13 `MPI_Recv` ... Trace Format Schematics 11
  • 13. Trace Visualization: Process Timeline Display 13
  • 14. Trace Visualization: Statistic Summary Display 14
  • 15. Trace Visualization: Message Statistics Display 15
  • 16. The Vampir Tool Family VampirTrace convenient instrumentation and measurement hides away complicated details provides many options and switches for experts VampirTrace is part of Open MPI 1.3 Vampir/VampirServer interactive trace visualization and analysis intuitive browsing and zooming scalable to large trace data sizes (100GB) scalable to high parallelism (2000 processes) Vampir for Windows in progress, beta version available 16
  • 17. Trace File Formats Open Trace Format (OTF) Open source trace file format Includes powerful libotf for use in custom applications High level interface for tools + low level interface for trace libraries Other Formats TAU Trace Format (Univ. of Oregon) Epilog (ZAM, FZ Jülich) STF (Pallas, now Intel) 17
  • 18. Other Tools Other Event Tracing Tools TAU profiling (University of Oregon, USA) – profiling and tracing for parallel applications – http://www.cs.uoregon.edu/research/tau/ Paraver (CEPBA, Barcelona, Spain) – trace based parallel performance analysis and visualization – http://www.cepba.upc.edu/paraver/ Scalasca (FZ Jülich) – tracing and automatic detection of performance problems – http://www.scalasca.org/ Intel Trace Collector & Analyzer – Very similar to Vampir 18
  • 19. Instrumentation Zellescher Weg 12 Willers-Bau A114 Tel. +49 351 - 463 - 38323 Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
  • 20. Instrumentation Instrumentation: Process of modifying programs to detect and report events by calling instrumentation functions. instrumentation functions provided by trace library notification about run-time event there are various ways of instrumentation 20
  • 21. Instrumentation Edit – Compile – Run Cycle Compiler Run Source Code Binary Results Edit – Compile – Run Cycle with VampirTrace Compiler Run Source Code Binary Results VT Wrapper Traces 21
  • 22. Instrumentation Types Source code instrumentation – manually – automatically Instrumentation with wrapper functions Library pre-load instrumentation Compiler Instrumentation Binary instrumentation VampirTrace supports different methods of instrumentation Hidden in compiler wrappers 22
  • 23. Source Code Instrumentation int foo(void* arg) { int foo(void* arg) { enter(7); if (cond) { if (cond) { leave(7); return 1; return 1; } } leave(7); return 0; return 0; } } manually or automatically 23
  • 24. Source Code Instrumentation manually large effort error prone difficult to manage automatically via source to source translation Program Database Toolkit (PDT) http://www.cs.uoregon.edu/research/pdt/ OpenMP Pragma And Region Instrumentor (Opari) http://www.fz-juelich.de/zam/kojak/opari/ 24
  • 25. Instrumentation with Wrapper Functions provide wrapper functions – call instrumentation function for notification – call original target for actual functionality implement via library pre-load or via preprocessor directives #define fread WRAPPER_glibc_fread #define fwrite WRAPPER_glibc_fwrite suitable for standard libraries (e.g. MPI, glibc) can evaluate function call semantics (function signature, arguments) 25
  • 26. The MPI Profiling Interface Instrumentation via library pre-load, e.g. for MPI Each MPI function has two names: – MPI_xxx and PMPI_xxx Selective replacement of MPI routines at link time MPI_Send MPI_Send user program MPI_Send wrapper library MPI_Send PMPI_Send MPI_Send MPI library 26
  • 27. Compiler Instrumentation gcc -finstrument-functions –c foo.c void __cyg_profile_func_enter( <args> ); void __cyg_profile_func_exit( <args> ); many compilers support instrumentation: (GCC, Intel, IBM, PGI, NEC, Hitachi, Sun Fortran, …) no common API, different command line switches, different behavior no source modification necessary managed by VampirTrace 27
  • 28. Dynamic Instrumentation modify binary executable in main memory (or in a file) insert instrumentation calls very platform/machine dependent expensive Using the DynInst project provides common interface to binary instrumentation available for Alpha/Tru64, MIPS/IRIX, PowerPC/AIX, Sparc/Solaris, x86/Linux+Windows, ia64/Linux see http://www.dyninst.org 28
  • 29. Practical Instrumentation Use VampirTrace compiler wrappers Internals and plattform specifics hidden Select appropriate way(s) of instrumentation Substitute calls to the regular compiler with calls to compiler wrappers CC=mpicc CC=vtcc 29
  • 30. Run Time Measurement Zellescher Weg 12 Willers-Bau A114 Tel. +49 351 - 463 - 38323 Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
  • 31. Trace Library What does the trace library do? provide instrumentation functions receive events of various types collect event properties – time stamp – location (thread, process, cluster node, MPI rank) – event specific properties – perhaps hardware performance counter values record to memory buffer, flush eventually try to be fast, minimize overhead 31
  • 32. Run-Time Options There are a number of run-time options Controlled by environment variables PAPI hardware performance counters Memory allocation counters Application I/O calls Filtering Grouping more ... see more in the following presentations and hands-on parts 32
  • 33. Performance Counters Include hardware performance counters in traces – via PAPI library – or Sun Solaris CPC counters – or NEC SX counters VT_METRICS can be used to specify a colon-separated list of counters see papi_avail and papi_command_line tools etc. see VampirTrace Documentation for CPC and NEC counters set VT_METRICS environment variable export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM 33
  • 34. Memory Allocation Tracing monitor memory allocation behavior record memory volume as counter record glibc calls like “malloc” and “free” as function calls via environment variable VT_MEMTRACE export VT_MEMTRACE=yes 34
  • 35. I/O Tracing monitor POSIX I/O behavior record read/write rates as counters record standard I/O calls like “open” and “read” via environment variable VT_IOTRACE export VT_IOTRACE=yes mmap I/O not supported 35
  • 36. Function Filtering selective tracing of certain functions/subroutines one way to reduce trace file size! via environment variable VT_FILTER_SPEC export VT_FILTER_SPEC=/home/user/filter.spec run-time filtering, no re-compilation or re-linking my*;test -- 1000 calculate -- -1 * -- 1000000 see also the vtfilter tool – can create a filter file with rough target size estimate – can apply a filter to an existing trace file as post processing 36
  • 37. Function Grouping defined user specified groups highlighting application behavior, different activities, program phases – communication, computation, initialization, different libraries, ... groups are assigned to colors in Vampir displays run-time grouping, no re-compilation or re-linking via environment variable VT_GROUPS_SPEC export VT_GROUPS_SPEC=/home/<user>/groups.spec contains a list of groups of associated functions, wildcards allowed CALC=calculate MISC=my*;test UNKNOWN=* 37
  • 38. Behind the Scenes Further activities of the trace library: Data management – Trace data is written to a buffer in memory first – When this buffer is full, data is flushed to files – Data compression, etc Timer selection and time synchronization between local clocks – use highly accurate clocks Unification of local process/thread traces (post processing) – trace processes/threads separately – collect all traces of all parallel processes/threads at the end – add global information about all participants 38
  • 39. Conclusions Zellescher Weg 12 Willers-Bau A114 Tel. +49 351 - 463 - 38323 Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
  • 40. Conclusion performance analysis is very important in HPC use performance analysis tools for profiling and tracing do not spend effort in DIY solutions, e.g. like printf-debugging use tracing tools with some precautions – overhead – data volume let us know about problems and about feature wishes via vampirsupport@zih.tu-dresden.de 40
  • 41. available via http://www.vampir.eu/ and http://www.tu-dresden.de/zih/vampirtrace/ Thank you ! 41