1 Vampir Overview

Event Tracing with
VampirTrace and Vampir

Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Overview

Introduction
Event Tracing Overview
Instrumentation
Run-Time Measurement
Conclusions

2

Introduction

Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323


Why bother with performance analysis?

Moore's Law still in charge, so what?
increasingly difficult to get close to peak performance
– for sequential computation
• memory wall
• optimum pipelining, ...
– for parallel interaction
• Amdahl's law
• synchronization with single late-comer, ...

efficiency is important because of limited resources
scalability is important to cope with next bigger simulation

4

Profiling and Tracing

Profile Recording
of aggregated information (Time, Counts, …)
about program and system entities
– functions, loops, basic blocks
– application, processes, threads, …

Methods of Profile Creation
sampling (statistical approach)
direct measurement (deterministic approach)

5


Trace Recording
run-time events (points of interest)
during program execution
saved as event record
– timestamp, process, thread, event type
– event specific information
via instrumentation & trace library

Event Trace
collection of all events of a process / program
sorted by time stamp

6


Tracing Advantages
preserve temporal and spatial relationships (context)
allow reconstruction of dynamic behavior
profiles can be calculated from traces

Tracing Disadvantages
traces can become very large
may cause perturbation
instrumentation and tracing is complicated
– event buffering, clock synchronization, …

7

Event Tracing Overview

Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323


Event Tracing from A to Z

Instrumentation Run Time Visualization / Analysis
Measurement

src
exec.
instrument

exec.
instrument
trace file(s)
see following
see more below
presentation

9

Most common event types

Which events to monitor?
enter/leave of function/routine/region
– time stamp, process/thread, function ID
send/receive of P2P message (MPI)
– time stamp, sender, receiver, length, tag, communicator
collective communication (MPI)
– time stamp, process, root, communicator, # bytes
hardware performance counter value
– time stamp, process, counter ID, value

corresponding “record types” in trace file format

10

Parallel Trace Files
10010 P 1 ENTER 5
10090 P 1 ENTER 6
10110 P 1 ENTER 12
10110 P 1 SEND TO 3 LEN 1024 ...
10330 P 1 LEAVE 12
10400 10020 P 2 ENTER 5
P 1 LEAVE 6 DEF TIMERRES 1000000000
10520 10095 P 2 ENTER 6
P 1 ENTER 9 DEF PROCESS 1 `Master`
10550 10120 P 2 ENTER 13
P 1 LEAVE 9 DEF PROCESS 1 `Slave`
... 10300 P 2 RECV FROM 3 LEN 1024 ...
DEF FUNCTION 5 `main`
10350 P 2 LEAVE 13 DEF FUNCTION 6 `foo`
10450 P 2 LEAVE 6 DEF FUNCTION 9 `bar`
10620 P 2 ENTER 9 DEF FUNCTION 12 `MPI_Send`
10650 P 2 LEAVE 9 DEF FUNCTION 13 `MPI_Recv`
...

Trace Format Schematics
11

Trace Visualization: Timeline Display

12

Trace Visualization: Process Timeline Display

13

Trace Visualization: Statistic Summary Display

14

Trace Visualization: Message Statistics Display

15

The Vampir Tool Family

VampirTrace
convenient instrumentation and measurement
hides away complicated details
provides many options and switches for experts
VampirTrace is part of Open MPI 1.3

Vampir/VampirServer
interactive trace visualization and analysis
intuitive browsing and zooming
scalable to large trace data sizes (100GB)
scalable to high parallelism (2000 processes)

Vampir for Windows in progress, beta version
available

16

Trace File Formats

Open Trace Format (OTF)
Open source trace file format
Includes powerful libotf for use in custom applications
High level interface for tools + low level interface for trace libraries

Other Formats
TAU Trace Format (Univ. of Oregon)
Epilog (ZAM, FZ Jülich)
STF (Pallas, now Intel)

17

Other Tools
Other Event Tracing Tools
TAU profiling (University of Oregon, USA)
– profiling and tracing for parallel applications
– http://www.cs.uoregon.edu/research/tau/
Paraver (CEPBA, Barcelona, Spain)
– trace based parallel performance analysis and visualization
– http://www.cepba.upc.edu/paraver/
Scalasca (FZ Jülich)
– tracing and automatic detection of performance problems
– http://www.scalasca.org/
Intel Trace Collector & Analyzer
– Very similar to Vampir

18

Instrumentation

Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323


Instrumentation

Instrumentation: Process of modifying programs to detect and report
events by calling instrumentation functions.

instrumentation functions provided by trace library
notification about run-time event

there are various ways of instrumentation

20

Instrumentation

Edit – Compile – Run Cycle

Compiler Run
Source Code Binary Results

Edit – Compile – Run Cycle with VampirTrace

Compiler Run
Source Code Binary Results
VT Wrapper

Traces

21

Instrumentation Types

Source code instrumentation
– manually
– automatically
Instrumentation with wrapper functions
Library pre-load instrumentation
Compiler Instrumentation
Binary instrumentation

VampirTrace supports different methods of instrumentation
Hidden in compiler wrappers

22

Source Code Instrumentation

int foo(void* arg) { int foo(void* arg) {
enter(7);
if (cond) { if (cond) {
leave(7);
return 1;
return 1;
}
}
leave(7);
return 0;
return 0;
}
}

manually or automatically

23

Source Code Instrumentation

manually
large effort
error prone
difficult to manage

automatically
via source to source translation
Program Database Toolkit (PDT)
http://www.cs.uoregon.edu/research/pdt/
OpenMP Pragma And Region Instrumentor (Opari)
http://www.fz-juelich.de/zam/kojak/opari/

24

Instrumentation with Wrapper Functions

provide wrapper functions
– call instrumentation function for notification
– call original target for actual functionality

implement via library pre-load
or via preprocessor directives

#define fread WRAPPER_glibc_fread
#define fwrite WRAPPER_glibc_fwrite

suitable for standard libraries (e.g. MPI, glibc)
can evaluate function call semantics (function signature, arguments)

25

The MPI Profiling Interface

Instrumentation via library pre-load, e.g. for MPI
Each MPI function has two names:
– MPI_xxx and PMPI_xxx
Selective replacement of MPI routines at link time

MPI_Send MPI_Send
user program

MPI_Send
wrapper library

MPI_Send PMPI_Send MPI_Send
MPI library

26

Compiler Instrumentation

gcc -finstrument-functions –c foo.c

void __cyg_profile_func_enter( <args> );
void __cyg_profile_func_exit( <args> );

many compilers support instrumentation:
(GCC, Intel, IBM, PGI, NEC, Hitachi, Sun Fortran, …)
no common API, different command line switches, different
behavior
no source modification necessary
managed by VampirTrace

27

Dynamic Instrumentation

modify binary executable in main memory (or in a file)
insert instrumentation calls
very platform/machine dependent
expensive

Using the DynInst project
provides common interface to binary instrumentation
available for Alpha/Tru64, MIPS/IRIX, PowerPC/AIX,
Sparc/Solaris, x86/Linux+Windows, ia64/Linux
see http://www.dyninst.org

28

Practical Instrumentation

Use VampirTrace compiler wrappers
Internals and plattform specifics hidden
Select appropriate way(s) of instrumentation
Substitute calls to the regular compiler with calls to compiler
wrappers
CC=mpicc
CC=vtcc

29

Run Time Measurement

Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323


Trace Library

What does the trace library do?
provide instrumentation functions
receive events of various types
collect event properties
– time stamp
– location (thread, process, cluster node, MPI rank)
– event specific properties
– perhaps hardware performance counter values
record to memory buffer, flush eventually
try to be fast, minimize overhead

31

Run-Time Options

There are a number of run-time options
Controlled by environment variables

PAPI hardware performance counters
Memory allocation counters
Application I/O calls
Filtering
Grouping
more ...

see more in the following presentations and hands-on parts

32

Performance Counters

Include hardware performance counters in traces
– via PAPI library
– or Sun Solaris CPC counters
– or NEC SX counters
VT_METRICS can be used to specify a colon-separated list of counters
see papi_avail and papi_command_line tools etc.
see VampirTrace Documentation for CPC and NEC counters
set VT_METRICS environment variable

export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM

33

Memory Allocation Tracing

monitor memory allocation behavior
record memory volume as counter
record glibc calls like “malloc” and “free” as function calls
via environment variable VT_MEMTRACE

export VT_MEMTRACE=yes

34

I/O Tracing

monitor POSIX I/O behavior
record read/write rates as counters
record standard I/O calls like “open” and “read”
via environment variable VT_IOTRACE

export VT_IOTRACE=yes

mmap I/O not supported

35

Function Filtering

selective tracing of certain functions/subroutines
one way to reduce trace file size!
via environment variable VT_FILTER_SPEC
export VT_FILTER_SPEC=/home/user/filter.spec
run-time filtering, no re-compilation or re-linking
my*;test -- 1000
calculate -- -1
* -- 1000000

see also the vtfilter tool
– can create a filter file with rough target size estimate
– can apply a filter to an existing trace file as post processing

36

Function Grouping

defined user specified groups
highlighting application behavior, different activities, program phases
– communication, computation, initialization, different libraries, ...
groups are assigned to colors in Vampir displays
run-time grouping, no re-compilation or re-linking

via environment variable VT_GROUPS_SPEC
export VT_GROUPS_SPEC=/home/<user>/groups.spec
contains a list of groups of associated functions, wildcards allowed
CALC=calculate
MISC=my*;test
UNKNOWN=*

37

Behind the Scenes
Further activities of the trace library:
Data management
– Trace data is written to a buffer in memory first
– When this buffer is full, data is flushed to files
– Data compression, etc
Timer selection and time synchronization between local clocks
– use highly accurate clocks
Unification of local process/thread traces (post processing)
– trace processes/threads separately
– collect all traces of all parallel processes/threads at the end
– add global information about all participants

38

Conclusions

Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323


Conclusion
performance analysis is very important in HPC

use performance analysis tools for profiling and tracing
do not spend effort in DIY solutions, e.g. like printf-debugging

use tracing tools with some precautions
– overhead
– data volume

let us know about problems and about feature wishes via
vampirsupport@zih.tu-dresden.de

40

available via http://www.vampir.eu/ and
http://www.tu-dresden.de/zih/vampirtrace/

Thank you !
41

1 Vampir Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 1 Vampir Overview

Similar to 1 Vampir Overview (20)

More from PTIHPA

More from PTIHPA (15)

Recently uploaded

Recently uploaded (20)

1 Vampir Overview