2010 02 instrumentation_and_runtime_measurement


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2010 02 instrumentation_and_runtime_measurement

  1. 1. Instrumentation and Run-Time Measurement VampirTrace
  2. 2. Overview • Instrumentation – Automatic, manual and binary instrumentation • Run-time measurement – Behind the scenes, post-processing – Trace file format, overhead • Options, settings, parameters – Environment Variables – PAPI hardware performance counters – Memory allocation counters, application I/O calls – Filtering, grouping • FAQ and Issues
  4. 4. Instrumentation in General Edit – Compile – Run Cycle Compiler Run Source Code Binary Results Edit – Compile – Run Cycle with VampirTrace Compiler Run Source Code Binary Results VT Wrapper Traces
  5. 5. Compiler Wrappers • Easiest way of using VampirTrace • No source code modifications • In the build system of your application, substitute calls to the regular compiler with calls to the VampirTrace compiler wrappers – For compiling and linking – e.g. in the makefile change icc to vtcc • Rebuild the application • Run the application to produce trace data
  6. 6. Instrumentation & Measurement • What do you need to do for it? – VampirTrace and a supported compiler • Instrumentation (automatic with compiler wrappers) CC = icc CC = vtcc CXX = icpc CXX = vtcxx F90 = ifc F90 = vtf90 MPICC = mpicc MPICC = vtcc -vt:cc mpicc • Re-compile & re-link • Trace Run (run with appropriate test data set) • More details later
  7. 7. Compiler Wrappers Compiler Wrappers Captured events: • All user function entries and exits – If supported by the compiler (Intel, GNU, PGI, NEC, IBM) • MPI calls and messages – If the application is MPI parallel • OMP regions – If the application is OpenMP parallel
  8. 8. Manual Instrumentation • Allows for detailed source code instrumentation – e.g. regions of functions such as loops • Can be combined with automatic instrumentation • Be sure to instrument all function exits! – Otherwise post-mortem analysis will fail • I personally consider this advanced usage of VampirTrace!
  9. 9. Manual Instrumentation Manual Instrumentation • Add the following into our source code to instrument a region, e.g. C: (available for C++ and FORTRAN as well) #include "vt_user.h" ... VT_USER_START("Region_1"); ... VT_USER_END("Region_1"); ... • Compile with “-DVTRACE” – Otherwise, VampirTrace macros will expand to empty blocks, producing zero overhead vtcc -vt:inst manual prog.c -DVTRACE -o prog
  10. 10. Binary Instrumentation • Using DYNINST – http://www.dyninst.org • Source should be compiled with “-g” switch • “vtunify” has to be run manually afterwards vtf90 -vt:inst dyninst prog.c -o prog
  11. 11. Behind the Scenes Unifying - Post-Processing OTF Open Trace Format Tracing Overhead RUN-TIME MEASUREMENT
  12. 12. Workflow 1) Instrumentation – Hide instrumentation in compiler wrappers – Use underlying compiler and add appropriate options CC=mpicc CC= vtcc -vt:cc mpicc 2) Test Run – Use representative test input – Set parameters, environment variables, etc. – Selective tracing 3) Get Trace
  13. 13. Automatic Function Tracing • Uses compiler support to add tracing calls at every function entry and exit • Compilers supported: – GNU, Intel, PGI, PathScale, IBM, Sun Fortran, NEC • Binary instrumentation via Dyninst
  14. 14. MPI and OpenMP Tracing • Tracing of MPI-1 and MPI-IO events via PMPI interface • Tracing of OpenMP directives via OPARI source-to-source instrumentation
  15. 15. Hardware Performance Counter • Recording PAPI counter(s) at every function entry / exit • PAPI allows access to hardware (mostly CPU) counters, e.g. floating point operations, cache misses, exceptions • Can derive rates, e.g. GFlop/s of each function
  16. 16. Memory and I/O Tracing • Tracing of memory allocation calls via libc built-in hooks • malloc, realloc, free, … • Tracing of I/O calls, accessed files, transferred data volume via wrappers for I/O calls • open, read, write, …
  17. 17. Instrumentation & Measurement What does VampirTrace do in the background? • Trace Run: – Event data collection – Precise time measurement – Parallel timer synchronization – Collecting parallel process/thread traces – Collecting performance counters • from PAPI, • memory usage, • POSIX I/O calls and • fork/system/exec calls, and more … – Filtering and grouping of function calls 17
  18. 18. Behind the Scenes • Trace data is written to a buffer in memory first • When this buffer is full, data is flushed to storage • After the application has run to completion, these trace files are unified to produce the final OTF trace • Most aspects of this behavior can be customized with environment variables
  19. 19. Filebased Workflow
  20. 20. Unifying - Post-Processing • Normally, trace data is unified automatically after the application has run to completion • This takes time – depending on the trace-data • Can be switched off by an environment variable • vtunify <number-of-trace-files> <trace-file-prefix> vtunify 16 my_trace
  21. 21. How to Store Trace Data - Trace File Various trace file formats (for HPC): – VTF3 (TU Dresden) – Tau Trace Format (Univ. of Oregon, LANL and JSC/Jülich) – EPILOG (JSC/Jülich/Germany) – STF (Pallas GmbH, now Intel) – OTF (TU Dresden) • ASCII or binary file formats • single/multiple file(s) per trace • merge process traces to single file • multiple streams for parallel/selective I/O
  22. 22. OTF – Open Trace Format • Open source trace file format – Available from the homepage of TU Dresden, ZIH http://www.tu-dresden.de/zih/otf/ • Includes powerful libotf for use in custom applications • API / Interfaces – High level interface for analysis tools – Low level interface for trace libraries • Actively developed – In cooperation with the University of Oregon and Lawrence Livermore National Laboratory
  23. 23. Tracing Overhead • Measured on SGI Altix 4700, Itanium 2 1.6 GHz • Tracing overhead per function call (from test program with one million function calls, multiple repetitions) • Suppressed inlining: icc -O2 -ip-no-inlining VampirTrace Intel Trace Collector 3 PAPI counters 4.61 µs 9.64 µs 1 PAPI counter 4.47 µs 9.25 µs Without PAPI 0.92 µs 1.10 µs Filtered function 0.82 µs 1.04 µs
  24. 24. Environment Variables PAPI hardware performance counters Memory allocation counters Application I/O calls Filtering Grouping OPTIONS, SETTINGS, PARAMETERS
  25. 25. Environment Variables • By default, trace data is written to the ‘pwd’ • About everything of this can be customized with environment variables • Environment variables must be set prior to running the application, not prior to building the application
  26. 26. Environment Variables VT_PFORM_GDIR Directory where final trace file is stored VT_PFORM_LDIR Directory for intermediate trace files VT_FILE_PREFIX Trace file name VT_BUFFER_SIZE Internal trace buffer size VT_MAX_FLUSHES Max number of buffer flushes VT_MEMTRACE Enable memory allocation tracing VT_IOTRACE Enable I/O tracing VT_MPITRACE Enable MPI tracing VT_FILTER_SPEC Name of filter file VT_GROUPS_SPEC Name of function groups file VT_COMPRESSION Compress trace files VT_METRICS List of PAPI counters
  27. 27. PAPI Counter Environment Variables • PAPI counters can be included in traces – If PAPI is available on the platform – If VampirTrace was build with PAPI support • VT_METRICS can be used to specify a colon- separated list of PAPI counters export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM • VampirTrace >5.8.1 will have a customizable separator as Component-PAPI counters will use colons in the counter-names
  28. 28. Memory Counter • Memory allocation counters can be included in traces – If VampirTrace was build with memory allocations support – If GNU glibc is used on the platform • Memory function in glibc like “malloc” and “free” are traced • Environment variable VT_MEMTRACE export VT_MEMTRACE=yes
  29. 29. I/O Counter • I/O counter can be included in traces – If VampirTrace was build with I/O tracing support • Standard I/O calls like “open” and “read” are recorded • Environment variable VT_IOTRACE export VT_IOTRACE=yes
  30. 30. User defined Counter • Records program variables or any other numerical quantity #include "vt_user.h" int main() { unsigned int i, cid, cgid; cgid = VT_COUNT_GROUP_DEF(’loopindex’); cid = VT_COUNT_DEF("i", "#", VT_COUNT_TYPE_UNSIGNED, cgid); for( i = 1; i <= 100; i++ ) { VT_COUNT_UNSIGNED_VAL(cid, i); } return 0; } • Helps finding „that one loop-iteration“ which causes trouble
  31. 31. User defined Counter
  32. 32. Function Filtering • Filtering is one of the ways to reduce trace file size • Environment variable VT_FILTER_SPEC %> export VT_FILTER_SPEC=filter.spec • Filter definition file contains a list of filters my_*;test_* -- 1000 debug_* -- 0 calculate -- -1 * -- 1000000 • Filter rules can be global to all processes or only be assigned to specific ranks (see the manual for more details of rank specific filtering) • See also the vtfilter tool – Can generate a customized filter file – Can reduce the size of existing trace files
  33. 33. Switch Tracing On/Off • Starting and stopping of tracing should be performed with care • Tracing has to be activated on the same level as it was switched off to ensure the consistency of the trace file • Useful if your program behaves in an iterative manner or if you are only interested in some parts of your application #include “vt_user.h” … VT_OFF(); for( i=1; i < 100; i++ ) { do something}; VT_ON(); … • Recompile your source code with the user macro “-DVTRACE” %> vtcc … -DVTRACE source_code.c …
  34. 34. Selective Instrumentation • Selective instrumentation can help you to reduce the size of your trace file so that only those parts of interests will be recorded • One option to use selective instrumentation is to use a manual instrumentation instead of a automatic instrumentation %> vtcc -vt:inst manual … source_code.c • Another option is to modify your Makefile in such a way that a automatic instrumentation (default) is only applied to source files of interest (functions of interest)
  35. 35. Function Grouping • Groups can be defined by the user to group related functions – Groups can be assigned different colors in Vampir, highlighting application behavior • Environment variable VT_GROUPS_SPEC export VT_GROUPS_SPEC=/path/to/groups.spec • Group file contains a list of groups with associated functions CALC=calculate MISC=my*;test UNKNOWN=*
  36. 36. Advanced Performance Monitoring • CUDA wrapper library Application – Based on LD_PRELOAD – Usable with dynamically Function linked libraries enter leave – Little overhead Preload- Library (indirection) Wrapper- Function – No re-compilation (neither application nor library) enter leave CUDA Function
  37. 37. Advanced Performance Monitoring • vtlibwrapgen foo.h – Abstraction layer for process monitoring monitor-gen – Dynamic and static libraries callback.inc.* vt_user.h – Requires library’s header file only – Portable make libmonitor/src libmonitor.so vtlibwrapgen -g SDL -o SDLwrap.c /usr/include/SDL/*.h vtlibwrapgen --build --shared -o libSDLwrap SDLwrap.c export LD_PRELOAD=$PWD/libSDLwrap.so <executable>
  38. 38. QUESTIONS?