2010 05 hands_on

985 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
985
On SlideShare
0
From Embeds
0
Number of Embeds
97
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2010 05 hands_on

  1. 1. HandsOn
  2. 2. Using putty for portforwarding QUARRY / VAMPIRSERVER
  3. 3. Start VM • If not done already – start the virtual machine: 1. Start  All Programs  MS Virtual PC 2. Next  Add an existing VM  Next 3. Browse  Select Windows XP VM 4. Next  Finish 5. Start Windows XP VM • Full-screen-mode is found under the Action menu entry • The VM should resize the resolution automatically Hands-On is completely done in the VM
  4. 4. Start PuTTY Select Quarry, click Load and then Open
  5. 5. Login to compute-node • There is a script in your home-folder that should connect you to the correct node: % ./logon_to_compute_node
  6. 6. Start VampirServer % vampirserver Once connected, type in “vampirserver “
  7. 7. Open Second PuTTY Select Quarry, click Load but !NOT! Open
  8. 8. Portforwarding On the left, select SSH and then Tunnels
  9. 9. Portforwarding Source port: 30000 Destination: see vampirserver Click Add
  10. 10. Portforwarding Click Open and then: % ./logon_to_compute_node This terminal can be used normally for compile&run commands
  11. 11. Vampir Remote Open Open GUI, click on File, then click on Remote Open
  12. 12. Vampir Remote Open Server: 127.0.0.1 Port: 30000 Click on Connect
  13. 13. Vampir Remote Open To avoid having to wait for all user-home folders to be loaded – add path manually. Path: /N/u/hpstrn## ##: 01-15 (see vampirserver terminal for your specific username)
  14. 14. Vampir Remote Open In your Home under Quarry/traces/p_8, click on Semtex_original_8cpu and then Open
  15. 15. Vampir 7 GUI Take time to get acquainted with the different displays and the options each display offers
  16. 16. Use of VampirTrace • Instrument your application with VampirTrace – Edit your Makefile and change the underlying compiler CC = icc CC = vtcc CXX = icpc CXX = vtcxx F77 = ifort F77 = vtf77 F90 = ifort F90 = vtf90 MPICC = icc MPICC = vtcc MPIF90 = ifort MPIF90 = vtf90 – Tell VampirTrace the parallelization type of your application -vt:<seq|mpi|mt|hyb> # seq = sequential # mpi = parallel (uses MPI) # mt = parallel (uses OpenMP/POSIX threads) # hyb = hybrid parallel (MPI+Threads) – Optional: Choose instrumentation type for your application -vt:inst <gnu|pgi|sun|xl|ftrace|openuh|manual| dyninst> # DEFAULT: automatic instrumentation by compiler # manual: manual by using VT’s API (see manual) # dyninst: binary instrumentation using Dyninst
  17. 17. Getting to know the GUI HANDS-ON EXERCISE
  18. 18. Hands-on: The Ping-Pong Example • Hands-on: The Ping-Pong example with VampirTrace and Vampir – Go to the ping_pong.c example program %> cd ./examples/ping_pong – Compile and run with pristine version • Always check that target application compiles and runs without errors %> mpicc -g –O3 ping_pong.c –o ping_pong %> mpirun –np 2 ./ping_pong – Compile with VampirTrace compiler wrapper – Run normally %> vtcc –vt:cc mpicc -g –O3 ping_pong.c –o ping_pong %> mpirun –np 2 ./ping_pong
  19. 19. Hands-on: The Ping-Pong Example – After trace run, there are additional output files in the working directory: 2.2K ping_pong.0.def.z 29 ping_pong.0.marker.z 954 ping_pong.1.events.z 935 ping_pong.2.events.z 12 ping_pong.otf – The event trace in Open Trace Format (OTF) • Anchor file *.otf • Definitions in *.def.z • Events in *.events.z, one per process/rank/thread by default • Markers in *.markers.z for advanced usage – Open *.otf with Vampir – Command line tools to access or modify OTF traces
  20. 20. Hands-on: The Ping-Pong Example Time interval indicator: entire time shown Timeline and Profile: time mostly spend in VT init and MPI finish
  21. 21. Hands-on: The Ping-Pong Example zoomed to the actual activity
  22. 22. Hands-on: The Ping-Pong Example MPI time still dominating! further zoomed, ping-pong messages become visible average message bandwidth
  23. 23. Hands-on: The Ping-Pong Example zoomed to single message pair different behavior on both ranks details for selected second message
  24. 24. Center for Information Services and High Performance Computing (ZIH) Guided Exercise with NPB 3.3 BT-MPI VAMPIR / VAMPIRTRACE HANDS- ON EXERCISE
  25. 25. Hands-on: NPB 3.3 BT-MPI – Move into tutorial directory in your home directory % cd NPB3.3-MPI – Select the VampirTrace compiler wrappers % vim config/make.def -> comment out line 32, resulting in: ... 32: #MPIF77 = mpif77 ... -> remove the comment from line 38, resulting in: ... 38: MPIF77 = vtf77 –vt:f77 mpif77 ... -> comment out line 88, resulting in: ... 88: #MPICC = mpicc ... -> remove the comment from line 94, resulting in: ... 94: MPICC = vtcc -vt:cc mpicc ...
  26. 26. Hands-on: NPB 3.3 BT-MPI • Build benchmark % make clean; make suite • Launch as MPI application % cd bin.vampir; export VT_FILE_PREFIX=bt_1_initial % mpiexec –np 16 bt_W.16 NAS Parallel Benchmarks 3.3 -- BT Benchmark Size: 24x 24x 24 Iterations: 200 dt: 0.0008000 Number of active processes: 16 Time step 1 ... Time step 180 [0]VampirTrace: Maximum number of buffer flushes reached (VT_MAX_FLUSHES=1) [0]VampirTrace: Tracing switched off permanently Time step 200 ...
  27. 27. Hands-on: NPB 3.3 BT-MPI • Resulting trace files % ls -alh 4,1M bt_1_initial.16 4,9K bt_1_initial.16.0.def.z 29 bt_1_initial.16.0.marker.z 12M bt_1_initial.16.10.events.z 12M bt_1_initial.16.1.events.z 11M bt_1_initial.16.2.events.z 12M bt_1_initial.16.3.events.z ... 11M bt_1_initial.16.c.events.z 12M bt_1_initial.16.d.events.z 12M bt_1_initial.16.e.events.z 12M bt_1_initial.16.f.events.z 66 bt_1_initial.16.otf • Visualization with Vampir7
  28. 28. Hands-on: NPB 3.3 BT-MPI 28
  29. 29. Hands-on: NPB 3.3 BT-MPI 29
  30. 30. Hands-on: NPB 3.3 BT-MPI • Decrease number of buffer flushes by increasing the buffer size % export VT_MAX_FLUSHES=1 VT_BUFFER_SIZE=120M • Set a new file prefix % export VT_FILE_PREFIX=bt_2_buffer_120M • Launch as MPI application % mpiexec -np 16 bt_W.16
  31. 31. Hands-on: NPB 3.3 BT-MPI 31 On an SGI Altix4700
  32. 32. Hands-on: NPB 3.3 BT-MPI 32 On an SGI Altix4700
  33. 33. Hands-on: NPB 3.3 BT-MPI • Generate filter specification file % vtfilter -gen -fo filter.txt -r 10 -stats -p bt_2_buffer_120M.otf % export VT_FILTER_SPEC=/path/to/filter.txt • Set a new file prefix % export VT_FILE_PREFIX=bt_3_filter • Launch as MPI application % mpiexec -np 16 bt_W.16 • For reference a manually written filterfile: matmul_sub*; matvec_sub*;binvcrhs* --0
  34. 34. Hands-on: NPB 3.3 BT-MPI 34 On an SGI Altix4700
  35. 35. Hands-on: NPB 3.3 BT-MPI 35 On an SGI Altix4700
  36. 36. PAPI • PAPI counters can be included in traces – If VampirTrace was build with PAPI support – If PAPI is available on the platform • VT_METRICS specifies a list of PAPI counters % export VT_METRICS = PAPI_FP_OPS:PAPI_L2_TCM • see also the PAPI commands papi_avail and papi_command_line • PAPI is not available on quarry – View traces Large/Small on your windows-machine
  37. 37. Hands-on: NPB 3.3 BT-MPI • Record I/O and Memory counters % export VT_MEMTRACE = yes % export VT_IOTRACE = yes • Set a new file prefix % export VT_FILE_PREFIX=bt_4_papi • Launch as MPI application % mpiexec -np 16 bt_W.16 37
  38. 38. Hands-on: NPB 3.3 BT-MPI On an SGI Altix4700
  39. 39. Examples: Filtering: filter_mpi_omp Instrumenting: instrument_ring Profiling: profile_heat Mixed: Cannon FREE TRAINING
  40. 40. examples/filter_mpi_omp • Look into the source-code – Artificial example made of three parts • Matrix multiply MPI-parallelized • Matrix multiply OpenMP-parallelized • Dummy functions • Use automatic instrumentation and visualize • Filter out the dummy functions, run&visualize • Create a group-filter for dummy functions and matrix multiply functions – Do not forget to switch off the function filter
  41. 41. examples/instrument_ring • Look at source-code and makefiles • Run and visualize both versions • Add additional instrumentation for while loop • Run and visualize again
  42. 42. examples/profile_heat • Compile via “make all” • export GMON_OUT_PREFIX=name • Run the binaries (change prefix in between) • Use gprof to combine the profiles: gprof –s • Watch the output: gprof [-b] sum.txt | less
  43. 43. How to solve issues when using VampirTrace WRAP-UP
  44. 44. For more details on VampirTrace and its features see also the manual. HOW TO SOLVE ISSUES WHEN USING VAMPIRTRACE
  45. 45. Incomplete Traces • Issue: Tracing was switched off because the internal trace buffer was too small [0]VampirTrace: Maximum number of buffer flushes reached (VT_MAX_FLUSHES=1) [0]VampirTrace: Tracing switched off permanently • Result: • Asynchronous behavior of the application due to buffer flush of the measurement system • No tracing information available after flush operation • Huge overhead due to flush operation
  46. 46. Incomplete Traces - Solutions • Increase trace buffer size %> export VT_BUFFER_SIZE = 150M • Increase number of allowed buffer flushes (not recommended) %> export VT_MAX_FLUSHES = 2 • Use filter mechanisms to reduce the number of recorded events %> export VT_FILTER_SPEC = $HOME/filter.spec • Switch tracing on/off if your application in an iterative manner to reduce the number of recorded events
  47. 47. Way too large Traces • Issue: – Each function entry/exit, MPI event was recorded • Result: – Trace files become large even for short application runs • Solutions: – Use filter mechanisms to reduce the number of recorded events – Use selective instrumentation of your application – Switch tracing on/off if your application works in an iterative manner to reduce the number of recorded events
  48. 48. Overhead • Issue: – Runtime filtering will be called for each event • Result: – Runtime filtering increases the runtime overhead • Solutions: – Use selective instrumentation of your application – Use manual source instrumentation (high effort, error prone) – Only instrument interesting source files with VampirTrace – Switch tracing on/off if your application works in an iterative manner to reduce the number of recorded events
  49. 49. Additional Information needed • Issue: – I’m interested in more events and hardware counters. What do I have to do? • Solutions: – Use the enviroment option VT_METRICS to enable recording of additional hardware counters like PAPI, CPC or NEC if available. – Use the environment option VT_RUSAGE to record the Unix resource usage counters. – Use the environment option VT_MEMTRACE, if available on your system, to intercept the libc allocation functions add to record memory allocation information. – For more additional events and recording hardware information see chapter 4 in the VampirTrace manual.
  50. 50. Thanks for your attention. 50

×