Trace Visualization

1,245 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,245
On SlideShare
0
From Embeds
0
Number of Embeds
101
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Trace Visualization

  1. 1. Trace Visualization Visualization and Analysis of MPI Resources
  2. 2. Motivation & Mission • Motivation – Parallel programming is about performance! – Scaling to thousands of cores is required – You need a decent MPI implementation, e.g. Open MPI – You also need a ready-to-use performance monitoring and analysis tool • Mission – Visualization of dynamics of complex parallel processes – Requires two components • Monitor/Collector (VampirTrace) • Charts/Browser (Vampir) – Available for major platforms – Open Source (partially)
  3. 3. Event Trace Visualization • Trace Visualization – Alternative and supplement to automatic analysis – Show dynamic run-time behavior graphically – Provide statistics and performance metrics • Global timeline for parallel processes/threads • Process timeline plus performance counters • Statistics summary display • Message statistics • More – Interactive browsing, zooming, selecting • Adapt statistics to zoom level (time interval) • Also for very large and highly parallel traces
  4. 4. Vampir History • PARvis at Research Center Jülich • 1995: Vampir at Research Center Jülich http://www.top500.org/reports/1995/vampir/vampir.html – 1997: Vampir at TU Dresden – 2006: new version VampirServer (or Vampir NG) • Distributed storage, enhanced scalability • Client/server architecture – 2009: Vampir7 – redesign of GUI using QT
  5. 5. Vampir Toolset Architecture CPU CPU CPU CPU Vampir 7 Multi-Core Vampir Trace Program Trace File CPU CPU CPU CPU (OTF) CPU CPU CPU CPU Vampir VampirServer CPU CPU CPU CPU Trace CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Trace Many-Core Bundle CPU CPU CPU CPU CPU Program CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
  6. 6. Vampir for Windows • Vampir for UNIX – VampirClassic (single threaded) Vampir Classic – VampirServer All in one, single threaded (MPI parallel) • Vampir for Windows Vampir Server – Based on parallel service Parallelized Visualization engine service engine Sockets (Motif) – All new browser • A beta of the new Vampir 7 for Windows Browser for Linux Threaded Windows available at service DLL API GUI www.vampir.eu
  7. 7. Usage order of the Vampir Performance Analysis Toolset 1. Instrument your application with VampirTrace 2. Run your application with an appropriate test set 3. Analyze your trace file with Vampir 1. Small trace files with a low number of processes can be analyzed on your local workstation 1. Start your local Vampir 2. Load trace file from your local disk 2. Large trace files should be stored on the cluster file system 1. Start VampirServer on your analysis cluster 2. Start your local Vampir 3. Connect local Vampir with the VampirServer on the analysis cluster 4. Load trace file from the cluster file system
  8. 8. Vampir Displays The main displays of Vampir: • Master Timeline (Global Timeline ) • Process and Counter Timeline • Function Summary • Message Summary • Process Summary • Communication Matrix • Call Tree
  9. 9. Vampir 7: Displays for a WRF trace
  10. 10. Master Timeline ( Global Timeline ) Master Timeline
  11. 11. Process and Counter Timeline Process Timeline Counter Timeline
  12. 12. Function Summary Function Summary
  13. 13. Message Summary
  14. 14. Process Summary Process Summary
  15. 15. Communication Matrix Communication Matrix
  16. 16. Call Tree
  17. 17. Customizable Chart Layout •No cluttering Toolbars •Time based alignment Master •View impact at a glance Timeline Function Summary •Simple controls (hidden) •User defined Secondary Timeline – Combination Call Tree – Rows and columns Process – Arrangement Timeline – Size Function Func. Group Context Legend Summary View Comprehensive Performance Tracking with Dresden, September 15th Slide 17 Vampir 7.0
  18. 18. Sessions • What is a session? – Trace file – Chart selection Trace File Config File – Layout (OTF) Toolbars TOOLBARS Toolbars – Preferences (i. e. colors) Master Master Master – Chart options • Toolbars Function Function Function Timeline Timeline Timeline Timeline • Master Summary Summary Summary • Scope of session properties • Secondary Timeline • Process Timeline Secondary Timeline Secondary Timeline – Identical for all traces Secondary Timeline • Function Summary Call Tree • Function Group Summary – Trace specific Process Process • Call Tree Call Tree CallTree – Matter of taste Timeline Timeline Legend • Function • Context View – Therefore: scope is Function Function Func. Group Func. Group Func. Group Context Context customizable Legend Legend Legend Summary Summary Summary View View • Can be attached to trace data Comprehensive Performance Tracking with Dresden, September 15th Slide 18 Vampir 7.0
  19. 19. Typical Performance Problems
  20. 20. Finding Bottlenecks • Trace Visualization – Vampir provides a number of display types – Each allows many different options • Advice – Identify essential parts of an application (initialization, main iteration, I/O, finalization) – Identify important components of the code (serial computation, MPI P2P, collective MPI, OpenMP) – Make a hypothesis about performance problems – Consider application’s internal workings if known – Select the appropriate displays – Use statistic displays in conjunction with timelines
  21. 21. Communication Computation Memory, I/O, etc. Tracing itself FINDING BOTTLENECKS
  22. 22. Bottlenecks in Communication • Communications as such (dominating over computation) • Late sender, late receiver • Point-to-point messages instead of collective communication • Unmatched messages • Overcharge of MPI’s buffers • Bursts of large messages (bandwidth) • Frequent short messages (latency) • Unnecessary synchronization (barrier)  All of the above usually result in high MPI time share
  23. 23. Bottlenecks in Communication unnecessary MPI_Barriers
  24. 24. Bottlenecks in Communication Patterns of successive MPI_Allreduce calls
  25. 25. Bottlenecks in Communication Inefficient implementation of MPI_Allgatherv
  26. 26. Further Bottlenecks • Unbalanced computation – Single late comer • Strictly serial parts of program – Idle processes/threads • Very frequent tiny function calls • Sparse loops
  27. 27. Further Bottlenecks Example: Idle OpenMP threads
  28. 28. Bottlenecks in Computation • Memory bound computation – Inefficient L1/L2/L3 cache usage – TLB misses – Detectable via HW performance counters • I/O bound computation – Slow input/output – Sequential I/O on single process – I/O load imbalance • Exception handling
  29. 29. Bottlenecks in Computation Low FP rate due to heavy cache misses
  30. 30. Bottlenecks in Computation Low FP rate due to heavy FP exceptions
  31. 31. Bottlenecks in Computation Irregular slow I/O operations
  32. 32. Effects due to Tracing • Measurement overhead – Especially grave for tiny function calls – Solve with selective instrumentation • Long/frequent/asynchronous trace buffer flushes • Too man concurrent counters • Heisenbugs
  33. 33. Effects due to Tracing Trace buffer flushes are explicitly marked in the trace. It is rather harmless at the end of a trace as shown here.
  34. 34. Conclusion – Performance analysis very important in HPC – Use performance analysis tools for profiling and tracing – Do not spend effort in DIY solutions, e.g. like printf-debugging – Use tracing tools with some precautions • Overhead • Data volume – Let us know about problems and about feature wishes – vampirsupport@zih.tu-dresden.de
  35. 35. Summary • Vampir & VampirServer – Interactive trace visualization and analysis – Intuitive browsing and zooming – Scalable to large trace data sizes (100GByte) – Scalable to high parallelism (20000 processes) • Vampir for Linux in progress, beta available • VampirTrace – Convenient instrumentation and measurement – Hides away complicated details – Provides many options and switches for experts • VampirTrace is part of Open MPI > 1.3

×