This document discusses event tracing using VampirTrace and Vampir. It provides an overview of event tracing, including instrumentation, run-time measurement, and visualization. Event tracing involves instrumenting code to record events, running the instrumented code to generate trace files, and using tools like Vampir to analyze and visualize the trace files.
eBPF Debugging Infrastructure - Current TechniquesNetronome
eBPF (extended Berkeley Packet Filter), in particular with its driver-level hook XDP (eXpress Data Path), has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This talk will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the use of disassembly to inspect generated assembly code and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware.
The talk will also explore where the current gaps in debugging infrastructure are and suggest some of the next steps to improve this, for example, integrations with tools such as strace, valgrind or even the LLDB debugger.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
eBPF Debugging Infrastructure - Current TechniquesNetronome
eBPF (extended Berkeley Packet Filter), in particular with its driver-level hook XDP (eXpress Data Path), has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This talk will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the use of disassembly to inspect generated assembly code and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware.
The talk will also explore where the current gaps in debugging infrastructure are and suggest some of the next steps to improve this, for example, integrations with tools such as strace, valgrind or even the LLDB debugger.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
Presentation slides about internals of GCC C++ compiler. It covers transformation from source code to output binary, compiler optimizations, register transfer language, etc.
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resources (ReConFig2014@Cancun, Mexico)
flipSyrup, a new framework for rapid prototyping is proposed.
Presentation slides about internals of GCC C++ compiler. It covers transformation from source code to output binary, compiler optimizations, register transfer language, etc.
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resources (ReConFig2014@Cancun, Mexico)
flipSyrup, a new framework for rapid prototyping is proposed.
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Mingliang Liu
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.
http://dl.acm.org/citation.cfm?id=2745876
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
WRENCH enables novel avenues for scientific workflow use, research, development, and education. WRENCH capitalizes on recent and critical advances in the state of the art of distributed platform/application simulation. WRENCH builds on top of the open-source SimGrid simulation framework. SimGrid enables the simulation of large-scale distributed applications in a way that is accurate (via validated simulation models), scalable (low ratio of simulation time to simulated time, ability to run large simulations on a single computer with low compute, memory, and energy footprints), and expressive (ability to simulate arbitrary platform, application, and execution scenarios). WRENCH provides directly usable high-level simulation abstractions using SimGrid as a foundation. More information on https://wrench-project.org
In a nutshell, WRENCH makes it possible to:
- Prototype implementations of Workflow Management System (WMS) components and underlying algorithms;
- Quickly, scalably, and accurately simulate arbitrary workflow and platform scenarios for a simulated WMS implementation; and
- Run extensive experimental campaigns to conclusively compare workflow executions, platform architectures, and WMS algorithms and designs.
DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. TAU is a versatile
profiling and tracing toolkit that supports performance engineering of
codes written for CPU and GPUs and has support for most IBM platforms.
This talk will give an overview of TAU and E4S and how developers can
use these tools to analyze the performance of their codes. TAU supports
transparent instrumentation of codes without modifying the application
binary. The talk will describe TAU's support for CUDA, OpenACC, pthread,
OpenMP, Kokkos, and MPI applications. It will describe TAU's use for
Python based frameworks such as Tensorflow and PyTorch. It will cover
the use of TAU in E4S containers using Docker and Singularity runtimes
under ppc64le. E4S provides both source builds through the Spack
platform and a set of containers that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
No real-time insight without real-time data ingestion. No real-time data ingestion without NiFi ! Apache NiFi is an integrated platform for data flow management at entreprise level, enabling companies to securely acquire, process and analyze disparate sources of information (sensors, logs, files, etc) in real-time. NiFi helps data engineers accelerate the development of data flows thanks to its UI and a large number of powerful off-the-shelf processors. However, with great power comes great responsibilities. Behind the simplicity of NiFi, best practices must absolutely be respected in order to scale data flows in production & prevent sneaky situations. In this joint presentation, Hortonworks and Renault, a French car manufacturer, will present lessons learnt from real world projects using Apache NiFi. We will present NiFi design patterns to achieve high level performance and reliability at scale as well as the process to put in place around the technology for data flow governance. We will also show how these best practices can be implemented in practical use cases and scenarios.
Speakers
Kamelia Benchekroun, Data Lake Squad Lead, Renault Group
Abdelkrim Hadjidj, Solution Engineer, Hortonworks
nuclio is iguazio's open source serverless project. nuclio is 100x faster, brings significant new functionality and works with data and event sources to accelerate performance and development.
The magic behind your Lyft ride prices: A case study on machine learning and ...Karthik Murugesan
Rakesh Kumar and Thomas Weise explore how Lyft dynamically prices its rides with a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability, and scalability—allowing the pricing system to be more adaptable to real-world changes.
eBPF Tooling and Debugging InfrastructureNetronome
eBPF, in particular with its driver-level hook XDP, has increased in importance over the past few years. As a result, the ability to rapidly debug and diagnose problems is becoming more relevant. This session will cover common issues faced and techniques to diagnose them, including the use of bpftool for map and program introspection, the disassembling of programs to inspect generated eBPF instructions and other methods such as using debug prints and how to apply these techniques when eBPF programs are offloaded to the hardware.
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
4. Why bother with performance analysis?
Moore's Law still in charge, so what?
increasingly difficult to get close to peak performance
– for sequential computation
• memory wall
• optimum pipelining, ...
– for parallel interaction
• Amdahl's law
• synchronization with single late-comer, ...
efficiency is important because of limited resources
scalability is important to cope with next bigger simulation
4
5. Profiling and Tracing
Profile Recording
of aggregated information (Time, Counts, …)
about program and system entities
– functions, loops, basic blocks
– application, processes, threads, …
Methods of Profile Creation
sampling (statistical approach)
direct measurement (deterministic approach)
5
6. Profiling and Tracing
Trace Recording
run-time events (points of interest)
during program execution
saved as event record
– timestamp, process, thread, event type
– event specific information
via instrumentation & trace library
Event Trace
collection of all events of a process / program
sorted by time stamp
6
7. Profiling and Tracing
Tracing Advantages
preserve temporal and spatial relationships (context)
allow reconstruction of dynamic behavior
profiles can be calculated from traces
Tracing Disadvantages
traces can become very large
may cause perturbation
instrumentation and tracing is complicated
– event buffering, clock synchronization, …
7
8. Event Tracing Overview
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
9. Event Tracing from A to Z
Instrumentation Run Time Visualization / Analysis
Measurement
src
exec.
instrument
exec.
instrument
trace file(s)
see following
see more below
presentation
9
10. Most common event types
Which events to monitor?
enter/leave of function/routine/region
– time stamp, process/thread, function ID
send/receive of P2P message (MPI)
– time stamp, sender, receiver, length, tag, communicator
collective communication (MPI)
– time stamp, process, root, communicator, # bytes
hardware performance counter value
– time stamp, process, counter ID, value
corresponding “record types” in trace file format
10
11. Parallel Trace Files
10010 P 1 ENTER 5
10090 P 1 ENTER 6
10110 P 1 ENTER 12
10110 P 1 SEND TO 3 LEN 1024 ...
10330 P 1 LEAVE 12
10400 10020 P 2 ENTER 5
P 1 LEAVE 6 DEF TIMERRES 1000000000
10520 10095 P 2 ENTER 6
P 1 ENTER 9 DEF PROCESS 1 `Master`
10550 10120 P 2 ENTER 13
P 1 LEAVE 9 DEF PROCESS 1 `Slave`
... 10300 P 2 RECV FROM 3 LEN 1024 ...
DEF FUNCTION 5 `main`
10350 P 2 LEAVE 13 DEF FUNCTION 6 `foo`
10450 P 2 LEAVE 6 DEF FUNCTION 9 `bar`
10620 P 2 ENTER 9 DEF FUNCTION 12 `MPI_Send`
10650 P 2 LEAVE 9 DEF FUNCTION 13 `MPI_Recv`
...
Trace Format Schematics
11
16. The Vampir Tool Family
VampirTrace
convenient instrumentation and measurement
hides away complicated details
provides many options and switches for experts
VampirTrace is part of Open MPI 1.3
Vampir/VampirServer
interactive trace visualization and analysis
intuitive browsing and zooming
scalable to large trace data sizes (100GB)
scalable to high parallelism (2000 processes)
Vampir for Windows in progress, beta version
available
16
17. Trace File Formats
Open Trace Format (OTF)
Open source trace file format
Includes powerful libotf for use in custom applications
High level interface for tools + low level interface for trace libraries
Other Formats
TAU Trace Format (Univ. of Oregon)
Epilog (ZAM, FZ Jülich)
STF (Pallas, now Intel)
17
18. Other Tools
Other Event Tracing Tools
TAU profiling (University of Oregon, USA)
– profiling and tracing for parallel applications
– http://www.cs.uoregon.edu/research/tau/
Paraver (CEPBA, Barcelona, Spain)
– trace based parallel performance analysis and visualization
– http://www.cepba.upc.edu/paraver/
Scalasca (FZ Jülich)
– tracing and automatic detection of performance problems
– http://www.scalasca.org/
Intel Trace Collector & Analyzer
– Very similar to Vampir
18
20. Instrumentation
Instrumentation: Process of modifying programs to detect and report
events by calling instrumentation functions.
instrumentation functions provided by trace library
notification about run-time event
there are various ways of instrumentation
20
21. Instrumentation
Edit – Compile – Run Cycle
Compiler Run
Source Code Binary Results
Edit – Compile – Run Cycle with VampirTrace
Compiler Run
Source Code Binary Results
VT Wrapper
Traces
21
22. Instrumentation Types
Source code instrumentation
– manually
– automatically
Instrumentation with wrapper functions
Library pre-load instrumentation
Compiler Instrumentation
Binary instrumentation
VampirTrace supports different methods of instrumentation
Hidden in compiler wrappers
22
23. Source Code Instrumentation
int foo(void* arg) { int foo(void* arg) {
enter(7);
if (cond) { if (cond) {
leave(7);
return 1;
return 1;
}
}
leave(7);
return 0;
return 0;
}
}
manually or automatically
23
24. Source Code Instrumentation
manually
large effort
error prone
difficult to manage
automatically
via source to source translation
Program Database Toolkit (PDT)
http://www.cs.uoregon.edu/research/pdt/
OpenMP Pragma And Region Instrumentor (Opari)
http://www.fz-juelich.de/zam/kojak/opari/
24
25. Instrumentation with Wrapper Functions
provide wrapper functions
– call instrumentation function for notification
– call original target for actual functionality
implement via library pre-load
or via preprocessor directives
#define fread WRAPPER_glibc_fread
#define fwrite WRAPPER_glibc_fwrite
suitable for standard libraries (e.g. MPI, glibc)
can evaluate function call semantics (function signature, arguments)
25
26. The MPI Profiling Interface
Instrumentation via library pre-load, e.g. for MPI
Each MPI function has two names:
– MPI_xxx and PMPI_xxx
Selective replacement of MPI routines at link time
MPI_Send MPI_Send
user program
MPI_Send
wrapper library
MPI_Send PMPI_Send MPI_Send
MPI library
26
27. Compiler Instrumentation
gcc -finstrument-functions –c foo.c
void __cyg_profile_func_enter( <args> );
void __cyg_profile_func_exit( <args> );
many compilers support instrumentation:
(GCC, Intel, IBM, PGI, NEC, Hitachi, Sun Fortran, …)
no common API, different command line switches, different
behavior
no source modification necessary
managed by VampirTrace
27
28. Dynamic Instrumentation
modify binary executable in main memory (or in a file)
insert instrumentation calls
very platform/machine dependent
expensive
Using the DynInst project
provides common interface to binary instrumentation
available for Alpha/Tru64, MIPS/IRIX, PowerPC/AIX,
Sparc/Solaris, x86/Linux+Windows, ia64/Linux
see http://www.dyninst.org
28
29. Practical Instrumentation
Use VampirTrace compiler wrappers
Internals and plattform specifics hidden
Select appropriate way(s) of instrumentation
Substitute calls to the regular compiler with calls to compiler
wrappers
CC=mpicc
CC=vtcc
29
30. Run Time Measurement
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)
31. Trace Library
What does the trace library do?
provide instrumentation functions
receive events of various types
collect event properties
– time stamp
– location (thread, process, cluster node, MPI rank)
– event specific properties
– perhaps hardware performance counter values
record to memory buffer, flush eventually
try to be fast, minimize overhead
31
32. Run-Time Options
There are a number of run-time options
Controlled by environment variables
PAPI hardware performance counters
Memory allocation counters
Application I/O calls
Filtering
Grouping
more ...
see more in the following presentations and hands-on parts
32
33. Performance Counters
Include hardware performance counters in traces
– via PAPI library
– or Sun Solaris CPC counters
– or NEC SX counters
VT_METRICS can be used to specify a colon-separated list of counters
see papi_avail and papi_command_line tools etc.
see VampirTrace Documentation for CPC and NEC counters
set VT_METRICS environment variable
export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM
33
34. Memory Allocation Tracing
monitor memory allocation behavior
record memory volume as counter
record glibc calls like “malloc” and “free” as function calls
via environment variable VT_MEMTRACE
export VT_MEMTRACE=yes
34
35. I/O Tracing
monitor POSIX I/O behavior
record read/write rates as counters
record standard I/O calls like “open” and “read”
via environment variable VT_IOTRACE
export VT_IOTRACE=yes
mmap I/O not supported
35
36. Function Filtering
selective tracing of certain functions/subroutines
one way to reduce trace file size!
via environment variable VT_FILTER_SPEC
export VT_FILTER_SPEC=/home/user/filter.spec
run-time filtering, no re-compilation or re-linking
my*;test -- 1000
calculate -- -1
* -- 1000000
see also the vtfilter tool
– can create a filter file with rough target size estimate
– can apply a filter to an existing trace file as post processing
36
37. Function Grouping
defined user specified groups
highlighting application behavior, different activities, program phases
– communication, computation, initialization, different libraries, ...
groups are assigned to colors in Vampir displays
run-time grouping, no re-compilation or re-linking
via environment variable VT_GROUPS_SPEC
export VT_GROUPS_SPEC=/home/<user>/groups.spec
contains a list of groups of associated functions, wildcards allowed
CALC=calculate
MISC=my*;test
UNKNOWN=*
37
38. Behind the Scenes
Further activities of the trace library:
Data management
– Trace data is written to a buffer in memory first
– When this buffer is full, data is flushed to files
– Data compression, etc
Timer selection and time synchronization between local clocks
– use highly accurate clocks
Unification of local process/thread traces (post processing)
– trace processes/threads separately
– collect all traces of all parallel processes/threads at the end
– add global information about all participants
38
40. Conclusion
performance analysis is very important in HPC
use performance analysis tools for profiling and tracing
do not spend effort in DIY solutions, e.g. like printf-debugging
use tracing tools with some precautions
– overhead
– data volume
let us know about problems and about feature wishes via
vampirsupport@zih.tu-dresden.de
40