Sameer Shende, director, Performance Research Laboratory, University of Oregon, presents TAU for Accelerating AI Applications at OpenPOWER Summit Europe 2018.
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
Streamlio's Karthik Ramasamy takes a look how the Apache Heron streaming platform uses built-in intelligence to automatically regulate data flow and ensure resiliency.
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
Streamlio's Karthik Ramasamy takes a look how the Apache Heron streaming platform uses built-in intelligence to automatically regulate data flow and ensure resiliency.
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
Monitoring is complicated, and in most organizations consists of far too many tools owned by too many teams. Fixing monitoring issues requires people, process, and technology. Hear common issues seen in the real world including what should be monitored or collected from a technology and a business perspective.
Investigate what instrumentation is most scalable and effective across languages, commonly used APIs, and possibilities for capturing data from common languages like Java, .NET, and PHP. Cover browser and mobile instrumentation techniques. Get tips on which APIs to use, what open source tools and frameworks can be leveraged, and how to coordinate and communicate requirements across your organization.
Key takeaways:
o What is instrumentation, and what to instrument, collect, and store
o How this can be accomplished on common software stacks
o How to work with application owners to collect business data
o How correlation works in custom open source or packaged monitoring tools
For more information, go to: www.appdynamics.com
DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. TAU is a versatile
profiling and tracing toolkit that supports performance engineering of
codes written for CPU and GPUs and has support for most IBM platforms.
This talk will give an overview of TAU and E4S and how developers can
use these tools to analyze the performance of their codes. TAU supports
transparent instrumentation of codes without modifying the application
binary. The talk will describe TAU's support for CUDA, OpenACC, pthread,
OpenMP, Kokkos, and MPI applications. It will describe TAU's use for
Python based frameworks such as Tensorflow and PyTorch. It will cover
the use of TAU in E4S containers using Docker and Singularity runtimes
under ppc64le. E4S provides both source builds through the Spack
platform and a set of containers that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
More info: sumologic.com/training
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Rogue Wave Software
Debugging scalable hybrid and accelerated applications on the Cray XC30, CS300 with our multi-threaded debugger TotalView. Faster fault isolation, improved memory optimization, and dynamic visualization for your high performance computing apps, a solution provided by Rogue Wave Software
“Performance testing is the process by which software is tested to determine the current system performance. This process aims to gather information about current performance, but places no value judgments on the findings".
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
Video: https://www.sumologic.com/online-training/#QuickStart
Network visibility and control using industry standard sFlow telemetrypphaal
• Find out about the sFlow instrumentation built into commodity data center network and server infrastructure.
• Understand how sFlow fits into the broader ecosystem of NetFlow, IPFIX, SNMP and DevOps monitoring technologies.
• Case studies demonstrate how sFlow telemetry combined with automation can lower costs, increase performance, and improve security of cloud infrastructure and applications.
Webinar here: https://youtu.be/MEmFFwNmLxg
Sumo Logic "How To" Webinar - Monitoring you Data: Alerting on Outliers
Dashboards are fantastic, but how do I get notified of critical events? This webinar will cover how to create alerts that will allow your team to effectively monitor business-critical events. Alert channels include email or webhooks into Slack, PagerDuty, DataDog, ServiceNow, or any other webhook you want to develop. What about running custom scripts triggered from alerts? Let's do it.
Sumo Logic QuickStart Webinar - Dec 2016Sumo Logic
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
Video: https://www.sumologic.com/online-training/#start
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
Monitoring is complicated, and in most organizations consists of far too many tools owned by too many teams. Fixing monitoring issues requires people, process, and technology. Hear common issues seen in the real world including what should be monitored or collected from a technology and a business perspective.
Investigate what instrumentation is most scalable and effective across languages, commonly used APIs, and possibilities for capturing data from common languages like Java, .NET, and PHP. Cover browser and mobile instrumentation techniques. Get tips on which APIs to use, what open source tools and frameworks can be leveraged, and how to coordinate and communicate requirements across your organization.
Key takeaways:
o What is instrumentation, and what to instrument, collect, and store
o How this can be accomplished on common software stacks
o How to work with application owners to collect business data
o How correlation works in custom open source or packaged monitoring tools
For more information, go to: www.appdynamics.com
DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. TAU is a versatile
profiling and tracing toolkit that supports performance engineering of
codes written for CPU and GPUs and has support for most IBM platforms.
This talk will give an overview of TAU and E4S and how developers can
use these tools to analyze the performance of their codes. TAU supports
transparent instrumentation of codes without modifying the application
binary. The talk will describe TAU's support for CUDA, OpenACC, pthread,
OpenMP, Kokkos, and MPI applications. It will describe TAU's use for
Python based frameworks such as Tensorflow and PyTorch. It will cover
the use of TAU in E4S containers using Docker and Singularity runtimes
under ppc64le. E4S provides both source builds through the Spack
platform and a set of containers that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
More info: sumologic.com/training
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Rogue Wave Software
Debugging scalable hybrid and accelerated applications on the Cray XC30, CS300 with our multi-threaded debugger TotalView. Faster fault isolation, improved memory optimization, and dynamic visualization for your high performance computing apps, a solution provided by Rogue Wave Software
“Performance testing is the process by which software is tested to determine the current system performance. This process aims to gather information about current performance, but places no value judgments on the findings".
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
Video: https://www.sumologic.com/online-training/#QuickStart
Network visibility and control using industry standard sFlow telemetrypphaal
• Find out about the sFlow instrumentation built into commodity data center network and server infrastructure.
• Understand how sFlow fits into the broader ecosystem of NetFlow, IPFIX, SNMP and DevOps monitoring technologies.
• Case studies demonstrate how sFlow telemetry combined with automation can lower costs, increase performance, and improve security of cloud infrastructure and applications.
Webinar here: https://youtu.be/MEmFFwNmLxg
Sumo Logic "How To" Webinar - Monitoring you Data: Alerting on Outliers
Dashboards are fantastic, but how do I get notified of critical events? This webinar will cover how to create alerts that will allow your team to effectively monitor business-critical events. Alert channels include email or webhooks into Slack, PagerDuty, DataDog, ServiceNow, or any other webhook you want to develop. What about running custom scripts triggered from alerts? Let's do it.
Sumo Logic QuickStart Webinar - Dec 2016Sumo Logic
QuickStart your Sumo Logic service with this exclusive webinar. At these monthly live events you will learn how to capitalize on critical capabilities that can amplify your log analytics and monitoring experience while providing you with meaningful business and IT insights.
Video: https://www.sumologic.com/online-training/#start
Check out our coverage of the final day of OpenPOWER Summit Europe, where our members from around the world discussed how participating in an open ecosystem has driven their success.
See the latest in acceleration, deep and machine learning, and more by clicking thru our curated experience of International Supercomputing 2016. Through an OpenPOWER lens, we show you the best news and conversations that took place at ISC June 20-23, 2016 in Frankfurt, Germany.
In our recap of the final day of International Supercomputing 2016, we explore how OpenPOWER members are working in the HPC industry and continue the conversation around cognitive computing, deep learning, and machine learning.
Our OpenPOWER recap of Day 2 featured challenges within the HPC industry, and how the OpenPOWER Foundation's ecosystem of innovators are rising to solve them.
At Day 1 of OpenPOWER Summit at NVIDIA GTC, we were blown away by the buzz around GPUs. Take a look at our recap from the latest from GTC and OpenPOWER members.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
4. Performance Engineering using TAU
• How much time is spent in each application routine and outer loops? Within loops, what
is the contribution of each statement? What is the time spent in OpenMP loops?
Kernels on the GPU?
• How many instructions are executed in these code regions?
Floating point, Level 1 and 2 data cache misses, hits, branches taken?
• What is the memory usage of the code? When and where is memory allocated/de-
allocated? Are there any memory leaks? What is the memory footprint of the
application? What is the memory high water mark?
• How much energy does the application use in Joules? What is the peak power usage?
• What are the I/O characteristics of the code? What is the peak read and write
bandwidth of individual calls, total volume?
• How does the application scale? What is the efficiency, runtime breakdown of
performance across different core counts?
5. Instrumentation
• Source instrumentation using a preprocessor
• Add timer start/stop calls in a copy of the source code.
• Use Program Database Toolkit (PDT) for parsing source code.
• Requires recompiling the code using TAU shell scripts (tau_cc.sh, tau_f90.sh)
• Selective instrumentation (filter file) can reduce runtime overhead and narrow
instrumentation focus.
• Compiler-based instrumentation
• Use system compiler to add a special flag to insert hooks at routine entry/exit.
• Requires recompiling using TAU compiler scripts (tau_cc.sh, tau_f90.sh…)
• Runtime preloading of TAU’s Dynamic Shared Object (DSO)
• No need to recompile code! Use tau_exec ./app with options.
6. Profiling and Tracing
• Tracing shows you when the events
take place on a timeline
Profiling Tracing
• Profiling shows you how much
(total) time was spent in each routine
• Profiling and tracing
Profiling shows you how much (total) time was spent in each routine
Tracing shows you when the events take place on a timeline
7. Inclusive vs. Exclusive values
■ Inclusive
■ Information of all sub-elements aggregated into single value
■ Exclusive
■ Information cannot be subdivided further
Inclusive Exclusive
int foo()
{
int a;
a = 1 + 1;
bar();
a = a + 1;
return a;
}
8. Performance Data Measurement
Direct via Probes Indirect via Sampling
• Exact measurement
• Fine-grain control
• Calls inserted into
code
• No code modification
• Minimal effort
• Relies on debug
symbols (-g)
Call START(‘potential’)
// code
Call STOP(‘potential’)
9. Sampling
• Running program is periodically interrupted to take
measurement
• Timer interrupt, OS signal, or HWC overflow
• Service routine examines return-address stack
• Addresses are mapped to routines using symbol table
information
• Statistical inference of program behavior
• Not very detailed information on highly volatile metrics
• Requires long-running applications
• Works with unmodified executables
Time
main foo(0) foo(1) foo(2) int main()
{
int i;
for (i=0; i < 3; i++)
foo(i);
return 0;
}
void foo(int i)
{
if (i > 0)
foo(i – 1);
}
Measurement
t9t7t6t5t4
t1 t2 t3 t8
10. Instrumentation
• Measurement code is inserted such that every
event of interest is captured directly
• Can be done in various ways
• Advantage:
• Much more detailed information
• Disadvantage:
• Processing of source-code / executable
necessary
• Large relative overheads for small functions
Time
Measurement int main()
{
int i;
for (i=0; i < 3; i++)
foo(i);
return 0;
}
void foo(int i)
{
if (i > 0)
foo(i – 1);
}
Time
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12t13 t14
main foo(0) foo(1) foo(2)
Start(“main”);
Stop (“main”);
Start(“foo”);
Stop (“foo”);
11. Using TAU’s Runtime Preloading
Tool: tau_exec
• Preload a wrapper that intercepts the runtime system call and
substitutes with another
• MPI, CUDA, OpenACC, pthread, OpenCL
• OpenMP
• POSIX I/O
• Memory allocation/deallocation routines
• Wrapper library for an external package
• No modification to the binary executable!
• Enable other TAU options (communication matrix, OTF2, event-
based sampling)
• For Python: tau_python replaces python. Similar to tau_exec.
13. Configuration tags for tau_exec
% ./configure –pdt=<dir> -mpi –papi=<dir>; make install
Creates in $TAU:
Makefile.tau-papi-mpi-pdt(Configuration parameters in stub makefile)
shared-papi-mpi-pdt/libTAU.so
% ./configure –pdt=<dir> -mpi; make install creates
Makefile.tau-mpi-pdt
shared-mpi-pdt/libTAU.so
To explicitly choose preloading of shared-<options>/libTAU.so change:
% mpirun -np 256 ./a.out to
% mpirun -np 256 tau_exec –T <comma_separated_options> ./a.out
% mpirun -np 256 tau_exec –T papi,mpi,pdt ./a.out
Preloads $TAU/shared-papi-mpi-pdt/libTAU.so
% mpirun -np 256 tau_exec –T papi ./a.out
Preloads $TAU/shared-papi-mpi-pdt/libTAU.so by matching.
% aprun –n 256 tau_exec –T papi,mpi,pdt –s ./a.out
Does not execute the program. Just displays the library that it will preload if executed without the –s option.
NOTE: -mpi configuration is selected by default. Use –T serial for
Sequential programs.
27. TAU’s Static Analysis System:
Program Database Toolkit (PDT)
Application
/ Library
C / C++
parser
Fortran parser
F77/90/95
C / C++
IL analyzer
Fortran
IL analyzer
Program
Database
Files
IL IL
DUCTAPE
TAU
instrumentor
Automatic source
instrumentation
.
.
.
29. Environment Variable Default Description
TAU_TRACE 0 Setting to 1 turns on tracing
TAU_CALLPATH 0 Setting to 1 turns on callpath profiling
TAU_TRACK_MEMORY_FOOTPRINT 0 Setting to 1 turns on tracking memory usage by sampling periodically the resident set size and high water mark of memory
usage
TAU_TRACK_POWER 0 Tracks power usage by sampling periodically.
TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no callpath or routine information, setting to 1 generates flat profile and
context events have just parent information (e.g., Heap Entry: foo)
TAU_SAMPLING 1 Setting to 1 enables event-based sampling.
TAU_TRACK_SIGNALS 0 Setting to 1 generate debugging callstack info when a program crashes
TAU_COMM_MATRIX 0 Setting to 1 generates communication matrix display using context events
TAU_THROTTLE 1 Setting to 0 turns off throttling. Throttles instrumentation in lightweight routines that are called frequently
TAU_THROTTLE_NUMCALLS 100000 Specifies the number of calls before testing for throttling
TAU_THROTTLE_PERCALL 10 Specifies value in microseconds. Throttle a routine if it is called over 100000 times and takes less than 10 usec of inclusive
time per call
TAU_CALLSITE 0 Setting to 1 enables callsite profiling that shows where an instrumented function was called. Also compatible with tracing.
TAU_PROFILE_FORMAT Profile Setting to “merged” generates a single file. “snapshot” generates xml format
TAU_METRICS TIME Setting to a comma separated list generates other metrics. (e.g.,
ENERGY,TIME,P_VIRTUAL_TIME,PAPI_FP_INS,PAPI_NATIVE_<event>:<subevent>)
Runtime Environment Variables
30. Environment Variable Default Description
TAU_TRACE 0 Setting to 1 turns on tracing
TAU_TRACE_FORMAT Default Setting to “otf2” turns on TAU’s native OTF2 trace generation (configure with –otf=download)
TAU_EBS_UNWIND 0 Setting to 1 turns on unwinding the callstack during sampling (use with tau_exec –ebs or TAU_SAMPLING=1)
TAU_EBS_RESOLUTION line Setting to “function” or “file” changes the sampling resolution to function or file level respectively.
TAU_TRACK_LOAD 0 Setting to 1 tracks system load on the node
TAU_SELECT_FILE Default Setting to a file name, enables selective instrumentation based on exclude/include lists specified in the file.
TAU_OMPT_SUPPORT_LEVEL basic Setting to “full” improves resolution of OMPT TR6 regions on threads 1.. N-1. Also, “lowoverhead” option is available.
TAU_OMPT_RESOLVE_ADDRESS_EAGERLY 0 Setting to 1 is necessary for event based sampling to resolve addresses with OMPT TR6 (-ompt=download-tr6)
Runtime Environment Variables
31. Environment Variable Default Description
TAU_TRACK_MEMORY_LEAKS 0 Tracks allocates that were not de-allocated (needs –optMemDbg or tau_exec –memory)
TAU_EBS_SOURCE TIME Allows using PAPI hardware counters for periodic interrupts for EBS (e.g.,
TAU_EBS_SOURCE=PAPI_TOT_INS when TAU_SAMPLING=1)
TAU_EBS_PERIOD 100000 Specifies the overflow count for interrupts
TAU_MEMDBG_ALLOC_MIN/MAX 0 Byte size minimum and maximum subject to bounds checking (used with TAU_MEMDBG_PROTECT_*)
TAU_MEMDBG_OVERHEAD 0 Specifies the number of bytes for TAU’s memory overhead for memory debugging.
TAU_MEMDBG_PROTECT_BELOW/ABOVE 0 Setting to 1 enables tracking runtime bounds checking below or above the array bounds (requires –
optMemDbg while building or tau_exec –memory)
TAU_MEMDBG_ZERO_MALLOC 0 Setting to 1 enables tracking zero byte allocations as invalid memory allocations.
TAU_MEMDBG_PROTECT_FREE 0 Setting to 1 detects invalid accesses to deallocated memory that should not be referenced until it is
reallocated (requires –optMemDbg or tau_exec –memory)
TAU_MEMDBG_ATTEMPT_CONTINUE 0 Setting to 1 allows TAU to record and continue execution when a memory error occurs at runtime.
TAU_MEMDBG_FILL_GAP Undefined Initial value for gap bytes
TAU_MEMDBG_ALINGMENT Sizeof(int) Byte alignment for memory allocations
TAU_EVENT_THRESHOLD 0.5 Define a threshold value (e.g., .25 is 25%) to trigger marker events for min/max
Runtime Environment Variables (contd.)
34. Support Acknowledgments
• US Department of Energy (DOE)
• ANL
• Office of Science contracts, ECP
• SciDAC, LBL contracts
• LLNL-LANL-SNL ASC/NNSA contract
• Battelle, PNNL and ORNL contract
• Department of Defense (DoD)
• PETTT, HPCMP
• National Science Foundation (NSF)
• SI2-SSI, Glassbox
• NASA
• CEA
• IBM
• Partners:
• The Ohio State University
• ParaTools, Inc.
• University of Tennessee, Knoxville
• T.U. Dresden, GWT
• Jülich Supercomputing Center