SlideShare a Scribd company logo
1 of 39
Download to read offline
Sameer	Shende	
University	of	Oregon	
Academic	Discussion	Group	Workshop	2018	
Saturday,	November	10,	2018,	11:30am	
Nimbix	Headquarters	
800	E	Campbell	Road,	Suite	241	
Richardson,	TX	
https://indico-jsc.fz-juelich.de/event/76/	
	
Performance Evaluation using the TAU
Performance System
Slides	available	at:	http://tau.uoregon.edu/tau_adg18.pdf
TAU Performance System
®
http://tau.uoregon.edu
•  Tuning and Analysis Utilities (20+ year project)
•  Comprehensive performance profiling and tracing
•  Integrated, scalable, flexible, portable
•  Targets all parallel programming/execution paradigms
•  Integrated performance toolkit
•  Instrumentation, measurement, analysis, visualization
•  Widely-ported performance profiling / tracing system
•  Performance data management and data mining
•  Open source (BSD-style license)
•  Integrates with application frameworks
2
TAU Performance System®
Performance Engineering using TAU
•  How much time is spent in each application routine and outer loops? Within loops, what
is the contribution of each statement? What is the time spent in OpenMP loops?
Kernels on the GPU?
•  How many instructions are executed in these code regions?
Floating point, Level 1 and 2 data cache misses, hits, branches taken?
•  What is the memory usage of the code? When and where is memory allocated/de-
allocated? Are there any memory leaks? What is the memory footprint of the
application? What is the memory high water mark?
•  How much energy does the application use in Joules? What is the peak power usage?
•  What are the I/O characteristics of the code? What is the peak read and write
bandwidth of individual calls, total volume?
•  How does the application scale? What is the efficiency, runtime breakdown of
performance across different core counts?
Instrumentation
•  Source	instrumentation	using	a	preprocessor	
•  Add	timer	start/stop	calls	in	a	copy	of	the	source	code.	
•  Use	Program	Database	Toolkit	(PDT)	for	parsing	source	code.	
•  Requires	recompiling	the	code	using	TAU	shell	scripts	(tau_cc.sh,	tau_f90.sh)	
•  Selective	instrumentation	(filter	file)	can	reduce	runtime	overhead	and		narrow	
instrumentation	focus.		
•  Compiler-based	instrumentation	
•  Use	system	compiler	to	add	a	special	flag	to	insert	hooks	at	routine	entry/exit.	
•  Requires	recompiling	using	TAU	compiler	scripts	(tau_cc.sh,	tau_f90.sh…)	
•  Runtime	preloading	of	TAU’s	Dynamic	Shared	Object	(DSO)		
•  No	need	to	recompile	code!	Use	tau_exec	./app		with	options.
Profiling and Tracing
•  Tracing shows you when the events
take place on a timeline
Profiling	 Tracing	
•  Profiling shows you how much
(total) time was spent in each routine
•  Profiling	and	tracing	
Profiling	shows	you	how	much	(total)	time	was	spent	in	each	routine	
Tracing	shows	you	when	the	events	take	place	on	a	timeline
Inclusive vs. Exclusive values
■  Inclusive	
■  Information	of	all	sub-elements	aggregated	into	single	value	
■  Exclusive	
■  Information	cannot	be	subdivided	further	
Inclusive Exclusive
int foo()
{
int a;
a = 1 + 1;
bar();
a = a + 1;
return a;
}
Performance Data Measurement
Direct	via	Probes	 Indirect	via	Sampling	
•  Exact measurement
•  Fine-grain control
•  Calls inserted into
code
•  No code modification
•  Minimal effort
•  Relies on debug
symbols (-g)
Call START(‘potential’)
// code
Call STOP(‘potential’)
Sampling
•  Running	program	is	periodically	interrupted	to	take	
measurement	
•  Timer	interrupt,	OS	signal,	or	HWC	overflow	
•  Service	routine	examines	return-address	stack	
•  Addresses	are	mapped	to	routines	using	symbol	table	
information	
•  Statistical	inference	of	program	behavior	
•  Not	very	detailed	information	on	highly	volatile	metrics	
•  Requires	long-running	applications	
•  Works	with	unmodified	executables	
Time
main foo(0) foo(1) foo(2) int main()
{
int i;
for (i=0; i < 3; i++)
foo(i);
return 0;
}
void foo(int i)
{
if (i > 0)
foo(i – 1);
}
Measurement
t9t7t6t5t4
t1 t2 t3 t8
Instrumentation
•  Measurement	code	is	inserted	such	that	every	
event	of	interest	is	captured	directly	
•  Can	be	done	in	various	ways	
•  Advantage:	
•  Much	more	detailed	information	
•  Disadvantage:	
•  Processing	of	source-code	/	executable	
necessary	
•  Large	relative	overheads	for	small	functions	
Time
Measurement int main()
{
int i;
for (i=0; i < 3; i++)
foo(i);
return 0;
}
void foo(int i)
{
if (i > 0)
foo(i – 1);
}
Time
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12t13 t14
main foo(0) foo(1) foo(2)
Start(“main”);
Stop (“main”);
Start(“foo”);
Stop (“foo”);
Using TAU’s Runtime Preloading
Tool: tau_exec
•  Preload a wrapper that intercepts the runtime system call and
substitutes with another
• MPI, CUDA, OpenACC, pthread, OpenCL
• OpenMP
• POSIX I/O
• Memory allocation/deallocation routines
• Wrapper library for an external package
•  No modification to the binary executable!
•  Enable other TAU options (communication matrix, OTF2, event-
based sampling)
•  For Python: tau_python replaces python. Similar to tau_exec.
TAU Execution Command (tau_exec)
• Uninstrumented execution
•  % ./a.out
• Track GPU operations
•  % tau_exec –cupti ./a.out
•  % tau_exec –cupti -um ./a.out (for Unified Memory)
•  % tau_exec –opencl ./a.out
•  % tau_exec –openacc ./a.out
•  % tau_exec –rocm ./a.out
• Track MPI performance
•  % tau_exec ./a.out
• Track OpenMP, and MPI performance (MPI enabled by default)
•  % tau_exec –T ompt,tr6,mpi –ompt ./a.out
• Track I/O operations
•  % tau_exec –io –T pthread ./a.out
• Use event based sampling (compile with –g)
•  % tau_exec –ebs ./a.out
•  Also –ebs_source=<PAPI_COUNTER> -ebs_period=<overflow_count>
•  Also –ebs_resolution=[file | function | line ] 12
Configuration tags for tau_exec
% ./configure –pdt=<dir> -mpi –papi=<dir>; make install
Creates in $TAU:
Makefile.tau-papi-mpi-pdt(Configuration parameters in stub makefile)
shared-papi-mpi-pdt/libTAU.so
% ./configure –pdt=<dir> -mpi; make install creates
Makefile.tau-mpi-pdt
shared-mpi-pdt/libTAU.so
To explicitly choose preloading of shared-<options>/libTAU.so change:
% mpirun -np 256 ./a.out to
% mpirun -np 256 tau_exec –T <comma_separated_options> ./a.out
% mpirun -np 256 tau_exec –T papi,mpi,pdt ./a.out
Preloads $TAU/shared-papi-mpi-pdt/libTAU.so
% mpirun -np 256 tau_exec –T papi ./a.out
Preloads $TAU/shared-papi-mpi-pdt/libTAU.so by matching.
% aprun –n 256 tau_exec –T papi,mpi,pdt –s ./a.out
Does not execute the program. Just displays the library that it will preload if executed without the –s option.
NOTE: -mpi configuration is selected by default. Use –T serial for
Sequential programs.
ParaProf Profile Browser
% paraprof
ParaProf Profile Browser
ParaProf 3D Window
Tensorflow mnist example with tau_python
ParaProf: Thread Statistics Window
TAU: Tracking Data Transfers
Thread Profile on Thread 341
ParaProf: Thread Statistics Window Thread 0
ParaProf 3D Window for Caffe: Googlenet
ParaProf Manager Window
ParaProf Node Window
Thread Statistics Window: Caffe Googlenet
MPI_T: TAU and MVAPICH2 (OSU) integration
TAU: Autotuning MVAPICH2 MPI_T CVARs
Vampir [TU Dresden, Germany] showing
improvement: Before and After
TAU’s Static Analysis System:
Program Database Toolkit (PDT)
Application
/ Library
C / C++
parser
Fortran parser
F77/90/95
C / C++
IL analyzer
Fortran
IL analyzer
Program
Database
Files
IL IL
DUCTAPE
TAU
instrumentor
Automatic source
instrumentation
.
.
.
tau_instrumentor	
Parsed	
program	
Instrumentation	
specification	file	
Instrumented		
copy	of	source	
TAU source
analyzer
Application		
source	
PDT: automatic source instrumentation
FUN3D: CFD
TAU
Environment	Variable	 Default	 Description	
TAU_TRACE	 0	 Setting	to	1	turns	on	tracing	
TAU_CALLPATH	 0	 Setting	to	1	turns	on	callpath	profiling	
TAU_TRACK_MEMORY_FOOTPRINT	 0	 Setting	to	1	turns	on	tracking	memory	usage	by	sampling	periodically	the	resident	set	size	and	high	water	mark	of	memory	
usage	
TAU_TRACK_POWER	 0	 Tracks	power	usage	by	sampling	periodically.		
TAU_CALLPATH_DEPTH	 2	 Specifies	depth	of	callpath.	Setting	to	0	generates	no	callpath	or	routine	information,	setting	to	1	generates	flat	profile	and	
context	events	have	just	parent	information	(e.g.,	Heap	Entry:	foo)	
TAU_SAMPLING	 1	 Setting	to	1	enables	event-based	sampling.	
TAU_TRACK_SIGNALS	 0	 Setting	to	1	generate	debugging	callstack	info	when	a	program	crashes	
TAU_COMM_MATRIX	 0	 Setting	to	1	generates	communication	matrix	display	using	context	events	
TAU_THROTTLE	 1	 Setting	to	0	turns	off	throttling.	Throttles	instrumentation	in	lightweight	routines	that	are	called	frequently	
TAU_THROTTLE_NUMCALLS	 100000	 Specifies	the	number	of	calls	before	testing	for	throttling	
TAU_THROTTLE_PERCALL	 10	 Specifies	value	in	microseconds.	Throttle	a	routine	if	it	is	called	over	100000	times	and	takes	less	than	10	usec	of	inclusive	
time	per	call	
TAU_CALLSITE	 0	 Setting	to	1	enables	callsite	profiling	that	shows	where	an	instrumented	function	was	called.	Also	compatible	with	tracing.	
TAU_PROFILE_FORMAT	 Profile	 Setting	to	“merged”	generates	a	single	file.	“snapshot”	generates	xml	format	
TAU_METRICS	 TIME	 Setting	to	a	comma	separated	list	generates	other	metrics.	(e.g.,	
ENERGY,TIME,P_VIRTUAL_TIME,PAPI_FP_INS,PAPI_NATIVE_<event>:<subevent>)	
Runtime Environment Variables
Environment	Variable	 Default	 Description	
TAU_TRACE	 0	 Setting	to	1	turns	on	tracing	
TAU_TRACE_FORMAT	 Default	 Setting	to	“otf2”	turns	on	TAU’s	native	OTF2	trace	generation	(configure	with	–otf=download)	
TAU_EBS_UNWIND	 0	 Setting	to	1	turns	on	unwinding	the	callstack	during	sampling	(use	with	tau_exec	–ebs	or	TAU_SAMPLING=1)	
TAU_EBS_RESOLUTION	 line	 Setting	to	“function”	or	“file”	changes	the	sampling	resolution	to	function	or	file	level	respectively.		
TAU_TRACK_LOAD	 0	 Setting	to	1	tracks	system	load	on	the	node	
TAU_SELECT_FILE	 Default	 Setting	to	a	file	name,	enables	selective	instrumentation	based	on	exclude/include	lists	specified	in	the	file.	
TAU_OMPT_SUPPORT_LEVEL	 basic	 Setting	to	“full”	improves	resolution	of	OMPT	TR6	regions	on	threads	1..	N-1.	Also,	“lowoverhead”	option	is	available.		
TAU_OMPT_RESOLVE_ADDRESS_EAGERLY	 0	 Setting	to	1	is	necessary	for	event	based	sampling	to	resolve	addresses	with	OMPT	TR6	(-ompt=download-tr6)	
Runtime Environment Variables
Environment	Variable	 Default	 Description	
TAU_TRACK_MEMORY_LEAKS	 0	 Tracks	allocates	that	were	not	de-allocated	(needs	–optMemDbg	or	tau_exec	–memory)	
TAU_EBS_SOURCE	 TIME	 Allows	using	PAPI	hardware	counters	for	periodic	interrupts	for	EBS	(e.g.,	
TAU_EBS_SOURCE=PAPI_TOT_INS	when	TAU_SAMPLING=1)	
TAU_EBS_PERIOD	 100000	 Specifies	the	overflow	count	for	interrupts	
TAU_MEMDBG_ALLOC_MIN/MAX	 0	 Byte	size	minimum	and	maximum	subject	to	bounds	checking	(used	with	TAU_MEMDBG_PROTECT_*)	
TAU_MEMDBG_OVERHEAD	 0	 Specifies	the	number	of	bytes	for	TAU’s	memory	overhead	for	memory	debugging.		
TAU_MEMDBG_PROTECT_BELOW/ABOVE	 0	 Setting	to	1	enables	tracking	runtime	bounds	checking	below	or	above	the	array	bounds	(requires	–
optMemDbg	while	building	or	tau_exec	–memory)	
TAU_MEMDBG_ZERO_MALLOC	 0	 Setting	to	1	enables	tracking	zero	byte	allocations	as	invalid	memory	allocations.		
TAU_MEMDBG_PROTECT_FREE	 0	 Setting	to	1	detects	invalid	accesses	to	deallocated	memory	that	should	not	be	referenced	until	it	is	
reallocated	(requires	–optMemDbg	or	tau_exec	–memory)	
TAU_MEMDBG_ATTEMPT_CONTINUE	 0	 Setting	to	1	allows	TAU	to	record	and	continue	execution	when	a	memory	error	occurs	at	runtime.		
TAU_MEMDBG_FILL_GAP	 Undefined	 Initial	value	for	gap	bytes	
TAU_MEMDBG_ALINGMENT	 Sizeof(int)	 Byte	alignment	for	memory	allocations	
TAU_EVENT_THRESHOLD	 0.5	 Define	a	threshold	value	(e.g.,	.25	is	25%)	to	trigger	marker	events	for	min/max	
Runtime Environment Variables (contd.)
Download	TAU	from	U.	Oregon	
	
	
http://www.hpclinux.com	[OVA	file]	
http://tau.uoregon.edu		
for	more	information	
Free	download,	open	source,	BSD	license
Performance Research Lab, UO, Eugene
www.uoregon.edu
Support Acknowledgments
•  US Department of Energy (DOE)
•  ANL
•  Office of Science contracts, ECP
•  SciDAC, LBL contracts
•  LLNL-LANL-SNL ASC/NNSA contract
•  Battelle, PNNL and ORNL contract
•  Department of Defense (DoD)
•  PETTT, HPCMP
•  National Science Foundation (NSF)
•  SI2-SSI, Glassbox
•  NASA
•  CEA, France
•  IBM
•  Partners:
• The Ohio State University
• ParaTools, Inc.
• University of Tennessee, Knoxville
• T.U. Dresden, GWT
• Jülich Supercomputing Center

More Related Content

Similar to TAU Performance tool using OpenPOWER

Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 EstimationLawrence Bernstein
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ICarlos Oliveira
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP PerformancePeterHBrown
 
Modernization Lessons Learned -Part 2
Modernization Lessons Learned -Part 2Modernization Lessons Learned -Part 2
Modernization Lessons Learned -Part 2Emerson Exchange
 
Sumo Logic quickStart Webinar June 2016
Sumo Logic quickStart Webinar June 2016Sumo Logic quickStart Webinar June 2016
Sumo Logic quickStart Webinar June 2016Sumo Logic
 
Problem-solving and design 1.pptx
Problem-solving and design 1.pptxProblem-solving and design 1.pptx
Problem-solving and design 1.pptxTadiwaMawere
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SGanesan Narayanasamy
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic
 
Hovitaga OpenSQL Editor - Overview
Hovitaga OpenSQL Editor - OverviewHovitaga OpenSQL Editor - Overview
Hovitaga OpenSQL Editor - OverviewHovitaga Kft.
 
SplunkLive! Advanced Session
SplunkLive! Advanced SessionSplunkLive! Advanced Session
SplunkLive! Advanced SessionSplunk
 
Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017Sumo Logic
 

Similar to TAU Performance tool using OpenPOWER (20)

Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
 
Indexes overview
Indexes overviewIndexes overview
Indexes overview
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
 
Modernization Lessons Learned -Part 2
Modernization Lessons Learned -Part 2Modernization Lessons Learned -Part 2
Modernization Lessons Learned -Part 2
 
DB
DBDB
DB
 
Sumo Logic quickStart Webinar June 2016
Sumo Logic quickStart Webinar June 2016Sumo Logic quickStart Webinar June 2016
Sumo Logic quickStart Webinar June 2016
 
Problem-solving and design 1.pptx
Problem-solving and design 1.pptxProblem-solving and design 1.pptx
Problem-solving and design 1.pptx
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
Hovitaga OpenSQL Editor - Overview
Hovitaga OpenSQL Editor - OverviewHovitaga OpenSQL Editor - Overview
Hovitaga OpenSQL Editor - Overview
 
SplunkLive! Advanced Session
SplunkLive! Advanced SessionSplunkLive! Advanced Session
SplunkLive! Advanced Session
 
Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017Sumo Logic Quickstart - Jan 2017
Sumo Logic Quickstart - Jan 2017
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 

TAU Performance tool using OpenPOWER

  • 2. TAU Performance System ® http://tau.uoregon.edu •  Tuning and Analysis Utilities (20+ year project) •  Comprehensive performance profiling and tracing •  Integrated, scalable, flexible, portable •  Targets all parallel programming/execution paradigms •  Integrated performance toolkit •  Instrumentation, measurement, analysis, visualization •  Widely-ported performance profiling / tracing system •  Performance data management and data mining •  Open source (BSD-style license) •  Integrates with application frameworks 2
  • 4. Performance Engineering using TAU •  How much time is spent in each application routine and outer loops? Within loops, what is the contribution of each statement? What is the time spent in OpenMP loops? Kernels on the GPU? •  How many instructions are executed in these code regions? Floating point, Level 1 and 2 data cache misses, hits, branches taken? •  What is the memory usage of the code? When and where is memory allocated/de- allocated? Are there any memory leaks? What is the memory footprint of the application? What is the memory high water mark? •  How much energy does the application use in Joules? What is the peak power usage? •  What are the I/O characteristics of the code? What is the peak read and write bandwidth of individual calls, total volume? •  How does the application scale? What is the efficiency, runtime breakdown of performance across different core counts?
  • 5. Instrumentation •  Source instrumentation using a preprocessor •  Add timer start/stop calls in a copy of the source code. •  Use Program Database Toolkit (PDT) for parsing source code. •  Requires recompiling the code using TAU shell scripts (tau_cc.sh, tau_f90.sh) •  Selective instrumentation (filter file) can reduce runtime overhead and narrow instrumentation focus. •  Compiler-based instrumentation •  Use system compiler to add a special flag to insert hooks at routine entry/exit. •  Requires recompiling using TAU compiler scripts (tau_cc.sh, tau_f90.sh…) •  Runtime preloading of TAU’s Dynamic Shared Object (DSO) •  No need to recompile code! Use tau_exec ./app with options.
  • 6. Profiling and Tracing •  Tracing shows you when the events take place on a timeline Profiling Tracing •  Profiling shows you how much (total) time was spent in each routine •  Profiling and tracing Profiling shows you how much (total) time was spent in each routine Tracing shows you when the events take place on a timeline
  • 7. Inclusive vs. Exclusive values ■  Inclusive ■  Information of all sub-elements aggregated into single value ■  Exclusive ■  Information cannot be subdivided further Inclusive Exclusive int foo() { int a; a = 1 + 1; bar(); a = a + 1; return a; }
  • 8. Performance Data Measurement Direct via Probes Indirect via Sampling •  Exact measurement •  Fine-grain control •  Calls inserted into code •  No code modification •  Minimal effort •  Relies on debug symbols (-g) Call START(‘potential’) // code Call STOP(‘potential’)
  • 9. Sampling •  Running program is periodically interrupted to take measurement •  Timer interrupt, OS signal, or HWC overflow •  Service routine examines return-address stack •  Addresses are mapped to routines using symbol table information •  Statistical inference of program behavior •  Not very detailed information on highly volatile metrics •  Requires long-running applications •  Works with unmodified executables Time main foo(0) foo(1) foo(2) int main() { int i; for (i=0; i < 3; i++) foo(i); return 0; } void foo(int i) { if (i > 0) foo(i – 1); } Measurement t9t7t6t5t4 t1 t2 t3 t8
  • 10. Instrumentation •  Measurement code is inserted such that every event of interest is captured directly •  Can be done in various ways •  Advantage: •  Much more detailed information •  Disadvantage: •  Processing of source-code / executable necessary •  Large relative overheads for small functions Time Measurement int main() { int i; for (i=0; i < 3; i++) foo(i); return 0; } void foo(int i) { if (i > 0) foo(i – 1); } Time t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12t13 t14 main foo(0) foo(1) foo(2) Start(“main”); Stop (“main”); Start(“foo”); Stop (“foo”);
  • 11. Using TAU’s Runtime Preloading Tool: tau_exec •  Preload a wrapper that intercepts the runtime system call and substitutes with another • MPI, CUDA, OpenACC, pthread, OpenCL • OpenMP • POSIX I/O • Memory allocation/deallocation routines • Wrapper library for an external package •  No modification to the binary executable! •  Enable other TAU options (communication matrix, OTF2, event- based sampling) •  For Python: tau_python replaces python. Similar to tau_exec.
  • 12. TAU Execution Command (tau_exec) • Uninstrumented execution •  % ./a.out • Track GPU operations •  % tau_exec –cupti ./a.out •  % tau_exec –cupti -um ./a.out (for Unified Memory) •  % tau_exec –opencl ./a.out •  % tau_exec –openacc ./a.out •  % tau_exec –rocm ./a.out • Track MPI performance •  % tau_exec ./a.out • Track OpenMP, and MPI performance (MPI enabled by default) •  % tau_exec –T ompt,tr6,mpi –ompt ./a.out • Track I/O operations •  % tau_exec –io –T pthread ./a.out • Use event based sampling (compile with –g) •  % tau_exec –ebs ./a.out •  Also –ebs_source=<PAPI_COUNTER> -ebs_period=<overflow_count> •  Also –ebs_resolution=[file | function | line ] 12
  • 13. Configuration tags for tau_exec % ./configure –pdt=<dir> -mpi –papi=<dir>; make install Creates in $TAU: Makefile.tau-papi-mpi-pdt(Configuration parameters in stub makefile) shared-papi-mpi-pdt/libTAU.so % ./configure –pdt=<dir> -mpi; make install creates Makefile.tau-mpi-pdt shared-mpi-pdt/libTAU.so To explicitly choose preloading of shared-<options>/libTAU.so change: % mpirun -np 256 ./a.out to % mpirun -np 256 tau_exec –T <comma_separated_options> ./a.out % mpirun -np 256 tau_exec –T papi,mpi,pdt ./a.out Preloads $TAU/shared-papi-mpi-pdt/libTAU.so % mpirun -np 256 tau_exec –T papi ./a.out Preloads $TAU/shared-papi-mpi-pdt/libTAU.so by matching. % aprun –n 256 tau_exec –T papi,mpi,pdt –s ./a.out Does not execute the program. Just displays the library that it will preload if executed without the –s option. NOTE: -mpi configuration is selected by default. Use –T serial for Sequential programs.
  • 17. Tensorflow mnist example with tau_python
  • 18.
  • 20. TAU: Tracking Data Transfers
  • 21. Thread Profile on Thread 341
  • 22. ParaProf: Thread Statistics Window Thread 0
  • 23. ParaProf 3D Window for Caffe: Googlenet
  • 26. Thread Statistics Window: Caffe Googlenet
  • 27. MPI_T: TAU and MVAPICH2 (OSU) integration
  • 29. Vampir [TU Dresden, Germany] showing improvement: Before and After
  • 30. TAU’s Static Analysis System: Program Database Toolkit (PDT) Application / Library C / C++ parser Fortran parser F77/90/95 C / C++ IL analyzer Fortran IL analyzer Program Database Files IL IL DUCTAPE TAU instrumentor Automatic source instrumentation . . .
  • 33. TAU
  • 34. Environment Variable Default Description TAU_TRACE 0 Setting to 1 turns on tracing TAU_CALLPATH 0 Setting to 1 turns on callpath profiling TAU_TRACK_MEMORY_FOOTPRINT 0 Setting to 1 turns on tracking memory usage by sampling periodically the resident set size and high water mark of memory usage TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no callpath or routine information, setting to 1 generates flat profile and context events have just parent information (e.g., Heap Entry: foo) TAU_SAMPLING 1 Setting to 1 enables event-based sampling. TAU_TRACK_SIGNALS 0 Setting to 1 generate debugging callstack info when a program crashes TAU_COMM_MATRIX 0 Setting to 1 generates communication matrix display using context events TAU_THROTTLE 1 Setting to 0 turns off throttling. Throttles instrumentation in lightweight routines that are called frequently TAU_THROTTLE_NUMCALLS 100000 Specifies the number of calls before testing for throttling TAU_THROTTLE_PERCALL 10 Specifies value in microseconds. Throttle a routine if it is called over 100000 times and takes less than 10 usec of inclusive time per call TAU_CALLSITE 0 Setting to 1 enables callsite profiling that shows where an instrumented function was called. Also compatible with tracing. TAU_PROFILE_FORMAT Profile Setting to “merged” generates a single file. “snapshot” generates xml format TAU_METRICS TIME Setting to a comma separated list generates other metrics. (e.g., ENERGY,TIME,P_VIRTUAL_TIME,PAPI_FP_INS,PAPI_NATIVE_<event>:<subevent>) Runtime Environment Variables
  • 35. Environment Variable Default Description TAU_TRACE 0 Setting to 1 turns on tracing TAU_TRACE_FORMAT Default Setting to “otf2” turns on TAU’s native OTF2 trace generation (configure with –otf=download) TAU_EBS_UNWIND 0 Setting to 1 turns on unwinding the callstack during sampling (use with tau_exec –ebs or TAU_SAMPLING=1) TAU_EBS_RESOLUTION line Setting to “function” or “file” changes the sampling resolution to function or file level respectively. TAU_TRACK_LOAD 0 Setting to 1 tracks system load on the node TAU_SELECT_FILE Default Setting to a file name, enables selective instrumentation based on exclude/include lists specified in the file. TAU_OMPT_SUPPORT_LEVEL basic Setting to “full” improves resolution of OMPT TR6 regions on threads 1.. N-1. Also, “lowoverhead” option is available. TAU_OMPT_RESOLVE_ADDRESS_EAGERLY 0 Setting to 1 is necessary for event based sampling to resolve addresses with OMPT TR6 (-ompt=download-tr6) Runtime Environment Variables
  • 36. Environment Variable Default Description TAU_TRACK_MEMORY_LEAKS 0 Tracks allocates that were not de-allocated (needs –optMemDbg or tau_exec –memory) TAU_EBS_SOURCE TIME Allows using PAPI hardware counters for periodic interrupts for EBS (e.g., TAU_EBS_SOURCE=PAPI_TOT_INS when TAU_SAMPLING=1) TAU_EBS_PERIOD 100000 Specifies the overflow count for interrupts TAU_MEMDBG_ALLOC_MIN/MAX 0 Byte size minimum and maximum subject to bounds checking (used with TAU_MEMDBG_PROTECT_*) TAU_MEMDBG_OVERHEAD 0 Specifies the number of bytes for TAU’s memory overhead for memory debugging. TAU_MEMDBG_PROTECT_BELOW/ABOVE 0 Setting to 1 enables tracking runtime bounds checking below or above the array bounds (requires – optMemDbg while building or tau_exec –memory) TAU_MEMDBG_ZERO_MALLOC 0 Setting to 1 enables tracking zero byte allocations as invalid memory allocations. TAU_MEMDBG_PROTECT_FREE 0 Setting to 1 detects invalid accesses to deallocated memory that should not be referenced until it is reallocated (requires –optMemDbg or tau_exec –memory) TAU_MEMDBG_ATTEMPT_CONTINUE 0 Setting to 1 allows TAU to record and continue execution when a memory error occurs at runtime. TAU_MEMDBG_FILL_GAP Undefined Initial value for gap bytes TAU_MEMDBG_ALINGMENT Sizeof(int) Byte alignment for memory allocations TAU_EVENT_THRESHOLD 0.5 Define a threshold value (e.g., .25 is 25%) to trigger marker events for min/max Runtime Environment Variables (contd.)
  • 38. Performance Research Lab, UO, Eugene www.uoregon.edu
  • 39. Support Acknowledgments •  US Department of Energy (DOE) •  ANL •  Office of Science contracts, ECP •  SciDAC, LBL contracts •  LLNL-LANL-SNL ASC/NNSA contract •  Battelle, PNNL and ORNL contract •  Department of Defense (DoD) •  PETTT, HPCMP •  National Science Foundation (NSF) •  SI2-SSI, Glassbox •  NASA •  CEA, France •  IBM •  Partners: • The Ohio State University • ParaTools, Inc. • University of Tennessee, Knoxville • T.U. Dresden, GWT • Jülich Supercomputing Center