TotalView - Memory debugging in parallel and distributed applications
Memory Debugging in ParallelHighlights: and Distributed Applications Challenges with memory debugging Author: Chris Gottbrath, Product Manager, TotalView Identifying and resolving Technologies memory bugs in parallel and distributed applications Abstract: Memory bugs, essentially a mistake in the Using the Heap Interposition management of heap memory, can occur in any Agent (HIA) to analyze memory program that is being written, enhanced, or problems maintained. A memory bug can be caused by a number of factors, including failure to check for error conditions; relying on nonstandard behavior; memory leaks including failure to free memory; dangling references such as failure to clear pointers; array bounds violations; and memory corruption such as writing to memory not owned/over running array bounds. These can sometimes cause programs to crash or generate incorrect “random” results, or more frustratingly, they may lurk in the code base for long periods of time — only to manifest themselves at the worst possible time. Memory problems are difficult to track down with conventional tools on even a simple desktop architecture, and are much more vexing then encountered on a distributed parallel architecture. This paper will review the challenges of memory debugging, with special attention paid to the challenges of parallel development and debugging, introduce a tool that helps developers identify and resolve memory bugs in parallel and distributed applications, highlight its major features and provide usage tips. Published: June 2008
The Challenges of Memory Debugging in Parallel important to note that analogous errors can also be madeDevelopment with memory that is allocated using the C++ newThe fact that memory bugs can be introduced at any time statement and the FORTRAN 90 allocate statement.makes memory debugging a challenging task especially incodes that are written collaboratively or that are being Malloc Errorsmaintained over a long period of time, where assumptions Malloc errors occur when a program passes an invalidabout memory management can either change or not be value to one of the operations in the C Heap Manager API.communicated clearly. They can also lurk in a code base This could potentially happen if the value of a pointer (thefor long periods of time since they are often not address of a block) is copied into another pointer, and thenimmediately fatal and can suddenly become an issue at a later time, both pointers are passed to free(). In thiswhen a program is ported to a new architecture or scaled case, the second free() is incorrect because the specifiedup to a larger problem size, or when code is adapted and pointer does not correspond to an allocated block. Thereused from one program to another. behavior of the program after such an operation is undefined.Memory bugs often manifest themselves in several ways:either as a crash that always happens, a crash that Dangling Pointerssometimes happens (instability), or just as incorrect A pointer can be said to be dangling when it referencesresults. Furthermore, they are difficult to track down with memory that has already been deallocated. Any memorycommonly used development tools and techniques (such access (either a read or a write) through a dangling pointeras printf and traditional source code debuggers), which are can lead to undefined behavior. Programs with danglingnot specifically designed to solve memory problems. pointer bugs may sometimes appear to function without any obvious errors — even for significant amounts of timeAdding parallelism to the mix makes things even harder — if the memory that the dangling pointer points tobecause parallel programs are often squeezed between happens not to be recycled into a new allocation during thetwo effects, meaning that these programs have to be very time that it is accessed.careful with memory. Parallel programs are also written insituations where the problem set is “large,” so the program Memory Bounds Violationsnaturally ends up loading a very significant amount of data Individual memory allocations that are returned by malloc()and using a lot of memory. However, special purpose high- represent discrete blocks of memory with defined sizes.performance computing (HPC) systems often have less Any access to memory immediately before the lowestmemory per node than one might ideally desire, as address in the block or immediately after the highestmemory is expensive. address in the block results in undefined behavior. Read-Before-Write ErrorsClassifying Memory Errors Reading memory before it has been initialized is aPrograms typically make use of several different common error. Some languages assign default values tocategories of memory that are managed in different ways. uninitialized global memory, and many compilers canThese include stack memory, heap memory, shared identify when local variables are read before beingmemory, thread private memory and static or global initialized. What is more difficult and generally can only bememory. However, programmers are required to pay done at random is detecting when memory accessedspecial attention to memory that is allocated out of the through a pointer is read before being initialized. Dynamicheap memory. This is because the management of heap memory is particularly affected, since this is alwaysmemory is done explicitly in the program rather than accessed through a pointer, and in most cases, theimplicitly at compile or run time. content of memory obtained from the memory manager is undefined.There are a number of ways that a program can fail tomake proper use of dynamically allocated heap memory. Itis useful to develop a simple categorization of these Detecting Memory Leaksmistakes for discussion; in this paper, they will be Leaks occur when a program finishes using a block ofdescribed in terms of the C malloc() API. However, it is memory and discards all references to the block, but fails
to call free() to release it back to the heap manager for within one GUI, and the ability to do script-basedreuse. The result is that the program is neither able to debugging to use batch queue environments, make it well-make use of the memory nor reallocate it for a new suited for debugging these parallel and distributedpurpose. applications.The impact of leaks depends on the nature of the MemoryScape Architectureapplication. In some cases the effects are very minor; in MemoryScape accomplishes memory debugging onothers, where the rate of leakage is high enough or the parallel and distributed applications through the modifiedruntime of the program is long enough, leaks can use of a technique called interposition. MemoryScapesignificantly change the memory behavior and the provides a library, called the Heap Interposition Agentperformance characteristics of the program. For long- (HIA), that is inserted between the user’s application coderunning applications or those where memory is limited, and the malloc() subsystem. This library defines functionseven a small leakage rate can have a very serious for each of the memory allocation API functions. It is thesecumulative and adverse effect. This somewhat functions that are initially called by the program wheneverparadoxically takes leaks all that much more annoying, it allocates, reallocates, or frees a block of memory.since they often linger in otherwise well-understood codes.It can be quite challenging to manage dynamic memory incomplex applications to ensure that allocations arereleased exactly once so that malloc and leak errors donot occur.Leak detection can be done at any point in programexecution. As discussed, leaks occur when the programceases using a block of memory without calling free. It ishard to define “ceasing to use” but an advanced memorydebugger is able to execute leak detection by looking tosee if the program retains a reference to specific memorylocations.The MemoryScape Debugger Figure.1 MemoryScape Heap Interposition Agent (HIA)The MemoryScape memory debugger is an easy-to-use Architecture. The HIA sits between the application and thetool for developers. It has a lightweight architecture that memory allocation layer in glibc.requires no recompilation and has modest impact on theruntime performance of the program. The interface is The interposition technique used by MemoryScape wasdesigned around the concept of an inductive user chosen in part because it provides for lightweight memoryinterface, which guides the user through the task of debugging. Low overheads are an important factor if thememory debugging and provides easy-to-understand performance of a program is not to suffer because of thegraphical displays, powerful analysis tools, and features to presence of the HIA. In most cases, the runtime per-support collaboration (making it easy to report a lurking formance of a program being debugged with the HIAmemory bug to the library vendor, scientific collaborator, or engaged will be similar to that where the HIA is absent.colleague who wrote the code in question). This is absolutely critical for high-performance computing applications, where a heavyweight approach that signi-MemoryScape is designed to be used with parallel and ficantly slowed the target program might very well makemulti-process target applications, providing both detailed the runtime of programs exceed the patience of develop-information about individual processes, as well as high ers, administrators and job schedulers.level memory usage statistics across all of the processesthat make up a large parallel application. MemoryScape’s Interposition differs from simply replacing the malloc()specialized features, including support for launching and library with a debug malloc in that the interposition libraryautomatically attaching to all of the processes of a parallel does not actually fulfill any of the operations itself — itjob, the ability to memory debug many processes from arranges for the program’s malloc() API function calls to be
forwarded to the underlying heap manager that would memory statistics window that provides overall memoryhave been called in the absence of the HIA. The effect of usage statistics in a number of graphical forms (line, barinterposing with the HIA is that the program behaves in the and pie charts) for one, all, or an arbitrary subset of thesame way that it would without the HIA — except that the processes that make up the debugging session. The userHIA is able to intercept all of the memory calls and perform may drive the program to a specific breakpoint or barrier,bookkeeping and “sanity checks” before and after the or simply halt all the processes at an arbitrary point inunderlying function is called. execution.The HIA library’s bookkeeping builds up and maintains a The set of processes that the user wishes to see statisticalrecord of all of the active allocations on the heap as the information about may be selected, with the type of viewprogram runs. For each allocation in the heap, it records that the user wants, by clicking “generate view.” Thenot just the position and size of the block, but also a full generated view represents the state of the program at thatfunction call stack representing what the program was point in time. The user may use the debugger processdoing when the block was allocated. The “sanity checks” controls to drive the program to a new point in executionthat the HIA performs are the kinds of things that allow the and then update the view to look for changes. If anyHIA to detect malloc() errors such as freeing the same processes look out of line, the user will likely want to lookblock of memory twice or trying to reallocate a pointer that more closely at the detailed status of the heap memory.points to a stack address. Using MemoryScape to Look at Heap StatusDepending on how it has been configured, the HIA can MemoryScape provides a wide range of heap statusalso detect whether some bounds errors have occurred. reports, the most popular of which is the heap graphicalThe information that the HIA collects is used by the display. At any point where a process has been stopped, aMemoryScape memory debugger to provide the user with user can obtain a graphical view of the heap. This isan accurate picture of the state of the heap. obtained by selecting the heap status tab, selecting one or more processes, choosing the graphical view and clickingMemoryScape Parallel Architecture “generate view.”MemoryScape uses a behind-the-scenes, distributedparallel architecture to manage runtime interaction with theuser’s parallel program. MemoryScape starts lightweightdebugging agent processes (called tvdsvr processes),which run on the nodes of the cluster where the user’scode is executing. These tvdsvr processes are eachresponsible for the lowlevel interactions with the individuallocal processes and the HIA module that is loaded into theprocess that is being debugged. The tvdsvr processescommunicate directly with the MemoryScape front-endprocess, using their own optimized protocol, which in mostcluster configurations is layered on top of TCP/IP.MemoryScape FeaturesUsing MemoryScape to Compare Memory StatisticsMany parallel and distributed applications have known orexpected behaviors in terms of memory usage. They may Figure 2. MemoryScape Graphical Interface provides anbe structured so that all of the nodes should allocate the interactive view of the heap. Colors indicate the status ofsame amount of memory, or they may be structured so memory allocations.that memory usage should depend in some way on theMPI COMMWORLD rank of the process. If such a pattern The resulting display paints a picture of the heap memoryis expected or if the user wishes to simply examine the set in the selected process. Each current heap memoryof processes to look for patterns, MemoryScape features a allocation is represented by a green line extending across
bounds of heap arrays. Developers and scientists can command line version that is specifically designed to becompliment MemoryScape’s heap bounds checking with incorporated into an automatic testing framework.compiler-generated bounds checking code for arrays that Development teams that use MemoryScape in this wayare allocated automatically on the heap or in global can detect, analyze and remove new memory errors asprogram memory. A number of compilers, including the soon as they are introduced in development — before theyIntel™ Fortran Compiler and the open source gfortran have any impact in production.compiler, can generate bounds checking code auto-matically with a compile line option (for example, -check- Future MemoryScape Product Plansbounds for ifort and -fbounds-check for gfortran). Areas for future development of MemoryScape include improved integration of the product into the TotalViewFor developers with more advanced memory debugging Workbench Manager application. The Workbench providesneeds, the TotalView Debugger provides a way of a site for sharing configuration and session data betweendebugging memory problems that allows the user to the different development tools that a developer may need.examine memory information within the source code The Workbench provides an extensible base from whichcontext of the program. When doing memory debugging users can access recent debugging, memory debuggingusing the TotalView source code debugger, the user can and performance analysis sessions. Because integratedexamine data structures that might contain pointers into development environments (IDEs) are popular but notthe heap memory within the program. When memory universally embraced, the Workbench can be used fromdebugging is enabled, these pointers are annotated with within an IDE but does not require it. Future versions of theinformation about the status of the memory block being MemoryScape product will more than likely share statepointed to. and information about programs, input values and parameters, memory data files, etc. with the Workbench.One advanced technique available to users who are usingthe TotalView source code debugger in conjunction with Other areas of future MemoryScape development includethe memory debugger involves the use of watchpoints. improved support for analyzing the historical usage of memory within the application so that developers canWatchpoints are a debugger feature that allows the generate an accurate understanding of the memory usageprocess to be stopped at the moment a given block of behavior of the program. Perhaps the most frequentlymemory is written to. When a pointer is writing “wildly” requested area of enhancement is in regards to providingacross memory (into space that is completely unrelated to developers with additional options for detecting andwhere the pointer is supposed to be writing), it can be very reporting memory reads and writes beyond the extent ofhard to pin down. heap allocations. TotalView Technologies is actively investigating possible enhancements that might allowGuard blocks can be used together with watchpoints to developers to trade off runtime performance for moretrack down this exceptionally subtle type of error. The detailed information in this area.troubleshooting takes two passes. On the first passthrough the program, guard blocks are used to identify a Conclusionspecific block of memory that is erroneously written to. Memory bugs can occur in any program that is beingThen, on the second pass through the program, a watch- written, enhanced or maintained. These types of bugs arepoint is set at that precise address. The watchpoint should often a source of great frustration for developers becausetrigger twice: once when the block of memory is painted by they can be introduced at any time and are caused by athe memory debugger and the second time when the block number of different factors. They can also lurk in a codeof memory is overwritten by the wild pointer. base for long periods of time and tend to manifest in several ways.While most users will want to use MemoryScape inter-actively as outlined above, ongoing development This makes memory debugging a challenging task,introduces the possibility that new memory bugs may be especially in parallel and distributed programs that includeintroduced at any point. Development teams are a significant amount of data and use a lot of memory.encouraged to add heap memory tests to their ongoing Commonly used development tools and techniques are nottesting strategy. MemoryScape includes a non-interactive specifically designed to solve memory problems and can
make the process of finding and fixing memory bugs an MemoryScape’s specialized features, including the abilityeven more complex process. to compare memory statistics, look at heap status and detect memory leaks, make it uniquely well-suited forMemoryScape is an easy-to-use memory debugging tool debugging these parallel and distributed applications.that helps developers identify and resolve memory bugs.To learn more, contact us at firstname.lastname@example.org , or visit us at www.totalviewtech.com.This is a preliminary document and may be changed prior to final commercial release of the technology described herein. This White Paper is forinformational purposes only. TotalView Technologies shall not be liable for technical or editorial errors or omissions contained herein.TotalView Technologies makes no warranties, express, implied or statutory, as to the information contained herein. No part of this document may bereproduced, stored or transmitted in any form or by any means without the express written permission of TotalView Technologies, LLC.TotalView Technologies may have trademarks, patents, copyrights, or other intellectual property rights covering subject matter in this document. The deliveryof this document does not give the recipient any license to these trademarks, patents, copyrights, or other intellectual property.Copyright 2008 TotalView Technologies, LLC. All rights reserved.All other trademarks are property of their respective owners.