Published on

.NET Memory management is an impressively complex process, and most of the time it works pretty
well. However, it’s not flawless, and neither are we developers, so memory management problems are
still something that a skilled developer should be prepared for. And while it’s possible to have useful
information about .NET memory management and write better code without fully understanding the
black box inside the framework, there are a few common misconceptions which need to be dispelled
before you can really get started:
1) A garbage collector collects garbage
2) Doing lots of gen0 collections is bad
3) Performance counters are great for understanding what is happening
4) .NET doesn’t leak memory
5) All objects are treated the same

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. THE TOP 5 .NET MEMORY MANAGEMENT MISCONCEPTIONS.NET Memory management is an impressively complex process, and most of the time it works prettywell. However, it’s not flawless, and neither are we developers, so memory management problems arestill something that a skilled developer should be prepared for. And while it’s possible to have usefulinformation about .NET memory management and write better code without fully understanding theblack box inside the framework, there are a few common misconceptions which need to be dispelledbefore you can really get started: 1) A garbage collector collects garbage 2) Doing lots of gen0 collections is bad 3) Performance counters are great for understanding what is happening 4) .NET doesn’t leak memory 5) All objects are treated the sameMisconception #1: A Garbage Collector collects garbageThe run-time system has a notion of objects which it thinks it’s going to touch during the rest of itsexecution, and these are called the “live”, or reachable objects. Conversely, any object which isn’t livecan be regarded as “dead”, and obviously we’d like to be able to reuse the memory resources thatthese dead objects are holding in order to make our program run more efficiently. So it is perhapsunintuitive that the focus of the .NET Garbage Collector (the GC, for short) is actually on the non-garbage; those so-called Live Objects.One of the essential ideas behind the GC strategies that most people implement is that most objectsdie young. If you analyze a lot of programs, you find that, typically, a lot of them generate temporaryobjects while they’re doing some calculation, and then produce some other object to represent theresults of that calculation (in some fashion). A lot of these young objects are therefore temporary, andare going to die quite quickly, so you want to design your GC to collect the dead items without havingto process them all individually. Ideally, you’d like to only walk across the live objects, do somethingwith those to keep them safe, and then get rid of all the objects which you now know are dead withoutgoing through them all one by one.And this is exactly what the .NET GC algorithm does. It is designed to collect dead items withoutprocessing them individually, and to do so with minimal disruption to your system as a whole. Thislatter consideration is what gave rise to the generational model employed by the .NET GC, which I’llmention again shortly. For now I’ll just say that there are three generations, labeled Gen0, Gen1 andGen2, and that new objects are allocated to Gen0, which we’re going to focus on as we take a look at asimple example of how the GC works:A Simple Mutator:We’re going to try and illustrate what I’ve just explained using a simple C# program; a Mutator, asit’s called. This program makes an instance of a collection, which it then assigns into a local variable,and because this collection is assigned to the local variable, it’ll be live. And because this localvariable is used throughout the execution of the While loop you can see below, it’ll be live for therest of the program:
  2. 2. var collect = new List<B>();while(true){ collect.Add(new A()); new A(); new A();}Listing 1 – A simple mutatorWhat we’re doing is allocating three instances of a small class we’ve called A. The first instance we’llput into the collection; it will remain live because it’s referenced by the collection, which is in turnreferenced by the local variable. Then the two other instances will be allocated, but those won’t bereferenced by anything, so they’ll be dead.If we consider how this allocation pattern looks when we study Gen0, we’ll see it looks something likethis:Allocation in Generation 0Figure 1 – A hypothetically empty Generation 0Here we’re assuming that Gen0 is empty at the point when our function starts running. Now of coursethat isn’t ever true; when you start the CLR up, the base class libraries are loaded right away, and theyallocate a lot of objects on the heap before the point at which your program runs. We’ll ignore thosefor the moment (for the sake of clarity), and we’ll also ignore the collection object that we assigned tothat local variable in our first step.So let’s look at the While loop, which we know is allocating 3 instances every time it iterates –we’ve coloured the instance that is referenced by the collection black to mark it as a live object, andthen the two other instances, which are the dead objects, are coloured red.Figure 2 – The initial allocations by the simple mutator (not to scale)We’ll obviously have the same pattern in the second iteration; allocate one live object, followed bytwo dead objects - and then the third iteration will be the same again. Looking at Figure 3, you canclearly see that, as we go into the fourth iteration, we’ll find we have no more space left in Gen0, sowe’ll need to perform a collection.Figure 3 – Generation 0, filled to capacity
  3. 3. No Space? CopyWhen dealing with Generations 0, 1 and 2 (i.e. the Small Object Heap) the .NET CLR uses a copyingstrategy; in this instance, it tries to promote the live objects out of Gen0 and into Gen1. The idea is tofind all of the live objects in Gen0 and copy them into some free space within Gen1 (which we’re alsoassuming is empty, for clarity). Bear in mind that this is just the first step in the collection process,which applies in the same way when the GC is working from Gen1 to Gen2, and is illustrated using thearrows below:Figure 4 – Copying live objects from Gen0 to Gen1 (or Genn to Genn+1)At this point, the GC needs to go through other objects on the heap which reference our instances(such as our collection object), and fix up the pointers from those objects to now point to the newplace where the referenced objects have been copied to.This is a slightly tricky step, because at the point where the GC is copying the objects, it needs to alsomake sure that no threads are actually manipulating those contents. If they were, then it might missupdates that were made to the “old” versions of the objects after it had copied them, or it mightactually modify pointers between objects in such a way that it forgets to copy some object forward.In order to manage this, the .NET runtime brings the threads to what’s known as “safe points”.Essentially, it stops the threads and gives itself an opportunity to redirect these pointers from oldobjects in Gen0 to the newly created copies in Gen1.Figure 5 – Updating pointers to reference the newly copied object, one Generation up.Of course, the cool thing is that once the GC has done that, it can now recycle the whole of Gen0, andcan do so without individually scanning the objects that it used to hold. After all, it knows that the liveobjects have been safely promoted and are correctly referenced, and everything else is dead, and thus
  4. 4. irrelevant. So, assuming most objects die young, we’ve only had to process a very small number ofobjects in order to recycle the whole of Gen0.Figure 6 – Generations 0 and 1, post collection.Observations • The basic trick behind the .NET Generational GC is that objects are allowed to move (or rather, are copied). This is a great way to get them out of the way so that we can reuse their memory without having to process every object individually. It also means that the amount of time needed to perform a collection is proportional to the number of live objects which the GC has to move, rather than the number of dead objects in memory, which it’s going to ignore anyway. • However, as a result of this system, we do have an overhead of needing to get all threads to a safe point. where we can fix up the pointers to reference the location where each object is copied to. This obviously has repercussions on the design of the run-time. For example, you need to have access to data that you get from the JIT and from the program itself, telling you the offset between the various objects at which you might find pointers, so that you can a) scan them to find the live objects, and b) fix those up at some later time. • A term you occasionally hear associated with this promotion policy and its effects is “bump allocation”, which just means that we have the handy ability to allocate things very quickly. If Gen0 starts out completely blank then, when we want to allocate our first object, what we have to do is increment the pointer which is initially pointing to the beginning of the generation by a number of bytes corresponding to the size of that first object. That way, we then know that we can immediately place the next object at that newly offset location, and move the pointer along by the size of that object, etc. This gives us that clean, stacked layout which we saw in the earlier figures, where the objects all occur one after the other.Now it turns out that it’s not quite as easy to do all that as you might think, because there can bemultiple threads, and if you want to avoid locking during the allocation of the object andincrimination of the pointer, you need to do some trickery to ensure that you don’t need to do anythread locking on the fast path (the path you normally take). This is mentioned in more detail in thedownloadable webinar that accompanies this discussion, but I won’t go into it here.One Other TrickThere is another trick that the .NET runtime uses for small objects (i.e. < 85k1), and that’s thegenerational structure which we’ve already encountered. It divides the objects into Gen0 (where itputs the new objects), Gen1 & Gen2 (the latter being where the very old objects go, moving up thegenerations with each garbage collection they survive). In other words, it assumes that the longerobjects survive garbage collections, the longer they are likely to keep surviving. Bearing in mind what1 Objects which are > 85k in size are another matter altogether, and we won’t worry about them for now.
  5. 5. I said earlier about how most objects die young, this structure means that the GC can focus itsattention on doing GCs of just Gen0 (i.e. just a sub-set of the available memory), which is where weexpect to get the greatest return in terms of recycling dead memory.Misconception #2: Lots of Gen0 Allocation is BadThis myth is almost an extension of the material that we’ve looked already at. We’ve seen that thetime to do a Gen0 collection is proportional to the amount of live data in that Generation, althoughthere are some fixed overheads for bringing the threads to a safe point. Moving lots of objects aroundis an expensive thing, but just doing a Gen0 collection is not necessarily an inherently bad thing.Imagine a hypothetical situation whereby Gen0 became full, but all the objects taking up the spacewere dead. In that situation, no live objects would be moved, and so the actual cost of that collectionwould be minimal.The basic answer is that doing lots of Gen0 collections is very probably not a bad thing to do, unlessyou’re in the situation where all the objects in Gen0 are live, in which case you end up allocatingthem first to Gen0, and then immediately copying them up to Gen1 (known as double allocation).Misconception #3: Performance Counters are AccurateWindows comes with a notion of a Performance Counter; this is just some statistic that getsperiodically updated by the system, and you can use tools to look at these values to try and deducewhat’s happening inside the system. The .NET framework offers a number of these, which you canuse various tools to take a look at in pseudo-realtime.From the point of view of memory management, using these performance counters can give you a feelfor whether your application is behaving like a “typical” application, and objects are dying young. Ifyou’re curious about what “typical” means, the various bits of Microsoft documentation availableonline collectively have a reasonably good measure. For example, it is commonly stated that youshould probably expect a ratio of Gen1-to-Gen2 collections of about 10:1.There are very useful performance counters, which we’ll touch upon later, which are to do withallocation rates. Examples of these include the values of “Bytes in all heaps” (which tells us the totalamount of all allocated objects), “time spent in GC”, and “allocated bytes per sec”, all of which wecan graph through the freely available Perfmon tool.Essentially, there’s a lot of data you can get to, which is all being maintained by the system to giveyou, as I mentioned, a sort of pseudo-realtime feel for how your application is behaving. On the faceof it, that’s fine, but we’ve listed this as a misconception because there are a couple of problems withthe way your data is collected and displayed.Periodic MeasurementsFirst of all, it’s important to remember that these counters are updated periodically, and in particularthe .NET memory ones are only updated when a collection happens. That means that if no collectionis happening, then the counter is stuck at its current reading. This means that things like the averagevalues you see in Perfmon are not really telling you exactly what’s happening inside your application,although they’re admittedly better than nothing.To demonstrate some of this, I’ve written a simple C# program that has the same basic structure thatwe saw before: we make a collection object, assign it to a local variable, and we allocate instances ofa small class. This program class will take about 12 bytes on x86. However, we constrain theallocation rate, and only allocate one of these objects once every millisecond. Naturally, with this
  6. 6. accumulation, and given the capacity of Gen0 being 1 or 2 MB, it’s going to take quite a few secondsbefore we fill Gen0 up and provoke a collection:class Program{ static void Main(string[] args) { var accumulator = new List<Program>(); while (true) { DateTime start = DateTime.Now; while ((DateTime.Now - start).TotalSeconds < 15) { accumulator.Add(new Program()); Thread.Sleep(1); } Console.WriteLine(accumulator.Count); } }}Listing 2 – A simple C# allocating objects at a constrained rateFigure 7 – Apparently spiking allocations.If you look at such a program running under Perfmon, instead of seeing a constant allocated bytesper sec counter (which is what we know our program is actually doing), due to the periodic nature ofthe measurement driving the counter, it looks as if the allocation rate is just spiking whenevercollections happen.
  7. 7. Figure 8 – Visualizing the varying generation sizes.It’s also important to remember that the runtime itself is measuring what’s happening. Every time acollection happens, it works out what percentage of the objects survived in order to adapt, choosingoptimal sizes for the various Generations, and trying to maximize throughput.So, if you graph some things, like the various heap sizes, you’ll find you get misleading figures. Forexample, you can see in Figure 8 that the system decided to enlarge Gen2 by a massive amount, andthen chose later to shrink it down again. In short, even though Perfmon is giving us this pseudo-realtime feel, what it shows us is not necessarily exactly how the application itself is behaving.Getting LowerIn order to really see how your application is behaving, you need to dive into it and look at things atthe object or type level, and there are several ways to do this.
  8. 8. Figure 9 – Investigating memory at the object level with WinDbg.Figure 10 – Investigating memory at the object level with WinDbg.
  9. 9. The first one, which is illustrated in Figure 9 and Figure 10, is to use WinDbg, which is part of thedebugging tools for windows, and which you can attach to a running executable. The .NETframework itself comes with a debugger extension, called SOS, which you can load into WinDbg andwhich then allows you to scan the heaps and find details about the objects they contain. Essentially,loading that DLL makes a whole set of extra commands (which know about .NET memory layout)available to the debugger. In Figure 10, for example, we’re dumping all objects of the type program,and it will tell us (for example) that there were 3953 instances of that type on the heap at the pointwhen I took this snapshot. It will also show us that each instance is taking up 12 bytes of memory.Now, if we consider a particular instance, we can use commands like GCRoots to try and relate thatobject back to the root that’s actually keeping it in memory, and that path will show us how it’s beingkept alive, which can be pretty useful information.Figure 11 – ANTS Memory Profiler displaying performance counters.I just want to quickly mention in passing that there are other tools that allow you to do this – I’llinevitably nod to our own ANTS Memory Profiler as an example. All of these tools try to make iteasier to deal with the vast amount of information available in memory debugging. In the case ofANTS, it tries to first show us the information from the various performance counters at the top of thescreen (See Figure 11 above) in order to guide us to a point in time at which we might want to take asnapshot (which is a dump of all the objects in the heap). The profiler then has tools to allow you tocompare your snapshots to try and work out which objects have survived unexpectedly, and whichobjects have been allocated in vast numbers when you don’t expect it. It also allows us to do the root-finding trick we saw a moment ago, but in a much more graphical way (see Figure 12)
  10. 10. Figure 12 – Using ANTS Memory Profiler to find an objects rootsSo, to wrap up the discussion of performance counters, you can use WinDbg, which gives you lots ofinformation but is hard to navigate, or you can try and use more graphical tools, which offer youfiltering and a means to graphically explore the contents of the heap at the point when you took thesnapshot. Either way, always bear in mind that the performance counters that these are based on arenot necessarily representative of what’s going on within your application in real time.Misconception #4: .NET Doesn’t Leak MemoryIn one sense, this statement is literally true, but the there are problems in .NET which have the samesymptoms. It’s ultimately a question of definition:Old DefinitionWhen you used malloc and free to manage memory yourself, a leak was what happened any timethat you forgot to do the free part. Basically, you’d allocate some memory, you’d do some workwith it, and then you’d forget to release it back to the runtime system.Or maybe you were dealing with a very large data structure, and you couldn’t work out what theactual root node into that structure was, which made it very hard for you to start freeing things.Or maybe you called into a library routine which gave you some objects back, and it wasn’t quiteclear if it was you or the library that would later free those objects.
  11. 11. In addition, prematurely releasing objects was often fatal. Say you allocated an objected, freed it, andthen continued to try and use it. If the memory space you were trying to access had since beenallocated to a different object, you’d find yourself with two objects of different types competing overthe same memory, and that would often cause things to go catastrophically wrong.New DefinitionThe good news is that those days are gone2, and that the .NET runtime, which takes care of freeingobjects for you, is also ultra-cautious. It works out whether it thinks a particular object is going to beneeded while your program runs, and it will only release that object if it can completely guarantee thatit is not going to be needed again.The difficulty with this, of course, is that it’s difficult to have an effective cost model in your head,describing when objects that you allocate are actually going to be freed again, so understanding yourown code can pose its own challenges! Moreover, while this managed memory is a boon, it also hasopportunities and loopholes which allow objects to live longer than they should.What Makes Things Live Longer?There are a few things that might cause your objects to live longer than necessary, causing yourapplication to take up much more memory than you think it should:The Runtime ItselfThe type of build you use can have an effect on object persistence. For example, if you choose to do adebug build instead of a release build, you can find that your objects are kept much longer than theyreally need to be, because the compiler hasn’t been as aggressive in its optimization.Having a debugger attached can also make things live longer, as your application will be JITed insuch a way that the local variables live to the end of the function, making it easier to debug things.We’ve already seen that the .NET runtime has a number of heuristics for deciding when to collecthigher generations, and it turns out that Gen2 objects will not be collected very often. So, should youhave an object that accidentally gets promoted into a higher Generation, it can be a long time beforethat object gets freed, and its memory gets recycled.Finalizers are another culprit; in order to implement one, the system waits until an object with afinalizer would have been collected, and then saves that object by promoting it. This means that thisobject will survive until the collection of the next generation, and perhaps longer if the finalizer threaddoesn’t get around to processing it in time.User Interfaces, and How They BehaveEvent handlers are a typical example of this – if you subscribe a delegate to an event handler on along-existing item, say a top-level Windows form, you’re basically adding a reference to both anobject and a method on that object (which is essentially what a delegate is). In short, you’reessentially making your object live as long as the top-level form. If you later forget to unsubscribe theobject, you’ll find that you’ve introduced a memory leak (in the form of an unnecessarily long-livedobject).There are other examples of this kind of mishap, but they’re not terribly common, so I won’t dwell onthem here. 2 Except when you aggressively dispose
  12. 12. LibrariesYou’ll find some libraries will have caches within themselves to improve their performance, but theymay not have a very good lifetime policy on those caches. So you might find that you’reunintentionally keeping the last 50 results, or something like that. Even if you don’t call into thatlibrary for a long time, the cache is still going to stay live, and all of those objects are still going to bearound.The CompilerMy favorite of all of these problems has to do with the way the compiler translates more modernconstructs in C# to run on the CLR 2 infrastructure that the .NET framework provides. Closures andLambda expressions are a very good example.Lambda expressions are not represented as objects in themselves at the level of IL in the CLR, but arerepresented as compiler-generated classes which are used to maintain references to what were thelocal variables.class Program { private static Func<int> s_LongLived; static void Main(string[] args) { var x = 20; var y = new int[20200]; Func<int> getSum = () => x + y.Length; Func<int> getFirst = () => x; s_LongLived = getFirst; } }Listing 3 – Illustrating compiler translation with a lambda expression.In this simple example, we have two local variables which are referenced by a lambda expression,which itself lives for a very long time by being put in a static field. Now, in order to make the lifetimeof these local variables match the lifetime of the lambda function, the C# compiler actually generatesall this by wrapping the local variables into a class, of which it makes an instance, and the compilerthen represents the lambda functions as delegates on that class.So in the case of this example, even though the local variable Y doesn’t need to live for a long time(because the lambda expression only refers to the variable X), we’ll find that, due to the way the C#compiler behaves, this large array will live for a very long time.
  13. 13. Figure 13 – Using .NET Reflector to see how the C# compiler unintentionally keeps objects aliveunnecessarily.If we look in .NET Reflector to see how that code is generated, we see the compiler has generated thisextra display class (see Figure 13, above), and the local variables are actually represented as fieldswithin that display class. However, there is no effort to clear out those fields, even when the systemknows that their values can’t be accessed in the future.Misconception #5: All Objects are Created EqualFinally, as we saw hints of in Misconception #2, it’s a bad idea to use the copy-&-promote strategywe saw earlier for very large objects, because it takes a very long time (and is very expensive) to copythose objects around. So the designers of the .NET memory management system needed a better wayto handle these large objects, and resorted to a standard technique called “mark-and-sweep”. Ratherthan promoting live objects to another generation, the GC leaves them in place, and keeps a record ofthe free areas around them, and then uses that record at a later stage to allocate objects. Crucially,there is no compaction at the moment.No Copying During CollectionLet’s take that simple mutator I wrote earlier, and instead of allocating instances of small objects, weallocate instances of large ones (bear in mind that anything larger than 85k bytes will typically beallocated in the Large Object Heap.) Going through the same scenario as we saw before, we have aseries of instances, live and dead, being allocated in cycles:Figure 14 – Allocating larger objects to the heap
  14. 14. At the point that garbage collection happens, rather than moving these live objects into a newgeneration, the GC just makes a note of where they are, and then scans through the dead objects,noting their address ranges as free blocks, which it’ll try to use later for allocation requests.Figure 15 – The post‐collection heap, with the free blocks noted by the GC.Once again, crucially, there is no copying – everything is left in place.Some ObservationsThat has some advantages – for starters, there is no movement, so we don’t have to do any fix-ups,and we don’t need to bring other threads to a safe point in order to adjust their pointers. That’s alsogiven us a potentially parallelism advantage.The trouble with this model is that it’s introduced potential fragmentation, and we’ll see in just amoment what fragmentation really means. The GC also now has to make a decision regarding at whatpoint it actually does collections in this Large Object Heap, and it was decided to make this areasynonymous with Gen2, at least for GC purposes. This means that whenever a collection of Gen2occurs, so too does a collection of this Large Object Heap.The repercussion of that decision is that temporary large objects don’t really fit into this model. Forsmall objects, it was fine to generate things temporarily, as they’d be very quickly and cheaplyrecycled, but for large objects that’s clearly not the case. So if we generate temporary large objects, itcan be a very long time before a Gen2 collection is carried out, and those temporary objects will beholding memory throughout that period.We also saw that, in order to find the free blocks of memory, the GC has to walk all of the objects onthis heap. This behavior is much more expensive from a paging perspective, as we’re actuallytouching a lot more of the live memory.The Problem of FragmentationThis can be illustrated if you have a program that’s allocating objects in a live-dead-live-dead pattern,as seen below.Figure 16 – An allocation pattern likely to generate a certain degree of fragmentation.After collection, those dead objects are all marked as free space, ready to be recycled:Figure 17 – Post‐collection memory, now fragmented.
  15. 15. The problem occurs when you try to allocate an object that’s actually slightly bigger than any of thesefree blocks...Figure 18 – A hypothetical object, larger than any of the available free slots.Obviously, the GC finds that this new object won’t fit into any of these free areas. So, even thoughthere is enough free memory available to satisfy the request for an object of that size, the GC will findit doesn’t actually have anywhere to put the new object, and will be forced to resize the LOH to makeallocation possible:Figure 19 – The effect of memory fragmentation.ConclusionWe’ve looked at 5 different issues you might have with your .NET memory management, and tried totell you a little bit of the story and history behind each of them. I think the conclusion is really thatthere’s a lot going on inside the heap of your process, and ideally you need to be able to visualizewhat’s going on to be able to understand why things are being kept alive longer than you think.