.NET UY Meetup 7 - CLR Memory by Fabian Alves


Published on

Video: http://youtu.be/kXzSqTqV0D8
Code: http://goo.gl/q4zDKm
Fabian Alves: http://goo.gl/s1kura

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

.NET UY Meetup 7 - CLR Memory by Fabian Alves

  1. 1. CLR Memory Fabian Alves Consultor Dev Tools Microsoft
  2. 2. Heap & Stack • The Stack is more or less responsible for keeping track of what's executing in our code (or what's been "called"). • The Heap is more or less responsible for keeping track of our objects.
  3. 3. Value & Reference Types • Value Types: • bool, byte, char, decimal, double, enum, float, Struct, etc. • System.ValueType • Reference Type: • Class, interface, delegate, object, string • System.Object
  4. 4. Pointers • A pointer is a Reference to a Type • No used explicitly, but managed from the CLR. • A Pointer is a chunk of space in memory that points to another space in memory. A Pointer takes up space just like any other thing that we're putting in the Stack and Heap and its value is either a memory address or null.
  5. 5. What goes Where • A Reference Type always goes to the Heap • Value Types and Pointers always go where they were declared. • The Stack is responsible for keeping track of where each thread is during the execution of our code (or what's been called). • Each thread has it’s own stack.
  6. 6. Stack • Once we start executing the method, the method's parameters are placed on the stack (we'll talk more about passing parameters later).
  7. 7. Stack • Next, control (the thread executing the method) is passed to the instructions to the AddFive() method which lives in our type's method table, a JIT compilation is performed if this is the first time we are hitting the method.
  8. 8. Stack • As the method executes, we need some memory for the "result" variable and it is allocated on the stack.
  9. 9. Stack • As the method executes, we need some memory for the "result" variable and it is allocated on the stack.
  10. 10. Stack • The method finishes execution and our result is returned.
  11. 11. Stack • And all memory allocated on the stack is cleaned up by moving a pointer to the available memory address where AddFive() started and we go down to the previous method on the stack (not seen here).
  12. 12. Heap • Value Types are also sometimes placed on the Heap. • Value Types always go where they were declared • If a Value Type is declared outside of a method, but inside a Reference Type it will be placed within the Reference Type on the Heap.
  13. 13. Heap • Because MyInt is a Reference Type, it is placed on the Heap and referenced by a Pointer on the Stack.
  14. 14. Heap • After AddFive() is finished executing (like in the first example), and we are cleaning up...
  15. 15. Heap • We're left with an orphaned MyInt in the heap (there is no longer anyone in the Stack standing around pointing to MyInt)! • Here is where the GC comes into play. Once our program reaches a certain memory threshold and we need more Heap space, our GC will kick off.
  16. 16. Value type vs Ref Type sample • By executing this method we'll get the value 3.
  17. 17. Value type vs Ref Type sample • By executing this method with the class we'll get the value 4.
  18. 18. Passing value types • When we are passing a value types, space is allocated and the value in our type is copied to the new space on the stack
  19. 19. Passing value types • Next, AddFive() is placed on the stack with space for it's parameters and the value is copied, bit by bit from x.
  20. 20. Passing value types • When AddFive() has finished execution, the thread is passed back to Go() and because AddFive() has completed, pValue is essentially "removed“ • Any value type parameters passed into a method are carbon copies and we count on the original variable's value to be preserved
  21. 21. Big value types • One thing to keep in mind is that if we have a very large value type (such as a big struct) and pass it to the stack, it can get very expensive in terms of space and processor cycles to copy it over each time. The stack does not have infinite space and just like filling a glass of water from the tap, it can overflow
  22. 22. Big value types as Ref • Copying big value types can be really inefficient. Imaging if we passed the MyStruct a couple thousand times and you can understand how it could really bog things down. • So how do we get around this problem? By passing a reference to the original value type as follows:
  23. 23. Passing reference types • Passing parameters that are reference types is similar to passing value types by reference as in the previous example.
  24. 24. Passing reference types • Starting with the call to Go() the variable x goes on the stack. • Starting with the call to DoSomething() the parameter pValue goes on the stack. • The value of x (the address of MyInt on the stack) is copied to pValue • The result es 12345
  25. 25. Passing reference types as Ref • Our variable x is turned into a Vegetable. • x is Animal : False x is Vegetable : True
  26. 26. Passing reference types as Ref • Starting with the Go() method call, the x pointer goes on the stack • The Animal goes on the heap • Starting with the call to Switcharoo() method, the pValue goes on the stack and points to x
  27. 27. Passing reference types as Ref • The Vegetable goes on the heapthe heap • The value of x is changed through pValue to the address of the Vegetable • If we don't pass the Thing by ref, we'll keep the Animal and get the opposite results from our code.
  28. 28. Garbage Collector • Garbage collection is a high-level abstraction that absolves developers of the need to care about managing memory deallocation. • A garbage collector also provides a finalization interface for unmanaged resources that do not reside on the managed heap, so that custom cleanup code can be executed when these resources are no longer needed. The two primary design goals of the .NET garbage collector are: • Remove the burden of memory management bugs and pitfalls • Provide memory management performance that matches or exceeds the performance of manual native allocators
  29. 29. GC Methods • Free List (C++) • Ref count • Tracing (.Net CLR / Java)
  30. 30. GC Phases: Mark • The GC traverses the graph of all objects currently referenced by the application. • Local Roots • Static Roots • GC Handles • The mark phase of the garbage collection cycle is an "almost read-only" phase, at which no objects are shifted in memory or deallocated from it.
  31. 31. Mark Phase Performance • During a full mark, the garbage collector must touch every single referenced object. This results in page faults if the memory is no longer in the working set, and results in cache misses and cache thrashing as objects are traversed. • On a multi-processor system, since the collector marks objects by setting a bit in their header, this causes cache invalidation for other processors that have the object in their cache. • Unreferenced objects are less costly in this phase, and therefore the performance of the mark phase is linear in the collection efficiency factor: the ratio between referenced and unreferenced objects in the collection space. • The performance of the mark phase additionally depends on the number of objects in the graph, and not the memory consumed by these objects. Large objects that do not contain many references are easier to traverse and incur less overhead. This means that the performance of the mark phase is linear in the number of live objects in the graph.
  32. 32. GC Phases: Sweep & Compact • Sweep: • GC reclaims memory of unused objects detected in the mark phase • Compact: • During the compact phase, the garbage collector moves live objects in memory so that they occupy a consecutive area in space
  33. 33. Sweep Phase Performance • The general performance of the sweep phase is linear in the number of objects in the graph, and is especially sensitive to the collection efficiency factor. • If most objects are discovered to be unreferenced, then the GC has to move only a few objects in memory. • The same applies to the scenario where most objects are still referenced, as there are relatively few holes to fill. • If every other object in the heap is unreferenced, the GC may have to move almost every live object to fill the holes.
  34. 34. Compact Phase Performance • Moving objects around means copying memory, which is an expensive operation for large objects. Even if the copy is optimized, copying several megabytes of memory in each garbage collection cycle results in unreasonable overhead. (This is why large objects are treated differently, as we shall see later.) • When objects are moved, references to them must be updated to reflect their new location. For objects that are frequently referenced, this scattered memory access (when references are being updated) can be costly.
  35. 35. Pinning • Occurs when passing managed objects for consumption by unmanaged code. • Pinning an object prevents the garbage collector from moving it around during the sweep phase until it is unpinned • When the garbage collector encounters a pinned object during the compact phase, it must work around that object to ensure that it is not moved in memory • “# of Pinned Objects performance counter” (in the .NET CLR Memory performance counter category)
  36. 36. GC Collection & Threads • When a garbage collection occurs, application threads are normally executing. (the garbage collection request is typically a result of a new allocation being made in the application's code) • The work performed by the GC affects the memory locations of objects and the references to these objects. Moving objects in memory and changing their references while application code is using them is prone to be problematic. • Sweep phase does not support application threads executing concurrently with the garbage collector.
  37. 37. GC Flavors • Workstation: • Single thread performs GC – app threads are suspended • Concurrent workstation (default flavor): • There is a separate, dedicated GC thread marked with THREAD_PRIORITY_HIGHEST that executes the garbage collection from start to finish. • CLR can decide that it wants some phases of the garbage collection process to run concurrently with application threads • Non Concurrent: • The non-concurrent workstation GC flavor, as its name implies, suspends the application threads during both the mark and sweep phases. • The primary usage scenario for non-concurrent workstation GC is the case mentioned in the previous section, when the UI thread tends to trigger garbage collection
  38. 38. GC Flavors 2 • Server: • There is a separate managed heap for each processor in the affinity mask of the .NET process. Allocation requests by a thread on a specific processor are satisfied from the managed heap that belongs to that specific processor. • The garbage collection occurs on a set of dedicated GC threads that are created during application startup and are marked THREAD_PRIORITY_HIGHEST. There is a GC thread for each processor that is in the affinity mask of the .NET process. This allows each thread to perform garbage collection in parallel on the managed heap assigned to its processor. • During both phases of garbage collection, all application threads are suspended. This allows GC to complete in a timely fashion and allows application threads to continue processing requests as soon as possible. It maximizes throughput at the expense of latency.
  39. 39. Changing Flavors Can be changed in configuration or in code <?xml version="1.0" encoding="utf-8" ?> <configuration> <runtime> <gcServer enabled="true" /> <gcConcurrent enabled="false" /> </runtime> </configuration>
  40. 40. Generations • The generational model of the .NET garbage collector optimizes collection performance by performing partial garbage collections. • Partial garbage collections have a higher collection efficiency factor, and the objects traversed by the collector are those with optimal collection likelihood. • The primary decision factor for partitioning objects by collection likelihood is their age—the model assumes that there is an inherent correlation between the object's age and its life expectancy.
  41. 41. Generations • In the generational model, the garbage collected heap is partitioned into three regions: generation 0, generation 1, and generation 2. • These regions reflect on the projected life expectancy of the objects they contain: generation 0 contains the youngest objects, and generation 2 contains old objects that have survived for a while • When an object survives a GC it is moved to the next generation
  42. 42. Generations
  43. 43. Gen 0 • All new objects goes to Gen 0 • It is very small, and cannot accommodate for all the memory usage of even the smallest application. • Generation 0 usually starts with a budget between 256 KB-4 MB and might grow slightly if the need arises. • When a new allocation request cannot be satisfied from generation 0 because it is full, a garbage collection is initia • A garbage collection within generation 0 is a very cheap and efficient processted within generation 0.
  44. 44. Gen 0 Survivors • Almost all objects are expected to disappear from generation 0 when the collection completes. However, some objects might survive due to a variety of reasons: • The application might be poorly-behaved and performs allocations of temporary objects that survive more than a single garbage collection. • The application is at the initialization stage, when long-lived objects are being allocated. • The application has created some temporary short-lived objects which happened to be in use when the garbage collection was triggered. • Survivors are promoted to generation 1, to reflect the fact that their life expectancy is now longer
  45. 45. Gen 0 to Gen 1
  46. 46. Pinned objects in Gen 0 • Pinning an object prevents it from being moved by the garbage collector. • In the generational model, it prevents promotion of pinned objects between generations. • Pinned objects that cause fragmentation within generation 0 have the potential of causing more harm than it might appear from examining pinned before we introduced generations into the picture. • The CLR has the ability to promote pinned objects using the following trick: if generation 0 becomes severely fragmented with pinned objects, the CLR can declare the entire space of generation 0 to be considered a higher generation, and allocate new objects from a new region of memory that will become generation 0.
  47. 47. Gen 1 • Generation 1 is the buffer between generation 0 and generation 2. • It contains objects that have survived one garbage collection. • A typical starting budget for generation 1 ranges from 512 KB-4 MB. • When generation 1 becomes full, a garbage collection is triggered in generation 1. • A garbage collection in generation 1 is still a relatively cheap process. • Surviving objects from generation 1 are promoted to generation 2. This promotion reflects the fact that they are now considered old objects. One of the primary risks in generational model is that temporary objects creep into generation 2 and die shortly afterwards; this is the mid-life crisis. It is extremely important to ensure that temporary objects do not reach generation 2.
  48. 48. Gen 2 • Generation 2 is the ultimate region of memory for objects that have survived at least two garbage collections. In the generational model, these objects are considered old and, based on our assumptions, should not become eligible for garbage collection in the near future. • Generation 2 is not artificially limited in size. Its size can extend the entire memory space dedicated for the OS process, i.e., up to 2 GB of memory on a 32-bit system, or up to 8 TB of memory on a 64-bit system • When a garbage collection occurs within generation 2, it is a full garbage collection. This is the most expensive kind of garbage collection, which can take the longest to complete
  49. 49. Large Object Heap (LOH) • The large object heap (LOH) is a special area reserved for large objects. • Large objects are objects that occupy more than 85KB of memory. • Large objects are allocated from the LOH directly, and do not pass through generation 0, generation 1 or generation 2. • Instead of sweeping large objects and copying them around, the garbage collector employs a different strategy when collecting the LOH. A linked list of all unused memory blocks is maintained, and allocation requests can be satisfied from this list. • The LOH is collected when the threshold for a collection in generation 2 is reached. • One effective strategy is pooling large objects and reusing them instead of releasing them to the GC.
  50. 50. Finalization • Unmanaged resources are those not managed by the CLR or by the garbage collector (such as kernel object handles, database connections, unmanaged memory etc.). Their allocation and deallocation are not governed by GC rules, and the standard memory reclamation techniques outlined above do not suffice when they are concerned. • Freeing unmanaged resources requires an additional feature called finalization, which associates an object (representing an unmanaged resource) with code that must be executed when the object is no longer needed. • Oftentimes, this code should be executed in a deterministic fashion when the resource becomes eligible for deallocation; at other times, it can be delayed for a later non-deterministic point in time.
  51. 51. Finalization types • Manual deterministic finalization: . • It is then the client's responsibility to do finalization • Automatic non-detereministic finalization: • Must rely on the garbage collector to discover whether an object is referenced. The GC's non-deterministic nature, in turn, implies that finalization will be non-deterministic. At times, this non-deterministic behavior is a show-stopper, because temporary "resource leaks" or holding a shared resource locked for just slightly longer than necessary might be unacceptable behaviors. • Automatic Deterministic finalization: • Dispose Pattern
  52. 52. Automatic non-deterministic finalization • Any type can override the protected Finalize method defined by System.Object to indicate that it requires automatic finalization. • The C# syntax for requesting automatic finalization on the File class is the ~ File method. This method is called a finalizer, and it must be invoked when the object is destroyed. • When an object with a finalizer is created, a reference to it is added to a special runtime- managed queue called the finalization queue. This queue is considered a root by the garbage collector, meaning that even if the application has no outstanding reference to the object, it is still kept alive by the finalization queue.
  53. 53. Automatic non-deterministic finalization • When the object becomes unreferenced by the application and a garbage collection occurs, the GC detects that the only reference to the object is the reference from the finalization queue. • The GC consequently moves the object reference to another runtime-managed queue called the f-reachable queue. This queue is also considered a root, so at this point the object is still referenced and considered alive. • The object's finalizer is not run during garbage collection. Instead, a special thread called the finalizer thread is created during CLR initialization • This thread repeatedly waits for the finalization event to become signaled. The GC signals this event after a garbage collection completes, if objects were moved to the f-reachable queue, and as a result the finalizer thread wakes up
  54. 54. Automatic non-deterministic finalization Pitfalls • Objects with finalizers are guaranteed to reach at least generation 1, which makes them more susceptible to the mid-life crisis phenomenon. This increases the chances of performing many full collections. • Objects with finalizers are slightly more expensive to allocate because they are added to the finalization queue. This introduces contention in multi-processor scenarios. Generally speaking, this cost is negligible compared to the other issues. • Pressure on the finalizer thread (many objects requiring finalization) might cause memory leaks. If the application threads are allocating objects at a higher rate than the finalizer thread is able to finalize them, then the application will steadily leak memory from objects waiting for finalization.
  55. 55. Automatic deterministic finalization: Dispose Pattern • The conventional contract established by the .NET Framework dictates that an object which requires deterministic finalization must implement the IDisposable interface, with a single Dispose method. This method should perform deterministic finalization to release unmanaged resources. • Clients of an object implementing the IDisposable interface are responsible for calling Dispose when they have finished using it. In C#, this can be accomplished with a using block, which wraps object usage in a try…finally block and calls Dispose within the finally block. • Automatic finalization is used as a backup finalization strategy if a client does not call Dispose!
  56. 56. GC.SuppressFinalize • It is a mechanism for instructing the garbage collector that the unmanaged resources have already been released and that automatic finalization is no longer required for a particular object. • Disables finalization by setting a bit in the object's header word • The object still remains in the finalization queue, but most of the finalization cost is not incurred because the object's memory is reclaimed immediately after the first collection, and it is never seen by the finalizer thread.
  57. 57. Idisposable implementation
  58. 58. Tools for diagnostics • .Memory CLR perf counters (http://msdn.microsoft.com/en-us/library/x2tyfybc(v=vs.110).aspx) • % Time in GC • Perf View • CLR profiler • Ants memory profiler
  59. 59. Time in GC • To determine whether concurrent GC can provide any benefit for your application, you must first determine how much time it normally spends performing garbage collection. If your application spends 50% of its time reclaiming memory, there remains plenty of room for optimization. On the other hand, if you only perform a collection once in a few minutes, you probably should stick to whatever works for you and pursue significant optimizations elsewhere. You can find out how much time you're spending performing garbage collection through the % Time in GC performance counter in the .NET CLR Memory performance category
  60. 60. Real world stress test results
  61. 61. Event Handler Leak : heap análisis demo class Button { public void OnClick(object sender, EventArgs e) { //Implementation omitted } } class Program { static event EventHandler ButtonClick; static void Main(string[] args) { while (true) { Button button = new Button(); ButtonClick += button.OnClick; } } }
  62. 62. Referencias • Pro .NET Performance by Sasha Goldshtein (Apress - 2012) • C# Heaping vs Stacking http://www.c-sharpcorner.com/UploadFile/rmcochran/csharp_memory01122006130034PM/csharp_memory.aspx