.NET UY Meetup 7 - CLR Memory by Fabian Alves

CLR Memory
Fabian Alves
Consultor Dev Tools
Microsoft

Heap & Stack
• The Stack is more or less
responsible for keeping track of
what's executing in our code (or
what's been "called").
• The Heap is more or less
responsible for keeping track of
our objects.

Value & Reference Types
• Value Types:
• bool, byte, char, decimal, double, enum, float, Struct, etc.
• System.ValueType
• Reference Type:
• Class, interface, delegate, object, string
• System.Object

Pointers
• A pointer is a Reference to a
Type
• No used explicitly, but managed
from the CLR.
• A Pointer is a chunk of space in
memory that points to another
space in memory. A Pointer
takes up space just like any other
thing that we're putting in the
Stack and Heap and its value is
either a memory address or null.

What goes Where
• A Reference Type always goes to the Heap
• Value Types and Pointers always go where they were declared.
• The Stack is responsible for keeping track of where each thread is
during the execution of our code (or what's been called).
• Each thread has it’s own stack.

Stack
• Once we start executing the method,
the method's parameters are placed on
the stack (we'll talk more about passing
parameters later).

Stack
• Next, control (the thread executing the
method) is passed to the instructions
to the AddFive() method which lives in
our type's method table, a JIT
compilation is performed if this is the
first time we are hitting the method.

Stack
• As the method executes, we need
some memory for the "result" variable
and it is allocated on the stack.

Stack
• The method finishes execution and our
result is returned.

Stack
• And all memory allocated on the stack
is cleaned up by moving a pointer to
the available memory address where
AddFive() started and we go down to
the previous method on the stack (not
seen here).

Heap
• Value Types are also sometimes placed
on the Heap.
• Value Types always go where they were
declared
• If a Value Type is declared outside of a
method, but inside a Reference Type it
will be placed within the Reference
Type on the Heap.

Heap
• Because MyInt is a Reference Type, it is
placed on the Heap and referenced by
a Pointer on the Stack.

Heap
• After AddFive() is finished executing
(like in the first example), and we are
cleaning up...

Heap
• We're left with an orphaned
MyInt in the heap (there is no
longer anyone in the Stack
standing around pointing to
MyInt)!
• Here is where the GC comes
into play. Once our program
reaches a certain memory
threshold and we need more
Heap space, our GC will kick off.

Value type vs Ref Type sample
• By executing this method we'll
get the value 3.

Value type vs Ref Type sample
• By executing this method with
the class we'll get the value 4.

Passing value types
• When we are passing a value types,
space is allocated and the value in our
type is copied to the new space on the
stack

Passing value types
• Next, AddFive() is placed on the stack
with space for it's parameters and the
value is copied, bit by bit from x.

Passing value types
• When AddFive() has finished
execution, the thread is passed back to
Go() and because AddFive() has
completed, pValue is essentially
"removed“
• Any value type parameters passed into
a method are carbon copies and we
count on the original variable's value to
be preserved

Big value types
• One thing to keep in mind is that if
we have a very large value type
(such as a big struct) and pass it to
the stack, it can get very expensive
in terms of space and processor
cycles to copy it over each
time. The stack does not have
infinite space and just like filling a
glass of water from the tap, it can
overflow

Big value types as Ref
• Copying big value types can be
really inefficient. Imaging if we
passed the MyStruct a couple
thousand times and you can
understand how it could really bog
things down.
• So how do we get around this
problem? By passing a reference to
the original value type as follows:

Passing reference types
• Passing parameters that are
reference types is similar to
passing value types by reference as
in the previous example.

• Starting with the call to Go() the
variable x goes on the stack.
• Starting with the call to
DoSomething() the parameter
pValue goes on the stack.
• The value of x (the address of
MyInt on the stack) is copied to
pValue
• The result es 12345

as Ref
• Our variable x is turned into a
Vegetable.
• x is Animal : False
x is Vegetable : True

as Ref
• Starting with the Go() method call,
the x pointer goes on the stack
• The Animal goes on the heap
• Starting with the call to Switcharoo()
method, the pValue goes on the stack
and points to x

as Ref
• The Vegetable goes on the heapthe heap
• The value of x is changed through pValue
to the address of the Vegetable
• If we don't pass the Thing by ref, we'll
keep the Animal and get the opposite
results from our code.

Garbage Collector
• Garbage collection is a high-level abstraction that absolves developers
of the need to care about managing memory deallocation.
• A garbage collector also provides a finalization interface for
unmanaged resources that do not reside on the managed heap, so
that custom cleanup code can be executed when these resources are
no longer needed. The two primary design goals of the .NET garbage
collector are:
• Remove the burden of memory management bugs and pitfalls
• Provide memory management performance that matches or exceeds the
performance of manual native allocators

GC Methods
• Free List (C++)
• Ref count
• Tracing (.Net CLR / Java)

GC Phases: Mark
• The GC traverses the graph of all objects
currently referenced by the application.
• Local Roots
• Static Roots
• GC Handles
• The mark phase of the garbage collection
cycle is an "almost read-only" phase, at which
no objects are shifted in memory or
deallocated from it.

Mark Phase Performance
• During a full mark, the garbage collector must touch every single referenced
object. This results in page faults if the memory is no longer in the working set,
and results in cache misses and cache thrashing as objects are traversed.
• On a multi-processor system, since the collector marks objects by setting a bit in
their header, this causes cache invalidation for other processors that have the
object in their cache.
• Unreferenced objects are less costly in this phase, and therefore the performance
of the mark phase is linear in the collection efficiency factor: the ratio between
referenced and unreferenced objects in the collection space.
• The performance of the mark phase additionally depends on the number of
objects in the graph, and not the memory consumed by these objects. Large
objects that do not contain many references are easier to traverse and incur less
overhead. This means that the performance of the mark phase is linear in the
number of live objects in the graph.

GC Phases: Sweep & Compact
• Sweep:
• GC reclaims memory of unused objects detected
in the mark phase
• Compact:
• During the compact phase, the garbage collector
moves live objects in memory so that they
occupy a consecutive area in space

Sweep Phase Performance
• The general performance of the sweep phase is linear in the number
of objects in the graph, and is especially sensitive to the collection
efficiency factor.
• If most objects are discovered to be unreferenced, then the GC has to
move only a few objects in memory.
• The same applies to the scenario where most objects are still
referenced, as there are relatively few holes to fill.
• If every other object in the heap is unreferenced, the GC may have to
move almost every live object to fill the holes.

Compact Phase Performance
• Moving objects around means copying memory, which is an
expensive operation for large objects. Even if the copy is optimized,
copying several megabytes of memory in each garbage collection
cycle results in unreasonable overhead. (This is why large objects are
treated differently, as we shall see later.)
• When objects are moved, references to them must be updated to
reflect their new location. For objects that are frequently referenced,
this scattered memory access (when references are being updated)
can be costly.

Pinning
• Occurs when passing managed objects for consumption by
unmanaged code.
• Pinning an object prevents the garbage collector from moving it
around during the sweep phase until it is unpinned
• When the garbage collector encounters a pinned object during the
compact phase, it must work around that object to ensure that it is
not moved in memory
• “# of Pinned Objects performance counter” (in the .NET CLR Memory
performance counter category)

GC Collection & Threads
• When a garbage collection occurs, application threads are normally
executing. (the garbage collection request is typically a result of a new
allocation being made in the application's code)
• The work performed by the GC affects the memory locations of
objects and the references to these objects. Moving objects in
memory and changing their references while application code is using
them is prone to be problematic.
• Sweep phase does not support application threads executing
concurrently with the garbage collector.

GC Flavors
• Workstation:
• Single thread performs GC – app threads are suspended
• Concurrent workstation (default flavor):
• There is a separate, dedicated GC thread marked with THREAD_PRIORITY_HIGHEST
that executes the garbage collection from start to finish.
• CLR can decide that it wants some phases of the garbage collection process to run
concurrently with application threads
• Non Concurrent:
• The non-concurrent workstation GC flavor, as its name implies, suspends the
application threads during both the mark and sweep phases.
• The primary usage scenario for non-concurrent workstation GC is the case
mentioned in the previous section, when the UI thread tends to trigger garbage
collection

GC Flavors 2
• Server:
• There is a separate managed heap for each processor in the affinity mask of
the .NET process. Allocation requests by a thread on a specific processor are
satisfied from the managed heap that belongs to that specific processor.
• The garbage collection occurs on a set of dedicated GC threads that are
created during application startup and are marked
THREAD_PRIORITY_HIGHEST. There is a GC thread for each processor that is in
the affinity mask of the .NET process. This allows each thread to perform
garbage collection in parallel on the managed heap assigned to its processor.
• During both phases of garbage collection, all application threads are
suspended. This allows GC to complete in a timely fashion and allows
application threads to continue processing requests as soon as possible. It
maximizes throughput at the expense of latency.

Changing Flavors
Can be changed in configuration or in code
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<runtime>
<gcServer enabled="true" />
<gcConcurrent enabled="false" />
</runtime>
</configuration>

Generations
• The generational model of the .NET garbage collector optimizes
collection performance by performing partial garbage collections.
• Partial garbage collections have a higher collection efficiency factor,
and the objects traversed by the collector are those with optimal
collection likelihood.
• The primary decision factor for partitioning objects by collection
likelihood is their age—the model assumes that there is an inherent
correlation between the object's age and its life expectancy.

Generations
• In the generational model, the garbage collected heap is partitioned
into three regions: generation 0, generation 1, and generation 2.
• These regions reflect on the projected life expectancy of the objects
they contain: generation 0 contains the youngest objects, and
generation 2 contains old objects that have survived for a while
• When an object survives a GC it is moved to the next generation

Gen 0
• All new objects goes to Gen 0
• It is very small, and cannot accommodate for all the memory usage of
even the smallest application.
• Generation 0 usually starts with a budget between 256 KB-4 MB and
might grow slightly if the need arises.
• When a new allocation request cannot be satisfied from generation 0
because it is full, a garbage collection is initia
• A garbage collection within generation 0 is a very cheap and efficient
processted within generation 0.

Gen 0 Survivors
• Almost all objects are expected to disappear from generation 0 when the
collection completes. However, some objects might survive due to a variety
of reasons:
• The application might be poorly-behaved and performs allocations of
temporary objects that survive more than a single garbage collection.
• The application is at the initialization stage, when long-lived objects are
being allocated.
• The application has created some temporary short-lived objects which
happened to be in use when the garbage collection was triggered.
• Survivors are promoted to generation 1, to reflect the fact that their life
expectancy is now longer

Pinned objects in Gen 0
• Pinning an object prevents it from being moved by the garbage collector.
• In the generational model, it prevents promotion of pinned objects
between generations.
• Pinned objects that cause fragmentation within generation 0 have the
potential of causing more harm than it might appear from examining
pinned before we introduced generations into the picture.
• The CLR has the ability to promote pinned objects using the following trick:
if generation 0 becomes severely fragmented with pinned objects, the CLR
can declare the entire space of generation 0 to be considered a higher
generation, and allocate new objects from a new region of memory that
will become generation 0.

Gen 1
• Generation 1 is the buffer between generation 0 and generation 2.
• It contains objects that have survived one garbage collection.
• A typical starting budget for generation 1 ranges from 512 KB-4 MB.
• When generation 1 becomes full, a garbage collection is triggered in
generation 1.
• A garbage collection in generation 1 is still a relatively cheap process.
• Surviving objects from generation 1 are promoted to generation 2. This
promotion reflects the fact that they are now considered old objects. One
of the primary risks in generational model is that temporary objects creep
into generation 2 and die shortly afterwards; this is the mid-life crisis. It is
extremely important to ensure that temporary objects do not reach
generation 2.

Gen 2
• Generation 2 is the ultimate region of memory for objects that have
survived at least two garbage collections. In the generational model,
these objects are considered old and, based on our assumptions,
should not become eligible for garbage collection in the near future.
• Generation 2 is not artificially limited in size. Its size can extend the
entire memory space dedicated for the OS process, i.e., up to 2 GB of
memory on a 32-bit system, or up to 8 TB of memory on a 64-bit
system
• When a garbage collection occurs within generation 2, it is a full
garbage collection. This is the most expensive kind of garbage
collection, which can take the longest to complete

Large Object Heap (LOH)
• The large object heap (LOH) is a special area reserved for large objects.
• Large objects are objects that occupy more than 85KB of memory.
• Large objects are allocated from the LOH directly, and do not pass through
generation 0, generation 1 or generation 2.
• Instead of sweeping large objects and copying them around, the garbage
collector employs a different strategy when collecting the LOH. A linked list
of all unused memory blocks is maintained, and allocation requests can be
satisfied from this list.
• The LOH is collected when the threshold for a collection in generation 2 is
reached.
• One effective strategy is pooling large objects and reusing them instead of
releasing them to the GC.

Finalization
• Unmanaged resources are those not managed by the CLR or by the garbage
collector (such as kernel object handles, database connections, unmanaged
memory etc.). Their allocation and deallocation are not governed by GC
rules, and the standard memory reclamation techniques outlined above do
not suffice when they are concerned.
• Freeing unmanaged resources requires an additional feature called
finalization, which associates an object (representing an unmanaged
resource) with code that must be executed when the object is no longer
needed.
• Oftentimes, this code should be executed in a deterministic fashion when
the resource becomes eligible for deallocation; at other times, it can be
delayed for a later non-deterministic point in time.

Finalization types
• Manual deterministic finalization: .
• It is then the client's responsibility to do finalization
• Automatic non-detereministic finalization:
• Must rely on the garbage collector to discover whether an object is
referenced. The GC's non-deterministic nature, in turn, implies that
finalization will be non-deterministic. At times, this non-deterministic
behavior is a show-stopper, because temporary "resource leaks" or holding a
shared resource locked for just slightly longer than necessary might be
unacceptable behaviors.
• Automatic Deterministic finalization:
• Dispose Pattern

Automatic non-deterministic finalization
• Any type can override the protected Finalize
method defined by System.Object to indicate
that it requires automatic finalization.
• The C# syntax for requesting automatic
finalization on the File class is the ~ File
method. This method is called a finalizer, and it
must be invoked when the object is destroyed.
• When an object with a finalizer is created, a
reference to it is added to a special runtime-
managed queue called the finalization queue.
This queue is considered a root by the garbage
collector, meaning that even if the application
has no outstanding reference to the object, it is
still kept alive by the finalization queue.

• When the object becomes unreferenced by the
application and a garbage collection occurs, the GC
detects that the only reference to the object is the
reference from the finalization queue.
• The GC consequently moves the object reference to
another runtime-managed queue called the f-reachable
queue. This queue is also considered a root, so at this
point the object is still referenced and considered alive.
• The object's finalizer is not run during garbage
collection. Instead, a special thread called the finalizer
thread is created during CLR initialization
• This thread repeatedly waits for the finalization event to
become signaled. The GC signals this event after a
garbage collection completes, if objects were moved to
the f-reachable queue, and as a result the finalizer
thread wakes up

Pitfalls
• Objects with finalizers are guaranteed to reach at least generation
1, which makes them more susceptible to the mid-life crisis
phenomenon. This increases the chances of performing many full
collections.
• Objects with finalizers are slightly more expensive to allocate because
they are added to the finalization queue. This introduces contention in
multi-processor scenarios. Generally speaking, this cost is negligible
compared to the other issues.
• Pressure on the finalizer thread (many objects requiring finalization)
might cause memory leaks. If the application threads are allocating
objects at a higher rate than the finalizer thread is able to finalize
them, then the application will steadily leak memory from objects
waiting for finalization.

Automatic deterministic finalization:
Dispose Pattern
• The conventional contract established by the .NET Framework
dictates that an object which requires deterministic finalization must
implement the IDisposable interface, with a single Dispose method.
This method should perform deterministic finalization to release
unmanaged resources.
• Clients of an object implementing the IDisposable interface are
responsible for calling Dispose when they have finished using it. In C#,
this can be accomplished with a using block, which wraps object
usage in a try…finally block and calls Dispose within the finally block.
• Automatic finalization is used as a backup finalization strategy if a
client does not call Dispose!

GC.SuppressFinalize
• It is a mechanism for instructing the garbage collector that the
unmanaged resources have already been released and that automatic
finalization is no longer required for a particular object.
• Disables finalization by setting a bit in the object's header word
• The object still remains in the finalization queue, but most of the
finalization cost is not incurred because the object's memory is
reclaimed immediately after the first collection, and it is never seen
by the finalizer thread.

Tools for diagnostics
• .Memory CLR perf counters (http://msdn.microsoft.com/en-us/library/x2tyfybc(v=vs.110).aspx)
• % Time in GC
• Perf View
• CLR profiler
• Ants memory profiler

Time in GC
• To determine whether concurrent GC can provide any benefit for your
application, you must first determine how much time it normally
spends performing garbage collection. If your application spends 50%
of its time reclaiming memory, there remains plenty of room for
optimization. On the other hand, if you only perform a collection once
in a few minutes, you probably should stick to whatever works for you
and pursue significant optimizations elsewhere. You can find out how
much time you're spending performing garbage collection through
the % Time in GC performance counter in the .NET CLR Memory
performance category

Real world stress test results

Event Handler Leak : heap análisis demo
class Button {
public void OnClick(object sender, EventArgs e) { //Implementation omitted }
}
class Program {
static event EventHandler ButtonClick;
static void Main(string[] args)
{
while (true)
{
Button button = new Button();
ButtonClick += button.OnClick;
}
}
}

Referencias
• Pro .NET Performance by Sasha Goldshtein (Apress - 2012)
• C# Heaping vs Stacking
http://www.c-sharpcorner.com/UploadFile/rmcochran/csharp_memory01122006130034PM/csharp_memory.aspx

.NET UY Meetup 7 - CLR Memory by Fabian Alves

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to .NET UY Meetup 7 - CLR Memory by Fabian Alves

Similar to .NET UY Meetup 7 - CLR Memory by Fabian Alves (20)

Recently uploaded

Recently uploaded (20)

.NET UY Meetup 7 - CLR Memory by Fabian Alves