2. Memory Management
3 March 2016By: Pooja Mehta
2
The process of binding values to memory
locations
Whether done automatically (as in
Python), or partially by the programmer
(as in C/C++), dynamic memory
management is an important part of
programming language design.
3. Three Categories of Memory
(for Data Storage)
3 March 2016By: Pooja Mehta
3
Static: storage requirements are known
prior to run time; lifetime is the entire
program execution
Run-time stack: memory associated with
active functions
Structured as stack frames
Heap: dynamically allocated storage; the
least organized and most dynamic storage
area
6. Memory Leaks and Garbage
Collection
3 March 2016By: Pooja Mehta
6
Three types of storage
Static
Stack
Heap
Problems with heap storage:
Memory leaks (garbage): failure to free storage when
pointers (references) are reassigned
Dangling pointers: when storage is freed, but
references to the storage still exist.
7. Garbage
3 March 2016By: Pooja Mehta
7
Any block of heap memory that cannot be
accessed by the program; i.e., there is no
stack pointer to the block; but which the
runtime system thinks is in use.
Garbage is created in several ways:
A function ends without returning the space
allocated to a local array or other dynamic
variable.
A node is deleted from a linked data structure, but
isn’t freed
8. Garbage Collection
3 March 2016By: Pooja Mehta
8
All inaccessible blocks of storage are identified
and returned to the free list.
The heap may also be compacted at this time:
allocated space is compressed into one end of
the heap, leaving all free space in a large
block at the other end.
9. Implementing Automated
Garbage Collection
3 March 2016By: Pooja Mehta
9
If programmers were perfect, garbage
collection wouldn’t be needed.
There are three major approaches to
automating the process:
Reference counting
Mark-sweep
Copy collection
10. Reference Counting
10
Prior to Python version 2.0, the Python
interpreter only used reference counting for
memory management.
Reference counting works by counting the
number of times an object is referenced by
other objects in the system.
When references to an object are removed,
the reference count for an object is
decremented.
When the reference count becomes zero the
object is de allocated.
3 March 2016By: Pooja Mehta
11. Mark and Sweep
3 March 2016By: Pooja Mehta
11
Reference counting tries to find unreachable
objects by finding objects with no incoming
references.
Imprecise because we forget which references
those are.
12. Cont..
3 March 2016By: Pooja Mehta
12
The root set is the set of memory locations in
the program that are known to be reachable.
Any objects reachable from the root set are
reachable.
Any objects not reachable from the root set are
not reachable.
13. Cont..
3 March 2016By: Pooja Mehta
13
Marking phase: Find reachable objects.
Add the root set to a worklist.
While the worklist isn't empty:
Remove an object from the worklist.
If it is not marked, mark it and add to the worklist all
objects reachable from that object.
Sweeping phase: Reclaim free memory.
For each allocated object:
If that object isn't marked, reclaim its memory.
If the object is marked, unmark it (so on the next
mark-and sweep iteration we have to mark it again).
14. Copy Collection
3 March 2016By: Pooja Mehta
14
Divide heap into two “semispaces”.
Allocate from one space (fromspace) till full.
Copy live data into other space (tospace).
Switch roles of the spaces.
Requires fixing pointers to moved data
(forwarding).
Eliminates fragmentation.
DFS improves locality, while BFS does not
require any extra storage.
The user does not have to preallocate or deallocate memory by hand as one has to when using dynamic memory allocation in languages such as C or C++.
The easiest way to create a reference cycle is to create an object which refers to itself as in the example below:
def make_cycle():
l = [ ]
l.append(l)
make_cycle()
Reference counting is extremely efficient but it does have some caveats.
One such caveat is that it cannot handle reference cycles.
A reference cycle is when there is no way to reach an object but its reference count is still greater than zero.
Problems
Memory Fragmentation
Work proportional to size of heap
Potential lack of locality of reference for newly allocated objects (fragmentation)
Solution
Compaction
Contiguous live objects, contiguous free space
Automatic Garbage Collection of Cycles
Because reference cycles are take computational work to discover, garbage collection must be a scheduled activity. Python schedules garbage collection based upon a threshold of object allocations and object deallocations. When the number of allocations minus the number of deallocations are greater than the threshold number, the garbage collector is run. One can inspect the threshold for new objects (objects in Python known as generation 0 objects) by loading the gc module and asking for garbage collection thresholds.
Because make_cycle() creates an object l which refers to itself, the object l will not automatically be freed when the function returns.
This will cause the memory that l is using to be held onto until the Python garbage collector is invoked