This document provides an overview of garbage collection. It defines garbage collection as the automatic reclamation of dynamically allocated memory after it is no longer used by a program. It describes different types of memory allocation and discusses the challenges of manual memory management. The document then introduces the key concepts of garbage collection, including reachable nodes, mark-sweep collection, and assumptions of collectors. It provides examples of mark-sweep collection and discusses properties, variations, implementation, efficiency analysis, and applications of garbage collection. Finally, it outlines benefits of understanding garbage collection.
1. Garbage Collection
Introduction and Overview
Christian Schulte
Programming Systems Lab
Universität des Saarlandes, Germany
schulte@ps.uni-sb.de
2. Purpose of Talk
Explaining basic
concepts
terminology
Garbage collection…
…is simple
…can be explained at a high-level
Organization
3. Purpose of Talk
Explaining basic
concepts
terminology
(never to be explained again)
Garbage collection…
…is simple
…can be explained at a high-level
Organization
4. Overview
What is garbage collection
objects of interest
principal notions
classic examples with assumptions and properties
Discussion
software engineering issues
typical cost
areas of usage
why knowledge is profitable
Organizational
Material
Requirements
5. Overview
What is garbage collection
objects of interest
principal notions
classic examples with assumptions and properties
Discussion
software engineering issues
typical cost
areas of usage
why knowledge is profitable
Organizational
Material
Requirements
7. Garbage Collection…
dynamically allocated memory
…is concerned with the automatic
reclamation of dynamically allocated
memory after its last use by a program
8. Garbage Collection…
dynamically allocated memory
last use by a program
…is concerned with the automatic
reclamation of dynamically allocated
memory after its last use by a program
9. Garbage Collection…
dynamically allocated memory
last use by a program
automatic reclamation
…is concerned with the automatic
reclamation of dynamically allocated
memory after its last use by a program
11. Kinds of Memory Allocation
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
12. Static Allocation
By compiler (in text area)
Available through entire runtime
Fixed size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
13. Automatic Allocation
Upon procedure call (on stack)
Available during execution of call
Fixed size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
14. Dynamic Allocation
Dynamically allocated at runtime (on heap)
Available until explicitly deallocated
Dynamically varying size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
15. Dynamically Allocated Memory
Also: heap-allocated memory
Allocation: malloc, new, …
before first usage
Deallocation: free, delete, dispose, …
after last usage
Needed for
C++, Java: objects
SML: datatypes, procedures
anything that outlives procedure call
16. Getting it Wrong
Forget to free (memory leak)
program eventually runs out of memory
long running programs: OSs. servers, …
Free to early (dangling pointer)
lucky: illegal access detected by OS
horror: memory reused, in simultaneous use
programs can behave arbitrarily
crashes might happen much later
Estimates of effort
Up to 40%! [Rovner, 1985]
17. Nodes and Pointers
Node n
Memory block, cell
Pointer p
Link to node
Node access: *p
Children children(n)
set of pointers to nodes referred by n
n
p
18. Mutator
Abstraction of program
introduces new nodes with pointer
redirects pointers, creating garbage
19. Nodes referred to by several pointers
Makes manual deallocation hard
local decision impossible
respect other pointers to node
Cycles instance of sharing
Shared Nodes
21. Last Use by a Program
Question: When is node M not any longer
used by program?
Let P be any program not using M
New program sketch:
Execute P; Use M;
Hence:
M used P terminates
We are doomed: halting problem!
So “last use” undecidable!
22. Safe Approximation
Decidable and also simple
What means safe?
only unused nodes freed
What means approximation?
some unused nodes might not be freed
Idea
nodes that can be accessed by mutator
23. Reachable Nodes
Reachable from root set
processor registers
static variables
automatic variables (stack)
Reachable from reachable nodes
root
24. Summary: Reachable Nodes
A node n is reachable, iff
n is element of the root set, or
n is element of children(m) and m is
reachable
Reachable node also called “live”
26. Reachability:
Safe Approximation
Safe
access to not reachable node impossible
depends on language semantics
but C/C++? later…
Approximation
reachable node might never be accessed
programmer must know about this!
have you been aware of this?
28. Example Garbage Collectors
Mark-Sweep
Others
Mark-Compact
Reference Counting
Copying
skipped here
read Chapter 1&2 of [Lins&Jones,96]
29. The Mark-Sweep Collector
Compute reachable nodes: Mark
tracing garbage collector
Free not reachable nodes: Sweep
Run when out of memory: Allocation
First used with LISP [McCarthy, 1960]
36. Recursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
for (m in children(n))
mark(m);
}
}
i-th recursion: nodes
on path with length i
marked
41. Eager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) {
if (is_marked(n)) clear_mark(n);
else free(n);
n += sizeof(*n);
}
}
42. The Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
if (free_pool is empty)
abort(“Memory exhausted”);
}
43. Assumptions
Nodes can be marked
Size of nodes known
Heap contiguous
Memory for recursion available
Child fields known!
44. Assumptions: Realistic
Nodes can be marked
Size of nodes known
Heap contiguous
Memory for recursion available
Child fields known
45. Assumptions: Conservative
Nodes can be marked
Size of nodes known
Heap contiguous
Memory for recursion available
Child fields known
46. Mark-Sweep Properties
Covers cycles and sharing
Time depends on
live nodes (mark)
live and garbage nodes (sweep)
Computation must be stopped
non-interruptible stop/start collector
long pause
Nodes remain unchanged (as not moved)
Heap remains fragmented
52. Overview
What is garbage collection
objects of interest
principal invariant
classic examples with assumptions and properties
Discussion
software engineering issues
typical cost
areas of usage
why knowledge is profitable
Organizational
Material
Requirements
53. Software Engineering Issues
Design goal in SE:
decompose systems
in orthogonal components
Clashes with letting each component
do its memory management
liveness is global property
leads to “local leaks”
lacking power of modern gc methods
54. Typical Cost
Early systems (LISP)
up to 40% [Steele,75] [Gabriel,85]
“garbage collection is expensive” myth
Well engineered system of today
10% of entire runtime [Wilson, 94]
55. Areas of Usage
Programming languages and systems
Java, C#, Smalltalk, …
SML, Lisp, Scheme, Prolog, …
Modula 3, Microsoft .NET
Extensions
C, C++ (Conservative)
Other systems
Adobe Photoshop
Unix filesystem
Many others in [Wilson, 1996]
56. Understanding Garbage
Collection: Benefits
Programming garbage collection
programming systems
operating systems
Understand systems with garbage collection
(e.g. Java)
memory requirements of programs
performance aspects of programs
interfacing with garbage collection (finalization)
57. Overview
What is garbage collection
objects of interest
principal invariant
classic examples with assumptions and properties
Discussion
software engineering issues
typical cost
areas of usage
why knowledge is profitable
Organizational
Material
Requirements
58. Material
Garbage Collection. Richard Jones
and Rafael Lins, John Wiley & Sons,
1996.
Uniprocessor garbage collection
techniques. Paul R. Wilson, ACM
Computing Surveys. To appear.
Extended version of IWMM 92, St. Malo.
59. Organization
Requirements
Talk
duration 45 min (excluding discussion)
Attendance
including discussion
Written summary
10 pages
to be submitted in PDF until Mar 31st, 2002
Schedule
weekly
starting Nov 14th, 2001
next on Dec 5th, 2001