Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[email_address] Basic Garbage Collection Techniques
Contents <ul><li>What is Garbage Collection </li></ul><ul><li>Reference Counting </li></ul><ul><li>Mark-Sweep Collection <...
1. Garbage Collection <ul><li>Garbage Collection (GC)  </li></ul><ul><li>Is the automatic   storage reclamation of compute...
Explicit Storage Management Hazards <ul><li>Programming errors may lead to errors in the storage management: </li></ul><ul...
Explicit Storage Management Hazards <ul><li>In many large systems, garbage collection is partially implemented in the syst...
GC Complexity <ul><li>Garbage Collection is sometimes considered cheaper than explicit deallocation </li></ul><ul><li>A go...
Garbage Collection – The Two-Phase Abstraction <ul><li>The basic functioning of a garbage collector consists, abstractly s...
2. Reference Counting <ul><li>Each object has an associated count of the references (pointers) to it </li></ul><ul><li>Eac...
Reference Counting <ul><li>When an object is reclaimed, its pointer fields are examined and every object it points to has ...
The Cycles Problem <ul><li>Reference Counting fails to reclaim circular structures </li></ul><ul><li>Originates from the d...
The Efficiency Problem <ul><li>When a pointer is created or destroyed, its reference count must be adjusted </li></ul><ul>...
Deferred Reference Counting <ul><li>Much of this cost can be optimized away by special treatment of local variables </li><...
Reference Counting - Recap <ul><li>While reference counting is out of vogue for high-performance applications </li></ul><u...
3. Mark-Sweep Collection  - [McCarthy 1960] <ul><li>Distinguishing  live object from garbage( Mark phase ) </li></ul><ul><...
Example Mark phase Sweep phase Before Heap root root root Free_list
Example p p Live object Marked object Garbage Free list r1 Free list r1 Free list r1
Basic Algorithm New (A)= if free_list is empty mark_sweep() if free_list is empty return (“out-of-memory”) pointer = alloc...
Properties of Mark & Sweep <ul><li>Fragmentation </li></ul><ul><ul><li>Difficult to allocate large objects </li></ul></ul>...
Standard Engineering Techniques <ul><li>Mark: </li></ul><ul><ul><li>Explicit mark-stack (avoid recursion) </li></ul></ul><...
4. Mark-Compact Collection Compact after mark <ul><li>Mark-Compact collectors remedy the fragmentation and allocation prob...
Mark-Compact Collection <ul><li>Garbage objects are “squeezed” to the end of  the memory  </li></ul><ul><li>The process re...
Object Ordering <ul><li>Arbitrary –no guaranteed order.  </li></ul><ul><li>Linearizing–objects pointing to one another are...
The Two Finger Algorithm  [Edwards 1974] <ul><li>Simplest algorithm:  </li></ul><ul><ul><li>Designed for objects of equal ...
Pass I: Compact <ul><li>Use two pointers: </li></ul><ul><ul><li>free: goes over heap from bottom searching for free/empty ...
Two finger, pass I - Example
Pass II: Fix Pointers <ul><li>Go over live objects in the heap </li></ul><ul><ul><li>Go over their pointers </li></ul></ul...
<ul><li>Simple! </li></ul><ul><li>Relatively fast: One pass + pass on live objects (and their previous location).  </li></...
The Lisp2 Algorithm <ul><li>Improvements: handle variable sized objects, keep order of objects.  </li></ul><ul><li>Require...
The Lisp2 Algorithm <ul><li>Pass 1: Address computation. Keep new address in an additional object field.  </li></ul><ul><l...
Lisp 2 –Properties <ul><li>Simple enough. </li></ul><ul><li>No constraints on object sizes.  </li></ul><ul><li>Order of ob...
5. Copying Garbage Collection <ul><li>Like Mark-Compact, the algorithm moves all of the  live objects into one area, and t...
Stop-and-Copy Collector <ul><li>Memory is allocated linearly upward through the “current” semispace </li></ul><ul><li>When...
Cheney breath-first copying
The algorithm Init()= Tospace=Heap_bottom space_size=Heap_size/2 top_of_space=Tospace+space_size Fromspace=top_of_space+1 ...
Efficiency of Copying Collection <ul><li>Can be made arbitrarily efficient if sufficient memory is available </li></ul><ul...
6. Non-Copying Implicit Collection <ul><li>Need more two pointer fields and a color field to each object </li></ul><ul><li...
7. Choosing Among Basic Tracing Techniques <ul><li>A common criterion for high-performance garbage collection is that the ...
8. Problems with Simple Tracing Collectors
9. Conservatism in GC  <ul><li>The first conservative assumption most collectors make is that any variable in the stack, g...
Upcoming SlideShare
Loading in …5
×

Basic Garbage Collection Techniques

14,539 views

Published on

Published in: Technology, Education

Basic Garbage Collection Techniques

  1. 1. [email_address] Basic Garbage Collection Techniques
  2. 2. Contents <ul><li>What is Garbage Collection </li></ul><ul><li>Reference Counting </li></ul><ul><li>Mark-Sweep Collection </li></ul><ul><li>Mark-Compact Collection </li></ul><ul><li>Copying Garbage Collection </li></ul><ul><li>Non-Copy Implicit Collection </li></ul><ul><li>Choosing Among Basic Tracing Techniques </li></ul><ul><li>Problems with Simple Tracing Collectors </li></ul><ul><li>Conservatism in GC </li></ul>Tracing
  3. 3. 1. Garbage Collection <ul><li>Garbage Collection (GC) </li></ul><ul><li>Is the automatic storage reclamation of computer storage </li></ul><ul><li>The GC function is to find data objects that are no longer in use and make their space available by the running program </li></ul><ul><li>So Why Garbage Collection? </li></ul><ul><li>A software routine operating on a data structure should not have to depend what other routines may be operating on the same structure </li></ul><ul><li>If the process does not free used memory, the unused space is accumulated until the process terminates or swap space is exhausted </li></ul>
  4. 4. Explicit Storage Management Hazards <ul><li>Programming errors may lead to errors in the storage management: </li></ul><ul><ul><li>May reclaim space earlier than necessary </li></ul></ul><ul><ul><li>May not reclaim space at all, causing memory leaks </li></ul></ul><ul><li>These errors are particularly dangerous since they tend to show up after delivery </li></ul><ul><li>Many programmers allocate several objects statically, to avoid allocation on the heap and reclaiming them at a later stage </li></ul>
  5. 5. Explicit Storage Management Hazards <ul><li>In many large systems, garbage collection is partially implemented in the system’s objects </li></ul><ul><li>Garbage Collection is not supported by the programming language </li></ul><ul><li>Leads to buggy, partial Garbage Collectors which are not useable by other applications </li></ul><ul><li>The purpose of GC is to address these issues </li></ul>
  6. 6. GC Complexity <ul><li>Garbage Collection is sometimes considered cheaper than explicit deallocation </li></ul><ul><li>A good Garbage Collector slows a program down by a factor of 10 percent </li></ul><ul><li>Although it seems a lot, it is only a small price to pay for: </li></ul><ul><ul><li>Convenience </li></ul></ul><ul><ul><li>Development time </li></ul></ul><ul><ul><li>Reliability </li></ul></ul>
  7. 7. Garbage Collection – The Two-Phase Abstraction <ul><li>The basic functioning of a garbage collector consists, abstractly speaking, of two parts: </li></ul><ul><ul><li>Distinguishing the live objects from the garbage in some way ( garbage detection) </li></ul></ul><ul><ul><li>Reclaiming the garbage objects’ storage, so that the running program can use it ( garbage reclamation) </li></ul></ul><ul><li>In practice these two phases may be interleaved </li></ul>
  8. 8. 2. Reference Counting <ul><li>Each object has an associated count of the references (pointers) to it </li></ul><ul><li>Each time a reference to the object is created, its reference count is increased by one and vice-versa </li></ul><ul><li>When the reference count reaches 0, the object’s space may be reclaimed </li></ul>
  9. 9. Reference Counting <ul><li>When an object is reclaimed, its pointer fields are examined and every object it points to has its reference count decremented </li></ul><ul><li>Reclaiming one object may therefore lead to a series of object reclamations </li></ul><ul><li>There are two major problems with reference counting </li></ul>
  10. 10. The Cycles Problem <ul><li>Reference Counting fails to reclaim circular structures </li></ul><ul><li>Originates from the definition of garbage </li></ul><ul><li>Circular structures are not rare in modern programs: </li></ul><ul><ul><li>Trees </li></ul></ul><ul><ul><li>Cyclic data structures </li></ul></ul><ul><li>The solution is up to the programmer </li></ul>
  11. 11. The Efficiency Problem <ul><li>When a pointer is created or destroyed, its reference count must be adjusted </li></ul><ul><li>Short-lived stack variables can incur a great deal of overhead in a simple reference counting scheme </li></ul><ul><li>In these cases, reference counts are incremented and then decremented back very soon </li></ul>
  12. 12. Deferred Reference Counting <ul><li>Much of this cost can be optimized away by special treatment of local variables </li></ul><ul><li>Reference from local variables are not included in this bookkeeping </li></ul><ul><li>However, we cannot ignore pointers from the stack completely </li></ul><ul><li>Therefore the stack is scanned before object reclamation and only if a pointer’s reference count is still 0, it is reclaimed </li></ul>
  13. 13. Reference Counting - Recap <ul><li>While reference counting is out of vogue for high-performance applications </li></ul><ul><li>It is quite common in applications where acyclic data structures are used </li></ul><ul><li>Most file systems use reference counting to manage files and/or disk blocks </li></ul><ul><li>Very simple scheme </li></ul>
  14. 14. 3. Mark-Sweep Collection - [McCarthy 1960] <ul><li>Distinguishing live object from garbage( Mark phase ) </li></ul><ul><ul><li>Done by tracing – starting at the root set and usually traversing the graph of pointers relationships </li></ul></ul><ul><ul><li>The reached objects are marked </li></ul></ul><ul><li>Reclaiming the garbage( Sweep phase) </li></ul><ul><ul><li>Once all live objects are marked, memory is exhaustively examined to find all of the unmarked (garbage) objects and reclaim their space </li></ul></ul>
  15. 15. Example Mark phase Sweep phase Before Heap root root root Free_list
  16. 16. Example p p Live object Marked object Garbage Free list r1 Free list r1 Free list r1
  17. 17. Basic Algorithm New (A)= if free_list is empty mark_sweep() if free_list is empty return (“out-of-memory”) pointer = allocate(A) return (pointer) mark_sweep ()= for Ptr in Roots mark(Ptr) sweep() mark (Obj)= if mark_bit(Obj) == unmarked mark_bit(Obj)=marked for C in Children(Obj) mark(C) Sweep ()= p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)
  18. 18. Properties of Mark & Sweep <ul><li>Fragmentation </li></ul><ul><ul><li>Difficult to allocate large objects </li></ul></ul><ul><ul><li>Several small objects maybe add up to large contiguous space </li></ul></ul><ul><li>Cost </li></ul><ul><ul><li>Proportional to the heap size, including both live and garbage objects </li></ul></ul><ul><li>Locality of reference: </li></ul><ul><ul><li>Does not move objects </li></ul></ul><ul><ul><li>Interleave of different ages causes many page swaps. (Because objects often created in clusters that are active at the same time) </li></ul></ul>
  19. 19. Standard Engineering Techniques <ul><li>Mark: </li></ul><ul><ul><li>Explicit mark-stack (avoid recursion) </li></ul></ul><ul><ul><li>Pointer reversal </li></ul></ul><ul><ul><li>Using bitmaps </li></ul></ul><ul><li>Sweep: </li></ul><ul><ul><li>Using bitmaps </li></ul></ul><ul><ul><li>Lazy sweep </li></ul></ul>
  20. 20. 4. Mark-Compact Collection Compact after mark <ul><li>Mark-Compact collectors remedy the fragmentation and allocation problems of mark-sweep collectors </li></ul><ul><li>The collector traverses the pointers graph and copy every live object after the previous one </li></ul><ul><li>This results in one big contiguous space which contains live objects and another which is considered free space </li></ul>
  21. 21. Mark-Compact Collection <ul><li>Garbage objects are “squeezed” to the end of the memory </li></ul><ul><li>The process requires several passes over the memory: </li></ul><ul><ul><li>One to computes the new location for objects </li></ul></ul><ul><ul><li>Subsequent passes update pointers and actually move the objects </li></ul></ul><ul><li>The algorithm can be significantly slower than Mark-Sweep Collection </li></ul><ul><li>Two Algorithms: </li></ul><ul><ul><li>Two-finger Alg–for objects of equal size. </li></ul></ul><ul><ul><li>Lisp 2 Alg </li></ul></ul>
  22. 22. Object Ordering <ul><li>Arbitrary –no guaranteed order. </li></ul><ul><li>Linearizing–objects pointing to one another are moved into adjacent positions. </li></ul><ul><li>Sliding–objects are slid to one end of the heap, maintaining the original order of allocation. </li></ul>1 2 3 4 Linearizing 4 1 2 3 Arbitrary Worst 1 3 4 2 Good 1 2 3 4 Sliding Best
  23. 23. The Two Finger Algorithm [Edwards 1974] <ul><li>Simplest algorithm: </li></ul><ul><ul><li>Designed for objects of equal size </li></ul></ul><ul><ul><li>Order of objects in output is arbitrary. </li></ul></ul><ul><ul><li>Two passes: compact & update pointers </li></ul></ul>
  24. 24. Pass I: Compact <ul><li>Use two pointers: </li></ul><ul><ul><li>free: goes over heap from bottom searching for free/empty objects. </li></ul></ul><ul><ul><li>Live:goes over heap from top downwards searching for live objects. </li></ul></ul><ul><ul><li>When freefinds a free spot and livefinds an object, the object is moved to the free spot. </li></ul></ul><ul><li>When an object is moved, a pointer to its new location is left at its old location </li></ul>
  25. 25. Two finger, pass I - Example
  26. 26. Pass II: Fix Pointers <ul><li>Go over live objects in the heap </li></ul><ul><ul><li>Go over their pointers </li></ul></ul><ul><ul><li>If pointer points to the free area: fix it according to the forwarding pointers in target objects. </li></ul></ul>Compacted area free area
  27. 27. <ul><li>Simple! </li></ul><ul><li>Relatively fast: One pass + pass on live objects (and their previous location). </li></ul><ul><li>No extra space required. </li></ul><ul><li>Objects restricted to equal size. </li></ul><ul><li>Order of objects in output is arbitrary. </li></ul><ul><li>This significantly deteriorates program efficiency! Thus –not used today. </li></ul>Two finger –Properties
  28. 28. The Lisp2 Algorithm <ul><li>Improvements: handle variable sized objects, keep order of objects. </li></ul><ul><li>Requires one additional pointer field for each object. </li></ul><ul><li>The picture: </li></ul><ul><li>Note: cannot simply keep forwarding pointer in original address. It may be overwritten by a moved object </li></ul>Free
  29. 29. The Lisp2 Algorithm <ul><li>Pass 1: Address computation. Keep new address in an additional object field. </li></ul><ul><li>Pass 2: pointer modification. </li></ul><ul><li>Pass 3: two pointers (free & live) run from the bottom. Live objects are moved to free space keeping their original order. </li></ul>Free
  30. 30. Lisp 2 –Properties <ul><li>Simple enough. </li></ul><ul><li>No constraints on object sizes. </li></ul><ul><li>Order of objects preserved. </li></ul>Slower: 3 passes. Extra space required –a pointer per object.
  31. 31. 5. Copying Garbage Collection <ul><li>Like Mark-Compact, the algorithm moves all of the live objects into one area, and the rest of the heap becomes available </li></ul><ul><li>There are several schemes of copying garbage collection, one of which is the “Stop and-Copy” garbage collection </li></ul><ul><li>In this scheme the heap is divided into two contiguous semispaces. During normal program execution, only one of them is in use </li></ul>
  32. 32. Stop-and-Copy Collector <ul><li>Memory is allocated linearly upward through the “current” semispace </li></ul><ul><li>When the running program demands an allocation that will not fit in the unused area </li></ul><ul><li>The program is stopped and the copying garbage collector is called to reclaim space </li></ul>
  33. 33. Cheney breath-first copying
  34. 34. The algorithm Init()= Tospace=Heap_bottom space_size=Heap_size/2 top_of_space=Tospace+space_size Fromspace=top_of_space+1 free=Tospace New(n)= If free+n>top_of_space Collect() if free+n>top_of_space abort“Memoryexhausted” new-object=free free=free+n return(new-object) collect()= from-space,to-space= to-space,from-space//swap scan=free=Tospace top_of_space=Tospace+space_size forRinRoots R=copy(R) whilescan<free forPinchildren(scan) *P=copy(P) scan=scan+size(scan) copy(P)= ifforwarded(P) returnforwarding_address(P) else addr=free mem-copy(P,free) free=free+size(P) forwarding_address(P)=addr return(addr) 1 2 4 3
  35. 35. Efficiency of Copying Collection <ul><li>Can be made arbitrarily efficient if sufficient memory is available </li></ul><ul><li>The work done in each collection is proportional to the amount of live data </li></ul><ul><li>To decrease the frequency of garbage collection, imply allocate larger semispaces </li></ul><ul><li>Impractical if there is not enough RAM and paging occurs </li></ul>
  36. 36. 6. Non-Copying Implicit Collection <ul><li>Need more two pointer fields and a color field to each object </li></ul><ul><li>These fields serve to link each hunk of storage into a doubly-linked list </li></ul><ul><li>The color field indicates which set an object belongs to </li></ul><ul><li>Live objects being traced do not require space for both fromspace and tospace versions </li></ul><ul><li>In most cases, this appears to make the space cost smaller than that of a copying collector but in some cases fragmentation costs(due to the inability to compact data) may outweigh those savings </li></ul>
  37. 37. 7. Choosing Among Basic Tracing Techniques <ul><li>A common criterion for high-performance garbage collection is that the cost of collecting objects be comparable, on average, to the cost of allocating objects </li></ul><ul><li>While current copying collectors appear to be more efficient than mark-sweep collectors, the difference is not high for state-of-the art implementations </li></ul><ul><li>When the overall memory capacity is small, reference counting collectors are more attractive </li></ul><ul><li>Simple Garbage Collection Techniques: </li></ul><ul><ul><li>Too much space, too much time </li></ul></ul>
  38. 38. 8. Problems with Simple Tracing Collectors
  39. 39. 9. Conservatism in GC <ul><li>The first conservative assumption most collectors make is that any variable in the stack, globals or registers is live. There may be interactions between the compiler's optimizations and the garbage collector's view of the reachability graph. </li></ul><ul><li>Tracing collectors introduce a major temporal form of conservatism, simply by allowing garbage to go un collected between collection cycles </li></ul><ul><li>Reference counting collectors are conservative topologically failing to distinguish between different paths that share an edge in the graph of pointer relationships </li></ul>

×