Basic  Garbage  Collection  Techniques
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • nice ppt
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
9,283
On Slideshare
8,992
From Embeds
291
Number of Embeds
6

Actions

Shares
Downloads
302
Comments
1
Likes
4

Embeds 291

http://techinterviewprep.wordpress.com 150
http://cheatsheetbuddy.com 62
http://gabhi.com 59
http://www.slideshare.net 12
http://localhost 6
http://fin-se.info 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Ưu điểm: Ko cần cấp thêm vùng nhớ để lưu trữ trường count field Các object ko cần di chuyển trong quá trình GC Loại bỏ được rác trong trường hợp có tham chiếu vòng (cyclic structures) Nhược điểm: Giới hạn giá trị lớn nhất có thể của ô nhớ (integer) Tăng khả năng làm phân mảnh vùng nhớ Chương trình phải tạm dừng khi GC thực thi Giai đoạn sweep tốn chi phí cao(thời gian)
  • Ưu điểm: Giảm khả năng phân mảnh vùng nhớ heap. Thời gian thực hiện công việc thì tỉ lệ với các live object Có thể xử lý được rác chứa các tham chiếu vòng Không tốn bộ nhớ để lưu trường count field Nhược điểm: Vùng nhớ dùng cho chương trình bị chia đôi(fromspace – tospace) Các object phải di chuyển trong vùng nhớ trong suốt quá trình thu gom rác, do vậy các tham chiếu đến các đối tượng cũng phải cập nhật. Chương trình phải tạm dừng khi bộ thu gom rác thực thi.

Transcript

  • 1. [email_address] Basic Garbage Collection Techniques
  • 2. Contents
    • What is Garbage Collection
    • Reference Counting
    • Mark-Sweep Collection
    • Mark-Compact Collection
    • Copying Garbage Collection
    • Non-Copy Implicit Collection
    • Choosing Among Basic Tracing Techniques
    • Problems with Simple Tracing Collectors
    • Conservatism in GC
    Tracing
  • 3. 1. Garbage Collection
    • Garbage Collection (GC)
    • Is the automatic storage reclamation of computer storage
    • The GC function is to find data objects that are no longer in use and make their space available by the running program
    • So Why Garbage Collection?
    • A software routine operating on a data structure should not have to depend what other routines may be operating on the same structure
    • If the process does not free used memory, the unused space is accumulated until the process terminates or swap space is exhausted
  • 4. Explicit Storage Management Hazards
    • Programming errors may lead to errors in the storage management:
      • May reclaim space earlier than necessary
      • May not reclaim space at all, causing memory leaks
    • These errors are particularly dangerous since they tend to show up after delivery
    • Many programmers allocate several objects statically, to avoid allocation on the heap and reclaiming them at a later stage
  • 5. Explicit Storage Management Hazards
    • In many large systems, garbage collection is partially implemented in the system’s objects
    • Garbage Collection is not supported by the programming language
    • Leads to buggy, partial Garbage Collectors which are not useable by other applications
    • The purpose of GC is to address these issues
  • 6. GC Complexity
    • Garbage Collection is sometimes considered cheaper than explicit deallocation
    • A good Garbage Collector slows a program down by a factor of 10 percent
    • Although it seems a lot, it is only a small price to pay for:
      • Convenience
      • Development time
      • Reliability
  • 7. Garbage Collection – The Two-Phase Abstraction
    • The basic functioning of a garbage collector consists, abstractly speaking, of two parts:
      • Distinguishing the live objects from the garbage in some way ( garbage detection)
      • Reclaiming the garbage objects’ storage, so that the running program can use it ( garbage reclamation)
    • In practice these two phases may be interleaved
  • 8. 2. Reference Counting
    • Each object has an associated count of the references (pointers) to it
    • Each time a reference to the object is created, its reference count is increased by one and vice-versa
    • When the reference count reaches 0, the object’s space may be reclaimed
  • 9. Reference Counting
    • When an object is reclaimed, its pointer fields are examined and every object it points to has its reference count decremented
    • Reclaiming one object may therefore lead to a series of object reclamations
    • There are two major problems with reference counting
  • 10. The Cycles Problem
    • Reference Counting fails to reclaim circular structures
    • Originates from the definition of garbage
    • Circular structures are not rare in modern programs:
      • Trees
      • Cyclic data structures
    • The solution is up to the programmer
  • 11. The Efficiency Problem
    • When a pointer is created or destroyed, its reference count must be adjusted
    • Short-lived stack variables can incur a great deal of overhead in a simple reference counting scheme
    • In these cases, reference counts are incremented and then decremented back very soon
  • 12. Deferred Reference Counting
    • Much of this cost can be optimized away by special treatment of local variables
    • Reference from local variables are not included in this bookkeeping
    • However, we cannot ignore pointers from the stack completely
    • Therefore the stack is scanned before object reclamation and only if a pointer’s reference count is still 0, it is reclaimed
  • 13. Reference Counting - Recap
    • While reference counting is out of vogue for high-performance applications
    • It is quite common in applications where acyclic data structures are used
    • Most file systems use reference counting to manage files and/or disk blocks
    • Very simple scheme
  • 14. 3. Mark-Sweep Collection - [McCarthy 1960]
    • Distinguishing live object from garbage( Mark phase )
      • Done by tracing – starting at the root set and usually traversing the graph of pointers relationships
      • The reached objects are marked
    • Reclaiming the garbage( Sweep phase)
      • Once all live objects are marked, memory is exhaustively examined to find all of the unmarked (garbage) objects and reclaim their space
  • 15. Example Mark phase Sweep phase Before Heap root root root Free_list
  • 16. Example p p Live object Marked object Garbage Free list r1 Free list r1 Free list r1
  • 17. Basic Algorithm New (A)= if free_list is empty mark_sweep() if free_list is empty return (“out-of-memory”) pointer = allocate(A) return (pointer) mark_sweep ()= for Ptr in Roots mark(Ptr) sweep() mark (Obj)= if mark_bit(Obj) == unmarked mark_bit(Obj)=marked for C in Children(Obj) mark(C) Sweep ()= p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)
  • 18. Properties of Mark & Sweep
    • Fragmentation
      • Difficult to allocate large objects
      • Several small objects maybe add up to large contiguous space
    • Cost
      • Proportional to the heap size, including both live and garbage objects
    • Locality of reference:
      • Does not move objects
      • Interleave of different ages causes many page swaps. (Because objects often created in clusters that are active at the same time)
  • 19. Standard Engineering Techniques
    • Mark:
      • Explicit mark-stack (avoid recursion)
      • Pointer reversal
      • Using bitmaps
    • Sweep:
      • Using bitmaps
      • Lazy sweep
  • 20. 4. Mark-Compact Collection Compact after mark
    • Mark-Compact collectors remedy the fragmentation and allocation problems of mark-sweep collectors
    • The collector traverses the pointers graph and copy every live object after the previous one
    • This results in one big contiguous space which contains live objects and another which is considered free space
  • 21. Mark-Compact Collection
    • Garbage objects are “squeezed” to the end of the memory
    • The process requires several passes over the memory:
      • One to computes the new location for objects
      • Subsequent passes update pointers and actually move the objects
    • The algorithm can be significantly slower than Mark-Sweep Collection
    • Two Algorithms:
      • Two-finger Alg–for objects of equal size.
      • Lisp 2 Alg
  • 22. Object Ordering
    • Arbitrary –no guaranteed order.
    • Linearizing–objects pointing to one another are moved into adjacent positions.
    • Sliding–objects are slid to one end of the heap, maintaining the original order of allocation.
    1 2 3 4 Linearizing 4 1 2 3 Arbitrary Worst 1 3 4 2 Good 1 2 3 4 Sliding Best
  • 23. The Two Finger Algorithm [Edwards 1974]
    • Simplest algorithm:
      • Designed for objects of equal size
      • Order of objects in output is arbitrary.
      • Two passes: compact & update pointers
  • 24. Pass I: Compact
    • Use two pointers:
      • free: goes over heap from bottom searching for free/empty objects.
      • Live:goes over heap from top downwards searching for live objects.
      • When freefinds a free spot and livefinds an object, the object is moved to the free spot.
    • When an object is moved, a pointer to its new location is left at its old location
  • 25. Two finger, pass I - Example
  • 26. Pass II: Fix Pointers
    • Go over live objects in the heap
      • Go over their pointers
      • If pointer points to the free area: fix it according to the forwarding pointers in target objects.
    Compacted area free area
  • 27.
    • Simple!
    • Relatively fast: One pass + pass on live objects (and their previous location).
    • No extra space required.
    • Objects restricted to equal size.
    • Order of objects in output is arbitrary.
    • This significantly deteriorates program efficiency! Thus –not used today.
    Two finger –Properties
  • 28. The Lisp2 Algorithm
    • Improvements: handle variable sized objects, keep order of objects.
    • Requires one additional pointer field for each object.
    • The picture:
    • Note: cannot simply keep forwarding pointer in original address. It may be overwritten by a moved object
    Free
  • 29. The Lisp2 Algorithm
    • Pass 1: Address computation. Keep new address in an additional object field.
    • Pass 2: pointer modification.
    • Pass 3: two pointers (free & live) run from the bottom. Live objects are moved to free space keeping their original order.
    Free
  • 30. Lisp 2 –Properties
    • Simple enough.
    • No constraints on object sizes.
    • Order of objects preserved.
    Slower: 3 passes. Extra space required –a pointer per object.
  • 31. 5. Copying Garbage Collection
    • Like Mark-Compact, the algorithm moves all of the live objects into one area, and the rest of the heap becomes available
    • There are several schemes of copying garbage collection, one of which is the “Stop and-Copy” garbage collection
    • In this scheme the heap is divided into two contiguous semispaces. During normal program execution, only one of them is in use
  • 32. Stop-and-Copy Collector
    • Memory is allocated linearly upward through the “current” semispace
    • When the running program demands an allocation that will not fit in the unused area
    • The program is stopped and the copying garbage collector is called to reclaim space
  • 33. Cheney breath-first copying
  • 34. The algorithm Init()= Tospace=Heap_bottom space_size=Heap_size/2 top_of_space=Tospace+space_size Fromspace=top_of_space+1 free=Tospace New(n)= If free+n>top_of_space Collect() if free+n>top_of_space abort“Memoryexhausted” new-object=free free=free+n return(new-object) collect()= from-space,to-space= to-space,from-space//swap scan=free=Tospace top_of_space=Tospace+space_size forRinRoots R=copy(R) whilescan<free forPinchildren(scan) *P=copy(P) scan=scan+size(scan) copy(P)= ifforwarded(P) returnforwarding_address(P) else addr=free mem-copy(P,free) free=free+size(P) forwarding_address(P)=addr return(addr) 1 2 4 3
  • 35. Efficiency of Copying Collection
    • Can be made arbitrarily efficient if sufficient memory is available
    • The work done in each collection is proportional to the amount of live data
    • To decrease the frequency of garbage collection, imply allocate larger semispaces
    • Impractical if there is not enough RAM and paging occurs
  • 36. 6. Non-Copying Implicit Collection
    • Need more two pointer fields and a color field to each object
    • These fields serve to link each hunk of storage into a doubly-linked list
    • The color field indicates which set an object belongs to
    • Live objects being traced do not require space for both fromspace and tospace versions
    • In most cases, this appears to make the space cost smaller than that of a copying collector but in some cases fragmentation costs(due to the inability to compact data) may outweigh those savings
  • 37. 7. Choosing Among Basic Tracing Techniques
    • A common criterion for high-performance garbage collection is that the cost of collecting objects be comparable, on average, to the cost of allocating objects
    • While current copying collectors appear to be more efficient than mark-sweep collectors, the difference is not high for state-of-the art implementations
    • When the overall memory capacity is small, reference counting collectors are more attractive
    • Simple Garbage Collection Techniques:
      • Too much space, too much time
  • 38. 8. Problems with Simple Tracing Collectors
  • 39. 9. Conservatism in GC
    • The first conservative assumption most collectors make is that any variable in the stack, globals or registers is live. There may be interactions between the compiler's optimizations and the garbage collector's view of the reachability graph.
    • Tracing collectors introduce a major temporal form of conservatism, simply by allowing garbage to go un collected between collection cycles
    • Reference counting collectors are conservative topologically failing to distinguish between different paths that share an edge in the graph of pointer relationships