Garbage collection 介紹
高國棟
演講經歷
● 2013/04 在 taipei.py 演講關於 pdb 的實作。相關投影片:
http://www.slideshare.net/ya790026/recoverpdb
● 2013/05 在 pyconf.tw 演將 CPyt...
Garbage Collection
● memory leak, dangling pointer

● Reference count
● Mark and sweep
memory leak

dangling pointer

memory

memory

memory

memory

free
Reference Counting
● Reference count is maintained for each
object on the heap.
● When an object is first created and a
re...
Reference Counting
● When any other variable is assigned a
reference to that object, the object's count is
incremented.
● ...
a = 5000
a

a = 5000
b=a
a

b

a = 5000
b=a
a = 3000

b

a

ob_ival: 5000
ob_refcnt: 1

ob_ival: 5000
ob_refcnt: 1

ob_iva...
a = 5000
b=a
a = 3000
b = 4000

b
ob_ival: 5000
ob_refcnt: 0

a
ob_ival: 4000
ob_refcnt: 1

ob_ival: 3000
ob_refcnt: 1
Reference Counting
Advantage:
suitable for real-time environments where
the program can't be interrupted for very long.
Di...
a = []
b = []
a.append(b)
b.append(a)
a

b

a = []
b = []
a.append(b)
b.append(a)
a = None
b = None
mark and sweep
1. Find the root objects of the system. These
are things like the global environment (like
the __main__ mod...
Two-Color Mark & Sweep
Sweep
Free

Sweep

White

Black

New

Mark
Two-Color Mark & Sweep
● the algorithm is non-incremental (atomic
collection)
Tri-Color Incremental Mark & Sweep
● Initially grey set is all the objects that are reachable from
root references but the...
Free

Sweep

Black

Mark

Sweep
After
Check

White

Barrier
backward

Mark
New

Gray

Barrier
Forward

Mark
Tri-Color Incremental Mark & Sweep
● When there are no more objects in the grey
set, then all the objects remaining in the...
Generational Collectors
1. Most objects created by most programs
have very short lives.
2. Most programs create some objec...
External Memory fragment
● Free memory is separated into small blocks
and is interspersed by allocated memory.
● Although ...
External Memory fragment
a

del b
del d

b

c

d

a

c

d

a

c

We can’t create a variable with four
blocks.
Compacting and copying
● Move objects on the fly to reduce heap
fragmentation
a

a
table of
object
handles

Object

b

b

Object
stop and copy
● The heap is divided into two regions.
● Only one of the two regions is used at any time.
● Objects are all...
free

allocated

unused

allocated

unused

unused

Copy
live
objects

free

allocated
Python garbage collection
● Python use both of reference count and
“mark and sweep”.
● “mark and sweep” only work for cont...
Python mark and sweep
1. For each container object, set gc_refs equal
to the object's reference count.
2. For each contain...
Python mark and sweep
3. All container objects that now have a gc_refs field
greater than one are referenced from outside ...
Python mark and sweep
5. Objects left in our original set are referenced
only by objects within that set (ie. they are
ina...
1
gc_refs: 1

gc_refs: 1

2
gc_refs: 1

gc_refs: 0

3
gc_refs: 1

gc_refs: 0

GC_TENTATIVE
LY_UNREACHAB
LE
4
gc_refs: 1

gc_refs: 1
1
gc_refs: 1

gc_refs: 1

2
gc_refs: 0

gc_refs: 0

3
gc_refs: 0

gc_refs: 0

GC_TENTATIVE
LY_UNREACHAB
LE
4
gc_refs: 0

gc_refs: 0
Java Reference
Strong reference
SoftReference
WeakReference
PhantomReference
Soft Reference
● The garbage collector may reclaim the
memory occupied by a softly reachable
object.
● It’s useful for cac...
Weak Reference
● The garbage collector must reclaim the
memory occupied by a weakly reachable
object.
● Canonicalizing map...
Phantom Reference
● Similar with weak reference
● Whereas the garbage collector enqueues
soft and weak reference objects w...
Python Reference
Strong reference
Weak reference

weakref.ref(object[, callback])
Python gc 介面
gc.enable()
gc.disable()
c.isenabled()
gc.collect([generation])
gc.set_threshold(threshold0[, threshold1[, th...
Python gc 介面
gc.set_debug(flags)
gc.get_referrers(*objs)
gc.get_referents(*objs)
gc.garbage
In [1]: import gc
In [2]: gc.set_debug(gc.DEBUG_STATS)
In [3]: gc.collect()
gc: collecting generation 2...
gc: objects in ...
>>>
...
...
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

class Finalizable:
def __del__(self): pass
a = Finalizable()
b = Finalizable(...
● memory-bound
○

可以考慮調低 threshold 用時間換取空間

● cpu-bound
○ 可以考慮調高 threshold 用空間換取時間
○ 但是不可以調太高 以免每次 gc 時間過久
○ 在部分要求低延遲的程式碼 ...
結論
● python 的 gc 演算法很有趣
● python 的記憶體管理機制,能夠減少記憶體破
碎的情形發生。但是 gc 無法解決
ExternalMemory fragment 的問題
● python 的 gc 是 atomic
參考資料
●
●
●
●
●

New-Garbage-Collector for lua
Garbage Collection
gc module docs
Details on Garbage Collection for Python
p...
PyConf 場務徵人
Thank you
Upcoming SlideShare
Loading in …5
×

Garbage collection 介紹

1,292 views

Published on

Garbage collection 介紹

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,292
On SlideShare
0
From Embeds
0
Number of Embeds
177
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Garbage collection 介紹

  1. 1. Garbage collection 介紹 高國棟
  2. 2. 演講經歷 ● 2013/04 在 taipei.py 演講關於 pdb 的實作。相關投影片: http://www.slideshare.net/ya790026/recoverpdb ● 2013/05 在 pyconf.tw 演將 CPython 原始碼解析。相關投 影片:http://www.slideshare.net/ya790026/c-python23247730。 ● 2013/08 在taipei.py 演講 python 如何執行程式碼。相關 投影片:http://www.slideshare.net/ya790026/python27854881
  3. 3. Garbage Collection ● memory leak, dangling pointer ● Reference count ● Mark and sweep
  4. 4. memory leak dangling pointer memory memory memory memory free
  5. 5. Reference Counting ● Reference count is maintained for each object on the heap. ● When an object is first created and a reference to it is assigned to a variable, the object's reference count is set to one.
  6. 6. Reference Counting ● When any other variable is assigned a reference to that object, the object's count is incremented. ● When a reference to an object goes out of scope or is assigned a new value, the object's count is decremented.
  7. 7. a = 5000 a a = 5000 b=a a b a = 5000 b=a a = 3000 b a ob_ival: 5000 ob_refcnt: 1 ob_ival: 5000 ob_refcnt: 1 ob_ival: 5000 ob_refcnt: 2 ob_ival: 3000 ob_refcnt: 1
  8. 8. a = 5000 b=a a = 3000 b = 4000 b ob_ival: 5000 ob_refcnt: 0 a ob_ival: 4000 ob_refcnt: 1 ob_ival: 3000 ob_refcnt: 1
  9. 9. Reference Counting Advantage: suitable for real-time environments where the program can't be interrupted for very long. Disadvantage: reference counting does not detect cycles.
  10. 10. a = [] b = [] a.append(b) b.append(a) a b a = [] b = [] a.append(b) b.append(a) a = None b = None
  11. 11. mark and sweep 1. Find the root objects of the system. These are things like the global environment (like the __main__ module in Python) and objects on the stack. 2. Search from these objects and find all objects reachable from them. This objects are all "alive". 3. Free all other objects.
  12. 12. Two-Color Mark & Sweep Sweep Free Sweep White Black New Mark
  13. 13. Two-Color Mark & Sweep ● the algorithm is non-incremental (atomic collection)
  14. 14. Tri-Color Incremental Mark & Sweep ● Initially grey set is all the objects that are reachable from root references but the objects referenced by grey objects haven't been scanned yet. ● The white setis the set of objects that are candidates for having their memory recycled. ● The black set is the set of objects that can cheaply be proven to have no references to objects in the white set.
  15. 15. Free Sweep Black Mark Sweep After Check White Barrier backward Mark New Gray Barrier Forward Mark
  16. 16. Tri-Color Incremental Mark & Sweep ● When there are no more objects in the grey set, then all the objects remaining in the white set have been demonstrated not to be reachable, and the storage occupied by them can be reclaimed.
  17. 17. Generational Collectors 1. Most objects created by most programs have very short lives. 2. Most programs create some objects that have very long lifetimes. A major source of inefficiency in simple copying collectors is that they spend much of their time copying the same long-lived objects again and again.
  18. 18. External Memory fragment ● Free memory is separated into small blocks and is interspersed by allocated memory. ● Although free storage is available, it is unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
  19. 19. External Memory fragment a del b del d b c d a c d a c We can’t create a variable with four blocks.
  20. 20. Compacting and copying ● Move objects on the fly to reduce heap fragmentation
  21. 21. a a table of object handles Object b b Object
  22. 22. stop and copy ● The heap is divided into two regions. ● Only one of the two regions is used at any time. ● Objects are allocated from one of the regions until all the space in that region has been exhausted. ● Find out live objects and copy them to the other region. ● Memory will be allocated from the new heap region until it too runs out of space
  23. 23. free allocated unused allocated unused unused Copy live objects free allocated
  24. 24. Python garbage collection ● Python use both of reference count and “mark and sweep”. ● “mark and sweep” only work for containers for solving reference cycles. ● Containers mean list, dict, instance, etc. ● python 的 mark and sweep和傳統方法不一 樣,因為 c extentsion 的存在,因此很難有共 同的 root object。
  25. 25. Python mark and sweep 1. For each container object, set gc_refs equal to the object's reference count. 2. For each container object, find which container objects it references and decrement the referenced container's gc_refs field.
  26. 26. Python mark and sweep 3. All container objects that now have a gc_refs field greater than one are referenced from outside the set of container objects. We cannot free these objects so we move them to a different set. 4. Any objects referenced from the objects moved also cannot be freed. We move them and all the objects reachable from them too.
  27. 27. Python mark and sweep 5. Objects left in our original set are referenced only by objects within that set (ie. they are inaccessible from Python and are garbage). We can now go about freeing these objects.
  28. 28. 1 gc_refs: 1 gc_refs: 1 2 gc_refs: 1 gc_refs: 0 3 gc_refs: 1 gc_refs: 0 GC_TENTATIVE LY_UNREACHAB LE
  29. 29. 4 gc_refs: 1 gc_refs: 1
  30. 30. 1 gc_refs: 1 gc_refs: 1 2 gc_refs: 0 gc_refs: 0 3 gc_refs: 0 gc_refs: 0 GC_TENTATIVE LY_UNREACHAB LE
  31. 31. 4 gc_refs: 0 gc_refs: 0
  32. 32. Java Reference Strong reference SoftReference WeakReference PhantomReference
  33. 33. Soft Reference ● The garbage collector may reclaim the memory occupied by a softly reachable object. ● It’s useful for cache.
  34. 34. Weak Reference ● The garbage collector must reclaim the memory occupied by a weakly reachable object. ● Canonicalizing mappings
  35. 35. Phantom Reference ● Similar with weak reference ● Whereas the garbage collector enqueues soft and weak reference objects when their referents are leaving the relevant reachability state, it enqueues phantom references when the referents are entering the relevant state. ● Establish more flexible pre-mortem cleanup
  36. 36. Python Reference Strong reference Weak reference weakref.ref(object[, callback])
  37. 37. Python gc 介面 gc.enable() gc.disable() c.isenabled() gc.collect([generation]) gc.set_threshold(threshold0[, threshold1[, threshold2]]) gc.get_count() gc.get_threshold()
  38. 38. Python gc 介面 gc.set_debug(flags) gc.get_referrers(*objs) gc.get_referents(*objs) gc.garbage
  39. 39. In [1]: import gc In [2]: gc.set_debug(gc.DEBUG_STATS) In [3]: gc.collect() gc: collecting generation 2... gc: objects in each generation: 159 2655 7538 gc: done, 10 unreachable, 0 uncollectable, 0.0123s elapsed.
  40. 40. >>> ... ... >>> >>> >>> >>> >>> >>> >>> >>> class Finalizable: def __del__(self): pass a = Finalizable() b = Finalizable() a.x = b b.x = a del a del b import gc gc.collect()
  41. 41. ● memory-bound ○ 可以考慮調低 threshold 用時間換取空間 ● cpu-bound ○ 可以考慮調高 threshold 用空間換取時間 ○ 但是不可以調太高 以免每次 gc 時間過久 ○ 在部分要求低延遲的程式碼 可以暫時停用 gc
  42. 42. 結論 ● python 的 gc 演算法很有趣 ● python 的記憶體管理機制,能夠減少記憶體破 碎的情形發生。但是 gc 無法解決 ExternalMemory fragment 的問題 ● python 的 gc 是 atomic
  43. 43. 參考資料 ● ● ● ● ● New-Garbage-Collector for lua Garbage Collection gc module docs Details on Garbage Collection for Python python source code(Modules/gcmodule.c)
  44. 44. PyConf 場務徵人
  45. 45. Thank you

×