GC in Ruby. RubyC, Kiev, 2014.

2,690 views
2,566 views

Published on

Garbage Collection in Ruby MRI from 1.8 to 2.2

Published in: Software

GC in Ruby. RubyC, Kiev, 2014.

  1. 1. GARBAGE COLLECTION in Ruby Timothy N.Tsvetkov RubyC, Kiev, 2014
  2. 2. Memory Management image source: http://www-03.ibm.com/ibm/history/ibm100/images/icp/J879398O31089G86/us__en_us__ibm100__risc_architecture__rtpc__620x350.jpg
  3. 3. .model tiny .code .startup mov dx, offset Hello mov ah, 9 int 21h .exit Hello db 'Hello world!$' end
  4. 4. include io.asm data segment x db 'Hello world!', '$' data ends stack segment stack db 128 dup (?) stack ends code segment assume cs: code, ds: data, ss: stack start: mov ax, data mov ds, ax lea dx, x outstr finish code ends end start
  5. 5. $ grep -r “free” * | wc -l
  6. 6. Does Ruby suxx?
  7. 7. Twitter on Scala “Because Ruby’s garbage collector is not quite as good as Java’s,each process uses up a lot of memory.We can’t really run very many Ruby daemon processes on a single machine without consuming large amounts of memory. Whereas with running things on the JVM we can run many threads in the same heap,and let that one process take all the machine’s memory for its playground.” —Robey Pointer
  8. 8. IRON.IO “After we rolled out our Go version,we reduced our server count [from 30] to two and we really only had two for redundancy.They were barely utilized,it was almost as if nothing was running on them.Our CPU utilization was less than 5% and the entire process started up with only a few hundred KB's of memory (on startup) vs our Rails apps which were ~50MB (on startup).” http://blog.iron.io/2013/03/how-we-went-from-30-servers-to-2-go.html
  9. 9. MRI Memory Allocation
  10. 10. Heap
  11. 11. RVALUE 841 #define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1)) # 23 bytes! 842 struct RString {! 843 struct RBasic basic;! 844 union {! 845 struct {! 846 long len;! 847 char *ptr;! 848 union {! 849 long capa;! 850 VALUE shared;! 851 } aux;! 852 } heap;! 853 char ary[RSTRING_EMBED_LEN_MAX + 1];! 854 } as;! 855 }; RBasic,RObject,RClass,RFloat,RArray,RRegexp,RHash,RData,RtypeData, RStruct, RBignum,RFile,RNode,RMatch,RRational,RComplex. Since 1.9 small strings are embedded into RString and not allocated externally http://rxr.whitequark.org/mri/source/include/ruby/ruby.h?v=2.0.0-p353#841
  12. 12. Mark & Sweep Mark & Sweep as a first GC was developed for the original version of Lisp in 1960. John McCarthy September 4,1927 –October 24,2011
  13. 13. Pro et contra (+) is able to reclaim cyclic data structures: Mark & Sweep traces out the set of objects accessible from the roots. (-) stop the world: When Mark & Sweep is called the execution of the program is suspended temporary.
  14. 14. Ruby 1.8 Mark & Sweep GC
  15. 15. 135 void *! 136 ruby_xmalloc(size)! 137 long size;! 138 {! 139 void *mem;! 140 ! 141 if (size < 0) {! 142 rb_raise(rb_eNoMemError, "negative allocation size (or too big)");! 143 }! 144 if (size == 0) size = 1;! 145 ! 146 if (ruby_gc_stress || (malloc_increase+size) > malloc_limit) {! 147 garbage_collect();! 148 }! 149 RUBY_CRITICAL(mem = malloc(size));! 150 if (!mem) {! 151 garbage_collect();! 152 RUBY_CRITICAL(mem = malloc(size));! 153 if (!mem) {! 154 rb_memerror();! 155 }! 156 }! 157 malloc_increase += size;! 158 ! 159 return mem;! 160 } Runs garbage collection every RUBY_GC_MALLOC_LIMIT of allocations http://rxr.whitequark.org/mri/source/gc.c?v=1.8.7-p374#136
  16. 16. 428 VALUE! 429 rb_newobj()! 430 {! 431 VALUE obj;! 432 ! 433 if (during_gc)! 434 rb_bug("object allocation during garbage collection phase");! 435 ! 436 if (ruby_gc_stress || !freelist) garbage_collect();! 437 ! 438 obj = (VALUE)freelist;! 439 freelist = freelist->as.free.next;! 440 MEMZERO((void*)obj, RVALUE, 1);! 441 #ifdef GC_DEBUG! 442 RANY(obj)->file = ruby_sourcefile;! 443 RANY(obj)->line = ruby_sourceline;! 444 #endif! 445 return obj;! 446 } Or runs garbage collection if there is no free slot for an object http://rxr.whitequark.org/mri/source/gc.c?v=1.8.7-p374#429
  17. 17. At the mark phase Ruby marks each live object with a bit flag FL_MARK inside the object structure. ! Guess the problem…
  18. 18. Ruby 1.9.3 Lazy sweeping
  19. 19. 1187 VALUE! 1188 rb_newobj(void)! 1189 {! 1190 rb_objspace_t *objspace = &rb_objspace;! 1191 VALUE obj;! 1192! ... ! 1206 if (UNLIKELY(!freelist)) {! 1207 if (!gc_lazy_sweep(objspace)) {! 1208 during_gc = 0;! 1209 rb_memerror();! 1210 }! 1211 }! ... ! 1222 ! 1223 return obj;! 1224 } http://rxr.whitequark.org/mri/source/gc.c?v=1.9.3-p484#1188 GC_LAZY_SWEEP() was introduced
  20. 20. 2239 static int! 2240 lazy_sweep(rb_objspace_t *objspace)! 2241 {! 2242 struct heaps_slot *next;! 2243 ! 2244 heaps_increment(objspace);! 2245 while (objspace->heap.sweep_slots) {! 2246 next = objspace->heap.sweep_slots->next;! 2247 slot_sweep(objspace, objspace->heap.sweep_slots);! 2248 objspace->heap.sweep_slots = next;! 2249 if (freelist) {! 2250 during_gc = 0;! 2251 return TRUE;! 2252 }! 2253 }! 2254 return FALSE;! 2255 } In “lazy sweeping”each invocation of a new object allocation sweeps until it finds a free object http://rxr.whitequark.org/mri/source/gc.c?v=1.9.3-p484#2240
  21. 21. Ruby 2.0 Bitmap marking GC, mark phase is rewritten to be non-recursive
  22. 22. Why do we need bitmap marking fork() uses Copy On Write (COW) optimization marking all objects (including AST nodes of your program) breaks COW pre-/forking model is widely used in RoR: Passenger,Unicorn, resque.
  23. 23. Bitmap requirements We need to locate a flag in the bitmap for an object on the heap (and vice versa) in constant time. This can be accomplished by converting one address to another with just bit operations if we will be able to allocate aligned memory. Ruby doesn’t have its own memory management and relies on OS malloc,ruby runs on multiple different platforms.
  24. 24. 3442 char* aligned;! 3443 res = malloc(alignment + size + sizeof(void*));! 3444 aligned = (char*)res + alignment + sizeof(void*); 3445 aligned -= ((VALUE)aligned & (alignment - 1));! 3446 ((void**)aligned)[-1] = res; 3468 free(((void**)ptr)[-1]);
  25. 25. Ruby 2.1 New tuning variables and malloc limits, RGenGC, GC events
  26. 26. Tuning variables • RUBY_GC_HEAP_INIT_SLOTS (10000) initial number of slots on the heap. If your app boots with 500k long-lived objects then increase it to 500k,there is no reason to run gc at boot. • RUBY_GC_HEAP_FREE_SLOTS (4096) minimum free slots reserved for sweep re-use. Let’s assume that every request allocates N objects, then setting it to N*8 will give you ~8 requests in between each mark phase. • RUBY_GC_HEAP_GROWTH_FACTOR (1.8) factor to growth the heap. If you increased RUBY_GC_HEAP_INIT_SLOTS and RUBY_GC_HEAP_FREE_SLOTS then your heap is already big,so you may decrease this one. • RUBY_GC_HEAP_GROWTH_MAX_SLOTS (no limit) maximum new slots to add.
  27. 27. New heap layout • Heap consists of pages (instead of slots),and page consists of slots (RVALUEs) • Page size is HEAP_OBJ_LIMIT=16KB,so one page can hold about ~408 object slots. • Heap pages divided into two sub heaps (lists of pages): eden_heap,tomb_heap.
  28. 28. Eden & Tomb eden_heap contains pages with one or more live objects. tomb_heap contains pages with no objects or with zombies or ghosts. (zombies are deferred T_DATA,T_FILE,objects with finalizers,…) When creating a new object (newobj_of) and there is no free pages in eden_heap resurrecting a page from tomb. Filling all slots in eden pages first reduces memory fragmentation.
  29. 29. Lazy memory management • When need to resurrect or allocate new pages to the eden pages allocated/resurrected only one at once. • Laze sweeping: sweep pages until first freed page.
  30. 30. Malloc limits • GC runs when exceeded threshold of memory allocation limit • @ko1 claims that RUBY_GC_MALLOC_LIMIT was 8MB because “Matz used 10MB machine at 20 years old”. • Now default malloc limit is 16 MB • Adaptive malloc limits with RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR (1.4) and RUBY_GC_MALLOC_LIMIT_MAX (32MB) • Similarly,the memory growth associated with oldgen is tracked separately.
  31. 31. Adaptive malloc limits If malloc increase exceeds malloc_limit then increase malloc_limit by a growth factor: 2880 if (inc > malloc_limit) {! 2881 malloc_limit = (size_t)(inc * gc_params.malloc_limit_growth_factor);! 2882 if (gc_params.malloc_limit_max > 0 && /* ignore max-check if 0 */! 2883 malloc_limit > gc_params.malloc_limit_max) {! 2884 malloc_limit = inc;! 2885 }! 2886 } http://rxr.whitequark.org/mri/source/gc.c?v=2.1.0-p0#2880 If malloc increase doesn’t exceed malloc_limit,then decrease malloc_limit: 2887 else {! 2888 malloc_limit = (size_t)(malloc_limit * 0.98); /* magic number */! 2889 if (malloc_limit < gc_params.malloc_limit_min) {! 2890 malloc_limit = gc_params.malloc_limit_min;! 2891 }! 2892 }
  32. 32. Python GC • Reference counter GC • Ref-counter can’t solve cyclic data structures • Generations can solve cyclic data structures • Python implements Generation-Zero,Generation-One, Generation-Two
  33. 33. Weak generational hypothesis 1. The most young objects die young. 2. Older objects are likely to remain alive (active) for a long time (e.g.in ruby T_CLASS,T_MODULE objects).
  34. 34. RGenGC • Two generations: young and old objects. • Two GCs: • Minor: GC on young space; Mark & Sweep • Major: GC on all (both young and old) spaces; Mark & Sweep
  35. 35. Minor GC Mark phase: • Mark and promote to old generation. • Stop traversing after old object. ! ! Sweep phase: • Sweep not marked and not old objects. • Some unreachable objects will not be collected.
  36. 36. Major GC Mark phase: • Mark reachable objects from roots. • Promote new marked to old-gen. Sweep phase: • Sweep all unmarked objects.
  37. 37. Marking leak New object attached to an old object is unreachable for Minor GC,thus it can’t be marked and promoted. Not marked and not old objects are sweeping,thus new object attached to an old object will be swept by Minor GC.
  38. 38. Marking leak fix 1. Add Write Barrier (WB) to detect creation of a “old object”to “new object”reference. 2. Add old objects with references to new objects to Remember set (Rset). 3. At mark phase treat objects from Remember set as root objects.
  39. 39. Shady objects • Write barrier in ruby is complicated. • Write barrier must work with 3rd-party C-extension. • Create objects of two types: Normal objects and Shady objects. • Never promote shady objects to old-gen and mark shady objects every Minor GC.
  40. 40. Minor GC triggers… • Malloc limit exceeded. • Adding new obj,lazy sweep is completed,no free page in eden and we reached limit for attaching/allocating new pages.
  41. 41. Major GC triggers… • Malloc limit exceeded. • [old] + [remembered] > [all objects count] / 2 • [old] > [old count at last major gc] * 2 • [shady] > [shady count at last major gc] * 2
  42. 42. Ruby 2.2 Symbol GC, 2-age promotion strategy for RGenGC, Incremental GC
  43. 43. Symbol GC Collects dynamic symbols such as #to_sym,#intern,… obj = Object.new! 100_000.times do |i|! obj.respond_to?("sym#{i}".to_sym)! end! GC.start! puts"symbol : #{Symbol.all_symbols.size}" % time ruby -v symbol_gc.rb ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0] symbol : 102399 ruby -v symbol_gc.rb 0,29s user 0,05s system 89% cpu 0,386 total % time ruby -v symbol_gc.rb ruby 2.2.0dev (2014-05-15 trunk 45954) [x86_64-darwin13] symbol : 2406 ruby -v symbol_gc.rb 0,34s user 0,05s system 90% cpu 0,426 total
  44. 44. 3-gen RGenGC Promote infant object to old after it survives two GC cycles: infant → young → old ! 3-gen RGenGC might be more suitable for web-apps because if 2- gen GC runs during a request then most of objects created in the request will be marked as old,but they will be ‘unreachable’ immediately after request finishes.
  45. 45. GCs comparison
  46. 46. Rails app emulation @retained = []! @rand = Random.new(999)! ! MAX_STRING_SIZE = 200! ! def stress(allocate_count, retain_count, chunk_size)!   chunk = []!   while retain_count > 0 || allocate_count > 0!     if retain_count == 0 || (@rand.rand < 0.5 && allocate_count > 0)!       chunk << " " * (@rand.rand * MAX_STRING_SIZE).to_i!       allocate_count -= 1!       if chunk.length > chunk_size!         chunk = []!       end!     else!       @retained << " " * (@rand.rand * MAX_STRING_SIZE).to_i!       retain_count -= 1!     end!   end! end! ! start = Time.now! ! # simulate rails boot, 2M objects allocated 600K retained in memory! stress(2_000_000, 600_000, 200_000)! ! # simulate 100 requests that allocate 200K objects each! stress(20_000_000, 0, 200_000)! ! ! puts "Duration: #{(Time.now - start).to_f}"! ! puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1; }'`}" https://github.com/tukan/gc_keynote_code/blob/master/rails_emulation_test.rb
  47. 47. Questions?

×