#rubykaigi 03
nari/@nari3/authorNari
Network Applied Communication Laboratory
2013/05/31
Ruby's GC 2.0
Self-introduction
➔nari, @nari3, authorNari
➔A CRuby committer
➔GC entertainer
➔“Nakamura”
– is the most powerful clan in Ruby World
I went to Cebu in Philippines
➔I studied English a month.
➔But I can't speak English....
Because I was always alone
Because I'm shy
Today's Agenda
➔Non-recursive Marking
➔Bitmap Marking
– My work in Ruby 2.0.0
What is GC ?
GC collects all
dead objects
What is a dead
object?
What is a dead object?
➔A dead object is an object that is
no longer referenced by the
program
➔In GC terms, we say a that dead
object is unreachable from Roots
What is Roots?
➔Roots is a set of pointers that
directly reference objects in the
program.
– e.g. Ruby's local variables, etc..
What is GC ?
GC collects objects
that are unreachable
from Roots.
CRuby's GC
Summary
CRuby's GC
➔Mark&Sweep
➔Mark phase
– mark all live(reachable) objects
➔Sweep phase
– free up all dead(unreachable) objects
– Unmark all marked objects
[]
Root
@mami
'Leg' []
'Body' []
@mami = nil
@mami = ['Leg']
@mami[1] = ['Body']
@mami[1][1] = ['Head']
'Head'
@mami[1].pop #=> ['Head']
GC.start
[]
Root
@mami
'Leg' []
'Body' []
'Head'
Mark phase
[]
'Leg' []
'Body'
mark
mark
mark
mark all live(reachable) objects
[]
Root
@mami
'Leg' []
'Body' []
'Head'
Sweep phase
[]
'Leg' []
[]'Body'
unmark
unmark unmark
unmark
free
Free all dead(unreachable) objects
Unmark all marked objects
You can buy a GC book at
RubyKaigi!!
with autograph :)
http://d.hatena.ne.jp/mnishikawa/20100508/1273411900
Please don't throw this away
Non-recursive
Marking
Introduction of
Recursive Marking
(a traditional way in CRuby)
Recursive Marking
An object graph Machine Stack
gc_mark()
gc_mark()
gc_mark()
Frame
gc_mark()
gc_mark()
gc_mark()
Recursive call
A bad case of a
simple recursive call
Recursive Marking
A deep object graph
gc_mark()
gc_mark()
gc_mark()
Frame
gc_mark()
gc_mark()
gc_mark()
gc_mark()
・
・
・
Max
Suddenly
SEGV
Overflow!!
Machine Stack
In order to avoid a
stack overflow,
CRuby adopted ...
Knuth's
Algorithm
Photo: http://www.cs.cuw.edu/museum/History.html
What's a Knuth's Algorithm?
➔To avoid a stack overflow
➔There is a fail-safe system which
consists of two stages.
– Using a marking buffer
– Rescanning all objects
Using a marking buffer
A
gc_mark()
gc_mark()
Frame gc_mark()
・
・
・
Max
Machine Stack
B
D E
C
F
G
・・・
Marking buffer
B C
A
CB
push push
Avoding overflow!!
Marking all objects of the
marking buffer at the end
of the mark phaseA
Frame gc_mark()
Machine Stack
B
D E
C
F
G
・・・
Marking buffer
B C
A
CB
gc_mark()
rescan rescan
D E F
G
How do you deal with
an overflow of the
marking buffer?
Rescanning all objects
A
B
D E
C
F
G
・・・
Marking buffer
S O
A
CB
overflow!!
R A HIgnoring
rescan rescan
rescan
D E F
G
It's very slow!!
There are two
problems
1. fail-safe system is slow
➔Rescanning is so slow.
– If you have some deep object graphs,
GC may be always slow with
rescanning.
2. We can't precisely check
stack overflow
➔There is a trade-off between speed
and precision.
– Marking will be slow if we check stack
overflow in each gc_mark().
– So we checked it at the appropriate
time.
– But, it's not precise.
2. We can't precisely check
stack overflow
➔This causes SEGV in the worst case
scenario
– For instance, Fiber sometimes fails
unexpectedly.
– Fiber uses small machine stack(128 KB)
– At times, checking for stack overflows
doesn't work well with Fiber.
So I decided to say
good bye to Knuth
Non-recursive
Marking
Non-recursive Marking
➔Marking w/o the machine stack
– w/ own Array based stack
➔Recursive => Iterative
Rescanning all objects
A
B
D E
C
F
G
Stack chunk B
A
CB
F
G
mark
C
mark
F
mark
G
mark
mark
Marking stack
Allocating new
a stack chunk
A
B
D E
C
F
G
Marking stack
X
A
CB
F
G
X
mark
X X X X DStack chunk
E Allocate!
Pros and Cons
➔Pros
– Good-bye complex fail-safe systems
– Good-bye SEGV!
➔Cons
– Fast enough?
– There is a risk of allocating a stack
chunk during GC
mark benchmark OPTS="-r 5"
https://gist.github.com/authorNari/3806667
vm3_gc
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
origin
non-recursive
bm_gc_deep.rb
https://gist.github.com/authorNari/3812118
depth=240 depth=500
0
2
4
6
8
10
12
14
16
origin
non-recursive
(sec)
In fact, Ruby 1.9.3
has backported this
patch :-)
You can buy a GC book at
RubyKaigi!!
Bitmap
Marking
Bitmap Marking in CRuby
➔Mark-bits separate from object
headers
– for CoW friendly
➔REE has adopted this approach
– Since 2008
– But we can't import this patch
What's CopyOnWrite?
(Unix)
Process 1
(P1)
Page table
Memory Space
fork()
Process 2
(P2)
Page table
At first, P1 and P2 use
same memory space.
What's CopyOnWrite?
(Unix)
Process 1
(P1)
Page table
Memory Space
fork()
Process 2
(P2)
Page table
write
P1 private use P2 private use
copy
If we have many forked processes
Process 1
Shared
Process 2
P1 P2
Process 3 Process 4 ・・・
copy
P3 P4
write
・・・
Increase memory usage of all forked processes
Marking in the old way
… 16KB …Object
… 16KB …Object
・
・
・
mb mb mb mb mb mb mb
Ruby Heap
HeapBlock 1
(HB)
HeapBlock 2
GC.start
… 16KB …Object
… 16KB …Object
・
・
・
Ruby Heap
mb mb mb mb mb mb mb
write write write write
HeapBlock 1
(HB)
HeapBlock 2
This Marking is
CoW not friendly!!
Memory SpaceHB1 HB2 HB3
Process 1 Process 2
GC.start!!
write write write
copy
HB2HB1 HB3
Bitmap Marking
Mark-bits are separated from the heap
… 16KB …Object
… 16KB …Object
・
・
・
mb mb mb mb mb mb mb
Ruby Heap
HeapBlock 1
HeapBlock 2
Bitmap
header
header
This Marking is
CoW friendly!!
Memory SpaceHB1 HB2 HB3
Process 1 Process 2
GC.start!!
write
copy
BM
Bitmap Bitmap
decrease!!
BitmapMarking
makes prefork server
happy!
e.g. Unicorn
w/ marking in the old way
Memory SpaceHB1 HB2
UP1(parent) UP2(child)
GC.start!!
read only
write
copy
・・・
Rails Rails app
read/write
app
write write
Rails Rails
e.g. Unicorn
w/ BitmapMarking
Memory SpaceHB1 HB2
UP1(parent) UP2(child)
GC.start!!
read only
write
copy
・・・
Rails Rails app
read/write
app
write
Rails
Bitmap Bitmap
How do you find an
appropriate bit in
Bitmap?
Finding an appropriate
bit for an object
… 16KB …HB 1 mark
Bitmap
Header
16KB align
(low 13 bits must be 0)
Allocate a heap block using memory align
& ~0x3fff
HB1
mark
…
How do you allocate
aligned memory?
Allocating aligned memory
➔Using posix_memalign()
– For Unix-like OS
➔Using _aligned_malloc()
– For Windows OS
– mingw: __mingw_aligned_malloc()
Allocating aligned memory
➔Using malloc()
● Thanks to yugui-san's help!
– For other environments
– For instance, Max OS X Lion and so on
– It allocates 32KB and returns an address
which is a multiple of 16KB
● 16KB memory space is wasted
● We should use mmap(), but ....
The structure of
Ruby Heap was
changed.
The structure of Heap in Ruby 1.8
Object
heaps
Object
heap block
Object
Object
heap block
header
header
slot
・
・
・
・
・
・
The structure of Heap in Ruby 2.0
heaps
slot
freelist
freelist
Each slot has a freelist
Benchmark
skkzipcode
https://github.com/authorNari/skkzipcode
shared memory private memory
0
50
100
150
200
250
origin
bmap
(MB)
You can buy a GC book at
RubyKaigi!!
Future
Other plans
➔Introduce new obj_(alloc/free)
events to TracePont.
➔mmap()/munmap()
rgengc
ko1-san deserves praise!
http://www.flickr.com/photos/recompile_net/4612052730
Conclusion
Conclusion
➔I implemented Non-recursive
Marking and Bitmap Marking.
➔Rgengc is so cooooool!
Thank you!

Ruby's GC 20