Parallel worlds of CRuby's GC
Upcoming SlideShare
Loading in...5
×
 

Parallel worlds of CRuby's GC

on

  • 20,952 views

I talked this presentation at rubyconf 2011. yay!

I talked this presentation at rubyconf 2011. yay!

Statistics

Views

Total Views
20,952
Views on SlideShare
7,380
Embed Views
13,572

Actions

Likes
20
Downloads
47
Comments
1

16 Embeds 13,572

http://d.hatena.ne.jp 12777
http://code.kzakza.com 301
http://code.kzakza.com 301
http://paper.li 113
https://www.google.co.jp 22
http://a0.twimg.com 18
http://coderwall.com 14
http://webcache.googleusercontent.com 8
http://www.google.co.jp 5
http://cache.yahoofs.jp 3
http://translate.googleusercontent.com 3
http://www.slideshare.net 2
http://us-w1.rockmelt.com 2
https://www.google.com 1
https://www.google.ca 1
https://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Parallel worlds of CRuby's GC Parallel worlds of CRuby's GC Presentation Transcript

    • Parallel worlds of CRubys GC nari/Narihiro Nakamura/ @nari_en Network Applied Communication Laboratory Ltd.Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Im very happy now.
    • Today is my firstpresentation in English.
    • My English is not good.
    • But, Ill do my best.Please bear with me :)
    • Self introduction
    • Ice-cream factory ✓ I worked in an assembly line ✓ For example, I made many cardboard boxes. ✓ I was a professional cardboard box maker :) 8/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Ice-cream factory ✓ I made 150 boxes per hour (ZOMG) 9/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • I was like a machine!! http://www.flickr.com/photos/kevincollins123/5887984753/
    • Working with Java ✓ I worked in a big company. ✓ This work was similar to assembly line work.. ✓ I made a part of a product. I didnt understand whole product. 13/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • I was still like a machine!! http://www.flickr.com/photos/kevincollins123/5887984753/
    • My current work ✓ Currently, I work at NaCl. ✓ matz and shyouhei and takaokouji are my co-workers. ✓ shugo is my boss. ✓ They are CRuby committers. 17/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • When I started Ruby programming ✓ I felt free. ✓ This work wasnt similar to assembly line work. ✓ I could make the whole product. 18/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • I was no longer a machine!! http://www.flickr.com/photos/danzden/121379782/
    • Garbage Collection for me ✓ GC technology is very interesting for me. ✓ GC is a garbage collecting machine. ✓ Ive been creating it since then. Its very fun!! 21/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Im making a machine!!
    • My relationship to GC
    • Im a CRuby Committer ✓ I work on GC. 24/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • And, I wrote abook about GC.
    • But, its only in Japanese :(
    • And, Ive been creating GC with RDD.
    • What is RDD?
    • RDD = RubyKaigi Driven Development
    • My RDD history ✓ LazySweepGC - RubyKaigi2008 ✓ LonglifeGC - 2009 ✓ LazySweepGC - 2010 ✓ ParallelMarkingGC - 2011 30/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • My RDD history ✓ LazySweepGC - RubyKaigi2008 ✓ LonglifeGC - 2009 ✓ LazySweepGC - 2010 ✓ ParallelMarkingGC - 2011 31/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • LonglifeGC ✓ It treats long-life objects as a special case. ✓ similar to Generational GC. ✓ LonglifeGC was rejected in CRuby 1.9.2 by some reason. ✓ :( 32/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • But, LonglifeGC has been used in Kiji :-) http://www.flickr.com/photos/conifer/2389654222/
    • Kiji ✓ Kiji is an optimized version of REE by Twitter developers. ✓ The twitter team substantially extended LonglifeGC. ✓ Its cool!! 34/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • But, Kiji will be rejected also... :(
    • My RDD history ✓ LazySweepGC - RubyKaigi2008 ✓ LonglifeGC - 2009 ✓ LazySweepGC - 2010 ✓ ParallelMarkingGC - 2011 36/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • LazySweepGC ✓ Traditional M&S GC executes mark and sweep atomically. ✓ Ruby application stops during GC (stop-the-world). ✓ In Lazy sweeping, sweeping is lazy. 37/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • LazySweepGC ✓ Each invocation of the object allocation sweeps Rubys heap ✓ until it finds an appropriate free object. 38/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Improvements ✓ This improves the response time of GC ✓ I.e. the worst case time of GC decreases. 39/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • LazySweepGC ✓ You can use LazySweepGC since Ruby 1.9.3 40/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • My RDD history ✓ LazySweepGC - RubyKaigi2008 ✓ LonglifeGC - 2009 ✓ LazySweepGC - 2010 ✓ ParallelMarkingGC - 2011 41/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Todays topics
    • Todays topics ✓ Why do we need Parallel Marking? ✓ What to consider? ✓ How to implement? ✓ How much did performance improve? 43/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Todays topics ✓ Why do we need Parallel Marking? ✓ What to consider? ✓ How to implement? ✓ How much did performance improve? 44/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Why do we need Parallel Marking?
    • This is CRubys current GC.
    • Current CRubys GC ✓ GC operates on only 1 core. ✓ In multi-core environment, other cores dont help GC. 47/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • GC:"Im alone, its so hard." http://www.flickr.com/photos/hortont/2698261070/
    • We should run GC in parallel!! http://www.flickr.com/photos/knallaerbse/2863161933/
    • First, Let me explain afew GC related concepts.
    • What is GC? ✓ GC collects all dead objects. 51/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • What is a dead object? ✓ A dead object is an object that is never referenced by the program. ✓ In GC terms, we say a that dead object is unreachable from Roots. 52/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • What is Roots? ✓ Roots is a set of pointers that directly reference objects in the program. ✓ e.g. Rubys local variables, etc.. 53/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • For example 54/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Please remember that ✓ GC collects objects that are unreachable from Roots. 55/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Next, Let me explain the current CRuby GC algorithm.
    • CRubys GC algorithm summary ✓ CRuby adopts the Mark & Sweep algorithm ✓ Collector works in separate Mark and Sweep phases. 57/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • In the Mark phase ✓ collector marks live objects that are reachable from Roots. 58/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • For example 59/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Mark phase with GC.start 60/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Ruby Heap after marking 61/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • In the Sweep phase ✓ collector sweeps "dead" objects ✓ "dead" means unmarked ✓ "dead" means unreachable from Roots 62/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Sweep phase 63/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Characteristics of CRubys GC
    • Characteristics ✓ The stop-the-world algorithm ✓ Single thread execution 65/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Recently, PC has multi-core processors. But, ✓ GC executes on a single thread. ✓ Other cores dont work during GC. ✓ What a waste!! 66/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • How can we fix this?
    • UseParallel Marking,Luke
    • What is Parallel Marking?
    • What is Parallel Marking? ✓ Collector run several marking processes in parallel ✓ by using native threads. ✓ We will be happy on multi-core machine. 70/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Flow diagram for Parallel Marking 71/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • BTW: Why not performsweeping in parallel?
    • Why not perform sweeping in parallel ✓ The sweeping is much faster than the marking. ✓ You can see ko1s research ✓ <URL:http://www.atdot.net/~ko1/ diary/201011.html#d4> 73/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Why not perform sweeping in parallel ✓ So, Mark phase improvement = GC improvement ✓ And, we already have the lazy sweeping. 74/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Todays topics ✓ Why do we need Parallel Marking? ✓ What to consider? ✓ How to implement? ✓ How much did performance improve? 75/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • What to consider whenimplementing Parallel Marking?
    • We should consider two problems ✓ Workload balancing ✓ Wait-free algorithm 77/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Workload balancing
    • How can we divide themarking task into sub- tasks?
    • I tried think about a simple approach.
    • 1 branch of Roots ismarked by 1 thread.
    • This means.. ✓ Tasks are distributed to multiple threads. ✓ The task of marking the entire heap is divided into several tasks, each marking a single branch. 84/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • This seems to be no problem.
    • But actually, this solutionsuffers from the workload problem.
    • Each thread doesnt know what the other threads are doing.
    • For instance, if A and B finishes work early,
    • then, they will stop doing anything :(
    • I think "machines should work forever" :D
    • So, I think A and B should ...
    • http://www.flickr.com/photos/ryanr/157458385/
    • Parallel Marking with Task Stealing.
    • If A and B finishes work early,
    • This is called"Task Stealing"
    • We should consider two problems ✓ Workload balancing ✓ Wait-free algorithm 97/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Wait-free algorithm
    • What does "wait-free" mean? ✓ A wait-free program does non- blocking execution. ✓ It guarantees per-thread progress. 99/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Why is wait-free important?
    • Amdahls law
    • Amdahls law is used to find the maximum expected improvement to an overall system when only part of the system is improved. [cited from `Amdahls law - Wikipedia] 102/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Amdahls law is used in parallel computing ✓ If parallel portion of the system is X% ✓ And number of processors is Y, ✓ How much speedup can we expect? 103/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Its worse than expected, right?
    • The conclusion so far
    • The conclusion so far ✓ We should consider how we can efficiently balance workloads. ✓ So, we use Task Stealing. ✓ We should eliminate non-parallel parts ✓ by using wait-free algorithm. 109/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Todays topics ✓ Why do we need Parallel Marking? ✓ What to consider? ✓ How to implement? ✓ How much did performance improve 110/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • How to implementParallel Marking?
    • Task Stealing ✓ In Task Stealing, threads steal tasks from each other ✓ Task Stealing is achieved with Aroras Deque 112/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Aroras Deque ✓ Deque stands for the Double- Ended Queue. ✓ In Aroras Deque, the deque contains tasks as elements. ✓ Its a wait-free data structure. 113/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Aroras Deque has only three operations.
    • Each mark worker has a single deque.
    • Only the owner can call pop() and push().
    • Worker can call shift() to steal other workers deque.
    • "Hey wait a minute, doesnt shift() havecontention problems?"
    • In what ways could shift() cause contention problems? e.g... ✓ Multi-thread (workers) may call shift() of same deque at the same time. 122/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • In what ways could shift() cause contention problems? e.g... ✓ shift() and pop() could be called at the same time ✓ when deque has only one element. 123/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • But, Aroras Deque avoidsthese contention problems.
    • Serialization ✓ shift() is serialized by using CAS. ✓ CAS = Compare And Swap ✓ And, this serialization doesnt use a lock. ✓ Its wait-free!! 125/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • I omit details of theimplementation of the serialization.
    • For the sake of thispresentation, lets assumethat Aroras Deque avoids contention problems.
    • Summary for Aroras Deque ✓ A simple data structure for Task Stealing. ✓ Each worker has a single deque. ✓ Stealing (shift operation) is wait- free! 128/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • How to use Aroras Deque in Parallel Marking?
    • First try:A task is an object.
    • Lets say that worker A has a branch that is composed of 4 objects.
    • We start by marking A and pushing it to the deque.
    • pop A, mark B and C, push B and C.
    • pop C, mark D, push D
    • pop D, pop B
    • This is a branch marking.
    • How do you steal?
    • Suppose that worker1 has task B and C. Worker2 has no task.
    • Worker2 steals task B on Worker1 by using shift().
    • Summary ✓ Marker uses Aroras Deque as a marking stack. ✓ A "task" means an object. ✓ The granularity of the task is very fine. ✓ This is a naive implementation. 140/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • I implemented this approach.
    • But..
    • Its slowerthan original GC.
    • OMG...http://www.flickr.com/photos/emariephotos/4958245676/
    • I fell intothe Pitfalls ofParallel Processing(PPP!!!)
    • Why slow?
    • Why slow? ✓ pop(),push(),shift() are called frequently. ✓ Because deque has fine-grained tasks. ✓ Their overhead is too big. 147/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • How to fix this?
    • We can make the tasks less fine-grained.
    • A task is a branch
    • All branches in Roots are divided roughly among the deques.
    • Each Worker marks a branch in its deque.
    • When the deque is empty, the workersteals a branch from another worker.
    • like this!!
    • Good point & Bad point ✓ Number of calls to Deques operations was reduced. ✓ Marking speed of the worker is improved. ✓ However, Coarse-grained tasks decrease parallelism. 155/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Why do coarse-grained tasks decrease parallelism?
    • Tasks may involve a large branch.
    • If an object in Bs branch has many child objects..
    • .. then A cant steal it while B is marking the large branch.
    • So, the worker needs totreat large branches as special cases.
    • Almost all large branches hold large Array objectsand/or large Hash objects.
    • Treatment for large Array objects and Hash objects ✓ Each marker has a special deque to manage them. ✓ A marker divides them into fixed size tasks. ✓ e.g. 0-9 elements of Array, 10-19 elements of Array... 162/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Treatment for Large Array and Hash ✓ By doing this, other workers can steal divided tasks. ✓ This improves parallelism. 163/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Summary ✓ The naive implementation was slow. ✓ Grain of the task was too fine. ✓ A "task" means a branch in Roots ✓ Grain of the task is coarse. ✓ Its faster!! 164/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Todays topics ✓ Why do we need Parallel Marking? ✓ What to consider? ✓ How to implement? ✓ How much did performance improve? 165/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • How much didperformance improve?
    • These are my machine specs ✓ My machine has only 2 cores ✓ Memory: 8GB ✓ OS: Linux 167/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Parallel marking uses 4 marking threads.
    • First benchmark program is ✓ make benchmark ✓ This is the benchmark which used in CRuby development 169/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Why does this seem so slow? ✓ I think its affected by Parallel Markings preparation. ✓ e.g. creating marking threads, allocation of deques. 171/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Why does this seem so slow? ✓ In most of the benchmarks, the mark target objects are few. ✓ In this case, Parallel Marking cost is expensive. 172/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Next benchmark program is ✓ make rdoc ✓ make rdoc generates the Ruby documentation. ✓ This benchmark measures execution time and the GC execution time of make rdoc. 173/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • make rdoc ✓ It takes about 80 seconds on my machine. ✓ In fact, 30% of that time is spent on GC!! ✓ How much did performance improve? 174/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • All GC time is improved by 40%!
    • So fast!!
    • In many core environment ✓ I expect we get a large improvement. ✓ e.g. 8 core, 16 core... ✓ But, my machine has just 2 cores. ✓ I cant see it :( 178/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Best case for Parallel GC ✓ If the objects are many. ✓ In this case, mark targets is also many. ✓ If the objects are long-lived. ✓ Server-side application? 179/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Demo
    • Demonstration ✓ I want to show the performance improvement with Parallel GC. ✓ This demonstration is video game style. 181/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Let me explain about this game.
    • And, Character has HP.
    • When GC runs,
    • the character loses HP while waiting for the GC to finish.
    • We must reach the goal before HP run out.
    • Other characteristics of SUPER NARIO GC ✓ GC is running in fixed intervals. ✓ A lot of objects are generated to increase GCs burden. ✓ Burden = Game Level 187/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Try to compare Original GC and Parallel GC ✓ Original GC pause time is long. ✓ This game will be difficult. ✓ Parallel GC pause time is short. ✓ This game will be easy. 188/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • OK, Lets try!
    • DEMOOriginal GC version
    • Oops.. so difficult!!!
    • DEMOParallel GC version
    • Wow!! Easy!!!!
    • Lets compare average times GC
    • Fast!!
    • Remaining Problems
    • Windows OS is not supported ✓ Mark Worker uses pthread as native thread. ✓ And, uses some gcc built-in functions. ✓ But, Ill support for Windows eventually. 198/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Increased memory usage. ✓ Size of 1 Deque is roughly 32KB. ✓ But generally multi-core machine have plenty of memory. ✓ So, I think its OK :P 199/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Conclusion
    • Conclusion ✓ I implemented Parallel Marking GC ✓ GC was improved! ✓ Ill report to ruby-core soon. 201/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Conclusion ✓ But, Parallel Marking has some problems. ✓ Ill fix these. 202/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • source code ✓ Parallel Marking GC ✓ <URL:https://github.com/authorNari/ ruby/tree/pmark_div_root2> ✓ SUPER NARIO GC ✓ <URL:https://github.com/authorNari/ nario/> 203/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Acknowledgments ✓ Following people helped me make this presentation!! ✓ Tor-san!! ✓ matz, shugo, yhara, sada, takaokouji, other co-workers!! 204/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3
    • Thank you!!!
    • Do you have any questions?Please short and simple questions :)
    • Sorry ✓ Its too difficult for me to understand/answer the question. ✓ Could be send the question on twitter(@nari_en)? 207/207Parallel worlds of CRubys GC Powered by Rabbit 0.9.3