Python and Ruby VMs


Published on

Moscow Big Systems/Big Data, April 2013 meetup presentation slides

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Python and Ruby VMs

  1. 1. Python and Ruby VMsCPython and Matzs Ruby Implementation details
  2. 2. Why should you care about Ruby● Opscode Chef● Puppet● VMware Cloud Foundry● Red Hat OpenShift● Redmine
  3. 3. Why should you care about Python● OpenStack● Mercurial● Bazaar
  4. 4. Matzs Ruby Implementation (MRI) / Yet another Ruby VM (YARV)
  5. 5. Matzs Ruby Implementation (MRI) /Yet another Ruby VM (YARV) outline● Memory management ○ Automatic, full heap mark-sweep GC● Execution model ○ Bytecode interpretation (stack machine) from 1.9 (YARV) ○ Direct AST interpretation before 1.9 (MRI)● Concurrency ○ Multi-threaded, one active interpreter thread at time ○ Green threads before 1.9 (MRI), OS level threads in 1.9 (YARV)● Method calls ○ Late binding, search for method in class dict by name
  6. 6. Typical interpreter execution model AST Bytecode Script Parsing generation If ... ... Instruction a Currently ... Instruction b executed ... Instruction c instruction a=1 a=2 ... Heap Interpreter thread stacks
  7. 7. GIL ownership diagram Thread 1 IO Interpreting IO Waiting Interpreting GIL state Owned by Owned by Owned by Free Thread 1 Thread 2 Thread 1 Thread 2 IO Interpreting IO Waiting
  8. 8. MRI memory allocation diagramFree list 1 Free list 2Object Objectpool 1 pool 2 RString RArray data data Heap
  9. 9. MRI memory allocation● Any ruby object is allocated on heap (even local variables)● SLAB like allocation for Ruby objects ○ C union is used, hence all objects are of the same size (40 bytes) ○ unlike typical SLAB allocator there is only one size of objects to store● RString, RArray, RHash, etc. have a pointer on external memory block containing the actual contents
  10. 10. MRI memory allocation (continue)● External memory block for string or array is allocated using plain malloc● String content can be shared between several objects (copy on write)● 1.9 changes: small strings (23 bytes or less) are embedded into RString structure rather than allocated externally
  11. 11. MRI GC● If there is no free slot for an object GC is run ○ If there is still no free slot new slab (pool) is allocated ■ Unlike Java GC is not triggered only when all heap is utilized● Stop the world mark-sweep GC ○ Unlike Java or .NET there is no generations
  12. 12. MRI GC (continue)● 1.9.3 changes: lazy sweep GC ○ "In Lazy sweeping, each invocation of the object allocation sweeps the heap until it finds an appropriate free object" ■ i. e. just search for object marked as dead instead of building free lists● 2.0 changes ○ Instead of marking live objects with FL_MARK flag external bitmap is created ■ This allows to avoid excessive copies of memory regions in forked processes
  13. 13. Real world Ruby usage stories● Twitter switch from Ruby to Scala: http: //www.artima. com/scalazine/articles/twitter_on_scala.html● switch from Ruby to Go: http://blog. servers-to-2-go.html
  14. 14. MRI Links● Threads in Ruby discussion: http: // ruby-have-real-multithreading● MRI GC slides: http://timetobleed. com/garbage-collection-slides-from-la-ruby- conference/
  15. 15.
  16. 16. CPyton VM outline● Memory management ○ Automatic, reference counting● Execution model ○ Bytecode interpretation (stack machine) ○ Maps, lists, tuples are created and managed by bytecode instructions● Concurrency ○ Multi-threaded, one active interpreter thread at time● Method calls ○ Late binding, search for method in class dict by name
  17. 17. Python GC● CPython uses reference counting to track object visibility ○ Python uses global interpreter lock in order to avoid synchronization on each reference operation● Cyclic references ○ Example: l = []; l.append(l); del l ○ Cyclic references are only possible for "container" objects● The GC for cyclic references has been included since version 2.2 and is enabled by default
  18. 18. Search for cyclic references inCPython (generations)● The GC classifies objects into three generations depending on how many collection sweeps they have survived ○ New objects are placed in the youngest generation (generation 0) ○ If an object survives a collection it is moved into the next older generation ○ Since generation 2 is the oldest generation, objects in that generation remain there after a collection
  19. 19. Search for cyclic references inCPython (activation)● When the number of allocations minus the number of deallocations exceeds first threshold (gc.get_threshold), collection starts ○ Initially only generation 0 is examined ○ If generation 0 has been examined more than second threshold times since generation 1 has been examined, then generation 1 is examined as well ○ Third threshold controls the number of collections of generation 1 before collecting generation 2
  20. 20. Objects with __del__ method inreference cycle● Which __del__ method for two objects in cycle should be called first? ○ After calling the first finalizer the object cannot be freed as the second finalizer still may access it● Cycles that are referenced from objects with finalizers are added to a global list of uncollectable garbage (gc.garbage) ○ The program can access the global list and free cycles in a way that makes sense for application
  21. 21. CPython links● Python GC description: http://arctrix. com/nas/python/gc/● GC module documentation: http://docs.● Python method call description: http://css. callables-0