Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python and Ruby VMs


Published on

Moscow Big Systems/Big Data, April 2013 meetup presentation slides

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Python and Ruby VMs

  1. 1. Python and Ruby VMsCPython and Matzs Ruby Implementation details
  2. 2. Why should you care about Ruby● Opscode Chef● Puppet● VMware Cloud Foundry● Red Hat OpenShift● Redmine
  3. 3. Why should you care about Python● OpenStack● Mercurial● Bazaar
  4. 4. Matzs Ruby Implementation (MRI) / Yet another Ruby VM (YARV)
  5. 5. Matzs Ruby Implementation (MRI) /Yet another Ruby VM (YARV) outline● Memory management ○ Automatic, full heap mark-sweep GC● Execution model ○ Bytecode interpretation (stack machine) from 1.9 (YARV) ○ Direct AST interpretation before 1.9 (MRI)● Concurrency ○ Multi-threaded, one active interpreter thread at time ○ Green threads before 1.9 (MRI), OS level threads in 1.9 (YARV)● Method calls ○ Late binding, search for method in class dict by name
  6. 6. Typical interpreter execution model AST Bytecode Script Parsing generation If ... ... Instruction a Currently ... Instruction b executed ... Instruction c instruction a=1 a=2 ... Heap Interpreter thread stacks
  7. 7. GIL ownership diagram Thread 1 IO Interpreting IO Waiting Interpreting GIL state Owned by Owned by Owned by Free Thread 1 Thread 2 Thread 1 Thread 2 IO Interpreting IO Waiting
  8. 8. MRI memory allocation diagramFree list 1 Free list 2Object Objectpool 1 pool 2 RString RArray data data Heap
  9. 9. MRI memory allocation● Any ruby object is allocated on heap (even local variables)● SLAB like allocation for Ruby objects ○ C union is used, hence all objects are of the same size (40 bytes) ○ unlike typical SLAB allocator there is only one size of objects to store● RString, RArray, RHash, etc. have a pointer on external memory block containing the actual contents
  10. 10. MRI memory allocation (continue)● External memory block for string or array is allocated using plain malloc● String content can be shared between several objects (copy on write)● 1.9 changes: small strings (23 bytes or less) are embedded into RString structure rather than allocated externally
  11. 11. MRI GC● If there is no free slot for an object GC is run ○ If there is still no free slot new slab (pool) is allocated ■ Unlike Java GC is not triggered only when all heap is utilized● Stop the world mark-sweep GC ○ Unlike Java or .NET there is no generations
  12. 12. MRI GC (continue)● 1.9.3 changes: lazy sweep GC ○ "In Lazy sweeping, each invocation of the object allocation sweeps the heap until it finds an appropriate free object" ■ i. e. just search for object marked as dead instead of building free lists● 2.0 changes ○ Instead of marking live objects with FL_MARK flag external bitmap is created ■ This allows to avoid excessive copies of memory regions in forked processes
  13. 13. Real world Ruby usage stories● Twitter switch from Ruby to Scala: http: //www.artima. com/scalazine/articles/twitter_on_scala.html● switch from Ruby to Go: http://blog. servers-to-2-go.html
  14. 14. MRI Links● Threads in Ruby discussion: http: // ruby-have-real-multithreading● MRI GC slides: http://timetobleed. com/garbage-collection-slides-from-la-ruby- conference/
  15. 15.
  16. 16. CPyton VM outline● Memory management ○ Automatic, reference counting● Execution model ○ Bytecode interpretation (stack machine) ○ Maps, lists, tuples are created and managed by bytecode instructions● Concurrency ○ Multi-threaded, one active interpreter thread at time● Method calls ○ Late binding, search for method in class dict by name
  17. 17. Python GC● CPython uses reference counting to track object visibility ○ Python uses global interpreter lock in order to avoid synchronization on each reference operation● Cyclic references ○ Example: l = []; l.append(l); del l ○ Cyclic references are only possible for "container" objects● The GC for cyclic references has been included since version 2.2 and is enabled by default
  18. 18. Search for cyclic references inCPython (generations)● The GC classifies objects into three generations depending on how many collection sweeps they have survived ○ New objects are placed in the youngest generation (generation 0) ○ If an object survives a collection it is moved into the next older generation ○ Since generation 2 is the oldest generation, objects in that generation remain there after a collection
  19. 19. Search for cyclic references inCPython (activation)● When the number of allocations minus the number of deallocations exceeds first threshold (gc.get_threshold), collection starts ○ Initially only generation 0 is examined ○ If generation 0 has been examined more than second threshold times since generation 1 has been examined, then generation 1 is examined as well ○ Third threshold controls the number of collections of generation 1 before collecting generation 2
  20. 20. Objects with __del__ method inreference cycle● Which __del__ method for two objects in cycle should be called first? ○ After calling the first finalizer the object cannot be freed as the second finalizer still may access it● Cycles that are referenced from objects with finalizers are added to a global list of uncollectable garbage (gc.garbage) ○ The program can access the global list and free cycles in a way that makes sense for application
  21. 21. CPython links● Python GC description: http://arctrix. com/nas/python/gc/● GC module documentation: http://docs.● Python method call description: http://css. callables-0