What lies beneath the beautiful code?


Published on

Talk @ RubyConfIndia 2012. Ruby is a pure object oriented and really a beautiful language to learn and practice.
But most of us do not bother to know or care about what happens behind the scene when we write some ruby code. Say creating a simple Array, Hash, class, module or any object. How does this map internally to C code ?
Ruby interpreter is implemented in C and I will talk about the Interpreter API that we as ruby developers
should be aware of. The main purpose of the presentation is to understand the efforts and complexity behind
the simplicity offered. I would also like to touch upon the difference in implementation of some core data structures
in different ruby versions. Having known a part of C language implementation behind Ruby, I would also like to throw some light upon when and why would we need to write some ruby extensions in C.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Around 300 C files Around 100 .h header files
  • Some Objects are fully specified by a VALUE, eliminating the need to create an actual object in Object Space. This saves a lot of processing cycles and does not functionally compromise the Object Model. These object types are: VALUE as an Immediate Object As we said above, immediate values are not pointers: Fixnum, Symbol, true, false, and nil are stored directly in VALUE. Fixnum values are stored as 31-bit numbers[Or 63-bit on wider CPU architectures.] that are formed by shifting the original number left 1 bit and then setting the least significant bit (bit 0) to ``1.'' When VALUE is used as a pointer to a specific Ruby structure, it is guaranteed always to have an LSB of zero; the other immediate values also have LSBs of zero. Thus, a simple bit test can tell you whether or not you have a Fixnum. There are several useful conversion macros for numbers as well as other standard datatypes shown in Table 17.1 on page 174. The other immediate values (true, false, and nil) are represented in C as the constants Qtrue, Qfalse, and Qnil, respectively. You can test VALUE variables against these constants directly, or use the conversion macros (which perform the proper casting).
  • You save memory since there’s only one copy of the string data, not two, and: You save execution time since there’s no need to call malloc a second time to allocate more memory from the heap.
  • When sharing same string data, memory is saved since there’s only one copy of the string data. When sharing same string data, execution time is saved since there’s no need to call malloc a 2 nd time to allocate more memory from the heap.
  • What lies beneath the beautiful code?

    1. 1. What lies beneath the beautiful code? Ruby Conf India 2012
    2. 2. Ruby Conf India 2012
    3. 3. self.inspect{:name => “Niranjan Sarade”,:role => “ruby developer @ TCS”,:blog => “http://niranjansarade.blogspot.com”:tweet => “twitter.com/nirusuma”,:github => “github.com/NiranjanSarade”} Ruby Conf India 2012
    4. 4. Ruby Conf India 2012
    5. 5. Ruby BeautifulPure object oriented Interpreted Ruby Conf India 2012
    6. 6. Matz’s Ruby Interpreter (MRI) Koichi’s Ruby Interpreter (KRI) Ruby Conf India 2012
    7. 7. Why should we know?Ruby Conf India 2012
    8. 8. Let’s dive in! Ruby Conf India 2012
    9. 9. Why C?Ruby Conf India 2012
    10. 10. TPI Ruby 1.8 Rubysource Tokenize code (yylex) Parse Series of tokens (yacc) Interpret AST Ruby Conf India 2012
    11. 11. TPCI Ruby 1.9 Rubysource Tokenize code (yylex) Parse Series of tokens (Bison) Compile AST (compile.c) Interpret bytecode (YARV) Ruby Conf India 2012
    12. 12. Tokenized Parsed (AST) bytecodeRuby Conf India 2012
    13. 13. Tokenized Parsed (AST) bytecodeRuby Conf India 2012
    14. 14. Ruby Source Overview# README.EXTruby language core class.c : classes and modules error.c : exception classes and exception mechanism gc.c : memory management load.c : library loading object.c : objects variable.c : variables and constantsruby syntax parser parse.y -> parse.c : automatically generated keywords : reserved keywords -> lex.c : automatically generated Ruby Conf India 2012
    15. 15. ruby evaluator (a.k.a. YARV) compile.c eval.c eval_error.c eval_jump.c eval_safe.c insns.def : definition of VM instructions iseq.c : implementation of VM::ISeq thread.c : thread management and context swiching thread_win32.c : thread implementation thread_pthread.c : ditto vm.c vm_dump.c vm_eval.c vm_exec.c vm_insnhelper.c vm_method.c Ruby Conf India 2012
    16. 16. regular expression engine (oniguruma) regex.c regcomp.c regenc.c regerror.c regexec.c regparse.c regsyntax.cutility functions debug.c : debug symbols for C debuggger dln.c : dynamic loading st.c : general purpose hash table strftime.c : formatting times util.c : misc utilities Ruby Conf India 2012
    17. 17. ruby interpreter implementation dmyext.c dmydln.c dmyencoding.c id.c inits.c main.c ruby.c version.cmultilingualization encoding.c : Encoding transcode.c : Encoding::Converter enc/*.c : encoding classes enc/trans/* : codepoint mapping tables Ruby Conf India 2012
    18. 18. class library array.c : Array numeric.c : Numeric, Integer, Fixnum, bignum.c : Bignum compar.c : Comparable Float complex.c : Complex pack.c : Array#pack, String#unpack cont.c : Fiber, Continuation proc.c : Binding, Proc dir.c : Dir process.c : Process enum.c : Enumerable random.c : random number enumerator.c : Enumerator range.c : Range file.c : File rational.c : Rational hash.c : Hash re.c : Regexp, MatchData io.c : IO signal.c : Signal marshal.c : Marshal sprintf.c : math.c : Math string.c : String struct.c : Struct time.c : Time Ruby Conf India 2012
    19. 19. ruby.hStruct Rbasic Struct RRegexpStruct RObject Struct RHashStruct RClass Struct RFileStruct RFloat Struct RBignumStruct RString Struct RArray Ruby Conf India 2012
    20. 20. RObject, RBasic and RClassstruct RObject { struct RClass { struct RBasic basic; struct RBasic basic; union { rb_classext_t *ptr; struct { struct st_table *m_tbl; long numiv; struct st_table *iv_index_tbl; VALUE *ivptr; }; struct st_table *iv_index_tbl; } heap; } as;};struct RBasic { VALUE flags; VALUE klass;}; Ruby Conf India 2012
    21. 21. Instance specific behaviormy_obj = Object.newdef my_obj.hello p “hello”endmy_obj.hello#=> helloObject.new.hello# NoMethodError: # undefined method `hello for #<Object:0x5418467> Ruby Conf India 2012
    22. 22. Conceptual sketch Objectmy_obj klass *m_tbl Object *m_tbl ‘my_objmy_obj *super klass *m_tbl -hello Ruby Conf India 2012
    23. 23. #class.cVALUE make_singleton_class(VALUE obj) { VALUE orig_class = RBASIC(obj)->klass; VALUE klass = rb_class_boot(orig_class); FL_SET(klass, FL_SINGLETON); RBASIC(obj)->klass = klass; return klass; } Ruby Conf India 2012
    24. 24. Am I Immediate Object or Pointer ? VALUE Ruby Conf India 2012
    25. 25. typedef unsigned long VALUE C type for referring to arbitrary ruby objectsStores immediate values of :- Fixnum Symbols True False Nil UndefBit test : If the LSB = 1, it is a Fixnum. If the VALUE is equal to 0,2,4, or 6 it is a special constant: false, true, nil, or undef. If the lower 8 bits are equal to 0xe, it is a Symbol. Otherwise, it is an Object Reference Ruby Conf India 2012
    26. 26. RString#1.8.7 # 1.9.3struct RString { #define RSTRING_EMBED_LEN_MAX ((int) struct RBasic basic; ((sizeof(VALUE)*3)/sizeof(char)-1)) long len; struct RString { char *ptr; struct RBasic basic; union { union { long capa; struct { VALUE shared; long len; } aux; char *ptr;}; union { long capa; VALUE shared; } aux; } heap; char ary[RSTRING_EMBED_LEN_MAX + 1]; } as; }; Ruby Conf India 2012
    27. 27. Ruby Conf India 2012Images created using wordle.net
    28. 28. Heap Strings Heap str RString char *ptr “This is a very very very long len = 46 very very long string”str2 Ruby Conf India 2012
    29. 29. Ruby Conf India 2012
    30. 30. Ruby Conf India 2012
    31. 31. Shared Strings str = "This is a very very very very very long string" str2 = String.new(str) #str2 = str.dup Heap RString char *ptrstr2 long len = 46 VALUE shared “This is a very very very very very long string” RStringstr char *ptr long len = 46 Ruby Conf India 2012
    32. 32. Ruby Conf India 2012
    33. 33. Copy on Write str = "This is a very very very very very long string" str2 = str.dup str2.upcase! Heap RStringstr char *ptr “This is a very very very very very long string” long len = 46 RString “THIS IS A VERY VERY VERYstr2 char *ptr VERY VERY LONG STRING” long len = 46 Ruby Conf India 2012
    34. 34. Ruby Conf India 2012
    35. 35. Embedded Strings str = "This is a very very very very very long string" str2 = str[0..3] #str2 = “This” Heap RStringstr char *ptr “This is a very very very very very long string” long len = 46 Rstringstr2 long len = 4 char ary[] = “This” Ruby Conf India 2012
    36. 36. Ruby Conf India 2012
    37. 37. Shared Strings with slice str = "This is a very very very very very long string" str2 = str[1..-1] #str2 = str[22..-1] # 0 <= start_offset < 46-23 RString Heapstr char *ptr long len = 46 VALUE shared T h i . . i n g RStringstr2 char *ptr long len = 45 Ruby Conf India 2012
    38. 38. Ruby Conf India 2012
    39. 39. String.new(“learning”)Creating a string 23 characters or less is fastestCreating a substring running to the end of the target string is also fastWhen sharing same string data, memory and execution time is savedCreating any other long substring or string, 24 or more bytes, is slower. Ruby Conf India 2012
    40. 40. RHash 1.8.7 :002 > {1 => "a", "f" => "b", 2 => "c"} => {1=>"a", 2=>"c", "f"=>"b"} 1.9.3p0 :001 > {1 => "a", "f" => "b", 2 => "c"} => {1=>"a", "f"=>"b", 2=>"c"}#1.8.7 #1.9.3struct RHash { struct RHash { struct RBasic basic; struct RBasic basic; struct st_table *tbl; struct st_table *ntbl; int iter_lev; int iter_lev; VALUE ifnone; VALUE ifnone;}; };struct st_table { struct st_table { struct st_hash_type *type; const struct st_hash_type *type; int num_bins; st_index_t num_bins; int num_entries; ... struct st_table_entry **bins; struct st_table_entry **bins;}; struct st_table_entry *head, *tail; };struct st_table_entry { struct st_table_entry { st_data_t key; st_data_t key; st_data_t record; st_data_t record; st_table_entry *next; st_table_entry *next;}; st_table_entry *fore, *back; }; Ruby Conf India 2012
    41. 41. RHash 1.8.7 st_table_entries key1 value key3 value x st_table key2 value xnum_entries = 4num_bins = 5 **bins key4 value x hash buckets - slots Ruby Conf India 2012
    42. 42. RHash 1.9.3 st_table_entries 1x key1 value key2 value 3 4x 2 st_table 3 key3 value 4 2 3num_entries = 4num_bins = 5 **bins*head*tail 4 key4 value 1x 3 4x hash buckets - slots Ruby Conf India 2012
    43. 43. Ruby Conf India 2012
    44. 44. C Extensions – why and when ?PerformanceUsing C libraries from ruby applicationsUsing ruby gems with native C extensionse.g. mysql, nokogiri, eventmachine, RedCloth, Rmagick, libxml-ruby, etcSince ruby interpreter is implemented in C, its API can be used Ruby Conf India 2012
    45. 45. My fellow istPatrick Shaughnessy Ruby Conf India 2012
    46. 46. Image Creditshttp://pguims-random-science.blogspot.in/2011/08/ten-benefits-of-scuba-diving.htmlhttp://www.istockphoto.com/stock-illustration-7620122-tree-roots.phphttp://horror.about.com/od/horrortoppicklists/tp/familyfriendlyhorror.01.htmhttp://www.creditwritedowns.com/2011/07/european-monetary-union-titanic.htmlhttp://irvine-orthodontist.com/wordpress/for-new-patients/faqs Ruby Conf India 2012
    47. 47. Thank you all for being patient and hearing me out ! Hope this helps you ! Ruby Conf India 2012