Your SlideShare is downloading. ×
0
HIGH PERFORMANCE       RUBY
Hiya• Charles Oliver Nutter• headius@headius.com• @headius• JVM language guy at Red Hat (JBoss)
Performance?• Writing code • Man hours more expensive than CPU    hours • Developer contentedness• Running code • Straight...
High Performance?• Faster than... • ...other Ruby impls? • ...other language runtimes? • ...unmanaged languages, like C? •...
“Fast Enough”• 1.8.7 was fast enough• 1.9.3 is fast enough• Unless it’s not fast enough • Does it matter?
Performance Wall• Move to a different runtime• Move to a different language • ...in whole or part
If you’re not writing perf-  sensitive code in Ruby,you’re giving up too easily.
Native Extensions• Not universally bad• Just bad in MRI • Invasive • Pointers • Few guarantees
What We Want• Faster execution• Better GC• Parallel execution• Big data
What We Can’t Have• Faster execution• Better GC• Parallel execution• Big data
Different Approach• Build our own runtime? • YARV, Rubinius, MacRuby• Use an existing runtime? • JRuby, MagLev, MacRuby, I...
Build or Buy• Making a new VM is “easy”• Making it competitive is really hard• I mean really, really, really hard
JVM• 15+ years of engineering by whole teams• FOSS• Fastest VM available• Best GCs available• Full parallel threading with...
But Java is Slow!• Java is very, very fast • Literally, C fast in many cases• Java applications can be slow • Oh hey, just...
JRuby• Java (and Ruby) impl of Ruby on JVM• Same memory, threading model• JRuby JITs to JVM bytecode• End of story, right?
Long, Hard Road• Interpreter optimization• JVM bytecode compiler• Optimizing core class methods• Lather, rinse, and repeat
Align with JVM• Individual arguments on call stack• JVM local variables• Avoid artificial framing• Avoid inter-call goo• El...
Unnecessary Work• Modules are maps • Name to method • Name to constant • Name to class var• Instance variables as maps• Wa...
Method Lookup• Inside a class/module • Current class’s methods (a map) • Methods retrieved from class + ancestors • Serial...
Thing                 Person           Placeobj.to_s   Rubyist        Other
Method lookups go up-hierarchy            Thing                                 Person           Place       obj.to_s     ...
to_sMethod lookups go up-hierarchy            Thing                                 Person            Place       obj.to_s...
to_sMethod lookups go up-hierarchy            Thing Lookup target caches result                                 Person    ...
Method lookups go up-hierarchy             Thing Lookup target caches result                                  Person      ...
Method lookups go up-hierarchy             Thing Lookup target caches resultModification cascades down         Person      ...
Method lookups go up-hierarchy             Thing                                             to_s Lookup target caches res...
Constant Lookup• Cache at lookup site• Global serial/switch indicates staleness • Complexities of lookup, etc • Joy of Rub...
Instance Vars• Class holds a table of offsets• Object holds array of values• Call site caches offset plus class ID• Same c...
Optimizing Ruby• Make calls fast• Make constants free• Make instance variables cheap• Make closures lightweight • TODO
What isinvokedynamic?
Invoke?
Invoke?That’s one use, but there are many others
Dynamic?
Dynamic?Dynamic typing is a common reason,    but there are many others
JVM 101
JVM 101200 opcodes
JVM 101       200 opcodesTen (or 16) “data endpoints”
JVM 101                   200 opcodes            Ten (or 16) “data endpoints”   Invocation invokevirtualinvokeinterface in...
JVM 101                   200 opcodes            Ten (or 16) “data endpoints”   Invocation       Field Access invokevirtua...
JVM 101                   200 opcodes            Ten (or 16) “data endpoints”   Invocation       Field Access     Array Ac...
JVM 101                   200 opcodes            Ten (or 16) “data endpoints”   Invocation       Field Access      Array A...
JVMOpcodes
JVM              Opcodes Invocation       Field Access   Array Access invokevirtual      getfield                         ...
JVM              Opcodes Invocation       Field Access   Array Access invokevirtual      getfield                         ...
JVM              Opcodes Invocation       Field Access   Array Access invokevirtual      getfield                         ...
JVM              Opcodes Invocation       Field Access   Array Access invokevirtual      getfield                         ...
In Detail• JRuby generates code with indy calls• JVM at first call asks JRuby what to do• JRuby provides function pointers ...
invokedynamic bytecode
invokedynamic bytecodebo  ot     stra         p             m              et                ho                  d
invokedynamic bytecodebo  ot     stra         p             m              et                ho                  d        ...
invokedynamic bytecode                                  target methodbo  ot     stra         p             m              ...
invokedynamic bytecode                                  target methodbo  ot     stra         p             m              ...
invokedynamic bytecode                                  target methodbo  ot     stra         p             m              ...
Dynamic Invocation                  Target                  Object                         associated withobj.foo()   JVM ...
Dynamic InvocationVM Operations                      Target                      Object                             associ...
Dynamic InvocationVM Operations                      Target                      Object                             associ...
Dynamic InvocationVM Operations  Method Lookup                 Target                                Object               ...
Dynamic InvocationVM Operations  Method Lookup                 Target     Branch                                Object    ...
Dynamic InvocationVM Operations  Method Lookup              Target     Branch  Method Cache               Object          ...
Constants               JVM       ConstantMY_CONST                         Lookup Call Site
ConstantsVM Operations                  JVM       Constant MY_CONST                            Lookup   Call Site
ConstantsVM Operations                  JVM       Constant MY_CONST                            Lookup   Call Site
ConstantsVM Operations  Lookup Value                   JVM        Constant MY_CONST                         value Lookup  ...
ConstantsVM Operations   Lookup Value Bind Permanently                       JVM       Constant MY_CONST             value...
Instance Variables                  Target                  Object                        associated with@bar     JVM     ...
Instance VariablesVM Operations                        Target                        Object                              a...
Instance VariablesVM OperationsInstance Var Lookup         Target                            Object                       ...
Instance VariablesVM OperationsInstance Var Lookup         Target    Offset Cache                            Object       ...
Instance VariablesVM OperationsInstance Var Lookup         Target    Offset Cache   Access Object            Object       ...
Instance VariablesVM OperationsInstance Var Lookup         Target    Offset Cache   Access Object            Object       ...
InvokeDynamic letsJRuby teach the JVM   how Ruby works
How Do We Know  We’ve Succeeded?• Benchmarking• Monitoring• User reports
Benchmarking is Hard• Runtimes may improve over time• Optimizer may eliminate useless code• Small systems are completely d...
bench_empty_methoddef foo; self; endi = 0while i < 10_000_000  foo; foo; foo; foo; foo  i += 1end
Ruby 1.9.3   JRuby      JRuby + indy4s3s2s1s0s                     ZOMG                            40X FA                 ...
Observations
One slow runtimescrews up the table
...do comparisons asratios against a norm
JRuby calls emptymethods really fast!!!
InvokeDynamic doesnot do much for us?
Ruby 1.9.3   JRuby   JRuby + indy4s3s2s1s0s
JVM Opto 101• JITs code bodies after 10k calls • No 10k calls, no JIT (generally)• Inlines up to two targets• Optimistic •...
SMALL CODE ISDIFFERENT THAN  LARGE CODE
Inlining• Call site in method A and method B match• JVM treats them as though B lived in A • No call overhead • Variables ...
Optimistic• Say we have a system...• The only method dynamically called is “foo”• All logic for dyncall revolves around “f...
bench_empty_method2def foo; self; enddef bar1; self; enddef bar2; self; endi = 0while i < 10_000_000  bar1; bar1; bar1; ba...
bench1   bench2   bench1 + indy   bench2 + indy  0.7s0.525s 0.35s0.175s   0s
bench1 + rbx    bench2 + rbx   bench1 + indy       bench2 + indy0.4s0.3s0.2s0.1s 0s
What Happened?• An unrelated change slowed our bench?• Not really unrelated • Hotspot optimizes early loop first • Later lo...
Benchmarking is       Not Enough• Need to monitor runtime optimization • JIT compilation • Inlining • Eventual native code...
1711   4 %    bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes)             @ 59 java.lang.invoke.MethodHandle::i...
1711   4 %    bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes)             @ 59 java.lang.invoke.MethodHandle::i...
1711   4 %    bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes)             @ 59 java.lang.invoke.MethodHandle::i...
Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants]  # {method} method__0$RUBY...
Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants]  # {method} method__0$RUBY...
Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants]  # {method} method__0$RUBY...
bench_empty_method3def invoker1  i = 0  while i < 1000    foo; foo; foo; foo; foo    i+=1  endend...  i = 0  while i < 100...
bench1 + indy   bench2 + indy   bench3 + indy 0.15s0.113s0.075s0.038s   0s
Moral• Benchmarks are synthetic• Every system is different• Do your own testing
bench_red_black• Pure-Ruby red/black tree impl• Build a 100k tree of rand(999_999)• Delete all nodes• Build it again• Sear...
Ruby 1.9.3    JRuby - indy     JRuby + indy                     bench_red_black  5s3.75s 2.5s1.25s  0s
bench_fractalbench_flipflop_fractal• Mandelbrot generator • Integer loops • Floating-point math• Julia generator using flip-fl...
def fractal_flipflop  w, h = 44, 54  c = 7 + 42 * w  a = [0] * w * h  g = d = 0  f = proc do |n|    a[c] += 1    o = a.map...
def fractal_flipflop  w, h = 44, 54  c = 7 + 42 * w  a = [0] * w * h  g = d = 0  f = proc do |n|    a[c] += 1    o = a.map...
Ruby 1.9.3   JRuby - indy    JRuby + indy                      bench_fractal  1.5s1.125s 0.75s0.375s   0s
Ruby 1.9.3      JRuby - indy          JRuby + indy                      bench_flipflop_fractal  1.5s1.125s 0.75s0.375s   0s
Rails?
Rails Perf• Mixed bag right now...some fast some slow• JVM JIT limits need to be bumped up • Significant gains for some fol...
What Next?
Expand Opto• Mixed-arity (ADD SLIDES ABOUT WHAT  WE OPTIMIZE TODAY)• Super calls• Much, much lighter-weight closures• Then...
Wacky Stuff• define_method methods?• method_missing call-throughs?• respond_to???• proc tables?• All possible...but worth it?
The Future• JRuby will continue to get faster • Indy improvements at VM-level • Compiler improvements at Ruby level• If yo...
Q/A
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
Upcoming SlideShare
Loading in...5
×

High Performance Ruby - Golden Gate RubyConf 2012

14,012

Published on

Published in: Business, Technology
2 Comments
17 Likes
Statistics
Notes
  • @DanielLucraft I have played with a few ways to optimize those cases. The simplest is to have those hashes be sized as small as possible...a single bucket, for example. If they're modified they'll end up rehashing, and if they're not the linear search is as fast as hashing. Optimizing all the way to the assignment is tricky, but possible, if we can defer creating the hash until the target method body. At that point we can decide if the keyword args need to go into a hash or if they can just be used directly. This is also something we will *need* to do to optimize Ruby 2.0's support for keyword args.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Good talk. Some good satisfying graphs in there, I was so pleased :)

    I was wondering the other day, whether you could optimize away hash option args, like optimize this:

    def initiailze(options)
    @name = options[:name]
    @height = options[:height]
    end

    to run internally as this:

    def initialize(name, height)
    @name, @height = name, height
    end
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
14,012
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
68
Comments
2
Likes
17
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • Ruby is already a &amp;#x201C;high performance&amp;#x201D; language when it comes to writing code\n
  • \n
  • \n
  • \n
  • Many better reasons... differently expressive languages, differently fun, designed for the problem at hand...\n
  • \n
  • \n
  • \n
  • \n
  • _why&amp;#x2019;s potion, MA Cournoyer&amp;#x2019;s tinyrb, the thousand other Ruby impls\nRubinius? 5 years with two fulltime people, hundreds of contributors. 1.5 years since last release.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Two hard things in CS: cache invalidation and naming things (and off by one errors)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Comparisons as ratios...sometimes. Often a stark difference sells the point better.\n
  • \n
  • Comparisons as ratios...sometimes. Often a stark difference sells the point better.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Rails applications are incredibly big systems compared to benchmarks\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "High Performance Ruby - Golden Gate RubyConf 2012"

    1. 1. HIGH PERFORMANCE RUBY
    2. 2. Hiya• Charles Oliver Nutter• headius@headius.com• @headius• JVM language guy at Red Hat (JBoss)
    3. 3. Performance?• Writing code • Man hours more expensive than CPU hours • Developer contentedness• Running code • Straight line
    4. 4. High Performance?• Faster than... • ...other Ruby impls? • ...other language runtimes? • ...unmanaged languages, like C? • ...you need it to be?
    5. 5. “Fast Enough”• 1.8.7 was fast enough• 1.9.3 is fast enough• Unless it’s not fast enough • Does it matter?
    6. 6. Performance Wall• Move to a different runtime• Move to a different language • ...in whole or part
    7. 7. If you’re not writing perf- sensitive code in Ruby,you’re giving up too easily.
    8. 8. Native Extensions• Not universally bad• Just bad in MRI • Invasive • Pointers • Few guarantees
    9. 9. What We Want• Faster execution• Better GC• Parallel execution• Big data
    10. 10. What We Can’t Have• Faster execution• Better GC• Parallel execution• Big data
    11. 11. Different Approach• Build our own runtime? • YARV, Rubinius, MacRuby• Use an existing runtime? • JRuby, MagLev, MacRuby, IronRuby
    12. 12. Build or Buy• Making a new VM is “easy”• Making it competitive is really hard• I mean really, really, really hard
    13. 13. JVM• 15+ years of engineering by whole teams• FOSS• Fastest VM available• Best GCs available• Full parallel threading with guarantees• Broad platform support
    14. 14. But Java is Slow!• Java is very, very fast • Literally, C fast in many cases• Java applications can be slow • Oh hey, just like Ruby?• The way you write code is more important than the language you use.
    15. 15. JRuby• Java (and Ruby) impl of Ruby on JVM• Same memory, threading model• JRuby JITs to JVM bytecode• End of story, right?
    16. 16. Long, Hard Road• Interpreter optimization• JVM bytecode compiler• Optimizing core class methods• Lather, rinse, and repeat
    17. 17. Align with JVM• Individual arguments on call stack• JVM local variables• Avoid artificial framing• Avoid inter-call goo• Eliminate unnecessary work
    18. 18. Unnecessary Work• Modules are maps • Name to method • Name to constant • Name to class var• Instance variables as maps• Wasted cycles without caching
    19. 19. Method Lookup• Inside a class/module • Current class’s methods (a map) • Methods retrieved from class + ancestors • Serial or switch indicates staleness • Weak list of child classes• Class mutation cascades down hierarchy
    20. 20. Thing Person Placeobj.to_s Rubyist Other
    21. 21. Method lookups go up-hierarchy Thing Person Place obj.to_s Rubyist Other
    22. 22. to_sMethod lookups go up-hierarchy Thing Person Place obj.to_s Rubyist Other
    23. 23. to_sMethod lookups go up-hierarchy Thing Lookup target caches result Person Place obj.to_s Rubyist Other
    24. 24. Method lookups go up-hierarchy Thing Lookup target caches result Person Place to_s obj.to_s Rubyist Other
    25. 25. Method lookups go up-hierarchy Thing Lookup target caches resultModification cascades down Person Place to_s obj.to_s Rubyist Other
    26. 26. Method lookups go up-hierarchy Thing to_s Lookup target caches resultModification cascades down Person Place to_s obj.to_s Rubyist Other
    27. 27. Constant Lookup• Cache at lookup site• Global serial/switch indicates staleness • Complexities of lookup, etc • Joy of Ruby interfering with Joy of Opto• Modifying constants triggers invalidation
    28. 28. Instance Vars• Class holds a table of offsets• Object holds array of values• Call site caches offset plus class ID• Same class, no lookup cost • Can be polymorphically chained
    29. 29. Optimizing Ruby• Make calls fast• Make constants free• Make instance variables cheap• Make closures lightweight • TODO
    30. 30. What isinvokedynamic?
    31. 31. Invoke?
    32. 32. Invoke?That’s one use, but there are many others
    33. 33. Dynamic?
    34. 34. Dynamic?Dynamic typing is a common reason, but there are many others
    35. 35. JVM 101
    36. 36. JVM 101200 opcodes
    37. 37. JVM 101 200 opcodesTen (or 16) “data endpoints”
    38. 38. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation invokevirtualinvokeinterface invokestatic invokespecial
    39. 39. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation Field Access invokevirtual getfieldinvokeinterface setfield invokestatic getstatic invokespecial setstatic
    40. 40. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic
    41. 41. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic All Java code revolves around these endpoints Remaining ops are stack, local vars, flow control allocation, and math/boolean/bit operations
    42. 42. JVMOpcodes
    43. 43. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic
    44. 44. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic Stack Local Vars Flow Control Allocation Boolean and Numeric
    45. 45. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic Stack Local Vars Flow Control Allocation Boolean and Numeric
    46. 46. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic Stack Local Vars Flow Control Allocation Boolean and Numeric
    47. 47. In Detail• JRuby generates code with indy calls• JVM at first call asks JRuby what to do• JRuby provides function pointers to code• Pointers include guards, invalidation logic• JRuby and JVM cooperate on optimizing
    48. 48. invokedynamic bytecode
    49. 49. invokedynamic bytecodebo ot stra p m et ho d
    50. 50. invokedynamic bytecodebo ot stra p m et ho d method handles
    51. 51. invokedynamic bytecode target methodbo ot stra p m et ho d method handles
    52. 52. invokedynamic bytecode target methodbo ot stra p m et ho d method handles
    53. 53. invokedynamic bytecode target methodbo ot stra p m et ho d method handles
    54. 54. Dynamic Invocation Target Object associated withobj.foo() JVM Method Table def foo ... def bar ...
    55. 55. Dynamic InvocationVM Operations Target Object associated with obj.foo() JVM Method Table Call Site def foo ... def bar ...
    56. 56. Dynamic InvocationVM Operations Target Object associated with obj.foo() JVM Method Table Call Site def foo ... def bar ...
    57. 57. Dynamic InvocationVM Operations Method Lookup Target Object associated with obj.foo() JVM Method Table Call Site def foo ... def foo ... def bar ...
    58. 58. Dynamic InvocationVM Operations Method Lookup Target Branch Object associated with obj.foo() JVM Method Table Call Site def foo ... def foo ... def bar ...
    59. 59. Dynamic InvocationVM Operations Method Lookup Target Branch Method Cache Object associated with obj.foo() JVM def foo ... Method Table Call Site def foo ... def bar ...
    60. 60. Constants JVM ConstantMY_CONST Lookup Call Site
    61. 61. ConstantsVM Operations JVM Constant MY_CONST Lookup Call Site
    62. 62. ConstantsVM Operations JVM Constant MY_CONST Lookup Call Site
    63. 63. ConstantsVM Operations Lookup Value JVM Constant MY_CONST value Lookup Call Site
    64. 64. ConstantsVM Operations Lookup Value Bind Permanently JVM Constant MY_CONST value Lookup Call Site
    65. 65. Instance Variables Target Object associated with@bar JVM Offset Table “@foo” => 0 “@bar” => 1
    66. 66. Instance VariablesVM Operations Target Object associated with @bar JVM Offset Table Access Site “@foo” => 0 “@bar” => 1
    67. 67. Instance VariablesVM OperationsInstance Var Lookup Target Object associated with @bar JVM Offset Table Access Site “@foo” => 0 “@bar” => 1
    68. 68. Instance VariablesVM OperationsInstance Var Lookup Target Offset Cache Object associated with @bar JVM 1 Offset Table Access Site “@foo” => 0 “@bar” => 1
    69. 69. Instance VariablesVM OperationsInstance Var Lookup Target Offset Cache Access Object Object associated with @bar JVM 1 Offset Table Access Site “@foo” => 0 “@bar” => 1
    70. 70. Instance VariablesVM OperationsInstance Var Lookup Target Offset Cache Access Object Object associated with @bar JVM 1 Offset Table Access Site “@foo” => 0 “@bar” => 1
    71. 71. InvokeDynamic letsJRuby teach the JVM how Ruby works
    72. 72. How Do We Know We’ve Succeeded?• Benchmarking• Monitoring• User reports
    73. 73. Benchmarking is Hard• Runtimes may improve over time• Optimizer may eliminate useless code• Small systems are completely different• Know how your runtime optimizes!
    74. 74. bench_empty_methoddef foo; self; endi = 0while i < 10_000_000 foo; foo; foo; foo; foo i += 1end
    75. 75. Ruby 1.9.3 JRuby JRuby + indy4s3s2s1s0s ZOMG 40X FA STER!
    76. 76. Observations
    77. 77. One slow runtimescrews up the table
    78. 78. ...do comparisons asratios against a norm
    79. 79. JRuby calls emptymethods really fast!!!
    80. 80. InvokeDynamic doesnot do much for us?
    81. 81. Ruby 1.9.3 JRuby JRuby + indy4s3s2s1s0s
    82. 82. JVM Opto 101• JITs code bodies after 10k calls • No 10k calls, no JIT (generally)• Inlines up to two targets• Optimistic • Early decisions may be wrong • Small code looks drastically different
    83. 83. SMALL CODE ISDIFFERENT THAN LARGE CODE
    84. 84. Inlining• Call site in method A and method B match• JVM treats them as though B lived in A • No call overhead • Variables visible across call boundary • More complete view for optimization
    85. 85. Optimistic• Say we have a system...• The only method dynamically called is “foo”• All logic for dyncall revolves around “foo”• Hotspot thinks all dyncalls will be “foo”
    86. 86. bench_empty_method2def foo; self; enddef bar1; self; enddef bar2; self; endi = 0while i < 10_000_000 bar1; bar1; bar1; bar1; bar1 bar2; bar2; bar2; bar2; bar2 i += 1end...
    87. 87. bench1 bench2 bench1 + indy bench2 + indy 0.7s0.525s 0.35s0.175s 0s
    88. 88. bench1 + rbx bench2 + rbx bench1 + indy bench2 + indy0.4s0.3s0.2s0.1s 0s
    89. 89. What Happened?• An unrelated change slowed our bench?• Not really unrelated • Hotspot optimizes early loop first • Later loop is different...calls “foo” • Assumptions change, perf looks different
    90. 90. Benchmarking is Not Enough• Need to monitor runtime optimization • JIT compilation • Inlining • Eventual native code (x86 ASM)• Fun?
    91. 91. 1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot)
    92. 92. 1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot)
    93. 93. 1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot)
    94. 94. Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants] # {method} method__0$RUBY$foo (Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject; inbench_empty_method # parm0: rsi:rsi = bench_empty_method # parm1: rdx:rdx = org/jruby/runtime/ThreadContext # parm2: rcx:rcx = org/jruby/runtime/builtin/IRubyObject # parm3: r8:r8 = org/jruby/runtime/Block # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq
    95. 95. Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants] # {method} method__0$RUBY$foo (Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject; inbench_empty_method # parm0: rsi:rsi = bench_empty_method # parm1: rdx:rdx = org/jruby/runtime/ThreadContext # parm2: rcx:rcx = org/jruby/runtime/builtin/IRubyObject # parm3: r8:r8 = org/jruby/runtime/Block # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq
    96. 96. Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants] # {method} method__0$RUBY$foo (Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject; inbench_empty_method # parm0: rsi:rsi = bench_empty_method # parm1: rdx:rdx = org/jruby/runtime/ThreadContext # parm2: rcx:rcx = org/jruby/runtime/builtin/IRubyObject # parm3: r8:r8 = org/jruby/runtime/Block # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq
    97. 97. bench_empty_method3def invoker1 i = 0 while i < 1000 foo; foo; foo; foo; foo i+=1 endend... i = 0 while i < 10000 invoker1 i+=1 end
    98. 98. bench1 + indy bench2 + indy bench3 + indy 0.15s0.113s0.075s0.038s 0s
    99. 99. Moral• Benchmarks are synthetic• Every system is different• Do your own testing
    100. 100. bench_red_black• Pure-Ruby red/black tree impl• Build a 100k tree of rand(999_999)• Delete all nodes• Build it again• Search for elements• In-order walks, min, max
    101. 101. Ruby 1.9.3 JRuby - indy JRuby + indy bench_red_black 5s3.75s 2.5s1.25s 0s
    102. 102. bench_fractalbench_flipflop_fractal• Mandelbrot generator • Integer loops • Floating-point math• Julia generator using flip-flops • I don’t really understand it.
    103. 103. def fractal_flipflop w, h = 44, 54 c = 7 + 42 * w a = [0] * w * h g = d = 0 f = proc do |n| a[c] += 1 o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/) puts "f" + o.map {|l| l.rstrip }.join("n") d += 1 - 2 * ((g ^= 1 << n) >> n) c += [1, w, -1, -w][d %= 4] end 1024.times do !!(!!(!!(!!(!!(!!(!!(!!(!!(true... f[0])...f[1])...f[2])... f[3])...f[4])...f[5])... f[6])...f[7])...f[8]) endend
    104. 104. def fractal_flipflop w, h = 44, 54 c = 7 + 42 * w a = [0] * w * h g = d = 0 f = proc do |n| a[c] += 1 o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/) puts "f" + o.map {|l| l.rstrip }.join("n") d += 1 - 2 * ((g ^= 1 << n) >> n) c += [1, w, -1, -w][d %= 4] end 1024.times do !!(!!(!!(!!(!!(!!(!!(!!(!!(true... f[0])...f[1])...f[2])... f[3])...f[4])...f[5])... f[6])...f[7])...f[8]) endend
    105. 105. Ruby 1.9.3 JRuby - indy JRuby + indy bench_fractal 1.5s1.125s 0.75s0.375s 0s
    106. 106. Ruby 1.9.3 JRuby - indy JRuby + indy bench_flipflop_fractal 1.5s1.125s 0.75s0.375s 0s
    107. 107. Rails?
    108. 108. Rails Perf• Mixed bag right now...some fast some slow• JVM JIT limits need to be bumped up • Significant gains for some folks• Long warmup times for so much code• Work continues!
    109. 109. What Next?
    110. 110. Expand Opto• Mixed-arity (ADD SLIDES ABOUT WHAT WE OPTIMIZE TODAY)• Super calls• Much, much lighter-weight closures• Then what?
    111. 111. Wacky Stuff• define_method methods?• method_missing call-throughs?• respond_to???• proc tables?• All possible...but worth it?
    112. 112. The Future• JRuby will continue to get faster • Indy improvements at VM-level • Compiler improvements at Ruby level• If you can’t compete with JVM...• Still FOSS from top to bottom • Don’t be afraid!
    113. 113. Q/A
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×