High Performance Ruby - Golden Gate RubyConf 2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

High Performance Ruby - Golden Gate RubyConf 2012

on

  • 14,452 views

 

Statistics

Views

Total Views
14,452
Views on SlideShare
14,385
Embed Views
67

Actions

Likes
17
Downloads
64
Comments
2

9 Embeds 67

https://twitter.com 41
http://staging.slideshare.com 8
http://www.w3schools.com 5
http://faxo.com 4
https://staging-assets.local.twitter.com 3
http://tweetedtimes.com 2
http://nuevospowerpoints.blogspot.com.es 2
http://twitter.com 1
https://si0.twimg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • @DanielLucraft I have played with a few ways to optimize those cases. The simplest is to have those hashes be sized as small as possible...a single bucket, for example. If they're modified they'll end up rehashing, and if they're not the linear search is as fast as hashing. Optimizing all the way to the assignment is tricky, but possible, if we can defer creating the hash until the target method body. At that point we can decide if the keyword args need to go into a hash or if they can just be used directly. This is also something we will *need* to do to optimize Ruby 2.0's support for keyword args.
    Are you sure you want to
    Your message goes here
    Processing…
  • Good talk. Some good satisfying graphs in there, I was so pleased :)

    I was wondering the other day, whether you could optimize away hash option args, like optimize this:

    def initiailze(options)
    @name = options[:name]
    @height = options[:height]
    end

    to run internally as this:

    def initialize(name, height)
    @name, @height = name, height
    end
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • Ruby is already a “high performance” language when it comes to writing code\n
  • \n
  • \n
  • \n
  • Many better reasons... differently expressive languages, differently fun, designed for the problem at hand...\n
  • \n
  • \n
  • \n
  • \n
  • _why’s potion, MA Cournoyer’s tinyrb, the thousand other Ruby impls\nRubinius? 5 years with two fulltime people, hundreds of contributors. 1.5 years since last release.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Two hard things in CS: cache invalidation and naming things (and off by one errors)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • Also loading constants, which are read-only; not as interesting\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Comparisons as ratios...sometimes. Often a stark difference sells the point better.\n
  • \n
  • Comparisons as ratios...sometimes. Often a stark difference sells the point better.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Rails applications are incredibly big systems compared to benchmarks\n
  • \n
  • \n
  • \n
  • \n
  • \n

High Performance Ruby - Golden Gate RubyConf 2012 Presentation Transcript

  • 1. HIGH PERFORMANCE RUBY
  • 2. Hiya• Charles Oliver Nutter• headius@headius.com• @headius• JVM language guy at Red Hat (JBoss)
  • 3. Performance?• Writing code • Man hours more expensive than CPU hours • Developer contentedness• Running code • Straight line
  • 4. High Performance?• Faster than... • ...other Ruby impls? • ...other language runtimes? • ...unmanaged languages, like C? • ...you need it to be?
  • 5. “Fast Enough”• 1.8.7 was fast enough• 1.9.3 is fast enough• Unless it’s not fast enough • Does it matter?
  • 6. Performance Wall• Move to a different runtime• Move to a different language • ...in whole or part
  • 7. If you’re not writing perf- sensitive code in Ruby,you’re giving up too easily.
  • 8. Native Extensions• Not universally bad• Just bad in MRI • Invasive • Pointers • Few guarantees
  • 9. What We Want• Faster execution• Better GC• Parallel execution• Big data
  • 10. What We Can’t Have• Faster execution• Better GC• Parallel execution• Big data
  • 11. Different Approach• Build our own runtime? • YARV, Rubinius, MacRuby• Use an existing runtime? • JRuby, MagLev, MacRuby, IronRuby
  • 12. Build or Buy• Making a new VM is “easy”• Making it competitive is really hard• I mean really, really, really hard
  • 13. JVM• 15+ years of engineering by whole teams• FOSS• Fastest VM available• Best GCs available• Full parallel threading with guarantees• Broad platform support
  • 14. But Java is Slow!• Java is very, very fast • Literally, C fast in many cases• Java applications can be slow • Oh hey, just like Ruby?• The way you write code is more important than the language you use.
  • 15. JRuby• Java (and Ruby) impl of Ruby on JVM• Same memory, threading model• JRuby JITs to JVM bytecode• End of story, right?
  • 16. Long, Hard Road• Interpreter optimization• JVM bytecode compiler• Optimizing core class methods• Lather, rinse, and repeat
  • 17. Align with JVM• Individual arguments on call stack• JVM local variables• Avoid artificial framing• Avoid inter-call goo• Eliminate unnecessary work
  • 18. Unnecessary Work• Modules are maps • Name to method • Name to constant • Name to class var• Instance variables as maps• Wasted cycles without caching
  • 19. Method Lookup• Inside a class/module • Current class’s methods (a map) • Methods retrieved from class + ancestors • Serial or switch indicates staleness • Weak list of child classes• Class mutation cascades down hierarchy
  • 20. Thing Person Placeobj.to_s Rubyist Other
  • 21. Method lookups go up-hierarchy Thing Person Place obj.to_s Rubyist Other
  • 22. to_sMethod lookups go up-hierarchy Thing Person Place obj.to_s Rubyist Other
  • 23. to_sMethod lookups go up-hierarchy Thing Lookup target caches result Person Place obj.to_s Rubyist Other
  • 24. Method lookups go up-hierarchy Thing Lookup target caches result Person Place to_s obj.to_s Rubyist Other
  • 25. Method lookups go up-hierarchy Thing Lookup target caches resultModification cascades down Person Place to_s obj.to_s Rubyist Other
  • 26. Method lookups go up-hierarchy Thing to_s Lookup target caches resultModification cascades down Person Place to_s obj.to_s Rubyist Other
  • 27. Constant Lookup• Cache at lookup site• Global serial/switch indicates staleness • Complexities of lookup, etc • Joy of Ruby interfering with Joy of Opto• Modifying constants triggers invalidation
  • 28. Instance Vars• Class holds a table of offsets• Object holds array of values• Call site caches offset plus class ID• Same class, no lookup cost • Can be polymorphically chained
  • 29. Optimizing Ruby• Make calls fast• Make constants free• Make instance variables cheap• Make closures lightweight • TODO
  • 30. What isinvokedynamic?
  • 31. Invoke?
  • 32. Invoke?That’s one use, but there are many others
  • 33. Dynamic?
  • 34. Dynamic?Dynamic typing is a common reason, but there are many others
  • 35. JVM 101
  • 36. JVM 101200 opcodes
  • 37. JVM 101 200 opcodesTen (or 16) “data endpoints”
  • 38. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation invokevirtualinvokeinterface invokestatic invokespecial
  • 39. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation Field Access invokevirtual getfieldinvokeinterface setfield invokestatic getstatic invokespecial setstatic
  • 40. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic
  • 41. JVM 101 200 opcodes Ten (or 16) “data endpoints” Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic All Java code revolves around these endpoints Remaining ops are stack, local vars, flow control allocation, and math/boolean/bit operations
  • 42. JVMOpcodes
  • 43. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic
  • 44. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic Stack Local Vars Flow Control Allocation Boolean and Numeric
  • 45. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic Stack Local Vars Flow Control Allocation Boolean and Numeric
  • 46. JVM Opcodes Invocation Field Access Array Access invokevirtual getfield *aloadinvokeinterface setfield *astore invokestatic getstatic b,s,c,i,l,d,f,a invokespecial setstatic Stack Local Vars Flow Control Allocation Boolean and Numeric
  • 47. In Detail• JRuby generates code with indy calls• JVM at first call asks JRuby what to do• JRuby provides function pointers to code• Pointers include guards, invalidation logic• JRuby and JVM cooperate on optimizing
  • 48. invokedynamic bytecode
  • 49. invokedynamic bytecodebo ot stra p m et ho d
  • 50. invokedynamic bytecodebo ot stra p m et ho d method handles
  • 51. invokedynamic bytecode target methodbo ot stra p m et ho d method handles
  • 52. invokedynamic bytecode target methodbo ot stra p m et ho d method handles
  • 53. invokedynamic bytecode target methodbo ot stra p m et ho d method handles
  • 54. Dynamic Invocation Target Object associated withobj.foo() JVM Method Table def foo ... def bar ...
  • 55. Dynamic InvocationVM Operations Target Object associated with obj.foo() JVM Method Table Call Site def foo ... def bar ...
  • 56. Dynamic InvocationVM Operations Target Object associated with obj.foo() JVM Method Table Call Site def foo ... def bar ...
  • 57. Dynamic InvocationVM Operations Method Lookup Target Object associated with obj.foo() JVM Method Table Call Site def foo ... def foo ... def bar ...
  • 58. Dynamic InvocationVM Operations Method Lookup Target Branch Object associated with obj.foo() JVM Method Table Call Site def foo ... def foo ... def bar ...
  • 59. Dynamic InvocationVM Operations Method Lookup Target Branch Method Cache Object associated with obj.foo() JVM def foo ... Method Table Call Site def foo ... def bar ...
  • 60. Constants JVM ConstantMY_CONST Lookup Call Site
  • 61. ConstantsVM Operations JVM Constant MY_CONST Lookup Call Site
  • 62. ConstantsVM Operations JVM Constant MY_CONST Lookup Call Site
  • 63. ConstantsVM Operations Lookup Value JVM Constant MY_CONST value Lookup Call Site
  • 64. ConstantsVM Operations Lookup Value Bind Permanently JVM Constant MY_CONST value Lookup Call Site
  • 65. Instance Variables Target Object associated with@bar JVM Offset Table “@foo” => 0 “@bar” => 1
  • 66. Instance VariablesVM Operations Target Object associated with @bar JVM Offset Table Access Site “@foo” => 0 “@bar” => 1
  • 67. Instance VariablesVM OperationsInstance Var Lookup Target Object associated with @bar JVM Offset Table Access Site “@foo” => 0 “@bar” => 1
  • 68. Instance VariablesVM OperationsInstance Var Lookup Target Offset Cache Object associated with @bar JVM 1 Offset Table Access Site “@foo” => 0 “@bar” => 1
  • 69. Instance VariablesVM OperationsInstance Var Lookup Target Offset Cache Access Object Object associated with @bar JVM 1 Offset Table Access Site “@foo” => 0 “@bar” => 1
  • 70. Instance VariablesVM OperationsInstance Var Lookup Target Offset Cache Access Object Object associated with @bar JVM 1 Offset Table Access Site “@foo” => 0 “@bar” => 1
  • 71. InvokeDynamic letsJRuby teach the JVM how Ruby works
  • 72. How Do We Know We’ve Succeeded?• Benchmarking• Monitoring• User reports
  • 73. Benchmarking is Hard• Runtimes may improve over time• Optimizer may eliminate useless code• Small systems are completely different• Know how your runtime optimizes!
  • 74. bench_empty_methoddef foo; self; endi = 0while i < 10_000_000 foo; foo; foo; foo; foo i += 1end
  • 75. Ruby 1.9.3 JRuby JRuby + indy4s3s2s1s0s ZOMG 40X FA STER!
  • 76. Observations
  • 77. One slow runtimescrews up the table
  • 78. ...do comparisons asratios against a norm
  • 79. JRuby calls emptymethods really fast!!!
  • 80. InvokeDynamic doesnot do much for us?
  • 81. Ruby 1.9.3 JRuby JRuby + indy4s3s2s1s0s
  • 82. JVM Opto 101• JITs code bodies after 10k calls • No 10k calls, no JIT (generally)• Inlines up to two targets• Optimistic • Early decisions may be wrong • Small code looks drastically different
  • 83. SMALL CODE ISDIFFERENT THAN LARGE CODE
  • 84. Inlining• Call site in method A and method B match• JVM treats them as though B lived in A • No call overhead • Variables visible across call boundary • More complete view for optimization
  • 85. Optimistic• Say we have a system...• The only method dynamically called is “foo”• All logic for dyncall revolves around “foo”• Hotspot thinks all dyncalls will be “foo”
  • 86. bench_empty_method2def foo; self; enddef bar1; self; enddef bar2; self; endi = 0while i < 10_000_000 bar1; bar1; bar1; bar1; bar1 bar2; bar2; bar2; bar2; bar2 i += 1end...
  • 87. bench1 bench2 bench1 + indy bench2 + indy 0.7s0.525s 0.35s0.175s 0s
  • 88. bench1 + rbx bench2 + rbx bench1 + indy bench2 + indy0.4s0.3s0.2s0.1s 0s
  • 89. What Happened?• An unrelated change slowed our bench?• Not really unrelated • Hotspot optimizes early loop first • Later loop is different...calls “foo” • Assumptions change, perf looks different
  • 90. Benchmarking is Not Enough• Need to monitor runtime optimization • JIT compilation • Inlining • Eventual native code (x86 ASM)• Fun?
  • 91. 1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot)
  • 92. 1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot)
  • 93. 1711 4 % bench_empty_method::block_0$RUBY$__file__ @ 56 (171 bytes) @ 59 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot) @ 16 java.lang.invoke.MethodHandle::invokeExact (5 bytes) inline (hot) @ 1 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 12 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 29 java.lang.invoke.MethodHandle::invokeExact (35 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (7 bytes) inline (hot) @ 3 org.jruby.runtime.invokedynamic.InvocationLinker::testMetaclass (17 bytes) inline (hot) @ 5 org.jruby.RubyBasicObject::getMetaClass (5 bytes) inline (hot) @ 14 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 31 java.lang.invoke.MethodHandle::invokeExact (10 bytes) inline (hot) @ 6 bench_empty_method::method__0$RUBY$foo (2 bytes) inline (hot) @ 68 java.lang.invoke.MethodHandle::invokeExact (33 bytes) inline (hot) @ 5 java.lang.invoke.MethodHandle::invokeExact (20 bytes) inline (hot) @ 2 java.lang.invoke.MethodHandle::invokeExact (9 bytes) inline (hot) @ 2 java.lang.invoke.MutableCallSite::getTarget (5 bytes) inline (hot)
  • 94. Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants] # {method} method__0$RUBY$foo (Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject; inbench_empty_method # parm0: rsi:rsi = bench_empty_method # parm1: rdx:rdx = org/jruby/runtime/ThreadContext # parm2: rcx:rcx = org/jruby/runtime/builtin/IRubyObject # parm3: r8:r8 = org/jruby/runtime/Block # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq
  • 95. Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants] # {method} method__0$RUBY$foo (Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject; inbench_empty_method # parm0: rsi:rsi = bench_empty_method # parm1: rdx:rdx = org/jruby/runtime/ThreadContext # parm2: rcx:rcx = org/jruby/runtime/builtin/IRubyObject # parm3: r8:r8 = org/jruby/runtime/Block # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq
  • 96. Decoding compiled method 0x000000010549d7d0:Code:[Entry Point][Verified Entry Point][Constants] # {method} method__0$RUBY$foo (Lbench_empty_method;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject; inbench_empty_method # parm0: rsi:rsi = bench_empty_method # parm1: rdx:rdx = org/jruby/runtime/ThreadContext # parm2: rcx:rcx = org/jruby/runtime/builtin/IRubyObject # parm3: r8:r8 = org/jruby/runtime/Block # [sp+0x20] (sp of caller) 0x000000010549d900: sub $0x18,%rsp 0x000000010549d907: mov %rbp,0x10(%rsp) ;*synchronization entry ; - bench_empty_method::method__0$RUBY$foo@-1 (line 3) 0x000000010549d90c: mov %rcx,%rax 0x000000010549d90f: add $0x10,%rsp 0x000000010549d913: pop %rbp 0x000000010549d914: test %eax,-0xe9f91a(%rip) # 0x00000001045fe000 ; {poll_return} 0x000000010549d91a: retq
  • 97. bench_empty_method3def invoker1 i = 0 while i < 1000 foo; foo; foo; foo; foo i+=1 endend... i = 0 while i < 10000 invoker1 i+=1 end
  • 98. bench1 + indy bench2 + indy bench3 + indy 0.15s0.113s0.075s0.038s 0s
  • 99. Moral• Benchmarks are synthetic• Every system is different• Do your own testing
  • 100. bench_red_black• Pure-Ruby red/black tree impl• Build a 100k tree of rand(999_999)• Delete all nodes• Build it again• Search for elements• In-order walks, min, max
  • 101. Ruby 1.9.3 JRuby - indy JRuby + indy bench_red_black 5s3.75s 2.5s1.25s 0s
  • 102. bench_fractalbench_flipflop_fractal• Mandelbrot generator • Integer loops • Floating-point math• Julia generator using flip-flops • I don’t really understand it.
  • 103. def fractal_flipflop w, h = 44, 54 c = 7 + 42 * w a = [0] * w * h g = d = 0 f = proc do |n| a[c] += 1 o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/) puts "f" + o.map {|l| l.rstrip }.join("n") d += 1 - 2 * ((g ^= 1 << n) >> n) c += [1, w, -1, -w][d %= 4] end 1024.times do !!(!!(!!(!!(!!(!!(!!(!!(!!(true... f[0])...f[1])...f[2])... f[3])...f[4])...f[5])... f[6])...f[7])...f[8]) endend
  • 104. def fractal_flipflop w, h = 44, 54 c = 7 + 42 * w a = [0] * w * h g = d = 0 f = proc do |n| a[c] += 1 o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/) puts "f" + o.map {|l| l.rstrip }.join("n") d += 1 - 2 * ((g ^= 1 << n) >> n) c += [1, w, -1, -w][d %= 4] end 1024.times do !!(!!(!!(!!(!!(!!(!!(!!(!!(true... f[0])...f[1])...f[2])... f[3])...f[4])...f[5])... f[6])...f[7])...f[8]) endend
  • 105. Ruby 1.9.3 JRuby - indy JRuby + indy bench_fractal 1.5s1.125s 0.75s0.375s 0s
  • 106. Ruby 1.9.3 JRuby - indy JRuby + indy bench_flipflop_fractal 1.5s1.125s 0.75s0.375s 0s
  • 107. Rails?
  • 108. Rails Perf• Mixed bag right now...some fast some slow• JVM JIT limits need to be bumped up • Significant gains for some folks• Long warmup times for so much code• Work continues!
  • 109. What Next?
  • 110. Expand Opto• Mixed-arity (ADD SLIDES ABOUT WHAT WE OPTIMIZE TODAY)• Super calls• Much, much lighter-weight closures• Then what?
  • 111. Wacky Stuff• define_method methods?• method_missing call-throughs?• respond_to???• proc tables?• All possible...but worth it?
  • 112. The Future• JRuby will continue to get faster • Indy improvements at VM-level • Compiler improvements at Ruby level• If you can’t compete with JVM...• Still FOSS from top to bottom • Don’t be afraid!
  • 113. Q/A