High Performance Ruby
Tips,Techniques, and Futures
Monday, July 1, 13
Me
• Charles Oliver Nutter
• @headius
• Java developer since 1996
• JRuby developer since 2006
• Red Hat / JBoss polyglot group
Monday, July 1, 13
Is Ruby fast?
Monday, July 1, 13
Is Ruby fast enough?
Monday, July 1, 13
How fast do you need
Ruby to be?
Monday, July 1, 13
What Should We
Optimize?
• Overall execution time?
• Memory use?
• Developer time?
• Developer happiness? :-)
Monday, July 1, 13
Ruby can be fast...
if you know how.
Monday, July 1, 13
Strategies
• Use a better runtime
• Use more cores
• Write better code
Monday, July 1, 13
Use a BetterVM
Monday, July 1, 13
Many Options
• Ruby 2.0
• Significant execution improvements
• JRuby
• Leveraging JVM more and more
• Rubinius
• OptimizingVM built for Ruby
Monday, July 1, 13
0
7.5
15
22.5
30
Java 1.4 Java 5 Java 6 Java 7
Go Java Go!
JRuby 1.0.3 (bm_red_black_tree.rb)
300% for free
Monday, July 1, 13
0
2
4
6
8
1.0.3 1.1.6 1.4.0 1.5.6 1.6.8 1.7.0
OpenJDK 8 (bm_red_black_tree.rb)
Go JRuby Go!
8.2x Improvement
Monday, July 1, 13
rbtree Extension
• Pure Ruby version works everywhere
• C or Java extension FOR SPEED
• Oh really? ;-)
Monday, July 1, 13
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
1.19
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
1.19
0.51
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
1.19
0.51
0.51
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
1.19
0.51
0.51
0.51
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
1.19
0.51
0.51
0.51
0.29
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
0 1 2 3 4
ruby-1.9.3 + Ruby
ruby-2.0.0 + Ruby
maglev + Ruby
macruby-0.12 + Ruby
rbx-2.0.0rc1 + Ruby
ruby-1.9.3 + C ext
ruby-2.0.0 + C ext
jruby + Ruby
jruby + Java ext
3.96
2.48
1.39
1.19
0.51
0.51
0.51
0.29
0.1
red/black tree, pure Ruby versus native
Runtime per iteration
Monday, July 1, 13
But How?
Monday, July 1, 13
Dynamic Optimization
• Target method/value discovered at runtime
• Lookup is expensive
• We can cache it
• Cache has to be validated
• Indirection hurts pipeline
• Inline methods/values at access point
Monday, July 1, 13
Method Caching
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
method table
Monday, July 1, 13
VM Operations
Method Caching
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Method Caching
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Method Caching
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Caching
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Method Caching
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
Constant Lookup
Constant
Table
MY_CONST VM
Monday, July 1, 13
VM Operations
Constant Lookup
Constant
Table
MY_CONST VM
Access Site
Monday, July 1, 13
VM Operations
LocateValue
Constant Lookup
Constant
Table
MY_CONST VM
Access Site
value
Monday, July 1, 13
VM Operations
LocateValue
Bind Permanently
Constant Lookup
Constant
Table
MY_CONST VM
Access Site
value
Monday, July 1, 13
def foo; 1; end
def invoker; foo; end
i = 0
while i < 10000
  invoker
  i+=1
end
Inlining
Monday, July 1, 13
def invoker; 1; end
i = 0
while i < 10000
  invoker
  i+=1
end
Inline foo into invoker
Monday, July 1, 13
i = 0
while i < 10000
  1
  i+=1
end
Inline invoker into loop
Monday, July 1, 13
i = 0
while i < 10000
  i+=1
end
Value is transient
Monday, July 1, 13
i = 10000
Loop does nothing
Monday, July 1, 13
Variable i is never read
Monday, July 1, 13
Use More Cores
Monday, July 1, 13
It's a multi-core world
• Scaling today is horizontal, not vertical
• N processes does not cut it
• N users * X MB process = $$$
• CoW is only a partial band-aid
• Non-parallel impls are falling behind
• JRuby, Rubinius your only real options
Monday, July 1, 13
True Parallellism
Ruby
Threads
Native
Threads
CPU Cores
in Use
Monday, July 1, 13
True Parallellism
Ruby
Threads
Native
Threads
Ruby 1.8.7
Green Threading
CPU Cores
in Use
Single Thread
Monday, July 1, 13
True Parallellism
Ruby
Threads
Native
Threads
Ruby 1.8.7 Ruby 2.0.0
Green Threading
CPU Cores
in Use
Global LockSingle Thread
Monday, July 1, 13
True Parallellism
Ruby
Threads
Native
Threads
Ruby 1.8.7 Ruby 2.0.0
Green Threading
CPU Cores
in Use
JRuby
Global LockSingle Thread Real Threading
Monday, July 1, 13
Multicore in MRI
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
200MB MRI
Instance
Ten instances * 200MB = 2GB
Monday, July 1, 13
Multicore in JRuby
300MB JRuby
Instance
One instance across 10 threads = 300MB
Monday, July 1, 13
require 'benchmark'
ary = (1..1000000).to_a
loop {
  puts Benchmark.measure {
    10.times {
      ary.each {|i|}
    }
  }
}
Monday, July 1, 13
require 'benchmark'
ary = (1..1000000).to_a
loop {
  puts Benchmark.measure {
    (1..10).map {
      Thread.new {
        ary.each {|i|}
      }
    }.map(&:join)
  }
}
Monday, July 1, 13
Monday, July 1, 13
Ruby 1.9
single thread
JRuby
single thread
Monday, July 1, 13
Ruby 1.9
single thread
Ruby 1.9
multiple threads
JRuby
single thread
JRuby
multiple threads
Monday, July 1, 13
0.2s
0.35s
0.5s
0.65s
0.8s
one thread two threads three threads four threads
Per-iteration time versus thread count
threaded_reverse
Monday, July 1, 13
Doing It Right
• Lock-free persistent data structures
• hamster et al
• Thread-safety utilities
• Mutex, Queue, thread_safe + atomic gems
• Threaded servers
• puma, trinidad, torquebox, JVM servers
Monday, July 1, 13
Finding Problems
• JRuby
• VM flags (heap/thread dumps, debug)
• Some of the best tools in the world
• Rubinius
• gdb, OS-level tools
• #rubinius
Monday, July 1, 13
Write Better Code
Monday, July 1, 13
• eval
• Exceptions as flow control
• Excessive allocation
• Defeating optimizations
• IO, DB, bad libraries
• VM flaw*
Usual Suspects
*I usually assume it's JRuby's fault until proven otherwise
Monday, July 1, 13
eval
• Code never stays the same
• VM can't cache, can't see patterns
• No optimization is possible*
*Specific cases can sometimes be cached and optimized
Monday, July 1, 13
Fixing eval
• Evaluate code into a method and leave it
• Methods are stable, optimizable
• Pass dynamic state, rather than interpolate
• Branches are cheaper than new code
• Do all evaluation up front
• ...not during your app's hot path
Monday, July 1, 13
Exceptions
• Act like a special return value
• Construct object with information
• Capture call stack at raise point
• Unroll call stack until rescued
• Overhead ranges from big to huge
• Especially costly on optimizingVMs
Monday, July 1, 13
def foo(a); raise; rescue; return a + 1; end
Shallow stack, 100k calls:
JRuby w/ exception: 7.7s
JRuby w/o exception: 0.004s
Ruby 2 w/ exception: 0.25s
Ruby 2 w/o exception: 0.009s
Rubinius w/ exception: 0.1s
Rubinius w/o exception: 0.002s
Monday, July 1, 13
def foo(a); raise; rescue; return a + 1; end
Deep stack, 100k calls:
JRuby w/ exception: 200s
Ruby 2 w/ exception: 1.25s
Rubinius w/ exception: 7.7s
Monday, July 1, 13
Exception Alternatives
• Pre-allocated exception object
• Empty backtrace passed to raise()
• Special return value
• Check at each caller
• catch/throw
• Avoids most overhead
Monday, July 1, 13
Allocation
• Literals
• "foo" creates object every time
• String + String,Array + Array
• Creates intermediate objects
• += is especially wasteful
• Slicing and enumerating
• ary.map{}.select{}.inject{}.find = 3 arrays
Monday, July 1, 13
Fixing Literals
• Constants are your friends
• Optimizes well on most impls
• Avoids literal churn
• Cache common interpolated values
• Study memory profiles
Monday, July 1, 13
Fixing Concat/Copy
• Modify in place
• Thread-safety trade-offs...
• Use persistent structures
• "hamster" gem
• Google "immutable ruby"
Monday, July 1, 13
Fixing Enum Chaining
• Condense into fewer steps
• Lazy Enumerator in 2.0
• Just use a loop :-)
Monday, July 1, 13
Defeating Optimization
• Caching and inlining are key to perf
• If we can't cache...
• Methods won't inline, won't optimize
• Constants must be looked up every time
• We have less time for real work
Monday, July 1, 13
Method Cache Busting
• VM must ensure cache is correct
• Check type
• Ensure method table is the same
• New type every time? No caching.
• Modify method table? No caching.
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
Call Site
method table
Monday, July 1, 13
VM Operations
Method Lookup
Branch
Method Cache
Dynamic Invocation
Target
Object
FooClass
def foo ...
def bar ...
associated with
obj.foo() VM
def foo ...
Call Site
method table
Monday, July 1, 13
Singletons
• Creates new types at runtime
• Impossible to cache based on type
• Usually defines new methods
• Method table is always different
class << foo ...
def foo.bar ...
Monday, July 1, 13
Object#extend
• Includes module into single object
• New one-off type every time
• Class hierarchy keeps changing
foo.extend Enumerable
Monday, July 1, 13
static VALUE
io_getpartial(int argc, VALUE *argv, VALUE io, int nonblock)
{
...
n = rb_read_internal(fptr->fd, RSTRING_PTR(str), len);
rb_str_unlocktmp(str);
if (n < 0) {
if (!nonblock && rb_io_wait_readable(fptr->fd))
goto again;
if (nonblock && (errno == EWOULDBLOCK || errno == EAGAIN))
rb_mod_sys_fail(rb_mWaitReadable, "read would block");
rb_sys_fail_path(fptr->pathv);
}
...
}
Monday, July 1, 13
static VALUE
io_getpartial(int argc, VALUE *argv, VALUE io, int nonblock)
{
...
n = rb_read_internal(fptr->fd, RSTRING_PTR(str), len);
rb_str_unlocktmp(str);
if (n < 0) {
if (!nonblock && rb_io_wait_readable(fptr->fd))
goto again;
if (nonblock && (errno == EWOULDBLOCK || errno == EAGAIN))
rb_mod_sys_fail(rb_mWaitReadable, "read would block");
rb_sys_fail_path(fptr->pathv);
}
...
}
Monday, July 1, 13
void
rb_mod_sys_fail(VALUE mod, const char *mesg)
{
VALUE exc = make_errno_exc(mesg);
rb_extend_object(exc, mod);
rb_exc_raise(exc);
}
Monday, July 1, 13
void
rb_mod_sys_fail(VALUE mod, const char *mesg)
{
VALUE exc = make_errno_exc(mesg);
rb_extend_object(exc, mod);
rb_exc_raise(exc);
}
Monday, July 1, 13
Fixing Singletons/
#extend
• Functional patterns
• FooLibrary.process(obj) rather than
obj.extend FooLibrary; obj.process
• Create types up front (programmatically?)
• 1000 predefined types beats infinite types
Monday, July 1, 13
Monday, July 1, 13
Monday, July 1, 13
Constant Lookup
• Constants in tables on classes/modules
• Usually assigned only once, at load time
• Lookup is expensive, like methods
• Values can be cached
Monday, July 1, 13
Constant Cache
• Constant search proceeds two ways
• First, lexical scoping
• Second, class hierarchy
• Invalidation happens globally
Monday, July 1, 13
Constant Cache Busting
• Redefining constants
• Introducing new lexical scopes
• Classes created at runtime
• Evaluated code
• Altering class hierarchies
• Lookup results may change...no caching
Monday, July 1, 13
Fixing Constants
• Don't modify them
• i.e. CONSTANT
• Avoid runtime class hierarchy changes
Monday, July 1, 13
How to Get Help
Monday, July 1, 13
Performance Issues
• Assume nothing...most can be fixed
• Isolate bad code, small a case as possible
• UseVM tools to monitor caches
• Fix if it's your bug, PR if it's a library
• Come to us for help or if it's aVM bug
• Repeat...
Monday, July 1, 13
Concurrency Issues
• Avoid mutable state
• Synchronize mutations
• Start coarse-grained, get finer over time
• VM tooling to monitor locks, contention
• ContactVM authors for help
Monday, July 1, 13
Monday, July 1, 13
Ruby can be fast...and
we want to help you.
Monday, July 1, 13
ThankYou!
• Charles Oliver Nutter
• @headius
• http://jruby.org
• http://blog.headius.com
• Book: "Using JRuby"
• Book: "Deploying JRuby"
Monday, July 1, 13

High Performance Ruby - E4E Conference 2013

  • 1.
    High Performance Ruby Tips,Techniques,and Futures Monday, July 1, 13
  • 2.
    Me • Charles OliverNutter • @headius • Java developer since 1996 • JRuby developer since 2006 • Red Hat / JBoss polyglot group Monday, July 1, 13
  • 3.
  • 4.
    Is Ruby fastenough? Monday, July 1, 13
  • 5.
    How fast doyou need Ruby to be? Monday, July 1, 13
  • 6.
    What Should We Optimize? •Overall execution time? • Memory use? • Developer time? • Developer happiness? :-) Monday, July 1, 13
  • 7.
    Ruby can befast... if you know how. Monday, July 1, 13
  • 8.
    Strategies • Use abetter runtime • Use more cores • Write better code Monday, July 1, 13
  • 9.
  • 10.
    Many Options • Ruby2.0 • Significant execution improvements • JRuby • Leveraging JVM more and more • Rubinius • OptimizingVM built for Ruby Monday, July 1, 13
  • 11.
    0 7.5 15 22.5 30 Java 1.4 Java5 Java 6 Java 7 Go Java Go! JRuby 1.0.3 (bm_red_black_tree.rb) 300% for free Monday, July 1, 13
  • 12.
    0 2 4 6 8 1.0.3 1.1.6 1.4.01.5.6 1.6.8 1.7.0 OpenJDK 8 (bm_red_black_tree.rb) Go JRuby Go! 8.2x Improvement Monday, July 1, 13
  • 13.
    rbtree Extension • PureRuby version works everywhere • C or Java extension FOR SPEED • Oh really? ;-) Monday, July 1, 13
  • 14.
  • 15.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 16.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 17.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 18.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 19.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 20.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 21.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 22.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 0.51 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 23.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 0.51 0.29 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 24.
    0 1 23 4 ruby-1.9.3 + Ruby ruby-2.0.0 + Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 0.51 0.29 0.1 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  • 25.
  • 26.
    Dynamic Optimization • Targetmethod/value discovered at runtime • Lookup is expensive • We can cache it • Cache has to be validated • Indirection hurts pipeline • Inline methods/values at access point Monday, July 1, 13
  • 27.
    Method Caching Target Object FooClass def foo... def bar ... associated with obj.foo() VM method table Monday, July 1, 13
  • 28.
    VM Operations Method Caching Target Object FooClass deffoo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 29.
    VM Operations Method Lookup MethodCaching Target Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 30.
    VM Operations Method Lookup MethodCaching Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 31.
    VM Operations Method Lookup Branch MethodCaching Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 32.
    VM Operations Method Lookup Branch MethodCache Method Caching Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 33.
  • 34.
  • 35.
  • 36.
    VM Operations LocateValue Bind Permanently ConstantLookup Constant Table MY_CONST VM Access Site value Monday, July 1, 13
  • 37.
    def foo; 1;end def invoker; foo; end i = 0 while i < 10000   invoker   i+=1 end Inlining Monday, July 1, 13
  • 38.
    def invoker; 1;end i = 0 while i < 10000   invoker   i+=1 end Inline foo into invoker Monday, July 1, 13
  • 39.
    i = 0 whilei < 10000   1   i+=1 end Inline invoker into loop Monday, July 1, 13
  • 40.
    i = 0 whilei < 10000   i+=1 end Value is transient Monday, July 1, 13
  • 41.
    i = 10000 Loopdoes nothing Monday, July 1, 13
  • 42.
    Variable i isnever read Monday, July 1, 13
  • 43.
  • 44.
    It's a multi-coreworld • Scaling today is horizontal, not vertical • N processes does not cut it • N users * X MB process = $$$ • CoW is only a partial band-aid • Non-parallel impls are falling behind • JRuby, Rubinius your only real options Monday, July 1, 13
  • 45.
  • 46.
    True Parallellism Ruby Threads Native Threads Ruby 1.8.7 GreenThreading CPU Cores in Use Single Thread Monday, July 1, 13
  • 47.
    True Parallellism Ruby Threads Native Threads Ruby 1.8.7Ruby 2.0.0 Green Threading CPU Cores in Use Global LockSingle Thread Monday, July 1, 13
  • 48.
    True Parallellism Ruby Threads Native Threads Ruby 1.8.7Ruby 2.0.0 Green Threading CPU Cores in Use JRuby Global LockSingle Thread Real Threading Monday, July 1, 13
  • 49.
    Multicore in MRI 200MBMRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance Ten instances * 200MB = 2GB Monday, July 1, 13
  • 50.
    Multicore in JRuby 300MBJRuby Instance One instance across 10 threads = 300MB Monday, July 1, 13
  • 51.
    require 'benchmark' ary =(1..1000000).to_a loop {   puts Benchmark.measure {     10.times {       ary.each {|i|}     }   } } Monday, July 1, 13
  • 52.
    require 'benchmark' ary =(1..1000000).to_a loop {   puts Benchmark.measure {     (1..10).map {       Thread.new {         ary.each {|i|}       }     }.map(&:join)   } } Monday, July 1, 13
  • 53.
  • 54.
    Ruby 1.9 single thread JRuby singlethread Monday, July 1, 13
  • 55.
    Ruby 1.9 single thread Ruby1.9 multiple threads JRuby single thread JRuby multiple threads Monday, July 1, 13
  • 56.
    0.2s 0.35s 0.5s 0.65s 0.8s one thread twothreads three threads four threads Per-iteration time versus thread count threaded_reverse Monday, July 1, 13
  • 57.
    Doing It Right •Lock-free persistent data structures • hamster et al • Thread-safety utilities • Mutex, Queue, thread_safe + atomic gems • Threaded servers • puma, trinidad, torquebox, JVM servers Monday, July 1, 13
  • 58.
    Finding Problems • JRuby •VM flags (heap/thread dumps, debug) • Some of the best tools in the world • Rubinius • gdb, OS-level tools • #rubinius Monday, July 1, 13
  • 59.
  • 60.
    • eval • Exceptionsas flow control • Excessive allocation • Defeating optimizations • IO, DB, bad libraries • VM flaw* Usual Suspects *I usually assume it's JRuby's fault until proven otherwise Monday, July 1, 13
  • 61.
    eval • Code neverstays the same • VM can't cache, can't see patterns • No optimization is possible* *Specific cases can sometimes be cached and optimized Monday, July 1, 13
  • 62.
    Fixing eval • Evaluatecode into a method and leave it • Methods are stable, optimizable • Pass dynamic state, rather than interpolate • Branches are cheaper than new code • Do all evaluation up front • ...not during your app's hot path Monday, July 1, 13
  • 63.
    Exceptions • Act likea special return value • Construct object with information • Capture call stack at raise point • Unroll call stack until rescued • Overhead ranges from big to huge • Especially costly on optimizingVMs Monday, July 1, 13
  • 64.
    def foo(a); raise;rescue; return a + 1; end Shallow stack, 100k calls: JRuby w/ exception: 7.7s JRuby w/o exception: 0.004s Ruby 2 w/ exception: 0.25s Ruby 2 w/o exception: 0.009s Rubinius w/ exception: 0.1s Rubinius w/o exception: 0.002s Monday, July 1, 13
  • 65.
    def foo(a); raise;rescue; return a + 1; end Deep stack, 100k calls: JRuby w/ exception: 200s Ruby 2 w/ exception: 1.25s Rubinius w/ exception: 7.7s Monday, July 1, 13
  • 66.
    Exception Alternatives • Pre-allocatedexception object • Empty backtrace passed to raise() • Special return value • Check at each caller • catch/throw • Avoids most overhead Monday, July 1, 13
  • 67.
    Allocation • Literals • "foo"creates object every time • String + String,Array + Array • Creates intermediate objects • += is especially wasteful • Slicing and enumerating • ary.map{}.select{}.inject{}.find = 3 arrays Monday, July 1, 13
  • 68.
    Fixing Literals • Constantsare your friends • Optimizes well on most impls • Avoids literal churn • Cache common interpolated values • Study memory profiles Monday, July 1, 13
  • 69.
    Fixing Concat/Copy • Modifyin place • Thread-safety trade-offs... • Use persistent structures • "hamster" gem • Google "immutable ruby" Monday, July 1, 13
  • 70.
    Fixing Enum Chaining •Condense into fewer steps • Lazy Enumerator in 2.0 • Just use a loop :-) Monday, July 1, 13
  • 71.
    Defeating Optimization • Cachingand inlining are key to perf • If we can't cache... • Methods won't inline, won't optimize • Constants must be looked up every time • We have less time for real work Monday, July 1, 13
  • 72.
    Method Cache Busting •VM must ensure cache is correct • Check type • Ensure method table is the same • New type every time? No caching. • Modify method table? No caching. Monday, July 1, 13
  • 73.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 74.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 75.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 76.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 77.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 78.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 79.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 80.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 81.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  • 82.
    VM Operations Method Lookup Branch MethodCache Dynamic Invocation Target Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  • 83.
    Singletons • Creates newtypes at runtime • Impossible to cache based on type • Usually defines new methods • Method table is always different class << foo ... def foo.bar ... Monday, July 1, 13
  • 84.
    Object#extend • Includes moduleinto single object • New one-off type every time • Class hierarchy keeps changing foo.extend Enumerable Monday, July 1, 13
  • 85.
    static VALUE io_getpartial(int argc,VALUE *argv, VALUE io, int nonblock) { ... n = rb_read_internal(fptr->fd, RSTRING_PTR(str), len); rb_str_unlocktmp(str); if (n < 0) { if (!nonblock && rb_io_wait_readable(fptr->fd)) goto again; if (nonblock && (errno == EWOULDBLOCK || errno == EAGAIN)) rb_mod_sys_fail(rb_mWaitReadable, "read would block"); rb_sys_fail_path(fptr->pathv); } ... } Monday, July 1, 13
  • 86.
    static VALUE io_getpartial(int argc,VALUE *argv, VALUE io, int nonblock) { ... n = rb_read_internal(fptr->fd, RSTRING_PTR(str), len); rb_str_unlocktmp(str); if (n < 0) { if (!nonblock && rb_io_wait_readable(fptr->fd)) goto again; if (nonblock && (errno == EWOULDBLOCK || errno == EAGAIN)) rb_mod_sys_fail(rb_mWaitReadable, "read would block"); rb_sys_fail_path(fptr->pathv); } ... } Monday, July 1, 13
  • 87.
    void rb_mod_sys_fail(VALUE mod, constchar *mesg) { VALUE exc = make_errno_exc(mesg); rb_extend_object(exc, mod); rb_exc_raise(exc); } Monday, July 1, 13
  • 88.
    void rb_mod_sys_fail(VALUE mod, constchar *mesg) { VALUE exc = make_errno_exc(mesg); rb_extend_object(exc, mod); rb_exc_raise(exc); } Monday, July 1, 13
  • 89.
    Fixing Singletons/ #extend • Functionalpatterns • FooLibrary.process(obj) rather than obj.extend FooLibrary; obj.process • Create types up front (programmatically?) • 1000 predefined types beats infinite types Monday, July 1, 13
  • 90.
  • 91.
  • 92.
    Constant Lookup • Constantsin tables on classes/modules • Usually assigned only once, at load time • Lookup is expensive, like methods • Values can be cached Monday, July 1, 13
  • 93.
    Constant Cache • Constantsearch proceeds two ways • First, lexical scoping • Second, class hierarchy • Invalidation happens globally Monday, July 1, 13
  • 94.
    Constant Cache Busting •Redefining constants • Introducing new lexical scopes • Classes created at runtime • Evaluated code • Altering class hierarchies • Lookup results may change...no caching Monday, July 1, 13
  • 95.
    Fixing Constants • Don'tmodify them • i.e. CONSTANT • Avoid runtime class hierarchy changes Monday, July 1, 13
  • 96.
    How to GetHelp Monday, July 1, 13
  • 97.
    Performance Issues • Assumenothing...most can be fixed • Isolate bad code, small a case as possible • UseVM tools to monitor caches • Fix if it's your bug, PR if it's a library • Come to us for help or if it's aVM bug • Repeat... Monday, July 1, 13
  • 98.
    Concurrency Issues • Avoidmutable state • Synchronize mutations • Start coarse-grained, get finer over time • VM tooling to monitor locks, contention • ContactVM authors for help Monday, July 1, 13
  • 99.
  • 100.
    Ruby can befast...and we want to help you. Monday, July 1, 13
  • 101.
    ThankYou! • Charles OliverNutter • @headius • http://jruby.org • http://blog.headius.com • Book: "Using JRuby" • Book: "Deploying JRuby" Monday, July 1, 13