2. Hiya
• Charles Oliver Nutter
• headius@headius.com
• @headius
• JVM language guy at Red Hat (JBoss)
3. Performance?
• Writing code
• Man hours more expensive than CPU
hours
• Developer contentedness
• Running code
• Straight line
4. High Performance?
• Faster than...
• ...other Ruby impls?
• ...other language runtimes?
• ...unmanaged languages, like C?
• ...you need it to be?
5. “Fast Enough”
• 1.8.7 was fast enough
• 1.9.3 is fast enough
• Unless it’s not fast enough
• Does it matter?
6. Performance Wall
• Move to a different runtime
• Move to a different language
• ...in whole or part
7. If you’re not writing perf-
sensitive code in Ruby,
you’re giving up too easily.
8. Native Extensions
• Not universally bad
• Just bad in MRI
• Invasive
• Pointers
• Few guarantees
9. What We Want
• Faster execution
• Better GC
• Parallel execution
• Big data
10. What We Can’t Have
• Faster execution
• Better GC
• Parallel execution
• Big data
11. Different Approach
• Build our own runtime?
• YARV, Rubinius, MacRuby
• Use an existing runtime?
• JRuby, MagLev, MacRuby, IronRuby
12. Build or Buy
• Making a new VM is “easy”
• Making it competitive is really hard
• I mean really, really, really hard
13. JVM
• 15+ years of engineering by whole teams
• FOSS
• Fastest VM available
• Best GCs available
• Full parallel threading with guarantees
• Broad platform support
14. But Java is Slow!
• Java is very, very fast
• Literally, C fast in many cases
• Java applications can be slow
• Oh hey, just like Ruby?
• The way you write code is more important
than the language you use.
15. JRuby
• Java (and Ruby) impl of Ruby on JVM
• Same memory, threading model
• JRuby JITs to JVM bytecode
• End of story, right?
16. Long, Hard Road
• Interpreter optimization
• JVM bytecode compiler
• Optimizing core class methods
• Lather, rinse, and repeat
17.
18. Align with JVM
• Individual arguments on call stack
• JVM local variables
• Avoid artificial framing
• Avoid inter-call goo
• Eliminate unnecessary work
19. Unnecessary Work
• Modules are maps
• Name to method
• Name to constant
• Name to class var
• Instance variables as maps
• Wasted cycles without caching
20. Method Lookup
• Inside a class/module
• Current class’s methods (a map)
• Methods retrieved from class + ancestors
• Serial or switch indicates staleness
• Weak list of child classes
• Class mutation cascades down hierarchy
24. to_s
Method lookups go up-hierarchy Thing
Lookup target caches result
Person Place
obj.to_s Rubyist Other
25. Method lookups go up-hierarchy Thing
Lookup target caches result
Person Place
to_s
obj.to_s Rubyist Other
26. Method lookups go up-hierarchy Thing
Lookup target caches result
Modification cascades down Person Place
to_s
obj.to_s Rubyist Other
27. Method lookups go up-hierarchy Thing
to_s
Lookup target caches result
Modification cascades down Person Place
to_s
obj.to_s Rubyist Other
28. Constant Lookup
• Cache at lookup site
• Global serial/switch indicates staleness
• Complexities of lookup, etc
• Joy of Ruby interfering with Joy of Opto
• Modifying constants triggers invalidation
29. Instance Vars
• Class holds a table of offsets
• Object holds array of values
• Call site caches offset plus class ID
• Same class, no lookup cost
• Can be polymorphically chained
30. Optimizing Ruby
• Make calls fast
• Make constants free
• Make instance variables cheap
• Make closures lightweight
• TODO
42. JVM 101
200 opcodes
Ten (or 16) “data endpoints”
Invocation Field Access Array Access
invokevirtual getfield *aload
invokeinterface setfield *astore
invokestatic getstatic b,s,c,i,l,d,f,a
invokespecial setstatic
All Java code revolves around these endpoints
Remaining ops are stack, local vars, flow control
allocation, and math/boolean/bit operations
45. JVM
Opcodes
Invocation Field Access Array Access
invokevirtual getfield
*aload
invokeinterface setfield
*astore
invokestatic getstatic
b,s,c,i,l,d,f,a
invokespecial setstatic
Stack Local Vars
Flow Control
Allocation
Boolean and Numeric
46. JVM
Opcodes
Invocation Field Access Array Access
invokevirtual getfield
*aload
invokeinterface setfield
*astore
invokestatic getstatic
b,s,c,i,l,d,f,a
invokespecial setstatic
Stack Local Vars
Flow Control
Allocation
Boolean and Numeric
47. JVM
Opcodes
Invocation Field Access Array Access
invokevirtual getfield
*aload
invokeinterface setfield
*astore
invokestatic getstatic
b,s,c,i,l,d,f,a
invokespecial setstatic
Stack Local Vars
Flow Control
Allocation
Boolean and Numeric
48.
49. In Detail
• JRuby generates code with indy calls
• JVM at first call asks JRuby what to do
• JRuby provides function pointers to code
• Pointers include guards, invalidation logic
• JRuby and JVM cooperate on optimizing
75. How Do We Know
We’ve Succeeded?
• Benchmarking
• Monitoring
• User reports
76. Benchmarking is Hard
• Runtimes may improve over time
• Optimizer may eliminate useless code
• Small systems are completely different
• Know how your runtime optimizes!
87. JVM Opto 101
• JITs code bodies after 10k calls
• No 10k calls, no JIT (generally)
• Inlines up to two targets
• Optimistic
• Early decisions may be wrong
• Small code looks drastically different
89. Inlining
• Call site in method A and method B match
• JVM treats them as though B lived in A
• No call overhead
• Variables visible across call boundary
• More complete view for optimization
90. Optimistic
• Say we have a system...
• The only method dynamically called is “foo”
• All logic for dyncall revolves around “foo”
• Hotspot thinks all dyncalls will be “foo”
91. bench_empty_method2
def foo; self; end
def bar1; self; end
def bar2; self; end
i = 0
while i < 10_000_000
bar1; bar1; bar1; bar1; bar1
bar2; bar2; bar2; bar2; bar2
i += 1
end
...
94. What Happened?
• An unrelated change slowed our bench?
• Not really unrelated
• Hotspot optimizes early loop first
• Later loop is different...calls “foo”
• Assumptions change, perf looks different
95. Benchmarking is
Not Enough
• Need to monitor runtime optimization
• JIT compilation
• Inlining
• Eventual native code (x86 ASM)
• Fun?
105. bench_red_black
• Pure-Ruby red/black tree impl
• Build a 100k tree of rand(999_999)
• Delete all nodes
• Build it again
• Search for elements
• In-order walks, min, max
109. def fractal_flipflop
w, h = 44, 54
c = 7 + 42 * w
a = [0] * w * h
g = d = 0
f = proc do |n|
a[c] += 1
o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/)
puts "f" + o.map {|l| l.rstrip }.join("n")
d += 1 - 2 * ((g ^= 1 << n) >> n)
c += [1, w, -1, -w][d %= 4]
end
1024.times do
!!(!!(!!(!!(!!(!!(!!(!!(!!(true...
f[0])...f[1])...f[2])...
f[3])...f[4])...f[5])...
f[6])...f[7])...f[8])
end
end
110. def fractal_flipflop
w, h = 44, 54
c = 7 + 42 * w
a = [0] * w * h
g = d = 0
f = proc do |n|
a[c] += 1
o = a.map {|z| " :#"[z, 1] * 2 }.join.scan(/.{#{w * 2}}/)
puts "f" + o.map {|l| l.rstrip }.join("n")
d += 1 - 2 * ((g ^= 1 << n) >> n)
c += [1, w, -1, -w][d %= 4]
end
1024.times do
!!(!!(!!(!!(!!(!!(!!(!!(!!(true...
f[0])...f[1])...f[2])...
f[3])...f[4])...f[5])...
f[6])...f[7])...f[8])
end
end
115. Rails Perf
• Mixed bag right now...some fast some slow
• JVM JIT limits need to be bumped up
• Significant gains for some folks
• Long warmup times for so much code
• Work continues!
119. The Future
• JRuby will continue to get faster
• Indy improvements at VM-level
• Compiler improvements at Ruby level
• If you can’t compete with JVM...
• Still FOSS from top to bottom
• Don’t be afraid!
Ruby is already a &#x201C;high performance&#x201D; language when it comes to writing code\n
\n
\n
\n
Many better reasons... differently expressive languages, differently fun, designed for the problem at hand...\n
\n
\n
\n
\n
_why&#x2019;s potion, MA Cournoyer&#x2019;s tinyrb, the thousand other Ruby impls\nRubinius? 5 years with two fulltime people, hundreds of contributors. 1.5 years since last release.\n
\n
\n
\n
\n
\n
\n
\n
Two hard things in CS: cache invalidation and naming things (and off by one errors)\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
Also loading constants, which are read-only; not as interesting\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Comparisons as ratios...sometimes. Often a stark difference sells the point better.\n
\n
Comparisons as ratios...sometimes. Often a stark difference sells the point better.\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Rails applications are incredibly big systems compared to benchmarks\n