2. Intro
• Charles Oliver Nutter
• “JRuby Guy”
• Sun Microsystems 2006-2009
• Engine Yard 2009-2012
• Red Hat 2012-
• Primarily responsible for compiler, perf
• Looking inside JVM
3. What We Will Learn
• How the JVM’s JIT works
• Monitoring the JIT
• Finding problems
• Dumping assembly (don’t be scared!)
4. What We Won’t
• GC tuning
• GC monitoring with VisualVM
• Google ‘visualgc’, it’s awesome
• OpenJDK internals
• JNI
5. Caveat
• Focusing on OpenJDK (Hotspot)
• Other JVMs will do things differently
• But base principals usually apply
• Flags are specific to Hotspot
• Internal, subject to change, etc
6. JIT
• Just-In-Time compilation
• Compiled when needed
• Maybe immediately before execution
• ...or when we decide it’s important
• ...or never?
8. Profiling
• Gather data about code while interpreting
• Invariants (types, constants, nulls)
• Statistics (branches, calls)
• Use that information to optimize
• Educated guess
• Guess can be wrong...
11. Inlining?
• Combine caller and callee into one unit
• e.g. based on profile
• Perhaps with a guard/test
• Optimize as a whole
• More code means better visibility
12. Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
int add(int a, int b) {
return a + b;
}
13. Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Only one target is ever seen
}
int add(int a, int b) {
return a + b;
}
14. Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = accum + i;
}
return accum; Don’t bother making the call
}
15. Loop Unrolling
• Works for small, constant loops
• Avoid tests, branching
• Allow inlining a single call as many
16. Loop Unrolling
private static final String[] options =
{ "yes", "no", "maybe"};
public void looper() {
for (String option : options) {
process(option);
}
}
Small loop, constant stride,
constant size
18. Lock Coarsening
public void needsLocks() {
for (option : options) {
process(option);
} Repeatedly locking
}
private synchronized String process(String option) {
// some wacky thread-unsafe code
}
19. Lock Coarsening
public void needsLocks() { Lock once
synchronized (this) {
for (option : options) {
// some wacky thread-unsafe code
}
}
}
20. Lock Eliding
public void overCautious() { Synchronize on
List l = new ArrayList();
synchronized (l) {
new Object
for (option : options) {
l.add(process(option));
}
}
}
But we know it
never escapes this
thread...
21. Lock Eliding
public void overCautious() {
List l = new ArrayList();
for (option : options) {
l.add(
/* process()’s code */);
}
}
No need to lock
22. Escape Analysis
private static class Foo {
public final String a;
public final String b;
Foo(String a, String b) {
this.a = a;
this.b = b;
}
}
23. Escape Analysis
public void bar() {
Foo f = new Foo("Hello", "JVM");
baz(f);
}
public void baz(Foo f) { Same object all
System.out.print(f.a);
System.out.print(", "); the way through
quux(f);
}
Never “escapes”
public void quux(Foo f) { these methods
System.out.print(f.b);
System.out.println('!');
}
25. Escape Analysis
• A bit tweaky on Hotspot
• All paths must inline
• No external view of object
• JRockit was better here?
• Now they can fix Hotspot!
26. Perf Sinks
• Memory accesses
• By far the biggest expense
• Calls
• Memory ref + branch kills pipeline
• Call stack, register juggling costs
• Locks
27. Volatile?
• Each CPU maintains a memory cache
• Caches may be out of sync
• If it doesn’t matter, no problem
• If it does matter, threads disagree!
• Volatile forces synchronization of cache
• Across cores and to main memory
28. Call Site
• The place where you make a call
• Monomorphic (“one shape”)
• Single target class
• Bimorphic (“two shapes”)
• Polymorphic (“many shapes”)
• Megamorphic (“you’re screwed”)
29. Blah.java
System.currentTimeMillis(); // static, monomorphic
List list1 = new ArrayList(); // constructor, monomorphic
List list2 = new LinkedList();
for (List list : new List[]{ list1, list2 }) {
list.add("hello"); // bimorphic
}
for (Object obj : new Object[]{ 'foo', list1, new Object() }) {
obj.toString(); // polymorphic
}
30. Hotspot
• -client mode (C1) inlines, less aggressive
• Fewer opportunities to optimize
• -server mode (C2) inlines aggressively
• Based on richer runtime profiling
32. system ~/projects/javaone2012-jit $ (pickjdk 4 ; time jruby -e 1)
New JDK: jdk1.7.0_07.jdk
real 0m1.251s
user 0m2.128s
sys m0.093s
0
system ~/projects/javaone2012-jit $ (pickjdk 5 ; time jruby -e 1)
New JDK: jdk1.8.0.jdk
real 0m1.167s
user 0m2.767s
sys m0.143s
0
system ~/projects/javaone2012-jit $ (pickjdk 5 ;
time jruby -J-XX:TieredStopAtLevel=1 -e 1)
New JDK: jdk1.8.0.jdk
real 0m0.850s
user 0m1.344s
sys m0.114s
0
33. C2 Compiler
• Profile to find “hot spots”
• Call sites
• Branch statistics
• Profile until 10k calls
• Inline mono/bimorphic calls
• Other mechanisms for polymorphic calls
35. Monitoring the JIT
• Dozens of flags
• Reams of output
• Always evolving
• How can you understand it?
36. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
static int add(int a, int b) {
return a + b;
}
}
37. $ java -version
openjdk version "1.7.0-b147"
OpenJDK Runtime Environment (build 1.7.0-
b147-20110927)
OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode)
$ javac Accumulator.java
$ java Accumulator 1000
499500
43. But what’s this?
$ java -XX:+PrintCompilation Accumulator 10000
53 1 java.lang.String::hashCode (67 bytes)
64 2 Accumulator::add (4 bytes)
49995000
Class loading, security logic, other stuff...
44. Hotspot is making zombies?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
45. Hotspot is making zombies?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
Not entrant? What the heck?
46. Optimistic Compilers
• Assume profile is accurate
• Aggressively optimize based on profile
• Bail out if we’re wrong
• ...and hope that we’re usually right
47. Deoptimization
• Bail out of running code
• Monitoring flags describe process
• “uncommon trap” - something’s changed
• “not entrant” - don’t let new calls enter
• “zombie” - on its way to deadness
52. No JIT At All?
• Code is too big
• Code isn’t called enough
53. That looks exciting!
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
54. Exception handling in here (boring!)
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
55. Exception Handling
• Unroll stack until someone stops us
• Handler gets registered in JVM
• Different treatment by JIT
• Inlined throw + catch = jump
• If no stack trace, essentially free
56. What’s this “n” all about?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
57. This method is native
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
58. And this one?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
Method has been replaced while running (OSR)
59. On-Stack Replacement
• Running method never exits?
• But it’s getting really hot?
• Generally means loops, back-branching
• Compile and replace while running
• Not typically useful in large systems
• Looks great on benchmarks!
60. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
addAll never exits...
static int addAll(int max) {
int accum = 0;
loops until end
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
static int add(int a, int b) {
return a + b;
}
}
70. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
static int add(int a, int b) {
return a + b;
}
}
71. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Called 10k times
}
static int add(int a, int b) {
return a + b;
}
}
72. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Called 10k times
}
static int add(int a, int b) { JITs as expected
return a + b;
}
}
73. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Called 10k times
}
static int add(int a, int b) { JITs as expected
return a + b;
}
} But makes no calls!
74. static double addAllSqrts(int max) {
double accum = 0;
for (int i = 0; i < max; i++) {
accum = addSqrt(accum, i);
}
return accum;
}
static int addSqrt(double a, int b) {
return a + sqrt(b);
}
static double sqrt(int a) {
return Math.sqrt(b);
}
78. Intrinsic?
• Known to the JIT
• Don’t inline bytecode
• Do insert “best” native code
• e.g. kernel-level memory operation
• e.g. optimized sqrt in machine code
79. Common Intrinsics
• String#equals
• Most (all?) Math methods
• System.arraycopy
• Object#hashCode
• Object#getClass
• sun.misc.Unsafe methods
83. Worst XML Evar
• Relational structure in hierarchical form
• Hotspot guys can read it...I cannot
• <JDK>/hotspot/src/share/tools/LogCompilation
• or http://github.com/headius/logc
88. Hotspot sees it’s 100% String
10 java.util.LinkedList::indexOf (73 bytes)
@ 52 java.lang.Object::equals (11 bytes)
type profile java/lang/Object -> java/lang/String (100%)
@ 52 java.lang.String::equals (88 bytes)
11 java.lang.String::indexOf (87 bytes)
@ 83 java.lang.String::indexOfSupplementary too big
Too big to inline! Could be bad?
89. Tuning Inlining
• -XX:+MaxInlineSize=35
• Largest inlinable method (bytecode)
• -XX:+InlineSmallCode=#
• Largest inlinable compiled method
• -XX:+FreqInlineSize=#
• Largest frequently-called method...
97. Wednesday, July 27, 2011
~/oscon ! java -XX:+UnlockDiagnosticVMOptions
> -XX:+PrintAssembly
> Accumulator 10000
OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled;
turning on DebugNonSafepoints to gain additional output
Loaded disassembler from hsdis-amd64.dylib
...
98. Decoding compiled method 11343cbd0:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'add' '(II)I' in 'Accumulator'
# parm0: rsi = int
# parm1: rdx = int
# [sp+0x20] (sp of caller)
11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
100. x86_64 Assembly 101
add Two’s complement add
sub ...subtract
mov* Move data from a to b
jmp goto
je, jne, jl, jge, ... Jump if ==, !=, <, >=, ...
push, pop Call stack operations
call*, ret* Call, return from subroutine
eax, ebx, esi, ... 32-bit registers
rax, rbx, rsi, ... 64-bit registers
101. Register Machine
• Instead of stack moves, we have “slots”
• Move data into slots
• Trigger operations that manipulate data
• Get new data out of slots
• JVM stack, locals end up as register ops
102. Native Stack?
• Native code has a stack too
• Preserves registers from call to call
• Various calling conventions
• Caller preserves registers?
• Callee preserves registers?
103. Decoding compiled method 11343cbd0: <= address of new compiled code
Code:
[Disassembling for mach='i386:x86-64'] <= architecture
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'add' '(II)I' in 'Accumulator' <= method, signature, class
# parm0: rsi = int <= first parm to method goes in rsi
# parm1: rdx = int <= second parm goes in rdx
# [sp+0x20] (sp of caller) <= caller’s pointer into native stack
104. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
rbp points at current stack frame, so we save it off.
105. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
Two args, so we bump stack pointer by 0x10.
106. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
Do nothing, e.g. to memory-align code.
107. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
At the “-1” instruction of our add() method...
i.e. here we go!
115. Things to Watch For
• CALL operations
• Indicates something failed to inline
• LOCK operations
• Cache-busting, e.g. volatility
116. CALL
1134858f5: xchg %ax,%ax
1134858f7: callq 113414aa0 ; OopMap{off=316}
;*invokespecial addAsBignum
; - org.jruby.RubyFixnum::addFixnum@29 (line 348)
; {optimized virtual_call}
1134858fc: jmpq 11348586d
Ruby integer adds might overflow into Bignum, leading to
addAsBignum call. In this case, it’s never called, so Hotspot
emits callq assuming we won’t hit it.
118. LOCK
public class RubyBasicObject ... {
private static final boolean DEBUG = false;
private static final Object[] NULL_OBJECT_ARRAY = new Object[0];
// The class of this object
protected transient RubyClass metaClass;
// zeroed by jvm
protected int flags;
// variable table, lazily allocated as needed (if needed)
private volatile Object[] varTable = NULL_OBJECT_ARRAY;
Maybe it’s not such a good idea to pre-init a volatile?
119. LOCK
~/projects/jruby ! git log 2f935de1e40bfd8b29b3a74eaed699e519571046 -1 | cat
commit 2f935de1e40bfd8b29b3a74eaed699e519571046
Author: Charles Oliver Nutter <headius@headius.com>
Date: Tue Jun 14 02:59:41 2011 -0500
Do not eagerly initialize volatile varTable field in RubyBasicObject;
speeds object creation significantly.
LEVEL UP!
120. What Have We Learned?
• How Hotspot’s JIT works
• How to monitor the JIT
• How to find problems
• How to fix problems we find
121. What We Missed
• Tuning GC settings in JVM
• Monitoring GC with VisualVM
• Google ‘visualgc’...it’s awesome