8. Profiling
• Gather data about code while interpreting
• Invariants (types, constants, nulls)
• Statistics (branches, calls)
• Use that information to optimize
• Educated guess?
11. Inlining?
• Combine caller and callee into one unit
• e.g. based on profile
• Perhaps with a guard/test
• Optimize as a whole
• More code means better visibility
12. Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
int add(int a, int b) {
return a + b;
}
13. Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Only one target is ever seen
}
int add(int a, int b) {
return a + b;
}
14. Inlining
int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = accum + i;
}
return accum; Don’t bother making the call
}
15. Loop Unrolling
• Works for small, constant loops
• Avoid tests, branching
• Allow inlining a single call as many
16. Loop Unrolling
private static final String[] options =
{ "yes", "no", "maybe"};
public void looper() {
for (String option : options) {
process(option);
}
}
Small loop, constant stride,
constant size
18. Lock Coarsening
public void needsLocks() {
for (option : options) {
process(option);
} Repeatedly locking
}
private synchronized String process(String option) {
// some wacky thread-unsafe code
}
19. Lock Coarsening
public void needsLocks() { Lock once
synchronized (this) {
for (option : options) {
// some wacky thread-unsafe code
}
}
}
20. Lock Eliding
public void overCautious() { Synchronize on
List l = new ArrayList();
synchronized (l) {
new Object
for (option : options) {
l.add(process(option));
}
}
}
But we know it
never escapes this
thread...
21. Lock Eliding
public void overCautious() {
List l = new ArrayList();
for (option : options) {
l.add(
/* process()’s code */);
}
}
No need to lock
22. Escape Analysis
private static class Foo {
public String a;
public String b;
Foo(String a, String b) {
this.a = a;
this.b = b;
}
}
23. Escape Analysis
public void bar() {
Foo f = new Foo("Hello", "Øredev");
baz(f);
}
public void baz(Foo f) { Same object all
System.out.print(f.a);
System.out.print(", "); the way through
quux(f);
}
Never “escapes”
public void quux(Foo f) { these methods
System.out.print(f.b);
System.out.println('!');
}
25. Perf Sinks
• Memory accesses
• By far the biggest expense
• Calls
• Memory ref + branch kills pipeline
• Call stack, register juggling costs
• Locks and volatile writes
26. Volatile?
• Each CPU maintains a memory cache
• Caches may be out of sync
• If it doesn’t matter, no problem
• If it does matter, threads disagree!
• Volatile forces synchronization of cache
• Across cores and to main memory
27. Call Site
• The place where you make a call
• Monomorphic (“one shape”)
• Single target class
• Bimorphic (“two shapes”)
• Polymorphic (“many shapes”)
• Megamorphic (“you’re screwed”)
28. Blah.java
System.currentTimeMillis(); // static, monomorphic
List list1 = new ArrayList(); // constructor, monomorphic
List list2 = new LinkedList();
for (List list : new List[]{ list1, list2 }) {
list.add("hello"); // bimorphic
}
for (Object obj : new Object[]{ 'foo', list1, new Object() }) {
obj.toString(); // polymorphic
}
29. Hotspot
• -client mode (C1) inlines, less aggressive
• Fewer opportunities to optimize
• -server mode (C2) inlines aggressively
• Based on richer runtime profiling
• We’ll focus on this
• Tiered mode combines them
• -XX:+TieredCompilation
30. C2 “server” Inlining
• Profile to find “hot spots”
• Call sites
• Branch statistics
• Profile until 10k calls
• Inline mono/bimorphic calls
• Other mechanisms for polymorphic calls
31. Tuning Inlining
• -XX:+MaxInlineSize=35
• Largest inlinable method (bytecode)
• -XX:+InlineSmallCode=#
• Largest inlinable compiled method
• -XX:+FreqInlineSize=#
• Largest frequently-called method...
34. Monitoring the JIT
• Dozens of flags
• Reams of output
• Always evolving
• How can you understand it?
35. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) {
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
static int add(int a, int b) {
return a + b;
}
}
36. $ java -version
openjdk version "1.7.0-b147"
OpenJDK Runtime Environment (build 1.7.0-
b147-20110927)
OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode)
$ javac Accumulator.java
$ java Accumulator 1000
499500
42. But what’s this?
$ java -XX:+PrintCompilation Accumulator 10000
53 1 java.lang.String::hashCode (67 bytes)
64 2 Accumulator::add (4 bytes)
49995000
Class loading, security logic, other stuff...
43. Dear god...there’s zombies in my code?!?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
44. Dear god...there’s zombies in my code?!?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
Not entrant? What the heck?
45. Optimistic Compilers
• Assume profile is accurate
• Aggressively optimize based on profile
• Bail out if we’re wrong
• ...and hope that we’re usually right
46. Deoptimization
• Bail out of running code
• Monitoring flags describe process
• “uncommon trap” - we were wrong
• “not entrant” - don’t let new calls enter
• “zombie” - on its way to deadness
47. No JIT At All?
• Code is too big
• Code isn’t called enough
48. That looks exciting!
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
49. Exception handling in here (boring!)
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
50. What’s this “n” all about?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
51. This method is native...maybe “intrinsic”
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
We’ll come back to that...
52. And this one?
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
Method has been replaced while running (OSR)
53. Millis from JVM start
1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412 71 java.lang.String::indexOf (7 bytes)
1420 72 ! java.io.BufferedReader::readLine (304 bytes)
1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant
1435 74 n java.lang.Object::hashCode (0 bytes)
1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie
1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie
1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant
1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant
1449 75 java.lang.String::endsWith (15 bytes)
1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665 76 java.lang.ClassLoader::checkName (43 bytes)
Sequence number of compilation
57. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
}
static int add(int a, int b) {
return a + b;
}
}
58. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Called 10k times
}
static int add(int a, int b) {
return a + b;
}
}
59. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Called 10k times
}
static int add(int a, int b) { JITs as expected
return a + b;
}
}
60. public class Accumulator {
public static void main(String[] args) {
int max = Integer.parseInt(args[0]);
System.out.println(addAll(max));
}
static int addAll(int max) { Called only once
int accum = 0;
for (int i = 0; i < max; i++) {
accum = add(accum, i);
}
return accum;
Called 10k times
}
static int add(int a, int b) { JITs as expected
return a + b;
}
} But makes no calls!
61. static double addAllSqrts(int max) {
double accum = 0;
for (int i = 0; i < max; i++) {
accum = addSqrt(accum, i);
}
return accum;
}
static int addSqrt(double a, int b) {
return a + sqrt(b);
}
static double sqrt(int a) {
return Math.sqrt(b);
}
69. Hotspot sees it’s 100% String
10 java.util.LinkedList::indexOf (73 bytes)
@ 52 java.lang.Object::equals (11 bytes)
type profile java/lang/Object -> java/lang/String (100%)
@ 52 java.lang.String::equals (88 bytes)
11 java.lang.String::indexOf (87 bytes)
@ 83 java.lang.String::indexOfSupplementary too big
Too big to inline! Could be bad?
70. Intrinsic?
• Known to the JIT
• Don’t inline bytecode
• Do insert “best” native code
• e.g. kernel-level memory operation
• e.g. optimized sqrt in machine code
80. Wednesday, July 27, 2011
~/oscon ! java -XX:+UnlockDiagnosticVMOptions
> -XX:+PrintAssembly
> Accumulator 10000
OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled;
turning on DebugNonSafepoints to gain additional output
Loaded disassembler from hsdis-amd64.dylib
...
81. Decoding compiled method 11343cbd0:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'add' '(II)I' in 'Accumulator'
# parm0: rsi = int
# parm1: rdx = int
# [sp+0x20] (sp of caller)
11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
83. x86_64 Assembly 101
add Two’s complement add
sub ...subtract
mov* Move data from a to b
jmp goto
je, jne, jl, jge, ... Jump if ==, !=, <, >=, ...
push, pop Call stack operations
call*, ret* Call, return from subroutine
eax, ebx, esi, ... 32-bit registers
rax, rbx, rsi, ... 64-bit registers
84. Register Machine
• Instead of stack moves, we have “slots”
• Move data into slots
• Trigger operations that manipulate data
• Get new data out of slots
• JVM stack, locals end up as register ops
85. Stack?
• Native code has a stack too
• Maintains registers from call to call
• Various calling conventions
• Caller saves registers?
• Callee saves registers?
86. Decoding compiled method 11343cbd0: <= address of new compiled code
Code:
[Disassembling for mach='i386:x86-64'] <= architecture
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} 'add' '(II)I' in 'Accumulator' <= method, signature, class
# parm0: rsi = int <= first parm to method goes in rsi
# parm1: rdx = int <= second parm goes in rdx
# [sp+0x20] (sp of caller) <= caller’s pointer into native stack
87. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
rbp points at current stack frame, so we save it off.
88. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
Two args, so we bump stack pointer by 0x10.
89. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
Do nothing, e.g. to memory-align code.
90. 11343cd00: push %rbp
11343cd01: sub $0x10,%rsp
11343cd05: nop ;*synchronization entry
; - Accumulator::add@-1 (line 16)
11343cd06: mov %esi,%eax
11343cd08: add %edx,%eax ;*iadd
; - Accumulator::add@2 (line 16)
11343cd0a: add $0x10,%rsp
11343cd0e: pop %rbp
11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000
; {poll_return}
11343cd15: retq
At the “-1” instruction of our add() method...
i.e. here we go!
98. Things to Watch For
• CALL operations
• Indicates something failed to inline
• LOCK operations
• Cache-busting, e.g. volatility
99. CALL
1134858f5: xchg %ax,%ax
1134858f7: callq 113414aa0 ; OopMap{off=316}
;*invokespecial addAsBignum
; - org.jruby.RubyFixnum::addFixnum@29 (line 348)
; {optimized virtual_call}
1134858fc: jmpq 11348586d
Ruby integer adds might overflow into Bignum, leading to
addAsBignum call. In this case, it’s never called, so Hotspot
emits callq assuming we won’t hit it.
101. LOCK
public class RubyBasicObject ... {
private static final boolean DEBUG = false;
private static final Object[] NULL_OBJECT_ARRAY = new Object[0];
// The class of this object
protected transient RubyClass metaClass;
// zeroed by jvm
protected int flags;
// variable table, lazily allocated as needed (if needed)
private volatile Object[] varTable = NULL_OBJECT_ARRAY;
Maybe it’s not such a good idea to pre-init a volatile?
102. LOCK
~/projects/jruby ! git log 2f935de1e40bfd8b29b3a74eaed699e519571046 -1 | cat
commit 2f935de1e40bfd8b29b3a74eaed699e519571046
Author: Charles Oliver Nutter <headius@headius.com>
Date: Tue Jun 14 02:59:41 2011 -0500
Do not eagerly initialize volatile varTable field in RubyBasicObject;
speeds object creation significantly.
LEVEL UP!
103. invokedynamic?
• Largely, it works the same
• MethodHandles optimize to x86 asm
• Inlining as normal
• Performance nearly the same as static!
• And that’s exactly the point!
104. What Have We Learned?
• How Hotspot’s JIT works
• How to monitor the JIT
• How to find problems
• How to fix problems we find
105. What We Missed
• Tuning GC settings in JVM
• Monitoring GC with VisualVM
• Google ‘visualgc’...it’s awesome