JAVA JIT
Compilation and optimization
AGENDA
o What is JIT
o Types - Client, Server, Tiered
o Main optimizations approach
o JIT tuning
o Conclusions
2
WHAT IS JIT
o Just In Time compiler
o Compilation done during execution of a
program – at run time – rather than prior to
execution
o First presented at 1960 in LISP
o Java, .NET, JS…
o Oracle HotSpot, IBM J9, Azul…
3
WHAT IS JIT
o JIT separates optimization from SD (just update JVM
- not improve code, tune for your platform)
o JIT'ing requires Profiling
• Because you don't want to JIT everything
o Profiling allows better code-gen
• Inline what’s hot
• Loop unrolling, range-check elimination, etc
• Branch prediction, spill-code-gen, scheduling
4
HOTSPOT JIT CLIENT (C1) WORKFLOW
5
Java
Source Bytecode compiler
Bytecode
Optimized
code
JIT Compiler
Run time
1.5K invocations 
JIT CLIENT (C1)
o Produced Compilations quickly
o Generated code runs relatively slowly
6
HOTSPOT JIT SERVER (C2) WORKFLOW
7
Java
Source Bytecode compiler
Bytecode
Optimized
code (native)
HotSpot info
Profiler
JIT compiler
(optimization)
Run time
JIT compiler
(deoptimization)
10K invocations
HOTSPOT JIT SERVER (C2)
o Produce compilations slowly (long warm-up)
o Generated code runs fast
o Profiler guided
o Speculative
8
HOTSPOT JIT TIERED (C2)
o Available from Java 7
o Default in Java 8
o Best of C1 and C2 approaches
o Level0=Interpreter
o Level1-3=C1
o #1 – C1 w/o profiling
o #2 – C1 with basic profiling (invocations)
o #3 – C1 w full profiling (~35% overhead)
o Level4=C2
9
KEYS FOR JIT VERSION
10
o -client
o -server (-d64)
o -server (-d64) -XX:+TieredCompilation
DEFAULT JIT VERSION
11
Install bits -client -server -d64
Linux 32-bit 32-bit client compiler 32-bit server compiler Error
Linux 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler
Mac OS X 64-bit server compiler 64-bit server compiler 64-bit server compiler
Windows 32-bit 32-bit client compiler 32-bit server compiler Error
Windows 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler
OS Default compiler
Windows, 32-bit, any number of CPUs -client
Windows, 64-bit, any number of CPUs -server
MacOS, any number of CPUs -server
Linux/Solaris, 32-bit, 1 CPU -client
Linux/Solaris, 32-bit, 2 or more CPUs -server
Linux, 64-bit, any number of CPUs -server
*In Java 8 the server compiler is the default in any of these cases
Information about default compiler
% java -version
java version "1.7.0" Java(TM) SE Runtime Environment (build
1.7.0-b147)
Java HotSpot(TM) Server VM (build 21.0-b17, mixed mode)
OPTIMIZATIONS IN HOTSPOT JVM
12
• compiler tactics
• delayed compilation
• tiered compilation
• on-stack replacement
• delayed reoptimization
• program dependence graph rep.
• static single assignment rep.
• proof-based techniques
– exact type inference
– memory value inference
– memory value tracking
– constant folding
– reassociation
– operator strength reduction
– null check elimination
– type test strength reduction
– type test elimination
– algebraic simplification
– common subexpression elimination
– integer range typing
• flow-sensitive rewrites
– conditional constant propagation
– dominating test detection
– flow-carried type narrowing
– dead code elimination
• language-specific techniques
• class hierarchy analysis
• devirtualization
• symbolic constant propagation
• autobox elimination
• escape analysis
• lock elision
• lock fusion
• de-reflection
• speculative (profile-based) techniques
• optimistic nullness assertions
• optimistic type assertions
• optimistic type strengthening
• optimistic array length strengthening
• untaken branch pruning
• optimistic N-morphic inlining
• branch frequency prediction
• call frequency prediction
• memory and placement transformation
expression hoisting
expression sinking
redundant store elimination
adjacent store fusion
card-mark elimination
merge-point splitting
• loop transformations
• loop unrolling
• loop peeling
• safepoint elimination
• iteration range splitting
• range check elimination
• loop vectorization
• global code shaping
• inlining (graph integration)
• global code motion
• heat-based code layout
• switch balancing
• throw inlining
• control flow graph transformation
• local code scheduling
• local code bundling
• delay slot filling
• graph-coloring register allocation
• linear scan register allocation
• live range splitting
• copy coalescing
• constant splitting
• copy removal
• address mode matching
• instruction peepholing
• DFA-based code generator
INLINING – MOTHER OF OPTIMIZATION
13
Before After
*Using JVM Devirtualization if needed
Frequency and size matter
int addAll(int max){
int accum=0;
for (int i=0;i<max;i++) {
accum = add(accum, i);
}
return accum;
}
}
int add(int a, int b) {return a+b;}
int addAll(int max){
int accum=0;
for (int i=0;i<max;i++) {
accum = accum+i;
}
return accum;
}
}
int add(int a, int b) {return a+b;}
OSR – ON-STACK REPLACEMENT
14
oRunning method never exits?
oBut it’s getting really hot?
oGenerally means loops, back-branching
oCompile and replace while running
oNot typically useful in large systems
oLooks great on benchmarks!
ESCAPE ANALYSIS
15
oObject is referenced only inside some loop; no
other code can ever access that object?
oIt needn’t get a synchronization lock when
calling the methods working with object
oIt needn’t store the fields in memory; it can
keep that value in a register
oSimilarly it can store the objects references in a
register
ESCAPE ANALYSIS
16
public class Factorial {
private BigInteger factorial;
private int n;
public Factorial(int n) {
this.n = n;
}
public synchronized BigInteger getFactorial() {
if (factorial == null) factorial =...;
return factorial;
}
}
ArrayList< BigInteger > list = new ArrayList < BigInteger >();
for ( int i = 0 ; i < 100 ; i ++) {
Factorial factorial = new Factorial ( i );
list.add(factorial.getFactorial ());
}
ESCAPE ANALYSIS (SIMPLE CASE)
17
oIt needn’t get a synchronization lock when
calling the getFactorial() method.
oIt needn’t store the field n in memory; it can
keep that value in a register.
oIt can just keep track of the individual fields of
the object.
oSometime – it needn’t to execute it at all.
JIT TUNING
(THESE MIGHT SAVE YOU )
o -client , -server or -XX:+TieredCompilation
o -XX:ReservedCodeCacheSize=, -XX:InitialCodeCacheSize=
19
JIT TUNING
o -XX:CompileThreshold=invocation value for compiling
o -XX:CICompilerCount= number of threads
o -XX:MaxFreqInlineSize=for hot methods (default value 325
bytes)
o -XX:MaxInlineSize= method smaller this will be inlined anyway
(default value 35 bytes)
20
WANT TO GET MORE DETAILS?
(BE CAREFUL WITH USING THEM ON PRODUCTION)
o -XX:+UnlockDiagnosticVMOptions
o -XX:+TraceClassLoading
o -XX:+LogCompilation
o -XX:+PrintAssembly
o -XX:+PrintCompilation - info about compiled methods
o -XX:+PrintInlining – info about inlining decisions
o -XX:CompileCommand=… - to control compilation policy
21
WANT TO GET MORE DETAILS? – LOGS 
22
WANT TO GET MORE DETAILS – JITWATCH, JSTAT
23
CONCLUSIONS
o KISS, SOLID, DRY, YAGNI – all well-known principles are
perfect for JIT to make his job
o Your code will be optimized and compiled, de-compiled
o There is a lot of various algorithms to do it inside JVM
o You need to reserve memory for compiled code
(CodeCache inside Metaspace/Permgen)
o To get full performance throttle JVM needs to warm-up
o Micro benchmarks lie to you. All the time
24
WHAT WE DIDN’T TOUCH
o Deoptimazing
o Specific benchmark for compilers
o Specific compiled code examples
o …
25
Q&A
26

Java Jit. Compilation and optimization by Andrey Kovalenko

  • 1.
  • 2.
    AGENDA o What isJIT o Types - Client, Server, Tiered o Main optimizations approach o JIT tuning o Conclusions 2
  • 3.
    WHAT IS JIT oJust In Time compiler o Compilation done during execution of a program – at run time – rather than prior to execution o First presented at 1960 in LISP o Java, .NET, JS… o Oracle HotSpot, IBM J9, Azul… 3
  • 4.
    WHAT IS JIT oJIT separates optimization from SD (just update JVM - not improve code, tune for your platform) o JIT'ing requires Profiling • Because you don't want to JIT everything o Profiling allows better code-gen • Inline what’s hot • Loop unrolling, range-check elimination, etc • Branch prediction, spill-code-gen, scheduling 4
  • 5.
    HOTSPOT JIT CLIENT(C1) WORKFLOW 5 Java Source Bytecode compiler Bytecode Optimized code JIT Compiler Run time 1.5K invocations 
  • 6.
    JIT CLIENT (C1) oProduced Compilations quickly o Generated code runs relatively slowly 6
  • 7.
    HOTSPOT JIT SERVER(C2) WORKFLOW 7 Java Source Bytecode compiler Bytecode Optimized code (native) HotSpot info Profiler JIT compiler (optimization) Run time JIT compiler (deoptimization) 10K invocations
  • 8.
    HOTSPOT JIT SERVER(C2) o Produce compilations slowly (long warm-up) o Generated code runs fast o Profiler guided o Speculative 8
  • 9.
    HOTSPOT JIT TIERED(C2) o Available from Java 7 o Default in Java 8 o Best of C1 and C2 approaches o Level0=Interpreter o Level1-3=C1 o #1 – C1 w/o profiling o #2 – C1 with basic profiling (invocations) o #3 – C1 w full profiling (~35% overhead) o Level4=C2 9
  • 10.
    KEYS FOR JITVERSION 10 o -client o -server (-d64) o -server (-d64) -XX:+TieredCompilation
  • 11.
    DEFAULT JIT VERSION 11 Installbits -client -server -d64 Linux 32-bit 32-bit client compiler 32-bit server compiler Error Linux 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler Mac OS X 64-bit server compiler 64-bit server compiler 64-bit server compiler Windows 32-bit 32-bit client compiler 32-bit server compiler Error Windows 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler OS Default compiler Windows, 32-bit, any number of CPUs -client Windows, 64-bit, any number of CPUs -server MacOS, any number of CPUs -server Linux/Solaris, 32-bit, 1 CPU -client Linux/Solaris, 32-bit, 2 or more CPUs -server Linux, 64-bit, any number of CPUs -server *In Java 8 the server compiler is the default in any of these cases Information about default compiler % java -version java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Server VM (build 21.0-b17, mixed mode)
  • 12.
    OPTIMIZATIONS IN HOTSPOTJVM 12 • compiler tactics • delayed compilation • tiered compilation • on-stack replacement • delayed reoptimization • program dependence graph rep. • static single assignment rep. • proof-based techniques – exact type inference – memory value inference – memory value tracking – constant folding – reassociation – operator strength reduction – null check elimination – type test strength reduction – type test elimination – algebraic simplification – common subexpression elimination – integer range typing • flow-sensitive rewrites – conditional constant propagation – dominating test detection – flow-carried type narrowing – dead code elimination • language-specific techniques • class hierarchy analysis • devirtualization • symbolic constant propagation • autobox elimination • escape analysis • lock elision • lock fusion • de-reflection • speculative (profile-based) techniques • optimistic nullness assertions • optimistic type assertions • optimistic type strengthening • optimistic array length strengthening • untaken branch pruning • optimistic N-morphic inlining • branch frequency prediction • call frequency prediction • memory and placement transformation expression hoisting expression sinking redundant store elimination adjacent store fusion card-mark elimination merge-point splitting • loop transformations • loop unrolling • loop peeling • safepoint elimination • iteration range splitting • range check elimination • loop vectorization • global code shaping • inlining (graph integration) • global code motion • heat-based code layout • switch balancing • throw inlining • control flow graph transformation • local code scheduling • local code bundling • delay slot filling • graph-coloring register allocation • linear scan register allocation • live range splitting • copy coalescing • constant splitting • copy removal • address mode matching • instruction peepholing • DFA-based code generator
  • 13.
    INLINING – MOTHEROF OPTIMIZATION 13 Before After *Using JVM Devirtualization if needed Frequency and size matter int addAll(int max){ int accum=0; for (int i=0;i<max;i++) { accum = add(accum, i); } return accum; } } int add(int a, int b) {return a+b;} int addAll(int max){ int accum=0; for (int i=0;i<max;i++) { accum = accum+i; } return accum; } } int add(int a, int b) {return a+b;}
  • 14.
    OSR – ON-STACKREPLACEMENT 14 oRunning method never exits? oBut it’s getting really hot? oGenerally means loops, back-branching oCompile and replace while running oNot typically useful in large systems oLooks great on benchmarks!
  • 15.
    ESCAPE ANALYSIS 15 oObject isreferenced only inside some loop; no other code can ever access that object? oIt needn’t get a synchronization lock when calling the methods working with object oIt needn’t store the fields in memory; it can keep that value in a register oSimilarly it can store the objects references in a register
  • 16.
    ESCAPE ANALYSIS 16 public classFactorial { private BigInteger factorial; private int n; public Factorial(int n) { this.n = n; } public synchronized BigInteger getFactorial() { if (factorial == null) factorial =...; return factorial; } } ArrayList< BigInteger > list = new ArrayList < BigInteger >(); for ( int i = 0 ; i < 100 ; i ++) { Factorial factorial = new Factorial ( i ); list.add(factorial.getFactorial ()); }
  • 17.
    ESCAPE ANALYSIS (SIMPLECASE) 17 oIt needn’t get a synchronization lock when calling the getFactorial() method. oIt needn’t store the field n in memory; it can keep that value in a register. oIt can just keep track of the individual fields of the object. oSometime – it needn’t to execute it at all.
  • 18.
    JIT TUNING (THESE MIGHTSAVE YOU ) o -client , -server or -XX:+TieredCompilation o -XX:ReservedCodeCacheSize=, -XX:InitialCodeCacheSize= 19
  • 19.
    JIT TUNING o -XX:CompileThreshold=invocationvalue for compiling o -XX:CICompilerCount= number of threads o -XX:MaxFreqInlineSize=for hot methods (default value 325 bytes) o -XX:MaxInlineSize= method smaller this will be inlined anyway (default value 35 bytes) 20
  • 20.
    WANT TO GETMORE DETAILS? (BE CAREFUL WITH USING THEM ON PRODUCTION) o -XX:+UnlockDiagnosticVMOptions o -XX:+TraceClassLoading o -XX:+LogCompilation o -XX:+PrintAssembly o -XX:+PrintCompilation - info about compiled methods o -XX:+PrintInlining – info about inlining decisions o -XX:CompileCommand=… - to control compilation policy 21
  • 21.
    WANT TO GETMORE DETAILS? – LOGS  22
  • 22.
    WANT TO GETMORE DETAILS – JITWATCH, JSTAT 23
  • 23.
    CONCLUSIONS o KISS, SOLID,DRY, YAGNI – all well-known principles are perfect for JIT to make his job o Your code will be optimized and compiled, de-compiled o There is a lot of various algorithms to do it inside JVM o You need to reserve memory for compiled code (CodeCache inside Metaspace/Permgen) o To get full performance throttle JVM needs to warm-up o Micro benchmarks lie to you. All the time 24
  • 24.
    WHAT WE DIDN’TTOUCH o Deoptimazing o Specific benchmark for compilers o Specific compiled code examples o … 25
  • 25.