Keeping Your Java Hot By
Solving the JVM Warmup problem
Presented by Simon Ritter, Deputy CTO | Azul
2
JVM Startup
JVM
• Load and initialise
• Generate bytecode
templates
JVM
• Load application
classes
• Initialise application
classes
• Application-specific
initialisation
JVM
• Compile/deoptimise/recompile
Application
• Process specific workloads
JVM Startup Application Startup Application Warmup
Time to first operation
Time to n operations
Fast Quick Can take a long time
3
JVM Performance Warmup
• Classfiles contain bytecodes (instructions for a Virtual Machine)
• Initially, these are interpreted
o Each bytecode is converted to equivalent native instructions
• Frequently used methods (hotspots) are identified and compiled
o Just-in-Time (JIT) compilation
o Uses two compilers, C1 (fast compile, low optimisation) and C2 (slow compile, high optimisation)
• Each time the application is started, the same process begins again
o Profile code looking for hot methods
o Compile using C1 JIT, recompile using C2 JIT when hot enough
4
JVM Performance Graph
Application
Warmup
5
JVM Performance Graph
First run Second run Third run
6
JVM Performance Graph
First run Second run Third run
7
Solution 1:
Ahead of Time (AOT) Compilation
8
Compile Java Source Direct to Native Code
• Traditional approach: Ahead of time, static compilation
• No interpreting bytecodes
• No analysis of hotspots
• No runtime compilation of code placing heavy load on CPUs
• Start at full speed, straight away
• This is the Graal native image approach
o A similar approach is being explored by OpenJDK (Draft JEP 8313278)
• Problem solved, right?
9
Not So Fast
• AOT is, by definition, static
• And code is compiled before it is run
• The compiler has no knowledge of how the code will actually run
o Profile guided optimisation has been around for a long time and only helps partially
10
Speculative Optimisation Example 1:
Inlining Monomorphic Sites
public class Animal {
private int color;
public int getColor() { return color };
}
myColor = animal.getColor();
can be transformed to:
myColor = animal.color;
As long as only one implementer of getColor()
existslic class Animal {
11
Speculative Optimisation Example 2:
Branch Analysis
int computeMagnitude(int value) {
if (value > 9)
bias = computeBias(value);
else
bias = 1;
return Math.log10(bias + 99);
}
Profiling data shows that value (so far) has never been greater than 9
12
Speculative Optimisation Example 2:
Branch Analysis
Assume that, based on profiling, value will continue to be less than 10
int computeMagnitude(int value) {
if (value > 9)
uncommonTrap(); // Deoptimise
return 2; // Math.log10(100)
}
13
Speculative Optimisations
• Our measurements show over 50% of performance gains in JIT are due to speculative optimisations
• But they only work if you can deoptimise when assumptions prove false
• Deoptimisations are very bad for performance
o Already completed compilation is thrown away (wasted CPU cycles)
o Code must then be recompiled (during which must fallback to interpreter or C1 code)
14
AOT vs JIT Compilation
• AOT
• Class loading prevents method inlining
• No runtime bytecode generation
• Reflection is complicated
• Unable to use speculative optimisations
• Overall performance will typically be lower
• Full speed from the start
• No CPU overhead to compile code at runtime
• JIT
• Can use aggressive method inlining
• Can use runtime byteocde generation
• Reflection is (relatively) simple
• Can use speculative optimisations
• Overall performance will typically be higher
• Requires warmup time
• CPU overhead to compile code at runtime
15
JVM Performance
JIT Compiled Code
AOT Compiled Code
AOT Compiled Code with PGO
16
When To Use AOT
• Ephemeral microservices
o Startup and warmup time is more important than overall speed
o Garbage collection is usually a non-issue
• Resource constrained services
o E.g. 2 vcore container
o JIT compilation will significantly reduce throughput during warmup
17
Solution 2:
Store JIT Compilation Data
18
Azul Prime ReadyNow
• Run the application until its warmed up (training run)
• Take a profile
o All currently loaded classes
o All currently initialised classes
o JIT profiling data
o Deoptimisations that occurred
o A copy of all compiled code
• Restart application
o Load and initialise all required classes
o Load code or compile methods
o All before main()
19
ReadyNow Startup Time
Performanc
e
Time
Performance
Time
Without ReadyNow!
With ReadyNow!
Class loading,
initialising and compile
time
20
Project Leyden (OpenJDK)
• The goal is to improve startup and warmup time
• Three specific parts:
o AOT Class Loading and Linking (JEP 483)
- Callsite linkage, constant resolution, constant pool, symbolic references
- Based on AppCDS
o AOT Method Profiling (Draft JEP 8325147)
- Cache profiling data from a training run
- Reuse at application restart
o AOT Code Compilation (Draft JEP 8335368)
- Cache compiled code from a training run
- Reuse at application restart
• Process is currently: training run -> Assembly phase -> Deployment run
• Does any of this sound familiar?
21
Solution 3:
Decouple The JIT Compiler
22
JVM Architecture
23
Speculatively Optimised Code
Optimised Code
Assumptions
(That must hold true
for code to work)
Only one getColor()
method exists
value is less than
10
24
JIT Compilation Has Cost
• JIT is CPU intensive
• Better optimisations deliver better performance (throughput)
o But require more time, compute power and memory
• This is fine if we have a powerful machine
o E.g. 64 vcores and 64GB RAM
• Less powerful environments can be problematic
o E.g. 2vcore container with 2GB RAM
o Heavily optimised JIT can become prohibitive by degrading throughput
o Even resul in OOM errors
• Often we end up with a compromise
25
Speed and CPU Usage Over Time
• 2 Vcore container
26
Speed and CPU Usage Over Time
• 2 Vcore container
27
And Made It A Cloud-Based Resource
Cloud Native Compiler
28
Speed and CPU Usage Over Time
• 2 Vcore container
29
Speed and CPU Usage Over Time
• 2 Vcore container
30
Speed and CPU Usage Over Time
• 2 Vcore container
31
Isn't This Just Shifting The Cost?
• Well, Yes…
• But we are shifting it to a much more efficient place
• optimization resources are used for a tiny fraction of wall clock time
• When a JVM optimizes locally, it must carry dedicated resources to do so
• When outsourced to a Cloud Native Compiler
o The resources are shared and reused
o The resources can be elastic§
• Compiled code can be cached
o The JIT now effectively has a memory across runs
32
Optimised Code
Assumptions
(That must hold true
for code to work)
Cloud Native
Compiler
33
Solution 4:
Combine Solution 2 and 3
34
Ready Now Orchestrator
Optimizer Hub
Prime JVM
Cloud Native
Compiler
ReadyNow
Orchestrator
Falcon JIT
Compiler
ReadyNow
35
Solution 5:
Save The Whole Application State
36
Co-ordinated Resume In Userspace
• Linux project
• Basic idea
o Freeze a running application
- Pause program counter
o Create a snapshot of the applications state (as a set of files)
o At some later point, use those files to restart the application from the same point
- Potentially, on a different physical machine
Input Output
CPU
Memory
Registers
37
Co-ordinated Restore at Checkpoint (CRaC)
• Let's make the application aware it is being checkpointed and restored
• CRaC also enforces more restrictions on a checkpointed application
o No open files or sockets
o Checkpoint will be aborted if any are found
Application running Application running
Aware of checkpoint
being created
Aware of restore
happening
38
Using CRaC API
• CRaC uses Resources that can be notified about a Checkpoint and Restore
• Classes in application code implement the Resource interface
o Receive callbacks during checkpoint and restore
o Able to perform necessary actions (close files, etc.)
<<Interface>>
Resource
beforeCheckpoint()
afterRestore()
39
Using CRaC API
• Resource objects need to be registered with a Context so that they can receive notifications
• There is a global Context accessible via the static getGlobalContext() method of the Core class
<<Interface>>
Resource
beforeCheckpoint()
afterRestore()
Core
getGlobalContext()
<<Abstract>>
Context
register(Resource)
40
Using CRaC API
• The global Context maintains a list of Resource objects
• The beforeCheckpoint() methods are called in the order the Resource objects were added
• The afterRestore() methods are called in the reverse order the Resource objects were added
o Predictable restoration of important state
41
Does It Work? POC Results
42
Summary
43
Solving The JVM Warmup Problem
• No one solution will fit all situations
• AOT is good for fast startup/small footprint in ephemeral services
• ReadyNow provides memory of JIT across runs
• Cloud Native Compiler offloads JIT workload
• ReadyNow Orchestrator combines basic ReadyNow and CNC
• CRaC restarts an application from a known point
• Project Leyden is looking at approaches that include those above as well as other ideas
Thank You.
Simon Ritter, Deputy CTO
@speakjava
45
Azul Java
• Azul is all about Java runtimes
• Zulu (Azul Platform Core)
o Distribution of OpenJDK
o Drop-in replacement for other OpenJDK distros (TCK tested)
o Free and commercial support for JDK 8, 11, 17 and 21
o Free builds of 22 (Early Access)
o Commercial support for JDK 6 and 7
• Azul Platform Prime
o OpenJDK-based higher performance JVM
o GC pauses eliminated = low latency
o C2 JIT compiler replaced = higher thruput
o ReadyNow! warmup elimination technology
o Cloud Native Compiler - JIT-as-a-Service

Keeping Your Java Hot by Solving the JVM Warmup Problem

  • 1.
    Keeping Your JavaHot By Solving the JVM Warmup problem Presented by Simon Ritter, Deputy CTO | Azul
  • 2.
    2 JVM Startup JVM • Loadand initialise • Generate bytecode templates JVM • Load application classes • Initialise application classes • Application-specific initialisation JVM • Compile/deoptimise/recompile Application • Process specific workloads JVM Startup Application Startup Application Warmup Time to first operation Time to n operations Fast Quick Can take a long time
  • 3.
    3 JVM Performance Warmup •Classfiles contain bytecodes (instructions for a Virtual Machine) • Initially, these are interpreted o Each bytecode is converted to equivalent native instructions • Frequently used methods (hotspots) are identified and compiled o Just-in-Time (JIT) compilation o Uses two compilers, C1 (fast compile, low optimisation) and C2 (slow compile, high optimisation) • Each time the application is started, the same process begins again o Profile code looking for hot methods o Compile using C1 JIT, recompile using C2 JIT when hot enough
  • 4.
  • 5.
    5 JVM Performance Graph Firstrun Second run Third run
  • 6.
    6 JVM Performance Graph Firstrun Second run Third run
  • 7.
    7 Solution 1: Ahead ofTime (AOT) Compilation
  • 8.
    8 Compile Java SourceDirect to Native Code • Traditional approach: Ahead of time, static compilation • No interpreting bytecodes • No analysis of hotspots • No runtime compilation of code placing heavy load on CPUs • Start at full speed, straight away • This is the Graal native image approach o A similar approach is being explored by OpenJDK (Draft JEP 8313278) • Problem solved, right?
  • 9.
    9 Not So Fast •AOT is, by definition, static • And code is compiled before it is run • The compiler has no knowledge of how the code will actually run o Profile guided optimisation has been around for a long time and only helps partially
  • 10.
    10 Speculative Optimisation Example1: Inlining Monomorphic Sites public class Animal { private int color; public int getColor() { return color }; } myColor = animal.getColor(); can be transformed to: myColor = animal.color; As long as only one implementer of getColor() existslic class Animal {
  • 11.
    11 Speculative Optimisation Example2: Branch Analysis int computeMagnitude(int value) { if (value > 9) bias = computeBias(value); else bias = 1; return Math.log10(bias + 99); } Profiling data shows that value (so far) has never been greater than 9
  • 12.
    12 Speculative Optimisation Example2: Branch Analysis Assume that, based on profiling, value will continue to be less than 10 int computeMagnitude(int value) { if (value > 9) uncommonTrap(); // Deoptimise return 2; // Math.log10(100) }
  • 13.
    13 Speculative Optimisations • Ourmeasurements show over 50% of performance gains in JIT are due to speculative optimisations • But they only work if you can deoptimise when assumptions prove false • Deoptimisations are very bad for performance o Already completed compilation is thrown away (wasted CPU cycles) o Code must then be recompiled (during which must fallback to interpreter or C1 code)
  • 14.
    14 AOT vs JITCompilation • AOT • Class loading prevents method inlining • No runtime bytecode generation • Reflection is complicated • Unable to use speculative optimisations • Overall performance will typically be lower • Full speed from the start • No CPU overhead to compile code at runtime • JIT • Can use aggressive method inlining • Can use runtime byteocde generation • Reflection is (relatively) simple • Can use speculative optimisations • Overall performance will typically be higher • Requires warmup time • CPU overhead to compile code at runtime
  • 15.
    15 JVM Performance JIT CompiledCode AOT Compiled Code AOT Compiled Code with PGO
  • 16.
    16 When To UseAOT • Ephemeral microservices o Startup and warmup time is more important than overall speed o Garbage collection is usually a non-issue • Resource constrained services o E.g. 2 vcore container o JIT compilation will significantly reduce throughput during warmup
  • 17.
    17 Solution 2: Store JITCompilation Data
  • 18.
    18 Azul Prime ReadyNow •Run the application until its warmed up (training run) • Take a profile o All currently loaded classes o All currently initialised classes o JIT profiling data o Deoptimisations that occurred o A copy of all compiled code • Restart application o Load and initialise all required classes o Load code or compile methods o All before main()
  • 19.
    19 ReadyNow Startup Time Performanc e Time Performance Time WithoutReadyNow! With ReadyNow! Class loading, initialising and compile time
  • 20.
    20 Project Leyden (OpenJDK) •The goal is to improve startup and warmup time • Three specific parts: o AOT Class Loading and Linking (JEP 483) - Callsite linkage, constant resolution, constant pool, symbolic references - Based on AppCDS o AOT Method Profiling (Draft JEP 8325147) - Cache profiling data from a training run - Reuse at application restart o AOT Code Compilation (Draft JEP 8335368) - Cache compiled code from a training run - Reuse at application restart • Process is currently: training run -> Assembly phase -> Deployment run • Does any of this sound familiar?
  • 21.
  • 22.
  • 23.
    23 Speculatively Optimised Code OptimisedCode Assumptions (That must hold true for code to work) Only one getColor() method exists value is less than 10
  • 24.
    24 JIT Compilation HasCost • JIT is CPU intensive • Better optimisations deliver better performance (throughput) o But require more time, compute power and memory • This is fine if we have a powerful machine o E.g. 64 vcores and 64GB RAM • Less powerful environments can be problematic o E.g. 2vcore container with 2GB RAM o Heavily optimised JIT can become prohibitive by degrading throughput o Even resul in OOM errors • Often we end up with a compromise
  • 25.
    25 Speed and CPUUsage Over Time • 2 Vcore container
  • 26.
    26 Speed and CPUUsage Over Time • 2 Vcore container
  • 27.
    27 And Made ItA Cloud-Based Resource Cloud Native Compiler
  • 28.
    28 Speed and CPUUsage Over Time • 2 Vcore container
  • 29.
    29 Speed and CPUUsage Over Time • 2 Vcore container
  • 30.
    30 Speed and CPUUsage Over Time • 2 Vcore container
  • 31.
    31 Isn't This JustShifting The Cost? • Well, Yes… • But we are shifting it to a much more efficient place • optimization resources are used for a tiny fraction of wall clock time • When a JVM optimizes locally, it must carry dedicated resources to do so • When outsourced to a Cloud Native Compiler o The resources are shared and reused o The resources can be elastic§ • Compiled code can be cached o The JIT now effectively has a memory across runs
  • 32.
    32 Optimised Code Assumptions (That musthold true for code to work) Cloud Native Compiler
  • 33.
  • 34.
    34 Ready Now Orchestrator OptimizerHub Prime JVM Cloud Native Compiler ReadyNow Orchestrator Falcon JIT Compiler ReadyNow
  • 35.
    35 Solution 5: Save TheWhole Application State
  • 36.
    36 Co-ordinated Resume InUserspace • Linux project • Basic idea o Freeze a running application - Pause program counter o Create a snapshot of the applications state (as a set of files) o At some later point, use those files to restart the application from the same point - Potentially, on a different physical machine Input Output CPU Memory Registers
  • 37.
    37 Co-ordinated Restore atCheckpoint (CRaC) • Let's make the application aware it is being checkpointed and restored • CRaC also enforces more restrictions on a checkpointed application o No open files or sockets o Checkpoint will be aborted if any are found Application running Application running Aware of checkpoint being created Aware of restore happening
  • 38.
    38 Using CRaC API •CRaC uses Resources that can be notified about a Checkpoint and Restore • Classes in application code implement the Resource interface o Receive callbacks during checkpoint and restore o Able to perform necessary actions (close files, etc.) <<Interface>> Resource beforeCheckpoint() afterRestore()
  • 39.
    39 Using CRaC API •Resource objects need to be registered with a Context so that they can receive notifications • There is a global Context accessible via the static getGlobalContext() method of the Core class <<Interface>> Resource beforeCheckpoint() afterRestore() Core getGlobalContext() <<Abstract>> Context register(Resource)
  • 40.
    40 Using CRaC API •The global Context maintains a list of Resource objects • The beforeCheckpoint() methods are called in the order the Resource objects were added • The afterRestore() methods are called in the reverse order the Resource objects were added o Predictable restoration of important state
  • 41.
    41 Does It Work?POC Results
  • 42.
  • 43.
    43 Solving The JVMWarmup Problem • No one solution will fit all situations • AOT is good for fast startup/small footprint in ephemeral services • ReadyNow provides memory of JIT across runs • Cloud Native Compiler offloads JIT workload • ReadyNow Orchestrator combines basic ReadyNow and CNC • CRaC restarts an application from a known point • Project Leyden is looking at approaches that include those above as well as other ideas
  • 44.
    Thank You. Simon Ritter,Deputy CTO @speakjava
  • 45.
    45 Azul Java • Azulis all about Java runtimes • Zulu (Azul Platform Core) o Distribution of OpenJDK o Drop-in replacement for other OpenJDK distros (TCK tested) o Free and commercial support for JDK 8, 11, 17 and 21 o Free builds of 22 (Early Access) o Commercial support for JDK 6 and 7 • Azul Platform Prime o OpenJDK-based higher performance JVM o GC pauses eliminated = low latency o C2 JIT compiler replaced = higher thruput o ReadyNow! warmup elimination technology o Cloud Native Compiler - JIT-as-a-Service