Java on CRaC
Presented by Simon Ritter, Deputy CTO | Azul
2
Why Is Java So Popular?
• It's a great language
o Easy to write, easy to read
• It has great libraries
o Pretty much anything you can think off, almost always a FOSS version as well as commercial
• It has a great community
o JUGs, Champions, conferences, etc, etc
o Easy to find Java programmers
• But really, it's the Java Virtual Machine
o Write once, run anywhere
o Excellent backwards compatibility
o A managed runtime environment
3
JVM Performance Warmup
• Classfiles contain bytecodes (instructions for a Virtual Machine)
• Initially, these are interpreted
o Each bytecode is converted to equivalent native instructions
• Frequently used methods (hotspots) are identified and compiled
o Just-in-Time (JIT) compilation
o Uses two compilers, C1 (fast compile, low optimisation) and C2 (slow compile, high optimisation)
• Each time the application is started, the same process begins again
o Profile code looking for hot methods
o Compile using C1 JIT, recompile using C2 JIT when hot enough
4
JVM Performance Graph
Application
Warmup
5
JVM Performance Graph
First run Second run Third run
6
JVM Performance Graph
First run Second run Third run
7
Why Not Use AOT?
• Traditional approach: Ahead of time, static compilation
• No interpreting bytecodes
• No analysis of hotspots
• No runtime compilation of code placing heavy load on CPUs
• Start at full speed, straight away
• This is the Graal native image approach
• Problem solved, right?
8
What About AppCDS?
9
Not So Fast
• AOT is, by definition, static
• And code is compiled before it is run
• The compiler has no knowledge of how the code will actually run
o Profile guided optimisation has been around for a long time and only helps partially
10
Speculative Optimisation Example 1:
Inlining Monomorphic Sites
public class Animal {
private int color;
public int getColor() { return color };
}
myColor = animal.getColor();
can be transformed to:
myColor = animal.color;
As long as only one implementer of getColor() existslic
class Animal {
11
Speculative Optimisation Example 2:
Branch Analysis
int computeMagnitude(int value) {
if (value > 9)
bias = computeBias(value);
else
bias = 1;
return Math.log10(bias + 99);
}
Profiling data shows that value (so far) has never been greater than 9
12
Speculative Optimisation Example 2:
Branch Analysis
Assume that, based on profiling, value will continue to be less than 10
int computeMagnitude(int value) {
if (value > 9)
uncommonTrap(); // Deoptimise
return 2; // Math.log10(100)
}
13
Speculative Optimisations
• Our measurements show over 50% of performance gains in JIT are due to speculative optimisations
• But they only work if you can deoptimise when assumptions prove false
• Deoptimisations are very bad for performance
o Already completed compilation is thrown away (wasted CPU cycles)
o Code must then be recompiled (during which must fallback to interpreter or C1 code)
14
AOT vs JIT Compilation
• AOT
• Class loading prevents method inlining
• No runtime bytecode generation
• Reflection is complicated
• Unable to use speculative optimisations
• Overall performance will typically be lower
• Full speed from the start
• No CPU overhead to compile code at runtime
• JIT
• Can use aggressive method inlining
• Can use runtime byteocde generation
• Reflection is (relatively) simple
• Can use speculative optimisations
• Overall performance will typically be higher
• Requires warmup time
• CPU overhead to compile code at runtime
15
JVM Performance
JIT Compiled Code
AOT Compiled Code
AOT Compiled Code with PGO
A Different Approach
17
Co-ordinated Resume In Userspace
• Linux project
• Basic idea
o Freeze a running application
- Pause program counter
o Create a snapshot of the applications state (as a set of files)
o At some later point, use those files to restart the application from the same point
- Potentially, on a different physical machine
Input Output
CPU
Memory
Registers
18
Co-ordinated Resume In Userspace
• Mainly implemented in userspace rather than the kernel
o Some kernel support required but integrated to mainline since 3.11
• Main goal is the migration of containers for microservices
• Supports extensive features
o Processes and threads
o Application memory, memory mapped files and shared memory
o Open files, pipes and FIFOs
o Sockets
o Interprocess communication channels
o Timers and signals
• Includes ability to rebuild a TCP connection from one side without the need to exchange packets
19
CRIU Challenges
• What if we move the saved state to a different machine?
o What about open files, shared memory, etc?
• What if we use the same state to restart mutilple instances
o They would work as if each was the original instance
o Could lead to difficult clashes of use of state
• A JVM (and therefore Java code) would assume it was continuing its tasks
o Very difficult to use effectively
20
Co-ordinated Restore at Checkpoint (CRaC)
• Let's make the application aware it is being checkpointed and restored
• CRaC also enforces more restrictions on a checkpointed application
o No open files or sockets
o Checkpoint will be aborted if any are found
Application running Application running
Aware of checkpoint
being created
Aware of restore
happening
21
Using CRaC API
• CRaC uses Resources that can be notified about a Checkpoint and Restore
• Classes in application code implement the Resource interface
o Receive callbacks during checkpoint and restore
o Able to perform necessary actions (close files, etc.)
<<Interface>>
Resource
beforeCheckpoint()
afterRestore()
22
Using CRaC API
• Resource objects need to be registered with a Context so that they can receive notifications
• There is a global Context accessible via the static getGlobalContext() method of the Core class
<<Interface>>
Resource
beforeCheckpoint()
afterRestore()
Core
getGlobalContext()
<<Abstract>>
Context
register(Resource)
23
Using CRaC API
• The global Context maintains a list of Resource objects
• The beforeCheckpoint() methods are called in the order the Resource objects were added
• The afterRestore() methods are called in the reverse order the Resource objects were added
o Predictable restoration of important state
24
Example: Jetty
class ServerManager {
Server server;
public ServerManager(int port, Handler handler) throws Exception {
server = new Server(8080);
server.setHandler(handler);
server.start();
}
}
25
Using CRaC (External Initiation)
$ jcmd target/example-jetty-1.0-SNAPSHOT.jar JDK.checkpoint
80694: Command executed successfully
jdk.crac.impl.CheckpointOpenSocketException:tcp6 localAddr::localPort 8080 remoteAddr
::remotePort 0
at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:80)
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:137) at
java.base/jdk.crac.Core.checkpointRestore(Core.java:177) at
java.base/jdk.crac.Core.lambda$checkpointRestoreInternal$0(Core.java:19 at
java.base/java.lang.Thread.run(Thread.java:832)
26
Example: Jetty
class ServerManager implements Resource {
public ServerManager(int port, Handler handler) throws Exception {
...
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
server.stop();
}
@Override
public void afterRestore(Context<? extends Resource> context) throws Exception {
server.start();
}
}
27
Using CRaC (Programmatic Initiation)
• In the application code, make an explicit call
Core.checkpointRestore();
• This will initiate the checkpoint and return when a restore has been completed
o Can throw either a CheckpointException or RestoreException
How Good Is It?
29
Proof Of Concept Results
30
Proof Of Concept Results
Time to complete n operations: Spring Boot
31
Proof Of Concept Results
Time to complete n operations: Quarkus
Summary
33
Summary
• CRaC is a way to pause a JVM-based application
• And restore it at some point in the future
o Possibly multiple times
• Benefit is potentially etremely fast time to full performance level
• Eleiminates the need for hotspot identification, method compiles, recompiles and deoptimisations
• Improved throughput from start
o By eliminating JIT compilation overhead
• OpenJDK project
o Get involved!
github.com/CRaC
wiki.openjdk.java.net/display/crac
34
Demo
Thank You.
Simon Ritter, Deputy CTO
@speakjava

Java On CRaC

  • 1.
    Java on CRaC Presentedby Simon Ritter, Deputy CTO | Azul
  • 2.
    2 Why Is JavaSo Popular? • It's a great language o Easy to write, easy to read • It has great libraries o Pretty much anything you can think off, almost always a FOSS version as well as commercial • It has a great community o JUGs, Champions, conferences, etc, etc o Easy to find Java programmers • But really, it's the Java Virtual Machine o Write once, run anywhere o Excellent backwards compatibility o A managed runtime environment
  • 3.
    3 JVM Performance Warmup •Classfiles contain bytecodes (instructions for a Virtual Machine) • Initially, these are interpreted o Each bytecode is converted to equivalent native instructions • Frequently used methods (hotspots) are identified and compiled o Just-in-Time (JIT) compilation o Uses two compilers, C1 (fast compile, low optimisation) and C2 (slow compile, high optimisation) • Each time the application is started, the same process begins again o Profile code looking for hot methods o Compile using C1 JIT, recompile using C2 JIT when hot enough
  • 4.
  • 5.
    5 JVM Performance Graph Firstrun Second run Third run
  • 6.
    6 JVM Performance Graph Firstrun Second run Third run
  • 7.
    7 Why Not UseAOT? • Traditional approach: Ahead of time, static compilation • No interpreting bytecodes • No analysis of hotspots • No runtime compilation of code placing heavy load on CPUs • Start at full speed, straight away • This is the Graal native image approach • Problem solved, right?
  • 8.
  • 9.
    9 Not So Fast •AOT is, by definition, static • And code is compiled before it is run • The compiler has no knowledge of how the code will actually run o Profile guided optimisation has been around for a long time and only helps partially
  • 10.
    10 Speculative Optimisation Example1: Inlining Monomorphic Sites public class Animal { private int color; public int getColor() { return color }; } myColor = animal.getColor(); can be transformed to: myColor = animal.color; As long as only one implementer of getColor() existslic class Animal {
  • 11.
    11 Speculative Optimisation Example2: Branch Analysis int computeMagnitude(int value) { if (value > 9) bias = computeBias(value); else bias = 1; return Math.log10(bias + 99); } Profiling data shows that value (so far) has never been greater than 9
  • 12.
    12 Speculative Optimisation Example2: Branch Analysis Assume that, based on profiling, value will continue to be less than 10 int computeMagnitude(int value) { if (value > 9) uncommonTrap(); // Deoptimise return 2; // Math.log10(100) }
  • 13.
    13 Speculative Optimisations • Ourmeasurements show over 50% of performance gains in JIT are due to speculative optimisations • But they only work if you can deoptimise when assumptions prove false • Deoptimisations are very bad for performance o Already completed compilation is thrown away (wasted CPU cycles) o Code must then be recompiled (during which must fallback to interpreter or C1 code)
  • 14.
    14 AOT vs JITCompilation • AOT • Class loading prevents method inlining • No runtime bytecode generation • Reflection is complicated • Unable to use speculative optimisations • Overall performance will typically be lower • Full speed from the start • No CPU overhead to compile code at runtime • JIT • Can use aggressive method inlining • Can use runtime byteocde generation • Reflection is (relatively) simple • Can use speculative optimisations • Overall performance will typically be higher • Requires warmup time • CPU overhead to compile code at runtime
  • 15.
    15 JVM Performance JIT CompiledCode AOT Compiled Code AOT Compiled Code with PGO
  • 16.
  • 17.
    17 Co-ordinated Resume InUserspace • Linux project • Basic idea o Freeze a running application - Pause program counter o Create a snapshot of the applications state (as a set of files) o At some later point, use those files to restart the application from the same point - Potentially, on a different physical machine Input Output CPU Memory Registers
  • 18.
    18 Co-ordinated Resume InUserspace • Mainly implemented in userspace rather than the kernel o Some kernel support required but integrated to mainline since 3.11 • Main goal is the migration of containers for microservices • Supports extensive features o Processes and threads o Application memory, memory mapped files and shared memory o Open files, pipes and FIFOs o Sockets o Interprocess communication channels o Timers and signals • Includes ability to rebuild a TCP connection from one side without the need to exchange packets
  • 19.
    19 CRIU Challenges • Whatif we move the saved state to a different machine? o What about open files, shared memory, etc? • What if we use the same state to restart mutilple instances o They would work as if each was the original instance o Could lead to difficult clashes of use of state • A JVM (and therefore Java code) would assume it was continuing its tasks o Very difficult to use effectively
  • 20.
    20 Co-ordinated Restore atCheckpoint (CRaC) • Let's make the application aware it is being checkpointed and restored • CRaC also enforces more restrictions on a checkpointed application o No open files or sockets o Checkpoint will be aborted if any are found Application running Application running Aware of checkpoint being created Aware of restore happening
  • 21.
    21 Using CRaC API •CRaC uses Resources that can be notified about a Checkpoint and Restore • Classes in application code implement the Resource interface o Receive callbacks during checkpoint and restore o Able to perform necessary actions (close files, etc.) <<Interface>> Resource beforeCheckpoint() afterRestore()
  • 22.
    22 Using CRaC API •Resource objects need to be registered with a Context so that they can receive notifications • There is a global Context accessible via the static getGlobalContext() method of the Core class <<Interface>> Resource beforeCheckpoint() afterRestore() Core getGlobalContext() <<Abstract>> Context register(Resource)
  • 23.
    23 Using CRaC API •The global Context maintains a list of Resource objects • The beforeCheckpoint() methods are called in the order the Resource objects were added • The afterRestore() methods are called in the reverse order the Resource objects were added o Predictable restoration of important state
  • 24.
    24 Example: Jetty class ServerManager{ Server server; public ServerManager(int port, Handler handler) throws Exception { server = new Server(8080); server.setHandler(handler); server.start(); } }
  • 25.
    25 Using CRaC (ExternalInitiation) $ jcmd target/example-jetty-1.0-SNAPSHOT.jar JDK.checkpoint 80694: Command executed successfully jdk.crac.impl.CheckpointOpenSocketException:tcp6 localAddr::localPort 8080 remoteAddr ::remotePort 0 at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:80) at java.base/jdk.crac.Core.checkpointRestore1(Core.java:137) at java.base/jdk.crac.Core.checkpointRestore(Core.java:177) at java.base/jdk.crac.Core.lambda$checkpointRestoreInternal$0(Core.java:19 at java.base/java.lang.Thread.run(Thread.java:832)
  • 26.
    26 Example: Jetty class ServerManagerimplements Resource { public ServerManager(int port, Handler handler) throws Exception { ... Core.getGlobalContext().register(this); } @Override public void beforeCheckpoint(Context<? extends Resource> context) throws Exception { server.stop(); } @Override public void afterRestore(Context<? extends Resource> context) throws Exception { server.start(); } }
  • 27.
    27 Using CRaC (ProgrammaticInitiation) • In the application code, make an explicit call Core.checkpointRestore(); • This will initiate the checkpoint and return when a restore has been completed o Can throw either a CheckpointException or RestoreException
  • 28.
  • 29.
  • 30.
    30 Proof Of ConceptResults Time to complete n operations: Spring Boot
  • 31.
    31 Proof Of ConceptResults Time to complete n operations: Quarkus
  • 32.
  • 33.
    33 Summary • CRaC isa way to pause a JVM-based application • And restore it at some point in the future o Possibly multiple times • Benefit is potentially etremely fast time to full performance level • Eleiminates the need for hotspot identification, method compiles, recompiles and deoptimisations • Improved throughput from start o By eliminating JIT compilation overhead • OpenJDK project o Get involved! github.com/CRaC wiki.openjdk.java.net/display/crac
  • 34.
  • 35.
    Thank You. Simon Ritter,Deputy CTO @speakjava