So you've been deploying Java in the cloud and are wondering how to handle the new world of containers, microservices, and memory constraints. Cold starts got you down? Come to this session to learn about how the OpenJ9 and the JVM in general can help you on your Cloud Native journey.
5. 5
Java in the Cloud requirements
§ Fast startup
–Faster scaling for increased demand
§ Small footprint
–Improves density for providers
–Improves cost for applications
§ Quick / immediate rampup
–GB/hr is key, if you run for less time you pay less money
9. Don’t use –Xmx for containers to avoid rebuilds
9
§ -XX:InitialRAMPercentage=N
– Set initial heap size as a percentage of total memory
§ -XX:MaxRAMPercentage=N
– Set maximum heap size as a percentage of total memory
§ Running in containers with set memory limits?
– OpenJ9 will base the default heap size on that limit:
10. Out of the box idle tuning
§ When’s the best time to schedule a GC?
When your apps idle!
§ OpenJ9 enables -XX:+IdleTuningGcOnIdle
when running in a container
– When the VM detects your app is idle for a
configurable length of time, it can GC
– Both scavenge and global collections can
occur based on heuristics
10
12. -Xverify:none never again!
12
§ Use -XX:[+|-]ClassRelationshipVerifier instead for fast and safe startup
public class A {
public static void main(String[] args) {
B b = new B();
acceptC(b);
}
public void acceptC(C c) {}
}
19. 19
ShareClasses and AOT
§ Distinction between ‘cold’ and ‘warm’ runs
§ Dynamic AOT compilation
–Relocatable format
–AOT loads are ~100 times faster than JIT compilations
–More generic code à slightly less optimized
§ Generate AOT code only during start-up
§ Recompilation helps bridge the gap
21. Problem 1: Modifying pre-existing files creates copies.
FROM scratch
RUN echo “Class A” >myscc
21
22. Problem 1: Modifying pre-existing files creates copies.
FROM scratch
RUN echo “Class A” >myscc
FROM Image_1
RUN echo “Class B” >>myscc
22
23. Problem 1: Modifying pre-existing files creates copies.
FROM scratch
RUN echo “Class A” >myscc
FROM Image_1
RUN echo “Class B” >>myscc
23
Class A
24. Problem 1: Modifying pre-existing files creates copies.
FROM scratch
RUN echo “Class A” >myscc
FROM Image_1
RUN echo “Class B” >>myscc
24
Class A
Class A
Class B
25. Problem 1: Modifying pre-existing files creates copies.
FROM scratch
RUN echo “Class A” >myscc
FROM Image_1
RUN echo “Class B” >>myscc
FROM Image_2
RUN echo “Class C” >>myscc
25
Class A
Class A
Class B
Class A
Class B
Class C
26. Problem 2: We need to leave space in the cache for other
images.
FROM scratch
RUN echo “Class A” >myscc
26
-Xscmx?
How big?
27. Solution: Create new cache layers in new files.
FROM scratch
RUN echo “Class A” >myscc.L0
FROM Image_1
RUN echo “Class B” >myscc.L1
FROM Image_2
RUN echo “Class C” >myscc.L2
27
28. Solution: Create new cache layers in new files.
FROM scratch
RUN echo “Class A” >myscc.L0
FROM Image_1
RUN echo “Class B” >myscc.L1
FROM Image_2
RUN echo “Class C” >myscc.L2
28
29. Solution: Create new cache layers in new files.
FROM scratch
RUN echo “Class A” >myscc.L0
FROM Image_1
RUN echo “Class B” >myscc.L1
FROM Image_2
RUN echo “Class C” >myscc.L2
29
Class A
Class B
Class C
30. Bonus: We can specify –Xscmx per layer. We don’t need to
leave any extra space.
FROM scratch
RUN echo “Class A” >myscc.L0
FROM Image_1
RUN echo “Class B” >myscc.L1
FROM Image_2
RUN echo “Class C” >myscc.L2
30
How Big?
Doesn’t
matter.
31. Bonus: We can specify –Xscmx per layer. We don’t need to
leave any extra space.
FROM scratch
RUN echo “Class A” >myscc.L0
FROM Image_1
RUN echo “Class B” >myscc.L1
FROM Image_2
RUN echo “Class C” >myscc.L2
31
How Big?
Doesn’t
matter.
-XscmxA
-XscmxB
-XscmxC
33. Example of Docker layers
Container
Layer
(Read-Write)
DayTrader
Image Layer
(Read-only)
Open Liberty
Image Layer
(Read-only)
OpenJ9
Image Layer
(Read-only)
Ubuntu 16.04
33
34. Example of Docker layers
Read-write OpenJ9
Read-only Ubuntu 16.04
SCC
OpenJ9 layer data
34
35. Example of Docker layers
Read-write Open Liberty
Read-only OpenJ9
Read-only Ubuntu 16.04
SCC
OpenJ9 layer data
SCC
OpenJ9 layer data Liberty layer data
SCC tuned on
Load OpenJ9 layer data/add Open Liberty
Layer data Docker CoW
35
36. Example of Docker layers
Read-write DayTrader
Read-only Open Liberty
Read-only OpenJ9
Read-only Ubuntu 16.04
SCC
OpenJ9 layer data
SCC
OpenJ9 layer + Liberty layer data
SCC
OpenJ9 layer + Liberty layer + DayTrader layer data
36
37. Multi-layer SCC
Read-write Open Liberty
Read-only OpenJ9
Read-only Ubuntu 16.04
SCC
OpenJ9 layer data
Cannot write to a lower layer
SCC
Liberty layer data
Write SCC in top layer
37
38. Multi-layer SCC
Read-write Open Liberty
Read-only OpenJ9
Read-only Ubuntu 16.04
SCC_L0
OpenJ9 layer data
SCC_L1
Liberty layer data
Cannot write to a lower layer Write SCC in top layer SCC becomes layered
38
40. Example
• Create a shared cache named "demo"
• Use –Xshareclasses:listAllCaches to find the shared cache. By default, the cache is layer 0
40
41. Example
• Traditionally if you are running on a single layer cache:
Simply use the same -Xshareclasses option to start up the JVM again.
Traditional single cache is equivalent to Mutli-layer SCC case with layer number 0.
41
42. Example
• If you want to create a new layer for cache named "demo":
42
43. Example
• Find the new layer cache. Cache named "demo" has layer 0 and layer 1 now (two files).
• Future run with –Xshareclasses:cacheDir=/tmp/,name=demo starts up all existing layers.
43
44. Command line options on Multi-layer SCC
§ -Xshareclasses:createLayer
– Create a new shared cache layer
§ -Xshareclasses:layer=<number>
– Specify the top layer number. Create a new shared cache layer if the
specified layer does not exist.
§ Some applications launch multiple JVMs simultaneously
– If each’s running with –Xshareclasses:createLayer, N new layers created
– Use layer=<number> to ensure only one layer created.
44
46. 46
Out of the Box: OpenJ9 uses roughly half the memory
Footprint is 60% smaller with OpenJ9
Hotspot OpenJ9 OpenJ9 -Xshareclasses -
Xquickstart
Hotspot OpenJ9 OpenJ9 -Xshareclasses -Xquickstart
47. Sidebar: Life of a running Java application
”Big bang”
(java process
created)
Time
47
48. Sidebar: Life of a running Java application
”Big bang”
(java process
created)
Application
ready to do
work, can be
1000s classes
100s class
loaders
Code paths
& profile
stabilizes
Size and
Complexity
of
Class Hierarchy
48
Ready to run
main
~ 750 classes
~ 3 class loaders
RampupStartup
Time
Steady state
49. Sidebar: Life of a running Java application
”Big bang”
(java process
created)
Application
ready to do
work, can be
1000s classes
100s class
loaders
Code paths
& profile
stabilizes
Size and
Complexity
of
Class Hierarchy
49
Startup
Time
Steady state
JITAOT Rampup
Ready to run
main
~ 750 classes
~ 3 class loaders
50. Challenges with AOT compilation
• Native code is not platform neutral
• Different AOT code needed for each deployment platform (Linux, Mac, Windows)
• Other usability issues
• Some deployment options decided at build time, e.g. GC policy, ability to re-JIT, etc.
• Different platforms: different classes load and methods compiled?
• Curate lists of classes/modules, methods to compile as your application and its
dependencies evolve
• What about classes that aren’t available until the run starts?
• What methods to compile?
• Performance: AOT compilers can only reason about what happens at runtime
• Unlike JIT compiler which sees it happening
50
51. Profile Directed Feedback (PDF) may help?
• BUT: AOT code must run all possible user executions
• No longer compiling for “this” user on “this” run
• Really important to use representative input data when collecting profile for AOT
• Risk: can be misleading to use only a few input data sets
• AOT compiler can specialize to one data set and then run well on it
• But PDF can lead compiler astray if data isn’t properly representative
• Monomorphic in one runtime instance ≠ Monomorphic across all runtime instances
• Benchmarks may not stress AOT compilers properly (not many input sets)
• Cross training critically important
• Input data sets need to be curated and maintained as application and users evolve
• Profile data collection and curation responsibility is on the application provider
• Observation: PDF has not really been a huge success for static languages
51
52. Strengths and Weaknesses
52
JIT AOT
Code Performance (steady state)
Runtime: adapt to changes
Ease of use
Platform neutral deployment
Start up (ready to handle load)
Ramp up (until steady state)
Runtime: CPU & Memory
53. Strengths and Weaknesses
JIT AOT AOT +JIT
Code Performance (steady state)
Runtime: adapt to changes
Ease of use
Platform neutral deployment
Start up (ready to handle load)
Ramp up (until steady state)
Runtime: CPU & Memory
53
55. Caching JIT Compiles
• Basic idea:
• Store JIT compiled code (JIT) in a cache for loading by other JVMs (“AOT”)
• Goal: JIT compiled code performance levels earlier
• Also: reduce JIT compiler’s transient CPU and memory overheads
• Really different than AOT ? No and Yes
• From perspective of second+ JVM: code loads as if it was AOT compiled
• First JVM: JIT compiles while app runs but generates code that can be cached
• Need meta data to validate later runs match first (i.e. same classes loaded same way)
• If invalid, don’t use cached code: instead do JIT or even more AOT recompilations
• Return to platform neutrality!
• Different users still get compiled code tailored for their environment
55
56. OpenJ9: Caching JIT code accelerates start-up
• OpenJ9 Shared Class Cache (SCC)
• Memory mapped file for caching:
• Class files*
• AOT compiled code
• Profile data, hints
• Population of the cache happens
naturally and transparently at runtime
• Also -Xtune:virtualized
• Caches JIT code even more aggressively
to accelerate ramp-up (under load)
• Maybe slight (5-10%) performance drop
24
0
20
40
60
80
100
120
OpenJ9 no
SCC
OpenJ9
default
SCC
OpenJ9
full SCC
HotSpot
Normalizedstart-uptime
Apache Tomcat 8 Start-up Time
28%
43%
19%
• SCC for JCL bootstrap classes enabled by default
• Use -Xshareclasses option for full sharing
* Technically an internal format that can load faster than a .class file
57. Strengths and Weaknesses
57* After first run
JIT AOT AOT +JIT Cache
JIT
Code Performance (steady state)
Runtime: adapt to changes
Ease of use
Platform neutral deployment
Start up (ready to handle load) *
Ramp up (until steady state) *
Runtime: CPU & Memory *
58. Still some “not green” boxes there
…even for caching JITs…
L
58
59. What if the JIT became a JIT Server
JIT
JVM
JIT
Server
JIT
JVM
JIT
JVM
Orchestrator
load balancing,
affinity, scaling,
reliability
JIT
Server
JVM client identifies methods to compile, but asks server to do the actual compilation
• JIT server asks questions to the client JVM (about classes, environment, etc.)
• Sends generated code & meta data back to be installed in client’s code cache 59
60. Benefits of an independent JIT server
• Move much of JIT induced CPU and memory spikes away from client
• Client CPU and memory consumption dictated by application
• JIT server connected to client JVM at runtime, so:
• Theoretically no loss in performance using same profile and class hierarchy info
• Still adaptable to changing conditions
• JVM client still platform neutral
60
62. Could that work?
62
AcmeAir rampup with JIT Server using
–Xshareclasses
All JVMs run in containers, client and server on different machines with direct cable connection
Note: Hotspot takes twice as long as OpenJ9 to ramp up to about the same performance level
0
1000
2000
3000
4000
5000
6000
0 100 200 300 400 500 600
Throughput(pages/sec)
Time (sec)
AcmeAir with -Xshareclasses (Cold Run)
Container limits: 1P, 150M
JITServer-cold OpenJ9-cold
0
1000
2000
3000
4000
5000
6000
0 100 200 300 400 500 600
Throughput(pages/sec)
Time (sec)
AcmeAir with -Xshareclasses (Warm Run)
Container limits: 1P, 150M
JITServer-warm OpenJ9-warm
31
66. Strengths and Weaknesses
66* After first run ** After first run across cluster
JIT AOT AOT +JIT Cache
JIT
JIT
Server
Code Performance (steady state)
Runtime: adapt to changes
Ease of use
Platform Neutral deployment
Start up (ready to handle load) * **
Ramp up (until steady state) * **
Runtime: CPU & Memory *
67. JIT Server Current Status
• Code is fully open source at Eclipse Open J9
• Has now been merged into our master branch
• Now available in AdoptOpenJDK January 2020 update releases for JDK8 and 11 on Linux x86-64
platform
• Simple options lend well to all kinds of Java workload deployments
• Server: jitserver –XX:JITServerPort=<port> -XX:JITServerAddress=<host>
• Client: java -XX:+UseJITServer -XX:JITServerPort=<port>
-XX:JITServerAddress=<host> YourJavaApp
• We are seeking feedback on how well it works in real user environments!
• Try it now:
• E.g. https://adoptopenjdk.net/releases.html?variant=openjdk8&jvmVariant=openj9
• E.g. Docker pull adoptopenjdk:8-jdk-openj9 (on Linux x86-64 platform)
67
68. We are really just at the beginning…
• Primary focus has been on mechanics to move JIT compilation to a server
• Once compilation work is redirected to server :
• Do that work more efficiently across a cluster of JVMS (think microservices)
• Classify and categorize JVM clients using machine learning
• Optimize groups of microservices together
• …
68