Are your application's tail-latencies holding it back from delivering its near-real time SLOs? Do your in-memory processing platform's long pauses only get worse with increasing heap sizes? How about those latency spikes causing variability in your end-to-end latency for your multi-tiered distributed systems?
If any of the above keep you up at night, then have no fear as Z Garbage Collector (GC) is here and is production ready in JDK 15.
In this talk, Monica Beckwith will cover the basics of Z GC and contrast it with G1 GC (the current default collector for OpenJDK JDK 11 LTS and tip).
2. Agenda The Basic Principles of an Adaptive,
Predictable Garbage Collector
Designing a GC
Introduction to Z Garbage Collector
Design Considerations
Phases
Production Readiness
Performance Impact and Considerations
Comparison with G1 GC
4. OpenJDK HotSpot collectors are designed with different optimization
goals that lead to different algorithmic considerations.
• E.g., a generational collector helps with scaling and maximizing
throughput
• Having multiple GC threads working in parallel helps speed up the time
it takes to complete the GC work
Designing a Garbage Collector (GC)
5. Let’s Design a GC
GC
Generational Parallel Work
Young Old
Stop-the-
World Threads
Concurrent
Threads
Maintenance Barriers Maintenance Barriers
6. The goal is to avoid fragmentation and to not take resources
away from the application:
• The work of marking and compaction can be done in a single
stop-the-world (STW) pause/collection known as the Full GC.
• Avoid concurrent work as that may take resources away from
the application threads.
Designing a Throughput Maximizing GC
7. Let’s Design a Throughput Maximizing GC
Generational Parallel Work
Young Old
Stop-the-
World Threads
Concurrent
Threads
Maintenance Barriers Maintenance Barriers
GC
Throughput Maximizer
8. A full compacting GC:
• Can get unpredictable and cause stalls that can cause you to miss delivering on your
system level objectives!
• May not scale well when your application has higher promotion rates with lots of
transients
In-order to be able to scale well and add some predictability, we need to
add:
• Regionalized heap &
• Partial compaction with concurrent marking
But I Can’t Deal With Those Long Pauses
9. Latency Sensitive
Generational Parallel Work
Partial
Compaction
Concurrent
Marking
Let’s Design a Latency Sensitive GC
Regionalized Heap
Young Old
Maintenance Barriers Maintenance Barriers
10. So, you need a predictable, scalable, low-latency GC?
• Keep the partial compaction, but make it concurrent
• Set time budgets on the STW phases
• Fall back to concurrent work once time budget is exceeded
• Repeat incremental concurrent work and time budgeted STW phases until work is
done.
• Elicit application threads to help with concurrent compaction aka relocation work – this
is known as ‘self-healing’
But I Can’t Deal With Those Long Pauses Tail Latencies
11. Low-Latency Sensitive
Generational Parallel Work
Let’s Design a Low-Latency Sensitive GC
Regionalized Heap
Partial/Incremental
Concurrent
Marking
Compaction
Self Healing
Young Old
Maintenance Barriers
Maintenance Barriers
Generational
Young Old
Maintenance Barriers
Z GC
Not There
Yet
13. ZGC is an adaptive, near-real-time, scalable, predictable low-latency
collector
• It can guarantee sub milliseconds pause times
• The GC pause doesn’t increase with the application heap, live dataset or
the root set sizes
• It can span heap sizes from 8MBs up to 16TBs!
• It works concurrently with your application and strives to not let the
application throughput fall below 15%!
Z GC Design Goals
14. Z GC Core Concept – Colored Pointers
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Object Address
0
43
Unused
M
a
r
k
e
d
0
M
a
r
k
e
d
1
R
e
m
a
p
p
e
d
F
i
n
a
l
i
z
a
b
l
e
48
63
Object is known to
be marked?
Object is known to
not be pointing into
the relocation set?
Object is reachable
only through a
Finalizer?
Metadata stores in the unused bits of the 64-bit pointers
Virtual address mapping/tagging
Multi-mapping on x86-64, aarch64
18. Barriers – Loaded Reference Barrier
• Update a “bad” reference to a “good” reference
• Can be self-healing/repairing barrier when updating the source memory
location
• Imposes a set of invariants –
• “All visible loaded reference values will be safely “marked through” by the
collector, if they haven’t been already.
• All visible loaded reference values point to the current location of the
safely accessible contents of the target objects they refer to.”
Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting Collector, in 'Proceedings
of the international symposium on Memory management' , ACM, New York, NY, USA , pp. 79--88 .
19. Concurrent Compaction
Load barrier to detect object pointers into the collection set
Can be self-healing
Off-heap forwarding tables enable to immediately release and reuse
virtual and physical memory
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
21. In-order to provide more near-real-time control over the pauses, Z GC has provided many
improvements to OpenJDK HotSpot. Here’s a list of few of the optimizations:
• Thread local handshakes – JDK 10
• Load barriers and colored pointers – JDK 11
• Concurrent reference processing – JDK 11
• Concurrent class unloading – JDK 12
• Uncommit unused memory – JDK 13
• Windows and macOS support; Parallel PreTouch – JDK 14
• Compressed class pointers – JDK 15
• Concurrent Thread Stack Scanning – JDK 16
• Extended AArch64 support – JDK 16 (Windows), JDK 17 (macOS)
Z GC Is Production Ready in JDK 15
22. Thread Local Handshakes vs Global STW
Application Threads Application Threads
Safepoint
Requested
GC
Completed
Application Threads GC Threads Application Threads
Safepoint
Requested
GC
Completed
Handshakes
Time To Safepoint
(TTSP)
GC Threads
31. Under Pressure : High Allocation Rate, Short-Lived Objects
Live At Mark Start Live At Relocation End
32. Reasons For Triggering a GC Cycle
Cause Name Description
Timer When timer is up and if no other GC has been performed
yet.
Warmup Based on heap occupancy and if no other GC has been
performed yet.
Allocation Rate Based on high allocation rates and possibility to run out
of heap space
Allocation Stall Mutator blocked due to lack of heap space
Proactive To maintain lower heap sizes if occupancy increases by
10% since the last GC or 5 minutes have passes since.
High
Utilization
Avoid GC due to ‘Allocation Rate’ trigger by
preventatively trigger GC if heap is 95% occupied and
application has a low allocation rate
Count
Timer Warmup
Allocation Rate Allocation Stall
Proactive High Utilization
39. Under Pressure : High Allocation Rate, Medium-Lived Objects
Live At Mark Start Live At Relocation End
40. Reasons For Triggering a GC Cycle
Cause Name Description
Timer When timer is up and if no other GC has been performed
yet.
Warmup Based on heap occupancy and if no other GC has been
performed yet.
Allocation Rate Based on high allocation rates and possibility to run out
of heap space
Allocation Stall Mutator blocked due to lack of heap space
Proactive To maintain lower heap sizes if occupancy increases by
10% since the last GC or 5 minutes have passes since.
High
Utilization
Avoid GC due to ‘Allocation Rate’ trigger by
preventatively trigger GC if heap is 95% occupied and
application has a low allocation rate
Count
Timer Warmup
Allocation Rate Allocation Stall
Proactive High Utilization