ZGC-SnowOne.pdf

Z Garbage Collector –
Look Ma ~No Pause!
- Monica Beckwith
- @mon_beck

Agenda  The Basic Principles of an Adaptive,
Predictable Garbage Collector
 Designing a GC
 Introduction to Z Garbage Collector
 Design Considerations
 Phases
 Production Readiness
 Performance Impact and Considerations
 Comparison with G1 GC

The Basic Principles of an Adaptive,
Predictable Garbage Collector

OpenJDK HotSpot collectors are designed with different optimization
goals that lead to different algorithmic considerations.
• E.g., a generational collector helps with scaling and maximizing
throughput
• Having multiple GC threads working in parallel helps speed up the time
it takes to complete the GC work
Designing a Garbage Collector (GC)

Let’s Design a GC
GC
Generational Parallel Work
Young Old
Stop-the-
World Threads
Concurrent
Threads
Maintenance Barriers Maintenance Barriers

The goal is to avoid fragmentation and to not take resources
away from the application:
• The work of marking and compaction can be done in a single
stop-the-world (STW) pause/collection known as the Full GC.
• Avoid concurrent work as that may take resources away from
the application threads.
Designing a Throughput Maximizing GC

Let’s Design a Throughput Maximizing GC
Young Old
Stop-the-
World Threads
Concurrent
Threads
GC
Throughput Maximizer

A full compacting GC:
• Can get unpredictable and cause stalls that can cause you to miss delivering on your
system level objectives!
• May not scale well when your application has higher promotion rates with lots of
transients
In-order to be able to scale well and add some predictability, we need to
add:
• Regionalized heap &
• Partial compaction with concurrent marking
But I Can’t Deal With Those Long Pauses

Latency Sensitive
Partial
Compaction
Concurrent
Marking
Let’s Design a Latency Sensitive GC
Regionalized Heap
Young Old

So, you need a predictable, scalable, low-latency GC?
• Keep the partial compaction, but make it concurrent
• Set time budgets on the STW phases
• Fall back to concurrent work once time budget is exceeded
• Repeat incremental concurrent work and time budgeted STW phases until work is
done.
• Elicit application threads to help with concurrent compaction aka relocation work – this
is known as ‘self-healing’
But I Can’t Deal With Those Long Pauses Tail Latencies

Low-Latency Sensitive
Let’s Design a Low-Latency Sensitive GC
Regionalized Heap
Partial/Incremental
Concurrent
Marking
Compaction
Self Healing
Young Old
Maintenance Barriers
Generational
Young Old
Z GC
Not There
Yet

Introduction to Z Garbage Collector
Design Considerations

ZGC is an adaptive, near-real-time, scalable, predictable low-latency
collector
• It can guarantee sub milliseconds pause times
• The GC pause doesn’t increase with the application heap, live dataset or
the root set sizes
• It can span heap sizes from 8MBs up to 16TBs!
• It works concurrently with your application and strives to not let the
application throughput fall below 15%!
Z GC Design Goals

Z GC Core Concept – Colored Pointers
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Object Address
0
43
Unused
M
a
r
k
e
d
0
M
a
r
k
e
d
1
R
e
m
a
p
p
e
d
F
i
n
a
l
i
z
a
b
l
e
48
63
Object is known to
be marked?
Object is known to
not be pointing into
the relocation set?
Object is reachable
only through a
Finalizer?
Metadata stores in the unused bits of the 64-bit pointers
Virtual address mapping/tagging
Multi-mapping on x86-64, aarch64

Phases

GC Background Threads
Z GC Phases
Java Application Threads
Pause
Mark
Start
Pause
Mark
End
Pause
Relocate
Start
Concurrent
Mark/Remap
Concurrent
Relocate
Concurrent Prepare for
Relocation
GC Background Threads

Concurrent Marking
Heap
GC
Thread 0
GC
Thread 1
GC
Thread n
…
0 1 … n 0 1 … n 0 1 … n
Stripe 0 Stripe 1 Stripe n
• Heap divided into logical stripes
• GC threads work on their own stripe
• Minimizes shared state
• Load barrier to detect loads of non-marked object pointers
• Concurrent reference processing
• Thread local handshakes

Barriers – Loaded Reference Barrier
• Update a “bad” reference to a “good” reference
• Can be self-healing/repairing barrier when updating the source memory
location
• Imposes a set of invariants –
• “All visible loaded reference values will be safely “marked through” by the
collector, if they haven’t been already.
• All visible loaded reference values point to the current location of the
safely accessible contents of the target objects they refer to.”
Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting Collector, in 'Proceedings
of the international symposium on Memory management' , ACM, New York, NY, USA , pp. 79--88 .

Concurrent Compaction
 Load barrier to detect object pointers into the collection set
 Can be self-healing
 Off-heap forwarding tables enable to immediately release and reuse
virtual and physical memory

Production Readiness

In-order to provide more near-real-time control over the pauses, Z GC has provided many
improvements to OpenJDK HotSpot. Here’s a list of few of the optimizations:
• Thread local handshakes – JDK 10
• Load barriers and colored pointers – JDK 11
• Concurrent reference processing – JDK 11
• Concurrent class unloading – JDK 12
• Uncommit unused memory – JDK 13
• Windows and macOS support; Parallel PreTouch – JDK 14
• Compressed class pointers – JDK 15
• Concurrent Thread Stack Scanning – JDK 16
• Extended AArch64 support – JDK 16 (Windows), JDK 17 (macOS)
Z GC Is Production Ready in JDK 15

Thread Local Handshakes vs Global STW
Application Threads Application Threads
Safepoint
Requested
GC
Completed
Application Threads GC Threads Application Threads
Safepoint
Requested
GC
Completed
Handshakes
Time To Safepoint
(TTSP)
GC Threads

Performance Impact and Considerations
Comparison with G1 GC

Effects of Severe Fragmentation
Due to large LDS, varying object sizes and 1 GB/s allocation rates

System: Arm64 running Linux
JDK version: JDK 16 EA (build 16+35-2229)
Benchmark: HeapFragger
Heap Size: 200Gs (initial and max)
LDS: 100Gs
GCs Tested: ZGC and G1GC
Additional JVM Arguments: -XX:+UseLargePages -XX:+AlwaysPreTouch
Benchmark Allocation Rates: 1GB/s
Object Sizes: min: 128B; max: 32.00032MBs
Test Configuration

A Quick Look: Allocation Rate and Promotions
Allocation Rate Promotions

Graceful Degradation: ZGC with Severe Fragmentation
Stalls Observed By The
Application
Worst Case GC Pause
Time = ~0.55ms

Application Latency : G1 GC vs Z GC @1GB/s Allocation Rate
16GB/s
G1 GC
Z GC

Effects of Large Object Sizes and Higher
Allocation Rates (with 50% LDS Pressure)

JDK version: JDK 17 GA (build 17+35)
Benchmark: HyperAlloc
LDS: 40Gs
Benchmark Threads: 80
Benchmark Allocation Rates: 4GB/s, 8GB/s, 16GB/s
Object Sizes: min: 128B; max: 32.00032MBs (humongous objects)
Test Configuration

Under Pressure : High Allocation Rate, Short-Lived Objects
Live At Mark Start Live At Relocation End

Reasons For Triggering a GC Cycle
Cause Name Description
Timer When timer is up and if no other GC has been performed
yet.
Warmup Based on heap occupancy and if no other GC has been
performed yet.
Allocation Rate Based on high allocation rates and possibility to run out
of heap space
Allocation Stall Mutator blocked due to lack of heap space
Proactive To maintain lower heap sizes if occupancy increases by
10% since the last GC or 5 minutes have passes since.
High
Utilization
Avoid GC due to ‘Allocation Rate’ trigger by
preventatively trigger GC if heap is 95% occupied and
application has a low allocation rate
Count
Timer Warmup
Allocation Rate Allocation Stall
Proactive High Utilization

Very High Allocation Rates and Effects on ZGC
Worst Case Pause Time

Very High Allocation Rates and Effects on Application

A Quick Comparison With G1
GC Pause Times Latency Observed by Application

Application Latencies Comparison Between G1GC and ZGC

Effects of Large Object Sizes and Higher
Allocation Rates (with 50% LDS & 25% MDS
Pressure)

JDK version: JDK 17 GA (build 17+35)
Benchmark: HyperAlloc
LDS: 40Gs
MDS: 20Gs
Benchmark Threads: 80
Benchmark Allocation Rates: 4GB/s, 8GB/s, 16GB/s
Object Sizes: min: 128B; max: 32.00032MBs (humongous objects)
Test Configuration

Under Pressure : High Allocation Rate, Medium-Lived Objects
Live At Mark Start Live At Relocation End

Very High Allocation Rate and Effects on ZGC
Worst Case Pause Time

Allocation Stalls and Effects on Application

A Quick Comparison With G1
Latency Observed by Application
GC Pause Times

© Copyright Microsoft Corporation. All rights reserved.
https://wiki.openjdk.java.net/display/zgc/Main
https://malloc.se/blog/zgc-jdk15
http://cr.openjdk.java.net/~pliden/slides/ZGC-OracleDevLive-2020.pdf
Tools:
Censum and some parsing scripts and JFreeChart + GCHisto
jHiccup: https://support.azul.com/hc/en-us/articles/217877803-How-to-Analyze-and-
Visualize-jHiccup-Logs
HistogramLogAnalyzer: https://github.com/HdrHistogram/HistogramLogAnalyzer
Thank You!

ZGC-SnowOne.pdf

Recommended

Recommended

More Related Content

Similar to ZGC-SnowOne.pdf

Similar to ZGC-SnowOne.pdf (20)

More from Monica Beckwith

More from Monica Beckwith (19)

Recently uploaded

Recently uploaded (20)

ZGC-SnowOne.pdf