1. Java Team
OpenJDK: In the New Age of
Concurrent Garbage Collectors
HotSpot’s Regionalized GCs
Monica Beckwith
JVM Performance
java-performance@Microsoft
@mon_beck
2. Agenda
Part 1 – Groundwork & Commonalities
Laying the Groundwork
Stop-the-world (STW) vs concurrent collection
Heap layout – regions and generations
Basic Commonalities
Copying collector – from and to spaces
Regions – occupied and free
Collection set and priority
July 10th, 2020
3. Agenda
Part 2 – Introduction & Differences
Introduction to G1, Shenandoah and Z GCs
Algorithm
Basic Differences
GC phases
Marking
Barriers
Compaction
July 10th, 2020
10. HeapFrom Space To Space
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
GC ROOTS
THREAD
1 STACK
THREAD
N STACK
STATIC
VARIABLES
ANY JNI
REFERENCES
Copying aka Evacuating Collector
11. O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
GC ROOTS
THREAD
1 STACK
THREAD
N STACK
O O
O O
O
STATIC
VARIABLES
ANY JNI
REFERENCES
O
OO
O
O
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
Copying aka Evacuating Collector
12. Copying aka Evacuating Collector
O O O O O O O O
O O O O O O O
O O O O O O O
O O O
O O O
O O
14. Occupied and Free Regions
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O
O O O O
O O O O
• List of free regions
• In case of generational heap (like G1), the occupied regions could be young, old or
humongous
16. Collection Priority and Collection Set
O O O O O O O O O O
O O O O O O O O O O
O O O O O O O O O O
O O O O
O O O O
O O O O
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OO
OOOO
• Priority is to reclaim regions with most garbage
• The candidate regions for collection/reclamation/relocation are said to be in a collection set
• There are threshold based on how expensive a region can get and maximum regions to
collect
• Incremental collection aka incremental compaction or partial compaction
• Usually needs a threshold that triggers the compaction
• Stops after the desired reclamation threshold or free-ness threshold is reached
• Doesn’t need to be stop-the-world
20. GC Phases of Marking and Compaction
G1 GC Gist
Initial Mark Mark objects directly reachable by the roots
Concurrent Root Region
Scanning
Since initial mark is piggy-backed on a young collection, the
survivor regions need to be scanned
Concurrent Marking Snapshot-at-the-beginning (SATB) algorithm
Final Marking Drain SATB buffers; traverse unvisited live objects
Cleanup Identify and free completely free regions, sort regions based on
liveness and expense
STW Compaction Move objects in collection set to “to” regions; free regions in
collection set
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
21. Concurrent Marking
Logical snapshot of the heap
SATB marking guarantees that all garbage objects that are present at the start of the
concurrent marking phase will be identified by the snapshot
But application mutates its object graph
Any new objects are considered live
For any reference update, the mutator needs to log the previous value in a log
queue
This is enabled by a pre-write barrier
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
•https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf
Snapshot-at-the-beginning (SATB) Algorithm
22. Barriers
SATB Pre-Write Barrier
The pseudo-code of the pre-write barrier for an assignment of the form x.f := y is:
if (marking_is_active) {
pre_val := x.f;
if (pre_val != NULL) {
satb_enqueue(pre_val);
}
}
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
23. Barriers
Post Write Barrier
Consider the following assignment:
object.field = some_other_object
G1 GC will issue a write barrier after the reference is updated, hence the name.
G1 GC filters the need for a barrier by way of a simple check as explained below:
(&object.field XOR &some_other_object) >> RegionSize
If the check evaluates to zero, a barrier is not needed.
If the check != zero, G1 GC enqueues the card in the update log buffer
https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
24. STW Compaction
Forwarding Pointer in Header
BodyHeader
A Java Object
Pointer
Pointer to an
InstanceKlass
Mark Word
b b
GC workers compete to install the forwarding pointer
From source:
• An InstanceKlass is the VM level representation of a Java class. It contains all information needed for at
class at execution runtime.
• When marked the bits will be 11
26. GC Phases of Marking and Compaction
Z GC Gist
Initial Mark Mark objects directly reachable by the roots
Concurrent Marking Striping - GC threads walk the object graph and
mark
Final Marking Traverse unvisited live objects; weak root cleaning
Concurrent Prepare for Compaction Identify collection set; reference processing
Start Compaction Handles roots into the collection set
Concurrent Compaction Move objects in collection set to “to” regions
Concurrent Remap (done with Concurrent Marking
of next cycle since walks the object graph)
Fixup of all the pointers to now-moved objects
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
27. Striping
Heap divided into logical stripes
GC threads work on their own stripe
Minimizes shared state
Load barrier to detect loads of non-marked object pointers
Concurrent reference processing
Thread local handshakes
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Heap
GC
Thread0
GC
Thread1
GC
Threadn
…
0 1 … n 0 1 … n 0 1 … n
Stripe
0
Stripe
1
Stripe
n
Concurrent Marking
28. Barriers
Read Barrier – For References
Update a “bad” reference to a “good” reference
Can be self-healing/repairing barrier when updates the source memory
location
Imposes a set of invariants –
“All visible loaded reference values will be safely “marked through” by the
collector, if they haven’t been already.
All visible loaded reference values point to the current location of the safely
accessible contents of the target objects they refer to.”
Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting
Collector, in 'Proceedings of the international symposium on Memory management' , ACM, New York, NY,
USA , pp. 79--88 .
Loaded Reference Barrier
29. Example
Object o = obj.fieldA; // Loading an object reference from
heap
load_barrier(register_for(o), address_of(obj.fieldA));
if (o & bad_bit_mask) {
slow_path(register_for(o),
address_of(obj.fieldA)); }
30. Example
mov 0x20(%rax), %rbx // Object o = obj.fieldA;
test %rbx, (0x16)%r15 // Bad color?
jnz slow_path // Yes -> Enter slow path and
mark/relocate/remap,
// adjust 0x20(%rax) and %rbx
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
31. Core Concept
Colored Pointers
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Object Address
041
Unused
M
a
r
k
e
d
0
M
a
r
k
e
d
1
R
e
m
a
p
p
e
d
F
i
n
a
l
i
z
a
b
l
e
4663
Object is known to
be marked?
Object is known to
not be pointing into
the relocation set?
Object is reachable
only through a
Finalizer?
Metadata stores in the unused bits of the 64-bit pointers
Virtual address mapping/tagging
Multi-mapping on x86-64
Hardware support on SPARC, aarch64
32. Concurrent Compaction
Load barrier to detect object pointers into the collection set
Can be self-healing
Off-heap forwarding tables enable to immediately release and reuse
virtual and physical memory
http://cr.openjdk.java.net/~pliden/slides/ZGC-Jfokus-2018.pdf
Off-Heap Forwarding Tables
34. GC Phases of Marking and Compaction
https://wiki.openjdk.java.net/display/shenandoah/Main
Shenandoah GC Gist
Initial Mark Mark objects directly reachable by the roots
Concurrent Marking Snapshot-at-the-beginning (SATB) algorithm
Final Marking Drain SATB buffers; traverse unvisited live objects;
identify collection set
Concurrent Cleanup Free completely free regions
Concurrent Compaction Move objects in collection set to “to” regions
Initial Update Reference Initialize the update reference phase
Concurrent Update Reference Scans the heap linearly; update any references to
objects that have moved
Final Update Reference Update roots to point to to-region copies
Concurrent Cleanup Free regions in collection set
35. Concurrent Marking
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
•https://www.jfokus.se/jfokus17/preso/Write-Barriers-in-Garbage-First-Garbage-Collector.pdf
Snapshot-at-the-beginning (SATB) Algorithm
36. Barriers
SATB Pre-Write Barrier - Recap
•C. Hunt, M. Beckwith, P. Parhar, B. Rutisson. Java Performance Companion.
Needed for all updates
Check if “marking-is-active”
SATB_enqueue the pre_val
37. Barriers
Read Barrier – For Concurrent Compaction
Here’s an assembly code snippet for reading a field:
mov 0x10(%rsi),%rsi ; *getfield value
Here’s what the snippet looks like with Shenandoah:
mov -0x8(%rsi),%rsi ; read of forwarding pointer at address
object - 0x8
mov 0x10(%rsi),%rsi ; *getfield value
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
38. Barriers
Copying Write Barrier – For Concurrent Compaction
Needed for all updates to ensure to-space invariant
Check if “evacuation_in_progress”
Check if “in_collection_set” and “not_yet_copied”
CAS (fwd-ptr(obj), obj, copy)
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
39. Barriers
Read Barrier – For Concurrent Compaction
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
40. Barriers
Copying Write Barrier – For Concurrent Compaction
*Flood, Christine & Kennke, Roman & Dinn, Andrew & Haley, Andrew & Westrelin, Roland. (2016).
Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK. 1-9.
10.1145/2972206.2972210.
41. Barriers
Loaded Reference Barrier - Recap
Tene, G.; Iyengar, B. & Wolf, M. (2011), C4: The Continuously Concurrent Compacting
Collector, in 'Proceedings of the international symposium on Memory management' , ACM, New York, NY,
USA , pp. 79--88 .
https://developers.redhat.com/blog/2019/06/27/shenandoah-gc-in-jdk-13-part-1-load-reference-barriers/
Ensure strong ‘to-space invariant’
Utilize barriers at reference load
Check if fast-path-possible; else do-slow-path
42. Concurrent Compaction
Brooks Style Indirection Pointer
BodyHeader
A Java Object
Indirection
Pointer
Forwarding pointer is placed before the object
Additional work of dereferencing per object
43. Concurrent Compaction
Brooks Style Indirection Pointer
Forwarding pointer is placed before the object
Additional work of dereferencing per object
44. Concurrent Compaction
Forwarding Pointer in Header
BodyHeader
To Space Copy Java Object
Body
Forwarding
Pointer
From Space Java Object
X
https://developers.redhat.com/blog/2019/06/28/shenandoah-gc-in-jdk-13-part-2-eliminating-the-forward-
pointer-word/
46. Variability: OpenJDK 8 LTS OpenJDK 11 LTS
JDK 11 LTS significantly less variability than JDK 8 LTS for responsiveness
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
SPECjbb2015
JDK 8 LTS
Full System Capacity Responsiveness
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
SPECjbb2015
JDK 11 LTS
Full System Capacity Responsiveness
0
5
10
15
JDK 8 LTS JDK 11 LTS
% STD Dev Full System Capacity Responsiveness
With G1 GC
47. 0.00
0.25
0.50
0.75
1.00
1.25
1.50
JDK 8 LTS JDK 11 LTS JDK 12 JDK 13
Full System Capacity Responsiveness
Out-of-box* GC Performance
OpenJDK 8 LTS - > OpenJDK 11 LTS
"-Xmx150g –Xms150g -Xmn130g"
G1 GC became the default GC
Higher is Better
48. Out-of-box* OpenJDK GC Performance
Innovation happens at tip
*With Xmx=Xms
0.85
0.90
0.95
1.00
1.05
1.10
Full System Capacity Responsiveness
PGC JDK tip vs JDK 11
G1GC JDK tip vs JDK 11
ZGC JDK tip vs JDK 11
Higher is Better
Root set includes: thread local variables, references embedded in generated code, interned Strings, references from classloaders (e.g. static final references), JNI references, JVMTI references. Having larger root set generally means longer pauses with Shenandoah, see below for diagnostic techniques
Compacting garbage collection algorithms have been shown to have smaller memory footprints and better cache locality than in place algorithms like Concurrent Mark and Sweep (CMS)
Objects allocated during the concurrent marking phase will be considered live but they are not traced, thus reducing the marking overhead. The technique guarantees that all live objects that were alive at the start of the marking phase are marked and traced and any new allocations made by the concurrent mutator threads during the marking cycle are marked as live and consequently not collected.
The marking_is_active condition is a simple check of a thread local flag that is set to true at the start of marking, during the initial mark pause. Guarding the rest of the pre-barrier code with this check reduces the overhead of executing the remainder of the barrier code when marking is not active. Since the flag is thread-local and it's value may be loaded multiple times, it is likely that any individual check will hit in cache - further reducing the overhead of the barrier.
Remembered Sets
Klass pointer
Live objects that need to be evacuated are copied to thread-local GC allocation buffers (GCLABs) allocated in target regions. Worker threads compete to install a forwarding pointer to the newly allocated copy of the old object image. With the help of work stealing [2], a single “winner” thread helps with copying and scanning the object. Work stealing also provides load balancing between the worker threads.
Each section here marks a heap word. That would be 64 bits on 64-bit architectures and 32 bits on 32-bit architectures.
The first word is the so-called mark word, or header of the object. It is used for a variety of purposes. For example, it can keep the hash-code of an object; it has 3 bits that are used for various locking states; some GCs use it to track object age and marking status; and it can be “overlaid” with a pointer to the “displaced” mark, to an “inflated” lock, or, during GC, the forwarding pointer.
The second word is reserved for the klass pointer. This is simply a pointer to the Hotspot-internal data structure that represents the class of the object.
Arrays would have an additional word next to store the array length. What follows is the actual payload of the object, that is, fields and array elements.
Weak root cleaning (string table)
Finalizable mark: Final reachable: object about to be finalized
writes and reads always happen into/from the to-space copy = strong to-space invariant.
Finalizable mark: Final reachable: object about to be finalized
Self-healing is where Java threads will help out
SGC no need to update refs, fwding pointer.; When we start register %rsi contains the address of the object, and the field is at offset 0x10.
To-space invariant – all writes are made into the object in to-space
Even primitives and locking of objects – exotic barriers acmp (pointer comparison), CAS, clone
writes and reads always happen into/from the to-space copy = strong to-space invariant.