SlideShare a Scribd company logo
1 of 84
Download to read offline
Jean-Philippe BEMPEL
WebScale
@jpbempel
Understanding
Low Latency
JVM GCs
2 •
• GC basics
• Shenandoah
• Azul’s C4
• ZGC
• How to choose a GC algorithm?
Understanding Low Latency JVM GCs
GC Basics
4 •
Generations
5 •
• Used in CMS & G1 algorithms already and by all low latency GCs
• Try to mark the whole object graph concurrently with the application running
• Based on Tri-color abstraction
Concurrent Marking
6 •
Concurrent Marking: Tri-Color Abstraction
7 •
Concurrent Marking: Tri-Color Abstraction
8 •
Concurrent Marking: Tri-Color Abstraction
9 •
Concurrent Marking: Tri-Color Abstraction
10 •
Concurrent Marking: Tri-Color Abstraction
11 •
Concurrent Marking: Tri-Color Abstraction
12 •
Concurrent Marking: Tri-Color Abstraction
13 •
Concurrent Marking: Issues
• New allocations during marking phase can be handled by:
• Marking automatically object at allocation
• Not considering new allocations for the current cycle
• Tri-Color abstraction provides 2 properties of missed object:
1. The mutator stores a reference to a white object into a black object.
2. All paths from any gray objects to that white object are destroyed.
http://www.memorymanagement.org/glossary/s.html#term-snapshot-at-the-beginning
14 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
15 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
16 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
17 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
OOPS!
18 •
• 2 ways to ensure not missing any marking
• Snapshot-At-The-Beginning
• Incremental Update
• For SATB, Pre-Write Barriers, recording object for marking
• Before a reference assignation (X.f = Y)
• SATB barrier is only active when Marking is on (global state)
Concurrent Marking: Resolving misses
if (SATB_WriteBarrier) {
if (X.f != null)
SATB_enqueue(X.f);
}
cmp BYTE PTR [r15+0x30],0x0
jne 0x000002965edc62e5
[...]
mov r11d,DWORD PTR [rbp+0x74]
test r11d,r11d
je 0x000002965edc6253
mov r10,QWORD PTR [r15+0x38]
mov rcx,r11
shl rcx,0x3
test r10,r10
je 0x000002965edc6318
mov r11,QWORD PTR [r15+0x48]
mov QWORD PTR [r11+r10*1-0x8],rcx
add r10,0xfffffffffffffff8
mov QWORD PTR [r15+0x38],r10
jmp 0x000002965edc6253
mov rdx,r15
movabs r10,0x7ffac2febc50
call r10
jmp 0x000002965edc6253
Shenandoah
20 •
• Non-generational (still option for partial collection)
• Region based
• Use Read Barrier: Brooks pointer
• Self-Healing
• Cooperation between mutator threads & GC threads
• Only for concurrent compaction
• Mostly based on G1 but with concurrent compaction
Shenandoah GC
21 •
• Initial Marking (STW)
• Concurrent Marking
• Final Remark (STW)
• Concurrent Cleanup
• Concurrent Evacuation
• Init Update References (STW)
• Concurrent Update References
• Final Update References (STW)
• Concurrent Cleanup
Shenandoah Phases
22 •
• SATB-style (like G1)
• 2 STW pauses for Initial Mark & Final Remark
• Conditional Write Barrier
• To deal with concurrent modification of object graph
Concurrent Marking
23 •
• Same principle than G1:
• Build CollectionSet with Garbage First!
• Evacuate to new regions to release the region for reuse
• Concurrent Evacuation done with the help of:
• 1 Read Barrier : Brooks pointer
• 4 Write Barriers
• Barriers help to keep the to-space invariant:
• All Writes are made into an object in to-space
Concurrent Evacuation
24 •
• All objects have an additional forwarding pointer
• Placed before the regular object
• Dereference the forwarding pointer for each access
• Memory footprint overhead
• Throughput overhead
Brooks pointers
Header
Brooks pointer
mov r13,QWORD PTR [r12+r14*8-0x8]
25 •
Concurrent Copy: GC thread
Header
Brooks pointer
From-Space To-Space
26 •
Concurrent Copy: GC thread
Header
Brooks pointer
From-Space To-Space
GC thread
27 •
Concurrent Copy: GC thread
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
GC thread
28 •
Concurrent Copy: GC thread
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
GC thread
29 •
Concurrent Copy: Reader threads
Header
Brooks pointer
From-Space To-Space
Reader
thread
Reader
thread
30 •
Concurrent Copy: Writer threads
Header
Brooks pointer
From-Space To-Space
31 •
Concurrent Copy: Writer threads
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
32 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
33 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
34 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
35 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
36 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
37 •
• Any writes (even primitives) to from-space object needs to be protected
• Exotic barriers:
• acmp (pointer comparison)
• CAS
• clone
Write Barriers
if (evacInProgress
&& inCollectionSet(obj)
&& notCopyYet(obj)) {
evacuateObject(obj)
}
test BYTE PTR [r15+0x3c0],0x2
jne 0x000000000281bcbc
[...]
mov r10d,DWORD PTR [r13+0xc]
test r10d,r10d
je 0x000000000281bc2b
mov r11,QWORD PTR [r15+0x360]
mov rcx,r10
shl rcx,0x3
test r11,r11
je 0x000000000281bd0d
[...]
mov rdx,r15
movabs r10,0x62d1f660
call r10
jmp 0x000000000281bc2b
38 •
• Late memory release
• Only happens when all refs updated (Concurrent Cleanup phase)
• Allocations can overrun the GC
• Failure modes:
• Pacing
• Degenerated GC
• FullGC
Extreme cases
Azul’s C4
40 •
• Generational (young & old)
• Region based (pages)
• Use Read Barrier: Loaded Value Barrier
• Self-Healing
• Cooperation between mutator threads & GC threads
• Pauseless algorithm but implementation requires safepoints
• Pauses are most of the time < 1ms
Continuously Concurrent Compacting Collector
41 •
• Baker-style Barrier
• move objects through forwarding addresses stored aside
• Applied at load time, not when dereferencing
• Ensure C4 invariants:
• Marked Through the current cycle
• Not relocated
• If not => Self-healing process to correct it
• Mark object
• Relocate & correct reference
• Checked for each reference loads
• Benefits from JIT optimization for caching loaded value (registers)
LVB
42 •
• States of objects stored inside reference address => Colored pointers
• NMT bit
• Generation
• Checked against a global expected value during the GC cycle
• Thread local, almost always L1 cache hits
• Register
• Relocated: x86 Implementation use trap from VM memory translation Guest/Host
• Intel EPT
• AMD NPT
LVB
test r9, rax
jne 0x3001443b
mov r10d, dword ptr [rax + 8]
43 •
Virtual Memory vs Physical Memory
Virtual Memory
Physical Memory
0 2^64
0 2^37
44 •
• All phases are fully parallel & concurrent
• No "rush" to finish phases
• No constraint about STW pause to be short
• Physical memory released quickly in relocation phase
• Can be reused for new allocations
• Plenty of virtual space vs physical memory
C4 Phases
45 •
• Mark
• Marking all objects in graph
• Relocation
• Moving objects to release pages
• Remap
• Fixup references in object graph
• Folded with next mark cycle
C4 Phases
46 •
• Incremental Update Marking (vs SATB)
• Single pass
• No final mark/remark
• Self-Healing: Mark object that are not marked for the current cycle
Mark Phase
47 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
48 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
49 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
LVB
50 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
LVB
51 •
• Scanning roots (Static var, Thread stacks, register, JNI handles)
• GC threads scans stalled threads
• Running threads scans their own stack stopping individually at Safepoint
• Scanning object graph like a parallel collector
• Newly allocated objects into new pages, not considered for reclaim (relocation)
• For each page, summing live data bytes, used to select page to reclaim
Mark Phase
52 •
• Select pages with the greatest number of dead objects (garbage first!)
• Protect page selected from being accessed by mutators thread
• Move objects to new allocated pages
• Build side arrays (off heap tables) for forwarding information
• Self-Healing: As protected, LVB will trigger a trap to:
• Copy object to the new location if not done
• Use forward pointer to fix the reference
Relocation Phase
53 •
Virtual
Physical
Relocation Phase
54 •
Virtual
Physical
Relocation Phase
55 •
Virtual
Physical
Relocation Phase
56 •
Virtual
Physical
Relocation Phase
57 •
Virtual
Physical
Relocation Phase
58 •
Virtual
Physical
Relocation Phase
59 •
Virtual
Physical
Relocation Phase
Forwarding table
60 •
Virtual
Physical
Relocation Phase
Forwarding table
61 •
Virtual
Physical
Relocation Phase
Forwarding table
62 •
Virtual
Physical
Relocation Phase
Forwarding table
63 •
• Few chances mutators stall on accessing a ref as processing mostly dead pages
• Once object copy done, physical memory is released (Quick Release)
• Can be immediately reused (remapped) to satisfy new allocations
• Pages evacuated are still mapped & protected to help remap phase
• Cannot be released until all objects are remapped
• Not a problem as we have a huge virtual address space
Relocation Phase
64 •
• Traverse Object Graph and fixup references
• Execute LVB barrier for each object
• Self-Healing: fixup references using forward information
• As we traverse again, mark for the next phase
• Mark & Remap phases are folded!
Remap Phase
65 •
• Algorithm requires a sustainable rate of remapping operations
• Linux limitations:
• TLB invalidation
• Only 4KB pages can be remapped
• Single threaded remapping (write lock)
• Kernel module implements API for the Zing JVM to increase significantly the remapping rate
• Implements also virtual address aliasing for addressing objects with metadata
Remap – Kernel module
66 •
• Young & Old collections done by same algorithm and can be concurrent
• Size of the generation are dynamically adjusted
• Card Marking with write barrier (Stored Value Barrier)
• Old collection is based on young-to-old roots generated by previous young cycle
• Young collection will perform card scanning per page
• hold an eventual concurrent Old collection per page scanned
Generational
67 •
• Used by Hadoop Name Node
• 580GB Heap
• Very hard to tune with G1
• No issue so far regarding GC since production roll out (Oct 2017)
C4 @ Criteo
Z GC
69 •
• Non generational
• Region based (zPages, dynamically sized)
• Concurrent Marking, Compaction, Ref processing
• Use Colored Pointers & Read/Load Barrier
• Self-Healing
• Cooperation between mutator threads & GC threads
• Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC)
Z GC
mov r10,QWORD PTR [r11+0xb0]
test QWORD PTR [r15+0x20],r10
jne 0x00007f9594cc54b5
70 •
Z GC
71 •
• Initial Mark (STW)
• Concurrent Mark/Remap
• Final Mark (STW)
• Concurrent Prepare for Relocation
• Start Relocate (STW)
• Concurrent Relocate
Z GC phases:
72 •
• Store metadata in unused bits of reference address
• 42 bits for addressing (4TB)
• 4 bits for metadata
• Marked0
• Marked1
• Remapped
• Finalizable
Colored Pointers
73 •
• Colored pointers needs to be unmasked for dereferencing
• Some HW support masking (SPARC, Aarch64))
• On linux/windows, overhead if done with classical instructions
• Only one view is active at any point
• Plenty of Virtual Space
Multi-Mapping
74 •
Multi-Mapping
Virtual Memory
Physical Memory
0 2^64
0 2^37
(marked0)
001<address>
(marked1)
010<address>
(remapped)
100<address>
75 •
• Pages are multiple of 2MB
• 3 different groups
• Small: 2MB pages with object size <= 256KB
• Medium: 32MB pages with object size <= 4MB
• Large: 2MB pages, objects span over multiple of them
• Objects in Large group are meant to not to be relocated (too expensive)
Page Allocations
76 •
• Handling remapping
• C4: Memory protection + trap
• Z: mask in colored pointer
• Unmasking ref addresses
• C4: Kernel module aliasing
• Z: Multi-mapping or HW support
• Pages & Relocation
• C4:
• Page are fixed to match OS size (mem protection)
• relocation for large objects by remapping
• Z:
• zPages are dynamic, a zPage can be 100MB large
• No relocation for large objects
Difference between C4 & Z GC
How to choose a GC algorithm
78 •
• You have to run on Windows
• Shenandoah
• Battlefield tested GC (maturity)
• C4
• Shenandoah
• Minimizing any kind of JVM pauses
• C4
• Z
• You don’t want pay for it:
• Shenandoah
• Z
Low latency GCs
References
80 •
• Java Garbage Collection distilled by Martin Thompson
• The Java GC mini book
• Oracle’s white paper on JVM memory management & GC
• What differences JVM makes by Nitsan Wakart
• Memory Management Reference
• IBM Pause-Less GC
References GC Basics
81 •
• Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK
• Shenandoah: The Garbage Collector That Could by Aleksey Shipilev
• Shenandoah GC Wiki
References Shenandoah
82 •
• The Pauseless GC algorithm (2005)
• C4: Continuously Concurrent Compacting Collector (2011)
• Azul GC in Detail by Charles Humble
• 2010 version source code
References C4
83 •
• ZGC - Low Latency GC for OpenJDK by Per Liden
• Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton
• A first look into ZGC by Dominik Inführ
• Architectural Comparison with C4/Pauseless
References ZGC
Thank You!
@jpbempel

More Related Content

What's hot

Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsGael Hofemeier
 
Android Multimedia Framework
Android Multimedia FrameworkAndroid Multimedia Framework
Android Multimedia FrameworkPicker Weng
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkDataWorks Summit
 
A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerNikita Popov
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceMariaDB plc
 
Building an Activity Feed with Cassandra
Building an Activity Feed with CassandraBuilding an Activity Feed with Cassandra
Building an Activity Feed with CassandraMark Dunphy
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondChris Travers
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Tiago Sousa
 

What's hot (20)

Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® Graphics
 
Embedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile DevicesEmbedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile Devices
 
Android Multimedia Framework
Android Multimedia FrameworkAndroid Multimedia Framework
Android Multimedia Framework
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Linux DMA Engine
Linux DMA EngineLinux DMA Engine
Linux DMA Engine
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
 
A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizer
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
Building an Activity Feed with Cassandra
Building an Activity Feed with CassandraBuilding an Activity Feed with Cassandra
Building an Activity Feed with Cassandra
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and Beyond
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3Anti-Aliasing Methods in CryENGINE 3
Anti-Aliasing Methods in CryENGINE 3
 

Similar to Understanding low latency jvm gcs

Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Jean-Philippe BEMPEL
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorterManchor Ko
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaWei-Bo Chen
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Triton and symbolic execution on gdb
Triton and symbolic execution on gdbTriton and symbolic execution on gdb
Triton and symbolic execution on gdbWei-Bo Chen
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a clusterGal Marder
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
 
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup
 
Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes Jonathan Salwan
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAbhishek Asthana
 
Low latency Java apps
Low latency Java appsLow latency Java apps
Low latency Java appsSimon Ritter
 
Performance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen BorgersPerformance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen BorgersNLJUG
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Implementing a JavaScript Engine
Implementing a JavaScript EngineImplementing a JavaScript Engine
Implementing a JavaScript EngineKris Mok
 
Jvm lecture
Jvm lectureJvm lecture
Jvm lecturesdslnmd
 
Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...
Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...
Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...Peter Breuer
 
Game of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCGame of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCMonica Beckwith
 
Fast, deterministic, and verifiable computations with WebAssembly
Fast, deterministic, and verifiable computations with WebAssemblyFast, deterministic, and verifiable computations with WebAssembly
Fast, deterministic, and verifiable computations with WebAssemblyFluence Labs
 

Similar to Understanding low latency jvm gcs (20)

Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorter
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON China
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Triton and symbolic execution on gdb
Triton and symbolic execution on gdbTriton and symbolic execution on gdb
Triton and symbolic execution on gdb
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves
 
Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes Dynamic Binary Analysis and Obfuscated Codes
Dynamic Binary Analysis and Obfuscated Codes
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in Java
 
Understanding JVM GC: advanced!
Understanding JVM GC: advanced!Understanding JVM GC: advanced!
Understanding JVM GC: advanced!
 
Low latency Java apps
Low latency Java appsLow latency Java apps
Low latency Java apps
 
Optimizing Java Notes
Optimizing Java NotesOptimizing Java Notes
Optimizing Java Notes
 
Performance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen BorgersPerformance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen Borgers
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Implementing a JavaScript Engine
Implementing a JavaScript EngineImplementing a JavaScript Engine
Implementing a JavaScript Engine
 
Jvm lecture
Jvm lectureJvm lecture
Jvm lecture
 
Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...
Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...
Abstract Interpretation meets model checking near the 1000000 LOC mark: Findi...
 
Game of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCGame of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GC
 
Fast, deterministic, and verifiable computations with WebAssembly
Fast, deterministic, and verifiable computations with WebAssemblyFast, deterministic, and verifiable computations with WebAssembly
Fast, deterministic, and verifiable computations with WebAssembly
 

More from Jean-Philippe BEMPEL

Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Jean-Philippe BEMPEL
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorderJean-Philippe BEMPEL
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differencesJean-Philippe BEMPEL
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objectsJean-Philippe BEMPEL
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaJean-Philippe BEMPEL
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutionsJean-Philippe BEMPEL
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukJean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Jean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Jean-Philippe BEMPEL
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance countersJean-Philippe BEMPEL
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfJean-Philippe BEMPEL
 

More from Jean-Philippe BEMPEL (14)

Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objects
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en java
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx uk
 
Lock free programming- pro tips
Lock free programming- pro tipsLock free programming- pro tips
Lock free programming- pro tips
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance counters
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perf
 

Recently uploaded

PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Internet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptxInternet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptxErYashwantJagtap
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 

Recently uploaded (17)

PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Internet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptxInternet of Things Presentation (IoT).pptx
Internet of Things Presentation (IoT).pptx
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 

Understanding low latency jvm gcs

  • 2. 2 • • GC basics • Shenandoah • Azul’s C4 • ZGC • How to choose a GC algorithm? Understanding Low Latency JVM GCs
  • 5. 5 • • Used in CMS & G1 algorithms already and by all low latency GCs • Try to mark the whole object graph concurrently with the application running • Based on Tri-color abstraction Concurrent Marking
  • 6. 6 • Concurrent Marking: Tri-Color Abstraction
  • 7. 7 • Concurrent Marking: Tri-Color Abstraction
  • 8. 8 • Concurrent Marking: Tri-Color Abstraction
  • 9. 9 • Concurrent Marking: Tri-Color Abstraction
  • 10. 10 • Concurrent Marking: Tri-Color Abstraction
  • 11. 11 • Concurrent Marking: Tri-Color Abstraction
  • 12. 12 • Concurrent Marking: Tri-Color Abstraction
  • 13. 13 • Concurrent Marking: Issues • New allocations during marking phase can be handled by: • Marking automatically object at allocation • Not considering new allocations for the current cycle • Tri-Color abstraction provides 2 properties of missed object: 1. The mutator stores a reference to a white object into a black object. 2. All paths from any gray objects to that white object are destroyed. http://www.memorymanagement.org/glossary/s.html#term-snapshot-at-the-beginning
  • 14. 14 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null;
  • 15. 15 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null;
  • 16. 16 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null;
  • 17. 17 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null; OOPS!
  • 18. 18 • • 2 ways to ensure not missing any marking • Snapshot-At-The-Beginning • Incremental Update • For SATB, Pre-Write Barriers, recording object for marking • Before a reference assignation (X.f = Y) • SATB barrier is only active when Marking is on (global state) Concurrent Marking: Resolving misses if (SATB_WriteBarrier) { if (X.f != null) SATB_enqueue(X.f); } cmp BYTE PTR [r15+0x30],0x0 jne 0x000002965edc62e5 [...] mov r11d,DWORD PTR [rbp+0x74] test r11d,r11d je 0x000002965edc6253 mov r10,QWORD PTR [r15+0x38] mov rcx,r11 shl rcx,0x3 test r10,r10 je 0x000002965edc6318 mov r11,QWORD PTR [r15+0x48] mov QWORD PTR [r11+r10*1-0x8],rcx add r10,0xfffffffffffffff8 mov QWORD PTR [r15+0x38],r10 jmp 0x000002965edc6253 mov rdx,r15 movabs r10,0x7ffac2febc50 call r10 jmp 0x000002965edc6253
  • 20. 20 • • Non-generational (still option for partial collection) • Region based • Use Read Barrier: Brooks pointer • Self-Healing • Cooperation between mutator threads & GC threads • Only for concurrent compaction • Mostly based on G1 but with concurrent compaction Shenandoah GC
  • 21. 21 • • Initial Marking (STW) • Concurrent Marking • Final Remark (STW) • Concurrent Cleanup • Concurrent Evacuation • Init Update References (STW) • Concurrent Update References • Final Update References (STW) • Concurrent Cleanup Shenandoah Phases
  • 22. 22 • • SATB-style (like G1) • 2 STW pauses for Initial Mark & Final Remark • Conditional Write Barrier • To deal with concurrent modification of object graph Concurrent Marking
  • 23. 23 • • Same principle than G1: • Build CollectionSet with Garbage First! • Evacuate to new regions to release the region for reuse • Concurrent Evacuation done with the help of: • 1 Read Barrier : Brooks pointer • 4 Write Barriers • Barriers help to keep the to-space invariant: • All Writes are made into an object in to-space Concurrent Evacuation
  • 24. 24 • • All objects have an additional forwarding pointer • Placed before the regular object • Dereference the forwarding pointer for each access • Memory footprint overhead • Throughput overhead Brooks pointers Header Brooks pointer mov r13,QWORD PTR [r12+r14*8-0x8]
  • 25. 25 • Concurrent Copy: GC thread Header Brooks pointer From-Space To-Space
  • 26. 26 • Concurrent Copy: GC thread Header Brooks pointer From-Space To-Space GC thread
  • 27. 27 • Concurrent Copy: GC thread Header Brooks pointer Header Brooks pointer From-Space To-Space GC thread
  • 28. 28 • Concurrent Copy: GC thread Header Brooks pointer Header Brooks pointer From-Space To-Space GC thread
  • 29. 29 • Concurrent Copy: Reader threads Header Brooks pointer From-Space To-Space Reader thread Reader thread
  • 30. 30 • Concurrent Copy: Writer threads Header Brooks pointer From-Space To-Space
  • 31. 31 • Concurrent Copy: Writer threads Header Brooks pointer From-Space To-Space Writer thread Writer thread
  • 32. 32 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread
  • 33. 33 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 34. 34 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 35. 35 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 36. 36 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread
  • 37. 37 • • Any writes (even primitives) to from-space object needs to be protected • Exotic barriers: • acmp (pointer comparison) • CAS • clone Write Barriers if (evacInProgress && inCollectionSet(obj) && notCopyYet(obj)) { evacuateObject(obj) } test BYTE PTR [r15+0x3c0],0x2 jne 0x000000000281bcbc [...] mov r10d,DWORD PTR [r13+0xc] test r10d,r10d je 0x000000000281bc2b mov r11,QWORD PTR [r15+0x360] mov rcx,r10 shl rcx,0x3 test r11,r11 je 0x000000000281bd0d [...] mov rdx,r15 movabs r10,0x62d1f660 call r10 jmp 0x000000000281bc2b
  • 38. 38 • • Late memory release • Only happens when all refs updated (Concurrent Cleanup phase) • Allocations can overrun the GC • Failure modes: • Pacing • Degenerated GC • FullGC Extreme cases
  • 40. 40 • • Generational (young & old) • Region based (pages) • Use Read Barrier: Loaded Value Barrier • Self-Healing • Cooperation between mutator threads & GC threads • Pauseless algorithm but implementation requires safepoints • Pauses are most of the time < 1ms Continuously Concurrent Compacting Collector
  • 41. 41 • • Baker-style Barrier • move objects through forwarding addresses stored aside • Applied at load time, not when dereferencing • Ensure C4 invariants: • Marked Through the current cycle • Not relocated • If not => Self-healing process to correct it • Mark object • Relocate & correct reference • Checked for each reference loads • Benefits from JIT optimization for caching loaded value (registers) LVB
  • 42. 42 • • States of objects stored inside reference address => Colored pointers • NMT bit • Generation • Checked against a global expected value during the GC cycle • Thread local, almost always L1 cache hits • Register • Relocated: x86 Implementation use trap from VM memory translation Guest/Host • Intel EPT • AMD NPT LVB test r9, rax jne 0x3001443b mov r10d, dword ptr [rax + 8]
  • 43. 43 • Virtual Memory vs Physical Memory Virtual Memory Physical Memory 0 2^64 0 2^37
  • 44. 44 • • All phases are fully parallel & concurrent • No "rush" to finish phases • No constraint about STW pause to be short • Physical memory released quickly in relocation phase • Can be reused for new allocations • Plenty of virtual space vs physical memory C4 Phases
  • 45. 45 • • Mark • Marking all objects in graph • Relocation • Moving objects to release pages • Remap • Fixup references in object graph • Folded with next mark cycle C4 Phases
  • 46. 46 • • Incremental Update Marking (vs SATB) • Single pass • No final mark/remark • Self-Healing: Mark object that are not marked for the current cycle Mark Phase
  • 47. 47 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null;
  • 48. 48 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null;
  • 49. 49 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null; LVB
  • 50. 50 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null; LVB
  • 51. 51 • • Scanning roots (Static var, Thread stacks, register, JNI handles) • GC threads scans stalled threads • Running threads scans their own stack stopping individually at Safepoint • Scanning object graph like a parallel collector • Newly allocated objects into new pages, not considered for reclaim (relocation) • For each page, summing live data bytes, used to select page to reclaim Mark Phase
  • 52. 52 • • Select pages with the greatest number of dead objects (garbage first!) • Protect page selected from being accessed by mutators thread • Move objects to new allocated pages • Build side arrays (off heap tables) for forwarding information • Self-Healing: As protected, LVB will trigger a trap to: • Copy object to the new location if not done • Use forward pointer to fix the reference Relocation Phase
  • 63. 63 • • Few chances mutators stall on accessing a ref as processing mostly dead pages • Once object copy done, physical memory is released (Quick Release) • Can be immediately reused (remapped) to satisfy new allocations • Pages evacuated are still mapped & protected to help remap phase • Cannot be released until all objects are remapped • Not a problem as we have a huge virtual address space Relocation Phase
  • 64. 64 • • Traverse Object Graph and fixup references • Execute LVB barrier for each object • Self-Healing: fixup references using forward information • As we traverse again, mark for the next phase • Mark & Remap phases are folded! Remap Phase
  • 65. 65 • • Algorithm requires a sustainable rate of remapping operations • Linux limitations: • TLB invalidation • Only 4KB pages can be remapped • Single threaded remapping (write lock) • Kernel module implements API for the Zing JVM to increase significantly the remapping rate • Implements also virtual address aliasing for addressing objects with metadata Remap – Kernel module
  • 66. 66 • • Young & Old collections done by same algorithm and can be concurrent • Size of the generation are dynamically adjusted • Card Marking with write barrier (Stored Value Barrier) • Old collection is based on young-to-old roots generated by previous young cycle • Young collection will perform card scanning per page • hold an eventual concurrent Old collection per page scanned Generational
  • 67. 67 • • Used by Hadoop Name Node • 580GB Heap • Very hard to tune with G1 • No issue so far regarding GC since production roll out (Oct 2017) C4 @ Criteo
  • 68. Z GC
  • 69. 69 • • Non generational • Region based (zPages, dynamically sized) • Concurrent Marking, Compaction, Ref processing • Use Colored Pointers & Read/Load Barrier • Self-Healing • Cooperation between mutator threads & GC threads • Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC) Z GC mov r10,QWORD PTR [r11+0xb0] test QWORD PTR [r15+0x20],r10 jne 0x00007f9594cc54b5
  • 71. 71 • • Initial Mark (STW) • Concurrent Mark/Remap • Final Mark (STW) • Concurrent Prepare for Relocation • Start Relocate (STW) • Concurrent Relocate Z GC phases:
  • 72. 72 • • Store metadata in unused bits of reference address • 42 bits for addressing (4TB) • 4 bits for metadata • Marked0 • Marked1 • Remapped • Finalizable Colored Pointers
  • 73. 73 • • Colored pointers needs to be unmasked for dereferencing • Some HW support masking (SPARC, Aarch64)) • On linux/windows, overhead if done with classical instructions • Only one view is active at any point • Plenty of Virtual Space Multi-Mapping
  • 74. 74 • Multi-Mapping Virtual Memory Physical Memory 0 2^64 0 2^37 (marked0) 001<address> (marked1) 010<address> (remapped) 100<address>
  • 75. 75 • • Pages are multiple of 2MB • 3 different groups • Small: 2MB pages with object size <= 256KB • Medium: 32MB pages with object size <= 4MB • Large: 2MB pages, objects span over multiple of them • Objects in Large group are meant to not to be relocated (too expensive) Page Allocations
  • 76. 76 • • Handling remapping • C4: Memory protection + trap • Z: mask in colored pointer • Unmasking ref addresses • C4: Kernel module aliasing • Z: Multi-mapping or HW support • Pages & Relocation • C4: • Page are fixed to match OS size (mem protection) • relocation for large objects by remapping • Z: • zPages are dynamic, a zPage can be 100MB large • No relocation for large objects Difference between C4 & Z GC
  • 77. How to choose a GC algorithm
  • 78. 78 • • You have to run on Windows • Shenandoah • Battlefield tested GC (maturity) • C4 • Shenandoah • Minimizing any kind of JVM pauses • C4 • Z • You don’t want pay for it: • Shenandoah • Z Low latency GCs
  • 80. 80 • • Java Garbage Collection distilled by Martin Thompson • The Java GC mini book • Oracle’s white paper on JVM memory management & GC • What differences JVM makes by Nitsan Wakart • Memory Management Reference • IBM Pause-Less GC References GC Basics
  • 81. 81 • • Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK • Shenandoah: The Garbage Collector That Could by Aleksey Shipilev • Shenandoah GC Wiki References Shenandoah
  • 82. 82 • • The Pauseless GC algorithm (2005) • C4: Continuously Concurrent Compacting Collector (2011) • Azul GC in Detail by Charles Humble • 2010 version source code References C4
  • 83. 83 • • ZGC - Low Latency GC for OpenJDK by Per Liden • Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton • A first look into ZGC by Dominik Inführ • Architectural Comparison with C4/Pauseless References ZGC