2. Important Disclaimers
• THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
• WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION
CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED.
• ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED
ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR
INFRASTRUCTURE DIFFERENCES.
• ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
• IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT
PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
• IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT
OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
• NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS
OR THEIR SUPPLIERS AND/OR LICENSORS
2
7. Garbage Collection
7
“Garbage Collection (GC) is a form of automatic memory
management. The garbage collector attempts to reclaim
memory occupied by objects that are no longer in use by the
application.”
8. Garbage Collection
8
Positives
v Automatic memory management
v Help reduce certain categories of
bugs
Negatives
v Require additional resources
v Causes unpredictable pauses
v May introduce runtime costs
v Application has little control of when
memory is reclaimed
10. Garbage Collection Policies
10
optthruput – stop the world parallel collector
optavgpause – concurrent collector
gencon – generational copying collector
balanced – region based collector
metronome – incremental soft realtime collector
-Xgcpolicy:
11. -Xgcpolicy:optthruput
11
Parallel global collector
v GC operations are completed in Stop the world (STW) pauses
v Mark, sweep and optional compaction collector
v All GC operations are completed in parallel by multiple helper threads
GC native memory overhead for mark map and work packets
13. -Xgcpolicy:optthruput heap
13
Flat heap -> one continuous block of memory
Heap
SOA LOA
Divided into 2 logical memory areas for allocation
SOA – small object area
LOA – large object area
15. -Xgcpolicy:optthruput GC
15
Each GC is divided into 3 phases
Marking
finds all of the live
objects
Sweeping
reclaims memory
for dead objects
Compaction
(optional)
defragment the
heap
17. -Xgcpolicy:optavgpause
17
Improves STW pause times and application responsiveness
Introduces the requirement for an object write barrier
Concurrent global collector
GC native memory overhead for mark map, work packets and card table
18. -Xgcpolicy:optavgpause heap
18
Heap layout is the same as optthruput
Heap
SOA LOA
Allocation is the same as with optthruput
Start a concurrent GC before the heap memory is exhausted
Application runs during the GC so allocations continue
25. -Xgcpolicy:optavgpause GC
25
Write Barrier
How is the write barrier implemented?
private void setField(Object A, Object C) {
| A.field1 = C;
| if (concurrentGCActive) {
| | cardTable->dirtyCard(A); // ß
| }
}
26. -Xgcpolicy:gencon
26
Provides a significant reduction in GC STW pause times
Introduces another write barrier for the remembered set
Generational copy collector
Concurrent global collector from optavgpause
28. -Xgcpolicy:gencon heap
28
Heap is divided into Nursery and Tenure Spaces
Heap
The Nursery is divided into 2 logical spaces: Allocate and Survivor
TenureAllocate Survivor
31. -Xgcpolicy:gencon GC
31
Why do we need a write barrier?
The GC needs to be able to find objects in the nursery which are only
referenced from tenure space
Write Barrier
32. -Xgcpolicy:gencon GC
32
How’s the write barrier implemented?
Write Barrier
private void setField(Object A, Object C) {
| A.field1 = C;
}
33. -Xgcpolicy:gencon GC
33
How’s the write barrier implemented?
Write Barrier
private void setField(Object A, Object C) {
| A.field1 = C;
| if (A is tenured) {
| | if (C is NOT tenured) {
| | | remember(A);
| | }
| }
}
34. -Xgcpolicy:gencon GC
34
Write Barrier
private void setField(Object A, Object C) {
| A.field1 = C;
| if (A is tenured) {
| | if (C is NOT tenured) {
| | | remember(A); // ß
| | }
| | if (concurrentGCActive) {
| | | cardTable->dirtyCard(A);
| | }
| }
}
35. -Xgcpolicy:metronome
35
Provides a significant reduction in max GC STW pause times
Uses a Snapshot At The Beginning (SATB) write barrier
Incremental soft realtime collector
High percentage of floating garbage
37. -Xgcpolicy:metronome heap
37
Cell based allocation
Heap
Regions are assigned a cell size on first use
v Cell sizes range from 16 bytes to 2048 bytes
v Only objects of that cell size can be allocated in a region of a
particular cell size
38. -Xgcpolicy:metronome heap
38
Objects larger than 2048 bytes consume contiguous regions
v If no contiguous regions a full STW GC is completed before OOM
Heap
Large arrays are allocated as arraylets
v Arraylet leaves are 2048 bytes
v Each arraylet leaf region contains 32 arraylet leaves
40. -Xgcpolicy:metronome
40
Snapshot at the beginning of the barrier (SATB)
Write Barrier
Causes a lot of floating garbage
The barrier has to be performed before the store
46. -Xgcpolicy:balanced
46
Provides a significant reduction in max GC STW pause times
Introduces a write barrier to track inter region references
Region based generational collector
Full parallel global as a fall back if the PGCs can not keep up
47. -Xgcpolicy:balanced heap
47
Heap is divided into a fixed number of regions
v Region size is always a power of 2
v Attempts to have between 1000-2000 regions
v Bigger heap == bigger region size
Heap
49. -Xgcpolicy:balanced heap
49
No non-array object can be larger than a region size
vIf (object_size > Region_size) throw OutOfMemoryError
Large arrays are allocated as arraylets
v Arrays less than region size are allocated as normal arrays
51. -Xgcpolicy:balanced Global Mark Phase (GMP)
51
Does not reclaim any memory
Performs a marking phase only
Scheduled to run in between PGCs
Builds an accurate mark map of the whole heap
Mark map is used to predict region ROI for PGC
Scrubs RSCL (Remember Set Card List)
52. -Xgcpolicy:balanced
52
Why do we need a write barrier?
Write Barrier
Balanced PGCs can select any region to be included in the
collect phase
Similar to the generational barrier, the GC needs to know
which regions reference a given region
53. -Xgcpolicy:balanced
53
How is the write barrier implemented?
Write Barrier
private void setField(Object A, Object C) {
| A.field1 = C;
}
54. -Xgcpolicy:balanced
54
Write Barrier
private void setField(Object A, Object C) {
| A.field1 = C;
| dirtyCard(A);
}
private void checkCards() { // Beginning of PGC
| for(eachCard)…
| | if (findRegion(A) != findRegion(C)) {
| | | addRSCLEntryFor(C, A);
| | }
}
55. GC Policy Quick Guide
55
GC Policy Generational Concurrent Defragmentation
gencon tenure / nursery
Concurrent
mark tenure / nursery
balanced regions have ages
Global Mark
Phase region based
metronome N/A
Incremental
realtime region based
optthruput N/A N/A (STW) SOA / LOA
optavgpause N/A
Concurrent
mark SOA / LOA
56. GC Policy Quick Guide
56
GC Policy Domain Memory used in addition to
heap (% of Xmx)
Throughput
gencon Webservers, desktop applications,
runs most apps well
9 Highest
balanced Very large heaps, large NUMA
systems
13.5 High
metronome Soft-realtime systems, trading
applications, etc.
4.5 Low-Medium
optthruput Small heaps with little GC 4.5 Medium
optavgpause Why not gencon? 6 Low-Medium
58. Arraylets
58
Large Arrays that cannot fit into a single region
vArray is created from construct comprising of an
arraylet spine and 1 or more arraylet leaves
vAn arraylet spine is allocated like a normal object
vEach leaf consumes an entire region**
60. 60
Arraylets
Arraylets were introduced so that arrays were more cleverly
stored in the heap for balanced and metronome GC policies.
Some APIs require a contiguous view of an array
61. 61
Arraylets
Some APIs require a contiguous view of an array
The case of Java Native Interface (JNI)
JNI Critical is used when the programmer wants direct
addressability of the object.
66. 66
Arraylets
Double Mapping
Make large arrays (discontiguous arraylets) look contiguous
Virtual Memory address space is large in 64 bit systems, 264 in fact
compared to 32 bits in 32 bit systems
Physical memory is limited
67. 67
Arraylets
Double Mapping
Map 2 virtual memory addresses to the same physical
memory address
Any modifications to the newly mapped address will reflect
the original array data, and vice-versa
73. Appendix
73
Case Study on
gencon GC
(XML validation service)
Heap size: 1280 MB
Nursery: 320 MB
Tenure: 960MB
More than 1000 nursery
collections (scavange time)
and 10 global collections
~ 20% of runtime
74. 74
Case Study on
gencon GC
(XML validation service)
Heap size: 1280 MB
Nursery: 640 MB
Tenure: 640 MB
Less than 500 nursery
collections (scavange time)
and no global collections
~ 8% of runtime
Appendix
Smoother