G1 GC Presentation @ JavaOne 2013
Sneak a peek under the hood of the latest and coolest garbage collector, Garbage-First!
Dive deep into G1's adaptability and ergonomics
Discuss the future of G1's adaptability
Monica BeckwithJava Champion. Java/JVM Performance - JavaOne Rock Star at Microsoft
2. What's the talk about?
2
Sneak a peek under the hood of the latest and
coolest garbage collector, Garbage-First!
Dive deep into G1's adaptability and ergonomics
Discuss the future of G1's adaptability
3. About the Presenters
3
Charlie Hunt
Architect,
Performance
Engineering,
Salesforce
Monica
Beckwith
Performance
Architect,
Servergy
John
Cuthbertson
GC Guru,
Azul Systems
4. Agenda
Key Concepts
Definition
Ergonomics
Adaptability (if any)
Future Adaptability Needs of G1
Need More Information?
Questions? - Let‟s Discuss!
4
5. Key Concepts
Remembered Sets (RSets)
Marking Threshold and Concurrent Cycle
Collection Set (CSet) and Collections
Mixed Collection
Young Collection
Old Regions Collection
Humongous Allocations
Evacuation Failures
Reference Processing
5
7. Remembered Sets
Per-region entries
Each RSet keeps track of outside references
into its “owning” region
RSets help regions to be independently GC‟d
No need to scan the entire heap!
G1 maintains RSets for –
old-to-young and
old-to-old references
7
8. Remembered Sets (RSets)
8
Region 2
Region 1
Region 3
RSet for
Region 1
RSet for
Region 3
RSet for
Region 2
* Figure referenced from InfoQ article: http://www.infoq.com/articles/tuning-tips-G1-GC
9. RSets – Ergonomics and Adaptability
Concurrent Refinement threads –
Help maintain RSets
Concurrent updating
Post-write Barriers –
After a write
Help track cross region updates
Coarsening –
RSets transitioning through different levels of
granularity
9
10. RSet - Concurrent Refinement
Concurrent processing of the filled update
buffers
Tiered deployment
Max number can be set by
-XX:G1ConcRefinementThreads
The processing of update buffers can eventually
fall to the mutator (application) threads
Need to avoid such a scenario
10
11. RSet - Coarsenings
Three levels of granularity –
Sparse
Hash table of card indices; Fastest to scan
Fine
Card indices held in a bitmap; Scan not as fast as Sparse
table
Coarse
One bit for each region; Slowest to scan
11
12. Key Concept – Marking Threshold
and Concurrent Cycle
12
13. Marking Threshold and Concurrent
Cycle
Threshold default: 45% of your Java heap
-XX:InitiatingHeapOccupancyPercent=<value>
When threshold‟s crossed, G1 starts a
concurrent cycle
Some phases are concurrent and some are stop-the
world
Multi-phased concurrent marking cycle finds the
“best” regions to be collected
Live-ness accounting
13
14. Marking Threshold and Concurrent
Cycle
After the marking phase is complete, G1 has
information on which old regions to collect
Regions are ordered based on “collection
efficiency”
Expensive regions would be regions with lots of live
data and large RSets
Completely free regions are collected during
cleanup phase
Examples are shown later in this presentation …
14
15. Marking Threshold – Example 1
15
Default IHOP IHOP increased to 75%
Java heap size Java heap sizeMax heap occupancy
Max heap occupancy
Marking Threshold
Marking Threshold
16. Marking Threshold – Example 2
16
Default IHOP IHOP increased to
70%
25% Young GCs
60% Mixed GCs
30% Mixed GCs
64% Young GCs
Better to do more young GCs
than mixed GCs
18. Collection Set
A set of regions to be collected during a GC
evacuation pause
Young Collection will have only and all young
regions in the CSet
Mixed Collection will have both young and old
regions in the CSet
Live data in the CSet is evacuated/copied
during a GC cycle
18
19. Collection Set – Young Collection
Look for the following in PrintAdaptiveSizePolicy enabled log –
6676.431: [GC pause (G1 Evacuation Pause) (young) 6676.431: [G1Ergonomics (CSet
Construction) start choosing CSet, _pending_cards: 147002, predicted base time: 42.39 ms,
remaining time: 157.61 ms, target pause time: 200.00 ms]
6676.431: [G1Ergonomics (CSet Construction) add young
regions to CSet, eden: 950 regions, survivors: 74 regions,
predicted young region time: 160.72 ms]
6676.431: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 950 regions,
survivors: 74 regions, old: 0 regions, predicted pause time: 203.11 ms, target pause time:
200.00 ms]
19
20. Collection Set – Mixed Collection
5884.952: [GC pause (G1 Evacuation Pause) (mixed) 5884.952: [G1Ergonomics (CSet
Construction) start choosing CSet, _pending_cards: 167183, predicted base time: 48.30 ms,
remaining time: 151.70 ms, target pause time: 200.00 ms]
5884.952: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 952 regions,
survivors: 72 regions, predicted young region time: 225.39 ms]
5884.954: [G1Ergonomics (CSet Construction) finish
adding old regions to CSet, reason: reclaimable percentage
not over threshold, old: 134 regions, max: 308 regions,
reclaimable: 1285706456 bytes (9.98 %), threshold: 10.00 %]
5884.954: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 952 regions,
survivors: 72 regions, old: 134 regions, predicted pause time: 354.55 ms, target pause time:
200.00 ms]
20
21. Collection Set – Mixed Collection
21
Young Regions
Old Regions
H Humongous Regions
✓ ✓
✓ ✓
✓ H ✓
✓
✓ H
✓
H ✓
* Figure shows both young and old collection sets.
22. Mixed Collection - Copying
22
✓ ✓
✓ ✓
✓ H ✓
✓
✓ H
✓
H ✓
Live objects are copied
into the “to-space”
regions:
- Survivor Regions
- Old Regions
23. Mixed Collection – Reclamation
23
✓ ✓
✓ ✓
✓ H ✓
✓
✓ H
✓
H ✓
G1 achieves
compaction via copying
24. CSet – Ergonomics and Adaptability
CSet for young entirely depends on –
The pause time target
-XX:MaxGCPauseMillis=<value>
All young regions are included in the CSet
CSet for mixed collection –
Could have a minimum number of regions based on
the MixedGCCountTarget and
But will never cross the max number of regions set by
OldCSetThresholdPercent
24
25. CSet – Ergonomics and Adaptability
-XX:G1MixedGCCountTarget (defaults to 8) – sets a target for
mixed collections by setting a minimum limit on the old regions
to collect per mixed collection, the goal is to not exceed the
CountTarget during the mixed collection cycle
-XX:G1OldCSetRegionThresholdPercent (defaults to 10) – sets
an upper limit on the max number of old regions that can be
collected during a mixed collection (its expressed as a
percentage of the total heap)
25
26. CSet – Ergonomics and Adaptability
Snippets from a PrintAdaptiveSizePolicy enabled
log –
<finish adding old regions to CSet, reason: predicted time is too high,
predicted time: 2.27 ms, remaining time: 0.00 ms, old: 12 regions,
min: 12 regions>
<finish adding old regions to CSet, reason: old CSet region num
reached min, old: 142 regions, min: 142 regions>
<finish adding old regions to CSet, reason: reclaimable percentage
not over threshold, old: 134 regions, max: 308 regions>
26
27. Mixed Collection – Ergonomics and
Adaptability
Young regions – all young regions are
included in a collection
Old regions are selected based on –
-XX:G1MixedGCLiveThresholdPercent (defaults to 65) – sets a limit
on the live objects' occupancy such that any old region above that
threshold will not be included in any mixed collection
Remember GC efficiency?
-XX:G1HeapWastePercent (defaults to 10) – sets the waste target i.e.
the percentage of heap space you are willing to never collect in an
effort to avoid expensive GCs
Remember the logs outputs that mentioned “reclaimable percentage not
over threshold”?
27
28. Young Collection – Ergonomics and
Adaptability
Young generation size is based on your pause time
target and internally set min and max bounds
-XX:MaxGCPauseMillis = 200 (default)
Default min nursery size = 5% of your Java heap
Default max nursery size = 60% of your Java heap
Prediction logic
Determines how much time it will take to collect 1 region
(Re-)Sizes the young generation accordingly after each
collection
28
29. Old Regions Collection - Ergonomics
During Mixed GCs
Based on the criteria mentioned earlier
809.925: [G1Ergonomics (Mixed GCs) start
mixed GCs, reason: candidate old regions
available, candidate old regions: 4273 regions,
reclaimable: 111102028112 bytes (53.89 %),
threshold: 10.00 %]
29
30. Old Regions Collection - Ergonomics
During concurrent cycle
Entirely free (i.e. full of garbage) regions are collected
6530.615: [GC cleanup 13G->12G(18G), 0.0388540 secs]
During Full GCs
Collects and compacts all regions
154.725: [Full GC 1018M->369M(1024M), 1.6437640 secs]
[Eden: 0.0B(51.0M)->0.0B(51.0M) Survivors: 0.0B->0.0B Heap:
1018.3M(1024.0M)->369.4M(1024.0M)]
30
32. Humongous Objects –What Are
They?
Objects that span >= 50% of G1‟s region size
Ideally –
Not that many in number
Are long lived
Allocated directly into the old generation into
Humongous Regions
Avoids unnecessary copying back and forth and
expensive promotions
Larger objects will need contiguous regions
32
33. Humongous Objects – WWG1D?
33
G1 Region –
Young Generation
G1 Region –
Old Generation
Object 1 < 50% of
G1 Region
Object 2 == 50%
of G1 Region
Object 3 > 50%
of G1 Region
Object 4 >
G1 Region
?
34. Humongous Objects – WWG1D?
34
G1 Region –
Young Generation
G1 Region –
Old Generation
Object 1 < 50% of
G1 Region
Object 2 == 50%
of G1 Region
Object 3 > 50%
of G1 Region
Object 4 >
G1 Region
* Object 4 will need contiguous regions
35. Object 4 - Humongous
Humongous Objects
35
Object 1 – NOT Humongous
Object 2 - Humongous
Object 3 - Humongous
* Object 4 needs contiguous regions
Wasted
Space
36. Humongous Objects – What’s The
Catch?
As of 7u40, initial heap size (-Xms) determines
the region size
G1 strives for 2048 regions
Region size a factor of 2 ranging from 1M to 32M
If there is a vast difference between the initial
and the max heap, normal objects may seem
humongous to G1!
36
37. Humongous Objects – What’s The
Catch?
Let‟s look at this PrintAdaptiveSizePolicy enabled
log snippet (Note: G1 region size was 4MB):
1361.680: [G1Ergonomics (Concurrent Cycles)
request concurrent cycle initiation, reason:
occupancy higher than threshold, occupancy:
1459617792 bytes, allocation request: 4194320
bytes, threshold: 1449551430 bytes (45.00 %),
source: concurrent humongous allocation]
37
38. Humongous Objects – What’s The
Catch?
Notes from earlier snippet –
Concurrent cycle requested
Reason: Occupancy was higher than threshold
Allocation size was 4194320 bytes
Greater than 4MB, hence humongous allocation
Notes from the log (not shown here) –
Too many humongous allocations
Concurrent cycles can‟t keep up with the allocations
Resulting in to-space exhausted messages and eventually Full GCs
38
39. Humongous Objects – How to “Fix”
it?
Let‟s find a region size that can accommodate
the humongous objects as regular allocations
So the next region size up would be 8MB
But, 4.000015MB > 50% of 8MB
So, go to the next size up. i.e. 16MB
Solution – Set your region size to 16MB
-XX:G1HeapRegionSize=16M
39
40. Humongous Allocation –
Ergonomics
Humongous Regions are not included in a
mixed collection
Dead Humongous objects are collected during
cleanup and during Full GC
6569.877: [GC cleanup 6708M- >6384M(12G),
0.0181200 secs]
Live Humongous objects are compacted through
during full GC
40
42. Evacuation Failures
Evacuation failures indicate that G1 ran out of
heap regions either –
while copying to survivor regions or
while promoting or copying live objects in-to the
old generation
Prior to Java 7u40 evacuation failures shown as
a “to-space overflow” in the GC logs
Java 7u40 onwards shows “to-space exhausted”
in the GC logs
42
43. Evacuation Failures – How to Avoid
Them?
Get a baseline with bare minimum options:
-Xmx, -Xms and -XX:MaxGCPauseMillis=<value>
Over-tuning is NOT for G1
Look at the output of PrintAdaptiveSizePolicy
Too many humongous allocations?
Increase G1HeapRegionSize
43
44. Evacuation Failures
Plot the heap utilization stats from the log
Marking threshold too high?
Can‟t keep up with promotions
Marking threshold too low?
Not reclaiming much space from marking cycle
Concurrent cycles taking a long time to
complete?
Increase the thread count: ConcGCThreads
44
45. Evacuation Failures
Sometimes survivor space gets exhausted
Increase the G1ReservePercent
It‟s a false ceiling
Defaults to 10
G1 will cap it off at 50%
45
52. Future Adaptability
Adaptive marking threshold?
A static value doesn‟t cut it
CMS has adaptive marking threshold with static
override and static max value
Wouldn‟t it be nice if G1 had an adaptive threshold
and adaptive max value?
Adaptive region size?
Will help alleviate the issues that applications with
numerous short-lived “humongous” objects encounter
52
54. Future Adaptability
Improving RSets
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7187490
Improving concurrent refinement thresholds to
reduce the possibility of long RSet updating times.
Smarter/ adaptable criteria for selecting CSet
for old regions
Smarter/ adaptable Mixed Collections
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173711
54
58. More Questions?
HOL5429: Tuning Low-Pause Garbage Collection – Tue 10 am
CON3754: G1 GC: Migration to, Expectations, and Advanced Tuning –
Wed 10 am
CON7624: Understanding Java Garbage Collection and What You Can
Do About It. – Wed 11:30 am
GC Tuning BOF4020 - Tonight @7:30 pm
JVM Performance BOF4471 – Tonight @8:30 pm
Email: hotspot-gc-use@openjdk.java.net & hotspot-gc-
dev@openjdk.java.net
58
Editor's Notes
These are the same graphs we used for Qcon… we can replace the data with Charlie’s data.