SlideShare a Scribd company logo
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Confidential
1
JVMs & GCs
GC tuning for low latencies with DSE
Quentin Ambard
@qambard
(NOT a Cassandra benchmark!)
Disclaimer
2
• Add implicit “for this use case” after each sentence
• Not a definitive jvm guide. Big heap only. Don’t
copy/paste the settings!
• Base for standard use case. Might change for
• Search workload, especially with big fq cache
• Analytic workload (lot of concurrent pagination)
• Abnormal memory usage (prepared statement cache etc.)
• Off heap usage
• …
It’s all about JVM
3
Agenda
• Hotspot GCs
• G1: Heap size & target pause time
• G1: 31GB or 32GB?
• G1: Advanced settings
• Low latencies JVMs
• Smaller heaps
• Wrap up
4
• Parallel collector
• CMS
• G1
Hotspot GCs
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.5
Hotspot GCs
6
GCs are generational, using different strategy / algorithm for each generation
lifetime
object allocated
Young Surv. Old
Parallel collector
7
Eden
Size fixed by conf
Survivor (S0/S1)
STW
Parallel collector
8
Heap is full
Triggers a Stop The World (STW) GC to mark & compact
Parallel gc profile
9
Young GC Old is filled, full GC
Bad latencies, good throughput (ex: batches)
Concurrent Mark Sweep (CMS) collector
10
Young collection uses ParNewGC. Behaves like Parallel GC
Difference: needs to communicate with CMS for the old generation
Concurrent Mark Sweep (CMS) collector
11
Old region is getting too big
Limit defined by -XX:CMSInitiatingHeapOccupancyPercent
Start concurrent marking & cleanup
Only smalls STW phase
Delete only. No compaction leading to fragmentation
Triggers a serial full STW GC if continuous memory can’t be allocated
Memory is requested by “block” –XX:OldPLABSize=16
each thread requests a block to copy young to old
CMS profile
12
Hard to tune. Need to choose the young size, won’t adapt to new workload
inexplicit options. Easier when heap remains around 8GB
Fragmentation will trigger super long full GC
Garbage first (G1) collector
13
Heap (31GB)
Empty Regions (-XX:G1HeapRegionSize)
Garbage first (G1) collector
14
Young region
Garbage first (G1) collector
15
Young space is full
Dynamically sized to reach pause objective -XX:MaxGCPauseMillis
Trigger STW “parallel gc” on young
Scan object in each region
Copy survivor to survivor
In another young region
Free regions
Garbage first (G1) collector
16
Young space is full
Dynamically sized to reach pause objective -XX:MaxGCPauseMillis
Trigger STW “parallel gc” on young
Scan object in each region
Copy survivor to survivor
In another young region. Survivor size dynamically adjusted
Copy old survivor to old region
In another young region
Free regions
Garbage first (G1) collector
17
Old space is getting too big
Limit defined by -XX:InitiatingHeapOccupancyPercent=40
Start region concurrent scan
Not blocking (2 short STW with SATB: start + end)
100% empty regions reclaimed
Fast, “free” operation
Trigger STW mixed gc:
Dramatically reduce young size
Includes a few old region in the next young gc
Repeat mixed gc until -XX:G1HeapWastePercent
10% 100% 100%
80%78%
30%
Young size being reduced to respect target pause time,
G1 triggers several concurrent gc
G1 profile
18
“mixed” gcEasy to tune, adaptable, predictable young GC pause
• Heap size -Xmx
• Target pause time
-XX:MaxGCPauseMillis
G1: Heap size &
target pause time
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.19
Test protocol
20
128GB RAM, 2x8 cores (32 ht cores)
2 CPU E5-2620 v4 @ 2.10GHz. Disk SSD RAID0. JDK 1.8.161
DSE 5.1.7, memtable size fixed to 4G. Data saved in ramdisk (40GB)
Gatling query on 3 tables
jvm: zing. Default C* schema configuration. Durable write=false (avoid commit log to reduce disk activity)
Rows size: 100byte, 1.5k, 3k
50% Read, 50% write. Throughput 40-120kq/sec
33% of the read return a range of 10 values. Main tests also executed with small rows only.
Datastax recommended OS settings applied
GC log analysis
21
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/opt/dse/dse-5.1.8/log/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
GC log analysis
22
2018-07-27T12:22:59.997+0000: 1.835: Total time for which application threads were stopped: 0.0002855 seconds, Stopping threads took: 0.0000522 seconds
{Heap before GC invocations=2 (full 0):
garbage-first heap total 1048576K, used 98953K [0x00000000c0000000, 0x00000000c0102000, 0x0000000100000000)
region size 1024K, 97 young (99328K), 8 survivors (8192K)
Metaspace used 20965K, capacity 21190K, committed 21296K, reserved 1067008K
class space used 3319K, capacity 3434K, committed 3456K, reserved 1048576K
2018-07-27T12:23:00.276+0000: 2.114: [GC pause (Metadata GC Threshold) (young) (initial-mark)
Desired survivor size 7340032 bytes, new threshold 15 (max 15)
- age 1: 1864496 bytes, 1864496 total
- age 2: 2329992 bytes, 4194488 total
, 0.0175524 secs]
[Parallel Time: 15.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 2114.1, Avg: 2117.0, Max: 2120.0, Diff: 5.9]
[Ext Root Scanning (ms): Min: 0.1, Avg: 4.6, Max: 9.1, Diff: 9.0, Sum: 9.2]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Processed Buffers: Min: 0, Avg: 0.5, Max: 1, Diff: 1, Sum: 1]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.2, Diff: 1.2, Sum: 1.2]
[Object Copy (ms): Min: 6.1, Avg: 7.0, Max: 7.9, Diff: 1.8, Sum: 14.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 9.4, Avg: 12.3, Max: 15.2, Diff: 5.9, Sum: 24.6]
[GC Worker End (ms): Min: 2129.3, Avg: 2129.3, Max: 2129.3, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.6 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 89.0M(97.0M)->0.0B(602.0M) Survivors: 8192.0K->12.0M Heap: 97.1M(1024.0M)->12.1M(1024.0M)]
GC log analysis
23
Generated by gceasy.io
Which Heap size
24
Why a bigger heap could be better?
• Heap get filled slower (less gc)
• Increase ratio of dead object during collections
Moving (compacting) the remaining objects is the most heavy operation
• Increases chances to flush an entirely region
Which Heap size
25
Why a bigger heap could be worse?
• Full GC has a bigger impact
Now parallel with java 10
• Increases chances to trigger longer pauses
• Less memory remaining for disk cache
Heap size
26
Small heap (<16GB) have bad
latencies
After a given size, no obvious
difference
0
100
200
300
400
500
600
0pt
20pt
40pt
55pt
65pt
75pt
80pt
85pt
88.75pt
91.25pt
93.75pt
95pt
96.25pt
97.18pt
97.81pt
98.43pt
98.75pt
99.06pt
99.29pt
99.45pt
99.6pt
99.68pt
99.76pt
99.92pt
Latency(ms)
Client latencies by Heap Size - target pause= 300ms, 60kq/sec
8GB 12GB 20GB 28GB 36GB 44GB 52GB 60GB
95th percentile = 160ms
Perfect latencies
Heap size & GC Pause time
27
0
20
40
60
80
100
120
8 18 28 38 48 58
Totalpausetime(sec)
Heap size (GB)
GC STW Pause by Heap size and heap allocation
target pause 300ms – jvm allocation rate 800mb/s
Total pause time (800mb/s)
Heap size & GC Pause time
28
0
100
200
300
400
500
600
700
0
20
40
60
80
100
120
8 18 28 38 48 58
Heap size (GB)
MaxPausetime(ms)
Totalpausetime(sec) GC STW Pause by Heap size and heap allocation
target pause 300ms – jvm allocation rate 800mb/s
Total pause time (800mb/s) GC Max pause time (800mb/s)
Heap size & GC Pause time
29
0
100
200
300
400
500
600
0
20
40
60
80
100
120
140
160
180
8 18 28 38 48 58
Heap size (GB)
MaxPausetime(ms)
Totalpausetime(sec) GC STW Pause by Heap size and heap allocation
target pause 300ms – jvm allocation rate 1250mb/s
Total pause time (1250mb/s) GC Max pause time (1250mb/s)
Heap size & GC Pause time
30
0
100
200
300
400
500
600
700
0
20
40
60
80
100
120
140
160
180
8 18 28 38 48 58
MaxPausetime(ms)
Totalpausetime(sec)
Heap size (GB)
GC STW Pause by Heap size and heap allocation
target pause 300ms
Total pause time (800mb/s) Total pause time (1250mb/s)
GC Max pause time (800mb/s) GC Max pause time (1250mb/s)
Target pause time -XX:MaxGCPauseMillis
31
0
10
20
30
40
50
60
70
50 100 200 300 400 500 600
Totalpausetime(sec)
G1 Pause time target (ms)
Total STW GC pause by target pause - heap 28GB
Total STW GC duration
Target pause time -XX:MaxGCPauseMillis
32
0
100
200
300
400
500
600
0
10
20
30
40
50
60
70
50 100 200 300 400 500 600
Maxpausetime(ms)
Totalpausetime(sec)
G1 Pause time target (ms)
Total STW GC pause by target pause - heap 28GB
Total STW GC duration (900 mb/sec) Max Pause time (900mb/sec)
Target pause time -XX:MaxGCPauseMillis
33
0
50
100
150
200
250
300
350
400
450
500
90pt
91.25pt
92.5pt
93.75pt
94.37pt
95pt
95.62pt
96.25pt
96.87pt
97.18pt
97.5pt
97.81pt
98.12pt
98.43pt
98.59pt
98.75pt
98.9pt
99.06pt
99.21pt
99.29pt
99.37pt
99.45pt
99.53pt
99.6pt
99.64pt
99.68pt
99.72pt
99.76pt
99.8pt
ClientPausetime(ms)
percentiles
Client pause time by GC target Pause
36GB-100ms 36GB-200ms 36GB-300ms 36GB-400ms
36GB-500ms 36GB-50ms 36GB-600ms
Heap size Conclusion
35
G1 struggle with a too small heap. Also increases full GC risk
GC pause time doesn’t reduce proportionally when heap size increase
Sweet spot seems to be around 30x allocation rate, at least 12GB
Keep -XX:MaxGCPauseMillis >= 200ms (for 30GB heap)
• Oops compression
• Region size
• Zero based oops compression
31GB or 32GB
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.36
31GB or 32GB?
37
Up to 31GB: Oops compressed on 32bit with 8 bytes alignement (3 bit shift)
8 0000 1000 0000 0001
32 0010 0000 0000 0100
40 0010 1000 0000 0101
2^32 => 4G. 3 bit shift trick leads to 2^3 = 8 times more addresses. 2^32 * 2^3 = 32G
32GB: Oops on 64 bits
Heap from 32GB to 64GB can be aligned on 16bit
G1 targets 2048 regions and changes the default size at 32GB
31GB => XX:G1HeapRegionSize=8m = 3875 regions
32GB => XX:G1HeapRegionSize=16m = 2000 regions
Region number can have an impact on Remember Set update/scan
[Update RS (ms): Min: 1232.7, Avg: 1236.0, Max: 1237.7, Diff: 5.0, Sum: 12576.2]
[Scan RS (ms): Min: 1328.5, Avg: 1329.5, Max: 1332.7, Diff: 4.2, Sum: 21271.7]
nodetool sjk mx -mc -b "com.sun.management:type=HotSpotDiagnostic" -op getVMOption -a UseCompressedOops
31GB or 32GB?
38
No major difference
Concurrent marking cycle is slower with smaller
region (8MB => +20%)
No major difference in cpu usage
31GB + RegionSize=16MB
Total GC Pause time -10%
Latencies mean -8%
All other GC metrics are very similar
Not sure? stick with 31GB + RegionSize=16MB0
50
100
150
200
250
300
350
400
92.5pt
93.75pt
94.375pt
95pt
95.625pt
96.25pt
96.875pt
97.187pt
97.5pt
97.812pt
98.125pt
98.437pt
98.593pt
98.75pt
98.906pt
99.062pt
99.218pt
99.296pt
99.375pt
99.453pt
99.531pt
99.609pt
99.648pt
99.687pt
99.726pt
99.765pt
99.804pt
99.824pt
99.843pt
99.863pt
99.882pt
Clientlatency(ms)
percentiles
32 or 31GB / Compressed oops & regions size
32GB RS=16mb bytealign=16 32GB RS=8mb bytealign=16
32GB RS=16mb 31GB RS=8mb
31GB RS=16mb 32GB RS=8mb
Zero based compressed oops?
39
Zero based compressed oops
Virtual memory starts at zero:
native oop = (compressed oop << 3)
Not zero based:
if (compressed oop is null)
native oop = null
else
native oop = base + (compressed oop << 3)
Happens around 26/30GB
Can be checked with -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode
No noticeable difference for this workload
• G1NewSizePercent
• ParallelGCThreads
• MaxTenuringThreshold
• InitiatingHeapOccupancyPercent
• G1RSetUpdatingPauseTimePercent
• …
More advanced
settings for G1
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.40
XX:ParallelGCThreads
41
Defines how many thread participate in GC.
2x8 physical cores
32 with hyperthreading
threads STW total time
8 90 sec
16 41 sec
32 32 sec
40 35 sec
0
50
100
150
200
250
300
350
400
Clientlatency(ms)
ParallelGCThreads - 31GB / 300ms
8 threads 16 threads 32 threads 40 threads
Minimum young size
42
During mixed gc, young size is drastically reduced (1.5GB with 31GB heap)
Young get filled very quickly. Can lead to multiple consecutive GC
We can force it to a minimum size
XX:G1NewSizePercent=8 seems to be a better default (5)
Interval between GC increased by x3 during mixed GC (min 3 sec)
No noticeable changes in throughout and latencies
(Increase mixed time, reduce young gc time)
GC Pause time, 31GB GC Pause time, 31GB -XX:NewSize=4GB
GC every sec (or multiple per sec)
Survivor threshold
43
Defines how many times data will be copied into young before promotion to old
Dynamically resized by G1
Desired survivor size 27262976 bytes, new threshold 3 (max 15)
- age 1: 21624512 bytes, 21624512 total
- age 2: 4510912 bytes, 26135424 total
- age 3: 5541504 bytes, 31676928 total
Default 15, but tends to remains <= 4 under heavy load
quickly fill survivor space defined by XX:SurvivorRatio
Most object should be either long-living or instantaneous. Is it worth disabling survivor ?
-XX:MaxTenuringThreshold=0 (default 15)
Survivor threshold
44
0
100
200
300
400
500
600
0pt
40pt
65pt
80pt
88.75pt
93.75pt
96.25pt
97.8125pt
98.75pt
99.2968pt
99.6093pt
99.7656pt
99.8632pt
99.9218pt
99.956pt
99.9755pt
99.9853pt
99.9914pt
99.9951pt
99.9972pt
99.9984pt
99.999pt
Client Latencies by Max survivor age - 31GB,
300ms
Max age 0 Max age 1 Max age 2 Max age 3 Max age 4
Removing survivor greatly reduces GC
(count -40%, time -50%)
In this case doesn’t increase old gc count
most survivor objects seems to be promoted anyway
! “Prematured promotion” could potentially
fill the old generation quickly !
Survivor threshold - JVM pauses
45
Max tenuring = 15 (default)
GC Avg: 235 ms
GC Max: 381 ms
STW GC Total: 53 sec
Max tenuring = 0
GC Avg: 157 ms
GC Max: 290 ms
STW GC Total: 28 sec
Generated by gceasy.io
Survivor threshold
46
Generated by gceasy.io
Check your GC log first!
In this case (almost no activity), the survivor size doesn’t reduce much after 4 periods
Try with -XX:MaxTenuringThreshold=4
Delaying the marking cycle
47
By default, G1 starts a marking cycle when the heap is used at 45%
-XX:InitiatingHeapOccupancyPercent (IHOP)
By delaying marking cycle:
• Reduce count of old gc
• Increase chance to reclaim empty regions ?
• Increase risk to trigger full GC
Java 9 now dynamically resizes IHOP!
JDK-8136677. Disable adaptative behavior with -XX:-G1UseAdaptiveIHOP
Delaying the marking cycle
48
0
50
100
150
200
250
300
350
0pt
20pt
40pt
55pt
65pt
75pt
80pt
85pt
88.75pt
91.25pt
93.75pt
95pt
96.25pt
97.1875pt
97.8125pt
98.4375pt
98.75pt
99.0625pt
99.2968pt
99.4531pt
99.6093pt
99.6875pt
99.7656pt
99.8242pt
99.8632pt
99.9023pt
99.9218pt
99.9414pt
99.956pt
99.9658pt
99.9755pt
99.9804pt
99.9853pt
99.989pt
Clientlatency(ms)
percentiles
Client latencies - IOHP - 31GB 300ms
IHOP 45 IHOP 60 IOHP 70 IHOP 80
49
Generated by gceasy.io
IHOP=80, 31GB heap
Max heap after GC = 25GB
4 “old” compaction
+10% “young” GC
GC Total: 24sec
IHOP=60, 31GB heap
Max heap after GC = 20GB
6 “old” compaction
GC Total: 21sec
Delaying the marking cycle
50
Conclusion:
• Reduce number of old gc but increase young
• No major improvement
• Increases risk of full GC
• Avoid increasing over 60% (or rely on java9 dynamic sizing – not tested)
Remember set updating time
51
G1 keep cross-region references into a structured called Remember Set.
Updated in batches to improve performances and avoid concurrent issue
-XX:G1RSetUpdatingPauseTimePercent controls how many time should be spent
in evacuation phase
percent of the MaxPauseTime. Default to 10
Adjust refinement thread zones (G1ConcRefinementGreen/Yellow/RedZone) after each gc
GC logs: [Update RS (ms): Min: 1255,1, Avg: 1256,0, Max: 1257,8, Diff: 2,8, Sum: 28889,1]
Remember set updating time
52
0
50
100
150
200
250
300
350
400
450
Clientlatencies(ms)
percentiles
R. Set updating time percent - 31GB, RegionSize=16mb, 300ms
RSetUpdatingPauseTimePercent=0 RSetUpdatingPauseTimePercent=5
RSetUpdatingPauseTimePercent=10 RSetUpdatingPauseTimePercent=15
Other settings
53
-XX:+ParallelRefProcEnabled
No noticeable difference for this workload
-XX:+UseStringDeduplication
No noticeable difference for this workload
-XX:G1MixedGCLiveThresholdPercent=45/55/65
No noticeable difference for this workload
Final settings for G1
54
-Xmx31G -Xms31G -XX:MaxGCPauseMillis=300 -XX:G1HeapRegionSize=16m -
XX:MaxTenuringThreshold=0 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=6 -
XX:G1MaxNewSizePercent=20 -XX:ParallelGCThreads=32 -XX:InitiatingHeapOccupancyPercent=55 -
XX:+ParallelRefProcEnabled
-XX:G1RSetUpdatingPauseTimePercent=5
Non-default VM flags: -XX:+AlwaysPreTouch -XX:CICompilerCount=15 -XX:CompileCommandFile=null -XX:ConcGCThreads=8 -XX:G1HeapRegionSize=16777216
-XX:G1RSetUpdatingPauseTimePercent=5 -XX:GCLogFileSize=10485760 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=33285996544 -
XX:InitialTenuringThreshold=0 -XX:+ManagementServer -XX:MarkStackSize=4194304 -XX:MaxGCPauseMillis=300 -XX:MaxHeapSize=33285996544 -
XX:MaxNewSize=19964887040 -XX:MaxTenuringThreshold=0 -XX:MinHeapDeltaBytes=16777216 -XX:NewSize=2621440000 -XX:NumberOfGCLogFiles=10 -
XX:OnOutOfMemoryError=null -XX:ParallelGCThreads=32 -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:PrintFLSStatistics=1 -XX:+PrintGC -
XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintPromotionFailure -
XX:+PrintTenuringDistribution -XX:+ResizeTLAB -XX:StringTableSize=1000003 -XX:ThreadPriorityPolicy=42 -XX:ThreadStackSize=256 -XX:-UseBiasedLocking -
XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCLogFileRotation -
XX:+UseNUMA -XX:+UseNUMAInterleaving -XX:+UseTLAB -XX:+UseThreadPriorities
• Zing (azul)
https://www.azul.com/files/wp_pgc_zing_v52.pdf
• Shenandoah (red hat)
https://www.youtube.com/watch?v=qBQtbkmURiQ
https://www.youtube.com/watch?v=VCeHkcwfF9Q
• Zgc (oracle)
Experimental. https://www.youtube.com/watch?v=tShc0dyFtgw
Low latencies
JVMs
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.55
Write barrier
56
A B C DB C
Object not marked
Hotspot uses write barrier to capture “pointer deletion”
Prevent D from being cleaned during the next GC
What about memory compaction?
Hotspot stop all applications thread, solving concurrent issues
C.myFieldD = null
-----------
markObjectAsPotentialAlive (D)
Read barrier
57
A B C DB C
Read D
-----------
If (D hasn’t been marked in this cycle){
markObject (D)
}
Return DD
What about memory compaction?
Read D
-----------
If (D is in a memory page being compacted){
followNewAddressOrMoveObjectToNewAddressNow(D)
updateReferenceToNewAddress(D)
}
Return D
Read barrier
58
A B C DB C D
Predictable, constant pause time
No matter how big the heap size is
Comes with a performance cost
Higher cpu usage
Ex: Gatling report takes 20% more time to complete using
low latency jvm vs hotspot+G1
(computation on 1 cpu running for 10minutes)
Low latencies JVM
59
Zing rampup
G1 rampup
JVM test, not a Cassandra benchmark!
Low latencies JVM
60
0
100
200
300
400
500
600
0pt
20pt
40pt
55pt
65pt
75pt
80pt
85pt
88.75pt
91.25pt
93.75pt
95pt
96.25pt
97.18pt
97.81pt
98.43pt
98.75pt
99.06pt
99.29pt
99.45pt
99.6pt
99.68pt
99.76pt
99.82pt
99.86pt
99.9pt
99.92pt
99.94pt
99.95pt
99.96pt
99.97pt
99.98pt
99.98pt
99.98pt
Clientlatncy(ms)
percentiles
31GB G1 31GB Zing
Low latencies JVM (shenandoah)
61
Shenandoah rampup
G1 rampup
JVM test, not a Cassandra benchmark!
Low latencies conclusion
62
Capable to deal with big heap
Can handle a bigger throughput than G1 before getting the first error
G1 pauses create a burst with potential timeout
Zing offers good performances, including with lower heap.
Shenandoah stable in our tests. Offers a very good alternative to G1 to avoid pause time
Try carefully, still young (?)
Small heap
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.63
Small heap
64
A typical webserver or kafka uses a few GB max.
The same guidelines apply
Pointers uncompressed: stay below 4GB
32 bit, 2^32
Same logic to set the max pause time (latency vs throughput)
With 3GB, you can aim around 30-50ms pause time
Usually less GC issue (won’t randomly freeze for 3 secs)
Not thoroughly tested !!
Conclusion
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.65
Conclusion
66
(Super) easy to get things wrong. Change 1 param at a time & test for > 1 day
Measure the wrong thing, test too short introducing external (not JVM) noise…
Lot of work for to save a few percentiles
Tests have been running for more than 300hours
Don’t over-tune. Keep things simple to avoid side effect
Keep target pause > 200ms
Keep heap size between 16GB and 31GB
Start with heap size = 30*(allocation rate).
G1 doesn’t play well with very sudden allocation spike
Can try to reduce -XX:G1MaxNewSizePercent=60 to mitigate this issue
Conclusion
67
GC pause still an issue? Upgrade to DSE6 or add extra servers
Running after the 99.9x pt? Go with low-latency jvms
Don’t trust what you’ve read! Check your GC logs!
© 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.68
Thank you
& Thanks to
Lucas Bruand & Laurent Dubois (DeepPay)
Pierre Laporte (datastax)

More Related Content

What's hot

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
A G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptxA G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptx
Monica Beckwith
 
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Monica Beckwith
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
MIJIN AN
 
G1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and TuningG1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and Tuning
Simone Bordet
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
DataStax
 
Mastering GC.pdf
Mastering GC.pdfMastering GC.pdf
Mastering GC.pdf
Jean-Philippe BEMPEL
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
Brendan Gregg
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
Brendan Gregg
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
Attila Szegedi
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
ScyllaDB
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 

What's hot (20)

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
A G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptxA G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptx
 
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
G1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and TuningG1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and Tuning
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Mastering GC.pdf
Mastering GC.pdfMastering GC.pdf
Mastering GC.pdf
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 

Similar to Jvm & Garbage collection tuning for low latencies application

Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
Prem Kuppumani
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
Erick Ramirez
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
In-Memory Computing Summit
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
C2B2 Consulting
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
aaronmorton
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
HostedbyConfluent
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
DataWorks Summit
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
GregSmith458515
 
Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
Vladislav Gangan
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Anna Shymchenko
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
Tacc Infinite Memory Engine
Tacc Infinite Memory EngineTacc Infinite Memory Engine
Tacc Infinite Memory Engine
inside-BigData.com
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
Subhas Kumar Ghosh
 
JVM memory management & Diagnostics
JVM memory management & DiagnosticsJVM memory management & Diagnostics
JVM memory management & Diagnostics
Dhaval Shah
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
bigbase
 

Similar to Jvm & Garbage collection tuning for low latencies application (20)

Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
Hotspot gc
Hotspot gcHotspot gc
Hotspot gc
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Tacc Infinite Memory Engine
Tacc Infinite Memory EngineTacc Infinite Memory Engine
Tacc Infinite Memory Engine
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
JVM memory management & Diagnostics
JVM memory management & DiagnosticsJVM memory management & Diagnostics
JVM memory management & Diagnostics
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

Jvm & Garbage collection tuning for low latencies application

  • 1. © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Confidential 1 JVMs & GCs GC tuning for low latencies with DSE Quentin Ambard @qambard (NOT a Cassandra benchmark!)
  • 2. Disclaimer 2 • Add implicit “for this use case” after each sentence • Not a definitive jvm guide. Big heap only. Don’t copy/paste the settings! • Base for standard use case. Might change for • Search workload, especially with big fq cache • Analytic workload (lot of concurrent pagination) • Abnormal memory usage (prepared statement cache etc.) • Off heap usage • …
  • 4. Agenda • Hotspot GCs • G1: Heap size & target pause time • G1: 31GB or 32GB? • G1: Advanced settings • Low latencies JVMs • Smaller heaps • Wrap up 4
  • 5. • Parallel collector • CMS • G1 Hotspot GCs © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.5
  • 6. Hotspot GCs 6 GCs are generational, using different strategy / algorithm for each generation lifetime object allocated Young Surv. Old
  • 7. Parallel collector 7 Eden Size fixed by conf Survivor (S0/S1) STW
  • 8. Parallel collector 8 Heap is full Triggers a Stop The World (STW) GC to mark & compact
  • 9. Parallel gc profile 9 Young GC Old is filled, full GC Bad latencies, good throughput (ex: batches)
  • 10. Concurrent Mark Sweep (CMS) collector 10 Young collection uses ParNewGC. Behaves like Parallel GC Difference: needs to communicate with CMS for the old generation
  • 11. Concurrent Mark Sweep (CMS) collector 11 Old region is getting too big Limit defined by -XX:CMSInitiatingHeapOccupancyPercent Start concurrent marking & cleanup Only smalls STW phase Delete only. No compaction leading to fragmentation Triggers a serial full STW GC if continuous memory can’t be allocated Memory is requested by “block” –XX:OldPLABSize=16 each thread requests a block to copy young to old
  • 12. CMS profile 12 Hard to tune. Need to choose the young size, won’t adapt to new workload inexplicit options. Easier when heap remains around 8GB Fragmentation will trigger super long full GC
  • 13. Garbage first (G1) collector 13 Heap (31GB) Empty Regions (-XX:G1HeapRegionSize)
  • 14. Garbage first (G1) collector 14 Young region
  • 15. Garbage first (G1) collector 15 Young space is full Dynamically sized to reach pause objective -XX:MaxGCPauseMillis Trigger STW “parallel gc” on young Scan object in each region Copy survivor to survivor In another young region Free regions
  • 16. Garbage first (G1) collector 16 Young space is full Dynamically sized to reach pause objective -XX:MaxGCPauseMillis Trigger STW “parallel gc” on young Scan object in each region Copy survivor to survivor In another young region. Survivor size dynamically adjusted Copy old survivor to old region In another young region Free regions
  • 17. Garbage first (G1) collector 17 Old space is getting too big Limit defined by -XX:InitiatingHeapOccupancyPercent=40 Start region concurrent scan Not blocking (2 short STW with SATB: start + end) 100% empty regions reclaimed Fast, “free” operation Trigger STW mixed gc: Dramatically reduce young size Includes a few old region in the next young gc Repeat mixed gc until -XX:G1HeapWastePercent 10% 100% 100% 80%78% 30% Young size being reduced to respect target pause time, G1 triggers several concurrent gc
  • 18. G1 profile 18 “mixed” gcEasy to tune, adaptable, predictable young GC pause
  • 19. • Heap size -Xmx • Target pause time -XX:MaxGCPauseMillis G1: Heap size & target pause time © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.19
  • 20. Test protocol 20 128GB RAM, 2x8 cores (32 ht cores) 2 CPU E5-2620 v4 @ 2.10GHz. Disk SSD RAID0. JDK 1.8.161 DSE 5.1.7, memtable size fixed to 4G. Data saved in ramdisk (40GB) Gatling query on 3 tables jvm: zing. Default C* schema configuration. Durable write=false (avoid commit log to reduce disk activity) Rows size: 100byte, 1.5k, 3k 50% Read, 50% write. Throughput 40-120kq/sec 33% of the read return a range of 10 values. Main tests also executed with small rows only. Datastax recommended OS settings applied
  • 22. GC log analysis 22 2018-07-27T12:22:59.997+0000: 1.835: Total time for which application threads were stopped: 0.0002855 seconds, Stopping threads took: 0.0000522 seconds {Heap before GC invocations=2 (full 0): garbage-first heap total 1048576K, used 98953K [0x00000000c0000000, 0x00000000c0102000, 0x0000000100000000) region size 1024K, 97 young (99328K), 8 survivors (8192K) Metaspace used 20965K, capacity 21190K, committed 21296K, reserved 1067008K class space used 3319K, capacity 3434K, committed 3456K, reserved 1048576K 2018-07-27T12:23:00.276+0000: 2.114: [GC pause (Metadata GC Threshold) (young) (initial-mark) Desired survivor size 7340032 bytes, new threshold 15 (max 15) - age 1: 1864496 bytes, 1864496 total - age 2: 2329992 bytes, 4194488 total , 0.0175524 secs] [Parallel Time: 15.3 ms, GC Workers: 2] [GC Worker Start (ms): Min: 2114.1, Avg: 2117.0, Max: 2120.0, Diff: 5.9] [Ext Root Scanning (ms): Min: 0.1, Avg: 4.6, Max: 9.1, Diff: 9.0, Sum: 9.2] [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] [Processed Buffers: Min: 0, Avg: 0.5, Max: 1, Diff: 1, Sum: 1] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.2, Diff: 1.2, Sum: 1.2] [Object Copy (ms): Min: 6.1, Avg: 7.0, Max: 7.9, Diff: 1.8, Sum: 14.0] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 9.4, Avg: 12.3, Max: 15.2, Diff: 5.9, Sum: 24.6] [GC Worker End (ms): Min: 2129.3, Avg: 2129.3, Max: 2129.3, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.1 ms] [Other: 2.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.6 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 89.0M(97.0M)->0.0B(602.0M) Survivors: 8192.0K->12.0M Heap: 97.1M(1024.0M)->12.1M(1024.0M)]
  • 24. Which Heap size 24 Why a bigger heap could be better? • Heap get filled slower (less gc) • Increase ratio of dead object during collections Moving (compacting) the remaining objects is the most heavy operation • Increases chances to flush an entirely region
  • 25. Which Heap size 25 Why a bigger heap could be worse? • Full GC has a bigger impact Now parallel with java 10 • Increases chances to trigger longer pauses • Less memory remaining for disk cache
  • 26. Heap size 26 Small heap (<16GB) have bad latencies After a given size, no obvious difference 0 100 200 300 400 500 600 0pt 20pt 40pt 55pt 65pt 75pt 80pt 85pt 88.75pt 91.25pt 93.75pt 95pt 96.25pt 97.18pt 97.81pt 98.43pt 98.75pt 99.06pt 99.29pt 99.45pt 99.6pt 99.68pt 99.76pt 99.92pt Latency(ms) Client latencies by Heap Size - target pause= 300ms, 60kq/sec 8GB 12GB 20GB 28GB 36GB 44GB 52GB 60GB 95th percentile = 160ms Perfect latencies
  • 27. Heap size & GC Pause time 27 0 20 40 60 80 100 120 8 18 28 38 48 58 Totalpausetime(sec) Heap size (GB) GC STW Pause by Heap size and heap allocation target pause 300ms – jvm allocation rate 800mb/s Total pause time (800mb/s)
  • 28. Heap size & GC Pause time 28 0 100 200 300 400 500 600 700 0 20 40 60 80 100 120 8 18 28 38 48 58 Heap size (GB) MaxPausetime(ms) Totalpausetime(sec) GC STW Pause by Heap size and heap allocation target pause 300ms – jvm allocation rate 800mb/s Total pause time (800mb/s) GC Max pause time (800mb/s)
  • 29. Heap size & GC Pause time 29 0 100 200 300 400 500 600 0 20 40 60 80 100 120 140 160 180 8 18 28 38 48 58 Heap size (GB) MaxPausetime(ms) Totalpausetime(sec) GC STW Pause by Heap size and heap allocation target pause 300ms – jvm allocation rate 1250mb/s Total pause time (1250mb/s) GC Max pause time (1250mb/s)
  • 30. Heap size & GC Pause time 30 0 100 200 300 400 500 600 700 0 20 40 60 80 100 120 140 160 180 8 18 28 38 48 58 MaxPausetime(ms) Totalpausetime(sec) Heap size (GB) GC STW Pause by Heap size and heap allocation target pause 300ms Total pause time (800mb/s) Total pause time (1250mb/s) GC Max pause time (800mb/s) GC Max pause time (1250mb/s)
  • 31. Target pause time -XX:MaxGCPauseMillis 31 0 10 20 30 40 50 60 70 50 100 200 300 400 500 600 Totalpausetime(sec) G1 Pause time target (ms) Total STW GC pause by target pause - heap 28GB Total STW GC duration
  • 32. Target pause time -XX:MaxGCPauseMillis 32 0 100 200 300 400 500 600 0 10 20 30 40 50 60 70 50 100 200 300 400 500 600 Maxpausetime(ms) Totalpausetime(sec) G1 Pause time target (ms) Total STW GC pause by target pause - heap 28GB Total STW GC duration (900 mb/sec) Max Pause time (900mb/sec)
  • 33. Target pause time -XX:MaxGCPauseMillis 33 0 50 100 150 200 250 300 350 400 450 500 90pt 91.25pt 92.5pt 93.75pt 94.37pt 95pt 95.62pt 96.25pt 96.87pt 97.18pt 97.5pt 97.81pt 98.12pt 98.43pt 98.59pt 98.75pt 98.9pt 99.06pt 99.21pt 99.29pt 99.37pt 99.45pt 99.53pt 99.6pt 99.64pt 99.68pt 99.72pt 99.76pt 99.8pt ClientPausetime(ms) percentiles Client pause time by GC target Pause 36GB-100ms 36GB-200ms 36GB-300ms 36GB-400ms 36GB-500ms 36GB-50ms 36GB-600ms
  • 34. Heap size Conclusion 35 G1 struggle with a too small heap. Also increases full GC risk GC pause time doesn’t reduce proportionally when heap size increase Sweet spot seems to be around 30x allocation rate, at least 12GB Keep -XX:MaxGCPauseMillis >= 200ms (for 30GB heap)
  • 35. • Oops compression • Region size • Zero based oops compression 31GB or 32GB © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.36
  • 36. 31GB or 32GB? 37 Up to 31GB: Oops compressed on 32bit with 8 bytes alignement (3 bit shift) 8 0000 1000 0000 0001 32 0010 0000 0000 0100 40 0010 1000 0000 0101 2^32 => 4G. 3 bit shift trick leads to 2^3 = 8 times more addresses. 2^32 * 2^3 = 32G 32GB: Oops on 64 bits Heap from 32GB to 64GB can be aligned on 16bit G1 targets 2048 regions and changes the default size at 32GB 31GB => XX:G1HeapRegionSize=8m = 3875 regions 32GB => XX:G1HeapRegionSize=16m = 2000 regions Region number can have an impact on Remember Set update/scan [Update RS (ms): Min: 1232.7, Avg: 1236.0, Max: 1237.7, Diff: 5.0, Sum: 12576.2] [Scan RS (ms): Min: 1328.5, Avg: 1329.5, Max: 1332.7, Diff: 4.2, Sum: 21271.7] nodetool sjk mx -mc -b "com.sun.management:type=HotSpotDiagnostic" -op getVMOption -a UseCompressedOops
  • 37. 31GB or 32GB? 38 No major difference Concurrent marking cycle is slower with smaller region (8MB => +20%) No major difference in cpu usage 31GB + RegionSize=16MB Total GC Pause time -10% Latencies mean -8% All other GC metrics are very similar Not sure? stick with 31GB + RegionSize=16MB0 50 100 150 200 250 300 350 400 92.5pt 93.75pt 94.375pt 95pt 95.625pt 96.25pt 96.875pt 97.187pt 97.5pt 97.812pt 98.125pt 98.437pt 98.593pt 98.75pt 98.906pt 99.062pt 99.218pt 99.296pt 99.375pt 99.453pt 99.531pt 99.609pt 99.648pt 99.687pt 99.726pt 99.765pt 99.804pt 99.824pt 99.843pt 99.863pt 99.882pt Clientlatency(ms) percentiles 32 or 31GB / Compressed oops & regions size 32GB RS=16mb bytealign=16 32GB RS=8mb bytealign=16 32GB RS=16mb 31GB RS=8mb 31GB RS=16mb 32GB RS=8mb
  • 38. Zero based compressed oops? 39 Zero based compressed oops Virtual memory starts at zero: native oop = (compressed oop << 3) Not zero based: if (compressed oop is null) native oop = null else native oop = base + (compressed oop << 3) Happens around 26/30GB Can be checked with -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode No noticeable difference for this workload
  • 39. • G1NewSizePercent • ParallelGCThreads • MaxTenuringThreshold • InitiatingHeapOccupancyPercent • G1RSetUpdatingPauseTimePercent • … More advanced settings for G1 © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.40
  • 40. XX:ParallelGCThreads 41 Defines how many thread participate in GC. 2x8 physical cores 32 with hyperthreading threads STW total time 8 90 sec 16 41 sec 32 32 sec 40 35 sec 0 50 100 150 200 250 300 350 400 Clientlatency(ms) ParallelGCThreads - 31GB / 300ms 8 threads 16 threads 32 threads 40 threads
  • 41. Minimum young size 42 During mixed gc, young size is drastically reduced (1.5GB with 31GB heap) Young get filled very quickly. Can lead to multiple consecutive GC We can force it to a minimum size XX:G1NewSizePercent=8 seems to be a better default (5) Interval between GC increased by x3 during mixed GC (min 3 sec) No noticeable changes in throughout and latencies (Increase mixed time, reduce young gc time) GC Pause time, 31GB GC Pause time, 31GB -XX:NewSize=4GB GC every sec (or multiple per sec)
  • 42. Survivor threshold 43 Defines how many times data will be copied into young before promotion to old Dynamically resized by G1 Desired survivor size 27262976 bytes, new threshold 3 (max 15) - age 1: 21624512 bytes, 21624512 total - age 2: 4510912 bytes, 26135424 total - age 3: 5541504 bytes, 31676928 total Default 15, but tends to remains <= 4 under heavy load quickly fill survivor space defined by XX:SurvivorRatio Most object should be either long-living or instantaneous. Is it worth disabling survivor ? -XX:MaxTenuringThreshold=0 (default 15)
  • 43. Survivor threshold 44 0 100 200 300 400 500 600 0pt 40pt 65pt 80pt 88.75pt 93.75pt 96.25pt 97.8125pt 98.75pt 99.2968pt 99.6093pt 99.7656pt 99.8632pt 99.9218pt 99.956pt 99.9755pt 99.9853pt 99.9914pt 99.9951pt 99.9972pt 99.9984pt 99.999pt Client Latencies by Max survivor age - 31GB, 300ms Max age 0 Max age 1 Max age 2 Max age 3 Max age 4 Removing survivor greatly reduces GC (count -40%, time -50%) In this case doesn’t increase old gc count most survivor objects seems to be promoted anyway ! “Prematured promotion” could potentially fill the old generation quickly !
  • 44. Survivor threshold - JVM pauses 45 Max tenuring = 15 (default) GC Avg: 235 ms GC Max: 381 ms STW GC Total: 53 sec Max tenuring = 0 GC Avg: 157 ms GC Max: 290 ms STW GC Total: 28 sec Generated by gceasy.io
  • 45. Survivor threshold 46 Generated by gceasy.io Check your GC log first! In this case (almost no activity), the survivor size doesn’t reduce much after 4 periods Try with -XX:MaxTenuringThreshold=4
  • 46. Delaying the marking cycle 47 By default, G1 starts a marking cycle when the heap is used at 45% -XX:InitiatingHeapOccupancyPercent (IHOP) By delaying marking cycle: • Reduce count of old gc • Increase chance to reclaim empty regions ? • Increase risk to trigger full GC Java 9 now dynamically resizes IHOP! JDK-8136677. Disable adaptative behavior with -XX:-G1UseAdaptiveIHOP
  • 47. Delaying the marking cycle 48 0 50 100 150 200 250 300 350 0pt 20pt 40pt 55pt 65pt 75pt 80pt 85pt 88.75pt 91.25pt 93.75pt 95pt 96.25pt 97.1875pt 97.8125pt 98.4375pt 98.75pt 99.0625pt 99.2968pt 99.4531pt 99.6093pt 99.6875pt 99.7656pt 99.8242pt 99.8632pt 99.9023pt 99.9218pt 99.9414pt 99.956pt 99.9658pt 99.9755pt 99.9804pt 99.9853pt 99.989pt Clientlatency(ms) percentiles Client latencies - IOHP - 31GB 300ms IHOP 45 IHOP 60 IOHP 70 IHOP 80
  • 48. 49 Generated by gceasy.io IHOP=80, 31GB heap Max heap after GC = 25GB 4 “old” compaction +10% “young” GC GC Total: 24sec IHOP=60, 31GB heap Max heap after GC = 20GB 6 “old” compaction GC Total: 21sec
  • 49. Delaying the marking cycle 50 Conclusion: • Reduce number of old gc but increase young • No major improvement • Increases risk of full GC • Avoid increasing over 60% (or rely on java9 dynamic sizing – not tested)
  • 50. Remember set updating time 51 G1 keep cross-region references into a structured called Remember Set. Updated in batches to improve performances and avoid concurrent issue -XX:G1RSetUpdatingPauseTimePercent controls how many time should be spent in evacuation phase percent of the MaxPauseTime. Default to 10 Adjust refinement thread zones (G1ConcRefinementGreen/Yellow/RedZone) after each gc GC logs: [Update RS (ms): Min: 1255,1, Avg: 1256,0, Max: 1257,8, Diff: 2,8, Sum: 28889,1]
  • 51. Remember set updating time 52 0 50 100 150 200 250 300 350 400 450 Clientlatencies(ms) percentiles R. Set updating time percent - 31GB, RegionSize=16mb, 300ms RSetUpdatingPauseTimePercent=0 RSetUpdatingPauseTimePercent=5 RSetUpdatingPauseTimePercent=10 RSetUpdatingPauseTimePercent=15
  • 52. Other settings 53 -XX:+ParallelRefProcEnabled No noticeable difference for this workload -XX:+UseStringDeduplication No noticeable difference for this workload -XX:G1MixedGCLiveThresholdPercent=45/55/65 No noticeable difference for this workload
  • 53. Final settings for G1 54 -Xmx31G -Xms31G -XX:MaxGCPauseMillis=300 -XX:G1HeapRegionSize=16m - XX:MaxTenuringThreshold=0 -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=6 - XX:G1MaxNewSizePercent=20 -XX:ParallelGCThreads=32 -XX:InitiatingHeapOccupancyPercent=55 - XX:+ParallelRefProcEnabled -XX:G1RSetUpdatingPauseTimePercent=5 Non-default VM flags: -XX:+AlwaysPreTouch -XX:CICompilerCount=15 -XX:CompileCommandFile=null -XX:ConcGCThreads=8 -XX:G1HeapRegionSize=16777216 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:GCLogFileSize=10485760 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=33285996544 - XX:InitialTenuringThreshold=0 -XX:+ManagementServer -XX:MarkStackSize=4194304 -XX:MaxGCPauseMillis=300 -XX:MaxHeapSize=33285996544 - XX:MaxNewSize=19964887040 -XX:MaxTenuringThreshold=0 -XX:MinHeapDeltaBytes=16777216 -XX:NewSize=2621440000 -XX:NumberOfGCLogFiles=10 - XX:OnOutOfMemoryError=null -XX:ParallelGCThreads=32 -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:PrintFLSStatistics=1 -XX:+PrintGC - XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintPromotionFailure - XX:+PrintTenuringDistribution -XX:+ResizeTLAB -XX:StringTableSize=1000003 -XX:ThreadPriorityPolicy=42 -XX:ThreadStackSize=256 -XX:-UseBiasedLocking - XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCLogFileRotation - XX:+UseNUMA -XX:+UseNUMAInterleaving -XX:+UseTLAB -XX:+UseThreadPriorities
  • 54. • Zing (azul) https://www.azul.com/files/wp_pgc_zing_v52.pdf • Shenandoah (red hat) https://www.youtube.com/watch?v=qBQtbkmURiQ https://www.youtube.com/watch?v=VCeHkcwfF9Q • Zgc (oracle) Experimental. https://www.youtube.com/watch?v=tShc0dyFtgw Low latencies JVMs © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.55
  • 55. Write barrier 56 A B C DB C Object not marked Hotspot uses write barrier to capture “pointer deletion” Prevent D from being cleaned during the next GC What about memory compaction? Hotspot stop all applications thread, solving concurrent issues C.myFieldD = null ----------- markObjectAsPotentialAlive (D)
  • 56. Read barrier 57 A B C DB C Read D ----------- If (D hasn’t been marked in this cycle){ markObject (D) } Return DD What about memory compaction? Read D ----------- If (D is in a memory page being compacted){ followNewAddressOrMoveObjectToNewAddressNow(D) updateReferenceToNewAddress(D) } Return D
  • 57. Read barrier 58 A B C DB C D Predictable, constant pause time No matter how big the heap size is Comes with a performance cost Higher cpu usage Ex: Gatling report takes 20% more time to complete using low latency jvm vs hotspot+G1 (computation on 1 cpu running for 10minutes)
  • 58. Low latencies JVM 59 Zing rampup G1 rampup JVM test, not a Cassandra benchmark!
  • 60. Low latencies JVM (shenandoah) 61 Shenandoah rampup G1 rampup JVM test, not a Cassandra benchmark!
  • 61. Low latencies conclusion 62 Capable to deal with big heap Can handle a bigger throughput than G1 before getting the first error G1 pauses create a burst with potential timeout Zing offers good performances, including with lower heap. Shenandoah stable in our tests. Offers a very good alternative to G1 to avoid pause time Try carefully, still young (?)
  • 62. Small heap © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.63
  • 63. Small heap 64 A typical webserver or kafka uses a few GB max. The same guidelines apply Pointers uncompressed: stay below 4GB 32 bit, 2^32 Same logic to set the max pause time (latency vs throughput) With 3GB, you can aim around 30-50ms pause time Usually less GC issue (won’t randomly freeze for 3 secs) Not thoroughly tested !!
  • 64. Conclusion © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.65
  • 65. Conclusion 66 (Super) easy to get things wrong. Change 1 param at a time & test for > 1 day Measure the wrong thing, test too short introducing external (not JVM) noise… Lot of work for to save a few percentiles Tests have been running for more than 300hours Don’t over-tune. Keep things simple to avoid side effect Keep target pause > 200ms Keep heap size between 16GB and 31GB Start with heap size = 30*(allocation rate). G1 doesn’t play well with very sudden allocation spike Can try to reduce -XX:G1MaxNewSizePercent=60 to mitigate this issue
  • 66. Conclusion 67 GC pause still an issue? Upgrade to DSE6 or add extra servers Running after the 99.9x pt? Go with low-latency jvms Don’t trust what you’ve read! Check your GC logs!
  • 67. © 2018 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.68 Thank you & Thanks to Lucas Bruand & Laurent Dubois (DeepPay) Pierre Laporte (datastax)

Editor's Notes

  1. see JobGenerator.scala
  2. see JobGenerator.scala
  3. see JobGenerator.scala
  4. see JobGenerator.scala
  5. see JobGenerator.scala
  6. see JobGenerator.scala
  7. see JobGenerator.scala
  8. see JobGenerator.scala
  9. see JobGenerator.scala
  10. see JobGenerator.scala
  11. see JobGenerator.scala
  12. see JobGenerator.scala
  13. see JobGenerator.scala
  14. see JobGenerator.scala
  15. see JobGenerator.scala
  16. see JobGenerator.scala
  17. see JobGenerator.scala
  18. see JobGenerator.scala
  19. see JobGenerator.scala
  20. see JobGenerator.scala
  21. see JobGenerator.scala
  22. see JobGenerator.scala
  23. see JobGenerator.scala
  24. see JobGenerator.scala
  25. see JobGenerator.scala
  26. see JobGenerator.scala
  27. see JobGenerator.scala
  28. see JobGenerator.scala
  29. see JobGenerator.scala
  30. see JobGenerator.scala
  31. see JobGenerator.scala
  32. see JobGenerator.scala
  33. see JobGenerator.scala
  34. see JobGenerator.scala
  35. see JobGenerator.scala
  36. see JobGenerator.scala
  37. see JobGenerator.scala
  38. see JobGenerator.scala
  39. see JobGenerator.scala
  40. see JobGenerator.scala
  41. see JobGenerator.scala
  42. see JobGenerator.scala
  43. see JobGenerator.scala
  44. see JobGenerator.scala
  45. see JobGenerator.scala
  46. see JobGenerator.scala
  47. see JobGenerator.scala
  48. see JobGenerator.scala
  49. see JobGenerator.scala
  50. see JobGenerator.scala
  51. see JobGenerator.scala
  52. see JobGenerator.scala