GC Tuning in the HotSpot Java VM - a FISL 10 Presentation

12,744 views
12,561 views

Published on

0 Comments
29 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
12,744
On SlideShare
0
From Embeds
0
Number of Embeds
205
Actions
Shares
0
Downloads
610
Comments
0
Likes
29
Embeds 0
No embeds

No notes for slide

GC Tuning in the HotSpot Java VM - a FISL 10 Presentation

  1. 1. Garbage Collection Tuning in the Java HotSpot™ Virtual Machine Tony Printezis, Charlie Hunt, Ludovic Poitou Sun Microsystems, Inc. 1
  2. 2. Who We Are • Tony Printezis > GC Group / HotSpot JVM development team > Been working on the HotSpot JVM since 2006 > 10+ years of GC experience • Charlie Hunt > Java Platform Performance Engineering Group > Works with many Sun product teams and customers > 10+ years of Java technology performance work • Ludovic Poitou (just the narrator) > Directory Services Engineering, OpenDS Community guy > 10+ years of Scaling LDAP directories, now with Java Copyright Sun Microsystems, Inc. 2
  3. 3. If you remember only one thing... GC Tuning is an Art ! Copyright Sun Microsystems, Inc. 3
  4. 4. GC Tuning is an Art • Unfortunately, we can't give you a flawless recipe or a flowchart that will apply to all your GC tuning scenarios • GC tuning involves a lot of common pattern recognition • This pattern recognition requires experience > We have a lot of it. :-) Copyright Sun Microsystems, Inc. 4
  5. 5. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 5
  6. 6. GCs in the HotSpot JVM • Three available GCs: > Serial GC > Parallel GC / Parallel Old GC > Concurrent Mark-Sweep GC (CMS) Copyright Sun Microsystems, Inc. 6
  7. 7. Heap Layout (same for all GCs) Young Generation Old Generation Permanent Generation Copyright Sun Microsystems, Inc. 7
  8. 8. Young Generation Allocation (new Object()) Eden Survivor Spaces Copyright Sun Microsystems, Inc. 8
  9. 9. Old Generation Promotion (survivors from the Young Generation) Copyright Sun Microsystems, Inc. 9
  10. 10. Permanent Generation Allocation (only directly from the JVM) Copyright Sun Microsystems, Inc. 10
  11. 11. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 11
  12. 12. Your Dream GC • You would really like a GC that has > Low GC overhead, > Low GC pause times, and > Good space efficiency • Unfortunately, you'll have to pick two (any two!) Copyright Sun Microsystems, Inc. 12
  13. 13. Heap Sizing Tuning Advice Supersize it! Copyright Sun Microsystems, Inc. 13
  14. 14. Heap Sizing Trade-Offs • Generally, the larger the heap space, the better > For both young and old generation > Larger space: less frequent GCs, lower GC overhead, objects more likely to become garbage > Smaller space: faster GCs (not always! see later) • Sometimes max heap size is dictated by available memory and/or max space the JVM can address > You have to find a good balance between young and old generation size Copyright Sun Microsystems, Inc. 14
  15. 15. Generation Size Roles • Young Generation Size > Dictates frequency of minor GCs > Dictates how many objects will be reclaimed in the young generation – Along with tenuring threshold + survivor space size tuning • Old Generation > Should comfortably hold the application's steady-state live size > Decrease the major GC frequency as much as possible Copyright Sun Microsystems, Inc. 15
  16. 16. Two Very Important Points • You should try to maximize the number of objects reclaimed in the young generation > This is probably the most important piece of advice when sizing a heap and/or tuning the young generation • Your application's memory footprint should not exceed the available physical memory > This is probably the second most important piece of advice when sizing a heap • The above apply to all our GCs Copyright Sun Microsystems, Inc. 16
  17. 17. Sizing Heap Spaces • -Xmx<size> : max heap size > young generation + old generation • -Xms<size> : initial heap size > young generation + old generation • -Xmn<size> : young generation size • Applications with emphasis on performance tend to set -Xms and -Xmx to the same value • When -Xms != -Xmx, heap growth or shrinking requires a Full GC Copyright Sun Microsystems, Inc. 17
  18. 18. Should -Xms == -Xmx? • Set -Xms to what you think would be your desired heap size > It's expensive to grow the heap • If memory allows, set -Xmx to something larger than -Xms “just in case” > Maybe the application is hit with more load > Maybe the DB gets larger over time • In most occasions, it's better to do a Full GC and grow the heap than to get an OOM and crash Copyright Sun Microsystems, Inc. 18
  19. 19. Sizing Heap Spaces (ii) • -XX:PermSize=<size> : permanent generation initial size • -XX:MaxPermSize=<size> : permanent generation max size • Applications with emphasis on performance almost always set -XX:PermSize and -XX:MaxPermSize to the same value > Growing or shrinking the permanent generation requires a Full GC too • Unfortunately, the permanent generation occupancy is hard to predict Copyright Sun Microsystems, Inc. 19
  20. 20. Stop-The-World Parallel GC Threads • The number of parallel GC threads is controlled by - XX:ParallelGCThreads=<num> • Default value assumes only one JVM per system • Set the parallel GC thread number according to: > Number of JVMs deployed on the system / processor set / zone > CPU chip architecture – Multiple hardware threads per chip core, i.e., UltraSPARC T1 / T2 Copyright Sun Microsystems, Inc. 20
  21. 21. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 21
  22. 22. Young Generation Sizing • Eden size determines > The frequency of minor GCs > Which objects will be reclaimed at age 0 – Newly-allocated objects in Eden start from age 0 – Their age is incremented at every minor GC • Increasing the size of the Eden will not always affect minor GC times > Remember: minor GC times are proportional to the amount of objects they copy (i.e., the live objects), not the young generation size Copyright Sun Microsystems, Inc. 22
  23. 23. Young Object Survivor Ratio Survivor Ratio 0 Youngest New-Allocated Object Age Oldest Copyright Sun Microsystems, Inc. 23
  24. 24. Young Object Survivor Ratio (ii) Survivor Ratio 0 Youngest New-Allocated Object Age Oldest Copyright Sun Microsystems, Inc. 24
  25. 25. Young Object Survivor Ratio (iii) Survivor Ratio 0 Youngest New-Allocated Object Age Oldest Copyright Sun Microsystems, Inc. 25
  26. 26. Sizing Heap Spaces (iii) • -XX:NewSize=<size> : initial young generation size • -XX:MaxNewSize=<size> : max young generation size • -XX:NewRatio=<ratio> : young generation to old generation ratio • Applications with emphasis on performance tend to use -Xmn to size the young generation since it combines the use of -XX:NewSize and -XX:MaxNewSize Copyright Sun Microsystems, Inc. 26
  27. 27. Tenuring • -XX:TargetSurvivorRatio=<percent>, e.g., 50 > How much of the survivor space should be filled – Typically leave extra space to deal with “spikes” • -XX:InitialTenuringThreshold=<threshold> • -XX:MaxTenuringThreshold=<threshold> • -XX:+AlwaysTenure > Never keep any objects in the survivor spaces • -XX:SurvivorRatio=<Integer>, e.g., 6 > Eden to Survivor Size Ratio Copyright Sun Microsystems, Inc. 27
  28. 28. Tenuring Threshold Trade-Offs • Try to retain as many objects as possible in the survivor spaces so that they can be reclaimed in the young generation > Less promotion into the old generation > Less frequent old GCs • But also, try not to unnecessarily copy very long- lived objects between the survivors > Unnecessary overhead on minor GCs • Not always easy to find the perfect balance > Generally: better copy more, than promote more Copyright Sun Microsystems, Inc. 28
  29. 29. Tenuring Distribution • Monitor tenuring distribution with -XX:+PrintTenuringDistribution Desired survivor size 6684672 bytes, new threshold 8 (max 8) - age 1: 2315488 bytes, 2315488 total - age 2: 19528 bytes, 2335016 total - age 3: 96 bytes, 2335112 total - age 4: 32 bytes, 2335144 total • Young generation seems well tuned here > We can even decrease the survivor space size Copyright Sun Microsystems, Inc. 29
  30. 30. Tenuring Distribution (ii) Desired survivor size 3342336 bytes, new threshold 1 (max 6) - age 1: 3956928 bytes, 3956928 total • Survivor space too small! > Increase survivor space and/or eden size Copyright Sun Microsystems, Inc. 30
  31. 31. Tenuring Distribution (iii) Desired survivor size 3342336 bytes, new threshold 6 (max 6) - age 1: 2483440 bytes, 2483440 total - age 2: 501240 bytes, 2984680 total - age 3: 50016 bytes, 3034696 total - age 4: 49088 bytes, 3083784 total - age 5: 48616 bytes, 3132400 total - age 6: 50128 bytes, 3182528 total • Might be able to do better > Either increase max tenuring threshold > Or even set max tenuring threshold to 2 – If ages > 6 still have around 50K of surviving bytes Copyright Sun Microsystems, Inc. 31
  32. 32. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 32
  33. 33. Parallel GC Ergonomics • The Parallel GC has ergonomics > i.e., auto-tuning • Ergonomics help in improving out-of-the-box GC performance • To get maximum performance, most customers we know do manual tuning Copyright Sun Microsystems, Inc. 33
  34. 34. Parallel GC Tuning Advice • Tune the young generation as described so far • Try to avoid / decrease the frequency of major GCs • We know of customers who use the Parallel GC in low-pause environments > Avoid Full GCs by avoiding / minimizing promotion > Maximize heap size Copyright Sun Microsystems, Inc. 34
  35. 35. NUMA • Non-Uniform Memory Access > Applicable to most SPARC, Opteron, more recently Intel platforms • -XX:+UseNUMA • Splits the young generation into partitions > Each partition “belongs” to a CPU • Allocates new objects into the partition that belongs to the allocating CPU • Big win for some applications Copyright Sun Microsystems, Inc. 35
  36. 36. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 36
  37. 37. CMS Tuning Advice • Tune the young generation as described so far • Need to be even more careful about avoiding premature promotion > Originally we were using an +AlwaysTenure policy > We have since changed our mind :-) • Promotion in CMS is expensive (free lists) • The more often promotion / reclamation happens, the more likely fragmentation will settle in the heap Copyright Sun Microsystems, Inc. 37
  38. 38. CMS Tuning Advice (ii) • We know customers who tune their applications to do mostly minor GCs, even with CMS > CMS is used as a “safety net”, when applications load exceeds what they have provisioned for > Schedule Full GCs at non-critical times (say, late at night) to “tidy up” the heap and minimize fragmentation Copyright Sun Microsystems, Inc. 38
  39. 39. Fragmentation • Two types > External fragmentation – No free chuck is large enough to satisfy an allocation > Internal fragmentation – Allocator rounds up allocation requests – Free space wasted due to this rounding up Copyright Sun Microsystems, Inc. 39
  40. 40. Fragmentation (ii) • The bad news: you can never eliminate it! > It has been proven • The good news: you can decrease its likelihood > Decrease promotion into the CMS old generation > Be careful when coding – Large objects of various sizes are the main cause Copyright Sun Microsystems, Inc. 40
  41. 41. Concurrent CMS GC Threads • Number of parallel CMS threads is controlled by -XX:ParallelCMSThreads=<num> > Available in post 6 JVMs • Trade-Off > CMS cycle duration vs. > Concurrent overhead during a CMS cycle Copyright Sun Microsystems, Inc. 41
  42. 42. Permanent Generation and CMS • To date, classes will not be unloaded by default from the permanent generation when using CMS > Both -XX:+CMSClassUnloadingEnabled and -XX: +PermGenSweepingEnabled need to be set to enable class unloading in CMS > The 2nd switch is not needed in post 6u4 JVMs Copyright Sun Microsystems, Inc. 42
  43. 43. Setting CMS Initiating Threshold • Again, a tricky trade-off! • Starting a CMS cycle too early > Frequent CMS cycles > High concurrent overhead • Starting a CMS cycle too late > Chance of an evacuation failure / Full GC • Initiating heap occupancy should be (much) higher than the application steady-state live size • Otherwise, CMS will constantly do CMS cycles Copyright Sun Microsystems, Inc. 43
  44. 44. Common CMS Scenarios • Applications that promote non-trivial amounts of objects to the old generation > Old generation grows at a non-trivial rate > Very frequent CMS cycles > CMS cycles need to start relatively early • Applications that promote very few or even no objects to the old generation > Old generation grows very slowly, if at all > Very infrequent CMS cycles > CMS cycles can start quite late Copyright Sun Microsystems, Inc. 44
  45. 45. Initiating CMS Cycles • CMS will try to automatically find the best initiating occupancy > It first does a CMS cycle early to collect stats > Then, it tries to start cycles as late as possible, but early enough not to run out of heap before the cycle completes > It keeps collecting stats and adjusting when to start cycles > Sometimes, the second cycle starts too late Copyright Sun Microsystems, Inc. 45
  46. 46. Initiating CMS Cycles (ii) • -XX:CMSInitiatingOccupancyFraction=<percent> > Occupancy percentage of CMS old generation that triggers a CMS cycle • -XX:+UseCMSInitiatingOccupancyOnly > Don't use the ergonomic initiating occupancy Copyright Sun Microsystems, Inc. 46
  47. 47. Initiating CMS Cycles (iii) • -XX:CMSInitiatingPermOccupancyFraction=<percent> > Occupancy percentage of permanent generation that triggers a CMS cycle > Class unloading must be enabled Copyright Sun Microsystems, Inc. 47
  48. 48. CMS Cycle Initiation Example • This is good: [ParNew 640710K->546360K(773376K), 0.1839508 secs] [CMS-initial-mark 548460K(773376K), 0.0883685 secs] [ParNew 651320K->556690K(773376K), 0.2052309 secs] [CMS-concurrent-mark: 0.832/1.038 secs] [CMS-concurrent-preclean: 0.146/0.151 secs] [CMS-concurrent-abortable-preclean: 0.181/0.181 secs] [CMS-remark 623877K(773376K), 0.0328863 secs] [ParNew 655656K->561336K(773376K), 0.2088224 secs] [ParNew 648882K->554390K(773376K), 0.2053158 secs] ... [ParNew 489586K->395012K(773376K), 0.2050494 secs] [ParNew 463096K->368901K(773376K), 0.2137257 secs] [CMS-concurrent-sweep: 4.873/6.745 secs] [CMS-concurrent-reset: 0.010/0.010 secs] [ParNew 445124K->350518K(773376K), 0.1800791 secs] [ParNew 455478K->361141K(773376K), 0.1849950 secs] Copyright Sun Microsystems, Inc. 48
  49. 49. CMS Cycle Initiation Example (ii) • Cycle started too early: [ParNew 390868K->296358K(773376K), 0.1882258 secs] [CMS-initial-mark 298458K(773376K), 0.0847541 secs] [ParNew 401318K->306863K(773376K), 0.1933159 secs] [CMS-concurrent-mark: 0.787/0.981 secs] [CMS-concurrent-preclean: 0.149/0.152 secs] [CMS-concurrent-abortable-preclean: 0.105/0.183 secs] [CMS-remark 374049K(773376K), 0.0353394 secs] [ParNew 407285K->312829K(773376K), 0.1969370 secs] [ParNew 405554K->311100K(773376K), 0.1922082 secs] [ParNew 404913K->310361K(773376K), 0.1909849 secs] [ParNew 406005K->311878K(773376K), 0.2012884 secs] [CMS-concurrent-sweep: 2.179/2.963 secs] [CMS-concurrent-reset: 0.010/0.010 secs] [ParNew 387767K->292925K(773376K), 0.1843175 secs] [CMS-initial-mark 295026K(773376K), 0.0865858 secs] [ParNew 397885K->303822K(773376K), 0.1995878 secs] Copyright Sun Microsystems, Inc. 49
  50. 50. CMS Cycle Initiation Example (iii) • Cycle started too late: [ParNew 742993K->648506K(773376K), 0.1688876 secs] [ParNew 753466K->659042K(773376K), 0.1695921 secs] [CMS-initial-mark 661142K(773376K), 0.0861029 secs] [Full GC 645986K->234335K(655360K), 8.9112629 secs] [ParNew 339295K->247490K(773376K), 0.0230993 secs] [ParNew 352450K->259959K(773376K), 0.1933945 secs] Copyright Sun Microsystems, Inc. 50
  51. 51. Start CMS Cycles Explicitly • If relying on explicit GCs and want them to be concurrent, use: > -XX:+ExplicitGCInvokesConcurrent – Requires a post 6 JVM > -XX:+ExplicitGCInvokesConcurrentAndUnloadClasses – Requires a post 6u4 JVM • Useful when wanting to cause references / finalizers to be processed Copyright Sun Microsystems, Inc. 51
  52. 52. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 52
  53. 53. Monitoring the GC • Online > VisualVM: http://visualvm.dev.java.net/ > VisualGC: – http://java.sun.com/performance/jvmstat/ – VisualGC is also available as a VisualVM plug-in – Can monitor multiple JVMs within the same tool • Offline > GC Logging > PrintGCStats > GChisto Copyright Sun Microsystems, Inc. 53
  54. 54. GC Logging in Production • Don't be afraid to enable GC logging in production > Very helpful when diagnosing production issues • Extremely low / non-existent overhead > Maybe some large files in your file system. :-) > We are surprised that customers are still afraid to enable it • Real customer quote: > “If someone doesn't enable GC logging in production, I shoot them!” Copyright Sun Microsystems, Inc. 54
  55. 55. Important GC Logging Parameters • You need at least: > -XX:+PrintGCTimeStamps – Add -XX:+PrintGCDateStamps if you must > -XX:+PrintGCDetails – Preferred over -verbosegc as it's more detailed • Also useful: > -Xloggc:<file> > Separates GC logging output from application output Copyright Sun Microsystems, Inc. 55
  56. 56. PrintGCStats • Summarizes GC logs • Downloadable script from > http://java.sun.com/developer/technicalArticles/Program ming/turbo/PrintGCStats.zip • Usage > PrintGCStats -v cpus=<num> <gc log file> – Where <num> is the number of CPUs on the machine where the GC log was obtained • It might not work with some of the printing flags Copyright Sun Microsystems, Inc. 56
  57. 57. PrintGCStats Parallel GC what count total mean max stddev gen0t(s) 193 11.470 0.05943 0.687 0.0633 gen1t(s) 1 7.350 7.34973 7.350 0.0000 GC(s) 194 18.819 0.09701 7.350 0.5272 alloc(MB) 193 11244.609 58.26222 100.875 18.8519 promo(MB) 193 807.236 4.18257 96.426 9.9291 used0(MB) 193 16018.930 82.99964 114.375 17.4899 used1(MB) 1 635.896 635.89648 635.896 0.0000 used(MB) 194 91802.213 473.20728 736.490 87.8376 commit0(MB) 193 17854.188 92.50874 114.500 9.8209 commit1(MB) 193 123520.000 640.00000 640.000 0.0000 commit(MB) 193 141374.188 732.50874 754.500 9.8209 alloc/elapsed_time = 11244.609 MB / 77.237 s = 145.586 MB/s alloc/tot_cpu_time = 11244.609 MB / 1235.792 s = 9.099 MB/s alloc/mut_cpu_time = 11244.609 MB / 934.682 s = 12.030 MB/s promo/elapsed_time = 807.236 MB / 77.237 s = 10.451 MB/s promo/gc0_time = 807.236 MB / 11.470 s = 70.380 MB/s gc_seq_load = 301.110 s / 1235.792 s = 24.366% gc_conc_load = 0.000 s / 1235.792 s = 0.000% gc_tot_load = 301.110 s / 1235.792 s = 24.366% Copyright Sun Microsystems, Inc. 57
  58. 58. PrintGCStats CMS what count total mean max stddev gen0(s) 110 24.381 0.22164 1.751 0.2038 gen0t(s) 110 24.397 0.22179 1.751 0.2038 cmsIM(s) 3 0.285 0.09494 0.108 0.0112 cmsRM(s) 3 0.092 0.03074 0.032 0.0015 GC(s) 113 24.774 0.21924 1.751 0.2013 cmsCM(s) 3 2.459 0.81967 0.835 0.0146 cmsCP(s) 6 0.971 0.16183 0.191 0.0272 cmsCS(s) 3 14.620 4.87333 4.916 0.0638 cmsCR(s) 3 0.036 0.01200 0.016 0.0035 alloc(MB) 110 11275.000 102.50000 102.500 0.0000 promo(MB) 110 1322.718 12.02471 104.608 11.8770 used0(MB) 110 12664.750 115.13409 115.250 1.2157 used(MB) 110 56546.542 514.05947 640.625 91.5858 commit0(MB) 110 12677.500 115.25000 115.250 0.0000 commit1(MB) 110 70400.000 640.00000 640.000 0.0000 commit(MB) 110 83077.500 755.25000 755.250 0.0000 alloc/elapsed_time = 11275.000 MB / 83.621 s = 134.835 MB/s alloc/tot_cpu_time = 11275.000 MB / 1337.936 s = 8.427 MB/s alloc/mut_cpu_time = 11275.000 MB / 923.472 s = 12.209 MB/s promo/elapsed_time = 1322.718 MB / 83.621 s = 15.818 MB/s promo/gc0_time = 1322.718 MB / 24.397 s = 54.217 MB/s gc_seq_load = 396.378 s / 1337.936 s = 29.626% gc_conc_load = 18.086 s / 1337.936 s = 1.352% gc_tot_load = 414.464 s Microsystems, Inc. Copyright Sun / 1337.936 s = 30.978% 58
  59. 59. GChisto • Graphical GC log visualizer • Under development > Currently, can only show pause times • Open source at > http://gchisto.dev.java.net/ • It might not work with some of the printing flags Copyright Sun Microsystems, Inc. 59
  60. 60. GCHisto (ii) Copyright Sun Microsystems, Inc. 60
  61. 61. GCHisto (iii) Copyright Sun Microsystems, Inc. 61
  62. 62. Agenda • Introductions • Brief GC Overview • GC Tuning > Tuning the young generation > Tuning Parallel GC > Tuning CMS • Monitoring the GC • Conclusions Copyright Sun Microsystems, Inc. 62
  63. 63. Conclusions • Remember: GC tuning is an Art • The talk contained > Basic GC tuning concepts > How to monitor GCs > What to look out for > Examples of good tuning practices • ...and practice makes perfect! Copyright Sun Microsystems, Inc. 63
  64. 64. Garbage Collection Tuning in the Java HotSpot™ Virtual Machine Tony Printezis, Charlie Hunt Antonios.Printezis@sun.com Charlie.Hunt@sun.com 64

×