Hotspot Garbage Collection - Tuning Guide


Published on

Part 2/2 Of the Hotspot Garbage Collection series. This is the Tuning Guide portion!

Published in: Technology
1 Comment
  • Another free VisualVM plugin:
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hotspot Garbage Collection - Tuning Guide

  1. 1. Hotspot Garbage CollectionTuning Guidehttp://www.jclarity.com1Thursday, 2 May 13
  2. 2. Who are we?• Martijn Verburg (@karianna)– CTO of jClarity– aka "The Diabolical Developer"– co-leader of the LJC• Dr. John Oliver (@johno_oliver)– Research Mentat at jClarity• Strange title? Yes were a start-up– Can read raw GC log files• "Empirical Science Wins"2Thursday, 2 May 13
  3. 3. What were going to cover• Part I - Shining a light into the Darkness– Retrospective from Talk I– Collector Flags Ahoy– Tooling and Basic Data• Part II - Setting the stage– When to tune GC– Pause times vs Throughput vs Heap Size– Application Lifecycle• Part III - Real World Scenarios– Possible Memory Leak(s), Long Pause Times– Premature Promotion, System GCs, Low Throughput– Healthy Application, Maxed Allocation Rate3Thursday, 2 May 13
  4. 4. What were not covering• G1 Collector– Its supported in production now– But we doubt any of you are using it yet• Non Hotspot JVMs– Again, most of you are using OpenJDK/Oracle.– Azuls Zing VM is a specialist VM you can look at4Thursday, 2 May 13
  5. 5. Part I - Shining a light into the dark• Retrospective• Collector Flags ahoy• Reading CMS Log records• Tooling and basic data5Thursday, 2 May 13
  6. 6. Java Heap LayoutCopyright - Oracle Corporation6Thursday, 2 May 13
  7. 7. Weak Generational HypothesisCopyright - Oracle Corporation7Thursday, 2 May 13
  8. 8. Copy Collectors• aka "stop-and-copy"– Some literature will discuss "Cheneys algorithm"• Used in many managed runtimes– Including Hotspot• GC thread(s) trace from root(s) to find live objects• Typically involves copying live objects– From one space to another space in memory– The result typically looks like a move as opposed to a copy8Thursday, 2 May 13
  9. 9. Mark and Sweep Collectors• Used by many modern collectors– Including Hotspot, usually for old generational collection• Typically 2 mandatory and 1 optional step(s)1.Find live objects (mark)2.Delete dead objects (sweep)3.Tidy up - optional (compact)9Thursday, 2 May 13
  10. 10. More Flags than your DeityCopyright Frank Pavageau10Thursday, 2 May 13
  11. 11. Mandatory Flags• -Xloggc:<pathtofile>– Path to the log output, make sure youve got disk space!• -XX:+PrintGCDetails– Minimum information for tools to help– Replace -verbose:gc with this• -XX:+PrintTenuringDistribution– Premature promotion information11Thursday, 2 May 13
  12. 12. Basic Heap Sizing Flags• -Xms<size>– Set the minimum size reserved for the heap• -Xmx<size>– Set the maximum size reserved for the heap• -XX:MaxPermSize=<size>– Set the maximum size of your perm gen– Good for Spring apps and App servers• Well cover other flags in a tuning context12Thursday, 2 May 13
  13. 13. Beware of Magic Happening• When you touch GC Flags a Puppy dies• Your Tenuring Threshold jumps to 15• -XX:MaxTenuringThreshold=n– To reset this to what you really want13Thursday, 2 May 13
  14. 14. Tooling• HPJMeter (Google it)– Solid, but no longer supported / enhanced• GCViewer (– Has rudimentary G1 support• GarbageCat (– Best name• IBM GCMV (– J9 support• jClarity Censum (– The prettiest and most useful, but were biased!14Thursday, 2 May 13
  15. 15. Dont listen to the vendors ;-)• Single log with consistent format?– You can probably grep for stuff– This doesnt scale• Existing free tools are adequate*– *For older JVMs especially– Most are no longer actively maintained• Latest tooling does more for you– Supports Latest JVMs & Collectors– Has more meaningful visualisations– Starts to do some of the Human analysis for you– Correlates and performs historical analysis– Parses certain data out that the others dont15Thursday, 2 May 13
  16. 16. Summary Data16Thursday, 2 May 13
  17. 17. Heap Usage After GC17Thursday, 2 May 13
  18. 18. Recovered Heap18Thursday, 2 May 13
  19. 19. Allocation Rates19Thursday, 2 May 13
  20. 20. Pause Times20Thursday, 2 May 13
  21. 21. Perm Space21Thursday, 2 May 13
  22. 22. Tenuring Threshold22Thursday, 2 May 13
  23. 23. Part II - Setting the stage• When to Tune• Latency / Throughput / Footprint– aka Performance goals• Application Lifecycle• Know your Hardware23Thursday, 2 May 13
  24. 24. When to tune GC• As part of a performance diagnostic process– After looking machine metrics– Before execution profiler• Its cheap to switch on GC flags– Its cheap to eliminate or pin issue on GC– Its not cheap to setup execution profilers• Result is either "GC is OK" or "GC is not OK"– Tune the GC and/or– Bring out the memory profiler24Thursday, 2 May 13
  25. 25. Latency vs Throughput vs Footprint• aka performance goals:– e.g. "Max Pause Times / 95th% Pause Times" vs– "Object Allocation Rate" vs– "Heap Size"– Throughput ~= % of time doing application work• Tuning tradeoff– Latency x Throughput x Footprint = Z– You can typically tune for 2/3 of these– To increase Z you need to• increase allocated hardware OR• Rewrite your app• Decide what characteristics you want!– Before tuning25Thursday, 2 May 13
  26. 26. Latency vs Throughput vs Footprint• Better Throughput– Usually means worse Latency and Footprint• Better Latency– Usually means worse Throughput• Better footprint– Usually means worse Throughput26Thursday, 2 May 13
  27. 27. Application Lifecycle• Very little point in tuning based off limited information– Have you gathered enough data– Has your application gone through its typical lifecycle?– This is why we dont run Live Demos• Very little point in tuning off incorrect information– Application start-up, shutdown and batch jobs are all outliers• You can infer amazing things from GC logs– When Richard went to lunch– When John stopped playing Minecraft– When Ben kicked off the weekly customer report– .....27Thursday, 2 May 13
  28. 28. Know your Hardware• Number of CPU cores, matters– Allocate X threads to do GC work with a concurrent collector– How many is safe?– How does that affect throughput?• Memory Bandwidth, matters– How quickly can your hardware allocate?– See your manufacturer– Object Allocation Rates != Memory Bandwidth != Real Metric• Use Hawkshaw to explore your hardware– Produces GC behaviour according to statistical models–, 2 May 13
  29. 29. Part III - Tuning Scenarios• Tuning can make it worse!• Grain of Salt• Scenarios– Possible Memory Leak(s)– Long Pause Times– Premature Promotion– System GCs– Low Throughput– Healthy Application– Maxed Allocation Rate29Thursday, 2 May 13
  30. 30. Tuning can make it worse*• Performance Tuning is an iterative process– Sometimes solving one problem uncovers a 2nd worseproblem– e.g. Fix the app, then the database gets hammered• Overall performance goes down• Only fix one aspect of GC at a time– Measure the next cycle with fresh eyes– Have you met your goals or made them worse?• GC tuning still needs human interaction– Azuls Zing can/will claim otherwise.30Thursday, 2 May 13
  31. 31. Grain of Salt"Nothing that we say should be held asperformance tuning tips for *your* application""There is *always* more than one way to tune in order tomeet your goal""Dont just use our numbers!"31Thursday, 2 May 13
  32. 32. A Likely Memory Leak• Memory leaks cant truly be ascertained by a GC log– It could just be an undersized heap!– Needs Human domain knowledge of app (periodicity)• First rule of thumb is to increase your heap– Rule out having an undersized heap• Second rule of thumb is to fire up the Memory profiler– Visual VM will do in most cases32Thursday, 2 May 13
  33. 33. A Likely Memory Leak• Only 1000 seconds, look at number of Full GCs, highlyindicative. Note trend along the bottom.33Thursday, 2 May 13
  34. 34. A Possible Memory Leak - I• Note: trend along the bottom, slow leak possible. Look forcycles in the log e.g. A full day in an applications life.34Thursday, 2 May 13
  35. 35. A Possible Memory Leak - II• Note: Trend along the bottom, slow leak possible. Again,look for cycles in the log.35Thursday, 2 May 13
  36. 36. Using a Memory Profiler• Visual VM– Memory profiler - invasive and slow on large apps– Look at object ages (aka Generations)• Look for high number of generations– Theyre a candidate– Make sure you switch on record allocation stack traces• Use allocation stack trace to find root cause– Track back from core JRE classes to your code– Yes, its always your code thats the problem!• Can also try jmap -histo36Thursday, 2 May 13
  37. 37. Visual VM - Memory Profiler• Note: Objects in many generations! Indicative theyre leaking37Thursday, 2 May 13
  38. 38. Visual VM - Stack Trace• NThreadedManagedCache$ root cause38Thursday, 2 May 13
  39. 39. Long Pause Times• The #1 complaint relating to GC– Lots of ways to mitigate– From small tuning tweaks --> off Heap solutions• User reports paused/locked application!– e.g. Web pages taking ages to load– e.g. Progress bars stalling• Tech Support want to uninstall Java!39Thursday, 2 May 13
  40. 40. Long Pause Time Example• User has set heap to: -Xms5G -Xmx5G• NOTE: Resident Set Size ~1GB40Thursday, 2 May 13
  41. 41. Long Pause Time Example• ~125ms young gen pauses & ~500ms Full GC pauses– OK for web app, but this is a new prototype low latency trading app orMedia Streaming app or Advertising service, oh dear!41Thursday, 2 May 13
  42. 42. Long Pause Time partial fix• Reduce heap size -Xmx1500M, more frequent, shorter pauses42Thursday, 2 May 13
  43. 43. Long Pause Time partial fix• ~20ms young gen pauses & ~250ms Full GC pauses, Better!43Thursday, 2 May 13
  44. 44. Long Pause Time fixed• Move to a CMS collector, hopefully shorter pauses• No Full GCs! Therefore minimal Tenured pauses44Thursday, 2 May 13
  45. 45. Long Pause Time fixed!• ~10ms young gen pauses, ~2ms tenured pauses, Better!• BUT: Throughput decreased from 69% down to 49% :-(45Thursday, 2 May 13
  46. 46. Other Long Pause Time Solutions• Increase number of threads performing GC– -XX:ParallelGCThreads=N– Rule of thumb is to use 3/4 the available physical cores– Can reduce application throughput - can be bad– Can increase context switching - bad• Try an alternative collector– ParNew/CMS vs PSScavenge/ParOld vs iCMS vs G1 etc– Match the collector to your application and hardware• Special note on G1– You can set pause time goals– BUT: We havent reliably succeed for <100ms pause times46Thursday, 2 May 13
  47. 47. Extreme Long Pause Time Solutions• Azuls Zing JVM– This has a proven low pause time goal settings– JCK/TCK compliant– Typically needs a very large heap (15GB+)• Take memory off heap– Good for caches in particular• GC in offline mode– Cluster app and offline nodes in order to run GC on them47Thursday, 2 May 13
  48. 48. Premature Promotion• User reports more pauses and/or longer pauses• Tech support reports there are more full GCs• Objects are promoted to Tenured too early– Recall the Young Generational Hypothesis!– This causes more Old Gen collections• Which can lead to more Full GCs48Thursday, 2 May 13
  49. 49. Premature Promotion ExampleCustomer had set:-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:+PrintGCDetails-Xloggc:gc.log-Xmx1024m-XX:+PrintTenuringDistribution-XX:NewRatio=2-XX:MaxTenuringThreshold=4NewRatio=2 means young gen gets ~1/3 of the total heap49Thursday, 2 May 13
  50. 50. Premature Promotion Example• Note: ~26% of objects promoted at age 150Thursday, 2 May 13
  51. 51. Premature Promotion Fixed• We dropped the NewRatio=1, Premature Promotion ~4%– Young Generational Hypothesis is a better fit– This gives the Young Gen ~1/2 the heap51Thursday, 2 May 13
  52. 52. System GCs• User reports frequent pauses– System GCs are Full GCs!• Tech support reports there are more full GCs– With this funny System wording in the log• System GCs often interfere with the GC subsystem– JVM no longer resizes heap based on runtime info• Caused by System.gc() in code or an RMI call– Very occasionally used to solve a problem– System.gc() is almost always honoured– You can disable it -XX:+DisableExplicitGC52Thursday, 2 May 13
  53. 53. System GC example• NOTE: 34,000 system GCs, every 1/2 second– Throughput 51% - Unhappy Minecraft players!53Thursday, 2 May 13
  54. 54. System GC calls Fixed• -XX:+DisableExplicitGC• Throughput went to 99.8% - Happier Minecraft players54Thursday, 2 May 13
  55. 55. Low Throughput• User reports slow application– e.g. Batch job fails to complete on time• Tech support reports there are lots of GCs• Lots of small GCs can also be bad!– Your application threads arent able to allocate objects– i.e. Low Throughput• Throughput increases when system is quiet– Be careful in analysing the right period of activity55Thursday, 2 May 13
  56. 56. Low Throughput example 1/4• 61 seconds in total pause time, log is only 170 seconds long• Throughput is 64% --> Rule of thumb, should be 95%+56Thursday, 2 May 13
  57. 57. Low Throughput example 2/4• Lots of small pauses from various collectors, which ones?57Thursday, 2 May 13
  58. 58. Low Throughput example 3/4• ~25% time spent in young GC & ~5-10% in Full GCs (CMFs)58Thursday, 2 May 13
  59. 59. Low Throughput example 4/4• Object allocation hitting max heap size– Able to recover memory, so no leak, needs a bigger heap!59Thursday, 2 May 13
  60. 60. Low Throughput Fixed 1/4• Increased footprint to -Xmx1024M60Thursday, 2 May 13
  61. 61. Low Throughput Fixed 2/4• Lots less pauses from Full GCs CMFs - just looks nicer!– Still lots of young gen pauses61Thursday, 2 May 13
  62. 62. Low Throughput Fixed 3/4• ~15% time spent in young GC & ~0% in Full GCs62Thursday, 2 May 13
  63. 63. Low Throughput Fixed 4/4• Note: 33 seconds out of 170, ~81% Throughput, Better!63Thursday, 2 May 13
  64. 64. Low Throughput Really Fixed 1/2• Switched to PSYoungGen collector (from ParNew)– Worth trying as young gen collections are dominant64Thursday, 2 May 13
  65. 65. Low Throughput Really Fixed 2/2• Note: 9 seconds out of 170, ~95% Throughput, Best!65Thursday, 2 May 13
  66. 66. Healthy Application• What is healthy? It depends!• Throughput– Typically a 95%+ throughput is good• Pause times– < 1sec is good for generic web apps• Footprint– Smaller == Less live objects to track == Better?66Thursday, 2 May 13
  67. 67. Healthy Application• Saw tooth pattern• Bottom of troughs trend line is flat67Thursday, 2 May 13
  68. 68. Healthy Minecraft Client!• Note: JVM resizing itself, you let IT do the work!68Thursday, 2 May 13
  69. 69. Maxed Allocation Rate• User reports slow application behaviour• Tech support has no idea why!– Normally youd do a full performance diagnostic– But we can look at GC cheaply• GC logs can help with non GC problems!– Memory Bandwidth limits are being hit– Not a GC problem!• More common in virtualised environments– What else on the hardware is using bandwidth?69Thursday, 2 May 13
  70. 70. Not Maxed Allocation Rate Example70Thursday, 2 May 13
  71. 71. Max Allocation Rate Example• 8GB/sec - could be getting close to real memory bandwidth71Thursday, 2 May 13
  72. 72. Max Allocation Rate ExampleHard limit at ~8GB (8e+06 on graph)72Thursday, 2 May 13
  73. 73. Max Allocation Rate Fixes• Lots you can do!• Stop allocating so much!– Get out your Memory profiler– Alter the applicationss objection allocation behaviour• Get better hardware!– CPU– Faster Bus– Faster RAM• Dont virtualise/share– Have your application be the only thing on that hardware73Thursday, 2 May 13
  74. 74. Summary• You need to understand some basic GC theory– Work with the Weak Generational Hypothesis– See for blog posts• Turn on GC logging!– It has low overhead*– Reading raw log files is hard– Use tooling!• Tradeoff: Pause Times vs Throughput vs Heap Size– Use tools to help you tweak– "Empirical Science Wins!"74Thursday, 2 May 13
  75. 75. Join our performance communityhttp://www.jclarity.comMartijn Verburg (@karianna)Dr. John Oliver (@johno_oliver)75Thursday, 2 May 13