Your SlideShare is downloading. ×
  • Like
Java GC - Pause tuning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Java GC - Pause tuning

  • 2,805 views
Published

English version of the presentation we gave at Devoxx FR 2012. …

English version of the presentation we gave at Devoxx FR 2012.
In depth analysis on how java Garbage collector works and how to minimise pause in your application.

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,805
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
87
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Death by pauses Everything you ever wanted to know about GC pauses* *but were afraid to ask 1Tuesday, July 10, 12
  • 2. Agenda 1. Introduction 2. Crime Scene Investigation 3. JVM Memory management systems and tools 4. Putting it together 2Tuesday, July 10, 12
  • 3. The Crime Scene PG 13* * Parents strongly cautioned: typed language, dead objects and verbose logs may not be suitable to scripting language fans 3Tuesday, July 10, 12
  • 4. B2C e-commerce platform Apache Tomcat Oracle 4Tuesday, July 10, 12
  • 5. B2C e-commerce platform Apache •12+ Servers Tomcat •10 different Webapps •50+ JVMs (Oracle JDK6) Oracle 4Tuesday, July 10, 12
  • 6. B2C e-commerce platform Apache •12+ Servers Tomcat •10 different Webapps •50+ JVMs (Oracle JDK6) •> 30000 sessions •250-400 Req/s Oracle •Variance is high 4Tuesday, July 10, 12
  • 7. ... an unusual victim... Product catalog modeled as a Graph 100% custom implementation 100% on-heap (no SQL except for initial load) in-place update by AtomicReference.set() 5Tuesday, July 10, 12
  • 8. ... an unusual victim... Product catalog modeled as a Graph 100% custom implementation 100% on-heap (no SQL except for initial load) in-place update by AtomicReference.set() Caching aggressively is not possible Large number of request-scoped objects Many WS into backoffice systems = latency 5Tuesday, July 10, 12
  • 9. vs. Throughput Latency 6Tuesday, July 10, 12
  • 10. 7Tuesday, July 10, 12
  • 11. Interactive e-commerce app: Low latency is the top priority! 7Tuesday, July 10, 12
  • 12. The Crime Scene JDBC Connections Time 8Tuesday, July 10, 12
  • 13. The Crime Scene JDBC Connections Requests/s Time Time 8Tuesday, July 10, 12
  • 14. The Crime Scene JDBC Connections Requests/s Time Time Active threads Time 8Tuesday, July 10, 12
  • 15. The Crime Scene JDBC Connections Requests/s Time Time Active threads HTTP Executor Queue Size Time Time 8Tuesday, July 10, 12
  • 16. The evidence HeapSize in MB 1 hour 9 Tuesday, July 10, 12
  • 17. The evidence Heap Can’t see anything: let’s zoom out!Size in MB 1 hour 9 Tuesday, July 10, 12
  • 18. The evidence HeapSize in MB 24 hours 10 Tuesday, July 10, 12
  • 19. The evidence HeapSize in MB 24 hours 10 Tuesday, July 10, 12
  • 20. The evidence HeapSize in MB 24 hours 10 Tuesday, July 10, 12
  • 21. The evidence Heap Time spent in GC (%) 100 75Size in MB 50 25 0 1 hour 1 hour 11 Tuesday, July 10, 12
  • 22. The usual suspects... 12Tuesday, July 10, 12
  • 23. The usual suspects... • OutOfMemory Heap 12Tuesday, July 10, 12
  • 24. The usual suspects... • OutOfMemory Heap • OutOfMemory PermGen 12Tuesday, July 10, 12
  • 25. The usual suspects... • OutOfMemory Heap • OutOfMemory PermGen • Long GC pauses 12Tuesday, July 10, 12
  • 26. The usual suspects... • OutOfMemory Heap • OutOfMemory PermGen • Long GC pauses ➡ under high load = immediate death 12Tuesday, July 10, 12
  • 27. The usual suspects... • OutOfMemory Heap • OutOfMemory PermGen • Long GC pauses ➡ under high load = immediate death 12Tuesday, July 10, 12
  • 28. The usual suspects... • OutOfMemory Heap • OutOfMemory PermGen • Long GC pauses ➡ under high load = immediate death 12Tuesday, July 10, 12
  • 29. a e st h b yThe usual suspects... • • D e OutOfMemory Heap s OutOfMemory PermGen • p au Long GC pauses ➡ under high load = immediate death 12Tuesday, July 10, 12
  • 30. Why do we need this GC thing again ? “Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free.” Cliff Click 13Tuesday, July 10, 12
  • 31. Fine, we just need to tune the JVM, right?... 14Tuesday, July 10, 12
  • 32. Fine, we just need to tune the JVM, right?... 14Tuesday, July 10, 12
  • 33. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 34. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 35. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 36. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 200 <= X< 300 * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 37. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 200 <= X< 300 300 <= X< 400 * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 38. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 200 <= X< 300 300 <= X< 400 400 <= X< 500 * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 39. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 200 <= X< 300 300 <= X< 400 400 <= X< 500 500 <= X< 600 * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 40. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 200 <= X< 300 300 <= X< 400 400 <= X< 500 500 <= X< 600 600 <= X< 700 * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 41. Fine, we just need to tune the JVM, right?... POP QUIZZ! Number of command-line flags*? less than 100 flags 100 <= X< 200 200 <= X< 300 300 <= X< 400 400 <= X< 500 500 <= X< 600 600 <= X< 700 664 Flags! * Oracle JVM 1.6.0_31 x86_64 server 14Tuesday, July 10, 12
  • 42. 15Tuesday, July 10, 12
  • 43. 15Tuesday, July 10, 12
  • 44. Memory in the JVM JVM 16Tuesday, July 10, 12
  • 45. Class metadata Permanent (PermGen) interned Strings, etc. JVM 17Tuesday, July 10, 12
  • 46. Class metadata Permanent (PermGen) interned Strings, etc. Heap Application Objects 18Tuesday, July 10, 12
  • 47. Permanent (PermGen) Old / Tenured Young / New 19Tuesday, July 10, 12
  • 48. Permanent (PermGen) Old / Tenured Eden S0 S1 20Tuesday, July 10, 12
  • 49. The Garbage Collector is generational r0 r1 v ivo v ivo en ur ur Ed S S Old 21Tuesday, July 10, 12
  • 50. r0 r1 v ivo v ivo en ur ur Ed S S Allocation Old 22Tuesday, July 10, 12
  • 51. r0 r1 v ivo v ivo en ur ur Ed S S Old 23Tuesday, July 10, 12
  • 52. r0 r1 v ivo v ivo en ur ur Ed S S 100% = GC! Old 24Tuesday, July 10, 12
  • 53. r0 r1 v ivo v ivo en ur ur Ed S S Unreferenced Live Old 25Tuesday, July 10, 12
  • 54. r0 r1 v ivo v ivo en ur ur Ed S S Copy Old 26Tuesday, July 10, 12
  • 55. r0 r1 v ivo v ivo en ur ur Ed S S Reset... Old 27Tuesday, July 10, 12
  • 56. r0 r1 v ivo v ivo en ur ur Ed S S Allocation Old 28Tuesday, July 10, 12
  • 57. r0 r1 v ivo v ivo en ur ur Ed S S 100% = GC ! Old 29Tuesday, July 10, 12
  • 58. r0 r1 v ivo v ivo en ur ur Ed S S Old 30Tuesday, July 10, 12
  • 59. r0 r1 v ivo v ivo en ur ur Ed S S Copy Old 31Tuesday, July 10, 12
  • 60. r0 r1 v ivo v ivo en ur ur Ed S S Copy Old 32Tuesday, July 10, 12
  • 61. r0 r1 v ivo v ivo en ur ur Ed S S Reset ... Génération 1 Génération 2 Old 33Tuesday, July 10, 12
  • 62. r0 r1 v ivo v ivo en ur ur Ed S S Allocation Old 34Tuesday, July 10, 12
  • 63. r0 r1 v ivo v ivo en ur ur Ed S S 100% = GC ! Old 35Tuesday, July 10, 12
  • 64. r0 r1 v ivo v ivo en ur ur Ed S S Copy Old 36Tuesday, July 10, 12
  • 65. r0 r1 v ivo v ivo en ur ur Ed S S Promotion Old 37Tuesday, July 10, 12
  • 66. Old 38Tuesday, July 10, 12
  • 67. “Almost full” =Old ! GC 39Tuesday, July 10, 12
  • 68. 40Tuesday, July 10, 12
  • 69. 41Tuesday, July 10, 12
  • 70. Old Compaction (optional) 42Tuesday, July 10, 12
  • 71. 43Tuesday, July 10, 12
  • 72. Garbage Collectors • Générational • Stop the world! • Throughput or Concurrent 44Tuesday, July 10, 12
  • 73. GC characteristics Young Serial Parallel Serial Old Parallel Concurrent 45Tuesday, July 10, 12
  • 74. GC characteristics Young Serial Parallel Serial Default Old Parallel N/A Concurrent 46Tuesday, July 10, 12
  • 75. GC characteristics Young Serial Parallel Serial Old Parallel Concurrent 47Tuesday, July 10, 12
  • 76. GC characteristics Young Serial Parallel Serial Serial Old Parallel Concurrent 47Tuesday, July 10, 12
  • 77. GC characteristics Young Serial Parallel Serial Serial Parallel Old Parallel Concurrent 47Tuesday, July 10, 12
  • 78. GC characteristics Young Serial Parallel Serial Serial Parallel Old Parallel ParallelOld Concurrent 47Tuesday, July 10, 12
  • 79. GC characteristics Young Serial Parallel Serial Serial Parallel Old Parallel ParallelOld Concurrent CMS 47Tuesday, July 10, 12
  • 80. GC characteristics Young Serial Parallel Serial Serial Parallel Old Parallel ParallelOld Concurrent CMS Serial CMS 47Tuesday, July 10, 12
  • 81. GC characteristics Young Serial Parallel Serial Serial Parallel Old Parallel ParallelOld Concurrent CMS Serial CMS Parallel implementation actually differ for each variant 48Tuesday, July 10, 12
  • 82. GC characteristics 49Tuesday, July 10, 12
  • 83. GC characteristics 49Tuesday, July 10, 12
  • 84. CMS is the right choice Serial 917 Parallel 852 ParallelOld 846 CMS 871 CMS Serial 937 0 250 500 750 1000 Average test duration (s) 50Tuesday, July 10, 12
  • 85. Tools: CLI jps, jhat, jmap, jstack, jstat $ jstat -gcutil PID S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 40.88 58.41 18.34 66.65 2729 316.538 46 6.820 323.358 51Tuesday, July 10, 12
  • 86. Tools: GUIs 52Tuesday, July 10, 12
  • 87. Tools: GUIs (2) • Any profiler • During development • For autopsies! 53Tuesday, July 10, 12
  • 88. Tools: GUIs (2) • Any profiler • During development • For autopsies! HeapDumpOnOutOfMemoryError HeapDumpPath 53Tuesday, July 10, 12
  • 89. verbose:gc 54Tuesday, July 10, 12
  • 90. verbose:gc 54Tuesday, July 10, 12
  • 91. verbose:gc 54Tuesday, July 10, 12
  • 92. verbose:gc 54Tuesday, July 10, 12
  • 93. verbose:gc 54Tuesday, July 10, 12
  • 94. verbose:gc 54Tuesday, July 10, 12
  • 95. verbose:gc 54Tuesday, July 10, 12
  • 96. verbose:gc Stop the world! 54Tuesday, July 10, 12
  • 97. verbose:gc Stop the world! 54Tuesday, July 10, 12
  • 98. MBeans 55Tuesday, July 10, 12
  • 99. OK, so we can measure... temperature!! 56Tuesday, July 10, 12
  • 100. OK, so we can measure... temperature ! = 57Tuesday, July 10, 12
  • 101. But...a single temperature measure is not enough to diagnose anything! We must archive all measurements to know the baseline!Credit: http://www.lhup.edu/mkhalequ/fieldtrip/geos253.htm 58Tuesday, July 10, 12
  • 102. Therefore we must persist all measurements! • JMX + jmxtrans • RRD • Graphite • etc. 59Tuesday, July 10, 12
  • 103. Operating the (many) switches only makes sense...Credit: http://www.our-energy.com 60Tuesday, July 10, 12
  • 104. ...if we can measure/compare the effects! Before cputime After 61Tuesday, July 10, 12
  • 105. Putting it together 62Tuesday, July 10, 12
  • 106. We want to minimize the GC pauses Young (ParNew) Old (CMS-initial-mark + CMS-remark) 63Tuesday, July 10, 12
  • 107. vs. 64Tuesday, July 10, 12
  • 108. JVM Tomcat vs. Application (code) 64Tuesday, July 10, 12
  • 109. 1. Code • Tuning the JVM cannot compensate for bad code • Rules of thumb • Immutability = object reuse = less allocations * • Move code invariants out of tight loops • Know the characteristics of your data structures & frameworks (java.util, Guava, Hibernate, etc.) • Mind the gap: data structure overhead can kill you! * But...pooling can be counter-productive! 65Tuesday, July 10, 12
  • 110. Example : HashMap HashMap 48 Entry[16] 80 key Entry 32 value 66Tuesday, July 10, 12
  • 111. Example : HashMap HashMap 48 Entry[16] 80 key Overhead = 160 Bytes! Entry 32 value 66Tuesday, July 10, 12
  • 112. Example : HashMap HashMap 48 Entry[16] 80 key Overhead = 160 Bytes! Entry 32 value •SingletonMap (40 Bytes) •initialCapacity + loadFactor 66Tuesday, July 10, 12
  • 113. GC Young / s Less allocations... 67Tuesday, July 10, 12
  • 114. Charge CPU ... saves CPU 68Tuesday, July 10, 12
  • 115. 2. Tomcat • Pooling • JSP tags: enablePooling in web/webdefault.xml • -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false • Careful with buffers and their reuse • -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true • JSP usage is a factor in PermGen requirements • Test & Measure, always! 69Tuesday, July 10, 12
  • 116. 2. Tomcat • Pooling Pooling may lead • JSP tags: enablePooling in web/webdefault.xml ! to Old fragmentation! • -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false • Careful with buffers and their reuse • -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true • JSP usage is a factor in PermGen requirements • Test & Measure, always! 69Tuesday, July 10, 12
  • 117. 3. Heap Size The JVM Time Heap Size Time 70Tuesday, July 10, 12
  • 118. 3. The JVM pause > 1s ! Heap Size Time Heap Size Time 70Tuesday, July 10, 12
  • 119. 3. The JVM pause > 1s ! Heap Size Time Heap Size Frequent GC Time 70Tuesday, July 10, 12
  • 120. The heap Heap 71Tuesday, July 10, 12
  • 121. The heap Heap -Xms : start size -Xmx : max size 71Tuesday, July 10, 12
  • 122. Young vs Old Old Young 72Tuesday, July 10, 12
  • 123. Young vs Old Old -XX:NewSize -XX:MaxNewSize -XX:NewRatio Young 72Tuesday, July 10, 12
  • 124. Young vs Old •“Working Set” Old •Caches, object pools •HttpSession, average lifespan objects Young Objects < RequestScope 72Tuesday, July 10, 12
  • 125. First mistake: setting the Young too small Old Young 73Tuesday, July 10, 12
  • 126. First mistake: setting the Young too small Old Young fills up quickly = many GC Young Objects promoted to Tenured too fast = Young many GC Old 73Tuesday, July 10, 12
  • 127. Second mistake: setting Young too large Old Young 74Tuesday, July 10, 12
  • 128. Second mistake: setting Young too large Old GC Young pauses increase Young 74Tuesday, July 10, 12
  • 129. Tuning Young Old Default NewRatio=8 with -server on Intel = Young Too small for a webapp with non- trivial load! 75Tuesday, July 10, 12
  • 130. Tuning Young Old Increase Young slowly and measure the effects! Young 75Tuesday, July 10, 12
  • 131. Old: Mind the Gaps (fragmentation)! Old Young 76Tuesday, July 10, 12
  • 132. Old: Mind the Gaps (fragmentation)! K6 < u22: on failed) JD Old omoti 15595) (pr ure size = fail (promotion ParNew Young 76Tuesday, July 10, 12
  • 133. Old generation : ideal shape 77Tuesday, July 10, 12
  • 134. Old generation : real life 78Tuesday, July 10, 12
  • 135. Old generation : ideal vs. real 79Tuesday, July 10, 12
  • 136. Old generation : ideal vs. real Rate increases 79Tuesday, July 10, 12
  • 137. Things to watch for • Traffic/Load variance • Traffic increases => Memory pressure increase • CMS requires some headroom to operate properly • Several phases are concurrent, i.e. at the same time as new objects are allocated 80Tuesday, July 10, 12
  • 138. Things to watch for • Traffic/Load variance • Traffic increases => Memory pressure increase • CMS requires some headroom to operate properly • Several phases are concurrent, i.e. at the same time as new objects are allocated (concurrent mode failure): 2165740K->1284261K(2228224K), 8.9411250 secs 80Tuesday, July 10, 12
  • 139. Giving CMS some room to operate Old Young 81Tuesday, July 10, 12
  • 140. Giving CMS some room to operate CMSInitiatingOccupancyFraction = 92% This is the default.... Old Young 81Tuesday, July 10, 12
  • 141. Giving CMS some room to operate We really need 75-80% UseCMSInitiatingOccupancyOnly to force the JVM to only consider this criteria Old Young 81Tuesday, July 10, 12
  • 142. CMS initial-mark 82Tuesday, July 10, 12
  • 143. CMS initial-mark (cumulative) 83Tuesday, July 10, 12
  • 144. CMS initial-mark (cumulative) Median: -83% 83Tuesday, July 10, 12
  • 145. CMS initial-mark (cumulative) Top 99%: -79% Median: -83% 83Tuesday, July 10, 12
  • 146. CMS remark 84Tuesday, July 10, 12
  • 147. CMS remark (cumulative) 85Tuesday, July 10, 12
  • 148. CMS remark (cumulative) Top 90%: -56% 85Tuesday, July 10, 12
  • 149. But...I still see pauses ! • RMI triggers explicit GC regularly • Invokes System.gc() • Explicit GC = Full GC (Serial) = 4-8s stop-the-world pause ! • DisableExplicitGC + CMSClassUnloadingEnabled • ExplicitGCInvokesConcurrentAndUnloadsClasses 86Tuesday, July 10, 12
  • 150. Complete GC comparison 87Tuesday, July 10, 12
  • 151. 88Tuesday, July 10, 12
  • 152. 88Tuesday, July 10, 12
  • 153. What’s next? • Survivors tuning (S0 & S1) • Size, ratio vs. Eden, max generation • G1 • Principles and operations are radically different! • Other JVMs : JRockit, Azul, IBM • Check tuning validity after every code change! • Measure, measure, measure! 89Tuesday, July 10, 12
  • 154. Questions ? 90Tuesday, July 10, 12
  • 155. 91Tuesday, July 10, 12