Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Game of Performance: A Song of JIT and GC

1,107 views

Published on

This is a presentation on Hotspot VM's runtime, JIT and GC. The presentation talks about various optimizations "close to the metal"

Published in: Technology
  • Be the first to comment

Game of Performance: A Song of JIT and GC

  1. 1. A Song of JIT and GC 1 QCon London 2016 Monica Beckwith monica@codekaram.com; @mon_beck https://www.linkedin.com/in/monicabeckwith www.codekaram.com Game of Performance
  2. 2. ©2016 CodeKaram About Me • Java/JVM/GC Performance Engineer/Consultant • Worked at AMD, Sun, Oracle… • Worked with HotSpot JVM for more than a decade • JVM heuristics, JIT compiler, GCs: Parallel(Old) GC, G1 GC, CMS GC 2
  3. 3. ©2016 CodeKaram Many Thanks • Vladimir Kozlov (Hotspot JIT Team) - for clearing my understanding of tiered compilation, escape analysis and nuances of dynamic deoptimizations. • Jon Masamitsu (Hotspot GC Team) - for keeping me on track with JDK8 changes that related to GC. 3
  4. 4. ©2016 CodeKaram Agenda • The Helpers within the JVM • JIT & Runtime • Adaptive Optimization • CompileThreshold • Inlining • Dynamic Deoptimization 4
  5. 5. ©2016 CodeKaram Agenda • The Helpers within the JVM • JIT & Runtime • Tiered Compilation • Intrinsics • Escape Analysis • Compressed Oops • Compressed Class Pointers 5
  6. 6. ©2016 CodeKaram Agenda • The Helpers within the JVM • Garbage Collection • Allocation • TLAB • NUMA Aware Allocator 6
  7. 7. ©2016 CodeKaram Agenda • The Helpers within the JVM • Garbage Collection • Reclamation • Mark-Sweep-Compact (Serial + Parallel) • Mark-Sweep • Scavenging 7
  8. 8. ©2016 CodeKaram Agenda • The Helpers within the JVM • Garbage Collection • String Interning and Deduplication 8
  9. 9. ©2016 CodeKaram Interfacing The Metal 9 Java Application Java APIs Runtime JIT Compiler GC Java VM Class loader OS + Hardware
  10. 10. ©2016 CodeKaram Interfacing The Metal 10 Java Application Java APIs Java VM OS + HardwareJRE Runtime JIT Compiler GC Class loader
  11. 11. ©2016 CodeKaram The Helpers 11 Java VM Runtime JIT Compiler GC
  12. 12. 12 The Helpers - JIT And Runtime.
  13. 13. 13 Advanced JIT & Runtime Optimizations - Adaptive Optimization
  14. 14. ©2016 CodeKaram • Startup - Interpreter • Adaptive optimization - Performance critical methods • Compilation: • CompileThreshold • Identify root of compilation • Method Compilation or On-stack replacement (Loop)? 14 JIT & Runtime
  15. 15. ©2016 CodeKaram PrintCompilation 15 timestamp compilation-id flags tiered- compilation-level Method <@ osr_bci> code- size <deoptimization>
  16. 16. ©2016 CodeKaram PrintCompilation 16 Flags: %: is_osr_method s: is_synchronized !: has_exception_handler b: is_blocking n: is_native
  17. 17. ©2016 CodeKaram PrintCompilation 17 567 693 % ! 3 org.h2.command.dml.Insert::insertRows @ 76 (513 bytes) 656 797 n 0 java.lang.Object::clone (native) 779 835 s 4 java.lang.StringBuffer::append (13 bytes)
  18. 18. ©2016 CodeKaram • Adaptive optimization - Performance critical methods • Inlining: • MinInliningThreshold, MaxFreqInlineSize, InlineSmallCode, MaxInlineSize, MaxInlineLevel, DesiredMethodLimit … 18 JIT & Runtime
  19. 19. ©2016 CodeKaram PrintInling* 19 @ 76 java.util.zip.Inflater::setInput (74 bytes) too big @ 80 java.io.BufferedInputStream::getBufIfOpen (21 bytes) inline (hot) @ 91 java.lang.System::arraycopy (0 bytes) (intrinsic) @ 2 java.lang.ClassLoader::checkName (43 bytes) callee is too large * needs -XX:+UnlockDiagnosticVMOptions
  20. 20. ©2016 CodeKaram • Adaptive optimization - Performance critical methods • Dynamic de-optimization • dependencies invalidation • classes unloading and redefinition • uncommon path in compiled code 20 JIT & Runtime
  21. 21. ©2016 CodeKaram PrintCompilation 21 573 704 2 org.h2.table.Table::fireAfterRow (17 bytes) 7963 2223 4 org.h2.table.Table::fireAfterRow (17 bytes) 7964 704 2 org.h2.table.Table::fireAfterRow (17 bytes) made not entrant 33547 704 2 org.h2.table.Table::fireAfterRow (17 bytes) made zombie
  22. 22. 22 Advanced JIT & Runtime Optimizations - Tiered Compilation
  23. 23. ©2016 CodeKaram • Start in interpreter • Tiered optimization with client compiler • Code profiled information • Enable server compiler 23 Tiered Compilation
  24. 24. ©2016 CodeKaram • C1 compilation threshold for tiered is about a 100 invocations • Tiered Compilation has a lot more profiled information for C1 compiled methods • CodeCache needs to be 5x larger than non-tiered • Default on JDK8 when tiered is enabled (48MB vs 240MB) • Need more? Use -XX:ReservedCodeCacheSize 24 Tiered Compilation - Effect on Code Cache
  25. 25. ©2016 CodeKaram • Fast dynamic type tests for type safety • Range check elimination • Loop unrolling • Profile data guided optimizations • Escape Analysis • Intrinsics • Vectorization 25 Other JIT Compiler Optimizations
  26. 26. 26 Advanced JIT & Runtime Optimizations - Intrinsics
  27. 27. ©2016 CodeKaram27 Without Intrinsics Java Method JIT Compilation Execute Generated Code
  28. 28. ©2016 CodeKaram28 Intrinsics Java Method JIT Compilation Execute Optimized Code Call Hand- Optimized Assembly Code
  29. 29. ©2016 CodeKaram PrintInling* 29 @ 76 java.util.zip.Inflater::setInput (74 bytes) too big @ 80 java.io.BufferedInputStream::getBufIfOpen (21 bytes) inline (hot) @ 91 java.lang.System::arraycopy (0 bytes) (intrinsic) @ 2 java.lang.ClassLoader::checkName (43 bytes) callee is too large * needs -XX:+UnlockDiagnosticVMOptions
  30. 30. 30 Advanced JIT & Runtime Optimizations - Escape Analysis
  31. 31. ©2016 CodeKaram • Entire IR graph • escaping allocations? • not stored to a static field or non-static field of an external object, • not returned from method, • not passed as parameter to another method where it escapes. 31 Escape Analysis
  32. 32. ©2016 CodeKaram32 Escape Analysis allocated object doesn’t escape the compiled method allocated object not passed as a parameter + remove allocation and keep field values in registers =
  33. 33. ©2016 CodeKaram33 Escape Analysis allocated object is passed as a parameter + remove locks associated with object and use optimized compare instructions = allocated object doesn’t escape the compiled method
  34. 34. 34 Advanced JIT & Runtime Optimizations - Compressed Oops
  35. 35. ©2016 CodeKaram A Java Object 35 Header Body Klass Mark Word Array Length
  36. 36. ©2016 CodeKaram Objects, Fields & Alignment • Objects are 8 byte aligned (default). • Fields: • are aligned by their type. • can fill a gap that maybe required for alignment. • are accessed using offset from the start of the object 36
  37. 37. ©2016 CodeKaram37 Mark Word - 32 bit vs 64 bit Klass - 32 bit vs 64 bit Array Length - 32 bit on both boolean, byte, char, float, int, short - 32 bit on both double, long - 64 bit on both ILP32 vs. LP64 Field Sizes
  38. 38. ©2016 CodeKaram Compressed OOPs 38 <wide-oop> = <narrow-oop-base> + (<narrow-oop> << 3) + <field-offset>
  39. 39. ©2016 CodeKaram Compressed OOPs 39 Heap Size? <4 GB (no encoding/ decoding needed) >4GB; <28GB (zero-based) <wide-oop> = <narrow-oop> <narrow-oop> << 3
  40. 40. ©2016 CodeKaram Compressed OOPs 40 Heap Size? >28 GB; <32 GB (regular) >32 GB; <64 GB * (change alignment) <wide-oop> = <narrow-oop-base> + (<narrow-oop> << 3) + <field- offset> <narrow-oop-base> + (<narrow-oop> << 4) + <field- offset>
  41. 41. ©2016 CodeKaram Compressed OOPs 41 Heap Size? <4 GB >4GB; <28GB <32GB <64GB Object Alignment? 8 bytes 8 bytes 8 bytes 16 bytes Offset Required? No No Yes Yes Shift by? No shift 3 3 4
  42. 42. ©2016 CodeKaram Compressed Class Pointers 42 • JDK 8 —> Perm Gen Removal —> Class Data outside of heap • Compressed class pointer space • contains class metadata • is a part of Metaspace
  43. 43. ©2016 CodeKaram Compressed Class Pointers 43 PermGen Removal Overview by Coleen Phillimore + Jon Masamitsu @JavaOne 2013
  44. 44. 44 The Helpers - Garbage Collection.
  45. 45. ©2016 CodeKaram Garbage Collection 45 Allocation + Reclamation
  46. 46. 46 Garbage Collection - Allocation.
  47. 47. ©2016 CodeKaram Generational Heap 47 Young Generation Old Generation Eden Survivors
  48. 48. 48 Fast Path Allocation - Thread Local Allocation Buffers (TLABs).
  49. 49. ©2016 CodeKaram Garbage Collection - Allocation 49 Most Allocations Eden Space Fast Path TLABs per thread: • allocate lock-free • bump-(your-own)-pointer allocation • only co-ordinate when needing a new TLAB
  50. 50. ©2016 CodeKaram TLAB 50 EdenThread 1 Thread 2 Thread 3 Thread 4 TLAB TLAB TLAB TLAB TLAB Thread 0
  51. 51. 51 (Min)TLABSize ResizeTLAB TLABWasteTargetPercent PrintTLAB
  52. 52. ©2016 CodeKaram TLAB 52 Eden TLAB TLAB TLAB TLAB TLAB
  53. 53. ©2016 CodeKaram TLAB 53 Eden TLAB TLAB TLAB TLAB TLAB TLABWasteTargetPercent (default = 1%)
  54. 54. ©2016 CodeKaram TLAB 54 Eden TLAB TLAB TLAB TLAB TLAB
  55. 55. ©2016 CodeKaram TLAB 55 Eden TLAB TLAB TLAB TLAB (Min)TLABSize
  56. 56. ©2016 CodeKaram TLAB 56 Eden TLAB TLAB TLAB ResizeTLAB (default = TRUE)
  57. 57. 57 Allocation - Non Uniform Memory Access (NUMA) Aware Allocator.
  58. 58. ©2016 CodeKaram NUMA Processing Node 0 Memory Controller DRAM Bank Processing Node 2 Memory Controller DRAM Bank Processing Node 1 Memory Controller DRAM Bank Processing Node 3 Memory Controller DRAM Bank 58
  59. 59. ©2016 CodeKaram UseNUMA 59 Processing Node 0 Memory Controller DRAM Bank Processing Node 1 Memory Controller DRAM Bank Thread 0 Area for Node 0 Area for Node 1 Thread 1 Thread 2 Eden
  60. 60. 60 UseNUMA UseNUMAInterleaving UseAdaptiveNUMAChunkSizing NUMAStats
  61. 61. 61 Garbage Collection - Reclamation.
  62. 62. ©2016 CodeKaram62 Old Generation Root Set Young Generation Garbage Collection -Reclamation via (Serial) Mark-Sweep-Compact
  63. 63. ©2016 CodeKaram Garbage Collection -Reclamation via Parallel Mark-Compact 63 Old Generation
  64. 64. ©2016 CodeKaram64 Garbage Collection -Reclamation via Parallel Mark-Compact Old Generation
  65. 65. ©2016 CodeKaram65 Garbage Collection -Reclamation via Parallel Mark-Compact Old Generation source region destination region
  66. 66. ©2016 CodeKaram66 Garbage Collection -Reclamation via Parallel Mark-Compact Old Generation source region destination region
  67. 67. ©2016 CodeKaram67 Garbage Collection -Reclamation via Parallel Mark-Compact Old Generation source region destination region source region
  68. 68. ©2016 CodeKaram68 Garbage Collection -Reclamation via Parallel Mark-Compact Old Generation source region destination region source region
  69. 69. ©2016 CodeKaram69 Garbage Collection -Reclamation via Parallel Mark-Compact Old Generation
  70. 70. ©2016 CodeKaram Garbage Collection - Reclamation via Mark-Sweep 70 Old Generation
  71. 71. ©2016 CodeKaram Garbage Collection - Reclamation via Mark-Sweep 71 Old Generation
  72. 72. ©2016 CodeKaram Garbage Collection - Reclamation via Scavenging 72 SurvivorEden Young Generation
  73. 73. ©2016 CodeKaram73 SurvivorEden Young Generation Garbage Collection - Reclamation via Scavenging
  74. 74. ©2016 CodeKaram74 SurvivorEden Young Generation Garbage Collection - Reclamation via Scavenging
  75. 75. ©2016 CodeKaram75 Garbage Collection In OpenJDK HotSpot The Throughput Collector - • Young Collections - Parallel Scavenge • Old Collections - Parallel Mark-Compact
  76. 76. ©2016 CodeKaram76 Garbage Collection In OpenJDK HotSpot CMS Collector - • Young Collections - Parallel New (similar to Parallel Scavenge) • Old Collections - (Mostly Concurrent) Mark- Sweep • Fallback Collections - Serial Mark-Sweep- Compact
  77. 77. ©2016 CodeKaram77 Garbage Collection In OpenJDK HotSpot G1 Collector - • Young and Mixed Collections - Compaction via Copying (similar to Parallel Scavenge) • Fallback Collections - Serial Mark-Sweep- Compact
  78. 78. 78 What’s The #1 Contributor To A GC Pause Duration?
  79. 79. Copying Costs! 79
  80. 80. ©2016 CodeKaram • Size generations keeping your application’s object longevity and size in mind. • short-lived; medium-lived; long-lived transient + permanent set. • premature promotions are a big problem! 80 Garbage Collection - Tuning Recommendations
  81. 81. ©2016 CodeKaram Generation Sizing [Eden: 4972.0M(4972.0M)->0.0B(4916.0M) Survivors: 148.0M->204.0M Heap: 5295.8M(10.0G)->379.4M(10.0G)] [Eden: Occupancy before GC(Eden size before GC)->Occupancy after GC(Eden size after GC) Survivors: Size before GC->Size after GC Heap: Occupancy before GC(Heap size before GC)- >Occupancy after GC(Heap size after GC)] 81
  82. 82. ©2016 CodeKaram Heap Information Plot 82 OccupancyinMBs 0 7500 15000 22500 30000 TimeStamps 2500 3750 5000 6250 7500 8750 10000 11250 12500 Eden Occupancy Before GC Eden Size After GC Survivor Size After GC Old Generation Occupancy After GC Heap Occupancy Before GC Heap Size
  83. 83. 83 What Are The ContributorS To GC Pause Frequency?
  84. 84. Allocation Rate and Promotion Rate 84
  85. 85. ©2016 CodeKaram • The faster the generation gets “filled”; the sooner a GC is triggered. • Premature promotions are a big problem! • Size your generations and age your objects appropriately. 85 Garbage Collection - Tuning Recommendations
  86. 86. ©2016 CodeKaram Plot Allocation & Promotion Rates 86
  87. 87. ©2016 CodeKaram87 Young Occupancy before GC Young Gen Size Old Gen Occupancy after GC Heap Occupancy before GC Heap Occupancy after GC Heap Size Timestamps
  88. 88. ©2016 CodeKaram Allocation Rate 88 38.692: [GC (Allocation Failure) [PSYoungGen: 917486K- >131054K(858112K)] 1004844K->224424K(2955264K), 0.0827570 secs] [Times: user=0.61 sys=0.00, real=0.08 secs] 51.013: [GC (Allocation Failure) [PSYoungGen: 858094K- >50272K(777728K)] 951464K->239014K(2874880K), 0.1414536 secs] [Times: user=0.70 sys=0.07, real=0.14 secs]
  89. 89. ©2016 CodeKaram Allocation Rate 89 38.692: [GC (Allocation Failure) [PSYoungGen: 917486K- >131054K(858112K)] 1004844K->224424K(2955264K), 0.0827570 secs] [Times: user=0.61 sys=0.00, real=0.08 secs] 51.013: [GC (Allocation Failure) [PSYoungGen: 858094K- >50272K(777728K)] 951464K->239014K(2874880K), 0.1414536 secs] [Times: user=0.70 sys=0.07, real=0.14 secs] Allocation rate = (858094K-131054K)/()
  90. 90. ©2016 CodeKaram Allocation Rate 90 38.692: [GC (Allocation Failure) [PSYoungGen: 917486K- >131054K(858112K)] 1004844K->224424K(2955264K), 0.0827570 secs] [Times: user=0.61 sys=0.00, real=0.08 secs] 51.013: [GC (Allocation Failure) [PSYoungGen: 858094K- >50272K(777728K)] 951464K->239014K(2874880K), 0.1414536 secs] [Times: user=0.70 sys=0.07, real=0.14 secs] Allocation rate = (858094K-131054K)/(51.013-38.692) = 57.6MB/s
  91. 91. ©2016 CodeKaram Promotion Rate 91 38.692: [GC (Allocation Failure) [PSYoungGen: 917486K- >131054K(858112K)] 1004844K->224424K(2955264K), 0.0827570 secs] [Times: user=0.61 sys=0.00, real=0.08 secs] 51.013: [GC (Allocation Failure) [PSYoungGen: 858094K- >50272K(777728K)] 951464K->239014K(2874880K), 0.1414536 secs] [Times: user=0.70 sys=0.07, real=0.14 secs] Promotion rate = ((239014K - 50272K) – ()
  92. 92. ©2016 CodeKaram Promotion Rate 92 38.692: [GC (Allocation Failure) [PSYoungGen: 917486K- >131054K(858112K)] 1004844K->224424K(2955264K), 0.0827570 secs] [Times: user=0.61 sys=0.00, real=0.08 secs] 51.013: [GC (Allocation Failure) [PSYoungGen: 858094K- >50272K(777728K)] 951464K->239014K(2874880K), 0.1414536 secs] [Times: user=0.70 sys=0.07, real=0.14 secs] Promotion rate = ((239014K - 50272K) – (951464K - 858094K))/ ()
  93. 93. ©2016 CodeKaram Promotion Rate 93 38.692: [GC (Allocation Failure) [PSYoungGen: 917486K- >131054K(858112K)] 1004844K->224424K(2955264K), 0.0827570 secs] [Times: user=0.61 sys=0.00, real=0.08 secs] 51.013: [GC (Allocation Failure) [PSYoungGen: 858094K- >50272K(777728K)] 951464K->239014K(2874880K), 0.1414536 secs] [Times: user=0.70 sys=0.07, real=0.14 secs] Promotion rate = ((239014K - 50272K) – (951464K - 858094K))/ (51.013 - 38.692) = 7.56MB/s
  94. 94. 94 String Interning and Deduplication
  95. 95. ©2016 CodeKaram95 Java String Object String1 String1 Object char []
  96. 96. 96 What If String1 Is Interned?
  97. 97. ©2016 CodeKaram97 Java Interned String Object String1 String1 Object char [] StringX Object char [] StringY Object char [] Pool of Strings Java Heap
  98. 98. ©2016 CodeKaram98 Java String Intern • unique + constant pool of strings • hashtable • default = 60013 (on LP64) • if pool already contain string.equals(String1)? • return string from the string pool • else, add String1 to the pool; return its reference
  99. 99. ©2016 CodeKaram99 Java Interned String Objects String1 String1 Object char [] String2 String2 Object char []
  100. 100. 100 If String1.intern() == String2.Intern() Then String1.equals(String2) Is True.
  101. 101. ©2016 CodeKaram101 Java Interned String Objects String1 String1 Object char [] StringX Object char [] StringY Object char [] String2 Pool of Strings Java Heap
  102. 102. 102 What If String1.equals(String2) And Both Are Not Interned*? * And You Are Using The G1 Collector
  103. 103. ©2016 CodeKaram103 Java String Deduplication (G1) String1 String1 Object char [] String2 String2 Object
  104. 104. ©2016 CodeKaram Further Reading 104 • Mark Price’s talk: https://qconlondon.com/presentation/hot-code- faster-code-addressing-jvm-warm • https://wiki.openjdk.java.net/display/HotSpot/Server+Compiler +Inlining+Messages • https://wiki.openjdk.java.net/display/HotSpot/EscapeAnalysis • Compressed Class Pointers: https://youtu.be/AHtfza2Tkt0?t=754 • String Deduplication: http://openjdk.java.net/jeps/192 • Perm Gen Removal: http://www.infoq.com/articles/Java- PERMGEN-Removed
  105. 105. ©2016 CodeKaram Appendix 105
  106. 106. ©2016 CodeKaram Real-World Issues & Workarounds - JDK 8 update 45+ 106 https://github.com/facebook/presto/commit/ 91e1b3bb6bbfffc62401025a24231cd388992d7c https://gist.github.com/nileema/ 6fb667a215e95919242f

×