JVM Magic


Published on

Virtual machines don't have to be slow, they don't even have to be slower than running native code.
All you have to do is write your code, lay back and let the JVM do its magic !
Learn about various JVM runtime optimizations and why is it considered one of the best VMs in the world.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

JVM Magic

  1. 1. The JVM Magic<br />Baruch Sadogursky<br />Consultant & Architect, AlphaCSP<br />
  2. 2. Agenda<br />Introduction<br />GC Magic 101<br />General Optimizations<br />Compiler Optimizations<br />What can I do?<br />Programming tips<br />JVM configuration flags<br />2<br />
  3. 3. Introduction<br />
  4. 4. Introduction<br />In the past, JVM was considered by many as Java Achilles’ heel<br />Interpreter?!<br />JVM team improved performance in 300 to 3000 times<br />JDK 1.6 compared to JDK 1.0<br />Java is measured to be 50% to 100+% the speed of C and C++<br />Jake2 vs Quake2<br />How can it be?<br />
  5. 5. Java Virtual Machines Zoo<br />CEE-J <br />Excelsior JET<br />Hewlett-Packard<br />J9 (IBM)<br />Jbed<br />Jblend<br />Jrockit<br />MRJ<br />MicroJvm<br />MS JVM<br />OJVM<br />PERC<br />Blackdown Java<br />CVM<br />Gemstone<br />Golden Code Development<br />Intent<br />Novell<br />NSIcomCrE-ME<br />ChaiVM<br />HotSpot<br />AegisVM<br />Apache Harmony<br />CACAO<br />Dalvik<br />IcedTea<br />IKVM.NET<br />Jamiga<br />JamVM<br />Jaos<br />JC<br />Jelatine JVM<br />JESSICA<br />Jikes RVM<br />Jnode<br />JOP<br />Juice<br />Jupiter<br />JX<br />Kaffe<br />leJOS<br />Mika VM<br />Mysaifu<br />NanoVM<br />SableVM<br />Squawk virtual machine<br />SuperWaba<br />TinyVM<br />VMkit of Low Level Virtual Machine<br />Wonka VM<br />Xam<br />5<br />
  6. 6. HotSpot Virtual Machine<br />Developed by Longview Technologies back in 1999<br />Contains:<br />Class loader<br />Bytecode interpreter<br />2 Virtual machines<br />7 Garbage collectors<br />2 Compilers<br />Runtime libraries<br />
  7. 7. HotSpot Virtual Machine<br />Configured by hundreds of –XX flags<br />Reminder<br /> -X options are non-standard<br />-XX options have specific system requirements for correct operations<br />Both are subject to change without notice<br />
  8. 8. GC Magic 101<br />
  9. 9. GC Is Slow?<br />GC has bad performance reputation<br />Reduces throughput<br />Introduces pauses<br />Unpredictable<br />Uncontrolled<br />Performance degradation is proportional to objects count<br />Just give me the damn free() and malloc()! I’ll be just fine!<br />Is it so?<br />
  10. 10. Generational Collectors<br />Weak generational hypothesis<br />Most objects die young (AKA Infant mortality)<br />Few old to young references<br />Generations: regions holding objects of different ages<br />GC is done separately once a generation fills<br />Different GC algorithms<br />The young (nursery) generation<br />Collected by “Minor garbage collection”<br />The old (tenured) generation<br />Collected by “Minor garbage collection”<br />
  11. 11. GC Magic 101<br />vs<br />Young is better than Tenured<br />Let your objects die in young generation<br />When possible and makes sense<br />11<br />
  12. 12. GC Magic 101<br />12<br />vs<br />Swapping is bad<br />Application&apos;s memory footprint should not exceed the available physical memory<br />
  13. 13. GC Magic 101<br />13<br />vs<br />Choose:<br />Throughput (client)<br />Low-pause (server)<br />
  14. 14. GC Magic 101<br />http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html<br />14<br />
  15. 15. Tracking Collectors Algorithms<br />Mark-Sweep collector<br />Mark phase marks each reachable object<br />Sweep phase “sweeps” the heap<br />Non marked objects reclaimed as garbage<br />Copying collector<br />Heap is divided into two equal spaces<br />When active space fills, live objects are copied to the unused space<br />Only live objects are examined<br />The roles of the spaces are then flipped<br />
  16. 16. Compaction<br />Compaction: The collector moves all live objects to the bottom of the heap<br />Remaining memory is reclaimed<br />Reduces the cost of objects allocation<br />No potential fragmentation<br />The drawback is slower completion of GC<br />
  17. 17. The Young generation<br />Consists of Eden + two survivor spaces <br />Objects are initially allocated in Eden<br />All HotSpot young collectors are stop-the-world copying collectors<br />Done is parallel for parallel garbage collectors<br />Collections are relatively fast and proportional to number of live objects<br />
  18. 18. The Young generation<br />
  19. 19. The Tenured generation<br />Objects surviving several GC cycles, are promoted to the tenured generation<br /> Use -XX:MaxTenuringThreshold=# to change<br />Collectors algorithms used are variations of Mark-Sweep<br />More space efficient<br />Characteristics<br />Lower garbage density<br />Bigger heap space <br />Fewer GC cycles<br />
  20. 20. Generetion Collectors<br />
  21. 21. Garbage Collectors<br />21<br />
  22. 22. GC Flags<br />22<br />
  23. 23. When to Use<br />23<br />
  24. 24. Garbage First (G1)<br />New in JDK 1.6 u14 (May 29th)<br />All memory is divided to 1MB buckets<br />Calculates objects liveness in buckets<br />Drops “dead” buckets<br />If a bucket is not total garbage, it’s not dropped<br />Collects the most garbage buckets first<br />Pauses only on “mark”<br />No sweep<br />User can provide pause time goals<br />Actual seconds or Percentage of runtime<br />G1 records bucket collection time and can estimate how many buckets to collect during pause<br />
  25. 25. Garbage First (G1)<br />Targets multi-process machines and large heaps<br />G1 will be the long-term replacement for the CMS collector<br />Unlike CMS, compacts to battle fragmentation<br />A bucket’s space is fully reclaimed<br />Better throughput<br />Predictable pauses (high probability)<br />Garbage left in buckets with high live ratio<br />May be collected later <br />
  26. 26. Benefits of G1<br />No imbalance of young-tenured generation<br />Generations are only logical <br />Generations are merely sets of buckets <br />More predictable GC pauses<br />Parallelism and concurrency in collections <br />No fragmentation due to compaction<br />Better heap utilization <br />Better GC ergonomics <br />
  27. 27. Young GCs in G1<br />Done using evacuation pauses<br />Stop-The-World parallel collections<br />Evacuates surviving objects between sets of buckets<br />
  28. 28. Old GCs in G1<br />Drops dead buckets<br />Calculates liveness info per bucket<br />Identifies best buckets for subsequent eviction pauses<br />Collect them piggy-backed on young GCs<br />
  29. 29. GC Ergonomics<br />29<br />
  30. 30. GC Ergonomics<br />Ergonomics goal is to provide good performance with little or no tuning<br />Better matches the needs of different application types<br />The HotSpot, garbage collector and heap size are automatically chosen<br />Based on OS, RAM and no# CPU<br />Server Vs. Client class machine<br />Hints the characteristics of the application<br />
  31. 31. GC Ergonomics<br />
  32. 32. GC Ergonomics<br />With the parallel collectors, one can specify performance goals<br />In contrast to specifying the heap size<br />Improves performance for large applications<br />Max Pause Time Goal<br />Use -XX:MaxGCPauseMillis=&lt;N&gt;<br />Both generation separately<br />Or: Average + Variance<br />No pause time goal by default<br />
  33. 33. GC Ergonomics<br />Throughput Goal<br />Use -XX:GCTimeRatio=&lt;N&gt;<br />The ratio of GC Vs. application time is 1/(1+N)<br />If N=19, GC time goal is 1/(1+19) or 5%<br />Default N is 99, meaning GC time is 1% <br />Minimum Footprint Goal<br />Priority of goals<br />Maximum pause time goal<br />Throughput goal<br />Minimum footprint goal<br />
  34. 34. GC Ergonomics<br />Performance goals may not be met<br />Pause time and throughput goals are somewhat contradicting<br />The pause time goal shrinks the generation<br />The throughput goal grows the generation<br />Statistics are kept by the GC<br />Adaptive to changes in application behavior <br />
  35. 35. GC Tweaking<br />
  36. 36. Heap Size<br />The larger the heap space, the better<br />For both young and old generation<br />Larger space: less frequent GCs, lower GC overhead, objects more likely to become garbage<br />Smaller space: faster GCs (not always! see later)<br />Sometimes max heap size is dictated by available memory and/or max space the JVM can address<br />You have to find a good balance between young and old generation size<br />
  37. 37. Heap Size<br />Maximize the number of objects reclaimed in the young generation<br />Application&apos;s memory footprint should not exceed the available physical memory<br />Swapping is bad<br />The above apply to all our GCs<br />37<br />
  38. 38. Heap Size<br />-Xmx&lt;size&gt; : max heap size<br />young generation + old generation<br />-Xms&lt;size&gt; : initial heap size<br />young generation + old generation<br />-Xmn&lt;size&gt; : young generation size<br />-XX:PermSize=&lt;size&gt; : permanent generation initial size<br />-XX:MaxPermSize=&lt;size&gt; : permanent generation max size<br />38<br />
  39. 39. Heap Size<br />When -Xms != -Xmx, heap growth or shrinking requires a Full GC<br />Set -Xms to desired heap size <br />Set –Xmx even higher “just in case”<br />Even full GC is better than OOM crash<br />Same for -XX:PermSize and -XX:MaxPermSize<br />Same for -XX:NewSize and<br />-XX:MaxNewSize<br />-Xmn Combines both<br />39<br />
  40. 40. Tenuring<br />Measure tenuring with - XX:+PrintTenuringDistribution<br />Avoid tenuring for short or even medium-lived objects!<br />Less promotion into the old generation<br />Less frequent old GCs<br />Promote long-lived objects ASAP<br />Yeah, conflict with previous bullet<br />Better copy more, than promote more<br />-XX:TargetSurvivorRatio=&lt;percent&gt;, e.g., 50<br />How much of the survivor space should be filled<br />Typically leave extra space to deal with “spikes”<br />40<br />
  41. 41. Permanent Space<br />Classes aren’t unloaded by default<br />-XX:+CMSClassUnloadingEnabled to enable<br />Classloader should be collected<br />It holds references to classes<br />Each object holds reference to classloader<br />41<br />
  42. 42. GC Options<br />42<br />
  43. 43. GC Statistics Options<br />GC logging has extremely low / non-existent overhead<br />It’s very helpful when diagnosing production issues<br />Enable it<br />In production too!<br />-XX:+<br />PrintGC<br />PrintGCDetails<br />PrintGCTimeStamps<br />PrintTenuringDistribution<br />Show this threshold and the ages of objects in the new generation<br />43<br />
  44. 44. GC Is Slow? – The Answers<br />Reduces throughput<br />You choose<br />Introduces pauses<br />You choose<br />Unpredictable<br />Not any more<br />Uncontrolled<br />Configurable<br />Performance degradation is proportional to objects count<br />Not true<br />Just give me the damn free() and malloc()! I’ll be just fine!<br />Bad idea (see more later)<br />
  45. 45. General Optimizations<br />
  46. 46. HotSpot Optimizations<br />JIT Compilation<br />Compiler Optimizations<br />Generates more performant code that you could write in native<br />Adaptive Optimization<br />Split Time Verification<br />Class Data Sharing<br />
  47. 47. Two Virtual Machines?<br />Client VM<br />Reducing start-up time and memory footprint<br />-client CL flag<br />Server VM<br />Maximum program execution speed<br />-server CL flag<br />Auto-detection<br />Server: &gt;1 CPUs & &gt;=2GB of physical memory<br />Win32 – always detected as client<br />Many 64bit OSes don’t have client VMs<br />47<br />
  48. 48. Just-In-Time Compilation<br />Everyone knows about JIT!<br />Hot code is compiled to native<br />What is “hot”?<br />Server VM – 10000 invocations<br />Client VM – 1500 invocations<br />Use -XX:CompileThreshold=# to change<br />More invocations – better optimizations<br />Less invocations – shorter warmup time<br />
  49. 49. Just-In-Time Compilation<br />The code is being optimized by the compiler<br />Coming soon…<br />
  50. 50. Adaptive Optimization<br />Allows HotSpot to uncompile previously compiled code<br />Much more aggressive, even speculative optimizations may be performed<br />And rolled back if something goes wrong or new data gathered<br />E.g. classloading might invalidate inlining<br />
  51. 51. Split Time Verification<br />Java suffers from long boot time<br />One of the reasons is bytecode verification<br />Valid flow control<br />Type safety<br />Visibility<br />In order to ease on the weak KVM, J2ME started performing part of the verification in compile time<br />It’s good, so now it’s in Java SE 6 too<br />
  52. 52. Class Data Sharing<br />Helps improve startup time<br />During JDK installation part of rt.jar is preloaded into shared memory file which is attached in runtime<br />No need to reload and reverify those classes every time<br />
  53. 53. Compiler Optimizations<br />
  54. 54. Two Types of Optimizations<br />Java has two compilers:<br />javac bytecode compiler<br />HotSpot VM JIT compiler<br />Both implement similar optimizations<br />Bytecode compiler is limited<br />Dynamic linking<br />Can apply only static optimizations<br />
  55. 55. Warning<br />Caution! Don’t try this at home yourself!<br />The source code you are about to see is not real!<br />It’s pseudo assembly code<br />Don’t writesuch code!<br />Source code should be readable and object-oriented<br />Bytecode will become performant automagically<br />55<br />
  56. 56. Optimization Rules<br />Make the common case fast<br />Don&apos;t worry about uncommon/infrequent case<br />Defer optimization decisions<br />Until you have data<br />Revisit decisions if data warrants<br />56<br />
  57. 57. Null check Elimination<br />Java is null-safe language<br />Pointer can’t point to meaningless portion of memory<br />Null checks are added by the compiler, NullPointerException is thrown<br />JVM’s profiler can eliminate those checks<br />57<br />
  58. 58. Example – Original Source<br />58<br />
  59. 59. Example – Null Check Elimination<br />59<br />
  60. 60. Inlining<br />Love Encapsulation?<br />Getters and setters<br />Love clean and simple code?<br />Small methods<br />Use static code analysis?<br />Small methods<br />No penalty for using those!<br />JIT brings the implementation of these methods into a containing method<br />This optimization known as “Inlining”<br />
  61. 61. Inlining<br />Not just about eliminating call overhead<br />Provides optimizer with bigger blocks<br />Enables other optimizations<br />hoisting, dead code elimination, code motion, strength reduction<br />61<br />
  62. 62. Inlining<br />But wait, all public non-final methods in Java are virtual!<br />HotSpot examines the exact case in place<br />In most cases there is only one implementation, which can be inlined<br />But wait, more implementations may be loaded later!<br />In such case HotSpot undoes the inlining<br />Speculative inlining<br />By default limited to 35 bytes of bytecode<br />Use -XX:MaxInlineSize=# to change<br />
  63. 63. Example - Inlining<br />63<br />
  64. 64. Example – Source Code Revision<br />64<br />
  65. 65. Example – Source Code Revision<br />65<br />
  66. 66. Code Hoisting<br />Hoist = to raise or lift<br />Size optimization<br />Eliminate duplicate code in method bodies by hoisting expressions or statements<br />Duplicate bytecode, not necessarily source code <br />
  67. 67. Example – Code Hoisting<br />67<br />
  68. 68. Bounds Check Elimination<br />Java promises automatic boundary checks for arrays<br />Exception is thrown<br />If programmer checks the boundaries of its array by himself, the automatic check can be turned off<br />
  69. 69. Example – Bounds Check Elimination<br />69<br />
  70. 70. Sub-Expression Elimination<br />Avoids redundant memory access<br />70<br />
  71. 71. Loop Unrolling<br />Some loops shouldn’t be loops<br />In performance meaning, not code readability<br />Those can be unrolled to set of statements<br />If the boundaries are dynamic, partial unroll will occur<br />
  72. 72. Example – Loop Unrolling<br />72<br />
  73. 73. Example – Inlining<br />73<br />
  74. 74. Escape Analysis<br />Escape analysis is not optimization<br />It is check for object not escaping local scope<br />E.g. created in private method, assigned to local variable and not returned<br />Escape analysis opens up possibilities for lots of optimizations<br />
  75. 75. Scalar Replacement<br />Remember the rule “new == always new object”?<br />False!<br />JVM can optimize away allocations<br />Fields are hoisted into registers<br />Object becomes unneeded<br />But object creation is cheap!<br />Yap, but GC is not so cheap…<br />75<br />
  76. 76. Example – Source Code Revision<br />76<br />
  77. 77. Example – Scalar Replacement<br />77<br />
  78. 78. Example – Scalar Replacement<br />78<br />
  79. 79. Lock Coarsening<br />HotSpot merges adjacent synchronized blocks using the same lock<br />The compiler is allowed to moved statements into merged coarse blocks <br />Tradeoff performance and responsiveness<br />Reduces instruction count<br />But locks are held longer<br />
  80. 80. Example – Source Code Revision<br />80<br />
  81. 81. Example – Lock Coarsening<br />81<br />
  82. 82. Lock Elision<br />A thread enters a lock that no other thread will synchronize on<br />Synchronization has no effect<br />Can be deducted using escape analysis<br />Such locks can be elided<br />Elides 4 StringBuffer synchronized calls: <br />
  83. 83. Example - Lock Elision<br />
  84. 84. Constants Folding<br />Trivial optimization<br />How many constants are there?<br />More than you think!<br />Inlining generates constants<br />Unrolling generates constants<br />Escape analysis generates constants<br />JIT determines what is constant in runtime<br />Whatever doesn’t change<br />
  85. 85. Constants Folding<br />Literals folding<br />Before: intfoo = 9*10;<br />After: intfoo = 90;<br />String folding or StringBuilder-ing<br />Before: String foo = &quot;hi Joe &quot; + (9*10);<br />After: String foo = newStringBuilder().append(&quot;hi Joe &quot;).append(9 * 10).toString();<br />After: String foo = &quot;hi Joe 90&quot;;<br />
  86. 86. Example – Constants Folding<br />86<br />
  87. 87. Dead Code Elimination<br />Dead code - code that has no effect on the outcome of the program execution <br />publicstaticvoid main(String[] args) {<br />long start = System.nanoTime(); <br />int result = 0; <br />for (inti = 0; i &lt; 10 * 1000 * 1000; i++) { <br /> result += Math.sqrt(i); <br />} <br />long duration = (System.nanoTime() - start) / 1000000; <br />System.out.format(&quot;Test duration: %d (ms) %n&quot;, duration);<br />}<br />
  88. 88. OSR - On Stack Replacement<br />Normally code is switched from interpretation to native in heap context<br />Before entering method<br />OSR - switch from interpretation to compiled code in local context<br />In the middle of a method call<br />JVM tracks code block execution count <br />Less optimizations<br />May prevent bound check elimination and loop unrolling<br />
  89. 89. Out-Of-Order Execution<br />
  90. 90. Out-Of-Order Execution<br />
  91. 91. Programming & Tuning Tips <br /><ul><li>91</li></li></ul><li>How Can I Help?<br />Just write good quality Java code<br />Object Orientation<br />Polymorphism<br />Abstraction<br />Encapsulation<br />DRY<br />KISS<br />Let the HotSpot optimize<br />
  92. 92. How Can I Help?<br />final keyword<br />For fields:<br />Allows caching<br />Allows lock coarsening<br />For methods:<br />Simplifies Inlining decisions<br />Immutable objects die younger<br />93<br />
  93. 93. JVM tuning tips<br />Reminder: -XX options are non standard<br />Added for HotSpot development purposes<br />Mostly tested on Solaris 10<br />Platform dependent<br />Some options may contradict each other<br />Know and experiment with these options <br />94<br />
  94. 94. Monitoring & Troubleshooting<br />95<br />
  95. 95. References<br />The HotSpot Home Page<br />Java HotSpot VM Options<br />Dynamic compilation and performance measurement<br />Urban performance legends, revisited<br />Synchronization optimizations in Mustang<br />Robust Java benchmarking<br />Garbage Collection Tuning<br />96<br />
  96. 96. References<br />JavaOne 2009 Sessions:<br />Garbage Collection Tuning in the Java HotSpot™ Virtual Machine<br />Under the Hood: Inside a High-Performance JVM™ Machine<br />Practical Lessons in Memory Analysis<br />Debugging Your Production JVM™ Machine<br />Inside Out: A Modern Virtual Machine Revealed<br />97<br />
  97. 97. Thank you for your attention <br />Thanks to Ori Dar!<br />