• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
JVM Magic
 

JVM Magic

on

  • 8,087 views

Virtual machines don't have to be slow, they don't even have to be slower than running native code....

Virtual machines don't have to be slow, they don't even have to be slower than running native code.
All you have to do is write your code, lay back and let the JVM do its magic !
Learn about various JVM runtime optimizations and why is it considered one of the best VMs in the world.

Statistics

Views

Total Views
8,087
Views on SlideShare
7,999
Embed Views
88

Actions

Likes
26
Downloads
416
Comments
0

5 Embeds 88

http://www.slideshare.net 67
http://wiki.opendesignstrategies.org 12
http://www.slideee.com 5
http://www.linkedin.com 3
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    JVM Magic JVM Magic Presentation Transcript

    • The JVM Magic
      Baruch Sadogursky
      Consultant & Architect, AlphaCSP
    • Agenda
      Introduction
      GC Magic 101
      General Optimizations
      Compiler Optimizations
      What can I do?
      Programming tips
      JVM configuration flags
      2
    • Introduction
    • Introduction
      In the past, JVM was considered by many as Java Achilles’ heel
      Interpreter?!
      JVM team improved performance in 300 to 3000 times
      JDK 1.6 compared to JDK 1.0
      Java is measured to be 50% to 100+% the speed of C and C++
      Jake2 vs Quake2
      How can it be?
    • Java Virtual Machines Zoo
      CEE-J
      Excelsior JET
      Hewlett-Packard
      J9 (IBM)
      Jbed
      Jblend
      Jrockit
      MRJ
      MicroJvm
      MS JVM
      OJVM
      PERC
      Blackdown Java
      CVM
      Gemstone
      Golden Code Development
      Intent
      Novell
      NSIcomCrE-ME
      ChaiVM
      HotSpot
      AegisVM
      Apache Harmony
      CACAO
      Dalvik
      IcedTea
      IKVM.NET
      Jamiga
      JamVM
      Jaos
      JC
      Jelatine JVM
      JESSICA
      Jikes RVM
      Jnode
      JOP
      Juice
      Jupiter
      JX
      Kaffe
      leJOS
      Mika VM
      Mysaifu
      NanoVM
      SableVM
      Squawk virtual machine
      SuperWaba
      TinyVM
      VMkit of Low Level Virtual Machine
      Wonka VM
      Xam
      5
    • HotSpot Virtual Machine
      Developed by Longview Technologies back in 1999
      Contains:
      Class loader
      Bytecode interpreter
      2 Virtual machines
      7 Garbage collectors
      2 Compilers
      Runtime libraries
    • HotSpot Virtual Machine
      Configured by hundreds of –XX flags
      Reminder
      -X options are non-standard
      -XX options have specific system requirements for correct operations
      Both are subject to change without notice
    • GC Magic 101
    • GC Is Slow?
      GC has bad performance reputation
      Reduces throughput
      Introduces pauses
      Unpredictable
      Uncontrolled
      Performance degradation is proportional to objects count
      Just give me the damn free() and malloc()! I’ll be just fine!
      Is it so?
    • Generational Collectors
      Weak generational hypothesis
      Most objects die young (AKA Infant mortality)
      Few old to young references
      Generations: regions holding objects of different ages
      GC is done separately once a generation fills
      Different GC algorithms
      The young (nursery) generation
      Collected by “Minor garbage collection”
      The old (tenured) generation
      Collected by “Minor garbage collection”
    • GC Magic 101
      vs
      Young is better than Tenured
      Let your objects die in young generation
      When possible and makes sense
      11
    • GC Magic 101
      12
      vs
      Swapping is bad
      Application's memory footprint should not exceed the available physical memory
    • GC Magic 101
      13
      vs
      Choose:
      Throughput (client)
      Low-pause (server)
    • GC Magic 101
      http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
      14
    • Tracking Collectors Algorithms
      Mark-Sweep collector
      Mark phase marks each reachable object
      Sweep phase “sweeps” the heap
      Non marked objects reclaimed as garbage
      Copying collector
      Heap is divided into two equal spaces
      When active space fills, live objects are copied to the unused space
      Only live objects are examined
      The roles of the spaces are then flipped
    • Compaction
      Compaction: The collector moves all live objects to the bottom of the heap
      Remaining memory is reclaimed
      Reduces the cost of objects allocation
      No potential fragmentation
      The drawback is slower completion of GC
    • The Young generation
      Consists of Eden + two survivor spaces
      Objects are initially allocated in Eden
      All HotSpot young collectors are stop-the-world copying collectors
      Done is parallel for parallel garbage collectors
      Collections are relatively fast and proportional to number of live objects
    • The Young generation
    • The Tenured generation
      Objects surviving several GC cycles, are promoted to the tenured generation
      Use -XX:MaxTenuringThreshold=# to change
      Collectors algorithms used are variations of Mark-Sweep
      More space efficient
      Characteristics
      Lower garbage density
      Bigger heap space
      Fewer GC cycles
    • Generetion Collectors
    • Garbage Collectors
      21
    • GC Flags
      22
    • When to Use
      23
    • Garbage First (G1)
      New in JDK 1.6 u14 (May 29th)
      All memory is divided to 1MB buckets
      Calculates objects liveness in buckets
      Drops “dead” buckets
      If a bucket is not total garbage, it’s not dropped
      Collects the most garbage buckets first
      Pauses only on “mark”
      No sweep
      User can provide pause time goals
      Actual seconds or Percentage of runtime
      G1 records bucket collection time and can estimate how many buckets to collect during pause
    • Garbage First (G1)
      Targets multi-process machines and large heaps
      G1 will be the long-term replacement for the CMS collector
      Unlike CMS, compacts to battle fragmentation
      A bucket’s space is fully reclaimed
      Better throughput
      Predictable pauses (high probability)
      Garbage left in buckets with high live ratio
      May be collected later
    • Benefits of G1
      No imbalance of young-tenured generation
      Generations are only logical
      Generations are merely sets of buckets
      More predictable GC pauses
      Parallelism and concurrency in collections
      No fragmentation due to compaction
      Better heap utilization
      Better GC ergonomics
    • Young GCs in G1
      Done using evacuation pauses
      Stop-The-World parallel collections
      Evacuates surviving objects between sets of buckets
    • Old GCs in G1
      Drops dead buckets
      Calculates liveness info per bucket
      Identifies best buckets for subsequent eviction pauses
      Collect them piggy-backed on young GCs
    • GC Ergonomics
      29
    • GC Ergonomics
      Ergonomics goal is to provide good performance with little or no tuning
      Better matches the needs of different application types
      The HotSpot, garbage collector and heap size are automatically chosen
      Based on OS, RAM and no# CPU
      Server Vs. Client class machine
      Hints the characteristics of the application
    • GC Ergonomics
    • GC Ergonomics
      With the parallel collectors, one can specify performance goals
      In contrast to specifying the heap size
      Improves performance for large applications
      Max Pause Time Goal
      Use -XX:MaxGCPauseMillis=<N>
      Both generation separately
      Or: Average + Variance
      No pause time goal by default
    • GC Ergonomics
      Throughput Goal
      Use -XX:GCTimeRatio=<N>
      The ratio of GC Vs. application time is 1/(1+N)
      If N=19, GC time goal is 1/(1+19) or 5%
      Default N is 99, meaning GC time is 1%
      Minimum Footprint Goal
      Priority of goals
      Maximum pause time goal
      Throughput goal
      Minimum footprint goal
    • GC Ergonomics
      Performance goals may not be met
      Pause time and throughput goals are somewhat contradicting
      The pause time goal shrinks the generation
      The throughput goal grows the generation
      Statistics are kept by the GC
      Adaptive to changes in application behavior
    • GC Tweaking
    • Heap Size
      The larger the heap space, the better
      For both young and old generation
      Larger space: less frequent GCs, lower GC overhead, objects more likely to become garbage
      Smaller space: faster GCs (not always! see later)
      Sometimes max heap size is dictated by available memory and/or max space the JVM can address
      You have to find a good balance between young and old generation size
    • Heap Size
      Maximize the number of objects reclaimed in the young generation
      Application's memory footprint should not exceed the available physical memory
      Swapping is bad
      The above apply to all our GCs
      37
    • Heap Size
      -Xmx<size> : max heap size
      young generation + old generation
      -Xms<size> : initial heap size
      young generation + old generation
      -Xmn<size> : young generation size
      -XX:PermSize=<size> : permanent generation initial size
      -XX:MaxPermSize=<size> : permanent generation max size
      38
    • Heap Size
      When -Xms != -Xmx, heap growth or shrinking requires a Full GC
      Set -Xms to desired heap size
      Set –Xmx even higher “just in case”
      Even full GC is better than OOM crash
      Same for -XX:PermSize and -XX:MaxPermSize
      Same for -XX:NewSize and
      -XX:MaxNewSize
      -Xmn Combines both
      39
    • Tenuring
      Measure tenuring with - XX:+PrintTenuringDistribution
      Avoid tenuring for short or even medium-lived objects!
      Less promotion into the old generation
      Less frequent old GCs
      Promote long-lived objects ASAP
      Yeah, conflict with previous bullet
      Better copy more, than promote more
      -XX:TargetSurvivorRatio=<percent>, e.g., 50
      How much of the survivor space should be filled
      Typically leave extra space to deal with “spikes”
      40
    • Permanent Space
      Classes aren’t unloaded by default
      -XX:+CMSClassUnloadingEnabled to enable
      Classloader should be collected
      It holds references to classes
      Each object holds reference to classloader
      41
    • GC Options
      42
    • GC Statistics Options
      GC logging has extremely low / non-existent overhead
      It’s very helpful when diagnosing production issues
      Enable it
      In production too!
      -XX:+
      PrintGC
      PrintGCDetails
      PrintGCTimeStamps
      PrintTenuringDistribution
      Show this threshold and the ages of objects in the new generation
      43
    • GC Is Slow? – The Answers
      Reduces throughput
      You choose
      Introduces pauses
      You choose
      Unpredictable
      Not any more
      Uncontrolled
      Configurable
      Performance degradation is proportional to objects count
      Not true
      Just give me the damn free() and malloc()! I’ll be just fine!
      Bad idea (see more later)
    • General Optimizations
    • HotSpot Optimizations
      JIT Compilation
      Compiler Optimizations
      Generates more performant code that you could write in native
      Adaptive Optimization
      Split Time Verification
      Class Data Sharing
    • Two Virtual Machines?
      Client VM
      Reducing start-up time and memory footprint
      -client CL flag
      Server VM
      Maximum program execution speed
      -server CL flag
      Auto-detection
      Server: >1 CPUs & >=2GB of physical memory
      Win32 – always detected as client
      Many 64bit OSes don’t have client VMs
      47
    • Just-In-Time Compilation
      Everyone knows about JIT!
      Hot code is compiled to native
      What is “hot”?
      Server VM – 10000 invocations
      Client VM – 1500 invocations
      Use -XX:CompileThreshold=# to change
      More invocations – better optimizations
      Less invocations – shorter warmup time
    • Just-In-Time Compilation
      The code is being optimized by the compiler
      Coming soon…
    • Adaptive Optimization
      Allows HotSpot to uncompile previously compiled code
      Much more aggressive, even speculative optimizations may be performed
      And rolled back if something goes wrong or new data gathered
      E.g. classloading might invalidate inlining
    • Split Time Verification
      Java suffers from long boot time
      One of the reasons is bytecode verification
      Valid flow control
      Type safety
      Visibility
      In order to ease on the weak KVM, J2ME started performing part of the verification in compile time
      It’s good, so now it’s in Java SE 6 too
    • Class Data Sharing
      Helps improve startup time
      During JDK installation part of rt.jar is preloaded into shared memory file which is attached in runtime
      No need to reload and reverify those classes every time
    • Compiler Optimizations
    • Two Types of Optimizations
      Java has two compilers:
      javac bytecode compiler
      HotSpot VM JIT compiler
      Both implement similar optimizations
      Bytecode compiler is limited
      Dynamic linking
      Can apply only static optimizations
    • Warning
      Caution! Don’t try this at home yourself!
      The source code you are about to see is not real!
      It’s pseudo assembly code
      Don’t writesuch code!
      Source code should be readable and object-oriented
      Bytecode will become performant automagically
      55
    • Optimization Rules
      Make the common case fast
      Don't worry about uncommon/infrequent case
      Defer optimization decisions
      Until you have data
      Revisit decisions if data warrants
      56
    • Null check Elimination
      Java is null-safe language
      Pointer can’t point to meaningless portion of memory
      Null checks are added by the compiler, NullPointerException is thrown
      JVM’s profiler can eliminate those checks
      57
    • Example – Original Source
      58
    • Example – Null Check Elimination
      59
    • Inlining
      Love Encapsulation?
      Getters and setters
      Love clean and simple code?
      Small methods
      Use static code analysis?
      Small methods
      No penalty for using those!
      JIT brings the implementation of these methods into a containing method
      This optimization known as “Inlining”
    • Inlining
      Not just about eliminating call overhead
      Provides optimizer with bigger blocks
      Enables other optimizations
      hoisting, dead code elimination, code motion, strength reduction
      61
    • Inlining
      But wait, all public non-final methods in Java are virtual!
      HotSpot examines the exact case in place
      In most cases there is only one implementation, which can be inlined
      But wait, more implementations may be loaded later!
      In such case HotSpot undoes the inlining
      Speculative inlining
      By default limited to 35 bytes of bytecode
      Use -XX:MaxInlineSize=# to change
    • Example - Inlining
      63
    • Example – Source Code Revision
      64
    • Example – Source Code Revision
      65
    • Code Hoisting
      Hoist = to raise or lift
      Size optimization
      Eliminate duplicate code in method bodies by hoisting expressions or statements
      Duplicate bytecode, not necessarily source code
    • Example – Code Hoisting
      67
    • Bounds Check Elimination
      Java promises automatic boundary checks for arrays
      Exception is thrown
      If programmer checks the boundaries of its array by himself, the automatic check can be turned off
    • Example – Bounds Check Elimination
      69
    • Sub-Expression Elimination
      Avoids redundant memory access
      70
    • Loop Unrolling
      Some loops shouldn’t be loops
      In performance meaning, not code readability
      Those can be unrolled to set of statements
      If the boundaries are dynamic, partial unroll will occur
    • Example – Loop Unrolling
      72
    • Example – Inlining
      73
    • Escape Analysis
      Escape analysis is not optimization
      It is check for object not escaping local scope
      E.g. created in private method, assigned to local variable and not returned
      Escape analysis opens up possibilities for lots of optimizations
    • Scalar Replacement
      Remember the rule “new == always new object”?
      False!
      JVM can optimize away allocations
      Fields are hoisted into registers
      Object becomes unneeded
      But object creation is cheap!
      Yap, but GC is not so cheap…
      75
    • Example – Source Code Revision
      76
    • Example – Scalar Replacement
      77
    • Example – Scalar Replacement
      78
    • Lock Coarsening
      HotSpot merges adjacent synchronized blocks using the same lock
      The compiler is allowed to moved statements into merged coarse blocks
      Tradeoff performance and responsiveness
      Reduces instruction count
      But locks are held longer
    • Example – Source Code Revision
      80
    • Example – Lock Coarsening
      81
    • Lock Elision
      A thread enters a lock that no other thread will synchronize on
      Synchronization has no effect
      Can be deducted using escape analysis
      Such locks can be elided
      Elides 4 StringBuffer synchronized calls:
    • Example - Lock Elision
    • Constants Folding
      Trivial optimization
      How many constants are there?
      More than you think!
      Inlining generates constants
      Unrolling generates constants
      Escape analysis generates constants
      JIT determines what is constant in runtime
      Whatever doesn’t change
    • Constants Folding
      Literals folding
      Before: intfoo = 9*10;
      After: intfoo = 90;
      String folding or StringBuilder-ing
      Before: String foo = "hi Joe " + (9*10);
      After: String foo = newStringBuilder().append("hi Joe ").append(9 * 10).toString();
      After: String foo = "hi Joe 90";
    • Example – Constants Folding
      86
    • Dead Code Elimination
      Dead code - code that has no effect on the outcome of the program execution
      publicstaticvoid main(String[] args) {
      long start = System.nanoTime();
      int result = 0;
      for (inti = 0; i < 10 * 1000 * 1000; i++) {
      result += Math.sqrt(i);
      }
      long duration = (System.nanoTime() - start) / 1000000;
      System.out.format("Test duration: %d (ms) %n", duration);
      }
    • OSR - On Stack Replacement
      Normally code is switched from interpretation to native in heap context
      Before entering method
      OSR - switch from interpretation to compiled code in local context
      In the middle of a method call
      JVM tracks code block execution count
      Less optimizations
      May prevent bound check elimination and loop unrolling
    • Out-Of-Order Execution
    • Out-Of-Order Execution
    • Programming & Tuning Tips
      • 91
    • How Can I Help?
      Just write good quality Java code
      Object Orientation
      Polymorphism
      Abstraction
      Encapsulation
      DRY
      KISS
      Let the HotSpot optimize
    • How Can I Help?
      final keyword
      For fields:
      Allows caching
      Allows lock coarsening
      For methods:
      Simplifies Inlining decisions
      Immutable objects die younger
      93
    • JVM tuning tips
      Reminder: -XX options are non standard
      Added for HotSpot development purposes
      Mostly tested on Solaris 10
      Platform dependent
      Some options may contradict each other
      Know and experiment with these options
      94
    • Monitoring & Troubleshooting
      95
    • References
      The HotSpot Home Page
      Java HotSpot VM Options
      Dynamic compilation and performance measurement
      Urban performance legends, revisited
      Synchronization optimizations in Mustang
      Robust Java benchmarking
      Garbage Collection Tuning
      96
    • References
      JavaOne 2009 Sessions:
      Garbage Collection Tuning in the Java HotSpot™ Virtual Machine
      Under the Hood: Inside a High-Performance JVM™ Machine
      Practical Lessons in Memory Analysis
      Debugging Your Production JVM™ Machine
      Inside Out: A Modern Virtual Machine Revealed
      97
    • Thank you for your attention 
      Thanks to Ori Dar!