JVM Magic
Upcoming SlideShare
Loading in...5
×
 

JVM Magic

on

  • 8,233 views

Virtual machines don't have to be slow, they don't even have to be slower than running native code....

Virtual machines don't have to be slow, they don't even have to be slower than running native code.
All you have to do is write your code, lay back and let the JVM do its magic !
Learn about various JVM runtime optimizations and why is it considered one of the best VMs in the world.

Statistics

Views

Total Views
8,233
Views on SlideShare
8,144
Embed Views
89

Actions

Likes
26
Downloads
417
Comments
0

5 Embeds 89

http://www.slideshare.net 67
http://wiki.opendesignstrategies.org 12
http://www.slideee.com 6
http://www.linkedin.com 3
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

JVM Magic JVM Magic Presentation Transcript

  • The JVM Magic
    Baruch Sadogursky
    Consultant & Architect, AlphaCSP
  • Agenda
    Introduction
    GC Magic 101
    General Optimizations
    Compiler Optimizations
    What can I do?
    Programming tips
    JVM configuration flags
    2
  • Introduction
  • Introduction
    In the past, JVM was considered by many as Java Achilles’ heel
    Interpreter?!
    JVM team improved performance in 300 to 3000 times
    JDK 1.6 compared to JDK 1.0
    Java is measured to be 50% to 100+% the speed of C and C++
    Jake2 vs Quake2
    How can it be?
  • Java Virtual Machines Zoo
    CEE-J
    Excelsior JET
    Hewlett-Packard
    J9 (IBM)
    Jbed
    Jblend
    Jrockit
    MRJ
    MicroJvm
    MS JVM
    OJVM
    PERC
    Blackdown Java
    CVM
    Gemstone
    Golden Code Development
    Intent
    Novell
    NSIcomCrE-ME
    ChaiVM
    HotSpot
    AegisVM
    Apache Harmony
    CACAO
    Dalvik
    IcedTea
    IKVM.NET
    Jamiga
    JamVM
    Jaos
    JC
    Jelatine JVM
    JESSICA
    Jikes RVM
    Jnode
    JOP
    Juice
    Jupiter
    JX
    Kaffe
    leJOS
    Mika VM
    Mysaifu
    NanoVM
    SableVM
    Squawk virtual machine
    SuperWaba
    TinyVM
    VMkit of Low Level Virtual Machine
    Wonka VM
    Xam
    5
  • HotSpot Virtual Machine
    Developed by Longview Technologies back in 1999
    Contains:
    Class loader
    Bytecode interpreter
    2 Virtual machines
    7 Garbage collectors
    2 Compilers
    Runtime libraries
  • HotSpot Virtual Machine
    Configured by hundreds of –XX flags
    Reminder
    -X options are non-standard
    -XX options have specific system requirements for correct operations
    Both are subject to change without notice
  • GC Magic 101
  • GC Is Slow?
    GC has bad performance reputation
    Reduces throughput
    Introduces pauses
    Unpredictable
    Uncontrolled
    Performance degradation is proportional to objects count
    Just give me the damn free() and malloc()! I’ll be just fine!
    Is it so?
  • Generational Collectors
    Weak generational hypothesis
    Most objects die young (AKA Infant mortality)
    Few old to young references
    Generations: regions holding objects of different ages
    GC is done separately once a generation fills
    Different GC algorithms
    The young (nursery) generation
    Collected by “Minor garbage collection”
    The old (tenured) generation
    Collected by “Minor garbage collection”
  • GC Magic 101
    vs
    Young is better than Tenured
    Let your objects die in young generation
    When possible and makes sense
    11
  • GC Magic 101
    12
    vs
    Swapping is bad
    Application's memory footprint should not exceed the available physical memory
  • GC Magic 101
    13
    vs
    Choose:
    Throughput (client)
    Low-pause (server)
  • GC Magic 101
    http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
    14
  • Tracking Collectors Algorithms
    Mark-Sweep collector
    Mark phase marks each reachable object
    Sweep phase “sweeps” the heap
    Non marked objects reclaimed as garbage
    Copying collector
    Heap is divided into two equal spaces
    When active space fills, live objects are copied to the unused space
    Only live objects are examined
    The roles of the spaces are then flipped
  • Compaction
    Compaction: The collector moves all live objects to the bottom of the heap
    Remaining memory is reclaimed
    Reduces the cost of objects allocation
    No potential fragmentation
    The drawback is slower completion of GC
  • The Young generation
    Consists of Eden + two survivor spaces
    Objects are initially allocated in Eden
    All HotSpot young collectors are stop-the-world copying collectors
    Done is parallel for parallel garbage collectors
    Collections are relatively fast and proportional to number of live objects
  • The Young generation
  • The Tenured generation
    Objects surviving several GC cycles, are promoted to the tenured generation
    Use -XX:MaxTenuringThreshold=# to change
    Collectors algorithms used are variations of Mark-Sweep
    More space efficient
    Characteristics
    Lower garbage density
    Bigger heap space
    Fewer GC cycles
  • Generetion Collectors
  • Garbage Collectors
    21
  • GC Flags
    22
  • When to Use
    23
  • Garbage First (G1)
    New in JDK 1.6 u14 (May 29th)
    All memory is divided to 1MB buckets
    Calculates objects liveness in buckets
    Drops “dead” buckets
    If a bucket is not total garbage, it’s not dropped
    Collects the most garbage buckets first
    Pauses only on “mark”
    No sweep
    User can provide pause time goals
    Actual seconds or Percentage of runtime
    G1 records bucket collection time and can estimate how many buckets to collect during pause
  • Garbage First (G1)
    Targets multi-process machines and large heaps
    G1 will be the long-term replacement for the CMS collector
    Unlike CMS, compacts to battle fragmentation
    A bucket’s space is fully reclaimed
    Better throughput
    Predictable pauses (high probability)
    Garbage left in buckets with high live ratio
    May be collected later
  • Benefits of G1
    No imbalance of young-tenured generation
    Generations are only logical
    Generations are merely sets of buckets
    More predictable GC pauses
    Parallelism and concurrency in collections
    No fragmentation due to compaction
    Better heap utilization
    Better GC ergonomics
  • Young GCs in G1
    Done using evacuation pauses
    Stop-The-World parallel collections
    Evacuates surviving objects between sets of buckets
  • Old GCs in G1
    Drops dead buckets
    Calculates liveness info per bucket
    Identifies best buckets for subsequent eviction pauses
    Collect them piggy-backed on young GCs
  • GC Ergonomics
    29
  • GC Ergonomics
    Ergonomics goal is to provide good performance with little or no tuning
    Better matches the needs of different application types
    The HotSpot, garbage collector and heap size are automatically chosen
    Based on OS, RAM and no# CPU
    Server Vs. Client class machine
    Hints the characteristics of the application
  • GC Ergonomics
  • GC Ergonomics
    With the parallel collectors, one can specify performance goals
    In contrast to specifying the heap size
    Improves performance for large applications
    Max Pause Time Goal
    Use -XX:MaxGCPauseMillis=<N>
    Both generation separately
    Or: Average + Variance
    No pause time goal by default
  • GC Ergonomics
    Throughput Goal
    Use -XX:GCTimeRatio=<N>
    The ratio of GC Vs. application time is 1/(1+N)
    If N=19, GC time goal is 1/(1+19) or 5%
    Default N is 99, meaning GC time is 1%
    Minimum Footprint Goal
    Priority of goals
    Maximum pause time goal
    Throughput goal
    Minimum footprint goal
  • GC Ergonomics
    Performance goals may not be met
    Pause time and throughput goals are somewhat contradicting
    The pause time goal shrinks the generation
    The throughput goal grows the generation
    Statistics are kept by the GC
    Adaptive to changes in application behavior
  • GC Tweaking
  • Heap Size
    The larger the heap space, the better
    For both young and old generation
    Larger space: less frequent GCs, lower GC overhead, objects more likely to become garbage
    Smaller space: faster GCs (not always! see later)
    Sometimes max heap size is dictated by available memory and/or max space the JVM can address
    You have to find a good balance between young and old generation size
  • Heap Size
    Maximize the number of objects reclaimed in the young generation
    Application's memory footprint should not exceed the available physical memory
    Swapping is bad
    The above apply to all our GCs
    37
  • Heap Size
    -Xmx<size> : max heap size
    young generation + old generation
    -Xms<size> : initial heap size
    young generation + old generation
    -Xmn<size> : young generation size
    -XX:PermSize=<size> : permanent generation initial size
    -XX:MaxPermSize=<size> : permanent generation max size
    38
  • Heap Size
    When -Xms != -Xmx, heap growth or shrinking requires a Full GC
    Set -Xms to desired heap size
    Set –Xmx even higher “just in case”
    Even full GC is better than OOM crash
    Same for -XX:PermSize and -XX:MaxPermSize
    Same for -XX:NewSize and
    -XX:MaxNewSize
    -Xmn Combines both
    39
  • Tenuring
    Measure tenuring with - XX:+PrintTenuringDistribution
    Avoid tenuring for short or even medium-lived objects!
    Less promotion into the old generation
    Less frequent old GCs
    Promote long-lived objects ASAP
    Yeah, conflict with previous bullet
    Better copy more, than promote more
    -XX:TargetSurvivorRatio=<percent>, e.g., 50
    How much of the survivor space should be filled
    Typically leave extra space to deal with “spikes”
    40
  • Permanent Space
    Classes aren’t unloaded by default
    -XX:+CMSClassUnloadingEnabled to enable
    Classloader should be collected
    It holds references to classes
    Each object holds reference to classloader
    41
  • GC Options
    42
  • GC Statistics Options
    GC logging has extremely low / non-existent overhead
    It’s very helpful when diagnosing production issues
    Enable it
    In production too!
    -XX:+
    PrintGC
    PrintGCDetails
    PrintGCTimeStamps
    PrintTenuringDistribution
    Show this threshold and the ages of objects in the new generation
    43
  • GC Is Slow? – The Answers
    Reduces throughput
    You choose
    Introduces pauses
    You choose
    Unpredictable
    Not any more
    Uncontrolled
    Configurable
    Performance degradation is proportional to objects count
    Not true
    Just give me the damn free() and malloc()! I’ll be just fine!
    Bad idea (see more later)
  • General Optimizations
  • HotSpot Optimizations
    JIT Compilation
    Compiler Optimizations
    Generates more performant code that you could write in native
    Adaptive Optimization
    Split Time Verification
    Class Data Sharing
  • Two Virtual Machines?
    Client VM
    Reducing start-up time and memory footprint
    -client CL flag
    Server VM
    Maximum program execution speed
    -server CL flag
    Auto-detection
    Server: >1 CPUs & >=2GB of physical memory
    Win32 – always detected as client
    Many 64bit OSes don’t have client VMs
    47
  • Just-In-Time Compilation
    Everyone knows about JIT!
    Hot code is compiled to native
    What is “hot”?
    Server VM – 10000 invocations
    Client VM – 1500 invocations
    Use -XX:CompileThreshold=# to change
    More invocations – better optimizations
    Less invocations – shorter warmup time
  • Just-In-Time Compilation
    The code is being optimized by the compiler
    Coming soon…
  • Adaptive Optimization
    Allows HotSpot to uncompile previously compiled code
    Much more aggressive, even speculative optimizations may be performed
    And rolled back if something goes wrong or new data gathered
    E.g. classloading might invalidate inlining
  • Split Time Verification
    Java suffers from long boot time
    One of the reasons is bytecode verification
    Valid flow control
    Type safety
    Visibility
    In order to ease on the weak KVM, J2ME started performing part of the verification in compile time
    It’s good, so now it’s in Java SE 6 too
  • Class Data Sharing
    Helps improve startup time
    During JDK installation part of rt.jar is preloaded into shared memory file which is attached in runtime
    No need to reload and reverify those classes every time
  • Compiler Optimizations
  • Two Types of Optimizations
    Java has two compilers:
    javac bytecode compiler
    HotSpot VM JIT compiler
    Both implement similar optimizations
    Bytecode compiler is limited
    Dynamic linking
    Can apply only static optimizations
  • Warning
    Caution! Don’t try this at home yourself!
    The source code you are about to see is not real!
    It’s pseudo assembly code
    Don’t writesuch code!
    Source code should be readable and object-oriented
    Bytecode will become performant automagically
    55
  • Optimization Rules
    Make the common case fast
    Don't worry about uncommon/infrequent case
    Defer optimization decisions
    Until you have data
    Revisit decisions if data warrants
    56
  • Null check Elimination
    Java is null-safe language
    Pointer can’t point to meaningless portion of memory
    Null checks are added by the compiler, NullPointerException is thrown
    JVM’s profiler can eliminate those checks
    57
  • Example – Original Source
    58
  • Example – Null Check Elimination
    59
  • Inlining
    Love Encapsulation?
    Getters and setters
    Love clean and simple code?
    Small methods
    Use static code analysis?
    Small methods
    No penalty for using those!
    JIT brings the implementation of these methods into a containing method
    This optimization known as “Inlining”
  • Inlining
    Not just about eliminating call overhead
    Provides optimizer with bigger blocks
    Enables other optimizations
    hoisting, dead code elimination, code motion, strength reduction
    61
  • Inlining
    But wait, all public non-final methods in Java are virtual!
    HotSpot examines the exact case in place
    In most cases there is only one implementation, which can be inlined
    But wait, more implementations may be loaded later!
    In such case HotSpot undoes the inlining
    Speculative inlining
    By default limited to 35 bytes of bytecode
    Use -XX:MaxInlineSize=# to change
  • Example - Inlining
    63
  • Example – Source Code Revision
    64
  • Example – Source Code Revision
    65
  • Code Hoisting
    Hoist = to raise or lift
    Size optimization
    Eliminate duplicate code in method bodies by hoisting expressions or statements
    Duplicate bytecode, not necessarily source code
  • Example – Code Hoisting
    67
  • Bounds Check Elimination
    Java promises automatic boundary checks for arrays
    Exception is thrown
    If programmer checks the boundaries of its array by himself, the automatic check can be turned off
  • Example – Bounds Check Elimination
    69
  • Sub-Expression Elimination
    Avoids redundant memory access
    70
  • Loop Unrolling
    Some loops shouldn’t be loops
    In performance meaning, not code readability
    Those can be unrolled to set of statements
    If the boundaries are dynamic, partial unroll will occur
  • Example – Loop Unrolling
    72
  • Example – Inlining
    73
  • Escape Analysis
    Escape analysis is not optimization
    It is check for object not escaping local scope
    E.g. created in private method, assigned to local variable and not returned
    Escape analysis opens up possibilities for lots of optimizations
  • Scalar Replacement
    Remember the rule “new == always new object”?
    False!
    JVM can optimize away allocations
    Fields are hoisted into registers
    Object becomes unneeded
    But object creation is cheap!
    Yap, but GC is not so cheap…
    75
  • Example – Source Code Revision
    76
  • Example – Scalar Replacement
    77
  • Example – Scalar Replacement
    78
  • Lock Coarsening
    HotSpot merges adjacent synchronized blocks using the same lock
    The compiler is allowed to moved statements into merged coarse blocks
    Tradeoff performance and responsiveness
    Reduces instruction count
    But locks are held longer
  • Example – Source Code Revision
    80
  • Example – Lock Coarsening
    81
  • Lock Elision
    A thread enters a lock that no other thread will synchronize on
    Synchronization has no effect
    Can be deducted using escape analysis
    Such locks can be elided
    Elides 4 StringBuffer synchronized calls:
  • Example - Lock Elision
  • Constants Folding
    Trivial optimization
    How many constants are there?
    More than you think!
    Inlining generates constants
    Unrolling generates constants
    Escape analysis generates constants
    JIT determines what is constant in runtime
    Whatever doesn’t change
  • Constants Folding
    Literals folding
    Before: intfoo = 9*10;
    After: intfoo = 90;
    String folding or StringBuilder-ing
    Before: String foo = "hi Joe " + (9*10);
    After: String foo = newStringBuilder().append("hi Joe ").append(9 * 10).toString();
    After: String foo = "hi Joe 90";
  • Example – Constants Folding
    86
  • Dead Code Elimination
    Dead code - code that has no effect on the outcome of the program execution
    publicstaticvoid main(String[] args) {
    long start = System.nanoTime();
    int result = 0;
    for (inti = 0; i < 10 * 1000 * 1000; i++) {
    result += Math.sqrt(i);
    }
    long duration = (System.nanoTime() - start) / 1000000;
    System.out.format("Test duration: %d (ms) %n", duration);
    }
  • OSR - On Stack Replacement
    Normally code is switched from interpretation to native in heap context
    Before entering method
    OSR - switch from interpretation to compiled code in local context
    In the middle of a method call
    JVM tracks code block execution count
    Less optimizations
    May prevent bound check elimination and loop unrolling
  • Out-Of-Order Execution
  • Out-Of-Order Execution
  • Programming & Tuning Tips
    • 91
  • How Can I Help?
    Just write good quality Java code
    Object Orientation
    Polymorphism
    Abstraction
    Encapsulation
    DRY
    KISS
    Let the HotSpot optimize
  • How Can I Help?
    final keyword
    For fields:
    Allows caching
    Allows lock coarsening
    For methods:
    Simplifies Inlining decisions
    Immutable objects die younger
    93
  • JVM tuning tips
    Reminder: -XX options are non standard
    Added for HotSpot development purposes
    Mostly tested on Solaris 10
    Platform dependent
    Some options may contradict each other
    Know and experiment with these options
    94
  • Monitoring & Troubleshooting
    95
  • References
    The HotSpot Home Page
    Java HotSpot VM Options
    Dynamic compilation and performance measurement
    Urban performance legends, revisited
    Synchronization optimizations in Mustang
    Robust Java benchmarking
    Garbage Collection Tuning
    96
  • References
    JavaOne 2009 Sessions:
    Garbage Collection Tuning in the Java HotSpot™ Virtual Machine
    Under the Hood: Inside a High-Performance JVM™ Machine
    Practical Lessons in Memory Analysis
    Debugging Your Production JVM™ Machine
    Inside Out: A Modern Virtual Machine Revealed
    97
  • Thank you for your attention 
    Thanks to Ori Dar!