JVM performance
    options.
  How it works?
Dmitriy Dumanskiy
 Cogniance, Velti project
     Java Team Lead
Xmx2048M -Xms2048M
     -XX:ParallelGCThreads=8 -Xincgc
     -XX:+UseConcMarkSweepGC -XX:
+UseParNewGC -XX:+CMSIncrementalPacing
            -XX:+AggressiveOpts
     -XX:+CMSParallelRemarkEnabled
           -XX:+DisableExplicitGC
         -XX:MaxGCPauseMillis=500
            -XX:SurvivorRatio=16
         -XX:TargetSurvivorRatio=90
       -XX:+UseAdaptiveGCBoundary
   -XX:-UseGCOverheadLimit -Xnoclassgc
    -XX:UseSSE=3 -XX:PermSize=128m
       -XX:LargePageSizeInBytes=4m
Options may vary per
architecture / OS / JVM version
JVM 6 ~ 730 options

JVM 7 ~ 680 options
-X   : are non-standard (not all JVM)
-XX : are not stable
Types

Boolean : -XX:+<option> or -XX:-<option>

Numeric : -XX:<option>=<number>

String   : -XX:<option>=<string>
Categories


    Behavioral options

    Garbage Collection options

    Performance tuning options

    Debugging options
-XX:+DoEscapeAnalysis

      Analys :

    Can objects be created on stack?

    Are objects accessed from 1 thread?
-XX:+DoEscapeAnalysis

      Analys result :

    GlobalEscape

    ArgEscape

    NoEscape
NoEscape
class Cursor {
    String icon;
    int x;
    public void create() {
        Cursor c = new Cursor(); //HEAP
        c.icon = null;           //HEAP
        c.x = 0;                 //HEAP
    }
}
NoEscape → scalar replacement
class Cursor {
    String icon;
    int x;
    public void create() {
        String icon = null; //ref on stack frame
        int x = 0;           //int on stack frame
    }
}
NoEscape → scalar replacement
NoEscape → scalar replacement
-XX:+DoEscapeAnalysis

    ~20-60% locks elimination

~15-20% performance improvement
-XX:+DoEscapeAnalysis
-XX:+AggressiveOpts
                            -AggressiveOpts +AggressiveOpts

AutoBoxCacheMax                  128             20000

BiasedLockingStartupDelay        4000             500

EliminateAutoBox                 false           true

OptimizeFill                     false           true

OptimizeStringConcat             false           true
-XX:AutoBoxCacheMax=size
Sets IntegerCache.high value :

class Integer {
    public static Integer valueOf(int i) {
      if(i >= -128 && i <= IntegerCache.high)
           return IntegerCache.cache[i + 128];
      else
           return new Integer(i);
  }
}
-XX:AutoBoxCacheMax=size


 new Integer(1) vs Integer.valueOf(1)


       valueOf ~4 times faster
-XX:BiasedLockingStartupDelay=delay


  
      Biased
  
      Thin
  
      Fat
-XX:-OptimizeStringConcat
String twenty = «12345678901234567890»;
String sb = twenty + twenty + twenty +
  twenty;



String twenty = «12345678901234567890»;
String sb = new StringBuilder()
.append(twenty).append(twenty)
.append(twenty).append(twenty).toString();
-XX:-OptimizeStringConcat
String twenty = «12345678901234567890»;
String sb = new StringBuilder()
.append(twenty).append(twenty)
.append(twenty).append(twenty).toString();



              new char[16];
              new char[34];
              new char[70];
              new char[142];
-XX:+OptimizeStringConcat
String twenty = «12345678901234567890»;
String sb = new StringBuilder()
.append(twenty).append(twenty)
.append(twenty).append(twenty).toString();


              new char[80];
-XX:+OptimizeStringConcat
String twenty = «12345678901234567890»;
StringBuilder sb1 = new StringBuilder();
sb1.append(new StringBuilder()
     .append(twenty).append(twenty)
     .append(twenty).append(twenty)
);



                new char[80];
XX:+OptimizeFill
Arrays.fill(), Arrays.copyOf() or code
  patterns :


for (int i = fromIndex; i < toIndex; i++) {
            a[i] = val;
}



       Native machine instructions
XX:+EliminateAutoBox



    Removes unnecessary AutoBox operations


    Works only for Integers
-XX:+UseStringCache



Look like not used anymore
-XX:+UseCompressedStrings



    For ASCII characters:
       char[] -> byte[]
-XX:+UseCompressedOops


    Heap size up to 32Gb

    References size 50% smaller

    JVM performance boost 2-10%

    20 — 60% less memory consumption;
-XX:+UseCompressedOops
1.2
 1
0.8
0.6
0.4
0.2
 0
      32-bit   64-bit   64-bit Comp.
-XX:+EliminateLocks

synchronized (object) {
                          synchronized (object) {
   //doSomething1
                             //doSomething1
}
                             //doSomething2
synchronized (object) {
                          }
   //doSomething2
}
-XX:+EliminateLocks
synchronized (object) {
   //doSomething1
}                         synchronized (object) {
                             //doSomething1
//doSomething2               //doSomething2
                             //doSomething3
synchronized (object) {   }
   //doSomething3
}
-XX:+UseLargePages

 Translation-Lookaside Buffer
   (TLB) is a page translation
   cache that holds the most-
recently used virtual-to-physical
      address translations
-XX:CompileThreshold=n

    Client mode n = 1500
   Server mode n = 10000

More profile data — more optimizations
-XX:hashCode=n


Object.hashCode() - internal
address of the object?
-XX:hashCode=n
n is :

    0   –   Park-Miller RNG (default)

    1   –   f (address, global state)

    2   –   const 1

    3   –   sequence counter

    4   –   object address

    5   –   Thread-local Xorshift

JVM performance options. How it works