JVM Internals - NHJUG Jan 2012

JVM Internals
Douglas Q. Hawkins
http://www.slideshare.net/dougqh
http://www.dougqh.net
dougqh@gmail.com

Monday, January 23, 12

Topics
Java Byte Code
File Format
Byte Code Examples
How Java 5 & 7 Features Are Implemented
JVM Optimizations


Why?

Besides techie ediﬁcation, why is this useful?
A better understanding of the internals can help in deciphering some of the harder problems, but better...
You’ll know that the compiler and JVM are doing a lot for you letting you focus on writing readable code.

File Format


Class File Format
CA FE BA BE Minor Version Major Version

Constant Pool

Flags This Class Super Class
Interfaces

Fields

Methods

Attributes


Every file starts the magic 2-bytes: CAFEBABE
Followed by major and minor version - major indicates Java 5, 6, 7, etc.
Then a constant pool - which contains...
constants: int, long, String, etc.
references: method and field
descriptors: method and field
Followed by flags: modifiers for this class/interface
Followed by reference to this class/interface
Followed by the super class - which is an index into the constant pool
Followed by a list interface references - which are indices into constant pool
Followed by fields
Followed by methods
And, finally, attributes which are extra meta-information about the class...
- the name of the original file
- annotation information
- information on sub-classes

Class File Spec: http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-Java5.pdf
History of CAFEBABE: http://en.wikipedia.org/wiki/Java_class_file

Class File Format
CA FE BA BE Minor Version Major Version

Constant Pool

n
Flags This Class Super Class

pu te d
tio

ce

iva te
er ct
ta

ab tfp

pr ec
int ra
fa

ic
um

pr ic
no
Interfaces

st

bl
ric

ot
al
at
en
an

fin
st

st
Fields

Methods

Attributes


Every file starts the magic 2-bytes: CAFEBABE
Followed by major and minor version - major indicates Java 5, 6, 7, etc.
Then a constant pool - which contains...
constants: int, long, String, etc.
references: method and field
descriptors: method and field
Followed by flags: modifiers for this class/interface
Followed by reference to this class/interface
Followed by the super class - which is an index into the constant pool
Followed by a list interface references - which are indices into constant pool
Followed by fields
Followed by methods
And, finally, attributes which are extra meta-information about the class...
- the name of the original file
- annotation information
- information on sub-classes

Class File Spec: http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-Java5.pdf
History of CAFEBABE: http://en.wikipedia.org/wiki/Java_class_file

Field Format
Flags Name Descriptor

pu te d
lat nt

iva te
ile
vo ie

pr ec

ic
pr ic
ns
Attributes

bl
ot
al
at
tra

fin
Monday, January 23, 12 st
Fields consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. field type - also index into the constant pool
- type is raw type
followed by attributes
- constant value
- specific type information - List< String >, etc.

Field Format
“name”
Attributes


ﬂags
- type is raw type
- constant value

Field Format
Flags Name Descriptor “Ljava/lang/String;”
Attributes


ﬂags
- type is raw type
- constant value

Field Format
Attributes

ConstantValue


ﬂags
- type is raw type
- constant value

Method Format

d
ize

pu te d
al on

iva te
s
tfp

fi n hr

pr ec
rg

ic
va e

pr ic
Attributes

tiv

nc

bl
ra
ric

ot
at
na

sy
st

Monday, January 23, 12 st
Methods consist of...
flags
followed by descriptor - e.g. raw parameter types and return type
- exceptions & code
- specific exception information
- debugging information

Method Format
“main”
Attributes


ﬂags
- exceptions & code

Method Format
Flags Name Descriptor “([Ljava/lang/String;)V”
Attributes


ﬂags
- exceptions & code

Method Format
Attributes

Exceptions

Code


ﬂags
- exceptions & code

Constant Pool
C 2 UTF 10 HelloWorld
C 4 UTF 16
“java/lang/Object”

UTF 6 “<init>” UTF

3 “()V” UTF 4 “Code”
M 3 9 N&T 5 6 UTF 4
“main” UTF 22
“([Ljava/lang/String;)V”

F 13 15 C 14 UTF

16 “java/lang/System”

Dissect the “Hello World” example a little...
Entry 1 is a class entry - a 2-byte index to a UTF entry that contains the name
Entry 2 is the name of the class
Similarly...
Entry 3 is a class entry - referring to the parent class refers to Entry 4 which is the full name of the parent class
Skip over the constructor “<init>” and focus on main
Entry 10 is the name “main” & Entry 11 is the raw type descriptor for “main”
The [Ljava/lang/String indicates String[] - V indicates returns void

Browsing Class File Format

JClassLib Viewer http://www.ej-technologies.com/products/jclasslib/overview.html

JClassLibViewer: http://www.ej-technologies.com/products/jclasslib/overview.html

ConstantValue
public final class HelloWorld {
public static final String MESSAGE = "Hello, World!";

public static final void main( final String... args ) {
System.out.println( MESSAGE );
}
}


Here, we can see that because the “MESSAGE” field is “static final”.
The value is stored in a “ConstantValue” attribute on the “MESSAGE” field.

Exceptions
public interface InputStreamProvider {
public abstract InputStream open() throws IOException;
}


Exception information is also stored in attribute.
As it turns out the JVM, makes no distinction between checked and unchecked exceptions which has an interesting
implication...

Exceptions
public final class NewInstance {
public static void main(String... args) {
try { public class SomeClass {
Class. public SomeClass() throws SomeException {

forName("net.dougqh.runtime.SomeClass"). throw new SomeException();
newInstance(); }
} catch ( }
InstantiationException |
IllegalAccessException |
ClassNotFoundException e)
{
e.printStackTrace();
}
}
}
Exception in thread "main" net.dougqh.runtime.SomeClass$SomeException
! at net.dougqh.runtime.SomeClass.<init>
! at sun.reflect.NativeConstructorAccessorImpl.newInstance0
! at sun.reflect.NativeConstructorAccessorImpl.newInstance
! at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
! at java.lang.reflect.Constructor.newInstance
! at java.lang.Class.newInstance0
! at java.lang.Class.newInstance
! at net.dougqh.runtime.NewInstance.main


www.javapuzzlers.com
Because of an oversight in the original reﬂection API, Class.newInstance can throw a checked exception that is
not reported by the compiler

Generics
public final class Generics {
public static final List<String> getStrings() {
return Collections.singletonList("foo");
}
}


Here, we can getStrings() which returns List<String> has a descriptor of the raw-type List
However, the exact type information is stored in the “Signature” attribute

Annotations
@Inherited
@Retention( RetentionPolicy.RUNTIME )
public @interface Annotation {
public int foo() default 20;

public String bar();
}

@Annotation( bar="quux" )
class Annotated {}


An annotation is just an inteface
The default values for each method are stored in a ConstElement attribute
The annotation information on a class or method is also stored in an attribute
In this case, since the annotation has a RUNTIME RetentionPolicy, it is stored in the RuntimeVisibleAnnotations
attribute
Values for the attribute are stored in the sub-attribute ElementValuePair

Byte Code


Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0


The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green ﬁeld is heap, the bar above is local variable slots, and the column to the left is the stack

Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the ﬁrst local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3

0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0

1




0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0

2
1




0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0

1+2




0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0

3




0 iconst_1
0 1 2 3
1 iconst_2 3
2 iadd
3 istore_0
4 iload_0




0 iconst_1
0 1 2 3
1 iconst_2 3
2 iadd
3 istore_0
4 iload_0

3




Parameters and Local Variables
static int volume( 0 iload_0
int width, 1 iload_1
int depth,
int height ) 2 imul

e
t
h

lum
igh
h
pt

a
dt
{

are
3 istore_3

de
he
wi

vo
0 1 2 3 4
int area = width * depth;
4 iload_3
int volume = area * height;
return volume; 5 iload_2
}
6 imul
7 istore 4
9 iload 4
11ireturn


Trace through a slightly more complicated example: calculating volume
- arguments are passed into the low local variables slots - 0 - 3 in this case
- ﬁrst to calculate area, load width and depth from slots 0 & 1 respectively
- multiply the values on the stack, then store result into slot 4 area
- reload area & height - slots 4 & 3 respectively
- multiply the values and store into slot 5: volume
- reload volume and return
Yes, the value is stored and then immediately reloaded in the byte code. Starting with Java 3, byte code is not
optimized by javac, all optimizations are left to the JVM to perform.

Static vs Virtual Methods
int volume( 0 iload_1
int width, 1 iload_2
int depth,
2 imul

e
int height )

are t
he h

lum
h

igh
pt

a
dt
s
{ 3 istore 4

thi

de
wi

vo
0 1 2 3 4 5
int area = width * depth;
5 iload 4
int volume = area * height;
return volume; 7 iload_3
} 8 imul
9 istore 5
11 iload 5
13 ireturn


In the prior example, you may have noticed that method was static.
If the method isn’t static, then “this” is invisibly passed to the ﬁrst slot.
So, our arguments start at 1 and the load and stores all change accordingly.

Hello World
System.out.println( “Hello World” );
0 1 2 3
0 getstatic System.out

3 ldc “Hello World”

5 invokevirtual PrintStream.println
“Hello World”
8 return

System.out


Now, we know enough to understand “Hello World”

The ﬁrst operation is a getstatic to load the value of System.out onto the stack
We need this reference to invoke println
Second, load the string “Hello World” onto the stack - the ldc indicates a load from the constant pool
Now, since this is non-static method on a class, use invokevirtual to invoke PrintStream.println
This consumes the pointer to System.out (which is the this for PrintStream.println) and the reference to “Hello
World”
These values are then mapped to local slots for “this” and “msg” in the new stack frame

Hello World

g
s
ms
thi
System.out.println( “Hello World” );
0 1 2 3
0 getstatic System.out

3 ldc “Hello World”

5 invokevirtual PrintStream.println
“Hello World”
8 return

System.out


Now, we know enough to understand “Hello World”

The ﬁrst operation is a getstatic to load the value of System.out onto the stack
We need this reference to invoke println
Second, load the string “Hello World” onto the stack - the ldc indicates a load from the constant pool
Now, since this is non-static method on a class, use invokevirtual to invoke PrintStream.println
This consumes the pointer to System.out (which is the this for PrintStream.println) and the reference to “Hello
World”
These values are then mapped to local slots for “this” and “msg” in the new stack frame

Types of Method Invocations
invokestatic - invoke static methods
invokevirtual - invoke instance method from class
invokeinterface - invoke instance method from interface
invokespecial - invoke <init> / invoke super method
invokedynamic - optimized dynamic look-up (in Java 7)


We’ve seen a call to invokevirtual which is used class methods, but there are other invocation types, too.
invokestatic - for static methods
invokeinterface- for methods invoked through an interface reference (rather than a class reference)
invokespecial - for direct targets - like constructors or invoking a super method where the call is not polymorphic
invokedynamic - used by script languages like JRuby in Java 7 for improved performance

New Object
BigDecimal num =

m
new BigDecimal(“2.0”);

nu
0 1 2 3
0 new BigDecimal

3 dup

4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>

9 astore_0


Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.

As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.

From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.

New Object
BigDecimal num =

m
new BigDecimal(“2.0”);

nu
0 1 2 3
0 new BigDecimal

3 dup

4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>

9 astore_0

BigDecimal


Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.

As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.

From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.

Demo
javap -c


Conditionals
Original Byte Code
if ( x > 0 ) { 0: iload_0
return true; 1: ifle 6
} else { 4: iconst_1
return false; 5: ireturn
} 6: iconst_0
7: ireturn

0: iload_0
return x > 0 ? true : false;
1: ifle 8
4: iconst_1
5: goto 9
8: iconst_0
9: ireturn

0: iload_0
return ( x > 0 );
1: ifle 8
4: iconst_1
5: goto 9
8: iconst_0
9: ireturn


Three ways to write a method that checks if a number is greater than 0.
The byte code is almost the same in all 3 cases.

Invoke Static
Original Decompiled
Math.max(10, 20); 0: bipush 10
2: bipush 20
4: invokestatic Math.max
7: pop
8: return


Here, we see an extra pop after the invokestatic call.
That’s because the return value of max is left on the stack, since we don’t use it the compiler generates a pop to
discard it.
If we store the value in a variable, the pop will be replaced with an istore

Invocations
Original Decompiled
FileInputStream in = 0: new FileInputStream
new FileInputStream("foo"); 3: dup
in.close(); 4: ldc "foo"
6: invokespecial FileInputStream.<init>
9: astore_0
10: aload_0
11: invokevirtual FileInputStream.close
14: return

Closeable in = new FileInputStream("foo"); 0: new FileInputStream
in.close(); 3: dup
4: ldc "foo"
6: invokespecial FileInputStream.<init>
9: astore_0
10: aload_0
11: invokeinterface Closeable.close
16: return


In one example, close is called on a class-type FileInputStream in the other it is called on an interface-type
Closeable
In the ﬁrst case, the compiler generates an invokevirtual call
In the second case, the compiler generates an invokeinterface call

For Loop

before
0 iconst_0

init & test loop
1 istore_2
2 iload_0

static int sum( int min, int max ){ 3 istore_3
int sum = 0; 4 goto +10 //14
for ( int i=min; i<max; ++i ){ 7 iload_2

loop body
sum += i; 8 iload_3
} 9 iadd
return sum;
10 istore_2
}

inc
11 iinc 3 by 1
14 iload_3
test 15 iload_1
16 if_icmplt -9 //7
19 iload_2
after
loop

20 ireturn


Examine a for loop example

The first 2 ops are the initialization of “sum”, load 0 and store in “sum” (slot 2)
The next 3 ops are the loop initialization and jump to the initial test...
- load the value of “min” (slot 0) into “i” (slot 3)
- then jump to the test
The test is placed at the end since it is generally performed after the body and step portions of the loop
The test...
- loads “i” (slot 3) and “max” (slot 1)
- if “i” is less than “max”, then it jumps back 9 bytes to the start of the loop body
The loop body...
- loads and adds “sum” and “i” (slots 2 and 3) and stores the result back into “sum” (slot 2)
Then the step / increment part of the loop happens...
- which just increments “i”
Then we flow straight into the test portion
If the test fails, we flow through to the after loop portion
Here, we load “sum” (slot 2) and return the result

0 aload_0
Exception Handling 1 invokevirtual InputStream.read

try / finally
static int read( InputStream in ) { 4 istore_1
try { 5 aload_0
return in.read(); 6 invokestatic IoUtils.closeQuietly
} catch ( IOException e ) { 9 iload_1
return -1; 10 ireturn
} finally {
11 pop
IoUtils.closeQuietly( in );

catch / finally
} 12 aload_0
} 13 invokestatic IoUtils.closeQuietly
16 iconst_m1
17 ireturn
Exception Table
18 astore_2
start end handler Exception
19 aload_0
0 5 11 IOException
finally

20 invokestatic IoUtils.closeQuietly
0 5 18 any
23 aload_2
11 12 18 any 24 athrow

Now, Exception handling...
Exceptions are handled through extra meta-information that says how to handle different types of exceptions
over a range of byte-code instructions.

The finally portion is inlined in the try, catch, and finally portions of the generated byte code.
(Prior to Java 6, the regular javac compiler generated “jsr” and “ret” to jump to single block of compiled “finally”
code.)

The “try / finally” section represents the normal flow.
- invoke InputStream.read
- store the result into an unnamed temporary variable (slot 1) b/c we need to run the finally code
- run the finally code
- reload the temporary variable and return

The “catch / finally” is the catching of the IOException...
The exception table says if an IOException is raised between instructions 0 and 5 (the try), jump to 11 this catch
section.
First, step is to “pop”, pop what? In this case the IOException which was automatically placed on the stack. Since
we don’t use it discard it. This implies that “e” is never assigned a stack slot by the compiler.
Now, invoke IoUtils.closeQuietly (the finally block) then return -1.

0 aload_0
Synchronization

before try
1 dup
int inc() { 2 astore_1
synchronized ( this ) { 3 monitorenter
++this.counter; 4 aload_0
}
5 dup
}
6 getfield Counter.num

try / finally
9 iconst_1
10 iadd
11 putfield Counter.num
14 aload_1
15 monitorexit
16 goto +6 //22
Exception Table
19 aload_1
finally

20 monitorexit
4 16 22 any
21 athrow
19 21 22 any 22 return

Interestingly enough, synchronization works the same way.
To understand synchronization, it is better to luck at synchronization as a lock and unlock within a try / finally.
And, that’s exactly how the byte code works.
And, just like a regular try / finally, the finally is inlined is both the try and the finally.

0 aload_0
Synchronization

before try
1 dup
int inc() { 2 astore_1
lock( this ); 3 monitorenter
try { 4 aload_0
++this.counter;
5 dup
} finally {
6 getfield Counter.num

try / finally
unlock( this );
} 9 iconst_1
} 10 iadd
11 putfield Counter.num
14 aload_1
15 monitorexit
16 goto +6 //22
Exception Table
19 aload_1
finally

20 monitorexit
4 16 22 any
21 athrow
19 21 22 any 22 return

Interestingly enough, synchronization works the same way.
To understand synchronization, it is better to luck at synchronization as a lock and unlock within a try / finally.
And, that’s exactly how the byte code works.
And, just like a regular try / finally, the finally is inlined is both the try and the finally.

Demo
Java 5
Java 7


In these demos, I demonstrate new language features by showing Java 5 and Java 7 code and then showing what it looks
when its decompiled back into Java 4 code.
JAD - http://www.varaneckas.com/jad

Java 5



Auto-Boxing
Original Decompiled as Java 4
public class AutoBoxing { public class AutoBoxing {
public static void main(String[] args) { public static void main(String args[]) {
Integer foo = 20; Integer foo = Integer.valueOf(20);
Integer bar = 30; Integer bar = Integer.valueOf(30);

int sum = foo + bar; int sum = foo.intValue() + bar.intValue();
System.out.println(sum); System.out.println(sum);
} }
} }


Here, we see how auto-boxing works.
The compiler injects the necessary calls to Integer.valueOf and Integer.intValue for us.
NOTE: Even if you don’t like auto-boxing, please call Integer.valueOf rather than calling new Integer.
Unlike new, Integer.valueOf returns cached instances of Integer for commonly used values.

Enhanced For
public class EnhancedFor { public class EnhancedFor {
static void array(String[] args) { static void array(String args[]) {
for ( String arg : args ) { String arr$[] = args;
System.out.println(arg); int len$ = arr$.length;
} for (int i$ = 0; i$ < len$; i$++) {
} String arg = arr$[i$];
System.out.println(arg);
static void iterable( }
Iterable<String> args) }
{
for ( String arg: args ) { static void iterable(Iterable args) {
System.out.println(arg); String arg;
}
} for (Iterator i$ = args.iterator();
} i$.hasNext(); )
{
arg = (String) i$.next();
System.out.println(arg)
}
}
}


In this slide, we see how the enhanced for gets handled by the compiler.

The array for loop, converts to the canonical C-style loop. With one slight difference of performing invariant
hoisting on the array length. (Although, this is rather pointless optimization because the JVM would do this at
runtime anyway.)

For an Iterable, a loop that uses an iterator is generated. In this example, we can also see that the compiler
injects a cast to exact type String, too.

Var-Args
public final class VarArgs { public final class VarArgs {
public static void main(String... args) { public static transient void main(
System.out.printf( String[] args)
"Hello %s %s", "Jon", "Doe"); {
} System.out.printf(
} "Hello %s %s",
new Object[] {"Jon", "Doe"});
}
}


In this example, we var-args being used both in the signature and in the call to printf.
NOTE: I’ve declared a main method with var-args, since on a byte-code level this is still just a String[]. This
actually works just fine.

The “transient” modifier in the decompiled Java 4 is a bit amusing. This happens because Java ran out of flag bits
to use in Java 5, so they overloaded the “transient” bit which only applies to fields to mean “var-args” when
applied to methods.

In the call to printf, we can see that the compiler injects a construction of a new Object[] and passes it as the last
arg to printf.

Enum
public enum AnEnum { public static final class AnEnum
FOO, extends Enum
BAR, {
QUUX public static final AnEnum FOO =
} new AnEnum(“FOO”, 0);
public static final AnEnum BAR =
new AnEnum(“BAR”, 1);
public static final AnEnum QUUX =
new AnEnum(“QUUX”, 2);
private static final AnEnum[] $VALUES =
new AnEnum[]{FOO, BAR, QUUX};

public static AnEnum[] values() {
return (AnEnum[]) $VALUES.clone();
}
public static AnEnum valueOf(String name){
return (AnEnum)Enum.valueOf(
AnEnum.class, name);
}
private Simple(String s, int i) {
super(s, i);
}
}


For Enum-s, the compiler does a great deal of work on your behalf -- even in the simplest case.
The compiler generates a constructor that takes a label and ordinal for each entry.
It then initializes a static final field for each constant from the original file.
These constants are all placed in a value array.
Finally, the compiler generates a values() method and valueOf() method for each enum class.

Covariance
public interface Parent { public static interface Parent {
Number calculate(); public abstract Number calculate();
} }

public class CovariantChild public class CovariantChild
implements Parent implements Parent
{ {
public Integer calculate() { public Integer calculate() {
return 10; return Integer.valueOf(10);
} }
}
public volatile Number calculate() {
return calculate();
}
}


A lesser known addition to Java 5 is the ability to have a covariant return type.
Here, the child type returns a more specific type of Number -- namely Integer.

The generated code is interesting. We end up with two “calculate” methods - one that returns Integer and another
returns Number. The one that returns Number satisfies the contact of the parent and simply calls the more
specific version that returns Integer.

Here, again we see the curious modifier on a method: “volatile”. This another situation where Java 5 overloaded an
existing flag bit.

For more information on why this is type-safe, look-up Liskov Substitution Principle.

Java 7


Multi-Catch
public final class EnhancedCatch { public final class EnhancedCatch {
public static void main(String[] args){ public static void main(String args[]) {
try { try {
Class. Class.
forName("some.package.SomeClass"). forName("some.package.SomeClass").
newInstance(); newInstance();
} catch ( } catch (ReflectiveOperationException e){
InstantiationException | throw new IllegalStateException(e);
IllegalAccessException | }
ClassNotFoundException e) }
{ }
throw new IllegalStateException(e);
}
}
}


Java 7 adds the ability to handle multi-exception types in a single catch.
Great for ugly reflection code.
Here, the catch of all the reflection exceptions simplifies to a single catch of their common parent
ReflectiveOperationException (a new base class for reflection exceptions also introduced in Java 7).

Try With Resources
Original Decompiled
public class EnhancedTry { public class EnhancedTry {
public static void main( public static void main(String args[])
String[] args) throws IOException
throws IOException {
{ Properties properties = new Properties();
Properties properties = InputStream in =
new Properties(); new FileInputStream("my.properties");
Throwable throwable = null;
try (InputStream in = try {
new FileInputStream("my.properties")) properties.load(in);
{ } catch (Throwable throwable1) {
properties.load(in); throwable = throwable1;
} } finally {
} if (in != null) {
} try {
in.close();
} catch (Throwable x2) {
throwable.addSuppressed(x2);
throw throwable;
}
}
}
}
}

Java 7 also enhances try by allowing it to automatically close resources.
It generates a similar try / ﬁnally to what you’d write by hand.
Although, it puts the resource acquisition outside the try (which is correct but uncommon among many Java
programmers).
However, it does one more thing, it also adds code, so that if an exception happens when closing the original
exception from the body is still propagated. And, even better the exception raised by closed is added to the
suppressed list of the original exception using the new Java 7 method: Throwable.addSuppressed.

String Switch
Original Decompiled
switch (args[0]) { byte byte0 = -1;
case "Hello": switch(args[0].hashCode()) {
System.out.println("Hello, World!"); case 69609650: ... break;
break; case 67278:
if(s.equals("9uFFE7")) {
case "Bye": byte0 = 2;
System.out.println("Good Bye, World!"); } else if(s.equals("Bye")) {
break; byte0 = 1;
}
case "9uffe7": break;
System.out.println("Collision"); }
break; switch(byte0) {
} case 0:
System.out.println("Hello, World!");
break;

case 1:
System.out.println("Good Bye, World!");
break;

case 2:
System.out.println("Collision");
break;
}

One last example from Java 7 -- string switch

String switch is implemented as a switch on the String’s hashCode.
However, hashCode is not unique, so the generated code must also perform an equals check.

To handle this, string switch actually generates two switch statements.
The ﬁrst on the hashCode, assigns a temporary variable, a case value from the original code.

Then the second switches on the case code, each case containing code from the original Java 7 cases.
Here, I’ve deliberately created a hash collision, so you can see how collisions are resolved.

Compiler
Optimizations

In the next few examples, I show code the original code and the code after it has been decompiled.
By doing this, we can see some of the optimizations performed by the compiler.

Constant Folding
Original Decompiled
public final class StaticInitializer { public final class StaticInitializer {
private static final String LOG_FORMAT = private static final String LOG_FORMAT =
"Started at %d ms"; "Started at %d ms";

private static final long START_TIME = private static final long START_TIME =
System.currentTimeMillis(); System.currentTimeMillis();

private static final long START_TIME_2; private static final long START_TIME_2 =
System.currentTimeMillis();
static { }
START_TIME_2 = System.currentTimeMillis();
}
}


While modern Java compiler’s don’t do much optimization, they do some.
One example is constant folding -- when possible, the compiler computes simply constant expressions at compile
time.
This even includes string concatenation.

Constant Inlining
Original Decompiled
public class Inlining { public class Inlining {
public static final String public static final String
INLINED_VERSION = "1.1.0"; INLINED_VERSION = "1.1.0";
public static final String public static final String
NOT_INLINED_VERSION = identity("1.2.0"); NOT_INLINED_VERSION = identity("1.2.0");

private static String identity( private static String identity(
String value) String value)
{ {
return value; return value;
} }

public static void print() { public static void print() {
System.out.println(INLINED_VERSION); System.out.println("1.1.0");
System.out.println(NOT_INLINED_VERSION); System.out.println(NOT_INLINED_VERSION);
} }
} }


Constants can also be inlined by the compiler
In this example, the compiler inlines INLINED_VERSION in the print method; however,
it does no inlined NOT_INLINED_VERSION.
The reason is that NOT_INLINED_VERSION is complexed expression because a method was invoked.

This has implications in the byte code, too.
INLINED_VERSION will have its value set through a ConstantValue attribute.
NOT_INLINED_VERSION will be initialized in a <clinit> method generated by the compiler and
called automatically when the class is first loaded.

Dead Code Elimination
Original Decompiled
public class DeadCodeElimination { public class DeadCodeElimination {
public static final boolean public static final boolean
DEBUG_OFF = false; DEBUG_OFF = false;

public static final boolean public static final boolean
DEBUG_ON = true; DEBUG_ON = true;

public static void main(String[] args) { public static void main(String args[]) {
if ( DEBUG_OFF ) { System.out.println("always");
System.out.println("never"); }
} }

if ( DEBUG_ON ) {
System.out.println("always");
}
}
}


Along with inlining, the compiler can perform dead code elimination.
In this case, DEBUG_OFF is never true, so the “never” print out is not generated by the
compiler.
Even in the DEBUG_ON case, the compiler realizes the if is always true and simply includes an
unconditional print of “always”.

Runtime
Optimizations

HotSpot Lifecycle
1 2
Interpreted Proﬁling

Dynamic Dynamic
Decompilation Compilation
4 3


Client compilation kicks-in at invocation 3000
Server compilation kicks-in at invocation 10000
Tiered compilation - C0, C1, C2
Method Replacement vs On-Stack Replacement

http://java.sun.com/products/hotspot/whitepaper.html
http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html
http://www.azulsystems.com/blog/cliff-click/2010-07-16-tiered-compilation
http://www.slideshare.net/drorbr/so-you-want-to-write-your-own-benchmark-presentation

Is This Optimized?
double sumU = 0, sumV = 0;
for ( int i = 0; i < 100; ++i ) {
Vector2D vector = new Vector2D( i, i );
synchronized ( vector ) {
sumU += vector.getU(); How many...?
sumV += vector.getV(); Loop Iterations 100
}
Heap Allocations 100
}
Method Invocations 200
Lock Acquisitions 100


Let’s start the runtime observation discussion with a simple question.
Is this optimized?
How many loop iterations does it do? 100
How many heap allocations? 100
How method invocations? 200
How lock acquisitions? 100
Surprisingly, enough the answer to all of these may actually be zero.

Is This Optimized?
for ( int i = 0; i < 100; ++i ) {
}
Heap Allocations 0
}
Lock Acquisitions 0


Let’s start the runtime observation discussion with a simple question.
Is this optimized?
How many loop iterations does it do? 100
How many heap allocations? 100
How method invocations? 200
How lock acquisitions? 100
Surprisingly, enough the answer to all of these may actually be zero.

Common Sub-Expression
Elimination
int x = a + b;
int y = a + b;

int tmp = a + b;
int x = tmp;
int y = tmp;


Among the simplest optimizations is common sub-expression elimination.
Here the VM optimizes the code by only performing the calculation of “a+b” once.

Array Bounds Check Elimination
int[] nums = ...
for ( int i = 0; i < nums.length; ++i ) {
System.out.println( “nums[“ + i + “]=” + nums[ i ] );
}

int[] nums = ...
if ( i < 0 || i >= nums.length ) {
throw new ArrayIndexOutOfBoundsException();
}
System.out.println( “nums[“ + i + “]=” + nums[ i ] );
}


One of the nice things about the VM is that we do have to worry about buffer overruns because the VM checks
array bounds for us, but how much is that costing us.
In short, nothing. The VM recognizes common patterns and realizes that it does not need to generate the bound
checking code.
http://www.cs.umd.edu/~vibha/330/array-bounds.pdf

Loop Invariant Hoisting
...
}

int length = nums.length;
for ( int i = 0; i < length; ++i ) {
...
}


The VM can also also realize that the length of array does not change, so it can replace looking up the length of the array on
each test with a single storing of a temporary variable and comparing against that instead.
http://java.sun.com/products/hotspot/docs/whitepaper/Java_Hotspot_v1.4.1/Java_HSpot_WP_v1.4.1_1002_4.html

Loop Unrolling
int sum = 0;
for ( int i = 0; i < 10; ++i ) {
sum += i;
}

int sum = 0;
sum += 1;
...
sum += 9;


In some situations, the loop can even be unrolled into a simple linear code segment.

Method Inlining
Vector vector = ...
double magnitude = vector.magnitude();
Vector vector = ... static always
double magnitude = Math.sqrt( final always
vector.u*vector.u + vector.v*vector.v );
private always
Vector vector = ... virtual often
double magnitude;
reflective sometimes
if ( vector instance of Vector2D ) {
magnitude = Math.sqrt( dynamic often
vector.u*vector.u + vector.v*vector.v );
} else {
magnitude = vector.magnitude();
}


http://www.ibm.com/developerworks/library/j-jtp12214/
http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html
http://blog.headius.com/2009/01/my-favorite-hotspot-jvm-flags.html
http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html

Lock Coarsening
StringBuffer buffer = ...
buffer.append( “Hello” );
buffer.append( name );
buffer.append( “n” );

lock( buffer ); buffer.append( “Hello” ); unlock( buffer );
lock( buffer ); buffer.append( name ); unlock( buffer );
lock( buffer ); buffer.append( “n” ); unlock( buffer );

lock( buffer );
buffer.append( “Hello” );
buffer.append( name );
buffer.append( “n” );
unlock( buffer );


Starting in Java 5, HotSpot optimizes locks by performing lock coarsening.
The VM realizes that constantly acquiring and releasing the same lock is not performant, so may take a single larger lock
instead.
http://java.sun.com/performance/reference/whitepapers/6_performance.html#2.1

Other Lock Optimizations
Biased Locking
Adaptive Locking - Thread sleep vs. Spin lock


And, even more lock optimizations are possible...
- biased locking - makes it cheap for the last thread to acquire lock to acquire it again
- adaptive locking - dynamic detects whether a lock is usually held for a short or long period
- if it is long, the thread is put to sleep
- if it is short, the thread will simply spin
http://java.sun.com/performance/reference/whitepapers/6_performance.html#2.1

Escape Analysis
Point p1 = new Point( x1, y1 ), p2 = new Point( x2, y2 );
synchronized ( p1 ) {
synchronized ( p2 ) {
double dx = p1.getX() - p2.getX();
double dy = p1.getY() - p2.getY();
double distance = Math.sqrt( dx*dx + dy*dy );
}
}


Finally, in Java 7, escape analysis is ﬁnally on by default.
With escape analysis, the VM can realize that an object never escapes a stack frame allowing
it to...
- elide heap allocation
- elide locks

Escape Analysis


it to...
- elide locks

Escape Analysis

double dx = x1 - x2;
double dx = y1 - y2;


it to...
- elide locks

Runtime Demo

http://code.google.com/p/caliper/


To conclude the runtime optimization section, I’ll show some micro-benchmarks illustrating some of the optimizations.
Writing microbenchmarks for a dynamically optimizing VM is devilishly hard, fortunately, Google created a tool called Caliper
to make it easy. You can write JUnit 3 like Benchmark classes to compare various implementation options.
http://code.google.com/p/caliper/

Loop Variable Placement
Inside
for ( int i = 0; i < ints.length; ++i ) {
int x = ints[i];
sum += x;

vs.
}

Outside
int x;
x = ints[i];
sum += x;

vs.
}

No Variable
sum += ints[i];
}


First, let’s look at loop variable placement -- declaring the loop variable inside the loop vs. outside vs. using no
variable at all.
All three take the same amount of time to run. In fact, declaring inside or outside produces the same byte code.

My recommendation...
For a one-line loop body, skip the variable.
For a complicated loop body, declare the variable inside to keep the code easier to read and refactor.

Loop Invariant Hoisting
Regular For
sum += ints[i];
}

vs.
Manual Hoisting
for ( int i = 0, len = ints.length; i < len; ++i ) {
sum += ints[i];
}

vs.
Enhanced For
for ( int x : ints ) {
sum += x;
}


Now, we’ll compare...
- the canonical loop which checks i against array.length each time in the test
- manually, hoisting the length into a len temporary variable
- using Java 5’s enhanced for
Once again, they all take the same amount of time because the VM performs for hoisting for us.

Field Access
Direct
point.x
point.y

vs.
Virtual Accessor
point.getX()
point.getY()

vs.
Interface Accessor
point.getX()
point.getY()


Next, we’ll look at direct ﬁeld access vs. using a virtual accessor method vs. using an interface accessor method
Once again, the VM can optimize all of these by performing method inlining, so all three take the same amount of
the time.

Loop Variable Placement
StringBuilder - no locks
StringBuilder builder = new StringBuilder();
builder.append( "foo" );
builder.append( "bar" );
builder.append( "baz" );

vs.
StringBuffer - multiple locks
StringBuffer buffer = new StringBuffer();
buffer.append( "foo" );
buffer.append( "bar" );
buffer.append( "baz" );

vs.
StringBuffer - single lock
StringBuffer buffer = new StringBuffer();
synchronized( buffer ) {
buffer.append( "foo" );
buffer.append( "bar" );
buffer.append( "baz" );
}


Now, revisiting locking - compare...
Java 5’s StringBuilder which performs no locking
vs.
Plain StringBuffer code - multiple separate appends
vs.
StringBuffer - with a manually added bigger lock

The no lock version does come out slightly ahead, but it is close.
And, the attempt to manually improve performance by taking a bigger single lock actually comes in last.

Heap Elision Benchmark
Primitive Array
Arrays.sort(new int[]{...});

vs.
Boxed Array - no Comparator
Arrays.sort(new Integer[]{...});

vs.
Boxed Array - singleton Compator
Arrays.sort(
new Integer[]{...},
IntCompator.INSTANCE);

vs.
Boxed Array - anonymous Compator
Arrays.sort(
new Integer[]{...},
new Comparator<Integer>() {
...
});


Lastly, lets look at heap elision by looking at sorting some lists.
No surprise, the primitive array is the most performant.
But the no Comparator case, the singleton Comparator case, and an anonymous Comparator all perform the same.
Even creating an anonymous every time does not impact performance much -- in Java 7, no heap allocation may
take place at all.

Is This Optimized?
for ( int i = 0; i < 100; ++i ) {
}
Heap Allocations 0
}
Lock Acquisitions 0


So now, hopefully, you can see how this could may truly be optimized already.
Just write clean code and trust in the VM to make it fast.
If you must optimize always proﬁle ﬁrst and use a micro-benchmarking tool like Caliper.

Recommending Reading
Java Puzzlers
By Joshua Bloch and Neal Gafter
http://www.javapuzzlers.com/

Java Specialist Newsletter
http://www.javaspecialists.eu

Brian Goetz’s Articles
http://www.ibm.com/developerworks/views/java/libraryview.jsp?contentarea_by=Java+technology&search_by=brian+goetz


JVM Internals - NHJUG Jan 2012

Recommended

Recommended

More Related Content

Similar to JVM Internals - NHJUG Jan 2012

Similar to JVM Internals - NHJUG Jan 2012 (10)

More from Doug Hawkins

More from Doug Hawkins (9)

Recently uploaded

Recently uploaded (20)

JVM Internals - NHJUG Jan 2012