1. JVM Internals
Douglas Q. Hawkins
http://www.slideshare.net/dougqh
http://www.dougqh.net
dougqh@gmail.com
Monday, January 23, 12
2. Topics
Java Byte Code
File Format
Byte Code Examples
How Java 5 & 7 Features Are Implemented
JVM Optimizations
Monday, January 23, 12
3. Why?
Monday, January 23, 12
Besides techie edification, why is this useful?
A better understanding of the internals can help in deciphering some of the harder problems, but better...
You’ll know that the compiler and JVM are doing a lot for you letting you focus on writing readable code.
5. Class File Format
CA FE BA BE Minor Version Major Version
Constant Pool
Flags This Class Super Class
Interfaces
Fields
Methods
Attributes
Monday, January 23, 12
Every file starts the magic 2-bytes: CAFEBABE
Followed by major and minor version - major indicates Java 5, 6, 7, etc.
Then a constant pool - which contains...
constants: int, long, String, etc.
references: method and field
descriptors: method and field
Followed by flags: modifiers for this class/interface
Followed by reference to this class/interface
Followed by the super class - which is an index into the constant pool
Followed by a list interface references - which are indices into constant pool
Followed by fields
Followed by methods
And, finally, attributes which are extra meta-information about the class...
- the name of the original file
- annotation information
- information on sub-classes
Class File Spec: http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-Java5.pdf
History of CAFEBABE: http://en.wikipedia.org/wiki/Java_class_file
6. Class File Format
CA FE BA BE Minor Version Major Version
Constant Pool
n
Flags This Class Super Class
pu te d
tio
ce
iva te
er ct
ta
ab tfp
pr ec
int ra
fa
ic
um
pr ic
no
Interfaces
st
bl
ric
ot
al
at
en
an
fin
st
st
Fields
Methods
Attributes
Monday, January 23, 12
Every file starts the magic 2-bytes: CAFEBABE
Followed by major and minor version - major indicates Java 5, 6, 7, etc.
Then a constant pool - which contains...
constants: int, long, String, etc.
references: method and field
descriptors: method and field
Followed by flags: modifiers for this class/interface
Followed by reference to this class/interface
Followed by the super class - which is an index into the constant pool
Followed by a list interface references - which are indices into constant pool
Followed by fields
Followed by methods
And, finally, attributes which are extra meta-information about the class...
- the name of the original file
- annotation information
- information on sub-classes
Class File Spec: http://java.sun.com/docs/books/jvms/second_edition/ClassFileFormat-Java5.pdf
History of CAFEBABE: http://en.wikipedia.org/wiki/Java_class_file
7. Field Format
Flags Name Descriptor
pu te d
lat nt
iva te
ile
vo ie
pr ec
ic
pr ic
ns
Attributes
bl
ot
al
at
tra
fin
Monday, January 23, 12 st
Fields consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. field type - also index into the constant pool
- type is raw type
followed by attributes
- constant value
- specific type information - List< String >, etc.
8. Field Format
Flags Name Descriptor
“name”
Attributes
Monday, January 23, 12
Fields consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. field type - also index into the constant pool
- type is raw type
followed by attributes
- constant value
- specific type information - List< String >, etc.
9. Field Format
Flags Name Descriptor “Ljava/lang/String;”
Attributes
Monday, January 23, 12
Fields consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. field type - also index into the constant pool
- type is raw type
followed by attributes
- constant value
- specific type information - List< String >, etc.
10. Field Format
Flags Name Descriptor
Attributes
ConstantValue
Monday, January 23, 12
Fields consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. field type - also index into the constant pool
- type is raw type
followed by attributes
- constant value
- specific type information - List< String >, etc.
11. Method Format
d
ize
Flags Name Descriptor
pu te d
al on
iva te
s
tfp
fi n hr
pr ec
rg
ic
va e
pr ic
Attributes
tiv
nc
bl
ra
ric
ot
at
na
sy
st
Monday, January 23, 12 st
Methods consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. raw parameter types and return type
followed by attributes
- exceptions & code
- specific type information - List< String >, etc.
- specific exception information
- debugging information
12. Method Format
Flags Name Descriptor
“main”
Attributes
Monday, January 23, 12
Methods consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. raw parameter types and return type
followed by attributes
- exceptions & code
- specific type information - List< String >, etc.
- specific exception information
- debugging information
13. Method Format
Flags Name Descriptor “([Ljava/lang/String;)V”
Attributes
Monday, January 23, 12
Methods consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. raw parameter types and return type
followed by attributes
- exceptions & code
- specific type information - List< String >, etc.
- specific exception information
- debugging information
14. Method Format
Flags Name Descriptor
Attributes
Exceptions
Code
Monday, January 23, 12
Methods consist of...
flags
followed by name - actually index to a string literal into the constant pool
followed by descriptor - e.g. raw parameter types and return type
followed by attributes
- exceptions & code
- specific type information - List< String >, etc.
- specific exception information
- debugging information
15. Constant Pool
C 2 UTF 10 HelloWorld
C 4 UTF 16
“java/lang/Object”
UTF 6 “<init>” UTF
3 “()V” UTF 4 “Code”
M 3 9 N&T 5 6 UTF 4
“main” UTF 22
“([Ljava/lang/String;)V”
F 13 15 C 14 UTF
16 “java/lang/System”
Monday, January 23, 12
Dissect the “Hello World” example a little...
Entry 1 is a class entry - a 2-byte index to a UTF entry that contains the name
Entry 2 is the name of the class
Similarly...
Entry 3 is a class entry - referring to the parent class refers to Entry 4 which is the full name of the parent class
Skip over the constructor “<init>” and focus on main
Entry 10 is the name “main” & Entry 11 is the raw type descriptor for “main”
The [Ljava/lang/String indicates String[] - V indicates returns void
16. Browsing Class File Format
JClassLib Viewer http://www.ej-technologies.com/products/jclasslib/overview.html
Monday, January 23, 12
JClassLibViewer: http://www.ej-technologies.com/products/jclasslib/overview.html
17. ConstantValue
public final class HelloWorld {
public static final String MESSAGE = "Hello, World!";
public static final void main( final String... args ) {
System.out.println( MESSAGE );
}
}
Monday, January 23, 12
Here, we can see that because the “MESSAGE” field is “static final”.
The value is stored in a “ConstantValue” attribute on the “MESSAGE” field.
18. Exceptions
public interface InputStreamProvider {
public abstract InputStream open() throws IOException;
}
Monday, January 23, 12
Exception information is also stored in attribute.
As it turns out the JVM, makes no distinction between checked and unchecked exceptions which has an interesting
implication...
19. Exceptions
public final class NewInstance {
public static void main(String... args) {
try { public class SomeClass {
Class. public SomeClass() throws SomeException {
forName("net.dougqh.runtime.SomeClass"). throw new SomeException();
newInstance(); }
} catch ( }
InstantiationException |
IllegalAccessException |
ClassNotFoundException e)
{
e.printStackTrace();
}
}
}
Exception in thread "main" net.dougqh.runtime.SomeClass$SomeException
! at net.dougqh.runtime.SomeClass.<init>
! at sun.reflect.NativeConstructorAccessorImpl.newInstance0
! at sun.reflect.NativeConstructorAccessorImpl.newInstance
! at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
! at java.lang.reflect.Constructor.newInstance
! at java.lang.Class.newInstance0
! at java.lang.Class.newInstance
! at net.dougqh.runtime.NewInstance.main
Monday, January 23, 12
www.javapuzzlers.com
Because of an oversight in the original reflection API, Class.newInstance can throw a checked exception that is
not reported by the compiler
20. Generics
public final class Generics {
public static final List<String> getStrings() {
return Collections.singletonList("foo");
}
}
Monday, January 23, 12
Here, we can getStrings() which returns List<String> has a descriptor of the raw-type List
However, the exact type information is stored in the “Signature” attribute
21. Annotations
@Inherited
@Retention( RetentionPolicy.RUNTIME )
public @interface Annotation {
public int foo() default 20;
public String bar();
}
@Annotation( bar="quux" )
class Annotated {}
Monday, January 23, 12
An annotation is just an inteface
The default values for each method are stored in a ConstElement attribute
The annotation information on a class or method is also stored in an attribute
In this case, since the annotation has a RUNTIME RetentionPolicy, it is stored in the RuntimeVisibleAnnotations
attribute
Values for the attribute are stored in the sub-attribute ElementValuePair
23. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
24. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0
1
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
25. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0
2
1
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
26. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0
1+2
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
27. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2
2 iadd
3 istore_0
4 iload_0
3
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
28. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2 3
2 iadd
3 istore_0
4 iload_0
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
29. Stack Based Virtual Machine
0 iconst_1
0 1 2 3
1 iconst_2 3
2 iadd
3 istore_0
4 iload_0
3
Monday, January 23, 12
The JVM byte code format is stack-based like many other VMs: CLR, PHP, and Python
In this example, the green field is heap, the bar above is local variable slots, and the column to the left is the stack
Let’s look at how to add 1 + 2 together and store into a local variable
First, we use an iconst_1 instruction to load onto the stack
Java has special instructions for common numbers: -1 to 5.
Next, an iconst_2 to place 2 on the stack
Next, we use iadd which pops the 1 & 2 on the stack adds them together and stores the result back on the stack
Next, we use an istore_0 to store into the first local variable slot
To load value, back from the local variable slots, we use an iload_0
Note: Similar to iconst, there are special istore/iload instructions for the most often used slots: 0-3
30. Parameters and Local Variables
static int volume( 0 iload_0
int width, 1 iload_1
int depth,
int height ) 2 imul
e
t
h
lum
igh
h
pt
a
dt
{
are
3 istore_3
de
he
wi
vo
0 1 2 3 4
int area = width * depth;
4 iload_3
int volume = area * height;
return volume; 5 iload_2
}
6 imul
7 istore 4
9 iload 4
11ireturn
Monday, January 23, 12
Trace through a slightly more complicated example: calculating volume
- arguments are passed into the low local variables slots - 0 - 3 in this case
- first to calculate area, load width and depth from slots 0 & 1 respectively
- multiply the values on the stack, then store result into slot 4 area
- reload area & height - slots 4 & 3 respectively
- multiply the values and store into slot 5: volume
- reload volume and return
Yes, the value is stored and then immediately reloaded in the byte code. Starting with Java 3, byte code is not
optimized by javac, all optimizations are left to the JVM to perform.
31. Static vs Virtual Methods
int volume( 0 iload_1
int width, 1 iload_2
int depth,
2 imul
e
int height )
are t
he h
lum
h
igh
pt
a
dt
s
{ 3 istore 4
thi
de
wi
vo
0 1 2 3 4 5
int area = width * depth;
5 iload 4
int volume = area * height;
return volume; 7 iload_3
} 8 imul
9 istore 5
11 iload 5
13 ireturn
Monday, January 23, 12
In the prior example, you may have noticed that method was static.
If the method isn’t static, then “this” is invisibly passed to the first slot.
So, our arguments start at 1 and the load and stores all change accordingly.
32. Hello World
System.out.println( “Hello World” );
0 1 2 3
0 getstatic System.out
3 ldc “Hello World”
5 invokevirtual PrintStream.println
“Hello World”
8 return
System.out
Monday, January 23, 12
Now, we know enough to understand “Hello World”
The first operation is a getstatic to load the value of System.out onto the stack
We need this reference to invoke println
Second, load the string “Hello World” onto the stack - the ldc indicates a load from the constant pool
Now, since this is non-static method on a class, use invokevirtual to invoke PrintStream.println
This consumes the pointer to System.out (which is the this for PrintStream.println) and the reference to “Hello
World”
These values are then mapped to local slots for “this” and “msg” in the new stack frame
33. Hello World
System.out.println( “Hello World” );
0 1 2 3
0 getstatic System.out
3 ldc “Hello World”
5 invokevirtual PrintStream.println
“Hello World”
8 return
System.out
Monday, January 23, 12
Now, we know enough to understand “Hello World”
The first operation is a getstatic to load the value of System.out onto the stack
We need this reference to invoke println
Second, load the string “Hello World” onto the stack - the ldc indicates a load from the constant pool
Now, since this is non-static method on a class, use invokevirtual to invoke PrintStream.println
This consumes the pointer to System.out (which is the this for PrintStream.println) and the reference to “Hello
World”
These values are then mapped to local slots for “this” and “msg” in the new stack frame
34. Hello World
System.out.println( “Hello World” );
0 1 2 3
0 getstatic System.out
3 ldc “Hello World”
5 invokevirtual PrintStream.println
“Hello World”
8 return
System.out
Monday, January 23, 12
Now, we know enough to understand “Hello World”
The first operation is a getstatic to load the value of System.out onto the stack
We need this reference to invoke println
Second, load the string “Hello World” onto the stack - the ldc indicates a load from the constant pool
Now, since this is non-static method on a class, use invokevirtual to invoke PrintStream.println
This consumes the pointer to System.out (which is the this for PrintStream.println) and the reference to “Hello
World”
These values are then mapped to local slots for “this” and “msg” in the new stack frame
35. Hello World
g
s
ms
thi
System.out.println( “Hello World” );
0 1 2 3
0 getstatic System.out
3 ldc “Hello World”
5 invokevirtual PrintStream.println
“Hello World”
8 return
System.out
Monday, January 23, 12
Now, we know enough to understand “Hello World”
The first operation is a getstatic to load the value of System.out onto the stack
We need this reference to invoke println
Second, load the string “Hello World” onto the stack - the ldc indicates a load from the constant pool
Now, since this is non-static method on a class, use invokevirtual to invoke PrintStream.println
This consumes the pointer to System.out (which is the this for PrintStream.println) and the reference to “Hello
World”
These values are then mapped to local slots for “this” and “msg” in the new stack frame
36. Types of Method Invocations
invokestatic - invoke static methods
invokevirtual - invoke instance method from class
invokeinterface - invoke instance method from interface
invokespecial - invoke <init> / invoke super method
invokedynamic - optimized dynamic look-up (in Java 7)
Monday, January 23, 12
We’ve seen a call to invokevirtual which is used class methods, but there are other invocation types, too.
invokestatic - for static methods
invokeinterface- for methods invoked through an interface reference (rather than a class reference)
invokespecial - for direct targets - like constructors or invoking a super method where the call is not polymorphic
invokedynamic - used by script languages like JRuby in Java 7 for improved performance
37. New Object
BigDecimal num =
m
new BigDecimal(“2.0”);
nu
0 1 2 3
0 new BigDecimal
3 dup
4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>
9 astore_0
Monday, January 23, 12
Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.
As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.
From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.
38. New Object
BigDecimal num =
m
new BigDecimal(“2.0”);
nu
0 1 2 3
0 new BigDecimal
3 dup
4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>
9 astore_0
BigDecimal
Monday, January 23, 12
Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.
As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.
From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.
39. New Object
BigDecimal num =
m
new BigDecimal(“2.0”);
nu
0 1 2 3
0 new BigDecimal
3 dup
4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>
9 astore_0
BigDecimal
Monday, January 23, 12
Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.
As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.
From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.
40. New Object
BigDecimal num =
m
new BigDecimal(“2.0”);
nu
0 1 2 3
0 new BigDecimal
3 dup
4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>
9 astore_0
BigDecimal
Monday, January 23, 12
Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.
As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.
From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.
41. New Object
BigDecimal num =
m
new BigDecimal(“2.0”);
nu
0 1 2 3
0 new BigDecimal
3 dup
4 ldc “2.0”
“2.0”
6 invokespecial BigDecimal.<init>
9 astore_0
BigDecimal
Monday, January 23, 12
Now, let’s look an object allocation
The first step is to an object; however, this steps does not yet invoke the constructor
It just allocates space on the heap for the object and returns a pointer to uninitialized memory
Unfortunately, since invoking of the constructor will consume a reference to the newly allocated BigDecimal, we
need to a copy (“dup”) so that we’ll have a reference left to store into “num”.
Next, we push “2.0” onto the stack
Then we invoke BigDecimal.<init> which is the BigDecimal constructor.
It consumes the pointer to “2.0” and the duplicate reference, leaving us with one reference to assign into “num”.
As you can see construction is rather complicated, some of the past security wholes with byte code verifier
involved object construction because the sequence is non-trivial.
CLR learned from this and has a single “new” instruction that both allocates and invokes the construction, thus
making byte code verification easier.
From this example, you can also see why double-checked locking is broken in Java. Construction isn’t a single
step and with reordering, so it is possible for a pointer to an uninitialized object to be assigned to field.
In Java 5, the use of volatile guarantees a “happens-before”, so the field will never be assigned before the
constructor is done being invoked.
43. Conditionals
Original Byte Code
if ( x > 0 ) { 0: iload_0
return true; 1: ifle 6
} else { 4: iconst_1
return false; 5: ireturn
} 6: iconst_0
7: ireturn
0: iload_0
return x > 0 ? true : false;
1: ifle 8
4: iconst_1
5: goto 9
8: iconst_0
9: ireturn
0: iload_0
return ( x > 0 );
1: ifle 8
4: iconst_1
5: goto 9
8: iconst_0
9: ireturn
Monday, January 23, 12
Three ways to write a method that checks if a number is greater than 0.
The byte code is almost the same in all 3 cases.
44. Invoke Static
Original Decompiled
Math.max(10, 20); 0: bipush 10
2: bipush 20
4: invokestatic Math.max
7: pop
8: return
Monday, January 23, 12
Here, we see an extra pop after the invokestatic call.
That’s because the return value of max is left on the stack, since we don’t use it the compiler generates a pop to
discard it.
If we store the value in a variable, the pop will be replaced with an istore
45. Invocations
Original Decompiled
FileInputStream in = 0: new FileInputStream
new FileInputStream("foo"); 3: dup
in.close(); 4: ldc "foo"
6: invokespecial FileInputStream.<init>
9: astore_0
10: aload_0
11: invokevirtual FileInputStream.close
14: return
Closeable in = new FileInputStream("foo"); 0: new FileInputStream
in.close(); 3: dup
4: ldc "foo"
6: invokespecial FileInputStream.<init>
9: astore_0
10: aload_0
11: invokeinterface Closeable.close
16: return
Monday, January 23, 12
In one example, close is called on a class-type FileInputStream in the other it is called on an interface-type
Closeable
In the first case, the compiler generates an invokevirtual call
In the second case, the compiler generates an invokeinterface call
46. For Loop
before
0 iconst_0
init & test loop
1 istore_2
2 iload_0
static int sum( int min, int max ){ 3 istore_3
int sum = 0; 4 goto +10 //14
for ( int i=min; i<max; ++i ){ 7 iload_2
loop body
sum += i; 8 iload_3
} 9 iadd
return sum;
10 istore_2
}
inc
11 iinc 3 by 1
14 iload_3
test 15 iload_1
16 if_icmplt -9 //7
19 iload_2
after
loop
20 ireturn
Monday, January 23, 12
Examine a for loop example
The first 2 ops are the initialization of “sum”, load 0 and store in “sum” (slot 2)
The next 3 ops are the loop initialization and jump to the initial test...
- load the value of “min” (slot 0) into “i” (slot 3)
- then jump to the test
The test is placed at the end since it is generally performed after the body and step portions of the loop
The test...
- loads “i” (slot 3) and “max” (slot 1)
- if “i” is less than “max”, then it jumps back 9 bytes to the start of the loop body
The loop body...
- loads and adds “sum” and “i” (slots 2 and 3) and stores the result back into “sum” (slot 2)
Then the step / increment part of the loop happens...
- which just increments “i”
Then we flow straight into the test portion
If the test fails, we flow through to the after loop portion
Here, we load “sum” (slot 2) and return the result
47. 0 aload_0
Exception Handling 1 invokevirtual InputStream.read
try / finally
static int read( InputStream in ) { 4 istore_1
try { 5 aload_0
return in.read(); 6 invokestatic IoUtils.closeQuietly
} catch ( IOException e ) { 9 iload_1
return -1; 10 ireturn
} finally {
11 pop
IoUtils.closeQuietly( in );
catch / finally
} 12 aload_0
} 13 invokestatic IoUtils.closeQuietly
16 iconst_m1
17 ireturn
Exception Table
18 astore_2
start end handler Exception
19 aload_0
0 5 11 IOException
finally
20 invokestatic IoUtils.closeQuietly
0 5 18 any
23 aload_2
11 12 18 any 24 athrow
Monday, January 23, 12
Now, Exception handling...
Exceptions are handled through extra meta-information that says how to handle different types of exceptions
over a range of byte-code instructions.
The finally portion is inlined in the try, catch, and finally portions of the generated byte code.
(Prior to Java 6, the regular javac compiler generated “jsr” and “ret” to jump to single block of compiled “finally”
code.)
The “try / finally” section represents the normal flow.
- invoke InputStream.read
- store the result into an unnamed temporary variable (slot 1) b/c we need to run the finally code
- run the finally code
- reload the temporary variable and return
The “catch / finally” is the catching of the IOException...
The exception table says if an IOException is raised between instructions 0 and 5 (the try), jump to 11 this catch
section.
First, step is to “pop”, pop what? In this case the IOException which was automatically placed on the stack. Since
we don’t use it discard it. This implies that “e” is never assigned a stack slot by the compiler.
Now, invoke IoUtils.closeQuietly (the finally block) then return -1.
48. 0 aload_0
Synchronization
before try
1 dup
int inc() { 2 astore_1
synchronized ( this ) { 3 monitorenter
++this.counter; 4 aload_0
}
5 dup
}
6 getfield Counter.num
try / finally
9 iconst_1
10 iadd
11 putfield Counter.num
14 aload_1
15 monitorexit
16 goto +6 //22
Exception Table
19 aload_1
start end handler Exception
finally
20 monitorexit
4 16 22 any
21 athrow
19 21 22 any 22 return
Monday, January 23, 12
Interestingly enough, synchronization works the same way.
To understand synchronization, it is better to luck at synchronization as a lock and unlock within a try / finally.
And, that’s exactly how the byte code works.
And, just like a regular try / finally, the finally is inlined is both the try and the finally.
49. 0 aload_0
Synchronization
before try
1 dup
int inc() { 2 astore_1
lock( this ); 3 monitorenter
try { 4 aload_0
++this.counter;
5 dup
} finally {
6 getfield Counter.num
try / finally
unlock( this );
} 9 iconst_1
} 10 iadd
11 putfield Counter.num
14 aload_1
15 monitorexit
16 goto +6 //22
Exception Table
19 aload_1
start end handler Exception
finally
20 monitorexit
4 16 22 any
21 athrow
19 21 22 any 22 return
Monday, January 23, 12
Interestingly enough, synchronization works the same way.
To understand synchronization, it is better to luck at synchronization as a lock and unlock within a try / finally.
And, that’s exactly how the byte code works.
And, just like a regular try / finally, the finally is inlined is both the try and the finally.
50. Demo
Java 5
Java 7
Monday, January 23, 12
In these demos, I demonstrate new language features by showing Java 5 and Java 7 code and then showing what it looks
when its decompiled back into Java 4 code.
JAD - http://www.varaneckas.com/jad
52. Auto-Boxing
Original Decompiled as Java 4
public class AutoBoxing { public class AutoBoxing {
public static void main(String[] args) { public static void main(String args[]) {
Integer foo = 20; Integer foo = Integer.valueOf(20);
Integer bar = 30; Integer bar = Integer.valueOf(30);
int sum = foo + bar; int sum = foo.intValue() + bar.intValue();
System.out.println(sum); System.out.println(sum);
} }
} }
Monday, January 23, 12
Here, we see how auto-boxing works.
The compiler injects the necessary calls to Integer.valueOf and Integer.intValue for us.
NOTE: Even if you don’t like auto-boxing, please call Integer.valueOf rather than calling new Integer.
Unlike new, Integer.valueOf returns cached instances of Integer for commonly used values.
53. Enhanced For
Original Decompiled as Java 4
public class EnhancedFor { public class EnhancedFor {
static void array(String[] args) { static void array(String args[]) {
for ( String arg : args ) { String arr$[] = args;
System.out.println(arg); int len$ = arr$.length;
} for (int i$ = 0; i$ < len$; i$++) {
} String arg = arr$[i$];
System.out.println(arg);
static void iterable( }
Iterable<String> args) }
{
for ( String arg: args ) { static void iterable(Iterable args) {
System.out.println(arg); String arg;
}
} for (Iterator i$ = args.iterator();
} i$.hasNext(); )
{
arg = (String) i$.next();
System.out.println(arg)
}
}
}
Monday, January 23, 12
In this slide, we see how the enhanced for gets handled by the compiler.
The array for loop, converts to the canonical C-style loop. With one slight difference of performing invariant
hoisting on the array length. (Although, this is rather pointless optimization because the JVM would do this at
runtime anyway.)
For an Iterable, a loop that uses an iterator is generated. In this example, we can also see that the compiler
injects a cast to exact type String, too.
54. Var-Args
Original Decompiled as Java 4
public final class VarArgs { public final class VarArgs {
public static void main(String... args) { public static transient void main(
System.out.printf( String[] args)
"Hello %s %s", "Jon", "Doe"); {
} System.out.printf(
} "Hello %s %s",
new Object[] {"Jon", "Doe"});
}
}
Monday, January 23, 12
In this example, we var-args being used both in the signature and in the call to printf.
NOTE: I’ve declared a main method with var-args, since on a byte-code level this is still just a String[]. This
actually works just fine.
The “transient” modifier in the decompiled Java 4 is a bit amusing. This happens because Java ran out of flag bits
to use in Java 5, so they overloaded the “transient” bit which only applies to fields to mean “var-args” when
applied to methods.
In the call to printf, we can see that the compiler injects a construction of a new Object[] and passes it as the last
arg to printf.
55. Enum
Original Decompiled as Java 4
public enum AnEnum { public static final class AnEnum
FOO, extends Enum
BAR, {
QUUX public static final AnEnum FOO =
} new AnEnum(“FOO”, 0);
public static final AnEnum BAR =
new AnEnum(“BAR”, 1);
public static final AnEnum QUUX =
new AnEnum(“QUUX”, 2);
private static final AnEnum[] $VALUES =
new AnEnum[]{FOO, BAR, QUUX};
public static AnEnum[] values() {
return (AnEnum[]) $VALUES.clone();
}
public static AnEnum valueOf(String name){
return (AnEnum)Enum.valueOf(
AnEnum.class, name);
}
private Simple(String s, int i) {
super(s, i);
}
}
Monday, January 23, 12
For Enum-s, the compiler does a great deal of work on your behalf -- even in the simplest case.
The compiler generates a constructor that takes a label and ordinal for each entry.
It then initializes a static final field for each constant from the original file.
These constants are all placed in a value array.
Finally, the compiler generates a values() method and valueOf() method for each enum class.
56. Covariance
Original Decompiled as Java 4
public interface Parent { public static interface Parent {
Number calculate(); public abstract Number calculate();
} }
public class CovariantChild public class CovariantChild
implements Parent implements Parent
{ {
public Integer calculate() { public Integer calculate() {
return 10; return Integer.valueOf(10);
} }
}
public volatile Number calculate() {
return calculate();
}
}
Monday, January 23, 12
A lesser known addition to Java 5 is the ability to have a covariant return type.
Here, the child type returns a more specific type of Number -- namely Integer.
The generated code is interesting. We end up with two “calculate” methods - one that returns Integer and another
returns Number. The one that returns Number satisfies the contact of the parent and simply calls the more
specific version that returns Integer.
Here, again we see the curious modifier on a method: “volatile”. This another situation where Java 5 overloaded an
existing flag bit.
For more information on why this is type-safe, look-up Liskov Substitution Principle.
58. Multi-Catch
Original Decompiled as Java 4
public final class EnhancedCatch { public final class EnhancedCatch {
public static void main(String[] args){ public static void main(String args[]) {
try { try {
Class. Class.
forName("some.package.SomeClass"). forName("some.package.SomeClass").
newInstance(); newInstance();
} catch ( } catch (ReflectiveOperationException e){
InstantiationException | throw new IllegalStateException(e);
IllegalAccessException | }
ClassNotFoundException e) }
{ }
throw new IllegalStateException(e);
}
}
}
Monday, January 23, 12
Java 7 adds the ability to handle multi-exception types in a single catch.
Great for ugly reflection code.
Here, the catch of all the reflection exceptions simplifies to a single catch of their common parent
ReflectiveOperationException (a new base class for reflection exceptions also introduced in Java 7).
59. Try With Resources
Original Decompiled
public class EnhancedTry { public class EnhancedTry {
public static void main( public static void main(String args[])
String[] args) throws IOException
throws IOException {
{ Properties properties = new Properties();
Properties properties = InputStream in =
new Properties(); new FileInputStream("my.properties");
Throwable throwable = null;
try (InputStream in = try {
new FileInputStream("my.properties")) properties.load(in);
{ } catch (Throwable throwable1) {
properties.load(in); throwable = throwable1;
} } finally {
} if (in != null) {
} try {
in.close();
} catch (Throwable x2) {
throwable.addSuppressed(x2);
throw throwable;
}
}
}
}
}
Monday, January 23, 12
Java 7 also enhances try by allowing it to automatically close resources.
It generates a similar try / finally to what you’d write by hand.
Although, it puts the resource acquisition outside the try (which is correct but uncommon among many Java
programmers).
However, it does one more thing, it also adds code, so that if an exception happens when closing the original
exception from the body is still propagated. And, even better the exception raised by closed is added to the
suppressed list of the original exception using the new Java 7 method: Throwable.addSuppressed.
60. String Switch
Original Decompiled
switch (args[0]) { byte byte0 = -1;
case "Hello": switch(args[0].hashCode()) {
System.out.println("Hello, World!"); case 69609650: ... break;
break; case 67278:
if(s.equals("9uFFE7")) {
case "Bye": byte0 = 2;
System.out.println("Good Bye, World!"); } else if(s.equals("Bye")) {
break; byte0 = 1;
}
case "9uffe7": break;
System.out.println("Collision"); }
break; switch(byte0) {
} case 0:
System.out.println("Hello, World!");
break;
case 1:
System.out.println("Good Bye, World!");
break;
case 2:
System.out.println("Collision");
break;
}
Monday, January 23, 12
One last example from Java 7 -- string switch
String switch is implemented as a switch on the String’s hashCode.
However, hashCode is not unique, so the generated code must also perform an equals check.
To handle this, string switch actually generates two switch statements.
The first on the hashCode, assigns a temporary variable, a case value from the original code.
Then the second switches on the case code, each case containing code from the original Java 7 cases.
Here, I’ve deliberately created a hash collision, so you can see how collisions are resolved.
61. Compiler
Optimizations
Monday, January 23, 12
In the next few examples, I show code the original code and the code after it has been decompiled.
By doing this, we can see some of the optimizations performed by the compiler.
JAD - http://www.varaneckas.com/jad
62. Constant Folding
Original Decompiled
public final class StaticInitializer { public final class StaticInitializer {
private static final String LOG_FORMAT = private static final String LOG_FORMAT =
"Started at %d ms"; "Started at %d ms";
private static final long START_TIME = private static final long START_TIME =
System.currentTimeMillis(); System.currentTimeMillis();
private static final long START_TIME_2; private static final long START_TIME_2 =
System.currentTimeMillis();
static { }
START_TIME_2 = System.currentTimeMillis();
}
}
Monday, January 23, 12
While modern Java compiler’s don’t do much optimization, they do some.
One example is constant folding -- when possible, the compiler computes simply constant expressions at compile
time.
This even includes string concatenation.
63. Constant Inlining
Original Decompiled
public class Inlining { public class Inlining {
public static final String public static final String
INLINED_VERSION = "1.1.0"; INLINED_VERSION = "1.1.0";
public static final String public static final String
NOT_INLINED_VERSION = identity("1.2.0"); NOT_INLINED_VERSION = identity("1.2.0");
private static String identity( private static String identity(
String value) String value)
{ {
return value; return value;
} }
public static void print() { public static void print() {
System.out.println(INLINED_VERSION); System.out.println("1.1.0");
System.out.println(NOT_INLINED_VERSION); System.out.println(NOT_INLINED_VERSION);
} }
} }
Monday, January 23, 12
Constants can also be inlined by the compiler
In this example, the compiler inlines INLINED_VERSION in the print method; however,
it does no inlined NOT_INLINED_VERSION.
The reason is that NOT_INLINED_VERSION is complexed expression because a method was invoked.
This has implications in the byte code, too.
INLINED_VERSION will have its value set through a ConstantValue attribute.
NOT_INLINED_VERSION will be initialized in a <clinit> method generated by the compiler and
called automatically when the class is first loaded.
64. Dead Code Elimination
Original Decompiled
public class DeadCodeElimination { public class DeadCodeElimination {
public static final boolean public static final boolean
DEBUG_OFF = false; DEBUG_OFF = false;
public static final boolean public static final boolean
DEBUG_ON = true; DEBUG_ON = true;
public static void main(String[] args) { public static void main(String args[]) {
if ( DEBUG_OFF ) { System.out.println("always");
System.out.println("never"); }
} }
if ( DEBUG_ON ) {
System.out.println("always");
}
}
}
Monday, January 23, 12
Along with inlining, the compiler can perform dead code elimination.
In this case, DEBUG_OFF is never true, so the “never” print out is not generated by the
compiler.
Even in the DEBUG_ON case, the compiler realizes the if is always true and simply includes an
unconditional print of “always”.
66. HotSpot Lifecycle
1 2
Interpreted Profiling
Dynamic Dynamic
Decompilation Compilation
4 3
Monday, January 23, 12
Client compilation kicks-in at invocation 3000
Server compilation kicks-in at invocation 10000
Tiered compilation - C0, C1, C2
Method Replacement vs On-Stack Replacement
http://java.sun.com/products/hotspot/whitepaper.html
http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html
http://www.azulsystems.com/blog/cliff-click/2010-07-16-tiered-compilation
http://www.slideshare.net/drorbr/so-you-want-to-write-your-own-benchmark-presentation
67. Is This Optimized?
double sumU = 0, sumV = 0;
for ( int i = 0; i < 100; ++i ) {
Vector2D vector = new Vector2D( i, i );
synchronized ( vector ) {
sumU += vector.getU(); How many...?
sumV += vector.getV(); Loop Iterations 100
}
Heap Allocations 100
}
Method Invocations 200
Lock Acquisitions 100
Monday, January 23, 12
Let’s start the runtime observation discussion with a simple question.
Is this optimized?
How many loop iterations does it do? 100
How many heap allocations? 100
How method invocations? 200
How lock acquisitions? 100
Surprisingly, enough the answer to all of these may actually be zero.
68. Is This Optimized?
double sumU = 0, sumV = 0;
for ( int i = 0; i < 100; ++i ) {
Vector2D vector = new Vector2D( i, i );
synchronized ( vector ) {
sumU += vector.getU(); How many...?
sumV += vector.getV(); Loop Iterations 0
}
Heap Allocations 0
}
Method Invocations 0
Lock Acquisitions 0
Monday, January 23, 12
Let’s start the runtime observation discussion with a simple question.
Is this optimized?
How many loop iterations does it do? 100
How many heap allocations? 100
How method invocations? 200
How lock acquisitions? 100
Surprisingly, enough the answer to all of these may actually be zero.
69. Common Sub-Expression
Elimination
int x = a + b;
int y = a + b;
int tmp = a + b;
int x = tmp;
int y = tmp;
Monday, January 23, 12
Among the simplest optimizations is common sub-expression elimination.
Here the VM optimizes the code by only performing the calculation of “a+b” once.
http://www.slideshare.net/drorbr/so-you-want-to-write-your-own-benchmark-presentation
70. Array Bounds Check Elimination
int[] nums = ...
for ( int i = 0; i < nums.length; ++i ) {
System.out.println( “nums[“ + i + “]=” + nums[ i ] );
}
int[] nums = ...
for ( int i = 0; i < nums.length; ++i ) {
if ( i < 0 || i >= nums.length ) {
throw new ArrayIndexOutOfBoundsException();
}
System.out.println( “nums[“ + i + “]=” + nums[ i ] );
}
Monday, January 23, 12
One of the nice things about the VM is that we do have to worry about buffer overruns because the VM checks
array bounds for us, but how much is that costing us.
In short, nothing. The VM recognizes common patterns and realizes that it does not need to generate the bound
checking code.
http://www.cs.umd.edu/~vibha/330/array-bounds.pdf
71. Loop Invariant Hoisting
for ( int i = 0; i < nums.length; ++i ) {
...
}
int length = nums.length;
for ( int i = 0; i < length; ++i ) {
...
}
Monday, January 23, 12
The VM can also also realize that the length of array does not change, so it can replace looking up the length of the array on
each test with a single storing of a temporary variable and comparing against that instead.
http://java.sun.com/products/hotspot/docs/whitepaper/Java_Hotspot_v1.4.1/Java_HSpot_WP_v1.4.1_1002_4.html
72. Loop Unrolling
int sum = 0;
for ( int i = 0; i < 10; ++i ) {
sum += i;
}
int sum = 0;
sum += 1;
...
sum += 9;
Monday, January 23, 12
In some situations, the loop can even be unrolled into a simple linear code segment.
74. Lock Coarsening
StringBuffer buffer = ...
buffer.append( “Hello” );
buffer.append( name );
buffer.append( “n” );
StringBuffer buffer = ...
lock( buffer ); buffer.append( “Hello” ); unlock( buffer );
lock( buffer ); buffer.append( name ); unlock( buffer );
lock( buffer ); buffer.append( “n” ); unlock( buffer );
StringBuffer buffer = ...
lock( buffer );
buffer.append( “Hello” );
buffer.append( name );
buffer.append( “n” );
unlock( buffer );
Monday, January 23, 12
Starting in Java 5, HotSpot optimizes locks by performing lock coarsening.
The VM realizes that constantly acquiring and releasing the same lock is not performant, so may take a single larger lock
instead.
http://java.sun.com/performance/reference/whitepapers/6_performance.html#2.1
75. Other Lock Optimizations
Biased Locking
Adaptive Locking - Thread sleep vs. Spin lock
Monday, January 23, 12
And, even more lock optimizations are possible...
- biased locking - makes it cheap for the last thread to acquire lock to acquire it again
- adaptive locking - dynamic detects whether a lock is usually held for a short or long period
- if it is long, the thread is put to sleep
- if it is short, the thread will simply spin
http://java.sun.com/performance/reference/whitepapers/6_performance.html#2.1
76. Escape Analysis
Point p1 = new Point( x1, y1 ), p2 = new Point( x2, y2 );
synchronized ( p1 ) {
synchronized ( p2 ) {
double dx = p1.getX() - p2.getX();
double dy = p1.getY() - p2.getY();
double distance = Math.sqrt( dx*dx + dy*dy );
}
}
Monday, January 23, 12
Finally, in Java 7, escape analysis is finally on by default.
With escape analysis, the VM can realize that an object never escapes a stack frame allowing
it to...
- elide heap allocation
- elide locks
77. Escape Analysis
Point p1 = new Point( x1, y1 ), p2 = new Point( x2, y2 );
double dx = p1.getX() - p2.getX();
double dy = p1.getY() - p2.getY();
double distance = Math.sqrt( dx*dx + dy*dy );
Monday, January 23, 12
Finally, in Java 7, escape analysis is finally on by default.
With escape analysis, the VM can realize that an object never escapes a stack frame allowing
it to...
- elide heap allocation
- elide locks
78. Escape Analysis
Point p1 = new Point( x1, y1 ), p2 = new Point( x2, y2 );
double dx = p1.getX() - p2.getX();
double dy = p1.getY() - p2.getY();
double distance = Math.sqrt( dx*dx + dy*dy );
double dx = x1 - x2;
double dx = y1 - y2;
double distance = Math.sqrt( dx*dx + dy*dy );
Monday, January 23, 12
Finally, in Java 7, escape analysis is finally on by default.
With escape analysis, the VM can realize that an object never escapes a stack frame allowing
it to...
- elide heap allocation
- elide locks
79. Runtime Demo
http://code.google.com/p/caliper/
Monday, January 23, 12
To conclude the runtime optimization section, I’ll show some micro-benchmarks illustrating some of the optimizations.
Writing microbenchmarks for a dynamically optimizing VM is devilishly hard, fortunately, Google created a tool called Caliper
to make it easy. You can write JUnit 3 like Benchmark classes to compare various implementation options.
http://www.slideshare.net/drorbr/so-you-want-to-write-your-own-benchmark-presentation
http://code.google.com/p/caliper/
80. Loop Variable Placement
Inside
for ( int i = 0; i < ints.length; ++i ) {
int x = ints[i];
sum += x;
vs.
}
Outside
int x;
for ( int i = 0; i < ints.length; ++i ) {
x = ints[i];
sum += x;
vs.
}
No Variable
for ( int i = 0; i < ints.length; ++i ) {
sum += ints[i];
}
Monday, January 23, 12
First, let’s look at loop variable placement -- declaring the loop variable inside the loop vs. outside vs. using no
variable at all.
All three take the same amount of time to run. In fact, declaring inside or outside produces the same byte code.
My recommendation...
For a one-line loop body, skip the variable.
For a complicated loop body, declare the variable inside to keep the code easier to read and refactor.
81. Loop Invariant Hoisting
Regular For
for ( int i = 0; i < ints.length; ++i ) {
sum += ints[i];
}
vs.
Manual Hoisting
for ( int i = 0, len = ints.length; i < len; ++i ) {
sum += ints[i];
}
vs.
Enhanced For
for ( int x : ints ) {
sum += x;
}
Monday, January 23, 12
Now, we’ll compare...
- the canonical loop which checks i against array.length each time in the test
- manually, hoisting the length into a len temporary variable
- using Java 5’s enhanced for
Once again, they all take the same amount of time because the VM performs for hoisting for us.
82. Field Access
Direct
point.x
point.y
vs.
Virtual Accessor
point.getX()
point.getY()
vs.
Interface Accessor
point.getX()
point.getY()
Monday, January 23, 12
Next, we’ll look at direct field access vs. using a virtual accessor method vs. using an interface accessor method
Once again, the VM can optimize all of these by performing method inlining, so all three take the same amount of
the time.
83. Loop Variable Placement
StringBuilder - no locks
StringBuilder builder = new StringBuilder();
builder.append( "foo" );
builder.append( "bar" );
builder.append( "baz" );
vs.
StringBuffer - multiple locks
StringBuffer buffer = new StringBuffer();
buffer.append( "foo" );
buffer.append( "bar" );
buffer.append( "baz" );
vs.
StringBuffer - single lock
StringBuffer buffer = new StringBuffer();
synchronized( buffer ) {
buffer.append( "foo" );
buffer.append( "bar" );
buffer.append( "baz" );
}
Monday, January 23, 12
Now, revisiting locking - compare...
Java 5’s StringBuilder which performs no locking
vs.
Plain StringBuffer code - multiple separate appends
vs.
StringBuffer - with a manually added bigger lock
The no lock version does come out slightly ahead, but it is close.
And, the attempt to manually improve performance by taking a bigger single lock actually comes in last.
84. Heap Elision Benchmark
Primitive Array
Arrays.sort(new int[]{...});
vs.
Boxed Array - no Comparator
Arrays.sort(new Integer[]{...});
vs.
Boxed Array - singleton Compator
Arrays.sort(
new Integer[]{...},
IntCompator.INSTANCE);
vs.
Boxed Array - anonymous Compator
Arrays.sort(
new Integer[]{...},
new Comparator<Integer>() {
...
});
Monday, January 23, 12
Lastly, lets look at heap elision by looking at sorting some lists.
No surprise, the primitive array is the most performant.
But the no Comparator case, the singleton Comparator case, and an anonymous Comparator all perform the same.
Even creating an anonymous every time does not impact performance much -- in Java 7, no heap allocation may
take place at all.
85. Is This Optimized?
double sumU = 0, sumV = 0;
for ( int i = 0; i < 100; ++i ) {
Vector2D vector = new Vector2D( i, i );
synchronized ( vector ) {
sumU += vector.getU(); How many...?
sumV += vector.getV(); Loop Iterations 0
}
Heap Allocations 0
}
Method Invocations 0
Lock Acquisitions 0
Monday, January 23, 12
So now, hopefully, you can see how this could may truly be optimized already.
Just write clean code and trust in the VM to make it fast.
If you must optimize always profile first and use a micro-benchmarking tool like Caliper.
86. Recommending Reading
Java Puzzlers
By Joshua Bloch and Neal Gafter
http://www.javapuzzlers.com/
Java Specialist Newsletter
http://www.javaspecialists.eu
Brian Goetz’s Articles
http://www.ibm.com/developerworks/views/java/libraryview.jsp?contentarea_by=Java+technology&search_by=brian+goetz
Monday, January 23, 12