©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Implementation concerns on JVMs

Gil Tene, CTO & co-Founder, Azul Systems
http://objectlayout.org
ObjectLayout / StructuredArray

Workshop

!
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
ObjectLayout: quick recap
Purpose: Match the raw speed benefits C based
languages get from commonly used forms of memory
layout 

Focus: Speed. For regular Java Objects. On the heap.

Focus: Layout related speed benefits

Dead reckoning

Streaming (prefetch friendly)

It’s all about capturing “enabling semantic limitations”
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
ObjectLayout target forms
Easily identified C-based layout forms:

array of structs

struct foo[];

struct with struct inside

struct foo { int a; struct bar b; int c; };

struct with array at the end

struct packet { int length; char[] body; }
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
example of speed-enabling limitations
Why is Object[] inherently slower than struct foo[]?

Java: a mutable array of same-base-type objects

C: An immutable array of exact-same-type structures

Mutability (of the array itself) & non-uniform member
size both force de-reference, and both break streaming

StructuredArray<T>: An immutable array of [potentially]
mutable exact-same-type (T) objects

Supports Instantiation, get(), but not put()…
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
ObjectLayout: “rules”
“Vanilla” and “Intrinsified” implementation behavior
should be indistinguishable (except for speed)

Java code should not be able to crash JVM (or break
security lines).

E.g. poking at private final fields via reflection

But Unsafe is out of scope, obviously
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
What we need from JDKs/JVMs
A new heap concept: Contained objects

With associated GC behavior

A set of JVM calls (unsafe additions)

With obvious intrinsification into IR for some 

Optimizations

Mainly aimed at dead-reckoning and checkcast

Recognition of classes and support for hidden fields
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
StructuredArrayIntrinsifiableBase
https://github.com/ObjectLayout/IntrinsicObjectLayout

A version of the “vanilla” ObjectLayout implementation’s
StructuredArrayIntrinsifiableBase class

vanilla is at https://github.com/ObjectLayout/ObjectLayout

Includes all the Java-side (JDK specific) things needed
specifically for StructuredArray “go fast” support

Many things stubbed out (to compile on vanilla Java)

but commented TODOs everywhere tell you what
needs to be completed for a JDK to support this
stuff.
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
New heap concept: 

Contained objects
A new heap concept: “contained” and “container” objects

Contained and container objects are regular objects

A heap object can now be: contained, a container,
neither, or both

Given a contained object, there is a means of finding
the immediately containing object

If GC needs to move an object that is contained in a
live container object, it will move the entire container
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
StructuredArray<Foo>
StructuredArray<Foo>
Heap
StructuredArray<Foo>
Foo
(contained)Bar (contained) Doof
StructuredArray<Moo>[4]
Container/Contained

Relationships
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Contained / container liveness
Containers implicitly keep their contained objects alive

So contained elements needs to be traversed when
a container is encountered during liveness
determination passes (e.g. marking or copying) 

!
But Contained objects don’t do anything special
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
What does GC need to do?
Simple: when any GC mechanism finds a need to move
a contained object, it must move the outermost
container as a whole

And that’s it

So it all comes down to:

How do we know an object is contained

How do we find the outermost live container

Multiple possible alternatives

Can vary by collector type and existing
infrastructure capabilities
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Example ways of finding container
Back pointer from contained object

E.g. in a pre-header

B-Tree describing container address ranges

Only needs to be in usable form at relocation time

Can be developed at mark time…

Or at allocation time (with thread-local buffers)

Use object start array

Already there in collectors with imprecise cards

Hybrids
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Identifying contained and container
Contained

Most likely via bit in header (aka mark word)

Could be done via BTree lookup instead

Container

Easiest is bit in header

Can be derived from Klass instead

Or via BTree lookup

Both indications only need to be correct at relocation 

Can be updated via marker

Or checked by relocator and fixed if stale
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
JVM API needs (as of July 2014)
Getting a ref to a contained object

This is the fast path get(index) call need:

offset = a.bodySize + (index * a.elementSize) 

deriveContainedObjectAtOffset(a, offset);

Allocation of a variable size slot on the heap

allocateHeapForClass(instanceClass, size);

Construction of an object at a given address

constructObjectAtOffset(o, offset, ctor, args);

Various derivations of size, footprint, padding
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
JVM API needs (cont.)
Derivations of size, footprint, padding:

getInstanceSize(instanceClass);

getInstanceSizeWhenContained(instanceClass);

getContainingObjectFootprint(containerClass,
containedElementSize, numberOfElements);

getContainingObjectFootprintWhenContained(…);

getPrePaddingInObjectSize();
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Optimizations
Streaming benefits come directly from layout

No compiler optimizations needed

Dead-reckoning benefits require some compiler support

no dereferencing, but….

e = (T) ( a + a.bodySize + (index * a.elementSize) );

elementSize and bodySize are not constant

But optimizations similar to CHA & inline-cache apply

Also, generics have that ugly checkcast on get()….
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Optimizations
CHA optimization of elementSize and bodySize:

StructuredArray<T> is often (usually?) subclassed

MassOrder extends StructuredArray<SimpleOrder>

constant bodySize, 

effectively constant elementSize and elementClass

Later subclassing can invalidate this… 

Inline cache (checkcast on a.elementClass only)

A given get() call site will usually deal with a
monomorphic StructuredArray (bodySize) AND a
monomorphic elementClass (elementSize)
©2014 Azul Systems, Inc.	
 	
 	
 	
 	
 	
Recognition of classes 

and support for hidden fields
Intrinsified implementations will make hard
assumptions about the true finality of certain fields

E.g. lengths, element sizes, and caches of those

E.g. array.length is a truly final instance field

Intrinsic implementations

“private final” in base class is not good enough

E.g. array.length is a truly final instance
field“private final” in base class is not good enough

Intrinsic implementations need hidden fields
Q & A
@giltene
http://www.azulsystems.com
!
http://objectlayout.org

https://github.com/ObjectLayout/ObjectLayout

JVM Language Summit: Object layout workshop

  • 1.
    ©2014 Azul Systems,Inc. Implementation concerns on JVMs Gil Tene, CTO & co-Founder, Azul Systems http://objectlayout.org ObjectLayout / StructuredArray Workshop !
  • 2.
    ©2014 Azul Systems,Inc. ObjectLayout: quick recap Purpose: Match the raw speed benefits C based languages get from commonly used forms of memory layout Focus: Speed. For regular Java Objects. On the heap. Focus: Layout related speed benefits Dead reckoning Streaming (prefetch friendly) It’s all about capturing “enabling semantic limitations”
  • 3.
    ©2014 Azul Systems,Inc. ObjectLayout target forms Easily identified C-based layout forms: array of structs struct foo[]; struct with struct inside struct foo { int a; struct bar b; int c; }; struct with array at the end struct packet { int length; char[] body; }
  • 4.
    ©2014 Azul Systems,Inc. example of speed-enabling limitations Why is Object[] inherently slower than struct foo[]? Java: a mutable array of same-base-type objects C: An immutable array of exact-same-type structures Mutability (of the array itself) & non-uniform member size both force de-reference, and both break streaming StructuredArray<T>: An immutable array of [potentially] mutable exact-same-type (T) objects Supports Instantiation, get(), but not put()…
  • 5.
    ©2014 Azul Systems,Inc. ObjectLayout: “rules” “Vanilla” and “Intrinsified” implementation behavior should be indistinguishable (except for speed) Java code should not be able to crash JVM (or break security lines). E.g. poking at private final fields via reflection But Unsafe is out of scope, obviously
  • 6.
    ©2014 Azul Systems,Inc. What we need from JDKs/JVMs A new heap concept: Contained objects With associated GC behavior A set of JVM calls (unsafe additions) With obvious intrinsification into IR for some Optimizations Mainly aimed at dead-reckoning and checkcast Recognition of classes and support for hidden fields
  • 7.
    ©2014 Azul Systems,Inc. StructuredArrayIntrinsifiableBase https://github.com/ObjectLayout/IntrinsicObjectLayout A version of the “vanilla” ObjectLayout implementation’s StructuredArrayIntrinsifiableBase class vanilla is at https://github.com/ObjectLayout/ObjectLayout Includes all the Java-side (JDK specific) things needed specifically for StructuredArray “go fast” support Many things stubbed out (to compile on vanilla Java) but commented TODOs everywhere tell you what needs to be completed for a JDK to support this stuff.
  • 8.
    ©2014 Azul Systems,Inc. New heap concept: Contained objects A new heap concept: “contained” and “container” objects Contained and container objects are regular objects A heap object can now be: contained, a container, neither, or both Given a contained object, there is a means of finding the immediately containing object If GC needs to move an object that is contained in a live container object, it will move the entire container
  • 9.
    ©2014 Azul Systems,Inc. StructuredArray<Foo> StructuredArray<Foo> Heap StructuredArray<Foo> Foo (contained)Bar (contained) Doof StructuredArray<Moo>[4] Container/Contained Relationships
  • 10.
    ©2014 Azul Systems,Inc. Contained / container liveness Containers implicitly keep their contained objects alive So contained elements needs to be traversed when a container is encountered during liveness determination passes (e.g. marking or copying) ! But Contained objects don’t do anything special
  • 11.
    ©2014 Azul Systems,Inc. What does GC need to do? Simple: when any GC mechanism finds a need to move a contained object, it must move the outermost container as a whole And that’s it So it all comes down to: How do we know an object is contained How do we find the outermost live container Multiple possible alternatives Can vary by collector type and existing infrastructure capabilities
  • 12.
    ©2014 Azul Systems,Inc. Example ways of finding container Back pointer from contained object E.g. in a pre-header B-Tree describing container address ranges Only needs to be in usable form at relocation time Can be developed at mark time… Or at allocation time (with thread-local buffers) Use object start array Already there in collectors with imprecise cards Hybrids
  • 13.
    ©2014 Azul Systems,Inc. Identifying contained and container Contained Most likely via bit in header (aka mark word) Could be done via BTree lookup instead Container Easiest is bit in header Can be derived from Klass instead Or via BTree lookup Both indications only need to be correct at relocation Can be updated via marker Or checked by relocator and fixed if stale
  • 14.
    ©2014 Azul Systems,Inc. JVM API needs (as of July 2014) Getting a ref to a contained object This is the fast path get(index) call need: offset = a.bodySize + (index * a.elementSize) deriveContainedObjectAtOffset(a, offset); Allocation of a variable size slot on the heap allocateHeapForClass(instanceClass, size); Construction of an object at a given address constructObjectAtOffset(o, offset, ctor, args); Various derivations of size, footprint, padding
  • 15.
    ©2014 Azul Systems,Inc. JVM API needs (cont.) Derivations of size, footprint, padding: getInstanceSize(instanceClass); getInstanceSizeWhenContained(instanceClass); getContainingObjectFootprint(containerClass, containedElementSize, numberOfElements); getContainingObjectFootprintWhenContained(…); getPrePaddingInObjectSize();
  • 16.
    ©2014 Azul Systems,Inc. Optimizations Streaming benefits come directly from layout No compiler optimizations needed Dead-reckoning benefits require some compiler support no dereferencing, but…. e = (T) ( a + a.bodySize + (index * a.elementSize) ); elementSize and bodySize are not constant But optimizations similar to CHA & inline-cache apply Also, generics have that ugly checkcast on get()….
  • 17.
    ©2014 Azul Systems,Inc. Optimizations CHA optimization of elementSize and bodySize: StructuredArray<T> is often (usually?) subclassed MassOrder extends StructuredArray<SimpleOrder> constant bodySize, effectively constant elementSize and elementClass Later subclassing can invalidate this… Inline cache (checkcast on a.elementClass only) A given get() call site will usually deal with a monomorphic StructuredArray (bodySize) AND a monomorphic elementClass (elementSize)
  • 18.
    ©2014 Azul Systems,Inc. Recognition of classes and support for hidden fields Intrinsified implementations will make hard assumptions about the true finality of certain fields E.g. lengths, element sizes, and caches of those E.g. array.length is a truly final instance field Intrinsic implementations “private final” in base class is not good enough E.g. array.length is a truly final instance field“private final” in base class is not good enough Intrinsic implementations need hidden fields
  • 19.