Graal and Truffle: One VM to Rule Them All

Graal and Truffle:
One VM to Rule Them All

Thomas Wuerthinger
Oracle Labs
@thomaswue
12-December-2013,
at ETH Zurich

Disclaimer
The following is intended to provide some insight into a line of
research in Oracle Labs. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decisions. The
development, release, and timing of any features or
functionality described in connection with any Oracle product or
service remains at the sole discretion of Oracle. Any views
expressed in this presentation are my own and do not
necessarily reflect the views of Oracle.

2

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Agenda

§  One VM to Rule Them All?
§  Dynamic Compilation
§  Graal Compiler
§  Truffle System
§  Q&A

3


One Language to Rule Them All?
Let’s ask a search engine…

4


One Language to Rule Them All?
Let’s ask Stack Overflow…

5


Relative Speed of Programming Languages
(as measured by the Computer Language Benchmarks Game, ~1y ago)

One VM to for all languages means
interoperability and being able to
choose the best language for the task!

3

Goal:

6


Agenda

§  Q&A

7


Static versus Dynamic Compilation (1)
§  Static (or ahead-of-time) Compilation
–  Compilation happens before program is run.
–  Can include profiling feedback from sample application runs.

§  Dynamic (or just-in-time) Compilation
–  Compilation happens while the program is running.
–  Base line execution (interpreter or simple compiler) gathers

profiling feeback.
–  Optimization => Deoptimization => Reoptimization cycles.
–  On-stack-replacement (OSR) to switch between the tiers (two or

more execution modes.

8


Static versus Dynamic Compilation (2)
§  Static (or ahead-of-time) Compilation
–  Fast start-up, because compilation and profiling is not part of

application execution time.
–  Predictable performance as only the source program affects the

generated machine code.
§  Dynamic (or just-in-time) Compilation
–  Can exploit exact target platform properties when generating

machine code.
–  Profiling feedback captures part of the application behavior and

increases code quality.
–  The deoptimization capabilities allow the optimized code to be

incomplete and/or use aggressive speculation.
–  Can use assumptions about the current state of the system (e.g.,
loaded classes) in the generated code.

9


Profiling Feedback for Java
§  Branch probabilities
–  Never taken branches can be omitted.
–  Exact probabilities allows if-cascade reordering.

§  Loop frequencies
–  Guide loop unrolling and loop invariant motion.

§  Type profile
–  Optimize instanceof, checkcast type checks (i.e., speculate that

only a specific set of types occurs)
–  Optimize virtual calls or interface calls.

Profiling feedback only helps when the program behavior during
the observed period matches the overall program behavior.

10


Static Single Assignment (SSA) Form
§  Every variable is assigned only once.
§  Phis capture values coming from different control flow branches.
§  Commonly used in compilers as it simplifies optimizations and

traversal along the def-use and use-def chain.

...
if (condition) {
x = value1 + value2;
} else {
x = value2;
}
return x;

11


...
if (condition) {
x1 = value1 + value2;
} else {
x2 = value2;
}
x3 = phi(x1, x2);
return x3;

Agenda

§  Q&A

12


Graal is an …

... extensible,
dynamic compiler using
object-oriented Java programming,
a graph intermediate representation,
and Java snippets.

13


HotSpotVM versus GraalVM
30k LOC

120k LOC

60k LOC

Client

Server

Graal

Compiler Interface

Compilation Queue

Compilation Queue

Compiler Interface

HotSpot

HotSpot

C++

14


Java

Why Java?
Robustness: Runtime exceptions not fatal.
Reflection: Annotations instead of macros.
Meta-Evaluation: IR subgraph expressible in Java code.
Extensibility: No language barrier to the application.
Tooling: Java IDEs speed up the development process.

15


Snippets for Graph Construction
Manual construction:
Node max(ValueNode a, ValueNode b) {
IfNode ifNode = new IfNode(new IntegerLessThanNode(a, b));
ifNode.trueSuccessor().setNext(new ReturnNode(a));
ifNode.falseSuccessor().setNext(new ReturnNode(b));
return ifNode;
}

Expression as snippet:
int max(int a, int b) {
if (a > b) return a;
else return b;
}

16


Data

Code

Lowering
§  Replace one node with multiple other nodes.
–  New nodes provide more detailed description of semantics.
–  New nodes can be optimized and moved separately.

§  General Java lowerings
–  Example: Replace an array store with null check, bounds check,

store check, write operation.
if (array != null && index >= 0 && index < array.length &&
canAssign(array.getClass().getComponentType(), value)) {
*(array + 16 + index*8) = value;
} else { deoptimize; }

§  VM specific lowerings
–  Examples: Replace a monitorenter with the code dependent on the

locking schemes used by the VM

17


Gradual Lowering
3

Nodes per bytecode

2.5

2

Graal
1.5

Client
Server

1

0.5

0

After parsing

After optimizations

After lowering

Before code emission

Numbers obtained while running the DaCapo benchmark suite.

18


Extensibility
•  Multiple Target Platforms (AMD64, SPARC, PTX, HSAIL)
•  Multiple Runtimes (HotSpot and Maxine)
•  Adding new types of Nodes
•  Adding new compiler Phases
abstract
class
Phase
{
abstract
void
run(Graph
g);
}
for
(IfNode
n
:
graph.getNodes(IfNode.class))
{
...
}

Compiler has about 100 different individual modules.

19


Graph IR
• 

Static single assignment (SSA) form with def-use and use-def edges.

• 

Program dependence graph (sea of nodes), but with explicit
distinction between control flow and data flow edges.

• 

Graph visualization tools: IdealGraphVisualizer and c1visualizer.
...

condition

If

...
if (condition) {
result = value1 + value2;
} else {
result = value2;
}
return result;

Begin

Begin

End

End

Merge


Add

Phi

Return

20

value1

value2

Guards
int get(x) {
return x.field;
}

21


Guards
int get(x) {
if (cond) return x.field;
else return 0;
}

22


Eliding Exception Edges
Catch
Operation

Operation
Operation

Actual

Potential
Invoke

1296646

14454

1.11%

BoundsCheck

166770

498

0.30%

NullCheck

1525061

686

0.04%

OutOfMemory

110078

0

0.00%

CheckCast

99192

0

0.00%

DivRem

6082

0

0.00%

MonitorNullCheck

33631

0

0.00%

TOTAL

3237460

15638

0.48%

Numbers obtained while running the DaCapo benchmark suite.

23


Graal GPU Backends
JavaScript, Ruby,
Python, …

Java bytecodes

Truffle AST

Graal IR

PTX

24


HSAIL

Java Peak Performance
§  SPECjvm2008
114

120

100

100
80

76

60
40
20
0

Client

Graal

Server

Conﬁgura*on:
Intel
Core
i7-‐3770
@
3,4
Ghz,
4
Cores
8
Threads,
16
GB
RAM

Comparison
against
HotSpot
changeset
tag
hs25-‐b37
from
June
13,
2013

25


Scala Peak Performance
§  Scala-Dacapo Benchmark Suite
120

100

100

106

80

61
60
40
20
0

Client

Graal

Server

Conﬁgura*on:
Intel
Core
i7-‐3770
@
3,4
Ghz,
4
Cores
8
Threads,
16
GB
RAM

Comparison
against
HotSpot
changeset
tag
hs25-‐b37
from
June
13,
2013

26


Your Compiler Extension?
http://openjdk.java.net/projects/graal/
graal-dev@openjdk.java.net
$ hg clone http://hg.openjdk.java.net/graal/graal
$ cd graal
$ ./mx.sh --vm graal build
$ ./mx.sh ideinit
$ ./mx.sh --vm graal vm

§  Graal Resources

https://wiki.openjdk.java.net/display/Graal/Main
§  Graal License: GPLv2

27


Agenda

§  Q&A

28


“Write Your Own Language”
Current situation

Prototype a new language
Parser and language work to build
syntax tree (AST), AST Interpreter
Write a “real” VM
In C/C++, still using AST interpreter,
spend a lot of time implementing
runtime system, GC, …
People start using it
People complain about performance
Define a bytecode format and
write bytecode interpreter
Performance is still bad
Write a JIT compiler
Improve the garbage collector

29


How it should be

Prototype a new language in Java
Parser and language work to build
syntax tree (AST)
Execute using AST interpreter
People start using it
And it is already fast

Truffle: System Structure

Written by:
Application
Developer

Written in:

Guest Language Application

Guest Language

Language
Developer

Guest Language Implementation

Managed Host Language

VM Expert

Host Services

Managed Host Language
or Unmanaged Language

OS Expert

OS

30


Unmanaged Language
(typically C or C++)

Speculate and Optimize …

Node Rewriting
for Profiling Feedback

U

Compilation using
Partial Evaluation

G

G
U

U

Node Transitions
U

U

I
Uninitialized

S
AST Interpreter
Uninitialized Nodes

I

G
I

I

D

String

Double

G
Generic

31

I

Integer

I

U

I

G


AST Interpreter
Rewritten Nodes

Compiled Code

Partial Evaluation
§  Example function:
–  f(x, y) = x + y + 1

§  Partial evaluation of example function:
–  g(y) = f(1, y) = 1 + y + 1 = y + 2

§  Interpreter function:
–  f(program, arguments) = calculations to interpret the program

§  Partial evaluation of interpreter function (first Futamura projection):
–  g(arguments) = f(#specificProgram, arguments) = compiled version of

#specificProgram that takes arguments as parameters

32


… and Deoptimize and Reoptimize!

Deoptimization
to AST Interpreter

Node Rewriting to Update
Profiling Feedback

G

Recompilation using
Partial Evaluation

G

G
I
I

G
I

G

D

G

I

I
I

33

D

G


I

I

D

G
D

Object add(Object a, Object b) {
if(a instanceof Integer && b instanceof Integer) {
return (int)a + (int)b;
} else if (a instanceof String && b instanceof String) {
return (String)a + (String)b;
} else {
return genericAdd(a, b);
}
}

int add(int a,

String add(String a,

int b) {


return genericAdd(a, b);

return a + b;
}

34

Object b) {

String b) {

return a + b;
}

Object add(Object a,

}

Node Implementation
class IAddNode extends BinaryNode {
int executeInt(Frame f) throws UnexpectedResult {
int a;
try {
a = left.executeInt(f);
} catch (UnexpectedResult ex) {
throw rewrite(f, ex.result, right.execute(f));
}
int b;
try {
b = right.executeInt(f);
} catch (UnexpectedResult ex) {
throw rewrite(f, a, ex.result);
}
try {
return Math.addExact(a, b);
} catch (ArithmeticException ex) {
throw rewrite(f, a, b);
}
}

35


Uninitialized

Double

String

Generic

36


Specializing
FSA

Truffle DSL
@Specialization(rewriteOn=ArithmeticException.class)
int addInt(int a, int b) {
return Math.addExact(a, b);
}
@Specialization
double addDouble(double a, double b) {
return a + b;
}
@Generic
Object addGeneric(Frame f, Object a, Object b) {
// Handling of String omitted for simplicity.
Number aNum = Runtime.toNumber(f, a);
Number bNum = Runtime.toNumber(f, b);
return Double.valueOf(aNum.doubleValue() +
bNum.doubleValue());
}

37


Inline Caching
uninitialized

monomorphic

polymorphic

U

S

megamorphic

S

G
U

S

…

S

U

38


Method Inlining

39


Method Inlining

40


Truffle API Compiler Directives
§  Guards
if(condition)
{

//
some
code
that
is
only
valid
if
condition
is
true

}
else
{

CompilerDirectives.transferToInterpreter();

}

§  Assumptions
Assumption
assumption
=
Truffle.getRuntime().createAssumption();

assumption.check();

//
some
code
that
is
only
valid
if
assumption
is
true

assumption.invalidate();

41


Performance Number Disclaimers
§  All Truffle numbers reflect the current development snapshot.
–  Subject to change at any time (hopefully improve)
–  You have to know a benchmark to understand why it is slow or fast

§  We are not claiming to have complete language implementations.
–  JavaScript: quite complete, passing 99.8% of ECMAScript262 tests
–  Ruby: passing >45% of RubySpec language tests
–  R: early prototype

§  We measure against latest versions of competitors.
§  We measure peak performance (i.e., giving each benchmark enough

iterations to warmup before starting measurement).

§  Benchmarks that are not shown
–  may not run at all, or
–  may not run fast

42


Peak Performance: JavaScript
Speedup relative to V8
2.6

3.0
Truffle
SpiderMonkey

2.5

0.8

1.0
0.9

1.2

1.1

0.9
1.1

0.5
0.6

0.7
0.7

1.0
0.6

1.0

0.8

1.0
0.7

1.5

1.4

1.5

1.6

2.0

0.5

te

u
C

om

po

si

em
gb

x2
bo

bo
yrle

d

r
ye

y
la
ea

na

vi

er

-s

to
k

sp

es

e
ra
y

tra
c

to
cr
yp

bl
lta
de

ric

ha

rd

s

ue

0.0

Selection of benchmarks from Google‘s Octane benchmark suite v1.0

43


Peak Performance: Ruby
Speedup relative to JRuby 1.7.5

14

14
14

16
MRI 2.0.0
Topaz

12

Truf f le
10

0

44


0.6
1.0
1.7

1.8

0.8

1.7
2.7
1.1

0.5

0.2

0.4
0.3
0.7

2

1.7
2.7

4

0.7

4.7
4.5

6

4.9

8

Peak Performance: R
94

Speedup relative to GNUR
100.0
90.0
80.0
70.0
60.0

22

30.0

0.0

45


0.8

2.7

2.1

10.0

2.0

14

20.0

23

40.0

24

38

39

50.0

Language Implementations

Simple
Language

Ruby

C

R

46

JavaScript

Python

Smalltalk

Your
language?


Java

Your Language?
$ hg clone http://hg.openjdk.java.net/graal/graal
$ cd graal
$ ./mx.sh --vm server build
$ ./mx.sh ideinit
$ ./mx.sh --vm server unittest SumTest

§  Truffle API Resources

https://wiki.openjdk.java.net/display/Graal/Truffle+FAQ+and+Guidelines
§  Truffle API License: GPLv2 with Classpath Exception

47


Acknowledgements
Oracle Labs
Laurent Daynès
Erik Eckstein
Michael Haupt
Peter Kessler
Christos Kotselidis
David Leibs
Roland Schatz
Chris Seaton
Doug Simon
Michael Van De Vanter
Christian Wimmer
Christian Wirth
Mario Wolczko
Thomas Würthinger
Laura Hill (Manager)
Interns
Danilo Ansaloni
Daniele Bonetta
Shams Imam
Stephen Kell
Gregor Richards
Rifat Shariyar

48


JKU Linz
Prof. Hanspeter Mössenböck
Gilles Duboscq
Matthias Grimmer
Christian Häubl
Josef Haider
Christian Humer
Christian Huber
Manuel Rigger
Lukas Stadler
Bernhard Urban
Andreas Wöß
University of Edinburgh
Christophe Dubach
Juan José Fumero Alfonso
Ranjeet Singh
Toomas Remmelg
LaBRI
Floréal Morandat

University of California, Irvine
Prof. Michael Franz
Codrut Stancu
Gulfem Savrun Yeniceri
Wei Zhang
Purdue University
Prof. Jan Vitek
Tomas Kalibera
Petr Maj 
Lei Zhao
T. U. Dortmund
Prof. Peter Marwedel
Helena Kotthaus
Ingo Korb
University of California, Davis
Prof. Duncan Temple Lang
Nicholas Ulle

@thomaswue

Q/A

49


50


51


Graal and Truffle: One VM to Rule Them All

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Graal and Truffle: One VM to Rule Them All

Similar to Graal and Truffle: One VM to Rule Them All (20)

Recently uploaded

Recently uploaded (20)

Graal and Truffle: One VM to Rule Them All