SlideShare a Scribd company logo
1 of 67
Building High-Performance
Language Implementations
With Low Effort
Stefan Marr
FOSDEM 2015, Brussels, Belgium
January 31st, 2015
@smarr
http://stefan-marr.de
Why should you care about how
Programming Languages work?
2
SMBC: http://www.smbc-comics.com/?id=2088
3
SMBC: http://www.smbc-comics.com/?id=2088
Why should you care about how
Programming Languages work?
• Performance isn’t magic
• Domain-specific languages
• More concise
• More productive
• It’s easier than it looks
• Often open source
• Contributions welcome
What’s “High-Performance”?
4
Based on latest data from http://benchmarksgame.alioth.debian.org/
Geometric mean over available benchmarks.
Disclaimer: Not indicate for application performance!
Competitively Fast!
0
3
5
8
10
13
15
18
Java V8 C# Dart Python Lua PHP Ruby
Small and
Manageable
16
260
525
562
1 10 100 1000
What’s “Low Effort”?
5
KLOC: 1000 Lines of Code, without blank lines and comments
V8 JavaScript
HotSpot
Java Virtual Machine
Dart VM
Lua 5.3 interp.
Language Implementation Approaches
6
Source
Program
Interpreter
Run TimeDevelopment
Time
Input
Output
Source
Program
Compiler Binary
Input
Output
Run TimeDevelopment
Time
Simple, but often slow More complex, but often faster
Not ideal for all languages.
Modern Virtual Machines
7
Source
Program
Interpreter
Run TimeDevelopment Time
Input
Output
Binary
Runtime Info
Compiler
Virtual Machine
with
Just-In-Time
Compilation
VMs are Highly Complex
8
Interpreter
Input
Output
Compiler Optimizer
Garbage
Collector
CodeGen
Foreign
Function
Interface
Threads
and
Memory
Model
How to reuse most parts
for a new language?
Debugging
Profiling
…
Easily
500 KLOC
How to reuse most parts
for a new language?
9
Input
Output
Make Interpreters Replaceable Components!
Interpreter
Compiler Optimizer
Garbage
Collector
CodeGen
Foreign
Function
Interface
Threads
and
Memory
Model
Garbage
Collector
…
Interpreter
Interpreter
…
Interpreter-based Approaches
Truffle + Graal
with Partial Evaluation
Oracle Labs
RPython
with Meta-Tracing
[3] Würthinger et al., One VM to Rule Them All, Onward!
2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT
Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
SELF-OPTIMIZING TREES
A Simple Technique for Language Implementation and Optimization
[1] Würthinger, T.; Wöß, A.; Stadler, L.; Duboscq, G.; Simon, D. & Wimmer, C. (2012), Self-
Optimizing AST Interpreters, in 'Proc. of the 8th Dynamic Languages Symposium' , pp. 73-82.
Code Convention
12
Python-ish
Interpreter Code
Java-ish
Application Code
A Simple
Abstract Syntax Tree Interpreter
13
root_node = parse(file)
root_node.execute(Frame())
if (condition) {
cnt := cnt + 1;
} else {
cnt := 0;
}
cnt
1
+
cnt:
=
if
cnt:
=
0
cond
root_node
Implementing AST Nodes
14
if (condition) {
cnt := cnt + 1;
} else {
cnt := 0;
}
class Literal(ASTNode):
final value
def execute(frame):
return value
class VarWrite(ASTNode):
child sub_expr
final idx
def execute(frame):
val := sub_expr.execute(frame)
frame.local_obj[idx]:= val
return val
class VarRead(ASTNode):
final idx
def execute(frame):
return frame.local_obj[idx]
cnt
1
+
cnt:
=
if
cnt:
=
0
cond
Self-Optimization by Node Specialization
15
cnt := cnt + 1
def UninitVarWrite.execute(frame):
val := sub_expr.execute(frame)
return specialize(val).
execute_evaluated(frame, val)
uninitialized
variable write
cnt
1
+
cnt:
=
cnt:
=
def UninitVarWrite.specialize(val):
if val instanceof int:
return replace(IntVarWrite(sub_expr))
elif …:
…
else:
return replace(GenericVarWrite(sub_expr))
specialized
Self-Optimization by Node Specialization
16
cnt := cnt + 1
def IntVarWrite.execute(frame):
try:
val := sub_expr.execute_int(frame)
return execute_eval_int(frame, val)
except ResultExp, e:
return respecialize(e.result).
execute_evaluated(frame, e.result)
def IntVarWrite.execute_eval_int(frame, anInt):
frame.local_int[idx] := anInt
return anInt
int
variable write
cnt
1
+
cnt:
=
Some Possible Self-Optimizations
• Type profiling and specialization
• Value caching
• Inline caching
• Operation inlining
• Library Lowering
17
Library Lowering for Array class
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
18
class Array {
static new(size, lambda) {
return new(size).setAll(lambda);
}
setAll(lambda) {
forEach((i, v) -> { this[i] = lambda.eval(); });
}
}
class Object {
eval() { return this; }
}
Optimizing for Object Values
19
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
.new
Array
global lookup
method
invocation
1000
int literal
‘fast’
string literal
Object, but not a lambda
Optimization
potential
Specialized new(size, lambda)
def UninitArrNew.execute(frame):
size := size_expr.execute(frame)
val := val_expr.execute(frame)
return specialize(size, val).
execute_evaluated(frame, size, val)
20
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
def UninitArrNew.specialize(size, val):
if val instanceof Lambda:
return replace(StdMethodInvocation())
else:
return replace(ArrNewWithValue())
Specialized new(size, lambda)
def ArrNewWithValue.execute_evaluated(frame, size,
val):
return Array([val] * 1000)
21
createSomeArray() { return Array.new(1000, ‘fast fast fast’); }
1 specialized node vs. 1000x `this[i] = lambda.eval()`
1000x `eval() { return this; }`
.new
Array
global lookup
1000
int literal
‘fast’
string literal
specialized
JUST-IN-TIME COMPILATION FOR
INTERPRETERS
Generating Efficient Native Code
22
How to Get Fast Program Execution?
23
VarWrite.execute(frame)
IntVarWrite.execute(frame)
VarRead.execute(frame)
Literal.execute(frame)
ArrayNewWithValue.execute(frame)
..VW_execute() # bin
..IVW_execute() # bin
..VR_execute() # bin
..L_execute() # bin
..ANWV_execute() # bin
Standard Compilation: 1 node at a time
Minimal Optimization Potential
Problems with Node-by-Node Compilation
24
cnt
1
+
cnt:
=
Slow Polymorphic Dispatches
def IntVarWrite.execute(frame):
try:
val := sub_expr.execute_int(frame)
return execute_eval_int(frame, val)
except ResultExp, e:
return respecialize(e.result).
execute_evaluated(frame, e.result)
cnt:
=
Runtime checks in general
Compilation Unit based on User Program
Meta-Tracing Partial Evaluation
Guided By AST
25
cnt
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+
cnt:
=if cnt:
=
0
[3] Würthinger et al., One VM to Rule Them
All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's
Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
RPython
Just-in-Time Compilation with
Meta Tracing
RPython
• Subset of Python
– Type-inferenced
• Generates VMs
27
Interpreter
source
RPython
Toolchain
Meta-Tracing
JIT Compiler
Interpreter
http://rpython.readthedocs.org/
Garbage
Collector
…
Meta-Tracing of an Interpreter
28
cnt
1
+cnt:=
if
cnt:= 0
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
Meta Tracers need to know the Loops
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
29
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
30
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
31
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
32
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
Tracing Records one Concrete Execution
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
33
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
34
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
35
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
guard(right_expr == Const(IntLiteral))
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
36
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
guard(right_expr == Const(IntLiteral))
i5 := right_expr.value # Const(100)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
37
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
guard(right_expr == Const(IntLiteral))
i5 := right_expr.value # Const(100)
guard_no_exception(Const(UnexpectedResult)
Tracing Records one Concrete Execution
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
38
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
guard(right_expr == Const(IntLiteral))
i5 := right_expr.value # Const(100)
guard_no_exception(Const(UnexpectedResult)
b1 := i4 < i5
Tracing Records one Concrete Execution
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
39
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
guard(right_expr == Const(IntLiteral))
i5 := right_expr.value # Const(100)
guard_no_exception(Const(UnexpectedResult)
b1 := i4 < i5
guard_true(b1)
Tracing Records one Concrete Execution
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
jit_merge_point(node=self)
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
40
while (cnt < 100) {
cnt := cnt + 1;
}
Trace
guard(cond_expr == Const(IntLessThan))
guard(left_expr == Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(Const(UnexpectedResult)
guard(right_expr == Const(IntLiteral))
i5 := right_expr.value # Const(100)
guard_no_exception(Const(UnexpectedResult)
b1 := i4 < i5
guard_true(b1)
...
Traces are Ideal for Optimization
guard(cond_expr ==
Const(IntLessThan))
guard(left_expr ==
Const(IntVarRead))
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i2 := a1[i1]
guard(i2 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
guard_no_exception(
Const(UnexpectedResult))
guard(right_expr ==
Const(IntLiteral))
i5 := right_expr.value # Const(100)
guard_no_exception(
Const(UnexpectedResult))
b1 := i4 < i5
guard_true(b1)
...
i1 := left_expr.idx # Const(1)
a1 := frame.layout
i1 := a1[Const(1)]
guard(i1 == Const(F_INT))
i3 := left_expr.idx # Const(1)
a2 := frame.local_int
i4 := a2[i3]
i5 := right_expr.value # Const(100)
b1 := i2 < i5
guard_true(b1)
...
a1 := frame.layout
i1 := a1[1]
guard(i1 == F_INT)
a2 := frame.local_int
i2 := a2[1]
b1 := i2 < 100
guard_true(b1)
...
Truffle + Graal
Just-in-Time Compilation with
Partial Evaluation
Oracle Labs
Truffle+Graal
• Java framework
– AST interpreters
• Based on HotSpot
JVM
43
Interpreter
Graal Compiler +
Truffle Partial Evaluator
http://www.ssw.uni-linz.ac.at/Research/Projects/JVM/Truffle.html
http://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index-2301583.html
Garbage
Collector
…
+ Truffle
Framework
HotSpot JVM
Partial Evaluation Guided By AST
44
cnt
1
+cnt:=
if
cnt:= 0
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
45
while (cnt < 100) {
cnt := cnt + 1;
}
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
cond = cond_expr.execute_bool(frame)
if not cond:
break
body_expr.execute(frame)
46
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLessThan(ASTNode):
child left_expr
child right_expr
def execute_bool(frame):
try:
left = left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = right_expr.execute_int()
expect UnexpectedResult r:
...
return left < right
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
left = cond_expr.left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
47
while (cnt < 100) {
cnt := cnt + 1;
}
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
left = cond_expr.left_expr.execute_int()
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntVarRead(ASTNode):
final idx
def execute_int(frame):
if frame.is_int(idx):
return frame.local_int[idx]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.ex
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
if frame.is_int(1):
left = frame.local_int[1]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
while (cnt < 100) {
cnt := cnt + 1;
}
Optimize Optimistically
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
try:
if frame.is_int(1):
left = frame.local_int[1]
else:
new_node = respecialize()
raise UnexpectedResult(new_node.execute())
except UnexpectedResult r:
...
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
while (cnt < 100) {
cnt := cnt + 1;
}
Optimize Optimistically
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = cond_expr.right_expr.execute_int()
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
Partial Evaluation inlines
based on Runtime Constants
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = 100
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
Classic Optimizations:
Dead Code Elimination
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
try:
right = 100
expect UnexpectedResult r:
...
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
Classic Optimizations:
Constant Propagation
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
right = 100
cond = left < right
if not cond:
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class IntLiteral(ASTNode):
final value
def execute_int(frame):
return value
Classic Optimizations:
Loop Invariant Code Motion
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
while True:
if frame.is_int(1):
left = frame.local_int[1]
else:
__deopt_return_to_interp()
if not (left < 100):
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
class WhileNode(ASTNode):
child cond_expr
child body_expr
def execute(frame):
if not frame.is_int(1):
__deopt_return_to_interp()
while True:
if not (frame.local_int[1] < 100):
break
body_expr.execute(frame)
while (cnt < 100) {
cnt := cnt + 1;
}
Classic Optimizations:
Loop Invariant Code Motion
Compilation Unit based on User Program
Meta-Tracing Partial Evaluation
Guided by AST
58
cnt
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+
cnt:
=if cnt:
=
0
[3] Würthinger et al., One VM to Rule Them
All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's
Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
WHAT’S POSSIBLE FOR A SIMPLE
INTERPRETER?
Results
59
Designed for Teaching:
• Simple
• Conceptual Clarity
• An Interpreter family
– in C, C++, Java, JavaScript,
RPython, Smalltalk
Used in the past by:
http://som-st.github.io
60
Self-Optimizing SOMs
61
SOMME
RTruffleSOM
Meta-Tracing
RPython
SOMPE
TruffleSOM
Partial Evaluation +
Graal Compiler
on the HotSpot JVM
JIT Compiled JIT Compiled
github.com/SOM-st/TruffleSOMgithub.com/SOM-st/RTruffleSOM
Java 8 -server vs. SOM+JIT
JIT-compiled Peak Performance
62
3.5x slower
(min. 1.6x, max. 6.3x)
RPython
2.8x slower
(min. 3%, max. 5x)
Truffle+Graal
Compiled
SOMMT
Compiled
SOMPE
●●●
●●●
●●●●●●●●●●
●
●●●●●●
●●●●
●●
●●
●
●●●●●●
●●●●●●●●●●●
●●●
●●●●●●●
●
●●
●
●●●
●●●●
●
●●●●●●●●
●
●●●●
●●●
●●●●●●●●●●●
●
●
●
●●●
●●●●●●
●
●●●●●●●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●●●●●●●●
●
●
●
●●●●●●●●●●
●
●
●
●●
1
4
8
Bounce
BubbleSort
DeltaBlue
Fannkuch
Mandelbrot
NBody
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Bounce
BubbleSort
DeltaBlue
Fannkuch
Mandelbrot
NBody
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Runtimenormalizedto
Java(compiledorinterpreted)
Implementation: Smaller Than Lua
63
Meta-Tracing
SOMMT (RTruffleSOM)
Partial Evaluation
SOMPE (TruffleSOM)
KLOC: 1000 Lines of Code, without blank lines and comments
4.2
9.8
16
260
525
562
1 10 100 1000
V8 JavaScript
HotSpot
Java Virtual Machine
Dart VM
Lua 5.3 interp.
CONCLUSION
64
Simple and Fast Interpreters are Possible!
• Self-optimizing AST interpreters
• RPython or Truffle for JIT Compilation
65
[1] Würthinger et al., Self-Optimizing AST Interpreters, Proc. of the 8th Dynamic Languages Symposium, 2012, pp.
73-82.
[3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
[4] Marr et al., Are We There Yet? Simple Language Implementation Techniques for the 21st Century. IEEE Software
31(5):60—67, 2014
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
Literature on the ideas:
RPython
• #pypy on irc.freenode.net
• rpython.readthedocs.org
• Kermit Example interpreter
https://bitbucket.org/pypy/example-interpreter
• A Tutorial
http://morepypy.blogspot.be/2011/04/tutorial-
writing-interpreter-with-pypy.html
• Language implementations
https://www.evernote.com/shard/s130/sh/4d42
a591-c540-4516-9911-
c5684334bd45/d391564875442656a514f7ece5
602210
Truffle
• http://mail.openjdk.java.net/
mailman/listinfo/graal-dev
• SimpleLanguage interpreter
https://github.com/OracleLabs/GraalVM/tree/mast
er/graal/com.oracle.truffle.sl/src/com/oracle/truffle
/sl
• A Tutorial
http://cesquivias.github.io/blog/2014/10/13/writin
g-a-language-in-truffle-part-1-a-simple-slow-
interpreter/
• Project
– http://www.ssw.uni-
linz.ac.at/Research/Projects/JVM/Truffle.html
– http://www.oracle.com/technetwork/oracle-
labs/program-languages/overview/index-
2301583.html 66
Big Thank You!
to both communities,
for help, answering questions, debugging support, etc…!!!
Languages: Small, Elegant, and Fast!
67
cn
t
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+cnt:=
if
cnt:= 0
Compiled
SOMMT
Compiled
SOMPE
●●●
●●●
●●●●●●●●●●
●
●●●●●●
●●●●
●●
●●
●
●●●●●●
●●●●●●●●●●●
●●●
●●●●●●●
●
●●
●
●●●
●●●●
●
●●●●●●●●
●
●●●●
●●●
●●●●●●●●●●●
●
●
●
●●●
●●●●●●
●
●●●●●●●
●
●●●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●●●●●●●●
●
●
●
●●●●●●●●●●
●
●
●
●●
1
4
8 Bounce
BubbleSort
DeltaBlue
Fannkuch
Mandelbrot
NBody
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Bounce
BubbleSort
DeltaBlue
Fannkuch
Mandelbrot
NBody
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Runtimenormalizedto
Java(compiledorinterpreted)
3.5x slower
(min. 1.6x, max. 6.3x)
4.2 KLOC
RPython
2.8x slower
(min. 3%, max. 5x)
9.8 KLOC
Truffle+Graal
@smarr | http://stefan-marr.de

More Related Content

What's hot

[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming
Tobias Lindaaker
 
NS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variablesNS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variables
Teerawat Issariyakul
 

What's hot (20)

Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic Languages
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
 
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide show
 
Конверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемыеКонверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемые
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09
 
Introduction to Rust language programming
Introduction to Rust language programmingIntroduction to Rust language programming
Introduction to Rust language programming
 
iOS Development with Blocks
iOS Development with BlocksiOS Development with Blocks
iOS Development with Blocks
 
同態加密
同態加密同態加密
同態加密
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
NS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variablesNS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variables
 
Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014Seeing with Python presented at PyCon AU 2014
Seeing with Python presented at PyCon AU 2014
 
Blazing Fast Windows 8 Apps using Visual C++
Blazing Fast Windows 8 Apps using Visual C++Blazing Fast Windows 8 Apps using Visual C++
Blazing Fast Windows 8 Apps using Visual C++
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Functional Reactive Programming with RxJS
Functional Reactive Programming with RxJSFunctional Reactive Programming with RxJS
Functional Reactive Programming with RxJS
 
NS2 Classifiers
NS2 ClassifiersNS2 Classifiers
NS2 Classifiers
 
Return of c++
Return of c++Return of c++
Return of c++
 
C++ & Java JIT Optimizations: Finding Prime Numbers
C++ & Java JIT Optimizations: Finding Prime NumbersC++ & Java JIT Optimizations: Finding Prime Numbers
C++ & Java JIT Optimizations: Finding Prime Numbers
 

Viewers also liked

Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Stefan Marr
 
Swatch Creative Strategy
Swatch Creative StrategySwatch Creative Strategy
Swatch Creative Strategy
jcheavey
 

Viewers also liked (7)

Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
 
low effort judgement
low effort judgementlow effort judgement
low effort judgement
 
high effort judgement
high effort judgementhigh effort judgement
high effort judgement
 
Swatch Creative Strategy
Swatch Creative StrategySwatch Creative Strategy
Swatch Creative Strategy
 
Swatch case
Swatch caseSwatch case
Swatch case
 
Buying Decision Making Process
Buying Decision Making ProcessBuying Decision Making Process
Buying Decision Making Process
 
Consumer behaviour internal factors
Consumer behaviour internal factorsConsumer behaviour internal factors
Consumer behaviour internal factors
 

Similar to Building High-Performance Language Implementations With Low Effort

Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Guido Chari
 
CA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptx
CA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptxCA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptx
CA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptx
trupeace
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
chiportal
 

Similar to Building High-Performance Language Implementations With Low Effort (20)

Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
 
How to write clean & testable code without losing your mind
How to write clean & testable code without losing your mindHow to write clean & testable code without losing your mind
How to write clean & testable code without losing your mind
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual Machines
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
 
Design Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on ExamplesDesign Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on Examples
 
Andes open cl for RISC-V
Andes open cl for RISC-VAndes open cl for RISC-V
Andes open cl for RISC-V
 
Tools and Techniques for Understanding Threading Behavior in Android
Tools and Techniques for Understanding Threading Behavior in AndroidTools and Techniques for Understanding Threading Behavior in Android
Tools and Techniques for Understanding Threading Behavior in Android
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
Pydiomatic
PydiomaticPydiomatic
Pydiomatic
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
PVS-Studio for Linux Went on a Tour Around Disney
PVS-Studio for Linux Went on a Tour Around DisneyPVS-Studio for Linux Went on a Tour Around Disney
PVS-Studio for Linux Went on a Tour Around Disney
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
CA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptx
CA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptxCA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptx
CA-Lec4-RISCV-Instructions-1aaaaaaaaaa.pptx
 
Kotlin Backend Development 6 Yrs Recap. The Good, the Bad and the Ugly
Kotlin Backend Development 6 Yrs Recap. The Good, the Bad and the UglyKotlin Backend Development 6 Yrs Recap. The Good, the Bad and the Ugly
Kotlin Backend Development 6 Yrs Recap. The Good, the Bad and the Ugly
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
The best language in the world
The best language in the worldThe best language in the world
The best language in the world
 

More from Stefan Marr

Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Stefan Marr
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingSly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of Programming
Stefan Marr
 
Metaprogrammierung und Reflection
Metaprogrammierung und ReflectionMetaprogrammierung und Reflection
Metaprogrammierung und Reflection
Stefan Marr
 
Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?
Stefan Marr
 

More from Stefan Marr (19)

Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent Programming
 
Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
 
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual MachinesSupporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual Machines
 
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
 
Sly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with SmalltalkSly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with Smalltalk
 
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingSly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of Programming
 
PHP.next: Traits
PHP.next: TraitsPHP.next: Traits
PHP.next: Traits
 
The Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore EraThe Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore Era
 
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
 
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
 
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
 
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
 
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
 
VMADL: An Architecture Definition Language for Variability and Composition ...
VMADL: An Architecture Definition Language  for Variability and Composition  ...VMADL: An Architecture Definition Language  for Variability and Composition  ...
VMADL: An Architecture Definition Language for Variability and Composition ...
 
Metaprogrammierung und Reflection
Metaprogrammierung und ReflectionMetaprogrammierung und Reflection
Metaprogrammierung und Reflection
 
Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Building High-Performance Language Implementations With Low Effort

  • 1. Building High-Performance Language Implementations With Low Effort Stefan Marr FOSDEM 2015, Brussels, Belgium January 31st, 2015 @smarr http://stefan-marr.de
  • 2. Why should you care about how Programming Languages work? 2 SMBC: http://www.smbc-comics.com/?id=2088
  • 3. 3 SMBC: http://www.smbc-comics.com/?id=2088 Why should you care about how Programming Languages work? • Performance isn’t magic • Domain-specific languages • More concise • More productive • It’s easier than it looks • Often open source • Contributions welcome
  • 4. What’s “High-Performance”? 4 Based on latest data from http://benchmarksgame.alioth.debian.org/ Geometric mean over available benchmarks. Disclaimer: Not indicate for application performance! Competitively Fast! 0 3 5 8 10 13 15 18 Java V8 C# Dart Python Lua PHP Ruby
  • 5. Small and Manageable 16 260 525 562 1 10 100 1000 What’s “Low Effort”? 5 KLOC: 1000 Lines of Code, without blank lines and comments V8 JavaScript HotSpot Java Virtual Machine Dart VM Lua 5.3 interp.
  • 6. Language Implementation Approaches 6 Source Program Interpreter Run TimeDevelopment Time Input Output Source Program Compiler Binary Input Output Run TimeDevelopment Time Simple, but often slow More complex, but often faster Not ideal for all languages.
  • 7. Modern Virtual Machines 7 Source Program Interpreter Run TimeDevelopment Time Input Output Binary Runtime Info Compiler Virtual Machine with Just-In-Time Compilation
  • 8. VMs are Highly Complex 8 Interpreter Input Output Compiler Optimizer Garbage Collector CodeGen Foreign Function Interface Threads and Memory Model How to reuse most parts for a new language? Debugging Profiling … Easily 500 KLOC
  • 9. How to reuse most parts for a new language? 9 Input Output Make Interpreters Replaceable Components! Interpreter Compiler Optimizer Garbage Collector CodeGen Foreign Function Interface Threads and Memory Model Garbage Collector … Interpreter Interpreter …
  • 10. Interpreter-based Approaches Truffle + Graal with Partial Evaluation Oracle Labs RPython with Meta-Tracing [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204. [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
  • 11. SELF-OPTIMIZING TREES A Simple Technique for Language Implementation and Optimization [1] Würthinger, T.; Wöß, A.; Stadler, L.; Duboscq, G.; Simon, D. & Wimmer, C. (2012), Self- Optimizing AST Interpreters, in 'Proc. of the 8th Dynamic Languages Symposium' , pp. 73-82.
  • 13. A Simple Abstract Syntax Tree Interpreter 13 root_node = parse(file) root_node.execute(Frame()) if (condition) { cnt := cnt + 1; } else { cnt := 0; } cnt 1 + cnt: = if cnt: = 0 cond root_node
  • 14. Implementing AST Nodes 14 if (condition) { cnt := cnt + 1; } else { cnt := 0; } class Literal(ASTNode): final value def execute(frame): return value class VarWrite(ASTNode): child sub_expr final idx def execute(frame): val := sub_expr.execute(frame) frame.local_obj[idx]:= val return val class VarRead(ASTNode): final idx def execute(frame): return frame.local_obj[idx] cnt 1 + cnt: = if cnt: = 0 cond
  • 15. Self-Optimization by Node Specialization 15 cnt := cnt + 1 def UninitVarWrite.execute(frame): val := sub_expr.execute(frame) return specialize(val). execute_evaluated(frame, val) uninitialized variable write cnt 1 + cnt: = cnt: = def UninitVarWrite.specialize(val): if val instanceof int: return replace(IntVarWrite(sub_expr)) elif …: … else: return replace(GenericVarWrite(sub_expr)) specialized
  • 16. Self-Optimization by Node Specialization 16 cnt := cnt + 1 def IntVarWrite.execute(frame): try: val := sub_expr.execute_int(frame) return execute_eval_int(frame, val) except ResultExp, e: return respecialize(e.result). execute_evaluated(frame, e.result) def IntVarWrite.execute_eval_int(frame, anInt): frame.local_int[idx] := anInt return anInt int variable write cnt 1 + cnt: =
  • 17. Some Possible Self-Optimizations • Type profiling and specialization • Value caching • Inline caching • Operation inlining • Library Lowering 17
  • 18. Library Lowering for Array class createSomeArray() { return Array.new(1000, ‘fast fast fast’); } 18 class Array { static new(size, lambda) { return new(size).setAll(lambda); } setAll(lambda) { forEach((i, v) -> { this[i] = lambda.eval(); }); } } class Object { eval() { return this; } }
  • 19. Optimizing for Object Values 19 createSomeArray() { return Array.new(1000, ‘fast fast fast’); } .new Array global lookup method invocation 1000 int literal ‘fast’ string literal Object, but not a lambda Optimization potential
  • 20. Specialized new(size, lambda) def UninitArrNew.execute(frame): size := size_expr.execute(frame) val := val_expr.execute(frame) return specialize(size, val). execute_evaluated(frame, size, val) 20 createSomeArray() { return Array.new(1000, ‘fast fast fast’); } def UninitArrNew.specialize(size, val): if val instanceof Lambda: return replace(StdMethodInvocation()) else: return replace(ArrNewWithValue())
  • 21. Specialized new(size, lambda) def ArrNewWithValue.execute_evaluated(frame, size, val): return Array([val] * 1000) 21 createSomeArray() { return Array.new(1000, ‘fast fast fast’); } 1 specialized node vs. 1000x `this[i] = lambda.eval()` 1000x `eval() { return this; }` .new Array global lookup 1000 int literal ‘fast’ string literal specialized
  • 23. How to Get Fast Program Execution? 23 VarWrite.execute(frame) IntVarWrite.execute(frame) VarRead.execute(frame) Literal.execute(frame) ArrayNewWithValue.execute(frame) ..VW_execute() # bin ..IVW_execute() # bin ..VR_execute() # bin ..L_execute() # bin ..ANWV_execute() # bin Standard Compilation: 1 node at a time Minimal Optimization Potential
  • 24. Problems with Node-by-Node Compilation 24 cnt 1 + cnt: = Slow Polymorphic Dispatches def IntVarWrite.execute(frame): try: val := sub_expr.execute_int(frame) return execute_eval_int(frame, val) except ResultExp, e: return respecialize(e.result). execute_evaluated(frame, e.result) cnt: = Runtime checks in general
  • 25. Compilation Unit based on User Program Meta-Tracing Partial Evaluation Guided By AST 25 cnt 1 + cnt: = if cnt: = 0 cnt 1 + cnt: =if cnt: = 0 [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204. [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
  • 27. RPython • Subset of Python – Type-inferenced • Generates VMs 27 Interpreter source RPython Toolchain Meta-Tracing JIT Compiler Interpreter http://rpython.readthedocs.org/ Garbage Collector …
  • 28. Meta-Tracing of an Interpreter 28 cnt 1 +cnt:= if cnt:= 0 [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
  • 29. Meta Tracers need to know the Loops class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: jit_merge_point(node=self) cond = cond_expr.execute_bool(frame) if not cond: break body_expr.execute(frame) 29 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan))
  • 30. Tracing Records one Concrete Execution class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right 30 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead))
  • 31. Tracing Records one Concrete Execution class IntVarRead(ASTNode): final idx def execute_int(frame): if frame.is_int(idx): return frame.local_int[idx] else: new_node = respecialize() raise UnexpectedResult(new_node.execute()) 31 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1)
  • 32. Tracing Records one Concrete Execution class IntVarRead(ASTNode): final idx def execute_int(frame): if frame.is_int(idx): return frame.local_int[idx] else: new_node = respecialize() raise UnexpectedResult(new_node.execute()) 32 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT))
  • 33. Tracing Records one Concrete Execution class IntVarRead(ASTNode): final idx def execute_int(frame): if frame.is_int(idx): return frame.local_int[idx] else: new_node = respecialize() raise UnexpectedResult(new_node.execute()) 33 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3]
  • 34. Tracing Records one Concrete Execution class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right 34 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult)
  • 35. Tracing Records one Concrete Execution class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right 35 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult) guard(right_expr == Const(IntLiteral))
  • 36. Tracing Records one Concrete Execution class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right 36 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult) guard(right_expr == Const(IntLiteral)) i5 := right_expr.value # Const(100)
  • 37. Tracing Records one Concrete Execution class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right 37 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult) guard(right_expr == Const(IntLiteral)) i5 := right_expr.value # Const(100) guard_no_exception(Const(UnexpectedResult)
  • 38. Tracing Records one Concrete Execution class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right 38 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult) guard(right_expr == Const(IntLiteral)) i5 := right_expr.value # Const(100) guard_no_exception(Const(UnexpectedResult) b1 := i4 < i5
  • 39. Tracing Records one Concrete Execution class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: jit_merge_point(node=self) cond = cond_expr.execute_bool(frame) if not cond: break body_expr.execute(frame) 39 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult) guard(right_expr == Const(IntLiteral)) i5 := right_expr.value # Const(100) guard_no_exception(Const(UnexpectedResult) b1 := i4 < i5 guard_true(b1)
  • 40. Tracing Records one Concrete Execution class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: jit_merge_point(node=self) cond = cond_expr.execute_bool(frame) if not cond: break body_expr.execute(frame) 40 while (cnt < 100) { cnt := cnt + 1; } Trace guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception(Const(UnexpectedResult) guard(right_expr == Const(IntLiteral)) i5 := right_expr.value # Const(100) guard_no_exception(Const(UnexpectedResult) b1 := i4 < i5 guard_true(b1) ...
  • 41. Traces are Ideal for Optimization guard(cond_expr == Const(IntLessThan)) guard(left_expr == Const(IntVarRead)) i1 := left_expr.idx # Const(1) a1 := frame.layout i2 := a1[i1] guard(i2 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] guard_no_exception( Const(UnexpectedResult)) guard(right_expr == Const(IntLiteral)) i5 := right_expr.value # Const(100) guard_no_exception( Const(UnexpectedResult)) b1 := i4 < i5 guard_true(b1) ... i1 := left_expr.idx # Const(1) a1 := frame.layout i1 := a1[Const(1)] guard(i1 == Const(F_INT)) i3 := left_expr.idx # Const(1) a2 := frame.local_int i4 := a2[i3] i5 := right_expr.value # Const(100) b1 := i2 < i5 guard_true(b1) ... a1 := frame.layout i1 := a1[1] guard(i1 == F_INT) a2 := frame.local_int i2 := a2[1] b1 := i2 < 100 guard_true(b1) ...
  • 42. Truffle + Graal Just-in-Time Compilation with Partial Evaluation Oracle Labs
  • 43. Truffle+Graal • Java framework – AST interpreters • Based on HotSpot JVM 43 Interpreter Graal Compiler + Truffle Partial Evaluator http://www.ssw.uni-linz.ac.at/Research/Projects/JVM/Truffle.html http://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index-2301583.html Garbage Collector … + Truffle Framework HotSpot JVM
  • 44. Partial Evaluation Guided By AST 44 cnt 1 +cnt:= if cnt:= 0 [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204.
  • 45. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: cond = cond_expr.execute_bool(frame) if not cond: break body_expr.execute(frame) 45 while (cnt < 100) { cnt := cnt + 1; }
  • 46. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: cond = cond_expr.execute_bool(frame) if not cond: break body_expr.execute(frame) 46 while (cnt < 100) { cnt := cnt + 1; } class IntLessThan(ASTNode): child left_expr child right_expr def execute_bool(frame): try: left = left_expr.execute_int() except UnexpectedResult r: ... try: right = right_expr.execute_int() expect UnexpectedResult r: ... return left < right
  • 47. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: try: left = cond_expr.left_expr.execute_int() except UnexpectedResult r: ... try: right = cond_expr.right_expr.execute_int() expect UnexpectedResult r: ... cond = left < right if not cond: break body_expr.execute(frame) 47 while (cnt < 100) { cnt := cnt + 1; }
  • 48. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: try: left = cond_expr.left_expr.execute_int() except UnexpectedResult r: ... try: right = cond_expr.right_expr.execute_int() expect UnexpectedResult r: ... cond = left < right if not cond: break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; } class IntVarRead(ASTNode): final idx def execute_int(frame): if frame.is_int(idx): return frame.local_int[idx] else: new_node = respecialize() raise UnexpectedResult(new_node.ex
  • 49. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: try: if frame.is_int(1): left = frame.local_int[1] else: new_node = respecialize() raise UnexpectedResult(new_node.execute()) except UnexpectedResult r: ... try: right = cond_expr.right_expr.execute_int() expect UnexpectedResult r: ... cond = left < right if not cond: break while (cnt < 100) { cnt := cnt + 1; }
  • 50. Optimize Optimistically class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: try: if frame.is_int(1): left = frame.local_int[1] else: new_node = respecialize() raise UnexpectedResult(new_node.execute()) except UnexpectedResult r: ... try: right = cond_expr.right_expr.execute_int() expect UnexpectedResult r: ... cond = left < right if not cond: break while (cnt < 100) { cnt := cnt + 1; }
  • 51. Optimize Optimistically class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: if frame.is_int(1): left = frame.local_int[1] else: __deopt_return_to_interp() try: right = cond_expr.right_expr.execute_int() expect UnexpectedResult r: ... cond = left < right if not cond: break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; }
  • 52. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: if frame.is_int(1): left = frame.local_int[1] else: __deopt_return_to_interp() try: right = cond_expr.right_expr.execute_int() expect UnexpectedResult r: ... cond = left < right if not cond: break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; } class IntLiteral(ASTNode): final value def execute_int(frame): return value
  • 53. Partial Evaluation inlines based on Runtime Constants class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: if frame.is_int(1): left = frame.local_int[1] else: __deopt_return_to_interp() try: right = 100 expect UnexpectedResult r: ... cond = left < right if not cond: break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; } class IntLiteral(ASTNode): final value def execute_int(frame): return value
  • 54. Classic Optimizations: Dead Code Elimination class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: if frame.is_int(1): left = frame.local_int[1] else: __deopt_return_to_interp() try: right = 100 expect UnexpectedResult r: ... cond = left < right if not cond: break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; } class IntLiteral(ASTNode): final value def execute_int(frame): return value
  • 55. Classic Optimizations: Constant Propagation class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: if frame.is_int(1): left = frame.local_int[1] else: __deopt_return_to_interp() right = 100 cond = left < right if not cond: break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; } class IntLiteral(ASTNode): final value def execute_int(frame): return value
  • 56. Classic Optimizations: Loop Invariant Code Motion class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): while True: if frame.is_int(1): left = frame.local_int[1] else: __deopt_return_to_interp() if not (left < 100): break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; }
  • 57. class WhileNode(ASTNode): child cond_expr child body_expr def execute(frame): if not frame.is_int(1): __deopt_return_to_interp() while True: if not (frame.local_int[1] < 100): break body_expr.execute(frame) while (cnt < 100) { cnt := cnt + 1; } Classic Optimizations: Loop Invariant Code Motion
  • 58. Compilation Unit based on User Program Meta-Tracing Partial Evaluation Guided by AST 58 cnt 1 + cnt: = if cnt: = 0 cnt 1 + cnt: =if cnt: = 0 [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204. [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
  • 59. WHAT’S POSSIBLE FOR A SIMPLE INTERPRETER? Results 59
  • 60. Designed for Teaching: • Simple • Conceptual Clarity • An Interpreter family – in C, C++, Java, JavaScript, RPython, Smalltalk Used in the past by: http://som-st.github.io 60
  • 61. Self-Optimizing SOMs 61 SOMME RTruffleSOM Meta-Tracing RPython SOMPE TruffleSOM Partial Evaluation + Graal Compiler on the HotSpot JVM JIT Compiled JIT Compiled github.com/SOM-st/TruffleSOMgithub.com/SOM-st/RTruffleSOM
  • 62. Java 8 -server vs. SOM+JIT JIT-compiled Peak Performance 62 3.5x slower (min. 1.6x, max. 6.3x) RPython 2.8x slower (min. 3%, max. 5x) Truffle+Graal Compiled SOMMT Compiled SOMPE ●●● ●●● ●●●●●●●●●● ● ●●●●●● ●●●● ●● ●● ● ●●●●●● ●●●●●●●●●●● ●●● ●●●●●●● ● ●● ● ●●● ●●●● ● ●●●●●●●● ● ●●●● ●●● ●●●●●●●●●●● ● ● ● ●●● ●●●●●● ● ●●●●●●● ● ●●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ●●●●●●●● ● ● ● ●●●●●●●●●● ● ● ● ●● 1 4 8 Bounce BubbleSort DeltaBlue Fannkuch Mandelbrot NBody Permute Queens QuickSort Richards Sieve Storage Towers Bounce BubbleSort DeltaBlue Fannkuch Mandelbrot NBody Permute Queens QuickSort Richards Sieve Storage Towers Runtimenormalizedto Java(compiledorinterpreted)
  • 63. Implementation: Smaller Than Lua 63 Meta-Tracing SOMMT (RTruffleSOM) Partial Evaluation SOMPE (TruffleSOM) KLOC: 1000 Lines of Code, without blank lines and comments 4.2 9.8 16 260 525 562 1 10 100 1000 V8 JavaScript HotSpot Java Virtual Machine Dart VM Lua 5.3 interp.
  • 65. Simple and Fast Interpreters are Possible! • Self-optimizing AST interpreters • RPython or Truffle for JIT Compilation 65 [1] Würthinger et al., Self-Optimizing AST Interpreters, Proc. of the 8th Dynamic Languages Symposium, 2012, pp. 73-82. [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204. [4] Marr et al., Are We There Yet? Simple Language Implementation Techniques for the 21st Century. IEEE Software 31(5):60—67, 2014 [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25. Literature on the ideas:
  • 66. RPython • #pypy on irc.freenode.net • rpython.readthedocs.org • Kermit Example interpreter https://bitbucket.org/pypy/example-interpreter • A Tutorial http://morepypy.blogspot.be/2011/04/tutorial- writing-interpreter-with-pypy.html • Language implementations https://www.evernote.com/shard/s130/sh/4d42 a591-c540-4516-9911- c5684334bd45/d391564875442656a514f7ece5 602210 Truffle • http://mail.openjdk.java.net/ mailman/listinfo/graal-dev • SimpleLanguage interpreter https://github.com/OracleLabs/GraalVM/tree/mast er/graal/com.oracle.truffle.sl/src/com/oracle/truffle /sl • A Tutorial http://cesquivias.github.io/blog/2014/10/13/writin g-a-language-in-truffle-part-1-a-simple-slow- interpreter/ • Project – http://www.ssw.uni- linz.ac.at/Research/Projects/JVM/Truffle.html – http://www.oracle.com/technetwork/oracle- labs/program-languages/overview/index- 2301583.html 66 Big Thank You! to both communities, for help, answering questions, debugging support, etc…!!!
  • 67. Languages: Small, Elegant, and Fast! 67 cn t 1 + cnt: = if cnt: = 0 cnt 1 +cnt:= if cnt:= 0 Compiled SOMMT Compiled SOMPE ●●● ●●● ●●●●●●●●●● ● ●●●●●● ●●●● ●● ●● ● ●●●●●● ●●●●●●●●●●● ●●● ●●●●●●● ● ●● ● ●●● ●●●● ● ●●●●●●●● ● ●●●● ●●● ●●●●●●●●●●● ● ● ● ●●● ●●●●●● ● ●●●●●●● ● ●●●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ●●●●●●●● ● ● ● ●●●●●●●●●● ● ● ● ●● 1 4 8 Bounce BubbleSort DeltaBlue Fannkuch Mandelbrot NBody Permute Queens QuickSort Richards Sieve Storage Towers Bounce BubbleSort DeltaBlue Fannkuch Mandelbrot NBody Permute Queens QuickSort Richards Sieve Storage Towers Runtimenormalizedto Java(compiledorinterpreted) 3.5x slower (min. 1.6x, max. 6.3x) 4.2 KLOC RPython 2.8x slower (min. 3%, max. 5x) 9.8 KLOC Truffle+Graal @smarr | http://stefan-marr.de

Editor's Notes

  1. Self-opt interpreters good way to communicate to compiler ASTs Nodes specialize themselves at runtime Based on observed types or values Using speculation And fallback handling This communicates essential information to optimizer
  2. def IntVarWrite.execute_eval_int(frame, anInt): frame.local_int[idx] := anInt
  3. def IntVarWrite.execute_eval_int(frame, anInt): frame.local_int[idx] := anInt
  4. It is about how to determine the compilation unit. Remember, the interpreter is implemented in one language, and the compilation works on the meta-level. The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation. This avoids the need to write custom JIT compilers.
  5. It is about how to determine the compilation unit. Remember, the interpreter is implemented in one language, and the compilation works on the meta-level. The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation. This avoids the need to write custom JIT compilers.
  6. - No control flow - Just all instructions directly layed out Ideal to identify data dependencies Remove redundant operations Flatten abstraction levels for frameworks, etc
  7. It is about how to determine the compilation unit. Remember, the interpreter is implemented in one language, and the compilation works on the meta-level. The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation. This avoids the need to write custom JIT compilers.
  8. It is about how to determine the compilation unit. Remember, the interpreter is implemented in one language, and the compilation works on the meta-level. The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation. This avoids the need to write custom JIT compilers.