Two-level Just-in-Time Compilation with One Interpreter and One Engine

Two-level Just-in-Time Compilation with One
Interpreter and One Engine
Yusuke Izawa 1
Hidehiko Masuhara 1
Carl Friedrich Bolz-Tereick 2
1Tokyo Institute of Technology
2Heinrich-Heine-Universität Düsseldorf
PEPM 2022
January 18, 2022
Two-level JIT Compilation with .. PEPM 2022 1 / 15

Outline
1. Introduction: the folklore in a JIT community and our findings
2. Proposal: Adaptive RPython that performs multitier compilation with “one
interpreter” and “one engine”
3. Observation: to confirm that Adaptive RPython “actually” works

Folklore: A Meta-JIT Compiler Performs a Fixed Kind of
JIT Compilation
• Build an interpreter from scratch for realizing different kinds of JIT
compilers
Interptracing Interpmethod Interpthreaded
Meta-JIT
compiler
Meta-JIT
compiler
Meta-JIT
compiler
Tracing JIT Method JIT
Threaded Code
Gen.[ICOOOLPS 2021]

JIT Compilation
compilers
Meta-JIT
compiler
Meta-JIT
compiler
Meta-JIT
compiler
Tracing JIT Method JIT
Threaded Code
Gen.[ICOOOLPS 2021]
consisting CALL insts in bytecode:
removing indirect-branching

JIT Compilation
compilers
RPython Truffle/Graal
Meta-JIT
compiler
PyPy TruffleRuby Threaded Code
Gen.[ICOOOLPS 2021]

Our Findings Will Affect JIT Community’s Assumption
JIT community assumes that ..
• Meta-tracing JIT compiler can
only do tracing compilation
RPython[interp, source] = ptracing

Our Findings Will Affect JIT Community’s Assumption
JIT community assumes that ..
• Meta-tracing JIT compiler can
only do tracing compilation
But, with our findings ..
• Let meta-tracing JIT do several
compilations, like
− method compilation
− threaded code compilation
− etc.
RPython[interp, source] = ptracing RPython[interp, source] = pα
RPython[interp, source] = pβ
RPython[interp, source] = pγ
· · ·

Our Findings

Our Findings
By providing different interp definitions to RPython, can derive different kinds
of outputs
E.g. when passes ..

Our Findings
of outputs
E.g. when passes ..
• interptracing to RPython → tracing compilation
RPython[interptracing, source] = ptracing

Our Findings
of outputs
E.g. when passes ..
• interpmethod to RPython → method compilation
RPython[interpmethod, source] = pmethod

Our Findings
of outputs
E.g. when passes ..
• interpthreaded to RPython → threaded compilation
RPython[interpthreaded, source] = pthreaded

Our Findings
of outputs
E.g. when passes ..
• interpthreaded to RPython → threaded compilation
RPython[interpthreaded, source] = pthreaded
 In other words ..
By changing an interpreter, we can get different kinds of compilers

Proposal: Multitier Compilation on Adaptive RPython
Adaptive RPython performs multitier compilation with “one interpreter” and
“one engine”
optimization level
threaded code baseline 2 · · ·
tracing
method
tracing + method

“one engine”
optimization level
tracing
method
tracing + method
With Adaptive RPython ..
one generic interp. → common interp + a bit different definitions

“one engine”
optimization level
tracing
method
tracing + method
With Adaptive RPython ..
one generic interp. → common interp + a bit different definitions
perform on one engine = RPython

Overview: Adaptive RPython Performs Multitier
Compilation
(1) A developer writes a generic
interp
Generic interp
Adaptive RPython
Pre-processor
Adaptive RPython
P.E. System

Compilation
(2) Pass information to the
pre-processor
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Adaptive RPython
P.E. System

Compilation
(3) Generate interps from generic
interp
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Itracing Ithreaded Imethod
Adaptive RPython
P.E. System

Compilation
(4) Pass information to the offline
P.E.
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs

Compilation
(5) Tracing compilation:
choose Icommon and Itracing
RPython[Icommon + Itracing,
P, V] = P′
tracing
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Ptracing

Compilation
(6) Threaded code gen. [Izawa et al., 2021]
:
choose Icommon and Ithreaded
RPython[Icommon + Ithreaded,
P, V] = P′
threaded
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Ptracing Pthreaded

Compilation
(7) Method compilation:
choose Icommon and Imethod
RPython[Icommon + Imethod,
P, V] = P′
method
Generic interp
Adaptive RPython
Pre-processor
Which instruc-
tions will be
transformed?
Icommon
Adaptive RPython
P.E. System
Source program
and info about
static and dy-
namic inputs
Ptracing Pthreaded Pmethod

Overview: How to Drive the RPython Engine [ICOOOLPS 2021]
Meta-tracing JIT
• Trace the execution of an interp.
Threaded code generation
A
B
C
D
JUMP
E
F
RET
c
a
l
l
[p0]
i1 = load(..)
i1 = int_add(..)
i2 = int_lt(..)
guard_true(i2) [p0]
..
..
jump(p0)

Meta-tracing JIT
• Trace the execution of an interp

Meta-tracing JIT
• Traverse the entire method body
• Not trace inside the handlers

Meta-tracing JIT
A
B
C
D
JUMP
E
F
RET
c
a
l
l
c
a
l
l
Traverse the en-
tire method body Not trace the
inside but
leave CALL to
the handler
Cut/stitch the
temporal trace
[p0]
i7 = call_i(ConstClass(DUP, ..))
i12 = call_i(ConstClass(CONST_I ..))
i16 = call_i(ConstClass(LT, ..))
guard_true(i16) [p0]
...
jump(p0)
[p0]
...
i28 = call_i(ConstClass(CALL, ..))
...
i32 = call_i(ConstClass(RET, ..2))
leave_portal_frame(0)
finish(i32)
bridge

Meta-tracing JIT
tweaking an interp
A
B
C
D
JUMP
E
F
RET
c
a
l
l
c
a
l
l
Traverse the en-
tire method body Not trace the
inside but
leave CALL to
the handler
Cut/stitch the
temporal trace
[p0]
i7 = call_i(ConstClass(DUP, ..))
i12 = call_i(ConstClass(CONST_I ..))
i16 = call_i(ConstClass(LT, ..))
guard_true(i16) [p0]
...
jump(p0)
[p0]
...
i28 = call_i(ConstClass(CALL, ..))
...
i32 = call_i(ConstClass(RET, ..2))
leave_portal_frame(0)
finish(i32)
bridge

Method-traversal Interpreter: How to Drive the
RPython Engine [ICOOOLPS 2021]
 traverse depth-firstly the entire method body w/ traverse_stack
@dont_look_insdie
def ADD():
..
while True:
if opcde == JUMP_IF:
top = pop()
target = bytecode[pc++]
if top.is_true():
traverse_stack.push(pc++)
pc = target
else:
traverse_stack.push(target)
pc++
elif opcode == JUMP:
if not traverse_stack.is_empty():
pc = traverse_stack.pop()
else:
finish()
elif opcode == RET:
else:
return pop()

@dont_look_insdie
def ADD():
..
while True:
top = pop()
if top.is_true():
pc = target
else:
pc++
else:
finish()
elif opcode == RET:
else:
return pop()
Suppress inlining

@dont_look_insdie
def ADD():
..
while True:
top = pop()
if top.is_true():
pc = target
else:
pc++
else:
finish()
elif opcode == RET:
else:
return pop()
Suppress inlining
Save another side
to traverse later

@dont_look_insdie
def ADD():
..
while True:
top = pop()
if top.is_true():
pc = target
else:
pc++
else:
finish()
elif opcode == RET:
else:
return pop()
Suppress inlining
Save another side
to traverse later
Jump to another side

The Design of Generic Interpreter
• From a generic interp, Adaptive RPython generates interps including MTI
 Embed tier-specific definitions in a meta-tracing-based interpreter
1. Declare JitTierDriver jittierdriver = JitTierDriver(pc='pc')
def interp(self);
..
if opcode == JUMP_IF:
target = bytecode[pc]
elif opcode == RET:
w_x = self.pop()
elif ..

1. Declare JitTierDriver
2. Define can_enter_tier1_XX at
JUMP_IF, JUMP, RET for threaded
code gen. and method comp.
jittierdriver = JitTierDriver(pc='pc')
def interp(self);
..
jittierdriver.can_enter_tier1_branch(
true_path=target,false_path=pc+1,
cond=self.is_true)
if we_are_in_tier2():
jittierdriver.can_enter_tier1_jump(target=target)
elif opcode == RET:
w_x = self.pop()
jittierdriver.can_enter_tier1_ret(ret_value=w_x)
elif ..

3. Define interp. for tracing JIT inside
we_are_in_tier2
def interp(self);
..
cond=self.is_true)
do stuff for tracing JIT
elif opcode == RET:
w_x = self.pop()
elif ..

3. Define interp. for tracing JIT inside
we_are_in_tier2
4. The pre-processor generates
method-traversal interp and tracing
interp from this
def interp(self);
..
cond=self.is_true)
elif opcode == RET:
w_x = self.pop()
elif ..

Observation: Can Adaptive RPython “Actually” Work? (1)
Setup
• Write TLA lang. interpreter in Adaptive RPython
• Run TLA interpreter on small examples
− loopabit: nested loop
− callabit: two functions – one is suitable for tracing, the other is for
thraeded code gen. (method)
NOTE
• Current multitier is the combination of threaded code gen. and tracing
(two-level)

Situation in callabit: increasing an optimization level
program JIT applied to f JIT applied to g
callabit_baseline_interp threaded (interpreted)
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
threaded code
generation
interpreted

callabit_baseline_only threaded threaded
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
threaded code
generation
threaded code
generation

callabit_tracing_baseline tracing baseline
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
tracing compi-
lation
threaded code
generation

callabit_baseline_tracing threaded tracing
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
threaded code
generation
tracing compi-
lation

callabit_baseline_tracing threaded tracing
callabit_tracing_only tracing tracing
function f
(for threaded
code gen.)
function g
(for trac-
ing comp.)
call
ret
tracing compi-
lation
tracing compi-
lation

Observation: Running Speeds and Trace Sizes
• Actually worked: compilation speed is reaching to tracing compilation
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
i
n
t
e
r
p
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
o
n
l
y
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
t
r
a
c
i
n
g
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
b
a
s
e
l
i
n
e
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
o
n
l
y
0.0
0.5
1.0
1.5
2.0
2.5
3.0
The
speed
up
ratio
(interp
=
1)
TLA w/ Adaptive RPython (Stable speed)
# Traces
0
50
100
150
200
250
300
350
400
callabit_baseline_interp
callabit_baseline_only
callabit_baseline_tracing
callabit_tracing_baseline
callabit_tracing_only
Increasing
better
smaller

Observation: Running Speeds and Trace Sizes
• Actually worked: compilation speed is reaching to tracing compilation
• Promising signs: multitier is same speed but smaller code size compared
to single tier → might get good performance in the future
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
i
n
t
e
r
p
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
o
n
l
y
c
a
l
l
a
b
i
t
_
b
a
s
e
l
i
n
e
_
t
r
a
c
i
n
g
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
b
a
s
e
l
i
n
e
c
a
l
l
a
b
i
t
_
t
r
a
c
i
n
g
_
o
n
l
y
0.0
0.5
1.0
1.5
2.0
2.5
3.0
The
speed
up
ratio
(interp
=
1)
TLA w/ Adaptive RPython (Stable speed)
# Traces
0
50
100
150
200
250
300
350
400
callabit_baseline_interp
callabit_baseline_only
callabit_baseline_tracing
callabit_tracing_baseline
callabit_tracing_only
better
smaller

Conclusion and Future Work
Conclusion
• Adaptive RPython actually worked on a small
lang.
RPython [I, P, V] = P′
RPython [ Icommon + Itracing, P, V ] = P′
tracing
RPython [ Icommon + Ithreaded, P, V ] = P′
threaded
RPython [ Icommon + Imethod, P, V ] = P′
method
One
Engine
One Interpreter
Multitier
Outputs
Derive from
Generic Interp.
Common
Interp.
Tweaked
Defs.
Future Work

Conclusion and Future Work
Conclusion
• Adaptive RPython actually worked on a small
lang.
RPython [I, P, V] = P′
RPython [ Icommon + Itracing, P, V ] = P′
tracing
RPython [ Icommon + Ithreaded, P, V ] = P′
threaded
RPython [ Icommon + Imethod, P, V ] = P′
method
One
Engine
One Interpreter
Multitier
Outputs
Derive from
Generic Interp.
Common
Interp.
Tweaked
Defs.
Future Work
• Decide multitier compilation
strategy
− How to shift between
levels?
− How to decide an
appropriate level?
• Implement our ideas on PyPy

References I
Izawa, Y., Masuhara, H., Bolz-Tereick, C. F., and Cong, Y. (2021).
Threaded code generation with a meta-tracing JIT compiler.
The Journal of Object Technology Special Issue for ICOOOLPS 2021, pages 1–11.
Accepted.

Two-level Just-in-Time Compilation with One Interpreter and One Engine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Two-level Just-in-Time Compilation with One Interpreter and One Engine

Similar to Two-level Just-in-Time Compilation with One Interpreter and One Engine (20)

Recently uploaded

Recently uploaded (20)

Two-level Just-in-Time Compilation with One Interpreter and One Engine