Just-In-Time Compiler in PHP 8
Nikita Popov @ betterCode PHP 8
About Me
About Me
●
Dmitry Stogov works on JIT
●
I work on everything else :)
About Me
●
Dmitry Stogov works on JIT
●
I work on everything else :)
●
My JIT involvement mostly QA
Just-In-Time (JIT) Compiler
PHP Code
Opcodes
Virtual
Machine
CPU
Just-In-Time (JIT) Compiler
PHP Code
Opcodes
Virtual
Machine
CPU
Machine
Code
JIT
History
●
Old project started by Zend in PHP 5 times
●
Mainly implemented by Dmitry Stogov
History
●
Early prototypes: The rest of PHP is too slow for
it to matter
History
●
Early prototypes: The rest of PHP is too slow for
it to matter
– Too many allocations
– Too much memory usage
– Too much pointer chasing
– Cache locality is key
History
●
Early prototypes: The rest of PHP is too slow for
it to matter
●
PHPNG (later: PHP 7) project started to
optimize PHP
●
Large performance improvements (2x), no JIT
needed!
History
●
SSA and type inference from JIT integrated into
opcache
●
Used for opcode optimizations
History
●
SSA and type inference from JIT integrated into
opcache
●
Used for opcode optimizations
– Constant Propagation
– Dead Code Elimination
– Refcount Optimization
Configuration
●
Enable opcache
●
opcache.jit_buffer_size=128M
●
Done!
Configuration
●
Advanced configuration:
– opcache.jit (CRTO)
– opcache.jit_debug, opcache.jit_bisect_limit
– opcache.jit_max_root_traces, opcache.jit_max_side_traces,
opcache.jit_max_exit_counters
– opcache.jit_hot_loop, opcache.jit_hot_func, opcache.jit_hot_return,
opcache.jit_hot_side_exit
– opcache.jit_blacklist_root_trace, opcache.jit_blacklist_side_trace
– opcache.jit_max_loop_unrolls, opcache.jit_max_recursive_calls,
opcache.jit_max_recursive_returns, opcache.jit_max_polymorphic_calls
– https://www.php.net/manual/en/opcache.configuration.php
Performance
bench.php
micro_bench.php
PHP-Parser
amphp
Symfony Demo
With Preloading
0 0.5 1 1.5 2 2.5 3 3.5
Baseline: Opcache + No JIT
bench.php
micro_bench.php
PHP-Parser
amphp
Symfony Demo
With Preloading
0 0.5 1 1.5 2 2.5 3 3.5
Baseline: Opcache + No JIT
Performance
bench.php
micro_bench.php
PHP-Parser
amphp
Symfony Demo
With Preloading
0 0.5 1 1.5 2 2.5 3 3.5
Baseline: Opcache + No JIT
Performance
bench.php
micro_bench.php
PHP-Parser
amphp
Symfony Demo
With Preloading
0 0.5 1 1.5 2 2.5 3 3.5
Baseline: Opcache + No JIT
Performance
bench.php
micro_bench.php
PHP-Parser
amphp
Symfony Demo
With Preloading
0 0.5 1 1.5 2 2.5 3 3.5
Baseline: Opcache + No JIT
Performance
Performance
●
Heavily depends on workload
●
Larger impact the more time is spent executing
PHP code (rather than e.g. DB queries)
●
More useful for "non-standard" applications
Function JIT
●
opcache.jit=function
●
Always JITs a whole function
Function JIT
PHP Code
Opcodes
Virtual
Machine
CPU
Machine
Code
JIT
Trigger
Function JIT
●
Trigger: When to JIT
– 0: All functions, on script load
– 1: All functions, on first execution
– 2: Profile first request, JIT hot functions
– 3: Profile on the fly, JIT hot functions
<?php
function sum(int $n) {
$sum = 0;
for ($i = 0; $i < $n; $i++) {
$sum += $i;
}
return $sum;
}
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
function sum(int $n) {
entry:
$sum_0 = 0;
$i_0 = 0;
goto cond;
loop:
$sum_2 = $sum_1 + $i_1;
$i_2 = $i_1 + 1;
cond:
$sum_1 = phi(entry: $sum_0, loop: $sum_2);
$i_1 = phi(entry: $i_0, loop: $i_2);
if ($i_1 < $n) goto loop;
finish:
return $sum_1;
}
<?php
function sum(int $n) {
entry:
$sum_0 = 0; # int
$i_0 = 0; # int
goto cond;
loop:
$sum_2 = $sum_1 + $i_1; # int|float
$i_2 = $i_1 + 1; # int
cond:
$sum_1 = phi(entry: $sum_0, loop: $sum_2); # int|float
$i_1 = phi(entry: $i_0, loop: $i_2); # int
if ($i_1 < $n) goto loop;
finish:
return $sum_1;
}
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
Assign 0 to $i (in register)
Increment $i (in register)
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
Frame pointer
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
Assign int(0) to $sum
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
Check whether $sum is int
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
Load $sum to register
Add $sum and $i
Write result back
...
.L2:
mov $0x0, 0x60(%r14)
mov $0x4, 0x68(%r14)
xor %rdx, %rdx
jmp .L5
.L3:
mov %rsi, 0x50(%r14)
mov $0x4, 0x58(%r14)
cmp $0x4, 0x68(%r14)
jnz .L10
mov 0x60(%r14), %rax
add %rdx, %rax
jo .L9
mov %rax, 0x60(%r14)
.L4:
add $0x1, %rdx
.L5:
...
Check if addition overflowed
...
.L9:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd 0x60(%r14), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vcvtsi2sd %rdx, %xmm1, %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
mov $0x5, 0x68(%r14)
jmp .L4
.L10:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rdx, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
jmp .L4
.L11:
...
Convert $i to float
...
.L9:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd 0x60(%r14), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vcvtsi2sd %rdx, %xmm1, %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
mov $0x5, 0x68(%r14)
jmp .L4
.L10:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rdx, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
jmp .L4
.L11:
...
Add (float)$i to $sum
...
.L9:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd 0x60(%r14), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vcvtsi2sd %rdx, %xmm1, %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
mov $0x5, 0x68(%r14)
jmp .L4
.L10:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rdx, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
jmp .L4
.L11:
...
Convert $sum to floatConvert $sum to float
Convert $i to float
...
.L9:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd 0x60(%r14), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vcvtsi2sd %rdx, %xmm1, %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
mov $0x5, 0x68(%r14)
jmp .L4
.L10:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rdx, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
jmp .L4
.L11:
...
Add $sum and $i as floats
Mark $sum slot as float
...
.L9:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd 0x60(%r14), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vcvtsi2sd %rdx, %xmm1, %xmm1
vaddsd %xmm1, %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
mov $0x5, 0x68(%r14)
jmp .L4
.L10:
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rdx, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
jmp .L4
.L11:
...
This code is almost
certainly unused!
Can't store $sum in
register, because it
might turn float
Tracing JIT
VM Execution
+ Profiling
Tracing JIT
VM Execution
+ Profiling
Trace
Collection
Hot
Tracing JIT
VM Execution
+ Profiling
Trace
Collection
Trace
Compilation
Hot
Tracing JIT
VM Execution
+ Profiling
Trace
Collection
Trace
Execution
Trace
Compilation
Hot
Tracing JIT
VM Execution
+ Profiling
Trace
Collection
Trace
Execution
Trace
Compilation
Hot
Deoptimization
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
trace:
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
trace:
if ($i < $n)
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
trace:
if ($i < $n)
$sum += $i;
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
trace:
if ($i < $n)
$sum += $i;
$i++;
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
trace:
if ($i < $n)
$sum += $i;
$i++;
goto trace;
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
$sum_0 = ...;
$i_0 = ...;
trace:
$sum_1 = phi($sum_0, $sum_2);
$i_1 = phi($i_0, $i_2);
if ($i_1 < $n)
$sum_2 = $sum_1 + $i_1;
$i_2 = $i_1 + 1;
goto trace;
<?php
function sum(int $n) {
entry:
$sum = 0;
$i = 0;
goto cond;
loop:
$sum += $i;
$i++;
cond:
if ($i < $n) goto loop;
finish:
return $sum;
}
<?php
$sum_0 = ...; # int
$i_0 = ...;
trace:
$sum_1 = phi($sum_0, $sum_2);
$i_1 = phi($i_0, $i_2);
if ($i_1 < $n) # does not exit
$sum_2 = $sum_1 + $i_1; # int
$i_2 = $i_1 + 1;
goto trace;
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
Check if $sum is int (exit 0)
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
Load $n, $sum, $i into registers
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
Check $i < $n (exit 1)
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
$sum += $i, check overflow (exit 2)
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
$i++
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
Check VM interrupt, like timeout
(exit 3)
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
Exits go to VM or side traces
TRACE-2$sum$5:
mov $EG(jit_trace_num), %rax
mov $0x2, (%rax)
mov 0x70(%r14), %rax
cmp 0x50(%r14), %rax
jge jit$$trace_exit_0
cmp $0x5, 0x68(%r14)
jnz jit$$trace_exit_1
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rax, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
add $0x1, 0x70(%r14)
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz TRACE-1$sum$5+4
jmp jit$$trace_exit_2
TRACE-2$sum$5:
mov $EG(jit_trace_num), %rax
mov $0x2, (%rax)
mov 0x70(%r14), %rax
cmp 0x50(%r14), %rax
jge jit$$trace_exit_0
cmp $0x5, 0x68(%r14)
jnz jit$$trace_exit_1
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rax, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
add $0x1, 0x70(%r14)
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz TRACE-1$sum$5+4
jmp jit$$trace_exit_2
Check if $sum is float
TRACE-2$sum$5:
mov $EG(jit_trace_num), %rax
mov $0x2, (%rax)
mov 0x70(%r14), %rax
cmp 0x50(%r14), %rax
jge jit$$trace_exit_0
cmp $0x5, 0x68(%r14)
jnz jit$$trace_exit_1
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rax, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
add $0x1, 0x70(%r14)
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz TRACE-1$sum$5+4
jmp jit$$trace_exit_2
$sum += (float) $i
TRACE-2$sum$5:
mov $EG(jit_trace_num), %rax
mov $0x2, (%rax)
mov 0x70(%r14), %rax
cmp 0x50(%r14), %rax
jge jit$$trace_exit_0
cmp $0x5, 0x68(%r14)
jnz jit$$trace_exit_1
vxorps %xmm0, %xmm0, %xmm0
vcvtsi2sd %rax, %xmm0, %xmm0
vaddsd 0x60(%r14), %xmm0, %xmm0
vmovsd %xmm0, 0x60(%r14)
add $0x1, 0x70(%r14)
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz TRACE-1$sum$5+4
jmp jit$$trace_exit_2
sub $0x10, %rsp
mov $EG(jit_trace_num), %rax
mov $0x1, (%rax)
cmp $0x4, 0x68(%r14)
jnz jit$$trace_exit_0
mov 0x50(%r14), %rcx
mov 0x60(%r14), %rdx
mov 0x70(%r14), %rsi
.L1:
cmp %rcx, %rsi
jge jit$$trace_exit_1
mov %rdx, %rax
add %rsi, %rax
jo jit$$trace_exit_2
mov %rax, %rdx
add $0x1, %rsi
mov $EG(vm_interrupt), %rax
cmp $0x0, (%rax)
jz .L1
jmp jit$$trace_exit_3
Trace 2
Interception
●
Each opcode stores a "VM handler" pointer
Interception
●
Each opcode stores a "VM handler" pointer
●
Replace handler at function entry, loop headers,
returns
●
Handler counts executions and invokes JIT
Trace Collection
●
Separate VM that collects type info while
executing
Trace Collection
●
Separate VM that collects type info while
executing
●
Traces can span different loops and functions
– Calls effectively get "inlined"
Code Generation
●
Early prototypes used LLVM
– Architecture agnostic
– Supports many sophisticated optimizations
Code Generation
●
Early prototypes used LLVM
– Architecture agnostic
– Supports many sophisticated optimizations
– But: Extremely slow compile-times
Code Generation
●
Early prototypes used LLVM
– Architecture agnostic
– Supports many sophisticated optimizations
– But: Extremely slow compile-times
●
Now using DynASM from the LuaJIT project
– Very fast
– But: Architecture specific
|.macro LONG_MATH_REG, opcode, dst_reg, src_reg
|| switch (opcode) {
|| case ZEND_ADD:
| add dst_reg, src_reg
|| break;
|| case ZEND_SUB:
| sub dst_reg, src_reg
|| break;
|| case ZEND_MUL:
| imul dst_reg, src_reg
|| break;
|| case ZEND_BW_OR:
| or dst_reg, src_reg
|| break;
|| case ZEND_BW_AND:
| and dst_reg, src_reg
|| break;
...
|| }
|.endmacro
|.macro LONG_MATH_REG, opcode, dst_reg, src_reg
|| switch (opcode) {
|| case ZEND_ADD:
| add dst_reg, src_reg
|| break;
|| case ZEND_SUB:
| sub dst_reg, src_reg
|| break;
|| case ZEND_MUL:
| imul dst_reg, src_reg
|| break;
|| case ZEND_BW_OR:
| or dst_reg, src_reg
|| break;
|| case ZEND_BW_AND:
| and dst_reg, src_reg
|| break;
...
|| }
|.endmacro
C code
X86 Assembly with placeholders
Code Generation
●
DynASM itself supports many architectures
●
But JIT code has to be written for each
●
No support for M1 at this time, sorry!
Closing Thoughts
●
Performance benefit workload dependent
– Try it!
Closing Thoughts
●
Performance benefit workload dependent
– Try it!
●
Room for improvement
– E.g. optimizations (loop invariant code motion, etc.)
Closing Thoughts
●
Concern: Stability
– Increased potential for hard to debug, hard to
reproduce bugs
Closing Thoughts
●
Concern: Stability
– Increased potential for hard to debug, hard to
reproduce bugs
●
Concern: Maintenance
– Only one person really understands the JIT
Thank You!

Just-In-Time Compiler in PHP 8

  • 1.
    Just-In-Time Compiler inPHP 8 Nikita Popov @ betterCode PHP 8
  • 2.
  • 3.
    About Me ● Dmitry Stogovworks on JIT ● I work on everything else :)
  • 4.
    About Me ● Dmitry Stogovworks on JIT ● I work on everything else :) ● My JIT involvement mostly QA
  • 5.
    Just-In-Time (JIT) Compiler PHPCode Opcodes Virtual Machine CPU
  • 6.
    Just-In-Time (JIT) Compiler PHPCode Opcodes Virtual Machine CPU Machine Code JIT
  • 7.
    History ● Old project startedby Zend in PHP 5 times ● Mainly implemented by Dmitry Stogov
  • 8.
    History ● Early prototypes: Therest of PHP is too slow for it to matter
  • 9.
    History ● Early prototypes: Therest of PHP is too slow for it to matter – Too many allocations – Too much memory usage – Too much pointer chasing – Cache locality is key
  • 10.
    History ● Early prototypes: Therest of PHP is too slow for it to matter ● PHPNG (later: PHP 7) project started to optimize PHP ● Large performance improvements (2x), no JIT needed!
  • 11.
    History ● SSA and typeinference from JIT integrated into opcache ● Used for opcode optimizations
  • 12.
    History ● SSA and typeinference from JIT integrated into opcache ● Used for opcode optimizations – Constant Propagation – Dead Code Elimination – Refcount Optimization
  • 13.
  • 14.
    Configuration ● Advanced configuration: – opcache.jit(CRTO) – opcache.jit_debug, opcache.jit_bisect_limit – opcache.jit_max_root_traces, opcache.jit_max_side_traces, opcache.jit_max_exit_counters – opcache.jit_hot_loop, opcache.jit_hot_func, opcache.jit_hot_return, opcache.jit_hot_side_exit – opcache.jit_blacklist_root_trace, opcache.jit_blacklist_side_trace – opcache.jit_max_loop_unrolls, opcache.jit_max_recursive_calls, opcache.jit_max_recursive_returns, opcache.jit_max_polymorphic_calls – https://www.php.net/manual/en/opcache.configuration.php
  • 15.
  • 16.
    bench.php micro_bench.php PHP-Parser amphp Symfony Demo With Preloading 00.5 1 1.5 2 2.5 3 3.5 Baseline: Opcache + No JIT Performance
  • 17.
    bench.php micro_bench.php PHP-Parser amphp Symfony Demo With Preloading 00.5 1 1.5 2 2.5 3 3.5 Baseline: Opcache + No JIT Performance
  • 18.
    bench.php micro_bench.php PHP-Parser amphp Symfony Demo With Preloading 00.5 1 1.5 2 2.5 3 3.5 Baseline: Opcache + No JIT Performance
  • 19.
    bench.php micro_bench.php PHP-Parser amphp Symfony Demo With Preloading 00.5 1 1.5 2 2.5 3 3.5 Baseline: Opcache + No JIT Performance
  • 20.
    Performance ● Heavily depends onworkload ● Larger impact the more time is spent executing PHP code (rather than e.g. DB queries) ● More useful for "non-standard" applications
  • 21.
  • 22.
  • 23.
    Function JIT ● Trigger: Whento JIT – 0: All functions, on script load – 1: All functions, on first execution – 2: Profile first request, JIT hot functions – 3: Profile on the fly, JIT hot functions
  • 24.
    <?php function sum(int $n){ $sum = 0; for ($i = 0; $i < $n; $i++) { $sum += $i; } return $sum; }
  • 25.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; }
  • 26.
    <?php function sum(int $n){ entry: $sum_0 = 0; $i_0 = 0; goto cond; loop: $sum_2 = $sum_1 + $i_1; $i_2 = $i_1 + 1; cond: $sum_1 = phi(entry: $sum_0, loop: $sum_2); $i_1 = phi(entry: $i_0, loop: $i_2); if ($i_1 < $n) goto loop; finish: return $sum_1; }
  • 27.
    <?php function sum(int $n){ entry: $sum_0 = 0; # int $i_0 = 0; # int goto cond; loop: $sum_2 = $sum_1 + $i_1; # int|float $i_2 = $i_1 + 1; # int cond: $sum_1 = phi(entry: $sum_0, loop: $sum_2); # int|float $i_1 = phi(entry: $i_0, loop: $i_2); # int if ($i_1 < $n) goto loop; finish: return $sum_1; }
  • 28.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ...
  • 29.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ... Assign 0 to $i (in register) Increment $i (in register)
  • 30.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ... Frame pointer
  • 31.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ... Assign int(0) to $sum
  • 32.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ... Check whether $sum is int
  • 33.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ... Load $sum to register Add $sum and $i Write result back
  • 34.
    ... .L2: mov $0x0, 0x60(%r14) mov$0x4, 0x68(%r14) xor %rdx, %rdx jmp .L5 .L3: mov %rsi, 0x50(%r14) mov $0x4, 0x58(%r14) cmp $0x4, 0x68(%r14) jnz .L10 mov 0x60(%r14), %rax add %rdx, %rax jo .L9 mov %rax, 0x60(%r14) .L4: add $0x1, %rdx .L5: ... Check if addition overflowed
  • 35.
    ... .L9: vxorps %xmm0, %xmm0,%xmm0 vcvtsi2sd 0x60(%r14), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vcvtsi2sd %rdx, %xmm1, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) mov $0x5, 0x68(%r14) jmp .L4 .L10: vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rdx, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) jmp .L4 .L11: ... Convert $i to float
  • 36.
    ... .L9: vxorps %xmm0, %xmm0,%xmm0 vcvtsi2sd 0x60(%r14), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vcvtsi2sd %rdx, %xmm1, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) mov $0x5, 0x68(%r14) jmp .L4 .L10: vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rdx, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) jmp .L4 .L11: ... Add (float)$i to $sum
  • 37.
    ... .L9: vxorps %xmm0, %xmm0,%xmm0 vcvtsi2sd 0x60(%r14), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vcvtsi2sd %rdx, %xmm1, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) mov $0x5, 0x68(%r14) jmp .L4 .L10: vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rdx, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) jmp .L4 .L11: ... Convert $sum to floatConvert $sum to float Convert $i to float
  • 38.
    ... .L9: vxorps %xmm0, %xmm0,%xmm0 vcvtsi2sd 0x60(%r14), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vcvtsi2sd %rdx, %xmm1, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) mov $0x5, 0x68(%r14) jmp .L4 .L10: vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rdx, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) jmp .L4 .L11: ... Add $sum and $i as floats Mark $sum slot as float
  • 39.
    ... .L9: vxorps %xmm0, %xmm0,%xmm0 vcvtsi2sd 0x60(%r14), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vcvtsi2sd %rdx, %xmm1, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) mov $0x5, 0x68(%r14) jmp .L4 .L10: vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rdx, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) jmp .L4 .L11: ... This code is almost certainly unused! Can't store $sum in register, because it might turn float
  • 40.
  • 41.
    Tracing JIT VM Execution +Profiling Trace Collection Hot
  • 42.
    Tracing JIT VM Execution +Profiling Trace Collection Trace Compilation Hot
  • 43.
    Tracing JIT VM Execution +Profiling Trace Collection Trace Execution Trace Compilation Hot
  • 44.
    Tracing JIT VM Execution +Profiling Trace Collection Trace Execution Trace Compilation Hot Deoptimization
  • 45.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php trace:
  • 46.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php trace: if ($i < $n)
  • 47.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php trace: if ($i < $n) $sum += $i;
  • 48.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php trace: if ($i < $n) $sum += $i; $i++;
  • 49.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php trace: if ($i < $n) $sum += $i; $i++; goto trace;
  • 50.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php $sum_0 = ...; $i_0 = ...; trace: $sum_1 = phi($sum_0, $sum_2); $i_1 = phi($i_0, $i_2); if ($i_1 < $n) $sum_2 = $sum_1 + $i_1; $i_2 = $i_1 + 1; goto trace;
  • 51.
    <?php function sum(int $n){ entry: $sum = 0; $i = 0; goto cond; loop: $sum += $i; $i++; cond: if ($i < $n) goto loop; finish: return $sum; } <?php $sum_0 = ...; # int $i_0 = ...; trace: $sum_1 = phi($sum_0, $sum_2); $i_1 = phi($i_0, $i_2); if ($i_1 < $n) # does not exit $sum_2 = $sum_1 + $i_1; # int $i_2 = $i_1 + 1; goto trace;
  • 52.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3
  • 53.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 Check if $sum is int (exit 0)
  • 54.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 Load $n, $sum, $i into registers
  • 55.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 Check $i < $n (exit 1)
  • 56.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 $sum += $i, check overflow (exit 2)
  • 57.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 $i++
  • 58.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 Check VM interrupt, like timeout (exit 3)
  • 59.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3
  • 60.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 Exits go to VM or side traces
  • 61.
    TRACE-2$sum$5: mov $EG(jit_trace_num), %rax mov$0x2, (%rax) mov 0x70(%r14), %rax cmp 0x50(%r14), %rax jge jit$$trace_exit_0 cmp $0x5, 0x68(%r14) jnz jit$$trace_exit_1 vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rax, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) add $0x1, 0x70(%r14) mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz TRACE-1$sum$5+4 jmp jit$$trace_exit_2
  • 62.
    TRACE-2$sum$5: mov $EG(jit_trace_num), %rax mov$0x2, (%rax) mov 0x70(%r14), %rax cmp 0x50(%r14), %rax jge jit$$trace_exit_0 cmp $0x5, 0x68(%r14) jnz jit$$trace_exit_1 vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rax, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) add $0x1, 0x70(%r14) mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz TRACE-1$sum$5+4 jmp jit$$trace_exit_2 Check if $sum is float
  • 63.
    TRACE-2$sum$5: mov $EG(jit_trace_num), %rax mov$0x2, (%rax) mov 0x70(%r14), %rax cmp 0x50(%r14), %rax jge jit$$trace_exit_0 cmp $0x5, 0x68(%r14) jnz jit$$trace_exit_1 vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rax, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) add $0x1, 0x70(%r14) mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz TRACE-1$sum$5+4 jmp jit$$trace_exit_2 $sum += (float) $i
  • 64.
    TRACE-2$sum$5: mov $EG(jit_trace_num), %rax mov$0x2, (%rax) mov 0x70(%r14), %rax cmp 0x50(%r14), %rax jge jit$$trace_exit_0 cmp $0x5, 0x68(%r14) jnz jit$$trace_exit_1 vxorps %xmm0, %xmm0, %xmm0 vcvtsi2sd %rax, %xmm0, %xmm0 vaddsd 0x60(%r14), %xmm0, %xmm0 vmovsd %xmm0, 0x60(%r14) add $0x1, 0x70(%r14) mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz TRACE-1$sum$5+4 jmp jit$$trace_exit_2
  • 65.
    sub $0x10, %rsp mov$EG(jit_trace_num), %rax mov $0x1, (%rax) cmp $0x4, 0x68(%r14) jnz jit$$trace_exit_0 mov 0x50(%r14), %rcx mov 0x60(%r14), %rdx mov 0x70(%r14), %rsi .L1: cmp %rcx, %rsi jge jit$$trace_exit_1 mov %rdx, %rax add %rsi, %rax jo jit$$trace_exit_2 mov %rax, %rdx add $0x1, %rsi mov $EG(vm_interrupt), %rax cmp $0x0, (%rax) jz .L1 jmp jit$$trace_exit_3 Trace 2
  • 66.
  • 67.
    Interception ● Each opcode storesa "VM handler" pointer ● Replace handler at function entry, loop headers, returns ● Handler counts executions and invokes JIT
  • 68.
    Trace Collection ● Separate VMthat collects type info while executing
  • 69.
    Trace Collection ● Separate VMthat collects type info while executing ● Traces can span different loops and functions – Calls effectively get "inlined"
  • 70.
    Code Generation ● Early prototypesused LLVM – Architecture agnostic – Supports many sophisticated optimizations
  • 71.
    Code Generation ● Early prototypesused LLVM – Architecture agnostic – Supports many sophisticated optimizations – But: Extremely slow compile-times
  • 72.
    Code Generation ● Early prototypesused LLVM – Architecture agnostic – Supports many sophisticated optimizations – But: Extremely slow compile-times ● Now using DynASM from the LuaJIT project – Very fast – But: Architecture specific
  • 73.
    |.macro LONG_MATH_REG, opcode,dst_reg, src_reg || switch (opcode) { || case ZEND_ADD: | add dst_reg, src_reg || break; || case ZEND_SUB: | sub dst_reg, src_reg || break; || case ZEND_MUL: | imul dst_reg, src_reg || break; || case ZEND_BW_OR: | or dst_reg, src_reg || break; || case ZEND_BW_AND: | and dst_reg, src_reg || break; ... || } |.endmacro
  • 74.
    |.macro LONG_MATH_REG, opcode,dst_reg, src_reg || switch (opcode) { || case ZEND_ADD: | add dst_reg, src_reg || break; || case ZEND_SUB: | sub dst_reg, src_reg || break; || case ZEND_MUL: | imul dst_reg, src_reg || break; || case ZEND_BW_OR: | or dst_reg, src_reg || break; || case ZEND_BW_AND: | and dst_reg, src_reg || break; ... || } |.endmacro C code X86 Assembly with placeholders
  • 75.
    Code Generation ● DynASM itselfsupports many architectures ● But JIT code has to be written for each ● No support for M1 at this time, sorry!
  • 76.
    Closing Thoughts ● Performance benefitworkload dependent – Try it!
  • 77.
    Closing Thoughts ● Performance benefitworkload dependent – Try it! ● Room for improvement – E.g. optimizations (loop invariant code motion, etc.)
  • 78.
    Closing Thoughts ● Concern: Stability –Increased potential for hard to debug, hard to reproduce bugs
  • 79.
    Closing Thoughts ● Concern: Stability –Increased potential for hard to debug, hard to reproduce bugs ● Concern: Maintenance – Only one person really understands the JIT
  • 80.