JIT compilation
for CPython
Dmitry Alimov
SPb Python
JIT compilation and JIT history
My experience with JIT in CPython
Python projects that use JIT and projects for JIT
What is JIT compilation
The earliest JIT compiler on LISP by John McCarthy in 1960
The earliest JIT compiler on LISP by John McCarthy in 1960
Ken Thompson in 1968 used for regex in text editor QED
The earliest JIT compiler on LISP by John McCarthy in 1960
Ken Thompson in 1968 used for regex in text editor QED
Popularized by Java with James Gosling using the term from 1993
Just-in-time manufacturing, also known as just-in-time production or the Toyota
Production System (TPS)
My experience with
JIT in CPython
def fibonacci(n):
"""Returns n-th Fibonacci number"""
a = 0
b = 1
if n < 1:
return a
i = 0
while i < n:
temp = a
a = b
b = temp + b
i += 1
return a
Fibonacci Sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...
Let’s JIT it
1) Convert function to machine code at run-time
Let’s JIT it
1) Convert function to machine code at run-time
2) Execute this machine code
Let’s JIT it
Convert function to AST
import ast
import inspect
lines = inspect.getsource(func)
node = ast.parse(lines)
visitor = Visitor()
FunctionDef(name='fibonacci', args=arguments(args=[Name(id='n', ctx=Param())],
vararg=None, kwarg=None, defaults=[]), body=[
Expr(value=Str(s='Returns n-th Fibonacci number')),
Assign(targets=[Name(id='a', ctx=Store())], value=Num(n=0)),
Assign(targets=[Name(id='b', ctx=Store())], value=Num(n=1)),
If(test=Compare(left=Name(id='n', ctx=Load()), ops=[Lt()], comparators=[Num(n=1)]), body=[
Return(value=Name(id='a', ctx=Load()))
], orelse=[]),
Assign(targets=[Name(id='i', ctx=Store())], value=Num(n=0)),
While(test=Compare(left=Name(id='i', ctx=Load()), ops=[Lt()], comparators=[Name(id='n', ctx=Load())]), body=[
Assign(targets=[Name(id='temp', ctx=Store())], value=Name(id='a', ctx=Load())),
Assign(targets=[Name(id='a', ctx=Store())], value=Name(id='b', ctx=Load())),
Assign(targets=[Name(id='b', ctx=Store())], value=BinOp(
left=Name(id='temp', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load()))),
AugAssign(target=Name(id='i', ctx=Store()), op=Add(), value=Num(n=1))
], orelse=[]),
Return(value=Name(id='a', ctx=Load()))
], decorator_list=[Name(id='jit', ctx=Load())])
# for x64 system
args_registers = ['rdi', 'rsi', 'rdx', ...]
registers = ['rax', 'rbx', 'rcx', ...]
# return register: rax
def fibonacci(n): n ⇔ rdi
return a a ⇔ rax
MOV rax, 0
MOV rbx, 1
CMP rdi, 1
JNL label0
MOV rcx, 0
MOV rdx, rax
MOV rax, rbx
ADD rdx, rbx
MOV rbx, rdx
INC rcx
CMP rcx, rdi
JL loop0
MOV <a>, 0
MOV <b>, 1
CMP <n>, 1
JNL label0
MOV <i>, 0
MOV <temp>, <a>
MOV <a>, <b>
ADD <temp>, <b>
MOV <b>, <temp>
INC <i>
CMP <i>, <n>
JL loop0
ASM to machine code
from pwnlib.asm import asm
code = asm(asm_code, arch='amd64')
Create function in memory
1) Allocate memory
Create function in memory
Create function in memory
Signatures in C/C++
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
int mprotect(void *addr, size_t len, int prot);
void *memcpy(void *dest, const void *src, size_t n);
int munmap(void *addr, size_t length);
LPVOID VirtualAlloc(LPVOID lpAddress, SIZE_T dwSize,
DWORD flAllocationType, DWORD flProtect);
BOOL VirtualProtect(LPVOID lpAddress, SIZE_T dwSize,
DWORD flNewProtect, PDWORD lpflOldProtect);
void *memcpy(void *dest, const void *src, size_t count);
BOOL VirtualFree(LPVOID lpAddress, SIZE_T dwSize, DWORD dwFreeType);
Create function in memory
mmap_func = libc.mmap
mmap_func.argtype = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_int,
ctypes.c_int, ctypes.c_int, ctypes.c_size_t]
mmap_func.restype = ctypes.c_void_p
memcpy_func = libc.memcpy
memcpy_func.argtypes = [ctypes.c_void_p, ctypes.c_void_p, ctypes.c_size_t]
memcpy_func.restype = ctypes.c_char_p
Create function in memory
machine_code = 'x48xc7xc0x00x00x00x00x48xc7xc3x01x00x00x00x48
machine_code_size = len(machine_code)
addr = mmap_func(None, machine_code_size, PROT_READ | PROT_WRITE | PROT_EXEC,
memcpy_func(addr, machine_code, machine_code_size)
func = ctypes.CFUNCTYPE(ctypes.c_uint64)(addr)
func.argtypes = [ctypes.c_uint32]
for _ in range(1000000):
n No JIT (s) JIT (s)
0 0,153 0,882
10 1,001 0,878
20 1,805 0,942
30 2,658 0,955
60 4,800 0,928
90 7,117 0,922
500 50,611 1,251
Python 2.7
n No JIT (s) JIT (s)
0 0,150 1,079
10 1,093 0,971
20 2,206 1,135
30 3,313 1,204
60 6,815 1,198
90 10,458 1,270
500 63.949 1,652
for _ in range(1000000):
Python 3.7
Python 2.7 vs 3.7
No JIT: 10.524 s
JIT: 1.185 s
JIT ~8.5 times faster
JIT compilation time: ~0.08 s
No JIT: 7.942 s
JIT: 0.887 s
JIT ~8.5 times faster
JIT compilation time: ~0.07 s
* fibonacci(n=92) = 0x68a3dd8e61eccfbd
fibonacci(n=93) = 0xa94fad42221f2702
0 LOAD_CONST 1 (0)
3 STORE_FAST 1 (a)
6 LOAD_CONST 2 (1)
9 STORE_FAST 2 (b)
12 LOAD_FAST 0 (n)
15 LOAD_CONST 2 (1)
18 COMPARE_OP 0 (<)
24 LOAD_FAST 1 (a)
>> 28 LOAD_CONST 1 (0)
31 STORE_FAST 3 (i)
34 SETUP_LOOP 48 (to 85)
>> 37 LOAD_FAST 3 (i)
40 LOAD_FAST 0 (n)
43 COMPARE_OP 0 (<)
49 LOAD_FAST 1 (a)
52 STORE_FAST 4 (temp)
55 LOAD_FAST 2 (b)
58 STORE_FAST 1 (a)
61 LOAD_FAST 4 (temp)
64 LOAD_FAST 2 (b)
68 STORE_FAST 2 (b)
71 LOAD_FAST 3 (i)
74 LOAD_CONST 2 (1)
78 STORE_FAST 3 (i)
>> 85 LOAD_FAST 1 (a)
MOV rax, 0
MOV rbx, 1
CMP rdi, 1
JNL label0
MOV rcx, 0
MOV rdx, rax
MOV rax, rbx
ADD rdx, rbx
MOV rbx, rdx
INC rcx
CMP rcx, rdi
JL loop0
33 (VM opcodes)
14 (real machine instructions)
Numba makes Python code fast
Numba is an open source JIT compiler that translates a subset of Python and
NumPy code into fast machine code
- Parallelization
- SIMD Vectorization
- GPU Acceleration
from numba import jit
import numpy as np
@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
trace = 0
for i in range(a.shape[0]): # Numba likes loops
trace += np.tanh(a[i, i]) # Numba likes NumPy functions
return a + trace # Numba likes NumPy broadcasting
def matmul(A, B, C):
"""Perform square matrix multiplication of C = A * B
i, j = cuda.grid(2)
if i < C.shape[0] and j < C.shape[1]:
tmp = 0.
for k in range(A.shape[1]):
tmp += A[i, k] * B[k, j]
C[i, j] = tmp
LLVM — compiler infrastructure project
Tutorial “Building a JIT: Starting out with KaleidoscopeJIT”
LLVMPy — Python bindings for LLVM
LLVMLite project by Numba team — lightweight LLVM Python binding for writing
JIT compilers
x86-64 assembler embedded in Python
Portable Efficient Assembly Code-generator in Higher-level Python
from peachpy.x86_64 import *
ADD(eax, 5).encode()
# bytearray(b'x83xc0x05')
MOVAPS(xmm0, xmm1).encode_options()
# [bytearray(b'x0f(xc1'), bytearray(b'x0f)xc8')]
VPSLLVD(ymm0, ymm1, [rsi + 8]).encode_length_options()
# {6: bytearray(b'xc4xe2uGFx08'),
# 7: bytearray(b'xc4xe2uGD&x08'),
# 9: bytearray(b'xc4xe2uGx86x08x00x00x00')}
PyPy is a fast, compliant alternative implementation of the Python language
Python programs often run faster on PyPy thanks to its Just-in-Time compiler
PyPy works best when executing long-running programs where a significant
fraction of the time is spent executing Python code
“If you want your code to run faster, you should probably just use PyPy”
— Guido van Rossum (creator of Python)
Other projects
Pyjion — A JIT for Python based upon CoreCLR
Pyston — built using LLVM and modern JIT techniques
Psyco — extension module which can greatly speed up the execution of code
The first just-in-time compiler for Python, now unmaintained and dead
Unladen Swallow — was an attempt to make LLVM be a JIT compiler for CPython
2. John Aycock: A Brief History of Just-In-Time. ACM Computing Surveys (CSUR) Surveys, volume 35,
issue 2, pages 97-113, June 2003, DOI: 10.1145/857076.857077
Thank you

