Optimizing tail recursion
in Python using bytecode
(all errors are my own)
Python has a limit on recursion depth:
def factorial(n, accum):
if n <= 1:
return factorial(n-1, accum*n)
RuntimeError: maximum recursion depth exceeded
• Optimize recursive function calls so that they
don’t create new frames, thus avoiding stack
• What we want: eliminate the recursive call;
instead, reset the variables and jump to the
beginning of the function.
How do you change the insides of a function?
new_bytecode = ‘’
for byte in f.__code__.co_code:
if instruction[byte] == ‘LOAD_GLOBAL f’:
get rid of this instruction
elif instruction[byte] == ‘CALL_FUNCTION’:
#replace it with resetting variables and jumping to 0
for arg in *args:
else: #regular byte
f.__code__.co_code = new_bytecode
Not only does it work, it works FASTER
than the original function:
• Timed 10000 calls to fact(450).
Original fact: 1.7009999752
Optimized fact: 1.6970000267
• And faster than other ways of optimizing this.
If our function calls another function…
def sq(x): return x*x
def sum_squares(n, accum):
if n < 1:
return sum_squares(n-1, accum+sq(n))
• Our initial algorithm was removing all calls to a
function, not only the recursive calls, so this would
How do you battle this?
• We need to keep track of function calls and
remove only the recursive calls.
• Unfortunately, bytecode doesn’t know which
function it’s calling: it just calls whatever is on
29 CALL_FUNCTION 2
So we just need to keep track of the
• When we hit ‘LOAD_GLOBAL self’, we start
keeping track of the stack size (stack_size = 0).
• Now, with every byte, we update the stack
• Once we hit stack_size = 0, it means this byte
was the recursive call, so we remove it.
• It allows us to not get rid of calls to other
functions (e.g., identity).
• Make it harder to break.
• Translate “normal” (non-tail) recursion into
tail-recursion (possibly with ASTs)
• Handle mutual recursion
…And some crasy ideas: