1. Chapter 7
Recursion
Recursive definitions
and recursive functions
When we define something in terms that relates back to what we define
itself, then we have what is called a recursive definition.
Recursion: See Recursion.
Not all examples are meaningless like this. Those that are useful consists of
base cases and recursive cases.
2. Base cases
Recursive case
Base case
Recursive case
The factorial of a number n, or the Fibonacci numbers, are classically
defined recursively like this. In the base cases we know the value of the
function immediately, and in the recursive cases we have to do some
computations that involves the function on another input. That “another
input” has to get us closer to a base case, or we will never get an answer—
similar to how we will never terminate an infinite loop.
Determining if a recursive function has a value is as hard as determining if
an algorithm terminates, so we cannot check this; we need to guarantee it
ourselves.
Base cases
Recursive case
Base case
Recursive case
If we have a recursive definition, then translating that into a Python function
that calculates a value based on it is usually straightforward. Recursive
functions work just as you would expect from recursive definitions, and we
won’t talk more about how to write recursive functions in this lecture. We
will, instead, see how function calls are generally implemented on
computers and how this makes recursive functions possible.
3. When we call a recursive function, we start a chain of function calls that
(hopefully) eventually hit a base cases where we can get a value. Then,
returning from the recursion, we can combine the result of function calls
with values in each instance of function calls. We return from the function
calls in the reverse order as we made them.
Function calls, call
stacks, and call stack
frames
If you have, so far, considered Python functions not much different from
mathematical functions—just mappings from input to output—except that
they can have side-effects, then this is because it is a useful abstraction. It
is a useful way to think about functions and you rarely need to think deeper
about them. You will learn something about computers if you try to
understand how functions and function calls are implemented on a machine.
4. When you run a computer program, you essentially have a sequence of
simple instructions and the CPU can execute them and, when the
instructions say so, jump to another point in the instructions. When you call
a function, you need to provide it with arguments and reserve space for its
local variables. If you want to return from the function again, and the
function can be called from more than one location, you need to remember
that as well.
Caveat: Python runs on a virtual machine, so actually is a little different than
explained here, but the concepts are the same.
If we have a recursive function, we will have to have several instances of
calls to the same function, so we cannot just reserve space for the local
variables and return location ones. We have to dynamically do this as
needed. (This isn’t just the case for recursive functions but any function that
might, through some chain of function calls, be called again before its first
call returns).
5. The way computers solve this issue is to have a stack of function calls, or
function frames they are sometimes called. When you call a function, you
push a return location, space for local variables, and function arguments on
the top of the stack. When you then execute the code for the function, you
access local variables relative to the stack pointer. If the same function is
called more than once, the active one is the one at the top of the stack.
When you return from a function call, you remove the call frame from the top
of the stack and continue executing from the location you stored in the
frame.
6. Tail-recursion
A special class of recursive functions are called tail-recursive. Those are
functions where the recursive call is the very last thing the function does. It
doesn’t do anything with the result of the recursive call except return it.
In a chain of tail-recursive calls, all except the first function call will have the
same return location. Since we do not modify any local variables after the
recursive calls, variables in the frames on the stack will not be modified ever
again before they are popped off the stack.
7. In such situations it is possible to optimize away the actual recursion by
reusing the call frame of the first function. In many programming languages,
this optimization is implemented. Python isn’t as dependent on functions
and recursion as other languages, and the optimization is not implemented
her. We can always implement it ourselves, though.
It sometimes requires a little rewriting to make a function tail-recursive. If
there is no more than one recursive call per function, it is relatively easy to
do this. If not, it can still be done but it is substantially harder—and not
worth the effort in Python.
Sometimes it is easier to write a recursive function than an iterative function;
if you do this, it is good to know that you can optimize it as well.
8. Often the recursive function is simpler to understand than the iterative one,
but if the transformation is the same, then you start to recognize the pattern.
If you can write an iterative function straight away, then don’t bother with
going through recursion, but often it is easier to think up a recursion first and
then do the transformation. The divide and conquer algorithms do this. On
this slide, I have written a recursive selection sort and the transformation of
it. You can compare it with the iterative selection sort from chapter 5.
A very simple divide-and-conquer algorithm is quick sort. It is defined
recursively like here, and you will agree that it is a very simple function. (The
partition function is a bit more involved but you get to implement it yourself).
9. We cannot get rid of both recursive calls in the tail-recursion transformation,
but we can get rid of one of them. If we want to keep the stack depth small
—it is a limited resource—we can guarantee that it only grows logarithmic in
the input by only recursing on the smaller sub-problem (can you see why the
runtime is logarithmic then?)
Exercises!
Time to put recursion-fu
into practice