More Related Content Similar to ARM procedure calling conventions and recursion (20) More from Stephan Cadene (9) ARM procedure calling conventions and recursion1. Professor David Brailsford
(dfb@cs.nott.ac.uk)
School of Computer Science
University of Nottingham
Extra material courtesy of:
Jaume Bacardit, Thorsten Altenkirch and Liyang Hu — School of CS, Univ. of Nottm.
Steve Furber, Jim Garside and Pete Jinks — School of CS, Univ. of Manchester
Lecture 08: ARM Procedure Calling Conventions and Recursion
Page 1
© David Brailsford 2011
2. What is a procedure?
◆ A portion of code within a larger program. Often called
a subroutine or procedure in imperative languages like C
methods in OO languages like Java
and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
reducing duplication of code and enabling re-use
decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
The caller is suspended; control hands over to the callee
Callee performs the requested task
Page 2
© David Brailsford 2011 Callee returns control to the caller
3. An Example in C
Jumps to a new piece of code
int f (int x, int y) { but keeps track of where we
return sqrt(x * x + y * y); were before
}
main function f
int main ( ) {
printf ("f(5,12) = %dn", f(5, 12));
} Returns to the next
instruction of the
original code
Page 3
© David Brailsford 2011
4. Basic procedure calls on ARM
◆ We already know that the BL instruction uses R14 as the link register (LR)
This is where it stores the return address
◆ So in simple cases, at the end of the procedure, all we need to do
is MOV PC, LR
◆ In simple cases routines may be able to do their job solely with registers
◆ We’ve seen some simple examples of this with the strcpy and strchr
procedures in courseworks.
◆ But we need conventions for register usage to avoid over-writing
and misunderstandings
◆ Thus we have the APCS (ARM Procedure Call Standard) to guide us
Page 4
© David Brailsford 2011
5. APCS Register Use Convention
Register APCS name APCS Role
0 a1 Argument 1/integer result/ scratch register
1 a2 Argument 2/scratch register
2 a3 Argument 3/scratch register
3 a4 Argument 4/scratch register
4 v1 Register variable 1
5 v2 Register variable 2
6 v3 Register variable 3
7 v4 Register variable 4
8 v5 Register variable 5
9 sb/v6 Static base / Register variable 6
10 sl/v7 Stack limit / Register variable 7
11 fp Frame pointer
12 ip Scratch register/ specialist use by linker
13 sp Lower end of current stack frame
14 lr link address / scratch register
15 pc Program counter
Page 5
© David Brailsford 2011
6. Caller Saved Registers
◆ R0–R3 used to pass arguments into a function
◆ But inside the function they may be used for any purpose (they are
scratch registers). R0 often delivers back the result
◆ Caller must expect R0–R3 contents to be trashed (i.e. over-written)
when a function call returns.
◆ If caller doesn’t want this to happen then it must save R0–R3
contents beforehand (typically in memory).
◆ A typical simple leaf function e.g. strlen (i.e. one
which does not call any other function), provided it uses only
R0–R3, only needs BL to jump in and MOV PC, LR to return
Page 6
© David Brailsford 2011
7. Callee Saved Registers
◆ R4–R8 (R4–R10 in some variants of APCS) are registers which any
called function is required to save.
◆ Therefore they must have unchanged values when control returns to
the calling routine (e.g. the main program)
◆ So if the called function needs these registers for extra workspace
then it must save them (hence: callee saved)
◆ Of course, it they have been saved then they must be restored
before returning to the caller.
◆ Registers are limited in number. Memory has much larger capacity
◆ We need a disciplined way to save stuff in memory. Best solution
is a stack
Page 7
© David Brailsford 2011
8. The Stack Concept
◆ A stack provides last in, first out storage
◆ It is a most important data structure in Computer Science
◆ Placing words on the stack is termed pushing
◆ Taking words off the stack is called popping
Page 8
© David Brailsford 2011
9. Stack Implementation Choices
◆ Do we grow the stack downwards (descending addresses) or
upwards (ascending addresses) in memory?
◆ We need a stack pointer register (SP) to hold address
of the top of stack (this SP is R13 on the ARM)
◆ But should R13 point to topmost filled location (stack full)
◆ Or should it point to next empty location just beyond top of stack
(stack empty)
◆ No single ‘right answer’. But ARM like many other systems
uses a “full descending” approach
Page 9
© David Brailsford 2011
10. Standard ARM C address space
◆ ARM C compilers generally arrange the memory address space
as follows: top of memory
stack stack pointer (sp)
stack limit (sl)
unused
top of heap
heap
top of application
static data
static base (sb)
application’s image code
application base address
Page 10
© David Brailsford 2011
11. Multiple Loads and Stores
◆ If we want to store register values on the stack in memory it’s good
to do this en bloc
◆ This is much more efficient than lots of individual STR and LDR
instructions
◆ ARM supplies Load and Store Multiple instructions (LDM and STM)
for just this purpose
◆ Just like the pre-index modes for single LDR/STR instructions we can
use a base register as the indexer — with an option for write-back
◆ In a stack-based discipline we use SP (R13) as the memory indexer
◆ ARM assemblers support a range of suffixes for different stack regimes
◆ But the APCS uses ‘full descending’ STMFD and LDMFD options
Page 11
© David Brailsford 2011
12. Addressing modes and stack suffix options
◆ There are four addressing modes for multiple load/store instructions
IA — Increment After Stack Orientated Suffixes
IB — Increment Before Stack Type Push Pop
DA — Decrement After Full descending STMFD (STMDB) LDMFD (LDMIA)
Full ascending STMFA (STMIB) LDMFA (LDMDA)
DB — Decrement Before Empty descending STMED (STMDA) LDMED (LDMIB)
Empty ascending STMEA (STMIA) LDMEA (LDMDB)
IA IB DA DB
LDMxx R10, {R0, R1, R4} R4 High addresses
STMxx R10, {R0, R1, R4} R4 R1
R1 R0
R13 R0 R4
R1 R4
R0 R1
R0
Low addresses
(a) (b) (c) (d)
◆ We need only the first line of above table (and diagrams (a) and (d) )
Page 12
© David Brailsford 2011
13. Multiple Loads and Stores — Details
◆ In the Full Descending scheme a multiple store (STMFD) corresponds
to pushing register contents onto the stack
◆ Conversely a multiple load (LDMFD) corresponds to a pop from the stack
◆ These operations could use the mnemonics STMDB and LDMIA if preferred
◆ Let’s assume we want to retrieve data from the stack into registers
◆ Consider LDMFD SP, {R0-R3}. Here the SP holds the base address
◆ The overall effect is equivalent to:
LDR R0, [SP]
LDR R1, [SP, #4]
LDR R2, [SP, #8]
LDR R3, [SP, #12]
◆ But notice, in the above sequence, that SP itself has not been changed
◆ If we want SP to be altered (and we usually will) we write
Page 13
LDMFD SP!, {R0-R3}
© David Brailsford 2011
14. Stack Frames and Link Registers — Details
◆ Data stored on the stack as part of a function call forms part of
the stack frame for that function invocation.
◆ A stack frame can have stored register values, and also allocated
space for local variables declared within the function
◆ The stack frame also stores ‘housekeeping’ information e.g. the
current value of the LR. (We’ll see why shortly)
◆ When a procedure is exited and we return to the caller of the function,
then the whole stack frame content must be popped.
◆ This is why local variables vanish once a function is exited
◆ When doing Load/Store Multiple we generally give a list of registers in
curly braces e.g. LDMFD SP, {R1–R4, LR}
◆ Remember: lowest-address item goes to the lowest numbered register
Page 14
© David Brailsford 2011
15. Storing the Link Register — Details
◆ Recall: if we are in a leaf function (which doesn’t call anything else)
we don’t need to store the LR. But in all other cases we do! Why?
main func1 func2
STMFD
sp!, {regs, lr}
... ...
...
BL func1 BL func2 ...
...
...
LDMFD
sp!, {regs, pc} MOV pc, lr
◆ The BL func1 in main stores the return address in LR (R14)
◆ But then the BL func2 inside func1 overwrites it
◆ So func2 returns to func1 OK but if func1 returns to
main, using MOV PC, LR then LR would be wrong!
Page 15
© David Brailsford 2011
16. Storing the Link Register — More Details
◆ We definitely need to stack the LR value for all non-leaf functions !
◆ Note the stack frame push and pop instructions at start and end of func1
◆ Note how the LDMFD asks that the stored LR value be put back into PC
◆ This causes instantaneous return to main. Cute !
◆ This kind of trick can be used for ‘tail continued’ functions
◆ However, we usually have some ‘clearing up’ to do before we can return
◆ Let’s look at a real example of the situation in the previous diagram
◆ We’ll use strchr (see later slide) as our ‘leaf function’
◆ This program is a pin-number generator using a character as the ‘seed’
Page 16
© David Brailsford 2011
17. The leaf function version of strchr
◆ The index of the first occurrence of a given character within a
string is found using strchr
◆ For example the index of ‘o’ in ‘Hello’ is 4 (indexing from 0)
◆ The final coursework gives you a C version of strchr and asks
you to convert it to ARM assembler.
◆ Let’s assume that this routine has been written and that it expects, on
entry that R1 contains the start address of the string
◆ Also assume that R2 contains the character to be searched for
◆ The index value will be returned in R0
Page 17
© David Brailsford 2011
18. The func1 function
◆ We save, on the stack frame, R4-R8 (which APCS says we must preserve)
and also LR
◆ Main program. PIN code issued is current year-number (2011) plus input
character’s index position in the chosen string. Returned in R0
func1 STMFD SP!, {R4-R8, LR}
; strchr trashes R4 and lots of other stuff may be added
; here, later, that may well trash R5-R8 (which APCS says
; we must save). We now get ready to call strchr
; R1-3 untouched so should be OK
BL strchr ;expects str. address in R1 and ch. in R2
ADD R0, R0, R5
LDMFD SP!, {R4-R8,PC} ; restore R4-R8 and return result in R0
Page 18
© David Brailsford 2011
19. Global strings and main prog.
◆ Here are the global string declarations and the main program
stack EQU 0x1000
B main
mesg1 DEFB "the quick brown fox jumps over the lazy dog0"
mesg2 DEFB "Please type a single lower-case alphabetic character: 0"
mesg3 DEFB "nOK - your pin number is 0"
ALIGN
main ADR R0, mesg2
SWI 3
SWI 1 ; get the character from keyboard
MOV R2, R0 ; seed char now in R2
ADR R1, mesg1
ADR R0, mesg3
SWI 3 ; OK - your pin number is
LDR R5, =2011 ; not possible with a MOV
MOV SP, #stack
BL func1
SWI 4 ; print out pin number
SWI 2
Page 19
© David Brailsford 2011
20. Notes (+ the stack picture)
◆ Registers R1, R2 and R5 contain vital info. for func1
◆ Notice that R1 and R2 are passed over into strchr
◆ Returned value from strch added to R0 contents inside func1
◆ Be clear that after the STMFD SP!, {R4-R8, LR} ‘push’ the
stack looks like:
... High addresses
LR
R8
R7
R6
R5
Page 20 SP R4 Low addresses
© David Brailsford 2011
21. Coping with recursion
◆ A recursive function is one that calls itself.
◆ Recursive function theory is of enormous importance for Maths and CS
◆ There has to be a way of escaping from the recursion. Otherwise it will
go on for ever (consuming CPU time and memory)
◆ The classic example is the factorial function defined as follows:
factorial (n) = n × factorial (n − 1)
factorial (0) = 1
◆ Thus, factorial(4) = 4 × 3 × 2 × 1 × 1 = 24
◆ Here’s how it is expressed in C:
int factorial (int n)
{
if (n==0) return 1
else
Page 21
return n * factorial (n-1)
© David Brailsford 2011 }
22. More about recursion
◆ For more information see my ‘Notes on Recursion’ handout
◆ Let’s look at how to do recursion in ARM assembler
◆ And the afterwards be very thankful that the C compiler lets us write
the version that was on the last slide !
◆ One of the simplest examples is factorial so let’s do that
◆ The stack will build up a lot of instances of n in separate stack
frames waiting to be consumed and multiplied together
◆ If a function calls itself it has to be written with extraordinary care
to be general enough to cope with:
Initial case when called from main
Final case when local instance of n has value 0
◆ Program we give next takes input argument in R1 and delivers result in R0
Page 22
© David Brailsford 2011
23. The factorial program
stack EQU 0x1000
input EQU 6
result DEFB " factorial is "
B main
ALIGN
factorial CMP R1, #0
MOVEQ R1, #1
BEQ exit ; base case -- no need for new frame
STMFD SP!, {R1, LR}
SUB R1, R1, #1
BL factorial
LDMFD SP!, {R1,LR} ; restore R1 and LR
exit MUL R0, R0, R1 ; answer builds up in R0
MOV PC, LR
main MOV R1, #input
MOV SP, #stack
MOV R0, R1
SWI 4
ADR R0, result
SWI 3
MOV R0, #1
Page 23
© David Brailsford 2011
BL factorial
SWI 4
SWI 2
24. Example stack frames
◆ Diagrams below show:
(a) build up of simple stack frames for factorial
(b) more general block diagram of typical stack frame
... FP ... High addresses
LR LR
3
Saved Registers
LR
2
Local variables
LR
SP
SP 1 ... Low addresses
(a) (b)
Page 24
© David Brailsford 2011
25. More about stack management
◆ Note the factorial stack contains different instances of n
◆ Generating correct code for stack-frame handling is the compiler’s job
◆ Things like factorial, fibbonacci and ackerman are increasingly
tough tests of your compiler’s handling of recursion !
◆ Stack frames can be cleared down by LDMFD ‘pop’ operations
◆ But also useful to have a Frame Pointer (FP) to start of current frame
(FP is R11 in the APCS scheme)
◆ Quick clear down of a frame can be done with MOV SP, FP
◆ If arguments and local vbles. are kept on stack frames what about global
(and static) variables? Answer: you need something like DEFW
◆ Start point of static variable area can be kept in the static base
Page 25
register (R9 on ARM)
© David Brailsford 2011