ARM procedure calling conventions and recursion


Published on

◆ A portion of code within a larger program. Often called
􀀀 a subroutine or procedure in imperative languages like C
􀀀 methods in OO languages like Java
􀀀 and functions in functional languages like Haskell
◆ Functions return a value. So some purists would say that a C
function returning void is actually a procedure !
◆ Procedures are necessary for:
􀀀 reducing duplication of code and enabling re-use
􀀀 decomposing complex programs into manageable parts
◆ Procedures can call each other and can even call themselves
◆ What happens when we call a procedure?
􀀀 The caller is suspended; control hands over to the callee
􀀀 Callee performs the requested task
􀀀 Callee returns control to the caller

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ARM procedure calling conventions and recursion

  1. 1. Professor David Brailsford ( School of Computer Science University of Nottingham Extra material courtesy of: Jaume Bacardit, Thorsten Altenkirch and Liyang Hu — School of CS, Univ. of Nottm. Steve Furber, Jim Garside and Pete Jinks — School of CS, Univ. of Manchester Lecture 08: ARM Procedure Calling Conventions and RecursionPage 1© David Brailsford 2011
  2. 2. What is a procedure? ◆ A portion of code within a larger program. Often called a subroutine or procedure in imperative languages like C methods in OO languages like Java and functions in functional languages like Haskell ◆ Functions return a value. So some purists would say that a C function returning void is actually a procedure ! ◆ Procedures are necessary for: reducing duplication of code and enabling re-use decomposing complex programs into manageable parts ◆ Procedures can call each other and can even call themselves ◆ What happens when we call a procedure? The caller is suspended; control hands over to the callee Callee performs the requested taskPage 2© David Brailsford 2011 Callee returns control to the caller
  3. 3. An Example in C Jumps to a new piece of code int f (int x, int y) { but keeps track of where we return sqrt(x * x + y * y); were before } main function f int main ( ) { printf ("f(5,12) = %dn", f(5, 12)); } Returns to the next instruction of the original codePage 3© David Brailsford 2011
  4. 4. Basic procedure calls on ARM ◆ We already know that the BL instruction uses R14 as the link register (LR) This is where it stores the return address ◆ So in simple cases, at the end of the procedure, all we need to do is MOV PC, LR ◆ In simple cases routines may be able to do their job solely with registers ◆ We’ve seen some simple examples of this with the strcpy and strchr procedures in courseworks. ◆ But we need conventions for register usage to avoid over-writing and misunderstandings ◆ Thus we have the APCS (ARM Procedure Call Standard) to guide usPage 4© David Brailsford 2011
  5. 5. APCS Register Use Convention Register APCS name APCS Role 0 a1 Argument 1/integer result/ scratch register 1 a2 Argument 2/scratch register 2 a3 Argument 3/scratch register 3 a4 Argument 4/scratch register 4 v1 Register variable 1 5 v2 Register variable 2 6 v3 Register variable 3 7 v4 Register variable 4 8 v5 Register variable 5 9 sb/v6 Static base / Register variable 6 10 sl/v7 Stack limit / Register variable 7 11 fp Frame pointer 12 ip Scratch register/ specialist use by linker 13 sp Lower end of current stack frame 14 lr link address / scratch register 15 pc Program counterPage 5© David Brailsford 2011
  6. 6. Caller Saved Registers ◆ R0–R3 used to pass arguments into a function ◆ But inside the function they may be used for any purpose (they are scratch registers). R0 often delivers back the result ◆ Caller must expect R0–R3 contents to be trashed (i.e. over-written) when a function call returns. ◆ If caller doesn’t want this to happen then it must save R0–R3 contents beforehand (typically in memory). ◆ A typical simple leaf function e.g. strlen (i.e. one which does not call any other function), provided it uses only R0–R3, only needs BL to jump in and MOV PC, LR to returnPage 6© David Brailsford 2011
  7. 7. Callee Saved Registers ◆ R4–R8 (R4–R10 in some variants of APCS) are registers which any called function is required to save. ◆ Therefore they must have unchanged values when control returns to the calling routine (e.g. the main program) ◆ So if the called function needs these registers for extra workspace then it must save them (hence: callee saved) ◆ Of course, it they have been saved then they must be restored before returning to the caller. ◆ Registers are limited in number. Memory has much larger capacity ◆ We need a disciplined way to save stuff in memory. Best solution is a stackPage 7© David Brailsford 2011
  8. 8. The Stack Concept ◆ A stack provides last in, first out storage ◆ It is a most important data structure in Computer Science ◆ Placing words on the stack is termed pushing ◆ Taking words off the stack is called poppingPage 8© David Brailsford 2011
  9. 9. Stack Implementation Choices ◆ Do we grow the stack downwards (descending addresses) or upwards (ascending addresses) in memory? ◆ We need a stack pointer register (SP) to hold address of the top of stack (this SP is R13 on the ARM) ◆ But should R13 point to topmost filled location (stack full) ◆ Or should it point to next empty location just beyond top of stack (stack empty) ◆ No single ‘right answer’. But ARM like many other systems uses a “full descending” approachPage 9© David Brailsford 2011
  10. 10. Standard ARM C address space ◆ ARM C compilers generally arrange the memory address space as follows: top of memory stack stack pointer (sp) stack limit (sl) unused top of heap heap top of application static data static base (sb) application’s image code application base addressPage 10© David Brailsford 2011
  11. 11. Multiple Loads and Stores ◆ If we want to store register values on the stack in memory it’s good to do this en bloc ◆ This is much more efficient than lots of individual STR and LDR instructions ◆ ARM supplies Load and Store Multiple instructions (LDM and STM) for just this purpose ◆ Just like the pre-index modes for single LDR/STR instructions we can use a base register as the indexer — with an option for write-back ◆ In a stack-based discipline we use SP (R13) as the memory indexer ◆ ARM assemblers support a range of suffixes for different stack regimes ◆ But the APCS uses ‘full descending’ STMFD and LDMFD optionsPage 11© David Brailsford 2011
  12. 12. Addressing modes and stack suffix options ◆ There are four addressing modes for multiple load/store instructions IA — Increment After Stack Orientated Suffixes IB — Increment Before Stack Type Push Pop DA — Decrement After Full descending STMFD (STMDB) LDMFD (LDMIA) Full ascending STMFA (STMIB) LDMFA (LDMDA) DB — Decrement Before Empty descending STMED (STMDA) LDMED (LDMIB) Empty ascending STMEA (STMIA) LDMEA (LDMDB) IA IB DA DB LDMxx R10, {R0, R1, R4} R4 High addresses STMxx R10, {R0, R1, R4} R4 R1 R1 R0 R13 R0 R4 R1 R4 R0 R1 R0 Low addresses (a) (b) (c) (d) ◆ We need only the first line of above table (and diagrams (a) and (d) )Page 12© David Brailsford 2011
  13. 13. Multiple Loads and Stores — Details ◆ In the Full Descending scheme a multiple store (STMFD) corresponds to pushing register contents onto the stack ◆ Conversely a multiple load (LDMFD) corresponds to a pop from the stack ◆ These operations could use the mnemonics STMDB and LDMIA if preferred ◆ Let’s assume we want to retrieve data from the stack into registers ◆ Consider LDMFD SP, {R0-R3}. Here the SP holds the base address ◆ The overall effect is equivalent to: LDR R0, [SP] LDR R1, [SP, #4] LDR R2, [SP, #8] LDR R3, [SP, #12] ◆ But notice, in the above sequence, that SP itself has not been changed ◆ If we want SP to be altered (and we usually will) we writePage 13 LDMFD SP!, {R0-R3}© David Brailsford 2011
  14. 14. Stack Frames and Link Registers — Details ◆ Data stored on the stack as part of a function call forms part of the stack frame for that function invocation. ◆ A stack frame can have stored register values, and also allocated space for local variables declared within the function ◆ The stack frame also stores ‘housekeeping’ information e.g. the current value of the LR. (We’ll see why shortly) ◆ When a procedure is exited and we return to the caller of the function, then the whole stack frame content must be popped. ◆ This is why local variables vanish once a function is exited ◆ When doing Load/Store Multiple we generally give a list of registers in curly braces e.g. LDMFD SP, {R1–R4, LR} ◆ Remember: lowest-address item goes to the lowest numbered registerPage 14© David Brailsford 2011
  15. 15. Storing the Link Register — Details ◆ Recall: if we are in a leaf function (which doesn’t call anything else) we don’t need to store the LR. But in all other cases we do! Why? main func1 func2 STMFD sp!, {regs, lr} ... ... ... BL func1 BL func2 ... ... ... LDMFD sp!, {regs, pc} MOV pc, lr ◆ The BL func1 in main stores the return address in LR (R14) ◆ But then the BL func2 inside func1 overwrites it ◆ So func2 returns to func1 OK but if func1 returns to main, using MOV PC, LR then LR would be wrong!Page 15© David Brailsford 2011
  16. 16. Storing the Link Register — More Details ◆ We definitely need to stack the LR value for all non-leaf functions ! ◆ Note the stack frame push and pop instructions at start and end of func1 ◆ Note how the LDMFD asks that the stored LR value be put back into PC ◆ This causes instantaneous return to main. Cute ! ◆ This kind of trick can be used for ‘tail continued’ functions ◆ However, we usually have some ‘clearing up’ to do before we can return ◆ Let’s look at a real example of the situation in the previous diagram ◆ We’ll use strchr (see later slide) as our ‘leaf function’ ◆ This program is a pin-number generator using a character as the ‘seed’Page 16© David Brailsford 2011
  17. 17. The leaf function version of strchr ◆ The index of the first occurrence of a given character within a string is found using strchr ◆ For example the index of ‘o’ in ‘Hello’ is 4 (indexing from 0) ◆ The final coursework gives you a C version of strchr and asks you to convert it to ARM assembler. ◆ Let’s assume that this routine has been written and that it expects, on entry that R1 contains the start address of the string ◆ Also assume that R2 contains the character to be searched for ◆ The index value will be returned in R0Page 17© David Brailsford 2011
  18. 18. The func1 function ◆ We save, on the stack frame, R4-R8 (which APCS says we must preserve) and also LR ◆ Main program. PIN code issued is current year-number (2011) plus input character’s index position in the chosen string. Returned in R0 func1 STMFD SP!, {R4-R8, LR} ; strchr trashes R4 and lots of other stuff may be added ; here, later, that may well trash R5-R8 (which APCS says ; we must save). We now get ready to call strchr ; R1-3 untouched so should be OK BL strchr ;expects str. address in R1 and ch. in R2 ADD R0, R0, R5 LDMFD SP!, {R4-R8,PC} ; restore R4-R8 and return result in R0Page 18© David Brailsford 2011
  19. 19. Global strings and main prog. ◆ Here are the global string declarations and the main program stack EQU 0x1000 B main mesg1 DEFB "the quick brown fox jumps over the lazy dog0" mesg2 DEFB "Please type a single lower-case alphabetic character: 0" mesg3 DEFB "nOK - your pin number is 0" ALIGN main ADR R0, mesg2 SWI 3 SWI 1 ; get the character from keyboard MOV R2, R0 ; seed char now in R2 ADR R1, mesg1 ADR R0, mesg3 SWI 3 ; OK - your pin number is LDR R5, =2011 ; not possible with a MOV MOV SP, #stack BL func1 SWI 4 ; print out pin number SWI 2Page 19© David Brailsford 2011
  20. 20. Notes (+ the stack picture) ◆ Registers R1, R2 and R5 contain vital info. for func1 ◆ Notice that R1 and R2 are passed over into strchr ◆ Returned value from strch added to R0 contents inside func1 ◆ Be clear that after the STMFD SP!, {R4-R8, LR} ‘push’ the stack looks like: ... High addresses LR R8 R7 R6 R5Page 20 SP R4 Low addresses© David Brailsford 2011
  21. 21. Coping with recursion ◆ A recursive function is one that calls itself. ◆ Recursive function theory is of enormous importance for Maths and CS ◆ There has to be a way of escaping from the recursion. Otherwise it will go on for ever (consuming CPU time and memory) ◆ The classic example is the factorial function defined as follows: factorial (n) = n × factorial (n − 1) factorial (0) = 1 ◆ Thus, factorial(4) = 4 × 3 × 2 × 1 × 1 = 24 ◆ Here’s how it is expressed in C: int factorial (int n) { if (n==0) return 1 elsePage 21 return n * factorial (n-1)© David Brailsford 2011 }
  22. 22. More about recursion ◆ For more information see my ‘Notes on Recursion’ handout ◆ Let’s look at how to do recursion in ARM assembler ◆ And the afterwards be very thankful that the C compiler lets us write the version that was on the last slide ! ◆ One of the simplest examples is factorial so let’s do that ◆ The stack will build up a lot of instances of n in separate stack frames waiting to be consumed and multiplied together ◆ If a function calls itself it has to be written with extraordinary care to be general enough to cope with: Initial case when called from main Final case when local instance of n has value 0 ◆ Program we give next takes input argument in R1 and delivers result in R0Page 22© David Brailsford 2011
  23. 23. The factorial program stack EQU 0x1000 input EQU 6 result DEFB " factorial is " B main ALIGN factorial CMP R1, #0 MOVEQ R1, #1 BEQ exit ; base case -- no need for new frame STMFD SP!, {R1, LR} SUB R1, R1, #1 BL factorial LDMFD SP!, {R1,LR} ; restore R1 and LR exit MUL R0, R0, R1 ; answer builds up in R0 MOV PC, LR main MOV R1, #input MOV SP, #stack MOV R0, R1 SWI 4 ADR R0, result SWI 3 MOV R0, #1Page 23© David Brailsford 2011 BL factorial SWI 4 SWI 2
  24. 24. Example stack frames ◆ Diagrams below show: (a) build up of simple stack frames for factorial (b) more general block diagram of typical stack frame ... FP ... High addresses LR LR 3 Saved Registers LR 2 Local variables LR SP SP 1 ... Low addresses (a) (b)Page 24© David Brailsford 2011
  25. 25. More about stack management ◆ Note the factorial stack contains different instances of n ◆ Generating correct code for stack-frame handling is the compiler’s job ◆ Things like factorial, fibbonacci and ackerman are increasingly tough tests of your compiler’s handling of recursion ! ◆ Stack frames can be cleared down by LDMFD ‘pop’ operations ◆ But also useful to have a Frame Pointer (FP) to start of current frame (FP is R11 in the APCS scheme) ◆ Quick clear down of a frame can be done with MOV SP, FP ◆ If arguments and local vbles. are kept on stack frames what about global (and static) variables? Answer: you need something like DEFW ◆ Start point of static variable area can be kept in the static basePage 25 register (R9 on ARM)© David Brailsford 2011