SlideShare a Scribd company logo
Professor David Brailsford
                                                  School of Computer Science
                                                   University of Nottingham

                                                   Extra material courtesy of:
                   Jaume Bacardit, Thorsten Altenkirch and Liyang Hu — School of CS, Univ. of Nottm.
                          Steve Furber, Jim Garside and Pete Jinks — School of CS, Univ. of Manchester
                      Lecture 08: ARM Procedure Calling Conventions and Recursion
Page 1
© David Brailsford 2011
What is a procedure?
           ◆ A portion of code within a larger program. Often called
                          a subroutine or procedure in imperative languages like C
                          methods in OO languages like Java
                          and functions in functional languages like Haskell
           ◆ Functions return a value. So some purists would say that a C
               function returning void is actually a procedure !
           ◆ Procedures are necessary for:
                          reducing duplication of code and enabling re-use
                          decomposing complex programs into manageable parts
           ◆ Procedures can call each other and can even call themselves
           ◆ What happens when we call a procedure?
                              The caller is suspended; control hands over to the callee
                              Callee performs the requested task
Page 2
© David Brailsford 2011       Callee returns control to the caller
An Example in C

                                                          Jumps to a new piece of code
            int f (int x, int y) {                        but keeps track of where we
            return sqrt(x * x + y * y);                   were before

                                                   main         function f
            int main ( ) {
            printf ("f(5,12) = %dn", f(5, 12));
   }                                                      Returns to the next
                                                          instruction of the
                                                          original code

Page 3
© David Brailsford 2011
Basic procedure calls on ARM
           ◆ We already know that the BL instruction uses R14 as the link register (LR)
                   This is where it stores the return address
           ◆ So in simple cases, at the end of the procedure, all we need to do
               is MOV PC, LR
           ◆ In simple cases routines may be able to do their job solely with registers
           ◆ We’ve seen some simple examples of this with the strcpy and strchr
                   procedures in courseworks.
           ◆ But we need conventions for register usage to avoid over-writing
                   and misunderstandings
           ◆ Thus we have the APCS (ARM Procedure Call Standard) to guide us

Page 4
© David Brailsford 2011
APCS Register Use Convention

                          Register   APCS name                   APCS Role
                             0           a1      Argument 1/integer result/ scratch register
                             1           a2      Argument 2/scratch register
                             2           a3      Argument 3/scratch register
                             3           a4      Argument 4/scratch register
                             4           v1      Register variable 1
                             5           v2      Register variable 2
                             6           v3      Register variable 3
                             7           v4      Register variable 4
                             8           v5      Register variable 5
                             9         sb/v6     Static base / Register variable 6
                            10         sl/v7     Stack limit / Register variable 7
                            11           fp      Frame pointer
                            12           ip      Scratch register/ specialist use by linker
                            13           sp      Lower end of current stack frame
                            14            lr     link address / scratch register
                            15           pc      Program counter

Page 5
© David Brailsford 2011
Caller Saved Registers
           ◆ R0–R3 used to pass arguments into a function
           ◆ But inside the function they may be used for any purpose (they are
                   scratch registers). R0 often delivers back the result
           ◆ Caller must expect R0–R3 contents to be trashed (i.e. over-written)
                   when a function call returns.
           ◆ If caller doesn’t want this to happen then it must save R0–R3
                   contents beforehand (typically in memory).
           ◆ A typical simple leaf function e.g. strlen (i.e. one
                   which does not call any other function), provided it uses only
                   R0–R3, only needs BL to jump in and MOV PC, LR to return

Page 6
© David Brailsford 2011
Callee Saved Registers
           ◆ R4–R8 (R4–R10 in some variants of APCS) are registers which any
                   called function is required to save.
           ◆ Therefore they must have unchanged values when control returns to
                   the calling routine (e.g. the main program)
           ◆ So if the called function needs these registers for extra workspace
                   then it must save them (hence: callee saved)
           ◆ Of course, it they have been saved then they must be restored
                   before returning to the caller.
           ◆ Registers are limited in number. Memory has much larger capacity
           ◆ We need a disciplined way to save stuff in memory. Best solution
                   is a stack

Page 7
© David Brailsford 2011
The Stack Concept
           ◆ A stack provides last in, first out storage
           ◆ It is a most important data structure in Computer Science
           ◆ Placing words on the stack is termed pushing
           ◆ Taking words off the stack is called popping

Page 8
© David Brailsford 2011
Stack Implementation Choices

           ◆ Do we grow the stack downwards (descending addresses) or
                   upwards (ascending addresses) in memory?
           ◆ We need a stack pointer register (SP) to hold address
                   of the top of stack (this SP is R13 on the ARM)
           ◆ But should R13 point to topmost filled location (stack full)
           ◆ Or should it point to next empty location just beyond top of stack
                   (stack empty)
           ◆ No single ‘right answer’. But ARM like many other systems
                   uses a “full descending” approach

Page 9
© David Brailsford 2011
Standard ARM C address space
           ◆ ARM C compilers generally arrange the memory address space
                   as follows:                                top of memory

                                                  stack       stack pointer (sp)

                                                              stack limit (sl)


                                                              top of heap


                                                              top of application
                                                static data
                                                              static base (sb)

                          application’s image     code

                                                              application base address
Page 10
© David Brailsford 2011
Multiple Loads and Stores

           ◆ If we want to store register values on the stack in memory it’s good
                   to do this en bloc
           ◆ This is much more efficient than lots of individual STR and LDR
           ◆ ARM supplies Load and Store Multiple instructions (LDM and STM)
                   for just this purpose
           ◆ Just like the pre-index modes for single LDR/STR instructions we can
                   use a base register as the indexer — with an option for write-back
           ◆ In a stack-based discipline we use SP (R13) as the memory indexer
           ◆ ARM assemblers support a range of suffixes for different stack regimes
           ◆ But the APCS uses ‘full descending’ STMFD and LDMFD options

Page 11
© David Brailsford 2011
Addressing modes and stack suffix options

           ◆ There are four addressing modes for multiple load/store instructions

              IA — Increment After                                      Stack Orientated Suffixes
              IB — Increment Before                         Stack Type            Push                Pop
              DA — Decrement After                      Full descending     STMFD (STMDB)        LDMFD (LDMIA)
                                                        Full ascending      STMFA (STMIB)        LDMFA (LDMDA)
              DB — Decrement Before                     Empty descending STMED (STMDA)           LDMED (LDMIB)
                                                        Empty ascending     STMEA (STMIA)        LDMEA (LDMDB)

                                                  IA     IB     DA      DB
                  LDMxx R10, {R0, R1, R4}                R4                     High addresses
                  STMxx R10, {R0, R1, R4}         R4     R1
                                                  R1     R0
                                            R13   R0            R4
                                                                R1      R4
                                                                R0      R1
                                                                                 Low addresses
                                                  (a)    (b)    (c)     (d)

           ◆ We need only the first line of above table (and diagrams (a) and (d) )
Page 12
© David Brailsford 2011
Multiple Loads and Stores — Details
           ◆ In the Full Descending scheme a multiple store (STMFD) corresponds
                   to pushing register contents onto the stack
           ◆ Conversely a multiple load (LDMFD) corresponds to a pop from the stack
           ◆ These operations could use the mnemonics STMDB and LDMIA if preferred
           ◆ Let’s assume we want to retrieve data from the stack into registers
           ◆ Consider LDMFD SP, {R0-R3}. Here the SP holds the base address
           ◆ The overall effect is equivalent to:
                  LDR     R0,   [SP]
                  LDR     R1,   [SP, #4]
                  LDR     R2,   [SP, #8]
                  LDR     R3,   [SP, #12]

           ◆ But notice, in the above sequence, that SP itself has not been changed
           ◆ If we want SP to be altered (and we usually will) we write
Page 13
                LDMFD SP!, {R0-R3}
© David Brailsford 2011
Stack Frames and Link Registers — Details
           ◆ Data stored on the stack as part of a function call forms part of
                   the stack frame for that function invocation.
           ◆ A stack frame can have stored register values, and also allocated
                   space for local variables declared within the function
           ◆ The stack frame also stores ‘housekeeping’ information e.g. the
                   current value of the LR. (We’ll see why shortly)
           ◆ When a procedure is exited and we return to the caller of the function,
                   then the whole stack frame content must be popped.
           ◆ This is why local variables vanish once a function is exited
           ◆ When doing Load/Store Multiple we generally give a list of registers in
              curly braces e.g. LDMFD SP, {R1–R4, LR}
           ◆ Remember: lowest-address item goes to the lowest numbered register
Page 14
© David Brailsford 2011
Storing the Link Register — Details
           ◆ Recall: if we are in a leaf function (which doesn’t call anything else)
                   we don’t need to store the LR. But in all other cases we do! Why?

                              main              func1            func2
                                             sp!, {regs, lr}
                               ...                                ...
                             BL func1          BL func2           ...
                                            sp!, {regs, pc}    MOV pc, lr

           ◆ The BL func1 in main stores the return address in LR (R14)
           ◆ But then the BL func2 inside func1 overwrites it
           ◆ So func2 returns to func1 OK but if func1 returns to
               main, using MOV PC, LR then LR would be wrong!

Page 15
© David Brailsford 2011
Storing the Link Register — More Details

           ◆ We definitely need to stack the LR value for all non-leaf functions !
           ◆ Note the stack frame push and pop instructions at start and end of func1
           ◆ Note how the LDMFD asks that the stored LR value be put back into PC
           ◆ This causes instantaneous return to main. Cute !
           ◆ This kind of trick can be used for ‘tail continued’ functions
           ◆ However, we usually have some ‘clearing up’ to do before we can return
           ◆ Let’s look at a real example of the situation in the previous diagram
           ◆ We’ll use strchr (see later slide) as our ‘leaf function’
           ◆ This program is a pin-number generator using a character as the ‘seed’

Page 16
© David Brailsford 2011
The leaf function version of strchr

           ◆ The index of the first occurrence of a given character within a
                   string is found using strchr
           ◆ For example the index of ‘o’ in ‘Hello’ is 4 (indexing from 0)
           ◆ The final coursework gives you a C version of strchr and asks
                   you to convert it to ARM assembler.
           ◆ Let’s assume that this routine has been written and that it expects, on
                   entry that R1 contains the start address of the string
           ◆ Also assume that R2 contains the character to be searched for
           ◆ The index value will be returned in R0

Page 17
© David Brailsford 2011
The func1 function
           ◆ We save, on the stack frame, R4-R8 (which APCS says we must preserve)
                   and also LR
           ◆ Main program. PIN code issued is current year-number (2011) plus input
                   character’s index position in the chosen string. Returned in R0

   func1                  STMFD SP!, {R4-R8, LR}
                                ; strchr trashes R4 and lots of other stuff may be added
                                ; here, later, that may well trash R5-R8 (which APCS says
                                ; we must save). We now get ready to call strchr
                                ; R1-3 untouched so should be OK
                           BL strchr              ;expects str. address in R1 and ch. in R2
                           ADD R0, R0, R5
                           LDMFD SP!, {R4-R8,PC}   ; restore R4-R8 and return result in R0

Page 18
© David Brailsford 2011
Global strings and main prog.
     ◆ Here are the global string declarations and the main program

   stack                  EQU          0x1000
   B main
   mesg1                  DEFB         "the quick brown fox jumps over the lazy dog0"
   mesg2                  DEFB         "Please type a single lower-case alphabetic character: 0"
   mesg3                  DEFB         "nOK - your pin number is 0"
   main                         ADR R0, mesg2
                                SWI 3
                                SWI 1            ; get the character from keyboard
                                MOV R2, R0       ; seed char now in R2
                                ADR R1, mesg1
                                ADR R0, mesg3
                                SWI 3            ; OK - your pin number is
                                LDR R5, =2011    ; not possible with a MOV
                                MOV SP, #stack
                                BL func1
                                SWI 4            ; print out pin number
                                SWI 2

Page 19
© David Brailsford 2011
Notes (+ the stack picture)

           ◆ Registers R1, R2 and R5 contain vital info. for func1
           ◆ Notice that R1 and R2 are passed over into strchr
           ◆ Returned value from strch added to R0 contents inside func1
           ◆ Be clear that after the STMFD SP!, {R4-R8, LR} ‘push’ the
                   stack looks like:

                                         ...       High addresses






Page 20                   SP             R4        Low addresses
© David Brailsford 2011
Coping with recursion

           ◆ A recursive function is one that calls itself.
           ◆ Recursive function theory is of enormous importance for Maths and CS
           ◆ There has to be a way of escaping from the recursion. Otherwise it will
                   go on for ever (consuming CPU time and memory)
           ◆ The classic example is the factorial function defined as follows:
                          factorial (n) = n × factorial (n − 1)
                          factorial (0) = 1
           ◆ Thus, factorial(4) = 4 × 3 × 2 × 1 × 1 = 24
           ◆ Here’s how it is expressed in C:
                          int factorial (int n)
                                  if (n==0) return 1
Page 21
                                  return n * factorial (n-1)
© David Brailsford 2011   }
More about recursion
           ◆ For more information see my ‘Notes on Recursion’ handout
           ◆ Let’s look at how to do recursion in ARM assembler
           ◆ And the afterwards be very thankful that the C compiler lets us write
                   the version that was on the last slide !
           ◆ One of the simplest examples is factorial so let’s do that
           ◆ The stack will build up a lot of instances of n in separate stack
                   frames waiting to be consumed and multiplied together
           ◆ If a function calls itself it has to be written with extraordinary care
                   to be general enough to cope with:
                         Initial case when called from main
                         Final case when local instance of n has value 0
           ◆ Program we give next takes input argument in R1 and delivers result in R0
Page 22
© David Brailsford 2011
The factorial program
   stack EQU              0x1000
   input EQU              6
   result DEFB            " factorial is   "
   B main
   factorial              CMP R1, #0
                          MOVEQ R1, #1
                          BEQ exit              ; base case -- no need for new frame
                          STMFD SP!, {R1, LR}
                          SUB R1, R1, #1
                          BL factorial
                          LDMFD SP!, {R1,LR}    ; restore R1 and LR
   exit                   MUL R0, R0, R1        ; answer builds up in R0
                          MOV PC, LR

   main                   MOV R1, #input
                          MOV SP, #stack
                          MOV R0, R1
                          SWI 4
                          ADR R0, result
                          SWI 3
                          MOV R0, #1
Page 23
© David Brailsford 2011
                          BL factorial
                          SWI 4
                          SWI 2
Example stack frames
   ◆ Diagrams below show:
                          (a) build up of simple stack frames for factorial
                          (b) more general block diagram of typical stack frame

                              ...            FP              ...           High addresses

                              LR                              LR

                                                        Saved Registers

                                                        Local variables
     SP                        1                             ...            Low addresses

                              (a)                            (b)

Page 24
© David Brailsford 2011
More about stack management

           ◆ Note the factorial stack contains different instances of n
           ◆ Generating correct code for stack-frame handling is the compiler’s job
           ◆ Things like factorial, fibbonacci and ackerman are increasingly
                   tough tests of your compiler’s handling of recursion !
           ◆ Stack frames can be cleared down by LDMFD ‘pop’ operations
           ◆ But also useful to have a Frame Pointer (FP) to start of current frame
                          (FP is R11 in the APCS scheme)
           ◆ Quick clear down of a frame can be done with MOV SP, FP
           ◆ If arguments and local vbles. are kept on stack frames what about global
                   (and static) variables? Answer: you need something like DEFW
           ◆ Start point of static variable area can be kept in the static base
Page 25
                   register (R9 on ARM)
© David Brailsford 2011

More Related Content

What's hot

Microprocessor 80386
Microprocessor 80386Microprocessor 80386
Microprocessor 80386yash sawarkar
Register Organization of 80386
Register Organization of 80386Register Organization of 80386
Microprocessor 8086
Microprocessor 8086Microprocessor 8086
Microprocessor 8086
Gopikrishna Madanan
Gaurav Verma
Math Co-processor 8087
Math Co-processor 8087Math Co-processor 8087
Math Co-processor 8087
Prothoma Diteeya
8259 a
8259 a8259 a
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formats
Mazin Alwaaly
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
University of Gujrat, Pakistan
Introduction to ARM
Introduction to ARMIntroduction to ARM
Introduction to ARM
Puja Pramudya
Addressing modes of 8086
Addressing modes of 8086Addressing modes of 8086
Addressing modes of 8086
Lect 2 ARM processor architecture
Lect 2 ARM processor architectureLect 2 ARM processor architecture
Lect 2 ARM processor architecture
Microprocessors - 80386DX
Microprocessors - 80386DXMicroprocessors - 80386DX
Microprocessors - 80386DX
Programming ARM Cortex-M4 STM32 Nucleo
Programming ARM Cortex-M4  STM32 NucleoProgramming ARM Cortex-M4  STM32 Nucleo
Programming ARM Cortex-M4 STM32 Nucleo
Sanjay Adhikari
Basic ops concept of comp
Basic ops  concept of compBasic ops  concept of comp
Basic ops concept of comp
gaurav jain
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazardWaed Shagareen
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
Mathivanan Natarajan
ARM Processor
ARM ProcessorARM Processor
ARM Processor
Aniket Thakur

What's hot (20)

The Hagelin M-209 cipher machine
The Hagelin M-209 cipher machineThe Hagelin M-209 cipher machine
The Hagelin M-209 cipher machine
Microprocessor 80386
Microprocessor 80386Microprocessor 80386
Microprocessor 80386
Register Organization of 80386
Register Organization of 80386Register Organization of 80386
Register Organization of 80386
Microprocessor 8086
Microprocessor 8086Microprocessor 8086
Microprocessor 8086
Math Co-processor 8087
Math Co-processor 8087Math Co-processor 8087
Math Co-processor 8087
8259 a
8259 a8259 a
8259 a
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formats
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
Introduction to ARM
Introduction to ARMIntroduction to ARM
Introduction to ARM
Addressing modes of 8086
Addressing modes of 8086Addressing modes of 8086
Addressing modes of 8086
Chapter 4
Chapter 4Chapter 4
Chapter 4
Important questions
Important questionsImportant questions
Important questions
Lect 2 ARM processor architecture
Lect 2 ARM processor architectureLect 2 ARM processor architecture
Lect 2 ARM processor architecture
Microprocessors - 80386DX
Microprocessors - 80386DXMicroprocessors - 80386DX
Microprocessors - 80386DX
Programming ARM Cortex-M4 STM32 Nucleo
Programming ARM Cortex-M4  STM32 NucleoProgramming ARM Cortex-M4  STM32 Nucleo
Programming ARM Cortex-M4 STM32 Nucleo
Basic ops concept of comp
Basic ops  concept of compBasic ops  concept of comp
Basic ops concept of comp
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazard
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
ARM Processor
ARM ProcessorARM Processor
ARM Processor

Viewers also liked

Program activation records
Program activation recordsProgram activation records
Program activation records
Nitin Reddy Katkam
ARM lab programs
ARM  lab programs  ARM  lab programs
ARM lab programs
revanasidha janbgi
Applications of stack
Applications of stackApplications of stack
Applications of stack
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching materialJohn Williams
[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention
[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention
[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention
GangSeok Lee
Function Call Stack
Function Call StackFunction Call Stack
Function Call Stack
Gail Carmichael
Ecological backpack
Ecological backpackEcological backpack
Ecological backpackGoncaglss
The Stack And Recursion
The Stack And RecursionThe Stack And Recursion
The Stack And Recursion
Ashim Lamichhane
Comparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARCComparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARC
Apurv Nerlekar
Ge6161 lab manual
Ge6161 lab manualGe6161 lab manual
Ge6161 lab manualMani Kandan
Memory Management
Memory ManagementMemory Management
Memory Management
Visakh V
Stacks & subroutines 1
Stacks & subroutines 1Stacks & subroutines 1
Stacks & subroutines 1
deval patel
How Functions Work
How Functions WorkHow Functions Work
How Functions Work
Saumil Shah
Stacks Implementation and Examples
Stacks Implementation and ExamplesStacks Implementation and Examples
Stacks Implementation and Examplesgreatqadirgee4u
Stack and subroutine
Stack and subroutineStack and subroutine
Stack and subroutineAshim Saha
Stack data structure
Stack data structureStack data structure
Stack data structureTech_MX

Viewers also liked (20)

Program activation records
Program activation recordsProgram activation records
Program activation records
ARM lab programs
ARM  lab programs  ARM  lab programs
ARM lab programs
Applications of stack
Applications of stackApplications of stack
Applications of stack
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention
[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention
[2007 CodeEngn Conference 01] 김기오 - NASM 어셈블러 사용법과 Calling Convention
Al2ed chapter13
Al2ed chapter13Al2ed chapter13
Al2ed chapter13
Stack and heap
Stack and heapStack and heap
Stack and heap
Function Call Stack
Function Call StackFunction Call Stack
Function Call Stack
Ecological backpack
Ecological backpackEcological backpack
Ecological backpack
The Stack And Recursion
The Stack And RecursionThe Stack And Recursion
The Stack And Recursion
Comparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARCComparison between RISC architectures: MIPS, ARM and SPARC
Comparison between RISC architectures: MIPS, ARM and SPARC
Ge6161 lab manual
Ge6161 lab manualGe6161 lab manual
Ge6161 lab manual
Memory Management
Memory ManagementMemory Management
Memory Management
Stacks & subroutines 1
Stacks & subroutines 1Stacks & subroutines 1
Stacks & subroutines 1
How Functions Work
How Functions WorkHow Functions Work
How Functions Work
Stacks Implementation and Examples
Stacks Implementation and ExamplesStacks Implementation and Examples
Stacks Implementation and Examples
Stack and subroutine
Stack and subroutineStack and subroutine
Stack and subroutine
Stack data structure
Stack data structureStack data structure
Stack data structure

Similar to ARM procedure calling conventions and recursion

07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters
Alexandre Moneger
AllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW SecurityAllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW Security
AllBits BVBA (freelancer)
Chapter Seven(1)
Chapter Seven(1)Chapter Seven(1)
Chapter Seven(1)bolovv
Embedded C programming session10
Embedded C programming  session10Embedded C programming  session10
Embedded C programming session10
Keroles karam khalil
C programming session10
C programming  session10C programming  session10
C programming session10
Keroles karam khalil
The Stack Frame
The Stack FrameThe Stack Frame
The Stack Frame
Ivo Marinkov
chapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guidechapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guide
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applications
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming BasicsReversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
Casiano Rodriguez-leon
05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
Alexandre Moneger
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
Nico Ludwig
7986-lect 7.pdf
7986-lect 7.pdf7986-lect 7.pdf
7986-lect 7.pdf
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
Buffer Overflows
Buffer OverflowsBuffer Overflows
Buffer OverflowsSumit Kumar
C programming session9 -
C programming  session9 -C programming  session9 -
C programming session9 -
Keroles karam khalil
02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack
Alexandre Moneger

Similar to ARM procedure calling conventions and recursion (20)

07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters
AllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW SecurityAllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW Security
Chapter Seven(1)
Chapter Seven(1)Chapter Seven(1)
Chapter Seven(1)
Embedded C programming session10
Embedded C programming  session10Embedded C programming  session10
Embedded C programming session10
C programming session10
C programming  session10C programming  session10
C programming session10
The Stack Frame
The Stack FrameThe Stack Frame
The Stack Frame
chapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guidechapter8.ppt clean code Boundary ppt Coding guide
chapter8.ppt clean code Boundary ppt Coding guide
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applications
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming BasicsReversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
7986-lect 7.pdf
7986-lect 7.pdf7986-lect 7.pdf
7986-lect 7.pdf
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
Buffer Overflows
Buffer OverflowsBuffer Overflows
Buffer Overflows
C programming session9 -
C programming  session9 -C programming  session9 -
C programming session9 -
02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack02 - Introduction to the cdecl ABI and the x86 stack
02 - Introduction to the cdecl ABI and the x86 stack

More from Stephan Cadene

Linux Kernel and Driver Development Training
Linux Kernel and Driver Development TrainingLinux Kernel and Driver Development Training
Linux Kernel and Driver Development Training
Stephan Cadene
Embedded Market March 2013
Embedded Market March 2013Embedded Market March 2013
Embedded Market March 2013Stephan Cadene
Exceptions and Interrupts on Cortex-M
Exceptions and Interrupts on Cortex-MExceptions and Interrupts on Cortex-M
Exceptions and Interrupts on Cortex-MStephan Cadene
Cloudy with a chance of arm
Cloudy with a chance of armCloudy with a chance of arm
Cloudy with a chance of armStephan Cadene
Arm assembly language by Bournemouth Unversity
Arm assembly language by Bournemouth UnversityArm assembly language by Bournemouth Unversity
Arm assembly language by Bournemouth Unversity
Stephan Cadene
120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630
120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630
120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630Stephan Cadene
What is a microcontroller
What is a microcontrollerWhat is a microcontroller
What is a microcontroller
Stephan Cadene
Stephan Cadene
2010-2013 Semiconductor Market Forecast Seizing the economic & political ...
2010-2013 Semiconductor Market Forecast Seizing the economic & political ...2010-2013 Semiconductor Market Forecast Seizing the economic & political ...
2010-2013 Semiconductor Market Forecast Seizing the economic & political ...Stephan Cadene

More from Stephan Cadene (9)

Linux Kernel and Driver Development Training
Linux Kernel and Driver Development TrainingLinux Kernel and Driver Development Training
Linux Kernel and Driver Development Training
Embedded Market March 2013
Embedded Market March 2013Embedded Market March 2013
Embedded Market March 2013
Exceptions and Interrupts on Cortex-M
Exceptions and Interrupts on Cortex-MExceptions and Interrupts on Cortex-M
Exceptions and Interrupts on Cortex-M
Cloudy with a chance of arm
Cloudy with a chance of armCloudy with a chance of arm
Cloudy with a chance of arm
Arm assembly language by Bournemouth Unversity
Arm assembly language by Bournemouth UnversityArm assembly language by Bournemouth Unversity
Arm assembly language by Bournemouth Unversity
120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630
120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630
120319 m2m-tutorial-dohler-alonso-ec-2012-final-24630
What is a microcontroller
What is a microcontrollerWhat is a microcontroller
What is a microcontroller
2010-2013 Semiconductor Market Forecast Seizing the economic & political ...
2010-2013 Semiconductor Market Forecast Seizing the economic & political ...2010-2013 Semiconductor Market Forecast Seizing the economic & political ...
2010-2013 Semiconductor Market Forecast Seizing the economic & political ...

ARM procedure calling conventions and recursion

  • 1. Professor David Brailsford ( School of Computer Science University of Nottingham Extra material courtesy of: Jaume Bacardit, Thorsten Altenkirch and Liyang Hu — School of CS, Univ. of Nottm. Steve Furber, Jim Garside and Pete Jinks — School of CS, Univ. of Manchester Lecture 08: ARM Procedure Calling Conventions and Recursion Page 1 © David Brailsford 2011
  • 2. What is a procedure? ◆ A portion of code within a larger program. Often called a subroutine or procedure in imperative languages like C methods in OO languages like Java and functions in functional languages like Haskell ◆ Functions return a value. So some purists would say that a C function returning void is actually a procedure ! ◆ Procedures are necessary for: reducing duplication of code and enabling re-use decomposing complex programs into manageable parts ◆ Procedures can call each other and can even call themselves ◆ What happens when we call a procedure? The caller is suspended; control hands over to the callee Callee performs the requested task Page 2 © David Brailsford 2011 Callee returns control to the caller
  • 3. An Example in C Jumps to a new piece of code int f (int x, int y) { but keeps track of where we return sqrt(x * x + y * y); were before } main function f int main ( ) { printf ("f(5,12) = %dn", f(5, 12)); } Returns to the next instruction of the original code Page 3 © David Brailsford 2011
  • 4. Basic procedure calls on ARM ◆ We already know that the BL instruction uses R14 as the link register (LR) This is where it stores the return address ◆ So in simple cases, at the end of the procedure, all we need to do is MOV PC, LR ◆ In simple cases routines may be able to do their job solely with registers ◆ We’ve seen some simple examples of this with the strcpy and strchr procedures in courseworks. ◆ But we need conventions for register usage to avoid over-writing and misunderstandings ◆ Thus we have the APCS (ARM Procedure Call Standard) to guide us Page 4 © David Brailsford 2011
  • 5. APCS Register Use Convention Register APCS name APCS Role 0 a1 Argument 1/integer result/ scratch register 1 a2 Argument 2/scratch register 2 a3 Argument 3/scratch register 3 a4 Argument 4/scratch register 4 v1 Register variable 1 5 v2 Register variable 2 6 v3 Register variable 3 7 v4 Register variable 4 8 v5 Register variable 5 9 sb/v6 Static base / Register variable 6 10 sl/v7 Stack limit / Register variable 7 11 fp Frame pointer 12 ip Scratch register/ specialist use by linker 13 sp Lower end of current stack frame 14 lr link address / scratch register 15 pc Program counter Page 5 © David Brailsford 2011
  • 6. Caller Saved Registers ◆ R0–R3 used to pass arguments into a function ◆ But inside the function they may be used for any purpose (they are scratch registers). R0 often delivers back the result ◆ Caller must expect R0–R3 contents to be trashed (i.e. over-written) when a function call returns. ◆ If caller doesn’t want this to happen then it must save R0–R3 contents beforehand (typically in memory). ◆ A typical simple leaf function e.g. strlen (i.e. one which does not call any other function), provided it uses only R0–R3, only needs BL to jump in and MOV PC, LR to return Page 6 © David Brailsford 2011
  • 7. Callee Saved Registers ◆ R4–R8 (R4–R10 in some variants of APCS) are registers which any called function is required to save. ◆ Therefore they must have unchanged values when control returns to the calling routine (e.g. the main program) ◆ So if the called function needs these registers for extra workspace then it must save them (hence: callee saved) ◆ Of course, it they have been saved then they must be restored before returning to the caller. ◆ Registers are limited in number. Memory has much larger capacity ◆ We need a disciplined way to save stuff in memory. Best solution is a stack Page 7 © David Brailsford 2011
  • 8. The Stack Concept ◆ A stack provides last in, first out storage ◆ It is a most important data structure in Computer Science ◆ Placing words on the stack is termed pushing ◆ Taking words off the stack is called popping Page 8 © David Brailsford 2011
  • 9. Stack Implementation Choices ◆ Do we grow the stack downwards (descending addresses) or upwards (ascending addresses) in memory? ◆ We need a stack pointer register (SP) to hold address of the top of stack (this SP is R13 on the ARM) ◆ But should R13 point to topmost filled location (stack full) ◆ Or should it point to next empty location just beyond top of stack (stack empty) ◆ No single ‘right answer’. But ARM like many other systems uses a “full descending” approach Page 9 © David Brailsford 2011
  • 10. Standard ARM C address space ◆ ARM C compilers generally arrange the memory address space as follows: top of memory stack stack pointer (sp) stack limit (sl) unused top of heap heap top of application static data static base (sb) application’s image code application base address Page 10 © David Brailsford 2011
  • 11. Multiple Loads and Stores ◆ If we want to store register values on the stack in memory it’s good to do this en bloc ◆ This is much more efficient than lots of individual STR and LDR instructions ◆ ARM supplies Load and Store Multiple instructions (LDM and STM) for just this purpose ◆ Just like the pre-index modes for single LDR/STR instructions we can use a base register as the indexer — with an option for write-back ◆ In a stack-based discipline we use SP (R13) as the memory indexer ◆ ARM assemblers support a range of suffixes for different stack regimes ◆ But the APCS uses ‘full descending’ STMFD and LDMFD options Page 11 © David Brailsford 2011
  • 12. Addressing modes and stack suffix options ◆ There are four addressing modes for multiple load/store instructions IA — Increment After Stack Orientated Suffixes IB — Increment Before Stack Type Push Pop DA — Decrement After Full descending STMFD (STMDB) LDMFD (LDMIA) Full ascending STMFA (STMIB) LDMFA (LDMDA) DB — Decrement Before Empty descending STMED (STMDA) LDMED (LDMIB) Empty ascending STMEA (STMIA) LDMEA (LDMDB) IA IB DA DB LDMxx R10, {R0, R1, R4} R4 High addresses STMxx R10, {R0, R1, R4} R4 R1 R1 R0 R13 R0 R4 R1 R4 R0 R1 R0 Low addresses (a) (b) (c) (d) ◆ We need only the first line of above table (and diagrams (a) and (d) ) Page 12 © David Brailsford 2011
  • 13. Multiple Loads and Stores — Details ◆ In the Full Descending scheme a multiple store (STMFD) corresponds to pushing register contents onto the stack ◆ Conversely a multiple load (LDMFD) corresponds to a pop from the stack ◆ These operations could use the mnemonics STMDB and LDMIA if preferred ◆ Let’s assume we want to retrieve data from the stack into registers ◆ Consider LDMFD SP, {R0-R3}. Here the SP holds the base address ◆ The overall effect is equivalent to: LDR R0, [SP] LDR R1, [SP, #4] LDR R2, [SP, #8] LDR R3, [SP, #12] ◆ But notice, in the above sequence, that SP itself has not been changed ◆ If we want SP to be altered (and we usually will) we write Page 13 LDMFD SP!, {R0-R3} © David Brailsford 2011
  • 14. Stack Frames and Link Registers — Details ◆ Data stored on the stack as part of a function call forms part of the stack frame for that function invocation. ◆ A stack frame can have stored register values, and also allocated space for local variables declared within the function ◆ The stack frame also stores ‘housekeeping’ information e.g. the current value of the LR. (We’ll see why shortly) ◆ When a procedure is exited and we return to the caller of the function, then the whole stack frame content must be popped. ◆ This is why local variables vanish once a function is exited ◆ When doing Load/Store Multiple we generally give a list of registers in curly braces e.g. LDMFD SP, {R1–R4, LR} ◆ Remember: lowest-address item goes to the lowest numbered register Page 14 © David Brailsford 2011
  • 15. Storing the Link Register — Details ◆ Recall: if we are in a leaf function (which doesn’t call anything else) we don’t need to store the LR. But in all other cases we do! Why? main func1 func2 STMFD sp!, {regs, lr} ... ... ... BL func1 BL func2 ... ... ... LDMFD sp!, {regs, pc} MOV pc, lr ◆ The BL func1 in main stores the return address in LR (R14) ◆ But then the BL func2 inside func1 overwrites it ◆ So func2 returns to func1 OK but if func1 returns to main, using MOV PC, LR then LR would be wrong! Page 15 © David Brailsford 2011
  • 16. Storing the Link Register — More Details ◆ We definitely need to stack the LR value for all non-leaf functions ! ◆ Note the stack frame push and pop instructions at start and end of func1 ◆ Note how the LDMFD asks that the stored LR value be put back into PC ◆ This causes instantaneous return to main. Cute ! ◆ This kind of trick can be used for ‘tail continued’ functions ◆ However, we usually have some ‘clearing up’ to do before we can return ◆ Let’s look at a real example of the situation in the previous diagram ◆ We’ll use strchr (see later slide) as our ‘leaf function’ ◆ This program is a pin-number generator using a character as the ‘seed’ Page 16 © David Brailsford 2011
  • 17. The leaf function version of strchr ◆ The index of the first occurrence of a given character within a string is found using strchr ◆ For example the index of ‘o’ in ‘Hello’ is 4 (indexing from 0) ◆ The final coursework gives you a C version of strchr and asks you to convert it to ARM assembler. ◆ Let’s assume that this routine has been written and that it expects, on entry that R1 contains the start address of the string ◆ Also assume that R2 contains the character to be searched for ◆ The index value will be returned in R0 Page 17 © David Brailsford 2011
  • 18. The func1 function ◆ We save, on the stack frame, R4-R8 (which APCS says we must preserve) and also LR ◆ Main program. PIN code issued is current year-number (2011) plus input character’s index position in the chosen string. Returned in R0 func1 STMFD SP!, {R4-R8, LR} ; strchr trashes R4 and lots of other stuff may be added ; here, later, that may well trash R5-R8 (which APCS says ; we must save). We now get ready to call strchr ; R1-3 untouched so should be OK BL strchr ;expects str. address in R1 and ch. in R2 ADD R0, R0, R5 LDMFD SP!, {R4-R8,PC} ; restore R4-R8 and return result in R0 Page 18 © David Brailsford 2011
  • 19. Global strings and main prog. ◆ Here are the global string declarations and the main program stack EQU 0x1000 B main mesg1 DEFB "the quick brown fox jumps over the lazy dog0" mesg2 DEFB "Please type a single lower-case alphabetic character: 0" mesg3 DEFB "nOK - your pin number is 0" ALIGN main ADR R0, mesg2 SWI 3 SWI 1 ; get the character from keyboard MOV R2, R0 ; seed char now in R2 ADR R1, mesg1 ADR R0, mesg3 SWI 3 ; OK - your pin number is LDR R5, =2011 ; not possible with a MOV MOV SP, #stack BL func1 SWI 4 ; print out pin number SWI 2 Page 19 © David Brailsford 2011
  • 20. Notes (+ the stack picture) ◆ Registers R1, R2 and R5 contain vital info. for func1 ◆ Notice that R1 and R2 are passed over into strchr ◆ Returned value from strch added to R0 contents inside func1 ◆ Be clear that after the STMFD SP!, {R4-R8, LR} ‘push’ the stack looks like: ... High addresses LR R8 R7 R6 R5 Page 20 SP R4 Low addresses © David Brailsford 2011
  • 21. Coping with recursion ◆ A recursive function is one that calls itself. ◆ Recursive function theory is of enormous importance for Maths and CS ◆ There has to be a way of escaping from the recursion. Otherwise it will go on for ever (consuming CPU time and memory) ◆ The classic example is the factorial function defined as follows: factorial (n) = n × factorial (n − 1) factorial (0) = 1 ◆ Thus, factorial(4) = 4 × 3 × 2 × 1 × 1 = 24 ◆ Here’s how it is expressed in C: int factorial (int n) { if (n==0) return 1 else Page 21 return n * factorial (n-1) © David Brailsford 2011 }
  • 22. More about recursion ◆ For more information see my ‘Notes on Recursion’ handout ◆ Let’s look at how to do recursion in ARM assembler ◆ And the afterwards be very thankful that the C compiler lets us write the version that was on the last slide ! ◆ One of the simplest examples is factorial so let’s do that ◆ The stack will build up a lot of instances of n in separate stack frames waiting to be consumed and multiplied together ◆ If a function calls itself it has to be written with extraordinary care to be general enough to cope with: Initial case when called from main Final case when local instance of n has value 0 ◆ Program we give next takes input argument in R1 and delivers result in R0 Page 22 © David Brailsford 2011
  • 23. The factorial program stack EQU 0x1000 input EQU 6 result DEFB " factorial is " B main ALIGN factorial CMP R1, #0 MOVEQ R1, #1 BEQ exit ; base case -- no need for new frame STMFD SP!, {R1, LR} SUB R1, R1, #1 BL factorial LDMFD SP!, {R1,LR} ; restore R1 and LR exit MUL R0, R0, R1 ; answer builds up in R0 MOV PC, LR main MOV R1, #input MOV SP, #stack MOV R0, R1 SWI 4 ADR R0, result SWI 3 MOV R0, #1 Page 23 © David Brailsford 2011 BL factorial SWI 4 SWI 2
  • 24. Example stack frames ◆ Diagrams below show: (a) build up of simple stack frames for factorial (b) more general block diagram of typical stack frame ... FP ... High addresses LR LR 3 Saved Registers LR 2 Local variables LR SP SP 1 ... Low addresses (a) (b) Page 24 © David Brailsford 2011
  • 25. More about stack management ◆ Note the factorial stack contains different instances of n ◆ Generating correct code for stack-frame handling is the compiler’s job ◆ Things like factorial, fibbonacci and ackerman are increasingly tough tests of your compiler’s handling of recursion ! ◆ Stack frames can be cleared down by LDMFD ‘pop’ operations ◆ But also useful to have a Frame Pointer (FP) to start of current frame (FP is R11 in the APCS scheme) ◆ Quick clear down of a frame can be done with MOV SP, FP ◆ If arguments and local vbles. are kept on stack frames what about global (and static) variables? Answer: you need something like DEFW ◆ Start point of static variable area can be kept in the static base Page 25 register (R9 on ARM) © David Brailsford 2011