Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ARM     Advanced RISC MachinesARM Instruction Set                              1
Stack Processing• A stack is usually implemented as a linear data structure which  grows up (an ascending stack) or down (...
•   The ARM architecture uses the load-store multiple instructions to carry out    stack operations.•   The pop operation ...
For example, a full ascending stack would have the notation FA appended to the loadmultiple instruction—LDMFA. This would ...
Example 20The STMFD instruction pushes registers onto the stack, updating the sp. Figure shows apush onto a full descendin...
Example 21In contrast, Next figure shows a push operation on an empty stack using the STMEDinstruction. The STMED instruct...
Block Copy Addressing88
Stack Examples       STMFD sp!,           STMED sp!,              STMFA sp!,            STMEA sp!,      {r0,r1,r3-r5}     ...
Load-Store Instructions• Three basic forms to move data between ARM registers  and memory  – Single register load and stor...
Single Register Swap Instruction• The swap instruction is a special case of a load-store  instruction. It swaps the conten...
Syntax: SWP{B}{<cond>} Rd,Rm,[Rn] Rd <- [Rn], [Rn] <- Rm                                    1    Rn                       ...
Example 21The swap instruction loads a word from memory into register r0 and overwrites thememory with register r1.       ...
PRE      mem32[0x9000] = 0x12345678          0X54233083   0X00009008         r0 = 0x00000000         r1 = 0x11112222      ...
Concept of SEMAPHORE• In computer science, a semaphore is a variable or abstract data type that  provides a simple but use...
• A    semaphore       can     only      be     accessed      using    the     following  operations: wait() and release()...
• In this implementation, a process wanting to enter its critical section it has  to acquire the binary semaphore which wi...
Example 22This example shows a simple data guard that can be used to protect data from beingwritten by another task. The S...
ARM instructions by instruction class1. Data Processing Instructions2. Branch Instructions3. Load-Store Instructions4. Sof...
Software Interrupt InstructionIntroduction   • The software interrupt instruction is used for calls to the operating syste...
Binary encoding              31   28 27         24 23                                         0            COND      OPCOD...
Syntax: SWI{<cond>} SWI_number                                 102
Example 23Here we have a simple example of an SWI call with SWI number 0x123456, used by ARMtoolkits as a debugging SWI. T...
ARM instructions by instruction class1. Data Processing Instructions2. Branch Instructions3. Load-Store Instructions4. Sof...
Byte organizations• Little-endian mode:- with the lowest-order byte residing in the low-  order bits of the word• Big-endi...
Byte organizations
Thumb Mode• Thumb is a 16-bit instruction set      – Optimized for code density from C code      – Improved performance fo...
Thumb has higher code density !• Code density: it is define as the space taken up in memory by an executable   program.• O...
Thumb implementation uses more instructions, the overall memory footprint isreduced.Code density was the main driving forc...
Thumb Register Usage• In Thumb state, you do not have direct access to all registers.• Only    the    low   registers   r0...
Thumb Instruction Set (1/3)111
Thumb Instruction Set (2/3)112
Thumb Instruction Set (3/3)113
Thumb Instruction Entry and Exit T bit, bit 5 of CPSR    If T = 1, the processor interprets the instruction stream as 16...
ARM-Thumb Interworking• ARM-Thumb interworking is the name given to the method of  linking ARM and Thumb code together for...
• There are two versions of the BX or BLX instructions: an ARM  instruction and a Thumb equivalent.• The ARM BX instructio...
Interworking Instructions• Interworking is achieved using the Branch Exchange instructions   – In Thumb state       BX Rn ...
Switching between States118
Example 24;Start off in ARM state            CODE32            ADR r0,Into_Thumb+1 ;generate branch target                ...
Summary          120
ARM data instructionsADD   AddADC   Add with carrySUB   SubtractSBC   Subtract with carryRSB   Reverse subtract ,RSB r0,r1...
ARM data instructionsAND       Bit-wise andORR       Bit-wise orEOR       Bit-wise exclusive-orBIC       Bit clear• BIC r0...
ARM data instructionsLSL   Logical shift left (zero fill)LSR   Logical shift right (zero fill)ASL   Arithmetic shift leftA...
ARM comparison instructions• only set the values of the NZCV bits CMP     Compare CMN     Negated compare,         uses an...
ARM move instructionsMOV   Move      MOV r0,r1 ; r0=r1MVN   Move negated      Mvn r0,r1 ; r0=not(r1)
ARM load-store instructionsLDR     LoadSTR     StoreLDRH    Load half-wordSTRH    Store half-wordLDRSH   Load half-word si...
C Assignments in ARM Instructions•   x = (a + b) - c;•   using r0 for a, r1 for b, r2 for c, and r3 for x.•   registers fo...
C Assignments in ARM Instructions           x = (a + b) - c;ADR r4,a     ; get address for aLDR r0,[r4] ; get value of aAD...
C Assignments in ARM Instructions• y = a*(b + c);• using r0 for both a and b, r1 for c, and r2 for y• use r4 to store addr...
C Assignments in ARM Instructions             y = a*(b + c);ADR r4,b    ; get address for bLDR r0,[r4] ;get value of bADR ...
C Assignments in ARM Instructions• z = (a « 2) | (b & 15);• using r0 for a and z, r1 for b,• r4 for addresses
C Assignments in ARM Instructions            z = (a « 2) | (b & 15);ADR r4,a     ; get address for aLDR r0,[r4] ; get valu...
Upcoming SlideShare
Loading in …5
×

Arm chap 3 last

1,883 views

Published on

Published in: Education, Technology

Arm chap 3 last

  1. 1. ARM Advanced RISC MachinesARM Instruction Set 1
  2. 2. Stack Processing• A stack is usually implemented as a linear data structure which grows up (an ascending stack) or down (a descending stack) memory• A stack pointer holds the address of the current top of the stack, either by pointing to the last valid data item pushed onto the stack (a full stack), or by pointing to the vacant slot where the next data item will be placed (an empty stack)• ARM multiple register transfer instructions support all four forms of stacks – Full ascending: grows up; base register points to the highest address containing a valid item – empty ascending: grows up; base register points to the first empty location above the stack – Full descending: grows down; base register points to the lowest address containing a valid data – empty descending: grows down; base register points to the first empty location below the stack83
  3. 3. • The ARM architecture uses the load-store multiple instructions to carry out stack operations.• The pop operation (removing data from a stack) uses a load multiple instruction; similarly, the push operation (placing data onto the stack) uses a store multiple instruction.• When using a stack you have to decide whether the stack will grow up or down in memory. A stack is either ascending (A) or descending (D). Ascending stacks grow towards higher memory addresses; in contrast, descending stacks grow towards lower memory addresses.• When you use a full stack (F), the stack pointer sp points to an address that is the last used or full location (i.e., sp points to the last item on the stack). In contrast, if you use an empty stack (E) the sp points to an address that is the first unused or empty location (i.e., it points after the last item on the stack).• There are a number of load-store multiple addressing mode aliases available to support stack operations (see Table). Next to the pop column is the actual load multiple instruction equivalent. 84
  4. 4. For example, a full ascending stack would have the notation FA appended to the loadmultiple instruction—LDMFA. This would be translated into an LDMDA instruction. 85
  5. 5. Example 20The STMFD instruction pushes registers onto the stack, updating the sp. Figure shows apush onto a full descending stack. You can see that when the stack grows the stackpointer points to the last full entry in the stack. PRE r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080014 STMFD sp!, {r1,r4} POST r1 = 0x00000002 r4 = 0x00000003 sp = 0x0008000c NOTE : Stack pointer points to the last full entry in the stack. 86
  6. 6. Example 21In contrast, Next figure shows a push operation on an empty stack using the STMEDinstruction. The STMED instruction pushes the registers onto the stack but updatesregister sp to point to the next empty location. PRE r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080010 STMED sp!, {r1,r4} POST r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080008 87NOTE : SP to point to the next empty location.
  7. 7. Block Copy Addressing88
  8. 8. Stack Examples STMFD sp!, STMED sp!, STMFA sp!, STMEA sp!, {r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5} {r0,r1,r3-r5} 0x418 S r5 S P r4 P r5 r3 r4 r1 r3 r0 r1Old SP Old SP r5 Old SP Old SP r0 0x400 r5 r4 r4 r3 r3 r1 r1 r0 S r0 S P P 0x3e8
  9. 9. Load-Store Instructions• Three basic forms to move data between ARM registers and memory – Single register load and store instruction • A byte, a 16-bit half word, a 32-bit word – Multiple register load and store instruction • To save or restore workspace registers for procedure entry and exit • To copy clocks of data – Single register swap instruction • A value in a register to be exchanged with a value in memory • To implement semaphores to ensure mutual exclusion on accesses 90
  10. 10. Single Register Swap Instruction• The swap instruction is a special case of a load-store instruction. It swaps the contents of memory with the contents of a register.• This instruction is an atomic operation—it reads and writes a location in the same bus operation, preventing any other instruction from reading or writing to that location until it completes.• Swap cannot be interrupted by any other instruction or any other bus access. We say the system “holds the bus” until the transaction is complete. 91
  11. 11. Syntax: SWP{B}{<cond>} Rd,Rm,[Rn] Rd <- [Rn], [Rn] <- Rm 1 Rn temp 2 3 Memory Rm Rd 92
  12. 12. Example 21The swap instruction loads a word from memory into register r0 and overwrites thememory with register r1. PRE mem32[0x9000] = 0x12345678 r0 = 0x00000000 r1 = 0x11112222 r2 = 0x00009000 SWP r0, r1, [r2] POST mem32[0x9000] = 0x11112222 r0 = 0x12345678 r1 = 0x11112222 r2 = 0x00009000This instruction is particularly useful when implementing semaphores and mutualexclusion in an operating system. You can see from the syntax that this instruction canalso have a byte size qualifier B, so this instruction allows for both a word and a byteswap. 93
  13. 13. PRE mem32[0x9000] = 0x12345678 0X54233083 0X00009008 r0 = 0x00000000 r1 = 0x11112222 0X36197488 0X00009008 r2 = 0x00009000 0X09059945 0X00009004r0 0X00000000 0X12345678 0X00009000r1 0X11112222r2 0X00009000 STORE SWP r0, r1, [r2] LOADPOST mem32[0x9000] = 0x11112222 0X54233083 0X00009008 r0 = 0x12345678 r1 = 0x11112222 0X36197488 0X00009008 r2 = 0x00009000 0X09059945 0X00009004r0 0x12345678 0X11112222 0X00009000r1 0X11112222r2 0X00009000 94
  14. 14. Concept of SEMAPHORE• In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.• A semaphore, in its most basic form, is a protected integer variable that can facilitate and restrict access to shared resources in a multi-processing environment.• The two most common kinds of semaphores are counting semaphores and binary semaphores. Counting semaphores represent multiple resources, while binary semaphores, as the name implies, represents two possible states (generally 0 or 1; locked or unlocked). 95
  15. 15. • A semaphore can only be accessed using the following operations: wait() and release().• wait() is called when a process wants access to a resource. This would be equivalent to the arriving customer trying to get an open table. If there is an open table, or the semaphore is greater than zero, then he can take that resource and sit at the table. If there is no open table and the semaphore is zero, that process must wait until it becomes available. signal() is called when a process is done using a resource, or when the patron is finished with his meal.• The following is an implementation of this counting semaphore (where the value can be greater than 1): 96
  16. 16. • In this implementation, a process wanting to enter its critical section it has to acquire the binary semaphore which will then give it mutual exclusion until it signals that it is done.• For example, we have semaphore s, and two processes, P1 and P2 that want to enter their critical sections at the same time. P1 first calls wait(s). The value of s is decremented to 0 and P1 enters its critical section. While P1 is in its critical section, P2 calls wait(s), but because the value of s is zero, it must wait until P1 finishes its critical section and executes signal(s).• When P1 calls signal, the value of s is incremented to 1, and P2 can then proceed to execute in its critical section (after decrementing the semaphore again). Mutual exclusion is achieved because only one process can be in its critical section at any time. 97
  17. 17. Example 22This example shows a simple data guard that can be used to protect data from beingwritten by another task. The SWP instruction “holds the bus” until the transaction iscomplete. loop MOV r1, =semaphore MOV r2, #1 SWP r3, r2, [r1] ; hold the bus until complete CMP r3, #1 BEQ loopThe address pointed to by the semaphore either contains the value 0 or 1. When thesemaphore equals 1, then the service in question is being used by another process. Theroutine will continue to loop around until the service is released by the other process—in other words, when the semaphore address location contains the value 0. 98
  18. 18. ARM instructions by instruction class1. Data Processing Instructions2. Branch Instructions3. Load-Store Instructions4. Software Interrupt Instruction5. Program Status Register Instructions 99
  19. 19. Software Interrupt InstructionIntroduction • The software interrupt instruction is used for calls to the operating system and is often called a supervisor call. • It puts the processor into supervisor mode and begins executing instructions from address 0x08.Binary encoding 31 28 27 24 23 0 COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE 100
  20. 20. Binary encoding 31 28 27 24 23 0 COND OPCODE 24-BIT (INTERPRETED) IMMEDIATEDescription To return to the instruction after the SWI the system routine must not only copy r14_svc back into the PC, but it must also restore the CPSR from SPSR_svc. 101
  21. 21. Syntax: SWI{<cond>} SWI_number 102
  22. 22. Example 23Here we have a simple example of an SWI call with SWI number 0x123456, used by ARMtoolkits as a debugging SWI. Typically the SWI instruction is executed in user mode. PRE cpsr = nzcVqift_USER pc = 0x00008000 lr = 0x003fffff; lr = r14 r0 = 0x12 0x00008000 SWI 0x123456 POST cpsr = nzcVqIft_SVC spsr = nzcVqift_USER pc = 0x00000008 lr = 0x00008004 r0 = 0x12Since SWI instructions are used to call operating system routines, you need some form ofparameter passing. This is achieved using registers. In this example, register r0 is used to passthe parameter 0x12. The return values are also passed back via registers.Code called the SWI handler is required to process the SWI call. The handler obtains the SWInumber using the address of the executed instruction, which is calculated from the linkregister lr. 103
  23. 23. ARM instructions by instruction class1. Data Processing Instructions2. Branch Instructions3. Load-Store Instructions4. Software Interrupt Instruction5. Program Status Register Instructions(MSR, MRS) (Self Study!!!) Refer Steve Furber 104
  24. 24. Byte organizations• Little-endian mode:- with the lowest-order byte residing in the low- order bits of the word• Big-endian mode:- the lowest-order byte stored in the highest bits of the word
  25. 25. Byte organizations
  26. 26. Thumb Mode• Thumb is a 16-bit instruction set – Optimized for code density from C code – Improved performance form narrow memory – Subset of the functionality of the ARM instruction set• Core has two execution states – ARM and Thumb – Switch between them using BX instruction• Thumb has characteristic features: – Most Thumb instruction are executed unconditionally – Many Thumb data process instruction use a 2-address format – Thumb instruction formats are less regular than ARM instruction formats, as a result of the dense encoding.107
  27. 27. Thumb has higher code density !• Code density: it is define as the space taken up in memory by an executable program.• On average, a Thumb implementation of the same code takes up around 30% less memory than the equivalent ARM implementation.• Figure 4.1 shows the same divide code routine implemented in ARM and Thumb assembly code. Even though the Thumb implementation uses more instructions, the overall memory footprint is reduced. Code density was the main driving force for the Thumb instruction set. 108
  28. 28. Thumb implementation uses more instructions, the overall memory footprint isreduced.Code density was the main driving force for the Thumb instruction set. Becauseit was also designed as a compiler target, rather than for hand-written assemblycode, we recommend that you write Thumb-targeted code in a high-levellanguage like C or C++. 109
  29. 29. Thumb Register Usage• In Thumb state, you do not have direct access to all registers.• Only the low registers r0 to r7 are fully accessible.• The higher registers r8 to r12 are only accessible with MOV, ADD, or CMP instructions.• CMP and all the data processing instructions that operate on low registers update the condition flags in the cpsr. 110
  30. 30. Thumb Instruction Set (1/3)111
  31. 31. Thumb Instruction Set (2/3)112
  32. 32. Thumb Instruction Set (3/3)113
  33. 33. Thumb Instruction Entry and Exit T bit, bit 5 of CPSR  If T = 1, the processor interprets the instruction stream as 16-bit Thumb instruction  If T = 0, the processor interprets if as standard ARM instructions Thumb Entry  ARM cores startup, after reset, execution ARM instructions  Executing a branch and Exchange instruction (BX)  Set the T bit if the bottom bit of the specified register was set  Switch the PC to the address given in the remainder of the register Thumb Exit  Executing a thumb BX instruction 114
  34. 34. ARM-Thumb Interworking• ARM-Thumb interworking is the name given to the method of linking ARM and Thumb code together for both assembly and C/C++.• To call a Thumb routine from an ARM routine, the core has to change state. This state change is shown in the T bit of the cpsr.• The BX and BLX branch instructions cause a switch between ARM and Thumb state while branching to a routine.• The BX lr instruction returns from a routine, also with a state switch if necessary. 115
  35. 35. • There are two versions of the BX or BLX instructions: an ARM instruction and a Thumb equivalent.• The ARM BX instruction enters Thumb state only if bit 0 of the address in Rn is set to binary 1; otherwise it enters ARM state. The Thumb BX instruction does the same. Syntax: BX Rn BLX Rn | label 116
  36. 36. Interworking Instructions• Interworking is achieved using the Branch Exchange instructions – In Thumb state BX Rn – In ARM state (on Thumb-aware cores only) BX<condition> Rn Where Rn can be any registers (R0 to R15)• The performs a branch to an absolute address in 4GB address space by copying Rn to the program counter• Bit 0 of Rn specifies the state to change to117
  37. 37. Switching between States118
  38. 38. Example 24;Start off in ARM state CODE32 ADR r0,Into_Thumb+1 ;generate branch target ;address & set bit 0 ;hence arrive Thumb state BX r0 ;branch exchange to Thumb … CODE16 ;assemble subsequent as ThumbInto_Thumb … ADR r5,Back_to_ARM ;generate branch target to ;word-aligned address, ;hence bit 0 is cleared. BX r5 ;branch exchange to ARM … CODE32 ;assemble subsequent as ARMBack_to_ARM …119
  39. 39. Summary 120
  40. 40. ARM data instructionsADD AddADC Add with carrySUB SubtractSBC Subtract with carryRSB Reverse subtract ,RSB r0,r1,r2, r0=r2 – r1RSC Reverse subtract with carryMUL MultiplyMLA Multiply and accumulate• MLA r0,rl,r2,r3 ,r0=r1 x r2 + r3
  41. 41. ARM data instructionsAND Bit-wise andORR Bit-wise orEOR Bit-wise exclusive-orBIC Bit clear• BIC r0,r1,r2 sets r0 to r1 and not r2- uses the second source operand as a mask, a bit in mask is 1, the corresponding bit in first source operand is cleared
  42. 42. ARM data instructionsLSL Logical shift left (zero fill)LSR Logical shift right (zero fill)ASL Arithmetic shift leftASR Arithmetic shift right, copies the sign bitROR Rotate rightRRX Rotate right extended with C, performs a 33- bit rotate
  43. 43. ARM comparison instructions• only set the values of the NZCV bits CMP Compare CMN Negated compare, uses an addition to set the status bits TST Bit-wise test, a bit-wise AND TEQ Bit-wise negated test, an exclusive-or
  44. 44. ARM move instructionsMOV Move MOV r0,r1 ; r0=r1MVN Move negated Mvn r0,r1 ; r0=not(r1)
  45. 45. ARM load-store instructionsLDR LoadSTR StoreLDRH Load half-wordSTRH Store half-wordLDRSH Load half-word signedLDRB Load byteSTRB Store byteADR Set register to address
  46. 46. C Assignments in ARM Instructions• x = (a + b) - c;• using r0 for a, r1 for b, r2 for c, and r3 for x.• registers for indirect addressing. Indirect r4• load values of a, b, and c into registers• store value of x back to memory
  47. 47. C Assignments in ARM Instructions x = (a + b) - c;ADR r4,a ; get address for aLDR r0,[r4] ; get value of aADR r4,b ; get address for b, using r4LDR r1,[r4] ; load value of bADD r3,r0,r1 ; set result for x to a + bADR r4,c ; get address for cLDR r2,[r4] ; get value of cSUB r3,r3,r2 ; complete computation of xADR r4,x ; get address for xSTR r3,[r4] ; store x at proper location
  48. 48. C Assignments in ARM Instructions• y = a*(b + c);• using r0 for both a and b, r1 for c, and r2 for y• use r4 to store addresses for indirect addressing
  49. 49. C Assignments in ARM Instructions y = a*(b + c);ADR r4,b ; get address for bLDR r0,[r4] ;get value of bADR r4,c ; get address for cLDR r1,[r4] ; get value of cADD r2,r0,r1 ; compute partial result of y=b+cADR r4,a ; get address for aLDR r0,[r4] ; get value of aMUL r2,r2,r0 ; compute final value of y=a*(b+c)ADR r4,y ; get address for ySTR r2,[r4] ; store value of y at proper location
  50. 50. C Assignments in ARM Instructions• z = (a « 2) | (b & 15);• using r0 for a and z, r1 for b,• r4 for addresses
  51. 51. C Assignments in ARM Instructions z = (a « 2) | (b & 15);ADR r4,a ; get address for aLDR r0,[r4] ; get value of aMOV r0,r0,LSL 2 ; perform shift (a « 2)ADR r4,b ; get address for bLDR r1,[r4] ; get value of bAND r1,r1,#15 ; perform logical AND (b & 15)ORR r1,r0,r1 ; compute final value of zADR r4,z ; get address for zSTR r1,[r4] ; store value of z

×