1ARM Architecture ARM core : key component for many embedded systems that needhigh code density, small size, low power e....
2 Sign Extend -> converts signed 8/16 bit to 32 bit value and places in reg. Two source registers (Rn and Rm) and one re...
On Chip Debug Hardware3
4 Instructions are 32-bit wide and address is word alignedCPU STATES and MODES: Mode determines which registers are acti...
5 When the processor is executing in ARM state: All instructions are 32 bits wide All instructions must be word aligned...
6CPSR: 32-bit register with condition flags, control bits, status & ext. Only privileged modes have full write access to...
7Banked Registers:
8 Total 37 registers = 30 general purpose + 6 status + 1 PC Different set of register in different mode of operation Us...
9ARM Data Processing Syntax : <opcode> {<cc>} {S} Rd, Rn, op2 ‘op2’ normally comes from barrel shifter and can be the fo...
10
11ARM The Barrel ShifterDestinationCF 0 Destination CFLSL : Logical Left Shift ASR: Arithmetic Right ShiftMultiplication b...
12ARM Data Processing Instructions CMP,CMN,TST & TEQ always update flags (even if ‘S’ is not used assuffix) and do not al...
13ARM Immediate OperandImmediate Operand (32-bit): obtained by 8-bit constant rotated right even number of positions i.e....
14Data processing: ADD R9, R5, R5, LSL #3 ; R9 = R5+(R5*8) RSB R9, R5, R5, LSR #3 ; R9 = (R5/8) – R5 MOV R12, R4, ROR R...
15ARM Conditional Execution
16 Set the flags, and then use various conditional code CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x) MOVEQ r1, # 0 i...
17 B <cc> label : branch to label( MOV LR, PC can be used before above inst. to store return add.) BL <cc> subroutine_la...
18ARM Multiply Normal (32-bit result) and long(64-bit result) multiplication Syntax: MUL {<cc>} {S} Rd, Rm, Rs ; Rd = R...
19ARM Load & Store Instructions Data movement between registers and memory Instructions : opcode <cc> Rd, <address>LDR S...
20Load & Store Instructions Choice of indexing :- Pre-index, Pre-index write back and post indexaddressing Post index an...
21ARM Pre & Post indexing0x50x5r10x200BaseRegister 0x200r00x5SourceRegisterfor STROffset12 0x20cr10x200OriginalBaseRegiste...
22ARM Load/Store Multiple Multiple register load and store with single instruction Syntax : LDM <CC> <add_mode> Rn {!} ...
23ARM Stack OperationsExample : Let R1=0x00000002, R4=0x00000003,SP=0x00000814 STMFD sp! , {R1,R4} ; full descending stac...
24
25ARM Miscellaneous Instr. SWP <cc> Rd, Rm, [Rn] Swap a word between memory and a register tmp= mem32[Rn], mem32[Rn]=Rm...
26 Count leading zeros : CLZ <cc> Rd, RmPseudo Instructions: LDR Rd, =constant (assembly pseudo instruction)if constant ...
27Exceptions: Generated by internal (e.g. undefined inst.) or external (e.g.interrupts) sources On exception, processor ...
28ARM Exceptions Events from internal and external sources that diverts normal flowof execution Reset and SWI switches p...
29ARM Exceptions When an exception occurs, the ARM automatically: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bi...
30ARM ExceptionsReturn from Exceptions: When exception occurs, return address stored in LR (i.e.PC-4)may not be address o...
ARM Exceptions Return from pre-fetch abort : PC not updated, so to return on same instruction Return :SUB LR, LR, #4MOV...
32ARM Exceptions Exception Priorities: Reset is highest priority exception initializes memory, caches,stack pointer etc....
33Software Interrupt User mode uses SWI instruction (that causes exception) toaccess privileged operation (e.g. OS servic...
34Software Interrupt Top level SWI handler determines SWI_number and uses this numberto call appropriate SWI service rout...
35Software Interrupt Instruction BL jump_table save return address in LR_SVC.Routine ‘num0’ returns to supervisor mode af...
36Nested SWIsReentrant SWI Handling: Corruption of SPSR and LR by nested SWI calls causesproblem e.g. 2nd SWI exception i...
37Nested SWIsLDR R10, [LR, # - 4] ; read SWI instruction opcodeAND R10, R10, # 0x00FFFFFF ; get 24-bit number in R10MOV R1...
38Software Interrupt Suffix ‘S’ in MOVS signifies that SPSR is also copied to CPSR
39Thumb Instructions On average, thumb program takes 35% less memory (highcode density) 16-bit fixed size instructions =...
40Thumb Instructions ARM-Thumb inter-working:BX and BLX instructions of ARM and Thumb does samethingCODE32 ; followings a...
41Thumb InstructionsB <cc> label : branch to label with condition Branch range is -256 to +254B label : branch to label w...
42 Data Processing Instructions: ADD/ADC/AND/BIC/EOR/MOV/MUL/MVN/NEG/ORR/SBC/SUB Rd, Rn ADD/SUB Rd, Rn #immed3 ADD/MOV...
43 Multiple Register Load/Store LDM / STM { IA } Rn!, { low register list} Stack Instructions: POP { low register_list...
44ARM Programs[1] Bit-Field Manipulation: Packing/Unpacking of bit fields (variable size) e.g. variable lengthcode Used ...
45 Three functions in packing: (1) Align byte-stream pointer (2)insertcodes to bitbuff and store bitbuff in mem.(3) finis...
46codelen R5 ; length of current codebitbuff R6 ; 32-bit big endian bufferbitsfree R7 ; no. of bits free in ‘bitbuff’temp ...
47write_code ; 2nd routine ( to write codes in buffer & store buffer if; it gets full )SUBS bitsfree, bitsfree, codelen ; ...
48write_finish ; 3RD routine (to finish packing)RSBS temp, bitsfree, #32 ; temp = no. of used bits in ‘bitbuff’finish_loop...
[2] SIMD processing: Let us consider graphics example of processing multiple 8-bit pixels of animage Problem : merge two...
IMG_W equ 176IMG_H equ 144pz R0 ; pointer to destination imagepx R1 ; pointer to first image Xpy R2 ; pointer to second im...
AND x, mask, xxAND y, mask, yySUB x, x, yMUL x, a, xADD x, x, y, LSL #8AND z, mask, x, LSR#8AND x, mask, xx, LSR #8AND y, ...
52
ARM7 TDMI block diagram53
External Interface through AMBA Bus54AMBAInterfaceInst. & data cacheMMUARM CoreCP15EmbeddedICE & JTAGWriteBufferAMBAAddres...
 JTAG TAP controller: Basically used to test PCB assembly, interconnect or even sub blockinside IC without any physical ...
DATA BUS Uni & Bidirectional Data Bus: When BUSEN is HIGH, all instruction and input data are presented toDIN[31:0] wher...
Upcoming SlideShare
Loading in...5
×

Ppt2 arm

746

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
746
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
82
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ppt2 arm

  1. 1. 1ARM Architecture ARM core : key component for many embedded systems that needhigh code density, small size, low power e.g. cell phones, handheldPDA, camera Adopted RISC design philosophy Reduced number of Fixed size Instructions (simple and powerful) Pipelining, Load/Store architecture, Large register set But different from pure RISC Variable cycle execution for certain instructions Inline barrel shifter leading to few complex instructions Thumb state (16-bit instruction set) Conditional execution of instructions DSP instructions Pipeline : Three basic stages (in ARM7TDMI): fetch, decode, execute five stages in ARM9 & six in ARM10 Performance: MIPS @ Clk freq., mW @ (Volt, Clk freq.) Softwares for ARM Embedded System :- Boot Code, Operating system& Application programs
  2. 2. 2 Sign Extend -> converts signed 8/16 bit to 32 bit value and places in reg. Two source registers (Rn and Rm) and one result register Rd Barrel shifter => preprocess Rm before it enters to ALU
  3. 3. On Chip Debug Hardware3
  4. 4. 4 Instructions are 32-bit wide and address is word alignedCPU STATES and MODES: Mode determines which registers are active and access rights toProgram Status Reg. Non-privileged mode has write access to only condition flags of currentprogram status register (CPSR) and read access to remaining fields After reset, processor is in ‘supervisor’ mode wherein OS kernel operates Programs and applications runs in ‘user’ mode IRQ & FIQ are associated with interrupts Exception modes are the modes other than user and systemARM Architecture
  5. 5. 5 When the processor is executing in ARM state: All instructions are 32 bits wide All instructions must be word aligned Therefore the pc value is stored in bits [31:2] with bits [1:0]undefined (as instruction cannot be halfword or bytealigned). When the processor is executing in Thumb state: All instructions are 16 bits wide All instructions must be halfword aligned Therefore the pc value is stored in bits [31:1] with bit [0]undefined (as instruction cannot be byte aligned). When the processor is executing in Jazelle state: All instructions are 8 bits wide Executes java byte codesARM Architecture
  6. 6. 6CPSR: 32-bit register with condition flags, control bits, status & ext. Only privileged modes have full write access to CPSR Every processor mode except user mode can change mode by writingdirectly to the mode bits of the CPSR.ARM Architecture N = Negative result from ALU (bit 31 of the result) Z = Zero result from ALU C = ALU operation results in Carry (if Subtraction result is -ve =>C reset) V = ALU operation oVerflowed Flags are updated only if suffix ‘S’ is added to instruction
  7. 7. 7Banked Registers:
  8. 8. 8 Total 37 registers = 30 general purpose + 6 status + 1 PC Different set of register in different mode of operation User and System mode uses same set of registers Shaded registers (banked registers) are hidden from user/system mode andavailable only in exception modes. R13 = Stack pointer (SP). Each exception mode has its own SP R14 = link register (LR) -> Holds return address of subroutine when it iscalled with BL instruction. Each exception mode has its own SP and LRBL <cc> subroutine_label (LR automatically stores return add.) The return can be in two ways MOV PC, LR or B LRARM Architecture
  9. 9. 9ARM Data Processing Syntax : <opcode> {<cc>} {S} Rd, Rn, op2 ‘op2’ normally comes from barrel shifter and can be the following: Rm and Rs should not be PC (r15) in shift/rotate by register mode of ‘op2’ shift and rotate affects N,Z,C flags # value for shift and rotate is 5-bit unsigned integer
  10. 10. 10
  11. 11. 11ARM The Barrel ShifterDestinationCF 0 Destination CFLSL : Logical Left Shift ASR: Arithmetic Right ShiftMultiplication by a power of 2 Division by a power of 2,preserving the sign bitDestination CF...0 Destination CFLSR : Logical Shift Right ROR: Rotate RightDivision by a power of 2 Bit rotate with wrap aroundfrom LSB to MSBDestinationRRX: Rotate Right ExtendedSingle bit rotate with wrap aroundfrom CF to MSBCF
  12. 12. 12ARM Data Processing Instructions CMP,CMN,TST & TEQ always update flags (even if ‘S’ is not used assuffix) and do not alter any register. They use only Rn and OP2. MOV & MVN use only two operands i.e. Rd and ‘op2’
  13. 13. 13ARM Immediate OperandImmediate Operand (32-bit): obtained by 8-bit constant rotated right even number of positions i.e.0,2,4,…..30. Instruction code contains 8-bit for constant and 4-bit for rotate The assembler converts immediate values to the rotate form: MOV r0,#4096 ; uses 0x40 ror 26 ADD r1,r2,#0xFF0000 ; uses 0xFF ror 16 Examples: ( range of 32-bit constants by rotating #0, #8 & #32 positions) Complement of valid 32-bit obtained as above is also valid 32-bit constant Valid 32-bit constants : 0xFF, 0x104, 0xFF00, 0xF000000F, 0x0FFFFFF0 Invalid 32-bit Constants : 0x101, 0x103, 0xFF1, 0xFF03, 0xFF04
  14. 14. 14Data processing: ADD R9, R5, R5, LSL #3 ; R9 = R5+(R5*8) RSB R9, R5, R5, LSR #3 ; R9 = (R5/8) – R5 MOV R12, R4, ROR R3 ;R12= R4 rotated right by value of R3 CMP R7, R5 ; update flags after (R7-R5)Conditional Execution: ARM instructions can be made to execute conditionally by post fixingthem with the appropriate condition code field. (e.g. MOVEQ R0,R1) Condition reflects the status of flags If condition is true, normal execution otherwise no execution. Adv. => Greater pipeline performance and higher code density leading tohigher instructions throughput
  15. 15. 15ARM Conditional Execution
  16. 16. 16 Set the flags, and then use various conditional code CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x) MOVEQ r1, # 0 if (a>0) x=1; MOVGT r1, #1 Set of Conditional compare instruction CMP r0, # 4 if (a==4 or a==10) CMPNE r0, #10 x=0; MOVEQ r1, # 0 Reduces number of instructionsWhile (a!=b) {if (a>b) a=a-b; else b=b-a; } (here r1 = a, r2= b)------------------------------------------------------------------------------------------loop: CMP r1,r2 loop1: CMP r1, r2BEQ finish SUBGT r1, r1, r2BLT lessthan SUBLT r2, r2, r1SUB r1, r1, r2 BNE loop1B looplessthan : SUB r2,r2,r1B loopfinishARM Conditional Execution
  17. 17. 17 B <cc> label : branch to label( MOV LR, PC can be used before above inst. to store return add.) BL <cc> subroutine_label (LR automatically stores return add.)The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to the PC ± 32 Mbyte range How to perform longer branches? (use BX Rm) BX Rm : branch with exchange If LSB of Rm is 1, processor switches to thumb state otherwise itwill remain in ARM state. PC= Rm & 0xFFFFFFFE Useful to provide interlinking between ARM and Thumb state BLX Rm : similar to BX Rm but additionally stores return address inLR BLX label : Branching in ± 32Mbyte range with LR storing return address Makes T=1 and Enters into Thumb state The T bit must not be changed by directly writing to CPSR to changethe state of CPUARM Brach Instructions
  18. 18. 18ARM Multiply Normal (32-bit result) and long(64-bit result) multiplication Syntax: MUL {<cc>} {S} Rd, Rm, Rs ; Rd = Rm * Rs MLA {<cc>}{S} Rd,Rm,Rs,Rn ; Rd = (Rm * Rs) + Rn [U or S] MULL{<cond>}{S} RdLo, RdHi, Rm, Rs; RdHi,RdLo := Rm*Rs [U or S] MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs; RdHi,RdLo := (Rm*Rs)+RdHi, RdLo MUL and MLA truncates result to least significant 32bits Rd must be different register than Rm or Rs Rs and Rm can be swapped N and Z flags are affected (of course if suffix ‘S’ is used)
  19. 19. 19ARM Load & Store Instructions Data movement between registers and memory Instructions : opcode <cc> Rd, <address>LDR STR ;32-bit Word load & storeLDRB STRB ;Byte load & storeLDRH STRH ;16-bit Halfword load & storeLDRSB ;Signed byte loadLDRSH ;Signed halfword load LDRB and LDRH copy 8-bit and 16-bit quantities from memoryto destination register and forces high bits of destinationregister to zero. For LDRSB and LDRSH the high bits ofdestination register is replaced by sign extension Address: Formed by base register and offset Base register can be any general purpose register including PC Offset ( for 32-bit Word and unsigned Byte) immediate (# 12-bit value) register or scaled register (Rm with shift/rotate by # immediate only) Offset for H,SH & SB :- immediate value (# 8bit) and register
  20. 20. 20Load & Store Instructions Choice of indexing :- Pre-index, Pre-index write back and post indexaddressing Post index and Pre-index write back modify base register value.Examples:- LDR R8, [R3, # -3] ; Load R8 from address R3-3 (Pre index) LDR R3, [R9], # 4 ; Load R3 from address R9 then R9=R9+4 (post index) STRB R7, [R6, # -1] ! ; Store byte at R6-1 from R7 and then decrementR6. (pre index with write back) LDR R0, [PC, -R2] ; load R0 from PC-R2 LDR R11, [R3, R5, LSL # 2] ;Load R11 from R3 + R5*4Note: By default, we assume ‘little endian’ format where lower byteof word is stored at lower address. In ‘big endian’ format lower byteof word is stored at higher address.
  21. 21. 21ARM Pre & Post indexing0x50x5r10x200BaseRegister 0x200r00x5SourceRegisterfor STROffset12 0x20cr10x200OriginalBaseRegister0x200r00x5SourceRegisterfor STROffset12 0x20cr10x20cUpdatedBaseRegisterPre-indexed write back : STR r0,[r1,#12]! Pre-indexed: STR r0, [r1, #12] Post-indexed: STR r0, [r1], #12=> R1=0x20c after instruction
  22. 22. 22ARM Load/Store Multiple Multiple register load and store with single instruction Syntax : LDM <CC> <add_mode> Rn {!} , {registers}{^} STM <CC> <add_mode> Rn {!} , {registers}{^}where add_mode :- IA | IB | DA | DB |Rn (base address) :- must not be PC, must not appear in registerlist if ! (write back) is specified Block memory copy: R9 -> points to start source, R11-> points toend of source, R10 -> points to start of destinationloop : LDMIA R9!, {R0}STMIA R10!, {R0}CMP R9,R11BNE loop Stack Opertions: SP replaces Rn add_mode :- FD | FA | ED | EA
  23. 23. 23ARM Stack OperationsExample : Let R1=0x00000002, R4=0x00000003,SP=0x00000814 STMFD sp! , {R1,R4} ; full descending stack writeAfter inst.: SP=0x0000080c , mem[0x810]=R4, mem[0x80c]=R1 Only Exception modes use ‘^’ (not used in user/system mode) F and E signify whether SP points to location that is full or empty Stack is either ascending (growing towards high memory add.) ordescending (growing towards low memory add.) One of the following pair is used to save context at start ofroutine/hander and retrieve context at the end of routine/handler
  24. 24. 24
  25. 25. 25ARM Miscellaneous Instr. SWP <cc> Rd, Rm, [Rn] Swap a word between memory and a register tmp= mem32[Rn], mem32[Rn]=Rm and Rd=tmp SWPB <cc> Rd, Rm, [Rn] Swap a byte between memory and a register Tmp=mem8[Rn], mem8[Rn]=Rm and Rd=tmp The swap instruction is atomic- it reads and writes a location in thesame bus cycle. Useful in implementing semaphore and mutualexclusion.CPSR instructions: MRS {<cc>} Rd, <CPSR | SPSR> ;copy from PSR toregister MSR {<cc>} <CPSR | SPSR>_<fields>, Rm MSR {<cc>} <CPSR | SPSR>_<fields>, # immediate <fields> can be f, s, x and c representing respective byte ofCPSR/SPSR MSR cpsr_c, R0 ; update only control byte of CPSR MSR cpsr_fsc, R0 ; update flags, status and control byteof CPSR In user mode you can read all CPSR bits but you can update only fbyte
  26. 26. 26 Count leading zeros : CLZ <cc> Rd, RmPseudo Instructions: LDR Rd, =constant (assembly pseudo instruction)if constant can be constructed with MOV or MVN thenthis instruction is actually generated. Otherwiseassembler generates a PC-relative LDR instructionthat reads the constant from the literal pool.You are responsible for ensuring that there is a literalpool within 4KB range. ADR Rd, labelthis pseudo instruction writes address of label intoregister, using PC-relative expression
  27. 27. 27Exceptions: Generated by internal (e.g. undefined inst.) or external (e.g.interrupts) sources On exception, processor changes the mode. The address ofnext instruction is copied to LR_<mode> and CPSR is copiedto SPSR_<mode>. Here LR_<mode> and SPSR_<mode> areLR and SPSR of newly entered exception mode Forceful mode change doesn’t copy CPSR to SPSR_<mode>ARM Exceptions
  28. 28. 28ARM Exceptions Events from internal and external sources that diverts normal flowof execution Reset and SWI switches processor to Supervisor mode Exception vector table -> starting address of exception handler Each exception handler need to restore registers and state of CPU
  29. 29. 29ARM Exceptions When an exception occurs, the ARM automatically: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits to Switch to ARM state (i.e. makes T=0) Change exception mode Disable interrupts IRQ Disable FIQ only when FIQ & reset occurs Stores the return address(i.e. PC - 4) in LR_<mode> Sets PC to vector address To return, exception handler needs to: Restore CPSR from SPSR_<mode> Restore PC from LR_<mode>
  30. 30. 30ARM ExceptionsReturn from Exceptions: When exception occurs, return address stored in LR (i.e.PC-4)may not be address of next instruction (because PC may or maynot be updated when exception occurs) Normally PC points to instruction being fetched, PC-4 points toinstruction decoded and PC-8 points to instruction executed Return from SWI and undefined instruction: PC is not updated when these exception are taken. So PC- 4is the actually return address which is already there in LR Return from handler : MOVS PC, LR Return from IRQ and FIQ exception: Interrupt exception occurs only after PC is updated. So PC-4is pointing to one instruction beyond the actual return address Return from handler :SUB LR, LR, #4MOVS PC, LR
  31. 31. ARM Exceptions Return from pre-fetch abort : PC not updated, so to return on same instruction Return :SUB LR, LR, #4MOVS PC, LR Return from Data Abort: PC is updated, so to return on same instruction Return :SUB LR, LR, #8MOVS PC, LR Suffix ‘S’ after MOV & SUB => restores CPSR from SPSR_mode31
  32. 32. 32ARM Exceptions Exception Priorities: Reset is highest priority exception initializes memory, caches,stack pointer etc. Lowest priority is shared by two mutually exclusive exceptions:SWI and Undefined IRQ is disabled when any exception occurs FIQ is disabled only when Reset or FIQ occurs otherwiseremains unchanged Placing Data Abort above FIQ exception ensures that data abortis actually registered before FIQ is handled.
  33. 33. 33Software Interrupt User mode uses SWI instruction (that causes exception) toaccess privileged operation (e.g. OS services) fromSupervisor mode Syntax : SWI <cc> SWI_number(24bit) SWI_number :- represents a particular service or featureof OS SWI_number = SWI_opcode AND (0x00ffffff) When CPU executes SWI instruction: Copies CPSR to SPSR_svc of Supervisor mode Set appropriate CPSR bits to Change exception mode Disable IRQ Stores return address in LR_svc Set PC to vector address
  34. 34. 34Software Interrupt Top level SWI handler determines SWI_number and uses this numberto call appropriate SWI service routine. STMFD SP!, {R0-R12, LR_svc} ; save context of user mode LDR R10, [LR, # - 4] ; read SWI instruction opcode AND R10, R10, # 0x00FFFFFF ; get 24-bit number in R10 MOV R10, R10 LSL #2 ; word align the offset ADD R9, R9, R10 ; add base R9 to offset BLX R9 ; go to appropriate location injump table LDMFD SP!, {R0-R12, PC}^ ;return from handler (to usermode), restore registers and CPSR R9 is pointer to the beginning of jump table. R10 (offset) picks out aparticular entry from jump table ^ in last instruction causes SPSR_svc to be copied to CPSRautomatically if PC appears in list
  35. 35. 35Software Interrupt Instruction BL jump_table save return address in LR_SVC.Routine ‘num0’ returns to supervisor mode after completion. When context is restored in Supervisor mode, LR is copied toPC and switches back to user mode Software interrupt can be nested by writing SWI instruction inSWI routine
  36. 36. 36Nested SWIsReentrant SWI Handling: Corruption of SPSR and LR by nested SWI calls causesproblem e.g. 2nd SWI exception in 1st SWI routine maycorrupt SPSR_SVC and LR_SVC Remedy: Save Context (i.e.registers, SPSR and LR ) at thebeginning of Handler so that each SWI call preservesenvironment of caller. When SWI routine completed,restore Context. Following assembly code of SWI hander is reentrant andsafely handles nested SWI calls Register R9 (base address) is pointing to beginning ofBranch TableSWI handler :STMFD SP!, {R0-R12,LR} ; Store registers and LR_SVCMRS R2, SPSR ; Get SPSR_SVC into register R2STR R2, [SP, # -4]! ; Store SPSR_SVC in stack
  37. 37. 37Nested SWIsLDR R10, [LR, # - 4] ; read SWI instruction opcodeAND R10, R10, # 0x00FFFFFF ; get 24-bit number in R10MOV R10, R10 LSL #2 ; word align the offsetADD R9, R9, R10 ; add base R9 to offsetBLX R9 ; go to appropriate location inbranch tableLDR R2, [SP], # 4 ; Restore SPSR_SVC from stackMSR SPSR, R2LDMFD SP!, {R0-R12,LR} ; restore registersMOVS PC, LR ; Return from current routine
  38. 38. 38Software Interrupt Suffix ‘S’ in MOVS signifies that SPSR is also copied to CPSR
  39. 39. 39Thumb Instructions On average, thumb program takes 35% less memory (highcode density) 16-bit fixed size instructions => higher performance than ARMwith 16-bit data bus How Thumb instructions differ from ARM? Only branch instruction (B label) is executed conditionally Barrel shift operations are separate instructions Multiple load/store (LDM/STM) support only IA mode. PUSH & POP instructions for stack operation ( only fulldescending stack) No instruction to access CPSR, SPSR and co-processor Restricted Register access You must switch to ARM state to alter CPSR & SPSR and toaccess coprocessor
  40. 40. 40Thumb Instructions ARM-Thumb inter-working:BX and BLX instructions of ARM and Thumb does samethingCODE32 ; followings are word aligned codesLDR R0, thumbcode +1 ; set LSB of R0 to 1, pointR0[31:1] to thumbcodeMOV LR, PC ; store return addressBX R0 ; branch to thumb state---------------------------------------------------------------------------CODE16 ; followings are half word aligned codesthumbcodeADD R1, #1 ; thumb instructions. . . . . . . ; “BX LR ; return to ARM stateThumb Instructions: Branch Instructions:
  41. 41. 41Thumb InstructionsB <cc> label : branch to label with condition Branch range is -256 to +254B label : branch to label without conditional code Branch range is -2048 to +2046BL <cc> subroutine_label (LR automatically stores return add.) ± 4 Mbytes rangeBX Rm : branch with exchange If LSB of Rm is 0, processor switches to ARM state otherwise itwill remain in THUMB state. PC = Rm & 0xFFFFFFFEBLX Rm : similar to BX Rm but additionally stores return addressin LRBLX label Branching in ± 4 Mbytes range with LR storing return address Makes T=0 and Enters into ARM state
  42. 42. 42 Data Processing Instructions: ADD/ADC/AND/BIC/EOR/MOV/MUL/MVN/NEG/ORR/SBC/SUB Rd, Rn ADD/SUB Rd, Rn #immed3 ADD/MOV/SUB Rd, #immed8 ADD/SUB Rd, Rn, Rm ADD Rd, PC, #immed8*4 (i.e. 0,4,8, …. ,1020) ADD Rd, SP, #immed8*4 ADD/ SUB SP, #immed7*4 (i.e. 0,4,8, ….., 508) CMN/CMP/TST Rn, Rm CMP Rn, #immed8 MOV Rn, RdBarrel Shift Instructions: LSL/LSR/ASR Rd, Rm, #immed5 ASR/LSL/LSR/ROR Rd, Rs Single Register Load/Store Instructions LDR/STR {B|H} Rd, [Rn, #immed5] LDR { H | SB | SH } Rd, [Rn, Rm] STR {B | H} Rd, [Rn, Rm] LDR Rd, [PC, #immed8*4] LDR / STR Rd, [SP, #immed8*4]Thumb Instructions
  43. 43. 43 Multiple Register Load/Store LDM / STM { IA } Rn!, { low register list} Stack Instructions: POP { low register_list, PC } PUSH { low register_list, LR } There is no SP in instruction but SP is automaticallyupdated Stack is always full descending Software Interrupt:SWI Number(8-bit) Switches to ARM state and takes similar actions as ARMequivalent SWI Unlike ARM it can’t be executed conditionallyThumb Instructions
  44. 44. 44ARM Programs[1] Bit-Field Manipulation: Packing/Unpacking of bit fields (variable size) e.g. variable lengthcode Used to create compressed file that packs item at bit granularityEx:- Bit Field Pack/Unpack:R0 contains code to be written to R1. Let Rm contains value of no.free bits available in R1 and ‘codelen’ is length of codeAlgorithm for Variable Length Code Packing: Pack variable length code to create bytestream Initially codes are packed in 32-bit buffer (reg. R1) from MSB to LSB.Once buffer is full, it can be stored to memory Sometimes code needs to be split into two parts. We make buffer fullwith 1st part, store the buffer in memory and write 2nd part in emptybuffer
  45. 45. 45 Three functions in packing: (1) Align byte-stream pointer (2)insertcodes to bitbuff and store bitbuff in mem.(3) finishing of byte stream Byte stream pointer may not be word aligned at the end of write. Nextwrite must begin with word aligned address ARM CODE:bytestream R0 ; current byte add. in output streamcode R4 ; current code
  46. 46. 46codelen R5 ; length of current codebitbuff R6 ; 32-bit big endian bufferbitsfree R7 ; no. of bits free in ‘bitbuff’temp R8 ; used bits of ‘bitbuff’write_start ; 1st routine (to word align ‘bytestream’)MOV bitbuffer, #0MOV bitsfree, # 32align_loop:TST bytestream, #3 ; is ‘bytestream’ is aligned ?LDRNEB code, [bytestream, # -1]! ; if not, get byteSUBNE bitsfree, bitsfree, # 8 ; update ‘bitsfree’ORRNE bitbuff, code, bitbuff, ROR # 8 ; copy byte in ‘bitbuff’BNE align_loop ; loop until ‘bytestream’ is alignedMOV bitbuff, bitbuff, ROR #8 ; adjust ‘bitbuff’MOV PC, LR ; return- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  47. 47. 47write_code ; 2nd routine ( to write codes in buffer & store buffer if; it gets full )SUBS bitsfree, bitsfree, codelen ; is bitsfree > code length?BLE buff_full ; if not branch to ‘buff_full’ORR bitbuff, bitbuff, code, LSL bitsfree ; otherwise write codeMOV PC, LR ; returnbuff_full:RSB bitsfree, bitsfree, # 0 ; make ‘bitsfree’ positiveORR bitbuff, bitbuff, code, LSR bitsfree ; write 1st part of split codeSTR bitbuff, [bytestream], #4 ; store ‘bitbuff’ in memoryRSB bitsfree, bitsfree, #32 ; update ‘bitsfree’MOV bitbuff, code, LSL bitsfree ; write 2nd part of split codeMOV PC, LR ; return- - - - - - - - - - - - - - - - - - - - - - - - - - - -
  48. 48. 48write_finish ; 3RD routine (to finish packing)RSBS temp, bitsfree, #32 ; temp = no. of used bits in ‘bitbuff’finish_loop:STRGTB bitbuff, [bytestream], # 1 ;start storing bytes of ‘bitbuff’ inMOVGT bitbuff, bitbuff, LSL # 8 ; memory from MSBSUBGTS temp, temp, #8 ; update ‘temp’BGT finish_loop ; loop till temp >0MOV PC, LR ; returnNote: Above code assumes ‘big endian’ data transfer
  49. 49. [2] SIMD processing: Let us consider graphics example of processing multiple 8-bit pixels of animage Problem : merge two images X and Y to produce new image Z by scaling Xwith a/256 and Y with 1- (a/256) where 0<a<256. let xn and yn and zn denotes nth 8-bit pixel of X, Y and Z zn = ( a/256 xn + {1- a/256)} yn Zn = wn/256 where wn = a(xn –yn) + 256 yn We load four pixels at once in 32-bit ARM register xx = [x3,x2,x1,x0] We need two expanded pixels in ARM register x = [0,x2,0,x0]ARM Programs49
  50. 50. IMG_W equ 176IMG_H equ 144pz R0 ; pointer to destination imagepx R1 ; pointer to first image Xpy R2 ; pointer to second image YaR3 ; 8-bit scaling factorxx R4 ; holds four pixels of Xyy R5 ; holds four pixels of Yx R6 ; holds two expanded pixels of X i.e. [0, x2, 0, x0]y R7 ; holds two expanded pixels of Y i.e. [0, y2, 0, y0]z R8 ; holds four pixels of Zcnt R9 ; number of remaining pixelsSTMFD sp!, {R4-R8, LR }MOV cnt, # IMG_W * IMG_HLDR mask, =0x00FF00FFloop:LDR xx, [px], #4LDR yy, [py], #450
  51. 51. AND x, mask, xxAND y, mask, yySUB x, x, yMUL x, a, xADD x, x, y, LSL #8AND z, mask, x, LSR#8AND x, mask, xx, LSR #8AND y, mask, yy, LSR #8SUB x, x, yMUL x, a, xADD x, x, y, LSL #8AND x, mask, x, LSR #8ORR z, z, x, LSL #8STR z, [pz], #4SUBS cnt, cnt, #4BGT loopLDMFD sp!, {r4-r8, PC}51
  52. 52. 52
  53. 53. ARM7 TDMI block diagram53
  54. 54. External Interface through AMBA Bus54AMBAInterfaceInst. & data cacheMMUARM CoreCP15EmbeddedICE & JTAGWriteBufferAMBAAddressAMBADataVirtualAddressPhysicalAddressInst. & data
  55. 55.  JTAG TAP controller: Basically used to test PCB assembly, interconnect or even sub blockinside IC without any physical prob. JTAG scan chain => embedded solution to testing an IC for certainstatic faults (shorts, opens, and logic errors). ICs supporting JTAG will have the four additional pins : Test Clock(TCK), Test Mode Select (TMS), Test Data Input (TDI), and Test DataOutput (TDO). Embedded ICE (In Circuit Emulator): Used to debug software of embedded system through breakpoints andwatch-points Breakpoint is an address at which program execution halts Watch point is a value that may combine address, data or controlsignals. When match occurs, debug event is generated that haltsprocessor execution uses JTAG as the transport mechanism to access on-chip debugmodules inside the target CPU55
  56. 56. DATA BUS Uni & Bidirectional Data Bus: When BUSEN is HIGH, all instruction and input data are presented toDIN[31:0] whereas output data appears on DOUT[31:0] When BUSEN is LOW, only bidirectional D[31:0] is used Unidirectional data bus is used for coprocessor/external IC connection56
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×