Successfully reported this slideshow.

ARM AAE - Intrustion Sets

4

Share

Upcoming SlideShare
Unit II Arm 7 Introduction
Unit II Arm 7 Introduction
Loading in …3
×
1 of 42
1 of 42

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

ARM AAE - Intrustion Sets

  1. 1. SOFTWARE & SYSTEMS DESIGN 3 – Instruction Sets
  2. 2. AGENDA • Instruction Sets VFP and NEON Pipelines AAETC3v00 Instruction Sets 2 Pipelines Cycle Counting
  3. 3. INSTRUCTION SET • ARM instruction set – All instructions are 32-bit – Most instructions can be executed conditionally • Thumb instruction set – 16-bit instruction set No condition execution (except for branches) AAETC3v00 Instruction Sets 3 – 16-bit instruction set – No condition execution (except for branches) – Optimized for code density from C code (~65% of ARM code size) • Thumb-2 technology – Extension to Thumb instruction set – Mix of 16-bit and 32-bit instructions – Condition execution via IT instruction – Higher performance than Thumb and smaller than ARM
  4. 4. ASSEMBLER SYNTAX • Data processing instructions <operation><condition> Rd, Rm, <op2> ADDEQ r4, r5, r6 // if (EQ) r4 = r5 + r6 ORR r2, r3, r6, LSL #4 // if (EQ) r4 = r5 + r6 SUBS r5, r7, #4 // r5 = r7 – 4; set flags MOV r4, #7 // r4 = 7 • Memory access instructions AAETC3v00 Instruction Sets 4 • Memory access instructions <operation><size> Rd, [<address>] LDR r0, [r6, #4] // r0 = *(r6 + 4) STRB r4, [r7], #8 // *(byte *) r7 = r4; r7 += 8 <operation><addressing mode> <Rn>!, <registers list> LDMIA r0, {r1, r2, r7} STMFD sp!, {r4-r11, lr} • Program flow instructions <branch> <label> BL foo B baR
  5. 5. DATA PROCESSING INSTRUCTIONS • These instructions operate on the contents of registers – They DO NOT affect memory arithmetic logical move manipulation (has destination register) ADD ADC SUB SBC RSB RSC AND EOR MOV ORR ORN BIC T2T2 MVN AAETC3v00 Instruction Sets 5 • Syntax: <Operation>{S}{<cond>} {Rd,} Rn, Operand2 • Examples: ADD r0, r1, r2 ; r0 = r1 + r2 TEQ r0, r1 ; if r0 = r1, Z flag will be set MOV r0, r1 ; copy r1 to r0 comparison (set flags only) CMN (ADDS) CMP (SUBS) TST (ANDS) TEQ (EORS)
  6. 6. MULTIPLY / DIVIDE • 32-bit multiplication 64-bit multiplication ×××× Rn Rm + ×××× Rn Rm Ra +/- optional accumulation optional accumulation MUL MLA MLS UMULL SMULL UMLAL SMLAL AAETC3v00 Instruction Sets 6 Examples: MLA r0, r1, r2, r3 ; r0 = r3 + (r1 * r2) [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3 Division: SDIV r0, r1, r2 ; signed: r0 = r1 / r2 UDIV r0, r1, r2 ; unsigned: r0 = r1 / r2 RdHi RdLoRdMLS SMLAL Optional in 7-A
  7. 7. BIT MANIPULATION INSTRUCTIONS 031 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 031 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0 031 BFI r0, r0, #9, #6 ; Bit Field Insert UBFX r1, r0, #18, #7 ; Bit Field Extract 1 1 0 1 0 0 1 0 1 0 011 1 0 1 0 0 r0 r0 AAETC3v00 Instruction Sets 7 031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 031 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BFC r1, #3, #4 ; Bit Field Clear 0 RBIT r2, r1 ; Reverse Bit Order 0 Zero extend r1 r2 031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 0 0r1
  8. 8. BYTE REVERSAL • Byte Reversal Instructions REV{cond} Rd, Rm Reverses the bytes in a word REV16{cond} Rd, Rm Reverses the bytes in each halfword 3 2 01 0 1 32 REV r0, r0 AAETC3v00 Instruction Sets 8 REV16{cond} Rd, Rm Reverses the bytes in each halfword REVSH{cond} Rd, Rm Reverses the bottom two bytes, and sign extends to 32 bits V6 and later REV r0, r0 Pre-V6 EOR r1, r0, r0, ROR #16 BIC r1, r1, #0xFF0000 MOV r0, r0, ROR #8 EOR r0, r0, r1, LSR #8
  9. 9. SIMD • ARMv6 added a number of instructions which perform SIMD (Single Instruction Multiple Data) operations using ARM registers – Includes instructions for addition, subtraction, multiplication and sum of absolute differences – Instructions can work on four 8-bit quantities, or two 16-bit quantities – Signed/unsigned and saturating versions available of many instructions – CPSR GE bits used instead of normal ALU flags UADD16 Rd, Rm, Rs AAETC3v00 Instruction Sets 9 • There are instructions for packing (PKHBT/PKHTB) and unpacking (UXTH/UXTB) registers + Rs + Rm UADD16 Rd, Rm, Rs Rd GE[3:2] GE[1:0]
  10. 10. SATURATED MATH AND CLZ • Support for Saturated Arithmetic – Targeted at DSP & control applications – Overflow sets Q flag (sticky) not V, and sets result to +/- max value QSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - Rn) QADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + Rn) 0x0 0x7FFFFFFF 0x80000000 -ve +ve AAETC3v00 Instruction Sets 10 QDSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - saturate(Rn * 2)) QDADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + saturate(Rn * 2)) • Count Leading Zeros CLZ{cond} Rd, Rm – Returns number of unset bits before the most significant set bit 031 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 CLZ returns 10 in this case
  11. 11. SATURATION • Saturate a value to a specified bit position (effectively saturating to any power of 2) – USAT - Unsigned saturate 32-bit • Syntax: USAT Rd, #sat, Rm {shift} • Operation: Rd = Saturate(Shift(Rm), #sat) 0 0 1 1 1 saturation position max (unsigned saturation) max min AAETC3v00 Instruction Sets 11 – Variants SSAT - signed saturation USAT16 - saturates two 16-bit unsigned halfwords (no rotation allowed) SSAT16 - signed saturation of two 16-bit halfwords (no rotation allowed) – #sat is specified as an immediate value in the range 0 to 31 – {shift} is optional and is limited to LSL or ASR – Q flag is set if saturation occurs 0 0 0 1 1 max 1 1 1 0 0 min (signed saturation)
  12. 12. SINGLE / DOUBLE REGISTER DATA TRANSFER • Use to move data between one or two registers and memory LDRD STRD Doubleword LDR STR Word LDRB STRB Byte LDRH STRH Halfword LDRSB Signed byte load LDRSH Signed halfword load Memory 31 0 AAETC3v00 Instruction Sets 12 • Syntax: – LDR{<size>}{<cond>} Rd, <address> – STR{<size>}{<cond>} Rd, <address> • Example: – LDRB r0, [r1] ; load bottom byte of r0 from the ; byte of memory at address in r1 Any remaining space zero filled or sign extended Rd
  13. 13. ADDRESSING MEMORY • The address accessed by LDR/STR is specified by a base register with an optional offset – Base register only (no offset) LDR r0, [r1] – Base register plus constant LDR r0, [r1, #8] r2, LSL #2 AAETC3v00 Instruction Sets 13 LDR r0, [r1, #8] – Base register, plus register (optionally shifted by an immediate value) LDR r0, [r1, r2] LDR r0, [r1, r2, LSL #2] – The offset can be either added or subtracted from the base register LDR r0, [r1, #-8] LDR r0, [r1, -r2] LDR r0, [r1, -r2, LSL #2] +/- r1 #8 r0 memory address r2, LSL #2 or
  14. 14. PRE- AND POST-INDEXED ADDRESSING • Post-indexed (add offset after memory access) LDR r0, [r1], #12 • Pre-indexed (add offset before memory access) LDR r0, [r1, #12]{!} + r1 #12 address r1 address AAETC3v00 Instruction Sets 14 r0 memory r0 memory + r1 #12 r1 • If ‘!’ present, update base register (r1) • Always update base register (r1) + r1 #12 r1
  15. 15. • These instructions move data between multiple registers and memory • Syntax <LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list> • 4 addressing modes • Increment after/before • Decrement after/before MULTIPLE REGISTER DATA TRANSFER (IA) r1 Increasing r4 r1 r4 r0 IB DA DB AAETC3v00 Instruction Sets 15 • Also PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register • Example LDM r10, {r0,r1,r4} ; load registers, using r10 base PUSH {r4-r6,pc} ; store registers, using SP base Increasing Addressr0 r1 r4 r0 r1 r4 r0 r10Base Register (Rb)
  16. 16. INSTRUCTIONS FOR LOADING CONSTANTS • The assembler provides some instructions for loading values into registers – These are the recommended mechanisms for loading constants into registers • PC- or register-relative constants ADR Rn, label • Add or subtract an immediate value to or from the PC to generate the • Absolute constants LDR Rn, =<constant> LDR Rn, =label AAETC3v00 Instruction Sets 16 to or from the PC to generate the address of the label into the specified register, using one instruction • ADRL pseudo instruction uses two instructions, giving a better range • Can be used to generate addresses for position independent code (but only if in same code section) • Constant determined at run time • Pseudo instruction • Assembler will use optimal sequence to generate constant into specified register (one of MOV, MVN or an LDR from a literal pool) • Can load to the PC, causing a branch • Use for absolute addressing and references outside the current section (resulting in position dependent code) • Constant determined at assembly or link time
  17. 17. LDR= EXAMPLES • The following examples show how the LDR= pseudo instruction makes code more readable, portable and flexible LDR r0, =0x2543 MOV r0, #0x2543 DisassemblyCode AAETC3v00 Instruction Sets 17 LDR r0, =0xFFFF43FF LDR r0, =0xFFFFF5 MVN r0, #0xBC00 LDR r0, [pc, #xx] ... DCD 0xFFFFF5
  18. 18. BRANCH INSTRUCTIONS • Branch instructions have the following format B{<cond>} label – Might not cause a pipeline flush (branch prediction) – Branch range depends on instruction set and width • A BL instruction additionally generates a return address in r14 (lr) – Returning is performed by restoring the program counter (pc) from lr AAETC3v00 Instruction Sets 18 – Returning is performed by restoring the program counter (pc) from lr : BL func2 : : BX lr func1 func2 void func1 (void) { : func2(); : }
  19. 19. BRANCH RANGES • The range of a branch instruction depends on which instruction set is being used • It also varies between different types of branch ARM Thumb B ±32MB ±16MB CBZ/CBNZ 126 bytes AAETC3v00 Instruction Sets 19 CBZ/CBNZ 126 bytes BL/BLX (imm) ±32MB ±16MB BLX (reg) Any Any BX Any Any TBB 510 bytes TBH 131070 bytes “Any” indicates an instruction which can branch to any address in the 4GB address space
  20. 20. READING AND WRITING PC • In general, writing PC causes a branch to the value written – Bit zero controls the execution state (ARM or Thumb) at the destination – The bottom bit of the destination address is always forced to zero – Writing a value with ‘10’ in the bottom two bits results in unpredictable behavior – Note that architectures prior to ARMv7 do not change state when the PC is written directly AAETC3v00 Instruction Sets 20 • Loading PC from memory behaves similarly – Architectures prior to ARMv5T do not change state when the PC is loaded from memory • The PC reads as the address of the current instruction plus an offset – In ARM state, the offset is 8 – In Thumb state, the offset is 4 – This reflects the 3-stage structure of the ARM7TDMI pipeline – In Thumb state, the bottom bit always reads as zero – In ARM state, the bottom two bits will always read as zero
  21. 21. CHANGING STATE • Changing between ARM and Thumb states (or “interworking”) can be carried out using the Branch Exchange instruction BX Rn BLX RN – Bit 0 of Rn determines the exchange behavior • Unset (0) - change to (or remain in) ARM state • Set (1) - change to (or remain in) Thumb state AAETC3v00 Instruction Sets 21 • Branch and Link with Exchange – Used to branch to a subroutine which is known to be in the opposite instruction set – When branching to imported labels use BL, the linker will substitute BLX if necessary BLX offset ; ARM/Thumb instruction which always ; changes state (and sets LR) • All instructions which modify the PC can cause a state change – Depending on bit 0 of the result – For data processing instructions, state changes only if S variant not used
  22. 22. IF-THEN • Thumb only, makes the next 1-4 instructions conditional • Syntax IT{T|E}{T|E}{T|E} <cond> – Any condition code may be used – Doesn’t affect condition flags – 16-bit instructions in the IT block do not affect condition ; if (r0 == 0) ; r0 = *r1 + 2; ; else ; r0 = *r2 + 4; ; if CMP r0, #0 ITTEE EQ AAETC3v00 Instruction Sets 22 – 16-bit instructions in the IT block do not affect condition flags (except CMP, CMN & TST) – 32-bit instructions do affect condition flags (normal rules apply) – No need to write this instruction: the assembler will insert it for you where necessary • Current “if-then status” stored in CPSR – Conditional block may be safely interrupted and returned to – Not recommended to branch into or out of ‘if-then’ block ITTEE EQ ; then LDREQ r0, [r1] ADDEQ r0, #2 ; else LDRNE r0, [r2] ADDNE r0, #4
  23. 23. STATUS REGISTER ACCESS • MRS and MSR allow contents of CPSR/SPSR to be transferred to/from a general purpose register or be set to an immediate value – MSR allows the whole status register, or just parts of it, to be updated MRS r0,CPSR ; read CPSR into r0 BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ MSR CPSR_c,r0 ; write modified value to ‘c’ byte only AAETC3v00 Instruction Sets 23 • CPS can be used to directly modify some bits in the CPSR – These are related to interrupt enable/disable and operating mode • SETEND instruction selects the endianness of data accesses – For use in systems with mixed endian data (e.g. peripherals) SETEND BE LDR r0, [r7], #4 ; big-endian SETEND LE LDR r1, [r7], #4 ; little-endian User mode programs may read all bits of CPSR but may only change the flag bits
  24. 24. SYSTEM CONTROL INSTRUCTIONS • ARM uses coprocessors for “internal functions” so as not to enforce a particular memory map – System Control Coprocessor: cp15 • Used for processor configuration: System ID, caches, MMU, TCMs, etc. – Debug Coprocessor: cp14 • Can be used to access debug control registers AAETC3v00 Instruction Sets 24 • Can be used to access debug control registers – VFP and NEON: cp10 and cp11 • In earlier versions of the architecture, designers were permitted to add external coprocessors – This is not permitted in ARMv7 architecture profiles
  25. 25. AGENDA Instruction Sets • VFP and NEON Pipelines AAETC3v00 Instruction Sets 25 Pipelines Cycle Counting
  26. 26. VFP ARCHITECTURE • VFP (Vector Floating Point) is ARM’s floating point architecture – There have been 4 versions of the architecture to date (VFPv1 is no longer AAETC3v00 Instruction Sets 26 – There have been 4 versions of the architecture to date (VFPv1 is no longer supported) – VFPv2 is supported by ARM9 and ARM11 processor families – VFPv3 and VFPv4 are optional extensions to the ARMv7-AR architecture profiles • VFPv3 (Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5) – Can be implemented with either 16 (VFPv3-D16) or 32 (VFPv3-D32) registers – Can be extended with half-precision conversion functions • VFPv4 (Cortex-A5, Cortex-A7 and Cortex-A15) – Includes half-precision conversion functions – Supports fused multiply-add operations
  27. 27. THE NEON ARCHITECTURE EXTENSION • NEON refers to the Advanced SIMD instruction set extension – Optional extension to ARMv7-AR architecture profiles – The NEON register set is separate from the core register bank – NEON instruction support parallel operations on vectors of elements held in registers – Advanced SIMDv1 is the base NEON architecture • Can be extended with half-precision conversion functions – Advanced SIMDv2 adds fused multiply-add operations AAETC3v00 Instruction Sets 27
  28. 28. AGENDA Instruction Sets VFP and NEON • Pipelines AAETC3v00 Instruction Sets 28 • Pipelines Cycle Counting
  29. 29. Fetch Decode Execute ARM7 Fetch Decode Execute Memory Writeback ARM9 Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3 Shift ALU Saturate Address Data 1 Data 2 Writeback Writeback ARM1136 HISTORIC PIPELINES AAETC3v00 Instruction Sets 29 Address Data 1 Data 2 Writeback Fetch 1 Fetch 2 Fetch 3 Queue Decode Rename Issue Execute 1 Execute 2 MAC 1 MAC 2 Address Load/Store Writeback Writeback Execute 1 Execute 2 Data Engine Writeback Writeback Cortex-A9
  30. 30. Operation Cycle 1 2 3 4 5 6 Execute Fetch Decode Execute Fetch Decode Execute Fetch Decode Fetch Decode Execute Fetch Decode Execute ADD SUB MOV AND ORR ARM7TDMI PIPELINE (DATA PROC) AAETC3v00 Instruction Sets 30 Fetch Decode Execute Fetch Decode Execute Fetch Decode Fetch ORR EOR CMP RSB • In this example it takes 6 clock cycles to execute 6 instructions • All operations here are on registers ( single cycle execution ) • Clock cycles per Instruction (CPI) = 1
  31. 31. ARM7TDMI PIPELINE (LDR) Cycle Operation 1 2 3 4 5 6 ADD SUB LDR FetchFetch Decode Execute Fetch Decode Execute Fetch Decode Execute Data Writeback AAETC3v00 Instruction Sets 31 • In this example it takes 6 clock cycles to execute 4 instructions • Clock cycles per Instruction (CPI) = 1.5 LDR MOV AND ORR Fetch Decode Execute Data Writeback Fetch Decode Execute Fetch Decode Fetch
  32. 32. ARM7TDMI PIPELINE (BRANCH) Fetch Decode Cycle 1 2 3 4 5 0x8000 BL 0x8004 X 0x8008 XX 0x8FEC ADD Address Operation Linkret AdjustFetch Decode Execute Fetch Decode Fetch Execute AAETC3v00 Instruction Sets 32 • Refilling the pipeline • Note that the core is executing in ARM state Fetch Decode Fetch 0x8FEC ADD 0x8FF0 SUB 0x8FF4 MOV Execute Decode Execute Fetch Decode Fetch
  33. 33. Cycle 1 2 3 4 5 6 7 8 IRQ Address Operation Fetch DecodeExecute Linkret Adjust Fetch Decode IRQ Linkret Execute IRQ Adjust 0x8000 ADD 0x8008 MOV 0x8004 SUB 0x800C X Fetch Fetch ARM7TDMI PIPELINE (INTERRUPT) AAETC3v00 Instruction Sets 33 0x0018 B (to 0xAF00) 0x001C XX 0x0020 XXX 0xAF00 STMFD 0xAF04 MOV 0xAF08 LDR Fetch Fetch Fetch Fetch Fetch Fetch Decode Decode Decode Decode Execute Execute IRQ interrupt minimum latency (service routine entry) = 7 cycles
  34. 34. ARM9TDMI PIPELINE (LDR INTERLOCK) Cycle Operation ADD R1, R1, R2 SUB R3, R4, R1 ORR R8, R3, R4 AND R6, R3, R1 1 2 3 4 5 6 7 8 LDR R4, [R7] 9 F D E F D E W F D E W F D E W F D WE W I M S AAETC3v00 Instruction Sets 34 • In this example it takes 7 clock cycles to execute 6 instructions, CPI of 1.2 • The LDR instruction immediately followed by a data operation using the same register causes an interlock EOR R3, R1, R2 F D E W F - Fetch D - Decode E - Execute I - Interlock M - Memory W - Writeback
  35. 35. ARM9TDMI PIPELINE (LDR) Cycle Operation ADD R1, R1, R2 SUB R3, R4, R1 ORR R8, R3, R4 AND R6, R3, R1 LDR R4, [R7] 1 2 3 4 5 6 7 8 9 F D E W F D E W F E W F D E W F D WE M D AAETC3v00 Instruction Sets 35 • In this example it takes 6 cycles to execute 6 instructions, CPI of 1 • Cycle 4 has simultaneous I & D memory accesses • Cycle 5 R4 data available to ORR before written to register – Internal forwarding paths are used EOR R3, R1, R2 F D E W F - Fetch D - Decode E - Execute I - Interlock M - Memory W - Writeback
  36. 36. CORTEX-R4 PIPELINE Decode Issue Pre- Decode Fetch2 Shift ALU Sat MAC 1 MAC 2 Data Cache Data Cache Format Fetch1 A G Common decode pipeline 4 parallel back end pipelines MAC 3 Wr Instruction AAETC3v00 Instruction Sets 36 FPU2 Cache 1 Cache 2 Format FPU0 FPU1 Branch3 Wr G UPrefetch Unit • Dual issue can occur for certain instruction sequences • Enabled at reset, can be disabled in CP15 • AGU = Address Generation Unit • Separate divide pipeline for hardware DIV instruction Branch1Branch2 FPU (Optional) Instruction queue
  37. 37. CORTEX-A9 PIPELINE Prefetch Unit ISS Ex1 Ex1 WB WB De Re BM Main (P0) Dual (P1) M1 Mac (M) Ex2 Ex2 M2 IQ Instruction Address Instruction fetching 64 AAETC3v00 Instruction Sets 37 • IQ: Instruction Queue • Re: Register renaming • BM:Branch Monitor • P0: Main execution pipeline • M: MAC pipeline • P1: Secondary (“dual”) execution pipeline • AGU: Address Generation Unit • LSU: Load/Store Unit • DE: Data Engine - (NEON and/or FPU) pipeline AGU WB Data Engine LSU Load/store (LS) WB Data Engine (DE)
  38. 38. CORTEX-A15 AND CORTEX-A7 Fetch Decode, Rename & Dispatch Loop Cache Queue Issue Integer Integer Multiply Floating-Point / NEON Branch Load Store Writeback AAETC3v00 Instruction Sets 38 Fetch Decode Queue Issue Integer Multiply Floating-Point / NEON Dual Issue Load/Store Writeback Cortex-A15 and Cortex-A7 form an architecturally-identical pair Cortex-A15 is optimized for performance Cortex-A7 is optimized for power consumption Together they can be built into a big.LITTLE configuration
  39. 39. AGENDA Instruction Sets VFP and NEON Pipelines AAETC3v00 Instruction Sets 39 Pipelines • Cycle Counting
  40. 40. CYCLE COUNTING • Early pipelines (e.g. ARM7TDMI) were entirely deterministic and predictable • Later pipelines introduce interlocks and inter-instruction dependencies – Address, resource and data dependencies are all possible AAETC3v00 Instruction Sets 40 – Address, resource and data dependencies are all possible – Interactions between instructions become very complicated • On ARMv7 cores, manual cycle counting is not really possible, so need to use… – Cycle-accurate trace – Simulation models – Performance Monitoring Unit (see later)
  41. 41. PERFORMANCE MONITORING HARDWARE • ARMv7-A cores include a performance monitoring unit (PMU) • A PMU provides a non-intrusive method of collecting execution information from the core – Enabling the PMU does not change the timing of the core • The PMU provides: – Cycle counter – counts execution cycles (optional 1/64 divider) AAETC3v00 Instruction Sets 41 – Cycle counter – counts execution cycles (optional 1/64 divider) – Programmable event counters • The number of counters and available events vary between cores – The PMU can be configured to generate interrupts if a counter overflows • Some examples common to most cores: – Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction, correct/incorrect predictions, Number of instructions executed, etc… • Some events are architecturally defined while others are core-dependent – Check the ARM ARM and your core’s TRM for a full list
  42. 42. SOFTWARE & SYSTEMS DESIGN 3 – Instruction Sets

×