Q4.11: ARM Architecture


Published on

Resource: Q4.11
Name: ARM Architecture
Date: 28-11-2011
Speaker: David Brash

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Q4.11: ARM Architecture

  1. 1. 1 ARMv7-A Architecture Overview David Brash Architecture Program Manager, ARM Ltd.
  2. 2. 2 ARM Architecture roadmap Announced Aug-2010: • Large PhysAddr Extn • PA => 40-bits •Virtualization Extn (v8 builds on these)
  3. 3. 3 ARMv7-A registers R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 (SP) R14 (LR) R0 R0 R0 R0 R0R0 SP_svc LR_svc SP_irq LR_irq SP_und LR_und SP_fiq LR_fiq SP_abt LR_abt R8_fiq R9_fiq R10_fiq R11_fiq R12_fiq SP_hyp R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 LR_(BL) CPSR bits • Flags: NZCVQ • IT[7:0] status bits • GE[3:0] Adv SIMD flags • J (Jazelle) • T (Thumb) • E (endian) • I, A, F masks • M[4:0] (mode) SPSR_svc SPSR_abt SPSR_und SPSR_irq SPSR_hyp ELR_hyp SPSR_fiq CPSR
  4. 4. 4 Classic ARM MMU  32-bit physical address space  2-level translation tables  Pointed to by TTBR0 (user mappings) and TTBR1 (kernel mappings but with restrictions to the user/kernel memory split)  32-bit page table entries  1st level contains 4096 entries (4 pages for PGD)  1MB section per entry or  Pointer to a 2nd level table  Implementation-defined 16MB supersections  2nd level contains 256 entries pointing to 4KB page each  1KB per 2nd level page table  ARMv6/v7 introduced TEX remapping  Memory type becomes a 3-bit index
  5. 5. 5 Classic ARM MMU (cont’d)  Other features  XN (eXecute Never) bit  Different memory types: Normal (cacheable and non-cacheable), Device, Strongly Ordered  Shareability attributes for SMP systems  ASID-tagged TLB (ARMv6 onwards)  Avoids TLB flushing at context switch  8-bit ASID value assigned to an mm_struct  Dynamically allocated (there can be more than 256 processes)
  6. 6. 6 Classic ARM MMU Limitations  Only 32-bit physical address space  Growing market requiring more than 4GB physical address space (both RAM and peripherals)  Supersections can be used to allow up to 40-bit addresses using 16MB sections (implementation-defined feature)  Prior to ARMv6, not a direct link between access permissions and Linux PTE bits  Simplified permission model introduced with ARMv6 but not used by Linux  2nd level page table does not fill a full 4K page  ARM Linux workarounds  Separate array for the Linux PTE bits  1st level entry consists of two 32-bit locations pointing to 2KB 2nd level page table entries
  7. 7. 7 The ARMv7-A Virtualization Extensions Popek and Goldberg summarized the concept in 1974:  Equivalence/Fidelity  A program running under the hypervisor should exhibit a behaviour essentially identical to that demonstrated when running on an equivalent machine directly.  Resource control / Safety  The hypervisor should be in complete control of the virtualized resources.  Efficiency/Performance  A statistically dominant fraction of machine instructions must be executed without hypervisor intervention. "Formal Requirements for Virtualizable Third Generation Architectures". Communications of the ACM 17
  8. 8. 8 ARM hypervisor support philosophy  Virtual machine (VM) scheduling and resource sharing  New Hyp mode for Hypervisor execution  Minimise Hypervisor intervention for “routine” GuestOS tasks  Guest OS page table management  Interrupt control  Guest OS Device Drivers  Syndrome support for trapping key instructions  GuestOS load/store emulation  Privileged control instructions  System instructions (MRS, MSR) to read/write key registers  Virtualized ID register management
  9. 9. 9 ARM virtualization extension - modes A new privilege layer – hypervisor mode  Guest OS kernel given same privilege structure as for a non- virtualized environment  Can run the same instructions  Hypervisor mode has higher privilege  VMM can control a wide range of OS accesses to hardware User Mode (Non-privileged) Supervisor Mode (Privileged) Hyp Mode (More Privileged) Guest Operating System1 App2App1 Guest Operating System2 App2App1 Virtual Machine Monitor (VMM) or Hypervisor
  10. 10. 10 Hypervisor mode and security extensions Hypervisor mode applies to normal world Secure world supports a single virtual machine Monitor mode controls transition between worlds Guest Operating System1 App2App1 Guest Operating System2 App2App1 Virtual Machine Monitor (VMM) or Hypervisor TrustZone Monitor Secure World OS Trusted App2Trusted App1
  11. 11. 11 ARMv7-A Virtualization - key features 1  Trap and control support (HYP mode)  Rich set of trap options (TLB/cache ops, ID groups, instructions)  Syndrome register support  EC: exception class (instr type, I/D abort into/within HYP)  IL: instruction length (0 == 16-bit; 1 == 32-bit)  ISS: instruction specific syndrome (instr fields, reason code, ...) 3 1 2 6 2 5 2 4 0 EC I L ISS
  12. 12. 12 ARMv7-A Virtualization - key features 2  Dedicated Exception Link Register (ELR)  Stores preferred return address on exception entry  New instruction – ERET – for exception return from HYP mode  Other modes overload exception model and procedure call LRs  R14 used by exception entry BL and BLX instructions  Address translation
  13. 13. 13 Two layers of address translation Stage 1 translation owned by guest OS Virtual address map Intermediate physical address map Real System Physical address map Stage 2 translation owned by the VMM Hardware has 2-stage memory translation Tables from Guest OS translate VA to IPA Second set of tables from VMM translate IPA to PA Allows aborts to be routed to appropriate software layer
  14. 14. 14 LPAE – Stage 1 (VA => {I}PA)  Managed by the OS  64-bit descriptors, 512 entries per table  4KB table size == 4KB page size  1 or 2 Translation Table Base Registers  3 levels of table supported  up to 9 address bits per level  2 bits at Level 1  9 bits at Levels 2 and 3  Page Table Entry: 32-bitVirtual Address TTBR1 space TTBR0 space Not mapped (Fault) 232- TTCR.T1SZ 232-TTCR.T0SZ 0 232 T1SZ ≤ T0SZ If T1SZ == T0SZ == 0 then TTBR0 always used 31:30 VA Bits <29:21> VA Bits <20:12> VA Bits <11:0> L1 Level 2 table index Level 3 table (page) index Page offset address 6 3 5 2 4 0 3 0 1 2 2 1 0 Upper attributes SBZ Address out SBZ Lower attributes Valid Block/TableBlock/page • Memory type • Cache policy • Shareability, AF, nG, NS flags • Access permissions •Table override bits (NS, XN, PXN, AP[1:0]) • Block/Page XN, PXN bits • Block/Page 16x entry contiguous hint • IGNORED bits
  15. 15. 15 LPAE – Stage 2 (IPA => PA)  Managed by the VMM / hypervisor  Same table walk scheme as stage 1, now up to 40-bit input address:  2x contiguous 4KB tables allowed at L1; support 240 address space  Up to 1-16 contiguous tables allowed at L2; support 230 - 234 address space  1x Translation Table Base register (VTBR)  Page table entry: 6 3 5 2 4 0 3 0 1 2 2 1 0 Upper attributes SBZ Address out SBZ Lower attributes Valid Block/Table Block / page only • XN bit • 16x entry contiguous hint • IGNORED bits Block/page • Memory type • Cache policy • Shareability, AF bit • R/W Access permissions 3 9 3 4 3 0 2 1 1 2 0 IPA Bits <39:30> IPA Bits <29:21> IPA Bits <20:12> IPA Bits <11:0> IPA Bits <11:0> IPA Bits <33;21> IPA Bits <20:12> IPA Bits <11:0>
  16. 16. 16 Linux Support for ARM LPAE Catalin Marinas ELC Europe 2011
  17. 17. 17 ARM LPAE Features  40-bit physical addresses (1TB)  40-bit intermediate physical addresses (guest PA space)  3-level translation tables  Pointed to by TTBR0 (user mappings) and TTBR1 (kernel mappings)  Not as restrictive on user/kernel memory split (can use 3:1)  With 1GB kernel mapping, the 1st level is skipped  64-bit entries in each level  1st level contains 4 entries (stage 1 translation)  1GB section or  Pointer to 2nd level table  2nd level contains 512 entries (4KB in total)  2MB section or  Pointer to 3rd level
  18. 18. 18 ARM LPAE Features (cont’d)  3rd level contains 512 entries (4KB)  Each addressing a 4KB range  Possibility to set a contiguity flag for 16 consecutive pages  LDRD/STRD (64-bit load/store) instructions are atomic on ARM processors supporting LPAE  Only the simplified page permission model is supported  No kernel RW and user RO combination  Domains are no longer present (they have already been removed in ARMv7 Linux)  Additional spare bits to be used by the OS  Dedicated bits for user, read-only and access flag (young) settings
  19. 19. 19 ARM LPAE Features (cont’d)  ASID is part of the TTBR0 register  Simpler context switching code (no need to deal with speculative TLB fetching with the wrong ASID)  The Context ID register can be used solely for debug/trace  Additional permission control  PXN – Privileged eXecute Never  SCTLR.WXN, SCTLR.UWXN – prevent execution from writable locations (the latter only for user accesses)  APTable – restrict permissions in subsequent page table levels  XNTable, PXNTable – override XN and PXN bits in subsequent page table levels  New registers for the memory region attributes  MAIR0, MAIR1 – 32-bit Memory Attribute Indirection Registers  8 memory types can be configured at a time
  20. 20. 20 ARM LPAE Features 1GB block Table ptr 1st level table VA[31:30] TTBRx 4KB 2MB block Table ptr 2nd level table VA[29:21] 4KB PTE PTE 3rd level table VA[20:12] 4KB 64-bit entry 64-bit entry64-bit entry PGD PMD PTE
  21. 21. 21 Linux and ARM LPAE  Linux + ARM LPAE has the same memory layout as the classic MMU implementation  Described in Documentation/arm/memory.txt  0..TASK_SIZE – user space  PAGE_OFFSET-16M..PAGE_OFFSET-2M – module space  PAGE_OFFSET-2M..PAGE_OFFSET – highmem mappings  Highmem is already supported by the classic MMU  Memory beyond 4G is only accessible via highmem  Page mapping functions use pfn (32-bit variable, PAGE_SHIFT == 12, maximum 44-bit physical addresses)  Original ARM kernel port assumed 32-bit physical addresses  LPAE redefines phys_addr_t, dma_addr_t as u64 (generic typedefs)  New ATAG_MEM64 defined for specifying the 64-bit memory layout
  22. 22. 22 Linux and ARM LPAE (cont’d)  Hard-coded assumptions about 2 levels of page tables  PGDIR_(SHIFT|SIZE|MASK) references converted to PMD_*  swapper_pg_dir extended to cover both 1st and 2nd levels of page tables  1 page for PGD (only 4 entries used for stage 1 translations)  4 pages for PMD  init_mm.pgd points to swapper_pg_dir  TTBR0 used for user mappings and always points to PGD  TTBR1 used for kernel mapping:  3:1 split – TTBR1 points to 4th page of PMD (2 levels only, 1GB)  Classic MMU does not allow the use of TTBR1 for the 3:1 split  2:2 split – TTBR1 points to 3rd PGD entry  1:3 split – TTBR1 points to 1st PGD entry
  23. 23. 23 Linux and ARM LPAE (cont’d)  The page table definitions have been separated into pgtable*-2level.h and pgtable*-3level.h files  Few PTE bits shared between classic and LPAE definitions  Negated definitions of L_PTE_EXEC and L_PTE_WRITE to match the corresponding hardware bits  Memory types are the same and they represent an index in the TEX remapping registers (PRRR/NMRR or MAIR0/MAIR1)  The proc-v7.S file has been duplicated into proc-v7lpae.S  Different register setup for TTBRx and TEX remapping (MAIRx)  Simpler cpu_v7_set_pte_ext (1:1 mapping between hardware and software PTE bits)  Simpler cpu_v7_switch_mm (ASID switched with TTBR0)  Current ARM code converted to pgtable-nopud.h  Not using „nopmd‟ with classic MMU
  24. 24. 24 Linux and ARM LPAE (cont’d)  Lowmem is mapped using 2MB sections in the 2nd level table  PGD and PMD only allocated from lowmem  PTE tables can be allocated from highmem  Exception handling  Different IFSR/DFSR registers structure and exception numbering – arch/arm/mm/fault.c modified accordingly  Error reporting includes PMD information as well  PGD allocation/freeing  Kernel PGD entries copied to the new PGD during pgd_alloc()  Modules and pkmap entries added to the PMD during fault handling  Identity mapping (PA == VA)  Required for enabling or disabling the MMU – secondary CPU booting, CPU hotplug, power management, kexec
  25. 25. 25 Linux and ARM LPAE (cont’d)  Uses pgd_alloc() and pgd_free()  When PHYS_OFFSET > PAGE_OFFSET, kernel PGD entries may be overridden  swapper_pg_dir entries marked with an additional bit – L_PGD_SWAPPER  pgd_free() skips such entries during clean-up
  26. 26. 26 Current Status and Future  Initial development done on software models  Tested on real hardware (FPGA and silicon)  Parts of the LPAE patch set already in mainline  Mainly preparatory patches, not core functionality  Aiming for full support in Linux 3.3  Hardware supporting LPAE  ARM Cortex-A15, Cortex-A7 processors  TI OMAP5 (dual-core ARM Cortex-A15)  Other developments  KVM support for Cortex-A15 – implemented by Christoffer Dall at Columbia University  Uses the Virtualisation extensions together with the LPAE stage 2 translations
  27. 27. 27 Reference  ARM Architecture Reference Manual rev C  Currently beta, not publicly available yet  Specifications publicly available on ARM Infocenter  http://infocenter.arm.com/  ARM architecture -> Reference Manuals -> ARMv7-AR LPA Virtualisation Extensions  Linux patches – ARM architecture development tree  Hosts the latest ARM architecture developments before patches are merged into the mainline kernel  git://github.com/cmarinas/linux.git  When the kernel.org accounts are back  git://git.kernel.org/pub/scm/linux/cmarinas/linux-arm-arch.git
  28. 28. 28 ARM Architecture System Features
  29. 29. 29 Interrupt Controller  ARM standardising on the “Generic Interrupt Controller” [GIC]  Supports the ARMv7 Security Extension  Interrupt groups support (Nonsecure) IRQ and (Secure) FIQ split  Distributor and cpu interface functionality  GICv2 adds support for a virtualized cpu interface  VMM manages physical interface & queues entries for each VM.  GuestOS (VM) configures, acknowledges, processes and completes interrupts directly. Traps to VMM where necessary.  Hypervisor can virtualize FIQs from the physical (Nonsecure, IRQ) interface.
  30. 30. 30 Interrupt Handling – GICv2 Secure handler (monitor mode) or OS FIQ handler OS IRQ handler GuestOS IRQ handler GuestOS FIQ handler VirtualCpu InterfaceN • cpu-specific control + state • Interrupt ACK / END (retire) Virtual Grp0: FIQ Virtual Grp1: IRQ Hypervisor – virtualized distributor • Grp1 physical => Grp1/Grp0 virtual Virtual interrupt management • interrupt queues (list registers) Virtualization support * ARM Architecture Security Extns NEW Distributor - control and configuration • Assignment (≤8 cpu interfaces) • Interrupt grouping (Grp0, Grp1) Grp0: (Secure*, FIQ) Grp1: IRQ always Cpu Interface0 • cpu-specific control + state • Interrupt ACK / END (retire) Grp0: (Secure*, FIQ) Grp1: IRQ always GICv1 Cpu InterfaceN • cpu-specific control + state • Interrupt ACK / END (retire) • software • private (per cpu) peripheral • shared peripheral Interrupt sources:
  31. 31. 31 Generic Timer  Shared “always on” counter  Fast read access for reliable distribution of time.  ≥ ComparedValue or ≤ 0 timer event configuration support  Maskable interrupts  Offset capability for virtual time  Event stream support  Hypervisor trap support SoC Counter Power Controller CPU Cluster1 CPU0 Timer0 CPU1 Timer1 Counter Interface Memory Interconnect and Memory Controller Shared Cache $ $ Timer Distribution Bus Peripheral Bus Always Powered Power Domain GIC CPU Cluster0 CPU0 Timer0 CPU1 Timer1 Counter Interface Shared Cache $ $ GIC Systemevents Implementation example:
  32. 32. 32 System MMU Local Coherence Bus (no snooping on bus) “Event pulse” enabling next cycle notifications  System MMU architecture translation options from pre-set tables  No Translation  VA -> IPA (Stage 1) or IPA->PA (Stage 2) only  VA->PA (Stage1 and Stage2)  Can share page tables with ARM cores – relate contexts to VMIDs/ASIDs L1 I-cache L1 D-cache Core0 L1 I-cache L1 D-cache CoreN L2 cache System interconnect DMA System MMU System Memory
  33. 33. 33 Quad Cortex-A15 Quad Cortex-A15 AMBA 4 Cache Coherent Interconnect CCI-400 I/O device MMU-400 DMC-400 Dynamic Memory Controller AXI Network Interconnect Slaves Slaves AXI Network Interconnect LCDVideo DDR3/ LPDDR2 DDR3/ LPDDR2 PHY GIC-400 Mali Graphics PHY MMU-400 MMU-400 Completing the SoC IPA PA Intermediate Physical Address (IPA) The address the OS believes it‟s accessing Physical Address (PA) The actual address the VMM assigns
  34. 34. 34 To summarize: ARMv7-A additions  A natural progression of well established features applied to an architecture long associated with low power.  „Classic‟ uses and solutions will apply to ARM markets:  Low cost/low power server opportunities  Application OS + RTOS/microkernel smart mobile devices  Feature split by „Manufacturers‟ versus User‟ VM environment  Automotive, home, ...  Opportunities for new use cases?  Today‟s entrepreneurs/ideas => tomorrow‟s major businesses
  35. 35. 35 Thank you