SlideShare a Scribd company logo
SOFTWARE & SYSTEMS
DESIGN
4 – Memory Systems
AGENDA
• Memory System Hierarchy
Tightly Coupled Memory
Alignment, Endianness and Ordering
VMSA and PMSA
AAETC4v00
Memory Systems 2
VMSA and PMSA
Caches and Coherency
Barriers and Synchronization
MEMORY SUBSYSTEM
ARM Core
L1 I-Cache
MMU/MPU
BusInterfaceUnit
CP15
L2Cache
AMBA
Interconnect
L1 L2 L3
AAETC4v00
Memory Systems 3
L1 D-Cache
BusInterfaceUnit
WB
• MMU
• Supports virtual memory, included by all Cortex-A processors
• MPU
• Allows memory protection only, included by all Cortex-R processors
• Some Cortex-M processors support an optional MPU
TYPICAL MEMORY MAP
Peripherals
PL310
GIC/SCU/Timer
AAETC4v00
Memory Systems 4
RAM
ROM
AGENDA
Memory System Hierarchy
• Tightly Coupled Memory
Alignment, Endianness and Ordering
VMSA and PMSA
AAETC4v00
Memory Systems 5
VMSA and PMSA
Caches and Coherency
Barriers and Synchronization
WHAT IS TIGHTLY COUPLED MEMORY?
• An alternative approach to caches
– Allows for high performance operation with slow external memory
– Supported on Cortex-R processors
• Fast memory, local to the processor
– Provides high speed performance without accessing the system bus
– A smaller die size penalty compared to equivalent amount of cache
AAETC4v00
Memory Systems 6
• Appears at fixed locations within the physical memory map
– Code and data can be copied to TCMs by application or library code
– DMA access or an external AXI interface to TCMs are included on some processors
• Can be used for TCM preloading
• Cortex-R4/R5 provides an external AXI slave port for access to TCMs
• Precise real-time performance can be predicted for MPU based cores
– MMU enabled cores have to perform address translation for TCM accesses
• TLB checks will be made and table walks can occur
TCM CONFIGURATION
• TCM enabled cores support two interfaces
– Traditionally referred to as I-TCM and D-TCM
– Also referred to as TCM-A and TCM-B, e.g. Cortex-R4
TCM-B
0x200000
External Memory• Each TCM interface can individually be configured
using CP15 operations
• Physical base address (multiple of size)
• Can overlay external memory
AAETC4v00
Memory Systems 7
Memory map
0x0
TCM-A
TCM blocks• Memory Size (depends on core and
implementation)
• Enable/Disable
• External pin(s) determines post-reset configuration
• Possible to make system boot from TCM memory
• INITRAM pin(s) enables TCMs during core reset
• LOCZRAM pin allows TCM address selection before reset
• Supported on Cortex-R4
• When enabled TCMs must not overlap
AGENDA
Memory System Hierarchy
Tightly Coupled Memory
• Alignment, Endianness and Ordering
VMSA and PMSA
AAETC4v00
Memory Systems 8
VMSA and PMSA
Caches and Coherency
Barriers and Synchronization
ALIGNMENT AND ENDIANNESS
• ARMv4/v5 data alignment
– Prior to ARMv6, all hardware data accesses had to be size aligned (for
example, words on word boundaries)
– Unaligned accesses could be caught by hardware
– Unaligned data in software was accessed by a series of aligned memory
accesses
• ARMv6/v7 data alignment
AAETC4v00
Memory Systems 9
• ARMv6/v7 data alignment
– Data accesses can be unaligned
• Only a sub-set of load/store instructions support unaligned accesses
• Unaligned accesses only allowed to addresses marked as Normal
– The load/store unit will access memory with aligned memory accesses
and make the data available to the CPU
• ARM processors are little-endian
– But can be configured to access big-endian memory systems
MEMORY ORDERING MODEL
• The ARM architecture defines a weak ordering model…
… between accesses to Normal memory regions
… between Normal memory and Device memory accesses
• This means that accesses might not occur in program order
• The architecture also allows for speculative accesses
– Data or instructions fetched from memory before being explicitly referenced
AAETC4v00
Memory Systems 10
– Data or instructions fetched from memory before being explicitly referenced
– Examples of speculative access includes:
• Branch prediction
• Out of order data loads
• Speculative cache line fills
• Speculative data accesses are only allowed to Normal memory
• Speculative instruction fetches are allowed to any region not marked as
XN
WHY DO I CARE ABOUT ACCESS
ORDER?
• In most cases precise access order does not matter
– But sometimes it is necessary to force access ordering
• Examples of when ordering matters:
– Sharing data between different threads/CPUs
• e.g. mail boxes
– Sharing data with peripherals
AAETC4v00
Memory Systems 11
– Sharing data with peripherals
• e.g. DMA operations
– Modifying instruction memory
• e.g. loading a program into RAM or scatter loading
– Modifying memory management scheme
• e.g. context switching or demand paging
• Where access order is important you may need to use barrier
instructions
• Compilers/assemblers will not automatically insert barriers for you!
V6/V7 MEMORY TYPE
• In ARMv6/ARMv7 address locations must be described in terms of
a type
• The “type” tells the processor how accesses to that location must
behave
– Memory access ordering rules
– Caching and buffering behavior
AAETC4v00
Memory Systems 12
– Caching and buffering behavior
– Speculation
• There are three mutually exclusive memory types
– Normal - Data and instructions
– Device - Devices/peripherals
– Strongly-ordered - Device/peripherals, or data used by legacy code
ACCESS ORDERING
• In Normal memory, ARM implements a weakly-ordered memory model
– This means that, in the absence of address or data dependencies, accesses
may be re-ordered, combined and/or repeated without affect on the system
– Speculative access are permitted
• Access ordering
– The table shows the ordering enforced between two memory accesses (A1
AAETC4v00
Memory Systems 13
– The table shows the ordering enforced between two memory accesses (A1
and A2) in each type of memory
– “<“ indicates that access A1 must complete before access A2
– Barrier instructions are required to enforce ordering beyond the default
behavior in the table
AGENDA
Memory System Hierarchy
Tightly Coupled Memory
Alignment, Endianness and Ordering
• VMSA and PMSA
AAETC4v00
Memory Systems 14
• VMSA and PMSA
Caches and Coherency
Barriers and Synchronization
VMSA AND PMSA
• Protected Memory System Architecture
– Allows protection of configurable memory regions
– Regions defined as base address and length
– Number of regions available varies between processors
– Protection is on basis of access type and privilege
– Does not support virtual address translation
AAETC4v00
Memory Systems 15
– Does not support virtual address translation
• Virtual Memory System Architecture
– Implements virtual memory translation
– Supported by all Cortex-A processors
– Uses page tables for translation configuration
– Also implements a full access protection scheme
– Extended to 40-bit physical addressing on latest cores (e.g.
Cortex-A15)
MEMORY PROTECTION UNIT
Peripherals
FLASH
Memory map
MPU region 2
Size: 256MB
Read/Write
MPU region 1
Size: 32MB
Read Only
Normal (Cached)
Executable
• A Memory Protection Unit (MPU)
provides basic memory management
• Allows attributes to be applied to
different address regions
• All accesses checked against MPU
regions
• Each region has:
AAETC4v00
Memory Systems 16
SRAM
MPU region 3
Size: 256KB
Read/Write
Normal (Cached, bufferable)
Executable
Peripherals Read/Write
Device (Bufferable)
Execute Never (XN)
MPU region 0
Size: 4GB
No Access
• Each region has:
• Base address
• Size
• Attributes (e.g. Type)
• Available on:
• ARM1156T2(F)-S
• Cortex-R family
VIRTUAL MEMORY
• Core issues “Virtual Addresses” (VA)
• Memory is accessed using “Physical Addresses” (PA)
• Translation is carried out automatically by Memory Management
Unit (MMU)
• Translation configuration is stored in page tables in external
memory
Virtual Memory Map Physical Memory Map
AAETC4v00
Memory Systems 17
Virtual Memory Map
OS
Application Space
Vectors
Peripherals
Privileged Access
User Access
Uncached
Read-only
Physical Memory Map
FLASH
RAM
Peripherals
THE MEMORY MANAGEMENT UNIT
• The Memory Management Unit (MMU) handles translation of virtual addresses to
ARM Core
MMU
Caches
Memory
Virtual Address Space Physical Address Space
TLBs
Table
Walk
Unit
Translation
Tables
AAETC4v00
Memory Systems 18
• The Memory Management Unit (MMU) handles translation of virtual addresses to
physical addresses
• Provides hardware to read translation tables in memory - called table walking
• CP15 Table Base Registers (TTBR) store physical base addresses of tables
• Translation Look-aside Buffers (TLBs) cache recent translations
• Core can have separate instruction and data TLBs, or a shared unified TLB
• When the MMU is enabled all accesses by the core are passed through it
• MMU will use cached translations from the TLB(s) or perform a table walk
• Translation must occur before cache look-up can complete
LEVEL ONE PAGE TABLES
First-level Table
0x0
0x4
0x8
0xC
0x3FFC
0x3FF8
0x3FF4
0x3FF0
0x3FEC
0x3FE8
Tableoffset(bytes)
ARM Core
Virtual Address
VA
PA
Memory
Physical Address
AAETC4v00
Memory Systems 19
• Diagram shows a single-level page table
• VA to PA mapping at 1MB resolution
• Translation carried out in a single step
• Page table lookup is done automatically by MMU
• Recent translations are cached in internal TLB
Tableoffset(bytes)
Translation Table Base (TTB)
LEVEL TWO PAGE TABLES
First-level table
0x4
0x8
0xC
0x3FFC
0x3FF8
0x3FF4
0x3FF0
0x3FEC
0x3FE8
Second-level tables
0x0
0x4
0x8
0x3FC
0x3F8
0x3F4
0x3F0
4KB Page
Page Table
4KB Page
Page Table
Tableoffset(bytes)
ARM Core
Virtual Address
VA
PA
AAETC4v00
Memory Systems 20
• Second level page table allows mapping at 4KB resolution
• Translation requires two page table look-ups
0x0
0x4
Translation Table Base (TTB)
Tableoffset(bytes)
ACCESS PERMISSIONS AND XN
• Access permission determined by AP[2:0] bits in page table
descriptor
AP Privileged User Notes
000 No access No access Permission fault
001 Read/Write No access Privileged mode access
010 Read/Write Read Permission fault on user write
011 Read/Write Read/Write Full access
AAETC4v00
Memory Systems 21
011 Read/Write Read/Write Full access
100 - - Reserved
101 Read No access Privileged mode read only
110 Read Read Permission fault on writes†
111 Read Read Permission fault on writes
• “eXecute Never” (XN) prevents instruction execution from a region
• Speculative instruction fetches are also suppressed
• The core never makes speculative accesses to Device or Strongly Ordered memory
MMU CONFIGURATION AND
MAINTENANCE
• Enabling the MMU
– The MMU is disabled at reset and is enabled via the SCTLR.M bit
– MMU page tables contain memory type configuration
(Includes shareability, cacheability, bufferability, access permissions etc.)
– All this must be configured before the MMU is enabled
• TLB maintenance
– TLBs cache memory translation information
AAETC4v00
Memory Systems 22
– TLBs cache memory translation information
– Must be invalided when translation table contents are changed
– May also need invalidation on a context switch
– ASID is provided to minimize this
– TLBs should be invalidated by the startup code on reset
• When the MMU is disabled
– PA = VA i.e. no address translation is performed
– Instruction accesses may be cached (controlled by SCRTL.I bit)
– Data accesses will not be cached and are all treated as Strongly ordered
– No access permissions are carried out
AGENDA
Memory System Hierarchy
Tightly Coupled Memory
Alignment, Endianness and Ordering
VMSA and PMSA
AAETC4v00
Memory Systems 23
VMSA and PMSA
• Caches and Coherency
Barriers and Synchronization
CACHES IN CORTEX-A SERIES
PROCESSORS
• Applications processors are usually implemented with two levels of cache
– Separate (Harvard) L1 Instruction and Data caches per core
• Relatively small (typically 32KB), providing fast access inside the L1 subsystem
– A single (unified) L2 cache (integrated or external, depending on the CPU)
• Relatively large (up to 8 MB), with access times slower than L1 memory
accesses
• MMU uses information contained in the translation tables to control which
memory locations are cached
AAETC4v00
Memory Systems 24
memory locations are cached
MMU
CPU0
I-Cache
D-Cache
BusInterfaceUnit
CP15
L2Cache
AMBAInterconnect
SRAM
External
DRAM
AMBAInterconnect
APB
MMU
CPU1
I-Cache
D-Cache
CP15
CACHE TERMINOLOGY
• You should know the meaning of the following
terms…
– Line
– Way
Tag Index Offset
Address:
AAETC4v00
Memory Systems 25
Way
– Set
– Tag
– Index
– Offset
– Data RAM
– Tag RAM
– Valid and Dirty Bits
Tag RAMData RAM
Way
Set
Index
Tag
HOW IS DATA STORED IN MY CACHE?
• Caches handle data in lines (32 or 64 bytes per cache line)
– Physical address used to determine the location of data in cache
• Bottom bits (offset) identify word/byte in line
• Middle bits (index) identify which line
• Top bits (tag) identify remainder of address
• Each line in the cache includes:
Tag RAMData RAM
Index
Tag Index Offset
Address:
AAETC4v00
Memory Systems 26
• Each line in the cache includes:
– Tag bits from the associated physical address
– Valid bit: indicates whether line exists in the cache
– Dirty data bit(s): indicates whether line (or cache line) is not coherent with external
memory
• To reduce cache contention, ARM caches are “set associative”
– There are multiple possible cache locations (ways) for any given address
– A victim counter decides which cache way will be used for an allocation
– Replacement policy used by victim counter varies by core
Way
Set
Tag
EXAMPLE MEMORY ACCESS
Main Memory
Offset
Index
Index
Offset
0x00000000
0x00000010
0x00000020
0x00000030
0x00000040
0x00000050
0x00000060
0x00000070
0x00000080
0x00000090
Way 0 Way 1
…110 ...101
Tag Index Offset
32bit Address: 0x0000007C
...001 11 11 00
Byte
…001
Main Memory
Offset
Index
Index
Offset
0x00000000
0x00000010
0x00000020
0x00000030
0x00000040
0x00000050
0x00000060
0x00000070
0x00000080
0x00000090
Way 0 Way 1
…110 ...101
Tag Index Offset
32bit Address: 0x0000007C
...001 11 11 00
Byte
…001
AAETC4v00
Memory Systems 27
?×
Victim Counter
?
Victim Counter
Way 0 Way 1
Data
==
4. Victim counter specifies which cache Way to use (will Evict previous data)
5. Cache returns requested word to the core
Way 0 Way 1
Data
==
• Memory Read:
LDR r1,[0x0000007C]
1. Cache Lookup is performed
2. Cache Miss - Tag matches fail for
given Index in all Cache Ways
3. Cache Linefill is performed
CACHE BEHAVIOR
• Cache lookup
– The core checks to see if a memory address is currently in the cache
– A “cache miss” occurs if the data is not found
• The cache may then automatically load the relevant data
• This is called a “cache linefill”
– A “cache hit” occurs if the data is found
AAETC4v00
Memory Systems 28
– A “cache hit” occurs if the data is found
• The data is immediately returned to the core
• No external memory access takes place
• Cache Eviction
– In order to make space for new data, existing cache data may have to be
evicted
– In “writeback” mode, dirty data will have to be written back to memory first
• Victim counter
– This is an internal value used to select the data for eviction
CACHE MODES AND POLICIES
• Allocation policy
– Controls when new data is loaded into the cache
– A read-allocate policy only allocates new data on a read miss
– A write-allocate policy also allocates on a write miss
• Eviction policy
– Governs the selection of lines for eviction
AAETC4v00
Memory Systems 29
– Governs the selection of lines for eviction
– A round-robin policy cycles through the lines in a fixed order
– A random policy selects a line at random
• Write-through and Write-back
– Controls what happens when a write operation hits in the cache
– A write-through cache updates external memory in parallel
– A write-back cache does not update external memory
WHEN SHOULD I ENABLE CACHES?
• Caches are disabled on reset
– Architecturally, caches are not guaranteed to be in a known state at reset
– Need to be invalidated by software on Cortex-A9
– Not required on Cortex-A5/A7/A15
• The L1 instruction cache can be enabled without enabling the MMU
– Many boot loaders will enable the I cache, but not the D cache
• Data caching is only possible once the MMU is enabled
AAETC4v00
Memory Systems 30
• Data caching is only possible once the MMU is enabled
– Appropriate cache policies must be configured in the translation tables
• The L2 cache should generally be enabled with the L1 data cache
– On the Cortex A15 and A7 the L2 (unified) cache is always enabled
• But no lookup occurs unless the L1 D-cache on one of the CPUs in the cluster is
also enabled
– On Cortex-A9 or A5 an external L2 cache (like PL310) is enabled separately
• Via a write to a memory mapped control register
Performance is very poor if instructions are not fetched from cache!
CACHE MAINTENANCE OPERATIONS
• Caches require maintenance to ensure that the program always has
access to the correct data
– Cache clean
• Writes out “dirty data” so that external memory and cache are coherent
• Only applicable to write-back caching
– Cache invalidate
• Marks lines as invalid and therefore available for new data
AAETC4v00
Memory Systems 31
• Marks lines as invalid and therefore available for new data
• When is maintenance required?
– Context switches which modify the mapping between address tags and
physical addresses
– To write self-modifying code, data written via the data cache must be read
back via the instruction cache
• The data cache must be flushed and the instruction cache invalidated
– If an external engine has modified external memory (e.g. DMA)
• Data cache must be invalidated
COHERENCY OPERATIONS
• Type of operation
– Invalidate - clear the Valid bits on the particular cache / branch predictor entry
– Clean - updates the external memory system with Dirty cache line(s)
• Which entries
– All - the entire cache (not available for the data/unified cache)
– MVA - a specific virtual address
AAETC4v00
Memory Systems 32
– MVA - a specific virtual address
– Set/Way - a specific cache line (not available for the branch predictor)
• Scope
– PoC - Point of Coherency (discussed later)
– PoU - Point of Unification (discussed later)
• Inner shareable
– Operations that can be “broadcast”
POINT OF UNIFICATION (POU)
I D
I
Cache
D
Cache
System Control
Coprocessor
CP15
AAETC4v00
Memory Systems 33
I D
Cache Cache
TLB
Point of Unification
• The point at which Instruction, data and TLB accesses see the same copy of memory
• Generally the L2 cache or memory – depends on the system design
POINT OF COHERENCY (POC)
I D
D
Cache
System Control
Coprocessor
CP15
I D
D
Cache
System Control
Coprocessor
CP15
Master A Master B
AAETC4v00
Memory Systems 34
Cache Cache
Point of Coherency
• The point at which all agents see the same copy of memory
• Generally the external memory system – again, very system dependent
POU V POC
I D
I
Cache
D
Cache
System Control
Coprocessor
CP15
I D
I
Cache
D
Cache
System Control
Coprocessor
CP15
AAETC4v00
Memory Systems 35
TLB
Point of Unification
Point of Coherency
TLB
L2 Cache
Point of Unification
Point of Coherency
ASYMMETRIC MULTI-PROCESSING
(AMP)
CPU1
Shared
CPU2 CPU1
CPU2
RAM
Tasks
Task
AAETC4v00
Memory Systems 36
• Each CPU may run a different program
– May also see a different memory map
– May also have own set of interrupts
• Data exchange through shared memory
– CPU L1 caches will need to be managed for coherency
Peripheral
(CPU2)
Peripheral
(CPU1)
SYMMETRIC MULTI-PROCESSING (SMP)
RAM
CPU
1
CPU
2Tasks
AAETC4v00
Memory Systems 37
Peripheral Peripheral
SMP VS. AMP
• SMP – Symmetric Multi-Processing
– All tasks share a common view of memory and peripherals
– Tasks can be dynamically shared across Multiple CPUs
– Simplifies software development
• Provides increased productivity for programmer
AAETC4v00
Memory Systems 38
• AMP – Asymmetric Multi-Processing
– Code portability and design flexibility
– Programmer statically assigns tasks to a CPU
– Enables tasks to be isolated from each other
• Each task may have a different view of memory
A MULTICORE ARM PROCESSOR
Two
Cortex-A9
processor
cores
Snoop ControlInterrupt
CoreSight
debug
infrastructure
AAETC4v00
Memory Systems 39
Shared
external bus
interface
Snoop Control
Unit maintains
L1 cache
coherency
Interrupt
Distributor
Shared
architectural
peripherals
SNOOP CONTROL UNIT
Snoop Control
Unit maintains
AAETC4v00
Memory Systems 40
Unit maintains
L1 cache
coherency
SNOOP CONTROL UNIT (SCU)
• The Snoop Control Unit (SCU) maintains coherency between L1 data caches
– Arbitrates accesses to L2 AXI master interface(s), for both instructions and data
– Duplicated Tag RAMs keep track of what data is allocated in each CPU’s cache
• Separate interfaces into L1 data caches for coherency maintenance
• Optionally, can use address filtering
– Directing accesses to configured memory range to AXI Master port 1
AAETC4v00
Memory Systems 41
CPU0
D$ I$
CPU1
D$ I$
CPU2
D$ I$
CPU3
D$ I$
Snoop Control Unit
TAG TAG TAG TAG
AXI Master 0 AXI Master 1
AGENDA
Memory System Hierarchy
Tightly Coupled Memory
Alignment, Endianness and Ordering
VMSA and PMSA
AAETC4v00
Memory Systems 42
VMSA and PMSA
Caches and Coherency
• Barriers and Synchronization
SYNCHRONIZATION
• Shared resources in a
multi-threaded or multi-
processor system need
protection in critical code
sections
• Operating Systems provide
AAETC4v00
Memory Systems 43
• Operating Systems provide
resources such as spinlocks
or mutexes etc
• Here is an example of a
simple spinlock using
ARM’s exclusive load
and store instructions
BARRIERS
• The ARM architecture includes barrier instructions to force access order
and access completion at a specific point
DMB – Data Memory Barrier
DSB – Data Synchronization Barrier
ISB – Instruction Synchronization Barrier
• This course provides a simple introduction to barriers and their use,
but…
AAETC4v00
Memory Systems 44
but…
– If you are writing code where ordering is important we recommend also
reading:
– ARM Architecture Reference Manual ARMv7-A/R Edition (Rev C)
• A3.8 Memory access order
• B2.2.9 Ordering of cache and branch predictor maintenance operations
• B3.10.1 TLB maintenance operations and the memory order model
• Appendix G Barrier Litmus Tests
– Includes worked examples
DMB VS DSB
• A Data Memory Barrier (DMB) is less restrictive than a Data
Synchronization Barrier (DSB)
• For a DMB:
– No memory accesses after the DMB in program order are started until all
memory accesses before the DMB in program order have been seen by the
rest of the system
AAETC4v00
Memory Systems 45
rest of the system
• A DSB doesn’t complete until:
– All memory accesses before the DSB in program order have completed, and
– All cache, branch predictor and TLB maintenance operations issued by the
local processor have completed
– Furthermore, no instruction that appears after the DSB in program order can
execute until the DSB completes
• Use a DSB when necessary, but don’t overuse them
MAIL BOX EXAMPLE
• P0 – DMB needed to ensure mail box is seen BEFORE the flag is
updated
• P1 – DMB needed to ensure mail box read AFTER flag is seen
P0 – Flag Data As Available
LDR r1, =ADDR_MAILBOX_DATA
LDR r2, =ADDR_MAILBOX_FLAG
P1 – Flag Data As Available
LDR r1, =ADDR_MAILBOX_DATA
LDR r2, =ADDR_MAILBOX_FLAG
AAETC4v00
Memory Systems 46
; Write a new message into
; mail box
STR r5, [r1]
DMB
; set available flag to
; signal mail box full
MOV r0, #0
STR r0, [r2]
; wait for flag
loop
LDR r12, [r2]
CMP r12, #0
BNE loop
DMB
; read message
LDR r0, [r1]
ISB
• The ARM architecture defines context as the system settings in CP15
• Context-changing operations include:
– Cache, TLB, and branch predictor maintenance operations
– Changes to system control registers (e.g. SCTLR, TTBCR, TTBRn, CONTEXTIDR)
• The effect of a context-changing operation is only guaranteed to be seen
after a context synchronization event
AAETC4v00
Memory Systems 47
after a context synchronization event
– Taking an exception
– Returning from an exception
– Instruction Synchronization Barrier (ISB)
• An ISB flushes the pipeline, and re-fetches the instructions from the cache
(or memory)
– Guarantees that effects of any completed context-changing operation before the
ISB are visible to any instruction after the barrier
– Also guarantees that context-changing operations after the ISB instruction only
take effect after the ISB has been executed
CP15 EXAMPLE
• To enable FPU/NEON you need to first enable access to cp10 and cp11
– This is done by writing to the Coprocessor Access Control Register (CACR)
MRC p15, 0, r1, c1, c0, 2; Read CACR into r1
ORR r1, r1, #(0xf << 20) ; Enable full access for p10 & p11
MCR p15, 0, r1, c1, c0, 2; Write back into CACR
ISB
AAETC4v00
Memory Systems 48
ISB
MOV r0, #0x40000000
VMSR FPEXC, r0 ; Enable FPU and NEON
• Without the ISB the processor could already have decoded the VMSR
as an Undefined Instruction exception, before the time the MCR is
executed
– The ISB ensures the update to the CACR is seen by the processor when
decoding the VMSR
SELF-MODIFYING CODE (1)
P0 loads a new program into memory, which then gets executed by P0 and P1
P0
STR r11, [r1] ; Save instruction to program memory
DCCMVAU r1 ; clean D-$ so instruction visible to I-$
DSB ; ensure clean completes on all CPUs
ICIMVAU r1 ; discard stale data from I-$ …
BPIMVA r1 ; … and from Branch Predictor
AAETC4v00
Memory Systems 49
BPIMVA r1 ; … and from Branch Predictor
DSB ; ensure I-$/BP invalidates complete for all
STR r0, [r2] ; set flag == 1 to signal completion
ISB ; synchronize context on this processor
MOV pc, r1 ; branch to new code
P1-Pn
WAIT ([r2] == 1) ; wait for flag signaling completion
; no DSB required here
ISB ; synchronize context on this processor
MOV pc, r1 ; execute newly saved instruction
SOFTWARE & SYSTEMS
DESIGN
4 – Memory Systems

More Related Content

What's hot

Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
GlobalLogic Ukraine
 
Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processor
RAMPRAKASHT1
 
Memory organization (Computer architecture)
Memory organization (Computer architecture)Memory organization (Computer architecture)
Memory organization (Computer architecture)
Sandesh Jonchhe
 
Intel x86 Architecture
Intel x86 ArchitectureIntel x86 Architecture
Intel x86 ArchitectureChangWoo Min
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
Linaro
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
sreea4
 
2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecturesatish1jisatishji
 
Pentium processor
Pentium processorPentium processor
Pentium processor
Pranjali Deshmukh
 
Ec8791 lpc2148
Ec8791 lpc2148Ec8791 lpc2148
Ec8791 lpc2148
RajalakshmiSermadurai
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image Compression
A B Shinde
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystemSandeep Kamath
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
Syed Zaid Irshad
 
System On Chip
System On ChipSystem On Chip
System On Chip
A B Shinde
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
Dr. Pankaj Zope
 
INTEL 80386 MICROPROCESSOR
INTEL  80386  MICROPROCESSORINTEL  80386  MICROPROCESSOR
INTEL 80386 MICROPROCESSOR
Annies Minu
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
Antonios Katsarakis
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
Zakaria Gomaa
 

What's hot (20)

Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processor
 
Memory organization (Computer architecture)
Memory organization (Computer architecture)Memory organization (Computer architecture)
Memory organization (Computer architecture)
 
Intel x86 Architecture
Intel x86 ArchitectureIntel x86 Architecture
Intel x86 Architecture
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
 
2 introduction to arm architecture
2 introduction to arm architecture2 introduction to arm architecture
2 introduction to arm architecture
 
Pentium processor
Pentium processorPentium processor
Pentium processor
 
Ec8791 lpc2148
Ec8791 lpc2148Ec8791 lpc2148
Ec8791 lpc2148
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image Compression
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystem
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
 
Memory technologies
Memory technologiesMemory technologies
Memory technologies
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memories
 
INTEL 80386 MICROPROCESSOR
INTEL  80386  MICROPROCESSORINTEL  80386  MICROPROCESSOR
INTEL 80386 MICROPROCESSOR
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 

Similar to ARM AAE - Memory Systems

ARM architcture
ARM architcture ARM architcture
ARM architcture
Hossam Adel
 
ARM® Cortex M Boot & CMSIS Part 1-3
ARM® Cortex M Boot & CMSIS Part 1-3ARM® Cortex M Boot & CMSIS Part 1-3
ARM® Cortex M Boot & CMSIS Part 1-3Raahul Raghavan
 
POWER ISA introduction and what’s new in ISA V3.1 (Overview)
POWER ISA introduction and what’s new in ISA V3.1 (Overview)POWER ISA introduction and what’s new in ISA V3.1 (Overview)
POWER ISA introduction and what’s new in ISA V3.1 (Overview)
Ganesan Narayanasamy
 
Board support package_on_linux
Board support package_on_linuxBoard support package_on_linux
Board support package_on_linux
Vandana Salve
 
WEEK6_COMPUTER_ORGANIZATION.pptx
WEEK6_COMPUTER_ORGANIZATION.pptxWEEK6_COMPUTER_ORGANIZATION.pptx
WEEK6_COMPUTER_ORGANIZATION.pptx
EmmanueljohnBarretto
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
Yong Heui Cho
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Heechul Yun
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linuxSusant Sahani
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisation
Pavithra S
 
Unit 1 processormemoryorganisation
Unit 1 processormemoryorganisationUnit 1 processormemoryorganisation
Unit 1 processormemoryorganisation
Karunamoorthy B
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
Young Alista
 
10. compute-part-2
10. compute-part-210. compute-part-2
10. compute-part-2
Muhammad Ahad
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
Sarwan ali
 
Cs intro-ca
Cs intro-caCs intro-ca
Cs intro-ca
aniketbijwe143
 
EC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxEC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptx
deviifet2015
 
cs-procstruc.ppt
cs-procstruc.pptcs-procstruc.ppt
cs-procstruc.ppt
Mohamoud Saed Mohamed
 
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld
 
4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf
arpowersarps
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDIT
tjunicornfx
 
It322 intro 2
It322 intro 2It322 intro 2
It322 intro 2
J Cza Àkera
 

Similar to ARM AAE - Memory Systems (20)

ARM architcture
ARM architcture ARM architcture
ARM architcture
 
ARM® Cortex M Boot & CMSIS Part 1-3
ARM® Cortex M Boot & CMSIS Part 1-3ARM® Cortex M Boot & CMSIS Part 1-3
ARM® Cortex M Boot & CMSIS Part 1-3
 
POWER ISA introduction and what’s new in ISA V3.1 (Overview)
POWER ISA introduction and what’s new in ISA V3.1 (Overview)POWER ISA introduction and what’s new in ISA V3.1 (Overview)
POWER ISA introduction and what’s new in ISA V3.1 (Overview)
 
Board support package_on_linux
Board support package_on_linuxBoard support package_on_linux
Board support package_on_linux
 
WEEK6_COMPUTER_ORGANIZATION.pptx
WEEK6_COMPUTER_ORGANIZATION.pptxWEEK6_COMPUTER_ORGANIZATION.pptx
WEEK6_COMPUTER_ORGANIZATION.pptx
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linux
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisation
 
Unit 1 processormemoryorganisation
Unit 1 processormemoryorganisationUnit 1 processormemoryorganisation
Unit 1 processormemoryorganisation
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
10. compute-part-2
10. compute-part-210. compute-part-2
10. compute-part-2
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
Cs intro-ca
Cs intro-caCs intro-ca
Cs intro-ca
 
EC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxEC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptx
 
cs-procstruc.ppt
cs-procstruc.pptcs-procstruc.ppt
cs-procstruc.ppt
 
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld 2013: Extreme Performance Series: Monster Virtual Machines
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
 
4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf4.1 Introduction 145• In this section, we first take a gander at a.pdf
4.1 Introduction 145• In this section, we first take a gander at a.pdf
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDIT
 
It322 intro 2
It322 intro 2It322 intro 2
It322 intro 2
 

More from Anh Dung NGUYEN

ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARM
Anh Dung NGUYEN
 
ARM AAE - Intrustion Sets
ARM AAE - Intrustion SetsARM AAE - Intrustion Sets
ARM AAE - Intrustion Sets
Anh Dung NGUYEN
 
ARM AAE - Introduction
ARM AAE - IntroductionARM AAE - Introduction
ARM AAE - Introduction
Anh Dung NGUYEN
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation Diversity
Anh Dung NGUYEN
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System Startup
Anh Dung NGUYEN
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
Anh Dung NGUYEN
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced Features
Anh Dung NGUYEN
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software Development
Anh Dung NGUYEN
 
AAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAAME ARM Techcon2013 Intro
AAME ARM Techcon2013 Intro
Anh Dung NGUYEN
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
Anh Dung NGUYEN
 

More from Anh Dung NGUYEN (10)

ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARM
 
ARM AAE - Intrustion Sets
ARM AAE - Intrustion SetsARM AAE - Intrustion Sets
ARM AAE - Intrustion Sets
 
ARM AAE - Introduction
ARM AAE - IntroductionARM AAE - Introduction
ARM AAE - Introduction
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation Diversity
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System Startup
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced Features
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software Development
 
AAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAAME ARM Techcon2013 Intro
AAME ARM Techcon2013 Intro
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
 

Recently uploaded

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 

Recently uploaded (20)

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 

ARM AAE - Memory Systems

  • 1. SOFTWARE & SYSTEMS DESIGN 4 – Memory Systems
  • 2. AGENDA • Memory System Hierarchy Tightly Coupled Memory Alignment, Endianness and Ordering VMSA and PMSA AAETC4v00 Memory Systems 2 VMSA and PMSA Caches and Coherency Barriers and Synchronization
  • 3. MEMORY SUBSYSTEM ARM Core L1 I-Cache MMU/MPU BusInterfaceUnit CP15 L2Cache AMBA Interconnect L1 L2 L3 AAETC4v00 Memory Systems 3 L1 D-Cache BusInterfaceUnit WB • MMU • Supports virtual memory, included by all Cortex-A processors • MPU • Allows memory protection only, included by all Cortex-R processors • Some Cortex-M processors support an optional MPU
  • 5. AGENDA Memory System Hierarchy • Tightly Coupled Memory Alignment, Endianness and Ordering VMSA and PMSA AAETC4v00 Memory Systems 5 VMSA and PMSA Caches and Coherency Barriers and Synchronization
  • 6. WHAT IS TIGHTLY COUPLED MEMORY? • An alternative approach to caches – Allows for high performance operation with slow external memory – Supported on Cortex-R processors • Fast memory, local to the processor – Provides high speed performance without accessing the system bus – A smaller die size penalty compared to equivalent amount of cache AAETC4v00 Memory Systems 6 • Appears at fixed locations within the physical memory map – Code and data can be copied to TCMs by application or library code – DMA access or an external AXI interface to TCMs are included on some processors • Can be used for TCM preloading • Cortex-R4/R5 provides an external AXI slave port for access to TCMs • Precise real-time performance can be predicted for MPU based cores – MMU enabled cores have to perform address translation for TCM accesses • TLB checks will be made and table walks can occur
  • 7. TCM CONFIGURATION • TCM enabled cores support two interfaces – Traditionally referred to as I-TCM and D-TCM – Also referred to as TCM-A and TCM-B, e.g. Cortex-R4 TCM-B 0x200000 External Memory• Each TCM interface can individually be configured using CP15 operations • Physical base address (multiple of size) • Can overlay external memory AAETC4v00 Memory Systems 7 Memory map 0x0 TCM-A TCM blocks• Memory Size (depends on core and implementation) • Enable/Disable • External pin(s) determines post-reset configuration • Possible to make system boot from TCM memory • INITRAM pin(s) enables TCMs during core reset • LOCZRAM pin allows TCM address selection before reset • Supported on Cortex-R4 • When enabled TCMs must not overlap
  • 8. AGENDA Memory System Hierarchy Tightly Coupled Memory • Alignment, Endianness and Ordering VMSA and PMSA AAETC4v00 Memory Systems 8 VMSA and PMSA Caches and Coherency Barriers and Synchronization
  • 9. ALIGNMENT AND ENDIANNESS • ARMv4/v5 data alignment – Prior to ARMv6, all hardware data accesses had to be size aligned (for example, words on word boundaries) – Unaligned accesses could be caught by hardware – Unaligned data in software was accessed by a series of aligned memory accesses • ARMv6/v7 data alignment AAETC4v00 Memory Systems 9 • ARMv6/v7 data alignment – Data accesses can be unaligned • Only a sub-set of load/store instructions support unaligned accesses • Unaligned accesses only allowed to addresses marked as Normal – The load/store unit will access memory with aligned memory accesses and make the data available to the CPU • ARM processors are little-endian – But can be configured to access big-endian memory systems
  • 10. MEMORY ORDERING MODEL • The ARM architecture defines a weak ordering model… … between accesses to Normal memory regions … between Normal memory and Device memory accesses • This means that accesses might not occur in program order • The architecture also allows for speculative accesses – Data or instructions fetched from memory before being explicitly referenced AAETC4v00 Memory Systems 10 – Data or instructions fetched from memory before being explicitly referenced – Examples of speculative access includes: • Branch prediction • Out of order data loads • Speculative cache line fills • Speculative data accesses are only allowed to Normal memory • Speculative instruction fetches are allowed to any region not marked as XN
  • 11. WHY DO I CARE ABOUT ACCESS ORDER? • In most cases precise access order does not matter – But sometimes it is necessary to force access ordering • Examples of when ordering matters: – Sharing data between different threads/CPUs • e.g. mail boxes – Sharing data with peripherals AAETC4v00 Memory Systems 11 – Sharing data with peripherals • e.g. DMA operations – Modifying instruction memory • e.g. loading a program into RAM or scatter loading – Modifying memory management scheme • e.g. context switching or demand paging • Where access order is important you may need to use barrier instructions • Compilers/assemblers will not automatically insert barriers for you!
  • 12. V6/V7 MEMORY TYPE • In ARMv6/ARMv7 address locations must be described in terms of a type • The “type” tells the processor how accesses to that location must behave – Memory access ordering rules – Caching and buffering behavior AAETC4v00 Memory Systems 12 – Caching and buffering behavior – Speculation • There are three mutually exclusive memory types – Normal - Data and instructions – Device - Devices/peripherals – Strongly-ordered - Device/peripherals, or data used by legacy code
  • 13. ACCESS ORDERING • In Normal memory, ARM implements a weakly-ordered memory model – This means that, in the absence of address or data dependencies, accesses may be re-ordered, combined and/or repeated without affect on the system – Speculative access are permitted • Access ordering – The table shows the ordering enforced between two memory accesses (A1 AAETC4v00 Memory Systems 13 – The table shows the ordering enforced between two memory accesses (A1 and A2) in each type of memory – “<“ indicates that access A1 must complete before access A2 – Barrier instructions are required to enforce ordering beyond the default behavior in the table
  • 14. AGENDA Memory System Hierarchy Tightly Coupled Memory Alignment, Endianness and Ordering • VMSA and PMSA AAETC4v00 Memory Systems 14 • VMSA and PMSA Caches and Coherency Barriers and Synchronization
  • 15. VMSA AND PMSA • Protected Memory System Architecture – Allows protection of configurable memory regions – Regions defined as base address and length – Number of regions available varies between processors – Protection is on basis of access type and privilege – Does not support virtual address translation AAETC4v00 Memory Systems 15 – Does not support virtual address translation • Virtual Memory System Architecture – Implements virtual memory translation – Supported by all Cortex-A processors – Uses page tables for translation configuration – Also implements a full access protection scheme – Extended to 40-bit physical addressing on latest cores (e.g. Cortex-A15)
  • 16. MEMORY PROTECTION UNIT Peripherals FLASH Memory map MPU region 2 Size: 256MB Read/Write MPU region 1 Size: 32MB Read Only Normal (Cached) Executable • A Memory Protection Unit (MPU) provides basic memory management • Allows attributes to be applied to different address regions • All accesses checked against MPU regions • Each region has: AAETC4v00 Memory Systems 16 SRAM MPU region 3 Size: 256KB Read/Write Normal (Cached, bufferable) Executable Peripherals Read/Write Device (Bufferable) Execute Never (XN) MPU region 0 Size: 4GB No Access • Each region has: • Base address • Size • Attributes (e.g. Type) • Available on: • ARM1156T2(F)-S • Cortex-R family
  • 17. VIRTUAL MEMORY • Core issues “Virtual Addresses” (VA) • Memory is accessed using “Physical Addresses” (PA) • Translation is carried out automatically by Memory Management Unit (MMU) • Translation configuration is stored in page tables in external memory Virtual Memory Map Physical Memory Map AAETC4v00 Memory Systems 17 Virtual Memory Map OS Application Space Vectors Peripherals Privileged Access User Access Uncached Read-only Physical Memory Map FLASH RAM Peripherals
  • 18. THE MEMORY MANAGEMENT UNIT • The Memory Management Unit (MMU) handles translation of virtual addresses to ARM Core MMU Caches Memory Virtual Address Space Physical Address Space TLBs Table Walk Unit Translation Tables AAETC4v00 Memory Systems 18 • The Memory Management Unit (MMU) handles translation of virtual addresses to physical addresses • Provides hardware to read translation tables in memory - called table walking • CP15 Table Base Registers (TTBR) store physical base addresses of tables • Translation Look-aside Buffers (TLBs) cache recent translations • Core can have separate instruction and data TLBs, or a shared unified TLB • When the MMU is enabled all accesses by the core are passed through it • MMU will use cached translations from the TLB(s) or perform a table walk • Translation must occur before cache look-up can complete
  • 19. LEVEL ONE PAGE TABLES First-level Table 0x0 0x4 0x8 0xC 0x3FFC 0x3FF8 0x3FF4 0x3FF0 0x3FEC 0x3FE8 Tableoffset(bytes) ARM Core Virtual Address VA PA Memory Physical Address AAETC4v00 Memory Systems 19 • Diagram shows a single-level page table • VA to PA mapping at 1MB resolution • Translation carried out in a single step • Page table lookup is done automatically by MMU • Recent translations are cached in internal TLB Tableoffset(bytes) Translation Table Base (TTB)
  • 20. LEVEL TWO PAGE TABLES First-level table 0x4 0x8 0xC 0x3FFC 0x3FF8 0x3FF4 0x3FF0 0x3FEC 0x3FE8 Second-level tables 0x0 0x4 0x8 0x3FC 0x3F8 0x3F4 0x3F0 4KB Page Page Table 4KB Page Page Table Tableoffset(bytes) ARM Core Virtual Address VA PA AAETC4v00 Memory Systems 20 • Second level page table allows mapping at 4KB resolution • Translation requires two page table look-ups 0x0 0x4 Translation Table Base (TTB) Tableoffset(bytes)
  • 21. ACCESS PERMISSIONS AND XN • Access permission determined by AP[2:0] bits in page table descriptor AP Privileged User Notes 000 No access No access Permission fault 001 Read/Write No access Privileged mode access 010 Read/Write Read Permission fault on user write 011 Read/Write Read/Write Full access AAETC4v00 Memory Systems 21 011 Read/Write Read/Write Full access 100 - - Reserved 101 Read No access Privileged mode read only 110 Read Read Permission fault on writes† 111 Read Read Permission fault on writes • “eXecute Never” (XN) prevents instruction execution from a region • Speculative instruction fetches are also suppressed • The core never makes speculative accesses to Device or Strongly Ordered memory
  • 22. MMU CONFIGURATION AND MAINTENANCE • Enabling the MMU – The MMU is disabled at reset and is enabled via the SCTLR.M bit – MMU page tables contain memory type configuration (Includes shareability, cacheability, bufferability, access permissions etc.) – All this must be configured before the MMU is enabled • TLB maintenance – TLBs cache memory translation information AAETC4v00 Memory Systems 22 – TLBs cache memory translation information – Must be invalided when translation table contents are changed – May also need invalidation on a context switch – ASID is provided to minimize this – TLBs should be invalidated by the startup code on reset • When the MMU is disabled – PA = VA i.e. no address translation is performed – Instruction accesses may be cached (controlled by SCRTL.I bit) – Data accesses will not be cached and are all treated as Strongly ordered – No access permissions are carried out
  • 23. AGENDA Memory System Hierarchy Tightly Coupled Memory Alignment, Endianness and Ordering VMSA and PMSA AAETC4v00 Memory Systems 23 VMSA and PMSA • Caches and Coherency Barriers and Synchronization
  • 24. CACHES IN CORTEX-A SERIES PROCESSORS • Applications processors are usually implemented with two levels of cache – Separate (Harvard) L1 Instruction and Data caches per core • Relatively small (typically 32KB), providing fast access inside the L1 subsystem – A single (unified) L2 cache (integrated or external, depending on the CPU) • Relatively large (up to 8 MB), with access times slower than L1 memory accesses • MMU uses information contained in the translation tables to control which memory locations are cached AAETC4v00 Memory Systems 24 memory locations are cached MMU CPU0 I-Cache D-Cache BusInterfaceUnit CP15 L2Cache AMBAInterconnect SRAM External DRAM AMBAInterconnect APB MMU CPU1 I-Cache D-Cache CP15
  • 25. CACHE TERMINOLOGY • You should know the meaning of the following terms… – Line – Way Tag Index Offset Address: AAETC4v00 Memory Systems 25 Way – Set – Tag – Index – Offset – Data RAM – Tag RAM – Valid and Dirty Bits Tag RAMData RAM Way Set Index Tag
  • 26. HOW IS DATA STORED IN MY CACHE? • Caches handle data in lines (32 or 64 bytes per cache line) – Physical address used to determine the location of data in cache • Bottom bits (offset) identify word/byte in line • Middle bits (index) identify which line • Top bits (tag) identify remainder of address • Each line in the cache includes: Tag RAMData RAM Index Tag Index Offset Address: AAETC4v00 Memory Systems 26 • Each line in the cache includes: – Tag bits from the associated physical address – Valid bit: indicates whether line exists in the cache – Dirty data bit(s): indicates whether line (or cache line) is not coherent with external memory • To reduce cache contention, ARM caches are “set associative” – There are multiple possible cache locations (ways) for any given address – A victim counter decides which cache way will be used for an allocation – Replacement policy used by victim counter varies by core Way Set Tag
  • 27. EXAMPLE MEMORY ACCESS Main Memory Offset Index Index Offset 0x00000000 0x00000010 0x00000020 0x00000030 0x00000040 0x00000050 0x00000060 0x00000070 0x00000080 0x00000090 Way 0 Way 1 …110 ...101 Tag Index Offset 32bit Address: 0x0000007C ...001 11 11 00 Byte …001 Main Memory Offset Index Index Offset 0x00000000 0x00000010 0x00000020 0x00000030 0x00000040 0x00000050 0x00000060 0x00000070 0x00000080 0x00000090 Way 0 Way 1 …110 ...101 Tag Index Offset 32bit Address: 0x0000007C ...001 11 11 00 Byte …001 AAETC4v00 Memory Systems 27 ?× Victim Counter ? Victim Counter Way 0 Way 1 Data == 4. Victim counter specifies which cache Way to use (will Evict previous data) 5. Cache returns requested word to the core Way 0 Way 1 Data == • Memory Read: LDR r1,[0x0000007C] 1. Cache Lookup is performed 2. Cache Miss - Tag matches fail for given Index in all Cache Ways 3. Cache Linefill is performed
  • 28. CACHE BEHAVIOR • Cache lookup – The core checks to see if a memory address is currently in the cache – A “cache miss” occurs if the data is not found • The cache may then automatically load the relevant data • This is called a “cache linefill” – A “cache hit” occurs if the data is found AAETC4v00 Memory Systems 28 – A “cache hit” occurs if the data is found • The data is immediately returned to the core • No external memory access takes place • Cache Eviction – In order to make space for new data, existing cache data may have to be evicted – In “writeback” mode, dirty data will have to be written back to memory first • Victim counter – This is an internal value used to select the data for eviction
  • 29. CACHE MODES AND POLICIES • Allocation policy – Controls when new data is loaded into the cache – A read-allocate policy only allocates new data on a read miss – A write-allocate policy also allocates on a write miss • Eviction policy – Governs the selection of lines for eviction AAETC4v00 Memory Systems 29 – Governs the selection of lines for eviction – A round-robin policy cycles through the lines in a fixed order – A random policy selects a line at random • Write-through and Write-back – Controls what happens when a write operation hits in the cache – A write-through cache updates external memory in parallel – A write-back cache does not update external memory
  • 30. WHEN SHOULD I ENABLE CACHES? • Caches are disabled on reset – Architecturally, caches are not guaranteed to be in a known state at reset – Need to be invalidated by software on Cortex-A9 – Not required on Cortex-A5/A7/A15 • The L1 instruction cache can be enabled without enabling the MMU – Many boot loaders will enable the I cache, but not the D cache • Data caching is only possible once the MMU is enabled AAETC4v00 Memory Systems 30 • Data caching is only possible once the MMU is enabled – Appropriate cache policies must be configured in the translation tables • The L2 cache should generally be enabled with the L1 data cache – On the Cortex A15 and A7 the L2 (unified) cache is always enabled • But no lookup occurs unless the L1 D-cache on one of the CPUs in the cluster is also enabled – On Cortex-A9 or A5 an external L2 cache (like PL310) is enabled separately • Via a write to a memory mapped control register Performance is very poor if instructions are not fetched from cache!
  • 31. CACHE MAINTENANCE OPERATIONS • Caches require maintenance to ensure that the program always has access to the correct data – Cache clean • Writes out “dirty data” so that external memory and cache are coherent • Only applicable to write-back caching – Cache invalidate • Marks lines as invalid and therefore available for new data AAETC4v00 Memory Systems 31 • Marks lines as invalid and therefore available for new data • When is maintenance required? – Context switches which modify the mapping between address tags and physical addresses – To write self-modifying code, data written via the data cache must be read back via the instruction cache • The data cache must be flushed and the instruction cache invalidated – If an external engine has modified external memory (e.g. DMA) • Data cache must be invalidated
  • 32. COHERENCY OPERATIONS • Type of operation – Invalidate - clear the Valid bits on the particular cache / branch predictor entry – Clean - updates the external memory system with Dirty cache line(s) • Which entries – All - the entire cache (not available for the data/unified cache) – MVA - a specific virtual address AAETC4v00 Memory Systems 32 – MVA - a specific virtual address – Set/Way - a specific cache line (not available for the branch predictor) • Scope – PoC - Point of Coherency (discussed later) – PoU - Point of Unification (discussed later) • Inner shareable – Operations that can be “broadcast”
  • 33. POINT OF UNIFICATION (POU) I D I Cache D Cache System Control Coprocessor CP15 AAETC4v00 Memory Systems 33 I D Cache Cache TLB Point of Unification • The point at which Instruction, data and TLB accesses see the same copy of memory • Generally the L2 cache or memory – depends on the system design
  • 34. POINT OF COHERENCY (POC) I D D Cache System Control Coprocessor CP15 I D D Cache System Control Coprocessor CP15 Master A Master B AAETC4v00 Memory Systems 34 Cache Cache Point of Coherency • The point at which all agents see the same copy of memory • Generally the external memory system – again, very system dependent
  • 35. POU V POC I D I Cache D Cache System Control Coprocessor CP15 I D I Cache D Cache System Control Coprocessor CP15 AAETC4v00 Memory Systems 35 TLB Point of Unification Point of Coherency TLB L2 Cache Point of Unification Point of Coherency
  • 36. ASYMMETRIC MULTI-PROCESSING (AMP) CPU1 Shared CPU2 CPU1 CPU2 RAM Tasks Task AAETC4v00 Memory Systems 36 • Each CPU may run a different program – May also see a different memory map – May also have own set of interrupts • Data exchange through shared memory – CPU L1 caches will need to be managed for coherency Peripheral (CPU2) Peripheral (CPU1)
  • 38. SMP VS. AMP • SMP – Symmetric Multi-Processing – All tasks share a common view of memory and peripherals – Tasks can be dynamically shared across Multiple CPUs – Simplifies software development • Provides increased productivity for programmer AAETC4v00 Memory Systems 38 • AMP – Asymmetric Multi-Processing – Code portability and design flexibility – Programmer statically assigns tasks to a CPU – Enables tasks to be isolated from each other • Each task may have a different view of memory
  • 39. A MULTICORE ARM PROCESSOR Two Cortex-A9 processor cores Snoop ControlInterrupt CoreSight debug infrastructure AAETC4v00 Memory Systems 39 Shared external bus interface Snoop Control Unit maintains L1 cache coherency Interrupt Distributor Shared architectural peripherals
  • 40. SNOOP CONTROL UNIT Snoop Control Unit maintains AAETC4v00 Memory Systems 40 Unit maintains L1 cache coherency
  • 41. SNOOP CONTROL UNIT (SCU) • The Snoop Control Unit (SCU) maintains coherency between L1 data caches – Arbitrates accesses to L2 AXI master interface(s), for both instructions and data – Duplicated Tag RAMs keep track of what data is allocated in each CPU’s cache • Separate interfaces into L1 data caches for coherency maintenance • Optionally, can use address filtering – Directing accesses to configured memory range to AXI Master port 1 AAETC4v00 Memory Systems 41 CPU0 D$ I$ CPU1 D$ I$ CPU2 D$ I$ CPU3 D$ I$ Snoop Control Unit TAG TAG TAG TAG AXI Master 0 AXI Master 1
  • 42. AGENDA Memory System Hierarchy Tightly Coupled Memory Alignment, Endianness and Ordering VMSA and PMSA AAETC4v00 Memory Systems 42 VMSA and PMSA Caches and Coherency • Barriers and Synchronization
  • 43. SYNCHRONIZATION • Shared resources in a multi-threaded or multi- processor system need protection in critical code sections • Operating Systems provide AAETC4v00 Memory Systems 43 • Operating Systems provide resources such as spinlocks or mutexes etc • Here is an example of a simple spinlock using ARM’s exclusive load and store instructions
  • 44. BARRIERS • The ARM architecture includes barrier instructions to force access order and access completion at a specific point DMB – Data Memory Barrier DSB – Data Synchronization Barrier ISB – Instruction Synchronization Barrier • This course provides a simple introduction to barriers and their use, but… AAETC4v00 Memory Systems 44 but… – If you are writing code where ordering is important we recommend also reading: – ARM Architecture Reference Manual ARMv7-A/R Edition (Rev C) • A3.8 Memory access order • B2.2.9 Ordering of cache and branch predictor maintenance operations • B3.10.1 TLB maintenance operations and the memory order model • Appendix G Barrier Litmus Tests – Includes worked examples
  • 45. DMB VS DSB • A Data Memory Barrier (DMB) is less restrictive than a Data Synchronization Barrier (DSB) • For a DMB: – No memory accesses after the DMB in program order are started until all memory accesses before the DMB in program order have been seen by the rest of the system AAETC4v00 Memory Systems 45 rest of the system • A DSB doesn’t complete until: – All memory accesses before the DSB in program order have completed, and – All cache, branch predictor and TLB maintenance operations issued by the local processor have completed – Furthermore, no instruction that appears after the DSB in program order can execute until the DSB completes • Use a DSB when necessary, but don’t overuse them
  • 46. MAIL BOX EXAMPLE • P0 – DMB needed to ensure mail box is seen BEFORE the flag is updated • P1 – DMB needed to ensure mail box read AFTER flag is seen P0 – Flag Data As Available LDR r1, =ADDR_MAILBOX_DATA LDR r2, =ADDR_MAILBOX_FLAG P1 – Flag Data As Available LDR r1, =ADDR_MAILBOX_DATA LDR r2, =ADDR_MAILBOX_FLAG AAETC4v00 Memory Systems 46 ; Write a new message into ; mail box STR r5, [r1] DMB ; set available flag to ; signal mail box full MOV r0, #0 STR r0, [r2] ; wait for flag loop LDR r12, [r2] CMP r12, #0 BNE loop DMB ; read message LDR r0, [r1]
  • 47. ISB • The ARM architecture defines context as the system settings in CP15 • Context-changing operations include: – Cache, TLB, and branch predictor maintenance operations – Changes to system control registers (e.g. SCTLR, TTBCR, TTBRn, CONTEXTIDR) • The effect of a context-changing operation is only guaranteed to be seen after a context synchronization event AAETC4v00 Memory Systems 47 after a context synchronization event – Taking an exception – Returning from an exception – Instruction Synchronization Barrier (ISB) • An ISB flushes the pipeline, and re-fetches the instructions from the cache (or memory) – Guarantees that effects of any completed context-changing operation before the ISB are visible to any instruction after the barrier – Also guarantees that context-changing operations after the ISB instruction only take effect after the ISB has been executed
  • 48. CP15 EXAMPLE • To enable FPU/NEON you need to first enable access to cp10 and cp11 – This is done by writing to the Coprocessor Access Control Register (CACR) MRC p15, 0, r1, c1, c0, 2; Read CACR into r1 ORR r1, r1, #(0xf << 20) ; Enable full access for p10 & p11 MCR p15, 0, r1, c1, c0, 2; Write back into CACR ISB AAETC4v00 Memory Systems 48 ISB MOV r0, #0x40000000 VMSR FPEXC, r0 ; Enable FPU and NEON • Without the ISB the processor could already have decoded the VMSR as an Undefined Instruction exception, before the time the MCR is executed – The ISB ensures the update to the CACR is seen by the processor when decoding the VMSR
  • 49. SELF-MODIFYING CODE (1) P0 loads a new program into memory, which then gets executed by P0 and P1 P0 STR r11, [r1] ; Save instruction to program memory DCCMVAU r1 ; clean D-$ so instruction visible to I-$ DSB ; ensure clean completes on all CPUs ICIMVAU r1 ; discard stale data from I-$ … BPIMVA r1 ; … and from Branch Predictor AAETC4v00 Memory Systems 49 BPIMVA r1 ; … and from Branch Predictor DSB ; ensure I-$/BP invalidates complete for all STR r0, [r2] ; set flag == 1 to signal completion ISB ; synchronize context on this processor MOV pc, r1 ; branch to new code P1-Pn WAIT ([r2] == 1) ; wait for flag signaling completion ; no DSB required here ISB ; synchronize context on this processor MOV pc, r1 ; execute newly saved instruction
  • 50. SOFTWARE & SYSTEMS DESIGN 4 – Memory Systems