Memory model

Memory ordering
2014
issue.hsu@gmail.com

Background
•Synchronization of multithread
program
– Mutex (mutual exclusion)
• Ensuring that no two processes or threads are in their critical section
at the same time
– Here, a critical section refers to a period of time when the process
accesses a shared resource, such as shared memory
3

Background
– Semaphore
• A mutex is essentially the same thing as a binary semaphore, and
sometimes uses the same basic implementation
• However, the term "mutex" is used to describe a construct which
prevents two processes from accessing a shared resource
concurrently
• The term "binary semaphore" is used to describe a construct which
limits access to a single resource
• In many cases a mutex has a concept of an “owner”
– the process which locked the mutex is the only process allowed to
unlock it. In contrast, semaphores generally do not have this
restriction
– Semaphore vs. mutex
• http://www.kernel.org/doc/Documentation/mutex-design.txt
4

Synchronization
and mutex
Common synchronization methods
5
Reference:
http://msdn.microsoft.com/e
n-us/library/ms810047.aspx
Windows mutex mechanisms
Type of mutex IRQL considerations Recursion and thread details
Interrupt spin lock Acquisition raises IRQL to DIRQ and returns
previous IRQL to caller.
Not recursive. Release on same
thread as acquire.
Spin lock Acquisition raises IRQL to
DISPATCH_LEVEL and returns previous
IRQL to caller.
thread as acquire.
Queued spin lock Acquisition raises IRQL to
DISPATCH_LEVEL and stores previous
IRQL in lock owner handle.
thread as acquire.
Fast mutex Acquisition raises IRQL to APC_LEVEL and
stores previous IRQL in lock.
thread as acquire.
Kernel mutex (a
kernel dispatcher
object)
Enters critical region upon acquisition and
leaves critical region upon release.
Recursive. Release on same
thread as acquire.
Synchronization
event (a kernel
dispatcher object)
Acquisition does not change IRQL. Wait at
IRQL <= APC_LEVEL and signal at IRQL
<= DISPATCH_LEVEL.
Not recursive. Release on the
same thread or on a different
thread.
Unsafe fast mutex Acquisition does not change IRQL. Acquire
and release at IRQL <= APC_LEVEL.
thread as acquire.
Synchronization
method
Description Windows mechanisms
Interlocked
operations
Provides atomic logical,
arithmetic, and list
manipulation operations that
are both thread-safe and
multiprocessor safe.
InterlockedXxx and
ExInterlockedXxx routines
Mutexes Provides (mutually) exclusive
access to memory.
Spin locks, fast mutexes,
kernel mutexes,
synchronization events
Shared/exclusive
lock
Allows one thread to write or
many threads to read the
protected data.
Executive resources
Counted semaphore Allows a fixed number of
acquisitions.
Semaphores

What is wrong with Mutexes?
• Mutexes are perfectly fine, but you have a problem if there is lock
contention
– If you want your algorithm to be fast, you want to use the
available cores as much as possible instead of letting them sleep
– A thread can hold a mutex and be de-scheduled by the CPU
(because of a cache miss or its time slice is over), then all the
threads that want to acquire this mutex will be blocked
– And if you have a lot of blocking, the OS also needs to do more
context switches which are expensive because they clear the
caches
6
Reference:
http://woboq.com/blog/introduction
-to-lockfree-programming.html

What is wrong with Mutexes?
• Problems with locking
– Deadlock
– Priority Inversion
• Low-priority processes hold a lock required by a higher priority process
– Convoying
• All the other processes slow to the speed of the slowest one
– Async-signal-safety
• Signal handlers can’t use lock-based primitives
– Kill-tolerant availability
• What happens if threads are killed/crash while holding locks
– Pre-emption tolerance
• What happens if you’re pre-empted holding a lock
– Overall performance
7
Reference:
http://www.cs.cmu.edu/~410-
s05/lectures/L31_LockFree.pdf

So how can we do it without locking?
• Lock-free Programming
– Thread-safe access to shared data without the use of
synchronization primitives such as mutexes
– Practical with hardware support
• Modern CPUs have something called atomic operations
• The use of shared memory and an atomic instruction provides the
mutual exclusion
8

Atomic operation
• Atomic operation
– Processors have instructions that can be used to implement lock-
free and wait-free algorithms
• Atomic read-write
• Atomic swap, also called XCHG
• Test-and-set
• Fetch-and-add
• Compare-and-swap (CAS)
– Compare and Exchange (CMPXCHG) instruction in the x86 and
Itanium architectures
– ABA problem
» http://woboq.com/blog/introduction-to-lockfree-programming.html
9
Reference:
http://en.wikipedia.org/wiki/Atomic_operation
http://en.wikipedia.org/wiki/Read-modify-write

Atomic operation
• Load-Link/Store-Conditional
– The LDREX and STREX instructions in ARM split the operation of
atomically updating memory into two separate steps. Together, they provide
atomic updates in conjunction with exclusive monitors that track exclusive
memory accesses. Load-Exclusive and Store-Exclusive must only access
memory regions marked as Normal
– For example
» LDREX R1, [R0] performs a Load-Exclusive from the address in R0, places the value into
R1 and updates the exclusive monitor(s).
» STREX R2, R1, [R0] performs a Store-Exclusive operation to the address in R0,
conditionally storing the value from R1 and indicating success or failure in R2.
10
Reference:
http://infocenter.arm.com/help/topic/co
m.arm.doc.dht0008a/ch01s02s01.html
http://infocenter.arm.com/help/topic/co
m.arm.doc.dht0008a/CJAGCFAF.html
Exclusive accesses to memory locations
marked as Non-shareable are checked
only against this local monitor. Exclusive
accesses to memory locations marked as
Shareable are checked against both the
local monitor and the global monitor.

Atomic operation
• GCC Built-in functions for atomic memory access
– http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Atomic-Builtins.html
• Atomic operations supported in Linux Kernel
– https://www.kernel.org/doc/Documentation/atomic_ops.txt
• Atomic operations supported in C11/C++11
– C11 defines a new _Atomic() type specifier. You can declare an
atomic integer like this:
_Atomic(int) counter;
– C++11 moves this declaration into the standard library:
#include <atomic>
std::atomic<int> counter;
11
Reference:
http://www.informit.com/articles
/article.aspx?p=1832575

Atomic operation
• Is atomic operation enough?
• Linux-v3.7.8/arch/arm/include/asm/atomic.h
12
static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
{
unsigned long oldval, res;
smp_mb();
do {
__asm__ __volatile__("@ atomic_cmpxchgn"
"ldrex %1, [%3]n"
"mov %0, #0n"
"teq %1, %4n"
"strexeq %0, %5, [%3]n"
: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
: "r" (&ptr->counter), "Ir" (old), "r" (new)
: "cc");
} while (res);
smp_mb();
return oldval;
}
Reference:
http://lxr.linux.no/#linux+v3.7.8/arch/ar
m/include/asm/atomic.h#L115
Before talking about memory
barrier, let’s see memory ordering
first.
Memory barrier

Memory ordering
• Memory ordering - memory access ordering
– Program order
• the order of the program’s object code as seen by the CPU, which might differ from
the order in the source code due to compiler optimizations
– Execution order
• It can differ from program order due to both compiler and CPU implementation
optimizations
– Perceived order
• It can differ from the execution order due to caching, interconnect, and memory-
system optimizations
• Why memory reordering
– Performance!
14
Reference:
http://www.rdrop.com/users/paulmck/sca
lability/paper/ordering.2007.09.19a.pdf
http://preshing.com/20120930/weak-vs-
strong-memory-models

Memory consistency models
• Memory models – memory consistency models
• Sequential consistency
– all reads and all writes are in-order
• Relaxed consistency
– Some types of reordering are allowed
• Loads can be reordered after loads (for better working of cache coherency,
better scaling)
• Loads can be reordered after stores
• Stores can be reordered after stores
• Stores can be reordered after loads
• Weak consistency
– Reads and Writes are arbitrarily reordered, limited only by explicit
memory barriers
15

Weak VS. Strong memory model
16
Reference:
http://preshing.com/20120930/
weak-vs-strong-memory-models

Memory ordering in some architectures
17
SPARC TSO = total-store order (default)
SPARC RMO = relaxed-memory order (not supported on recent
CPUs)
SPARC PSO = partial store order (not supported on recent CPUs)
Type Alpha ARMv7 PA-RISC POWER
SPARC
RMO
SPARC
PSO
SPARC
TSO
x86
x86
oostore
AMD64 IA-64 zSeries
Loads reordered after loads Y Y Y Y Y Y Y
Loads reordered after stores Y Y Y Y Y Y Y
Stores reordered after stores Y Y Y Y Y Y Y Y
Stores reordered after loads Y Y Y Y Y Y Y Y Y Y Y Y
Atomic reordered with loads Y Y Y Y Y
Atomic reordered with stores Y Y Y Y Y Y
Dependent loads reordered Y
Incoherent Instruction cache pipeline Y Y Y Y Y Y Y Y Y Y
Reference:
http://en.wikipedia.org/wiki/Memory_ordering

Types of Memory Barrier
• #LoadLoad
• #StoreStore
• #LoadStore
• #StoreLoad
– A StoreLoad barrier ensures that all stores performed before the barrier are visible to other
processors, and that all loads performed after the barrier receive the latest value that is visible at the
time of the barrier
18
Reference:
http://preshing.com/20120710/memory-
barriers-are-like-source-control-operations
if (IsPublished) // Load and check shared flag
{
LOADLOAD_FENCE(); // Prevent reordering of loads
return Value; // Load published value
}
Value = x; // Publish some data
STORESTORE_FENCE();
IsPublished = 1; // Set shared flag to indicate availability of data

Memory barrier in compiler
• GCC compiler memory barrier
– These barriers prevent a compiler from reordering instructions,
they do not prevent reordering by CPU.
• GCC support for hardware memory barriers
– This builtin issues a full memory barrier.
19
Reference:
http://en.wikipedia.org/wiki/Memory_ordering
http://gcc.gnu.org/onlinedocs/gcc-
4.6.3/gcc/Atomic-Builtins.html
asm volatile("" ::: "memory");
or
__asm__ __volatile__ ("" ::: "memory");
_sync_synchronize (...);

Memory barriers in Linux kernel
• General barrier
– barrier()
• Compiler barrier only. The compiler will not reorder memory accesses from one side of this
statement to the other. This has no effect on the order that the processor actually executes
the generated instructions.
• Mandatory barriers
– mb()
• A full system memory barrier. All memory operations before the mb() in the instruction
stream will be committed before any operations after the mb() are committed. This ordering
will be visible to all bus masters in the system. It will also ensure the order in which
accesses from a single processor reaches slave devices.
– rmb()
• Like mb(), but only guarantees ordering between read accesses. That is, all read
operations before an rmb() will be committed before any read operations after the rmb().
– wmb()
• Like mb(), but only guarantees ordering between write accesses. That is, all write
operations before a wmb() will be committed before any write operations after the wmb().
20
Reference:
http://blogs.arm.com/software-
enablement/448-memory-access-ordering-
part-2-barriers-and-the-linux-kernel/
http://www.kernel.org/doc/Documentation/
memory-barriers.txt

Memory barriers in Linux kernel
• SMP conditional barriers
– smp_mb()
• Similar to mb(), but only guarantees ordering between cores/processors within an
SMP system. All memory accesses before the smp_mb() will be visible to all cores
within the SMP system before any accesses after the smp_mb().
– smp_rmb()
• Like smp_mb(), but only guarantees ordering between read accesses.
– smp_wmb()
• Like smp_mb(), but only guarantees ordering between write accesses.
– SMP barriers are a subset of mandatory barriers, not a superset.
• An SMP barrier cannot replace a mandatory barrier, but a mandatory barrier can
replace an SMP barrier.
• Implicit barriers
– Locking constructs in the kernel act as implicit SMP barriers, in the same way
as pthread synchronization operations do in user space.
– I/O accessor macros (readb(), iowrite32()) for the ARM architecture act as
explicit memory barriers when kernel is compiled with
CONFIG_ARM_DMA_MEM_BUFFERABLE. This was added in linux-2.6.35.
• arch/arm/include/asm/io.h
• arch/arm/mm/Kconfig
21
Reference:
https://www.kernel.org/doc/Documentatio
n/memory-barriers.txt

Memory ordering in ARM Architecture
• Memory types
– Normal memory
• Normal memory is effectively for all of your data and executable code
• This memory type permits speculative reads, merging of accesses and repeating of
reads without side effects
• Accesses to Normal memory can always be buffered, and in most situations they
are also cached - but they can be configured to be uncached
• There is no implicit ordering of Normal memory accesses
– Device memory and Strongly-ordered memory
• Used with memory mapped peripherals or other control registers
• Processors implementing the LPAE treat Device and Strongly-ordered memory
regions identically
• ARMv7-A processors that do not implement the LPAE can set device memory to be
Shareable or Non-shareable
• Accesses to these types of memory must happen exactly the number of times that
executing the program suggests they should
• There is no guarantee about ordering between memory accesses to different
devices, or usually between accesses of different memory types
23
Reference:
http://blogs.arm.com/software-enablement/594-
memory-access-ordering-part-3-memory-access-
ordering-in-the-arm-architecture/

• Arranges of ARM Memory Types
– Normal
• Shareable or Non-shareable
• Cacheable or Non-cacheable
– Device (w/o LPAE)
• Shareable or Non-shareable
– Device (w LPAE)
• Always shareable
– Strongly-ordered
• Always shareable
• Have to wait slave’s access ACK
24
ARM ® Architecture Reference
Manual
ARMv7-A and ARMv7-R edition

• Figure A3-5 shows the memory ordering between two explicit accesses A1 and A2,
where A1 occurs before A2 in program order
 < Accesses must arrive at any particular memory-mapped peripheral or block of
memory in program order, that is, A1 must arrive before A2. There are no ordering
restrictions about when accesses arrive at different peripherals or blocks of
memory.
 – Accesses can arrive at any memory-mapped peripheral or block of memory in
any order.
25

• Barriers
– Barriers were introduced progressively into the ARM architecture
• Some ARMv5 processors, such as the ARM926EJ-S, implemented a Drain Write
Buffer cp15 operation, which halted execution until any buffered writes had drained
into the external memory system
• With the introduction of the ARMv6 memory model, this operation was redefined in
more architectural terms and became the Data Synchronization Barrier
– ARMv6 also introduced the new Data Memory Barrier and Flush Prefetch Buffer
cp15 operations
• ARMv7 evolved the memory model somewhat, extending the meaning of the
barriers - and the Flush Prefetch Buffer operation was renamed the Instruction
Synchronization Barrier
• ARMv7 also allocated dedicated instruction encodings for the barrier operations
– Use of the cp15 operations is now deprecated and software targeting ARMv7 or
later should use the DMB, DSB and ISB mnemonics.
• And finally, ARMv7 extended the Shareability concept to cover both Inner-shareable
and Outer-shareable domains
– This together with AMBA4 ACE gives us barriers that propagate into the memory
system
26

– Instruction Synchronization Barrier (ISB)
• The ISB ensures that any subsequent instructions are fetched anew
from cache in order that privilege and access is checked with the
current MMU configuration
– It is used to ensure any previously executed context changing
operations will have completed by the time the ISB completed
• Access type and domain are not really relevant for this barrier
– It is not used in any of the Linux memory barrier primitives, but
appears in memory management, cache control and context
switching code
27

– Data Memory Barrier (DMB)
• DMB prevents reordering of data accesses instructions across itself
– All data accesses by this processor/core before the DMB will be
visible to all other masters within the specified shareability domain
before any of the data accesses after it
– It also ensures that any explicit preceding data/unified cache
maintenance operations have completed before any subsequent
data accesses are executed
– The DMB instruction takes two optional parameters: an operation
type (stores only - 'ST' - or loads and stores) and a domain
– The default operation type is loads and stores and the default
domain is System
• In the Linux kernel, the DMB instruction is used for the smp_*mb()
macros
28

– Data Synchronization Barrier (DSB)
• DSB enforces the same ordering as the Data Memory Barrier
– But it also blocks execution of any further instructions until
synchronization is complete
– It also waits until all cache and branch predictor maintenance
operations have completed for the specified shareability domain
– If the access type is load and store then it also waits for any TLB
maintenance operations to complete.
• In the Linux kernel, the DSB instruction is used for the *mb() macros.
29

Domain
Abbreviatio
n
Description
Non-shareable NSH
A domain consisting only of the local agent. Accesses that never need to be synchronized with other
cores, processors or devices. Not normally used in SMP systems.
Inner
Shareable
ISH
A domain potentially shared by multiple agents, but usually not all agents in the system.
A system can have multiple Inner Shareable domains. An operation that affects one Inner Shareable
domain does not affect other Inner Shareable domains in the system.
Outer
Shareable
OSH
A domain almost certainly shared by multiple agents, and quite likely consisting of several Inner
Shareable domains. An operation that affects an Outer Shareable domain also implicitly affects all
Inner Shareable domains within it.
For processors such as the Cortex-A15 MPCore that implement the LPAE, all Device memory
accesses are considered Outer Shareable. For other processors, the shareability attribute can be
explicitly set (to shareable or non-shareable).
Full system SY
An operation on the full system affects all agents in the system; all Non-shareable regions, all Inner
Shareable regions and all Outer Shareable regions. Simple peripherals such as UARTs, and several
more complex ones, are not normally necessary to have in a restricted shareability domain.
• Shareability domains
– Shareability domains define "zones" within the bus topology within which memory
accesses are to be kept consistent (taking place in a predictable way) and
potentially coherent (with hardware support)
– Outside of this domain, observers might not see the same order of memory
accesses as inside it
30
Reference:
http://infocenter.arm.com/help/topic/com.arm.doc.dui0489c/CIHGHHIE.html
ARMv7

31
Allocated values for the data barriers (DMB/DSB)ARMv8

• The shareability domains example
32
4 cores per cluster,
2 clusters per chip

Memory model supported in C++11
• C++ Memory model
– Sequential consistent/acquire-release/relaxed
• http://en.cppreference.com/w/cpp/atomic/memory_order
• http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
34

Acquire and Release Semantics
• ARMv8 AArch64/AArch32 support load-acquire/store-release
instructions
– The Load-Acquire/Store-Release instructions can remove the requirement to use
the explicit DMB memory barrier instruction
35
Reference:
http://preshing.com/20120913/acq
uire-and-release-semantics
http://www.arm.com/files/downloa
ds/ARMv8_Architecture.pdf
Acquire semantics is a property which can only apply to
operations which read from shared memory. The operation is
then considered a read-acquire. Acquire semantics prevent
memory reordering of the read-acquire with any read or write
operation which follows it in program order.
Release semantics is a property which can only apply to
operations which write to shared memory. The operation is then
considered a write-release. Release semantics prevent memory
reordering of the write-release with any read or write operation
which precedes it in program order.

• An demo example
36
//Shared global variables
int A = 0;
int Ready = 0;
//Thread 1
A = 42;
Ready = 1;
//Thread 2
int r1 = Ready;
int r2 = A;
//Possible results of r1, r2
r1 =0 r2 = 0
r2 = 42
r1 = 1 r2 = 0
r2 = 42
//Shared global variables
int A = 0;
Atomic<int> Ready = 0;
//Thread 1
A = 42;
Ready.store(1,
memory_order_release);
//Thread 2
int r1 =
Ready.load(memory_ord
er_acquire);
int r2 = A;
//Possible results of r1, r2
r1 =0 r2 = 0
r2 = 42
r1 = 1 r2 = 42

• A Write-Release Can Synchronize-With a Read-Acquire
37
// Thread 1
void SendTestMessage(void* param)
{
// Copy to shared memory using non-atomic stores.
g_payload.tick = clock();
g_payload.str = "TestMessage";
g_payload.param = param;
// Perform an atomic write-release to indicate that the message is ready.
g_guard.store(1, std::memory_order_release);
}
// Thread 2
bool TryReceiveMessage(Message& result)
{
// Perform an atomic read-acquire to check whether the message is ready.
int ready = g_guard.load(std::memory_order_acquire);
if (ready != 0)
{
// Yes. Copy from shared memory using non-atomic loads.
result.tick = g_payload.tick;
result.str = g_msg_str;
result.param = g_payload.param;
return true;
}
// No.
return false;
}
Reference:
http://preshing.com/20130823/
the-synchronizes-with-relation/

Volatile V.S. memory-order/atomic
• What does the volatile keyword mean?
39
Reference:
http://www.drdobbs.com/parallel/vola
tile-vs-volatile/212701484

• C programmers have often taken volatile to mean that the variable could
be changed outside of the current thread of execution
– as a result, they are sometimes tempted to use it in kernel code
when shared data structures are being used
– In other words, they have been known to treat volatile types as a sort
of easy atomic variable, which they are not
– The use of volatile in kernel code is almost never correct
• The key point to understand with regard to volatile is that its purpose is to
suppress optimization, which is almost never what one really wants to do
• In the kernel, one must protect shared data structures against unwanted
concurrent access, which is very much a different task
• Like volatile, the kernel primitives which make concurrent access to data
safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
unwanted optimization. If they are being used properly, there will be no
need to use volatile as well
40
Reference:
https://www.kernel.org/doc/Document
ation/volatile-considered-harmful.txt

• To safely write lock-free code that communicates between threads without using
locks
– prefer to use ordered atomic variables
– Java/.NET volatile, C++0x atomic<T>, and C-compatible atomic_T
• To safely communicate with special hardware or other memory that has unusual
semantics
– use un-optimizable variables: ISO C/C++ volatile
– Remember that reads and writes of these variables are not necessarily
atomic
• To protect shared data structures against unwanted concurrent access in kernel
code
– use kernel concurrent access primitives, like spinlocks, mutexes, memory
barriers
• Finally, to express a variable that both has unusual semantics and has any or all
of the atomicity and/or ordering guarantees needed for lock-free coding
– only the ISO C++11 Standard provides a direct way to spell it: volatile
atomic<T>
41

Usage of memory barrier
instructions
• In what situations might I need to insert memory barrier instructions?
– Mutexes
43
Reference:
http://infocenter.arm.com/help/topic/
com.arm.doc.genc007826/Barrier_Lit
mus_Tests_and_Cookbook_A08.pdf
http://infocenter.arm.com/help/topic/
com.arm.doc.faqs/ka14041.html
LOCKED EQU 1
UNLOCKED EQU 0
lock_mutex
; Is mutex locked?
LDREX r1, [r0] ; Check if locked
CMP r1, #LOCKED ; Compare with "locked"
WFEEQ ; Mutex is locked, go into standby
BEQ lock_mutex ; On waking re-check the mutex
; Attempt to lock mutex
MOV r1, #LOCKED
STREX r2, r1, [r0] ; Attempt to lock mutex
CMP r2, #0x0 ; Check whether store completed
BNE lock_mutex ; If store failed, try again
DMB ; Required before accessing protected resource
BX lr
unlock_mutex
DMB ; Ensure accesses to protected resource have completed
MOV r1, #UNLOCKED ; Write "unlocked" into lock field
STR r1, [r0]
DSB ; Ensure update of the mutex occurs before other CPUs wake
SEV ; Send event to other CPUs, wakes any CPU waiting on using WFE
BX lr

Usage of memory barrier instructions
– Memory Remapping
• Consider a situation where your reset handler/boot code lives in Flash memory (ROM),
which is aliased to address 0x0 to ensure that your program boots correctly from the vector
table, which normally resides at the bottom of memory (see left-hand-side memory map).
• After you have initialized your system, you may wish to turn off the Flash memory alias so
that you can use the bottom portion of memory for RAM (see right-hand-side memory
map). The following code (running from the permanent Flash memory region) disables the
Flash alias, before calling a memory block copying routine (e.g., memcpy) to copy some
data from to the bottom portion of memory (RAM).
44
MOV r0, #0
MOV r1, #REMAP_REG
STR r0, [r1] ; Disable Flash alias
BL block_copy_routine() ; Block copy code into RAM
BL copied_routine() ; Execute copied routine (now in RAM)
DMB ; Ensure above str completion with DMB
DSB ; Ensure block copy is completed with DSB
ISB ; Ensure pipeline flush with ISB
Question

Usage of memory barrier instructions
– Self-modifying code
– If the memory you are performing the block copying routine on is marked as 'cacheable'
the instruction cache will need to be invalidated so that the processor does not execute
any other 'cached' code.
– For "write-back" regions the data cache must be cleaned before the instruction cache
invalidate.
45
Overlay_manager
; ...
BL block_copy ; Copy new routine from ROM to RAM
B relocated_code ; Branch to new routine
Overlay_manager
; ...
BL block_copy ; Copy new routine from ROM to RAM
data_cache_clean ; Clean the cache so that the new routine is written out to memory
icache_and_pb_invalidate ; Invalidate the instruction cache and branch predictor so that the
; old routine is no longer cached
B relocated_code ; Branch to new routine
DSB ; Ensure block copy has completed
ISB ; Flush pipeline to ensure processor fetches new instructions
DSB ; Ensure data cache clean has completed
DSB ; Ensure invalidate has completed
ISB ; Flush pipeline to ensure processor fetches new instructions

Memory model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Memory model

Similar to Memory model (20)

More from Yi-Hsiu Hsu

More from Yi-Hsiu Hsu (6)

Recently uploaded

Recently uploaded (20)

Memory model

Editor's Notes