1. III Year –CSE
Regulation 2017
Department Computer Science &
Engineering
Ramco Institute of Technology
1
CS8603 DISTRIBUTED SYSTEMS
RIT/CSE/CS8603-DS/UNIT V
2. 5.2. Distributed Shared Memory
5.2.2. Memory Consistency
Models
2
UNIT V
Source:
Ajay D Kshemkalyani & Mukesh Singhal (2010). Distributed
Computing: Principles, Algorithms and Systems. Cambridge
University Press
RIT/CSE/CS8603-DS/UNIT V
4. 5.2.2. Memory Consistency Models
RIT/CSE/CS8603-DS/UNIT V
4
Memory coherence is the ability of the system to
execute memory operations correctly.
Memory consistency models determine when
data updates are propagated and what level of
inconsistency is acceptable.
For example, assume the following:
n processes and si memory operations per process
Pi.
All the operations issued by a process are executed
sequentially.
We will get (s1 + s2 + . . . sn )!/(s1!s2! . . . sn !)
possible interleaving
5. 5.2.2. Memory Consistency Models
Contd..
RIT/CSE/CS8603-DS/UNIT V
5
Memory coherence model defines which interleavings
are permitted.
Traditionally, Read returns the value written by the
most recent Write ”Most recent” Write is ambiguous
with replicas and concurrent accesses
DSM consistency model is a contract between DSM
system and application programmer
Hence, a clear definition of correctness is required in
such a system.
The DSM system enforces a particular memory
consistency model
https://www.youtube.com/watch?v=uAqIa-mtjJ4
6. 5.2.2. Memory Consistency Models
Contd…
RIT/CSE/CS8603-DS/UNIT V
6
To use DSM, one must also implement a distributed
synchronization service.
It includes the use of locks, semaphores, and
message passing.
Most implementations, data is read from local copies
of the data but updates to data must be propagated to
other copies of the data.
Some of the memory consistency models are:
Strict Consistency/ Atomic Consistency/Linearizability
Sequential consistency
Casual Consistency
PRAM (Pipelined RAM) / Processor Consistency
Slow Memory
7. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
7
Strict Consistency/ Atomic Consistency/Linearizability
It is corresponding to the notion of correctness on the
traditional Von Neumann architecture or the uni-
processor machine.
Any Read to a location (variable) should return the
value written by the most recent Write to that location
(variable).
Two Salient Feature:
(i) a common global time axis is implicitly available in a
uni-processor system
(ii) each write is immediately visible to all processes.
Adapting this correctness model to a DSM system
with operations that can be concurrently issued by the
various processes gives the strict consistency model,
also known as the atomic consistency model
8. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
8
Strict Consistency/ Atomic Consistency/Linearizability
It can be formally specified as follows:
1. Any Read to a location (variable) is required to return
the value written by the most recent Write to that location
(variable) as per a global time reference.
For operations that do not overlap as per the global time
reference, the specification is clear.
For operations that overlap as per the global time
reference, the following further specifications are
necessary.
2. All operations appear to be executed atomically and
sequentially.
3. All processors see the same ordering of events, which
is equivalent to the global-time occurrence of non-
overlapping events.
9. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
9
Strict Consistency/ Atomic Consistency/Linearizability
An alternate way to specify this consistency model is in
terms of invocation and response to each read and write
operation.
Each operation takes a finite time interval
Hence, different operations by different processors can
overlap in time.
However, the invocation and the response to each
invocation can both be separately viewed as being atomic
events.
An execution sequence in global time is viewed as a
sequence Seq of such invocations and responses.
Clearly, Seq must satisfy the following conditions:
(Liveness:) Each invocation must have a corresponding
response.
(Correctness:) The projection of Seq on any processor i,
10. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
10
Strict Consistency/ Atomic Consistency/Linearizability
(Example)
Figure (a) The execution is not linearizable because
although the Read by P2 begins after Write x 4, the Read
returns the value that existed before the Write. Hence, a
permutation Seq satisfying the condition 2 above on global
time order does not exist.
Figure (b) The execution is linearizable. The global order
of operations (corresponding to invocation, response pairs
in Seq), consistent with the real-time occurrence, is: Write
y 2, Write x 4, Read x 4, Ready 2. This permutation Seq
satisfies conditions 1 and 2.
Figure (c) The execution is not linearizable. The two
dependencies: Read x 0 before Write x 4, and Read y 0
before Write x 2 cannot both be satisfied in a global order
while satisfying the local order of operations at each
processor. Hence, there does not exist any permutation
Seq satisfying conditions 1 and 2.
12. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
12
Strict Consistency/ Atomic
Consistency/Linearizability (Implementation)
Simulating global time axis is expensive.
Assume full replication, and total order broadcast
support.
13. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
13
Sequential Consistency Model(SC)
Linearizability is too strict for most practical
purpose
Strongest memory model for DSM that is used in
practice is sequential consistency.
This is very expensive
Programmers can deal with weaker models.
The first weaker model, that of sequential
consistency (SC) was proposed by Lamport.
It uses logical time reference instead of the global
time reference.
14. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
14
Sequential Consistency Model(SC)
Sequential consistency is specified as follows:
The result of any execution is the same as if all
operations of the processors were executed in
some sequential order.
The operations of each individual processor appear
in this sequence in the local program order.
15. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
15
Sequential Consistency Model(SC)
More formally , a sequence Seq of invocation and
response events is sequentially consistent if there is a
permutation Seq of adjacent pairs of corresponding
invoc resp events satisfying:
1. For every variable v, the projection of Seq on v,
denoted Seq v, is such that every Read (adjacent invoc
resp event pair) returns the most recent Write (adjacent
invoc resp event pair) that immediately preceded it.
2. If the response op1 resp of operation op1 at process
Pi occurred before the invocation op2 invoc of operation
op2 by process Pi in Seq, then op1 (adjacent invoc resp
event pair) occurs before op2 (adjacent invoc resp event
pair) in Seq.
16. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
16
Sequential Consistency Model(SC)
Condition 1 is the same as that for linearizability.
Condition 2 differs from that for linearizability.
It specifies that the common order Seq must satisfy
only the local order of events at each processor,
instead of the global order of non-overlapping
events.
17. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
17
Sequential Consistency Model(Example)
Figure 12.4(a) The execution is sequentially consistent.
The global order Seq is: Write y 2, Read x 0, Write x 4,
Read y 2.
Figure 12.4(b) As the execution is linearizable (seen in
Section 12.2.1), it is also sequentially consistent. The
global order of operations (corresponding to invocation,
response pairs in Seq), consistent with the real-time
occurrence, is: Write y 2, Write x 4, Read x 4, Read y 2.
Figure 12.4(c) The execution is not sequentially
consistent (and hence not linearizable). The two
dependencies: Read x 0 before Write x 4, and Read y 0
before Write x 2 cannot both be satisfied in a global order
while satisfying the local order of operations at each
19. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
19
Sequential Consistency Model(Implementation)
It should be easier to implement it.
Global time ordering need not be preserved across
processes.
It is sufficient to use total order broadcasts for the
Write operations only.
In the simplified algorithm, no total order broadcast is
required for Read
operations, because:
1. all consecutive operations by the same processor are
ordered in the same order because pipelining is not
used;
2. Read operations by different processors are
independent of each other and need to be ordered only
with respect to the Write operations in the execution.
20. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
20
Implementation of SC using Local
Read Operation:
A Read operation completes
atomically, whereas a Write
operation does not.
Between the invocation of a
Write by Pi (line 1b) and its
acknowledgement (lines 2a,
2b), there may be multiple Write
operations initiated by other
processors that take effect at Pi
(line 2a).
Thus, a Write issued locally has
its completion locally delayed.
Such an algorithm is
acceptable for Read intensive
programs.
21. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
21
Implementation of SC using Local
Write Operation:
For Writeintensive programs, it is
desirable that a locally issued
Write gets acknowledged
immediately (as in lines 2a–2c),
even though the total order
broadcast for the Write, and the
actual update for the Write may
not go into effect by updating the
variable at the same time (line
3a).
The algorithm achieves this at the
cost of delaying a Read operation
by a processor until all previously
issued local Write operations by
that same processor have locally
gone into effect (i.e., previous
Writes issued locally have
updated their local variables being
written to).
22. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
22
Implementation of SC using Local Write Operation:
The variable counter is used to track the number of
Write operations that have been locally initiated but
not completed at any time.
A Read operation completes only if there are no prior
locally initiated Write
operations that have not written to their variables (line
1a), i.e., there are no pending locally initiated Write
operations to any variable.
Otherwise, a Read operation is delayed until after all
previously initiated Write operations have written to
their local variables (lines 3b–3d), which happens
after the total order broadcasts associated with the
Write have delivered the broadcast message locally.
23. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
23
Causal Consistency
Write operations issued by different processors
must necessarily be seen in some common order
by all processors is required in Sequential
consistency model.
It can be relaxed to require only that Writes that
are causally related
It must be seen in that same order by all
processors, whereas “concurrent” Writes may be
seen by different processors in different orders.
The resulting consistency model is the causal
consistency model.
24. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
24
Causal Consistency
The causality relation for shared memory systems
is defined as follows:
Local order At a processor, the serial order of the
events defines the local causal order.
Inter-process order A Write operation causally
precedes a Read operation issued by another
processor if the Read returns a value written by the
Write.
Transitive closure The transitive closure of the
above two relations defines the (global) causal
order.
25. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
25
Causal Consistency (Example)
Figure (a) The execution is sequentially consistent (and
hence causally consistent). Both P3 and P4 see the
operations at P1 and P2 in sequential order and in causal
order.
Figure (b) The execution is not sequentially consistent but
it is causally consistent. Both P3 and P4 see the
operations at P1 and P2 in causal order because the lack
of a causality relation between the Writes by P1 and by P2
allows the values written by the two processors to be seen
in different orders in the system. The execution is not
sequentially consistent because there is no global
satisfying the contradictory ordering requirements set by
the Reads by P3 and by P4.
Figure (c) The execution is not causally consistent
because the second Read by P4 returns 4 after P4 has
already returned 7 in an earlier Read.
27. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
27
PRAM (pipelined RAM) or processor consistency
Causal consistency requires all causally related
Writes to be seen in the same order by all
processors
It requires more restriction on application.
Only Write ops issued by the same processor are
seen by others in the order they were issued, but
Writes from different processors may be seen by
other processors in different orders.
All operations issued by any processor appear to
the other processors in a FIFO pipelined
sequence.
28. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
28
PRAM (pipelined RAM) or processor
consistency(Example)
In the previous Figure (c), the execution is PRAM
consistent (even though it is not causally
consistent) because (trivially) both P3 and P4 see
the updates made by P1 and P2 in FIFO order
along the channels P1 to P3 and P2 to P3, and
along P1 to P4 and P2 to P4, respectively.
29. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
29
PRAM (pipelined RAM) or processor
consistency(Implementation)
PRAM consistency can be implemented using
FIFO broadcast
30. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
30
Slow memory
It is the next weaker consistency model.
It represents a location-relative weakening of the
PRAM model.
Here, only all Write operations issued by the
same processor and to the same memory
location must be observed in the same order by
all the processors
31. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
31
Slow memory(Example)
Figure (a) The updates to each of the variables are
seen pipelined separately in a FIFO fashion. The “x”
pipeline from P1 to P2 is slower than the “y” pipeline
from P1 to P2. Thus, the overtaking effect is allowed.
However, PRAM consistency is violated because the
FIFO property is violated over the single common
“pipeline” from P1 to P2 – the update to y is seen by
P2 but the much older value of x = 0 is seen by P2
later.
Figure (b) Slow memory consistency is violated
because the FIFO property is violated for the pipeline
for variable x. “x = 7” is seen by P2 before it sees “x
=0” and “x =2” although 7 was written to x after the
33. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
33
Slow memory(Implementation)
Slow memory can be implemented using a
broadcast primitive
FIFO property should be satisfied only for
updates to the same variable.
35. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
35
Other models based on synchronization instructions
The behavior of DSM differ based on the consistency
model.
The programmer’s logic also depends on the
underlying consistency model.
Consistency conditions apply only to special
”synchronization” instructions, e.g., barrier
synchronization
Non-sync statements may be executed in any order
by various processors
Some of the other consistency models are:
Weak Consistency
Release Consistency
Entry Consistency
36. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
36
Other models based on synchronization instructions-
Weak Consistency Model
All Writes are propagated to other processes, and all
Writes done elsewhere are brought locally, at a sync
instruction.
Properties
Accesses to sync variables are sequentially consistent
Access to sync variable is not permitted unless all Writes
elsewhere have completed
No data access is allowed until all previous
synchronization variable accesses have been
performed
Drawback
It cannot tell whether beginning access to shared
variables (enter CS), or finished access to shared
variables (exit CS)
37. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
37
Other models based on synchronization instructions-
Release Consistency Model
Drawback of Weak Consistency Model:
When a synchronization variable is accessed, the
memory does not know whether this is being done
because the process is finished writing the shared
variables (exiting the CS) or about to begin reading them
(entering the CS)
Release consistency provides these two kinds.
Acquire accesses are used to tell the memory system
that a critical region is about to be entered. Hence, the
actions for case 2 above need to be performed to ensure
that local replicas of variables are made consistent with
remote ones.
Release accesses say that a critical region has just been
exited.
38. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
38
Other models based on synchronization
instructions- Release Consistency Model
Rules are followed by the protected variables:
All previously initiated Acquire operations must
complete successfully before a process can access
a protected shared variable.
All accesses to a protected shared variable must
complete before a Release operation can be
performed.
The Acquire and Release operations effectively
follow the PRAM consistency model.
39. 5.2.2. Memory Consistency Models
contd…
RIT/CSE/CS8603-DS/UNIT V
39
Other models based on synchronization
instructions- Entry Consistency Model
Each ordinary shared variable is associated with
a synchronization variable (e.g., lock, barrier)
For Acquire /Release on a synchronization
variable, access to only those ordinary variables
guarded by the synchronization variables is
performed.