SlideShare a Scribd company logo
Replication & Consistency
EECE 411: Design of Distributed Software Applications
Logistics
 Project
 P01
 Non-blocking IO lecture: Nov 4th.
 P02 deadline on NEXT Wednesday (November 17th)
 Next quiz
 Tuesday: November 16th
EECE 411: Design of Distributed Software Applications
Problem Technique Advantage
Partitioning  Consistent hashing Incremental scalability
High availability for
writes
 Eventual consistency
 Quorum protocol
 Vector clocks with
reconciliation during reads
High availability
Handling temporary
failures
 ‘Sloppy’ quorum
protocol and hinted
handoff
Provides high availability
and durability guarantee
when some of the replicas
are not available.
Recovering from
permanent failures
 Anti-entropy using
Merkle trees
Synchronizes divergent
replicas in the background.
Membership and
failure detection
 Gossip-based
membership protocol
Preserves symmetry and
avoids having a centralized
registry for storing
membership and node
liveness information.
Last time:
Amazon’s Dynamo service
EECE 411: Design of Distributed Software Applications
Replication: Creating and using multiple copies of
data (or services)
Why replicate?
 Improve system reliability
 Prevent data loss
 i.e. provide data durability
 Increase data availability
 Note: availability ≠ durability
 Increase confidence: e.g. deal with byzantine failures
 Improve performance
 Scaling throughput
 Reduce access times
EECE 411: Design of Distributed Software Applications
Issue 1. Dealing with data changes
 Consistency models
 What is the semantic the system implements
 (luckily) applications do not always require strict consistency
 Consistency protocols
 How to implement the semantic agreed upon?
Issue 2. Replica management
 How many replicas?
 Where to place them?
 When to get rid of them?
Issue 3. Redirection/Routing
 Which replica should clients use?
What are the issues?
EECE 411: Design of Distributed Software Applications
Performance and Scalability
To keep replicas consistent, we generally need to ensure that
all conflicting operations are done in the same order
everywhere
Conflicting operations: From the world of transactions:
 Read–write conflict: a read operation and a write
operation act concurrently
 Write–write conflict: two concurrent write operations
Problem: Guaranteeing global ordering on conflicting
operations may be costly, reducing scalability
Solution: Weaken consistency requirements so that hopefully
global synchronization can be avoided
Scalability TENSION Management overheads
EECE 411: Design of Distributed Software Applications
Client‘s view of the data-store
Ideally: black box -- complete ‘transparency’ over how data
is stored and managed
EECE 411: Design of Distributed Software Applications
Management’ system view on data store
Management system: Controls the allocated resources and
aims to provide transparency
EECE 411: Design of Distributed Software Applications
Consistency model
Management system: Controls the allocated resources and
aims to provide transparency
Consistency model: Contract between the data store and
the clients: The data store specifies the results of read and
write operations in the presence of concurrent operations.
EECE 411: Design of Distributed Software Applications
Roadmap for next few classes
 Consistency models:
 contracts between the data store and the clients
that specify the results of read and write
operations are in the presence of concurrency.
 Protocols
 To manage update propagation
 To manage replicas:
 creation, placement, deletion
 To assign client requests to replicas
EECE 411: Design of Distributed Software Applications
Consistency models
 Data centric: Assume a global, data-store view
(i.e., across all clients)
 Continuous consistency
 Limit the deviation between replicas
 Models based on uniform ordering of operations
 Constraints on operation ordering at the data-store level
 Client centric
 Assume client-independent
views of the datastore
 Constraints on operation
ordering for each client
independently
EECE 411: Design of Distributed Software Applications
Notations
For operations on a Data Store:
 Read: Ri(x) a -- client i reads a from location x
 Write: Wi(x) b -- client i writes b at location x
P1: W(x)1
P2: R(x)NIL R(x)1
EECE 411: Design of Distributed Software Applications
Sequential Consistency
The result of any execution is the same as if :
 operations by all processes on the data store were
executed in some sequential order, and
 the operations of each individual process appear in this
sequence in the order specified by its program.
Note: we talk about interleaved execution – there is some
total ordering for all operations taken together
EECE 411: Design of Distributed Software Applications
Sequential Consistency (example)
Process1: Process 2 Process 3
X  1 Y  1 Z  1
print (Y, Z) print (X, Z) print (X, Y)
Is output: 00 00 00 permitted?
What about: 11 11 11?
What about: 00 10 10?
EECE 411: Design of Distributed Software Applications
Causal Consistency (1)
Writes that are potentially causally related must be seen
by all processes in the same order. Concurrent writes
may be seen in a different order by different
processes.
Causally related relationship:
 A read is causally related to the write that provided the data the
read got.
 A write is causally related to a read that happened before this
write in the same process.
 If write1  read, and read  write2, then write1  write2.
Concurrent <=> not causally related
EECE 411: Design of Distributed Software Applications
Causal Consistency (Example)
 This sequence is allowed with a causally-consistent
store, but not with sequentially or strictly consistent
store.
 Note: W1(x)a  W2(x)b, but not W2 (x)b  W1(x)c
EECE 411: Design of Distributed Software Applications
Causal Consistency: (More Examples)
A correct sequence of events in a causally-consistent store.
A violation of a causally-consistent store.
EECE 411: Design of Distributed Software Applications
Increasing granularity: Grouping Operations
Basic idea: You don’t care that reads and writes of a
series of operations are immediately known to other
processes. You just want the effect of the series itself
to be known.
One Solution:
 Accesses to synchronization variables are
sequentially consistent.
 No access to a synchronization variable is allowed to
be performed until all previous writes have completed
everywhere.
 No read data access is allowed to be performed until
all previous accesses to synchronization variables
have been performed.
EECE 411: Design of Distributed Software Applications
Consistency models
 Data centric: Assume a global, data-store view
(i.e., across all clients)
 Continuous consistency
 Limit the deviation between replicas
 Models based on uniform ordering of operations
 Constraints on operation ordering at the data-store level
 Client centric
 Assume client-independent
views of the datastore
 Constraints on operation
ordering for each client
independently
EECE 411: Design of Distributed Software Applications
Continuous Consistency
Observation: We can actually talk a about a degree of
consistency:
 replicas may differ in their numerical value
 replicas may differ in their relative staleness
 Replicas may differ with respect to (number and order) of
performed update operations
Conit: consitency unit  specifies the data unit over
which consistency is to be enforced.
EECE 411: Design of Distributed Software Applications
Example
Conit: contains the variables x and y:
 Each replica maintains a vector clock
 B sends A operation [<5,B>: x := x + 2]; A has made
this operation permanent (cannot be rolled back)
 A has three pending operations  order deviation = 3
EECE 411: Design of Distributed Software Applications
 Data centric: solutions at the data store level
 Continuous consistency
 limit the deviation between replicas, or
 Models based on uniform ordering of operations
 Constraints on operation ordering at the data-store level
 Client centric
 Assume client-independent views of the datastore
 Constraints on operation ordering for each client
independently
Consistency models: contracts between the data store
and the clients
EECE 411: Design of Distributed Software Applications
Eventual Consistency
Idea: If no updates take place for a long enough period
time, all replicas will gradually (i.e., eventually) become
consistent.
When? Situations where eventual consistency models may
make sense
 Mostly read-only workloads
 No concurrent updates
 Advantages/Drawbacks
EECE 411: Design of Distributed Software Applications
Client-centric Consistency Models
 System model
 Monotonic reads
 Monotonic writes
 Read-your-writes
 Write-follows-reads
Goal: Avoid system-wide consistency, by concentrating on
what each client independently wants, (instead of what
should be maintained by servers with a global view)
EECE 411: Design of Distributed Software Applications
Example: Consistency for Mobile Users
Example: Distributed database to which a user has access
through her notebook.
 Notebook acts as a front end to the database.
 At location A user accesses the database with reads/updates
 At location B user continues work, but unless it accesses the
same server as the one at location A, she may detect
inconsistencies:
 updates at A may not have yet been propagated to B
 user may be reading newer entries than the ones available at A:
 user updates at B may eventually conflict with those at A
Note: The only thing the user really needs is that the entries
updated and/or read at A, are available at B the way she left
them in A.
 Idea: the database will appear to be consistent to the user
EECE 411: Design of Distributed Software Applications
Example - Basic Architecture
EECE 411: Design of Distributed Software Applications
Client-centric Consistency
Idea: Guarantees some degree of data access consistency for
a single client/process point of view.
Notations:
 Xi[t]  Version of data item x at time t at local replica Li
 WS(xi [t])  working set (all write operations) at Li
up to time t
 WS(xi [t]; xj [t])  indicates that it is known that
WS(xi [t]) is included in WS(xj [t])
EECE 411: Design of Distributed Software Applications
Monotonic-Read Consistency
Intuition: Client “sees” the same or newer version of
data.
Definition: If a process reads the value of a data item
x, any successive read operation on x by that process
will always return that same or a more recent value.
EECE 411: Design of Distributed Software Applications
Monotonic reads – Examples
 Automatically reading your personal calendar updates
from different servers.
 Monotonic Reads guarantees that the user sees always more
recent updates, no matter from which server the reading
takes place.
 Reading (not modifying) incoming e-mail while you
are on the move.
 Each time you connect to a different e-mail server, that
server fetches (at least) all the updates from the server you
previously visited.
EECE 411: Design of Distributed Software Applications
Monotonic-Write Consistency
Intuition: A write happens on a replica only if it’s
brought up to date with preceding write operations
on same data (but possibly at different replicas)
Definition: A write operation by a process on a data
item x is completed before any successive write
operation on x by the same process.
WS
EECE 411: Design of Distributed Software Applications
Monotonic writes – Examples
 Updating a program at server S2, and ensuring that
all components on which compilation and linking
depends, are also placed at S2.
 Maintaining versions of replicated files in the correct
order everywhere (propagate the previous version to
the server where the newest version is installed).
EECE 411: Design of Distributed Software Applications
Read-Your-Writes Consistency
Intuition: All previous writes are always completed
before any successive read
Definition: The effect of a write operation by a
process on data item x, will always be seen by a
successive read operation on x by the same process.
EECE 411: Design of Distributed Software Applications
Read-Your-Writes - Examples
 Updating your Web page and guaranteeing that your
Web browser shows the newest version instead of its
cached copy.
 Password database
EECE 411: Design of Distributed Software Applications
Writes-Follow-Reads Consistency
Intuition: Any successive write operation on x will be
performed on a copy of x that is same or more recent
than the last read.
Definition: A write operation by a process on a data item
x following a previous read operation on x by the same
process, is guaranteed to take place on the same or a
more recent value of x that was read.
 Examples: news groups
EECE 411: Design of Distributed Software Applications
Writes-Follow-Reads - Examples
 See reactions to posted articles only if you have the
original posting
 a read “pulls in” the corresponding write operation.
EECE 411: Design of Distributed Software Applications
Rodamap
Updating replicas
 Consistency models
 What is the semantic the system provides in case of updates
 (luckily) applications do not always require strict consistency
 Consistency protocols:
 How is this semantic implemented
Replica and content management
 How many replicas?
 Where to place them?
 When to get rid of them?
Redirection/Routing
 Which replica should clients use?
EECE 411: Design of Distributed Software Applications
Reminder: System view
Management system: Controls the allocated resources and
aims to provide replication transparency
Consistency model: contract between the data store and
the clients
EECE 411: Design of Distributed Software Applications
Next: implementation techniques
Updating replicas
 Consistency models (how to deal with updated data)
 (luckily) applications do not always require strict consistency
 Consistency protocols: Implementation issues
 How is a consistency model implemented
Replica and content management
 How many replicas?
 Where to place them?
 When to get rid of them?
Redirection/Routing
 Which replica should clients use?
EECE 411: Design of Distributed Software Applications
Reminder: Types of consistency models
 Consistency model: contract between the data store
and the clients
 the data store specifies precisely what the results of read and
write operations are in the presence of concurrency.
 Data centric: solutions at the data store level
 Continuous consistency: limit the deviation between replicas, or
 Constraints on operation ordering at the data-store level
 Client centric
 Constraints on operation ordering
for each client independently
EECE 411: Design of Distributed Software Applications
Today: Consistency protocols
Question: How does one design the protocols to
implement the desired consistency model?
 Data centric
 Constraints on operation ordering at the data-store level
 Sequential consistency.
 Continuous consistency: limit the deviation between replicas
 Client centric
 Constraints on operation ordering
for each client independently
EECE 411: Design of Distributed Software Applications
Reminder: Monotonic-Read Consistency
Definition: If a process reads the value of a data item x,
any successive read operation on x by that process will
always return that same or a more recent value.
 Intuition: Client “sees” the same or a newer version of
the data.
EECE 411: Design of Distributed Software Applications
Implementation of monotonic-read consistency
Sketch:
 Globally unique identifier for each write operation
 Quiz-like question: explain how exactly the globally
unique identifier is (1) generated, and (2) used.
 Each client keeps track of:
 Write IDs relevant to the read operations performed so
far on various objects (ReadSet)
 When a client launches a read
 Client sends the ReadSet to the replica server
 The server checks that all updates in the ReadSet
have been performed. [If necessary] Fetches
unperformed updates
 Returns the value
EECE 411: Design of Distributed Software Applications
More quiz-like questions
Sketch a Design for the consistency model
 Monotonic-writes
 Read-your-writes
 Writes-follow-reads
Write pseudo-code
EECE 411: Design of Distributed Software Applications
Today: Consistency protocols
Question: How does one design the protocols to
implement the desired consistency model?
 Data centric
 Constraints on operation ordering at the data-store level
 Sequential consistency, causal consistency, etc
 Continuous consistency: limit the deviation between replicas
 Client centric
 Constraints on operation ordering
for each client independently
EECE 411: Design of Distributed Software Applications
Overview: Data-centric
Consistency Protocols
EECE 411: Design of Distributed Software Applications
Overview: Data-centric
Consistency Protocols
EECE 411: Design of Distributed Software Applications
Usage: distributed databases and file systems that require a
high degree of fault tolerance. Replicas often placed on
same LAN.
Issues: blocking vs. non-blocking
Primary-Based Protocols
EECE 411: Design of Distributed Software Applications
Primary-backup protocol with local writes
 The primary migrates to the node that executes the write
 Usage: Mobile computing in disconnected mode
 Idea: ship all relevant files to user before disconnecting, and update
later.
EECE 411: Design of Distributed Software Applications
Overview: Data-centric
Consistency Protocols
EECE 411: Design of Distributed Software Applications
Active Replication (1/2)
 Model: Remote operations/updates (and not the
result of state changes) are replicated
 Control replication (and not data replication)
 Requires total update ordering (either through a
sequencer or using Lamport’s protocol).
 Example: Replicated distributed objects
 Problems
 Operations need to
be sequenced
 Dealing with workflows
EECE 411: Design of Distributed Software Applications
Replicated-Write: Active Replication (2/2)
 Dealing with workflows in a general setting
EECE 411: Design of Distributed Software Applications
Overview: Data-centric
Consistency Protocols
EECE 411: Design of Distributed Software Applications
Replicated writes: Quorums
Problem: some replicas may not be available all the time.
 leads to low availability (for writes)
Solution: Quorum-based protocols: Ensure that each
operation is carried on enough replicas (not on all)
 a majority ‘vote’ is established before each operation
 distinguish a read quorum and a write quorum:
EECE 411: Design of Distributed Software Applications
Valid quorums
Must satisfy two basic rules
 A read quorum should “intersect” any prior write quorum at >= 1
processes
 A write quorum should intersect any other write quorum
So, in a group of size N:
 Qr + Qw > N, and
 Qw + Qw > N results in Q>w > N/2
EECE 411: Design of Distributed Software Applications
Quorums: Mechanics for read and write
Setup: Replicas of data items have “versions”,
 Versions are numbered (timestamped)
 Timestamps must increase monotonically and include a
process id to break ties
Reads are easy:
 Send RPCs until Qr processes reply
 Then use the replica with the largest timestamp (the
most recently updated replica)
EECE 411: Design of Distributed Software Applications
Quorums: Mechanics for read and write (cont)
Writes are trickier:
 Need to determine version number
 Need to make sure all replicas are updated atomically
 Consequence: need a ‘commit’ protocol to protect from
concurrent writes
EECE 411: Design of Distributed Software Applications
Quorum based protocols: Write algorithm
EECE 411: Design of Distributed Software Applications
Consistency protocols for continuous consistency
Question: How does one design the protocols to
implement the desired consistency model?
 Data centric
 Constraints on operation ordering at the data-store level
 Sequential consistency.
 Continuous consistency: limit the deviation between
replicas
 Client centric
 Constraints on operation ordering
for each client independently
EECE 411: Design of Distributed Software Applications
Continuous Consistency
EECE 411: Design of Distributed Software Applications
Continuous Consistency (reminder)
Observation: We can talk a about a degree of
consistency:
 replicas may differ in their numerical value
 replicas may differ in their relative staleness
 There may differences with respect to (number and
order) of performed update operations
Conit: consistency unit  specifies the data unit over
which consistency is to be measured.
EECE 411: Design of Distributed Software Applications
Continuous Consistency:
Bounding Numerical Errors (I)
Principle: consider a conit (data item) x and let weight(W(x))
denote the numerical change in its value after a write
operation W.
 [for simplicity] Assume that for all W(x), weight(W) > 0
 W is initially forwarded to one of the N replicas: the origin(W).
 TW[i, j] are the writes executed by server Si that originated from Sj
TW[i, j]=Σ {weight(W)| origin(W)=Sj & W in log(Si)}
Note: Actual value of x: vi : value of x at replica i
EECE 411: Design of Distributed Software Applications
Continuous Consistency: Bounding Numerical Errors (II)
Reminder: TW[i, j] writes executed by server Si originated at Sj
Problem: We need to ensure that v(t) − vi < di for every Si.
Approach: Let every server Sk maintain a view TWk[i, j] for
what Sk believes is the value of TW[i, j].
 This information can be piggybacked when an update is propagated.
Note: 0 ≤ TWk[i, j] ≤ TW[i, j] ≤ TW[j, j]
Solution: Sk sends operations from its log to Si when it sees that
TWk[i, k] is getting too far from TW[k, k],
 in particular, when TW[k, k] − TWk[i, k] > di/(N − 1).
Note: Staleness can be done analogously, by essentially keeping track of what
has been seen last from Sk
EECE 411: Design of Distributed Software Applications
Issue 1. Dealing with data changes
 Consistency models
 What is the semantic the system implements
 (luckily) applications do not always require strict consistency
 Consistency protocols
 How to implement the semantic agreed upon?
Issue 2. Replica management
 How many replicas?
 Where to place them?
 When to get rid of them?
Issue 3. Redirection/Routing
 Which replica should clients use?
Rodamap
EECE 411: Design of Distributed Software Applications
Quorums
Servers
Clients
State:
…
State:
…
State:
…
[Barbara Liskov]
EECE 411: Design of Distributed Software Applications
Quorums
Servers
Clients
State:
…
State:
…
State:
…
A A
X
[Barbara Liskov]
EECE 411: Design of Distributed Software Applications
Quorums
Servers
Clients
State:
…
State:
…
A
State:
…
A
X
[Barbara Liskov]
EECE 411: Design of Distributed Software Applications
C-A-P
choose two
C
A P
Fox&Brewer
“CAP Theorem”
consistency
Availability Partition-resilience
Claim: every distributed
system is on one side of
the triangle.
CA: available, and
consistent, unless
there is a partition.
AP: a reachable replica
provides service even in
a partition, but may be
inconsistent.
CP: always consistent, even
in a partition, but a reachable
replica may deny service
without agreement of the
others (e.g., quorum).
EECE 411: Design of Distributed Software Applications
Issue 1. Dealing with data changes
 Consistency models
 What is the semantic the system implements
 (luckily) applications do not always require strict consistency
 Consistency protocols
 How to implement the semantic agreed upon?
Issue 2. Replica management
 How many replicas?
 Where to place them?
 When to get rid of them?
Issue 3. Redirection/Routing
 Which replica should clients use?
Rodamap
EECE 411: Design of Distributed Software Applications
Reminder: System view
Management system: Controls the allocated resources and
aims to provide replication transparency
Consistency model: contract between the data store and
the clients
EECE 411: Design of Distributed Software Applications
 Consistency model to have in mind for next
slides: eventual consistency
 Example applications: web content
distribution – HTML, videos, etc
EECE 411: Design of Distributed Software Applications
Replica server placement
Problem: Figure out what the best K places are out of N
possible locations.
 Select best location out of N – j, j:0..K-1 for which the
average distance to clients is minimal. Then choose the
next best server.
 (Note: The first chosen location minimizes the average distance to
all clients.)
 Computationally expensive.
 Select the k largest autonomous system and place a
server at the best-connected host.
 Computationally expensive.
 Position nodes in a d-dimensional geometric space, where
distance reflects latency. Identify the K regions with
highest density and place a server in every one.
 Computationally cheap.
EECE 411: Design of Distributed Software Applications
Content replication and placement
Model: We consider data objects
(don't worry whether they contain data or code, or both)
Distinguish different processes: A process is capable
of hosting a replica of an object or data:
 Permanent replicas: Process/machine always having a
replica
 Server-initiated replica: Process that can dynamically
host a replica on request of another server in the data
store
 Client-initiated replica: Process that can dynamically
host a replica on request of a client (client caches)
EECE 411: Design of Distributed Software Applications
Content Replication: Server-Initiated Replicas
 Keep track of access counts per file, aggregated by
considering server closest to requesting clients
 Number of accesses drops below threshold D  drop file
 Number of accesses exceeds threshold R  replicate file
 Number of access between D and R  migrate file
EECE 411: Design of Distributed Software Applications
Client initiated replicas
Issue: What do I propagate when content is dynamic?
State vs. operations
 Propagate only notification/invalidation of update
(often used for caches)
 Transfer data from one copy to another (often for
distributed databases)
 Propagate the update operation to other copies
(active replication)
Note: No single approach is the best, but depends
highly on (1) available bandwidth and read-to-write
ratio at replicas; (2) consistency model
EECE 411: Design of Distributed Software Applications
Propagating updates (1/3)
Issue Push-based Pull-based
State of server List of client replicas and caches None
Messages sent
Update message (and possibly fetch
update later)
Poll and update
Response time at
client
Immediate (or fetch-update time) Fetch-update time
 Pushing updates: server-initiated approach, in which
the update is propagated regardless whether target
asked for it.
 Pulling updates: client-initiated approach, in which
client requests to be updated.
EECE 411: Design of Distributed Software Applications
Propagating updates: Leases (2/3)
Observation: We can dynamically switch between pull and
push using leases:
 Lease: A contract in which the server promises to push updates to
the client until the lease expires.
Technique: Make lease expiration time dependent on system's
behavior (adaptive leases):
 Age-based leases: An object that hasn't changed for a
long time, will not change in the near future, so provide a
long-lasting lease
 Renewal-frequency based leases: The more often a
client requests a specific object, the longer the expiration
time for that client (for that object) will be
 State-based leases: The more loaded a server is, the
shorter the expiration times become
EECE 411: Design of Distributed Software Applications
Propagating updates: Epidemic protocols (3/3)
 Problem so far: server scalability
 Server needs to provide updates to all system participants
 What if Internet-scale system (P2P)?
 Epidemic protocols:
 Nodes periodically pair up and exchange state (or updates)
 Push and/or pull exchanges
 Issues
 Generated traffic – mapping on physical topology
 When is an update distributed to everyone?
 Probabilistic guarantees
 Freeriding
 Use: server-less environments, volatile nodes
 mobile networks, P2P systems
EECE 411: Design of Distributed Software Applications
Issue 1. Dealing with data changes
 Consistency models
 What is the semantic the system implements
 (luckily) applications do not always require strict consistency
 Consistency protocols
 How to implement the semantic agreed upon?
Issue 2. Replica management
 How many replicas?
 Where to place them?
 When to get rid of them?
Issue 3. Redirection/Routing
 Which replica should clients use?
Summary

More Related Content

Similar to documents.pub_replication-consistency.ppt

Coupling based structural metrics for measuring the quality of a software (sy...
Coupling based structural metrics for measuring the quality of a software (sy...Coupling based structural metrics for measuring the quality of a software (sy...
Coupling based structural metrics for measuring the quality of a software (sy...
Mumbai Academisc
 
Distributed Software Engineering with Client-Server Computing
Distributed Software Engineering with Client-Server ComputingDistributed Software Engineering with Client-Server Computing
Distributed Software Engineering with Client-Server Computing
Haseeb Rehman
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx
MohamedBilal73
 
Parallel and Distributed Computing chapter 1
Parallel and Distributed Computing chapter 1Parallel and Distributed Computing chapter 1
Parallel and Distributed Computing chapter 1
AbdullahMunir32
 
Machine Learning and Deep Software Variability
Machine Learning and Deep Software VariabilityMachine Learning and Deep Software Variability
Machine Learning and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...
ijpla
 
Introduction to Java Enterprise Edition
Introduction to Java Enterprise EditionIntroduction to Java Enterprise Edition
Introduction to Java Enterprise Edition
Abdalla Mahmoud
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
Pankaj Kumar Jain
 
Reactive: Programming -> Systems -> Architecture
Reactive: Programming -> Systems -> ArchitectureReactive: Programming -> Systems -> Architecture
Reactive: Programming -> Systems -> Architecture
Aleksey Izmailov
 
Embedded system
Embedded systemEmbedded system
Embedded system
mangal das
 
Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02
Atv Reddy
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
Tekle12
 
Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
Jorgen Thelin
 
Cs556 section1
Cs556 section1Cs556 section1
Cs556 section1
farshad33
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
Yasir Khan
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
Vincenzo De Florio
 
Lecture 04 - Loose Coupling
Lecture 04 - Loose CouplingLecture 04 - Loose Coupling
Lecture 04 - Loose Coupling
phanleson
 
Chapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.pptChapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.ppt
sirajmohammed35
 
Consistency of data replication
Consistency of data replicationConsistency of data replication
Consistency of data replication
ijitjournal
 
How to write shared libraries!
How to write shared libraries!How to write shared libraries!
How to write shared libraries!
Stanley Ho
 

Similar to documents.pub_replication-consistency.ppt (20)

Coupling based structural metrics for measuring the quality of a software (sy...
Coupling based structural metrics for measuring the quality of a software (sy...Coupling based structural metrics for measuring the quality of a software (sy...
Coupling based structural metrics for measuring the quality of a software (sy...
 
Distributed Software Engineering with Client-Server Computing
Distributed Software Engineering with Client-Server ComputingDistributed Software Engineering with Client-Server Computing
Distributed Software Engineering with Client-Server Computing
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx
 
Parallel and Distributed Computing chapter 1
Parallel and Distributed Computing chapter 1Parallel and Distributed Computing chapter 1
Parallel and Distributed Computing chapter 1
 
Machine Learning and Deep Software Variability
Machine Learning and Deep Software VariabilityMachine Learning and Deep Software Variability
Machine Learning and Deep Software Variability
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...
 
Introduction to Java Enterprise Edition
Introduction to Java Enterprise EditionIntroduction to Java Enterprise Edition
Introduction to Java Enterprise Edition
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
 
Reactive: Programming -> Systems -> Architecture
Reactive: Programming -> Systems -> ArchitectureReactive: Programming -> Systems -> Architecture
Reactive: Programming -> Systems -> Architecture
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
 
Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011Orleans: Cloud Computing for Everyone - SOCC 2011
Orleans: Cloud Computing for Everyone - SOCC 2011
 
Cs556 section1
Cs556 section1Cs556 section1
Cs556 section1
 
Hpc 6 7
Hpc 6 7Hpc 6 7
Hpc 6 7
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
 
Lecture 04 - Loose Coupling
Lecture 04 - Loose CouplingLecture 04 - Loose Coupling
Lecture 04 - Loose Coupling
 
Chapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.pptChapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.ppt
 
Consistency of data replication
Consistency of data replicationConsistency of data replication
Consistency of data replication
 
How to write shared libraries!
How to write shared libraries!How to write shared libraries!
How to write shared libraries!
 

Recently uploaded

Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
LINAT
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
Welcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what yourWelcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what your
Virni Arrora
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
Nipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma TranscriptNipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma Transcript
zyqedad
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
chetankumar9855
 
Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)
Alireza Kamrani
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
vrvipin164
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
harendmgr
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
GaneshGanesh399816
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
Research proposal seminar ,Research Methodology
Research proposal seminar ,Research MethodologyResearch proposal seminar ,Research Methodology
Research proposal seminar ,Research Methodology
doctorzlife786
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 

Recently uploaded (20)

Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
Welcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what yourWelcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what your
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
Nipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma TranscriptNipissing University degree offer Nipissing diploma Transcript
Nipissing University degree offer Nipissing diploma Transcript
 
Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...Amul goes international: Desi dairy giant to launch fresh ...
Amul goes international: Desi dairy giant to launch fresh ...
 
Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
 
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECTMUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
MUMBAI MONTHLY RAINFALL CAPSTONE PROJECT
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
Research proposal seminar ,Research Methodology
Research proposal seminar ,Research MethodologyResearch proposal seminar ,Research Methodology
Research proposal seminar ,Research Methodology
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 

documents.pub_replication-consistency.ppt

  • 2. EECE 411: Design of Distributed Software Applications Logistics  Project  P01  Non-blocking IO lecture: Nov 4th.  P02 deadline on NEXT Wednesday (November 17th)  Next quiz  Tuesday: November 16th
  • 3. EECE 411: Design of Distributed Software Applications Problem Technique Advantage Partitioning  Consistent hashing Incremental scalability High availability for writes  Eventual consistency  Quorum protocol  Vector clocks with reconciliation during reads High availability Handling temporary failures  ‘Sloppy’ quorum protocol and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available. Recovering from permanent failures  Anti-entropy using Merkle trees Synchronizes divergent replicas in the background. Membership and failure detection  Gossip-based membership protocol Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information. Last time: Amazon’s Dynamo service
  • 4. EECE 411: Design of Distributed Software Applications Replication: Creating and using multiple copies of data (or services) Why replicate?  Improve system reliability  Prevent data loss  i.e. provide data durability  Increase data availability  Note: availability ≠ durability  Increase confidence: e.g. deal with byzantine failures  Improve performance  Scaling throughput  Reduce access times
  • 5. EECE 411: Design of Distributed Software Applications Issue 1. Dealing with data changes  Consistency models  What is the semantic the system implements  (luckily) applications do not always require strict consistency  Consistency protocols  How to implement the semantic agreed upon? Issue 2. Replica management  How many replicas?  Where to place them?  When to get rid of them? Issue 3. Redirection/Routing  Which replica should clients use? What are the issues?
  • 6. EECE 411: Design of Distributed Software Applications Performance and Scalability To keep replicas consistent, we generally need to ensure that all conflicting operations are done in the same order everywhere Conflicting operations: From the world of transactions:  Read–write conflict: a read operation and a write operation act concurrently  Write–write conflict: two concurrent write operations Problem: Guaranteeing global ordering on conflicting operations may be costly, reducing scalability Solution: Weaken consistency requirements so that hopefully global synchronization can be avoided Scalability TENSION Management overheads
  • 7. EECE 411: Design of Distributed Software Applications Client‘s view of the data-store Ideally: black box -- complete ‘transparency’ over how data is stored and managed
  • 8. EECE 411: Design of Distributed Software Applications Management’ system view on data store Management system: Controls the allocated resources and aims to provide transparency
  • 9. EECE 411: Design of Distributed Software Applications Consistency model Management system: Controls the allocated resources and aims to provide transparency Consistency model: Contract between the data store and the clients: The data store specifies the results of read and write operations in the presence of concurrent operations.
  • 10. EECE 411: Design of Distributed Software Applications Roadmap for next few classes  Consistency models:  contracts between the data store and the clients that specify the results of read and write operations are in the presence of concurrency.  Protocols  To manage update propagation  To manage replicas:  creation, placement, deletion  To assign client requests to replicas
  • 11. EECE 411: Design of Distributed Software Applications Consistency models  Data centric: Assume a global, data-store view (i.e., across all clients)  Continuous consistency  Limit the deviation between replicas  Models based on uniform ordering of operations  Constraints on operation ordering at the data-store level  Client centric  Assume client-independent views of the datastore  Constraints on operation ordering for each client independently
  • 12. EECE 411: Design of Distributed Software Applications Notations For operations on a Data Store:  Read: Ri(x) a -- client i reads a from location x  Write: Wi(x) b -- client i writes b at location x P1: W(x)1 P2: R(x)NIL R(x)1
  • 13. EECE 411: Design of Distributed Software Applications Sequential Consistency The result of any execution is the same as if :  operations by all processes on the data store were executed in some sequential order, and  the operations of each individual process appear in this sequence in the order specified by its program. Note: we talk about interleaved execution – there is some total ordering for all operations taken together
  • 14. EECE 411: Design of Distributed Software Applications Sequential Consistency (example) Process1: Process 2 Process 3 X  1 Y  1 Z  1 print (Y, Z) print (X, Z) print (X, Y) Is output: 00 00 00 permitted? What about: 11 11 11? What about: 00 10 10?
  • 15. EECE 411: Design of Distributed Software Applications Causal Consistency (1) Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order by different processes. Causally related relationship:  A read is causally related to the write that provided the data the read got.  A write is causally related to a read that happened before this write in the same process.  If write1  read, and read  write2, then write1  write2. Concurrent <=> not causally related
  • 16. EECE 411: Design of Distributed Software Applications Causal Consistency (Example)  This sequence is allowed with a causally-consistent store, but not with sequentially or strictly consistent store.  Note: W1(x)a  W2(x)b, but not W2 (x)b  W1(x)c
  • 17. EECE 411: Design of Distributed Software Applications Causal Consistency: (More Examples) A correct sequence of events in a causally-consistent store. A violation of a causally-consistent store.
  • 18. EECE 411: Design of Distributed Software Applications Increasing granularity: Grouping Operations Basic idea: You don’t care that reads and writes of a series of operations are immediately known to other processes. You just want the effect of the series itself to be known. One Solution:  Accesses to synchronization variables are sequentially consistent.  No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere.  No read data access is allowed to be performed until all previous accesses to synchronization variables have been performed.
  • 19. EECE 411: Design of Distributed Software Applications Consistency models  Data centric: Assume a global, data-store view (i.e., across all clients)  Continuous consistency  Limit the deviation between replicas  Models based on uniform ordering of operations  Constraints on operation ordering at the data-store level  Client centric  Assume client-independent views of the datastore  Constraints on operation ordering for each client independently
  • 20. EECE 411: Design of Distributed Software Applications Continuous Consistency Observation: We can actually talk a about a degree of consistency:  replicas may differ in their numerical value  replicas may differ in their relative staleness  Replicas may differ with respect to (number and order) of performed update operations Conit: consitency unit  specifies the data unit over which consistency is to be enforced.
  • 21. EECE 411: Design of Distributed Software Applications Example Conit: contains the variables x and y:  Each replica maintains a vector clock  B sends A operation [<5,B>: x := x + 2]; A has made this operation permanent (cannot be rolled back)  A has three pending operations  order deviation = 3
  • 22. EECE 411: Design of Distributed Software Applications  Data centric: solutions at the data store level  Continuous consistency  limit the deviation between replicas, or  Models based on uniform ordering of operations  Constraints on operation ordering at the data-store level  Client centric  Assume client-independent views of the datastore  Constraints on operation ordering for each client independently Consistency models: contracts between the data store and the clients
  • 23. EECE 411: Design of Distributed Software Applications Eventual Consistency Idea: If no updates take place for a long enough period time, all replicas will gradually (i.e., eventually) become consistent. When? Situations where eventual consistency models may make sense  Mostly read-only workloads  No concurrent updates  Advantages/Drawbacks
  • 24. EECE 411: Design of Distributed Software Applications Client-centric Consistency Models  System model  Monotonic reads  Monotonic writes  Read-your-writes  Write-follows-reads Goal: Avoid system-wide consistency, by concentrating on what each client independently wants, (instead of what should be maintained by servers with a global view)
  • 25. EECE 411: Design of Distributed Software Applications Example: Consistency for Mobile Users Example: Distributed database to which a user has access through her notebook.  Notebook acts as a front end to the database.  At location A user accesses the database with reads/updates  At location B user continues work, but unless it accesses the same server as the one at location A, she may detect inconsistencies:  updates at A may not have yet been propagated to B  user may be reading newer entries than the ones available at A:  user updates at B may eventually conflict with those at A Note: The only thing the user really needs is that the entries updated and/or read at A, are available at B the way she left them in A.  Idea: the database will appear to be consistent to the user
  • 26. EECE 411: Design of Distributed Software Applications Example - Basic Architecture
  • 27. EECE 411: Design of Distributed Software Applications Client-centric Consistency Idea: Guarantees some degree of data access consistency for a single client/process point of view. Notations:  Xi[t]  Version of data item x at time t at local replica Li  WS(xi [t])  working set (all write operations) at Li up to time t  WS(xi [t]; xj [t])  indicates that it is known that WS(xi [t]) is included in WS(xj [t])
  • 28. EECE 411: Design of Distributed Software Applications Monotonic-Read Consistency Intuition: Client “sees” the same or newer version of data. Definition: If a process reads the value of a data item x, any successive read operation on x by that process will always return that same or a more recent value.
  • 29. EECE 411: Design of Distributed Software Applications Monotonic reads – Examples  Automatically reading your personal calendar updates from different servers.  Monotonic Reads guarantees that the user sees always more recent updates, no matter from which server the reading takes place.  Reading (not modifying) incoming e-mail while you are on the move.  Each time you connect to a different e-mail server, that server fetches (at least) all the updates from the server you previously visited.
  • 30. EECE 411: Design of Distributed Software Applications Monotonic-Write Consistency Intuition: A write happens on a replica only if it’s brought up to date with preceding write operations on same data (but possibly at different replicas) Definition: A write operation by a process on a data item x is completed before any successive write operation on x by the same process. WS
  • 31. EECE 411: Design of Distributed Software Applications Monotonic writes – Examples  Updating a program at server S2, and ensuring that all components on which compilation and linking depends, are also placed at S2.  Maintaining versions of replicated files in the correct order everywhere (propagate the previous version to the server where the newest version is installed).
  • 32. EECE 411: Design of Distributed Software Applications Read-Your-Writes Consistency Intuition: All previous writes are always completed before any successive read Definition: The effect of a write operation by a process on data item x, will always be seen by a successive read operation on x by the same process.
  • 33. EECE 411: Design of Distributed Software Applications Read-Your-Writes - Examples  Updating your Web page and guaranteeing that your Web browser shows the newest version instead of its cached copy.  Password database
  • 34. EECE 411: Design of Distributed Software Applications Writes-Follow-Reads Consistency Intuition: Any successive write operation on x will be performed on a copy of x that is same or more recent than the last read. Definition: A write operation by a process on a data item x following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read.  Examples: news groups
  • 35. EECE 411: Design of Distributed Software Applications Writes-Follow-Reads - Examples  See reactions to posted articles only if you have the original posting  a read “pulls in” the corresponding write operation.
  • 36. EECE 411: Design of Distributed Software Applications Rodamap Updating replicas  Consistency models  What is the semantic the system provides in case of updates  (luckily) applications do not always require strict consistency  Consistency protocols:  How is this semantic implemented Replica and content management  How many replicas?  Where to place them?  When to get rid of them? Redirection/Routing  Which replica should clients use?
  • 37. EECE 411: Design of Distributed Software Applications Reminder: System view Management system: Controls the allocated resources and aims to provide replication transparency Consistency model: contract between the data store and the clients
  • 38. EECE 411: Design of Distributed Software Applications Next: implementation techniques Updating replicas  Consistency models (how to deal with updated data)  (luckily) applications do not always require strict consistency  Consistency protocols: Implementation issues  How is a consistency model implemented Replica and content management  How many replicas?  Where to place them?  When to get rid of them? Redirection/Routing  Which replica should clients use?
  • 39. EECE 411: Design of Distributed Software Applications Reminder: Types of consistency models  Consistency model: contract between the data store and the clients  the data store specifies precisely what the results of read and write operations are in the presence of concurrency.  Data centric: solutions at the data store level  Continuous consistency: limit the deviation between replicas, or  Constraints on operation ordering at the data-store level  Client centric  Constraints on operation ordering for each client independently
  • 40. EECE 411: Design of Distributed Software Applications Today: Consistency protocols Question: How does one design the protocols to implement the desired consistency model?  Data centric  Constraints on operation ordering at the data-store level  Sequential consistency.  Continuous consistency: limit the deviation between replicas  Client centric  Constraints on operation ordering for each client independently
  • 41. EECE 411: Design of Distributed Software Applications Reminder: Monotonic-Read Consistency Definition: If a process reads the value of a data item x, any successive read operation on x by that process will always return that same or a more recent value.  Intuition: Client “sees” the same or a newer version of the data.
  • 42. EECE 411: Design of Distributed Software Applications Implementation of monotonic-read consistency Sketch:  Globally unique identifier for each write operation  Quiz-like question: explain how exactly the globally unique identifier is (1) generated, and (2) used.  Each client keeps track of:  Write IDs relevant to the read operations performed so far on various objects (ReadSet)  When a client launches a read  Client sends the ReadSet to the replica server  The server checks that all updates in the ReadSet have been performed. [If necessary] Fetches unperformed updates  Returns the value
  • 43. EECE 411: Design of Distributed Software Applications More quiz-like questions Sketch a Design for the consistency model  Monotonic-writes  Read-your-writes  Writes-follow-reads Write pseudo-code
  • 44. EECE 411: Design of Distributed Software Applications Today: Consistency protocols Question: How does one design the protocols to implement the desired consistency model?  Data centric  Constraints on operation ordering at the data-store level  Sequential consistency, causal consistency, etc  Continuous consistency: limit the deviation between replicas  Client centric  Constraints on operation ordering for each client independently
  • 45. EECE 411: Design of Distributed Software Applications Overview: Data-centric Consistency Protocols
  • 46. EECE 411: Design of Distributed Software Applications Overview: Data-centric Consistency Protocols
  • 47. EECE 411: Design of Distributed Software Applications Usage: distributed databases and file systems that require a high degree of fault tolerance. Replicas often placed on same LAN. Issues: blocking vs. non-blocking Primary-Based Protocols
  • 48. EECE 411: Design of Distributed Software Applications Primary-backup protocol with local writes  The primary migrates to the node that executes the write  Usage: Mobile computing in disconnected mode  Idea: ship all relevant files to user before disconnecting, and update later.
  • 49. EECE 411: Design of Distributed Software Applications Overview: Data-centric Consistency Protocols
  • 50. EECE 411: Design of Distributed Software Applications Active Replication (1/2)  Model: Remote operations/updates (and not the result of state changes) are replicated  Control replication (and not data replication)  Requires total update ordering (either through a sequencer or using Lamport’s protocol).  Example: Replicated distributed objects  Problems  Operations need to be sequenced  Dealing with workflows
  • 51. EECE 411: Design of Distributed Software Applications Replicated-Write: Active Replication (2/2)  Dealing with workflows in a general setting
  • 52. EECE 411: Design of Distributed Software Applications Overview: Data-centric Consistency Protocols
  • 53. EECE 411: Design of Distributed Software Applications Replicated writes: Quorums Problem: some replicas may not be available all the time.  leads to low availability (for writes) Solution: Quorum-based protocols: Ensure that each operation is carried on enough replicas (not on all)  a majority ‘vote’ is established before each operation  distinguish a read quorum and a write quorum:
  • 54. EECE 411: Design of Distributed Software Applications Valid quorums Must satisfy two basic rules  A read quorum should “intersect” any prior write quorum at >= 1 processes  A write quorum should intersect any other write quorum So, in a group of size N:  Qr + Qw > N, and  Qw + Qw > N results in Q>w > N/2
  • 55. EECE 411: Design of Distributed Software Applications Quorums: Mechanics for read and write Setup: Replicas of data items have “versions”,  Versions are numbered (timestamped)  Timestamps must increase monotonically and include a process id to break ties Reads are easy:  Send RPCs until Qr processes reply  Then use the replica with the largest timestamp (the most recently updated replica)
  • 56. EECE 411: Design of Distributed Software Applications Quorums: Mechanics for read and write (cont) Writes are trickier:  Need to determine version number  Need to make sure all replicas are updated atomically  Consequence: need a ‘commit’ protocol to protect from concurrent writes
  • 57. EECE 411: Design of Distributed Software Applications Quorum based protocols: Write algorithm
  • 58. EECE 411: Design of Distributed Software Applications Consistency protocols for continuous consistency Question: How does one design the protocols to implement the desired consistency model?  Data centric  Constraints on operation ordering at the data-store level  Sequential consistency.  Continuous consistency: limit the deviation between replicas  Client centric  Constraints on operation ordering for each client independently
  • 59. EECE 411: Design of Distributed Software Applications Continuous Consistency
  • 60. EECE 411: Design of Distributed Software Applications Continuous Consistency (reminder) Observation: We can talk a about a degree of consistency:  replicas may differ in their numerical value  replicas may differ in their relative staleness  There may differences with respect to (number and order) of performed update operations Conit: consistency unit  specifies the data unit over which consistency is to be measured.
  • 61. EECE 411: Design of Distributed Software Applications Continuous Consistency: Bounding Numerical Errors (I) Principle: consider a conit (data item) x and let weight(W(x)) denote the numerical change in its value after a write operation W.  [for simplicity] Assume that for all W(x), weight(W) > 0  W is initially forwarded to one of the N replicas: the origin(W).  TW[i, j] are the writes executed by server Si that originated from Sj TW[i, j]=Σ {weight(W)| origin(W)=Sj & W in log(Si)} Note: Actual value of x: vi : value of x at replica i
  • 62. EECE 411: Design of Distributed Software Applications Continuous Consistency: Bounding Numerical Errors (II) Reminder: TW[i, j] writes executed by server Si originated at Sj Problem: We need to ensure that v(t) − vi < di for every Si. Approach: Let every server Sk maintain a view TWk[i, j] for what Sk believes is the value of TW[i, j].  This information can be piggybacked when an update is propagated. Note: 0 ≤ TWk[i, j] ≤ TW[i, j] ≤ TW[j, j] Solution: Sk sends operations from its log to Si when it sees that TWk[i, k] is getting too far from TW[k, k],  in particular, when TW[k, k] − TWk[i, k] > di/(N − 1). Note: Staleness can be done analogously, by essentially keeping track of what has been seen last from Sk
  • 63. EECE 411: Design of Distributed Software Applications Issue 1. Dealing with data changes  Consistency models  What is the semantic the system implements  (luckily) applications do not always require strict consistency  Consistency protocols  How to implement the semantic agreed upon? Issue 2. Replica management  How many replicas?  Where to place them?  When to get rid of them? Issue 3. Redirection/Routing  Which replica should clients use? Rodamap
  • 64. EECE 411: Design of Distributed Software Applications Quorums Servers Clients State: … State: … State: … [Barbara Liskov]
  • 65. EECE 411: Design of Distributed Software Applications Quorums Servers Clients State: … State: … State: … A A X [Barbara Liskov]
  • 66. EECE 411: Design of Distributed Software Applications Quorums Servers Clients State: … State: … A State: … A X [Barbara Liskov]
  • 67. EECE 411: Design of Distributed Software Applications C-A-P choose two C A P Fox&Brewer “CAP Theorem” consistency Availability Partition-resilience Claim: every distributed system is on one side of the triangle. CA: available, and consistent, unless there is a partition. AP: a reachable replica provides service even in a partition, but may be inconsistent. CP: always consistent, even in a partition, but a reachable replica may deny service without agreement of the others (e.g., quorum).
  • 68. EECE 411: Design of Distributed Software Applications Issue 1. Dealing with data changes  Consistency models  What is the semantic the system implements  (luckily) applications do not always require strict consistency  Consistency protocols  How to implement the semantic agreed upon? Issue 2. Replica management  How many replicas?  Where to place them?  When to get rid of them? Issue 3. Redirection/Routing  Which replica should clients use? Rodamap
  • 69. EECE 411: Design of Distributed Software Applications Reminder: System view Management system: Controls the allocated resources and aims to provide replication transparency Consistency model: contract between the data store and the clients
  • 70. EECE 411: Design of Distributed Software Applications  Consistency model to have in mind for next slides: eventual consistency  Example applications: web content distribution – HTML, videos, etc
  • 71. EECE 411: Design of Distributed Software Applications Replica server placement Problem: Figure out what the best K places are out of N possible locations.  Select best location out of N – j, j:0..K-1 for which the average distance to clients is minimal. Then choose the next best server.  (Note: The first chosen location minimizes the average distance to all clients.)  Computationally expensive.  Select the k largest autonomous system and place a server at the best-connected host.  Computationally expensive.  Position nodes in a d-dimensional geometric space, where distance reflects latency. Identify the K regions with highest density and place a server in every one.  Computationally cheap.
  • 72. EECE 411: Design of Distributed Software Applications Content replication and placement Model: We consider data objects (don't worry whether they contain data or code, or both) Distinguish different processes: A process is capable of hosting a replica of an object or data:  Permanent replicas: Process/machine always having a replica  Server-initiated replica: Process that can dynamically host a replica on request of another server in the data store  Client-initiated replica: Process that can dynamically host a replica on request of a client (client caches)
  • 73. EECE 411: Design of Distributed Software Applications Content Replication: Server-Initiated Replicas  Keep track of access counts per file, aggregated by considering server closest to requesting clients  Number of accesses drops below threshold D  drop file  Number of accesses exceeds threshold R  replicate file  Number of access between D and R  migrate file
  • 74. EECE 411: Design of Distributed Software Applications Client initiated replicas Issue: What do I propagate when content is dynamic? State vs. operations  Propagate only notification/invalidation of update (often used for caches)  Transfer data from one copy to another (often for distributed databases)  Propagate the update operation to other copies (active replication) Note: No single approach is the best, but depends highly on (1) available bandwidth and read-to-write ratio at replicas; (2) consistency model
  • 75. EECE 411: Design of Distributed Software Applications Propagating updates (1/3) Issue Push-based Pull-based State of server List of client replicas and caches None Messages sent Update message (and possibly fetch update later) Poll and update Response time at client Immediate (or fetch-update time) Fetch-update time  Pushing updates: server-initiated approach, in which the update is propagated regardless whether target asked for it.  Pulling updates: client-initiated approach, in which client requests to be updated.
  • 76. EECE 411: Design of Distributed Software Applications Propagating updates: Leases (2/3) Observation: We can dynamically switch between pull and push using leases:  Lease: A contract in which the server promises to push updates to the client until the lease expires. Technique: Make lease expiration time dependent on system's behavior (adaptive leases):  Age-based leases: An object that hasn't changed for a long time, will not change in the near future, so provide a long-lasting lease  Renewal-frequency based leases: The more often a client requests a specific object, the longer the expiration time for that client (for that object) will be  State-based leases: The more loaded a server is, the shorter the expiration times become
  • 77. EECE 411: Design of Distributed Software Applications Propagating updates: Epidemic protocols (3/3)  Problem so far: server scalability  Server needs to provide updates to all system participants  What if Internet-scale system (P2P)?  Epidemic protocols:  Nodes periodically pair up and exchange state (or updates)  Push and/or pull exchanges  Issues  Generated traffic – mapping on physical topology  When is an update distributed to everyone?  Probabilistic guarantees  Freeriding  Use: server-less environments, volatile nodes  mobile networks, P2P systems
  • 78. EECE 411: Design of Distributed Software Applications Issue 1. Dealing with data changes  Consistency models  What is the semantic the system implements  (luckily) applications do not always require strict consistency  Consistency protocols  How to implement the semantic agreed upon? Issue 2. Replica management  How many replicas?  Where to place them?  When to get rid of them? Issue 3. Redirection/Routing  Which replica should clients use? Summary