Primary-copy replication designates one replica as the primary copy where updates are applied. The primary propagates updates asynchronously to secondary replicas to ensure eventual consistency. Multi-master replication allows updates to multiple copies but requires reconciliation of conflicting updates. Various algorithms like majority consensus and Thomas' write rule aim to ensure serializability despite failures and asynchronous replication.
UNIT IV FAILURE RECOVERY AND FAULT TOLERANCE 9
Basic Concepts-Classification of Failures – Basic Approaches to Recovery; Recovery in
Concurrent System; Synchronous and Asynchronous Checkpointing and Recovery; Check
pointing in Distributed Database Systems; Fault Tolerance; Issues - Two-phase and Nonblocking
Commit Protocols; Voting Protocols; Dynamic Voting Protocols;
UNIT IV FAILURE RECOVERY AND FAULT TOLERANCE 9
Basic Concepts-Classification of Failures – Basic Approaches to Recovery; Recovery in
Concurrent System; Synchronous and Asynchronous Checkpointing and Recovery; Check
pointing in Distributed Database Systems; Fault Tolerance; Issues - Two-phase and Nonblocking
Commit Protocols; Voting Protocols; Dynamic Voting Protocols;
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
This is a presentation for Chapter 7 Distributed system management
Book: DISTRIBUTED COMPUTING , Sunita Mahajan & Seema Shah
Prepared by Students of Computer Science, Ain Shams University - Cairo - Egypt
Process Migration in Heterogeneous Systemsijsrd.com
Process migration is very important aspect of distributed operating system. It is the act of transferring a process between two machines. This aspect of process migration deals with the decisions of transferring the state of process from one computer to another i.e. distributing the processes. The process migration mechanism must package the entire state of the process executing in the originating machine so that the destination machine may continue its execution. The process should not normally be concerned by any changes in the environment, other than in obtaining better performance. This paper will present the mechanism of Process Migration in heterogeneous distributed system.
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
My EPFL candidacy exam presentation: http://wiki.epfl.ch/edicpublic/documents/Candidacy%20exam/vozniuk_andrii_candidacy_writeup.pdf
Here I present how schedulers work in three distributed data processing systems and their possible optimizations. I consider Gamma - a parallel database, MapReduce - a data-intensive system and Condor - a compute-intensive system.
This talk is based on the following papers:
1) Batch Scheduling in Parallel Database Systems by Manish Mehta, Valery Soloviev and David J. DeWitt
2) Improving MapReduce performance in heterogeneous environments by Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica
3) Batch Scheduling in Parallel Database Systems by Manish Mehta, Valery Soloviev and David J. DeWitt
The timing behavior of the OS must be predictable - services of the OS: Upper bound on the execution time!
2. OS must manage the timing and scheduling
OS possibly has to be aware of task deadlines;
(unless scheduling is done off-line).
3. The OS must be fast
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
This is a presentation for Chapter 7 Distributed system management
Book: DISTRIBUTED COMPUTING , Sunita Mahajan & Seema Shah
Prepared by Students of Computer Science, Ain Shams University - Cairo - Egypt
Process Migration in Heterogeneous Systemsijsrd.com
Process migration is very important aspect of distributed operating system. It is the act of transferring a process between two machines. This aspect of process migration deals with the decisions of transferring the state of process from one computer to another i.e. distributing the processes. The process migration mechanism must package the entire state of the process executing in the originating machine so that the destination machine may continue its execution. The process should not normally be concerned by any changes in the environment, other than in obtaining better performance. This paper will present the mechanism of Process Migration in heterogeneous distributed system.
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
My EPFL candidacy exam presentation: http://wiki.epfl.ch/edicpublic/documents/Candidacy%20exam/vozniuk_andrii_candidacy_writeup.pdf
Here I present how schedulers work in three distributed data processing systems and their possible optimizations. I consider Gamma - a parallel database, MapReduce - a data-intensive system and Condor - a compute-intensive system.
This talk is based on the following papers:
1) Batch Scheduling in Parallel Database Systems by Manish Mehta, Valery Soloviev and David J. DeWitt
2) Improving MapReduce performance in heterogeneous environments by Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica
3) Batch Scheduling in Parallel Database Systems by Manish Mehta, Valery Soloviev and David J. DeWitt
The timing behavior of the OS must be predictable - services of the OS: Upper bound on the execution time!
2. OS must manage the timing and scheduling
OS possibly has to be aware of task deadlines;
(unless scheduling is done off-line).
3. The OS must be fast
Learn strategies to maintain your database's high availability even during peak use periods. MariaDB's Field CTO Max Mether offers best practices for high availability, disaster recovery and more.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
A tutorial-like technical presentation that covers fundamental approaches for replication along with their advantages, disadvantages, comparisons with each other etc.
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
Dieses Webinar hilft Ihnen, die Unterschiede zwischen den verschiedenen Replikationsansätzen zu verstehen, die Anforderungen der jeweiligen Strategie zu erkennen und sich über die Möglichkeiten klar zu werden, was mit jeder einzelnen zu erreichen ist. Damit werden Sie hoffentlich eher in der Lage sein, herauszufinden, welche PostgreSQL-Replikationsarten Sie wirklich für Ihr System benötigen.
- Wie physische und logische Replikation in PostgreSQL funktionieren
- Unterschiede zwischen synchroner und asynchroner Replikation
- Vorteile, Nachteile und Herausforderungen bei der Multi-Master-Replikation
- Welche Replikationsstrategie für unterschiedliche Use-Cases besser geeignet ist
Referent:
Borys Neselovskyi, Regional Sales Engineer DACH, EDB
------------------------------------------------------------
For more #webinars, visit http://bit.ly/EDB-Webinars
Download free #PostgreSQL whitepapers: http://bit.ly/EDB-Whitepapers
Read our #Postgres Blog http://bit.ly/EDB-Blogs
Follow us on Facebook at http://bit.ly/EDB-FB
Follow us on Twitter at http://bit.ly/EDB-Twitter
Follow us on LinkedIn at http://bit.ly/EDB-LinkedIn
Reach us via email at marketing@enterprisedb.com
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
Outbrain is the world's largest content discovery program. Learn about their use case with Scylla where they lowered latency while doing 20X IOPS of Cassandra.
Presentation was delivered in a fault tolerance class which talk about the achieving fault tolerance in databases by making use of the replication.Different commercial databases were studied and looked into the approaches they took for replication.Then based on the study an architecture was suggested for military database design using an asynchronous approach and making use of the cluster patterns.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
3. 2/29/2012 3
1. Introduction
• Replication - using multiple copies of a server or
resource for better availability and performance.
– Replica and Copy are synonyms
• If you’re not careful, replication can lead to
– worse performance - updates must be applied to all
replicas and synchronized
– worse availability - some algorithms require multiple
replicas to be operational for any of them to be used
7. 2/29/2012 7
Replicated Database
Database
Server 1
Database
Server 2
Database
Server 3
• Objective
– Availability
– Performance
• Transparency
– 1 copy serializability
• Challenge
– Propagating and synchronizing updates
8. 2/29/2012 8
Replicated Server
• Can replicate servers on a common resource
– Data sharing - DB servers communicate with shared disk
Resource
Server Replica 1 Server Replica 2
Client
• Helps availability for process (not resource) failure
• Requires a replica cache coherence mechanism, so
this helps performance only if
– little conflict between transactions at different servers or
– loose coherence guarantees (e.g. read committed)
9. 2/29/2012 9
Replicated Resource
• To get more improvement in availability,
replicate the resources (too)
• Also increases potential throughput
• This is what’s usually meant by replication
• It’s the scenario we’ll focus on
Resource replica
Server Replica 1 Server Replica 2
ClientClient
Resource replica
10. 2/29/2012 10
Synchronous Replication
• Replicas function just like a non-replicated resource
– Txn writes data item x. System writes all replicas of x.
– Synchronous – replicas are written within the update txn
– Asynchronous – One replica is updated immediately.
Other replicas are updated later
• Problems with synchronous replication
– Too expensive for most applications, due to heavy
distributed transaction load (2-phase commit)
– Can’t control when updates are applied to replicas
Write(x1)
Write(x2)
Write(x3)
x1
x2
x3
Start
Write(x)
Commit
11. 2/29/2012 11
Synchronous Replication - Issues
r1[xA]
r2[yD] w2[xB]
w1[yC]yD fails
xA fails
Not equivalent to a
one-copy execution,
even if xA and yD
never recover!
• DBMS products support it only in special situations
• If you just use transactions, availability suffers.
• For high-availability, the algorithms are complex and
expensive, because they require heavy-duty
synchronization of failures.
• … of failures? How do you synchronize failures?
• Assume replicas xA, xB of x and yC, yD of y
12. 2/29/2012 12
Atomicity & Isolation Goal
• One-copy serializability (abbr. 1SR)
– An execution of transactions on the replicated database has
the same effect as a serial execution on a one-copy database.
• Readset (resp. writeset) - the set of data items (not
copies) that a transaction reads (resp. writes).
• 1SR Intuition: the execution is SR and in an equivalent
serial execution, for each txn T and each data item x in
readset(T), T reads from the most recent txn that wrote
into any copy of x.
• To check for 1SR, first check for SR (using SG), then
see if there’s equivalent serial history with the above
property
13. 2/29/2012 13
Atomicity & Isolation (cont’d)
• Previous example was not 1SR. It is equivalent to
– r1[xA] w1[yC] r2[yD] w2[xB] and
– r2[yD] w2[xB] r1[xA] w1[yC]
– but in both cases, the second transaction does not read its
input from the previous transaction that wrote that input.
• These are 1SR
– r1[xA] w1[yD] r2[yD] w2[xB]
– r1[xA] w1[yC] w1[yD] r2[yD] w2[xA] w2[xB]
• The previous history is the one you would expect
– Each transaction reads one copy of its readset and
writes into all copies of its writeset
• But it may not always be feasible, because some copies
may be unavailable.
14. 2/29/2012 14
Asynchronous Replication
• Asynchronous replication
– Each transaction updates one replica.
– Updates are propagated later to other replicas.
• Primary copy: Each data item has a primary copy
– All transactions update the primary copy
– Other copies are for queries and failure handling
• Multi-master: Transactions update different copies
– Useful for disconnected operation, partitioned network
• Both approaches ensure that
– Updates propagate to all replicas
– If new updates stop, replicas converge to the same state
• Primary copy ensures serializability, and often 1SR
– Multi-master does not.
15. 2/29/2012 15
2. Primary-Copy Replication
• Designate one replica as the primary copy (publisher)
• Transactions may update only the primary copy
• Updates to the primary are sent later to secondary replicas
(subscribers) in the order they were applied to the primary
T1: Start
… Write(x1) ...
Commit
x1T2
Tn
... Primary
Copy
x2
xm
...
Secondaries
16. 2/29/2012 16
Update Propagation
• Collect updates at the primary using triggers or
by post-processing the log
– Triggers: on every update at the primary, a trigger fires
to store the update in the update propagation table.
– Log post-processing: “sniff” the log to generate update
propagations
• Log post-processing (log sniffing)
– Saves triggered update overhead during on-line txn.
– But R/W log synchronization has a (small) cost
• Optionally identify updated fields to compress log
• Most DB systems support this today.
17. 2/29/2012 17
Update Processing 1/2
• At the replica, for each tx T in the propagation stream,
execute a refresh tx that applies T’s updates to replica.
• Process the stream serially
– Otherwise, conflicting transactions may run in a
different order at the replica than at the primary.
– Suppose log contains w1[x] c1 w2[x] c2.
Obviously, T1 must run before T2 at the replica.
– So the execution of update transactions is serial.
• Optimizations
– Batching: {w(x)} {w(y)} -> {w(x), w(y)}
– “Concurrent” execution
18. 2/29/2012 18
Update Processing 2/2
• To get a 1SR execution at the replica
– Refresh transactions and read-only queries use an
atomic and isolated mechanism (e.g., 2PL)
• Why this works
– The execution is serializable
– Each state in the serial execution is one that
occurred at the primary copy
– Each query reads one of those states
• Client view
– Session consistency
19. 2/29/2012 19
Request Propagation
• Or propagate requests (e.g. txn-bracketed stored proc calls)
• Requirements
– Must ensure same order at primary and replicas
– Determinism
• This is often a txn middleware (not DB) feature.
• An alternative to propagating updates is to propagate
procedure calls (e.g., a DB stored procedure call).
SP1: Write(x)
Write(y)x, y
DB-A w[x]
w[y]
SP1: Write(x)
Write(y) x, y
DB-B
w[x]
w[y]Replicate
Call(SP1)
20. 2/29/2012 20
Failure & Recovery Handling 1/3
• Secondary failure - nothing to do till it recovers
– At recovery, apply the updates it missed while down
– Needs to determine which updates it missed,
just like non-replicated log-based recovery
– If down for too long, may be faster to get a whole copy
• Primary failure
– Normally, secondaries wait till the primary recovers
– Can get higher availability by electing a new primary
– A secondary that detects primary’s failure starts a new
election by broadcasting its unique replica identifier
– Other secondaries reply with their replica identifier
– The largest replica identifier wins
21. 2/29/2012 21
Failure & Recovery Handling 2/3
• Primary failure (cont’d)
– All replicas must now check that they have the
same updates from the failed primary
– During the election, each replica reports the id of the
last log record it received from the primary
– The most up-to-date replica sends its latest updates to
(at least) the new primary.
22. 2/29/2012 22
Failure & Recovery Handling 3/3
• Primary failure (cont’d)
– Lost updates
– Could still lose an update that committed at the primary
and wasn’t forwarded before the primary failed …
but solving it requires synchronous replication
(2-phase commit to propagate updates to replicas)
– One primary and one backup
• There is always a window for lost updates.
23. 2/29/2012 23
Communications Failures
• Secondaries can’t distinguish a primary failure from a
communication failure that partitions the network.
• If the secondaries elect a new primary and the old primary
is still running, there will be a reconciliation problem
when they’re reunited. This is multi-master.
• To avoid this, one partition must know it’s the only one
that can operate. It can’t communicate with other
partitions to figure this out.
• Could make a static decision.
E.g., the partition that has the primary wins.
• Dynamic solutions are based on Majority Consensus
24. 2/29/2012 24
Majority Consensus
• Whenever a set of communicating replicas detects a
replica failure or recovery, they test if they have a
majority (more than half) of the replicas.
• If so, they can elect a primary
• Only one set of replicas can have a majority.
• Doesn’t work with an even number of copies.
– Useless with 2 copies
• Quorum consensus
– Give a weight to each replica
– The replica set that has a majority of the weight wins
– E.g. 2 replicas, one has weight 1, the other weight 2
25. 2/29/2012 25
3. Multi-Master Replication
• Some systems must operate when partitioned.
– Requires many updatable copies, not just one primary
– Conflicting updates on different copies are detected late
• Classic example - salesperson’s disconnected laptop
Customer table (rarely updated) Orders table (insert mostly)
Customer log table (append only)
– So conflicting updates from different salespeople are rare
• Use primary-copy algorithm, with multiple masters
– Each master exchanges updates (“gossips”) with other replicas
when it reconnects to the network
– Conflicting updates require reconciliation (i.e. merging)
• In Lotus Notes, Access, SQL Server, Oracle, …
26. 2/29/2012 26
Example of Conflicting Updates
• Replicas end up in different states
Replica 1
Initially x=0
T1: X=1
Primary
Initially x=0
Send (X=1)
Replica 2
Initially x=0
T2: X=2
Send (X=1)
X=1
X=1
X=2
Send (X=2)
X=2
Send (X=2)
• Assume all updates propagate via the primary
time
27. 2/29/2012 27
Thomas’ Write Rule
• To ensure replicas end up in the same state
– Tag each data item with a timestamp
– A transaction updates the value and timestamp of data
items (timestamps monotonically increase)
– An update to a replica is applied only if the update’s
timestamp is greater than the data item’s timestamp
– You only need timestamps of data items that were
recently updated (where an older update could still be
floating around the system)
• All multi-master products use some variation of this
• Robert Thomas, ACM TODS, June ’79
28. 2/29/2012 28
Thomas Write Rule Serializability
• Replicas end in the same state, but neither T1 nor T2 reads
the other’s output, so the execution isn’t serializable.
• This requires reconciliation
Replica 1
T1: read x=0 (TS=0)
T1: X=1, TS=1
Primary
Initially x=0,TS=0
Send (X=1, TS=1)
Replica 2
T1: read x=0 (TS=0)
T2: X=2, TS=2
Send (X=1, TS=1)
X=1, TS=1
X=1,TS=1X=2, TS=2
Send (X=2, TS=2)
X=2, TS=2
Send (X=2, TS=2)
29. 2/29/2012 29
Multi-Master Performance
• The longer a replica is disconnected and
performing updates, the more likely it will
need reconciliation
• The amount of propagation activity increases
with more replicas
– If each replica is performing updates,
the effect is quadratic in the number of
replicas
30. 2/29/2012 30
Making Multi-Master Work
• Transactions
– T1: x++ {x=1} at replica 1
– T2: x++ {x=1} at replica 2
– T3: x++ {y=1} at replica 3
– Replica 2 and 3 already exchanged updates
• On replica 1
– Current state { x=1, y=0 }
– Receive update from replica 2 {x=1, y=1}
– Receive update from replica 3 {x=1, y=1}
31. 2/29/2012 31
Making Multi-Master Work
• Time in a distributed system
– Emulate global clock
– Use local clock
– Logical clock
– Vector clock
• Dependency tracking metadata
– Per data item
– Per replica
– This could be bigger than the data
32. 2/29/2012 32
Microsoft Access and SQL Server
• Each row R of a table has 4 additional columns
– Globally unique id (GUID)
– Generation number, to determine which updates from
other replicas have been applied
– Version num = the number of updates to R
– Array of [replica, version num] pairs, identifying the
largest version num it got for R from every other replica
• Uses Thomas’ write rule, based on version nums
– Access uses replica id to break ties.
– SQL Server 7 uses subscriber priority or custom conflict
resolution.
33. 2/29/2012 33
4. Other Approaches (1/2)
• Non-transactional replication using timestamped
updates and variations of Thomas’ write rule
– Directory services are managed this way
• Quorum consensus per-transaction
– Read and write a quorum of copies
– Each data item has a version number and timestamp
– Each read chooses a replica with largest version
number
– Each write increments version number one greater
than any one it has seen
– No special work needed for a failure or recovery
34. 2/29/2012 34
Other Approaches 2/2
• Read-one replica, write-all-available replicas
– Requires careful management of failures and
recoveries
• E.g., Virtual partition algorithm
– Each node knows the nodes it can communicate
with, called its view
– Txn T can execute if its home node has a view
including a quorum of T’s readset and writeset
– If a node fails or recovers, run a view formation
protocol (much like an election protocol)
– For each data item with a read quorum, read the
latest version and update the others with smaller
version #.
35. 2/29/2012 35
Summary
• State-of-the-art products have rich functionality.
– It’s a complicated world for app designers
– Lots of options to choose from
• Most failover stories are weak
– Fine for data warehousing
– For 247 TP, need better integration with
cluster node failover