Different database consistency models explained by The Dude.
Choosing a database is hard, most developers take the word of bloggers and huge proclamations from inventors.
This slide helps in understanding various database consistency models so that you can break down your own requirements into a consistency model, and then choose the appropriate database,
5. Consistency and Histories
Invariants are your fixed set of requirements that system should not violate.
For example: client should never see an older copy of data.
When a system holds true for all the assumed invariants the system is said to be
consistent.
A consistency model is set of all allowed histories.
13. In a transaction when a value is changed meanwhile other transaction overwrites
it and commits it.
What happens if T1 wants to rollback? Should x = 0 or x = 2?
What are dirty writes?
20. There is no Dirty Write since T2 commits before T1 writes x. There is no Dirty
Read since there is no read after a write. Nevertheless, at the end of this
sequence T2‘s update will be lost, even if T2 commits.
What are lost updates?
21. It introduces the concept of a cursor, which refers to a particular object being
accessed by a transaction.
Transactions may have multiple cursors.
When a transaction reads an object using a cursor, that object cannot be modified
by any other transaction until the cursor is released, or the transaction commits.
How lost updates are mitigated by Cursor Stability?
24. Once some effects of a txn A are observed by txn B, then all effects of A are
visible to B. Also if another later version is observed then all the versions from that
transaction will be visible.
Only committed values are observed as monotonic atomic views implies read
committed.
Isolation effect of “Atomicity” in ACID — how much of a committed transaction
should be visible to others — all or nothing.
What is monotonic atomic views?
26. When a transaction reads something and later the same read returns different
value.
What are non-repeatable reads?
27. You started transaction from your spring boot application
Default level was set to READ COMMITTED.
You read a value and took some decision. This triggered an update.
In the validation phase you did another read.
Values mismatched.
Someone changed your value and the decision you took was wrong.
Remedy, Use higher consistency level or use SELECT FOR UPDATE
Real life use case
29. In a snapshot isolated system, each transaction appears to operate on an
independent, consistent snapshot of the database. Its changes are visible only to
that transaction until commit time, when all changes become visible atomically. If
transaction T1 has modified an object x, and another transaction T2 committed a
write to x after T1’s snapshot began, and before T1’s commit, then T1 must abort.
What is snapshot isolation?
31. A Phantom occurs when a transaction does a predicate-based read (e.g.
SELECT… WHERE P) and another transaction writes a data item matched by that
predicate while the first transaction is still in flight.
What are phantom reads?
32. If each transaction is correct by itself, then a schedule that comprises any serial
execution of these transactions is correct: "Serial" means that transactions do not
overlap in time and cannot interfere with each other, i.e, complete isolation
between each other exists. Any order of the transactions is legitimate, if no
dependencies among them exist. As a result, a schedule that comprises any
execution (not necessarily serial) that is equivalent (in its outcome) to any serial
execution of these transactions, is correct.
What is serializable isolation level?
33. As serializability does not involve any transaction interleaving, any transaction
doing predicate queries won’t get affected by other transactions.
Other option is index crabbing.
How phantom reads are handled?
35. Implies that every operation appears to take place atomically, in some order,
consistent with the real-time ordering of those operations: e.g., if operation A
completes before operation B begins, then B should logically take effect after A.
Points to note:
1. One object
2. Atomic, indivisible steps, all or nothing
3. Real time constraint if something happens before another then it should
reflect accordingly
4. Usually for single object using one key or set of keys
What is linearizability?
36. 1. Multi object.
2. Multi operations.
3. Arbitrary Order.
4. No overlap in time.
5. No real time constraint.
6. Desired by database
community
1. Single object.
2. Single operation.
3. Real time Order.
4. No overlap in time.
5. Real time constraint.
6. Desired by distributed
systems community
38. Baseball as an example for
Replicated database consistency
Source https://jepsen.io/consistency
Source https://www.microsoft.com/en-
us/research/wp-
content/uploads/2011/10/ConsistencyAndBaseball
Report.pdf
39. The game starts with the score of 0-0. The visitors bat first and remain at bat until
they make three outs. Then the home team bats until it makes three outs.
Baseball
45. What is the point dude?
Identify various actors of the system.
Based on the actors identify the problems that different actors of the
system can live with and the problems that are bad for business.
Then, please use the suitable database.
Convert the needed invariants of your system and the anomalies the
system can live with into a consistency model.
Thank You Dudes
46. If after so many slides. If you still feel like saying ….
47. You: I will just use MongoDB. It is a web scale database.
Slides are pretty heavy. Atleast I thought the first slide should be good.
All references are from the movie The Big Lebowski movie from which the philosophy of dudeism started.
Over a year ago when I joined TW I was more interested in database. I wanted to enroll into courses and deeper my understanding about databases and in general distributed systems. Started enrolling into courses and realized the vocabulary used in academic courses is bizarre. What you think you know is always different from what you really know. It might make you feel perplexed
These words in database academic papers are like street codes man without them you cannot hustle.
We all know databases. I’m sure at least I hope so. I do not want to pull out wikipedia on you all.
If I want to design a system I will have an objective for this system. There are certain things it must do and certain things it should not and there are other things it should not do but it might be outside the scope.
Invariants are the requirements that a system should not violate..
For last defination we will understand in next slide.
Simple ruby program. Output? How to validate it is correct?
Could be seen as single register r/w with no concurrency
Only one client in above history.
HIstory is through which we validate our distributed systems or databases.
From top to bottom.
More consistency from bottom to top.
Implies.
Network Partition
Explain Legend
Dirty writes the anomaly no one wants.
How can some other transaction overwrite the uncommitted value from another transaction?
First transaction showed the first interest in the value. It should be locked.
The bare minimum you could except from a database. <Show meme>
T1 is working to withdraw 40 from x and deposit to y it will eventually comes 10 + 90 to satisfy x+y=100 constraint but T2 is able to observe the intermediate inconsistent state.
Cursor stability is a consistency model which strengthens read committed by preventing lost updates. It introduces the concept of a cursor, which refers to a particular object being accessed by a transaction. Transactions may have multiple cursors. When a transaction reads an object using a cursor, that object cannot be modified by any other transaction until the cursor is released, or the transaction commits.
SELECT FOR UPDATE
First transaction showed the first interest in the value. It should be locked.
All the previous anomalies were due to the fact that transactions saw something that they are not supposed to see or saw something from future or past. Let’s break the word.
Give an example. Transaction saw something and things changed.
First transaction showed the first interest in the value. It should be locked.
Give brief explanation. Timestamps and stuff. <Show meme>
How many of us remember this from college? It provides higher isolation as it only interleave unrelated transactions. The transactions appear to be in some serial order. But what order? To which linearizability will say <show meme> It solves phantom reads
Serializability: different kind of operations multi-operation, multi-object, arbitrary total order, If order’s result is equivalent to serial order it is correct.
Serializability: Give me real time constraint
Linearizability: Give me multi object