Transaction
Sanjoy Roy
A question for you?
What isolation level you have in
your (operational) database?
If you are using Oracle and
isolation level is
READ COMMITTED
then the following anomalies
can happen:
In database systems many things can go
wrong:
Q uer y + update Q uer y + update
Can transaction help
to simplify these issues?
rea
dwrite
rea
dwrite
What is transaction?
1975 IBM System R
NoSQL Transactions
When dealing with
transactions it is important to
understand exactly what
safety guarantees it can
provide and associated cost.
ACID
A C I D
Not about concurrency
A C I D
about handling faults (crashes)
Update listing
set buyer = John
Insert into
invoices …
Ok
TimeDB
Alice
Any of these fault can happen:
 Crash/power failure
 Deadlock
 Network fault
 Constraint violation
 …..
Update listing
set buyer = John
Insert into
invoices …
Ok
Time
Alice
DB
Transactions =
Multi-objects atomicity
Roll back writes on abort
Any of these fault can happen:
 Crash/power failure
 Deadlock
 Network fault
 Constraint violation
 …..
Abort
A C I D
consistency
ACID CAP
A C I D
ConcurrencyIsolation
Alice
Bob
Get counter
42
Get counter
42
[ 42 + 1 = 43 ]
[ 42 + 3 = 45 ]
Set counter 43
OK
Set counter 45
OK
Time
DB
Isolation is the trickiest part,
more on this in later slides.
A C I D
durability
Durability means the promise
that any data is committed
successfully (that it has written)
during the transaction will not be
forgotten even if database
crashes.
Durability
Archive Tape fsync to Disk Replication
Isolation levels
Serializable
Repeatable Read Snapshot Isolation
Read committed
Read uncommitted
“Weaker than”
Serializable
Serializable isolation means that the
database guarantees that transactions
have the same effect as if they ran
serially, i.e. one at a time, without any
concurrency.
Serializability
vs
Linearizability
Linearizability Serializability
It is a guarantee about single
operations on single objects.
It provides a real-time (i.e.,
wall-clock) guarantee on the
behavior of a set of single
operations (often reads and
writes) on a single object
(e.g., distributed register or
data item).
It is a guarantee about
transactions, or groups of one or
more operations over one or
more objects.
It guarantees that the execution of
a set of transactions (usually
containing read and write
operations) over multiple items is
equivalent to some serial
execution (total ordering) of the
transactions.
Serializable isolation has a
performance cost
and many databases don’t want
to pay that price.
HAT, not CAP: Towards Highly Available Transactions
Peter Bailis et al, 2013
READ COMMITTED
Most basic level transaction isolation.
It makes two guarantees:
No dirty reads No dirty writes
Default in Postgres, Oracle, and SQL Server
Dirty reads
Alice
Bob
DB
TimeGet X
2
Set X = 3
Ok
Get X
3
Commit
Dirty Read
No dirty reads
Alice
Bob
DB
Time
Get X
2
Set X = 3
Ok
2
Commit
3
Get X Get X
Dirty writes
Alice
Bob
Lis ting
Time
Update listing set
buyer = Alice where
id = 1
Ok
Invoic e
Update listing set
buyer = Bob
where id = 1
O
k
Update invoice set
recipient = Bob where
listing_id = 1
O
k
Update invoice set
recipient = Alice where
listing_id = 1
Ok
Commit
Commit
Read skew can happen under read committed
X = 500
Y= 500
Backup
Set X = X - 100
Ok
Set Y = Y + 100
Ok
Get Y
500
Commit
400
600
Get X
400
400 + 600 = 1000
500 + 400 = 900
Account 1
Account 2
Alice
Time
Read skew can be prevented by
Snapshot Isolation/MVCC
Postgres “Repeatable Read”
MySQL “Repeatable Read” – for read only transactions
SQL Server “Snapshot”
Oracle “Serializable”
Snapshot Isolation
Snapshot Isolation is a guarantee that all reads
made in a transaction will see a consistent
snapshot of the database (in practice it reads the
last committed values that existed at the time it
started), and the transaction itself will
successfully commit only if no updates it has
made conflict with any concurrent updates made
since that snapshot.
Snapshot isolation is a popular feature: it is supported by PostgreSQL, MySQL with the InnoDB storage engine, Oracle, SQL
Server, and more.
In Snapshot Isolation
readers never block writers
writers never block readers.
Alice
Bob
DB
Time
Get X 2
Lost update problem
Get X 2
Set X = 4
Ok
Set X = 6
X = 2
Ok
Here Alice’s change
is lost.
Solutions to lost update problem
 Atomic Operations
 Explicit Locking
 Automatically detecting lost updates
 Compare-and-set
 Conflict resolution and replication
Write skew
A transaction reads something, makes a
decision based on the value it saw, and writes
the decision to the database. However, by the
time the write is made, the premise of the
decision is no longer true.
Write skew example
Account Amount
Alice £70
Bob £80
Constraint
Either account can be overdrawn if
Alice’s Amount + Bob’s Amount > 0
Write skew example
Account Amount
Alice - £30
Bob £80
Constraint
Either account can be overdrawn if
Alice’s Amount + Bob’s Amount > 0
Transaction 1
Alice withdraws £100
Permitted, 70 + 80 = £150
Write skew example
Account Amount
Alice £70
Bob - £20
Constraint
Either account can be overdrawn if
Alice’s Amount + Bob’s Amount > 0
Transaction 2
Bob withdraws £100
Permitted, 70 + 80 = £150
Write skew example
Account Amount
Alice £70
Bob £80
Account Amount
Alice - £30
Bob £80
Account Amount
Alice £70
Bob - £20
Transaction 1
Alice withdraws £100
Permitted, 70 + 80 = £150
Transaction 2
Bob withdraws £100
Permitted, 70 + 80 = £150
Account Amount
Alice - £30
Bob - £20
Solutions to write skew
Serializable
Lock everything you read
( 2 PL, pessimistic)
Literally serial order
(H-store, VoltDB)
Detect conflict & abort
(SSI, optimistic)
Non repeatable read
A transaction retrieves a row twice and
the values within the row differ between
the reads.
Non repeatable read example
Transaction 1
/* Query 1 */
SELECT * FROM users WHERE id = 1;
/* Query 1 */
SELECT * FROM users WHERE id = 1; COMMIT;
/* lock-based REPEATABLE READ */
Transaction 2
/* Query 2 */
UPDATE users SET age = 21 WHERE id =
1; COMMIT;
/* in multiversion concurrency control, or
lock-based READ COMMITTED */
Phantom reads
A transaction reads rows that match
some search condition. Another client
makes a write that affects the results of
that search.
Phantom read example
Transaction 1
/* Query 1 */
SELECT * FROM users
WHERE age BETWEEN 10 AND 30;
/* Query 1 */
SELECT * FROM users
WHERE age BETWEEN 10 AND 30;
Transaction 2
/* Query 2 */
INSERT INTO users(id,name,age)
VALUES ( 3, 'Bob', 27 ); COMMIT;
This search condition will give
different result now.
Summary
• Without transactions, data can become inconsistent in
various error scenarios.
• Transactions can hugely reduce the number of potential
error cases when there is a complex database access pattern.
A large class of errors is reduced down to a simple
transaction abort, and the application just needs to try again.
• There are several widely used isolation levels, in particular
read committed, snapshot isolation (sometimes called
repeatable read) and serializable.
• Different race conditions can happen: dirty reads, dirty
writes, read skew, lost updates, write skews, phantom reads.

Transaction

  • 1.
  • 2.
    A question foryou? What isolation level you have in your (operational) database?
  • 3.
    If you areusing Oracle and isolation level is READ COMMITTED then the following anomalies can happen:
  • 7.
    In database systemsmany things can go wrong: Q uer y + update Q uer y + update
  • 8.
    Can transaction help tosimplify these issues?
  • 9.
  • 10.
  • 11.
  • 12.
    When dealing with transactionsit is important to understand exactly what safety guarantees it can provide and associated cost.
  • 13.
  • 14.
    A C ID Not about concurrency
  • 15.
    A C ID about handling faults (crashes)
  • 16.
    Update listing set buyer= John Insert into invoices … Ok TimeDB Alice
  • 17.
    Any of thesefault can happen:  Crash/power failure  Deadlock  Network fault  Constraint violation  …..
  • 18.
    Update listing set buyer= John Insert into invoices … Ok Time Alice DB
  • 19.
  • 20.
    Any of thesefault can happen:  Crash/power failure  Deadlock  Network fault  Constraint violation  ….. Abort
  • 21.
    A C ID consistency
  • 22.
  • 23.
    A C ID ConcurrencyIsolation
  • 24.
    Alice Bob Get counter 42 Get counter 42 [42 + 1 = 43 ] [ 42 + 3 = 45 ] Set counter 43 OK Set counter 45 OK Time DB
  • 25.
    Isolation is thetrickiest part, more on this in later slides.
  • 26.
    A C ID durability
  • 27.
    Durability means thepromise that any data is committed successfully (that it has written) during the transaction will not be forgotten even if database crashes.
  • 28.
    Durability Archive Tape fsyncto Disk Replication
  • 29.
    Isolation levels Serializable Repeatable ReadSnapshot Isolation Read committed Read uncommitted “Weaker than”
  • 30.
    Serializable Serializable isolation meansthat the database guarantees that transactions have the same effect as if they ran serially, i.e. one at a time, without any concurrency.
  • 31.
  • 32.
    Linearizability Serializability It isa guarantee about single operations on single objects. It provides a real-time (i.e., wall-clock) guarantee on the behavior of a set of single operations (often reads and writes) on a single object (e.g., distributed register or data item). It is a guarantee about transactions, or groups of one or more operations over one or more objects. It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions.
  • 33.
    Serializable isolation hasa performance cost and many databases don’t want to pay that price.
  • 34.
    HAT, not CAP:Towards Highly Available Transactions Peter Bailis et al, 2013
  • 35.
    READ COMMITTED Most basiclevel transaction isolation. It makes two guarantees: No dirty reads No dirty writes Default in Postgres, Oracle, and SQL Server
  • 36.
    Dirty reads Alice Bob DB TimeGet X 2 SetX = 3 Ok Get X 3 Commit Dirty Read
  • 37.
    No dirty reads Alice Bob DB Time GetX 2 Set X = 3 Ok 2 Commit 3 Get X Get X
  • 38.
    Dirty writes Alice Bob Lis ting Time Updatelisting set buyer = Alice where id = 1 Ok Invoic e Update listing set buyer = Bob where id = 1 O k Update invoice set recipient = Bob where listing_id = 1 O k Update invoice set recipient = Alice where listing_id = 1 Ok Commit Commit
  • 39.
    Read skew canhappen under read committed X = 500 Y= 500 Backup Set X = X - 100 Ok Set Y = Y + 100 Ok Get Y 500 Commit 400 600 Get X 400 400 + 600 = 1000 500 + 400 = 900 Account 1 Account 2 Alice Time
  • 40.
    Read skew canbe prevented by Snapshot Isolation/MVCC Postgres “Repeatable Read” MySQL “Repeatable Read” – for read only transactions SQL Server “Snapshot” Oracle “Serializable”
  • 41.
    Snapshot Isolation Snapshot Isolationis a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation is a popular feature: it is supported by PostgreSQL, MySQL with the InnoDB storage engine, Oracle, SQL Server, and more.
  • 42.
    In Snapshot Isolation readersnever block writers writers never block readers.
  • 43.
    Alice Bob DB Time Get X 2 Lostupdate problem Get X 2 Set X = 4 Ok Set X = 6 X = 2 Ok Here Alice’s change is lost.
  • 44.
    Solutions to lostupdate problem  Atomic Operations  Explicit Locking  Automatically detecting lost updates  Compare-and-set  Conflict resolution and replication
  • 45.
    Write skew A transactionreads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true.
  • 46.
    Write skew example AccountAmount Alice £70 Bob £80 Constraint Either account can be overdrawn if Alice’s Amount + Bob’s Amount > 0
  • 47.
    Write skew example AccountAmount Alice - £30 Bob £80 Constraint Either account can be overdrawn if Alice’s Amount + Bob’s Amount > 0 Transaction 1 Alice withdraws £100 Permitted, 70 + 80 = £150
  • 48.
    Write skew example AccountAmount Alice £70 Bob - £20 Constraint Either account can be overdrawn if Alice’s Amount + Bob’s Amount > 0 Transaction 2 Bob withdraws £100 Permitted, 70 + 80 = £150
  • 49.
    Write skew example AccountAmount Alice £70 Bob £80 Account Amount Alice - £30 Bob £80 Account Amount Alice £70 Bob - £20 Transaction 1 Alice withdraws £100 Permitted, 70 + 80 = £150 Transaction 2 Bob withdraws £100 Permitted, 70 + 80 = £150 Account Amount Alice - £30 Bob - £20
  • 50.
    Solutions to writeskew Serializable Lock everything you read ( 2 PL, pessimistic) Literally serial order (H-store, VoltDB) Detect conflict & abort (SSI, optimistic)
  • 51.
    Non repeatable read Atransaction retrieves a row twice and the values within the row differ between the reads.
  • 52.
    Non repeatable readexample Transaction 1 /* Query 1 */ SELECT * FROM users WHERE id = 1; /* Query 1 */ SELECT * FROM users WHERE id = 1; COMMIT; /* lock-based REPEATABLE READ */ Transaction 2 /* Query 2 */ UPDATE users SET age = 21 WHERE id = 1; COMMIT; /* in multiversion concurrency control, or lock-based READ COMMITTED */
  • 53.
    Phantom reads A transactionreads rows that match some search condition. Another client makes a write that affects the results of that search.
  • 54.
    Phantom read example Transaction1 /* Query 1 */ SELECT * FROM users WHERE age BETWEEN 10 AND 30; /* Query 1 */ SELECT * FROM users WHERE age BETWEEN 10 AND 30; Transaction 2 /* Query 2 */ INSERT INTO users(id,name,age) VALUES ( 3, 'Bob', 27 ); COMMIT; This search condition will give different result now.
  • 55.
    Summary • Without transactions,data can become inconsistent in various error scenarios. • Transactions can hugely reduce the number of potential error cases when there is a complex database access pattern. A large class of errors is reduced down to a simple transaction abort, and the application just needs to try again. • There are several widely used isolation levels, in particular read committed, snapshot isolation (sometimes called repeatable read) and serializable. • Different race conditions can happen: dirty reads, dirty writes, read skew, lost updates, write skews, phantom reads.

Editor's Notes

  • #6 on March 2nd, 2014, the Flexcoin Bitcoin exchange was subject to such a concurrency-related attack: The attacker. . . successfully exploited a flaw in the code which allows transfers between Flexcoin users. By sending thousands of simultaneous requests, the attacker was able to “move” coins from one user account to another until the sending account was overdrawn, before balances were updated. This was then repeated through multiple accounts, snowballing the amount, until the attacker withdrew the coins. As a result of this attack, all Bitcoins stored in the Flexcoin exchange were stolen, all users lost their stored Bitcoins, and the exchange was forced to shut down.
  • #7 This presentation is mostly based on the chapter 7 (Transactions) of this book. Every developer should read this book. Martin has explained many complex topics in easy and beautiful way in this book.
  • #8 Different ways database systems can go wrong: The database system software and hardware may fail at any time (including in the middle of a write operation) The application may crash at any time (including halfway through a series of operations) Network disruption can unexpectedly cut off the application from the database or from one node from another Several clients may write to the database at the same time, overwriting each other changes A client may read partially updated data which does not make sense Race conditions between clients can create surprising bugs In order to be reliable system has to deal with these faults, but it requires lot of work, lot of careful thinking, and also lot of testing to ensure the system actually works.
  • #9 For decades, transactions have been the mechanism of choice for simplifying these issues.
  • #10 Transaction is a way for an application to group several reads and writes together into a logical unit. Conceptually all reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commits) or fails (abort, rollback). If it fails, the application can safely retry. Transactions were created in order to simplify the programming model for the applications accessing a database. By using transactions, application is free to ignore certain potential error scenarios and concurrency issues because database takes care of them instead (these are called safety guarantees).
  • #11 Most of relational databases today follow the transaction style that was introduced in 1975 by IBM System R, the first SQL database. Although some implementation might have changed but the general idea behind the transaction support in MySQL, PostgreSQL, Oracle, SQL server is similar to System R.
  • #12 Many new generation databases have abandoned transactions entirely or redefined the word “transaction” to describe a much weaker set of guarantees that it had previously been understood. Transactions have advantages and limitations. In order to understand these trade-offs we need to know the details of the guarantees that transactions can provide both in normal operation, and in various extreme (but realistic) circumstances.
  • #13 Even though at first look transactions seem straight forward but there are many subtle but important details that come into play when dealing with transaction.
  • #14 The safety guarantees provided by transactions are often described by the well- known acronym ACID, which stands for Atomicity, Consistency, Isolation and Durability. It was coined in 1983 by Theo Härder and Andreas Reuter in an effort to establish precise terminology for fault-tolerance mechanisms in databases. In practice one database’s implementation of ACID does not equal another’s implementation. There is a lot of ambiguity around the meaning of isolation. Today when a system claims to be “ACID compliant”, it is not always clear what guarantees it can provide. Systems that are do not meet ACID criteria are sometimes called BASE (Basically Available, Soft state, Eventual consistency).
  • #15 In general, atomic refers to something that cannot be broken down into smaller parts. ACID atomicity is not about concurrency. It does not describe what happens if several processes try to access the same data at the same time.
  • #16 Instead, ACID atomicity describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed.
  • #17 In this example, Alice is selling a product to John. In this flow, many things can go wrong.
  • #20 If the writes are grouped together into an atomic transaction, and the transaction cannot be completed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any writes it has made so far in that transaction. Without atomicity, if an error occurs part way through making multiple changes, it’s difficult to know which changes have taken effect and which haven’t.
  • #23 In CAP theorem, C = consistency means linearizability. In ACID, C = consistency refers to an application specific notation of the database is in a good state. The idea of the ACID consistency is that there are some statements about your data (invariants) that must always be true, like customer id must always be unique, customer’s balance must not be negative, in accounting system debits and credits across all accounts must always be balanced etc. If a transaction starts with a database that is valid according to some invariants and any write during the transaction preserves these invariants then it can be said that these invariants are satisfied.
  • #24 In ACID, Isolation means concurrently executing transactions are isolated from each other. Classic database textbooks formalized isolation as serializability which tells that each transaction can pretend that it is the only transaction running on the entire database. The database ensures that when the transactions have committed, the result is the same as if they had run serially (one after another), even though in reality they may have run concurrently.
  • #25 Most databases are accessed by several clients at the same time. This is not a problem if they are reading and writing different parts of the database, but if they are accessing the same database records, there is a possibility to run into concurrency problems (race conditions). Here a race condition between two clients concurrently updating a counter.
  • #29 Historically durability means writing to an archive tape. On a single node database system, it means that data has been written to an non-volatile storage such as hard disk or SSD. On a replicated database system, durability may mean data has been successfully copied to some other nodes. In order to provide durability guarantee database system must wait until these writes or replications complete before reporting a transaction as successfully commit. Perfect durability does not exist: if all your hard disks and all your backups are destroyed at the same time, there’s obviously nothing your database can do to save you.
  • #30 Concurrency issues (race conditions) only come into play when one transaction reads data that is concurrently modified by another transaction, or when two transactions try to simultaneously modify the same data. Concurrency bugs are hard to find by testing, concurrency is also very difficult to reason about, especially in a large application. Also application has many concurrent users, so any piece of data could unexpectedly change at any time. For this reason, databases have long tried to hide concurrency issues from application developers by providing transaction isolation.
  • #31 Serializable isolation is usually regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency. Thus, the database guarantees that if the transactions behave correctly when run individually, they continue to be correct when run concurrently. In other words, the database prevents all possible race conditions.
  • #32 Serializability and linearizability are both important properties about interleaving of operations in databases and distributed systems. It is easy to get them confused.
  • #33 In plain English, under linearizability, writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write. Serializability is the traditional “I” or isolation, in ACID. If users’ transactions each preserve application correctness (“C” or consistency, in ACID), a serializable execution also preserves correctness. Therefore, serializability is a mechanism for guaranteeing database correctness. Unlike linearizability, serializability does not - by itself - impose any real-time constraints on the ordering of transactions. Serializability is also not composable. Serializability does not imply any kind of deterministic order—it simply requires that some equivalent serial execution exists.
  • #34 In practice, Isolation is unfortunately not that simple. Most databases that provide serializability today use one of three techniques: 1. Actual serial execution: executing only one transaction at a time, in serial order, on a single thread. Executing all transactions serially makes concurrency control much simpler, but limits the transaction throughput of the database to the speed of a single CPU core on a single machine. Read-only transactions may execute elsewhere, using snapshot isolation, but for applications with high write throughput the single-threaded transaction processor can become a serious bottleneck. 2. Two phase locking: Here several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon as anyone wants to write (modify or delete) an object, exclusive access is required. In 2PL, writers don’t just block other writers, but a reader must also block a writer, and vice versa. Two types of lock are used: shared lock and exclusive lock. Transaction throughput and response times of queries are significantly worse under two-phase locking. This is due to the overhead of acquiring and releasing the locks and reduced concurrency. 3. Optimistic concurrency control techniques such as serializable snapshot isolation (SSI): SSI is a fairly new algorithm that avoids most of the downsides of the previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is checked, and aborted if the execution was not serializable. SSI provides full serializability, but has only a small performance penalty compared to snapshot isolation. Today SSI is used both in single-node databases (the serializable isolation level in PostgreSQL since version 9.1) and distributed databases (like CockroachDB).
  • #36 Some databases support an even weaker isolation level called Read Uncommitted which prevents dirty writes, but does not prevent dirty reads.
  • #37 Imagine one transaction has written some data to the database, but has not yet committed or aborted. Can another transaction see that uncommitted data? If yes, that is called a dirty read. Read committed isolation level must prevent dirty reads. This means that any writes by a transaction only become visible to others when that transaction commits. There are a few reasons why it is useful to prevent dirty reads: If a transaction needs to update several objects, a dirty read means that another transaction may see some of the updates but not others. If a transaction aborts, any writes it has made need to be rolled back. If the database allows dirty reads, that means a transaction may see data that was later rolled back, i.e. which was never actually committed to the database.
  • #39 In this example buyer is Bob but invoice recipient is Alice. In dirty write, later write is overwriting an uncommitted value. With dirty writes, conflicting writes from different transactions can be mixed up. Read Committed isolation level must prevent dirty writes, usually by delaying the second write until the first write’s transaction has committed or aborted.
  • #40 Read skew is also called non repeatable read.
  • #41 Snapshot Isolation is being called by many databases in different names. Oracle calls it Serializable whereas Postgres and MySQL call it Repeatable Read. The reason for this naming confusion is that the SQL standard doesn’t have the concept of snapshot isolation, because the standard is based on System R’s 1975 definition of isolation levels and snapshot isolation hadn’t yet been invented then. Instead, it defines repeatable read, which looks similar to snapshot isolation. However, there has been a formal definition of repeatable read in the research literature but most implementations don’t satisfy that formal definition. And to top it off, IBM DB2 uses repeatable read to refer to serializability. As a result, nobody really knows what repeatable read means.
  • #43 In snapshot isolation write locks are used to prevent dirty writes. This means a transaction that holds the write lock prevent another transaction to write on the same object. However reads do not require locking. It performs better than serializability and also at the same time avoids most of concurrency anomalies that serializability avoids. Snapshot isolation is implemented by using a technique called Multi Version Concurrency Control (MVCC). In snapshot isolation the database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time. Because it maintains several versions of an object side-by-side, this technique is known as multi‐version concurrency control (MVCC).
  • #44 The lost update problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle). If two transactions do this concurrently, one of the modifications can be lost, because the second write does not include the first modification. Examples: 1. Incrementing a counter or updating an account balance. 2. Making a local change to a complex value, e.g. adding an element to a list within a JSON document. 3. Two users editing a wiki page at the same time, where each user saves their changes by sending the entire page contents to the server, overwriting whatever is currently in the database.
  • #45 Atomic operations are usually implemented by taking an exclusive lock on the object when it is read, so that no other transaction can read it until the update has been applied. This technique is sometimes known as cursor stability. Explicit locking is done by the application itself. For example, application can use “FOR UPDATE” clause in query. The “FOR UPDATE” clause indicates that the database should take a lock on all rows returned by the query. This should be done carefully otherwise race condition can happen. Some databases automatically detect lost updates. For example, PostgreSQL’s repeatable read, Oracle’s serializable and SQL Server’s snapshot isolation levels automatically detect when a lost update has occurred, and abort the offending transactions. In this case, when transaction manager detects that a lost update occurred, the transaction is aborted and must retry its read-modify-write cycle. In Compare-and-set, an update is only allowed to happen if the value has not changed since it was last read. If the current value does not match what was being previously read, the update has no effect, and the read-modify-write cycle must be retried. Locks and Compare-and-set will not work in replicated databases. Because databases with multi-leader or leaderless replication usually allow several writes to happen concurrently and replicate them asynchronously, so they cannot guarantee that there is a single up-to-date copy of the data which locks and compare-and-set based technique assume. A common approach in such replicated databases is to allow concurrent writes to create several conflicting versions of a value (also known as siblings), and to use application code or special data structures to resolve and merge these versions after the fact. Atomic operations can work well in a replicated context, especially if they are commutative (i.e. you can apply them in a different order on different replicas, and still get the same result). Riak 2.0 datatypes use atomic operations to prevent lost updates. Unfortunately, Last Write Wins (LWW) is the default conflict resolution technique used in many replicated databases which is prone to Lost Updates.
  • #50 Each update is safe by itself, but the final commit state violates the constraint Alice’ Amount + Bob’s Amount > 0. This problem was not detected by First Committer Wins because two different data items were updated, each under the assumption that the other remained stable. Hence the name "Write Skew". Other examples: meeting room booking system, Multiplayer game, claiming username Write skew is not dirty writes or lost updates.
  • #53 In this example, Transaction 2 commits successfully, which means that its changes to the row with id 1 should become visible. However, Transaction 1 has already seen a different value for age in that row. 
  • #54 This effect, where a write in one transaction changes the result of a search query in another transaction, is called a phantom.
  • #55 Phantom read can occur when range locks are not acquired on performing a SELECT ... WHERE operation. The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates (i.e. INSERT) new rows (in the target table) which fulfill that WHERE clause. Use serializable isolation level to overcome this problem.