Distributed transactions
- ARITRA DAS
Database transactions
A group of operations that is independently executed for
data retrieval or updates.
States
ACID properties
➔ Atomicity
➔ Consistency
➔ Isolation
➔ Durability
consistency
➔ Constraints are satisfied
➔ Data integrity is maintained
Durability
➔ Changes that have been committed to the database should
remain even in the case of software and hardware failure
Atomicity
➔ All or nothing
➔ Abortibility
➔ Log for crash recovery
Isolation
➔ Concurrently running transactions shouldn’t interfere
with each other.
Dirty write
➔ One transaction overrides uncommitted data of another
transaction
Dirty read
➔ One transaction sees uncommitted data of a different
transaction
Non repeatable reads
➔ Different reads gives back different values for same
object.
Phantom reads
➔ Transactions depend on the criteria that it modifies
Isolation levels
Isolation levels
Read uncommitted
➔ Transaction takes a write-lock on the row whose data it’s
modifying.
➔ Prevents dirty write.
➔ Highest performance, since no read lock.
Read committed
➔ Maintain last committed value in memory.
➔ For read operations last commited value is given (no
dirty reads)
➔ For write operation it takes a write-lock on the row
level (no dirty write)
Snapshot isolation
➔ Transaction sees all the data in a state, when the
transaction was initiated.
➔ Database maintains several copies of the same data (multi
version concurrency control)
➔ Takes write-lock.
➔ Prevents dirty write, dirty read, non-repeatable reads.
serializable
➔ Transactions may execute in parallel, the end result is
the same as if they had executed one at a time, serially,
without any concurrency.
➔ Complex and slower than others.
serializable isolation techniques
➔ Execute in serial order
➔ 2 phase locking
◆ Shared lock for reading, exclusive lock for writing
◆ Pessimistic
➔ Serializable snapshot isolation
◆ No locks, database checks for conflicts when commit attempt is made.
◆ In case of a conflict transactions are aborted.
◆ Optimistic
Isolation levels on different dbs
Distributed systems
➔ The nodes operate concurrently.
➔ The nodes fail independently.
➔ The nodes do not share a global clock.
Distributed transactions
➔ Transactions in a distributed system, spanned across two
or more nodes.
➔ Transactions involving 2 or more nodes having network
partition.
2 phase commit
➔ Provides atomicity
➔ Synchronous
➔ Requires global coordinator(most of the times)
➔ 2 Phases
◆ Prepare
◆ Commit
2PC failures: in prepare
2pc failures: after prepare
1. P1 -> Prepared ACK
2. P2 -> Prepared ACK
3. P1 -> Committed ACK
4. P2 -> Commit fails
5. Coordinator -> Retry indefinitely
2pc failures: coordinator failure
➔ Before prepare
◆ Clients abort
➔ After getting ACK from participants
◆ Participants wait for the coordinator to come back up
◆ Coordinator comes back, reads the log and act
2pc benefits
➔ Guarantees atomicity
➔ Provides read-write isolation
➔ Provides strong consistency
2pc disadvantages
➔ Synchronous and Blocking
➔ Hold locks
➔ Some problems in 2PC is addressed by 3PC
SAGA
➔ Async and reactive
➔ Communication over message bus
➔ Compensating transactions on failure
Compensating transaction
Saga pros and cons
➔ Pros
◆ Async, non-blocking
◆ Atomicity
➔ Cons
◆ No isolation
Types of saga
➔ Choreography
➔ Orchestration
choreography
➔ No central coordinator
➔ Participants emits and subscribes to messages
➔ Simpler to implement
➔ Provides loose coupling
orchestration
➔ Central orchestrator coordinate the events
➔ Works in command reply async style
➔ Less coupling
➔ Smart orchestrator, dumb services
Saga isolation anomalies
➔ Lost update
➔ Dirty read
➔ Non repeatable reads
Lost update
Dirty read
Counter measures
➔ Semantic lock— An application-level lock. This can be an actual DB lock,
or adding an indicator that this record is being updated with something
like *_PENDING added to the status.
➔ Commutative updates— Design update operations to be executable in any
order.
➔ Pessimistic view— Reorder the steps of a saga to minimize business risk.
➔ Reread value— Prevent dirty writes by rereading data to verify that it’s
unchanged before overwriting it.
➔ Version file— Record the updates to a record so that they can be
reordered.
➔ By value— Use each request’s business risk to dynamically select the
concurrency mechanism.
Semantic lock: LOST UPDATE
1. Create order SAGA -> Flag :: PENDING_APPROVAL
2. Cancel order SAGA -> can’t cancel as flag is
PENDING_APPROVAL
3. Create order SAGA -> Flag :: APPROVED
Reread value: lost update
1. Create order saga creates an order
2. Cancel order saga cancels the order
3. Create order saga try to approve the order
a. Reads the order status
b. Order is cancelled hence don’t do anything
conclusion
➔ Maintaining ACID properties in distributed systems is
hard.
➔ It’s essential to understand atomicity and isolation to
determine what is required by your systems.
➔ If you can identify what kind of anomalies might happen
in your system, then you can take countermeasures.
➔ There’s no silver bullet.

Distributed transactions

  • 1.
  • 2.
    Database transactions A groupof operations that is independently executed for data retrieval or updates.
  • 3.
  • 4.
    ACID properties ➔ Atomicity ➔Consistency ➔ Isolation ➔ Durability
  • 5.
    consistency ➔ Constraints aresatisfied ➔ Data integrity is maintained
  • 6.
    Durability ➔ Changes thathave been committed to the database should remain even in the case of software and hardware failure
  • 7.
    Atomicity ➔ All ornothing ➔ Abortibility ➔ Log for crash recovery
  • 9.
    Isolation ➔ Concurrently runningtransactions shouldn’t interfere with each other.
  • 10.
    Dirty write ➔ Onetransaction overrides uncommitted data of another transaction
  • 11.
    Dirty read ➔ Onetransaction sees uncommitted data of a different transaction
  • 12.
    Non repeatable reads ➔Different reads gives back different values for same object.
  • 13.
    Phantom reads ➔ Transactionsdepend on the criteria that it modifies
  • 14.
  • 15.
  • 16.
    Read uncommitted ➔ Transactiontakes a write-lock on the row whose data it’s modifying. ➔ Prevents dirty write. ➔ Highest performance, since no read lock.
  • 17.
    Read committed ➔ Maintainlast committed value in memory. ➔ For read operations last commited value is given (no dirty reads) ➔ For write operation it takes a write-lock on the row level (no dirty write)
  • 18.
    Snapshot isolation ➔ Transactionsees all the data in a state, when the transaction was initiated. ➔ Database maintains several copies of the same data (multi version concurrency control) ➔ Takes write-lock. ➔ Prevents dirty write, dirty read, non-repeatable reads.
  • 19.
    serializable ➔ Transactions mayexecute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency. ➔ Complex and slower than others.
  • 20.
    serializable isolation techniques ➔Execute in serial order ➔ 2 phase locking ◆ Shared lock for reading, exclusive lock for writing ◆ Pessimistic ➔ Serializable snapshot isolation ◆ No locks, database checks for conflicts when commit attempt is made. ◆ In case of a conflict transactions are aborted. ◆ Optimistic
  • 21.
    Isolation levels ondifferent dbs
  • 24.
    Distributed systems ➔ Thenodes operate concurrently. ➔ The nodes fail independently. ➔ The nodes do not share a global clock.
  • 25.
    Distributed transactions ➔ Transactionsin a distributed system, spanned across two or more nodes. ➔ Transactions involving 2 or more nodes having network partition.
  • 28.
    2 phase commit ➔Provides atomicity ➔ Synchronous ➔ Requires global coordinator(most of the times) ➔ 2 Phases ◆ Prepare ◆ Commit
  • 30.
  • 31.
    2pc failures: afterprepare 1. P1 -> Prepared ACK 2. P2 -> Prepared ACK 3. P1 -> Committed ACK 4. P2 -> Commit fails 5. Coordinator -> Retry indefinitely
  • 32.
    2pc failures: coordinatorfailure ➔ Before prepare ◆ Clients abort ➔ After getting ACK from participants ◆ Participants wait for the coordinator to come back up ◆ Coordinator comes back, reads the log and act
  • 33.
    2pc benefits ➔ Guaranteesatomicity ➔ Provides read-write isolation ➔ Provides strong consistency
  • 34.
    2pc disadvantages ➔ Synchronousand Blocking ➔ Hold locks ➔ Some problems in 2PC is addressed by 3PC
  • 35.
    SAGA ➔ Async andreactive ➔ Communication over message bus ➔ Compensating transactions on failure
  • 37.
  • 38.
    Saga pros andcons ➔ Pros ◆ Async, non-blocking ◆ Atomicity ➔ Cons ◆ No isolation
  • 39.
    Types of saga ➔Choreography ➔ Orchestration
  • 40.
    choreography ➔ No centralcoordinator ➔ Participants emits and subscribes to messages ➔ Simpler to implement ➔ Provides loose coupling
  • 41.
    orchestration ➔ Central orchestratorcoordinate the events ➔ Works in command reply async style ➔ Less coupling ➔ Smart orchestrator, dumb services
  • 44.
    Saga isolation anomalies ➔Lost update ➔ Dirty read ➔ Non repeatable reads
  • 45.
  • 46.
  • 47.
    Counter measures ➔ Semanticlock— An application-level lock. This can be an actual DB lock, or adding an indicator that this record is being updated with something like *_PENDING added to the status. ➔ Commutative updates— Design update operations to be executable in any order. ➔ Pessimistic view— Reorder the steps of a saga to minimize business risk. ➔ Reread value— Prevent dirty writes by rereading data to verify that it’s unchanged before overwriting it. ➔ Version file— Record the updates to a record so that they can be reordered. ➔ By value— Use each request’s business risk to dynamically select the concurrency mechanism.
  • 48.
    Semantic lock: LOSTUPDATE 1. Create order SAGA -> Flag :: PENDING_APPROVAL 2. Cancel order SAGA -> can’t cancel as flag is PENDING_APPROVAL 3. Create order SAGA -> Flag :: APPROVED
  • 49.
    Reread value: lostupdate 1. Create order saga creates an order 2. Cancel order saga cancels the order 3. Create order saga try to approve the order a. Reads the order status b. Order is cancelled hence don’t do anything
  • 50.
    conclusion ➔ Maintaining ACIDproperties in distributed systems is hard. ➔ It’s essential to understand atomicity and isolation to determine what is required by your systems. ➔ If you can identify what kind of anomalies might happen in your system, then you can take countermeasures. ➔ There’s no silver bullet.