Distributed transactions involve multiple servers and require protocols like two-phase commit to ensure atomicity. Transactions must be serializable across servers through techniques like locking, timestamps, or optimistic concurrency control. The two-phase commit protocol involves servers voting to commit or abort in phase 1, and executing the decision in phase 2 to ensure consistency across distributed systems.
Transactions and Concurrency Control in distributed systems. Transaction properties, classification, and transaction implementation. Flat, Nested, and Distributed transactions. Inconsistent Retrievals, Lost Update, Dirty Read, and Premature Writes Problem
Distributed database system is collection of loosely coupled sites that are independeant of each other.
Distributed transaction model
Concurrency control
2 phase commit protocol
Transactions and Concurrency Control in distributed systems. Transaction properties, classification, and transaction implementation. Flat, Nested, and Distributed transactions. Inconsistent Retrievals, Lost Update, Dirty Read, and Premature Writes Problem
Distributed database system is collection of loosely coupled sites that are independeant of each other.
Distributed transaction model
Concurrency control
2 phase commit protocol
Transaction is a unit of program execution that accesses and possibly updates various data items.
Usually, a transaction is initiated by a user program written in a high-level data-manipulation language or programming language (for example, SQL,COBOL, C, C++, or Java), where it is delimited by statements (or function calls) of the form begin transaction and end transaction.
Transaction concept, ACID property, Objectives of transaction management, Types of transactions, Objectives of Distributed Concurrency Control, Concurrency Control anomalies, Methods of concurrency control, Serializability and recoverability, Distributed Serializability, Enhanced lock based and timestamp based protocols, Multiple granularity, Multi version schemes, Optimistic Concurrency Control techniques
Transaction Processing Monitors represent an early type of middleware that is still widely used for performing distributed transactions involving multiple databases.
Usually TPMs employ the two phase commit protocol that ensures ACID properties (Atomicity, Consistency, Isolation, Durability) as in relational databases.
Transaction is a unit of program execution that accesses and possibly updates various data items.
Usually, a transaction is initiated by a user program written in a high-level data-manipulation language or programming language (for example, SQL,COBOL, C, C++, or Java), where it is delimited by statements (or function calls) of the form begin transaction and end transaction.
Transaction concept, ACID property, Objectives of transaction management, Types of transactions, Objectives of Distributed Concurrency Control, Concurrency Control anomalies, Methods of concurrency control, Serializability and recoverability, Distributed Serializability, Enhanced lock based and timestamp based protocols, Multiple granularity, Multi version schemes, Optimistic Concurrency Control techniques
Transaction Processing Monitors represent an early type of middleware that is still widely used for performing distributed transactions involving multiple databases.
Usually TPMs employ the two phase commit protocol that ensures ACID properties (Atomicity, Consistency, Isolation, Durability) as in relational databases.
Atomicity for transactions involving foreign server in PostgreSQLAshutosh Bapat
Slides for my presentation at PGCon 2015 at Ottawa, Canada. The presentation covered the proposed design and implementation of atomicity for transactions involving foreign servers.
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture WSO2
Transactional behavior is a standard requirement for business applications. The microservices paradigm dictates that monolithic business applications should be broken into a collection of independently deployable services which come together to deliver the business functionality. This poses a new set of interesting challenges for microservice architects a developers. One of these challenges include managing transactions across microservices. In the monolithic app, all components and libraries execute within the same process so managing transactional behavior is less of a challenge. However, in the microservices world, the transaction context has to flow through across the network from service to service whenever transactional behavior is desired. In this session, we will look at an approach developed by WSO2 to address this challenge of transaction management for microservices.
Chapter-10 Transaction Processing and Error RecoveryKunal Anand
This chapter discusses the concept of concurrency in database systems. We talk about different concurrency control techniques along with error recovery.
Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - B...Flink Forward
Getting data in and out of Flink is by far the most important aspect, and an everyday typical requirement of building Flink applications. Doing so in an end-to-end exactly-once manner, however, can be tricky. Being able to reliably consume data from the outside world without any duplicate processing and guaranteeing consistent distributed state, and at the same time provide computed results back to the outside world also without introducing duplicates, is crucial for the consistency and correctness of applications built upon stream processors. In this talk, we will talk about how end-to-end exactly-once guarantees can be achieved with Apache Flink. We will talk about Flink’s checkpointing mechanism, and how exactly to leverage it when consuming and producing data from your Flink streaming pipelines. In particular, we will be having a detailed review on how our supported connectors do so, with the aim to provide reference implementations for your own custom consumers and sinks.
Transaction Processing; Concurrency control; ACID properties; Schedule and Discoverability; Serialization; Concurrency control and Recovery; Two Phase locking; Deadlock Shadow Paging
3. 5 March, 2002 3
Transactions
• Definition
– sequence of server operations
– originate from databases (banking, airline reservation, etc)
– atomic operations or sequences (free from interference by
other clients and server crashes)
– durable (when completed, saved in permanent storage)
• Issues in transaction processing
– need to maximise concurrency while ensuring consistency
• serial equivalence/serializability (= same effect as a serial
execution)
– must be recoverable from failures
4. 5 March, 2002 4
Distributed transactions
• Definition
– access objects which are managed by multiple servers
– can be flat or nested
• Sources of difficulties
– all servers must agree to commit or abort
• two-phase commit protocol
– concurrency control in a distributed environment
• locking, timestamps
• optimistic concurrency control
– failures!
• deadlocks, recovery from aborted transactions
5. 5 March, 2002 5
Transaction handling
• Requires coordinator server, with open/close/abort
• Start new transaction (returns unique TID)
openTransaction() -> trans;
• Then invoke operations on recoverable objects
A.withdraw(100);
B.deposit(300)
• If all goes well end transaction (commit or abort)
– closeTransaction(trans) -> (commit, abort);
• Otherwise
– abortTransaction(trans);
6. 5 March, 2002 6
Distributed transactions
• Flat structure:
– client makes requests to more than one server
– request completed before going on to next
– sequential access to objects
• Nested structure:
– arranged in levels: top level can open sub-transactions
– any depth of nesting
– objects in different servers can be invoked in parallel
– better performance
7. 5 March, 2002 7
Distributed transactions
Client
X
Y
Z
X
Y
M
NT1
T2
T11
Client
P
T
T 12
T
21
T
22
(a) Flat transaction (b) Nested transactions
T
T
E.g. TE.g. T1111, T, T1212 can run in parallelcan run in parallel
8. 5 March, 2002 8
How it works...
• Client
– issues openTransaction() to coordinator in any server
– coordinator executes it and returns unique TID to client
TID = server IP address + unique transaction ID
• Servers
– communicate with each other
– keep track of who is who
– coordinator: responsible for commit/abort at the end
– participant: can join(Trans, RefToParticipant)
• manages object accessed in transaction
• keeps track of recoverable objects
• cooperates with coordinator
9. 5 March, 2002 9
Distributed flat banking transaction
..
BranchZ
BranchX
participant
participant
C
D
Client
BranchY
B
A
participantjoin
join
join
T
a.withdraw(4);
c.deposit(4);
b.withdraw(3);
d.deposit(3);
openTransaction
b.withdraw(T, 3);
closeTransaction
T = openTransaction
a.withdraw(4);
c.deposit(4);
b.withdraw(3);
d.deposit(3);
closeTransaction
Note: the coordinator is in one of the servers, e.g. BranchX
coordinator
10. 5 March, 2002 10
One-phase commit
• Distributed transactions
– multiple servers, must either be committed or aborted
• One-phase commit
– coordinator communicates commit/abort to participants
– keeps repeating the request until all acknowledged
• But… server cannot abort part of a transaction:
– when the server crashed and has been replaced...
– when deadlock has been detected and resolved…
• Problem
– when part aborted, the whole transaction may have to be
aborted
11. 5 March, 2002 11
Two-phase commit
• Phase 1 (voting phase)
(1) coordinator sends canCommit? to participants
(2) participant replies with vote (Yes or No); before voting Yes
prepares to commit by saving objects in permanent storage,
and if No aborts
• Phase 2 (completion according to outcome of vote)
(3) coordinator collects votes (including own)
• if no failures and all Yes, sends doCommit to participants
• otherwise, sends doAbort to participants
(4) participants that voted Yes wait for doCommit or doAbort
and act accordingly; confirm their action to coordinator by
sending haveCommitted
12. 5 March, 2002 12
Communication in 2-phase protocol
Coordinator
step
Participant
statusstepstatus
13. 5 March, 2002 12
Communication in 2-phase protocol
Coordinator
step
Participant
statusstepstatus
canCommit?
1
(waiting for votes)
prepared to commit
14. 5 March, 2002 12
Communication in 2-phase protocol
Coordinator
step
Participant
statusstepstatus
canCommit?
1
(waiting for votes)
prepared to commit
Yes 2
(uncertain)
prepared to commit
15. 5 March, 2002 12
Communication in 2-phase protocol
Coordinator
step
Participant
statusstepstatus
canCommit?
1
(waiting for votes)
prepared to commit
Yes 2
(uncertain)
prepared to commit
doCommit3 committed
16. 5 March, 2002 12
Communication in 2-phase protocol
Coordinator
step
Participant
statusstepstatus
canCommit?
1
(waiting for votes)
prepared to commit
Yes 2
(uncertain)
prepared to commit
doCommit3 committed
haveCommitted 4 committed
17. 5 March, 2002 12
Communication in 2-phase protocol
Coordinator
step
Participant
statusstepstatus
canCommit?
1
(waiting for votes)
prepared to commit
Yes 2
(uncertain)
prepared to commit
doCommit3 committed
haveCommitted 4 committed
done
18. 5 March, 2002 13
What can go wrong...
• In distributed systems
– objects stored/managed at different servers
• Server crashes
– participant: save in permanent storage when preparing to
commit, retrieve data after crash
– coordinator: delay till replaced, or cooperative approach
• Messages fail to arrive (server crash or link failure)
– use timeout for each step that may block (but no reliable
failure detector, asynchronous communication)
– if uncertain, participant prompts coordinator by getDecision
– if in doubt (e.g. initial canCommit? or votes missing), abort!
19. 5 March, 2002 14
Nested transactions
• Top-level transaction
– starts subtransactions with unique TID (extension of the
parent TID)
– subtransaction joins parent transaction
– completes when all subtransactions have completed
– can commit even if one of its subtransactions aborted...
• Subtransactions
– can be independent (e.g. act on different bank accounts)
– can execute in parallel, at different servers
– can provisionally commit or abort
– if parent aborts, must abort too
20. 5 March, 2002 15
Nested banking transaction
a.withdraw(10)
c.deposit(10)
b.withdraw(20)
d.deposit(20)
Client A
B
C
T1
T2
T3
T4
T
D
X
Y
Z
T = openTransaction
openSubTransaction
a.withdraw(10);
closeTransaction
openSubTransaction
b.withdraw(20);
openSubTransaction
c.deposit(10);
openSubTransaction
d.deposit(20);
IfIf b.withdrawb.withdraw aborts due to insufficient funds,aborts due to insufficient funds,
no need to abort the whole transactionno need to abort the whole transaction
21. 5 March, 2002 16
Nested two-phase commit
• Used to decide when top-level transaction commits
• Top-level transaction
– is coordinator in two-phase commit
– knows all subtransactions that joined
– keeps record of subtransaction info
• Subtransactions
– report status back to parent
– when abort: reports abort, ignoring children status (now
orphans)
– when provisionally commit: reports status of all child
subtransactions
22. 5 March, 2002 17
Transaction T decides to commit
1
2
T
11
T
12
T
22
T21
abort (at server M)
provisional commit (at N)
provisional commit (at X)
aborted (at Y)
provisional commit (at N)
provisional commit (at P)
T
T
T
23. 5 March, 2002 17
Transaction T decides to commit
1
2
T
11
T
12
T
22
T21
abort (at server M)
provisional commit (at N)
provisional commit (at X)
aborted (at Y)
provisional commit (at N)
provisional commit (at P)
T
T
T
orphansorphans
24. 5 March, 2002 18
Hierarchic two-phase commit
• Multi-level nested protocol
– coordinator of top-level transaction is coordinator
– coordinator sends canCommit? to coordinator of
subtransactions one level down the tree
– propagate to next level down the tree, etc
– aborted subtransactions ignored
– participants collect replies from children before replying
• if any provisionally committed subtransaction found,
prepares the object and votes Yes
• if none found, assume must have crashed and vote No
• Second phase (completion using doCommit)
– same as before
25. 5 March, 2002 19
Concurrency control
• Needed at each server
– to ensure consistency
• In distributed systems
– consistency needed across multiple servers
• Methods
– Locking
• processes run at different servers can lock objects
– Timestamping
• global unique timestamps
– Optimistic concurrency control
• validate transaction at multiple servers before committing
26. 5 March, 2002 20
Locking
• Locks
– control availability of objects
– lock manager held at the same server as objects
– to acquire lock: contact server
– to release: must delay until transactions commit/abort
• Issues
– locks acquired independently
– cyclic dependencies may arise
T: locks A for writing; U: locks B for writing;
T: wants to read B - must wait; U: wants to read A - must wait;
– distributed deadlock detection and resolution needed
27. 5 March, 2002 21
Timestamp ordering
• If a single server...
– coordinator issues unique timestamp to each transaction
– versions of objects committed in timestamp order
– ensures serializability
• In distributed transactions
– coordinator issues globally unique timestamps to the client
opening transaction:
<local timestamp, server ID>
– synchronised clocks sometimes used for efficiency
– objects committed in global timestamp order
– conflicts resolved, or else abort
28. 5 March, 2002 22
Optimistic concurrency control
• If a single server...
– alternative to locking (avoids overhead and deadlocks)
– transactions allowed to proceed but
– validated before allowed to commit: if conflict arises may be
aborted
• transactions given numbers at the start of validation
• serialised according to this order
• In distributed transactions
– must be validated by multiple independent servers (in the
first phase of two-phase commit protocol)
– global validation needed (serialise across servers)
– parallel also possible
29. 5 March, 2002 23
Other issues
• Distributed deadlocks!
– often unavoidable, since cannot predict dependencies and
server crashes possible
– use deadlock detection, priorities, etc
• Recovery
– must ensure all of committed transactions and none of the
aborted transactions recorded in permanent storage
– use logging, recovery files, shadowing, etc
• See textbook for more info
30. 5 March, 2002 24
Summary
• Transactions
– crucial to the running of large distributed systems
– atomic, durable, serializable
– order of updates important
– require two-phase commit protocol
• Distributed transactions
– run on multiple servers
– can be flat or nested
– hierarchical two-phase commit
– concurrency control adapted to distributed environment