One of the many challenges of a distributed architecture is preserving the consistency of data across different systems. During this one-hour presentation, we are going to explore a number of strategies for maintaining consistency, going from the most basic options up to an automated recovery mechanism using compensations and reservations - what’s commonly referred to as a “saga” pattern. Our journey will be based on a hypothetical food delivery application on which we will analyze various decisions and their tradeoffs. The discussion will stay at an abstract, architectural level for the most part, with only a few code examples.
In the agenda:
- Idempotency and Retries
- 2 Phase Commit
- Eventual Consistency
- Compensations
- Reservations
- The Saga Pattern
4. 119 VictorRentea.ro
a training by
1) Vo&ng Phase
- All par6cipants no6fy the coordinator if their local transac6on would commit OK
2) Commit Phase:
- Coordinator decides to commit if all voted "Yes" or rollback; no6fies all par6cipants
§Downsides
- Can s6ll fail, requiring recovery steps
- Involves locking, doesn't scale well
- Not supported by some resources: requires XA drivers, and a JTA coordinator
- Requires direct connec6on to remote DB
2-Phase-Commit (2PC)
119
5. 120 VictorRentea.ro
a training by
Scenario #1
@Entity // in user-api
public class User {
@Id @GeneratedValue
private long id;
private String name;
private LocalDateTime lastMessageTime;
}
@Entity // in message-api
public class Message {
@Id @GeneratedValue
private long id;
private long userId;
private String contents;
private LocalDateTime messageTimestamp;
}
6. 121 VictorRentea.ro
a training by
POST message-api/messages ...
§Sync call to sync state
- PUT user-api/users/{uid}/lastMessageTimestamp
- Fragile: What if user-api is down? Retry? For how long?
§Async send a message via durable queue (eg. Rabbit)
- eg. MessageSentEvent
- What if MQ broker is down? è
§Avoid synchroniza&on: redesign the service boundaries
- GET message-api/lastMessageTimestamp?user={uid}
Scenario #1 - Consistency Strategies
7. 122 VictorRentea.ro
a training by
A call failed or ,med out
Let me try again...
Is the opera,on IDEMPOTENT?
Retry
DUP:REMOVE
8. 123 VictorRentea.ro
a training by
= can be applied many .mes without changing the result. Examples:
§Get Product by id via GET ?
- ✅ YES: the call does not change any data on the server
§Cancel Payment by id via DELETE
- ✅ YES: canceling it again has no addi1onal effect
§Update Product price by id via PUT
- ✅ YES: we would just set the same price again
§Place Order { items: [..] } via POST or MQ
- ❌ NO if retry would create a second order
- ✅ YES, if we deduplicate via lastPlacedOrders = Map<custId, List<orderJsonHash>> (TTL 1h)
§Place Order { items: [..], clickId/messageID: UUID } via POST or MQ
- ✅ YES if we deduplicate via Set<lastSeenClickIds>
§Place Order { id: UUID, items: [..] } via PUT or MQ = Client-generated ID 🤔
- ✅ YES: a duplicate would cause a PK/UK viola1on
Idempotent OperaAon
In DB: alternate UK, next to numeric PK
DUP:REMOVE
9. 124 VictorRentea.ro
a training by
Update DB and send a Message
void f() {
mq.send(..);
repo.save(..)
}
@TransacDonal
void f() {
repo.saveAndFlush(..)
mq.send(..);
}
@TransacDonalEventListener(AFTER_COMMIT)
void aOerCommit(..) {
mq.send(..);
}
db.update(data);
db.commit;
mq.send(message);💥
db.update(data);
mq.send(message);
db.commit;💥
mq.send(message);
db.update(data);💥
db.commit;💥
10. 125 VictorRentea.ro
a training by
Receive a Message and Update DB
If ack is not sent, MQ would retry the message
è Listeners should be idempotent
SEEN_MESSAGES_IDS
db.update(data);
mq.ack(message);💥
mq.ack(message);
db.update(data);💥
11. 126 VictorRentea.ro
a training by
TransacAonal Outbox Table
Problem: update DB and send message atomically.
2PC is not an opNon.
Solu/on:
§Instead of sending the message, INSERT it in 'MESSAGES_TO_SEND' table
§A scheduler polls this table, sends messages in order and removes them
§A form of 'persistent retry'
§Can raise alarms if message is delayed too much
§:/ Could send duplicate messages
12. 127 VictorRentea.ro
a training by
TransacAonal Outbox Table
⏱
Change Data Capture (CDC)
h"p://debezium.io
tails the transac7on log and
publishes every change to a Ka<a topic
13. 128 VictorRentea.ro
a training by
Saga PaJern
Problem: Run a business transacNon across mulNple services (separate DB)
Solu/on: Saga PaYern
§Implement the business transacNon as a sequence of local transacNons
§Each local transacNon updates the DB and sends a message
(command or event) to trigger the next local transacNon to take place
§If a local transacNon fails, the saga executes compensa/ng transac/ons to
undo the previously commiYed transacNons
§CompensaNng acNons must be retry-able
§Use reserva/on (Nmed) for non-reversible steps + confirmaNon/cancel
15. 130 VictorRentea.ro
a training by
Each party commits then calls next step.
On error, each party must call undo on
all previously completed steps.
++COUPLING
Orchestrator calls all parDes synchronously.
On error: orchestrator calls compensaDng 'undo'
endpoints for previously completed steps.
NOT SCALABLE, FRAGILE
Sync Saga
Sync RPC
TransacDon
Orchestrated Choreographed
16. 131 VictorRentea.ro
a training by
Orchestrated
Choreographed
Async Saga
Orchestrator sends messages to par6es.
On error: it sends compensa)ng command
messages to previously completed steps
Async Message
TransacDon
Each service commits and sends a message
to the next service
On error, a party:
a) publishes a failure event,
listened by all previous par6es (coupling++)
b) sends compensa)ng commands to all
par6es stamped on message (Rou6ng Slip)
c) no6fies a Saga Execu)on Coordinator
18. 133 VictorRentea.ro
a training by
§Rou&ng Slip PaAern = Accumulate all previous "undo" ac<ons
- Each service appends its own "undo" informa6on to the message sent forward
- On any error on received message => call/message all UNDO ac6ons
§Error Event upstream
- All previous steps undo on LegalCheckFailedEvent{orderId}
Choreographed Compensations
Stock
Payment
Legal
1) cancel stock reserva6on
2) undo payment
19. 134 VictorRentea.ro
a training by
locked
state
unlocked
state
push in stack
insert coin
insert coin
push in stack
/capture coin
Side-effect / Ac6on
/release coin
External signal
ini2al state
A state machine reacts to various
external signals (inputs)
in different ways (outputs),
depending on its current state
/🤨
/🤨
UML State Diagram:
21. 136 VictorRentea.ro
a training by
Feed hungry people with food from restaurants delivered by couriers.
Assume customer, restaurants and couriers have an app installed.
High level flow:
1.hungry customer orders Food FF from Restaurant RR
2.accept card payment via external payment gateway
3.tell RR to cook FF
4.find a courier CC
5.CC picks FF from RR
6.CC delivers food to customer
7.charge a fee to RR for the service
Exercise: Food Delivery App
22. 137 VictorRentea.ro
a training by
Sagas are Hard
§Keep hard consistency constraints within the boundary of one service.
- (that is, don't distribute)
§Manual intervenNon could be cheaper(eg: by 2nd level support)
- eg. log.error("[CALL-SUPPORT] Out of stock for order {}", ...);
- Implement a Saga to recover from frequent or expensive failures
§Use a Saga framework (or learn from it)
- Orchestrated: Camunda , Apache Camel
- Choreographed: Eventuate. Seata, Axon Saga