Wait, isn't Cross-Shard Transactions a Sharding feature? What does it have to with Replication? Well, in order to make sharded transactions work, we needed a distributed commit protocol to handle various failovers. One key stage of this protocol is making sure that all shards are “prepared” before they commit. As a result, the replication system needed to make a promise: a transaction that is in the “prepare” state will survive failovers. This means that we don’t corrupt your transactions and we don’t deadlock your system. We worry about those things so that you don’t have to.In this talk, we will follow the lifetime of a sharded transaction and show you how we keep your data safe. You’ll hear about design decisions, implementation details, and key behaviors from two of the engineers that built Prepare Support for Sharded Transactions.
7. Definition of commit
Committed Transaction: When all operations in a MongoDB
transaction are visible outside of the transaction
Majority Committed: When an operation is replicated to a majority
of the replica set
12. Committing Transactions
There are two possibilities when committing on multiple shards:
• All shards successfully commit
• One or more shards fail to commit
We need something to help us deal with both cases
13. The Prepare State
A transaction in the prepared state is guaranteed to be able to
commit
14. The Prepare State
A transaction in the prepared state is guaranteed to be able to
commit
All participating shards must prepare the transaction on a majority
before any shard can commit
15. The Prepare State
A transaction in the prepared state is guaranteed to be able to
commit
All participating shards must prepare the transaction on a majority
before any shard can commit
If any shard fails to prepare, then no shard will commit
20. The Prepare State
The storage engine only stores in-memory information about a
prepared transaction
21. The Prepare State
The storage engine only stores in-memory information about a
prepared transaction
Replication is in charge of the durability guarantees for prepare
35. Session State
aeff42 Prepared
bdfe32 Prepared
Session State
aeff42 Prepared
Session State
aeff42 Prepared
bdfe32 Committed
Session State
aeff42 Prepared
bdfe32 Committed
Session State
aeff42 Prepared
bdfe32 Committed
Startup Recovery with Prepared Transactions
Consistent
Point In Time
36. Startup Recovery with Prepared Transactions
This is safe because:
1. We cannot change prepared transactions
2. Prepare conflicts on the primary will prevent any conflicting
writes from succeeding
3. We don’t accept reads while in recovery
43. Rollback (4.0+)
Stable Timestamp – Consistent majority committed point in time for
replication and storage
Common Point – Point in time after which the rollback node and the
new primary diverge
44. Rollback (4.0+)
Stable Timestamp – Consistent majority committed point in time
for replication and storage
Common Point – Point in time after which the rollback node and the
new primary diverge
45. Rollback (4.0+)
Stable Timestamp – Consistent majority committed point in time for
replication and storage
Common Point – Point in time after which the rollback node
and the new primary diverge
49. Rollback with Prepared Transactions
stableTS
common
point
New primary
branch
Rolling back
branch
50. How do we handle all these scenarios?
The same way
51. Rollback with Prepared Transactions
stableTS
common
point
New primary
branch
Rolling back
branch
Session State
aeff42 Prepared
bdfe32 Prepared
Session State
aeff42 Prepared
Session State
aeff42 Prepared
bdfe32 Prepared
Session State
aeff42 Prepared
bdfe32 Prepared
52. Your data is safe!
We didn't lose your prepared transactions
We didn't corrupt your prepared data
53. Takeaways
Why we need the prepare state
Why prepare needs to be durable
How replication makes prepared transactions
durable