Replication Internals: The Life of a Write

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
492
On Slideshare
401
From Embeds
91
Number of Embeds
2

Actions

Shares
Downloads
11
Comments
0
Likes
1

Embeds 91

http://www.mongodb.com 50
https://www.mongodb.com 41

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Andy Schwerin Lead Engineer, MongoDB
  • 2. • Goals of Replication • Replication Architecture • A representative write
  • 3. • High availability for processing reads and writes – Automatic leader election • Support many network topologies – Tag sets • Accessible consistency model – Ordered operation log • Client can trade latency for durability – Write concern
  • 4. { ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” } } OPLOG …
  • 5. PRIMARY OPLOG 4 SECONDARY OPLOG 8 9 SECONDARY OPLOG 4 5 When a secondary oplog is not a prefix of the primary oplog…
  • 6. w:?
  • 7. w:1 Could lose write when primary disappears, without notification.
  • 8. w:majority Over half of nodes must fail to lose the write. And, an outside operator must intervene before new writes are
  • 9. w:all All nodes have the write before primary responds. But, cannot complete writes if any nodes are
  • 10. OPLOG d.c OPLOG P TS:6 S1 TS:6 S2 TS:2 d.c. insert ({_id:10,name:’john’}, wC: {w:2}}) 1. Fetch oplog entries 2. Apply to collections 3. Write to local oplog 4. Notify primary 5. Repeat
  • 11. OPLOG OBSERVER BATCH BATCH PREFETCH APPLIER BATCH x.y d.cd.c OPLOG d.c. insert ({_id:10,name:’john’}, wC: {w:2}}) P TS:6 S1 TS:6 S2 TS:2
  • 12. OPLOG d.c.insert ({_id:10,name:’john’}, wC: {w:2}}) d.c { ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” } } P TS:4 S1 TS:2 S2 TS:2
  • 13. OPLOG d.c.insert ({_id:10,name:’john’}, wC: {w:2}}) OBSERVER BATCH d.c OPLOG P TS:6 S1 TS:2 S2 TS:2
  • 14. OPLOG d.c.insert ({_id:10,name:’john’}, wC: {w:2}}) BATCH d.c OPLOG OBSERVER P TS:6 S1 TS:2 S2 TS:2
  • 15. OBSERVER BATCH BATCH PREFETCH OPLOG • Split batch into arbitrary work units • Assign work to prefetch threads • Entries processed in any order • All while admitting readers Allow readers
  • 16. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER • Assign entries to workers by target collection • Disable schema constraints Allow readers
  • 17. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER • Concurrency control excludes readers • Oplog entries applied in timestamp order Exclude readers
  • 18. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER Exclude readers• Concurrency control excludes readers • Oplog entries applied in timestamp order
  • 19. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER Exclude readers• Concurrency control excludes readers • Oplog entries applied in timestamp order
  • 20. OBSERVER BATCH BATCH PREFETCH APPLIER BATCH x.y d.c OPLOG • Readmit readers • Move entries from batch to oplog • Begin processing next batch Allow readers
  • 21. OPLOG OBSERVER BATCH BATCH PREFETCH APPLIER BATCH x.y d.cd.c OPLOG Allow readers P TS:6 S1 TS:6 S2 TS:2
  • 22. OPLOG d.c P TS:6 S1 TS:6 S2 TS:2 • Consults list of waiting clients • Looks for those waiting for ts:6 or earlier on S1 • Sends acknowledgement!