Andy Schwerin
Lead Engineer, MongoDB
• Goals of Replication
• Replication Architecture
• A representative write
• High availability for processing reads and writes
– Automatic leader election
• Support many network topologies
– Tag se...
{ ts: 4,
op: “i”,
ns: “d.c”,
o: { _id: 10, name: “john” }
}
OPLOG
…
PRIMARY OPLOG
4
SECONDARY OPLOG
8 9
SECONDARY OPLOG
4 5
When a secondary oplog is not a prefix
of the primary oplog…
w:?
w:1
Could lose write when
primary disappears,
without notification.
w:majority
Over half of nodes must
fail to lose the write.
And, an outside
operator must intervene
before new writes are
w:all
All nodes have the
write before primary
responds.
But, cannot complete
writes if any nodes are
OPLOG
d.c
OPLOG
P TS:6
S1 TS:6
S2 TS:2
d.c. insert ({_id:10,name:’john’}, wC: {w:2}})
1. Fetch oplog entries
2. Apply to c...
OPLOG
OBSERVER
BATCH
BATCH
PREFETCH
APPLIER
BATCH
x.y d.cd.c
OPLOG
d.c. insert ({_id:10,name:’john’}, wC: {w:2}})
P TS:6
S...
OPLOG
d.c.insert ({_id:10,name:’john’}, wC: {w:2}})
d.c
{ ts: 4,
op: “i”,
ns: “d.c”,
o: { _id: 10, name: “john” }
}
P TS:4...
OPLOG
d.c.insert ({_id:10,name:’john’}, wC: {w:2}})
OBSERVER
BATCH
d.c
OPLOG
P TS:6
S1 TS:2
S2 TS:2
OPLOG
d.c.insert ({_id:10,name:’john’}, wC: {w:2}})
BATCH
d.c
OPLOG
OBSERVER
P TS:6
S1 TS:2
S2 TS:2
OBSERVER
BATCH
BATCH
PREFETCH
OPLOG
• Split batch into arbitrary work units
• Assign work to prefetch threads
• Entries pr...
OBSERVER
BATCH
BATCH
PREFETCH
OPLOG
BATCH
x.y d.c
APPLIER
• Assign entries to workers by
target collection
• Disable schem...
OBSERVER
BATCH
BATCH
PREFETCH
OPLOG
BATCH
x.y d.c
APPLIER
• Concurrency control excludes
readers
• Oplog entries applied i...
OBSERVER
BATCH
BATCH
PREFETCH
OPLOG
BATCH
x.y d.c
APPLIER
Exclude readers• Concurrency control excludes
readers
• Oplog en...
OBSERVER
BATCH
BATCH
PREFETCH
OPLOG
BATCH
x.y d.c
APPLIER
Exclude readers• Concurrency control excludes
readers
• Oplog en...
OBSERVER
BATCH
BATCH
PREFETCH
APPLIER
BATCH
x.y d.c
OPLOG
• Readmit readers
• Move entries from batch to oplog
• Begin pro...
OPLOG
OBSERVER
BATCH
BATCH
PREFETCH
APPLIER
BATCH
x.y d.cd.c
OPLOG
Allow readers
P TS:6
S1 TS:6
S2 TS:2
OPLOG
d.c
P TS:6
S1 TS:6
S2 TS:2
• Consults list of waiting clients
• Looks for those waiting for ts:6 or
earlier on S1
• ...
Replication Internals: The Life of a Write
Replication Internals: The Life of a Write
Replication Internals: The Life of a Write
Replication Internals: The Life of a Write
Upcoming SlideShare
Loading in...5
×

Replication Internals: The Life of a Write

480

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
480
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Replication Internals: The Life of a Write

  1. 1. Andy Schwerin Lead Engineer, MongoDB
  2. 2. • Goals of Replication • Replication Architecture • A representative write
  3. 3. • High availability for processing reads and writes – Automatic leader election • Support many network topologies – Tag sets • Accessible consistency model – Ordered operation log • Client can trade latency for durability – Write concern
  4. 4. { ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” } } OPLOG …
  5. 5. PRIMARY OPLOG 4 SECONDARY OPLOG 8 9 SECONDARY OPLOG 4 5 When a secondary oplog is not a prefix of the primary oplog…
  6. 6. w:?
  7. 7. w:1 Could lose write when primary disappears, without notification.
  8. 8. w:majority Over half of nodes must fail to lose the write. And, an outside operator must intervene before new writes are
  9. 9. w:all All nodes have the write before primary responds. But, cannot complete writes if any nodes are
  10. 10. OPLOG d.c OPLOG P TS:6 S1 TS:6 S2 TS:2 d.c. insert ({_id:10,name:’john’}, wC: {w:2}}) 1. Fetch oplog entries 2. Apply to collections 3. Write to local oplog 4. Notify primary 5. Repeat
  11. 11. OPLOG OBSERVER BATCH BATCH PREFETCH APPLIER BATCH x.y d.cd.c OPLOG d.c. insert ({_id:10,name:’john’}, wC: {w:2}}) P TS:6 S1 TS:6 S2 TS:2
  12. 12. OPLOG d.c.insert ({_id:10,name:’john’}, wC: {w:2}}) d.c { ts: 4, op: “i”, ns: “d.c”, o: { _id: 10, name: “john” } } P TS:4 S1 TS:2 S2 TS:2
  13. 13. OPLOG d.c.insert ({_id:10,name:’john’}, wC: {w:2}}) OBSERVER BATCH d.c OPLOG P TS:6 S1 TS:2 S2 TS:2
  14. 14. OPLOG d.c.insert ({_id:10,name:’john’}, wC: {w:2}}) BATCH d.c OPLOG OBSERVER P TS:6 S1 TS:2 S2 TS:2
  15. 15. OBSERVER BATCH BATCH PREFETCH OPLOG • Split batch into arbitrary work units • Assign work to prefetch threads • Entries processed in any order • All while admitting readers Allow readers
  16. 16. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER • Assign entries to workers by target collection • Disable schema constraints Allow readers
  17. 17. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER • Concurrency control excludes readers • Oplog entries applied in timestamp order Exclude readers
  18. 18. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER Exclude readers• Concurrency control excludes readers • Oplog entries applied in timestamp order
  19. 19. OBSERVER BATCH BATCH PREFETCH OPLOG BATCH x.y d.c APPLIER Exclude readers• Concurrency control excludes readers • Oplog entries applied in timestamp order
  20. 20. OBSERVER BATCH BATCH PREFETCH APPLIER BATCH x.y d.c OPLOG • Readmit readers • Move entries from batch to oplog • Begin processing next batch Allow readers
  21. 21. OPLOG OBSERVER BATCH BATCH PREFETCH APPLIER BATCH x.y d.cd.c OPLOG Allow readers P TS:6 S1 TS:6 S2 TS:2
  22. 22. OPLOG d.c P TS:6 S1 TS:6 S2 TS:2 • Consults list of waiting clients • Looks for those waiting for ts:6 or earlier on S1 • Sends acknowledgement!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×