Your SlideShare is downloading. ×
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Advanced Replication Internals
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Advanced Replication Internals

209

Published on

Internals of replication in mongodb. These internals cover replication selection, the replication process, elections (and the rules), and oplog transformation. …

Internals of replication in mongodb. These internals cover replication selection, the replication process, elections (and the rules), and oplog transformation.

This presentation was given at the MongoDB San Francisco conference.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
209
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. AdvancedReplicationInternals
  • 2. Design and GoalsGoals■ Highly Available■ Consistent Data■ Automatic Failover■ Multi-Region/DC■ Dynamic ReadsDesign● All DBs, each node● Quorum/Election● Smart clients● Source selection● Read Preferences● Record operations● Asynchronous● Write/Replicationacknowledgements
  • 3. Goal: High Availability● Node Redundancy: Duplicate Data● Record Write Operations● Apply Write Operations● Use capped collection called "oplog"
  • 4. Replication OperationsInsert● oplog entry (fields):○ op, o{"ns" : "test.gamma","op" : "i", "v" : 2,"ts" : Timestamp(1350504342, 5),"o" : { "_id" : 2, "x" : "hi"} }
  • 5. Replication OperationsUpdate● oplog entry (fields):○ o = update, o2 = query{"ns" : "test.tags","op" : "u", "v" : 2,"ts": Timestamp(1368049619, 1),"o2" : { "_id" : 1 },"o" : { "$set" : { "tags.4" : "e" } } }
  • 6. Operation Transformation● Idempotent (update by _id)● Multi-update/delete (results in many ops)● Array modifications (replacement)
  • 7. Interchangeable● All members maintain oplog + dbs● All able to take over, or be used for samefunctions
  • 8. Replication Process● Record oplog entry on write● Idempotent entries● Pulled by replicas1. Read over network2. Buffer locally3. Apply in batch4. Repeat
  • 9. Read + Apply Decoupled● Background oplog reader thread● Pool of oplog applier threads (by collection)Repl SourceApplierThreadPool16BufferDB4DB3DB1 DB2Local OplogNetworkBatchComplete
  • 10. Replication Metrics"network": {"bytes": 103830503805,"readersCreated": 2248,"getmores": {"totalMillis": 257461206,"num": 2152267 },"ops": 7285440 }"buffer": {"sizeBytes": 0,"maxSizeBytes": 268435456,"count": 0},"preload": { "docs": {"totalMillis":0,"num":0},"indexes": {"totalMillis": 23142318,"num": 14560667 } },"apply": {"batches": {"totalMillis": 231847,"num": 1797105},"ops": 7285440 },"oplog": {"insertBytes": 106866610253,"insert": {"totalMillis": 1756725,"num": 7285440 } }
  • 11. Good Replication States● Initial Sync○ Record oplog start position○ Clone/copy all dbs○ Set minvalid, apply oplog since start○ Build indexes● Replication Batch: MinValid
  • 12. Goal: Consistent Data● Single Master● Quorum (majority)● Ordered Oplog
  • 13. Consistent DataWhy a single master?
  • 14. Election EventsElection events:● Primary failure● Stepdown (manual)● Reconfigure● Quorum loss
  • 15. Election NominationDisqualificationsA replica will nominate itself unless:● Priority:0 or arbiter● Not freshest● Just stepped down (in unelectable state)● Would be vetoed by anyone because○ There is a Primary already○ They dont have us in their config○ Higher priority member out there● Higher config version out there
  • 16. The ElectionNomination:● If it looks like a tie, sleep random time(unless first node)Voting:● If all goes well, only one nominee● All voting members vote for one nominee● Majority of votes wins
  • 17. Goal: Automatic Failover● Single Master● Smart Clients● Discovery
  • 18. DiscoveryisMaster command:setName: <name>,ismaster: true, secondary: false, arbiterOnly:hosts: [ <visible nodes> ],passives: [ <prio:0 nodes> ],arbiters: [ <nodes> ],primary: <active primary>,tags: {<tags>},me: <me>
  • 19. Failover ScenarioClientPSSDiscovery (isMaster)Active Primary
  • 20. Failover ScenarioClientPSSActive PrimaryPFailed Primary
  • 21. Failover ScenarioClientFailedPSDiscovery (isMaster)Active Primary
  • 22. Replication Source Selectn● Select closest source○ Limit to non-hidden or slave delayed○ If nothing, try again with hidden/slave delayed○ Select node with fastest "ping" time○ Must be fresher● Choose source when○ Starting○ Any error with existing source (network, query)○ Any member is 30s ahead of current source● Manual override○ replSetSyncSource -- good until we choose again
  • 23. Goal: Datacenter Aware● Dynamic replication topologies● Beachhead data center serverP
  • 24. Goal: Dynamic ReadsControls for consistency● Default to Primary● Non-primary allowed● Based on○ Locality (ping/tags)○ TagsClientSPSTags: A,BTags: B, C
  • 25. Asynchronous Replication● Important considerations● Additional requirements● System/Application controls
  • 26. Write Propagation● Write Concern● Replication requirements● Timing● Dynamic requirements
  • 27. Exceptional Conditions● Multiple Primaries● Rollback● Too stale
  • 28. Design and GoalsGoals■ Highly Available■ Consistent Data■ Automatic Failover■ Multi-Region/DC■ Dynamic ReadsDesign● All DBs, each node● Quorum/Election● Smart clients● Source selection● Read Preferences● Record operations● Asynchronous● Write/Replicationacknowledgements
  • 29. ThanksQuestions?

×