Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Replication 
Internals 
Fitting Everything Together
2.8, Refactored 
● Architecture as of 2.8 
● Unit testable; more, and faster, cpp tests 
● Many changes (heartbeats, locki...
Large Blocks 
● Topology Manager (state machine) 
● Replication Coordinator (repl facade) 
● Applier (replicate/apply oplo...
Blocks 
CFG 
Topology Manager 
Applier 
Replication 
Coordinator 
Oplog 
CMDs 
Writes 
Query 
Executor
Blocks 
CFG 
Topology Manager 
Applier 
Replication 
Coordinator 
Oplog 
CMDs 
Writes 
Query 
Executor
Topology 
● Maintains Authoritative State 
o Heartbeat, ping, member state 
o Roles and transitions 
● Contains Decision L...
Examples 
● updateConfig 
● prepare*Response for commands 
● getSyncSource, * 
● setFollowerMode (state) 
● processHeartbe...
PrepareHeartbeatResponse 
Status TopologyCoordinatorImpl::prepareHeartbeatResponse(...) { 
// Check error conditions, then...
Failover Scenario 
Heart 
beats P 
S 
HAeaclttihve C Phreimcka (rrysHB) 
S
Failover Scenario 
Heart 
beats P 
S 
Active Primary 
Failed S
Failover Scenario 
Heart 
beats Failed 
P 
Health Check (rsHB) 
S
Blocks 
CFG 
Topology Manager 
Applier 
Replication 
Coordinator 
Oplog 
CMDs 
Writes 
Query 
Executor
Replications Coordinator 
● Interface to other subsystems 
● Uses executor to schedule 
o Commands 
o Elections, Initiate,...
Blocks 
Applier 
Replication 
Coordinator 
CFG 
Oplog 
CMDs 
Writes 
Query 
Executor 
Topology Manager
Examples 
● process*Response for commands 
● awaitReplication* (for writes or migration) 
● isReplEnabled 
● canAcceptWrit...
Accepting writes 
static bool checkIsMasterForDatabase(const std::string& db, ...) { 
if (!getReplicationCoordinator()->ca...
Blocks 
CFG 
Topology Manager 
Applier 
Replication 
Coordinator 
Oplog 
CMDs 
Writes 
Query 
Executor
Applier 
● Reads from *upstream* oplog 
● Applier operations transformations 
● Mostly unchanged since 2.4 
● Includes Upd...
Read + Apply Decoupled 
● Background oplog reader thread (net) 
● Pool of oplog applier threads (by collection) 
Repl Sour...
Replication Operations 
oplog entry (fields): 
o = update, o2 = query 
{ "ns" : "test.tags", 
"op" : "u", "v" : 2, "ts": ....
Blocks 
CFG 
Topology Manager 
Applier 
Replication 
Coordinator 
Oplog 
CMDs 
Writes 
Query 
Executor
Executor 
● Serializes access to Topology state 
● Serializes global state changes wrt db writes 
● Processes network requ...
Write Request 
● Sent by user 
● Interpreted by command subsystem 
● Checked by replication coordinator 
● Executed 
● Ide...
Write Request 
Applier 
Replication 
Coordinator 
CFG 
Oplog 
CMDs 
Writes 
Query 
Executor 
Topology Manager
● Topology Manager (state machine) 
● Replication Coordinator (repl facade) 
● Applier (replicate/apply oplog) 
● Executor...
Thanks 
Questions?
Upcoming SlideShare
Loading in …5
×

MongoDB 2.8 Replication Internals: Fitting it all together

638 views

Published on

MongoDB replication internal architecture for 2.8

Abstract:
Replication in MongoDB requires deep integration with almost every part of the codebase, and has important hooks in various systems like storage, indexing, command processing and querying. Most of the replication components have seen a major overhaul recently in order to make further improvements. In this talk we will address what those pieces are, how they interact, and interesting choices made during their design. In this talk we get into the interaction of the replication protocols, commands really, writes and write concern enforcement, consensus (elections/ leader/follower/ majority) behaviors, and down into the depths of oplog generation and application on replicas. While a large part of the talk will be a technical overview of the big pieces we will dive into many important areas in order to ensure better understanding. The audience will be able to greatly affect which areas we focus on during the session, so come with ideas and a focus.

Published in: Technology
  • Be the first to comment

MongoDB 2.8 Replication Internals: Fitting it all together

  1. 1. Replication Internals Fitting Everything Together
  2. 2. 2.8, Refactored ● Architecture as of 2.8 ● Unit testable; more, and faster, cpp tests ● Many changes (heartbeats, locking, future) ● Interop with 2.6 ● Larger replica sets
  3. 3. Large Blocks ● Topology Manager (state machine) ● Replication Coordinator (repl facade) ● Applier (replicate/apply oplog) ● Executor (network, heartbeats, serialization) ● Commands (re-config, init, status, etc) ● External (writes, storage, query, commands)
  4. 4. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor
  5. 5. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor
  6. 6. Topology ● Maintains Authoritative State o Heartbeat, ping, member state o Roles and transitions ● Contains Decision Logic ● Unit Testable ● Serial Access CFG Topology Manager
  7. 7. Examples ● updateConfig ● prepare*Response for commands ● getSyncSource, * ● setFollowerMode (state) ● processHeartbeat ● prepareHeartbeatResponse
  8. 8. PrepareHeartbeatResponse Status TopologyCoordinatorImpl::prepareHeartbeatResponse(...) { // Check error conditions, then set response fields … response->setElectable(!_getMyUnelectableReason(...)); response->setHbMsg(_getHbmsg(...)); response->setTime(...); response->setOpTime(lastOpApplied); if (!_syncSource) { response->setSyncingTo(_syncSource); } … topology_coordinator_impl.cpp:628
  9. 9. Failover Scenario Heart beats P S HAeaclttihve C Phreimcka (rrysHB) S
  10. 10. Failover Scenario Heart beats P S Active Primary Failed S
  11. 11. Failover Scenario Heart beats Failed P Health Check (rsHB) S
  12. 12. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor
  13. 13. Replications Coordinator ● Interface to other subsystems ● Uses executor to schedule o Commands o Elections, Initiate, Reconfig o Role/State Changes ● Unit Testable o With help, requires mocking out bridge for subsystems Replication Coordinator
  14. 14. Blocks Applier Replication Coordinator CFG Oplog CMDs Writes Query Executor Topology Manager
  15. 15. Examples ● process*Response for commands ● awaitReplication* (for writes or migration) ● isReplEnabled ● canAcceptWrites*
  16. 16. Accepting writes static bool checkIsMasterForDatabase(const std::string& db, ...) { if (!getReplicationCoordinator()->canAcceptWritesForDatabase(db)){ errorDetail->setErrCode(ErrorCodes::NotMaster); errorDetail->setErrMessage("Not primary while writing to " + ns); return false; } return true; }
  17. 17. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor
  18. 18. Applier ● Reads from *upstream* oplog ● Applier operations transformations ● Mostly unchanged since 2.4 ● Includes UpdatePosition commands Applier
  19. 19. Read + Apply Decoupled ● Background oplog reader thread (net) ● Pool of oplog applier threads (by collection) Repl Source Buffer Applier Pool DB1 DB2 DB4 DB3 Local Oplog Network
  20. 20. Replication Operations oplog entry (fields): o = update, o2 = query { "ns" : "test.tags", "op" : "u", "v" : 2, "ts": ..., "o2" : { "_id" : 1 }, "o" : { "$set" : { "tags.4" : "e" } } }
  21. 21. Blocks CFG Topology Manager Applier Replication Coordinator Oplog CMDs Writes Query Executor
  22. 22. Executor ● Serializes access to Topology state ● Serializes global state changes wrt db writes ● Processes network requests in IO pool ● Supports event/signal notification
  23. 23. Write Request ● Sent by user ● Interpreted by command subsystem ● Checked by replication coordinator ● Executed ● Idempotent entry recorded in oplog ● ~ Replicated ● ~ Possibly verified during user write request
  24. 24. Write Request Applier Replication Coordinator CFG Oplog CMDs Writes Query Executor Topology Manager
  25. 25. ● Topology Manager (state machine) ● Replication Coordinator (repl facade) ● Applier (replicate/apply oplog) ● Executor (network, heartbeats, serialization) ● Commands (re-config, init, status, etc) ● External (writes, storage, query, commands)
  26. 26. Thanks Questions?

×