Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Replication, Durability, and Disaster Recovery

10,995 views

Published on

This session introduces the basic components of high availability before going into a deep dive on MongoDB replication. We'll explore some of the advanced capabilities with MongoDB replication and best practices to ensure data durability and redundancy. We'll also look at various deployment scenarios and disaster recovery configurations.

Published in: Technology
  • Be the first to comment

Replication, Durability, and Disaster Recovery

  1. 1. Replication& Durability
  2. 2. @spf13 AKASteve Francia15+ yearsbuilding theinternet Father, husband, skateboarderChief Solutions Architect @responsible for drivers,integrations, web & docs
  3. 3. Agenda• Intro to replication• How MongoDB does Replication• Configuring a ReplicaSet• Advanced Replication• Durability• High Availability Scenarios
  4. 4. Replication
  5. 5. Use cases
  6. 6. Use cases• High Availability (auto-failover)
  7. 7. Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)
  8. 8. Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)• Backups Delayed Copy (fat finger) • Online, Time (PiT) backups • Point in
  9. 9. Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)• Backups Delayed Copy (fat finger) • Online, Time (PiT) backups • Point in• Use (hidden) replica for secondary workload • Analytics • Data-processingexternal systems • Integration with
  10. 10. Types of outage
  11. 11. Types of outagePlanned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade
  12. 12. Types of outagePlanned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgradeUnplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption
  13. 13. Replica Set features
  14. 14. Replica Set features• A cluster of N servers
  15. 15. Replica Set features• A cluster of N servers• Any (one) node can be primary
  16. 16. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary
  17. 17. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover
  18. 18. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery
  19. 19. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery• All writes to primary
  20. 20. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery• All writes to primary• Reads can be to primary (default) or a secondary
  21. 21. How MongoDBReplication
  22. 22. How MongoDB Replication works Member 1 Member 3 Member 2• Set is made up of 2 or more nodes
  23. 23. How MongoDB Replication works Member 1 Member 3 Member 2 Primary• Election establishes the PRIMARY• Data replication from PRIMARY to
  24. 24. How MongoDB Replication works negotiate new master Member 1 Member 3 Member 2 DOWN• PRIMARY may fail• Automatic election of new PRIMARY if
  25. 25. How MongoDB Replication works Member 3 Member 1 Primary Member 2 DOWN• New PRIMARY elected• Replica Set re-established
  26. 26. How MongoDB Replication works Member 3 Member 1 Primary Member 2 Recovering• Automatic recovery
  27. 27. How MongoDB Replication works Member 3 Member 1 Primary Member 2• Replica Set re-established
  28. 28. How Is DataReplicated?
  29. 29. How Is Data Replicated? to the• Change operations are writtenoplog • The oplog is a capped collection (fixed size) •Must have enough space to allow new secondaries to catch up (from scratch or from a backup) •Must have enough space to cope with any applicable slaveDelay
  30. 30. How Is Data Replicated? to the• Change operations are written oplog • The oplog is a capped collection (fixed size) •Must have enough space to allow new secondaries to catch up (from scratch or from a backup) •Must have enough space to cope with any applicable slaveDelay• Secondaries query the primary’s oplog and apply what they find • All replicas contain an oplog
  31. 31. Configuringa ReplicaSet
  32. 32. Creating a Replica Set$ ./mongod --replSet <name>> cfg = { _id : "<name>", members : [ { _id : 0, host : "sf1.acme.com" }, { _id : 1, host : "sf2.acme.com" }, { _id : 2, host : "sf3.acme.com" } ]}> use admin> rs.initiate(cfg)
  33. 33. Managing a Replica Set
  34. 34. Managing a Replica Setrs.conf() Shell helper: get current configuration
  35. 35. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica set
  36. 36. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica setrs.reconfig(<cfg>) Shell helper: reconfigure a replica set
  37. 37. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica setrs.reconfig(<cfg>) Shell helper: reconfigure a replica setrs.add("hostname:<port>") Shell helper: add a new member
  38. 38. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica setrs.reconfig(<cfg>) Shell helper: reconfigure a replica setrs.add("hostname:<port>") Shell helper: add a new memberrs.remove("hostname:<port>") Shell helper: remove a member
  39. 39. Managing a Replica Set
  40. 40. Managing a Replica Set rs.status() Reports status of the replica set from one nodes point of view
  41. 41. Managing a Replica Set rs.status() Reports status of the replica set from one nodes point of view rs.stepDown(<secs>) Request the primary to step down
  42. 42. Managing a Replica Set rs.status() Reports status of the replica set from one nodes point of view rs.stepDown(<secs>) Request the primary to step down rs.freeze(<secs>) Prevents any changes to the current replica set configuration (primary/secondary status) Use during backups
  43. 43. AdvancedReplication
  44. 44. Lots of Features• Delayed• Hidden• Priorities• Tags
  45. 45. Slave Delay
  46. 46. Slave Delay• Lags behind master by configurable time delay
  47. 47. Slave Delay• Lags behind master by configurable time delay• Automatically hidden from clients
  48. 48. Slave Delay• Lags behind master by configurable time delay• Automatically hidden from clients• Protects against operator errors • Fat fingering • Application corrupts data
  49. 49. Other member types
  50. 50. Other member types• Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker
  51. 51. Other member types• Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker• Hidden • Not reported in isMaster • Hidden from slaveOk reads
  52. 52. Priorities
  53. 53. Priorities• Priority: a number between 0 and 100• Used during an election: • Most up to date • Highest priority • Less than 10s behind failed Primary• Allows weighting of members during failover
  54. 54. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0
  55. 55. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date
  56. 56. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date• Members A or B will be chosen first • Highest priority
  57. 57. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date• Members A or B will be chosen first • Highest priority• Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date
  58. 58. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date• Members A or B will be chosen first • Highest priority• Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date• Member E is never chosen • priority:0 means it cannot be elected
  59. 59. Durabilit y
  60. 60. Durability Options
  61. 61. Durability Options•Fire and forget
  62. 62. Durability Options•Fire and forget•Write Concern
  63. 63. Write Concern &If a write requires areturn tripWhat the return tripshould depend on
  64. 64. Write Concern
  65. 65. Write Concernw:the number of servers to replicate to (ormajority)
  66. 66. Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replication
  67. 67. Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replicationj:wait for journal sync
  68. 68. Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replicationj:wait for journal synctags:ensure replication to n nodes of given tag
  69. 69. Fire and Forget Driver Primary write apply in memory•Operations are applied in memory•No waiting for persistence to disk•MongoDB clients do not block waiting to confirm the operation completed
  70. 70. Wait for error Driver Primary write getLastError apply in memory•Operations are applied in memory•No waiting for persistence to disk•MongoDB clients do block waiting to confirm the operation completed
  71. 71. Wait for journal sync Driver write Primary getLastError apply in memory j:true Write to journal•Operations are applied in memory•Wait for persistence to journal•MongoDB clients do block waiting to confirm the operation completed
  72. 72. Wait for fsync Driver Primary write getLastError apply in memory fsync:true write to journal (if enabled) fsync•Operations are applied in memory•Wait for persistence to journal•Wait for persistence to disk•MongoDB clients do block waiting to confirm the operation completed
  73. 73. Wait for replication Driver Primary Secondary write getLastError apply in memory w:2 replicate•Operations are applied in memory•No waiting for persistence to disk•Waiting for replication to n nodes•MongoDB clients do block waiting to confirm the operation completed
  74. 74. Tagging• Control over where data is written to.• Each member can have one or more tags: tags: {dc: "stockholm"} tags: {dc: "stockholm", ip: "192.168", rack: "row3-rk7"}• Replica set defines rules for where data resides• Rules defined in RS config... can change without change application code
  75. 75. Tagging - example { _id : "someSet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}} ] settings : { getLastErrorModes : { veryImportant : {"dc" : 3}, sortOfImportant : {"dc" : 2} } }}
  76. 76. HighAvailabilityScenarios
  77. 77. Single Node • Downtime inevitable • If node crashes human intervention might be needed • Should absolutely run with journaling to prevent data loss /
  78. 78. Replica Set 1 • Single datacenterArbiter • Single switch & power • One node failure • Automatic recovery of single node crash • Points of failure: • Power • Network • Datacenter
  79. 79. Replica Set 2 • Single datacenterArbiter • Multiple power/network zones • Automatic recovery of single node crash • w=2 not viable as losing 1 node means no writes • Points of failure: • Datacenter • Two node failure
  80. 80. Replica Set 3 • Single datacenter • Multiple power/network zones • Automatic recovery of single node crash • w=2 viable as 2/3 online • Points of failure: • Datacenter • Two node failure
  81. 81. When disaster
  82. 82. Replica Set 4 • Multi datacenter • DR node for safety • Cant do multi data center durable write safely since only 1 node in distant DC
  83. 83. Replica Set 5 • Three data centers • Can survive full data center loss • Can do w= { dc : 2 } to guarantee write in 2 data centers
  84. 84. SetUse? Data Protection High Availability Notes size Must use --journal to protect X One No No against crashes On loss of one member, surviving Two Yes No member is read only On loss of one member, surviving Three Yes Yes - 1 failure two members can elect a new primary * On loss of two members, X Four Yes Yes - 1 failure* surviving two members are read only On loss of two members, surviving Five Yes Yes - 2 failures three members can elect a new primary Typical
  85. 85. http://spf13.com http://github.com/s @spf13Questions? download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

×