Replication& Durability
@spf13                  AKASteve Francia15+ yearsbuilding theinternet  Father, husband,  skateboarderChief Solutions Archi...
Agenda• Intro to replication• How MongoDB does Replication• Configuring a ReplicaSet• Advanced Replication• Durability• Hig...
Replication
Use cases
Use cases• High Availability (auto-failover)
Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)
Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)• Backups Delayed Copy (fat finger) •...
Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)• Backups Delayed Copy (fat finger) •...
Types of outage
Types of outagePlanned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system /   storage • ...
Types of outagePlanned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system /   storage • ...
Replica Set features
Replica Set features• A cluster of N servers
Replica Set features• A cluster of N servers• Any (one) node can be primary
Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary
Replica Set features•   A cluster of N servers•   Any (one) node can be primary•   Consensus election of primary•   Automa...
Replica Set features•   A cluster of N servers•   Any (one) node can be primary•   Consensus election of primary•   Automa...
Replica Set features•   A cluster of N servers•   Any (one) node can be primary•   Consensus election of primary•   Automa...
Replica Set features•    A cluster of N servers•    Any (one) node can be primary•    Consensus election of primary•    Au...
How MongoDBReplication
How MongoDB    Replication works        Member 1              Member 3                   Member 2• Set is made up of 2 or ...
How MongoDB    Replication works        Member 1              Member 3                   Member 2                    Prima...
How MongoDB     Replication works                     negotiate                    new master         Member 1            ...
How MongoDB    Replication works                              Member 3        Member 1                               Prima...
How MongoDB    Replication works                                Member 3        Member 1                                 P...
How MongoDB    Replication works                            Member 3        Member 1                             Primary  ...
How Is DataReplicated?
How Is Data         Replicated? to the• Change operations are writtenoplog • The oplog is a capped collection (fixed size) ...
How Is Data         Replicated? to the• Change operations are written oplog • The oplog is a capped collection (fixed size)...
Configuringa ReplicaSet
Creating a Replica Set$ ./mongod --replSet <name>> cfg = {  _id : "<name>",  members : [    { _id : 0, host : "sf1.acme.co...
Managing a Replica Set
Managing a Replica Setrs.conf()   Shell helper: get current configuration
Managing a Replica Setrs.conf()   Shell helper: get current configurationrs.initiate(<cfg>);   Shell helper: initiate repli...
Managing a Replica Setrs.conf()   Shell helper: get current configurationrs.initiate(<cfg>);   Shell helper: initiate repli...
Managing a Replica Setrs.conf()   Shell helper: get current configurationrs.initiate(<cfg>);   Shell helper: initiate repli...
Managing a Replica Setrs.conf()   Shell helper: get current configurationrs.initiate(<cfg>);   Shell helper: initiate repli...
Managing a Replica Set
Managing a Replica Set rs.status()    Reports status of the replica set from one    nodes point of view
Managing a Replica Set rs.status()    Reports status of the replica set from one    nodes point of view rs.stepDown(<secs>...
Managing a Replica Set rs.status()    Reports status of the replica set from one    nodes point of view rs.stepDown(<secs>...
AdvancedReplication
Lots of Features• Delayed• Hidden• Priorities• Tags
Slave Delay
Slave Delay• Lags behind master by configurable  time delay
Slave Delay• Lags behind master by configurable  time delay• Automatically hidden from clients
Slave Delay• Lags behind master by configurable  time delay• Automatically hidden from clients• Protects against operator e...
Other member    types
Other member        types• Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker
Other member        types• Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker• Hidden •...
Priorities
Priorities• Priority: a number between 0 and 100• Used during an election: • Most up to date • Highest priority • Less tha...
Priorities - example   A      B     C     D     E  p:10   p:10   p:1   p:1   p:0
Priorities - example         A       B         C      D       E        p:10   p:10       p:1    p:1      p:0•   Assuming a...
Priorities - example           A       B       C       D      E          p:10    p:10     p:1    p:1     p:0•   Assuming a...
Priorities - example           A       B          C      D     E          p:10    p:10       p:1     p:1   p:0•   Assuming...
Priorities - example           A       B          C       D         E          p:10    p:10       p:1     p:1        p:0• ...
Durabilit   y
Durability Options
Durability Options•Fire and forget
Durability Options•Fire and forget•Write Concern
Write Concern                        &If a write requires areturn tripWhat the return tripshould depend on
Write Concern
Write Concernw:the number of servers to replicate to (ormajority)
Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replication
Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replicationj:wait for ...
Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replicationj:wait for ...
Fire and Forget                 Driver           Primary                          write                                   ...
Wait for error              Driver                  Primary                          write                       getLastEr...
Wait for journal             sync              Driver                          write                                      ...
Wait for fsync              Driver                  Primary                          write                       getLastEr...
Wait for replication   Driver                  Primary                     Secondary               write            getLas...
Tagging• Control over where data is written to.• Each member can have one or more tags:  tags: {dc: "stockholm"}  tags: {d...
Tagging - example    {        _id : "someSet",        members : [            {_id : 0, host : "A", tags : {"dc":   "ny"}},...
HighAvailabilityScenarios
Single Node    • Downtime inevitable    • If node crashes human      intervention might be      needed    • Should absolut...
Replica Set 1          • Single datacenterArbiter          • Single switch & power          • One node failure          • ...
Replica Set 2          • Single datacenterArbiter          • Multiple power/network            zones          • Automatic ...
Replica Set 3     • Single datacenter     • Multiple power/network       zones     • Automatic recovery of       single no...
When disaster
Replica Set 4     • Multi datacenter     • DR node for safety     • Cant do multi data       center durable write       sa...
Replica Set 5     • Three data centers     • Can survive full data       center loss     • Can do w= { dc : 2 } to       g...
SetUse?           Data Protection High Availability                Notes       size                                       ...
http://spf13.com                            http://github.com/s                            @spf13Questions?     download a...
Replication, Durability, and Disaster Recovery
Upcoming SlideShare
Loading in...5
×

Replication, Durability, and Disaster Recovery

6,574

Published on

This session introduces the basic components of high availability before going into a deep dive on MongoDB replication. We'll explore some of the advanced capabilities with MongoDB replication and best practices to ensure data durability and redundancy. We'll also look at various deployment scenarios and disaster recovery configurations.

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,574
On Slideshare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
181
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Replication, Durability, and Disaster Recovery

    1. 1. Replication& Durability
    2. 2. @spf13 AKASteve Francia15+ yearsbuilding theinternet Father, husband, skateboarderChief Solutions Architect @responsible for drivers,integrations, web & docs
    3. 3. Agenda• Intro to replication• How MongoDB does Replication• Configuring a ReplicaSet• Advanced Replication• Durability• High Availability Scenarios
    4. 4. Replication
    5. 5. Use cases
    6. 6. Use cases• High Availability (auto-failover)
    7. 7. Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)
    8. 8. Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)• Backups Delayed Copy (fat finger) • Online, Time (PiT) backups • Point in
    9. 9. Use cases• High Availability (auto-failover)• Read Scaling (extra copies to read from)• Backups Delayed Copy (fat finger) • Online, Time (PiT) backups • Point in• Use (hidden) replica for secondary workload • Analytics • Data-processingexternal systems • Integration with
    10. 10. Types of outage
    11. 11. Types of outagePlanned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade
    12. 12. Types of outagePlanned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgradeUnplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption
    13. 13. Replica Set features
    14. 14. Replica Set features• A cluster of N servers
    15. 15. Replica Set features• A cluster of N servers• Any (one) node can be primary
    16. 16. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary
    17. 17. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover
    18. 18. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery
    19. 19. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery• All writes to primary
    20. 20. Replica Set features• A cluster of N servers• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery• All writes to primary• Reads can be to primary (default) or a secondary
    21. 21. How MongoDBReplication
    22. 22. How MongoDB Replication works Member 1 Member 3 Member 2• Set is made up of 2 or more nodes
    23. 23. How MongoDB Replication works Member 1 Member 3 Member 2 Primary• Election establishes the PRIMARY• Data replication from PRIMARY to
    24. 24. How MongoDB Replication works negotiate new master Member 1 Member 3 Member 2 DOWN• PRIMARY may fail• Automatic election of new PRIMARY if
    25. 25. How MongoDB Replication works Member 3 Member 1 Primary Member 2 DOWN• New PRIMARY elected• Replica Set re-established
    26. 26. How MongoDB Replication works Member 3 Member 1 Primary Member 2 Recovering• Automatic recovery
    27. 27. How MongoDB Replication works Member 3 Member 1 Primary Member 2• Replica Set re-established
    28. 28. How Is DataReplicated?
    29. 29. How Is Data Replicated? to the• Change operations are writtenoplog • The oplog is a capped collection (fixed size) •Must have enough space to allow new secondaries to catch up (from scratch or from a backup) •Must have enough space to cope with any applicable slaveDelay
    30. 30. How Is Data Replicated? to the• Change operations are written oplog • The oplog is a capped collection (fixed size) •Must have enough space to allow new secondaries to catch up (from scratch or from a backup) •Must have enough space to cope with any applicable slaveDelay• Secondaries query the primary’s oplog and apply what they find • All replicas contain an oplog
    31. 31. Configuringa ReplicaSet
    32. 32. Creating a Replica Set$ ./mongod --replSet <name>> cfg = { _id : "<name>", members : [ { _id : 0, host : "sf1.acme.com" }, { _id : 1, host : "sf2.acme.com" }, { _id : 2, host : "sf3.acme.com" } ]}> use admin> rs.initiate(cfg)
    33. 33. Managing a Replica Set
    34. 34. Managing a Replica Setrs.conf() Shell helper: get current configuration
    35. 35. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica set
    36. 36. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica setrs.reconfig(<cfg>) Shell helper: reconfigure a replica set
    37. 37. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica setrs.reconfig(<cfg>) Shell helper: reconfigure a replica setrs.add("hostname:<port>") Shell helper: add a new member
    38. 38. Managing a Replica Setrs.conf() Shell helper: get current configurationrs.initiate(<cfg>); Shell helper: initiate replica setrs.reconfig(<cfg>) Shell helper: reconfigure a replica setrs.add("hostname:<port>") Shell helper: add a new memberrs.remove("hostname:<port>") Shell helper: remove a member
    39. 39. Managing a Replica Set
    40. 40. Managing a Replica Set rs.status() Reports status of the replica set from one nodes point of view
    41. 41. Managing a Replica Set rs.status() Reports status of the replica set from one nodes point of view rs.stepDown(<secs>) Request the primary to step down
    42. 42. Managing a Replica Set rs.status() Reports status of the replica set from one nodes point of view rs.stepDown(<secs>) Request the primary to step down rs.freeze(<secs>) Prevents any changes to the current replica set configuration (primary/secondary status) Use during backups
    43. 43. AdvancedReplication
    44. 44. Lots of Features• Delayed• Hidden• Priorities• Tags
    45. 45. Slave Delay
    46. 46. Slave Delay• Lags behind master by configurable time delay
    47. 47. Slave Delay• Lags behind master by configurable time delay• Automatically hidden from clients
    48. 48. Slave Delay• Lags behind master by configurable time delay• Automatically hidden from clients• Protects against operator errors • Fat fingering • Application corrupts data
    49. 49. Other member types
    50. 50. Other member types• Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker
    51. 51. Other member types• Arbiters • Don’t store a copy of the data • Vote in elections • Used as a tie breaker• Hidden • Not reported in isMaster • Hidden from slaveOk reads
    52. 52. Priorities
    53. 53. Priorities• Priority: a number between 0 and 100• Used during an election: • Most up to date • Highest priority • Less than 10s behind failed Primary• Allows weighting of members during failover
    54. 54. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0
    55. 55. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date
    56. 56. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date• Members A or B will be chosen first • Highest priority
    57. 57. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date• Members A or B will be chosen first • Highest priority• Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date
    58. 58. Priorities - example A B C D E p:10 p:10 p:1 p:1 p:0• Assuming all members are up to date• Members A or B will be chosen first • Highest priority• Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date• Member E is never chosen • priority:0 means it cannot be elected
    59. 59. Durabilit y
    60. 60. Durability Options
    61. 61. Durability Options•Fire and forget
    62. 62. Durability Options•Fire and forget•Write Concern
    63. 63. Write Concern &If a write requires areturn tripWhat the return tripshould depend on
    64. 64. Write Concern
    65. 65. Write Concernw:the number of servers to replicate to (ormajority)
    66. 66. Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replication
    67. 67. Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replicationj:wait for journal sync
    68. 68. Write Concernw:the number of servers to replicate to (ormajority)wtimeout:timeout in ms waiting for replicationj:wait for journal synctags:ensure replication to n nodes of given tag
    69. 69. Fire and Forget Driver Primary write apply in memory•Operations are applied in memory•No waiting for persistence to disk•MongoDB clients do not block waiting to confirm the operation completed
    70. 70. Wait for error Driver Primary write getLastError apply in memory•Operations are applied in memory•No waiting for persistence to disk•MongoDB clients do block waiting to confirm the operation completed
    71. 71. Wait for journal sync Driver write Primary getLastError apply in memory j:true Write to journal•Operations are applied in memory•Wait for persistence to journal•MongoDB clients do block waiting to confirm the operation completed
    72. 72. Wait for fsync Driver Primary write getLastError apply in memory fsync:true write to journal (if enabled) fsync•Operations are applied in memory•Wait for persistence to journal•Wait for persistence to disk•MongoDB clients do block waiting to confirm the operation completed
    73. 73. Wait for replication Driver Primary Secondary write getLastError apply in memory w:2 replicate•Operations are applied in memory•No waiting for persistence to disk•Waiting for replication to n nodes•MongoDB clients do block waiting to confirm the operation completed
    74. 74. Tagging• Control over where data is written to.• Each member can have one or more tags: tags: {dc: "stockholm"} tags: {dc: "stockholm", ip: "192.168", rack: "row3-rk7"}• Replica set defines rules for where data resides• Rules defined in RS config... can change without change application code
    75. 75. Tagging - example { _id : "someSet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}} ] settings : { getLastErrorModes : { veryImportant : {"dc" : 3}, sortOfImportant : {"dc" : 2} } }}
    76. 76. HighAvailabilityScenarios
    77. 77. Single Node • Downtime inevitable • If node crashes human intervention might be needed • Should absolutely run with journaling to prevent data loss /
    78. 78. Replica Set 1 • Single datacenterArbiter • Single switch & power • One node failure • Automatic recovery of single node crash • Points of failure: • Power • Network • Datacenter
    79. 79. Replica Set 2 • Single datacenterArbiter • Multiple power/network zones • Automatic recovery of single node crash • w=2 not viable as losing 1 node means no writes • Points of failure: • Datacenter • Two node failure
    80. 80. Replica Set 3 • Single datacenter • Multiple power/network zones • Automatic recovery of single node crash • w=2 viable as 2/3 online • Points of failure: • Datacenter • Two node failure
    81. 81. When disaster
    82. 82. Replica Set 4 • Multi datacenter • DR node for safety • Cant do multi data center durable write safely since only 1 node in distant DC
    83. 83. Replica Set 5 • Three data centers • Can survive full data center loss • Can do w= { dc : 2 } to guarantee write in 2 data centers
    84. 84. SetUse? Data Protection High Availability Notes size Must use --journal to protect X One No No against crashes On loss of one member, surviving Two Yes No member is read only On loss of one member, surviving Three Yes Yes - 1 failure two members can elect a new primary * On loss of two members, X Four Yes Yes - 1 failure* surviving two members are read only On loss of two members, surviving Five Yes Yes - 2 failures three members can elect a new primary Typical
    85. 85. http://spf13.com http://github.com/s @spf13Questions? download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×