Your SlideShare is downloading. ×
0
#MongoSeoulReplication and ReplicaSetsSteve FranciaChief Evangelist, 10gen
Agenda• Replica Sets Lifecycle• Developing with Replica Sets• Operational Considerations• Behind the Curtain
Why Replication?• How many have faced node failures?• How many have been woken up from sleep to do a fail-over(s)?• How ma...
ReplicaSet Lifecycle
Replica Set – Creation
Replica Set – Initialize
Replica Set – Failure
Replica Set – Failover
Replica Set – Recovery
Replica Set – Recovered
ReplicaSet Roles &Configuration
Replica Set Roles
Configuration Options> conf = {    _id : "mySet",    members : [         {_id : 0, host : "A”, priority : 3},         {_id...
Configuration Options> conf = {    _id : "mySet”,    members : [                                  Primary DC         {_id ...
Configuration Options> conf = {                                                                  Secondary DC    _id : "my...
Configuration Options> conf = {    _id : "mySet”,    members : [         {_id : 0, host : "A”, priority : 3},             ...
Configuration Options> conf = {    _id : "mySet”,    members : [         {_id : 0, host : "A”, priority : 3},         {_id...
Developing with ReplicaSets
Strong Consistency
Delayed Consistency
Write Concern• Network acknowledgement• Wait for error• Wait for journal sync• Wait for replication
Unacknowledged
MongoDB Acknowledged (wait forerror)
Wait for Journal Sync
Wait for Replication
Tagging• New in 2.0.0• Control where data is written to, and read from• Each member can have one or more tags   – tags: {d...
Tagging Example{    _id : "mySet",    members : [       {_id : 0, host : "A", tags : {"dc": "ny"}},       {_id : 1, host :...
Wait for Replication (Tagging)
Read Preference Modes• 5 modes (new in 2.2)  –   primary (only) - Default  –   primaryPreferred  –   secondary  –   second...
Tagged Read Preference• Custom read preferences• Control where you read from by (node) tags   – E.g. { "disk": "ssd", "use...
OperationalConsiderations
Maintenance and Upgrade• No downtime• Rolling upgrade/maintenance  – Start with Secondary  – Primary last
Replica Set – 1 Data Center                • Single datacenter                • Single switch & power                • Poi...
Replica Set – 2 Data Centers               • Multi data center               • DR node for safety               • Can’t do...
Replica Set – 3 Data Centers               • Three data centers               • Can survive full data                 cent...
Behind the Curtain
Implementation details• Heartbeat every 2 seconds   – Times out in 10 seconds• Local DB (not replicated)   – system.replse...
Op(erations) Log isidempotent> db.replsettest.insert({_id:1,value:1}){ "ts" : Timestamp(1350539727000, 1), "h" :NumberLong...
Single operation can havemany entries> db.replsettest.update({},{$set:{name : ”foo”}, false, true}){ "ts" : Timestamp(1350...
What’s New in 2.2• Read preference support with sharding   – Drivers too• Improved replication over WAN/high-latency netwo...
Just Use It• Use replica sets• Easy to setup   – Try on a single machine• Check doc page for RS tutorials   – http://docs....
#MongoSeoulThank YouSteve FranciaChief Evangelist, 10gen
Upcoming SlideShare
Loading in...5
×

Replication and Replica Sets

803

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
803
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
29
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Basic explanation2 or more nodes form the setQuorum
  • Initialize -> ElectionPrimary + data replication from primary to secondary
  • Primary down/network failureAutomatic election of new primary if majority exists
  • New primary electedReplication established from new primary
  • Down node comes upRejoins setsRecovery and then secondary
  • PrimaryData memberSecondaryHot standbyArbitersVoting member
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • Analytics good for integrations with Hadoop, Storm, etc.
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • ConsistencyWrite preferencesRead preferences
  • Not really fire and forget. This return arrow is to confirm that the network successfully transferred the packet(s) of data.This confirms that the TCP ACK response was received.
  • Presenter should mention:Default is w:1w:majority is what most people should use for durability. Majority is a special token here signifying more than half of the nodes in the set have acknowledged the write.
  • Using 'someDCs' so that in the event of an outage, at least a majority of the DCs would receive the change. This favors availability over durability.
  • Using 'allDCs' because we want to make certain all DCs have this piece of data. If any of the DCs are down, this would timeout. This favors durability over availability.
  • Upgrade/maintenanceCommon deployment scenarios
  • A good question to ask the audience : 'Why wouldn't you set w={dc:3}'… Why would you ever do that? What would be the complications?
  • Schemaoplog
  • rs.syncFromallows administrators to configure the member of a replica set that the current member will pull data from. Specify the name of the member you want to sync from in the form of [hostname]:[port].Replica Set Members will not Sync from Members Without Indexes Unless buildIndexes: falseTo prevent inconsistency between members of replica sets, if the member of a replica set has members[n].buildIndexes set to true, other members of the replica set will not sync from this member, unless they also have members[n].buildIndexes set to true. See SERVER-4160 for more information.New Option To Configure Index Pre-Fetching during ReplicationBy default, when replicating options, secondaries will pre-fetch Indexes associated with a query to improve replication throughput in most cases. The replIndexPrefetch setting and --replIndexPrefetch option allow administrators to disable this feature or allow the mongod to pre-fetch only the index on the _id field. See SERVER-6718 for more information.
  • Transcript of "Replication and Replica Sets"

    1. 1. #MongoSeoulReplication and ReplicaSetsSteve FranciaChief Evangelist, 10gen
    2. 2. Agenda• Replica Sets Lifecycle• Developing with Replica Sets• Operational Considerations• Behind the Curtain
    3. 3. Why Replication?• How many have faced node failures?• How many have been woken up from sleep to do a fail-over(s)?• How many have experienced issues due to network latency?• Different uses for data – Normal processing – Simple analytics
    4. 4. ReplicaSet Lifecycle
    5. 5. Replica Set – Creation
    6. 6. Replica Set – Initialize
    7. 7. Replica Set – Failure
    8. 8. Replica Set – Failover
    9. 9. Replica Set – Recovery
    10. 10. Replica Set – Recovered
    11. 11. ReplicaSet Roles &Configuration
    12. 12. Replica Set Roles
    13. 13. Configuration Options> conf = { _id : "mySet", members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ]}> rs.initiate(conf)
    14. 14. Configuration Options> conf = { _id : "mySet”, members : [ Primary DC {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ]}> rs.initiate(conf)
    15. 15. Configuration Options> conf = { Secondary DC _id : "mySet”, Default Priority = 1 members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ]}> rs.initiate(conf)
    16. 16. Configuration Options> conf = { _id : "mySet”, members : [ {_id : 0, host : "A”, priority : 3}, Analytics {_id : 1, host : "B", priority : 2}, node {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ]}> rs.initiate(conf)
    17. 17. Configuration Options> conf = { _id : "mySet”, members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ]} Backup node> rs.initiate(conf)
    18. 18. Developing with ReplicaSets
    19. 19. Strong Consistency
    20. 20. Delayed Consistency
    21. 21. Write Concern• Network acknowledgement• Wait for error• Wait for journal sync• Wait for replication
    22. 22. Unacknowledged
    23. 23. MongoDB Acknowledged (wait forerror)
    24. 24. Wait for Journal Sync
    25. 25. Wait for Replication
    26. 26. Tagging• New in 2.0.0• Control where data is written to, and read from• Each member can have one or more tags – tags: {dc: "ny"} – tags: {dc: "ny",
 subnet: "192.168",
 rack: "row3rk7"}• Replica set defines rules for write concerns• Rules can change without changing app code
    27. 27. Tagging Example{ _id : "mySet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}}], settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}} }}> db.blogs.insert({...})> db.runCommand({getLastError : 1, w : "someDCs"})
    28. 28. Wait for Replication (Tagging)
    29. 29. Read Preference Modes• 5 modes (new in 2.2) – primary (only) - Default – primaryPreferred – secondary – secondaryPreferred – Nearest When more than one node is possible, closest node is used for reads (all modes but primary)
    30. 30. Tagged Read Preference• Custom read preferences• Control where you read from by (node) tags – E.g. { "disk": "ssd", "use": "reporting" }• Use in conjunction with standard read preferences – Except primary
    31. 31. OperationalConsiderations
    32. 32. Maintenance and Upgrade• No downtime• Rolling upgrade/maintenance – Start with Secondary – Primary last
    33. 33. Replica Set – 1 Data Center • Single datacenter • Single switch & power • Points of failure: – Power – Network – Data center – Two node failure • Automatic recovery of single node crash
    34. 34. Replica Set – 2 Data Centers • Multi data center • DR node for safety • Can’t do multi data center durable write safely since only 1 node in distant DC
    35. 35. Replica Set – 3 Data Centers • Three data centers • Can survive full data center loss • Can do w= { dc : 2 } to guarantee write in 2 data centers (with tags)
    36. 36. Behind the Curtain
    37. 37. Implementation details• Heartbeat every 2 seconds – Times out in 10 seconds• Local DB (not replicated) – system.replset – oplog.rs • Capped collection • Idempotent version of operation stored
    38. 38. Op(erations) Log isidempotent> db.replsettest.insert({_id:1,value:1}){ "ts" : Timestamp(1350539727000, 1), "h" :NumberLong("6375186941486301201"), "op" : "i", "ns" :"test.replsettest", "o" : { "_id" : 1, "value" : 1 } }> db.replsettest.update({_id:1},{$inc:{value:10}}){ "ts" : Timestamp(1350539786000, 1), "h" :NumberLong("5484673652472424968"), "op" : "u", "ns" :"test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "value" : 11 } } }
    39. 39. Single operation can havemany entries> db.replsettest.update({},{$set:{name : ”foo”}, false, true}){ "ts" : Timestamp(1350540395000, 1), "h" : NumberLong("-4727576249368135876"), "op" : "u", "ns" : "test.replsettest", "o2" : {"_id" : 2 }, "o" : { "$set" : { "name" : "foo" } } }{ "ts" : Timestamp(1350540395000, 2), "h" : NumberLong("-7292949613259260138"), "op" : "u", "ns" : "test.replsettest", "o2" : {"_id" : 3 }, "o" : { "$set" : { "name" : "foo" } } }{ "ts" : Timestamp(1350540395000, 3), "h" : NumberLong("-1888768148831990635"), "op" : "u", "ns" : "test.replsettest", "o2" : {"_id" : 1 }, "o" : { "$set" : { "name" : "foo" } } }
    40. 40. What’s New in 2.2• Read preference support with sharding – Drivers too• Improved replication over WAN/high-latency networks• rs.syncFrom command• buildIndexes setting• replIndexPrefetch setting
    41. 41. Just Use It• Use replica sets• Easy to setup – Try on a single machine• Check doc page for RS tutorials – http://docs.mongodb.org/manual/replication/#tutorials
    42. 42. #MongoSeoulThank YouSteve FranciaChief Evangelist, 10gen
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×