The document discusses MongoDB replication and replica sets. It covers the lifecycle of replica sets including creation, initialization, failure, and recovery. It also discusses replica set roles and configuration options. Additionally, it addresses considerations for developing with replica sets like strong/delayed consistency, write concerns, tagging, and read preferences. Finally, it discusses operational considerations like maintenance/upgrades and different replica set deployment architectures.
3. Why Replication?
• How many have faced node failures?
• How many have been woken up from sleep to
do a fail-over(s)?
• How many have experienced issues due to
network latency?
• Different uses for data
– Normal processing
– Simple analytics
26. Tagging
• New in 2.0.0
• Control where data is written to, and read from
• Each member can have one or more tags
– tags: {dc: "ny"}
– tags: {dc: "ny", subnet: "192.168", rack:
"row3rk7"}
• Replica set defines rules for write concerns
• Rules can change without changing app code
29. Read Preference Modes
• 5 modes (new in 2.2)
– primary (only) - Default
– primaryPreferred
– secondary
– secondaryPreferred
– Nearest
When more than one node is possible, closest node is used
for reads (all modes but primary)
30. Tagged Read Preference
• Custom read preferences
• Control where you read from by (node) tags
– E.g. { "disk": "ssd", "use": "reporting" }
• Use in conjunction with standard read
preferences
– Except primary
32. Maintenance and Upgrade
• No downtime
• Rolling upgrade/maintenance
– Start with Secondary
– Primary last
33. Replica Set – 1 Data Center
• Single datacenter
• Single switch & power
• Points of failure:
– Power
– Network
– Data center
– Two node failure
• Automatic recovery of
single node crash
34. Replica Set – 2 Data Centers
• Multi data center
• DR node for safety
• Can’t do multi data
center durable write
safely since only 1
node in distant DC
35. Replica Set – 3 Data Centers
• Three data centers
• Can survive full data
center loss
• Can do w= { dc : 2 } to
guarantee write in 2
data centers (with tags)
36. Recent improvements
• Read preference support with sharding
– Drivers too
• Improved replication over WAN/high-latency
networks
• rs.syncFrom command
• buildIndexes setting
• replIndexPrefetch setting
37. Just Use It
• Use replica sets
• Easy to setup
– Try on a single machine
• Check doc page for RS tutorials
– http://docs.mongodb.org/manual/replication/#tutorials
Basic explanation2 or more nodes form the setQuorum
Initialize -> ElectionPrimary + data replication from primary to secondary
Primary down/network failureAutomatic election of new primary if majority exists
New primary electedReplication established from new primary
Down node comes upRejoins setsRecovery and then secondary
PrimaryData memberSecondaryHot standbyArbitersVoting member
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
Analytics good for integrations with Hadoop, Storm, etc.
PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
ConsistencyWrite preferencesRead preferences
Not really fire and forget. This return arrow is to confirm that the network successfully transferred the packet(s) of data.This confirms that the TCP ACK response was received.
Presenter should mention:Default is w:1w:majority is what most people should use for durability. Majority is a special token here signifying more than half of the nodes in the set have acknowledged the write.
Using 'someDCs' so that in the event of an outage, at least a majority of the DCs would receive the change. This favors availability over durability.
Using 'allDCs' because we want to make certain all DCs have this piece of data. If any of the DCs are down, this would timeout. This favors durability over availability.
Upgrade/maintenanceCommon deployment scenarios
A good question to ask the audience : 'Why wouldn't you set w={dc:3}'… Why would you ever do that? What would be the complications?
Schemaoplog
rs.syncFromallows administrators to configure the member of a replica set that the current member will pull data from. Specify the name of the member you want to sync from in the form of [hostname]:[port].Replica Set Members will not Sync from Members Without Indexes Unless buildIndexes: falseTo prevent inconsistency between members of replica sets, if the member of a replica set has members[n].buildIndexes set to true, other members of the replica set will not sync from this member, unless they also have members[n].buildIndexes set to true. See SERVER-4160 for more information.New Option To Configure Index Pre-Fetching during ReplicationBy default, when replicating options, secondaries will pre-fetch Indexes associated with a query to improve replication throughput in most cases. The replIndexPrefetch setting and --replIndexPrefetch option allow administrators to disable this feature or allow the mongod to pre-fetch only the index on the _id field. See SERVER-6718 for more information.