2. Agenda
1) What is Replication ?
2) How Replication is handled ?
3) Replica Sets Or Master-Slave Replication.
4) What and Why Sharding ?
5) Implementation of Sharding .
3. Replication
> Replication is the process of synchronizing data across multiple
servers.
> Replication provides redundancy and increases data availability
with multiple copies of data on different database servers, replication
protects a database from the loss of a single server.
>Disaster Recovery
> No downtime for maintenance (like backups, index rebuilds,
compaction)
> Read scaling (extra copies to read from)
4. How Replication Works
> MongoDB achieves replication by the use of replica set. A replica set is
a group of mongod instances that host the same data set.
- Replica set is a group of two or more nodes (generally minimum 3
nodes are required).
- In a replica set one node is primary node and remaining nodes are
secondary.
- All data replicates from primary to secondary node.
- At the time of automatic failover or maintenance, election establishes
for primary and a new primary node is elected.
After the recovery of failed node, it again join the replica set and works
as a secondary node.
5. Replica Set Members
1) Primary
2) Secondaries
2.1) Priority 0 Replica Set Members
2.2) Hidden Replica Set Members.
2.3) Delayed Replica Set Members
3) Arbiter
6. Primary Replica Set Member
The primary is the only member in the replica set that
receives write operations.
●
MongoDB applies write operations on the primary and
then records the operations on the primary’s oplog.
●
Secondary members replicate this log and apply the
operations to their data sets.
7. Priority 0 Replica Set Members
A secondary maintains a copy of the primary’s data set.
A priority 0 member is a secondary that cannot become primary.
●
Priority 0 members cannot trigger elections. Otherwise these members function as normal
secondaries.
●
A priority 0 member maintains a copy of the data set, accepts read operations, and votes in
elections.
●
Configure a priority 0 member to prevent secondaries from becoming primary, which is
particularly useful in multi-data center deployments.
8. Hidden Replica Set Members
A hidden member maintains a copy of the primary’s data
set but is invisible to client applications.
●
Hidden members must always be priority 0 members and
so cannot become primary.
●
The db.isMaster() method does not display hidden
members. Hidden members, however, may vote in
elections.
9. Delayed Replica Set Members
Delayed members contain copies of a replica set’s data set.
●
However, a delayed member’s data set reflects an earlier, or delayed,
state of the set.
●
Must be priority 0 members. Set the priority to 0 to prevent a delayed
member from becoming primary.
●
Should be hidden members. Always prevent applications from
seeing and querying delayed members.
●
do vote in elections for primary, if members[n].votes is set to 1.
10. Replica Set Arbiter
An arbiter does not have a copy of data set and cannot
become a primary.
●
Replica sets may have arbiters to add a vote in elections
of for primary.
●
Arbiters always have exactly 1 election vote, and thus
allow replica sets to have an uneven number of voting
members without the overhead of an additional member
that replicates data.
11. DEMO FOR REPLICATION
● Make a replicaset with 5 members different kind of
replica (e.g. Primary, Secondary, Hidden, Arbitrary
and Priority 0).
● Insert data and watch behavior for Delay and
Arbiter member , and other Secondary members
● Turn Down Primary and Invoke Elections .
● Adjust Priority for Replica Set Member And Prevent
Secondary from Becoming Primary
● Configure Non-Voting Replica Set Member
12. Sharding
● Sharding is a method for distributing data
across multiple machines.
● MongoDB uses sharding to support
deployments with very large data sets and
high throughput operations.
● MongoDB supports horizontal scaling through
sharding.
14. Shard Keys
● To distribute the documents in a collection,
MongoDB partitions the collection using the
shard key.
● The shard key consists of an immutable field or
fields that exist in every document in the target
collection.
● You choose the shard key when sharding a
collection. The choice of shard key cannot be
changed after sharding.
15. Shard Key
● Chunks : - A contiguous range of shard key
values within a particular shard. MongoDB
splits chunks when they grow beyond the
configured chunk size, which by default is 64
megabytes
16. The Perfect Shard Key
If you think about it, the perfect shard key would have the following
characteristics:
●
All inserts, updates, and deletes would each be distributed
uniformly across all of the shards in the cluster
● All queries would be uniformly distributed across all of the shards in
the cluster
● All operations would only target the shards of interest: an update or
delete would never be sent to a shard which didn't own the data being
modified
● Similarly, a query would never be sent to a shard which holds none
of the data being queried
17. Hashed Vs Ranged Sharding
● Hashed shard keys use a hashed index of a
single field as the shard key to partition data
across your sharded cluster.
● Ranged-based sharding involves dividing data
into contiguous ranges determined by the
shard key values. In this model, documents
with “close” shard key values are likely to be in
the same chunk or shard.
18. By using a hashed index on X, the distribution of inserts
is similar to the following:
Given a collection using a monotonically increasing value X as
the shard key, using ranged sharding results in a
distribution of incoming inserts similar to the following:
Ranged sharding is most efficient when the shard key displays the following traits:
Large Shard Key Cardinality
Low Shard Key Frequency
Non-Monotonically Changing Shard Keys