VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Evolution of MonogDB Sharding and Its Best Practices - Ranjith A - Mydbops Team
1. Evolution of MongoDB Sharding and Its Best Practices
Presenter by
Ranjith A
Database Engineer @ Mydbops
Mydbops 11th Webinar
www.mydbops.com info@mydbops.com
3. Mydbops at a Glance
● Founded in 2015, HQ in Bangalore India, 70+ Employees.
● Mydbops is on Database Consulting with core specialization on MySQL, MongoDB and PostgreSQL
Administration and Support.
● Mydbops was created with a motto of developing a DevOPS model for Database Administration.
● We help organisations to scale in MySQL/Mongodb/postgresql and implement the advanced technologies in
MySQL/Mongodb/PostgreSQL.
5. Agenda
● Intro
● Sharding Overview
● Sharding Architecture
● Types of Sharding
● Things need to be taken care before choosing shard key
● Evolution of sharding from 3.6 to 5.0
● Q/A
7. Sharding Overview
● Sharding is a method for distributing data across multiple machines.
● By using MongoDB sharding, We can handle very large data sets and high throughput operations.
● Database systems with large data sets or high throughput applications can challenge the capacity of a single server.
● For example, If our system is a heavy read sensitive it will exhaust the CPU capacity of the server. Working set sizes
larger than the system's RAM stress the I/O capacity of disk drives.
8. Sharding Overview
We can scale the system in two engineering approach.
Vertical Scaling Horizontal sharding
Achieved by MongoDB replica set / standalone server Achieved by MongoDB Sharding
Data set in Single server Distribute the data set to Multiple servers
Increasing the server capacity of a single server. Adding additional servers to increase capacity as required.
There is a limitation for increasing the server capacity We can easily add or remove additional servers as required.
10. Sharding Architecture - (Data Shard)
● MongoDB supports horizontal scaling through MongoDB Sharded Cluster.
● MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.
● MongoDB Sharded Cluster consists of the following components:
1. Shards
2. Mongos
3. Config server
Shard:
● Each shard contains a subset of the sharded data.
● Each shard can be deployed as a replica set.
11. Sharding Architecture-(Mongos)
Mongos:
● The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
● Applications never connect or communicate directly with the shards.
● The mongos tracks what data is on which shard by caching the metadata from the config servers.
● The mongos uses the metadata to route operations from applications and clients to the mongod instances.
● The mongos receives responses from all shards, it merges the data and returns the result document.
12. Sharding Architecture-(Config)
Config Server:
● Config servers store metadata and configuration settings for the cluster. As of MongoDB 3.4, config servers must
be deployed as a replica set (CSRS).
● If your cluster has a single config server, then the config server is a single point of failure.
● If the config server is inaccessible, the cluster is not accessible.
● If you cannot recover the data on a config server, the cluster will be inoperable.
● Always use three config servers for production deployments
13. Types of Sharding-(Hashed Sharding)
MongoDB supports two sharding methods for distributing data across sharded clusters.
● Hashed sharding
● Ranged Sharding
Hashed sharding:
14. Hashed sharding
● Hashed Sharding involves computing a hash of the shard key field's value. We can use either a single field hashed index
or a compound hashed index (New in 4.4) as the shard key.
● MongoDB automatically computes the hashes when resolving queries using hashed indexes.
● Hashed sharding provides a more even data distribution across the sharded cluster.
● The fields which we chose the hashed shard key should have a good cardinality (more unique values).
● Default _id field is the best example for good cardinality (Objectid values).
16. Hashed sharding
Command to enable Sharding: (Collection Level)
sh.shardCollection( "databasename.collectionname", { "field" : "hashed" } )
● Make sure before enabling sharding for a particular collection database must be sharded also the Index must be
available for the fields which we are going to use as a shard key.
18. Ranged sharding
● In Range-based sharding, data's are splitted into contiguous ranges determined by the shard key values.
● Data with "close" shard key values are likely to be in the same chunk or shard.
● This will improve the performance of the read queries (target documents) within a contiguous range.
● Poor sharded key selection will affect both read and write performance.
Command to enable Sharding: (Collection Level)
sh.shardCollection( "databasename.collectionname", { "field" : 1 } )
● Make sure before enabling sharding for a particular collection, database must be sharded also the Index must be
available for the fields which we are going to use as a shard key.
19. Zone sharding
● In sharded clusters, you can create single or multiple zones in single shard as well as multiple shards.
● Zones represent a group of shards and associate one or more ranges of shard key values to that zone.
● Zone ranges are always inclusive of the lower boundary and exclusive of the upper boundary.
● From MongoDB 4.0.2, dropping a collection deletes its associated zone/tag ranges.
● Zones information are stored in config.shards & config.tags collection
● Starting from MongoDB 4.4 brings we can shard a collection and determine zones by compound keys, including
mixing a hashed key with non-hashed keys.
21. Zone sharding
Command to create Zone range:
sh.updateZoneKeyRange("dbname.collectionname", { fieldname: "minkey" }, { fieldname: "maxkey" },
"zonename")
22. Sharded key
Shard key is the key to evenly distribute the data among all shards. Good shared key always satisfies the below points.
● High Cardinality
● High Frequency
● Non Monotonically Changing Shard Keys
● Sharding Query patterns
23. Sharded key
High Cardinality:
● High cardinality shard key - More no. of chunks & evenly distributed data
● Low cardinality shard key - Less no. of chunks & low distributed data
● Each unique shard key value can exist on no more than a single chunk at any given time.
High Frequency:
● High frequency shard key - More evenly distributed data
● Low frequency shard key - Low distributed data
● Shard key cardinality & monotonically changing shard key also contribute to the distribution of the data.
24. Sharded key
Non Monotonically Changing Shard Keys:-
● monotonically increases or decreases shard key tends to distribute the data to a single chunk within the cluster.
● Each chunk has its own min & Max value.
25. Evolution of sharding from 3.6 to 5.0
Mongo
version
Modify shard
key field value
Refining
shard key
Change
shard key
New variables New features
3.6 NO NO NO orphanCleanupDelaySecs Shard must be a replica set
All shard members have chunk
metadata
4.2 YES NO NO sh.setBalancerState(true)
sh.setBalancerState(false)
Modify shard key field value except
immutable _id field
4.4 YES YES NO Hedged Reads Refinable shard keys
Hedged Reads
compound shard keys with a hashed
field
Remove multiple shard at a time
Remove shard key size limit
5.0 YES YES YES reshardCollection Change the shard key
Change the name of a sharded
collection
26. Sharding Features in 3.6
● Shards must be replica sets.
● All members of the shard replica set maintain the metadata regarding chunk metadata. This prevents reads from
the secondaries from returning orphaned data.
● Based on the orphanCleanupDelaySecs (New in 3.6) variable migrated chunk is deleted from the source shard.
● orphanCleanupDelaySecs - Default 900 (15 min)
Set the orphanCleanupDelaySecs value to 20 min during the mongo service start
mongod --setParameter orphanCleanupDelaySecs=1200 (20 min)
setParameter command:
db.adminCommand( { setParameter: 1, orphanCleanupDelaySecs: 1200 } )
27. Sharding Features in 4.2
● We can update a document's shard key value except the shard key field is the immutable _id field.
● In earlier version we can't change the document's shard key value.
28. Sharding Features in 4.2
New variables in 4.2:
● sh.startBalancer() - sh.setBalancerState(true) (Enable auto-splitting for the sharded cluster)
● sh.stopBalancer() - sh.setBalancerState(false) (Disable auto-splitting for the sharded cluster)
● sh.enableAutoSplit() - Enable auto-splitting when the balancer is disabled
29. Sharding Features in 4.4
● Refinable Shard Keys - refine a collection's shard key by adding a suffix field or fields to the existing key
db.employee.createIndex({"employeeid" : 1, "mailid": 1})
db.adminCommand( {refineCollectionShardKey: "mydbops.employee",
key: { "employeeid" : 1, "mailid": 1 }} )
● In 4.4, Shard key field can be missing in a sharded collection.
● In earlier versions, shard key fields must exist in every document for a sharded collection.
● Support Hedged Reads - To minimize latencies
● Support compound shard keys with a hashed field.
● More than one removeShard operation at a time.
● MongoDB removes the 512-byte limit on the shard key size.
30. Sharding Features in 5.0
● We have a option to change the shard key by using reshardCollection command.
● To change the name of a sharded collection by using renameCollection command.
Things need to be taken care before resharding:
● Initially MongoDB block writes to two seconds and begins the resharding operation.
● Available space should be 1.2x the size of the collection that you want to reshard.
● Ensure the Disk (50%) & CPU (80%) utilisation will be minimal.
● The new shard key cannot have a uniqueness constraint
● If a collection having uniqueness constraint is not supported for Resharding
31. Sharding Features in 5.0
The following commands are not supported on the collection, while the resharding operation is in progress.
● collMod
● convertToCapped
● createIndexes
● createIndex()
● drop()
● dropIndexes
● dropIndex()
● renameCollection
● renameCollection()
32. Sharding Features in 5.0
The following methods are not supported on the cluster, while the resharding operation is in progress.
● addShard
● removeShard
● db.createCollection()
● dropDatabase
Resharded commands:
● db.adminCommand({ reshardCollection: "mydbops.client", key: {"cperiod": 1} } )