Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations by providing the capability for horizontal scaling.
2. Back to Basics 2017 : Webinar 4
Introduction to Sharding
Joe Drumgoole
Director of Developer Advocacy, EMEA
MongoDB
@jdrumgoole
V1.1
3. 3
Summary of Part 1 to 3
• Introduction to NoSQL
• Your First MongoDB Application
• Introduction to Replica Sets
• MongoDB Compass, MongoDB Atlas
4. 4
Agenda
• Sharding – What is it? Why do we need it?
• The architecture of a sharded cluster
• Sharded cluster constraints
• How a sharded cluster works in practice
9. But There is More
Application
mongos mongos mongos
Driver
Config Server
10. 10
Construction
• Build Cluster
• Identify shard key
• Sharding happens on individual collections
• To shard a collection:
sh.shardcollections( "MUGS.members",{ "members.member_id" : 1 } )
11. 11
Shard Keys
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
12. 12
Shard Key Constraints
• Shard keys are immutable
• Shard keys should have high cardinality
• Shard keys must be unique
• Shard key must exist in every document
• Limited to 512 bytes in size
• Cannot be a multi-key (array)
Once chunk size is reached, mongos asks mongod to split a chunk
+ internal function called splitVector()
mongod counts number of documents on each side of split
+ based on avg. document size `db.stats()`
Chunk split is a **logical** operation (no data has moved)
Max on first chunk should be 14
Balancer is running on mongos
Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts