This document discusses MongoDB sharding. It explains that sharding allows scaling a MongoDB deployment across multiple servers. Key aspects covered include the sharding architecture with config servers, shards, and mongos routers. It also discusses concepts like shard keys, chunks, and how the balancer migrates chunks for even data distribution. The document provides an example demo of setting up a sharded cluster and checking the sharding configuration and status.
2. Agenda
• Why Sharding
• Sharding Architecture
• What is Sharding
• Sharding Balancer
• Write/Reads with Sharding
• Sharding Limitation
• Demo 2
3. Why Sharding
• All writes go to master
• Latency sensitive queries still go to master
• Single replica set has limitation of 12 nodes
• Memory can’t be large enough when active
dataset is big
• Local Disk is not big enough
• Vertical upgrade is too expensive
3
5. Config
Config Servers
Servers
mongod
mongod
• We have three config servers in prod
cluster or one in test environment mongod
• Changes are made using 2 phase commit to
provide strong consistency among all 3
config servers
• If anyone is down, meta data will be read
only
• System is online as long as 1/3 is up
5
6. shard1
Shards mongo
mongo
• Each Shard can be master, master/slave or
replica set
• Replica set provides auto-failover capability
for sharding cluster
• Regular mongod processes
6
7. Mongos
mongos
• Sharding Router
• Acts just like a mongod to clients, it makes
the cluster “invisible” to clients
• You can have as many as you want
• It’s suggested to run on appserver
• It caches metadata from config servers
7
9. What is Sharding
• It’s range based
• Automatic balancing for changes in load and data distribution
• Convert from single replica set to sharding cluster without
downtime
• Easy addition of new shards without downtime
• Scaling to one thousand nodes
• No single points of failure
• Automatic failover
9
10. Shard key
• It can be one or more fields
• every document needs a shard key (null is ok)
• shard key can’t be updated
• MongoDB's sharding is order-preserving.You can
define the shard key as ascending order or
descending order, like { tag : 1, timestamp : -1 }
• null < numbers < strings < objects < arrays <
binary data < ObjectIds < booleans < dates <
regular expressions
10
11. Chunk
• A chunk is a contiguous range of data from a
particular collection
• Collection is broken into chunks by range
• A chunk is a logical concept, not a physical
reality. $minKey <= key < $maxKey
• Each document must belong to one and only
one chunk
• default size is 64M, can be specified by --
chunksize
11
19. Chunk Migration
• Chunk Migration is an expensive operation
• Only one chunk migration happens at any
time
• based on overall size of the shard
• Balancer will automatically migrate chunks
between shards
• you can also manually move chunks
15
20. Sharding Balancer
• keep data evenly distributed on all shards
• minimize the amount of data transfered
• For a balancing round to occur, a shard
must have at least nine more chunks than
the least-populous shard
• it can be turn off
• db.settings.update({"_id" : "balancer"}, {"$set" : {"stopped" : true }}, true)
16
25. Choosing Shard Key
• A good shard key can distribute reads and
writes, but that also keeps the data you’re
using together
• Don’t use ascending shard key like ID
• Don’t use low cardinality shard key like
continent
• Don’t use random shard key like MD5
• Good example: Coarsely ascending key +
search key
20
27. Sharding Limitation
• Unique index can’t be created without shared
key as a prefix
• You can’t update shard key
• Only one chunk move in the cluster at a time
• Sharding does not yet support data center
awareness
• Add new shards brings in more traffic to
existing cluster
• 20Pb size limit
22
28. Demo
• Startup Shards
• Startup config servers
• Startup mongos
• Configure Shards
• Shard Data
• Look at config data @ mongo config server
23
35. Look at config data
• Login config database
• db.shards.find()
• db.databases.find()
• db.chunks.find()
• db.printShardingStatus(true)
30
36. Recommend Reads
• Mongodb Documentation
• http://www.mongodb.org/display/DOCS/
Sharding
• Book “Scaling Mongodb”
• You can find it on
www.safaribooksonline.com
31