Sharding in MongoDB Days 2013
Upcoming SlideShare
Loading in...5
×
 

Sharding in MongoDB Days 2013

on

  • 753 views

Sharding presentation used at the MongoDB Days 2013 conferences in North America: Seattle, Chicago,

Sharding presentation used at the MongoDB Days 2013 conferences in North America: Seattle, Chicago,

Statistics

Views

Total Views
753
Views on SlideShare
753
Embed Views
0

Actions

Likes
0
Downloads
21
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Sharding in MongoDB Days 2013 Sharding in MongoDB Days 2013 Presentation Transcript

  • #MongoDBDays Chicago Introduction To Sharding J. Randall Hunt
 Hackathoner, MongoDB
 @jrhunt, randall@mongodb.com
  • In Today's Talk • What? Why? When? • How? • What's happening beind the scenes?
  • What Is Sharding?
  • This is a picture of my cat.
  • This is a picture of ~100 cats. http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg
  • This is a cat trying to find a home webserver mongod
  • 100 cats trying to find a home. webserver (not to scale) mongod
  • Scale Up?
  • Data Store Scalability • Custom Hardware • Custom Software In the past you've had two options for achieving data store scalability: 1) custom hardware (oracle?) 2) custom software (google, facebook) ! The reason these things were custom were that these problems were not yet common enough. The number of people on the internet 10 years ago is incredibly small compared to the number of people using web services 10 years from now.
  • Scale Out?
  • Scale Out?
  • The MongoDB Sharding Solution • Automatically partition your data • Worry about failover at the partition layer • Application independent • Free and open source
  • Why Do I Shard?
  • Input/Output You input/output exceeds the capacity of a single node or replica set. this is not easy to do!
  • Working Set Exceeds Physical Memory RAM
  • Working Set Exceeds Physical Memory Data RAM
  • Working Set Exceeds Physical Memory Data RAM Indexes
  • Working Set Exceeds Physical Memory Data RAM Sorts Indexes
  • Working Set Exceeds Physical Memory Data RAM Sorts Indexes Aggregations
  • Working Set Exceeds Physical Memory Data Indexes RAM Sorts Aggregations
  • Working Set Exceeds Physical Memory
  • How Does Sharding Work?
  • MongoDB's Sharding Infrastructure
  • MongoDB's Sharding Infrastructure app server mongod
  • MongoDB's Sharding Infrastructure app server mongod mongod mongod
  • MongoDB's Sharding Infrastructure app server shard
  • MongoDB's Sharding Infrastructure app server shard
  • MongoDB's Sharding Infrastructure app server mongos shard
  • MongoDB's Sharding Infrastructure app server mongos mongod --configsvr shard
  • MongoDB's Sharding Infrastructure app server mongos mongod --configsvr shard
  • Terminology • Shards • Chunks • Config Servers • mongos A shard is a server, or a collection of servers, that holds chunks of info which are split up according to a shard key, a shard holds a subset of a collection's data A chunk of info is a group of data falling in a particular range based on a shard key that can be moved logically from server to server config serves hold information about where chunks live mongos is the router and balancer -- it communicates with the config servers and figures out how to intelligently direct your query.
  • What exactly is a shard? • Shard is a node of the cluster • Can be a single mongod or an entire replica set Shard Mongod Shard or Primary Secondary Secondary Now what do shards hold? Chunks, which are partitions of your data that live in certain ranges.
  • Partitioning • User defines a shard key or uses hash based sharding • Shard key defines a range of data • The key space is like points on a line • A range is a segment of that line -∞ Remember interval notation? Key Space +∞
  • Data Distribution Initially a single chunk Default Max Chunk Size: 64mb MongoDB willMongos Mongos split and migrate chunks as automatically Mongos they reach the max size Config Node 1 Secondary Server Shard 1 Mongod Shard 2
  • Shards and Shard Keys
  • Shards and Shard Keys Chunks!
  • Shards and Shard Keys Chunks! Shard Keys!
  • What is a config server? • A config server is for storing shard meta-data • It stores chunk ranges and locations • Run with 3 in production! Config Node 1 Secondary Server Config Node 1 Secondary Server or Config Node 1 Secondary Server Config Node 1 Secondary Server this is not a replica set, the three servers are purely for failover purposes. ! pro-tip use CNAMEs to identify these.
  • What is a mongos? • Acts as a router / balancer for queries and ops • No local data (persists all info to the config servers) • Can run with just one or many App Server App Server App Server App Server or Mongos Mongos Mongos
  • MongoDB's Sharding Infrastructure App Server Config Node 1 Secondary Server App Server App Server Mongos Mongos Mongos Shard Shard Shard Config Node 1 Secondary Server Config Node 1 Secondary Server
  • Get Started With Sharding? 1. Choose a shard key (we'll talk about this later) 2. Start config servers 3. Turn on sharding 4. Profit.
  • Mechanics of Sharding Oh hey there devops!
  • Start the Configuration Server Config Node 1 Secondary Server mongod --configsvr Starts a configuration server on the default port (27019)
  • Start the mongos router Mongos Config Node 1 Secondary Server mongos --configdb catconf.mongodb.com:27019
  • Start the mongod Mongos Config Node 1 Secondary Server Shard Mongod mongod --shardsvr Starts a mongod with the default shard port (27018) Shard is not yet connected to the rest of the cluster Could have already been a part of the cluster
  • Add the Shard Mongos Config Node 1 Secondary Server Shard Mongod On mongos: sh.addShard('cat1.mongodb.com:27018') For a replica set: sh.addShard('<rsname>/<seedlist>')
  • Check that everything is working! Mongos Config Node 1 Secondary Server Shard Mongod [mongos] admin> db.runCommand({ listshards: 1 }) { "shards": [ { "_id": "shard0000", "host": "cat1.mongodb.com:27018" } ], "ok": 1 }
  • Now enable sharding • Enable Sharding on a database
 sh.enableSharding("<dbname>") • Shard a collection (with a key):
 sh.shardCollection(
 "<dbname>.cat",
 {"name": 1}) • Use a compound shard key to prevent duplicates
 sh.shardCollection(
 "<dbname>.cats",
 {"name": 1, "uniqueid": 1})
  • Tag Aware Sharding • Total control over the distribution of your data! • Tag a range of shard keys:
 sh.addTagRange(<collection>,<min>,<max>,<tag>) • Tag a shard:
 sh.addShardTag("shard0000","NYC")

  • The Balancer • • Transparent to driver and application • try to minimize clock skew with ntpd Ensures even distribution of chunks across the cluster Very tuneable but defaults are often sensible
  • Routing Requests (Oh hi there application developers!)
  • Cluster Request Routing Scatter Gather Targeted Choose your own adventure!
  • Targeted Query Mongos Shard Shard Shard
  • Routable request received 1 Mongos Shard Shard Shard
  • Request routed to appropriate shard 1 Mongos 2 Shard Shard Shard
  • Shard returns results 1 Mongos 2 3 Shard Shard Shard
  • mongos returns results to client 1 4 Mongos 2 3 Shard Shard Shard
  • Non-targeted queries Mongos Shard Shard Shard
  • request received 1 Mongos Shard Shard Shard
  • Farm request out to all shards 1 Mongos 2 Shard 2 Shard 2 Shard
  • shards return results to mongos 1 Mongos 2 3 Shard 2 2 3 Shard 3 Shard
  • mongos returns results to client 1 4 Mongos 2 3 Shard 2 2 3 Shard 3 Shard
  • Choosing A Shard Key
  • Things to remember! • • Shard key values are immutable • Shard key must be indexed • It is limited to 512 bytes in size • Try to choose a field used in queries • should not be monotonically increasing! Shard Key is immutable Only the shard key can be guaranteed unique across shards
  • How to choose your key? • Cardinality • Write Distribution • Query Isolation • Reliability • Index Locality Cardinality – Can your data be broken down enough? Query Isolation - query targeting to a specific shard Reliability – shard outages
 ! A good shard key can: 
 Optimize routing Minimize (unnecessary) traffic Allow best scaling ! consider pre splitting no unique indexes keys unless part of the shard key ! geokeys cannot be part of a shardkey $near won't work but the $geo commands work fine
  • Thanks! • What's Next? • Resources:
 https://education.mongodb.com/
 https://www.mongodb.com/presentations • Me:
 @jrhunt, randall@mongodb.com In summary -- and this is not a sales pitch... lots of other databases out there have sharding and replication... not many of them provide the granularity of control that you need for your applications while maintaining sensible defaults.