#MongoDBDays Chicago

Introduction To Sharding
J. Randall Hunt

Hackathoner, MongoDB

@jrhunt, randall@mongodb.com
In Today's Talk

•

What? Why? When?

•

How?

•

What's happening beind the scenes?
What Is Sharding?
This is a picture of my cat.
This is a picture of ~100 cats.

http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg
This is a cat trying to find a home

webserver

mongod
100 cats trying to find a home.

webserver

(not to scale)

mongod
Scale Up?
Data Store Scalability

•

Custom Hardware

•

Custom Software

In the past you've had two options for achieving data stor...
Scale Out?
Scale Out?
The MongoDB Sharding Solution
•

Automatically partition your data

•

Worry about failover at the partition layer

•

App...
Why Do I Shard?
Input/Output

You input/output exceeds the capacity of a single node or replica set.

this is not easy to do!
Working Set Exceeds Physical Memory

RAM
Working Set Exceeds Physical Memory

Data

RAM
Working Set Exceeds Physical Memory

Data

RAM

Indexes
Working Set Exceeds Physical Memory

Data

RAM Sorts

Indexes
Working Set Exceeds Physical Memory

Data

RAM Sorts

Indexes

Aggregations
Working Set Exceeds Physical Memory

Data

Indexes
RAM

Sorts

Aggregations
Working Set Exceeds Physical Memory
How Does Sharding Work?
MongoDB's Sharding Infrastructure
MongoDB's Sharding Infrastructure
app server

mongod
MongoDB's Sharding Infrastructure
app server

mongod
mongod
mongod
MongoDB's Sharding Infrastructure
app server

shard
MongoDB's Sharding Infrastructure
app server

shard
MongoDB's Sharding Infrastructure
app server

mongos

shard
MongoDB's Sharding Infrastructure
app server

mongos

mongod --configsvr

shard
MongoDB's Sharding Infrastructure
app server

mongos

mongod --configsvr

shard
Terminology
•

Shards

•

Chunks

•

Config Servers

•

mongos

A shard is a server, or a collection of servers, that holds...
What exactly is a shard?
•

Shard is a node of the cluster

•

Can be a single mongod or an entire replica set

Shard

Mon...
Partitioning
•

User defines a shard key or uses hash based sharding

•

Shard key defines a range of data

•

The key space...
Data Distribution
Initially a single chunk
Default Max Chunk Size: 64mb
MongoDB willMongos Mongos split and migrate chunks...
Shards and Shard Keys
Shards and Shard Keys
Chunks!
Shards and Shard Keys
Chunks!

Shard Keys!
What is a config server?
•

A config server is for storing shard meta-data

•

It stores chunk ranges and locations

•

Run ...
What is a mongos?
•

Acts as a router / balancer for queries and ops

•

No local data (persists all info to the config ser...
MongoDB's Sharding Infrastructure
App Server

Config
Node 1
Secondary
Server

App Server

App Server

Mongos

Mongos

Mongo...
Get Started With Sharding?
1. Choose a shard key (we'll talk about this later)
2. Start config servers
3. Turn on sharding
...
Mechanics of Sharding
Oh hey there devops!
Start the Configuration Server

Config
Node 1

Secondary
Server

mongod --configsvr
Starts a configuration server on the defa...
Start the mongos router

Mongos

Config
Node 1

Secondary
Server

mongos --configdb catconf.mongodb.com:27019
Start the mongod
Mongos

Config
Node 1

Secondary
Server

Shard
Mongod

mongod --shardsvr
Starts a mongod with the default ...
Add the Shard
Mongos

Config
Node 1

Secondary
Server

Shard
Mongod

On mongos:
sh.addShard('cat1.mongodb.com:27018')
For a...
Check that everything is working!
Mongos

Config
Node 1

Secondary
Server

Shard
Mongod

[mongos] admin> db.runCommand({ li...
Now enable sharding
•

Enable Sharding on a database

sh.enableSharding("<dbname>")

•

Shard a collection (with a key):

...
Tag Aware Sharding
•

Total control over the distribution of your data!

•

Tag a range of shard keys:

sh.addTagRange(<co...
The Balancer

•
•

Transparent to driver and application

•

try to minimize clock skew with ntpd

Ensures even distributi...
Routing Requests
(Oh hi there application developers!)
Cluster Request Routing

Scatter Gather

Targeted

Choose your own adventure!
Targeted Query

Mongos

Shard

Shard

Shard
Routable request received
1

Mongos

Shard

Shard

Shard
Request routed to appropriate shard
1

Mongos

2

Shard

Shard

Shard
Shard returns results
1

Mongos

2
3

Shard

Shard

Shard
mongos returns results to client
1
4
Mongos

2
3

Shard

Shard

Shard
Non-targeted queries

Mongos

Shard

Shard

Shard
request received
1

Mongos

Shard

Shard

Shard
Farm request out to all shards
1

Mongos

2

Shard

2

Shard

2

Shard
shards return results to mongos
1

Mongos

2
3

Shard

2

2
3

Shard

3

Shard
mongos returns results to client
1
4
Mongos

2
3

Shard

2

2
3

Shard

3

Shard
Choosing A Shard Key
Things to remember!
•
•

Shard key values are immutable

•

Shard key must be indexed

•

It is limited to 512 bytes in si...
How to choose your key?
•

Cardinality

•

Write Distribution

•

Query Isolation

•

Reliability

•

Index Locality

Card...
Thanks!
•

What's Next?

•

Resources:

https://education.mongodb.com/

https://www.mongodb.com/presentations

•

Me:

@jr...
Sharding in MongoDB Days 2013
Upcoming SlideShare
Loading in...5
×

Sharding in MongoDB Days 2013

919

Published on

Sharding presentation used at the MongoDB Days 2013 conferences in North America: Seattle, Chicago,

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • obat kista coklat http://obatkistacoklatsite.wordpress.com/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
919
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
33
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Sharding in MongoDB Days 2013

  1. 1. #MongoDBDays Chicago Introduction To Sharding J. Randall Hunt
 Hackathoner, MongoDB
 @jrhunt, randall@mongodb.com
  2. 2. In Today's Talk • What? Why? When? • How? • What's happening beind the scenes?
  3. 3. What Is Sharding?
  4. 4. This is a picture of my cat.
  5. 5. This is a picture of ~100 cats. http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg
  6. 6. This is a cat trying to find a home webserver mongod
  7. 7. 100 cats trying to find a home. webserver (not to scale) mongod
  8. 8. Scale Up?
  9. 9. Data Store Scalability • Custom Hardware • Custom Software In the past you've had two options for achieving data store scalability: 1) custom hardware (oracle?) 2) custom software (google, facebook) ! The reason these things were custom were that these problems were not yet common enough. The number of people on the internet 10 years ago is incredibly small compared to the number of people using web services 10 years from now.
  10. 10. Scale Out?
  11. 11. Scale Out?
  12. 12. The MongoDB Sharding Solution • Automatically partition your data • Worry about failover at the partition layer • Application independent • Free and open source
  13. 13. Why Do I Shard?
  14. 14. Input/Output You input/output exceeds the capacity of a single node or replica set. this is not easy to do!
  15. 15. Working Set Exceeds Physical Memory RAM
  16. 16. Working Set Exceeds Physical Memory Data RAM
  17. 17. Working Set Exceeds Physical Memory Data RAM Indexes
  18. 18. Working Set Exceeds Physical Memory Data RAM Sorts Indexes
  19. 19. Working Set Exceeds Physical Memory Data RAM Sorts Indexes Aggregations
  20. 20. Working Set Exceeds Physical Memory Data Indexes RAM Sorts Aggregations
  21. 21. Working Set Exceeds Physical Memory
  22. 22. How Does Sharding Work?
  23. 23. MongoDB's Sharding Infrastructure
  24. 24. MongoDB's Sharding Infrastructure app server mongod
  25. 25. MongoDB's Sharding Infrastructure app server mongod mongod mongod
  26. 26. MongoDB's Sharding Infrastructure app server shard
  27. 27. MongoDB's Sharding Infrastructure app server shard
  28. 28. MongoDB's Sharding Infrastructure app server mongos shard
  29. 29. MongoDB's Sharding Infrastructure app server mongos mongod --configsvr shard
  30. 30. MongoDB's Sharding Infrastructure app server mongos mongod --configsvr shard
  31. 31. Terminology • Shards • Chunks • Config Servers • mongos A shard is a server, or a collection of servers, that holds chunks of info which are split up according to a shard key, a shard holds a subset of a collection's data A chunk of info is a group of data falling in a particular range based on a shard key that can be moved logically from server to server config serves hold information about where chunks live mongos is the router and balancer -- it communicates with the config servers and figures out how to intelligently direct your query.
  32. 32. What exactly is a shard? • Shard is a node of the cluster • Can be a single mongod or an entire replica set Shard Mongod Shard or Primary Secondary Secondary Now what do shards hold? Chunks, which are partitions of your data that live in certain ranges.
  33. 33. Partitioning • User defines a shard key or uses hash based sharding • Shard key defines a range of data • The key space is like points on a line • A range is a segment of that line -∞ Remember interval notation? Key Space +∞
  34. 34. Data Distribution Initially a single chunk Default Max Chunk Size: 64mb MongoDB willMongos Mongos split and migrate chunks as automatically Mongos they reach the max size Config Node 1 Secondary Server Shard 1 Mongod Shard 2
  35. 35. Shards and Shard Keys
  36. 36. Shards and Shard Keys Chunks!
  37. 37. Shards and Shard Keys Chunks! Shard Keys!
  38. 38. What is a config server? • A config server is for storing shard meta-data • It stores chunk ranges and locations • Run with 3 in production! Config Node 1 Secondary Server Config Node 1 Secondary Server or Config Node 1 Secondary Server Config Node 1 Secondary Server this is not a replica set, the three servers are purely for failover purposes. ! pro-tip use CNAMEs to identify these.
  39. 39. What is a mongos? • Acts as a router / balancer for queries and ops • No local data (persists all info to the config servers) • Can run with just one or many App Server App Server App Server App Server or Mongos Mongos Mongos
  40. 40. MongoDB's Sharding Infrastructure App Server Config Node 1 Secondary Server App Server App Server Mongos Mongos Mongos Shard Shard Shard Config Node 1 Secondary Server Config Node 1 Secondary Server
  41. 41. Get Started With Sharding? 1. Choose a shard key (we'll talk about this later) 2. Start config servers 3. Turn on sharding 4. Profit.
  42. 42. Mechanics of Sharding Oh hey there devops!
  43. 43. Start the Configuration Server Config Node 1 Secondary Server mongod --configsvr Starts a configuration server on the default port (27019)
  44. 44. Start the mongos router Mongos Config Node 1 Secondary Server mongos --configdb catconf.mongodb.com:27019
  45. 45. Start the mongod Mongos Config Node 1 Secondary Server Shard Mongod mongod --shardsvr Starts a mongod with the default shard port (27018) Shard is not yet connected to the rest of the cluster Could have already been a part of the cluster
  46. 46. Add the Shard Mongos Config Node 1 Secondary Server Shard Mongod On mongos: sh.addShard('cat1.mongodb.com:27018') For a replica set: sh.addShard('<rsname>/<seedlist>')
  47. 47. Check that everything is working! Mongos Config Node 1 Secondary Server Shard Mongod [mongos] admin> db.runCommand({ listshards: 1 }) { "shards": [ { "_id": "shard0000", "host": "cat1.mongodb.com:27018" } ], "ok": 1 }
  48. 48. Now enable sharding • Enable Sharding on a database
 sh.enableSharding("<dbname>") • Shard a collection (with a key):
 sh.shardCollection(
 "<dbname>.cat",
 {"name": 1}) • Use a compound shard key to prevent duplicates
 sh.shardCollection(
 "<dbname>.cats",
 {"name": 1, "uniqueid": 1})
  49. 49. Tag Aware Sharding • Total control over the distribution of your data! • Tag a range of shard keys:
 sh.addTagRange(<collection>,<min>,<max>,<tag>) • Tag a shard:
 sh.addShardTag("shard0000","NYC")

  50. 50. The Balancer • • Transparent to driver and application • try to minimize clock skew with ntpd Ensures even distribution of chunks across the cluster Very tuneable but defaults are often sensible
  51. 51. Routing Requests (Oh hi there application developers!)
  52. 52. Cluster Request Routing Scatter Gather Targeted Choose your own adventure!
  53. 53. Targeted Query Mongos Shard Shard Shard
  54. 54. Routable request received 1 Mongos Shard Shard Shard
  55. 55. Request routed to appropriate shard 1 Mongos 2 Shard Shard Shard
  56. 56. Shard returns results 1 Mongos 2 3 Shard Shard Shard
  57. 57. mongos returns results to client 1 4 Mongos 2 3 Shard Shard Shard
  58. 58. Non-targeted queries Mongos Shard Shard Shard
  59. 59. request received 1 Mongos Shard Shard Shard
  60. 60. Farm request out to all shards 1 Mongos 2 Shard 2 Shard 2 Shard
  61. 61. shards return results to mongos 1 Mongos 2 3 Shard 2 2 3 Shard 3 Shard
  62. 62. mongos returns results to client 1 4 Mongos 2 3 Shard 2 2 3 Shard 3 Shard
  63. 63. Choosing A Shard Key
  64. 64. Things to remember! • • Shard key values are immutable • Shard key must be indexed • It is limited to 512 bytes in size • Try to choose a field used in queries • should not be monotonically increasing! Shard Key is immutable Only the shard key can be guaranteed unique across shards
  65. 65. How to choose your key? • Cardinality • Write Distribution • Query Isolation • Reliability • Index Locality Cardinality – Can your data be broken down enough? Query Isolation - query targeting to a specific shard Reliability – shard outages
 ! A good shard key can: 
 Optimize routing Minimize (unnecessary) traffic Allow best scaling ! consider pre splitting no unique indexes keys unless part of the shard key ! geokeys cannot be part of a shardkey $near won't work but the $geo commands work fine
  66. 66. Thanks! • What's Next? • Resources:
 https://education.mongodb.com/
 https://www.mongodb.com/presentations • Me:
 @jrhunt, randall@mongodb.com In summary -- and this is not a sales pitch... lots of other databases out there have sharding and replication... not many of them provide the granularity of control that you need for your applications while maintaining sensible defaults.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×