#MongoDBDays

Introduction to Sharding
Craig Wilson
Software Engineer, MongoDB
@craiggwilson
Sharding is a

Solution for scalability
Examining Growth
•  User Growth
–  1995: 0.4% of the world’s population
–  Today: 30% of the world is online (~2.2B)
–  Em...
Do you need to Shard?
Read/Write Throughput

Exceeds I/O
Working Set

Exceeds Physical Memory
Sharding in MongoDB
Horizontally Scalable
Application Independent
One API
What is a Shard?
Replica Set

Primary

Secondary

Secondary
Single Node in a Cluster

Shard
P

S
S

Shard

Shard
P

S
S

P

S
S
Composed of Chunks
•  Grouping of data based on a range
•  Default Max Size: 64 MB
Chunks Have Ranges

A-B

S-Z

M
Chunks Get Split

A-B

S-V

M

W-Z
Chunks Get Migrated
•  One shard has 7 more chunks than another
•  Triggered manually
Chunks Get Migrated
•  One shard has 7 more chunks than another
•  Triggered manually
Chunks Get Migrated
•  One shard has 7 more chunks than another
•  Triggered manually
How does it all work?
Configuration
•  3 Config Servers
–  Just mongod
–  Stores chunk ranges and location
–  Not a replica set

Config

Config

Con...
Routers
•  Mongos
–  Both a router and a balancer
–  No local data
–  Can have 1 or many

Mongos
Cluster

Application

Application

Mongos

Mongos

Config
Config
Config
Shard
P

S
S

Shard

Shard
P

S
S

P

S
S
Query Routing
Shard Key
•  Defines the range of data called a Key Space
•  Defines the distribution of documents in a collection
•  Every ...
Chunks
•  Each chunk contains a non-overlapping range of

Shard Key values
3 Types of Queries
•  Targeted Queries
•  Scatter Gather Queries
•  Scatter Gather Queries with Sorting
Targeted Queries
•  Query contains the shard key

Mongos

P

S
S

P

S
S

P

S
S
Scatter Gather Queries
•  Query does not contain the shard key

Mongos

P

S
S

P

S
S

P

S
S
Scatter Gather Queries with Sort
•  Query does not contain the shard key
•  Sorting is done first on the Shard
•  Results a...
How do I pick a good Shard Key?
Considerations
•  Cardinality
•  Write Distribution
•  Query Isolation
•  Reliability
•  Index Locality
Example: Email Storage
>	
  db.emails.find({	
  user:	
  123	
  })	
  
{	
  
	
  	
  	
  _id:	
  ObjectId(),	
  	
  
	
  	...
Example: Email Storage
Cardinality	


Write
Scaling	


Query
Isolation	


Reliability	


Index	

Locality
Example: Email Storage
Cardinality	


_id	


Write
Scaling	


Doc level	

 One shard	


Query
Isolation	


Reliability	


...
Example: Email Storage
Cardinality	


Write
Scaling	


Query
Isolation	


Reliability	


Index	

Locality	


_id	


Doc le...
Example: Email Storage
Cardinality	


Write
Scaling	


Query
Isolation	


Reliability	


Index	

Locality	


_id	


Doc le...
Example: Email Storage
Cardinality	


Write
Scaling	


Query
Isolation	


Reliability	


Index	

Locality	


_id	


Doc le...
How do I get up and running?
5 Steps
•  Launch Config Servers
•  Launch Mongos
•  Launch Shards
•  Add Shards
•  Enable Sharding
Launch Config Servers
•  mongod	
  –configsvr	
  

•  Starts 1 config server on the default port 27019

Config
Config
Config
Launch Mongos
•  mongos	
  –configdb	
  hostname:

27019,hostname2:27019,hostname3:27019	
  

Config
Config
Config

Mongos
Launch Shards
•  Nothing special, just like a normal replica set

Config

Mongos
Shard

Config
P

S

Config
S
Add Shards
•  Connect to mongos via the shell
•  sh.addShard(“<rsname>/<seedlist>”)	
  

Config

Mongos
Shard

Config
P

S

...
Verify that the shard was added
db.runCommand({	
  listShards:	
  1	
  })	
  
{	
  	
  
	
  	
  shards	
  :	
  [	
  
	
  	...
Enable Sharding
•  Enable sharding on a database
–  sh.enableSharding(“<dbname>”)	
  

•  Shard a collection with the give...
Tag Aware Sharding
•  Tag aware sharding allows you to control the

distribution of your data
•  Tag a range of shard keys...
Conclusion
Read/Write Throughput

Exceeds I/O
Working Set

Exceeds Physical Memory
Sharding Enables Scale
MongoDB’s Auto-Sharding
–  Easy to Configure
–  Consistent Interface
–  Free and Open Source
#MongoDBDays

Thank You
Craig Wilson
Software Engineer, MongoDB
@craiggwilson
Upcoming SlideShare
Loading in...5
×

Introduction to sharding

210

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
210
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Introduction to sharding

  1. 1. #MongoDBDays Introduction to Sharding Craig Wilson Software Engineer, MongoDB @craiggwilson
  2. 2. Sharding is a Solution for scalability
  3. 3. Examining Growth •  User Growth –  1995: 0.4% of the world’s population –  Today: 30% of the world is online (~2.2B) –  Emerging Markets & Mobile •  Data Set Growth –  Facebook’s data set is around 100 petabytes –  4 billion photos taken in the last year (4x a decade ago)
  4. 4. Do you need to Shard?
  5. 5. Read/Write Throughput Exceeds I/O
  6. 6. Working Set Exceeds Physical Memory
  7. 7. Sharding in MongoDB
  8. 8. Horizontally Scalable
  9. 9. Application Independent
  10. 10. One API
  11. 11. What is a Shard?
  12. 12. Replica Set Primary Secondary Secondary
  13. 13. Single Node in a Cluster Shard P S S Shard Shard P S S P S S
  14. 14. Composed of Chunks •  Grouping of data based on a range •  Default Max Size: 64 MB
  15. 15. Chunks Have Ranges A-B S-Z M
  16. 16. Chunks Get Split A-B S-V M W-Z
  17. 17. Chunks Get Migrated •  One shard has 7 more chunks than another •  Triggered manually
  18. 18. Chunks Get Migrated •  One shard has 7 more chunks than another •  Triggered manually
  19. 19. Chunks Get Migrated •  One shard has 7 more chunks than another •  Triggered manually
  20. 20. How does it all work?
  21. 21. Configuration •  3 Config Servers –  Just mongod –  Stores chunk ranges and location –  Not a replica set Config Config Config
  22. 22. Routers •  Mongos –  Both a router and a balancer –  No local data –  Can have 1 or many Mongos
  23. 23. Cluster Application Application Mongos Mongos Config Config Config Shard P S S Shard Shard P S S P S S
  24. 24. Query Routing
  25. 25. Shard Key •  Defines the range of data called a Key Space •  Defines the distribution of documents in a collection •  Every document must contain the Shard Key •  Shard Keys are immutable
  26. 26. Chunks •  Each chunk contains a non-overlapping range of Shard Key values
  27. 27. 3 Types of Queries •  Targeted Queries •  Scatter Gather Queries •  Scatter Gather Queries with Sorting
  28. 28. Targeted Queries •  Query contains the shard key Mongos P S S P S S P S S
  29. 29. Scatter Gather Queries •  Query does not contain the shard key Mongos P S S P S S P S S
  30. 30. Scatter Gather Queries with Sort •  Query does not contain the shard key •  Sorting is done first on the Shard •  Results are merged in Mongos P S S P Mongos S S P S S
  31. 31. How do I pick a good Shard Key?
  32. 32. Considerations •  Cardinality •  Write Distribution •  Query Isolation •  Reliability •  Index Locality
  33. 33. Example: Email Storage >  db.emails.find({  user:  123  })   {        _id:  ObjectId(),          user:  123,        time:  Date(),          subject:  “...”,          recipients:  [],          body:  “...”,          attachments:  []   }    
  34. 34. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality
  35. 35. Example: Email Storage Cardinality _id Write Scaling Doc level One shard Query Isolation Reliability Index Locality Scatter/ gather All users affected Good
  36. 36. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/ gather All users affected Good hash(_id) Hash level All Shards Scatter/ gather All users affected Poor
  37. 37. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/ gather All users affected Good hash(_id) Hash level All Shards Scatter/ gather All users affected Poor user Many docs All Shards Targeted Some users affected Good
  38. 38. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/ gather All users affected Good hash(_id) Hash level All Shards Scatter/ gather All users affected Poor user Many docs All Shards Targeted Some users affected Good Doc level Targeted Some users affected Good user, time All Shards
  39. 39. How do I get up and running?
  40. 40. 5 Steps •  Launch Config Servers •  Launch Mongos •  Launch Shards •  Add Shards •  Enable Sharding
  41. 41. Launch Config Servers •  mongod  –configsvr   •  Starts 1 config server on the default port 27019 Config Config Config
  42. 42. Launch Mongos •  mongos  –configdb  hostname: 27019,hostname2:27019,hostname3:27019   Config Config Config Mongos
  43. 43. Launch Shards •  Nothing special, just like a normal replica set Config Mongos Shard Config P S Config S
  44. 44. Add Shards •  Connect to mongos via the shell •  sh.addShard(“<rsname>/<seedlist>”)   Config Mongos Shard Config P S Config S
  45. 45. Verify that the shard was added db.runCommand({  listShards:  1  })   {        shards  :  [          {  _id:  “shard0000”,  host:  “<hostname>:27017”  }        ],      “ok”  :  1   }    
  46. 46. Enable Sharding •  Enable sharding on a database –  sh.enableSharding(“<dbname>”)   •  Shard a collection with the given key –  sh.shardCollection(“<dbname>.people”,  {  country:  1  })   –  sh.shardCollection(“<dbname>”.cars”,  {  year:  1,  uniqueid:  1})  
  47. 47. Tag Aware Sharding •  Tag aware sharding allows you to control the distribution of your data •  Tag a range of shard keys –  sh.addTagRange(<collection>,<min>,<max>,<tag>)   •  Tag a shard –  sh.addShardTag(<shard>,<tag>)  
  48. 48. Conclusion
  49. 49. Read/Write Throughput Exceeds I/O
  50. 50. Working Set Exceeds Physical Memory
  51. 51. Sharding Enables Scale MongoDB’s Auto-Sharding –  Easy to Configure –  Consistent Interface –  Free and Open Source
  52. 52. #MongoDBDays Thank You Craig Wilson Software Engineer, MongoDB @craiggwilson
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×