Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Шардинг в MongoDB, Henrik Ingo (MongoDB)

3,109 views

Published on

Доклад Хенрика Инго на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

Шардинг в MongoDB, Henrik Ingo (MongoDB)

  1. 1. Solutions Architect, MongoDB Henrik Ingo @h_ingo Introduction to Sharding
  2. 2. Agenda •Scaling Data •MongoDB'sApproach •Architecture •Configuration •Mechanics
  3. 3. MongoDB's Approach to Sharding
  4. 4. Partitioning • User defines shard key • Shard key defines range of data • Key space is like points on a line • Range is a segment of that line
  5. 5. Initially 1 chunk Default max chunk size: 64mb MongoDB automatically splits & migrates chunks when max reached Data Distribution
  6. 6. Queries routed to specific shards MongoDB balances cluster MongoDB migrates data to new nodes Routing and Balancing
  7. 7. MongoDB Auto-Sharding • Minimal effort required – Same interface as single mongod • Two steps – Enable Sharding for a database – Shard collection within database
  8. 8. Architecture
  9. 9. What is a Shard? • Shard is a node of the cluster • Shard can be a single mongod or a replica set
  10. 10. Sharding infrastructure
  11. 11. Configuration
  12. 12. Example Cluster
  13. 13. mongod --configsvr Starts a configuration server on the default port (27019) Starting the Configuration Server
  14. 14. mongos --configdb <hostname>:27019 For 3 configuration servers: mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3> This is always how to start a new mongos, even if the cluster is already running Start the mongos Router
  15. 15. mongod --shardsvr Starts a mongodwith the default shard port (27018) Shard is not yet connected to the rest of the cluster Shard may have already been running in production Start the shard database
  16. 16. On mongos: – sh.addShard(‘<host>:27018’) Adding a replica set: – sh.addShard(‘<rsname>/<seedlist>’) Add the Shard
  17. 17. db.runCommand({ listshards:1 }) { "shards" : [{"_id”: "shard0000”,"host”: ”<hostname>:27018” } ], "ok" : 1 } Verify that the shard was added
  18. 18. Enabling Sharding • Enable sharding on a database sh.enableSharding(“<dbname>”) • Shard a collection with the given key sh.shardCollection(“<dbname>.people”,{“country”:1}) • Use a compound shard key to prevent duplicates sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})
  19. 19. Tag Aware Sharding • Tag aware sharding allows you to control the distribution of your data • Tag a range of shard keys – sh.addTagRange(<collection>,<min>,<max>,<tag>) • Tag a shard – sh.addShardTag(<shard>,<tag>)
  20. 20. Routing Requests
  21. 21. Cluster Request Routing • Targeted Queries • Scatter Gather Queries • Scatter Gather Queries with Sort
  22. 22. Cluster Request Routing: Targeted Query
  23. 23. Routable request received
  24. 24. Request routed to appropriate shard
  25. 25. Shard returns results
  26. 26. Mongos returns results to client
  27. 27. Cluster Request Routing: Non-Targeted Query
  28. 28. Non-Targeted Request Received
  29. 29. Request sent to all shards
  30. 30. Shards return results to mongos
  31. 31. Mongos returns results to client
  32. 32. Cluster Request Routing: Non-Targeted Query with Sort
  33. 33. Non-Targeted request with sort received
  34. 34. Request sent to all shards
  35. 35. Query and sort performed locally
  36. 36. Shards return results to mongos
  37. 37. Mongos merges sorted results
  38. 38. Mongos returns results to client
  39. 39. Shard Key
  40. 40. Shard Key • Shard key is immutable • Shard key values are immutable • Shard key must be indexed • Shard key limited to 512 bytes in size • Shard key used to route queries – Choose a field commonly used in queries • Only shard key can be unique across shards – `_id` field is only unique within individual shard
  41. 41. Shard Key Considerations • Cardinality • Write Distribution • Query Isolation • Reliability • Index Locality
  42. 42. { _id: ObjectId(), user: 123, time: Date(), subject: “...”, recipients: [], body: “...”, attachments: [] } Example: Email Storage Most common scenario, can be applied to 90% cases Each document can be up to 16MB Each user may have GBs of storage Most common query: get user emails sorted by time Indexes on {_id}, {user, time}, {recipients}
  43. 43. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality
  44. 44. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/gat her All users affected Good
  45. 45. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/gat her All users affected Good hash(_id) Hash level All Shards Scatter/gat her All users affected Poor
  46. 46. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/gat her All users affected Good hash(_id) Hash level All Shards Scatter/gat her All users affected Poor user Many docs All Shards Targeted Some users affected Good
  47. 47. Example: Email Storage Cardinality Write Scaling Query Isolation Reliability Index Locality _id Doc level One shard Scatter/gat her All users affected Good hash(_id) Hash level All Shards Scatter/gat her All users affected Poor user Many docs All Shards Targeted Some users affected Good user, time Doc level All Shards Targeted Some users affected Good
  48. 48. @h_ingo Questions?
  49. 49. Solutions Architect, MongoDB Henrik Ingo @h_ingo Thank you!

×