Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Black, 10gen


Published on

Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s sharding support, including an architectural overview, design principles, and automation.

Published in: Technology
  • Be the first to comment

MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Black, 10gen

  1. 1. Software Engineer, 10gen@brandonmblackBrandon Black#MongoDBDaysIntroduction toSharding
  2. 2. Agenda• Scaling Data• MongoDBs Approach• Architecture• Configuration• Mechanics
  3. 3. Scaling Data
  4. 4. Examining Growth• User Growth– 1995: 0.4% of the world‟s population– Today: 30% of the world is online (~2.2B)– Emerging Markets & Mobile• Data Set Growth– Facebook‟s data set is around 100 petabytes– 4 billion photos taken in the last year (4x a decade ago)
  5. 5. Read/Write Throughput Exceeds I/O
  6. 6. Working Set Exceeds PhysicalMemory
  7. 7. Vertical Scalability (Scale Up)
  8. 8. Horizontal Scalability (Scale Out)
  9. 9. Data Store Scalability• Custom Hardware– Oracle• Custom Software– Facebook + MySQL– Google
  10. 10. Data Store Scalability Today• MongoDB Auto-Sharding• A data store that is– 100% Free– Publicly available– Open-source (– Horizontally scalable– Application independent
  11. 11. MongoDBs Approachto Sharding
  12. 12. Partitioning• User defines shard key• Shard key defines range of data• Key space is like points on a line• Range is a segment of that line
  13. 13. Data Distribution• Initially 1 chunk• Default max chunk size: 64mb• MongoDB automatically splits & migrates chunkswhen max reached
  14. 14. Routing and Balancing• Queries routed to specificshards• MongoDB balances cluster• MongoDB migrates data tonew nodes
  15. 15. MongoDB Auto-Sharding• Minimal effort required– Same interface as single mongod• Two steps– Enable Sharding for a database– Shard collection within database
  16. 16. Architecture
  17. 17. What is a Shard?• Shard is a node of the cluster• Shard can be a single mongod or a replica set
  18. 18. • Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have 3)– Not a replica setMeta Data Storage
  19. 19. Routing and Managing Data• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many
  20. 20. Sharding infrastructure
  21. 21. Configuration
  22. 22. Example Cluster• Don’t use this setup in production!- Only one Config server (No FaultTolerance)- Shard not in a replica set (LowAvailability)- Only one mongos and shard (No Performance Improvement)- Useful for developmentor demonstrating configuration mechanics
  23. 23. Starting the Configuration Server• mongod --configsvr• Starts a configuration server on the default port (27019)
  24. 24. Start the mongos Router• mongos --configdb <hostname>:27019• For 3 configuration servers:mongos--configdb<host1>:<port1>,<host2>:<port2>,<host3>:<port3>• This is always how to start a new mongos, even if the clusteris already running
  25. 25. Start the shard database• mongod --shardsvr• Starts a mongod with the default shard port (27018)• Shard is not yet connected to the rest of the cluster• Shard may have already been running in production
  26. 26. Add the Shard• On mongos:- sh.addShard(„<host>:27018‟)• Adding a replica set:- sh.addShard(„<rsname>/<seedlist>‟)
  27. 27. Verify that the shard was added• db.runCommand({ listshards:1 }){ "shards" :[{"_id”:"shard0000”,"host”:”<hostname>:27018”} ],"ok" : 1}
  28. 28. Enabling Sharding• Enable sharding on a databasesh.enableSharding(“<dbname>”)• Shard a collection with the given keysh.shardCollection(“<dbname>.people”,{“country”:1})• Use a compound shard key to prevent duplicatessh.shardCollection(“<dbname>.cars”,{“year”:1,”uniqueid”:1})
  29. 29. Tag Aware Sharding• Tag aware sharding allows you to control thedistribution of your data• Tag a range of shard keyssh.addTagRange(<collection>,<min>,<max>,<tag>)• Tag a shardsh.addShardTag(<shard>,<tag>)
  30. 30. Mechanics
  31. 31. Partitioning• Remember its based on ranges
  32. 32. Chunk is a section of the entire range
  33. 33. Chunk splitting• Achunk is split once it exceeds the maximum size• There is no split point if all documents have the same shardkey• Chunk split is a logical operation (no data is moved)
  34. 34. Balancing• Balancer is running on mongos• Once the difference in chunks between the most dense shardand the least dense shard is above the migration threshold, abalancing round starts
  35. 35. Acquiring the Balancer Lock• The balancer on mongos takes out a “balancer lock”• To see the status of these locks:use configdb.locks.find({_id: “balancer”})
  36. 36. Moving the chunk• The mongos sends a moveChunk command to source shard• The source shard then notifies destination shard• Destination shard starts pulling documents from source shard
  37. 37. Committing Migration• When complete, destination shard updates config server- Provides new locationsof the chunks
  38. 38. Cleanup• Source shard deletes moved data- Must wait for open cursors to either close or time out- NoTimeoutcursors may preventthe release of the lock• The mongos releases the balancer lock after old chunks aredeleted
  39. 39. Routing Requests
  40. 40. Cluster Request Routing• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort
  41. 41. Cluster Request Routing: TargetedQuery
  42. 42. Routable request received
  43. 43. Request routed to appropriate shard
  44. 44. Shard returns results
  45. 45. Mongos returns results to client
  46. 46. Cluster Request Routing: Non-TargetedQuery
  47. 47. Non-Targeted Request Received
  48. 48. Request sent to all shards
  49. 49. Shards return results to mongos
  50. 50. Mongos returns results to client
  51. 51. Cluster Request Routing: Non-TargetedQuery with Sort
  52. 52. Non-Targeted request with sortreceived
  53. 53. Request sent to all shards
  54. 54. Query and sort performed locally
  55. 55. Shards return results to mongos
  56. 56. Mongos merges sorted results
  57. 57. Mongos returns results to client
  58. 58. Shard Key
  59. 59. Shard Key• Shard key is immutable• Shard key values are immutable• Shard key must be indexed• Shard key limited to 512 bytes in size• Shard key used to route queries– Choose a field commonly used in queries• Only shard key can be unique across shards– `_id` field is only unique within individual shard
  60. 60. Shard Key Considerations• Cardinality• Write Distribution• Query Isolation• Reliability• Index Locality
  61. 61. Conclusion
  62. 62. Read/Write Throughput Exceeds I/O
  63. 63. Working Set Exceeds PhysicalMemory
  64. 64. Sharding Enables Scale• MongoDB‟sAuto-Sharding– Easy to Configure– Consistent Interface– Free and Open-Source
  65. 65. • What‟s next?– Hash-Based Sharding in MongoDB 2.4 (2:50pm)– Webinar: Indexing and Query Optimization (May 22nd)– Online Education Program– MongoDB User Group• Resources
  66. 66. Software Engineer, 10gen@brandonmblackBrandon Black#MongoDBDaysThank You