Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sharding Overview


Published on

  • Be the first to comment

Sharding Overview

  1. 1. #MongoDBdaysShardingEdouard Servan-Schreiber, Ph.D.Director of Solution Architecture, 10gen
  2. 2. Visual representation of vertical scaling1970 - 2000: Vertical Scalability (scaleup)
  3. 3. Visual representation of horizontal scalingGoogle, ~2000: Horizontal Scalability (scaleout)
  4. 4. Why shard?
  5. 5. Working Set Exceeds PhysicalMemory
  6. 6. Read/Write Throughput Exceeds I/O
  7. 7. Partition data based on ranges• User defines shard key• Shard key defines range of data• Key space is like points on a line• Range is a segment of that line
  8. 8. Distribute data in chunks acrossnodes• Initially 1 chunk• Default max chunk size: 64mb• MongoDB automatically splits & migrates chunks when max reached
  9. 9. MongoDB manages data access• Queries routed to specific shards• MongoDB balances cluster• MongoDB migrates data to new nodes
  10. 10. Architecture
  11. 11. Data stored in shard• Shard is a node of the cluster• Shard can be a single mongod or a replica set
  12. 12. Config server stores meta data• Config Server – Stores cluster chunk ranges and locations – Can have only 1 or 3 (production must have 3) – Two phase commit (not a replica set)
  13. 13. MongoS manages the data routing• Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or many
  14. 14. Sharding infrastructure
  15. 15. Mechanics
  16. 16. Partitioning• Remember its based on ranges
  17. 17. Chunk is a section of the entire range
  18. 18. Chunk splitting• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same shard key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk count across cluster a balancing round starts
  19. 19. Balancing• Balancer is running on mongos• Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
  20. 20. Acquiring the Balancer Lock• The balancer on mongos takes out a “balancer lock”• To see the status of these locks: - use config - db.locks.find({ _id: “balancer” })
  21. 21. Moving the chunk• The mongos sends a “moveChunk” command to source shard• The source shard then notifies destination shard• The destination claims the chunk shard-key range• Destination shard starts pulling documents from source shard
  22. 22. Committing Migration• When complete, destination shard updates config server - Provides new locations of the chunks
  23. 23. Cleanup• Source shard deletes moved data - Must wait for open cursors to either close or time out - NoTimeout cursors may prevent the release of the lock• Mongos releases the balancer lock after old chunks are deleted
  24. 24. Routing Requests
  25. 25. Cluster Request Routing• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort
  26. 26. Cluster Request Routing: TargetedQuery
  27. 27. Routable request received
  28. 28. Request routed to appropriate shard
  29. 29. Shard returns results
  30. 30. Mongos returns results to client
  31. 31. Cluster Request Routing: Non-TargetedQuery
  32. 32. Non-Targeted Request Received
  33. 33. Request sent to all shards
  34. 34. Shards return results to mongos
  35. 35. Mongos returns results to client
  36. 36. Cluster Request Routing: Non-TargetedQuery with Sort
  37. 37. Non-Targeted request with sortreceived
  38. 38. Request sent to all shards
  39. 39. Query and sort performed locally
  40. 40. Shards return results to mongos
  41. 41. Mongos merges sorted results
  42. 42. Mongos returns results to client
  43. 43. Shard Key
  44. 44. Shard Key• Choose a field common used in queries• Shard key is immutable• Shard key values are immutable• Shard key requires index on fields contained in key• Uniqueness of `_id` field is only guaranteed within individual shard• Shard key limited to 512 bytes in size
  45. 45. Shard Key Considerations• Cardinality• Write distribution• Query isolation• Data Distribution
  46. 46. Hashed Sharding
  47. 47. Hashed Shard Keys• New in MongoDB 2.4• Uses Hashed indexes• The mongos will route all equality queries to a specific shard or set of shards; however, the mongos must route range queries to all shards.• When using a hashed shard key on an empty collection, MongoDB automatically pre-splits the range of 64-bit hash values into chunks
  48. 48. Use Cases for Hashed Shard Keys• Write heavy applications• Need good write distribution• Tolerant of slower scatter gather queries• Rarely perform range queries (Everything is directed to one document)
  49. 49. #ConferenceHashTagThank YouSpeaker NameJob Title, 10gen