0
Software Engineer, 10gen@brandonmblackBrandon Black#MongoDBDaysHash-Based Sharding inMongoDB 2.4
Agenda• Mechanics of Sharding– Key space– Chunks– Balancing• Request Routing• Hashed Shard Keys– Why use hashed shard keys...
Sharded Cluster
Sharding Your Data
What Is A Shard Key?• Shard key is used to partition your collection• Shard key must exist in every document• Shard key is...
The Key Space{x: 10} {x: -5} {x: -9} {x: 7} {x: 6} {x: 0}
Inserting Data{x: 0}{x: 6}{x: 7}{x: -5}{x: 10} {x: -9}
Inserting Data{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
Chunk Range and Size{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
Inserting Further Data{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}{x: 9}{x: -7} {x: 3}
Chunk Splitting{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}0 0• Achunk is split once it exceeds the maximum size• There is n...
Data Distribution• MinKey to 0 lives on Shard1• 0 to MaxKey lives on Shard2• Mongos routes queries appropriately
Mongos Routes DataminKey  0 0  maxKeydb.test.insert({ x: -1000 })
Mongos Routes DataminKey  0 0  maxKeydb.test.insert({ x: -1000 })
Unbalanced ShardsminKey  0 0  maxKey
Balancing• Migration threshold• Number of chunks less than 20, migration threshold of 2• 21-80, migration threshold 4• >80...
Moving the chunk• One chunk of data is copied from Shard 1 to Shard 2
Committing Migration• Once everyone agrees the data has moved, that chunk getsdeleted from Shard 1.
Cleanup• Other mongos have to find out about new configuration
Effects of Migrations• Expensive• Can take a long time• Competes for limited resources
Picking A Shard Key• Cardinality• Optimize routing• Minimize (unnecessary) traffic• Allow best scaling
Routing Requests
Cluster Request Routing• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort
Cluster Request Routing: TargetedQuery
Routable Request Received
Request routed to appropriate shard
Shard returns results
Mongos returns results to client
Cluster Request Routing: Non-TargetedQuery
Non-Targeted Request Received
Request sent to all shards
Shards return results to mongos
Mongos returns results to client
Cluster Request Routing: Non-TargetedQuery with Sort
Non-Targeted request with sortreceived
Request sent to all shards
Query and sort performed locally
Shards return results to mongos
Mongos merges sorted results
Mongos returns results to client
What About ObjectId?ObjectId("51597ca8e28587b86528edfd”)• Used for _id• 12 byte value• Generated by the driver if not spec...
What About ObjectId?ObjectId("51597ca8e28587b86528edfd”)12 BytesTimestampMACPIDCounter
// enabling sharding on test databasemongos> sh.enableSharding("test"){ "ok" : 1 }// sharding the test collectionmongos> s...
shards:{ "_id" : "shard0000", "host" : "localhost:30000" }{ "_id" : "shard0001", "host" : "localhost:30001" }databases:{ "...
ObjectId Results In A “Hot Shard”minKey  0 0  maxKey
Sharding on incrementalvalues like timestamp isnot optimum for evendistribution
Hashed Shard Keys
Hashed Shard Keys{x:2} md5 c81e728d9d4c2f636f067f89cc14862c{x:3} md5 eccbc87e4b5ce2fe28308fd9f2a7baf3{x:1} md5 c4ca4238a0b...
Hashed Shard Key Eliminates “HotShard”minKey  0 0  maxKey
Under the Hood• Create a hashed index used for sharding• Uses the first 64-bits of md5 hash of field• Hash both data and B...
// hash on 1 as an integer> db.runCommand({_hashBSONElement:1}){"key" : 1,"seed" : 0,"out" : NumberLong("59024087802609715...
Enabling Hashed Indexes• Create index:db.collection.ensureIndex({field : ”hashed”})
Using Hash Shard Keys• Enable sharding on collection:sh.shardCollection(“test.collection”,{field: “hashed”})
// enabling sharding on test databasemongos> sh.enableSharding("test"){ "ok" : 1 }// shard by hashed _id fieldmongos> sh.s...
databases:{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }test.hashshard key: { "_id" : "hashed" }chunks:...
// create a loop inserting datamongos> for (x=0; x<10000; x++) {... db.hash.insert({value:x})... }Inserting Into Hashed Sh...
test.hashshard key: { "_id" : "hashed" }chunks:shard0000 4shard0001 4{"_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLon...
Hash Keys Are Great for EqualityQueries• Equality queries directed to a specific shard• Will use the index• Most efficient...
mongos> db.hash.find({x:1}).explain(){"cursor" : "BtreeCursor x_hashed","n" : 1,"nscanned" : 1,"nscannedObjects" : 1,"mill...
Not So Good for a Range Query• Range queries scatter gather• Don’t use the index• Inefficient query
mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain(){"cursor" : "BasicCursor","n" : 97,"nChunkSkips" : 0,"nYields" : 0,"nsc...
Limitations• Cannot use a compound key• Key cannot have an array value• Incompatible with tag aware sharding– Tags would b...
Summary• There are 3 different approaches for sharding• Hash shard keys give great distribution• Hash shard keys are good ...
#MongoDBDaysThank YouSoftware Engineer, 10gen@brandonmblackBrandon Black
Upcoming SlideShare
Loading in...5
×

MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

1,181

Published on

In version 2.4, MongoDB introduces hash-based sharding, a new option for distributing data in sharded collections. Hash-based sharding and range-based sharding present different advantages for MongoDB users deploying large scale systems. In this talk, we'll provide an overview of this new feature and discuss when to use hash-based sharding or range-based sharding.

Published in: Sports, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,181
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
33
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Remind everyone what a sharded cluster is. We will take a close look at some how sharded clusters work and at the new hashed shard key feature of 2.4
  • Isolating queries (to a few shards)Scatter -- gather ( high latency but not bad )hash keys
  • Min value includedMax value not included
  • Balancer is running on mongosOnce the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
  • Moved chunk on shard2 should be gray
  • Source shard deletes moved dataMust wait for open cursors to either close or time outNoTimeout cursors may prevent the release of the lockMongos releases the balancer lock after old chunks are deleted
  • Moving data is expensive (i/o, network bandwidth)Moving many chunks takes a long time (can only move one chunk at a time)Balancing and migrations compete for resources with your application
  • The mongos does not have to load the whole set into memory since each shard sorts locally. The mongos can just getMore from the shards as needed and incrementally return the results to the client.
  • What’s the solution to sharding on incremental values as a shard key?
  • Uses the hashed index
  • Range Based - bestHash Based – uniform writes but not routed range queriesTag Aware
  • Transcript of "MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen"

    1. 1. Software Engineer, 10gen@brandonmblackBrandon Black#MongoDBDaysHash-Based Sharding inMongoDB 2.4
    2. 2. Agenda• Mechanics of Sharding– Key space– Chunks– Balancing• Request Routing• Hashed Shard Keys– Why use hashed shard keys– How to enable hashed shard keys– Limitations
    3. 3. Sharded Cluster
    4. 4. Sharding Your Data
    5. 5. What Is A Shard Key?• Shard key is used to partition your collection• Shard key must exist in every document• Shard key is immutable• Shard key values are immutable• Shard key must be indexed• Shard key is used to route requests to shards
    6. 6. The Key Space{x: 10} {x: -5} {x: -9} {x: 7} {x: 6} {x: 0}
    7. 7. Inserting Data{x: 0}{x: 6}{x: 7}{x: -5}{x: 10} {x: -9}
    8. 8. Inserting Data{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
    9. 9. Chunk Range and Size{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}
    10. 10. Inserting Further Data{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}{x: 9}{x: -7} {x: 3}
    11. 11. Chunk Splitting{x: 0} {x: 6} {x: 7}{x: -5} {x: 10}{x: -9}0 0• Achunk is split once it exceeds the maximum size• There is no split point if all documents have the same shard key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk count across clustera balancing round starts
    12. 12. Data Distribution• MinKey to 0 lives on Shard1• 0 to MaxKey lives on Shard2• Mongos routes queries appropriately
    13. 13. Mongos Routes DataminKey  0 0  maxKeydb.test.insert({ x: -1000 })
    14. 14. Mongos Routes DataminKey  0 0  maxKeydb.test.insert({ x: -1000 })
    15. 15. Unbalanced ShardsminKey  0 0  maxKey
    16. 16. Balancing• Migration threshold• Number of chunks less than 20, migration threshold of 2• 21-80, migration threshold 4• >80, migration threshold 8
    17. 17. Moving the chunk• One chunk of data is copied from Shard 1 to Shard 2
    18. 18. Committing Migration• Once everyone agrees the data has moved, that chunk getsdeleted from Shard 1.
    19. 19. Cleanup• Other mongos have to find out about new configuration
    20. 20. Effects of Migrations• Expensive• Can take a long time• Competes for limited resources
    21. 21. Picking A Shard Key• Cardinality• Optimize routing• Minimize (unnecessary) traffic• Allow best scaling
    22. 22. Routing Requests
    23. 23. Cluster Request Routing• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort
    24. 24. Cluster Request Routing: TargetedQuery
    25. 25. Routable Request Received
    26. 26. Request routed to appropriate shard
    27. 27. Shard returns results
    28. 28. Mongos returns results to client
    29. 29. Cluster Request Routing: Non-TargetedQuery
    30. 30. Non-Targeted Request Received
    31. 31. Request sent to all shards
    32. 32. Shards return results to mongos
    33. 33. Mongos returns results to client
    34. 34. Cluster Request Routing: Non-TargetedQuery with Sort
    35. 35. Non-Targeted request with sortreceived
    36. 36. Request sent to all shards
    37. 37. Query and sort performed locally
    38. 38. Shards return results to mongos
    39. 39. Mongos merges sorted results
    40. 40. Mongos returns results to client
    41. 41. What About ObjectId?ObjectId("51597ca8e28587b86528edfd”)• Used for _id• 12 byte value• Generated by the driver if not specified• Theoretically globally unique
    42. 42. What About ObjectId?ObjectId("51597ca8e28587b86528edfd”)12 BytesTimestampMACPIDCounter
    43. 43. // enabling sharding on test databasemongos> sh.enableSharding("test"){ "ok" : 1 }// sharding the test collectionmongos> sh.shardCollection("test.test",{_id:1}){ "collectionsharded" : "test.test", "ok" : 1 }// create a loop inserting datamongos> for (x=0; x<10000; x++) {... db.test.insert({value:x})... }Sharding on ObjectId
    44. 44. shards:{ "_id" : "shard0000", "host" : "localhost:30000" }{ "_id" : "shard0001", "host" : "localhost:30001" }databases:{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }test.testshard key: { "_id" : 1 }chunks:shard0001 3{ "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") }on : shard0001 { "t" : 1000, "i" : 1 }{ "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } }on : shard0001 { "t" : 1000, "i" : 2 }ObjectId Chunk Distribution
    45. 45. ObjectId Results In A “Hot Shard”minKey  0 0  maxKey
    46. 46. Sharding on incrementalvalues like timestamp isnot optimum for evendistribution
    47. 47. Hashed Shard Keys
    48. 48. Hashed Shard Keys{x:2} md5 c81e728d9d4c2f636f067f89cc14862c{x:3} md5 eccbc87e4b5ce2fe28308fd9f2a7baf3{x:1} md5 c4ca4238a0b923820dcc509a6f75849b
    49. 49. Hashed Shard Key Eliminates “HotShard”minKey  0 0  maxKey
    50. 50. Under the Hood• Create a hashed index used for sharding• Uses the first 64-bits of md5 hash of field• Hash both data and BSON type• Represented as a NumberLong in the shell
    51. 51. // hash on 1 as an integer> db.runCommand({_hashBSONElement:1}){"key" : 1,"seed" : 0,"out" : NumberLong("5902408780260971510"),"ok" : 1}// hash on “1” as a string> db.runCommand({_hashBSONElement:"1"}){"key" : "1","seed" : 0,"out" : NumberLong("-2448670538483119681"),"ok" : 1}Hash on both data and BSON type
    52. 52. Enabling Hashed Indexes• Create index:db.collection.ensureIndex({field : ”hashed”})
    53. 53. Using Hash Shard Keys• Enable sharding on collection:sh.shardCollection(“test.collection”,{field: “hashed”})
    54. 54. // enabling sharding on test databasemongos> sh.enableSharding("test"){ "ok" : 1 }// shard by hashed _id fieldmongos> sh.shardCollection("test.hash”,{_id:"hashed"}){ "collectionsharded" : "test.hash", "ok" : 1 }Sharding on Hashed ObjectId
    55. 55. databases:{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }test.hashshard key: { "_id" : "hashed" }chunks:shard0000 2shard0001 2{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 2 }{ "_id" : NumberLong("-4611686018427387902") } --> { "_id" : NumberLong(0) }on : shard0000 { "t" : 2000, "i" : 3 }{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") }on : shard0001 { "t" : 2000, "i" : 4 }{ "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1} } on : shard0001 { "t" : 2000, "i" : 5 }Pre-Splitting the Data
    56. 56. // create a loop inserting datamongos> for (x=0; x<10000; x++) {... db.hash.insert({value:x})... }Inserting Into Hashed Shard KeyCollection
    57. 57. test.hashshard key: { "_id" : "hashed" }chunks:shard0000 4shard0001 4{"_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8}{"_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9}{"_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("-2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6}{"_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0)} on : shard0000 { "t" : 2000, "i" : 7}{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743")} on : shard0001 { "t" : 2000, "i" : 12}Even Distribution of Chunks
    58. 58. Hash Keys Are Great for EqualityQueries• Equality queries directed to a specific shard• Will use the index• Most efficient query possible
    59. 59. mongos> db.hash.find({x:1}).explain(){"cursor" : "BtreeCursor x_hashed","n" : 1,"nscanned" : 1,"nscannedObjects" : 1,"millisShardTotal" : 0,"numQueries" : 1,"numShards" : 1,"indexBounds" : {"x" : [[NumberLong("5902408780260971510"),NumberLong("5902408780260971510")]]},"millis" : 0}Explain Plan of an Equality Query
    60. 60. Not So Good for a Range Query• Range queries scatter gather• Don’t use the index• Inefficient query
    61. 61. mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain(){"cursor" : "BasicCursor","n" : 97,"nChunkSkips" : 0,"nYields" : 0,"nscanned" : 1000,"nscannedAllPlans" : 1000,"nscannedObjects" : 1000,"nscannedObjectsAllPlans" : 1000,"millisShardTotal" : 0,"millisShardAvg" : 0,"numQueries" : 2,"numShards" : 2,"millis" : 3}Explain Plan of a Range Query
    62. 62. Limitations• Cannot use a compound key• Key cannot have an array value• Incompatible with tag aware sharding– Tags would be assigned the value of the hash, not thevalue of the underlying key• Key with poor cardinality is going to give a hashwith poor cardinality– Floating point numbers are squashed. E.g. 100.4 will behashed as 100
    63. 63. Summary• There are 3 different approaches for sharding• Hash shard keys give great distribution• Hash shard keys are good for equality• Pick the right shard key for your application
    64. 64. #MongoDBDaysThank YouSoftware Engineer, 10gen@brandonmblackBrandon Black
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×