Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB Auto-Sharding at Mongo Seattle

12,958 views

Published on

Aaron Staple's presentation at Mongo Seattle

Published in: Technology

MongoDB Auto-Sharding at Mongo Seattle

  1. 1. MongoDBauto sharding<br />Aaron Staple<br />aaron@10gen.com<br />Mongo Seattle<br />July 27, 2010<br />
  2. 2. MongoDB v1.6out next week!<br />
  3. 3. Why Scale Horizontally?<br />Vertical scaling is expensive<br />Horizontal scaling is more incremental – works well in the cloud<br />Will always be able to scale wider than higher<br />
  4. 4. Distribution Models<br />Ad-hoc partitioning<br />Consistent hashing (dynamo)<br />Range based partitioning (BigTable/PNUTS)<br />
  5. 5. Auto Sharding<br />Each piece of data is exclusively controlled by a single node (shard)<br />Each node (shard) has exclusive control over a well defined subset of the data<br />Database operations run against one shard when possible, multiple when necessary<br />As system load changes, assignment of data to shards is rebalanced automatically<br />
  6. 6. Mongo Sharding<br />Basic goal is to make this<br />mongod<br />mongod<br />mongod<br />?<br />client<br />
  7. 7. Mongo Sharding<br />Look like this<br />mongod<br />client<br />
  8. 8. Mongo Sharding<br />Mapping of documents to shards controlled by shard key<br />Can convert from single master to sharded cluster with 0 downtime<br />Most functionality of a single Mongo master is preserved<br />Fully consistent<br />
  9. 9. Shard Key<br />
  10. 10. Shard Key Examples<br />{ user_id: 1 }<br />{ state: 1 }<br />{ lastname: 1, firstname: 1 }<br />{ tag: 1, timestamp: -1 }<br />{ _id: 1 }<br />This is the default<br />Careful when using ObjectId<br />
  11. 11. Architecture Overview<br />5500 <= user_id < +inf<br />-inf <= user_id < 2000<br />2000 <= user_id < 5500<br />mongod<br />mongod<br />mongod<br />mongos<br />client<br />
  12. 12. Query I<br />Shard key { user_id: 1 }<br />db.users.find( { user_id: 5000 } )<br />Query appropriate shard<br />
  13. 13. Query II<br />Shard key { user_id: 1 }<br />db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )<br />Query appropriate shard(s)<br />
  14. 14. { a : …, b : …, c : … } a is declared shard key<br />find( { a : { $gt : 333, $lt : 400 } )<br />
  15. 15. { a : …, b : …, c : … } a is declared shard key<br />find( { a : { $gt : 333, $lt : 2012 } )<br />
  16. 16. Query III<br />Shard key { user_id: 1 }<br />db.users.find( { hometown: ‘Seattle’ } )<br />Query all shards<br />
  17. 17. { a : …, b : …, c : … } a is declared shard key<br />find( { a : { $gt : 333, $lt : 2012 } )<br />
  18. 18. { a : …, b : …, c : … } secondary query, secondary index<br />ensureIndex({b:1})<br />find( { b : 99 } )<br />This case good when m is small (such as here), also when the queries are large tasks<br />
  19. 19. Query IV<br />Shard key { user_id: 1 }<br />db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } )<br />Query all shards, in sequence<br />
  20. 20. Query V<br />Shard key { user_id: 1 }<br />db.users.find( { hometown: ‘Seattle’ } ).sort( { lastname: 1 } )<br />Query all shards in parallel, perform merge sort<br />Secondary index in { lastname: 1 } can be used<br />
  21. 21. Map/Reduce<br />Map/Reduce was designed for distributed systems<br />Map/Reduce jobs will run on all relevant shards in parallel, subject to query spec<br />
  22. 22. Map/Reduce<br />> m = function() { emit(this.user_id, 1); }<br />> r = function(k,vals) { return 1; }<br />> res = db.events.mapReduce(m, r, { query : {type:'sale'} });<br />> db[res.result].find().limit(2)<br />{ "_id" : 8321073716060 , "value" : 1 }<br />{ "_id" : 7921232311289 , "value" : 1 }<br />
  23. 23. Writes<br />Inserts routed to appropriate shard (inserted doc must contain shard key)<br />Removes call upon matching shards<br />Updates call upon matching shards<br />Writes parallel if asynchronous, sequential if synchronous<br />Updates cannot modify the shard key<br />
  24. 24. Choice of Shard Key<br />Key per document that is generally a component of your queries<br />Often you want a unique key<br />If not, consider granularity of key and potentially add fields<br />The shard key is generally comprised of fields you would put in an index if operating on a single machine<br />But in a sharded configuration, the shard key will be indexed automatically<br />
  25. 25. Shard Key Examples (again)<br />{ user_id: 1 }<br />{ state: 1 }<br />{ lastname: 1, firstname: 1 }<br />{ tag: 1, timestamp: -1 }<br />{ _id: 1 }<br />This is the default<br />Careful when using ObjectId<br />
  26. 26. Bit.ly Example<br />~50M users<br />~10K concurrently using server at peak<br />~12.5B shortens per month (1K/sec peak)<br />History of all shortens per user stored in mongo<br />
  27. 27. Bit.ly Example - Schema<br />{ "_id" : "lthp", "domain" : null, "keyword" : null, "title" : "bit.ly, a simple urlshortener", "url" : "http://jay.bit.ly/", "labels" : [<br /> {"text" : "lthp",<br /> "name" : "user_hash"},<br /> {"text" : "BUFx",<br /> "name" : "global_hash"},<br /> {"text" : "http://jay.bit.ly/",<br /> "name" : "url"},<br /> {"text" : "bit.ly, a simple urlshortener",<br /> "name" : "title"}<br />], "ts" : 1229375771, "global_hash" : "BUFx", "user" : "jay", "media_type" : null }<br />
  28. 28. Bit.ly Example<br />Shard key { user: 1 }<br />Indices<br />{ _id: 1 }<br />{ user: 1 }<br />{ global_hash: 1 }<br />{ user: 1, ts: -1 }<br />Query<br />db.history.find( { <br /> user : user,<br />labels.text : …<br /> } ).sort( { ts: -1 } )<br />
  29. 29. Balancing<br />The whole point of autosharding is that mongo balances the shards for you<br />The balancing algorithm is complicated, basic idea is that this<br />1000 <= user_id < +inf<br />-inf <= user_id < 1000<br />
  30. 30. Balancing<br />Becomes this<br />3000 <= user_id < +inf<br />-inf <= user_id < 3000<br />
  31. 31. Balancing<br />Current balancing metric is data size<br />Future possibilities – cpu, disk utilization<br />There is some flexibility built into the partitioning algorithm – so we aren’t thrashing data back and forth between shards<br />Only move one ‘chunk’ of data at a time – a conservative choice that limits total overhead of balancing<br />
  32. 32. System Architecture<br />
  33. 33. Shard<br />Regular mongodprocess(es), storing all documents for a given key range<br />Handles all reads/writes for this key range as well<br />Each shard indexes the data contained within it<br />Can be single mongod, master/slave, or replica set<br />
  34. 34. Shard - Chunk<br />In a sharded cluster, shards partitioned by shard key<br />Within a shard, chunks partitioned by shard key<br />A chunk is the smallest unit of data for balancing<br />Data moves between chunks at chunk granularity<br />Upper limit on chunk size is 200MB<br />Special case if shard key range is open ended<br />
  35. 35. Shard - Replica Sets<br />Replica sets provide data redundancy and auto failover<br />In the case of sharding, this means redundancy and failover per shard<br />All typical replica set operations are possible<br />For example, write with w=N<br />Replica sets were specifically designed to work as shards<br />
  36. 36. System Architecture<br />
  37. 37. Mongos<br />Sharding router – distributes reads/writes to sharded cluster<br />Client interface is the same as a mongod<br />Can have as many mongos instances as you want<br />Can run on app server machine to avoid extra network traffic<br />Mongos also initiates balancing operations<br />Keeps metadata per chunk in RAM – 1MB RAM per 1TB of user data in cluster<br />
  38. 38. System Architecture<br />
  39. 39. Config Server<br />3 Config servers<br />Changes are made with a 2 phase commit<br />If any of the 3 servers goes down, config data becomes read only<br />Sharded cluster will remain online as long as 1 of the config servers is running<br />Config metadata size estimate<br />1MB metadata per 1TB data in cluster<br />
  40. 40. Shared Machines<br />
  41. 41. Bit.ly Architecture<br />
  42. 42. Limitations<br />Unique index constraints not expressed by shard key are not enforced across shards<br />Updates to a document’s shard key aren’t allowed (you can remove and reinsert, but it’s not atomic)<br />Balancing metric is limited to # of chunks right now – but this will be enhanced<br />Right now only one chunk moves in the cluster at a time – this means balancing can be slow, it’s a conservative choice we’ve made to keep the overhead of balancing low for now<br />20 petabyte size limit<br />
  43. 43. Start Up Config Servers<br />$ mkdir -p ~/dbs/config<br />$ ./mongod --dbpath ~/dbs/config --port 20000<br />Repeat as necessary<br />
  44. 44. Start Up Mongos<br />$ ./mongos --port 30000 --configdb localhost:20000<br />No dbpath<br />Repeat as necessary<br />
  45. 45. Start Up Shards<br />$ mkdir -p ~/dbs/shard1 <br />$ ./mongod --dbpath ~/dbs/shard1 --port 10000<br />Repeat as necessary<br />
  46. 46. Configure Shards<br />$ ./mongo localhost:30000/admin<br />> db.runCommand({addshard : "localhost:10000", allowLocal : true})<br />{<br />"added" : "localhost:10000",<br /> "ok" : true<br />}<br />Repeat as necessary<br />
  47. 47. Shard Data<br />> db.runCommand({"enablesharding" : "foo"})<br />> db.runCommand({"shardcollection" : "foo.bar", "key" : {"_id" : 1}})<br />Repeat as necessary<br />
  48. 48. Production Configuration<br />Multiple config servers<br />Multiple mongos servers<br />Replica sets for each shard<br />Use getLastError/w correctly<br />
  49. 49. Looking at config data<br />Mongo shell connected to config server, config database<br />> db.shards.find()<br /> { "_id" : "shard0", "host" : "localhost:10000" }<br /> { "_id" : "shard1", "host" : "localhost:10001" }<br />
  50. 50. Looking at config data<br />> db.databases.find() <br />{ "_id" : "admin", "partitioned" : false, "primary" : "config" } <br />{ "_id" : "foo", "partitioned" : false, "primary" : "shard1" } <br />{ "_id" : "x", "partitioned" : false, "primary" : "shard0” }<br />{"_id" : "test", "partitioned" : true, "primary" : "shard0", "sharded" :<br /> {<br /> "test.foo" : { "key" : {"x" : 1}, "unique" : false }<br /> }<br />}<br />
  51. 51. Looking at config data<br />> db.chunks.find() <br />{"_id" : "test.foo-x_MinKey",<br /> "lastmod" : { "t" : 1276636243000, "i" : 1 },<br /> "ns" : "test.foo",<br /> "min" : {"x" : { $minKey : 1 } },<br /> "max" : {"x" : { $maxKey : 1 } },<br /> "shard" : "shard0”<br />}<br />
  52. 52. Looking at config data<br />> db.printShardingStatus() <br />--- Sharding Status ---<br />sharding version: { "_id" : 1, "version" : 3 }<br /> shards:<br /> { "_id" : "shard0", "host" : "localhost:10000" }<br /> { "_id" : "shard1", "host" : "localhost:10001" } <br />databases:<br /> { "_id" : "admin", "partitioned" : false, "primary" : "config" } <br /> { "_id" : "foo", "partitioned" : false, "primary" : "shard1" } <br /> { "_id" : "x", "partitioned" : false, "primary" : "shard0" } <br /> { "_id" : "test", "partitioned" : true, "primary" : "shard0","sharded" : { "test.foo" : { "key" : { "x" : 1 }, "unique" : false } } }<br />test.foo chunks:<br /> { "x" : { $minKey : 1 } } -->> { "x" : { $maxKey : 1 } } on : shard0 { "t" : 1276636243 …<br />
  53. 53. Give it a Try!<br />Download from mongodb.org<br />Sharding production ready in 1.6, which is scheduled for release next week<br />For now use 1.5 (unstable) to try sharding<br />

×