Introduction to Sharding with MongoDB

6,630 views

Published on

Alberto Lerner, Software Engineer at 10gen, presents at MongoUK in London, June 2010

Published in: Technology, Design

Introduction to Sharding with MongoDB

  1. 1. Introduction to MongoDBSharding<br />Alberto Lerner<br />Software Engineer – 10Gen<br />alerner@10gen.com<br />
  2. 2. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  3. 3. Sharding Basics<br />To maintain the impression that things look like this<br />SearchCriteria<br />using an index<br />scanning the collection<br />
  4. 4. Sharding Basics (cont)<br />When they actually are like this<br />SearchCriteria<br />using an index<br />scanning the collection<br />
  5. 5. A Detail<br />Partitioning a collection is relatively easy<br />A bit of application logic to find a partition and that’s it<br />Or is it?<br />
  6. 6. The Certainty<br />Things change<br />You get spotted, your querying volume grows<br />You build new functionality, your access pattern changes<br />You buy new machines, your fixed partitioning scheme goes out the window<br />
  7. 7. Insurance<br />Sharding is not about partitioning. It’s about repartitioning without you bothering to ask<br />Adding or removing shards<br />Splitting and moving chunks*<br />Logic of finding a chunk is MongoDB’s not the application’s<br />* Chunk: an (arbitrary) unit that can move at once between shards<br />
  8. 8. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  9. 9. Starting to Shard<br />You can load data into a sharded collection or shard an existing one*<br />Automatic range partition will take place <br />The data placement will be taken care of<br />By default, it will be sharded over _id but you can specify a different sharding key<br />An index will be built automatically over that key<br />* 1.6<br />
  10. 10. On Writes<br />Write capacity becomes the sum of shards capacity<br />
  11. 11. A digression<br />A shard can actually live in a group of replicated servers<br />Fault-tolerance is obtained that way<br />Our focus here is incremental scalability and aggregated performance<br />
  12. 12. On Reads, I<br />Lookup over the shard key or a prefix thereof<br />Sharding at its best!<br />Search criteria can be satisfied by a single chunk<br />Lookup inside chunk uses index<br />May or may not need to access the collection<br />Example:<br />Shard by user_id, return the user’s name<br />
  13. 13. On Reads, II<br />Lookup over secondary index<br />Not bad: merges results from shards<br />Example: {country : “UK”} with secondary index over country<br />
  14. 14. On Reads, III<br />Lookups where indexes won’t help<br />Traversing shards sequentially or in parallel?*<br />*1.6<br />
  15. 15. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  16. 16. The Sharding Key<br />Choose wisely; you’re marrying it<br />Often, you’re better off defining a unique key that stores data the application wants to query<br />(Internally generated _id is really not it)<br />
  17. 17. Mind Your Queries<br />Sure, dynamic partitioning is automatic<br />But, ultimately, the system’s response time and scalability is connected to how your application query it<br />If most important queries fall into category I, remaining ones in II, and seldom any query that matters in III, you’ll be fine<br />
  18. 18. Pick Your Indexes<br />MongoDB allows shardingand secondary indexes<br />Critical queries that are not served by the sharding index can use help<br />Sometimes, you can’t help them all…<br />Index selection is a trade-off between querying and updates/insertion/deletions<br />
  19. 19. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  20. 20. Bit.ly History<br />User creates URL shortener<br />Sharding is used to store all past URL’s of a user<br />Sharding key: user_id<br />Indexes: timestamp(desc)<br />Queries:<br />Shortened URLs by a given user<br />Last n URLs by any user<br />
  21. 21. Take Away<br />Picture to keep in mind<br />
  22. 22. Questions?<br />www.mongodb.org<br />

×