Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Sharding with MongoDB

6,757 views

Published on

Alberto Lerner, Software Engineer at 10gen, presents at MongoUK in London, June 2010

Published in: Technology, Design

Introduction to Sharding with MongoDB

  1. 1. Introduction to MongoDBSharding<br />Alberto Lerner<br />Software Engineer – 10Gen<br />alerner@10gen.com<br />
  2. 2. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  3. 3. Sharding Basics<br />To maintain the impression that things look like this<br />SearchCriteria<br />using an index<br />scanning the collection<br />
  4. 4. Sharding Basics (cont)<br />When they actually are like this<br />SearchCriteria<br />using an index<br />scanning the collection<br />
  5. 5. A Detail<br />Partitioning a collection is relatively easy<br />A bit of application logic to find a partition and that’s it<br />Or is it?<br />
  6. 6. The Certainty<br />Things change<br />You get spotted, your querying volume grows<br />You build new functionality, your access pattern changes<br />You buy new machines, your fixed partitioning scheme goes out the window<br />
  7. 7. Insurance<br />Sharding is not about partitioning. It’s about repartitioning without you bothering to ask<br />Adding or removing shards<br />Splitting and moving chunks*<br />Logic of finding a chunk is MongoDB’s not the application’s<br />* Chunk: an (arbitrary) unit that can move at once between shards<br />
  8. 8. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  9. 9. Starting to Shard<br />You can load data into a sharded collection or shard an existing one*<br />Automatic range partition will take place <br />The data placement will be taken care of<br />By default, it will be sharded over _id but you can specify a different sharding key<br />An index will be built automatically over that key<br />* 1.6<br />
  10. 10. On Writes<br />Write capacity becomes the sum of shards capacity<br />
  11. 11. A digression<br />A shard can actually live in a group of replicated servers<br />Fault-tolerance is obtained that way<br />Our focus here is incremental scalability and aggregated performance<br />
  12. 12. On Reads, I<br />Lookup over the shard key or a prefix thereof<br />Sharding at its best!<br />Search criteria can be satisfied by a single chunk<br />Lookup inside chunk uses index<br />May or may not need to access the collection<br />Example:<br />Shard by user_id, return the user’s name<br />
  13. 13. On Reads, II<br />Lookup over secondary index<br />Not bad: merges results from shards<br />Example: {country : “UK”} with secondary index over country<br />
  14. 14. On Reads, III<br />Lookups where indexes won’t help<br />Traversing shards sequentially or in parallel?*<br />*1.6<br />
  15. 15. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  16. 16. The Sharding Key<br />Choose wisely; you’re marrying it<br />Often, you’re better off defining a unique key that stores data the application wants to query<br />(Internally generated _id is really not it)<br />
  17. 17. Mind Your Queries<br />Sure, dynamic partitioning is automatic<br />But, ultimately, the system’s response time and scalability is connected to how your application query it<br />If most important queries fall into category I, remaining ones in II, and seldom any query that matters in III, you’ll be fine<br />
  18. 18. Pick Your Indexes<br />MongoDB allows shardingand secondary indexes<br />Critical queries that are not served by the sharding index can use help<br />Sometimes, you can’t help them all…<br />Index selection is a trade-off between querying and updates/insertion/deletions<br />
  19. 19. What is it about?<br />It’s not about sharding, it’s resharding<br />What can sharding do for you<br />What you must do first to obtain it<br />Use case<br />
  20. 20. Bit.ly History<br />User creates URL shortener<br />Sharding is used to store all past URL’s of a user<br />Sharding key: user_id<br />Indexes: timestamp(desc)<br />Queries:<br />Shortened URLs by a given user<br />Last n URLs by any user<br />
  21. 21. Take Away<br />Picture to keep in mind<br />
  22. 22. Questions?<br />www.mongodb.org<br />

×