Your SlideShare is downloading. ×
Hellenic MongoDB user group - Introduction to sharding
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hellenic MongoDB user group - Introduction to sharding


Published on

Hellenic MongoDB user group. …

Hellenic MongoDB user group.
16 Jan 2013,
1st meetup

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Introduction to sharding Christos Soulios Software Architect, Persado ( Jan 2013 0
  • 2. Lets start with an example: We have launched our latest and greatest web application We use MongoDB database which is fast and cool We even have setup replication for high availability Our application turns out to be popular and we are already planning our next project Cool! 1
  • 3. Unfortunately, our website becomes too popular too fast.And this causes problems 2
  • 4. MongoDB problems when dataset grows Dataset does not fit on local disks.Solution: Let’s buy more disks Database indexes do not fit in memory. They have to be paged in and out. Database becomes sluggishSolution: Let’s buy more memory High throughput writing operations cause high contention on the infamous MongoDB locksNow what? We need to scale horizontally. We need sharding 3
  • 5. What is sharding? Shardingis automatic data partitioning Distributes data evenly across cluster nodes (called shards) Allows for seamless querying. Almost no functionality lost over single master Keeps database consistent 4
  • 6. How sharding works Collection data is broken into chunks based on the range of a selected collection field. This field is called the shard key Chunks are evenly distributed across shards. Each data chunk is controlled by a single shard Special config servers are responsible for storing which shard controls which chunks Database clients communicate with the shards through the mongos router process mongos router behaves to the client just as a normal mongod server. Sharding is transparent to the client For each database operation, the mongosrouter queries the config servers using the shard key and redirects the operation to the correct shards While more data is inserted, ranges are split into more chunks 5
  • 7. Example (Users collection){„user_id‟ : 45, „username‟: „asterix‟, „email‟ : „last_login‟: ‟11/11/2012‟},{„user_id‟ : 4503, „username‟: „gandalf‟, „email‟ : „last_login‟: ‟01/14/2013‟},{„user_id‟ : 1153, „username‟: „superman‟, „email‟ : „last_login‟: ‟10/30/2012‟},{„user_id‟ : 5434, „username‟: „darth_vader‟, „email‟ : „last_login‟: ‟07/01/2012‟} >db.runCommand( { shardcollection: “test.users”, key: { username: 1 }} ) 6
  • 8. Shard architecture (sharding by user_id) 7
  • 9. Database operations All queries are routed through the mongosprocess Insert operations are routed by shard key. Shard key is required Querying by shard key routes the query to shards Querying by non-shard key scatters the query to all shards and gathers results Updates and deletes behave like queries 8
  • 10. Data balancing System becomes unbalanced when one shard stores more data chunks than others Data is automatically balanced without intervention from the client application or the administrator 9
  • 11. Data balancing The range of the loaded shard is split and chunks are migrated to other shards 10
  • 12. Data balancing Config servers are updated using a 2phase commit process to ensure database consistency System ends up balanced 11
  • 13. Choosing a shard key Choosing a good shard key is critical Once chosen, we are stuck with it Shard key must be immutable Should distribute data load evenly across shards Should be of high cardinality. Enumerated values are not good shard keys Should not be monotically increasing. ObjectIds, dates or database sequences are not good shard keys, because they create hotspots Should be used by most critical queries to provide query isolation. Avoid scatter-gather queries Should provide good data affinity to avoid disk to memory transfers (random values are not good shard keys) 13
  • 14. Choosing a shard keyKnow your data. It is important What is the expected dataset size? What is the write throughput? How do data look like? Which fields are random or increasing? Are there low cardinality fields? Can we identify any access patterns for reads? What data is indexed? What is the active working set? Are there historical data that are not used after sometime? 14
  • 15. Choosing a shard key It is not trivial Most of the times there is no single field that can be used as shard key We have to invent one 15
  • 16. Choosing a shard key Usually applications access lately inserted data more often What about a compound shard key? What about a combination of a coarsely ascending field and a commonly queried search key? Coarsely ascending key should have a few hundreds of chunks per value. This provides good data locality and even distribution Search key provides query isolation Rule of thumb: {coarseLocality: 1, search : 1} 16
  • 17. Example (Tweets collection){user: „asterix‟,ts: ISODate(“01/14/2013Z22:53:33.123”), month: „2013-01‟retweets: 45, client: „TweetDeck‟, text: „Mongodbsharding is super cool!‟}We are typically looking for the latest tweets of a user. Therefore, a combination of „month + user‟ fields would create a good shard key monthfield is coarsely ascending, allowing to transfer only latest tweets to memory user field is a commonly searched key 17
  • 18. Conclusion Sharding allows MongoDB databases to scale horizontally Shard balancing is performed automatically by the system Sharding is transparent to the client application Choosing a good shard key is critical Choosing a good shard key is not trivial Be creative and experiment with your data before choosing the shard key 18
  • 19. Questions ?