Your SlideShare is downloading. ×
Sharding
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sharding

6,030

Published on

MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of …

MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
6,030
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
269
Comments
1
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to Sharding Tyler Brock Software Engineer, 10genWednesday, March 27, 13
  • 2. Agenda • Scaling Data • MongoDBs Approach • Architecture • Configuration • MechanicsWednesday, March 27, 13
  • 3. Scaling DataWednesday, March 27, 13
  • 4. Examining GrowthWednesday, March 27, 13
  • 5. Examining Growth • User Growth – 1995: 0.4% of the world’s population – Today: 30% of the world is online (~2.2B) – Emerging Markets & MobileWednesday, March 27, 13
  • 6. Examining Growth • User Growth – 1995: 0.4% of the world’s population – Today: 30% of the world is online (~2.2B) – Emerging Markets & Mobile • Data Set Growth – Facebook’s data set is around 100 petabytes – 4 billion photos taken in the last year (4x a decade ago)Wednesday, March 27, 13
  • 7. Working Set Exceeds Physical MemoryWednesday, March 27, 13
  • 8. Read/Write Throughput Exceeds I/OWednesday, March 27, 13
  • 9. Vertical Scalability (Scale Up)Wednesday, March 27, 13
  • 10. Horizontal Scalability (Scale Out)Wednesday, March 27, 13
  • 11. Data Store Scalability • Custom Hardware – Oracle • Custom Software – Facebook + MySQL – GoogleWednesday, March 27, 13
  • 12. Data Store Scalability Today • MongoDB Auto-Sharding • A data store that is – Free – Publicly available – Open Source (https://github.com/mongodb/mongo) – Horizontally scalable – Application independentWednesday, March 27, 13
  • 13. MongoDBs Approach to ShardingWednesday, March 27, 13
  • 14. Partitioning • User defines shard key • Shard key defines range of data • Key space is like points on a line • Range is a segment of that lineWednesday, March 27, 13
  • 15. Data Distribution • Initially 1 chunk • Default max chunk size: 64mb • MongoDB automatically splits & migrates chunks when max reachedWednesday, March 27, 13
  • 16. Routing and Balancing • Queries routed to specific shards • MongoDB balances cluster • MongoDB migrates data to new nodesWednesday, March 27, 13
  • 17. MongoDB Auto-Sharding • Minimal effort required – Same interface as single mongod • Two steps – Enable Sharding for a database – Shard collection within databaseWednesday, March 27, 13
  • 18. ArchitectureWednesday, March 27, 13
  • 19. What is a Shard? • Shard is a node of the cluster • Shard can be a single mongod or a replica setWednesday, March 27, 13
  • 20. Meta Data Storage • Config Server – Stores cluster chunk ranges and locations – Can have only 1 or 3 (production must have 3) – Not a replica setWednesday, March 27, 13
  • 21. Routing and Managing Data • Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or manyWednesday, March 27, 13
  • 22. Sharding infrastructureWednesday, March 27, 13
  • 23. ConfigurationWednesday, March 27, 13
  • 24. Example Cluster • Don’t use this setup in production! - Only one Config server (No Fault Tolerance) - Shard not in a replica set (Low Availability) - Only one mongos and shard (No Performance Improvement)Wednesday, March 27, 13
  • 25. Starting the Configuration Server • mongod --configsvr • Starts a configuration server on the default port (27019)Wednesday, March 27, 13
  • 26. Start the mongos Router • mongos --configdb <hostname>:27019 • For 3 configuration servers: mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3> • This is always how to start a new mongos, even if the cluster is already runningWednesday, March 27, 13
  • 27. Start the shard database • mongod --shardsvr • Starts a mongod with the default shard port (27018) • Shard is not yet connected to the rest of the cluster • Shard may have already been running in productionWednesday, March 27, 13
  • 28. Add the Shard • On mongos: - sh.addShard(‘<host>:27018’) • Adding a replica set:Wednesday, March 27, 13
  • 29. Verify that the shard was added • db.runCommand({ listshards:1 }) { "shards" : ! [{"_id”: "shard0000”,"host”: ”<hostname>:27018” } ], "ok" : 1 }Wednesday, March 27, 13
  • 30. Enabling Sharding • Enable sharding on a database sh.enableSharding(“<dbname>”) • Shard a collection with the given key sh.shardCollection(“<dbname>.people”,{“country”:1}) • Use a compound shard key to prevent duplicates sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})Wednesday, March 27, 13
  • 31. Tag Aware Sharding • Tag aware sharding allows you to control the distribution of your data • Tag a range of shard keys – sh.addTagRange(<collection>,<min>,<max>,<tag>) • Tag a shard – sh.addShardTag(<shard>,<tag>)Wednesday, March 27, 13
  • 32. MechanicsWednesday, March 27, 13
  • 33. Partitioning • Remember its based on rangesWednesday, March 27, 13
  • 34. Chunk is a section of the entire rangeWednesday, March 27, 13
  • 35. Chunk splitting • A chunk is split once it exceeds the maximum size • There is no split point if all documents have the same shard key • Chunk split is a logical operation (no data is moved)Wednesday, March 27, 13
  • 36. Balancing • Balancer is running on mongos • Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round startsWednesday, March 27, 13
  • 37. Acquiring the Balancer Lock • The balancer on mongos takes out a “balancer lock” • To see the status of these locks: use config db.locks.find({ _id: “balancer” })Wednesday, March 27, 13
  • 38. Moving the chunk • The mongos sends a moveChunk command to source shard • The source shard then notifies destination shard • Destination shard starts pulling documents from source shardWednesday, March 27, 13
  • 39. Committing Migration • When complete, destination shard updates config server - Provides new locations of the chunksWednesday, March 27, 13
  • 40. Cleanup • Source shard deletes moved data - Must wait for open cursors to either close or time out - NoTimeout cursors may prevent the release of the lock • The mongos releases the balancer lock after old chunks areWednesday, March 27, 13
  • 41. Routing RequestsWednesday, March 27, 13
  • 42. Cluster Request Routing • Targeted Queries • Scatter Gather Queries • Scatter Gather Queries with SortWednesday, March 27, 13
  • 43. Cluster Request Routing: Targeted QueryWednesday, March 27, 13
  • 44. Routable request receivedWednesday, March 27, 13
  • 45. Request routed to appropriate shardWednesday, March 27, 13
  • 46. Shard returns resultsWednesday, March 27, 13
  • 47. Mongos returns results to clientWednesday, March 27, 13
  • 48. Cluster Request Routing: Non-Targeted QueryWednesday, March 27, 13
  • 49. Non-Targeted Request ReceivedWednesday, March 27, 13
  • 50. Request sent to all shardsWednesday, March 27, 13
  • 51. Shards return results to mongosWednesday, March 27, 13
  • 52. Mongos returns results to clientWednesday, March 27, 13
  • 53. Cluster Request Routing: Non-Targeted Query with SortWednesday, March 27, 13
  • 54. Non-Targeted request with sort receivedWednesday, March 27, 13
  • 55. Request sent to all shardsWednesday, March 27, 13
  • 56. Query and sort performed locallyWednesday, March 27, 13
  • 57. Shards return results to mongosWednesday, March 27, 13
  • 58. Mongos merges sorted resultsWednesday, March 27, 13
  • 59. Mongos returns results to clientWednesday, March 27, 13
  • 60. Shard KeyWednesday, March 27, 13
  • 61. Shard Key • Shard key is immutable • Shard key values are immutable • Shard key must be indexed • Shard key limited to 512 bytes in size • Shard key used to route queries – Choose a field commonly used in queries • Only shard key can be unique across shards – `_id` field is only unique within individual shardWednesday, March 27, 13
  • 62. Shard Key Considerations • Cardinality • Write Distribution • Query Isolation • Reliability • Index LocalityWednesday, March 27, 13
  • 63. ConclusionWednesday, March 27, 13
  • 64. Read/Write Throughput Exceeds I/OWednesday, March 27, 13
  • 65. Working Set Exceeds Physical MemoryWednesday, March 27, 13
  • 66. Sharding Enables Scale • MongoDB’s Auto-Sharding – Easy to Configure – Consistent Interface – Free and Open SourceWednesday, March 27, 13
  • 67. • What’s next? – [ Insert Related Talks ] – [ Insert Upcoming Webinars ] – MongoDB User Group • Resources https://education.10gen.com/ http://www.10gen.com/presentationsWednesday, March 27, 13
  • 68. Thank You Tyler Brock Software Engineer, 10genWednesday, March 27, 13

×