Your SlideShare is downloading. ×
0

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sharding

848

Published on

For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com

For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
848
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Ops will be most interested in ConfigurationDev will be most interested in Mechanics
  • Build - Document oriented database maps perfectly to object oriented languagesScale - MongoDB presents clear path to scalability that isn't ops intensive - Provides same interface for sharded cluster as single instance
  • In 2005, two ways to achieve datastore scalability:Lots of money to purchase custom hardwareLots of money to develop custom software
  • Indexes should be contained in working set.
  • The solution to the preceding problemsAdd graphic??
  • Add arrows for or mention that there is communication between shards (migrations)
  • Once chunk size is reached, mongos asks mongod to split a chunk + internal function called splitVector()mongod counts number of documents on each side of split + based on avg. document size `db.stats()`Chunk split is a **logical** operation (no data has moved)Max on first chunk should be 14
  • Moved chunk on shard2 should be gray
  • Use this slide to talk about clients that have had this problem and worked around it by sharding.
  • Use this slide to talk about clients that have had this problem and worked around it by sharding.
  • Transcript

    • 1. #ConferenceHashtagShardingSpeaker NameJob Title, 10gen
    • 2. Agenda• Scaling data• Why shard• Architecture• Configuration• Mechanics• Solutions
    • 3. The story of scalingdata
    • 4. Visual representation of vertical scalingAltaVista, 1995: Vertical Scalability
    • 5. Visual representation of horizontal scalingGoogle, ~2000: Horizontal Scalability
    • 6. Data Store Scalability in 2005• Custom Hardware – Oracle• Custom Software – Facebook + MySQL
    • 7. Data Store Scalability Today• MongoDB auto-sharding available in 2009• A data store that is – Free – Publicly available – Open source – Horizontally scalable
    • 8. Why shard?
    • 9. Working Set Exceeds PhysicalMemory
    • 10. Read/Write Throughput Exceeds I/O
    • 11. MongoDB Auto-Sharding• Minimal effort required – Same interface as single mongod• Two steps – Enable Sharding for a database – Shard collection within database
    • 12. Architecture
    • 13. Architecture – Components• shard – Can be stand alone or replica set
    • 14. Architecture – Components• Config Server – Stores cluster meta-data (chunk ranges and locations) – Can have only 1 or 3 (production must have 3) – Two phase commit (not a replica set)
    • 15. Architecture – Components• Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or many
    • 16. Sharding infrastructure
    • 17. Configuration
    • 18. Example cluster setup• Don‟t use this setup in production! - Only one Config server (No Fault Tolerance) - Shard not in a replica set (Low Availability) - Only one Mongos and shard (No Performance Improvement)
    • 19. Start the config server• “mongod --configsvr”• Starts a config server on the default port (27019)
    • 20. Start the mongos router• “mongos --configdb <hostname>:27019”• For 3 config servers: “mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>”• This is always how to start a new mongos, even if the cluster is already running
    • 21. Start the shard database• “mongod --shardsvr”• Starts a mongod with the default shard port (27018)• Shard is not yet connected to the rest of the cluster• Shard may have already been running in production
    • 22. Add the shard• On mongos: “sh.addShard(„<host>:27018‟)”• Adding a replica set: “sh.addShard(„<rsname>/<seedlist>‟)
    • 23. Verify that the shard was added• db.runCommand({ listshards:1 })• { "shards" : [ { "_id”: "shard0000”, "host”: ”<hostname>:27018” } ], "ok" : 1 }
    • 24. Enabling Sharding• Enable sharding on a database – sh.enableSharding(“records”)• Shard a collection with the given key – sh.shardCollection(“records.people”,{“country”:1})• Use a compound shard key to prevent duplicates – sh.shardCollection(“records.cars”,{“year”:1, ”uniqueid”:1})
    • 25. Tag Aware Sharding• Tag aware sharding allows you to control the distribution of your data• Tag a range of shard keys – sh.addTagRange(<collection>,<min>,<max>,<tag>)• Tag a shard – sh.addShardTag(<shard>,<tag>)
    • 26. Mechanics
    • 27. Partitioning• User defined shard key• Range based partitioning• Initially 1 chunk• Default max chunk size: 64mb• Once max size is reached a split occurs
    • 28. Partitioning
    • 29. Chunk splitting• Balancer is running on mongos• Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
    • 30. Balancing• Balancer is running on mongos• Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
    • 31. Acquiring the Balancer Lock• The balancer on mongos takes out a “balancer lock”• To see the status of these locks: - use config - db.locks.find({ _id: “balancer” })
    • 32. Moving the chunk• The mongos sends a “moveChunk” command to source shard• The source shard then notifies destination shard• The destination clears the chunk shard-key range• Destination shard starts pulling documents from source shard
    • 33. Committing Migration• When complete, destination shard updates config server - Provides new locations of the chunks
    • 34. Cleanup• Source shard deletes moved data - Must wait for open cursors to either close or time out - NoTimeout cursors may prevent the release of the lock• Mongos releases the balancer lock after old chunks are deleted
    • 35. Cluster Request Routing• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort
    • 36. Cluster Request Routing: TargetedQuery
    • 37. Routable request received
    • 38. Request routed to appropriate shard
    • 39. Shard returns results
    • 40. Mongos returns results to client
    • 41. Cluster Request Routing: Non-TargetedQuery
    • 42. Non-Targeted Request Received
    • 43. Request sent to all shards
    • 44. Shards return results to mongos
    • 45. Mongos returns results to client
    • 46. Cluster Request Routing: Non-TargetedQuery with Sort
    • 47. Non-Targeted request with sortreceived
    • 48. Request sent to all shards
    • 49. Query and sort performed locally
    • 50. Shards return results to mongos
    • 51. Mongos merges sorted results
    • 52. Mongos returns results to client
    • 53. Shard Key• Choose a field common to queries• Shard key is immutable• Shard key values are immutable• Shard key requires index on fields contained in key• Uniqueness of `_id` field is only guaranteed within individual shard• Shard key limited to 512 bytes in size
    • 54. Shard Key Considerations• Cardinality• Write distribution• Query isolation• Data Distribution
    • 55. Sharding enables scale
    • 56. Working Set Exceeds PhysicalMemory
    • 57. Read/Write throughput exceeds I/O Click tracking
    • 58. #ConferenceHashTagThank YouSpeaker NameJob Title, 10gen

    ×