• Like
Uploaded on

For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com

For the first time this year, 10gen will be offering a track completely dedicated to Operations at MongoSV, 10gen's annual MongoDB user conference on December 4. Learn more at MongoSV.com

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
794
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
28
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Ops will be most interested in ConfigurationDev will be most interested in Mechanics
  • Build - Document oriented database maps perfectly to object oriented languagesScale - MongoDB presents clear path to scalability that isn't ops intensive - Provides same interface for sharded cluster as single instance
  • In 2005, two ways to achieve datastore scalability:Lots of money to purchase custom hardwareLots of money to develop custom software
  • Indexes should be contained in working set.
  • The solution to the preceding problemsAdd graphic??
  • Add arrows for or mention that there is communication between shards (migrations)
  • Once chunk size is reached, mongos asks mongod to split a chunk + internal function called splitVector()mongod counts number of documents on each side of split + based on avg. document size `db.stats()`Chunk split is a **logical** operation (no data has moved)Max on first chunk should be 14
  • Moved chunk on shard2 should be gray
  • Use this slide to talk about clients that have had this problem and worked around it by sharding.
  • Use this slide to talk about clients that have had this problem and worked around it by sharding.

Transcript

  • 1. #ConferenceHashtagShardingSpeaker NameJob Title, 10gen
  • 2. Agenda• Scaling data• Why shard• Architecture• Configuration• Mechanics• Solutions
  • 3. The story of scalingdata
  • 4. Visual representation of vertical scalingAltaVista, 1995: Vertical Scalability
  • 5. Visual representation of horizontal scalingGoogle, ~2000: Horizontal Scalability
  • 6. Data Store Scalability in 2005• Custom Hardware – Oracle• Custom Software – Facebook + MySQL
  • 7. Data Store Scalability Today• MongoDB auto-sharding available in 2009• A data store that is – Free – Publicly available – Open source – Horizontally scalable
  • 8. Why shard?
  • 9. Working Set Exceeds PhysicalMemory
  • 10. Read/Write Throughput Exceeds I/O
  • 11. MongoDB Auto-Sharding• Minimal effort required – Same interface as single mongod• Two steps – Enable Sharding for a database – Shard collection within database
  • 12. Architecture
  • 13. Architecture – Components• shard – Can be stand alone or replica set
  • 14. Architecture – Components• Config Server – Stores cluster meta-data (chunk ranges and locations) – Can have only 1 or 3 (production must have 3) – Two phase commit (not a replica set)
  • 15. Architecture – Components• Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or many
  • 16. Sharding infrastructure
  • 17. Configuration
  • 18. Example cluster setup• Don‟t use this setup in production! - Only one Config server (No Fault Tolerance) - Shard not in a replica set (Low Availability) - Only one Mongos and shard (No Performance Improvement)
  • 19. Start the config server• “mongod --configsvr”• Starts a config server on the default port (27019)
  • 20. Start the mongos router• “mongos --configdb <hostname>:27019”• For 3 config servers: “mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>”• This is always how to start a new mongos, even if the cluster is already running
  • 21. Start the shard database• “mongod --shardsvr”• Starts a mongod with the default shard port (27018)• Shard is not yet connected to the rest of the cluster• Shard may have already been running in production
  • 22. Add the shard• On mongos: “sh.addShard(„<host>:27018‟)”• Adding a replica set: “sh.addShard(„<rsname>/<seedlist>‟)
  • 23. Verify that the shard was added• db.runCommand({ listshards:1 })• { "shards" : [ { "_id”: "shard0000”, "host”: ”<hostname>:27018” } ], "ok" : 1 }
  • 24. Enabling Sharding• Enable sharding on a database – sh.enableSharding(“records”)• Shard a collection with the given key – sh.shardCollection(“records.people”,{“country”:1})• Use a compound shard key to prevent duplicates – sh.shardCollection(“records.cars”,{“year”:1, ”uniqueid”:1})
  • 25. Tag Aware Sharding• Tag aware sharding allows you to control the distribution of your data• Tag a range of shard keys – sh.addTagRange(<collection>,<min>,<max>,<tag>)• Tag a shard – sh.addShardTag(<shard>,<tag>)
  • 26. Mechanics
  • 27. Partitioning• User defined shard key• Range based partitioning• Initially 1 chunk• Default max chunk size: 64mb• Once max size is reached a split occurs
  • 28. Partitioning
  • 29. Chunk splitting• Balancer is running on mongos• Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
  • 30. Balancing• Balancer is running on mongos• Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts
  • 31. Acquiring the Balancer Lock• The balancer on mongos takes out a “balancer lock”• To see the status of these locks: - use config - db.locks.find({ _id: “balancer” })
  • 32. Moving the chunk• The mongos sends a “moveChunk” command to source shard• The source shard then notifies destination shard• The destination clears the chunk shard-key range• Destination shard starts pulling documents from source shard
  • 33. Committing Migration• When complete, destination shard updates config server - Provides new locations of the chunks
  • 34. Cleanup• Source shard deletes moved data - Must wait for open cursors to either close or time out - NoTimeout cursors may prevent the release of the lock• Mongos releases the balancer lock after old chunks are deleted
  • 35. Cluster Request Routing• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort
  • 36. Cluster Request Routing: TargetedQuery
  • 37. Routable request received
  • 38. Request routed to appropriate shard
  • 39. Shard returns results
  • 40. Mongos returns results to client
  • 41. Cluster Request Routing: Non-TargetedQuery
  • 42. Non-Targeted Request Received
  • 43. Request sent to all shards
  • 44. Shards return results to mongos
  • 45. Mongos returns results to client
  • 46. Cluster Request Routing: Non-TargetedQuery with Sort
  • 47. Non-Targeted request with sortreceived
  • 48. Request sent to all shards
  • 49. Query and sort performed locally
  • 50. Shards return results to mongos
  • 51. Mongos merges sorted results
  • 52. Mongos returns results to client
  • 53. Shard Key• Choose a field common to queries• Shard key is immutable• Shard key values are immutable• Shard key requires index on fields contained in key• Uniqueness of `_id` field is only guaranteed within individual shard• Shard key limited to 512 bytes in size
  • 54. Shard Key Considerations• Cardinality• Write distribution• Query isolation• Data Distribution
  • 55. Sharding enables scale
  • 56. Working Set Exceeds PhysicalMemory
  • 57. Read/Write throughput exceeds I/O Click tracking
  • 58. #ConferenceHashTagThank YouSpeaker NameJob Title, 10gen