Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Managing data and operation distribution in MongoDB

200 views

Published on

Managing data and operation distribution in MongoDB - Percona Live 2019, Austin

Published in: Software
  • Login to see the comments

  • Be the first to like this

Managing data and operation distribution in MongoDB

  1. 1. Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1
  2. 2. Introduction www.objectrocket.com 2 Antonios Giannopoulos Jason Terpko
  3. 3. Overview • Sharded Cluster • Shard Keys Selection • Shard Key Operations • Chunk Management • Data Distribution • Orphaned documents • Q&A www.objectrocket.com 3
  4. 4. Sharded Cluster • Cluster Metadata • Data Layer • Query Routing • Cluster Communication www.objectrocket.com 4
  5. 5. Cluster Metadata
  6. 6. Data Layer … s1 s2 sN
  7. 7. Replication Data redundancy relies on an idempotent log of operations.
  8. 8. Query Routing … s1 s2 sN
  9. 9. Sharded Cluster … s1 s2 sN
  10. 10. Cluster Communication How do independent components become a cluster and communicate? ● Replica Set ○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole ● Mongos Configuration ○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor ● Post Add Shard ○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions
  11. 11. Primary Shard … s1 s2 sN Database <foo>
  12. 12. Collection UUID Cluster Metadata config.collections Data Layer (mongod) config.collections With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.
  13. 13. Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important • UUID’s for a namespace must match • Use 4.0+ Tools for a sharded cluster restore
  14. 14. Shard Key - Selection • Profiling • Identify shard key candidates • Pick a shard key • Challenges www.objectrocket.com 14
  15. 15. Sharding … 15 s1 s2 sN Database <foo> Collection <foo> Shards are Physical Partitions chunk chunk Chunks are Logical Partitions chunk chunkchunk chunk
  16. 16. What is a Chunk? The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster. ● Maximum size is defined in config.settings ○ Default 64MB ● Before 3.4.11: Hardcoded maximum document count of 250,000 ● Version 3.4.11 and higher: 1.3 configured chunk size by the average document size ● Chunk map is stored in config.chunks ○ Continuous range from MinKey to MaxKey ● Chunk map is cached at both the mongos and mongod ○ Query Routing ○ Sharding Filter ● Chunks distributed by the Balancer ○ Using moveChunk ○ Up to maxSize
  17. 17. Shard Key Selection www.objectrocket.com 17 Profiling Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size
  18. 18. Shard Key Selection www.objectrocket.com 18 CandidatesProfiling Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates
  19. 19. Shard Key Selection www.objectrocket.com 19 Build-in Constraints CandidatesProfiling Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates
  20. 20. Shard Key Selection www.objectrocket.com 20 Schema Constraints Build-in Constraints CandidatesProfiling Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations
  21. 21. Shard Key Selection www.objectrocket.com 21 Future Schema Constraints Build-in Constraints CandidatesProfiling Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months
  22. 22. Shard key - Operations • Apply a shard key • Revert a shard key www.objectrocket.com 22
  23. 23. Apply a shard key www.objectrocket.com 23 Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer
  24. 24. Sharding … s1 s2 sN Database <foo> Collection <foo> chunk chunk sh.ShardCollection({foo.foo},<key>) sh.startBalancer() chunk chunk chunk chunk Burn Period
  25. 25. Revert a shard key www.objectrocket.com 25 Two categories: o Affects functionality (exceptions, inconsistent data,…) o Affects performance (operational hotspots…) Dump/Restore o Requires downtime – write and in some cases read o Time consuming operation o You may restore on a sharded or unsharded collection o Better pre-create indexes o Same or new cluster can be used o Streaming dump/restore is an option o On special cases, like time series data can be fast
  26. 26. Revert a shard key www.objectrocket.com 26 Dual writes o Mongo to Mongo connector or Change streams o No downtime o Requires extra capacity o May Increase latency o Same or new cluster can be used o Adds complexity Alter the config database o Requires downtime – but minimal o Easy during burn period o Time consuming, if chunks are distributed o Has overhead during chunk moves
  27. 27. Revert a shard key www.objectrocket.com 27 Process: 1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset Rollback: • After step 6, stop all mongos and shards • Stop the running members of the config server ReplSet and wipe their data directory • Start all config server replset members • Start all mongos and shards
  28. 28. Revert a shard key www.objectrocket.com 28 Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases: Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1}) o Possible (and easier) if b’s max and min (per a) are predefined o For example {year:month} to be extended to {year:month:day} Reduce the elements of a shard key (({a:1, b:1} to {a:1}) o Possible (and easier) if all distinct “a” values are in the same shard o There aren’t chunks with the same “a.min” (adds complexity)
  29. 29. Revert a shard key www.objectrocket.com 29 Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change *There might be a more optimal code path but the above one worked like a charm
  30. 30. Chunk Splitting and Merging • Pre-splitting • Auto Splits • Manual Intervention www.objectrocket.com 30
  31. 31. Distribution Goal … 31 s1* s2 s4 Database <foo> 25% 25% 25% 50G 50G 50G Database Size: 200G Primary Shard: s1
  32. 32. Pre-Split – Hashed Keys 32 Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 Count = Number of documents / 125000 Limit = Number of shards * 8192 numInitialChunks = Min(Max(Size, Count), Limit) 1,600 = 51,200 / 32 800 = 100,000,000 / 125,000 32,768 = 4 *8192 1600 = Min(Max(1600, 800), 32768) Command db.runCommand( { shardCollection: ”foo.users", key: { "uid": "hashed" }, numInitialChunks : 1600 } );
  33. 33. Pre-Split – Deterministic 33 Use Case: Collection containing user profiles with email as the unique key. Prerequisites 1. Shard key analysis complete 2. Understanding of access patterns 3. Knowledge of the data 4. Unique key constraint
  34. 34. Pre-Split – Deterministic 34 SplitPrerequisites Initial Chunk Splits
  35. 35. Pre-Split – Deterministic 35 SplitPrerequisites Balance
  36. 36. Pre-Split – Deterministic 36 SplitPrerequisites Balance Split
  37. 37. Automatic Splitting 37 Controlling Auto-Split • sh.enableAutoSplit() • sh.disableAutoSplit() Alternatively Mongos • The component responsible for track statistics • Bytes Written Statistics • Multiple Mongos Servers for HA
  38. 38. Sub-Optimal Distribution … 38 s1* s2 s4 Database <foo> 40% 20% 20% 50G 50G 50G Database Size: 200G Primary Shard: s1 Chunks: Balanced
  39. 39. Maintenance – Splitting 39 Four Helpful Resources: • collStats • config.chunks • Profiler • Oplog • dataSize
  40. 40. Maintenance – Splitting 40 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile
  41. 41. Maintenance – Splitting 41 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile Or:
  42. 42. Maintenance – Splitting 42 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile *with setProfilingLevel at 2, analyze both read and writes
  43. 43. Maintenance – Splitting 43 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile* *with setProfilingLevel at 2, analyze both read and writes
  44. 44. Sub-Optimal Distribution … 44 s1* s2 s4 Database <foo> 40% 20% 20% 50G 50G 50G Database Size: 200G Primary Shard: s1 Chunks: Balanced
  45. 45. Maintenance - Merging 45 Analyze
  46. 46. Maintenance - Merging 46 MoveAnalyze
  47. 47. Maintenance - Merging 47 MoveAnalyze Merge
  48. 48. Balancing • Balancer overview • Balancing with defaults • Create a better distribution • Create a better balancing www.objectrocket.com 48
  49. 49. Balancer 49 The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. Takes into account the number of chunks (and not the amount of data) Number of Chunks Migration Threshold Fewer than 20 2 20-79 4 80 and greater 8 Jumbo Chunks: MongoDB cannot move a chunk if the number of documents in the chunk is greater than 1.3 times the result of dividing the configured chunk size by the average document size. db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection. Prior to 3.4.11 max was 250000 documents
  50. 50. Balancer 50 Parallel Migrations: Before 3.4, one migration at a time After 3.4 parallel migrations as long as source and destination aren’t involve in a another migration Settings: chunkSize: Default is 64M – Lives on config.settings _waitForDelete : Default is false – Lives on config.settings _secondaryThrottle Default is true. After 3.4 WT uses false. – Lives on config.settings activeWindow - Default is 24h. – Lives on config.settings maxSize – Default is unlimited. Lives on config.shards disableBalancing: Disables/Enables balancing per collection autoSplit: Disables/Enables splits
  51. 51. Balancing 51 Balancer only cares about the number of chunks per shard. Best case Our case Our goal
  52. 52. Balancing 52 The “apple algorithm” we are going to introduce is simple For a collection, it requires an ordered chunk map, with attributes: chunk size, chunk bounds (min, max) and the shard each chunk belongs. 1 Pick the first chunk (current) 2 Merge current with next 3 If merged size is lower than a configured threshold then go to step 2 4 else merge current with next and set next as current Lets now see the implementation in Python.
  53. 53. Balancing - Variables 53
  54. 54. Balancing – Basic functions 54
  55. 55. Balancing – Main function 55
  56. 56. Balancing – Helper functions 56
  57. 57. Balancing - Output 57
  58. 58. Balancing 58 Can the algorithm do better? Can we improve the balancing post running the script?
  59. 59. Balancing 59 Can the algorithm do better? Can we improve the balancing post running the script? Make bounds more strict and add more parameters will improve it. -OR- Chunk Buckets maybe the answer. The script produces chunks between (chunksize/2) and (chunksize) chunks It will improved balancing but, It may not achieve a perfect distribution The idea is to categorize the chunks to buckets between (chunksize/2) and (chunksize) and each shard to have equal number of chunks from each bucket
  60. 60. Balancing - Buckets 60 For example, chunksize=64 we can create the following buckets: o Bucket1 for sizes between 32 and 36 MiB o Bucket2 for sizes between 36 and 40 MiB o Bucket3 for sizes between 40 and 44 MiB o Bucket4 for sizes between 44 and 48 MiB o Bucket5 for sizes between 48 and 52 MiB o Bucket6 for sizes between 52 and 56 MiB o Bucket7 for sizes between 56 and 60 MiB o Bucket8 for sizes between 60 and 64 MiB More buckets means more accuracy but it may cause more chunk moves. The diversity of the chunks plays a major role
  61. 61. Balancing - Buckets 61
  62. 62. Balancing – Get the code 62 GitHub Repo - https://bit.ly/2M0LnxG
  63. 63. Orphaned Documents • Definition • Issues • Cleanup www.objectrocket.com 63
  64. 64. Definition/Impact 64 Definition: Orphaned documents are those documents on a shard that also exist in chunks on other shards How can they occur: - Failed migration - Failed cleanup (RangeDeleter) - Direct access to the shards Impact: - Space - Performance - Application consistency
  65. 65. Cleanup 65 cleanupOrphaned • Must run on every shard • Removes the Orphans automatically • No dry run / Poor reporting Drain shard(s) • Expensive – storage/performance • Locate shards with orphans
  66. 66. Cleanup Cont. 66 There are ways to scan more intelligently: • Skip unsharded collections db.collections.find({"dropped" : false},{_id:1}) • Skip collections without migrations db.changelog.distinct("ns",{"what":"moveChunk.start"}) • Check first event - changelog is a capped collection
  67. 67. Cleanup Cont. 67 An offline method to cleanup orphans: mongodump/mongorestore shards with orphans and config.chunks collection Remove documents on all ranges belong to the shard(s) The “leftovers” are the orphaned documents Its a bit more tricky with “hashed” keys:
  68. 68. Questions? www.objectrocket.com 68
  69. 69. Rate Our Session www.objectrocket.com 69
  70. 70. www.objectrocket.com 70 We’re Hiring! Looking to join a dynamic & innovative team? https://www.objectrocket.com/careers/ or email careers@objectrocket.com
  71. 71. Thank you! Address: 401 Congress Ave Suite 1950 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com 71

×