Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How sitecore depends on mongo db for scalability and performance, and what it can teach you

632 views

Published on

Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian

  • Login to see the comments

How sitecore depends on mongo db for scalability and performance, and what it can teach you

  1. 1. How Sitecore depends on MongoDB for scalability and performance, and what it can teach you Antonios Giannopoulos Database Administrator – ObjectRocket Grant Killian Sitecore Architect - Rackspace Percona Live 2017
  2. 2. Agenda We are going to discuss: Key terms - Introduction to Sitecore - Introduction to MongoDB Best Practices for MongoDB with Sitecore Scaling Sitecore Benchmarks
  3. 3. Who We Are Antonios Giannopoulos Database Administrator w/ ObjectRocket Grant Killian Sitecore Architect w/ Rackspace Sitecore MVP
  4. 4. Sitecore Architecture Minimum necessary to understand this talk
  5. 5. Gartner Magic Quadrant for WCM (Web Content Management) -Sept 2016
  6. 6. Sitecore is a framework for building websites...
  7. 7. Sitecore ♥ MongoDB because . . . ● Unstructured document model is a better fit for Sitecore analytics vs traditional database rows ● ∞ scalability ● Introduces key flexibility to the system ○ HTTP Session state ○ Optional repository for other Sitecore modules ○ 100% replacement for SQL Server (experimental) ■ $$$
  8. 8. MongoDB replica-set A group of mongod processes that maintain the same dataset Replica sets provides: - Redundancy - High availability - Scaling
  9. 9. MongoDB replica-set Consists of at least 3 nodes - Up to 50 nodes in 3.0 and higher - 12 on previous versions A replica-set node may be either: - Primary - Secondary - Arbiter
  10. 10. MongoDB replica-set Asynchronous replication - Delay between PRI and SECs - SECs pull and apply operations Automatic failover - If a PRI fails a SEC takes its place
  11. 11. MongoDB replica-set Best Practices - Odd number of members - Use same server specs - Reliable network connections - Adjust the oplog accordingly
  12. 12. MongoDB Sharded Clusters Consists of: Mongos - It’s a statement (query) router - Connection interface for the driver - makes sharding transparent Config Servers: Holds cluster metadata - location of the data Shards: Contains a subset of the sharded data
  13. 13. MongoDB Sharded Clusters
  14. 14. MongoDB Sharded Clusters Best Practices - Deploy shards as replica-sets - Reliable network connections - But most important… pick a shard key Undo a shard key might require downtime
  15. 15. MongoDB Sharded Clusters What makes a good shard key: - High Cardinality - Not Null values - Immutable field(s) - Not Monotonically increased fields - Even read/write distribution - Even data distribution - Read targeting/locality Most important choose a shard key according to your application requirements
  16. 16. MongoDB Storage Engines MongoDB version 3.0 and higher supports: - MMAPv1 - WiredTiger - RocksDB (Percona Server) - In Memory (Percona Server) - Fractal Tree (Percona Server)
  17. 17. Sitecore MongoDB Databases 1. Analytics - customer visit metrics (IP address, browser,pages…) 2. Tracking_contact - contact processing 3. Tracking_history - history worker queue for full rebuilds 4. Tracking_live - task queue for real-time processing 5. Private_session - “classic” http session state 6. Shared_session - meta http session state for contacts (engagement state for livetime of interactions…)
  18. 18. For example . . . Graphic courtesy of http://www.techphoria414.com
  19. 19. Scaling Sitecore – Separate Workloads Move each Sitecore database to a separate instance Sitecore uses different connection string per Database connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database _name_" /> connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_databas e_name_" /> Instances can be optimized according to their workload
  20. 20. Scaling Sitecore – Polyglot Use a different storage engine per database: - Different instances - Sharded clusters, different storage engines per shard Percona In-memory storage engine is a good fit for _sessions - Based on the in-memory storage engine used in MongoDB Enterprise Edition - _sessions data are not persistent
  21. 21. Scaling Sitecore - Sharding What to shard: - Large collections for capacity - Busy collections for load distribution How to pick a shard key: - Collect a representative statement sample and identify statement patterns - Pick a shard key that scales the workload/statements - Meet sharding constraints
  22. 22. Scaling Sitecore - Sharding From Sitecore documentation: “Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace” Sharding interaction and contact for capacity.
  23. 23. Scaling Sitecore - Sharding Collection Interaction Receives: Inserts, Queries and Updates Read/Write Ratio: 60-40 Updates are using the _id Queries are using: "_id, ContactId” : 80% "ContactId,_t”: 5% "ContactId,ContactVisitIndex”: 15%
  24. 24. Scaling Sitecore - Sharding Collection Interaction Recommended shard key is _id:1 or _id:hashed - Scale vast majority of statements - But… few scatter-gather queries (around 20%) {ContactId:1} is also decent, But: - Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule - _id is generated by the application not the driver - Potential for Jumbo chunks
  25. 25. Scaling Sitecore - Sharding Collection Interaction Choose your shard key according to your engine - MMAP _id:1 or _id:hashed - WiredTiger _id:1 or _id:hashed or ContactId:1 Sitecore may optimize sharding by including ContactId on the updates
  26. 26. Scaling Sitecore - Sharding Collection Contacts Receives: Inserts, Queries and Updates Read/Write Ratio: 80-20 Updates are using the _id Queries are using the _id (with additional fields) Recommended shard key is _id:1 or _id:hashed
  27. 27. Scaling Sitecore - Sharding Collection Devices Recommended shard key is _id:1 or _id:hashed Collection ClassificationsMap Recommended shard key is _id:1 or _id:hashed Collection KeyBehaviorCache Recommended shard key is _id:1 or _id:hashed
  28. 28. Scaling Sitecore - Sharding Collection GeoIps Recommended shard key is _id:1 or _id:hashed Collection OperationStatuses Recommended shard key is _id:1 or _id:hashed Collection ReferringSites Recommended shard key is _id:1 or _id:hashed
  29. 29. Scaling Sitecore - Sharding {_id:1} vs {_id:hashed} Client generated _id are monotonically increased thus “hashed” added for randomness Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled on BinData datatype Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
  30. 30. Scaling Sitecore - Sharding {_id:1} vs {_id:hashed} You may use the uuidhelpers.js utility to convert _id to UUID Download from: https://github.com/mongodb/mongo-csharp- driver/blob/master/uuidhelpers.js >doc = db.test.findOne() { "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") } >doc._id.toCSUUID() CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
  31. 31. Scaling Sitecore - Sharding Use {_id:"hashed”} when you have an empty collection Using numInitialChunks allows to pre-split and distribute empty chunks. - Avoid chunk splits - Avoid chunk moves db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} , numInitialChunks:<number>} ) , number < 8192 per shard.
  32. 32. Scaling Sitecore - Sharding Use {_id:"hashed”} when you have an empty collection Define numInitialChunks Size= Collection size (in MB)/32 Count= Number of documents/125000 Limit= Number of shards*8192 numInitialChunks = Min(Max(Size, Count), Limit)
  33. 33. Scaling Sitecore - Sharding Move Primary Move each sitecore database to a different shard: (analytics, tracking_live …) db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } ) Requires downtime for live databases
  34. 34. Scaling Sitecore – Secondary Reads You can configure Secondary Reads from the driver (secondary or secondaryPreferred) connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da tabase_name_?readPreference=secondary/> In 3.4 maxStalenessSeconds was introduced to control stale reads Specifies, in seconds, how stale a secondary can be before the client stops using it for read operations
  35. 35. Scaling Sitecore – Secondary Reads Use ReplicaSet Tags to target reads: - Direct reads to specific replica set nodes - Reduces availability conf = rs.conf(); conf.members[0].tags = {"db": "analytics"} rs.reconfig(conf) Set readPreferenceTags on the connection string connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPref erenceTags=analytics/> Order matters when setting multiple tagsOrder matters
  36. 36. Scaling Sitecore – Multi Region Challenges: - Direct reads to the closest node - Direct writes to the closest node - Single database entity for reporting - Minimum complexity
  37. 37. Scaling Sitecore – Multi Region Replica Set: - Target reads using nearest read concern - Target reads using region based tags - Writes must go to the Primary - Requires at least one secondary per region
  38. 38. Scaling Sitecore – Multi Region Sharded cluster: - Target reads using nearest read concern - Target reads using region based tags - Requires at least one secondary per region - Writes must go to the Primaries - Tags or Zones are based on shard key ranges - Add location to shard key as prefix – change the source code
  39. 39. Scaling Sitecore – Multi Region Mongo to Mongo connector: - Creates a pipeline from a MongoDB cluster to another MongoDB cluster - Reads and replicates oplog operations - Easy deployment mongo-connector -m <name:port> -t <name:port> -d <database>
  40. 40. Scaling Sitecore – Connector oplog oplog db.Insert.foo ({a:1}) db.Insert.foo ({_id:1, a:1}) { "ts" : Timestamp(), "h" : NumLong(), "v" : 2, "op" : "i", "ns”:”foo.foo”, "o" : { "_id" : 1, a:1}
  41. 41. Scaling Sitecore – Multi Region Mongo to Mongo Connector
  42. 42. Scaling Sitecore – Multi Region Mongo to Mongo Connector
  43. 43. Scaling Sitecore – Multi Region Mongo to Mongo Connector
  44. 44. Benchmarks Benchmark 1: Single/Replica set MMAP vs Single shard/Replica set WiredTiger (3.2.8) Results: WiredTiger is 9.5% faster Benchmark 2: Sharded cluster MMAP vs Sharded cluster WiredTiger (Analytics sharded on {_id:1}) Results: WiredTiger is 9.4% faster
  45. 45. So what? - Evaluate your MongoDB architecture to determine if it would benefit from scaling - If scaling is in order, consider this talk as a reference - Recognize how MongoDB’s versatility makes it relevant to a wide variety of applications
  46. 46. Whats next? - Test MongoRocks (Percona Server) against Sitecore - Test In-Memory (Percona Server) for sessions or cache(s) - Expand sharding recommendations on add-ons - Evaluate other Sitecore modules for suitability with MongoDB - Re-invent our benchmarks
  47. 47. We’re Hiring! Looking to join a dynamic & innovative team? Justine is here at Percona Live 2017, Reach out directly to our Recruiter at justine.marmolejo@rackspace.com
  48. 48. Questions? Thank you!!! antonios.giannopoulos@rackspace.co.uk @iamantonios 🍍 grant.killian@rackspace.com @sitecoreagent

×