How sitecore depends on mongo db for scalability and performance, and what it can teach you
May. 5, 2017•0 likes•731 views
Download to read offline
Report
Presentations & Public Speaking
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian
How sitecore depends on mongo db for scalability and performance, and what it can teach you
1. How Sitecore depends on MongoDB
for scalability and performance, and
what it can teach you
Antonios Giannopoulos
Database Administrator – ObjectRocket
Grant Killian
Sitecore Architect - Rackspace
Percona Live 2017
2. Agenda
We are going to discuss:
Key terms
- Introduction to Sitecore
- Introduction to MongoDB
Best Practices for MongoDB with Sitecore
Scaling Sitecore
Benchmarks
3. Who We Are
Antonios Giannopoulos
Database Administrator w/ ObjectRocket
Grant Killian
Sitecore Architect w/ Rackspace
Sitecore MVP
9. Sitecore ♥ MongoDB because . . .
● Unstructured document model is a better fit for
Sitecore analytics vs traditional database rows
● ∞ scalability
● Introduces key flexibility to the system
○ HTTP Session state
○ Optional repository for other Sitecore modules
○ 100% replacement for SQL Server (experimental)
■ $$$
10. MongoDB replica-set
A group of mongod processes that maintain the same dataset
Replica sets provides:
- Redundancy
- High availability
- Scaling
11. MongoDB replica-set
Consists of at least 3 nodes
- Up to 50 nodes in 3.0 and higher
- 12 on previous versions
A replica-set node may be either:
- Primary
- Secondary
- Arbiter
13. MongoDB replica-set
Best Practices
- Odd number of members
- Use same server specs
- Reliable network connections
- Adjust the oplog accordingly
14. MongoDB Sharded Clusters
Consists of:
Mongos
- It’s a statement (query) router
- Connection interface for the driver - makes sharding transparent
Config Servers: Holds cluster metadata - location of the data
Shards: Contains a subset of the sharded data
16. MongoDB Sharded Clusters
Best Practices
- Deploy shards as replica-sets
- Reliable network connections
- But most important… pick a shard key
Undo a shard key might require downtime
17. MongoDB Sharded Clusters
What makes a good shard key:
- High Cardinality
- Not Null values
- Immutable field(s)
- Not Monotonically increased fields
- Even read/write distribution
- Even data distribution
- Read targeting/locality
Most important choose a shard key according to your application requirements
18. MongoDB Storage Engines
MongoDB version 3.0 and higher supports:
- MMAPv1
- WiredTiger
- RocksDB (Percona Server)
- In Memory (Percona Server)
- Fractal Tree (Percona Server)
19. Sitecore MongoDB Databases
1. Analytics - customer visit metrics (IP address, browser,pages…)
2. Tracking_contact - contact processing
3. Tracking_history - history worker queue for full rebuilds
4. Tracking_live - task queue for real-time processing
5. Private_session - “classic” http session state
6. Shared_session - meta http session state for contacts
(engagement state for livetime of interactions…)
20. For example . . .
Graphic courtesy of http://www.techphoria414.com
21. Scaling Sitecore – Separate Workloads
Move each Sitecore database to a separate instance
Sitecore uses different connection string per Database
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database
_name_" />
connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_databas
e_name_" />
Instances can be optimized according to their workload
22. Scaling Sitecore – Polyglot
Use a different storage engine per database:
- Different instances
- Sharded clusters, different storage engines per shard
Percona In-memory storage engine is a good fit for _sessions
- Based on the in-memory storage engine used in MongoDB Enterprise Edition
- _sessions data are not persistent
23. Scaling Sitecore - Sharding
What to shard:
- Large collections for capacity
- Busy collections for load distribution
How to pick a shard key:
- Collect a representative statement sample and identify statement patterns
- Pick a shard key that scales the workload/statements
- Meet sharding constraints
24. Scaling Sitecore - Sharding
From Sitecore documentation: “Sitecore calculates
diskspace sizing projections using 5KB per
interaction and 2.5KB per identified contact and
these two items make up 80% of the diskspace”
Sharding interaction and contact for capacity.
25. Scaling Sitecore - Sharding
Collection Interaction
Receives: Inserts, Queries and Updates
Read/Write Ratio: 60-40
Updates are using the _id
Queries are using:
"_id, ContactId” : 80%
"ContactId,_t”: 5%
"ContactId,ContactVisitIndex”: 15%
26. Scaling Sitecore - Sharding
Collection Interaction
Recommended shard key is _id:1 or _id:hashed
- Scale vast majority of statements
- But… few scatter-gather queries (around 20%)
{ContactId:1} is also decent, But:
- Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule
- _id is generated by the application not the driver
- Potential for Jumbo chunks
27. Scaling Sitecore - Sharding
Collection Interaction
Choose your shard key according to your engine
- MMAP _id:1 or _id:hashed
- WiredTiger _id:1 or _id:hashed or ContactId:1
Sitecore may optimize sharding by including ContactId on the updates
28. Scaling Sitecore - Sharding
Collection Contacts
Receives: Inserts, Queries and Updates
Read/Write Ratio: 80-20
Updates are using the _id
Queries are using the _id (with additional fields)
Recommended shard key is _id:1 or _id:hashed
29. Scaling Sitecore - Sharding
Collection Devices
Recommended shard key is _id:1 or _id:hashed
Collection ClassificationsMap
Recommended shard key is _id:1 or _id:hashed
Collection KeyBehaviorCache
Recommended shard key is _id:1 or _id:hashed
30. Scaling Sitecore - Sharding
Collection GeoIps
Recommended shard key is _id:1 or _id:hashed
Collection OperationStatuses
Recommended shard key is _id:1 or _id:hashed
Collection ReferringSites
Recommended shard key is _id:1 or _id:hashed
31. Scaling Sitecore - Sharding
{_id:1} vs {_id:hashed}
Client generated _id are monotonically increased thus “hashed”
added for randomness
Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled
on BinData datatype
Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
32. Scaling Sitecore - Sharding
{_id:1} vs {_id:hashed}
You may use the uuidhelpers.js utility to convert _id to UUID
Download from: https://github.com/mongodb/mongo-csharp-
driver/blob/master/uuidhelpers.js
>doc = db.test.findOne()
{ "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") }
>doc._id.toCSUUID()
CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
33. Scaling Sitecore - Sharding
Use {_id:"hashed”} when you have an empty collection
Using numInitialChunks allows to pre-split and distribute empty chunks.
- Avoid chunk splits
- Avoid chunk moves
db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} ,
numInitialChunks:<number>} ) , number < 8192 per shard.
34. Scaling Sitecore - Sharding
Use {_id:"hashed”} when you have an empty collection
Define numInitialChunks
Size= Collection size (in MB)/32
Count= Number of documents/125000
Limit= Number of shards*8192
numInitialChunks = Min(Max(Size, Count), Limit)
35. Scaling Sitecore - Sharding
Move Primary
Move each sitecore database to a different shard:
(analytics, tracking_live …)
db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } )
Requires downtime for live databases
36. Scaling Sitecore – Secondary Reads
You can configure Secondary Reads from the driver (secondary or
secondaryPreferred)
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da
tabase_name_?readPreference=secondary/>
In 3.4 maxStalenessSeconds was introduced to control stale reads
Specifies, in seconds, how stale a secondary can be before the client stops using
it for read operations
37. Scaling Sitecore – Secondary Reads
Use ReplicaSet Tags to target reads:
- Direct reads to specific replica set nodes
- Reduces availability
conf = rs.conf();
conf.members[0].tags = {"db": "analytics"}
rs.reconfig(conf)
Set readPreferenceTags on the connection string
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPref
erenceTags=analytics/>
Order matters when setting multiple tagsOrder matters
38. Scaling Sitecore – Multi Region
Challenges:
- Direct reads to the closest node
- Direct writes to the closest node
- Single database entity for reporting
- Minimum complexity
39. Scaling Sitecore – Multi Region
Replica Set:
- Target reads using nearest read concern
- Target reads using region based tags
- Writes must go to the Primary
- Requires at least one secondary per region
40. Scaling Sitecore – Multi Region
Sharded cluster:
- Target reads using nearest read concern
- Target reads using region based tags
- Requires at least one secondary per region
- Writes must go to the Primaries
- Tags or Zones are based on shard key ranges
- Add location to shard key as prefix – change the source code
41. Scaling Sitecore – Multi Region
Mongo to Mongo connector:
- Creates a pipeline from a MongoDB cluster to another
MongoDB cluster
- Reads and replicates oplog operations
- Easy deployment
mongo-connector -m <name:port> -t <name:port> -d <database>
46. Benchmarks
Benchmark 1: Single/Replica set MMAP vs Single shard/Replica set
WiredTiger (3.2.8)
Results: WiredTiger is 9.5% faster
Benchmark 2: Sharded cluster MMAP vs Sharded cluster
WiredTiger (Analytics sharded on {_id:1})
Results: WiredTiger is 9.4% faster
47. So what?
- Evaluate your MongoDB architecture to determine if it
would benefit from scaling
- If scaling is in order, consider this talk as a
reference
- Recognize how MongoDB’s versatility makes it
relevant to a wide variety of applications
48. Whats next?
- Test MongoRocks (Percona Server) against Sitecore
- Test In-Memory (Percona Server) for sessions or
cache(s)
- Expand sharding recommendations on add-ons
- Evaluate other Sitecore modules for suitability with
MongoDB
- Re-invent our benchmarks
49. We’re Hiring!
Looking to join a dynamic & innovative team?
Justine is here at Percona Live 2017,
Reach out directly to our Recruiter at justine.marmolejo@rackspace.com