Presented by Ger Hartnett, Manager, Technical Services, MongoDB
Experience level: Advanced
Ger will take you on a ride through some memorable customer stories. Get to hear about some more unusual MongoDB use cases, the idiosyncratic choices behind them, and their path to success. You'll laugh, you'll cry, and you'll learn never to shard collections on booleans again.
6. 6
About You - Hands Up if...
• You work in operations
• You work in development
• You have a MongoDB system in production
• You have contacted MongoDB Technical Services (support)
• How many of you attended this talk last year?
7. 7
Stories
• We collect - observations about common mistakes - to share the experience
of many
• Names have been changed to protect the (mostly) innocent
• No animals were harmed during the making of this presentation (but maybe
some DBAs and engineers had light emotional scarring)
• While you might be new to MongoDB we have deep experience that you can
leverage
8. 8
The Stories
• Discovering a DR flaw during a data centre outage
• Complex documents, memory and an upgrade “surprise”
• Wild success “uncovers” the wrong shard key
10. 10
Story #1: Recovering from a diaster
• Prospect in the process of signing up for a subscription
• Called us late on Friday, data centre power outage and 30+ (11 shards)
servers down
• When they started bringing up the first shard, the nodes crashed with data
corruption
• 17TB of data, very little free disk space, JOURNALLING DISABLED!
11. 11
Recovery Plan
• Multisite team worked with customer over weekend to put plan in place
• Stop everything Repair config servers with mongodump / mongorestore
• In each replica set
– Start secondary in read only mode
– Mount NFS storage for repaired files
– Repair a former primary node Iterative rsync to seed a secondary
12. 12
Recovering Each Shard
• Start secondary read only
• Mount NFS storage for repair
• Repair former primary node
• Iterative rsync to seed a secondary
Second
ary
Primary
Second
ary
13. 13
Implementing the Plan
• Multiple calls checking progress at every step Config servers repaired
• Read-only shards started
• Repairing each shard primary while doing document count checks (some
documents missing, 9k on one shard)
• Provided method to “dump --repair” and diff to recover most of 9k missing
documents
15. 15
Aftermath and Lessons Learned
• Signed up for a subscription
• Enabled journalling
• Added a second data center with a RS member in each
• Put together disaster recovery procedures, backups and tested them
16. 16
Key Takeaways for You
• If you are departing significantly from standard config, check with us (i.e. if
you think journalling is a bad idea)
• Two DC in different buildings on different flood plains, not in the path of the
same storm (i.e. secondaries in AWS)
• DR/backups are useless if you haven’t tested them
17. 17
Story #2: Complex Documents, Memory and a “Surprise
• Well established ecommerce site selling diverse goods in 20+ countries
• After switching to wired tiger in production, performance dropped, this is the
opposite of what they were expecting
18. 18
Product Catalog: Original Schema
{ _id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... >
inventory: 423
}
19. 19
What’s Good About this Schema
• Each document contains all the data about a given product, across all
languages/locales
• Very efficient way to retrieve the English, French, German, etc. translations of
a single product’s information in one query
20. 20
However…
That is not how the product data is actually used
(except perhaps by translation staff)
23. 23
Consequences
• WiredTiger reads/rewrites the whole document
• Each document contained ~20x more data than any common use case
needed
• MongoDB lets you request just a subset of a document’s contents (using a
projection), but…
– Typically whole document loaded into RAM
• There are other overheads (like readahead)
24. 24
Visualising the Read Problem
{ _id: 42,
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... > }
<READAHEAD OVERHEAD>
{ _id: 709,
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... > }
<READAHEAD OVERHEAD>
{ _id: 3600,
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... > }
- Data in RED are loaded into RAM and
used.
- Data in BLUE take up memory but are
not required.
- Readahead padding in GREEN makes
things even more inefficient
26. 26
What Did We Recommend?
• Design for your use case, your most common query pattern
– In this case: 99.99% of queries want the product data for exactly one
locale at a time
– Move the frequently changing fields to a new collection
• Eliminate inefficiencies on the system
– Make reading from disk less wasteful, maximise I/O capabilities by
reducing readahead
27. 27
Product Catalog: Eventual Schema
{ _id: "375-en_US",
name : ..., description : ..., <etc...> }
{ _id: "375-en_GB",
name : ..., description : ..., <etc...> }
{ _id: "375-fr_FR",
name : ..., description : ..., <etc...> }
... and so on for other locales ...
db.inventory
{ _id: "375", count : NumberLong(1234), <etc...> }
28. 28
Aftermath and Lessons Learned
• Faster Updates
• Queries induced minimal overhead
• Greater than 20x distinct products fit in memory at once
• Disk I/O utilization reduced
• UI latency decreased
29. 29
Key Takeaways
• When doing a major version/storage-engine upgrade, test in staging with
some proportion of production data/workload
• Sometimes putting everything into one document is counter productive
30. 30
Story #3: Wild Success Uncovers the Wrong Shard Key
• Started out as error “[Balancer] caught exception … tag ranges not valid for:
db.coll”
• 11 shards, they had added 2 new shards to keep up traffic - 400+ databases
• Lots of code changes ahead of the Superbowl Spotted slow 300+s queries,
decided to build some indexes without telling us
• Went production down
33. 33
• The red-herring hunt begins
• Transparent Huge Pages enabled – production
• Chaotic call - 20 people talking at once, then in the middle of the call
everything started working again
• Barrage of tickets and calls Connection storms
35. 35
Diagnosing the Issue #2
• Got inconsistent and missing log files
• Discovered repeated scatter-gather queries returning the same results
• Secondary reads
• Heavy load on some shards and low disk space
37. 37
Diagnosing the Issues #3
• Shard Key – string with year/month and customer id
{
_id : ObjectId("4c4ba5e5e8aabf3"),
count: 1025,
changes: { … }
modified :
{ date : "2015_02",
customerId: 314159 }
}
38.
39. 39
Diagnosing the Issues #4
• First heard about DDOS attack
• Missing tag ranges on some collections
• Stopped the balancer which reduced system load from chunk moves
• Two clusters had a mongos each on the same server
40. 40
Fixing the Issues
• Script to fix the tag ranges
• Proposed finer granularity shard key - but this was not possible because of 30TB of data Moved mongos to
dedicated servers
• Re-enable the balancer for short windows with waitForDelete and secondaryThrottle
• Put together scripts to pre-split and move empty chunks to quiet shards based on traffic from month before
42. 42
The Diagnosis in Retrospect
• The outage did not appear to have been related to either the invalid tag
ranges or the earlier failed moves
• The step downs did not help resolve the outage but did highlight some
queries that need to be fixed
• The DDoS was the ultimate cause of the outage - lead to diagnosis of deeper
issues
• The deepest issue was the shard key
43. 43
Aftermath and Lessons Learned
• Signed up for a Named TSE
• Now doing pre-split and move before the end of every month
• Check before making other changes (i.e. building new indexes)
44. 44
Key Takeaways for You
• Choosing a shard key is a pivotal decision - make it carefully
• Understand current bottleneck
• Monitor insert distribution and chunk ranges
• Look for slow queries (logs & mtools)
• Run mongos, mongod, config server on dedicated server or use
containers/cgroups