Fixing Sub-optimal Performance in a Retail Application

Ger Hartnett
Director of Technical Services (EMEA), MongoDB @ghartnett #MongoDB
Tales from the Field
Part two: Fixing Sub-optimal Performance in
a Retail Application

Or:
●Cautionary Tales
●Don’t solve the wrong problems
●Bad schemas & shard keys hurt ops
too

●The main talk should take 30-35 minutes
●You can submit questions via the chat box
●We’ll answer as many as possible at the end
●We will send the slides and recording
tomorrow via email
●The final webinar in the series will take place
on Thursday 21rd April – 14:00 BST | 15:00
CEST
Before we start

●You work in operations
●You work in development
●You have a MongoDB system in production
●You have contacted MongoDB Technical
Services (support)
●You attended the last webinar (part1)
A quick poll - add a word to the
chat to let me know your
perspective

●We collect - observations about common
mistakes - to share the experience of many
●Names have been changed to protect the
(mostly) innocent
●No animals were harmed during the making
of this presentation (but maybe some DBAs
and engineers had light emotional scarring)
●While you might be new to MongoDB we
have deep experience that you can leverage
Stories

1. Discovering a DR flaw during a data
centre outage
2. Complex documents, memory and
an upgrade “surprise”
3. Wild success “uncovers” the wrong
shard key
The Stories (part two today)

Story #1: Recovering from a
disaster
●Prospect in the process of signing up for a
subscription
●Called us late on Friday, data centre power
outage and 30+ (11 shards) servers down
●When they started bringing up the first
shard, the nodes crashed with data
corruption
●17TB of data, very little free disk space,
JOURNALLING DISABLED!

Recovering each shard
1.Start secondary
read only
2.Mount NFS
storage for repair
3.Repair former
primary node
4.Iterative rsync to
seed a secondary
Secondary
Primary
Secondary

Key takeaways for you
●If you are departing significantly from
standard config, check with us (i.e. if you
think journalling is a bad idea)
●Two DC in different buildings on different
flood plains, not in the path of the same
storm (i.e. secondaries in AWS)
●DR/backups are useless if you haven’t
tested them

Story #2: Complex documents,
memory and an upgrade
“surprise”
●Well established ecommerce site selling
diverse goods in 20+ countries
●After switching to wired tiger in production,
performance dropped, this is the opposite of
what they were expecting

{
_id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... >
inventory: 423
}
Product Catalog: Original
Schema

What’s good about this schema?
● Each document contains all the data about a given
product, across all languages/locales
● Very efficient way to retrieve the English, French,
German, etc. translations of a single product’s
information in one query

However……
That is not how the product data is
actually used
(except perhaps by translation staff)

db
db.catalog.update( { _id : 375 }, { $inc: { count: -1 } } )
db.catalog.find( { _id : 375 } , { en_US : true } );
db.catalog.find( { _id : 375 } , { fr_FR : true } );
db.catalog.find( { _id : 375 } , { de_DE : true } );
... and so forth for other locales ...
Dominant Query Patterns

Which means……
The Product Catalog’s data model
did not fit the way the data was
accessed.

Consequences
●WiredTiger reads/rewrites the whole document
●Each document contained ~20x more data than
any common use case needed
●MongoDB lets you request just a subset of a
document’s contents (using a projection), but…
o Typically whole document loaded into RAM
●There are other overheads (like readahead)

{ _id: 42,
de_DE : ...,
de_CH : ...,
<... and so on for other locales... > }
<READAHEAD OVERHEAD>
{ _id: 709,
de_DE : ...,
de_CH : ...,
<READAHEAD OVERHEAD>
{ _id: 3600,
de_DE : ...,
de_CH : ...,
Visualising the read problem
- Data in RED are loaded into RAM
and used.
- Data in BLUE take up memory but
are not required.
- Readahead padding in GREEN
makes things even more inefficient

More RAM? It’s not that simple

What did we recommend?
● Design for your use case, your most common query
pattern
o In this case: 99.99% of queries want the product
data for exactly one locale at a time
o Move the frequently changing fields to a new
collection
● Eliminate inefficiencies on the system
o Make reading from disk less wasteful, maximise I/O
capabilities by reducing readahead

{ _id: "375-en_US",
name : ..., description : ..., <etc...> }
{ _id: "375-en_GB",
{ _id: "375-fr_FR",
... and so on for other locales ...
db.inventory
{ _id: "375", count : NumberLong(1234), <etc...> }
Product Catalog: Eventual
Schema

Aftermath & lessons learned
●Faster updates
●Queries induced minimal overhead
●Greater than 20x distinct products fit in
memory at once
●Disk I/O utilization reduced
●UI latency decreased

Key Takeaways
●When doing a major version/storage-engine
upgrade, test in staging with some
proportion of production data/workload
●Sometimes putting everything into one
document is counter productive

Story #3: Quick Preview
2 More Shards….

Story #3: Wild success uncovers
the wrong shard key
●Started out as error “[Balancer] caught
exception … tag ranges not valid for: db.coll”
●11 shards, they had added 2 new shards to
keep up traffic - 400+ databases
●Lots of code changes ahead of the
Superbowl
●Spotted slow 300+s queries, decided to build
some indexes without telling us
●Production went down

Further Reading
Production notes
docs.mongodb.org/manual/administration/production-notes
Mtools
github.com/rueckstiess/mtools

Ger Hartnett
Director Technical Services (EMEA), MongoDB
@ghartnett #MongoDB
Questions?

●You can submit questions via the chat box
●We are recording and will send slides
tomorrow
●We will send the slides and recording
tomorrow via email
●Part 3: the next webinar will take place on
Thursday 21st April – 14:00 BST | 15:00
CEST
www.mongodb.com/webinars
Questions

Code GerHartnett gets 25% discount

Fixing Sub-optimal Performance in a Retail Application

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fixing Sub-optimal Performance in a Retail Application

Similar to Fixing Sub-optimal Performance in a Retail Application (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Fixing Sub-optimal Performance in a Retail Application

Editor's Notes