How to Achieve Scale with MongoDB

How to Achieve Scale with
MongoDB
Jake Angerman
Sr. Solutions Architect, MongoDB

Today’s Webinar Agenda
Schema Design
Indexes
Monitoring your Workload
Achieve Scale
Optimization Tips
Scale Vertically
Horizontal
Scaling
1
2
3

Optimization Tips to
Scale Your App

Premature Optimization
• There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about,
or worrying about, the speed of noncritical parts of their
programs, and these attempts at efficiency actually have a strong
negative impact when debugging and maintenance are
considered. We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%.
- Donald Knuth, 1974

• "There is no doubt that the grail of efficiency leads to abuse.
considered. We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%."

• "There is no doubt that the grail of efficiency leads to abuse.
considered. We should forget about small efficiencies, say
about 97% of the time: premature optimization is the root of
all evil. Yet we should not pass up our opportunities in that
critical 3%."

Schema Design
• Document Model
• Dynamic Schema
• Collections
{ "customer_id" : 123,
"first_name" : ”John",
"last_name" : "Smith",
"address" : {
"street": "123 Main Street",
"city": "Houston",
"state": "TX",
"zip_code": "77027"
}
policies: [ {
policy_number : 13,
description: “short term”,
deductible: 500
},
{ policy_number : 14,
description: “dental”,
visits: […]
} ]
}

The Importance of Schema Design
• MongoDB schemas are built oppositely than relational
schemas!
• Relational Schema:
– normalize data
– write complex queries to join the data
– let the query planner figure out how to make queries efficient
• MongoDB Schema:
– denormalize the data
– create a (potentially complex) schema with prior knowledge of your
actual (not just predicted) query patterns
– write simple queries

Real World Example: Optimizing Schema for
Scale
Product catalog schema for retailer selling in 20 countries
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
de_CH: …,
<… and so on for other locales …>
}

What's good about this schema?
• Each document contains all the data about the
product across all possible locales.
• It is the most efficient way to retrieve all translations of
a product in a single query (English, French, German,
etc).

But that's not how the data was accessed
db.catalog.find( { _id: 375 }, { en_US: true } );
db.catalog.find( { _id: 375 }, { fr_FR: true } );
db.catalog.find( { _id: 375 }, { de_DE: true } );
… and so forth for other locales
The data model did not fit the access pattern.

Why is this inefficient?
Data in RED are
being used. Data in
BLUE take up
memory but are not in
demand.
{
_id: 375,
de_DE: …,
de_CH: …,
}
{
_id: 42,
de_DE: …,
de_CH: …,
}

Consequences of the schema
• Each document contained 20x more data than the
common use case requires
• Disk IO was too high for the relatively modest query
load on the dataset
• MongoDB lets you request a subset of a document's
contents via projection…
• … but the entire document must be loaded into RAM
to service the request

Consequences of the schema redesign
• Queries induced minimal memory overhead
• 20x as many distinct products fit in RAM at once
• Disk IO utilization reduced
• Application latency reduced
{
_id: "375-en_GB",
name: …,
description: …,
<… the rest of the document …>
}

Schema Design Patterns
• Pattern: pre-computing interesting quantities, ideally with each
write operation
• Pattern: putting unrelated items in different collections to take
advantage of indexing
• Anti-pattern: appending to arrays ad infinitum
• Anti-pattern: importing relational schemas directly into
MongoDB

Schema Design Tips
• Avoid inherently slow operations
– Updates of unindexed arrays of several thousand elements
– Updates of indexed arrays of several hundred elements
– Document moves
• Arrays are great, but know how to use them

Schema Design resources
• Blog series, "6 rules of thumb"
– Part 1: http://goo.gl/TFJ3dr
– Part 2: http://goo.gl/qTdGhP
– Part 3: http://goo.gl/JFO1pI

Indexing
• Indexes are tree-structured sets of references to your
documents
• Indexes are the single biggest tunable performance factor in
the database
• Indexing and schema design go hand in hand

Indexing Mistakes
• Failing to build necessary indexes
• Building unnecessary indexes
• Running ad-hoc queries in production

Indexing Fixes
• Failing to build necessary indexes
– Run .explain(), examine slow query log, mtools, system.profile
collection
• Building unnecessary indexes
– Talk to your application developers about usage
• Running ad-hoc queries in production
– Use a staging environment, use secondaries

mongod log files
Sun Jun 29 06:35:37.646 [conn2] query
test.docs query: { parent.company:
"22794", parent.employeeId: "83881" }
ntoreturn:1 ntoskip:0 nscanned:806381
keyUpdates:0 numYields: 5
locks(micros) r:2145254 nreturned:0
reslen:20 1156ms

mongod log files
date and time thread operation
reslen:20 1156ms
n…
counters
lock
times
duration
number
of yields

You need a tool when doing log file analysis

mtools
• http://github.com/rueckstiess/mtools
• log file analysis for poorly performing queries
– Show me queries that took more than 1000 ms from 6 am to 6 pm:
– mlogfilter mongodb.log --from 06:00 --to 18:00 --slow
1000 > mongodb-filtered.log

Graphing with mtools
% mplotqueries --type histogram --group namespace --bucketSize 3600

Real World Example: Indexing for Scale
reslen:20 1156ms

Document schema
{
_id: ObjectId("53b9ab7e939f1e229b4f574c"),
firstName: "Alice",
lastName: "Smith",
parent: {
company: 22794,
employeeId: 83881
}
}

But there's an index!?!
db.system.indexes.find().toArray()
[{
"v" : 1,
"key" : {
"company" : 1,
"employeeId" : 1
},
"ns" : "test.docs",
"name" : "company_1_employeeId_1"
}]

But there's an index!?!
[{
"v" : 1,
"key" : {
"company" : 1,
"employeeId" : 1
},
"ns" : "test.docs",
"name" : "company_1_employeeId_1"
}]
This isn't
the index
you're
looking for.

Did you see the problem?
{
_id: ObjectId("53b9ab7e939f1e229b4f574c"),
firstName: "Alice",
lastName: "Smith",
parent: {
company: 22794,
employeeId: 83881
}
}

The index was created incorrectly
[{
"v" : 1,
"key" : {
"parent.company" : 1,
"parent.employeeId" : 1
},
"ns" : "test.docs",
"name" :
"parent.company_1_parent.employeeId_1"
}]
Subdocument
needed

Indexing Strategies
• Create indexes that support your queries!
• Create highly selective indexes
• Eliminate duplicate indexes with a compound index, if possible
– db.collection.ensureIndex({A:1, B:1, C:1})
– allows queries using leftmost prefix
• Order compound index fields thusly: equality, sort, then range
– see http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
• Create indexes that support covered queries
• Prevent collection scans in pre-production environments
– mongod --notablescan
– db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )

Monitoring Your Workload
• Log files, iostat, mtools, mongotop are for debugging
• MongoDB Management Service (MMS) can do metrics
collection and reporting

Hardware statistics (CPU, disk)

Cloud Version of MMS
1. Go to http://mms.mongodb.com
2. Create an account
3. Install one agent in your datacenter
4. Add hosts from the web interface
5. Enjoy!

Hardware Considerations
Achieve Scale
1 Optimization Tips
Scale Vertically
Horizontal
Scaling
2
3

Vertical Scaling
Factors:
– RAM
– Disk
– CPU
– Network
Replica Set
Primary
Secondary
Secondary
Replica Set
Primary
Secondary
Secondary
Horizontal Scaling

Working Set Exceeds Physical
Memory

RAM - Measure your working set and index
sizes
• db.serverStatus({workingSet:1}).workingSet
{ "computationTimeMicros": 2751,
"note": "thisIsAnEstimate",
"overSeconds": 1084,
"pagesInMemory": 2041
}
• db.stats().indexSize
2032880640
• In this example,
(2041 * 4096) + 2032880640 = 2041240576 bytes
= 1.9 GB
• Note: this is a subset of the virtual memory used by mongod

Real World Example: Vertical Scaling
• System that tracked status information for entities in the
business
• State changes happen in batches; sometimes 10% of entities
get updated, sometimes 100% get updated

Initial Architecture
Sharded cluster with 4 shards using spinning disks
Application / mongos
mongod

Adding shards to scale horizontally
• Application was a success! Business entities grew by a factor of
5
• Cluster capacity multiplied by 5, but so did the TCO
mongod
…16 more shards…

More success means more shards
• 10x growth means … 200 shards
• Horizontal scaling with sharding is linear scaling, but an order
of magnitude was needed
• Bulk updates of random documents approaches speed of
disks

Final architecture
• Scaling the random IOPS with SSDs was a vertical scaling
approach
mongod SSD

Before you add hardware…
• Make sure you are solving the right scaling problem
• Remedy schema and index problems first
– schema and index problems can look like hardware problems
• Tune the Operating System
– ulimits, swap, NUMA, NOOP scheduler with hypervisors
• Tune the IO subsystem
– ext4 or XFS vs SAN, RAID10, readahead, noatime
• See MongoDB "production notes" page
• Heed logfile startup warnings

Achieve Scale
1 Optimization Tips
2 Scale Vertically
The Horizontal Basics of Sharding
Scaling
3

The basics of
Horizontal Scaling

The basics of
Horizontal Scaling
(aka Sharding)

Rule of Thumb
To make good decisions about
MongoDB implementations, you
must understand MongoDB and your
applications and the workload your
applications generate and your
business requirements.

Summary
• Don't throw hardware at the problem until you examine all
other possibilities (schema, indexes, OS, IO subsystem)
• Know what is considered "normal" performance by monitoring
• Horizontal scaling in MongoDB is implemented with sharding,
but you must understand schema design and indexing before
you shard
Sharding a sub-optimally designed
database will not make it performant

Achieve Scale
1 Optimization Tips
The Horizontal Basics of Sharding
Scaling
3
Schema Design
Indexes
Monitoring your Workload
2 Scale Vertically

Limited Time: Get Expert Advice for Free
If you’re thinking about
scaling, why reinvent the
wheel?
Our experts can collaborate
with you to provide detailed
guidance.
Sign Up For a Free One Hour
Consult:
http://bit.ly/1rkXcfN

Questions?
Stay tuned after the webinar and take our survey
for your chance to win MongoDB schwag.

Thank You
Jake Angerman
Sr. Solutions Architect, MongoDB

How to Achieve Scale with MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to How to Achieve Scale with MongoDB

Similar to How to Achieve Scale with MongoDB (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

How to Achieve Scale with MongoDB

Editor's Notes