RSVP: mongodb.com/events
What’s coming in 3.4?
San Francisco | 11/1
Schema Patterns and
Your Storage Engine
Agenda
Schema Design 101
MMAPv1
Wired Tiger
Examples
Update Process
https://farm3.staticflickr.com/2001/2087134188_98125a9702_z.jpg
Why do we have so some many
options?
MMAP V1WT 3rd
Party
Available 3.2 Your
own?
In-memoryEncrypted
Example of Document Definition
• App1
- Realtime dashboard
- Ad hoc queries
- User profiles
• App2
- Heavy writes batch process
- Analytics workload
- Multi-tenant application
Schema Design 101
Why is Schema Design Important ?
Why is Schema Design Important ?
• Defines our applications interactions
• How we defined data
• Can considerably impact performance
• Impact DBA’s sleep!
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an
array of sub-documents
Typed field values
Fields can contain arrays
Document Model
class Item(Object):
def __init__(self, name, car ,date):
self.name = name
self.car = car
self.date = date
class Car(Object):
def __init__(self, brand, manufactor,
date):
self.brand = brand
self.manufactor = manufactor
self.date = date
Natural
Representation
Flexible Schema
Aligned with
Development
{ _id: 1,
'greetings':'hello'
}
{ _id: 2,
'k':'greetings',
'v': 'hello'
}
{ _id: 1,
'k':'greetings',
'v': 'hello'
}
{ _id: 2,
'car':{ 'manufactor': 'somecarmaker',
'brand': 'somebrand',
'date': ISODate("2016-06-27")
}
'date': ISODate("2016-06-27")
}
What to Consider
Change Rate ConcurrencyData Structure Data Lifecycle
What to Consider
Data Structure
users accounts reviews products
What to Consider
Data Structure
accounts
users
products
reviews
What to Consider
Data Structure
users accounts reviews products
What to Consider
Data Structure
Client1 Client2 Client3 Client4
What to Consider
{
_id: 1,
model : "ford"
year: 2014,
picture: BinData(2, "afGGAF677..."),
}
{
_id: 1,
model : "ford"
date: {year: 2014, month: 5, day: 1},
picture: "AAFF123BEFC...",
}
{
_id: 1,
model : ObjectId("21213312"),
date: ISODate("20140501"),
picture: "AAFF123BEFC...",
}
Change Rate
What to Consider
{
_id: 1,
model : "ford"
year: 2014,
picture: BinData(2, "afGGAF677..."),
}
{
_id: 1,
model : "ford"
date: {year: 2014, month: 5, day: 1},
picture: "AAFF123BEFC...",
}
{
_id: 1,
model : ObjectId("21213312"),
date: ISODate("20140501"),
picture: "AAFF123BEFC...",
}
Different data types
Change Rate
What to Consider
{
_id: 1,
model : "ford"
year: 2014,
picture: BinData(2, "afGGAF677..."),
}
{
_id: 1,
model : "ford"
date: {year: 2014, month: 5, day: 1},
picture: "AAFF123BEFC...",
}
{
_id: 1,
model : ObjectId("21213312"),
date: ISODate("20140501"),
picture: "AAFF123BEFC...",
}
Different field structure
Change Rate
What to Consider
{
"product_id": "b872ad6e34f87102cf866fead4e10e29",
"purchases":{
"2016": {
"06": 2823,
"05": 5535,
...
"total": 1222312
},
"2015": {...},
"2014": {...},
"2013": {...},
}
}
Data Lifecycle
What to Consider
{
"product_id": "b872ad6e34f87102cf866fead4e10e29",
"purchases":{
"2016": {
"06": 2823,
"05": 5535,
...
"total": 1222312
},
"2015": {...},
"2014": {...},
"2013": {...},
}
}
What we actually
read and write
Data Lifecycle
What to Consider
{
"product_id": "b872ad6e34f87102cf866fead4e10e29",
"purchases":{
"2016": {
"06": 2823,
"05": 5535,
...
"total": 1222312
},
"2015": {...},
"2014": {...},
"2013": {...},
}
}
What we don't
anymore
Data Lifecycle
What to Consider
Concurrency
What to Consider
Concurrency
What to Consider
Concurrency
What to Consider
Concurrency
MMAPv1
Disclaimer!
WiredTiger is our default Storage Engine - 3.2
onwards
MMAPv1 / Basics
• Data is Mapped into virtual Memory for
Fast access
• Documents pointers are request per
access
• If in Memory = fast
• If not = Disk seek
• Indexes follow the same structure
• Allocation based on Database per file
db.collection.update(
{a: 1},
{ $set: {
b: 1,
c: {$inc:10},
d: {$push: ["hello"]
})
$set Operator
MMAPv1 / Schema Design Best Practices
db.collection.insert(
{
_id: 1,
name: 'Norberto',
parking_tickets: {},
eat_outs:[
null,
null,
null,
null
]
}
)
Document
Pre-allocation
db.collection.update(
{a: 1},
{ $set: {
b: 1,
c: {$inc:10},
d: {$push: ["hello"]
})
$set Operator
MMAPv1 / Schema Design Best Practices
db.collection.insert(
{
_id: 1,
name: 'Norberto',
parking_tickets: {},
eat_outs:[
null,
null,
null,
null
]
}
)
Document
Pre-allocation
db.collection.update(
{a: 1},
{ $set: {
b: 1,
c: {$inc:10},
d: {$push: ["hello"]
})
$set Operator
db.movies.update(
{_id: "b872ad6e34"},
{$push:{
reviews: {
$each:
["great", "5*","awful"],
$slice: 10
}
}})
Keep Documents
Small
MMAPv1 / Schema Design Best Practices
Wired Tiger
WiredTiger / Basics
• MVCC Storage Engine
• Compression
• Document Level Concurrency Control
• Better resource allocation
WiredTiger Engine
Schema &
Cursors
Python API C API Java API
Database
Files
Transactions
Page
read/write
Logging
Column
storage
Block
management
Row
storage
Snapshots
Log Files
Cache
WiredTiger/ Schema Design Best Practices
Cache Size
WiredTiger/ Schema Design Operational Best Practices
Disk
RAM
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS
Document Level
Concurrency
Control
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Collection
Collection
Document Level
Concurrency
Control
Cache Size
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Doc1
Doc2
Doc3
Doc4
Doc5
Doc6
DocX
Document Level
Concurrency
Control
MMAPv1/ Operational Best Practices
mongod
Collection1
doc1
Collection2
doc2
Collection2
doc3
Collection4
doc4
Collection5
doc5
Collection6
doc6
Collection7
doc7
CollectionX
docX
Document Level
Concurrency
Control
MMAPv1/ Operational Best Practices
mongod
Collection1
doc1
Collection2
doc2
Collection2
doc3
Collection4
doc4
Collection5
doc5
Collection6
doc6
Collection7
doc7
CollectionX
docX
Document Level
Concurrency
Control
Cache Size Compression
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Collection
Collection
Doc1
Doc2
Doc3
Doc4
Doc5
Doc6
DocX
RAM
{
"message": "hello
MongoDB World",
"user": "Norberto",
"channel": "twitter",
"comments": [
{"text":"howdy",
"user":"Ross"}]
}
Document Level
Concurrency
Control
Cache Size Compression
WiredTiger/ Operational Best Practices
Disk
RAM
cacheSizeGB
Heap
OS mongod
Collection
Collection
Collection
Doc1
Doc2
Doc3
Doc4
Doc5
Doc6
DocX
RAM
gzip(0x001231400122
2321)
WiredTiger/ Schema Design - Compression
{
"product_id": "b872ad6e34",
"name": "MongoDB Atlas",
"company": "MongoDB",
"description": "MongoDBAAS",
"comments": [
{"text":"Beautiful"}
]
}
WiredTiger/ Schema Design - Compression
{
"product_id": "b872ad6e34",
"name": "MongoDB Atlas",
"company": "MongoDB",
"description": "MongoDBAAS",
"comments": [
{"text":"Beautiful"}
]
}
MMAPv1
{
"pid": "b872ad6e34",
"n": "MongoDB Atlas",
"c": "MongoDB",
"d": "MongoDBAAS",
"cs": [
{"t":"Beautiful"}
]
}
WiredTiger/ Schema Design - Compression
{
"product_id": "b872ad6e34",
"name": "MongoDB Atlas",
"company": "MongoDB",
"description": "MongoDBAAS",
"comments": [
{"text":"Beautiful"}
]
}
WiredTiger
WiredTiger/ Schema Design – Index Prefix Compression
{
"user": "Norberto",
"country": "Portugal",
"last_comment": ”European Champions!!!",
}
WiredTiger/ Schema Design – Index Prefix Compression
{
"user": "Norberto",
"country": "Portugal",
"last_comment": "Iceland beat England :-)",
}
db.users.createIndex( {"country": 1} )
Australia-Zimbabwe
A-N N-Z
N-Q Q-Z…
record_id: 0x12311
WiredTiger/ Schema Design – Index Prefix Compression
{
"user": "Norberto",
"country": "Portugal",
"last_comment": "Iceland beat England :-)",
}
db.users.createIndex( {"country": 1} )
Australia-Zimbabwe
A-N N-Z
N-Q Q-Z…
{
key: "Por",
record_id: 0x12311
}
record_id: 0x12311
{
key: "Por",
record_id: 0x123BB
}
{
key: "Pol",
record_id: 0xF23CB
}
Picking a Storage
Engine Matters ?
In-place Updates
Memory
{
country: "Portugal",
star: "Cristiano"
}
{
country: "Iceland",
star: "Hannes Magnusson"
}
{
country: "Netherlands",
star: "Arthur Viegers"
}
{
country: "Portugal",
star: "Crestiano"
}
In-place Updates
Memory
{
country: "Portugal",
star: "Cristiano"
}
{
country: "Iceland",
star: "Hannes Magnusson"
}
{
country: "Netherlands",
star: "Arthur Viegers"
}
db.teams.update({country: "Portugal"},
{"$set": {
"star": "Cristiano Ronaldo"
}})
{
country: "Portugal",
star: "Cristiano Ronaldo"
}
MMAPv1
In-place Updates
Memory
{
country: "Portugal",
star: "Crestiano"
}
db.teams.update({country: "Portugal"},
{"$set": {
"star": "Cristiano Ronaldo"
}})
{
country: "Portugal",
star: "Cristiano Ronaldo"
}
WiredTiger
version 1
version 2
In-place Updates
Memory
{
country: "Portugal",
star: "Crestiano"
}
{
country: "Portugal",
star: "Cristiano Ronaldo"
}
WiredTiger
version 1version 2
Insert Heavy
server
db.pool.insert(
{
"subject": "Euro 2016",
"winner": "Portugal"
}
)
mongod
Insert Heavy
Server 1
mongod
db.pool.insert(
{
"subject": "Euro 2016",
"winner": "Portugal"
}
)
MMAPv1
Server 2
mongod
Server 3
mongod
Sharding
Insert Heavy
server
mongod
db.pool.insert(
{
"subject": "Euro 2016",
"winner": "Portugal"
}
)
mongod
mongod
MMAPv1
Micro-Sharding
https://www.mongodb.com/blog/post/mongodb-single-
platform-all-financial-data-ahl
Insert Heavy
server
db.pool.insert(
{
"subject": "Euro 2016",
"winner": "Portuaal"
}
)
mongod
WiredTiger
CPU availability
Insert Heavy
server
mongod
db.pool.insert(
{
"subject": "Euro 2016",
"winner": "Portuaal"
}
)
WiredTiger
server
mongod
server
mongod
Sharding
Buckets {
"product_id": "b872ad6e34f87...",
"visits":{
"store1": {
"Jan_31": 2823,
"Jan_30": 5535,
...
"total": 1222312
},
"store_2": {
"Jan_31": 2823,
"Jan_28": 5535,
...
"total": 1222312
},
}
}
Buckets {
"product_id": "b872ad6e34f87...",
"visits":{
"store1": {
"Jan_31": 2823,
"Jan_30": 5535,
...
"total": 1222312
},
"store_2": {
"Jan_31": 2823,
"Jan_28": 5535,
...
"total": 1222312
},
}
}
db.visits.update(
{"product_id": "b872ad6e34f87...."},
{"$inc": {
"store_2.March_19": 10,
"store_1.April_1": 68,
"stores.total": 78
}
)
Buckets {
"product_id": "b872ad6e34f87...",
"visits":{
"store1": {
"Jan_31": 2823,
"Jan_30": 5535,
"April_1": 10,
"total": 1222312
},
"store_2": {
"Jan_31": 2823,
"Jan_28": 5535,
"March_19": 68,
"total": 1222312
},
}
}
In-place update
MMAPv1
Buckets
{
"product_id": "b872ad6e34f87...",
"visits":{
"store1": {
"Jan_31": 2823,
"Jan_30": 5535,
...
"total": 1222312
},
"store_2": {
"Jan_31": 2823,
"Jan_28": 5535,
...
"total": 1222312
},
}
}
New version
{
"product_id": "b872ad6e34f87...",
"visits":{
"store1": {
"Jan_31": 2823,
"Jan_30": 5610,
"April_1": 10,
"total": 1222312
},
"store_2": {
"Jan_31": 2823,
"Jan_28": 5545,
"March_19": 68,
"total": 12229032
},
}
}
WiredTiger
Moving to WiredTiger
What to Look For
Bucketing Read/Write RatioIn-place Update
Update Process
• Apply changes to your schema design
• Check if there's any performance regression
• Make sure you have CPU resources available
• Swap secondary nodes
• Swap primary
• Enjoy the increased performance!
Example of Document Definition
• App1
- Realtime dashboard
- Ad hoc queries
- User profiles
• App2
- Heavy writes batch process
- Analytics workload
- Multi-tenant application
Example of Document Definition
• App1
- Realtime dashboard
- Ad hoc queries
- User profiles
• App2
- Heavy writes batch process
- Analytics workload
- Multi-tenant application
WiredTiger or MMAPv1
Example of Document Definition
• App1
- Realtime dashboard
- Ad hoc queries
- User profiles
• App2
- Heavy writes batch process
- Analytics workload
- Multi-tenant application
WiredTiger or MMAPv1 WiredTiger
Market Size
$36 Billion
Partners
1,000+
International Offices
15
Global Employees
575+
Downloads Worldwide
15,000,000+
Make a GIANT Impact
www.mongodb.com/careers
http://grnh.se/pj10su
https://university.mongodb.com/courses/M310/about
M310: MongoDB Security
Feel free to reach out!
Twitter: @nleite
norberto@mongodb.com
#mongodbeurope
Obrigado

Webinar: Schema Patterns and Your Storage Engine