Jumpstart: Introduction to Schema Design

O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O
# M D B l o c a l
JUMP START
Introduction to Schema
Design

# M D B l o c a l
Sigfrido “Sig” Narváez
Sr. Solution Architect
@SigNarvaez

# M D B l o c a l
@ S i g N a r v a e z
1) Schema Design in MongoDB is important
2) Schema Design in MongoDB differs from Relational
3) New patterns not possible before
4) Almost all performance issues, due to Schema Design
Why this talk?

# M D B l o c a l
At the end of this presentation, you should be able to:
1) Understand the Document Model
2) Leverage the Document Model to Design your Use Case
3) Optimize your Schema for humongous workloads, for GIANT
ideas
What I want you to get out of this talk

# M D B l o c a l
Variety of Use Cases

# M D B l o c a l
High Level
Schema Design

# M D B l o c a l
Relational Schema Design
https://iedei.files.wordpress.com/2012/04/mercedes-f1-car-disassembled.jpeg

# M D B l o c a l
MongoDB Schema Design
http://www.northlodge.org/f1/2002/bar/bar-2002-01-honda-02.jpg

# M D B l o c a l
How MongoDB
Stores Data

# M D B l o c a l
JSON is a widely used standard
JSON is fast, lean, flexible
JSON is supported by almost all languages and tools
JSON is used for modern communication & data exchange
MongoDB stores BSON (Binary JSON)
MongoDB uses JSON Documents

# M D B l o c a l
Documents are Rich Data Structures{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: { type: Point,
coordinates: [45.123,47.232]},
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-
documents
Typed field values
Fields can contain arrays
Fields
ORM not needed
Index/Query at any level

# M D B l o c a l
Find car collectors within 10 mile radius
of London who own cars from the 1970’s
http://static2.uk.businessinsider.com/image/57925947dd089545048b49d4-1200/jay-lenos-garage-jaguar-e-type.jpg

# M D B l o c a l
db.collectors.createIndex({
"location":"2dsphere",
"cars.year": 1
});
Rich indexes

# M D B l o c a l
Expressive Queries
London 10 Miles
(in Radians)
1970’s

# M D B l o c a l
{
sku: ‘PAINTZXC123’,
product_name: ‘Metallic Paint’,
colors: [‘Red’, ‘Green’],
size_gallons: [5, 10]
}
{
sku: ‘TSHRTASD43546’,
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
colors: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
sku: ‘WHEELBVCX6543’,
product_name: ’19” 5-spoke’,
material: ‘aluminum alloy’,
color: ‘silver’,
frame_material: ‘aluminum’,
package_height: ’20.5x32.9x55’,
weight_lbs: 5.15,
wheel_size_in: 19
}
Different Documents in the same ProductsCatalog collection in MongoDB
Polymorphic Schema – Aligns with OOP principles
Car Paint products Car T-Shirt products Car Wheels products
Documents are FLEXIBLE

# M D B l o c a l
Flexible Data Governance
All products must
have an SKU and
Name

# M D B l o c a l
Schema Design

# M D B l o c a l
Questions to keep in mind
Data access patterns?
Number of Reads vs Updates?
Expected size of a Document?
Why Schema Design is important
Almost all performance issues are related to Schema Design
It’s all about the data

# M D B l o c a l
Collections & Documents
People Collection
{
id: 1,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
{
id: 2,
first_name: ‘Ortega’,
surname: ‘Alvaro’,
city: ‘Valencia’
}
Cars Collection

# M D B l o c a l
Embedding Documents
{
id: 1,
city: ‘London’
}
{
year: 1973,
owner: 1
}
{
year: 1965,
color: ‘gold‘,
owner: 1
}
People Collection Car Collection

# M D B l o c a l
Embedding Documents
{
id: 1,
city: ‘London’
}
{
year: 1973,
owner: 1
}
{
year: 1965,
color: ‘gold‘,
owner: 1
}
People Collection Car Collector Collection
No need for JOINS!
Car Collection

# M D B l o c a l
Which Cars belong to Paul?
How fast can you answer this?

# M D B l o c a l
What about what I know
from relational?

# M D B l o c a l
REFERENCING & EMBEDDING
https://docs.mongodb.com/manual/core/data-modeling-introduction/

# M D B l o c a l
Healthcare use case

# M D B l o c a l
1-1
Embed – weak entity

# M D B l o c a l
1-M
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures

# M D B l o c a l
Join table Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
XNo Join Tables
Use arrays instead
M-M

# M D B l o c a l
M-M
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
Embedding Physicians in Hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
address: {…},
…}]
}
Data Duplication
…
is ok!

# M D B l o c a l
M-M
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]
}
Referencing
{
id: 63633,
name: “Harold Green”,
hospitals: [1,2],
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
hospitals: [1],
…}
{
id: 12346,
name: “Mary Well”,
hospitals: [1,2],
…}

# M D B l o c a l
MongoDB Schema
Design Patterns

# M D B l o c a l
Entertainment Use Case

# M D B l o c a l
SUBSET Pattern
•You want to display
dependent information,
however only part of it
•The rest of the data is
fetched only if needed
•Examples:
‒ The Cast of a Movie
‒ Last 10 Movies an Actor has
starred in

# M D B l o c a l
SCHEMA DESIGN PATTERNS
Address a precise use case
in a problem
• Similar to GoF Patterns
for OOD
• Not modeling of
relationship or full
"solution" of a problem
NoSQL patterns often address
performance
• E.g. reduced reads
• If performance is not an issue -
design for simplicity
Data Duplication
• Its ok!
• Data change & update cadence
• Understand implication of stale data
• Define a Correction Mechanism

# M D B l o c a l
Full talk from MDBW17 – here today!
https://explore.mongodb.com

# M D B l o c a l
Internet of things
Use Cases

# M D B l o c a l
Weather Use Case

# M D B l o c a l
Our Weather Company
• Weather history and forecast service
• Weather station is on the network
• Different measurements (temperature,
wind speed, humidity, etc.)
• We currently cover Chicago region with a
few hundred stations
• Many users looking for historic details
and forecasting
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg

# M D B l o c a l
Data Modeling for Time Series
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:00:00.000Z”),
temperature : 9,
humidity : 60
}
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:01:00.000Z”),
temperature : 10,
humidity : 60
}
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:02:00.000Z”),
temperature : 10,
humidity : 59
}
Document per
measurement

# M D B l o c a l
Things to consider
• Relational Approach?
• What will happen if we scale?
(Region, Frequency, Stations)
• Data & Index Size?

# M D B l o c a l
{
sensor_id : 520124,
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
Bucketing

# M D B l o c a l
Things to consider
• Size of a Document?

# M D B l o c a l
{
sensor_id : 520124,
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
Bucketing by
transactions

# M D B l o c a l
Things to consider
• How will user access the data?

# M D B l o c a l
{
sensor_id : 520124,
start_date : ISODate(“2017-02-25T17:00:00.000Z”),
end_date : ISODate(“2017-02-25T18:00:00.000Z”),
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
Bucketing by time
and transactions

# M D B l o c a l
Things to consider
• How will user access the data?
• How will Data Scientists access the
data?

# M D B l o c a l
{
sensor_id : 520124,
start_date : ISODate(“2017-02-25T17:00:00.000Z”),
end_date : ISODate(“2017-02-25T18:00:00.000Z”),
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
avg_temperature : 10,
avg_humidity : 53
}
Bucketing by time &
transactions with
pre-aggregation

# M D B l o c a l
TOOLS

# M D B l o c a l
MongoDB Compass

# M D B l o c a l
MGenerateJS
{
"first_name" : { "$string" : { "length" : 30 }},
"last_name" : { "$string" : { "length" : 30 }},
"cell" : "$number",
"city" : { "$string" : { "length" : 30 }},
"location" : [ "$number", "$number"],
"professions" : { "$array" : [ {
"$choose" : [ "banking", "finance", "trader" ] },
{ "$number": [1, 3] }
] },
"physicians" : { "$array" : [
{
"name" : { "$string" : { "length" : 30 }},
"last_visit" : { "$string" : { "length" : 30 }},
"last_visit_dt" : "$datetime"
},
{ "$number" : [1, 5]}
] }
}
• Model schema using JSON
• Generate K’s to M’s of docs
• Try out queries
• Measure performance
• Iterate!!

# M D B l o c a l
Conclusion
• MongoDB Schema Design is different
• MongoDB Schema is flexible and support modern requirements
• Embedding Documents is key and remove need for joins
• Pre-Aggregation will improve query performance
• You can still have “relations” – use patterns
• Buckets improve overall experience while dealing with big data
• What are your next steps?

# M D B l o c a l
Resources
university.mongodb.com explore.mongodb.com mongodb.com/user-groups

Jumpstart: Introduction to Schema Design

More Related Content

Similar to Jumpstart: Introduction to Schema Design

More from MongoDB

Jumpstart: Introduction to Schema Design

Editor's Notes