O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O
# M D B l o c a l
JUMP START
Introduction to Schema
Design
# M D B l o c a l
Sigfrido “Sig” Narváez
Sr. Solution Architect
@SigNarvaez
# M D B l o c a l
@ S i g N a r v a e z
1) Schema Design in MongoDB is important
2) Schema Design in MongoDB differs from Relational
3) New patterns not possible before
4) Almost all performance issues, due to Schema Design
Why this talk?
# M D B l o c a l
@ S i g N a r v a e z
At the end of this presentation, you should be able to:
1) Understand the Document Model
2) Leverage the Document Model to Design your Use Case
3) Optimize your Schema for humongous workloads, for GIANT
ideas
What I want you to get out of this talk
# M D B l o c a l
@ S i g N a r v a e z
Variety of Use Cases
# M D B l o c a l
@ S i g N a r v a e z
High Level
Schema Design
# M D B l o c a l
@ S i g N a r v a e z
Relational Schema Design
https://iedei.files.wordpress.com/2012/04/mercedes-f1-car-disassembled.jpeg
# M D B l o c a l
@ S i g N a r v a e z
MongoDB Schema Design
http://www.northlodge.org/f1/2002/bar/bar-2002-01-honda-02.jpg
# M D B l o c a l
@ S i g N a r v a e z
How MongoDB
Stores Data
# M D B l o c a l
@ S i g N a r v a e z
JSON is a widely used standard
JSON is fast, lean, flexible
JSON is supported by almost all languages and tools
JSON is used for modern communication & data exchange
MongoDB stores BSON (Binary JSON)
MongoDB uses JSON Documents
# M D B l o c a l
@ S i g N a r v a e z
Documents are Rich Data Structures{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: { type: Point,
coordinates: [45.123,47.232]},
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-
documents
Typed field values
Fields can contain arrays
Fields
ORM not needed
Index/Query at any level
# M D B l o c a l
@ S i g N a r v a e z
Find car collectors within 10 mile radius
of London who own cars from the 1970’s
http://static2.uk.businessinsider.com/image/57925947dd089545048b49d4-1200/jay-lenos-garage-jaguar-e-type.jpg
# M D B l o c a l
@ S i g N a r v a e z
db.collectors.createIndex({
"location":"2dsphere",
"cars.year": 1
});
Rich indexes
# M D B l o c a l
@ S i g N a r v a e z
Expressive Queries
London 10 Miles
(in Radians)
1970’s
# M D B l o c a l
@ S i g N a r v a e z
{
sku: ‘PAINTZXC123’,
product_name: ‘Metallic Paint’,
colors: [‘Red’, ‘Green’],
size_gallons: [5, 10]
}
{
sku: ‘TSHRTASD43546’,
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
colors: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
sku: ‘WHEELBVCX6543’,
product_name: ’19” 5-spoke’,
material: ‘aluminum alloy’,
color: ‘silver’,
frame_material: ‘aluminum’,
package_height: ’20.5x32.9x55’,
weight_lbs: 5.15,
wheel_size_in: 19
}
Different Documents in the same ProductsCatalog collection in MongoDB
Polymorphic Schema – Aligns with OOP principles
Car Paint products Car T-Shirt products Car Wheels products
Documents are FLEXIBLE
# M D B l o c a l
@ S i g N a r v a e z
Flexible Data Governance
All products must
have an SKU and
Name
# M D B l o c a l
@ S i g N a r v a e z
Schema Design
# M D B l o c a l
@ S i g N a r v a e z
Questions to keep in mind
Data access patterns?
Number of Reads vs Updates?
Expected size of a Document?
Why Schema Design is important
Almost all performance issues are related to Schema Design
It’s all about the data
# M D B l o c a l
@ S i g N a r v a e z
Collections & Documents
People Collection
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
{
id: 2,
first_name: ‘Ortega’,
surname: ‘Alvaro’,
city: ‘Valencia’
}
Cars Collection
# M D B l o c a l
@ S i g N a r v a e z
Embedding Documents
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
People Collection Car Collection
# M D B l o c a l
@ S i g N a r v a e z
Embedding Documents
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
People Collection Car Collector Collection
No need for JOINS!
Car Collection
# M D B l o c a l
@ S i g N a r v a e z
Which Cars belong to Paul?
How fast can you answer this?
# M D B l o c a l
@ S i g N a r v a e z
What about what I know
from relational?
# M D B l o c a l
@ S i g N a r v a e z
REFERENCING & EMBEDDING
https://docs.mongodb.com/manual/core/data-modeling-introduction/
# M D B l o c a l
@ S i g N a r v a e z
Healthcare use case
# M D B l o c a l
@ S i g N a r v a e z
1-1
Embed – weak entity
# M D B l o c a l
@ S i g N a r v a e z
1-M
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures
# M D B l o c a l
@ S i g N a r v a e z
Join table Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
XNo Join Tables
Use arrays instead
M-M
# M D B l o c a l
@ S i g N a r v a e z
M-M
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
Embedding Physicians in Hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}]
}
Data Duplication
…
is ok!
# M D B l o c a l
@ S i g N a r v a e z
M-M
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]
}
Referencing
{
id: 63633,
name: “Harold Green”,
hospitals: [1,2],
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
name: “Joe Doctor”,
hospitals: [1],
…}
{
id: 12346,
name: “Mary Well”,
hospitals: [1,2],
…}
# M D B l o c a l
@ S i g N a r v a e z
MongoDB Schema
Design Patterns
# M D B l o c a l
@ S i g N a r v a e z
Entertainment Use Case
# M D B l o c a l
@ S i g N a r v a e z
SUBSET Pattern
•You want to display
dependent information,
however only part of it
•The rest of the data is
fetched only if needed
•Examples:
‒ The Cast of a Movie
‒ Last 10 Movies an Actor has
starred in
# M D B l o c a l
@ S i g N a r v a e z
SCHEMA DESIGN PATTERNS
Address a precise use case
in a problem
• Similar to GoF Patterns
for OOD
• Not modeling of
relationship or full
"solution" of a problem
NoSQL patterns often address
performance
• E.g. reduced reads
• If performance is not an issue -
design for simplicity
Data Duplication
• Its ok!
• Data change & update cadence
• Understand implication of stale data
• Define a Correction Mechanism
# M D B l o c a l
@ S i g N a r v a e z
Full talk from MDBW17 – here today!
https://explore.mongodb.com
# M D B l o c a l
@ S i g N a r v a e z
Internet of things
Use Cases
# M D B l o c a l
@ S i g N a r v a e z
Weather Use Case
# M D B l o c a l
@ S i g N a r v a e z
Our Weather Company
• Weather history and forecast service
• Weather station is on the network
• Different measurements (temperature,
wind speed, humidity, etc.)
• We currently cover Chicago region with a
few hundred stations
• Many users looking for historic details
and forecasting
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:00:00.000Z”),
temperature : 9,
humidity : 60
}
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:01:00.000Z”),
temperature : 10,
humidity : 60
}
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:02:00.000Z”),
temperature : 10,
humidity : 59
}
Document per
measurement
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
(Region, Frequency, Stations)
• Data & Index Size?
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
Bucketing
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
• Data & Index Size?
• Size of a Document?
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
Bucketing by
transactions
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
• Data & Index Size?
• Size of a Document?
• How will user access the data?
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
start_date : ISODate(“2017-02-25T17:00:00.000Z”),
end_date : ISODate(“2017-02-25T18:00:00.000Z”),
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
Bucketing by time
and transactions
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
• Data & Index Size?
• Size of a Document?
• How will user access the data?
• How will Data Scientists access the
data?
# M D B l o c a l
@ S i g N a r v a e z
Data Modeling for Time Series
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
start_date : ISODate(“2017-02-25T17:00:00.000Z”),
end_date : ISODate(“2017-02-25T18:00:00.000Z”),
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
avg_temperature : 10,
avg_humidity : 53
}
Bucketing by time &
transactions with
pre-aggregation
# M D B l o c a l
@ S i g N a r v a e z
TOOLS
# M D B l o c a l
@ S i g N a r v a e z
MongoDB Compass
# M D B l o c a l
@ S i g N a r v a e z
MGenerateJS
{
"first_name" : { "$string" : { "length" : 30 }},
"last_name" : { "$string" : { "length" : 30 }},
"cell" : "$number",
"city" : { "$string" : { "length" : 30 }},
"location" : [ "$number", "$number"],
"professions" : { "$array" : [ {
"$choose" : [ "banking", "finance", "trader" ] },
{ "$number": [1, 3] }
] },
"physicians" : { "$array" : [
{
"name" : { "$string" : { "length" : 30 }},
"last_visit" : { "$string" : { "length" : 30 }},
"last_visit_dt" : "$datetime"
},
{ "$number" : [1, 5]}
] }
}
• Model schema using JSON
• Generate K’s to M’s of docs
• Try out queries
• Measure performance
• Iterate!!
# M D B l o c a l
@ S i g N a r v a e z
Conclusion
• MongoDB Schema Design is different
• MongoDB Schema is flexible and support modern requirements
• Embedding Documents is key and remove need for joins
• Pre-Aggregation will improve query performance
• You can still have “relations” – use patterns
• Buckets improve overall experience while dealing with big data
• What are your next steps?
# M D B l o c a l
@ S i g N a r v a e z
Resources
university.mongodb.com explore.mongodb.com mongodb.com/user-groups
# M D B l o c a l
THANK YOU

Jumpstart: Introduction to Schema Design

  • 1.
    O C TO B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O # M D B l o c a l JUMP START Introduction to Schema Design
  • 2.
    # M DB l o c a l Sigfrido “Sig” Narváez Sr. Solution Architect @SigNarvaez
  • 3.
    # M DB l o c a l @ S i g N a r v a e z 1) Schema Design in MongoDB is important 2) Schema Design in MongoDB differs from Relational 3) New patterns not possible before 4) Almost all performance issues, due to Schema Design Why this talk?
  • 4.
    # M DB l o c a l @ S i g N a r v a e z At the end of this presentation, you should be able to: 1) Understand the Document Model 2) Leverage the Document Model to Design your Use Case 3) Optimize your Schema for humongous workloads, for GIANT ideas What I want you to get out of this talk
  • 5.
    # M DB l o c a l @ S i g N a r v a e z Variety of Use Cases
  • 6.
    # M DB l o c a l @ S i g N a r v a e z High Level Schema Design
  • 7.
    # M DB l o c a l @ S i g N a r v a e z Relational Schema Design https://iedei.files.wordpress.com/2012/04/mercedes-f1-car-disassembled.jpeg
  • 8.
    # M DB l o c a l @ S i g N a r v a e z MongoDB Schema Design http://www.northlodge.org/f1/2002/bar/bar-2002-01-honda-02.jpg
  • 9.
    # M DB l o c a l @ S i g N a r v a e z How MongoDB Stores Data
  • 10.
    # M DB l o c a l @ S i g N a r v a e z JSON is a widely used standard JSON is fast, lean, flexible JSON is supported by almost all languages and tools JSON is used for modern communication & data exchange MongoDB stores BSON (Binary JSON) MongoDB uses JSON Documents
  • 11.
    # M DB l o c a l @ S i g N a r v a e z Documents are Rich Data Structures{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: { type: Point, coordinates: [45.123,47.232]}, Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub- documents Typed field values Fields can contain arrays Fields ORM not needed Index/Query at any level
  • 12.
    # M DB l o c a l @ S i g N a r v a e z Find car collectors within 10 mile radius of London who own cars from the 1970’s http://static2.uk.businessinsider.com/image/57925947dd089545048b49d4-1200/jay-lenos-garage-jaguar-e-type.jpg
  • 13.
    # M DB l o c a l @ S i g N a r v a e z db.collectors.createIndex({ "location":"2dsphere", "cars.year": 1 }); Rich indexes
  • 14.
    # M DB l o c a l @ S i g N a r v a e z Expressive Queries London 10 Miles (in Radians) 1970’s
  • 15.
    # M DB l o c a l @ S i g N a r v a e z { sku: ‘PAINTZXC123’, product_name: ‘Metallic Paint’, colors: [‘Red’, ‘Green’], size_gallons: [5, 10] } { sku: ‘TSHRTASD43546’, product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], colors: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { sku: ‘WHEELBVCX6543’, product_name: ’19” 5-spoke’, material: ‘aluminum alloy’, color: ‘silver’, frame_material: ‘aluminum’, package_height: ’20.5x32.9x55’, weight_lbs: 5.15, wheel_size_in: 19 } Different Documents in the same ProductsCatalog collection in MongoDB Polymorphic Schema – Aligns with OOP principles Car Paint products Car T-Shirt products Car Wheels products Documents are FLEXIBLE
  • 16.
    # M DB l o c a l @ S i g N a r v a e z Flexible Data Governance All products must have an SKU and Name
  • 17.
    # M DB l o c a l @ S i g N a r v a e z Schema Design
  • 18.
    # M DB l o c a l @ S i g N a r v a e z Questions to keep in mind Data access patterns? Number of Reads vs Updates? Expected size of a Document? Why Schema Design is important Almost all performance issues are related to Schema Design It’s all about the data
  • 19.
    # M DB l o c a l @ S i g N a r v a e z Collections & Documents People Collection { id: 1, first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } { id: 2, first_name: ‘Ortega’, surname: ‘Alvaro’, city: ‘Valencia’ } Cars Collection
  • 20.
    # M DB l o c a l @ S i g N a r v a e z Embedding Documents { id: 1, first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } People Collection Car Collection
  • 21.
    # M DB l o c a l @ S i g N a r v a e z Embedding Documents { id: 1, first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } People Collection Car Collector Collection No need for JOINS! Car Collection
  • 22.
    # M DB l o c a l @ S i g N a r v a e z Which Cars belong to Paul? How fast can you answer this?
  • 23.
    # M DB l o c a l @ S i g N a r v a e z What about what I know from relational?
  • 24.
    # M DB l o c a l @ S i g N a r v a e z REFERENCING & EMBEDDING https://docs.mongodb.com/manual/core/data-modeling-introduction/
  • 25.
    # M DB l o c a l @ S i g N a r v a e z Healthcare use case
  • 26.
    # M DB l o c a l @ S i g N a r v a e z 1-1 Embed – weak entity
  • 27.
    # M DB l o c a l @ S i g N a r v a e z 1-M { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] } Patients Embed Modeled in 2 possible ways { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]} { _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …} Patients Reference Procedures
  • 28.
    # M DB l o c a l @ S i g N a r v a e z Join table Physicians name specialty phone Hospitals name HosPhysicanRel hospitalId physicianId XNo Join Tables Use arrays instead M-M
  • 29.
    # M DB l o c a l @ S i g N a r v a e z M-M { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…}, …}, { id: 12346, name: “Mary Well”, address: {…}, …}] } Embedding Physicians in Hospitals collection { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…}, …}, { id: 12345, name: “Joe Doctor”, address: {…}, …}] } Data Duplication … is ok!
  • 30.
    # M DB l o c a l @ S i g N a r v a e z M-M { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346] } Referencing { id: 63633, name: “Harold Green”, hospitals: [1,2], …} Hospitals { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345] } Physicians { id: 12345, name: “Joe Doctor”, hospitals: [1], …} { id: 12346, name: “Mary Well”, hospitals: [1,2], …}
  • 31.
    # M DB l o c a l @ S i g N a r v a e z MongoDB Schema Design Patterns
  • 32.
    # M DB l o c a l @ S i g N a r v a e z Entertainment Use Case
  • 33.
    # M DB l o c a l @ S i g N a r v a e z SUBSET Pattern •You want to display dependent information, however only part of it •The rest of the data is fetched only if needed •Examples: ‒ The Cast of a Movie ‒ Last 10 Movies an Actor has starred in
  • 34.
    # M DB l o c a l @ S i g N a r v a e z SCHEMA DESIGN PATTERNS Address a precise use case in a problem • Similar to GoF Patterns for OOD • Not modeling of relationship or full "solution" of a problem NoSQL patterns often address performance • E.g. reduced reads • If performance is not an issue - design for simplicity Data Duplication • Its ok! • Data change & update cadence • Understand implication of stale data • Define a Correction Mechanism
  • 35.
    # M DB l o c a l @ S i g N a r v a e z Full talk from MDBW17 – here today! https://explore.mongodb.com
  • 36.
    # M DB l o c a l @ S i g N a r v a e z Internet of things Use Cases
  • 37.
    # M DB l o c a l @ S i g N a r v a e z Weather Use Case
  • 38.
    # M DB l o c a l @ S i g N a r v a e z Our Weather Company • Weather history and forecast service • Weather station is on the network • Different measurements (temperature, wind speed, humidity, etc.) • We currently cover Chicago region with a few hundred stations • Many users looking for historic details and forecasting http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
  • 39.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, date : ISODate(“2017-02-25T17:00:00.000Z”), temperature : 9, humidity : 60 } { sensor_id : 520124, date : ISODate(“2017-02-25T17:01:00.000Z”), temperature : 10, humidity : 60 } { sensor_id : 520124, date : ISODate(“2017-02-25T17:02:00.000Z”), temperature : 10, humidity : 59 } Document per measurement
  • 40.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? (Region, Frequency, Stations) • Data & Index Size?
  • 41.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], Bucketing
  • 42.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? • Data & Index Size? • Size of a Document?
  • 43.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], txCount : 100, Bucketing by transactions
  • 44.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? • Data & Index Size? • Size of a Document? • How will user access the data?
  • 45.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, start_date : ISODate(“2017-02-25T17:00:00.000Z”), end_date : ISODate(“2017-02-25T18:00:00.000Z”), measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], txCount : 100, Bucketing by time and transactions
  • 46.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? • Data & Index Size? • Size of a Document? • How will user access the data? • How will Data Scientists access the data?
  • 47.
    # M DB l o c a l @ S i g N a r v a e z Data Modeling for Time Series http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, start_date : ISODate(“2017-02-25T17:00:00.000Z”), end_date : ISODate(“2017-02-25T18:00:00.000Z”), measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], txCount : 100, avg_temperature : 10, avg_humidity : 53 } Bucketing by time & transactions with pre-aggregation
  • 48.
    # M DB l o c a l @ S i g N a r v a e z TOOLS
  • 49.
    # M DB l o c a l @ S i g N a r v a e z MongoDB Compass
  • 50.
    # M DB l o c a l @ S i g N a r v a e z MGenerateJS { "first_name" : { "$string" : { "length" : 30 }}, "last_name" : { "$string" : { "length" : 30 }}, "cell" : "$number", "city" : { "$string" : { "length" : 30 }}, "location" : [ "$number", "$number"], "professions" : { "$array" : [ { "$choose" : [ "banking", "finance", "trader" ] }, { "$number": [1, 3] } ] }, "physicians" : { "$array" : [ { "name" : { "$string" : { "length" : 30 }}, "last_visit" : { "$string" : { "length" : 30 }}, "last_visit_dt" : "$datetime" }, { "$number" : [1, 5]} ] } } • Model schema using JSON • Generate K’s to M’s of docs • Try out queries • Measure performance • Iterate!!
  • 51.
    # M DB l o c a l @ S i g N a r v a e z Conclusion • MongoDB Schema Design is different • MongoDB Schema is flexible and support modern requirements • Embedding Documents is key and remove need for joins • Pre-Aggregation will improve query performance • You can still have “relations” – use patterns • Buckets improve overall experience while dealing with big data • What are your next steps?
  • 52.
    # M DB l o c a l @ S i g N a r v a e z Resources university.mongodb.com explore.mongodb.com mongodb.com/user-groups
  • 53.
    # M DB l o c a l THANK YOU

Editor's Notes

  • #3 Welcome everyone, thank you for being here, I am Sig Narvaez, a Sr. Solution Architect based in Southern California, I am on my 3rd year with MongoDB, and before joining, I used to be a customer, advocate and mongodb master I know it's early, but let me tell you, I think you have chosen the right jumpstart talk :) – schema design is one of my favorite topics, because this is how I fell in love with with technology, after I realized that I could develop and simply write and read my data the way I needed to, it felt liberating. I have worked with relational databases, flat files and even proprietary file formats, and after using JSON and document databases, I realized there was no going back
  • #4 Now, even though, MongoDB has a flexible schema, and yes we can ingest any data as long as it is json, schema design is still very important, mainly, because most of us have probably been taught to model in relational form throughout university, and we have applied this in our profession, and we may use the same patterns when we start using MongoDB, because, well, its what we know, but that may or may not be the best way, or the way to get the most value out of this type of database, and so some concepts are relevant, in some concepts may not be that useful. And we need to be open to new ways of thinking and designing data schemas, that was not possible before. The last point about why this talk is important, is that we have learned from most of our field engineers who work directly with customers on performance issues, that the root cause, most of the time, is due to schema design, because maybe the design was either too relational, or the json document style was taken to the extreme, its’ really really important to understand tradeoffs.
  • #5 And so what I really want you to get out of this talk it Is to understand the document model how to leverage it for designing around your use cases, so that you can go out and build GIANT ideas. ROADMAP / AGENDA ?
  • #6 Because MongoDB is used
  • #7 MongoDB is used in many many use cases, so I am going to use different use cases / industries (automotive, healtcare, iot)
  • #8 In relational, we are taught to model the physical world, by breaking it apart into two-dimensional concepts, because this is what the database can store right, they store tables of rows and columns. And to express complex objects, we have to break them down into its components, but forced into two-dimensions. READS Assemble whole every time WRITES Disassemble whole Locks : all-or-nothing ROUNDTRIPS Assemble/disassemble every trip Error prone, slow, non-agile, etc. Frameworks (ORM’s) to the rescue – more moving parts
  • #9 Logic Deals with whole, not parts Engineers mindset Objects – not tabular API’s JSON RESTful Objects over the wire READS Read Whole in a single operation (or what is needed most of the time) WRITES Write whole, or update sub-parts Atomic DATA MODEL Model for use cases, performance, flexibility Previous rules (e.g. de-normalize are ok!)
  • #21 The NATURAL way of modeling data in MongoDB is to use EMbedding
  • #25 You can reference No RI Cascades – app level concern We have joins: $lookup $graphLookuo
  • #26 Let’s say we are going to build a medical system where Where Patients have medical procedures Doctors work at many hospitals and vice-versa Folks want to see this information in a mobile app
  • #33 Let’s say we are going to build a home movie rental system Our clients will search mainly by Movies and by Actors If I pull-up a Movie, I want to see the main Cast right away If I pull-up an Actor or Actress, I want to see the movies they have appeared in This is a natural M-M relationship
  • #38 Let’s say we are going to build a home movie rental system Our clients will search mainly by Movies and by Actors If I pull-up a Movie, I want to see the main Cast right away If I pull-up an Actor or Actress, I want to see the movies they have appeared in This is a natural M-M relationaship
  • #53 Have a great day at the conference – cool new 3.6 stuff