#MDBW17
{Schema Design}
JUMP START
#MDBW17
WHY THIS TALK?
1) Schema Design in MongoDB is important
2) Schema Design in MongoDB differs from Relational
3) Almost all performance issues, due to Schema Design
#MDBW17
WHAT I WANT YOU TO GET OUT OF THIS
TALK
At the end of this presentation, you should be able to:
1) Understand the Document Model
2) Leverage the Document Model to Design your Use Case
3) Optimize your Schema for humongous workloads
HIGH LEVEL
SCHEMA DESIGN
#MDBW17
RELATIONAL SCHEMA DESIGN
https://iedei.files.wordpress.com/2012/04/mercedes-f1-car-disassembled.jpeg
#MDBW17
MONGODB SCHEMA DESIGN
http://www.northlodge.org/f1/2002/bar/bar-2002-01-honda-02.jpg
How MongoDB
Stores Data
#MDBW17
MONGODB USES JSON DOCUMENTS
JSON is a widely used standard
• JSON is fast, lean, flexible
• JSON is supported by almost all languages and tools
• JSON is used for modern communication & data exchange
MongoDB stores BSON (Binary JSON)
#MDBW17
DOCUMENTS ARE RICH DATA STRUCTURES
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: { type: Point,
coordinates: [45.123,47.232]},
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Typed field values
Fields can contain
arrays
Fields
Schema Design
#MDBW17
IT IS ALL ABOUT THE DATA
Questions to keep in mind
• Data access patterns?
• Number of Reads vs Updates?
• Expected size of a Document?
Why Schema Design is important
• Almost all performance issues are related to Schema Design
#MDBW17
COLLECTIONS & DOCUMENTS
People Collection
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
{
id: 2,
first_name: ‘Ortega’,
surname: ‘Alvaro’,
city: ‘Valencia’
}
Cars Collection
#MDBW17
EMBEDDING DOCUMENTS
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
People Collection Car Collection
#MDBW17
EMBEDDING DOCUMENTS
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
cars: [
{
model: ‘Bently’,
year: 1973,
color: ‘silver’
},
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold’
}
]
}
People Collection People CollectionCar Collection
#MDBW17
EMBEDDING DOCUMENTS
{
id: 1,
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’
}
{
model: ‘Bently’,
year: 1973,
color: ‘silver’,
owner: 1
}
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold‘,
owner: 1
}
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
cars: [
{
model: ‘Bently’,
year: 1973,
color: ‘silver’
},
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold’
}
]
}
People Collection People CollectionCar Collection
#MDBW17
WHICH CARS BELONG TO PAUL?
HOW FAST CAN YOU ANSWER THIS?
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
cars: [
{
model: ‘Bently’,
year: 1973,
color: ‘silver’
},
{
model: ‘Rolls Royce’,
year: 1965,
color: ‘gold’
}
]
}
WHY DEVELOPERS
LOVE MONGODB
#MDBW17
DOCUMENTS == OBJECTS
#MDBW17
DEVELOPMENT – BEFORE MONGODB
{ CODE } DB SCHEMAXML CONFIG
APPLICATION RELATIONAL DATABASE
OBJECT RELATIONAL
MAPPING
#MDBW17
DEVELOPMENT – BEFORE MONGODB
{ CODE } DB SCHEMAXML CONFIG
APPLICATION RELATIONAL DATABASE
OBJECT RELATIONAL
MAPPING
Time? Focus?
#MDBW17
DEVELOPMENT – WITH MONGODB
{ CODE }
Agility
APPLICATION
INTERNET OF
THINGS USE CASES
#MDBW17
OUR WEATHER COMPANY
• Weather history and forecast service
• Weather station is on the network
• Different measurements (temperature,
wind speed, humidity, etc.)
• We currently cover Vienna region with
a few hundred stations
• Many users looking for historic details
and forecasting
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:00:00.000Z”),
temperature : 9,
humidity : 60
}
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:01:00.000Z”),
temperature : 10,
humidity : 60
}
{
sensor_id : 520124,
date : ISODate(“2017-02-25T17:02:00.000Z”),
temperature : 10,
humidity : 59
}
Document per measurement
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
(Region, Frequency, Stations)
• Data & Index Size?
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
Bucketing
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
• Data & Index Size?
• Size of a Document?
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
Bucketing by transactions
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
• Data & Index Size?
• Size of a Document?
• How will user access the data?
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
start_date : ISODate(“2017-02-25T17:00:00.000Z”),
end_date : ISODate(“2017-02-25T18:00:00.000Z”),
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
Bucketing by time and transactions
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
Things to consider
• Relational Approach?
• What will happen if we scale?
• Data & Index Size?
• Size of a Document?
• How will user access the data?
• How will Data Scientists access the
data?
#MDBW17
DATA MODELLING FOR TIME SERIES
http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
{
sensor_id : 520124,
start_date : ISODate(“2017-02-25T17:00:00.000Z”),
end_date : ISODate(“2017-02-25T18:00:00.000Z”),
measurements : [
{
date : ISODate(“2017-02-25T17:13:00.000Z”),
temperature : 9,
humidity : 60
},
{
date : ISODate(“2017-02-25T17:13:05.000Z”),
temperature : 15,
humidity : 55
}, … ],
txCount : 100,
sum_temperature : 1600,
sum_humidity : 5700
}
Bucketing by time and transactions with pre-aggregation
#MDBW17
MONGODB COMPASS
1 minute writing into both simple schema and bucket of 100 documents
#MDBW17
CONCLUSION
• MongoDB Schema Design is different
• MongoDB Schema is flexible and support modern requirements
• Embedding Documents is key and remove need for joins
• Pre-Aggregation will improve query performance
• Buckets improve overall experience while dealing with big data
• What are your next steps?
MongoDB Schema Design

MongoDB Schema Design

  • 1.
  • 2.
    #MDBW17 WHY THIS TALK? 1)Schema Design in MongoDB is important 2) Schema Design in MongoDB differs from Relational 3) Almost all performance issues, due to Schema Design
  • 3.
    #MDBW17 WHAT I WANTYOU TO GET OUT OF THIS TALK At the end of this presentation, you should be able to: 1) Understand the Document Model 2) Leverage the Document Model to Design your Use Case 3) Optimize your Schema for humongous workloads
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    #MDBW17 MONGODB USES JSONDOCUMENTS JSON is a widely used standard • JSON is fast, lean, flexible • JSON is supported by almost all languages and tools • JSON is used for modern communication & data exchange MongoDB stores BSON (Binary JSON)
  • 9.
    #MDBW17 DOCUMENTS ARE RICHDATA STRUCTURES { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: { type: Point, coordinates: [45.123,47.232]}, Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Typed field values Fields can contain arrays Fields
  • 10.
  • 11.
    #MDBW17 IT IS ALLABOUT THE DATA Questions to keep in mind • Data access patterns? • Number of Reads vs Updates? • Expected size of a Document? Why Schema Design is important • Almost all performance issues are related to Schema Design
  • 12.
    #MDBW17 COLLECTIONS & DOCUMENTS PeopleCollection { id: 1, first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } { id: 2, first_name: ‘Ortega’, surname: ‘Alvaro’, city: ‘Valencia’ } Cars Collection
  • 13.
    #MDBW17 EMBEDDING DOCUMENTS { id: 1, first_name:‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } People Collection Car Collection
  • 14.
    #MDBW17 EMBEDDING DOCUMENTS { id: 1, first_name:‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, cars: [ { model: ‘Bently’, year: 1973, color: ‘silver’ }, { model: ‘Rolls Royce’, year: 1965, color: ‘gold’ } ] } People Collection People CollectionCar Collection
  • 15.
    #MDBW17 EMBEDDING DOCUMENTS { id: 1, first_name:‘Paul’, surname: ‘Miller’, city: ‘London’ } { model: ‘Bently’, year: 1973, color: ‘silver’, owner: 1 } { model: ‘Rolls Royce’, year: 1965, color: ‘gold‘, owner: 1 } { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, cars: [ { model: ‘Bently’, year: 1973, color: ‘silver’ }, { model: ‘Rolls Royce’, year: 1965, color: ‘gold’ } ] } People Collection People CollectionCar Collection
  • 16.
    #MDBW17 WHICH CARS BELONGTO PAUL? HOW FAST CAN YOU ANSWER THIS? { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, cars: [ { model: ‘Bently’, year: 1973, color: ‘silver’ }, { model: ‘Rolls Royce’, year: 1965, color: ‘gold’ } ] }
  • 17.
  • 18.
  • 19.
    #MDBW17 DEVELOPMENT – BEFOREMONGODB { CODE } DB SCHEMAXML CONFIG APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING
  • 20.
    #MDBW17 DEVELOPMENT – BEFOREMONGODB { CODE } DB SCHEMAXML CONFIG APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING Time? Focus?
  • 21.
    #MDBW17 DEVELOPMENT – WITHMONGODB { CODE } Agility APPLICATION
  • 22.
  • 23.
    #MDBW17 OUR WEATHER COMPANY •Weather history and forecast service • Weather station is on the network • Different measurements (temperature, wind speed, humidity, etc.) • We currently cover Vienna region with a few hundred stations • Many users looking for historic details and forecasting http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg
  • 24.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, date : ISODate(“2017-02-25T17:00:00.000Z”), temperature : 9, humidity : 60 } { sensor_id : 520124, date : ISODate(“2017-02-25T17:01:00.000Z”), temperature : 10, humidity : 60 } { sensor_id : 520124, date : ISODate(“2017-02-25T17:02:00.000Z”), temperature : 10, humidity : 59 } Document per measurement
  • 25.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? (Region, Frequency, Stations) • Data & Index Size?
  • 26.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], Bucketing
  • 27.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? • Data & Index Size? • Size of a Document?
  • 28.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], txCount : 100, Bucketing by transactions
  • 29.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? • Data & Index Size? • Size of a Document? • How will user access the data?
  • 30.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, start_date : ISODate(“2017-02-25T17:00:00.000Z”), end_date : ISODate(“2017-02-25T18:00:00.000Z”), measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], txCount : 100, Bucketing by time and transactions
  • 31.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg Things to consider • Relational Approach? • What will happen if we scale? • Data & Index Size? • Size of a Document? • How will user access the data? • How will Data Scientists access the data?
  • 32.
    #MDBW17 DATA MODELLING FORTIME SERIES http://www.switchdoc.com/wp-content/uploads/2015/01/41-tvY-gqZL.jpg { sensor_id : 520124, start_date : ISODate(“2017-02-25T17:00:00.000Z”), end_date : ISODate(“2017-02-25T18:00:00.000Z”), measurements : [ { date : ISODate(“2017-02-25T17:13:00.000Z”), temperature : 9, humidity : 60 }, { date : ISODate(“2017-02-25T17:13:05.000Z”), temperature : 15, humidity : 55 }, … ], txCount : 100, sum_temperature : 1600, sum_humidity : 5700 } Bucketing by time and transactions with pre-aggregation
  • 33.
    #MDBW17 MONGODB COMPASS 1 minutewriting into both simple schema and bucket of 100 documents
  • 34.
    #MDBW17 CONCLUSION • MongoDB SchemaDesign is different • MongoDB Schema is flexible and support modern requirements • Embedding Documents is key and remove need for joins • Pre-Aggregation will improve query performance • Buckets improve overall experience while dealing with big data • What are your next steps?