Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MongoDB Schema Design
Tips & Tricks
Grupo Undanet
August 2017, Salamanca
Who am I
Juan Roy
Twitter: @juanroycouto
Email: juanroycouto@gmail.com
MongoDB DBA at Grupo Undanet
2
Agenda
MongoDB Schema Design
● What is MongoDB
● What is a JSON Document
● What a Document Must Contain
● Relational Appro...
What is MongoDB
MongoDB Schema Design
● Non-Relational Database
● NoSQL Multipurpose Database
● Main Characteristics:
○ Sc...
What is a JSON Document
MongoDB Schema Design
5
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description" : "MICHELIN ...
What a Document must Contain
MongoDB Schema Design
● Ideally
○ All (principal application) item-related data
○ 1 Doc per I...
Relational Approach vs Document Model
MongoDB Schema Design
7
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description...
Normalization vs Denormalization
MongoDB Schema Design
8
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes...
Embedding Documents
MongoDB Schema Design
9
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
o...
Things to Keep in Mind
MongoDB Schema Design
10
● Avoid Relational Approach
● What will happen if we scale
● Size of:
○ Da...
Goals
MongoDB Schema Design
11
● Performance
● Scalability
● Simplicity
Over Normalization
MongoDB Schema Design
● The relational model has been moved directly to the MongoDB model.
● In the rel...
Overloaded Documents
MongoDB Schema Design
● This problem can arise if the application is packing lots of rarely used data...
Working Set
MongoDB Schema Design
14
The Working Set is the size of:
● Our Data *
plus
● Our Indexes
* But only the size o...
Working Set
MongoDB Schema Design
15
The Working Set does not fit in RAM, what should I do?
● Add more RAM to our machine
...
Historic Information
MongoDB Schema Design
16
● When our data grows up continuously (historical) and we embed them on our
...
1-1
MongoDB Schema Design
17
id name phone_number zip_code
1 Rick 555-111-1234 01209
2 Mike 555-222-2345 30062
Users
{
_id...
1-Few
MongoDB Schema Design
18
● Referencing (or Normalization)
○ To show a user’s information we need to do joins (or mor...
1-Few
MongoDB Schema Design
19
id name zip_code
1 Rick 01209
2 Mike 30062
id user_id phone_number
1 1 555-111-1234
2 2 555...
1-Few (MongoDB-Embedding)
MongoDB Schema Design
● The approach that gives us the best performance and data consistency gua...
1-Few (MongoDB-Referencing)
MongoDB Schema Design
21
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ...
N-1
MongoDB Schema Design
22
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ],
address : '13, Rue de...
1-Many
MongoDB Schema Design
Case: A blog with hundreds, or even thousands, of comments for a given post.
Embedding carrie...
Many-Many
MongoDB Schema Design
● We will embed a list of _id values in both directions
● We no longer have redundant info...
Recap
MongoDB Schema Design
● Avoid round trips to the database.
● User events should only generate a small number of quer...
Questions?
MongoDB Schema Design
26
Thank you!
MongoDB Schema Design
Thank you for your attention!
27
Upcoming SlideShare
Loading in …5
×

MongoDB Schema Design Tips & Tricks

620 views

Published on

How to model your data for MongoDB. Design your schema for easy scalability, performance and simplicity.

Published in: Data & Analytics
  • Be the first to comment

MongoDB Schema Design Tips & Tricks

  1. 1. MongoDB Schema Design Tips & Tricks Grupo Undanet August 2017, Salamanca
  2. 2. Who am I Juan Roy Twitter: @juanroycouto Email: juanroycouto@gmail.com MongoDB DBA at Grupo Undanet 2
  3. 3. Agenda MongoDB Schema Design ● What is MongoDB ● What is a JSON Document ● What a Document Must Contain ● Relational Approach vs Document Model ● Normalization vs Denormalization ● Embedding Documents ● Things to Keep in Mind ● Goals ● Over Normalization 3 ● Overloaded Documents ● Working Set ● Historic Information ● 1-1 ● 1-Few (Embedding & Referencing) ● N-1 ● 1-Many ● Many-Many ● Recap
  4. 4. What is MongoDB MongoDB Schema Design ● Non-Relational Database ● NoSQL Multipurpose Database ● Main Characteristics: ○ Scalability ○ High Availability ○ Automatic Failover ○ … ● Document-based (JSON) 4 SQL MongoDB Database Database Table Collection Register Document
  5. 5. What is a JSON Document MongoDB Schema Design 5 { "_id" : ObjectId("59400587962fe33db2194129"), "description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012", "date" : ISODate("2017-08-28T04:02:32Z"), "property" : { "tag" : { "noisebands" : "1", "rollingresistance" : "B", "noise" : "69", "wetgrip" : "A" }, "ratio" : 30, }, "ecotasa" : [ { "country" : "724", "price" : NumberDecimal("1.380000"), }, { "country" : "620", "price" : NumberDecimal("0.000000"), } ], "location" : { "type" : Point, "coordinates" : [ -5.724332, 40.959219 ] } } _id string array date subdocument geo-location number
  6. 6. What a Document must Contain MongoDB Schema Design ● Ideally ○ All (principal application) item-related data ○ 1 Doc per Item 6 Application Principal Item Catalog Article Finance Client ● Really ○ Most frequently accessed data
  7. 7. Relational Approach vs Document Model MongoDB Schema Design 7 { "_id" : ObjectId("59400587962fe33db2194129"), "description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012", "date" : ISODate("2017-08-28T04:02:32Z") "property" : { "tag" : { "noisebands" : "1", "rollingresistance" : "B", "noise" : "69", "wetgrip" : "A" }, "ratio" : "30", }, "ecotasa" : [ { "country" : "724", "price" : NumberDecimal("1.380000"), }, { "country" : "620", "price" : NumberDecimal("0.000000"), } ], "location" : { "type" : Point, "coordinates" : [ -5.724332, 40.959219 ] } }
  8. 8. Normalization vs Denormalization MongoDB Schema Design 8 People { _id : 1, name : 'Peter', city : 'Salamanca' } Motorbikes { _id : 1, owner : 1, color : 'red', model : 'Suzuki' } { _id : 2, owner : 1, color : 'black', model : 'Harley Davidson' } People { _id : 1, name : 'Peter', city : 'Salamanca', motorbikes : [ { model : 'Suzuki', color : 'red' }, { model : 'Harley Davidson', color : 'black' } ] } Denormalization Normalization
  9. 9. Embedding Documents MongoDB Schema Design 9 People { _id : 1, name : 'Peter', city : 'Salamanca' } Motorbikes { _id : 1, owner : 1, color : 'red', model : 'Suzuki' } { _id : 2, owner : 1, color : 'black', model : 'Harley Davidson' } People { _id : 1, name : 'Peter', city : 'Salamanca', motorbikes : [ { model : 'Suzuki', color : 'red' }, { model : 'Harley Davidson', color : 'black' } ] }
  10. 10. Things to Keep in Mind MongoDB Schema Design 10 ● Avoid Relational Approach ● What will happen if we scale ● Size of: ○ Data ○ Index ○ Document ● How will users access the data ○ Normal users ○ Machine Learning ○ Business Intelligence
  11. 11. Goals MongoDB Schema Design 11 ● Performance ● Scalability ● Simplicity
  12. 12. Over Normalization MongoDB Schema Design ● The relational model has been moved directly to the MongoDB model. ● In the relational world is common to have one table per concept. They do not have arrays. ● Only one action implies multiple queries, instead of just querying the data once. 12
  13. 13. Overloaded Documents MongoDB Schema Design ● This problem can arise if the application is packing lots of rarely used data into its frequently accessed documents. ● If your application is packing rarely used data into a document that needs to be touched frequently, that means it is more likely to evict other important data from the cache when that document gets read. ● Multiply this across a collection and the net result is that the server could be paging a lot more data than necessary in order to service the application. 13
  14. 14. Working Set MongoDB Schema Design 14 The Working Set is the size of: ● Our Data * plus ● Our Indexes * But only the size of our most accessed data The Working Set must fit in RAM!
  15. 15. Working Set MongoDB Schema Design 15 The Working Set does not fit in RAM, what should I do? ● Add more RAM to our machine ● Shard ● Reduce the size of our Working Set: ○ Limit our arrays ○ Limit our embedded documents ○ … ○ Benefits: ■ Fast data retrieval ■ One query brings all the information needed
  16. 16. Historic Information MongoDB Schema Design 16 ● When our data grows up continuously (historical) and we embed them on our main collection, our document will own a lot of information not needed habitually. But maybe, I want to store that for analytics purposes. So we’ll keep it away from the user document. ● That is not the case of information with a limited growth (addresses, phone numbers, etc).
  17. 17. 1-1 MongoDB Schema Design 17 id name phone_number zip_code 1 Rick 555-111-1234 01209 2 Mike 555-222-2345 30062 Users { _id : 1, name : 'Rick', phone_number : '555-111-1234', zip_code : '01209' } { _id : 2, name : 'Mike', phone_number : '555-222-2345', zip_code : '30062' }
  18. 18. 1-Few MongoDB Schema Design 18 ● Referencing (or Normalization) ○ To show a user’s information we need to do joins (or more than one query), this implies random seeks, a very low-performance operation! ● Embedding (or Denormalization) ○ We can avoid joins via denormalization. This implies redundancy data and more complex applications for not to generate inconsistencies. ○ Arrays help us to get no redundancy. This solution gives us perform benefits. ○ With denormalization, we have a lot of data model possibilities and this makes more difficult to define our model.
  19. 19. 1-Few MongoDB Schema Design 19 id name zip_code 1 Rick 01209 2 Mike 30062 id user_id phone_number 1 1 555-111-1234 2 2 555-222-2345 3 2 555-333-3456
  20. 20. 1-Few (MongoDB-Embedding) MongoDB Schema Design ● The approach that gives us the best performance and data consistency guarantees. ● Locality: MongoDB stores documents contiguously on disk, putting all the data you need into one document means that you’re never more than one seek away from everything you need. ● Atomicity and Isolation: Embedding we get atomicity (transactionality). 20 { _id : 2, name : 'Mike', zip_code : '30062', phone_numbers : [ '555-222-2345', '555-333-3456' ] }
  21. 21. 1-Few (MongoDB-Referencing) MongoDB Schema Design 21 { _id : 2, name : 'Mike', zip_code : '30062', phone_numbers : [ 2, 3 ] } { _id : 2, user_id : 2, phone_number : '555-222-2345' } { _id : 3, user_id : 2, phone_number : '555-333-3456' } ● Referencing we lose transactionality. ● We need: ○ More than one query ○ To use $lookup (joins) ● This approach is worst than embedding for performance. ● If we have to read our data frequently is better to embed it. ● Flexibility in order to project desired fields.
  22. 22. N-1 MongoDB Schema Design 22 { _id : 2, name : 'Mike', zip_code : '30062', phone_numbers : [ 2, 3 ], address : '13, Rue del Percebe' } { _id : 1, name : 'Rick', zip_code : '01209', phone_numbers : [ 2, 3 ], address : '13, Rue del Percebe' } What if two people share an address? ● Does that mean that you have to store the address twice? Yes, you do have to store it twice, three times, etc. ● This is better than make unnecessary joins. This extra space on the disk you are going to need will make your queries faster.
  23. 23. 1-Many MongoDB Schema Design Case: A blog with hundreds, or even thousands, of comments for a given post. Embedding carries significant penalties: ● The larger a document is, the more RAM it uses. The fewer documents in RAM, the more likely the server is to page fault to retrieve documents, and ultimately page faults lead to random disk I/O. ● Growing documents must eventually be copied to larger spaces. ● The document never stops growing up. ● MongoDB documents have a hard size limit of 16MB. Referencing: ● The document will not grow up because we will have one document per comment in a second collection. ● For very high or unpredictable one-to-many relationships. Solution: We may only wish to display the first three comments when showing a blog entry, more is simply wasting RAM. 23
  24. 24. Many-Many MongoDB Schema Design ● We will embed a list of _id values in both directions ● We no longer have redundant information 24 Product { _id : 'My product', category_ids : [ 'My category',... ] } Category { _id : 'My category', product_ids : [ 'My product', … ] }
  25. 25. Recap MongoDB Schema Design ● Avoid round trips to the database. ● User events should only generate a small number of queries. ● Use arrays when needed and of course when they won’t grow indefinitely. ● Don’t just migrate relational schemas. ● Data that is queried together should be in the same document whenever possible. ● Store the last login time, plus the shopping cart, in the user document since that is all we need for the landing page. ● Embedding for performance and atomicity (transactionality). ● Referencing for huge relationships. Ultimately, the decision depends on the access patterns of your application. 25
  26. 26. Questions? MongoDB Schema Design 26
  27. 27. Thank you! MongoDB Schema Design Thank you for your attention! 27

×