SlideShare a Scribd company logo
1 of 22
MongoDB Schema
Design Patterns
Presented By :
Kalpit Pandit
panditkalpit@gmail.com
Agenda
 Schema Pattern?
 Data Modelling
 Design Patterns
 Approximation
 Computed
 Bucket
 Extended Reference
 Outlier
 Subset
panditkalpit@gmail.com
What are the Schema Patterns?
 Schema patterns are very similar to Software Design Patterns but focused on designing the
schema.
 These are some building blocks identified by developers upon years of experience which
helps in designing the schema of tables/collections in a way such that the application
becomes more scalable in terms of reads/writes.
 There are no perfect schema patterns. These are just guidelines and some industry standards
you can follow to build a better schema.
 You can also use multiple schema patterns together to solve one particular use-case.
 Let me reiterate this again there is no pattern which can be applied as plug and play. You
need to modify it to fit your use-case.
panditkalpit@gmail.com
Data Modeling
Embedding
Referencing
panditkalpit@gmail.com
Design Patterns
panditkalpit@gmail.com
Approximation Pattern
 This pattern can be used when the use-case is write-intensive but the correctness of the result
set is not of priority.
 let's say we need to implement a feature which says the number of people viewed that
product like a view counter and the counter logically need not be accurate it can be an
approximate i.e instead of 1878 it can be 1800 which is also fine since the accuracy is not
critical and can survive with approximate data.
 instead of updating the counter in the database with every visit, we could build in a counter
and only update by 50.
panditkalpit@gmail.com
Approximatio
n Pattern
 Useful when
 expensive calculations are frequently done
 the precision of those calculations is not the highest
priority
 Pros
 fewer writes to the database
 Reduction in CPU workload for frequent computations
 Cons
 exact numbers aren’t being represented
 implementation must be done in the application
 Example
 Visitor Count
 Total Product views
panditkalpit@gmail.com
Computed Pattern
 Have you ever wondered, how can
YouTube get the real time total views of
a video?
 The video above has 13B views, but if
you were to count that in any DB using
db.getCollection("video- views")
.find({}).count();
 The query above would take ages to
count the total views of the video. But
on YouTube you can see the total view in
an instant right.
 So let me introduce you to the
computed value pattern.
panditkalpit@gmail.com
Computed
Pattern
 Useful when
 Very read-intensive data access patterns
 Data needs to be repeatedly computed by the application
 Computation done in conjunction with any update or at
defined intervals - every hour for example
 Pros
 Reduction in CPU workload for frequent computations
 Cons
 It may be difficult to identify the need for this pattern
 Examples
 Revenue or viewer
 Time series data
 Product catalogues
panditkalpit@gmail.com
Bucket Pattern
 With data coming in as a stream over a
period of time (time series data) we may be
inclined to store each measurement in its
own document, as if we were using a
relational database.
 We could end up having to index sensor_id
and timestamp for every single measurement
to enable rapid access.
 We can "bucket" this data, by time, into
documents that hold the measurements from
a particular time span. We can also
programmatically add additional information
to each of these "buckets".
 Benefits in terms of index size savings,
potential query simplification, and the ability
to use that pre -aggregated data in our
documents.
panditkalpit@gmail.com
Bucket
Pattern
 Useful when
 needing to manage streaming data
 time-series
 real-time analytics
 Internet of Things (IoT)
 Pros
 reduces the overall number of documents in a collection
 improves index performance
 can simplify data access by leveraging pre -aggregation,
e.g., average temperature = sum/count
 Example
 IoT, time series
panditkalpit@gmail.com
Extended Reference
 In an e-commerce application order,
customer and the inventory are separate
logical entities.
 However, the full retrieval of an order requires
to join data from different entities.
 A customer can have N orders, creating a 1-N
relationship.
 Embedding all the customer information
inside each order
 reduce the JOIN operation
 results in a lot of duplicated information
 not all the customer data may be
needed.
panditkalpit@gmail.com
Extended Reference
 Instead of embedding (i.e.
duplicating) all the data of an
external entity (i.e. another
document), we only copy the fields
we access frequently.
 Instead of including a reference to
join the information, we only
embed those fields of the highest
priority and most frequently
accessed.
panditkalpit@gmail.com
Extended
Reference
 Useful when
 your application is experiencing lots of JOIN operations to
bring together frequently accessed data
 Pros
 improves performance when there are a lot of join
operations
 faster reads and a reduction in the complexity of data
fetching
 Sometimes duplication of data is better because you keep
the historical values (e.g., shipping address of the order)
 Cons
 data duplication, it works best if such data rarely change
(e.g., user-id, name)
panditkalpit@gmail.com
Outlier Pattern
 With the Outlier Pattern, we are working to
prevent a few queries or documents
driving our solution towards one that
would not be optimal for the majority of
our use cases
 E-Commerce selling books
 who has purchased a particular book?
 store an array of user_id who purchased
the book, in each book document
 You have a solution that works for 99.99%
of the cases, but what happens when a
top-seller book is released? o You cannot
store millions of user_ids due to the
document size limit (16 MB)
panditkalpit@gmail.com
Outlier
 Add a new field to "flag" the
document as an outlier, e.g.,
“has_extras”
 Move the overflow information into
a separate document linked with
the book's id.
 Inside the application, we would be
able to easily determine if a
document “has extras”.
 Only in such outlier cases, the
application would retrieve the extra
information.
panditkalpit@gmail.com
Outlier
Pattern
 Useful when
 few queries or documents that don’t fit into the rest of your
typical data patterns
 Pros
 prevents a few documents or queries from determining an
application’s solution.
 queries are tailored for “typical” use cases, but outliers are still
addressed
 Cons
 often tailored for specific queries, therefore ad hoc queries
may not perform well
 much of this pattern is done with application code
 Example
 social network relationships
 book sales
 movie reviews
panditkalpit@gmail.com
Subset Pattern
 When the working set of data and
indexes grows beyond the physical
RAM allotted, performance is reduced
as disk accesses starts to occur and
data rolls out of RAM
 add more RAM to the server
 sharding our collection, but that
comes with additional costs and
complexities
 reduce the size of our working set
with the Subset pattern
 Caused by large documents which have
a lot of data that isn't actually used by
the application
 e-commerce site that has a list of
reviews for a product. o accessing
that product's data, we'd only
need the most recent ten or so
reviews.
 pulling in the entirety of the
product data with all the reviews
could easily cause the working set
to uselessly expand
panditkalpit@gmail.com
Subset
Pattern
 Useful when
 the working set exceed the capacity of RAM due to large
documents that have much of the data in the document
not being used by the application
 Pros
 reduction in the overall size of the working set.
 shorter disk access time for the most frequently used data
 Cons
 we must manage the subset
 pulling in additional data requires additional trips to the
database
 Example
 reviews for a product
panditkalpit@gmail.com
References
 https://www.mongodb.com/blog/post/b
uilding-with-patterns-a-summary
 https://www.youtube.com/@MongoDB
 MongoDB Developer Day Conference
panditkalpit@gmail.com
Q&A
panditkalpit@gmail.com
Thank you
Kalpit Pandit
panditkalpit@gmail.com

More Related Content

Similar to MongoDb Schema Pattern - Kalpit Pandit.pptx

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...stevemcpherson
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida CLARA CAMPROVIN
 
Data warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendData warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendnoviari sugianto
 
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design PatternsMongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design PatternsMongoDB
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdfTechoERP
 
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsInside Analysis
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Modelspepeborja
 
Scaling Web Apps P Falcone
Scaling Web Apps P FalconeScaling Web Apps P Falcone
Scaling Web Apps P Falconejedt
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - TestKiran Naiga
 
Datalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointDatalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointMike Plant
 
Bodo Value Guide.pdf
Bodo Value Guide.pdfBodo Value Guide.pdf
Bodo Value Guide.pdfGregHanchin1
 
Implementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_YourImplementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_YourMichael Cook
 
Getting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech TalksGetting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech TalksAmazon Web Services
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB
 

Similar to MongoDb Schema Pattern - Kalpit Pandit.pptx (20)

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Building a SaaS Style Application
Building a SaaS Style ApplicationBuilding a SaaS Style Application
Building a SaaS Style Application
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
 
Data warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendData warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spend
 
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design PatternsMongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters Analytics
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Models
 
Scaling Web Apps P Falcone
Scaling Web Apps P FalconeScaling Web Apps P Falcone
Scaling Web Apps P Falcone
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - Test
 
Datalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointDatalayer Best Practices with Observepoint
Datalayer Best Practices with Observepoint
 
Bodo Value Guide.pdf
Bodo Value Guide.pdfBodo Value Guide.pdf
Bodo Value Guide.pdf
 
Implementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_YourImplementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_Your
 
Getting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech TalksGetting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech Talks
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

MongoDb Schema Pattern - Kalpit Pandit.pptx

  • 1. MongoDB Schema Design Patterns Presented By : Kalpit Pandit panditkalpit@gmail.com
  • 2. Agenda  Schema Pattern?  Data Modelling  Design Patterns  Approximation  Computed  Bucket  Extended Reference  Outlier  Subset panditkalpit@gmail.com
  • 3. What are the Schema Patterns?  Schema patterns are very similar to Software Design Patterns but focused on designing the schema.  These are some building blocks identified by developers upon years of experience which helps in designing the schema of tables/collections in a way such that the application becomes more scalable in terms of reads/writes.  There are no perfect schema patterns. These are just guidelines and some industry standards you can follow to build a better schema.  You can also use multiple schema patterns together to solve one particular use-case.  Let me reiterate this again there is no pattern which can be applied as plug and play. You need to modify it to fit your use-case. panditkalpit@gmail.com
  • 6. Approximation Pattern  This pattern can be used when the use-case is write-intensive but the correctness of the result set is not of priority.  let's say we need to implement a feature which says the number of people viewed that product like a view counter and the counter logically need not be accurate it can be an approximate i.e instead of 1878 it can be 1800 which is also fine since the accuracy is not critical and can survive with approximate data.  instead of updating the counter in the database with every visit, we could build in a counter and only update by 50. panditkalpit@gmail.com
  • 7. Approximatio n Pattern  Useful when  expensive calculations are frequently done  the precision of those calculations is not the highest priority  Pros  fewer writes to the database  Reduction in CPU workload for frequent computations  Cons  exact numbers aren’t being represented  implementation must be done in the application  Example  Visitor Count  Total Product views panditkalpit@gmail.com
  • 8. Computed Pattern  Have you ever wondered, how can YouTube get the real time total views of a video?  The video above has 13B views, but if you were to count that in any DB using db.getCollection("video- views") .find({}).count();  The query above would take ages to count the total views of the video. But on YouTube you can see the total view in an instant right.  So let me introduce you to the computed value pattern. panditkalpit@gmail.com
  • 9. Computed Pattern  Useful when  Very read-intensive data access patterns  Data needs to be repeatedly computed by the application  Computation done in conjunction with any update or at defined intervals - every hour for example  Pros  Reduction in CPU workload for frequent computations  Cons  It may be difficult to identify the need for this pattern  Examples  Revenue or viewer  Time series data  Product catalogues panditkalpit@gmail.com
  • 10. Bucket Pattern  With data coming in as a stream over a period of time (time series data) we may be inclined to store each measurement in its own document, as if we were using a relational database.  We could end up having to index sensor_id and timestamp for every single measurement to enable rapid access.  We can "bucket" this data, by time, into documents that hold the measurements from a particular time span. We can also programmatically add additional information to each of these "buckets".  Benefits in terms of index size savings, potential query simplification, and the ability to use that pre -aggregated data in our documents. panditkalpit@gmail.com
  • 11. Bucket Pattern  Useful when  needing to manage streaming data  time-series  real-time analytics  Internet of Things (IoT)  Pros  reduces the overall number of documents in a collection  improves index performance  can simplify data access by leveraging pre -aggregation, e.g., average temperature = sum/count  Example  IoT, time series panditkalpit@gmail.com
  • 12. Extended Reference  In an e-commerce application order, customer and the inventory are separate logical entities.  However, the full retrieval of an order requires to join data from different entities.  A customer can have N orders, creating a 1-N relationship.  Embedding all the customer information inside each order  reduce the JOIN operation  results in a lot of duplicated information  not all the customer data may be needed. panditkalpit@gmail.com
  • 13. Extended Reference  Instead of embedding (i.e. duplicating) all the data of an external entity (i.e. another document), we only copy the fields we access frequently.  Instead of including a reference to join the information, we only embed those fields of the highest priority and most frequently accessed. panditkalpit@gmail.com
  • 14. Extended Reference  Useful when  your application is experiencing lots of JOIN operations to bring together frequently accessed data  Pros  improves performance when there are a lot of join operations  faster reads and a reduction in the complexity of data fetching  Sometimes duplication of data is better because you keep the historical values (e.g., shipping address of the order)  Cons  data duplication, it works best if such data rarely change (e.g., user-id, name) panditkalpit@gmail.com
  • 15. Outlier Pattern  With the Outlier Pattern, we are working to prevent a few queries or documents driving our solution towards one that would not be optimal for the majority of our use cases  E-Commerce selling books  who has purchased a particular book?  store an array of user_id who purchased the book, in each book document  You have a solution that works for 99.99% of the cases, but what happens when a top-seller book is released? o You cannot store millions of user_ids due to the document size limit (16 MB) panditkalpit@gmail.com
  • 16. Outlier  Add a new field to "flag" the document as an outlier, e.g., “has_extras”  Move the overflow information into a separate document linked with the book's id.  Inside the application, we would be able to easily determine if a document “has extras”.  Only in such outlier cases, the application would retrieve the extra information. panditkalpit@gmail.com
  • 17. Outlier Pattern  Useful when  few queries or documents that don’t fit into the rest of your typical data patterns  Pros  prevents a few documents or queries from determining an application’s solution.  queries are tailored for “typical” use cases, but outliers are still addressed  Cons  often tailored for specific queries, therefore ad hoc queries may not perform well  much of this pattern is done with application code  Example  social network relationships  book sales  movie reviews panditkalpit@gmail.com
  • 18. Subset Pattern  When the working set of data and indexes grows beyond the physical RAM allotted, performance is reduced as disk accesses starts to occur and data rolls out of RAM  add more RAM to the server  sharding our collection, but that comes with additional costs and complexities  reduce the size of our working set with the Subset pattern  Caused by large documents which have a lot of data that isn't actually used by the application  e-commerce site that has a list of reviews for a product. o accessing that product's data, we'd only need the most recent ten or so reviews.  pulling in the entirety of the product data with all the reviews could easily cause the working set to uselessly expand panditkalpit@gmail.com
  • 19. Subset Pattern  Useful when  the working set exceed the capacity of RAM due to large documents that have much of the data in the document not being used by the application  Pros  reduction in the overall size of the working set.  shorter disk access time for the most frequently used data  Cons  we must manage the subset  pulling in additional data requires additional trips to the database  Example  reviews for a product panditkalpit@gmail.com