SlideShare a Scribd company logo
1 of 22
MongoDB Schema
Design Patterns
Presented By :
Kalpit Pandit
panditkalpit@gmail.com
Agenda
 Schema Pattern?
 Data Modelling
 Design Patterns
 Approximation
 Computed
 Bucket
 Extended Reference
 Outlier
 Subset
panditkalpit@gmail.com
What are the Schema Patterns?
 Schema patterns are very similar to Software Design Patterns but focused on designing the
schema.
 These are some building blocks identified by developers upon years of experience which
helps in designing the schema of tables/collections in a way such that the application
becomes more scalable in terms of reads/writes.
 There are no perfect schema patterns. These are just guidelines and some industry standards
you can follow to build a better schema.
 You can also use multiple schema patterns together to solve one particular use-case.
 Let me reiterate this again there is no pattern which can be applied as plug and play. You
need to modify it to fit your use-case.
panditkalpit@gmail.com
Data Modeling
Embedding
Referencing
panditkalpit@gmail.com
Design Patterns
panditkalpit@gmail.com
Approximation Pattern
 This pattern can be used when the use-case is write-intensive but the correctness of the result
set is not of priority.
 let's say we need to implement a feature which says the number of people viewed that
product like a view counter and the counter logically need not be accurate it can be an
approximate i.e instead of 1878 it can be 1800 which is also fine since the accuracy is not
critical and can survive with approximate data.
 instead of updating the counter in the database with every visit, we could build in a counter
and only update by 50.
panditkalpit@gmail.com
Approximatio
n Pattern
 Useful when
 expensive calculations are frequently done
 the precision of those calculations is not the highest
priority
 Pros
 fewer writes to the database
 Reduction in CPU workload for frequent computations
 Cons
 exact numbers aren’t being represented
 implementation must be done in the application
 Example
 Visitor Count
 Total Product views
panditkalpit@gmail.com
Computed Pattern
 Have you ever wondered, how can
YouTube get the real time total views of
a video?
 The video above has 13B views, but if
you were to count that in any DB using
db.getCollection("video- views")
.find({}).count();
 The query above would take ages to
count the total views of the video. But
on YouTube you can see the total view in
an instant right.
 So let me introduce you to the
computed value pattern.
panditkalpit@gmail.com
Computed
Pattern
 Useful when
 Very read-intensive data access patterns
 Data needs to be repeatedly computed by the application
 Computation done in conjunction with any update or at
defined intervals - every hour for example
 Pros
 Reduction in CPU workload for frequent computations
 Cons
 It may be difficult to identify the need for this pattern
 Examples
 Revenue or viewer
 Time series data
 Product catalogues
panditkalpit@gmail.com
Bucket Pattern
 With data coming in as a stream over a
period of time (time series data) we may be
inclined to store each measurement in its
own document, as if we were using a
relational database.
 We could end up having to index sensor_id
and timestamp for every single measurement
to enable rapid access.
 We can "bucket" this data, by time, into
documents that hold the measurements from
a particular time span. We can also
programmatically add additional information
to each of these "buckets".
 Benefits in terms of index size savings,
potential query simplification, and the ability
to use that pre -aggregated data in our
documents.
panditkalpit@gmail.com
Bucket
Pattern
 Useful when
 needing to manage streaming data
 time-series
 real-time analytics
 Internet of Things (IoT)
 Pros
 reduces the overall number of documents in a collection
 improves index performance
 can simplify data access by leveraging pre -aggregation,
e.g., average temperature = sum/count
 Example
 IoT, time series
panditkalpit@gmail.com
Extended Reference
 In an e-commerce application order,
customer and the inventory are separate
logical entities.
 However, the full retrieval of an order requires
to join data from different entities.
 A customer can have N orders, creating a 1-N
relationship.
 Embedding all the customer information
inside each order
 reduce the JOIN operation
 results in a lot of duplicated information
 not all the customer data may be
needed.
panditkalpit@gmail.com
Extended Reference
 Instead of embedding (i.e.
duplicating) all the data of an
external entity (i.e. another
document), we only copy the fields
we access frequently.
 Instead of including a reference to
join the information, we only
embed those fields of the highest
priority and most frequently
accessed.
panditkalpit@gmail.com
Extended
Reference
 Useful when
 your application is experiencing lots of JOIN operations to
bring together frequently accessed data
 Pros
 improves performance when there are a lot of join
operations
 faster reads and a reduction in the complexity of data
fetching
 Sometimes duplication of data is better because you keep
the historical values (e.g., shipping address of the order)
 Cons
 data duplication, it works best if such data rarely change
(e.g., user-id, name)
panditkalpit@gmail.com
Outlier Pattern
 With the Outlier Pattern, we are working to
prevent a few queries or documents
driving our solution towards one that
would not be optimal for the majority of
our use cases
 E-Commerce selling books
 who has purchased a particular book?
 store an array of user_id who purchased
the book, in each book document
 You have a solution that works for 99.99%
of the cases, but what happens when a
top-seller book is released? o You cannot
store millions of user_ids due to the
document size limit (16 MB)
panditkalpit@gmail.com
Outlier
 Add a new field to "flag" the
document as an outlier, e.g.,
“has_extras”
 Move the overflow information into
a separate document linked with
the book's id.
 Inside the application, we would be
able to easily determine if a
document “has extras”.
 Only in such outlier cases, the
application would retrieve the extra
information.
panditkalpit@gmail.com
Outlier
Pattern
 Useful when
 few queries or documents that don’t fit into the rest of your
typical data patterns
 Pros
 prevents a few documents or queries from determining an
application’s solution.
 queries are tailored for “typical” use cases, but outliers are still
addressed
 Cons
 often tailored for specific queries, therefore ad hoc queries
may not perform well
 much of this pattern is done with application code
 Example
 social network relationships
 book sales
 movie reviews
panditkalpit@gmail.com
Subset Pattern
 When the working set of data and
indexes grows beyond the physical
RAM allotted, performance is reduced
as disk accesses starts to occur and
data rolls out of RAM
 add more RAM to the server
 sharding our collection, but that
comes with additional costs and
complexities
 reduce the size of our working set
with the Subset pattern
 Caused by large documents which have
a lot of data that isn't actually used by
the application
 e-commerce site that has a list of
reviews for a product. o accessing
that product's data, we'd only
need the most recent ten or so
reviews.
 pulling in the entirety of the
product data with all the reviews
could easily cause the working set
to uselessly expand
panditkalpit@gmail.com
Subset
Pattern
 Useful when
 the working set exceed the capacity of RAM due to large
documents that have much of the data in the document
not being used by the application
 Pros
 reduction in the overall size of the working set.
 shorter disk access time for the most frequently used data
 Cons
 we must manage the subset
 pulling in additional data requires additional trips to the
database
 Example
 reviews for a product
panditkalpit@gmail.com
References
 https://www.mongodb.com/blog/post/b
uilding-with-patterns-a-summary
 https://www.youtube.com/@MongoDB
 MongoDB Developer Day Conference
panditkalpit@gmail.com
Q&A
panditkalpit@gmail.com
Thank you
Kalpit Pandit
panditkalpit@gmail.com

More Related Content

Similar to MongoDb Schema Pattern - Kalpit Pandit.pptx

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...stevemcpherson
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida CLARA CAMPROVIN
 
Data warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendData warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendnoviari sugianto
 
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design PatternsMongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design PatternsMongoDB
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdfTechoERP
 
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsInside Analysis
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Modelspepeborja
 
Scaling Web Apps P Falcone
Scaling Web Apps P FalconeScaling Web Apps P Falcone
Scaling Web Apps P Falconejedt
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - TestKiran Naiga
 
Datalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointDatalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointMike Plant
 
Bodo Value Guide.pdf
Bodo Value Guide.pdfBodo Value Guide.pdf
Bodo Value Guide.pdfGregHanchin1
 
Implementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_YourImplementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_YourMichael Cook
 
Getting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech TalksGetting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech TalksAmazon Web Services
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB
 

Similar to MongoDb Schema Pattern - Kalpit Pandit.pptx (20)

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Building a SaaS Style Application
Building a SaaS Style ApplicationBuilding a SaaS Style Application
Building a SaaS Style Application
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
 
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida  Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
Jet Reports es la herramienta para construir el mejor BI y de forma mas rapida
 
Data warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spendData warehouse pricing & cost: what you'll really spend
Data warehouse pricing & cost: what you'll really spend
 
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design PatternsMongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters Analytics
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Models
 
Scaling Web Apps P Falcone
Scaling Web Apps P FalconeScaling Web Apps P Falcone
Scaling Web Apps P Falcone
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - Test
 
Datalayer Best Practices with Observepoint
Datalayer Best Practices with ObservepointDatalayer Best Practices with Observepoint
Datalayer Best Practices with Observepoint
 
Bodo Value Guide.pdf
Bodo Value Guide.pdfBodo Value Guide.pdf
Bodo Value Guide.pdf
 
Implementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_YourImplementing_S1000D_Best_Business_Practices_Means_Understanding_Your
Implementing_S1000D_Best_Business_Practices_Means_Understanding_Your
 
Getting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech TalksGetting Started with ML in AdTech - AWS Online Tech Talks
Getting Started with ML in AdTech - AWS Online Tech Talks
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

MongoDb Schema Pattern - Kalpit Pandit.pptx

  • 1. MongoDB Schema Design Patterns Presented By : Kalpit Pandit panditkalpit@gmail.com
  • 2. Agenda  Schema Pattern?  Data Modelling  Design Patterns  Approximation  Computed  Bucket  Extended Reference  Outlier  Subset panditkalpit@gmail.com
  • 3. What are the Schema Patterns?  Schema patterns are very similar to Software Design Patterns but focused on designing the schema.  These are some building blocks identified by developers upon years of experience which helps in designing the schema of tables/collections in a way such that the application becomes more scalable in terms of reads/writes.  There are no perfect schema patterns. These are just guidelines and some industry standards you can follow to build a better schema.  You can also use multiple schema patterns together to solve one particular use-case.  Let me reiterate this again there is no pattern which can be applied as plug and play. You need to modify it to fit your use-case. panditkalpit@gmail.com
  • 6. Approximation Pattern  This pattern can be used when the use-case is write-intensive but the correctness of the result set is not of priority.  let's say we need to implement a feature which says the number of people viewed that product like a view counter and the counter logically need not be accurate it can be an approximate i.e instead of 1878 it can be 1800 which is also fine since the accuracy is not critical and can survive with approximate data.  instead of updating the counter in the database with every visit, we could build in a counter and only update by 50. panditkalpit@gmail.com
  • 7. Approximatio n Pattern  Useful when  expensive calculations are frequently done  the precision of those calculations is not the highest priority  Pros  fewer writes to the database  Reduction in CPU workload for frequent computations  Cons  exact numbers aren’t being represented  implementation must be done in the application  Example  Visitor Count  Total Product views panditkalpit@gmail.com
  • 8. Computed Pattern  Have you ever wondered, how can YouTube get the real time total views of a video?  The video above has 13B views, but if you were to count that in any DB using db.getCollection("video- views") .find({}).count();  The query above would take ages to count the total views of the video. But on YouTube you can see the total view in an instant right.  So let me introduce you to the computed value pattern. panditkalpit@gmail.com
  • 9. Computed Pattern  Useful when  Very read-intensive data access patterns  Data needs to be repeatedly computed by the application  Computation done in conjunction with any update or at defined intervals - every hour for example  Pros  Reduction in CPU workload for frequent computations  Cons  It may be difficult to identify the need for this pattern  Examples  Revenue or viewer  Time series data  Product catalogues panditkalpit@gmail.com
  • 10. Bucket Pattern  With data coming in as a stream over a period of time (time series data) we may be inclined to store each measurement in its own document, as if we were using a relational database.  We could end up having to index sensor_id and timestamp for every single measurement to enable rapid access.  We can "bucket" this data, by time, into documents that hold the measurements from a particular time span. We can also programmatically add additional information to each of these "buckets".  Benefits in terms of index size savings, potential query simplification, and the ability to use that pre -aggregated data in our documents. panditkalpit@gmail.com
  • 11. Bucket Pattern  Useful when  needing to manage streaming data  time-series  real-time analytics  Internet of Things (IoT)  Pros  reduces the overall number of documents in a collection  improves index performance  can simplify data access by leveraging pre -aggregation, e.g., average temperature = sum/count  Example  IoT, time series panditkalpit@gmail.com
  • 12. Extended Reference  In an e-commerce application order, customer and the inventory are separate logical entities.  However, the full retrieval of an order requires to join data from different entities.  A customer can have N orders, creating a 1-N relationship.  Embedding all the customer information inside each order  reduce the JOIN operation  results in a lot of duplicated information  not all the customer data may be needed. panditkalpit@gmail.com
  • 13. Extended Reference  Instead of embedding (i.e. duplicating) all the data of an external entity (i.e. another document), we only copy the fields we access frequently.  Instead of including a reference to join the information, we only embed those fields of the highest priority and most frequently accessed. panditkalpit@gmail.com
  • 14. Extended Reference  Useful when  your application is experiencing lots of JOIN operations to bring together frequently accessed data  Pros  improves performance when there are a lot of join operations  faster reads and a reduction in the complexity of data fetching  Sometimes duplication of data is better because you keep the historical values (e.g., shipping address of the order)  Cons  data duplication, it works best if such data rarely change (e.g., user-id, name) panditkalpit@gmail.com
  • 15. Outlier Pattern  With the Outlier Pattern, we are working to prevent a few queries or documents driving our solution towards one that would not be optimal for the majority of our use cases  E-Commerce selling books  who has purchased a particular book?  store an array of user_id who purchased the book, in each book document  You have a solution that works for 99.99% of the cases, but what happens when a top-seller book is released? o You cannot store millions of user_ids due to the document size limit (16 MB) panditkalpit@gmail.com
  • 16. Outlier  Add a new field to "flag" the document as an outlier, e.g., “has_extras”  Move the overflow information into a separate document linked with the book's id.  Inside the application, we would be able to easily determine if a document “has extras”.  Only in such outlier cases, the application would retrieve the extra information. panditkalpit@gmail.com
  • 17. Outlier Pattern  Useful when  few queries or documents that don’t fit into the rest of your typical data patterns  Pros  prevents a few documents or queries from determining an application’s solution.  queries are tailored for “typical” use cases, but outliers are still addressed  Cons  often tailored for specific queries, therefore ad hoc queries may not perform well  much of this pattern is done with application code  Example  social network relationships  book sales  movie reviews panditkalpit@gmail.com
  • 18. Subset Pattern  When the working set of data and indexes grows beyond the physical RAM allotted, performance is reduced as disk accesses starts to occur and data rolls out of RAM  add more RAM to the server  sharding our collection, but that comes with additional costs and complexities  reduce the size of our working set with the Subset pattern  Caused by large documents which have a lot of data that isn't actually used by the application  e-commerce site that has a list of reviews for a product. o accessing that product's data, we'd only need the most recent ten or so reviews.  pulling in the entirety of the product data with all the reviews could easily cause the working set to uselessly expand panditkalpit@gmail.com
  • 19. Subset Pattern  Useful when  the working set exceed the capacity of RAM due to large documents that have much of the data in the document not being used by the application  Pros  reduction in the overall size of the working set.  shorter disk access time for the most frequently used data  Cons  we must manage the subset  pulling in additional data requires additional trips to the database  Example  reviews for a product panditkalpit@gmail.com